Scaling with Your Data: An Introduction to Hadoop

Development, Workshop
Location: 2002 Level:
Average rating: **...
(2.60, 5 ratings)

Cloudera will provide a tutorial aimed at producers and users of large volumes of data. Do you deal with TBs on a regular basis? Are traditional databases not doing what you need? Are your challenges related primarily to processing and analyzing data, rather than simply finding it? Hadoop and MapReduce might be just what you need. Google developed an integrated storage and processing framework to scale with the web. After publishing their results, the Apache Software Foundation, along with major contributions from Yahoo!, Facebook and others got the Hadoop project off the ground. Hadoop provides a fully open source implementation of the same system Google uses to perform deep analysis on web scale data.

This half-day tutorial will teach you what you need to know to work more deeply with Hadoop, and help you think about the following questions:

  • What must our organization do differently to effectively use large-scale data?
  • What tools help us analyze large-scale data and extract meaningful results, and how do we use them?
  • How can we reorient our data generation and collection processes to enable more powerful analysis later?

We’ll cover basics around working with large scale data systems and introduce participants to the MapReduce programming model. More importantly, we’ll focus on how to “think in MapReduce.” We’ll go through examples of how to convert common tasks into MapReduce, and provide the foundations which enable you to convert your own specific tasks to this model. We’ll also point you to resources you need to get up and running with Hadoop in your own data center or in the cloud.

Photo of Christophe Bisciglia

Christophe Bisciglia

Cloudera, Inc

Christophe Bisciglia joins Cloudera from Google, where he created and managed their Academic Cloud Computing Initiative. Starting in 2007, he began working with the University of Washington to teach students about Google’s core data management and processing technologies – MapReduce and GFS. This quickly brought Hadoop into the curriculum, and has since resulted in an extensive partnership with the National Science Foundation (NSF) which makes Google-hosted Hadoop clusters available for research and education worldwide. Beyond his work with Hadoop, he holds patents related to search quality and personalization, and spent a year working in Shanghai. Christophe earned his degree, and remains a visiting scientist, at the University of Washington.

Photo of Aaron Kimball

Aaron Kimball

Cloudera, Inc.

Aaron Kimball is a software engineer at Cloudera, Inc., the Commercial Hadoop company. Aaron is the principle developer of Sqoop, the SQL-to-Hadoop database import/export tool. Aaron has been working with Hadoop since early 2007, and contributes actively to its development. Through Cloudera, he additionally provides training to developers and system administrators working with Hadoop. Aaron holds a B.S. in Computer Science from Cornell University, and an M.S. in Computer Science and Engineering from the University of Washington.

  • 3Tera, Inc
  • Ascentium
  • Awareness
  • HiveLive, Inc.
  • ImageSpan
  • Jive Software
  • Juniper Networks
  • Kapow Technologies
  • Keynote Systems
  • LithiumTechnologies
  • Nokia
  • nomee
  • Qtask
  • Rackspace Hosting
  • Remy
  • TamTamy
  • Vignette
  • Yola (fka SynthaSite)
  • Znak
  • IBM
  • eBay
  • Microsoft Corporation
  • Adobe Systems, Inc.
  • EffectiveUI
  • Germany Trade & Invest
  • NeuStar
  • ONEsite

Sponsor & Exhibitor Opportunities

Natalia Dugandzic

Media Sponsor Opportunities

Matthew Balthazor

Speaker / Program Ideas

Have a suggestion for a speaker or topic at Web 2.0 Expo San Francisco? Send an email to:

Press/Media Inquiries

Maureen Jennings


Natalia Wodecki

Contact Us

View a complete list of Web 2.0 Expo contacts.