Developing applications for Apache Hadoop

Sarah Sproehnle (Cloudera, Inc.)
Data Science, GA J
Please note: to attend, your registration must include Tutorials.
Average rating: ****.
(4.25, 4 ratings)

This tutorial will explain how to leverage a Hadoop cluster to do data analysis using Java MapReduce, Apache Hive and Apache Pig. It is recommended that participants have experience with some programming language. Topics include:

  • Why are Hadoop and MapReduce needed?
  • Writing a Java MapReduce program
  • Common algorithms applied to Hadoop such as indexing, classification, joining data sets and graph processing
  • Data analysis with Hive and Pig
  • Overview of writing applications that use Apache HBase

Some programming experience is strongly recommended for this session.

Photo of Sarah Sproehnle

Sarah Sproehnle

Cloudera, Inc.

Sarah Sproehnle is the Director of Educational Services for Cloudera
where she helps customers learn to use Apache Hadoop for big data
processing. Cloudera provides commercial support, training and
services for the Apache Hadoop platform.

Comments on this page are now closed.


Picture of Sophia DeMartini
Sophia DeMartini
02/28/2012 7:19am PST

Hi Sam,

If the speaker provides us with their slides, we'll post them as soon as the session is over.

Thanks, Sophia

Sam Keen
02/28/2012 6:52am PST

will the slides be available after the session?

Picture of Sophia DeMartini
Sophia DeMartini
02/26/2012 2:31pm PST

Hi Mohit,

Laptops are not provided to attendees, but we encourage you to bring your own.
Mohit Anchlia
02/26/2012 9:56am PST

Do I need to bring my own laptop or are machines available in the room?

Harsh Hatekar
02/23/2012 2:18pm PST


I would like to attend the afternoon session, but it seems it is all booked. Any plans to have additional seats? Please let me know

Picture of Sarah Sproehnle
Sarah Sproehnle
01/13/2012 1:59pm PST

Some programming experience is strongly recommended for this session. The morning session does not require programm experience (

Ritee Rouf
01/12/2012 4:26am PST

Do you need to be a programmer to take this workshop?


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts