Introduction to Apache Hadoop

Thomas Wheeler (Cloudera, Inc.)
Location: Portland 255 Level: Novice
Average rating: ****.
(4.06, 48 ratings)
Slides:   1-PDF 


This tutorial will present a mix of lecture and instructor-led demonstrations to explain what Apache Hadoop is and why it’s becoming a standard for large-scale data storage and processing.

  • Why the World Needs Hadoop
    • What is Apache Hadoop?
    • How Did Apache Hadoop Originate?
    • The Economics of Hadoop
    • Common Use Cases
  • Fundamental Concepts
    • How Hadoop Differs from other Distributed Computing Architectures
    • High-Level Architecture
    • The Anatomy of the Cluster
  • HDFS: The Hadoop Distributed Filesystem
    • Comparison to Standard Filesystems
    • HDFS Replication and Reliability
    • Demo: Accessing HDFS Using the Command Line
  • MapReduce
    • Data Processing with MapReduce
    • Thinking in MapReduce
    • Hadoop Streaming
    • Demo: MapReduce Example in Python
    • Visual Overview of Job Execution
    • Hadoop’s Java API for MapReduce
    • Demo: MapReduce Example in Java
  • Using Apache Hadoop Effectively
    • Partitioning the Keyspace
    • Improving Performance with a Combiner
    • Tips for Running at Scale
    • When Hadoop is Not the Right Choice
  • The Hadoop Ecosystem
    • Apache Flume
    • Apache Sqoop
    • Apache Hive
    • Apache Pig
    • Apache HBase
    • Apache Mahout
    • Hadoop Versions and Distributions

This is a practical session focused on real-world applications of Apache Hadoop — at no point will I use the lame “wordcount” example that’s become cliché for explaining MapReduce to beginners.


If you’d like to follow along with the instructor-led demos of HDFS and MapReduce, please follow the instructions on this page to get the virtual machine and code samples.

QUESTIONS for the speaker?: Use the “Leave a Comment or Question” section at the bottom to address them.

Photo of Thomas Wheeler

Thomas Wheeler

Cloudera, Inc.

Tom Wheeler’s career spans more than fifteen years in the communications, biotech, financial, healthcare, aerospace and defense industries. Before joining Cloudera, he developed engineering software at Boeing, helped to design a high-volume data processing system for WebMD and served as senior programmer/analyst for a brokerage firm. Mr. Wheeler is a frequent presenter at both user groups and software conferences.

Comments on this page are now closed.


Picture of Thomas Wheeler
Thomas Wheeler
07/22/2013 5:15am PDT

Vivek: I burned a backup copy to a DVD. If you show up to the session a few minutes early, I will let you have it.

Vivek Vaid
07/22/2013 5:04am PDT

Tom – Would you have the VM available on a USB stick. The download is 2.4G and will take hours at the interweb speed here at the convention center.

Picture of Thomas Wheeler
Thomas Wheeler
06/06/2013 5:34am PDT

I apologize for this and have updated the link to point to valid page. I intend to post the VM and examples on that page by July 1, three weeks prior to the workshop.

Grant Johnson
06/06/2013 5:21am PDT

The VM link is broken.


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or

Contact Us

View a complete list of OSCON contacts