Testing Hadoop Applications

Hadoop: Tools & Technology, Gramercy Suite (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Average rating: ****.
(4.62, 8 ratings)

Software testing is hard enough, but it becomes especially challenging when you’re doing large-scale, distributed data processing. This tutorial will present a mix of lecture and instructor-led demonstrations to explain how you can verify that your code performs exactly as you intended.

This session will focus on four key topics:

  1. Unit testing: Proving that a single piece of code works in isolation
  2. Integration testing: Verifying that these units work correctly in conjunction with one another
  3. Performance testing: Ensuring that the code runs at the expected speed and scale
  4. Diagnostics: How to extract valuable information from Hadoop that can help you isolate problems in your code

We will also discuss several problems developers commonly introduce into their code, as well as ways to recognize and solve them.

Photo of Tom Wheeler

Tom Wheeler

Cloudera, Inc.

Tom Wheeler’s career spans more than fifteen years in the communications, biotech, financial, healthcare, aerospace and defense industries. Before joining Cloudera, he developed engineering software at Boeing, helped to design a high-volume data processing system for WebMD and served as senior programmer/analyst for a brokerage firm. Mr. Wheeler is a frequent presenter at both user groups and software conferences.

Comments on this page are now closed.


Picture of Shirley Bailes
Shirley Bailes
10/23/2012 10:04am EDT

The current version that Tom provided below, is now updated at the top of this page as a ZIP file.

Picture of Tom Wheeler
Tom Wheeler
10/23/2012 9:53am EDT

Hello attendees:

It seems that this Web site doesn’t have the latest version. I am working with O’Reilly to correct this, but in the meantime, you can find the current version of the slides and demos (including the MiniMRCluster and MiniDFSCluster example) here:


Picture of Tom Wheeler
Tom Wheeler
10/17/2012 9:32pm EDT

Hi Michael,

There are no specific prerequisites. It’s not practical to do this session as a hands-on workshop, so it will be a mix of lecture and demonstration. Thus, you needn’t have anything in particular on your computer (or even have a computer with you at all). I do plan on making my slides and all code used in the demos available following the session.

Although my demos will use Cloudera’s CDH4 distribution, I expect that they would run equally well on any modern version of Apache Hadoop, whether it comes from the Apache site or through another vendor’s distribution.

michael semb wever
10/17/2012 9:16pm EDT

Are there prerequisites for this tutorial? A list of tools (and versions) we should have installed to save time on the demonstrations?


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.