How to Develop Big Data Applications for Hadoop

Location: Mission City B5
Average rating: *....
(1.57, 23 ratings)

Hands On Instructions
Attendees are invited to participate in the hands-on section of this tutorial using Karmasphere software and free Amazon Web Services credits which will be distributed on a USB memory stick at the tutorial. If you’re interested in this, we recommend you install the VMware VMPlayer (for Windows and Linux) or VMWare Fusion (trial or full version for Mac) on your laptop in advance.

Distributed applications running on Hadoop clusters can deliver powerful insights and results from the biggest data sets ever generated. But do you have to be a rocket scientist to use it? Fortunately, the answer is no. This tutorial will explain the theory of MapReduce and how to develop big data applications in Java and higher level languages such as Pig and Hive SQL. Using practical, real-world examples such as weblog processing, analytics, and text summarization, it will cover how to prototype, debug, monitor, test and optimize big data applications for Hadoop’s distributed processing platform. Attendees will get hands-on instruction and will leave with a solid understanding of how to analyze data on Hadoop clusters and practical examples they can use and build on after the tutorial.

The tutorial will be in 5 parts:
Part 1 (30 mins): What you need to know about MapReduce and Hadoop
Part 2 (45 mins): Rapid prototyping and ad hoc analytics
Part 3 (15 mins): Real world case study
Part 4 (60 mins): Hands on instruction with practical examples
Part 5 (30 mins): Your questions answered

Photo of Abe Taha

Abe Taha


Abe was senior director of engineering for where he led engineering for all properties. Before that, Abe was director of engineering at Ning where he worked on Hadoop-based solutions for Ning users and led development of the Ning data services platform and systems management services.

Previously, Abe managed development for the Google Apps Infrastructure at Google, and while at Yahoo! served as senior engineering manager for several units including, Social Search Platform, Search Front-End Platform and Listings Platform. In addition, Abe has held engineering positions at technology companies including CNA eBusiness, Scient and Jackson Software.
Abe has completed course requirements for a PhD and holds a MS in Theoretical and Applied Mechanics from University of Illinois at Urbana-Champaign and earned a MS and BS in Mechanical Engineering from Cairo University.
Abe, the recipient of numerous academic awards and fellowships also holds a patent for development of a search engine with augmented relevance ranking by community participation.

Abe is currently the VP Engineering at Karmasphere.

Photo of Shevek -

Shevek -


Shevek is a widely-recognized mathematician and computer programmer with specific expertise in the Java programming language. His considerable experience ranges from theoretical computing to team management. Through the course of his career he has worked on cutting-edge academic research in systems, compilers, programming languages and computer security. Shevek has worked with organizations such as the U.K. Department of Trade and Industry, Raytheon Systems, and Weir, Strachan and Henshaw in the defense and nuclear industries. He received a Doctorate in Computing from University of Bath, Bath, England. Shevek also holds a Masters in Mathematics, with Honors, from University of Bath.
Shevek is CTO and co-founder of Karmasphere.

Photo of Ken Krugler

Ken Krugler

Scale Unlimited

Ken is the president of Scale Unlimited, a consulting and training company for big data processing and web mining problems using Hadoop, Cascading, Solr and Elasticsearch. Previously he was the founder and CTO of Krugle, a vertical search engine and enterprise appliance for code and technical information. He’s a member of the Apache Foundation, a committer on the Tika and Bixo open source projects, and teaches Hadoop, Cassandra and Solr courses for Scale Unlimited, Datastax and Lucidworks.

Photo of Chris Wensel

Chris Wensel

Concurrent, Inc

Chris Wensel is the founder of Concurrent Inc., and the author of the
Cascading data processing open-source project. He also co-founded Scale
Unlimited, the first Hadoop and “Big Data” related professional services and
training company, where he mentored companies like Sun Microsystems, Apple,
and numerous startups in the Bay Area.

Chris bootstrapped his first Internet startup in the early 90’s, creating an
early Web server-side scripting language used by companies in the
real estate and insurance verticals. During the late 90’s, Chris focused on
distributed-agent based systems where he received several patents on
distributed computing. From there he became Chief Architect for the fastest
growing business unit at Thomson Reuters. Just prior to Concurrent, Chris
was a Consulting Architect to TeleAtlas geo-content management group in

Comments on this page are now closed.


Håkan Jonsson
02/07/2011 4:24pm PST

Should have asked everyone to prepare everything, including registration for amazon, before the tutorial. Installation at a tutorial is a recipe for disaster

Picture of Sean Boisen
Sean Boisen
02/03/2011 7:50am PST

Abe: Despite the problems, i’m still very interested in trying to go through the steps you intended for us and getting the hands-on experience. Could you post some more complete directions that could be followed on our own?

Anthony Cassandra
02/02/2011 2:08pm PST

Just the affirm the two major problems with this session: 1) was not quite what it was advertised to be, and more of a sales pitch/demo for a specific prouct than a Hadoop-specific instruction; 2) hand-on setup was way too complex with too many dependencies, latencies and places for it to go wrong.

Picture of Abe Taha
Abe Taha
02/02/2011 5:41am PST

We the presenters are very sorry to hear of your negative experiences in our session and even more sorry to concede that they have some validity.

We were very ambitious in our plans for this session, and admit that we did not hit the mark 100% and met with some bumps along the way.

Our intent was to offer a genuine primer on Hadoop and help attendees develop their first MapReduce job in a short period of time and become familiar with Hadoop and its processing without having to install or pay for their own cluster. We also wanted the attendees to have continued access to some tools and a cluster for the next 30 days so they could continue and solidify their learning back in their offices. So using a tool that is available for free in a community edition and giving some free access to a cloud implementation seemed like the right solution.

But yes, despite many cycles spent on maximizing wifi access and upgrading systems, we still ran into bandwidth problems. And in retrospect, we can see content areas we could have changed to prevent the infomercial perception and make it more hands-on and educational at a pace the audience could follow.

Again, it was our sincere intent to educate and share our knowledge and provide valuable tools to the attendees to assist that process. We are engineers, not sales or pitch men. We will take all these comments to heart and make sure that the next next time we offer this tutorial the content and experience lines up with our intent.

Thank you for your feedback, and we wish you the best in your adventure with Hadoop.

Abe and team.

Paul Soule
02/01/2011 10:57pm PST

This ‘tutorial’ was not about Hadoop, but was a sales pitch about AWS and the Karmasphere product. I have no interest in either. This is not what was described in the session description and it’s not something one would expect to pay for.

The session itself was chaotic. Signing up to AWS was problematic and I only received my AWS approval after the session so could not participate in the practical, which was of little value anyway as it concentrated on the tool not on Hadoop. Very poor indeed.

Doug Durham
02/01/2011 11:10am PST

It is unclear that even if the speakers had been prepared that this would have been worthwhile. There was never any intent to teach Hadoop. Instead it was all about how to use Karmasphere, and Amazon tools. This is not Hadoop, and it bears little resemblance to the session description.

This is my first O’Reilly conference. I expected that there would be some level of quality control and oversight on the content. I didn’t expect to pay $1900 for a poorly planned 3 hour infomercial on a vendor’s product. This unfortunately will likely be my last.

Amjed Almousa
02/01/2011 8:22am PST

Setup was difficult to get done in the time frame …. Also presenter was running to fast… had he went a bit slower I guess attendees could have benefited more …

Picture of Sean Boisen
Sean Boisen
02/01/2011 6:58am PST

The hands-on component was a major disappointment. Initial set-up took way too long, the karmasphere server wasn’t up to the load, and then the steps of actually configuring and operating the karmasphere system weren’t at all clear.

Bill Neubauer
02/01/2011 4:10am PST

Some of the setup should have been sent to attendees beforehand. Sitting around waiting for AWS is losing people.

Picture of Jenni Snyder
Jenni Snyder
02/01/2011 3:25am PST

It’s unfortunate that this setup is so long and complicated; at some point I think it might make sense to just proceed with the demo.


  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts