Hadoop Operations: Managing Big Data Clusters

Operations
Location: Regency 1 Level: Intermediate
Average rating: ***..
(3.69, 26 ratings)

Hadoop and MapReduce are enabling consumers of big data to store and process ever increasing volumes of data. Logs, sensor data, user generated content, feeds – the list goes on.

At Cloudera, we’ve witnessed the same process occur at organizations across the world: what starts as a research project on a small Hadoop cluster ends up in the production pipeline, thus creating a new burden for the operations team. A number of tools for managing a reliable and scalable Hadoop installation have been produced by Hadoop’s vibrant user community, including major contributions from Yahoo!, Facebook, and IBM Research. We’ll demonstrate how to put these tools to use in your Hadoop cluster.

During this session, we’ll share war stories from clusters we’ve managed, as well as specific tips and tricks for scaling Hadoop from tens to thousands of nodes. We’ll cover the following in detail:

  • Cluster Setup: How do you get started? Where can you find stable binaries? How do you image your systems? How do you keep track of masters, slaves, and other deviants?
  • Monitoring and Alerting: How do you know if your cluster is feeling well? What monitoring tools work best with Hadoop? What tricks can help your monitoring systems scale with your cluster?
  • Upgrades: Hadoop has a fast release cycle. When should you upgrade? Will it be smooth? How will this impact your users?
  • Optimization: Are you 100 servers performing more like 60 or 160? How do you tell? What about those 50 tuning parameters?
Photo of Jeff Hammerbacher

Jeff Hammerbacher

Assistant Professor | Mount Sinai

Jeff Hammerbacher was an Entrepreneur in Residence at Accel Partners immediately prior to joining Cloudera. Before Accel, he conceived, built, and led the Data team at Facebook. The Data team was responsible for driving many of the applications of statistics and machine learning at Facebook, as well as building out the infrastructure to support these tasks for massive data sets. The team produced two open source projects: Hive, a system for offline analysis built above Hadoop, and Cassandra, a structured storage system on a P2P network. Before joining Facebook, Jeff was a quantitative analyst on Wall Street. Jeff earned his Bachelor’s Degree in Mathematics from Harvard University.

Comments on this page are now closed.

Comments

Picture of Steve Bennett
Steve Bennett
06/23/2009 2:05am PDT

An excellent presentation.. Great to get technical..

Gary Richardson
06/22/2009 12:05pm PDT

Good coverage of HDFS.

I was hoping for more information on streaming logs into the warehouse from production systems.

Picture of Aaron Kulick
Aaron Kulick
06/22/2009 9:37am PDT

Again, the A/V (audio) tech in there with his iPhone ringing away. Bad form. On a separate note, I found ths presentation to be very informative and extremely well organized.

Picture of Ernest Mueller
Ernest Mueller
06/22/2009 8:57am PDT

Really informative and, though technically in depth, well organized.

  • Keynote Systems
  • Google
  • Shopzilla
  • Aptimize
  • Facebook
  • NeuStar
  • Rackspace Cloud
  • Schooner Information Technology
  • SoftLayer
  • SpringSource
  • Sun Microsystems

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com

Download the Velocity Sponsor/Exhibitor Prospectus

Media Partner Opportunities

Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Velocity Newsletter

To stay abreast of conference news and to receive email notification when registration opens, please sign up for the Velocity Conference newsletter (login required)

Contact Us

View a complete list of Velocity contacts