In this talk I’ll be presenting our experience, lessons learned, pitfalls to avoid, and deployment patterns around operationalizing Hadoop in Docker containers. This talk will be most beneficial to SREs, DevOps staff, and people focusing on operations. Development, creating MR jobs, and design issues will not be covered in the presentation.
Running large scale multi-tenant Hadoop clusters means facing non-trivial challenges in the areas of resource control and security. We have deployed Docker on physical machines in order to partition our resources without too much overhead, providing light-weight virtual machines as compute nodes for Hadoop. In order to achieve this, and for being able to operate and maintain this installation, we have had to work on a number of issues:
Additionally, we had to resolve a series of networking issues to make sure that Docker containers could take advantage of a high-bandwidth, high-throughput network built and optimized for Hadoop.
Docker is more popular among developers but less applied to production systems, so operating it at large scale is a road less traveled. I’m hoping that this talk helps people that plan to use Docker and Hadoop together avoid some time consuming issue, and end up with expedited and reliable deployments.
Nasser Manesh has 25 years of experience in Unix, infrastructure, distributed systems, and backend operations, mostly in DevOps, team lead, and CTO roles. He has founded startups in consumer Internet, mobile, photography and art areas. Nasser is currently focused on Big Data infrastructure, Hadoop core (HDFS/YARN), Chef, Linux cgroups, and Docker at scale.
For exhibition and sponsorship opportunities, email email@example.com
For information on trade opportunities with O'Reilly conferences, email firstname.lastname@example.org
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata + Hadoop World contacts
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.