Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Running Production Hadoop Clusters in Docker Containers

Nasser Manesh (Altiscale, Inc.)
10:40am–11:20am Friday, 02/20/2015
Hadoop Platform
Location: 210 C/G
Average rating: ****.
(4.00, 1 rating)
Slides:   1-PPTX 

In this talk I’ll be presenting our experience, lessons learned, pitfalls to avoid, and deployment patterns around operationalizing Hadoop in Docker containers. This talk will be most beneficial to SREs, DevOps staff, and people focusing on operations. Development, creating MR jobs, and design issues will not be covered in the presentation.

Running large scale multi-tenant Hadoop clusters means facing non-trivial challenges in the areas of resource control and security. We have deployed Docker on physical machines in order to partition our resources without too much overhead, providing light-weight virtual machines as compute nodes for Hadoop. In order to achieve this, and for being able to operate and maintain this installation, we have had to work on a number of issues:

  • Sizing and capacity planning: How many containers on a machine, and how to allocate resources to those containers
  • Configuration management: How to configure containers for a datanode vs a nodenamager vs a namenode
  • Accessing data: How to expose large and parallel disks to the containers that serve HDFS
  • Automation: How to bring up and manage containers on a large number of machines service as Hadoop nodes
  • Metrics and Monitoring: How to check the health of Docker containers and pass it up to Hadoop, and how to collect correct metrics for disk, memory, CPU and network representing a container rather than the entire host
  • Streamlining troubleshooting: How to do logging, at host and container level, to be able to respond quickly to operational issues.

Additionally, we had to resolve a series of networking issues to make sure that Docker containers could take advantage of a high-bandwidth, high-throughput network built and optimized for Hadoop.

Docker is more popular among developers but less applied to production systems, so operating it at large scale is a road less traveled. I’m hoping that this talk helps people that plan to use Docker and Hadoop together avoid some time consuming issue, and end up with expedited and reliable deployments.

Photo of Nasser Manesh

Nasser Manesh

Altiscale, Inc.

Nasser Manesh has 25 years of experience in Unix, infrastructure, distributed systems, and backend operations, mostly in DevOps, team lead, and CTO roles. He has founded startups in consumer Internet, mobile, photography and art areas. Nasser is currently focused on Big Data infrastructure, Hadoop core (HDFS/YARN), Chef, Linux cgroups, and Docker at scale.