If you are building applications for the big data world, you’ll need access to a variety of big data platform configurations during development and continuous integration. These deployments need to scale down to an individual laptop or scale up to a realistic test cluster, and they need to be interchangeable with quick setup and teardown. Chad Metcalf and Seshadri Mahalingam explore several strategies for meeting this challenge by containerizing and deploying Hadoop services in Docker. Chad and Seshadri share lessons learned from solutions that they built, some which have been open sourced, and dig into topics including how to manage the images lifecycle, configuration, persistent data, multihost networking with Docker Engine and Swarm, and creating different deployment environments with Docker Machine.
Chad Metcalf is a solutions engineering manager for Docker. Previously, Chad worked at Puppet Labs and was an infrastructure engineer at WibiData and Cloudera.
Seshadri Mahalingam is a software engineer at Trifacta, where, in addition to building out Wrangle, Trifacta’s domain-specific language for expressing data transformation, he develops the low-latency compute framework that powers Trifacta’s fluid and immersive data wrangling experience. Seshadri holds a BS in EECS from UC Berkeley, where he cotaught a class on open source software.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.