Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Automating cloud cluster deployment: Beyond the book

Bill Havanki (Cloudera)
1:15pm1:55pm Thursday, September 28, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud

Who is this presentation for?

  • Cluster administrators, big data developers, and cloud engineers

Prerequisite knowledge

  • Familiarity with running or using Hadoop clusters on-premises
  • Basic knowledge of cloud provider concepts like compute instances and virtual networking, especially in AWS

What you'll learn

  • Learn how to automate the creation of new clusters from scratch and use metrics gathered using the cloud provider to scale up


Often, when an organization first ventures into the cloud for running Hadoop clusters, it carries over practices that worked well on-premises, along with the idea that each cluster should last a long time and be carefully tended. It soon becomes apparent that there is a different, perhaps more effective way: deploying clusters on demand, scaling them as needed, and destroying them to save costs when demand slackens.

The problem is that it’s a lot of work to deploy a cluster in the cloud. There’s still the usual installation and configuration for all of the cluster services, but now you also need to think about allocating instances, placing them into your virtual networks, setting up security, creating new accounts, and more. How can all of that be done quickly enough to support an agile system of cloud cluster management?

Drawing on ideas from his book Moving Hadoop to the Cloud, Bill Havanki explains how you can automate the creation of new clusters from scratch and use metrics gathered using the cloud provider to scale up. Moving Hadoop to the Cloud covers many of the techniques you need, including creating instance images with most of your work baked in ahead of time, using automation to handle the rest of the work, and devising your own cloud-based metrics tailored to Hadoop clusters that inform you when your cluster could use more resources. Bill then takes you even further, demonstrating how to automate the creation of entire clusters, relying on the cloud provider API and your own scripting to make it happen. Once you can automatically create new clusters, you can also trigger similar actions from your metrics to scale your cluster up in response to demand, fully harnessing cloud flexibility for effective cluster management.

Photo of Bill Havanki

Bill Havanki


Bill Havanki is a software engineer at Cloudera, where he contributes to Hadoop components and systems for deploying Hadoop clusters into public cloud services. Previously, Bill worked for 15 years developing software for government contracts, focusing mostly on analytic frameworks and authentication and authorization systems. He holds a BS in electrical engineering from Rutgers University and an MS in computer engineering from North Carolina State University. A New Jersey native, Bill currently lives near Annapolis, Maryland, with his family.