Mar 15–18, 2020

Autoscaling big data operations in the cloud

Alexander Pierce (Pepperdata)
11:50am12:30pm Tuesday, March 17, 2020
Location: LL20A
Secondary topics:  Cloud Platforms and SaaS

Who is this presentation for?

Data engineers, data architects, developers




Scaling the number of nodes in your cluster up and down on the fly is one of the major features that make cloud deployments so attractive to I&O teams. You can resize your cluster down when you have little or no workload. You can scale your cluster up to support increased workloads or to simply add processing power when a job slows. But estimating the right number of cluster nodes for a workload is difficult, user-initiated cluster scaling requires significant manual intervention, and mistakes can be costly and disruptive.

Autoscaling is a feature built into modern cloud services that enables applications to perform their best when demand changes. But the definition of performance can vary depending on the app. Some are CPU bound, others memory bound. Some are “spiky” in nature, while others are constant and predictable. Autoscaling automatically addresses these variables to help ensure optimal application performance under most conditions. Amazon EMR, Azure HDInsight, and Google Cloud DataProc all provide autoscaling for big data and Hadoop, but each takes a different approach and offers a different cost model.

Alex Pierce evaluates of these three leading cloud service providers with respect to Hadoop and big data autoscaling capabilities and offers guidance to help you determine which flavor of autoscaling will best fit your business needs. You’ll learn about the operational challenges associated with maintaining optimal performance for big data in the cloud and what milestones to set, and you’ll gain recommendations on how to create a successful cloud migration framework for your critical big data workloads.

Prerequisite knowledge

  • General knowledge of Hadoop, big data, and cloud platforms

What you'll learn

  • Evaluate three leading cloud service providers with respect to Hadoop autoscaling capabilities
  • Determine the best autoscaling option to fit your business requirements
Photo of Alexander Pierce

Alexander Pierce


Alexander Pierce is a field engineer at Pepperdata.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

For media/analyst press inquires