Downscaling: The Achilles heel of autoscaling Spark clusters

Prakhar Jain (Microsoft), Sourabh Goyal (Qubole)

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 1A 21/22

Data Engineering and Architecture

Secondary topics: Deep dive into specific tools, platforms, or frameworks

Average rating:

(5.00, 1 rating)

Download slides (PDF)

Who is this presentation for?

Data engineers

Level

Intermediate

Description

Adding nodes at runtime (upscaling) to already running Spark on YARN clusters is fairly easy. But taking away these nodes (downscaling) when the workload is low at some later point is difficult. To remove a node from a running cluster, you need to make sure that it isn’t used for compute and storage. But on production workloads, many nodes can’t be taken away because nodes are running some containers, although they are not fully utilized. That means all containers are fragmented on different nodes. For example, each node is running one or two containers or executors, although they have resources to run f containers. Long-running Spark executors makes it even more difficult. Or nodes have some shuffle data in the local disk that will be consumed by a Spark application running on this cluster later. In this case, the resource manager will never decide to reclaim these nodes because losing shuffle data could lead to costly recomputation of stages or tasks.

Prakhar Jain and Sourabh Goyal explore how to improve downscaling in Spark on YARN clusters under the presence of such constraints. They cover changes in scheduling strategy for container allocation in the YARN and Spark task scheduler, which together helps achieve better packing of containers. This makes sure that containers are defragmented on fewer sets of nodes and some nodes don’t have any compute. By being careful in how you assign containers in the first place, you can prevent the chance of running into situations where containers of the same application are running over different nodes. They also examine enhancements to the Spark driver and external shuffle service (ESS) which helps you proactively delete shuffle data that you already know has been consumed. This makes sure that nodes are not holding any unnecessary shuffle data—thus freeing them from storage and making them available for reclamation for faster downscaling.

Prerequisite knowledge

Familiarity with cloud as a concept

What you'll learn

Identify efficient downscaling techniques for elastic clusters on the cloud

Prakhar Jain

Microsoft

Prakhar Jain is working as Senior Software Engineer in Spark team at Microsoft. Prior to this, he was working on cluster orchestration and big data stack at Qubole. Prakhar holds a bachelor of computer science engineering from the Indian Institute of Technology, Bombay, India.

Website

Sourabh Goyal

Qubole

Sourabh Goyal is a member of the technical staff at Qubole, where he works on the Hadoop team. Sourabh holds a bachelor in computer engineering from Netaji Shubas Institute of Technology, University of Delhi