Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

TuneIn: How to get your jobs tuned while you are sleeping

Manoj Kumar (LinkedIn), Pralabh Kumar (LinkedIn), Arpan Agrawal (LinkedIn)
4:20pm–5:00pm Thursday, 09/13/2018
Data engineering and architecture
Location: 1E 09 Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Hadoop and Spark developers and managers, cluster administrators, data scientists, and data analysts

Prerequisite knowledge

  • Familiarity with the Hadoop/Spark ecosystem

What you'll learn

  • Explore TuneIn, an auto-tuning framework developed on top of Dr. Elephant
  • Learn how a Hadoop/Spark job can be tuned automatically to optimize the resources used by the job

Description

Have you ever tuned a Spark, Hive, or Pig job? If the answer is yes, you already know that it is a never-ending cycle that involves executing the job, observing the running job, making sense out of hundreds of metrics, and then rerunning it with the better parameters. Now imagine doing this for tens of thousands of jobs. Manual performance optimization at this scale is both tedious and costly, requires significant domain expertise, and results in a lot of wasted resources.

LinkedIn solved this problem by developing Dr. Elephant, an open source self-serve performance monitoring and tuning tool for Hadoop and Spark. While it has proven to be very successful at LinkedIn as well as other companies, it relies on a developer’s initiative to check and apply the recommendations manually. It also expects some expertise from developers to arrive at the optimal configuration from the recommendations.

Manoj Kumar, Pralabh Kumar, and Arpan Agrawal offer an overview of TuneIn, an auto-tuning framework developed on top of Dr. Elephant. You’ll learn how LinkedIn uses an iterative optimization approach to find the optimal parameter values, the various optimization algorithms the team tried and why the particle swarm optimization algorithm gave the best results, and how they avoided using any extra execution by tuning the jobs during their regularly scheduled executions. Manoj, Pralabh, and Arpan also share techniques that ensure faster convergence and zero failed executions while tuning, explain how LinkedIn achieved a more than 50% reduction in resource usage by tuning a small set of parameters, and outline lessons learned and a future roadmap for the tool.

Photo of Manoj Kumar

Manoj Kumar

LinkedIn

Manoj Kumar is a senior software engineer on the data team at LinkedIn, where he is currently working on auto-tuning Hadoop jobs. He has more than four years of experience in big data technologies like Hadoop, MapReduce, Spark, HBase, Pig, Hive, Kafka, and Gobblin. Previously, he worked on the data framework for slicing and dicing (30 dimensions, 50 metrics) advertising data at PubMatic and worked at Amazon.

Photo of Pralabh Kumar

Pralabh Kumar

LinkedIn

Pralabh Kumar is a senior software engineer on the data team at LinkedIn, where he is working on auto-tuning Spark jobs. He has more than seven years of experience in big data technologies like Spark, Hadoop, MapReduce, Cassandra, Hive, Kafka, and ELK. He contributes to Spark and Livy and has filed couple of patents. Previously, he worked on the real-time system for unique customer identification at Walmart. He holds a degree from the University of Texas at Dallas.

Photo of Arpan Agrawal

Arpan Agrawal

LinkedIn

Arpan Agrawal is software engineer on the analytics platforms and applications team at LinkedIn. He holds a graduate degree in computer science and engineering from IIT Kanpur.