Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

TuneIn: How to get your jobs tuned while you are sleeping

Manoj Kumar (LinkedIn), Pralabh Kumar (LinkedIn), Arpan Agrawal (LinkedIn)

4:20pm–5:00pm Thursday, 09/13/2018

Data engineering and architecture
Location: 1E 09 Level: Intermediate

Average rating:

(5.00, 1 rating)

Download slides (PPTX)

Who is this presentation for?

Hadoop and Spark developers and managers, cluster administrators, data scientists, and data analysts

Prerequisite knowledge

Familiarity with the Hadoop/Spark ecosystem

What you'll learn

Explore TuneIn, an auto-tuning framework developed on top of Dr. Elephant
Learn how a Hadoop/Spark job can be tuned automatically to optimize the resources used by the job

Description

Have you ever tuned a Spark, Hive, or Pig job? If the answer is yes, you already know that it is a never-ending cycle that involves executing the job, observing the running job, making sense out of hundreds of metrics, and then rerunning it with the better parameters. Now imagine doing this for tens of thousands of jobs. Manual performance optimization at this scale is both tedious and costly, requires significant domain expertise, and results in a lot of wasted resources.

LinkedIn solved this problem by developing Dr. Elephant, an open source self-serve performance monitoring and tuning tool for Hadoop and Spark. While it has proven to be very successful at LinkedIn as well as other companies, it relies on a developer’s initiative to check and apply the recommendations manually. It also expects some expertise from developers to arrive at the optimal configuration from the recommendations.

Manoj Kumar, Pralabh Kumar, and Arpan Agrawal offer an overview of TuneIn, an auto-tuning framework developed on top of Dr. Elephant. You’ll learn how LinkedIn uses an iterative optimization approach to find the optimal parameter values, the various optimization algorithms the team tried and why the particle swarm optimization algorithm gave the best results, and how they avoided using any extra execution by tuning the jobs during their regularly scheduled executions. Manoj, Pralabh, and Arpan also share techniques that ensure faster convergence and zero failed executions while tuning, explain how LinkedIn achieved a more than 50% reduction in resource usage by tuning a small set of parameters, and outline lessons learned and a future roadmap for the tool.

Manoj Kumar

Manoj Kumar is a senior software engineer on the data team at LinkedIn, where he is currently working on auto-tuning Hadoop jobs. He has more than four years of experience in big data technologies like Hadoop, MapReduce, Spark, HBase, Pig, Hive, Kafka, and Gobblin. Previously, he worked on the data framework for slicing and dicing (30 dimensions, 50 metrics) advertising data at PubMatic and worked at Amazon.

Pralabh Kumar

Pralabh Kumar is a senior software engineer on the data team at LinkedIn, where he is working on auto-tuning Spark jobs. He has more than seven years of experience in big data technologies like Spark, Hadoop, MapReduce, Cassandra, Hive, Kafka, and ELK. He contributes to Spark and Livy and has filed couple of patents. Previously, he worked on the real-time system for unique customer identification at Walmart. He holds a degree from the University of Texas at Dallas.

Arpan Agrawal

Arpan Agrawal is software engineer on the analytics platforms and applications team at LinkedIn. He holds a graduate degree in computer science and engineering from IIT Kanpur.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com