Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Netflix: Integrating Spark at petabyte scale

Daniel Weeks (Netflix)
1:15pm–1:55pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Tags: media, featured
Average rating: ****.
(4.52, 23 ratings)
Slides:   1-PDF 

The Big Data Platform team at Netflix maintains a cloud-based data warehouse with over 10 petabytes of data stored predominantly in Parquet format. Our platform has traditionally leveraged Pig for ETL processing, Hive for large analytic workloads, and Presto for interactive and exploratory use cases. For a long time, Spark seemed attractive to complement our platform, but technical gaps prevented effective use at scale in our environment. Recent improvements have allowed us to add Spark to our cloud data architecture and interoperate seamlessly with the other tools and services in our stack.

We will go into detail about our deployment configuration and what it takes to run Spark alongside traditional workloads on YARN. We will share examples of a few of our largest workflows translated to Spark for comparison in terms of both performance and complexity. We also identified cases where big data tools were used to solve problems clearly out of their respective domains. This resulted in awkward implementations that were elegantly solved by Spark. Finally, we will share our vision of how Spark will evolve our platform and push the state of big data processing at Netflix.

Photo of Daniel Weeks

Daniel Weeks

Netflix

Daniel Weeks manages the Big Data Compute team at Netflix and is a Parquet committer. Prior to joining Netflix, Daniel focused on research in big data solutions and distributed systems.

Comments on this page are now closed.

Comments

Picture of Daniel Weeks
Daniel Weeks
10/05/2015 2:26pm EDT

Michael,

I’ve added the slides and asked the admins to attach them here.

Thanks. Sorry you weren’t able to attend.

Michael Rowe
10/02/2015 8:31am EDT

Hi Daniel,

Are you going to be making your slides available? I tried to watch the talk but the room was overflowing…

Cheers!