July 20–24, 2015
Portland, OR

The evolution of the big data platform at Netflix

Eva Tse (Netflix, Inc)
5:00pm–5:40pm Wednesday, 07/22/2015
Scale Portland 256
Average rating: ****.
(4.42, 12 ratings)
Slides:   1-PPTX 

Prerequisite Knowledge

This session would be most helpful for engineers who are curious about solving big data scalability challenges through architecture. Some knowledge about open source big data technologies and the AWS cloud or other equivalent cloud service providers would help.

Description

At Netflix, the big data platform is the foundation for analytics that drive all product decisions that directly impact our customer experience. As for scale, it is one of the top three largest services running at Netflix, in terms of compute power and data size.

In this talk, we will take the audience through a journey to understand how we scale the platform to handle the increasing amount of data (over 400 billion events generated daily), the increasing demand of analytics (which translates to compute power), and the increasing number of users dependent on our platform to make business decisions.

Specifically, we will talk about how we built this architecture; which architectural choices we made along the way; and the challenges we faced:

  • How we have woven together Apache and community open source big data technologies (like Hadoop, Pig, Hive, Parquet, Presto, and Spark) into our stack
  • What we contribute to these open source projects to make them work for us
  • How we evaluate and evolve the choices of big data processing engines as we scale up
  • How we build on top of these open source technologies our own big data infrastructure, which we open-sourced in Netflix OSS (like Genie, Inviso, Lipstick)
  • And, how we evolve the toolings (like our big data portal and our big data API) as we scale up the number of users who rely on the systems to make critical business decisions.

Overall, you will learn about our open source-powered big data architecture in the AWS cloud, and how we build out the technology stack that comprises the big data platform at Netflix today.

Photo of Eva Tse

Eva Tse

Netflix, Inc

Eva Tse leads the Big Data Platform team at Netflix. Her team architects and manages the Netflix big data platform in the AWS cloud. The platform is leveraged across Netflix for data analytics and ETL. The technology stack includes various open source projects (e.g., Pig, Hive, Presto, Parquet, Hadoop) and Netflix open-sourced tools and services (e.g., Genie, Lipstick, Inviso). Prior to joining Netflix, Eva led the server and metadata service teams for PowerCenter at Informatica. Eva holds an MS and BS in computer science from the University of Houston.