Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability & Efficiency

Jitender Aswani (Netflix), Di Lin (Netflix)
11:00am11:40am Wednesday, March 27, 2019
Secondary topics:  Data Integration and Data Pipelines, Data preparation, data governance, and data lineage, Media, Marketing, Advertising

Who is this presentation for?

Data Leaders, Data Architects, Data Engineers, Data Infrastructure Engineers

Level

Intermediate

Prerequisite knowledge

Understanding of Big Data Architecture, ETL pipeline development, Data Lineage Concepts

What you'll learn

The presenters will share key data principles they applied to design a comprehensive centralized lineage solution and also discuss how they leveraged open source data technologies to implement this architecture at scale. The audience will also learn from few false starts presenters experienced before course correcting to go on to build a highly leverage-able data solution.

Description

Netflix data infrastructure generates over 1 trillion events per day and stores over 100PB of data. Data Lineage plays a central role in our warehouse for establishing data integrity and trust.

Using a variety of publicly known use-cases and practical data architecture, Di Lin and Jitender Aswani will discuss how Netflix built a centralized lineage service to better understand the movement and evolution of data and related data artifacts within the company’s data warehouse, from the initial ingestion of trillions of events through multi-stage ETLs, reports and dashboards.

By establishing end-to-end data lineage across all data artifacts at an extremely granular level, we are able to improve platform reliability by forecasting accurate job SLA. We are also able to increase company wide trust in the data and enhance efficiency of the data infrastructure by establishing appropriate data retention levels.

Establishing trust in data is quintessential for all companies. Developing and scaling such a complex platform requires many iterations and a large investment of resources. We are hoping that the audience will benefit from our learnings and will build and scale similar data solutions at their own companies.

Photo of Jitender Aswani

Jitender Aswani

Netflix

Jitender Aswani supports the infrastructure and Security Data Engineering teams at Netflix. His team designs, builds and deploys scalable big data architecture and solutions to enable business and operations teams achieve consistent capacity, reliability and security gains. Jitender is a life-long student of smart data products and data science solutions that push organization to make data-inspired decisions and adopt analytics-first approach.

Photo of Di Lin

Di Lin

Netflix

Di Lin is a Senior Data Engineer in Infrastructure and Information Security team at Netflix. Di’s primary focus at Netflix is to build and scale complex data systems to help infrastructure teams improve reliability and efficiency. Prior to Netflix, Di Lin was a Data Engineer at Facebook where he built various company wide data products related to identity and subscriber growth.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)