Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

Scaling data lineage at Netflix to improve data infrastructure reliability and efficiency

Jitender Aswani (Netflix), Di Lin (Netflix), Girish Lingappa (Netflix)
11:00am11:40am Wednesday, March 27, 2019
Average rating: ***..
(3.40, 15 ratings)

Who is this presentation for?

  • Data leaders, data architects, data engineers, and data infrastructure engineers



Prerequisite knowledge

  • A basic understanding of big data architecture, ETL pipeline development, and data lineage concepts

What you'll learn

  • Explore the key data principles Netflix applied to design a comprehensive centralized lineage solution
  • Discover how the company leveraged open source data technologies to implement this architecture at scale


Netflix data infrastructure generates over a trillion events per day and stores over 100 PB of data. Data lineage plays a central role in the company’s warehouse for establishing data integrity and trust.

Using a variety of publicly known use-cases and practical data architecture, Di Lin, Girish Lingappa, and Jitender Aswani explain how Netflix built a centralized lineage service to better understand the movement and evolution of data and related data artifacts within the company’s data warehouse, from the initial ingestion of trillions of events through multistage ETLs, reports, and dashboards.

By establishing end-to-end data lineage across all data artifacts at an extremely granular level, Netflix is able to improve platform reliability by forecasting accurate job SLAs while also increasing company-wide trust in the data and enhancing the efficiency of the data infrastructure by establishing appropriate data retention levels.

Photo of Jitender Aswani

Jitender Aswani


Jitender Aswani supports the infrastructure and security data engineering teams at Netflix. His team designs, builds, and deploys scalable big data architecture and solutions to enable business and operations teams to achieve consistent capacity, reliability, and security gains. Jitender is a lifelong student of smart data products and data science solutions that push organizations to make data-inspired decisions and adopt analytics-first approaches.

Photo of Di Lin

Di Lin


Di Lin is a senior data engineer on the infrastructure and information security team at Netflix, where he focuses on building and scaling complex data systems to help infrastructure teams improve reliability and efficiency. Previously, he was a data engineer at Facebook, where he built company-wide data products related to identity and subscriber growth.

Girish Lingappa


Girish Lingappa is a senior software engineer at Netflix.