Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Scaling data lineage at Netflix to improve data infrastructure reliability and efficiency

Jitender Aswani (Netflix), Di Lin (Netflix), Girish Lingappa (Netflix)
11:00am11:40am Wednesday, March 27, 2019
Secondary topics:  Data Integration and Data Pipelines, Data preparation, data governance, and data lineage, Media, Marketing, Advertising
Average rating: ***..
(3.40, 15 ratings)

Who is this presentation for?

  • Data leaders, data architects, data engineers, and data infrastructure engineers

Level

Intermediate

Prerequisite knowledge

  • A basic understanding of big data architecture, ETL pipeline development, and data lineage concepts

What you'll learn

  • Explore the key data principles Netflix applied to design a comprehensive centralized lineage solution
  • Discover how the company leveraged open source data technologies to implement this architecture at scale

Description

Netflix data infrastructure generates over a trillion events per day and stores over 100 PB of data. Data lineage plays a central role in the company’s warehouse for establishing data integrity and trust.

Using a variety of publicly known use-cases and practical data architecture, Di Lin, Girish Lingappa, and Jitender Aswani explain how Netflix built a centralized lineage service to better understand the movement and evolution of data and related data artifacts within the company’s data warehouse, from the initial ingestion of trillions of events through multistage ETLs, reports, and dashboards.

By establishing end-to-end data lineage across all data artifacts at an extremely granular level, Netflix is able to improve platform reliability by forecasting accurate job SLAs while also increasing company-wide trust in the data and enhancing the efficiency of the data infrastructure by establishing appropriate data retention levels.

Photo of Jitender Aswani

Jitender Aswani

Netflix

Jitender Aswani supports the infrastructure and security data engineering teams at Netflix. His team designs, builds, and deploys scalable big data architecture and solutions to enable business and operations teams to achieve consistent capacity, reliability, and security gains. Jitender is a lifelong student of smart data products and data science solutions that push organizations to make data-inspired decisions and adopt analytics-first approaches.

Photo of Di Lin

Di Lin

Netflix

Di Lin is a senior data engineer on the infrastructure and information security team at Netflix, where he focuses on building and scaling complex data systems to help infrastructure teams improve reliability and efficiency. Previously, he was a data engineer at Facebook, where he built company-wide data products related to identity and subscriber growth.

Girish Lingappa

Netflix

Girish Lingappa is a senior software engineer at Netflix.