Netflix data infrastructure generates over a trillion events per day and stores over 100 PB of data. Data lineage plays a central role in the company’s warehouse for establishing data integrity and trust.
Using a variety of publicly known use-cases and practical data architecture, Di Lin, Girish Lingappa, and Jitender Aswani explain how Netflix built a centralized lineage service to better understand the movement and evolution of data and related data artifacts within the company’s data warehouse, from the initial ingestion of trillions of events through multistage ETLs, reports, and dashboards.
By establishing end-to-end data lineage across all data artifacts at an extremely granular level, Netflix is able to improve platform reliability by forecasting accurate job SLAs while also increasing company-wide trust in the data and enhancing the efficiency of the data infrastructure by establishing appropriate data retention levels.
Jitender Aswani supports the infrastructure and security data engineering teams at Netflix. His team designs, builds, and deploys scalable big data architecture and solutions to enable business and operations teams to achieve consistent capacity, reliability, and security gains. Jitender is a lifelong student of smart data products and data science solutions that push organizations to make data-inspired decisions and adopt analytics-first approaches.
Di Lin is a senior data engineer on the infrastructure and information security team at Netflix, where he focuses on building and scaling complex data systems to help infrastructure teams improve reliability and efficiency. Previously, he was a data engineer at Facebook, where he built company-wide data products related to identity and subscriber growth.
Girish Lingappa is a senior software engineer at Netflix.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org