Over 109 million subscribers enjoy more than 125 million hours of TV shows and movies per day on Netflix. This leads to a massive amount of data flowing through its data pipeline that can be used to improve service and user experience and power various data analytic cases like personalization, operational insight, and fraud detection.
Netflix is now building a stream-processing-as-a-service (SPaaS) platform on top of Apache Flink that is self-serve, operable, scalable, fault tolerant, and multitenant. Steven Wu explains how Netflix’s SPaaS platform empowers users to focus on extracting insights from data streams and build stream processing applications. He also shares lessons learned building and operating the largest SPaaS use case: Netflix’s Keystone data pipeline, a self-serve platform for creating near-real-time event pipelines that processes three trillion events and 12 PB of data every day.
Steven Wu is a software engineer working on real-time data infrastructure that powers a massive data ingestion pipeline and stream processing at Netflix. Steven is passionate about building scalable and operable distributed systems. Previously, he worked on the cloud platform that serves as the foundation of Netflix’s cloud-native microservice architecture. Before Netflix, he worked on Yahoo’s messenger server team, where he was a key contributor in revamping messenger’s backend and supporting multicolo deployment; he also designed and implemented a distributed key-value storage system.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com