Engineering the Future of Software
Feb 25–26, 2018: Training
Feb 26–28, 2018: Tutorials & Conference
New York, NY

Building stream processing as a service at Netflix

Steven Wu (Netflix)
1:15pm–2:05pm Tuesday, February 27, 2018
Cloud native, Distributed systems, Scale
Location: Grand Ballroom West
Secondary topics:  Best Practice, Case Study
Average rating: **...
(2.42, 19 ratings)

Who is this presentation for?

  • Engineers

Prerequisite knowledge

  • A basic understanding of distributed systems

What you'll learn

  • Learn why and how Netflix is building a stream-processing-as-a-service platform

Description

Over 109 million subscribers enjoy more than 125 million hours of TV shows and movies per day on Netflix. This leads to a massive amount of data flowing through its data pipeline that can be used to improve service and user experience and power various data analytic cases like personalization, operational insight, and fraud detection.

Netflix is now building a stream-processing-as-a-service (SPaaS) platform on top of Apache Flink that is self-serve, operable, scalable, fault tolerant, and multitenant. Steven Wu explains how Netflix’s SPaaS platform empowers users to focus on extracting insights from data streams and build stream processing applications. He also shares lessons learned building and operating the largest SPaaS use case: Netflix’s Keystone data pipeline, a self-serve platform for creating near-real-time event pipelines that processes three trillion events and 12 PB of data every day.

Photo of Steven Wu

Steven Wu

Netflix

Steven Wu is a software engineer working on real-time data infrastructure that powers a massive data ingestion pipeline and stream processing at Netflix. Steven is passionate about building scalable and operable distributed systems. Previously, he worked on the cloud platform that serves as the foundation of Netflix’s cloud-native microservice architecture. Before Netflix, he worked on Yahoo’s messenger server team, where he was a key contributor in revamping messenger’s backend and supporting multicolo deployment; he also designed and implemented a distributed key-value storage system.