Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference

Learn stream processing with Apache Beam

Tyler Akidau (Google), Slava Chernyak (Google), Dan Halperin (Google)
1:30pm–5:00pm Tuesday, December 6, 2016
IoT and intelligent real-time applications
Location: 323 Level: Beginner
Average rating: ***..
(3.00, 1 rating)

Prerequisite Knowledge

Materials or downloads needed in advance

  • A laptop with the Beam execution engine of your choice (Flink, Spark, or Cloud Dataflow) installed and set up
  • A GitHub account

What you'll learn

  • Understand the foundations of stream processing and the ease with which portable streaming can be accomplished via the Apache Beam platform
  • Explore a series of examples that help shed light on the important topics of windowing, watermarks, and triggers
  • Observe firsthand the different shapes of materialized output made possible by the flexibility of the Beam streaming model
  • Experience the portability afforded by Beam as you work through examples using the runner of your choice: Apache Flink, Apache Spark, or Google Cloud Dataflow


Stream processing is increasingly relevant in today’s world of big data, thanks to the lower latency, higher-value results, and more predictable resource utilization afforded by stream processing engines. At the same time, without a solid understanding of the necessary building blocks, streaming can feel like a complex and subtle beast. It doesn’t have to be that way. Join Tyler Akidau, Slava Chernyak, and Dan Halperin for a tour of stream processing concepts via a walkthrough of the most easy-to-use yet sophisticated stream processing model on the planet: Apache Beam (incubating). This hands-on tutorial covers the basics of robust stream processing (windowing, watermarks, and triggers), with the option to work through exercises using the runner of your choice (Flink, Spark, or Google Cloud Dataflow).

Photo of Tyler Akidau

Tyler Akidau


Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems is the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.

Photo of Slava Chernyak

Slava Chernyak


Slava Chernyak is a senior software engineer at Google. Slava spent over five years working on Google’s internal massive-scale streaming data processing systems and has since become involved with designing and building Google Cloud Dataflow Streaming from the ground up. Slava is passionate about making massive-scale stream processing available and useful to a broader audience. When he is not working on streaming systems, Slava is out enjoying the natural beauty of the Pacific Northwest.

Photo of Dan Halperin

Dan Halperin


Dan Halperin is a PPMC member and committer on Apache Beam (incubating). He has worked on Beam and Google Cloud Dataflow for 18 months. Previously, he was the director of research for scalable data analytics at the University of Washington eScience Institute, where he worked on scientific big data problems in oceanography, astronomy, medical informatics, and the life sciences. Dan holds a PhD in computer science and engineering from the University of Washington.