Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Realizing the promise of portability with Apache Beam

Tyler Akidau (Google)
11:1511:55 Thursday, 25 May 2017
Level: Beginner
Average rating: ***..
(3.80, 5 ratings)

Who is this presentation for?

  • Developers, architects, managers, and anyone interested in learning the capabilities of Beam relative to portability

Prerequisite knowledge

  • Familiarity with big data processing concepts

What you'll learn

  • Understand what Beam and its associated ecosystem look like today, how Beam can deliver on the promise of pipeline portability across multiple execution engines and environments, and where the project is headed in the future


The world of big data involves an ever-changing field of players. Much as SQL is a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. Tyler Akidau explains how this vision has been realized and discusses the challenges that lie ahead.

Topics include:

  • The capabilities of the Beam model for data processing and the current state of the Beam ecosystem
  • The benefits Beam provides regarding portability and ease-of-use
  • A demo of the same Beam pipeline running on multiple runners in multiple deployment scenarios (e.g., Apache Flink on Google Cloud, Apache Spark on AWS, and Apache Apex on-premises)
  • Some of the challenges Beam aims to address in the future
Photo of Tyler Akidau

Tyler Akidau


Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems is the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.