Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Realizing the promise of portability with Apache Beam

Reuven Lax (Google)
11:20am12:00pm Thursday, September 28, 2017
Secondary topics:  Streaming

Who is this presentation for?

  • Developers, architects, managers, and anyone interested in learning the capabilities of Beam relative to portability

Prerequisite knowledge

  • Familiarity with big data processing concepts

What you'll learn

  • Explore Beam and its associated ecosystem
  • Learn how Beam can deliver on the promise of pipeline portability across multiple execution engines and environments and where the project is headed in the future

Description

Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. By cleanly separating the user’s processing logic from details of the underlying execution engine, the same pipelines will run on any Apache Beam runtime environment, whether it’s on-premises or in the cloud, on open source frameworks like Apache Spark or Apache Flink or on managed services like Google Cloud Dataflow.

Reuven Lax offers an overview of Beam basic concepts and demonstrates that portability in action. After introducing the capabilities of the Beam model for data processing and the current state of the Beam ecosystem, Reuven outlines the benefits Beam provides regarding portability and ease-of-use and demos the same Beam pipeline running on multiple runners in multiple deployment scenarios (e.g., Apache Flink on Google Cloud, Apache Spark on AWS, Apache Apex on-premises). Along the way, Reuven covers some of the challenges Beam aims to address in the future.

Reuven Lax

Google

Reuven Lax is a senior staff software engineer at Google, the tech lead for cloud-based stream processing (i.e., the streaming engine behind Google Cloud Dataflow), and the former tech lead of MillWheel.