Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference

How Apache Beam can advance your enterprise workloads

1:45pm2:25pm Thursday, December 7, 2017
Average rating: *****
(5.00, 1 rating)

Prerequisite knowledge

  • A basic understanding of big data technologies, such as such as Apache Flink and Apache Spark, and the cloud

What you'll learn

  • Learn how to build data pipelines using Apache Beam
  • Discover how Beam can execute the same code across different runners, with demo based on an IoT use case where Apache Beam will deal with MQTT messages


Apache Beam is an evolution of the Dataflow model created by Google to process massive amounts of data. The name Beam (Batch + strEAM) comes from the idea of having a unified model for both batch and stream data processing. Programs written using Beam can be executed in different processing frameworks (via runners).

Apache Beam is designed to provide efficient and portable data processing pipelines. The same Beam pipelines work in both batch and streaming workloads, as well as on a variety of open source and private cloud data processing backends, including Apache Flink, Apache Spark, and Google Cloud Dataflow. Jean-Baptiste Onofré offers an overview of Apache Beam’s programming model, explores mechanisms for efficiently building data pipelines, and demos an IoT use case dealing with MQTT messages.

Photo of Jean-Baptiste Onofre

Jean-Baptiste Onofre


Jean-Baptiste Onofré is a fellow and software architect at cloud and big data integration software company Talend. An ASF member and contributor to roughly 20 different Apache projects, Jean-Baptiste specializes in both system integration and big data. He is also a champion and PPMC on multiple Apache Beam projects.