Apache Beam is an evolution of the Dataflow model created by Google to process massive amounts of data. The name Beam (Batch + strEAM) comes from the idea of having a unified model for both batch and stream data processing. Programs written using Beam can be executed in different processing frameworks (via runners).
Apache Beam is designed to provide efficient and portable data processing pipelines. The same Beam pipelines work in both batch and streaming workloads, as well as on a variety of open source and private cloud data processing backends, including Apache Flink, Apache Spark, and Google Cloud Dataflow. Jean-Baptiste Onofré offers an overview of Apache Beam’s programming model, explores mechanisms for efficiently building data pipelines, and demos an IoT use case dealing with MQTT messages.
Jean-Baptiste Onofré is a fellow and software architect at cloud and big data integration software company Talend. An ASF member and contributor to roughly 20 different Apache projects, Jean-Baptiste specializes in both system integration and big data. He is also a champion and PPMC on multiple Apache Beam projects.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org