Apache Beam lets you process unbounded, out-of-order, global-scale data with portable high-level pipelines, but not all use cases are pipelines of simple “map” and “combine” operations. Aljoscha Krettek introduces Beam’s new State API, which brings scalability and consistency to fine-grained stateful processing while interoperating with Beam’s other features such as consistent event-time windowing and windowed side inputs—all while remaining portable to any Beam runner, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Aljoscha covers the new state and timer features in Beam and shows how to use them to express common real-world use cases in a backend-agnostic manner.
Examples of new use cases unlocked by Beam’s new mutable state and timers include:
Aljoscha Krettek is a cofounder and software engineer at Ververica. Previously, he worked at IBM Germany and at the IBM Almaden Research Center in San Jose. Aljoscha is a PMC member at Apache Beam and Apache Flink, where he mainly works on the streaming API and designed and implemented the most recent additions to the windowing and state APIs. He studied computer science at TU Berlin.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com