The rise of unbounded, out-of-order, global-scale data requires increasingly sophisticated programming models to make stream processing feasible. When computing over an unbounded stream of data, each use case entails its own balance between three factors: completeness (confidence that you have all the data), latency (waiting to learn from the data), and cost (adding compute power to lower latency).
Kenneth Knowles shows how Apache Beam gives you control over this balance in a unified programming model that is portable to any Beam runner. Beam gives you this power by identifying and separating four concerns common to all streaming computations:
Regardless of backend, these questions must be answered. With Beam, you can answer these questions independently with loosely coupled APIs corresponding to each question: what—reading, transformation, aggregation, and writing; where—event time windowing; when—watermarks and triggers; and how—accumulation modes. With these, you can build a readable and portable pipeline focused on your problem rather than the quirks of your backend, which you can then execute on your runner of choice, including Apache Flink, Apache Spark, Apache Gearpump (also incubating), Apache Apex, or Google Cloud Dataflow.
Kenn Knowles is a founding committer of Apache Beam (incubating). Kenn has been working on Google Cloud Dataflow—Google’s Beam backend—since 2014. Prior to that, he built backends for startups such as Cityspan, Inkling, and Dimagi. Kenn holds a PhD in programming languages from the University of California, Santa Cruz.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.