Stream processing is increasingly relevant in today’s world of big data, thanks to the lower latency, higher-value results, and more predictable resource utilization afforded by stream processing engines. At the same time, without a solid understanding of the necessary building blocks, streaming can feel like a complex and subtle beast. It doesn’t have to be that way.
Join Tyler Akidau and Frances Perry for a tour of stream processing concepts via a walkthrough of the easiest to use yet most sophisticated stream processing model on the planet, Apache Beam (incubating). You’ll explore a series of examples that help shed light on the important topics of windowing, watermarks, and triggers; observe firsthand the different shapes of materialized output made possible by the flexibility of the Beam streaming model; experience the portability afforded by Beam, as you work through examples using the runner of your choice (Apache Flink, Apache Spark, or Google Cloud Dataflow); and interact with engineers who have years of experience with massive-scale stream processing.
Frances Perry is a software engineer who likes to make big data processing easy, intuitive, and efficient. After many years working on Google’s internal data processing stack, Frances joined the Cloud Dataflow team to make this technology available to external cloud customers. She led the early work on Dataflow’s unified batch/streaming programming model and is on the PMC for Apache Beam.
Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems is the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.