Stream processing is increasingly relevant in today’s world of big data, thanks to the lower latency, higher-value results, and more predictable resource utilization afforded by stream processing engines. At the same time, without a solid understanding of the necessary building blocks, streaming can feel like a complex and subtle beast. It doesn’t have to be that way. Join Tyler Akidau, Slava Chernyak, and Dan Halperin for a tour of stream processing concepts via a walkthrough of the most easy-to-use yet sophisticated stream processing model on the planet: Apache Beam (incubating). This hands-on tutorial covers the basics of robust stream processing (windowing, watermarks, and triggers), with the option to work through exercises using the runner of your choice (Flink, Spark, or Google Cloud Dataflow).
Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems is the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.
Slava Chernyak is a senior software engineer at Google. Slava spent over five years working on Google’s internal massive-scale streaming data processing systems and has since become involved with designing and building Google Cloud Dataflow Streaming from the ground up. Slava is passionate about making massive-scale stream processing available and useful to a broader audience. When he is not working on streaming systems, Slava is out enjoying the natural beauty of the Pacific Northwest.
Dan Halperin is a PPMC member and committer on Apache Beam (incubating). He has worked on Beam and Google Cloud Dataflow for 18 months. Previously, he was the director of research for scalable data analytics at the University of Washington eScience Institute, where he worked on scientific big data problems in oceanography, astronomy, medical informatics, and the life sciences. Dan holds a PhD in computer science and engineering from the University of Washington.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.