Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Learn stream processing with Apache Beam

Frances Perry (Google), Tyler Akidau (Google)
9:00am12:30pm Tuesday, March 14, 2017
Stream processing and analytics
Location: 210 A/E Level: Beginner
Secondary topics:  Streaming
Average rating: ***..
(3.00, 2 ratings)

Who is this presentation for?

  • Anyone wanting to learn the basics of stream processing and Apache Beam

Prerequisite knowledge

Materials or downloads needed in advance

  • A laptop
  • A GitHub account
  • Any initial setup for the Beam execution engine of your choice (Flink, Spark, or Cloud Dataflow) already completed

What you'll learn

  • Understand the foundations of stream processing and the ease with which portable streaming can be accomplished via the Apache Beam platform

Description

Stream processing is increasingly relevant in today’s world of big data, thanks to the lower latency, higher-value results, and more predictable resource utilization afforded by stream processing engines. At the same time, without a solid understanding of the necessary building blocks, streaming can feel like a complex and subtle beast. It doesn’t have to be that way.

Join Tyler Akidau and Frances Perry for a tour of stream processing concepts via a walkthrough of the easiest to use yet most sophisticated stream processing model on the planet, Apache Beam (incubating). You’ll explore a series of examples that help shed light on the important topics of windowing, watermarks, and triggers; observe firsthand the different shapes of materialized output made possible by the flexibility of the Beam streaming model; experience the portability afforded by Beam, as you work through examples using the runner of your choice (Apache Flink, Apache Spark, or Google Cloud Dataflow); and interact with engineers who have years of experience with massive-scale stream processing.

Photo of Frances Perry

Frances Perry

Google

Frances Perry is a software engineer who likes to make big data processing easy, intuitive, and efficient. After many years working on Google’s internal data processing stack, Frances joined the Cloud Dataflow team to make this technology available to external cloud customers. She led the early work on Dataflow’s unified batch/streaming programming model and is on the PMC for Apache Beam.

Photo of Tyler Akidau

Tyler Akidau

Google

Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems is the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.