Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

The evolution of massive-scale data processing

Tyler Akidau (Google)
1:15pm–1:55pm Thursday, 09/29/2016
Data innovations
Location: 1 E 07/1 E 08 Level: Beginner
Tags: real-time
Average rating: ****.
(4.67, 3 ratings)

Prerequisite knowledge

  • Familiarity with basic data processing concepts (both batch and streaming)
  • A basic understanding of the high-level topics presented in Tyler's O'Reilly Radar articles, "The World Beyond Batch: Streaming 101" and "Streaming 102"
  • What you'll learn

  • Understand the building blocks of massive-scale data processing systems in general
  • Gain an improved ability to choose the right system for you needs
  • Learn a set of insights to apply when engineering your own data processing applications
  • Description

    Tyler Akidau offers a whirlwind tour of the evolution of massive-scale data processing at Google, from the original MapReduce paradigm to the high-level pipelines of Flume to the streaming approach of MillWheel to the portable, unified streaming/batch model of Google Cloud Dataflow and Apache Beam (incubating). Tyler examines in detail the basic architectural concepts that underlie these four models, highlights their similarities, contrasts their differences (particularly regarding traditional batch versus streaming), and provides insight into the use cases the drove the progression of the designs to what exists today. He also highlights similarities and differences with related open source systems such as Flink, Spark, Storm, and Gearpump, calling out ways in which they’re converging on and diverging from the Beam model and what that means when running Beam pipelines on their respective runners.

    Photo of Tyler Akidau

    Tyler Akidau


    Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems is the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.

    Comments on this page are now closed.


    Picture of Tyler Akidau
    Tyler Akidau
    10/06/2016 3:27pm EDT


    Picture of Hajkan Jonsson
    10/06/2016 4:07am EDT

    Great talk in NYC. Can you please share the slides to the speaker slides page?

    10/03/2016 9:37pm EDT

    Could you share the presentation slides?