Tyler Akidau offers a whirlwind tour of the evolution of massive-scale data processing at Google, from the original MapReduce paradigm to the high-level pipelines of Flume to the streaming approach of MillWheel to the portable, unified streaming/batch model of Google Cloud Dataflow and Apache Beam (incubating). Tyler examines in detail the basic architectural concepts that underlie these four models, highlights their similarities, contrasts their differences (particularly regarding traditional batch versus streaming), and provides insight into the use cases the drove the progression of the designs to what exists today. He also highlights similarities and differences with related open source systems such as Flink, Spark, Storm, and Gearpump, calling out ways in which they’re converging on and diverging from the Beam model and what that means when running Beam pipelines on their respective runners.
Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.