Join Tyler Akidau for a whirlwind tour of the conceptual building blocks of massive-scale data processing systems over the last decade, as Tyler compares and contrasts systems at Google with popular open source systems in use today.
Tyler explores the evolution of massive-scale data processing at Google, from the original MapReduce paradigm to the high-level pipelines of Flume to the streaming approach of MillWheel to the unified streaming/batch model of Cloud Dataflow. Along the way, Tyler examines in detail the basic architectural concepts that underlie the four models—highlighting their similarities, contrasting their differences (particularly regarding traditional batch versus streaming), and providing insight into the use cases that drove the progression of the designs to what exists today—and discusses the similarities and differences with related open source systems, such as Hadoop, Spark, Storm, and Flink.
Tyler Akidau is a senior staff software engineer at Google Seattle. He leads technical infrastructure’s internal data processing teams in Seattle (MillWheel & Flume), is a founding member of the Apache Beam PMC, and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer in batch and streaming as two sides of the same coin, with the real endgame for data processing systems the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102.” His preferred mode of transportation is by cargo bike, with his two young daughters in tow.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.