Join Tyler Akidau for a whirlwind tour of the conceptual building blocks of massive-scale data processing systems over the last decade, as Tyler compares and contrasts systems at Google with popular open source systems in use today.
Tyler explores the evolution of massive-scale data processing at Google, from the original MapReduce paradigm to the high-level pipelines of Flume to the streaming approach of MillWheel to the unified streaming/batch model of Cloud Dataflow. Along the way, Tyler examines in detail the basic architectural concepts that underlie the four models—highlighting their similarities, contrasting their differences (particularly regarding traditional batch versus streaming), and providing insight into the use cases that drove the progression of the designs to what exists today—and discusses the similarities and differences with related open source systems, such as Hadoop, Spark, Storm, and Flink.
Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems is the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email stratahadoop@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata + Hadoop World contacts
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.
Comments
Hey, this was a great presentation. Where can I find the slides to this? Would love to go back and revisit!
Cheers!