Get the free Ebook:
Private and Open Data in Asia: A Regional Guide.
Come explore the evolution of massive-scale data processing over the last decade. The backbone of the talk will follow the progression of systems in use at Google, from the classic MapReduce paradigm, to the high-level pipelines of Flume, to the streaming approach of MillWheel, to the unified streaming/batch model of Cloud Dataflow. I’ll look in detail at the basic architectural concepts that underlie the four models, highlight their similarities, contrast their differences (particularly regarding traditional batch vs streaming), and provide insight into the use cases that drove the refinement of the designs into what exists today. As we go, I’ll also discuss the common patterns and differentiating characteristics found in contemporary open source systems, such as Hadoop, Spark, Storm, Flink, etc.
Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems is the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.