History has shown the limitations of existing streaming systems with respect to reliability, flexibility, and ease of use. The industry has responded in turn with the Lambda Architecture, a clever confederation of batch and streaming systems that provides low-latency, eventually-correct results, while maintaining the ability to respond to changes in upstream data. Lambda proponents have long argued that it’s not possible to have all these things at once within a single streaming system. We respectfully disagree. :-)
We believe it is possible to build a streaming system you can rely on, making the Lambda Architecture unnecessary. In this talk, I’ll cover:
This talk is, at the same time, both high-level and quite technical. There are varying opinions about what streaming is, and this talk attempts to give an overview of what the different existing approaches are. It then covers in detail the streaming use case that no other general streaming system has yet conquered: that of providing low-latency, correct results with the flexibility to adjust to changes in source data, all at a massive scale. We hope to provide the audience an understanding of the issues they might face in building standalone streaming pipelines, regardless of the architecture used, with an eye toward the features of Google Cloud Dataflow that make it particularly well-suited to that problem domain.
Tyler Akidau is a staff software engineer at Google. The current tech lead for internal streaming data processing systems (e.g. MillWheel), he’s spent five years working on massive-scale streaming data processing systems. He passionately believes in streaming data processing as the more general model of large-scale computation. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.