Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Streaming architecture: Why flow instead of state?

Ted Dunning (MapR)
5:10pm–5:50pm Wednesday, 03/30/2016
Data Innovations

Location: 210 D/H
Tags: real-time
Average rating: ****.
(4.11, 9 ratings)

Until recently, batch processing has been the standard model for big data. Largely, this is due to the very large influence of the original processing MapReduce implementation in Hadoop and the difficulty in replacing MapReduce in the original Hadoop framework.

Today, there has been a shift to streaming architectures using tools such as Apache Spark and Kafka. These architectures offer large benefits in terms of simplicity and robustness, but they are also surprisingly different from previous message-queuing designs. The changes in these new systems allow enormously higher scalability and make fault tolerance relatively simple to achieve while maintaining good latency.

Ted Dunning explores the key design techniques used in modern systems, including percolators, the big data oscilloscope, replayable queues, state-point queuing, and universal microarchitectures.

Benefits of these techniques include:

  • A decrease in total system complexity
  • Flexible throughput/latency tradeoffs
  • Fault tolerance without the difficulties of Lamdba architecture
  • Easy debuggability
Photo of Ted Dunning

Ted Dunning


Ted Dunning is the chief technology officer at MapR. He’s also a board member for the Apache Software Foundation; a PMC member and committer of the Apache Mahout, Apache Zookeeper, and Apache Drill projects; and a mentor for various incubator projects. Ted has years of experience with machine learning and other big data solutions across a range of sectors. He’s contributed to clustering, classification, and matrix decomposition algorithms in Mahout and to the new Mahout Math library and designed the t-digest algorithm used in several open source projects and by a variety of companies. Previously, Ted was chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems and built fraud-detection systems for ID Analytics (LifeLock). Ted has coauthored a number of books on big data topics, including several published by O’Reilly related to machine learning, and has 24 issued patents to date plus a dozen pending. He holds a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting.