Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Stream all the things!

Dean Wampler (Anyscale)
11:20am12:00pm Wednesday, September 27, 2017
Data Engineering & Architecture, Stream processing and analytics
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Streaming
Average rating: ***..
(3.00, 3 ratings)

Who is this presentation for?

  • Software architects and senior developers interested in architecture trends

Prerequisite knowledge

  • Prior experience building architectures for data-centric or enterprise systems

What you'll learn

  • Understand the forces driving streaming architectures in both the data and microservice spheres
  • Learn what streaming actually means, the tools that address particular needs, and how architectures are converging


Big data started with an emphasis on batch-oriented architectures, where data is captured in large, scalable stores then processed using batch jobs. To reduce the gap between data arrival and information extraction, these architectures are now evolving to be stream oriented, where data is processed as it arrives. (Fast data is the new buzzword for this process.)

Microservices are inherently message driven, a core tenet of reactive systems, responding to requests for service and sending messages to other microservices in turn. Hence, they are also stream oriented in a sense.

Because it’s trendy, the word “stream” is used in both spheres, because both are concerned with a never-ending sequence of data, but the resemblance is not superficial. Many of the same challenges and design patterns are shared. Hence, the movement to stream-oriented architectures is driving a convergence of data-centric and microservice architectures.

Dean Wampler defines “stream” based on characteristics for such systems, using specific tools as examples, and argues that big data and microservices architectures are converging. Dean begins by quantifying what streaming means in the context of four axes of concern that cross the fast data and microservice divide:

  • Low latency: How low?
  • High volume: How high?
  • Integration with other tools: Which ones and how?
  • Data processing: What kinds? In bulk? As individual events?

Dean then considers specific examples of streaming tools and explains how they fit on these axes, including heavy hitters in the data world, such as Spark and Kafka, and microservice toolkits, such as Akka and Rx. Dean concludes by speculating on the future of these trends—his belief that fast data and microservice architectures will converge, driven by the ever-growing importance of data and the scalability of fast data streaming.

Photo of Dean Wampler

Dean Wampler


Dean Wampler is an expert in streaming data systems, focusing on applications of machine learning and artificial intelligence (ML/AI). He’s head of developer relations at Anyscale, which is developing Ray for distributed Python, primarily for ML/AI. Previously, he was an engineering VP at Lightbend, where he led the development of Lightbend CloudFlow, an integrated system for building and running streaming data applications with Akka Streams, Apache Spark, Apache Flink, and Apache Kafka. Dean is the author of Fast Data Architectures for Streaming Applications, Programming Scala, and Functional Programming for Java Developers, and he’s the coauthor of Programming Hive, all from O’Reilly. He’s a contributor to several open source projects. A frequent conference speaker and tutorial teacher, he’s also the co-organizer of several conferences around the world and several user groups in Chicago. He earned his PhD in physics from the University of Washington.