Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

Hands-on machine learning with Kafka-based streaming pipelines

Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)
9:00am12:30pm Tuesday, March 26, 2019
Average rating: ***..
(3.85, 13 ratings)

Who is this presentation for?

  • Data engineers and architects interested in techniques for serving ML models in streaming data pipelines



Prerequisite knowledge

  • Programming experience, preferably with Java or Scala
  • Familiarity with ML toolkits, Kafka, and Akka Streams (useful but not required)

Materials or downloads needed in advance

  • A laptop with Scala 2.11, SBT, Docker, Git, and Intellij (recommended) installed
  • Read the course GitHub page and follow the setup instructions

What you'll learn

  • Understand the trade-offs of using REST-based services versus embedding for model serving
  • Learn how to combine Kafka with streaming libraries like Akka Streams and Kafka Streams to implement model serving for streaming data pipelines and how to leverage the strengths of these tools while avoiding their weaknesses
  • Discover how these approaches compare to Spark Streaming and Flink for stream-based model serving and how well different ML systems support these patterns


How should you train models and serve them (score with them)? One possibility is to treat trained model as code, then run that code for scoring. This works fine if the model will never change for the lifetime of the scoring process, but it’s not ideal when long-running data streams, where you would like to retrain the model periodically (due to concept drift) and score with the new model. A better way is to treat the model as data and have this model data exchanged between the training and scoring systems, which allows updating models in the running context.

Boris Lublinsky and Dean Wampler walk you through different approaches to model training and serving that use this technique, where you make one or both functions an integrated part of the data processing pipeline implementation (i.e., as an additional functional transformation of the data). The advantage of this approach is that model serving is implemented as part of the larger data transformation pipeline. Such pipelines can be implemented either using streaming engines (e.g., Spark Streaming, Flink, or Beam) or streaming libraries (e.g., Akka Streams or Kafka Streams). Boris and Dean demonstrate example implementations using Akka Streams, Flink, and Spark Structured Streaming.

Along the way, they cover speculative execution of model serving. The advantage of this approach is the ability to provide the following features for model serving applications:

  • Guaranteed execution time: If you have several models where the fastest one provides a fixed execution time for scoring, it’s possible to ensure that scoring completes within a fixed time bound (latency budget), as long as this bound is larger than the execution time of the faster model with the fixed execution time.
  • Consensus-based model serving: When you have several models, you can implement model serving where the score is the outcome of a majority “vote” of the models.
  • Quality-based model serving: When you have multiple models, you can layer over them an algorithm that evaluates the quality of scoring result and picks the result with the best quality. This requires a quality metric to exist. One possibility is that each score includes a confidence level and the result with the highest confidence wins.
  • A/B or blue-green testing: You can put a new model in production, route some percentage of the traffic to it, then evaluate if it’s actually better before switching all traffic to it.

They also cover performance optimizations for model training. If training from scratch, it requires all the relevant historical data, so much more compute is required than is typically necessary for scoring. To avoid this overhead, sometimes incremental training updates can be done instead. Minibatch training has existed for a while as a technique for training models on very large datasets, independent of the notion of streaming. This technique is directly applicable to the streaming context where new data is arriving all the time. Another common approach to simplification of the model serving is to train a sophisticated model, such as a neural net, and then train a simpler model, such as a logistic regression, using the neural net as a data generator. In other words, the simpler model approximates the complex model, trading off accuracy for better scoring performance. A variation of this approach is to use both models in the speculative execution, latency-sensitive context mentioned above. You’ll also learn the advantages of separating training and serving into two different systems (more implementation flexibility and the ability to optimize training and serving independently); how to use batch or minibatch training, saving intermediate model locally to restart training; how to train on a dedicated cluster where the hardware and software are optimized for model training; and how to leverage existing, publicly available models for well-known domains like NLP, where updates to the model are actually rarely required, thereby eliminating the need to do training yourself.

Boris and Dean conclude by considering real-world production concerns like data governance for metadata, management and monitoring, reactive principles (e.g., availability requirements and how to meet them as well as how and when to scale as needed), and security.

Photo of Boris Lublinsky

Boris Lublinsky


Boris Lublinsky is a principal architect at Lightbend, where he specializes in big data, stream processing, and services. Boris has over 30 years’ experience in enterprise architecture. Previously, he was responsible for setting architectural direction, conducting architecture assessments, and creating and executing architectural road maps in fields such as big data (Hadoop-based) solutions, service-oriented architecture (SOA), business process management (BPM), and enterprise application integration (EAI). Boris is the coauthor of Applied SOA: Service-Oriented Architecture and Design Strategies, Professional Hadoop Solutions, and Serving Machine Learning Models. He’s also cofounder of and frequent speaker at several Chicago user groups.

Photo of Dean Wampler

Dean Wampler


Dean Wampler is an expert in streaming data systems, focusing on applications of machine learning and artificial intelligence (ML/AI). He’s head of developer relations at Anyscale, which is developing Ray for distributed Python, primarily for ML/AI. Previously, he was an engineering VP at Lightbend, where he led the development of Lightbend CloudFlow, an integrated system for building and running streaming data applications with Akka Streams, Apache Spark, Apache Flink, and Apache Kafka. Dean is the author of Fast Data Architectures for Streaming Applications, Programming Scala, and Functional Programming for Java Developers, and he’s the coauthor of Programming Hive, all from O’Reilly. He’s a contributor to several open source projects. A frequent conference speaker and tutorial teacher, he’s also the co-organizer of several conferences around the world and several user groups in Chicago. He earned his PhD in physics from the University of Washington.