Hands-on Machine Learning with Kafka-based Streaming Pipelines
Location: LL21 C/D
Who is this presentation for?Software engineers and Architects looking to implement production grade model serving
Prerequisite knowledgeBasic knowledge of programming - Scala or Java Knowledge of stream processing Understanding of model training and serving
Materials or downloads needed in advance
What you'll learn
How should you train models and serve them (score with them)? One possibility is to treat trained model as code, then run that code for scoring. This works fine if the model will never change for the lifetime of the scoring process, but this is not ideal when long-running data streams, where you would like to retrain the model periodically (due to concept drift) and score with the new model.
The better way is to treat the model as data, and have this model data exchanged between the training and scoring systems, which allows updating models in the running context.
We’ll cover different approaches to model training and serving that follow use this technique, where we make one or both functions an integrated part of the data processing pipeline implementation, i.e., as an additional functional transformation of the data. The advantage of this approach is that model serving is implemented as part of the larger data transformation pipeline. Such pipelines can be implemented either using streaming engines, for example, Spark Streaming, Flink, or Beam or streaming libraries, for example Akka Streams or Kafka Streams.
The tutorial will demonstrate example implementations using Akka Streams, Flink, and Spark Structured Streaming.
Additionally, the tutorial will cover speculative execution of model serving. The advantages of this approach is the ability to provide the following features for model serving applications:
- Guaranteed execution time. If we have several models where the fastest one provides a fixed execution time for scoring, it is possible to ensure that scoring completes within a fixed time bound (latency budget), as long as this bound is larger than the execution time of the faster model with the fixed execution time.
- Consensus based model serving. When we have several models, we can implement model serving where the score is the outcome of a majority “vote” of the models.
- Quality based model serving. When we have multiple models, we layer over them an algorithm that evaluates the quality of scoring result and picks the result with the best quality. This requires a quality metric to exist. One possibility is that each score includes a confidence level and the result with the highest confidence wins.
A/B or blue/green testing. Related to the previous bullet, we might put a new model in production, route some percentage of the traffic to it, then evaluate if it’s actually better before switching all traffic to it.
For model training, we’ll cover the following:
- Performance optimizations:
If training from scratch, it requires all the relevant historical data, so much more compute is required than is typically necessary for scoring. To avoid this overhead, sometimes incremental training updates can be done instead. Mini-batch training has existed for a while as a technique for training models on very large data sets, independent of the notion of streaming. This technique is directly applicable to the streaming context where new data is arriving all the time.
Another common approach to simplification of the model serving is to train a sophisticated model, such as a neural net, then train a simpler model, such as a logistic regression, using the neural net as a data generator. In other words, the simpler model approximates the complex model, trading off accuracy for better scoring performance. A variation of this approach is to use both models in the speculative execution, latency-sensitive context mentioned above.
- Advantages of separating training and serving into two different systems include more implementation flexibility and the ability to optimize training and serving independently.
Use batch or minibatch training, saving intermediate model locally to restart training.
Train on a dedicated cluster where the hardware and software are optimized for model training
Leverage existing, publically-available models for well-known domains like NLP, where updates to the model are actually rarely required, thereby eliminating the need to do training yourself!
Finally, we will consider additional real-world production concerns:
- Data governance metadata: organizations will want some traceability about scoring. Hence, they will want to know which version of a model was used to score each record. This can be implemented several ways:
Include a version label or UUID for the model with the score that’s added to the records.
Use the timestamp and a repository that maps timestamps to model versions to determine the model that was used. However, this is tricky, because you will have to careful track at which subsecond timestamp a new model was added to the system. This approach also won’t really work with in speculative execution scenarios.
- Management and monitoring
- Reactive principles:
Availability requirements and how to meet them (e.g., failover to parallel pipeline?)
How and when to scale as needed
Boris Lublinsky is a software architect at Lightbend, where he specializes in big data, stream processing, and services. Boris has over 30 years’ experience in enterprise architecture. Over his career, he has been responsible for setting architectural direction, conducting architecture assessments, and creating and executing architectural roadmaps in fields such as big data (Hadoop-based) solutions, service-oriented architecture (SOA), business process management (BPM), and enterprise application integration (EAI). Boris is the coauthor of Applied SOA: Service-Oriented Architecture and Design Strategies, Professional Hadoop Solutions, and Serving Machine Learning Models. He is also cofounder of and frequent speaker at several Chicago user groups.
Dean Wampler is the vice president of fast data engineering at Lightbend, where he leads the creation of the Lightbend Fast Data Platform, a distribution of scalable, distributed stream processing tools including Spark, Flink, Kafka, and Akka, with machine learning and management tools. Dean is the author of Programming Scala and Functional Programming for Java Developers and the coauthor of Programming Hive, all from O’Reilly. He is a contributor to several open source projects. A frequent Strata speaker, he’s also the co-organizer of several conferences around the world and several user groups in Chicago.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of O'Reilly AI contacts