Hands-on machine learning with Kafka-based streaming pipelines
Who is this presentation for?
- Architects and software and data engineers interested in building streaming and model serving systems
One possibility of training and serving (scoring) models is to treat trained model as code, then run that code for scoring. This works fine if the model never changes for the lifetime of the scoring process but isn’t ideal with long-running data streams, where you’d like to retrain the model periodically (due to concept drift) and score with the new model. The better way is to treat the model as data and have this model data exchanged between the training and scoring systems, which allows updating models in the running context.
Boris Lublinsky and Dean Wampler explore different approaches to model training and serving that use this technique, where one or both functions are made an integrated part of the data-processing pipeline implementation (i.e., as an additional functional transformation of the data). The advantage of this approach is that model serving is implemented as part of the larger data-transformation pipeline. Such pipelines can be implemented using streaming engines—Spark Streaming, Flink, or Beam—or streaming libraries—Akka Streams or Kafka Streams. Boris and Dean will use Akka Streams, Flink, and Spark Structured Streaming in their demos.
Speculative execution of model serving
Guaranteed execution time
- With several models, where the fastest one provides a fixed execution time for scoring, it’s possible to ensure that scoring completes within a fixed time bound (latency budget), as long as this bound is larger than the execution time of the faster model with the fixed execution time.
Consensus-based model serving
- With several models, you can implement model serving where the score is the outcome of a majority “vote” of the models.
Quality-based model serving
- With multiple models, you can layer over them an algorithm that evaluates the quality of scoring result and picks the result with the best quality, which requires a quality metric to exist; may have a confidence level on each score and the result with the highest confidence “wins.”
- You may also wish to implement A/B or blue/green testing, where you put a new model in production and route some percentage of traffic to it, then evaluate if it’s actually better before switching all traffic to it.
- Training from scratch requires all relevant historical data, so much more compute is required than is typically necessary for scoring; to avoid this, sometimes incremental training updates can be done instead.
- Mini-batch training has existed for a while as a technique for training models on very large data sets, independent of the notion of streaming, which is directly applicable to the streaming context, where new data arrives all the time.
- Training a sophisticated model, such as a neural net, and then training a simpler model, such as a logistic regression, and using the neural net as a data generator is another common approach; the simpler model approximates the complex model, trading accuracy for better scoring performance—a variation is to use both models in the speculative execution, latency-sensitive context mentioned above.
- Advantages of separating training and serving into two different systems include more implementation flexibility and the ability to optimize training and serving independently.
- Use batch or mini-batch training to save the intermediate model locally to restart training.
- Train on a dedicated cluster where the hardware and software are optimized for model training.
- Leverage existing, publicly available models for well-known domains like natural language processing (NLP), where updates to the model are rarely required, eliminating the need to do training yourself.
Real-world production concerns
Data governance metadata
- Organizations will want some traceability about scoring and will want to know which version of a model was used to score each record. This can be implemented several ways: include a version label or universally unique identifier (UUID) for the model with the score that’s added to the records or use the time stamp and a repository that maps time stamps to model versions to determine the model that was used. However, using a time stamp is tricky, because you have to carefully track at which subsecond time stamp a new model was added to the system. This approach also won’t really work with speculative execution scenarios.
Management and monitoring
- Reactive principles: Availability requirements and how to meet them (e.g., failover to parallel pipeline?) and how and when to scale as needed.
- Experience programming, preferably with Java or Scala, and with various ML toolkits, Kafka, Akka Streams Spark and Flink (useful but not required)
Materials or downloads needed in advance
- A laptop with Java and Scala installed and access to Github (before the tutorial, please set up your laptop by cloning the "GitHub repo":https://github.com/lightbend/model-serving-tutorial. You can also download the latest release. Then follow the README's setup instructions.)
What you'll learn
- Gain an understanding of streaming architectures and their usage for model serving
- See working examples of streaming implementations leveraging modern stream processing engines and frameworks
- Discover practical approaches to evaluating tradeoffs of different streaming implementations
Boris Lublinsky is a principal architect at Lightbend, where he specializes in big data, stream processing, and services. Boris has over 30 years’ experience in enterprise architecture. Previously, he was responsible for setting architectural direction, conducting architecture assessments, and creating and executing architectural road maps in fields such as big data (Hadoop-based) solutions, service-oriented architecture (SOA), business process management (BPM), and enterprise application integration (EAI). Boris is the coauthor of Applied SOA: Service-Oriented Architecture and Design Strategies, Professional Hadoop Solutions, and Serving Machine Learning Models. He’s also cofounder of and frequent speaker at several Chicago user groups.
Dean Wampler is an expert in streaming data systems, focusing on applications of machine learning and artificial intelligence (ML/AI). He’s head of developer relations at Anyscale, which is developing Ray for distributed Python, primarily for ML/AI. Previously, he was an engineering VP at Lightbend, where he led the development of Lightbend CloudFlow, an integrated system for building and running streaming data applications with Akka Streams, Apache Spark, Apache Flink, and Apache Kafka. Dean is the author of Fast Data Architectures for Streaming Applications, Programming Scala, and Functional Programming for Java Developers, and he’s the coauthor of Programming Hive, all from O’Reilly. He’s a contributor to several open source projects. A frequent conference speaker and tutorial teacher, he’s also the co-organizer of several conferences around the world and several user groups in Chicago. He earned his PhD in physics from the University of Washington.
Comments on this page are now closed.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires