Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Apache Kafka and the four challenges of production machine learning systems

Jay Kreps (Confluent)
5:25pm–6:05pm Wednesday, 09/12/2018
Data engineering and architecture
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Model lifecycle management
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Developers and architects

Prerequisite knowledge

  • Basic knowledge of Apache Kafka and machine learning

What you'll learn

  • Learn how to use Apache Kafka and stream processing to make it easier to build machine learning systems


Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes, or customer experience. The cartoon version of machine learning sounds quite easy: you feed in training data made up of examples of good and bad outcomes, and the computer automatically learns from these and spits out a model that can make similar predictions on new data not seen before. What could be easier?

Those with real experience building and deploying production systems built around machine learning know that, in fact, these systems are shockingly hard to build, deploy, and operate. Jay Kreps explores some of the difficulties of building production machine learning systems and explains how Apache Kafka and stream processing can help.

Photo of Jay Kreps

Jay Kreps


Jay Kreps is the cofounder and CEO of Confluent, a company focused on Apache Kafka. Previously, Jay was one of the primary architects for LinkedIn, where he focused on data infrastructure and data-driven products. He was among the original authors of a number of open source projects in the scalable data systems space, including Voldemort (a key-value store), Azkaban, Kafka (a distributed messaging system), and Samza (a stream processing system).