Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Model serving and management at scale using open source tools

Dan Crankshaw (UC Berkeley RISELab)
9:00am–12:30pm Tuesday, 09/11/2018
Data science and machine learning
Location: 1E 06 Level: Intermediate
Secondary topics:  Model lifecycle management
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data scientists, data engineers, machine learning practitioners, and machine learning platform developers

Prerequisite knowledge

  • A basic understanding of Python and machine learning (A working knowledge of scikit-learn is enough.)
  • Experience building web applications and using databases and web servers (useful but not required)

Materials or downloads needed in advance

  • A laptop with a recent version of Docker, Python 2.7 or 3.6, and Jupyter installed
  • Clone the course GitHub repository (optional; link TBD)

What you'll learn

  • Understand the key challenges of deploying machine learning applications and the trade-offs between the set of currently available prediction-serving systems
  • Gain hands-on experience using the Clipper prediction server to deploy a real machine learning application


Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy serving loads. Supporting these applications comes with its own set of challenges, requiring the integration of machine learning software with other systems including user-facing application code, live databases, and high-volume data streams. To address these challenges, there has emerged a new class of systems, the prediction-serving system, for deploying machine learning models at inference time.

While the field of model serving is still new, there are already several prediction-serving systems that address different deployment challenges and span different points in the design space, including such systems as TensorFlow Serving, Clipper, MXNet Model Server, AWS SageMaker, and Microsoft Azure’s Machine Learning Studio. Some are designed for a specific use case or machine learning framework while others attempt to be more general. They vary in their performance, scalability, fault tolerance, and support for specialized hardware such as GPUs and TPUs. They vary in their ability to support more complex machine learning applications, including support for A/B testing, online exploration, and model composition. And they include open source systems, cloud services, and proprietary products.

Dan Crankshaw offers an overview of the current challenges and trade-offs involved in prediction serving, for those attempting to find the best existing system for their application as well as those planning on building their own prediction server, and explores the current state of prediction serving infrastructure. He then leads a deep dive into the Clipper serving system—an open source low-latency, general-purpose prediction-serving system from the RISELab—and shows you how to get started. Clipper interposes between applications that consume predictions and the machine learning models that produce predictions. By adopting a modular serving architecture and isolating models in their own containers, Clipper simplifies the model deployment process and allows models to be evaluated using the same runtime environment as that used during training. Clipper’s modular architecture provides simple mechanisms for scaling out models to meet increased throughput demands. And it leverages Kubernetes to provide fault tolerance and fine-grained physical resource allocation on a per-model basis. Further, by abstracting models behind a uniform serving interface, Clipper allows developers to compose many machine learning models within a single application to support increasingly common techniques such as ensemble methods, multiarmed bandit algorithms, and model composition. Along the way, Dan shares a case study of an industrial Clipper deployment and discuss his experience transforming a research prototype into an active open source system.

Dan concludes by walking you through developing and deploying your first machine learning application using Clipper. You’ll learn how to use the Clipper API and how to perform common tasks such as deploying and rolling back a model and scaling out an application; you’ll also get the opportunity to develop a custom machine learning application and deploy it for use it in a web application.

Photo of Dan Crankshaw

Dan Crankshaw

UC Berkeley RISELab

Dan Crankshaw is a PhD student in the CS Department at UC Berkeley, where he works in the RISELab. After cutting his teeth doing large-scale data analysis on cosmology simulation data and building systems for distributed graph analysis, Dan has turned his attention to machine learning systems. His current research interests include systems and techniques for serving and deploying machine learning, with a particular emphasis on low-latency and interactive applications.