Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy serving loads. Supporting these applications comes with its own set of challenges, requiring the integration of machine learning software with other systems including user-facing application code, live databases, and high-volume data streams. To address these challenges, there has emerged a new class of systems, the prediction-serving system, for deploying machine learning models at inference time.
While the field of model serving is still new, there are already several prediction-serving systems that address different deployment challenges and span different points in the design space, including such systems as TensorFlow Serving, Clipper, MXNet Model Server, AWS SageMaker, and Microsoft Azure’s Machine Learning Studio. Some are designed for a specific use case or machine learning framework while others attempt to be more general. They vary in their performance, scalability, fault tolerance, and support for specialized hardware such as GPUs and TPUs. They vary in their ability to support more complex machine learning applications, including support for A/B testing, online exploration, and model composition. And they include open source systems, cloud services, and proprietary products.
Dan Crankshaw offers an overview of the current challenges and trade-offs involved in prediction serving, for those attempting to find the best existing system for their application as well as those planning on building their own prediction server, and explores the current state of prediction serving infrastructure. He then leads a deep dive into the Clipper serving system—an open source low-latency, general-purpose prediction-serving system from the RISELab—and shows you how to get started. Clipper interposes between applications that consume predictions and the machine learning models that produce predictions. By adopting a modular serving architecture and isolating models in their own containers, Clipper simplifies the model deployment process and allows models to be evaluated using the same runtime environment as that used during training. Clipper’s modular architecture provides simple mechanisms for scaling out models to meet increased throughput demands. And it leverages Kubernetes to provide fault tolerance and fine-grained physical resource allocation on a per-model basis. Further, by abstracting models behind a uniform serving interface, Clipper allows developers to compose many machine learning models within a single application to support increasingly common techniques such as ensemble methods, multiarmed bandit algorithms, and model composition. Along the way, Dan shares a case study of an industrial Clipper deployment and discuss his experience transforming a research prototype into an active open source system.
Dan concludes by walking you through developing and deploying your first machine learning application using Clipper. You’ll learn how to use the Clipper API and how to perform common tasks such as deploying and rolling back a model and scaling out an application; you’ll also get the opportunity to develop a custom machine learning application and deploy it for use it in a web application.
Dan Crankshaw is a PhD student in the CS Department at UC Berkeley, where he works in the RISELab. After cutting his teeth doing large-scale data analysis on cosmology simulation data and building systems for distributed graph analysis, Dan has turned his attention to machine learning systems. His current research interests include systems and techniques for serving and deploying machine learning, with a particular emphasis on low-latency and interactive applications.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com