Put AI to Work

April 15-18, 2019
New York, NY

Please log in

Add to Your Schedule

Deploying deep learning models on GPU-enabled Kubernetes clusters

Mathew Salvaris (Microsoft), Fidan Boylu Uz (Microsoft)

11:05am–11:45am Wednesday, April 17, 2019

Implementing AI
Location: Trianon Ballroom

Secondary topics: Deep Learning and Machine Learning tools, Edge computing and Hardware, Platforms and infrastructure

Average rating:

(3.33, 3 ratings)

Download slides (PPTX)

Who is this presentation for?

Data scientists, solution architects, and deep learning practitioners

Level

Intermediate

Prerequisite knowledge

Familiarity with deep learning, Python, and Kubernetes

What you'll learn

Learn how to deploy deep learning models on Kubernetes, create a web service with Docker and test it locally, and create a Kubernetes cluster with GPUs and deploy the web service
Explore best practices for testing the model and obtaining throughput metrics
Understand GPU versus CPU benchmarking results that can serve as a rough guide to estimating the performance of deployed models

Description

One of the major challenges that data scientists often face is that once they have trained the model, they need to deploy it at production scale. It’s widely accepted that GPUs should be used for deep learning training, due to their significant speed when compared to CPUs. However, for tasks like inference (which are not as resource heavy as training), CPUs are usually sufficient and are more attractive due to their lower cost. But when inference speed is a bottleneck, GPUs provide considerable gains both from financial and time perspectives. Coupled with containerized applications and container orchestrators like Kubernetes, it’s now possible to go from training to deployment with GPUs faster and more easily while satisfying latency and throughput goals for production grade deployments.

Mathew Salvaris and Fidan Boylu Uz offer a step-by-step guide to creating a pretrained deep learning model, packaging it in a Docker container, and deploying as a web service on a Kubernetes cluster. You’ll learn how to test and verify each step and discover the gotchas you may encounter. You’ll also explore a demo of how to make calls to the deployed service to score images on a predeployed Kubernetes cluster as well as benchmarking results that provide a rough gauge of the performance of deep learning models on GPU and CPU clusters.

The tests use two frameworks—TensorFlow (1.8) and Keras (2.1.6) with a TensorFlow (1.6) backend—for five different models:

MobileNetV2 (3.4M parameters)
NasNetMobile (4.2M parameters)
ResNet50 (23.5M parameters)
ResNet152 (58.1M parameters)
NasNetLarge (84.7M parameters)

These models were selected in order to test a wide range of networks, from small parameter efficient models such as MobileNet to large networks such as NasNetLarge. For each, a Docker image with an API for scoring images has been prepared and deployed on four different cluster configurations:

1-node GPU cluster with 1 pod
2-node GPU cluster with 2 pods
3-node GPU cluster with 3 pods
5-node CPU cluster with 35 pods

Overall, results show that the throughput scales almost linearly with the number of GPUs and that GPUs always outperform CPUs at a similar price point. Mathew and Fidan also found that the performance on GPU clusters were far more consistent than CPUs—possibly because there’s no contention for resources between the model and the web service that’s present in the CPU only deployment. These results suggest that for deep learning inference tasks that use models with high number of parameters, GPU-based deployments benefit from the lack of resource contention and provide significantly higher throughput values compared to a CPU cluster of similar cost.

The session uses notebooks that you return to later.

Mathew Salvaris

Microsoft

Mathew Salvaris is a senior data scientist at Microsoft. Previously, Mathew was a data scientist for a small startup that provided analytics for fund managers; a postdoctoral researcher at UCL’s Institute of Cognitive Neuroscience, where he worked with Patrick Haggard in the area of volition and free will and devised models to decode human decisions in real time from the motor cortex using electroencephalography (EEG); and he held a postdoctoral position at the University of Essex’s Brain Computer Interface group and was a visiting researcher at Caltech. Mathew holds a PhD in brain-computer interfaces and an MSc in distributed artificial intelligence.

Fidan Boylu Uz

Microsoft

Fidan Boylu Uz is a senior data scientist at Microsoft, where she’s responsible for the successful delivery of end-to-end advanced analytic solutions. She’s also worked on a number of projects on predictive maintenance and fraud detection. Fidan has 10+ years of technical experience on data mining and business intelligence. Previously, she was a professor conducting research and teaching courses on data mining and business intelligence at the University of Connecticut. She has a number of academic publications on machine learning and optimization and their business applications and holds a PhD in decision sciences.

Presented by

Elite Sponsors

Strategic Sponsors

Contributing Sponsors

Business Summit Sponsor

Exabyte Sponsors

Diversity and Inclusion Sponsor

Impact Sponsors

Community Partners

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email aisponsorships@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of AI contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com