Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

From training to serving: Deploying TensorFlow models with Kubernetes

Brian Foo (Google), Holden Karau (Independent), Jay Smith (Google)
1:30pm–5:00pm Tuesday, 09/11/2018
Data engineering and architecture
Location: 1E 09 Level: Intermediate
Secondary topics:  Model lifecycle management
Average rating: **...
(2.00, 7 ratings)

Who is this presentation for?

  • Software engineers, data scientists, and solutions architects

Prerequisite knowledge

  • A working knowledge of Python and shell scripting (Familiarity with running Python code in a Jupyter notebook is recommended.)
  • A basic understanding of the TensorFlow Estimator or Keras API

Materials or downloads needed in advance

  • A laptop with the Google Chrome browser installed and the ability to access Google Cloud Platform (i.e., check your company's firewall)

What you'll learn

  • Learn best practices for migrating training code to production code for serving machine learning models—unit-testing TensorFlow APIs, writing server-client RPC interfaces, building and testing Docker images, deploying on a cluster of machines, monitoring the cluster, and performance profiling
  • Explore open source software that helps automate the deployment process


TensorFlow and Keras are popular libraries for training deep models due to hardware accelerator support. Brian Foo, Jay Smith, and Holden Karau explain how to bring deep learning models from training to serving in a cloud production environment. You’ll learn how to unit-test, export, package, deploy, optimize, serve, monitor, and test models using Docker and TensorFlow Serving in Kubernetes.

Topics include:

  • An introduction to the TensorFlow and Keras APIs
  • Writing and testing a server-client API for model serving
  • Deployment and services: Using Docker and Kubernetes to deploy model serving and using a frontend service to invoke backend model servers
  • Profiling serving performance on different hardware (CPUs, GPUs, TPUs) and with different batch sizes
Photo of Brian Foo

Brian Foo


Brian Foo is a senior software engineer for Google Cloud working on applied artificial intelligence, where he builds demos for Google Cloud’s strategic customers and creates open source tutorials to improve public understanding of AI. Previously, Brian worked at Uber, where he trained machine learning models and built a large-scale training and inference pipeline for mapping and sensing/perception applications using Hadoop and Spark, and headed the real-time bidding optimization team at Rocket Fuel, where he worked on algorithms that determined millions of ads shown every second across many platforms such as web, mobile, and programmatic TV. Brian holds a BS in EECS from UC Berkeley and a PhD in EE telecommunications from UCLA.

Photo of Holden Karau

Holden Karau


Holden Karau is a transgender Canadian software engineer working in the bay area. Previously, she worked at IBM, Alpine, Databricks, Google (twice), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She’s a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.

Photo of Jay Smith

Jay Smith


Jason “Jay” Smith is a Cloud customer engineer at Google. He spends his day helping enterprises find ways to expand their workload capabilities on Google Cloud. He’s on the Kubeflow go-to-market team and provides code contributions to help people build an ecosystem for their machine learning operations. His passions include big data, ML, and helping organizations find a way to collect, store, and analyze information.

Comments on this page are now closed.


Picture of Brian Foo
09/12/2018 7:54am EDT

Thank you all for attending this tutorial!

I realize that this was a dense lab, and there were also a number of technical issues with the lab that prevented many from completing the lab end to end. We will have a github repository up for you in about a week to run the lab on your own, both locally and on Google cloud, along with solution files so you can easily check your work.