Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Model lifecycle management sessions

Companies are realizing that machine learning model development is not quite the same as software development. Completion of the ML model building process doesn’t automatically translate to a working system. The data community is still in the process of building tools to help manage the entire lifecycle which also includes model deployment, monitoring, and operations. While tools and best practices are just beginning to emerge and be shared, model lifecycle management is one of the most active areas in the data space.

9:00am–12:30pm Tuesday, 09/11/2018

Model serving and management at scale using open source tools

Location: 1E 06 Level: Intermediate

Dan Crankshaw (UC Berkeley RISELab)

Average rating:

(5.00, 1 rating)

Dan Crankshaw offers an overview of the current challenges in deploying machine applications into production and the current state of prediction serving infrastructure. He then leads a deep dive into the Clipper serving system and shows you how to get started. Read more.

1:30pm–5:00pm Tuesday, 09/11/2018

From training to serving: Deploying TensorFlow models with Kubernetes

Location: 1E 09 Level: Intermediate

Brian Foo (Google), Holden Karau (Independent), Jay Smith (Google)

Average rating:

(2.00, 7 ratings)

TensorFlow and Keras are popular libraries for training deep models due to hardware accelerator support. Brian Foo, Jay Smith, and Holden Karau explain how to bring deep learning models from training to serving in a cloud production environment. You'll learn how to unit-test, export, package, deploy, optimize, serve, monitor, and test models using Docker and TensorFlow Serving in Kubernetes. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Why data scientists should love Linux containers

Location: 1A 08 Level: Beginner

William Benton (Red Hat)

Average rating:

(5.00, 2 ratings)

Containers are a hot technology for application developers, but they also provide key benefits for data scientists. William Benton details the advantages of containers for data scientists and AI developers, focusing on high-level tools that will enable you to become more productive and collaborate more effectively. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Executive Briefing: Why machine-learned models crash and burn in production and what to do about it

Location: 1E 14 Level: Intermediate

David Talby (Pacific AI)

Average rating:

(4.40, 5 ratings)

Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Bighead: Airbnb's end-to-end machine learning platform

Location: 1A 08 Level: Beginner

Atul Kale (Airbnb), Xiaohan Zeng (Airbnb)

Average rating:

(5.00, 3 ratings)

Atul Kale and Xiaohan Zeng offer an overview of Bighead, Airbnb's user-friendly and scalable end-to-end machine learning framework that powers Airbnb's data-driven products. Built on Python, Spark, and Kubernetes, Bighead integrates popular libraries like TensorFlow, XGBoost, and PyTorch and is designed be used in modular pieces. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

MLflow: An open platform to simplify the machine learning lifecycle

Location: Expo Hall

Mani Parkhe (Databricks), Andrew Chen (Databricks)

Successfully building and deploying a machine learning model is difficult to do once. Enabling other data scientists to reproduce your pipeline, compare the results of different versions, track what's running where, and redeploy and rollback updated models is much harder. Mani Parkhe and Andrew Chen offer an overview of MLflow—a new open source project from Databricks that simplifies this process. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

Using machine learning to drive intelligence at the edge

Location: 1E 09 Level: Intermediate

Dave Shuman (Cloudera), Bryan Dean (Red Hat)

The focus on the IoT is turning increasingly to the edge, and the way to make the edge more intelligent is by building machine learning models in the cloud and pushing them back out to the edge. Dave Shuman and Bryan Dean explain how Cloudera and Red Hat executed this architecture at one of Europe's leading manufacturers, along with a demo highlighting this architecture. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

Apache Kafka and the four challenges of production machine learning systems

Location: 1A 21/22 Level: Intermediate

Jay Kreps (Confluent)

Average rating:

(4.00, 2 ratings)

Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes or customer experience. Jay Kreps explores some of the difficulties of building production machine learning systems and explains how Apache Kafka and stream processing can help. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

Deploying machine learning models in the enterprise

Location: 1E 10/11 Level: Intermediate

Diego Oppenheimer (Algorithmia)

Average rating:

(4.50, 2 ratings)

After big investments in collecting and cleaning data and building machine learning (ML) models, enterprises face big challenges in deploying models to production and managing a growing portfolio of ML models. Diego Oppenheimer covers the strategic and technical hurdles each company must overcome and the best practices developed while deploying over 4,000 ML models for 70,000 engineers. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

Deep learning on YARN: Running distributed TensorFlow, MXNet, Caffe, and XGBoost on Hadoop clusters

Location: 1A 10 Level: Intermediate

Wangda Tan (Cloudera)

Average rating:

(4.50, 2 ratings)

In order to train deep learning and machine learning models, you must leverage applications such as TensorFlow, MXNet, Caffe, and XGBoost. Wangda Tan discusses new features in Apache Hadoop 3.x to better support deep learning workloads and demonstrates how to run these applications on YARN. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Building a high-performance model serving engine from scratch using Kubernetes, GPUs, Docker, Istio, and TensorFlow

Location: Expo Hall Level: Intermediate

Chris Fregly (Amazon Web Services)

Average rating:

(3.50, 2 ratings)

Chris Fregly details a full-featured, open source end-to-end TensorFlow model training and deployment system, using the latest advancements with Kubernetes, TensorFlow, and GPUs. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Kubeflow explained: Portable machine learning on Kubernetes

Location: 1A 10 Level: Intermediate

Michelle Casbon (Google)

Average rating:

(5.00, 2 ratings)

Michelle Casbon demonstrates how to build a machine learning application with Kubeflow. Kubeflow makes it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere and supports the full lifecycle of an ML product, including iteration via Jupyter notebooks. Join Michelle to find out what Kubeflow currently supports and the long-term vision for the project. Read more.

4:20pm–5:00pm Thursday, 09/13/2018

Infrastructure for deploying machine learning to production in large financial institutions: Lessons learned and best practices

Location: 1A 08 Level: Intermediate

Harish Doddi (Datatron), Jerry Xu (Datatron Technologies)

Large financial institutions have many data science teams (e.g., those for fraud, credit risk, and marketing), each often using diverse set of tools to build predictive models. There are many challenges involved in productionizing these predictive AI models. Harish Doddi and Jerry Xu share challenges and lessons learned deploying AI models to production in large financial institutions. Read more.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com