Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Schedule: Model lifecycle management sessions

Companies are realizing that machine learning model development is not quite the same as software development. Completion of the ML model building process doesn’t automatically translate to a working system. The data community is still in the process of building tools to help manage the entire lifecycle which also includes model deployment, monitoring, and operations. While tools and best practices are just beginning to emerge and be shared, model lifecycle management is one of the most active areas in the data space.

9:00am–12:30pm Tuesday, March 26, 2019

Hands-on machine learning with Kafka-based streaming pipelines

Data Engineering & Architecture, Streaming and IoT
Location: 2007

Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)

Average rating:

(3.85, 13 ratings)

Boris Lublinsky and Dean Wampler walk you through using ML in streaming data pipeline and doing periodic model retraining and low-latency scoring in live streams. You'll explore using Kafka as a data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.

1:30pm–5:00pm Tuesday, March 26, 2019

The hitchhiker's guide to deep learning-based recommenders in production

Data Science, Machine Learning & AI
Location: 2002

Abhishek Kumar (Publicis Sapient), Pramod Singh (Walmart Labs )

Average rating:

(4.17, 6 ratings)

Abhishek Kumar and Pramod Singh walk you through deep learning-based recommender and personalization systems they've built for clients. Join in to learn how to use TensorFlow Serving and MLflow for end-to-end productionalization, including model serving, Dockerization, reproducibility, and experimentation, and Kubernetes for deployment and orchestration of ML-based microarchitectures. Read more.

1:30pm–5:00pm Tuesday, March 26, 2019

Cross-cloud model training and serving with Kubeflow

Data Engineering & Architecture
Location: 2007

Holden Karau (Independent), Francesca Lazzeri (Microsoft), Trevor Grant (IBM)

Average rating:

(3.00, 2 ratings)

Holden Karau, Francesca Lazzeri, and Trevor Grant offer an overview of Kubeflow and walk you through using it to train and serve models across different cloud environments (and on-premises). You'll use a script to do the initial setup work, so you can jump (almost) straight into training a model on one cloud and then look at how to set up serving in another cluster/cloud. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Automating DevOps for machine learning

Data Engineering & Architecture
Location: 2008

Diego Oppenheimer (Algorithmia)

Average rating:

(4.00, 11 ratings)

You've invested heavily in cleaning your data, feature engineering, training, and tuning your model—but now you have to deploy your model into production, and you discover it's a huge challenge. Diego Oppenheimer shares common architectural patterns and best practices of the most advanced organizations who are deploying your model for scalability and accessibility. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

Deep learning beyond the learning

Data Engineering & Architecture
Location: 2008

Tobias Knaup (Mesosphere), Joerg Schad (ArangoDB)

Average rating:

(4.50, 2 ratings)

There are many great tutorials for training your deep learning models, but training is only a small part in the overall deep learning pipeline. Tobias Knaup and Joerg Schad offer an introduction to building a complete automated deep learning pipeline, starting with exploratory analysis, overtraining, model storage, model serving, and monitoring. Read more.

2:40pm–3:20pm Wednesday, March 27, 2019

Online evaluation of machine learning models

Data Science, Machine Learning & AI
Location: 2011

Ted Dunning (MapR, now part of HPE)

Average rating:

(4.70, 10 ratings)

Evaluating machine learning models is surprisingly hard, particularly because these systems interact in very subtle ways. Ted Dunning breaks the problem of evaluation apart into operational and function evaluation, demonstrating how to do each without unnecessary pain and suffering. Along the way, he shares exciting visualization techniques that will help make differences strikingly apparent. Read more.

4:20pm–5:00pm Wednesday, March 27, 2019

Scaling model training: From flexible training APIs to resource management with Kubernetes

Data Science, Machine Learning & AI
Location: 2011

Kelley Rivoire (Stripe)

Average rating:

(4.33, 3 ratings)

Production ML applications benefit from reproducible, automated retraining, and deployment of ever-more predictive models trained on ever-increasing amounts of data. Kelley Rivoire explains how Stripe built a flexible API for training machine learning models that's used to train thousands of models per week on Kubernetes, supporting automated deployment of new models with improved performance. Read more.

4:20pm–5:00pm Wednesday, March 27, 2019

MLflow: An open platform to simplify the machine learning lifecycle

Data Engineering & Architecture
Location: 2008

Corey Zumar (Databricks)

Average rating:

(4.89, 9 ratings)

Developing applications that leverage machine learning is difficult. Practitioners need to be able to reproduce their model development pipelines, as well as deploy models and monitor their health in production. Corey Zumar offers an overview of MLflow, which simplies this process by managing, reproducing, and operationalizing machine learning through a suite of model tracking and deployment APIs. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Persistent storage for machine learning in KubeFlow

Data Engineering & Architecture
Location: 2008

Skyler Thomas (MapR), Terry He (MapR Technologies)

Average rating:

(4.75, 4 ratings)

KubeFlow separates compute and storage to provide the ability to deploy best-of-breed open source systems for machine learning to any cluster running Kubernetes, whether on-premises or in the cloud. Skyler Thomas and Terry He explore the problems of state and storage and explain how distributed persistent storage can logically extend the compute flexibility provided by KubeFlow. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Talking to the machines: Monitoring production machine learning systems

Data Science, Machine Learning & AI
Location: 2011

Ting-Fang Yen (DataVisor)

Average rating:

(4.00, 3 ratings)

Ting-Fang Yen details an approach for monitoring production machine learning systems that handle billions of requests daily by discovering detection anomalies, such as spurious false positives, as well as gradual concept drifts when the model no longer captures the target concept. Join in to explore new tools for detecting undesirable model behaviors early in large-scale online ML systems. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Point, click, predict

Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall

Kevin Moore (Salesforce)

Average rating:

(4.50, 2 ratings)

Kevin Moore walks you through how TransmogrifAI—Salesforce's open source AutoML library built on Spark—automatically generates models that are automatically customized to a company's dataset and use case and provides insights into why the model is making the predictions it does. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Executive Briefing: Why machine-learned models crash and burn in production and what to do about it

Executive Briefing and best practices, Strata Business Summit
Location: 2020

David Talby (Pacific AI)

Average rating:

(4.90, 10 ratings)

Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Model governance in the enterprise

Data Engineering & Architecture
Location: 2018

Harish Doddi (Datatron), Jerry Xu (Datatron Technologies)

Average rating:

(4.00, 1 rating)

Harish Doddi and Jerry Xu share the challenges they faced scaling machine learning models and detail the solutions they're building to conquer them. Read more.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com