Mar 15–18, 2020

Continuous Delivery for Machine Learning: Automating the end-to-end lifecycle

Danilo Sato (ThoughtWorks)
9:00am12:30pm Monday, March 16, 2020
Location: LL21 D

Who is this presentation for?

Data scientists or analysts

Level

Intermediate

Description

Releasing Machine Learning systems into production is harder than traditional software. They are non-deterministic, hard to test, hard to explain, and hard to improve. You are not finished when you find your first working model; you also need to think about things like integration, testing, deployment, scaling, and monitoring. What’s more, after launch, you will want to continuously adapt and improve your model to respond to the changing environment.

ThoughtWorks pioneered Continuous Delivery, and have now further developed it to overcome the challenges associated with Machine Learning systems, and calls this new approach Continuous Delivery for Machine Learning (CD4ML). CD4ML is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles.

In this hands-on training, We will demonstrate how to apply CD4ML. Using a real Machine Learning application in a live scenario, you will learn how to:

  • Create your deployment pipelines;
  • Version your model training workflow to make it reproducible;
  • Improve your model in a development environment, test its performance, and depending on the outcome, automatically deploy the new model into a production environment;
  • Track model performance across various experiments; and
  • Monitor and observe your model in production to close the data feedback loop.

The tech stack for this scenario will be Python with scikit-learn, DVC (Data Science Version Control), mlflow, GoCD, Docker, Git, ElasticSearch, FluentD, Kibana, and Google Cloud Platform.

Prerequisite knowledge

Basic knowledge of developing ML models (preferably in Python), source control with Git, and using Docker for local development.

Materials or downloads needed in advance

A GitHub account and Docker installed on their laptops, to be able to run our containerized development environment.

What you'll learn

They will understand the importance of CD4ML, its technical components, and experience a specific implementation using Open Source tools and the Public Cloud to automate the end-to-end lifecycle of Machine Learning applications.
Photo of Danilo Sato

Danilo Sato

ThoughtWorks

Danilo Sato is a principal consultant at ThoughtWorks with more than 15 years of experience in many areas of architecture and engineering: software, data, infrastructure, and machine learning. Balancing strategy with execution, Danilo helps clients refine their technology strategy while adopting practices to reduce the time between having an idea, implementing it, and running it in production using the cloud, DevOps, and continuous delivery. He is the author of DevOps in Practice: Reliable and Automated Software Delivery, is a member of ThoughtWorks’ Technology Advisory Board and Office of the CTO, and is an experienced international conference speaker.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires