Continuous delivery for machine learning: Automating the end-to-end lifecycle
Who is this presentation for?Data scientists or analysts
Releasing ML systems into production is harder than traditional software. They are nondeterministic, hard to test, hard to explain, and hard to improve. You aren’t finished when you find your first working model; you also need to think about things like integration, testing, deployment, scaling, and monitoring. What’s more, after launch, you want to continuously adapt and improve your model to respond to the changing environment.
ThoughtWorks pioneered CD, and have now further developed it to overcome the challenges associated with ML systems, and calls this new approach continuous delivery for machine learning (CD4ML). CD4ML is a software engineering approach in which a cross-functional team produces ML applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time in short adaptation cycles.
Danilo Sato demonstrates how to apply CD4ML using a real ML application in a live scenario.
- Create your deployment pipelines
- Version your model training workflow to make it reproducible
- Improve your model in a development environment, test its performance, and depending on the outcome, automatically deploy the new model into a production environment
- Track model performance across various experiments
- Monitor and observe your model in production to close the data feedback loop.
The tech stack is Python with scikit-learn, data science version control (DVC), MLflow, GoCD, Docker, Git, Elasticsearch, FluentD, Kibana, and Google Cloud Platform.
- A basic understanding of developing ML models (preferably in Python), source control with Git, and using Docker for local development
Materials or downloads needed in advance
- A laptop with a GitHub account and Docker installed
What you'll learn
- Understand the importance of CD4ML, its technical components, and experience a specific implementation using open source tools and the public cloud to automate the end-to-end lifecycle of ML applications
Danilo Sato is a principal consultant at ThoughtWorks with more than 17 years of experience in many areas of architecture and engineering: software, data, infrastructure, and machine learning. Balancing strategy with execution, Danilo helps clients refine their technology strategy while adopting practices to reduce the time between having an idea, implementing it, and running it in production using the cloud, DevOps, and continuous delivery. He is the author of DevOps in Practice: Reliable and Automated Software Delivery, is a member of ThoughtWorks’ Technology Advisory Board and Office of the CTO, and is an experienced international conference speaker.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires