SOLD OUT: Managing the complete machine learning lifecycle with MLflow
Who is this presentation for?
- Data scientists, developers, and machine learning developers
Level
Description
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.
Jules Damji walks you through MLflow, an open source project that simplifies the entire ML lifecycle, to solve this problem. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
Prerequisite knowledge
- General knowledge of Python
- A basic understanding of machine learning concepts
Materials or downloads needed in advance
- A laptop with 8–16 GB of memory with Chrome or Firefox browsers installed
- A preregistration for the Databricks Community Edition
What you'll learn
- Understand the three main components of open source MLflow (MLflow Tracking, MLflow Projects, and MLflow Models) and how each help address challenges of the ML lifecycle
- Learn how to use MLflow Tracking to record and query experiments (code, data, config, and results), how to use MLflow Projects packaging format to reproduce runs, and how to use MLflow Models general format to send models to diverse deployment tools
Jules Damji
Databricks
Jules S. Damji is an Apache Spark community and developer advocate at Databricks. He’s a hands-on developer with over 20 years of experience. Previously, he worked at leading companies such as Sun Microsystems, Netscape, @Home, LoudCloud/Opsware, Verisign, ProQuest, and Hortonworks, building large-scale distributed systems. He holds a BSc and MSc in computer science and MA in political advocacy and communication from Oregon State University, the California State University, and Johns Hopkins University, respectively.
Comments on this page are now closed.
Presented by
Elite Sponsors
Strategic Sponsors
Zettabyte Sponsors
Contributing Sponsors
Exabyte Sponsors
Content Sponsor
Impact Sponsors
Supporting Sponsor
Non Profit
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
strataconf@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires
Comments
Hello Pete,
Yes, indeed!
I fixed it this AM and pushed it. So just git pull should fix it in your cloned environment.
Cheers
Hi I discovered a typo in the req.txt.
It should require scikit-learn not sckit-learn
Instructions for the tutorial are at this public GitHub link:
https://github.com/dmatrix/spark-saturday/tree/master/tutorials/mlflow/src/python
Cheers!
I have also uploaded the zip file on the google drive. Worst case you can down download and unzip it.
All you need is a copy of the labs and data.
https://dbricks.co/StrataNYC
cheers
Jules
Alternatively, you could not Download as a zip file.
Alternatively, go to this URL, click on “Clone or Download” button and unpzip it. Since you are not committing anything back, you should be fine just having a copy of the files.
Are you git a permission denied to clone it?
did use try git clone git@github.com:dmatrix/spark-saturday.git
or git clone git@github.com:dmatrix/spark-saturday.git
Either should work. this is a public git and many have cloned and forked it too.
Hello, got permission error to access te git repository!!Anyonelse?