Sep 23–26, 2019

Managing the Complete Machine Learning Lifecycle with MLflow

Jules Damji (Databricks)
9:00am12:30pm Tuesday, September 24, 2019
Location: 1A 21/22
Secondary topics:  Model Development, Governance, Operations

Who is this presentation for?

data scientists, developers, or machine learning developers

Level

Intermediate

Description

ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.

To solve these challenges, MLflow, an open source project, simplifies the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

What You Will Learn:

  • Understand the 3 main components of open source MLflow (MLflow Tracking, MLflow Projects, MLflow Models) and how each help address challenges of the ML lifecycle.
  • How to use MLflow Tracking to record and query experiments: code, data, config, and results.
  • How to use MLflow Projects packaging format to reproduce runs
  • How to use MLflow Models general format to send models to diverse deployment tools.

Prerequisite knowledge

– A fully-charged laptop (8-16GB memory) with Chrome or Firefox – Pre-Register for Databricks Community Edition – Basic knowledge of Python programming language – Basic understanding of machine learning concepts

What you'll learn

* Understand the 3 main components of open source MLflow (MLflow Tracking, MLflow Projects, MLflow Models) and how each help address challenges of the ML lifecycle. * How to use MLflow Tracking to record and query experiments: code, data, config, and results. * How to use MLflow Projects packaging format to reproduce runs * How to use MLflow Models general format to send models to diverse deployment tools.
Photo of Jules Damji

Jules Damji

Databricks

Jules S. Damji is an Apache Spark Community and Developer Advocate at Databricks. He is a hands-on developer with over 20 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, LoudCloud/Opsware, VeriSign, ProQuest, and Hortonworks, building large-scale distributed systems. He holds a B.Sc and M.Sc in Computer Science and MA in Political Advocacy and Communication from Oregon State University, Cal State, and Johns Hopkins University respectively.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts