Sep 23–26, 2019
Please log in

SOLD OUT: Managing the complete machine learning lifecycle with MLflow

Jules Damji (Databricks)
9:00am12:30pm Tuesday, September 24, 2019
Location: 1A 21
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Data scientists, developers, and machine learning developers

Level

Intermediate

Description

ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.

Jules Damji walks you through MLflow, an open source project that simplifies the entire ML lifecycle, to solve this problem. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

Prerequisite knowledge

  • General knowledge of Python
  • A basic understanding of machine learning concepts

Materials or downloads needed in advance

  • A laptop with 8–16 GB of memory with Chrome or Firefox browsers installed
  • A preregistration for the Databricks Community Edition

What you'll learn

  • Understand the three main components of open source MLflow (MLflow Tracking, MLflow Projects, and MLflow Models) and how each help address challenges of the ML lifecycle
  • Learn how to use MLflow Tracking to record and query experiments (code, data, config, and results), how to use MLflow Projects packaging format to reproduce runs, and how to use MLflow Models general format to send models to diverse deployment tools
Photo of Jules Damji

Jules Damji

Databricks

Jules S. Damji is an Apache Spark community and developer advocate at Databricks. He’s a hands-on developer with over 20 years of experience. Previously, he worked at leading companies such as Sun Microsystems, Netscape, @Home, LoudCloud/Opsware, Verisign, ProQuest, and Hortonworks, building large-scale distributed systems. He holds a BSc and MSc in computer science and MA in political advocacy and communication from Oregon State University, the California State University, and Johns Hopkins University, respectively.

Comments on this page are now closed.

Comments

Picture of Jules Damji
Jules Damji | Apache Spark Developer and Community Advocate
09/22/2019 4:51pm EDT

Hello Pete,

Yes, indeed!

I fixed it this AM and pushed it. So just git pull should fix it in your cloned environment.

Cheers

Pete Carlson |
09/21/2019 5:44pm EDT

Hi I discovered a typo in the req.txt.

It should require scikit-learn not sckit-learn

Picture of Jules Damji
Jules Damji | Apache Spark Developer and Community Advocate
09/20/2019 7:19pm EDT

Instructions for the tutorial are at this public GitHub link:

https://github.com/dmatrix/spark-saturday/tree/master/tutorials/mlflow/src/python

Cheers!

Picture of Jules Damji
Jules Damji | Apache Spark Developer and Community Advocate
09/20/2019 6:59pm EDT

I have also uploaded the zip file on the google drive. Worst case you can down download and unzip it.

All you need is a copy of the labs and data.

https://dbricks.co/StrataNYC

cheers
Jules

Picture of Jules Damji
Jules Damji | Apache Spark Developer and Community Advocate
09/20/2019 6:24pm EDT

Alternatively, you could not Download as a zip file.
Alternatively, go to this URL, click on “Clone or Download” button and unpzip it. Since you are not committing anything back, you should be fine just having a copy of the files.

Picture of Jules Damji
Jules Damji | Apache Spark Developer and Community Advocate
09/20/2019 6:10pm EDT

Are you git a permission denied to clone it?

did use try git clone git@github.com:dmatrix/spark-saturday.git

or git clone git@github.com:dmatrix/spark-saturday.git

Either should work. this is a public git and many have cloned and forked it too.

Milton Volpato | Head of Analytics & Data Science
09/20/2019 4:50pm EDT

Hello, got permission error to access te git repository!!Anyonelse?

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires