Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Machine learning platform lifecycle management

Hope Wang (Intuit)
14:5515:35 Thursday, 24 May 2018
Data engineering and architecture, Data-driven business management
Location: Capital Suite 7 Level: Intermediate
Secondary topics:  Financial Services, Managing and Deploying Machine Learning
Average rating: ****.
(4.00, 3 ratings)

Who is this presentation for?

  • Software architects, engineers, and data scientists

Prerequisite knowledge

  • Basic knowledge of machine learning development

What you'll learn

  • Learn how to build and virtualize end-to-end lifecycle management
  • Explore the components of a machine learning platform and learn how different components associate and interact
  • Understand how to execute and manage in a production environment
  • Explore a case study of taking a model through the deployment process

Description

Data science and machine learning are critical enabling factors for data-driven organizations. There has been an exponential rise of expectations put on engineering organizations to meet the demand to develop and scale machine learning capabilities. A machine learning platform is not just the sum of its parts; the key is how it supports the model lifecycle end to end. This includes data discovery, feature engineering, iterative model development, model training, and model scoring (batch and online). The management of artifacts, their associations, and deployment across various platform components is vital.

While there are a number of mature technologies that support each phase of this lifecycle, there are limited solutions available that tie these components together into a cohesive machine learning platform. To support the lifecycle of a model, you must be able to manage the various ML-related artifacts and their associations and automate deployment. A lifecycle management service built for this purpose should be leveraged for storage, versioning, visualizing (including associations), and deployment of artifacts. The platform should support model development in different programming languages, and language and package versions should be configured specific to a model. Having the custom environment follows the model through the lifecycle is important to guarantee model always run in the same environment. Thus, the environment should be externalized, associated, and deployed together with a model. Other considerations include the connection between various artifacts and platforms:, the data and datasets (source data and feature data, training datasets, and scoring result sets), the code (notebook code, model code, deployment code, etc.), model-specific environments, and platforms (developing and training platforms, batch and online scoring platforms).

Hope Wang explains how her team at Intuit is managing the machine learning lifecycle, how different components associate and interact with each other, and how to execute in a production environment. Hope then shares an example of how an integrated process was developed for data engineers and data scientists to manage the entire lifecycle of a model from ideation through development, training, and ultimately, scoring.

Photo of Hope Wang

Hope Wang

Intuit

Hope Wang is a software engineer in Intuit’s Small Business Data and Analytics Group. Hope is a self-taught, self-motivated, fully powered hacker who is passionate about innovation. She holds a master’s degree in biomedical engineering from the University of Southern California.