Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Streamlining a Machine Learning Project Team

Sourav Dey (Manifold), Alex Ng (Manifold)
1:30pm5:00pm Tuesday, March 26, 2019
Secondary topics:  AI and machine learning in the enterprise

Who is this presentation for?

Practicing data scientists and data engineers, CDOs with a mandate to build a team inside the organization

Level

Intermediate

Prerequisite knowledge

Some basic knowledge of the software engineering process and familiarity with machine learning vocabulary (model, training, etc.)

Materials or downloads needed in advance

None

What you'll learn

This talk provides a map of the new terrain for how to get value from machine learning in a way that will positively impact the company's bottom line, by streamlining teams and time to production.

Description

Artificial Intelligence is already helping many businesses become more responsive and competitive, but how do you move machine learning models efficiently from research to deployment at enterprise scale? It is imperative to plan for deployment from day one, both in tool selection and in the feedback and development process.

As recently as a few years ago, data scientists were the people who played in a sandbox—when they came up with a useful model, it was thrown over the wall to another team that would reimplement it to put it into production. Those days are over now: there’s only one Git repo in the entire company, and everything you commit is essentially in production. But teams are still run as if data science is mainly about experimentation.

This tutorial presents best practices for working in this new reality. Data scientists can still play in a sandbox, but do it in a way such that it’s turnkey to take models into production. Just as DevOps is about people working at the intersection of development and operations, there are now people working at the intersection of data science and software engineering who need to be integrated into the team with tools and support. At Manifold, we’ve developed the Lean AI process and the open-source Orbyter package for Docker-first data science to help do just that.

Sourav Day and Alex Ng explain how to streamline a machine learning project and help your engineers work as an an integrated part of your development and production teams.

Topics include:

  • Understanding both the business problem and the data
  • Containerized data science for cleaner workflows
  • Data engineering as a core competency
  • Building iterative data models to deliver value early
  • Best practices for bookkeeping ML experiments
  • Developing user trust in the data models
  • Seamless deployment at production scale
  • Observing and validating on-the-ground model use
Photo of Sourav Dey

Sourav Dey

Manifold

Sourav Dey is CTO at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Prior to Manifold, Sourav led teams to build data products across the technology stack, from smart thermostats and security cams (Google/Nest) to power grid forecasting (AutoGrid) to wireless communication chips (Qualcomm). He holds patents for his work, has been published in several IEEE journals, and has won numerous awards. He earned his PhD, MS, and BS degrees in Electrical Engineering and Computer Science from the Massachusetts Insitute of Technology (MIT).

Photo of Alex Ng

Alex Ng

Manifold

Alexander Ng is a Senior Data Engineer at Manifold. His previous work includes a stint as engineer and technical lead doing DevOps at Kryuus, as well as engineering work for the Navy. He holds a BS degree from Boston University in Electrical Engineering.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Picture of Sourav Dey
Sourav Dey | CTO
11/13/2018 6:16am PST

Hi all, we’re really looking forward to this tutorial! To help us make this session as productive as possible for you, please let us know some of the specific challenges you’ve run into—whether technical or team-related—when trying to move ML projects from development to production. We may not be able to address every specific scenario, but we’ll be sure to cover some of the most interesting in addition to the ones we commonly see.