Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Streamlining a machine learning project team

Sourav Dey (Manifold), Alex Ng (Manifold)
1:30pm5:00pm Tuesday, March 26, 2019
Average rating: ****.
(4.25, 4 ratings)

Who is this presentation for?

  • Practicing data scientists and data engineers and CDOs with a mandate to build a team inside the organization



Prerequisite knowledge

  • Basic knowledge of the software engineering process
  • Familiarity with machine learning concepts and vocabulary (model, training, etc.)

What you'll learn

  • Understand how to get value from machine learning in a way that will positively impact the company's bottom line, by streamlining teams and time to production


Artificial intelligence is already helping many businesses become more responsive and competitive, but how do you move machine learning models efficiently from research to deployment at enterprise scale? It’s imperative to plan for deployment from day one, both in tool selection and in the feedback and development process.

As recently as a few years ago, data scientists were the people who played in a sandbox—when they came up with a useful model, it was thrown over the wall to another team that would reimplement it to put it into production. Those days are over now: there’s only one Git repo in the entire company, and everything you commit is essentially in production. But teams are still run as if data science is mainly about experimentation.

Sourav Day and Alex Ng share best practices for working in this new reality. Data scientists can still play in a sandbox but must do so in a way such that offers a turnkey solution to take models into production. Just as in DevOps, where people work at the intersection of development and operations, today people are working at the intersection of data science and software engineering and need to be integrated into the team with tools and support. Manifold developed the Lean AI process and the open source Orbyter package for Docker-first data science to help do just that.

Sourav and Alex explain how to streamline a machine learning project and help your engineers work as an an integrated part of your development and production teams.

Topics include:

  • Understanding both the business problem and the data
  • Containerized data science for cleaner workflows
  • Data engineering as a core competency
  • Building iterative data models to deliver value early
  • Best practices for bookkeeping ML experiments
  • Developing user trust in the data models
  • Seamless deployment at production scale
  • Observing and validating on-the-ground model use
Photo of Sourav Dey

Sourav Dey


Sourav Dey is CTO at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Previously, Sourav led teams building data products across the technology stack, from smart thermostats and security cams at Google Nest to power grid forecasting at AutoGrid to wireless communication chips at Qualcomm. He holds patents for his work, has been published in several IEEE journals, and has won numerous awards. He holds PhD, MS, and BS degrees in electrical engineering and computer science from MIT.

Photo of Alex Ng

Alex Ng


Alexander Ng is a senior data engineer at Manifold. His previous work includes a stint as engineer and technical lead doing DevOps at Kryuus as well as engineering work for the Navy. He holds a BS in electrical engineering from Boston University.

Comments on this page are now closed.


Picture of Sourav Dey
Sourav Dey | CTO
11/13/2018 6:16am PST

Hi all, we’re really looking forward to this tutorial! To help us make this session as productive as possible for you, please let us know some of the specific challenges you’ve run into—whether technical or team-related—when trying to move ML projects from development to production. We may not be able to address every specific scenario, but we’ll be sure to cover some of the most interesting in addition to the ones we commonly see.