Sep 23–26, 2019
Please log in

Efficient ML engineering: Tools and best practices

Sourav Dey (Manifold), Jakov Kucan (Manifold)
9:00am12:30pm Tuesday, September 24, 2019
Location: 1A 12/14
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Practicing data scientists and data engineers and CDOs with a mandate to build a team inside the organization

Level

Beginner

Description

Business value comes from solving real needs by putting models into production. You need to be able to move ML models efficiently from research to deployment at enterprise scale. Part of the answer is about using the right workflow, and the other part is about choosing the right tools. The recent rise of the ML engineer is in large part due to evolving workflow best practices: just as DevOps folks have been working at the intersection of development and operations, today, ML engineers are working at the intersection of data science and software engineering—that is, ML ops. These folks must be integrated into the team with efficient tools and effective support. Manifold developed the Lean AI process and the open source Orbyter package for Docker-first data science to streamline the development process and help companies put successful models into production as smoothly and efficiently as possible. Even if you’ve never used Docker before, Orbyter makes containerization simple and elegant—which in turn makes your team’s work seamless and clean.

Sourav Dey and Jakov Kucan walk you through the six steps of the Lean AI process and explain how it helps your ML engineers work as an an integrated part of your development and production teams. You’ll get a hands-on example using real-world data, so you can get up and running with Docker and Orbyter and see firsthand how streamlined they can make your workflow. They cover creating an AI specification by understanding both your business and your data; using containerized data science for cleaner workflows (no experience needed); developing ML engineering as a core competency; being deliberate, disciplined, and coordinated with your process; and deploying seamlessly at production scale.

Prerequisite knowledge

  • A basic understanding of the software engineering process
  • Familiarity with machine learning vocabulary (model, training, etc.)

Materials or downloads needed in advance

  • A laptop
  • This tutorial requires Docker installed on participants laptops. For Mac you can use Docker for Mac, for Linux you should be able to do a system installation, for Windows you can get Docker for Windows. This is a large installation and should be done BEFORE arriving onsite.
  • Additionally, the workshop will requires that attendees have Python 3.x installed on their computers. Please also pull the Docker image for the tutorial (BEFORE you arrive onsite). This can be done by issuing the command:
  • `docker pull manifoldai/orbyter-ml-dev`
  • More information here: https://hub.docker.com/r/manifoldai/orbyter-ml-dev.
  • What you'll learn

    • Discover how to get value from machine learning in a way that will affect the company's bottom line by building teams of data scientists and engineers that are well integrated into organizational teams delivering models into production
    Photo of Sourav Dey

    Sourav Dey

    Manifold

    Sourav Dey is CTO at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Sourav leads the engineering team focusing on work across client projects, developing platform technologies to make Manifold ML engineers more efficient, and communicating to business stakeholders. Prior to Manifold, Sourav led teams building data products across the technology stack, from smart thermostats and security cams at Google-Nest to wireless communication at Qualcomm. Sourav’s career has always been at the intersection of math and computer science — a PhD from MIT in signal processing and bachelors degrees in Math and CS from MIT.

    Photo of Jakov Kucan

    Jakov Kucan

    Manifold

    Jakov Kucan is a senior architect at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Previously, Jakov was chief architect at Kyruus and director of product strategy at PTC. He’s a skilled architect and engineer, able to see through the details of implementations, keep track of the dependencies within a large design, and communicate the vision and ideas to both technical and nontechnical audiences. Jakov earned his PhD in computer science from MIT and his MA degree in mathematics and BSE degree in computer science and engineering from the University of Pennsylvania. He’s an author of several publications and patent applications.

    Comments on this page are now closed.

    Comments

    Calvin Lawson | Sr. Data Scientist
    09/25/2019 12:36pm EDT

    Thanks Sourav! Downloaded and saved, appreciate it.

    Picture of Sourav Dey
    Sourav Dey | CTO
    09/24/2019 12:57pm EDT

    Thanks Calvin! That is great feedback and something we will incorporate in the future. The slides are here: https://www.manifold.ai/2019StrataNY

    Calvin Lawson | Sr. Data Scientist
    09/24/2019 12:54pm EDT

    Sourav, thanks for the tutorial! You mentioned the slides would be available, where would I go to get them?

    Also, a little feedback: it would be awesome if you had mentioned above that we’ll need to run bash scripts during the tutorial. Most of us at least have Git Bash or (even better) WSL.

    Thanks again! Good stuff.

    Picture of Sourav Dey
    Sourav Dey | CTO
    09/23/2019 1:54pm EDT

    Hi Ben, thanks for your interest! We do touch upon that topic, but it’s not the focus of the tutorial. So in terms of how much time: not much.

    Ben Teeuwen | Lead Data Scientist
    09/22/2019 3:42am EDT

    How much time of the 3,5h is spent on monitoring / updating the model in production after deployment?

    • Cloudera
    • O'Reilly
    • Google Cloud
    • IBM
    • Cisco
    • Dataiku
    • Intel
    • Io-Tahoe
    • MemSQL
    • Microsoft Azure
    • Oracle Cloud Infrastructure
    • SAS
    • Arcadia Data
    • BMC Software
    • Hazelcast
    • SAP
    • Amazon Web Services
    • Anaconda
    • Esri
    • Infoworks.io, Inc.
    • Kyligence
    • Pitney Bowes
    • Talend
    • Google Cloud
    • Confluent
    • DataStax
    • Dremio
    • Immuta
    • Impetus Technologies Inc.
    • Keyence
    • Kyvos Insights
    • StreamSets
    • Striim
    • Syncsort
    • SK holdings C&C

      Contact us

      confreg@oreilly.com

      For conference registration information and customer service

      partners@oreilly.com

      For more information on community discounts and trade opportunities with O’Reilly conferences

      strataconf@oreilly.com

      For information on exhibiting or sponsoring a conference

      pr@oreilly.com

      For media/analyst press inquires