Problems taking AI to production and how to fix them

Jim Scott (NVIDIA)

1:15pm–1:55pm Thursday, September 26, 2019

Location: 1A 21/22

Data Engineering and Architecture

Secondary topics: Model Development, Governance, Operations

Average rating:

(2.67, 3 ratings)

Who is this presentation for?

Data scientists, data engineers, DevOps, and data ops

Level

Intermediate

Description

By working with a variety of clients across many industries—chemical sciences, healthcare, and oil and gas—Jim Scott has documented a number of major impediments to the successful operationalization of these systems and how to keep them running in a production environment, including problems with data formats and optimization and overtime versioning problems of the data, models, and parameters growing too complex. He explores tooling management issues around notebook applications like Jupyter and workflow management tools to keep track and manage an execution pipeline.

New problems arise when when log output volumes grow quickly and significant volume of data movement begins. Source data moves to the GPU, log data moves back to storage, and then the log data moves to machines to handle the distributed compute to perform postmodel analytics to evaluate the performance characteristics. Networks don’t provide infinite bandwidth and most enterprises do not run extremely high-speed networks.

Moving forward, the problems expand when preparing for production deployment of models and adapting them for real time and not just training and testing. You’ll learn about model deployment and scoring with a canary and decoy model leveraging the rendezvous architecture.

Prerequisite knowledge

A basic understanding of the daily activities that go into creating models including data sources (useful but not required)

What you'll learn

Discover specific problems and reference points within the industry
Learn how to rectify issues with getting to production

Jim Scott

NVIDIA

Jim Scott is the head of developer relations, data science, at NVIDIA. He’s passionate about building combined big data and blockchain solutions. Over his career, Jim has held positions running operations, engineering, architecture, and QA teams in the financial services, regulatory, digital advertising, IoT, manufacturing, healthcare, chemicals, and geographical management systems industries. Jim has built systems that handle more than 50 billion transactions per day, and his work with high-throughput computing at Dow was a precursor to more standardized big data concepts like Hadoop. Jim is also the cofounder of the Chicago Hadoop Users Group (CHUG).