Sep 23–26, 2019
Please log in

Problems taking AI to production and how to fix them

Jim Scott (NVIDIA)
1:15pm1:55pm Thursday, September 26, 2019
Location: 1A 21/22
Average rating: **...
(2.67, 3 ratings)

Who is this presentation for?

  • Data scientists, data engineers, DevOps, and data ops

Level

Intermediate

Description

By working with a variety of clients across many industries—chemical sciences, healthcare, and oil and gas—Jim Scott has documented a number of major impediments to the successful operationalization of these systems and how to keep them running in a production environment, including problems with data formats and optimization and overtime versioning problems of the data, models, and parameters growing too complex. He explores tooling management issues around notebook applications like Jupyter and workflow management tools to keep track and manage an execution pipeline.

New problems arise when when log output volumes grow quickly and significant volume of data movement begins. Source data moves to the GPU, log data moves back to storage, and then the log data moves to machines to handle the distributed compute to perform postmodel analytics to evaluate the performance characteristics. Networks don’t provide infinite bandwidth and most enterprises do not run extremely high-speed networks.

Moving forward, the problems expand when preparing for production deployment of models and adapting them for real time and not just training and testing. You’ll learn about model deployment and scoring with a canary and decoy model leveraging the rendezvous architecture.

Prerequisite knowledge

  • A basic understanding of the daily activities that go into creating models including data sources (useful but not required)

What you'll learn

  • Discover specific problems and reference points within the industry
  • Learn how to rectify issues with getting to production
Photo of Jim Scott

Jim Scott

NVIDIA

Jim Scott is the head of developer relations, data science, at NVIDIA. He’s passionate about building combined big data and blockchain solutions. Over his career, Jim has held positions running operations, engineering, architecture, and QA teams in the financial services, regulatory, digital advertising, IoT, manufacturing, healthcare, chemicals, and geographical management systems industries. Jim has built systems that handle more than 50 billion transactions per day, and his work with high-throughput computing at Dow was a precursor to more standardized big data concepts like Hadoop. Jim is also the cofounder of the Chicago Hadoop Users Group (CHUG).

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires