Sep 23–26, 2019

Problems Taking AI to Production, and How to Fix Them!

Jim Scott (MapR Technologies)
1:15pm1:55pm Thursday, September 26, 2019
Location: 1A 21/22
Secondary topics:  Model Development, Governance, Operations

Who is this presentation for?

data scientist, data engineer, devops, dataops

Level

Intermediate

Prerequisite knowledge

general understanding of the daily activities that go into creating models including data sources. Deep knowledge is not required, but cursory understanding is useful.

What you'll learn

Detailing of very specific problems, reference points for those problems within the industry as well as guidance / suggestions on how to rectify the issues with getting to production.

Description

By working with a variety of clients across many industries — chemical sciences, health care and oil and gas — I have documented a number of problems which are major impediments to successful operationalization of these systems as well as how to keep them running in a production environment.

Problems with data formats and optimization therein will be discussed. Overtime versioning problems of the data, models and parameters can grow complex. Tooling management issues around notebook applications like jupyter will be discussed as well as workflow management tools to keep track and manage an execution pipeline.

When log output volumes grow quickly and significant volume of data movement begin occurring new problems arise. Source data moving to the GPU, log data back to storage, and then the log data to machines to handle the distributed compute to perform post model analytics to evaluate the performance characteristics of the models. Networks do not provide infinite bandwidth and most enterprises do not run extremely high speed networks.

Moving forward the problems expand when preparing for production deployment of models and adapting them for real-time and not just training and testing. Model deployment and scoring with a canary and decoy model leveraging the rendezvous architecture will be discussed.

Photo of Jim Scott

Jim Scott

MapR Technologies

Jim Scott is the director of enterprise strategy and architecture at MapR Technologies. He is passionate about building combined big data and blockchain solutions. Over his career, Jim has held positions running operations, engineering, architecture, and QA teams in the financial services, regulatory, digital advertising, IoT, manufacturing, healthcare, chemicals, and geographical management systems industries. Jim has built systems that handle more than 50 billion transactions per day, and his work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop. Jim is also the cofounder of the Chicago Hadoop Users Group (CHUG).

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts