Problems Taking AI to Production, and How to Fix Them!
Who is this presentation for?data scientist, data engineer, devops, dataops
Prerequisite knowledgegeneral understanding of the daily activities that go into creating models including data sources. Deep knowledge is not required, but cursory understanding is useful.
What you'll learn
By working with a variety of clients across many industries — chemical sciences, health care and oil and gas — I have documented a number of problems which are major impediments to successful operationalization of these systems as well as how to keep them running in a production environment.
Problems with data formats and optimization therein will be discussed. Overtime versioning problems of the data, models and parameters can grow complex. Tooling management issues around notebook applications like jupyter will be discussed as well as workflow management tools to keep track and manage an execution pipeline.
When log output volumes grow quickly and significant volume of data movement begin occurring new problems arise. Source data moving to the GPU, log data back to storage, and then the log data to machines to handle the distributed compute to perform post model analytics to evaluate the performance characteristics of the models. Networks do not provide infinite bandwidth and most enterprises do not run extremely high speed networks.
Moving forward the problems expand when preparing for production deployment of models and adapting them for real-time and not just training and testing. Model deployment and scoring with a canary and decoy model leveraging the rendezvous architecture will be discussed.
Jim Scott is the director of enterprise strategy and architecture at MapR Technologies. He is passionate about building combined big data and blockchain solutions. Over his career, Jim has held positions running operations, engineering, architecture, and QA teams in the financial services, regulatory, digital advertising, IoT, manufacturing, healthcare, chemicals, and geographical management systems industries. Jim has built systems that handle more than 50 billion transactions per day, and his work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop. Jim is also the cofounder of the Chicago Hadoop Users Group (CHUG).
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts