Production models are interacting with the real world, and it’s terrifying that oftentimes nobody has any idea how they’re performing on live data. Bias and variance can creep into your models over time, and you should know when that happens. Many data scientists and their organizations are not keeping track of how their models are performing over time. The world changes, often slowly, and most models perform worse as time goes on. Nuances in a changing environment—new language usage, differing shopping habits, a changing political landscape, and many other factors—can unravel models that were once finely tuned.
With the AI and ML explosion, some organizations have upwards of hundreds of models running in production every day. Ensuring everything is working well is a huge undertaking, and unfortunately, many organizations are simply ignoring the problem. Donald Miner details the tracking of machine learning models in production to ensure model reliability, consistency, and performance into the future. You’ll come away with insights on three major topics. He covers why you should invest time in monitoring your machine learning models and shares several anecdotes about some of the dangers of not paying attention to how a model’s performance can change over time. You’ll learn which metrics you should be gathering for each model and what they tell you with a list of “vitals,” what value they provide, and how to measure them. Some of the vitals include classification label distribution over time, distribution of regression results, measurement of bias, measurement of variance, change in output from previous models, and changes in accuracy over time. You’ll also get some implementation strategies to keep watch on model drift over time. Many organizations already have data scientists on their team, but Donald explains how many data science approaches apply to model monitoring, how to determine if a model requires attention, and how to productionalize these strategies.
Donald Miner is the founder of the data science consulting firm Miner & Kasch and specializes in large-scale data analysis and applying machine learning to real-world problems. Donald is author of the O’Reilly book MapReduce Design Patterns and multiple industry reports. He’s architected and implemented dozens of mission-critical and large-scale data analysis systems within the US Government and Fortune 500 companies. He has applied machine learning techniques to analyze data across several verticals, including financial, retail, telecommunications, healthcare, government intelligence, and entertainment. His PhD is from the University of Maryland Baltimore County, where he focused on artificial intelligence and multiagent systems. He lives in Maryland with his wife and three young sons.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org