Academic machine learning almost exclusively involves offline evaluation of machine learning models. In the real world this is, somewhat surprisingly, only good enough for a rough cut that eliminates the real dogs. For production work, online evaluation is often the only option to determine which of several final-round candidates might be chosen for further use.
As Einstein is rumored to have said, theory and practice are the same, in theory. In practice, they are different. So it is with models. Part of the problem is interaction with other models and systems. Part of the problem has to do with the variability of the real world. Often, there are adversaries at work. It may even be sunspots. One particular problem arises when models choose their own training data and thus couple back onto themselves.
In addition to these difficulties, production models almost always have service-level agreements that have to do with how quickly they must produce results and how often they are allowed to fail. These operational considerations can be as important as the accuracy of the model: the right results returned late are worse than slightly wrong results returned in time.
Ted Dunning offers a survey of useful ways to evaluate models in the real world, breaking the problem of evaluation apart into operational and function evaluation and demonstrating how to do each without unnecessary pain and suffering. You’ll learn about decoy and canary models, nonlinear latency histogramming, model-delta diagrams, and more. These techniques may sound arcane, but each is simple at heart and doesn’t require any advanced mathematics to understand. Along the way, he shares exciting visualization techniques that will help make differences strikingly apparent.
Ted Dunning is the chief technology officer at MapR, an HPE company. He’s also a board member for the Apache Software Foundation, a PMC member, and committer on a number of projects. Ted has years of experience with machine learning and other big data solutions across a range of sectors. He’s contributed to clustering, classification, and matrix decomposition algorithms in Mahout and to the new Mahout Math library and designed the t-digest algorithm used in several open source projects and by a variety of companies. Previously, Ted was chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems and built fraud-detection systems for ID Analytics (LifeLock). Ted has coauthored a number of books on big data topics, including several published by O’Reilly related to machine learning, and has 24 issued patents to date plus a dozen pending. He holds a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com