Most available data science tools put a lot of effort into helping and automating the four main steps of machine learning (i.e., defining the business problem, collecting and preparing data, training and testing the model, and deploying model). But the fifth and crucial step—monitoring the model’s performance—is extremely challenging. How does one intelligently monitor the performance of unsupervised machine learning models, especially if there are many of them, involving multiple types of algorithms that feed into each other? What if your model is trained and deployed on multiple data streams in parallel (data from multiple customers, products, etc.)?
Ira Cohen explains how Anodot devised a way to intelligently monitor the performance of its highly complex unsupervised machine learning models. This solution runs approximately 30 different types of unsupervised machine learning algorithms, each one with its own parameters and tuning capabilities, and was designed to provide real-time anomaly detection. Adding to the complexity, the outputs of some of the algorithms act as the inputs others. These algorithms run constantly on the vast number of signals that are sent to Anodot’s SaaS cloud (currently more than 120 million signals are reported to Anodot every one to five minutes). Ira shares multiple examples of how this approach has helped Anodot detect, fix, and eventually design better learning algorithms and describes the general methodology, which Anodot calls “learning the learner.”
Anodot’s solution first collects time series metrics that constantly measure various performance indicators for each of the algorithms. It then measures the number of anomalies it discovers for each customer, their score distribution, the number of seasonal patterns discovered, classification changes and rates between of the selection algorithm, number of clusters and their quality from its various clustering algorithms, and much more. Since manual tracking of changes in these algorithm performance metrics is not feasible, Anodot created a two-step process that continuously measures the models’ performance and uses algorithms to learn their behavior, so abnormal changes in them can be detected and acted upon by the data science team. When Anodot is alerted to an abnormal change in the distribution of scores for the anomalies the system finds, it can quickly determine if it was due to algorithm tuning or is a valid shift. Similarly, after changes in the clustering algorithm parameters, Anodot gets alerted if the quality of the clusters degrades abnormally. It also uses outlier detection to determine if there are significant changes between the results of experiments during development and what happens in production.
Ira Cohen is a cofounder and chief data scientist at Anodot, where he’s responsible for developing and inventing the company’s real-time multivariate anomaly detection algorithms that work with millions of time series signals. He holds a PhD in machine learning from the University of Illinois at Urbana-Champaign and has over 12 years of industry experience.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com