Ira Cohen shares a novel two-step approach for building more reliable prediction models by integrating anomalies in them. The first step uses anomaly detection algorithms to discover anomalies in a time series in the training data. In the second, multiple prediction models, including time series models and deep networks, are trained, enriching the training data with the information about the anomalies discovered in the first step.
Anomaly detection for individual time series is a necessary but insufficient step, due to the fact that anomaly detection over a set of live data streams may result in anomaly fatigue, thereby limiting effective decision making. One way to address the above is to carry out anomaly detection in a multidimensional space. However, this is typically very expensive computationally and hence not suitable for live data streams. Another approach is to carry out anomaly detection on individual data streams and then leverage correlation analysis to minimize false positives, which in turn helps in surfacing actionable insights faster.
Arun Kejariwal walks you through how to marry correlation analysis with anomaly detection, discusses how the topics are intertwined, and details the challenges you may encounter based on production data. Arjun also showcases how deep learning can be leveraged to learn nonlinear correlation, which in turn can be used to further contain the false positive rate of an anomaly detection system.
Ira Cohen is a cofounder and chief data scientist at Anodot, where he’s responsible for developing and inventing the company’s real-time multivariate anomaly detection algorithms that work with millions of time series signals. He holds a PhD in machine learning from the University of Illinois at Urbana-Champaign and has over 12 years of industry experience.
Arun Kejariwal is an independent lead engineer. Previously, he was he was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install-and-click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns, and his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection; and he developed and open-sourced techniques for anomaly detection and breakout detection at Twitter. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org