The basic ideas of anomaly detection are simple. You build a model and you look for data points that don’t match that model. Building a practical anomaly detection system requires deal with practical details starting with algorithm selection, data flow architecture, anomaly alerting, user interfaces and visualizations. We will describe the major classes of anomaly detection systems and show how to build anomaly detection systems for:
a) rate shifts to determine when events such as web traffic, purchases or process progress beacons shift rate
b) topic spotting to determine when new topics appear in a content stream such as Twitter
c) network flow anomalies to determine when systems with defined inputs and outputs act strangely.
While describing how to solve these problems, we will describe how clustering, dimensionality reduction, and density estimation can be used in systems that adapt and learn about their environment and how these systems can tell you when something has changed.
Ted Dunning has been involved with a number of startups with the latest being MapR Technologies where he is Chief Application Architect working on advanced Hadoop-related technologies. He is also a PMC member for the Apache Zookeeper and Mahout projects. Opinionated about software and data-mining and passionate about open source, he is an active participant of Hadoop and related communities and loves helping projects get going with new technologies.
For exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences, email email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata contacts