Anomaly detection as a data science technique is used in many applications, from the IoT to finance. With the rise of the industrial Internet and the explosion of sensor data, businesses from transport to manufacturing are keen to develop predictive maintenance. (Another key area where anomaly detection is important is identifying fraud in finance and within the social benefit system.)
There are, however, a number of challenges when applying anomaly detection that are hindering progress. For a start, anomaly detection is a challenging problem by definition: defining and distinguishing between “normal” and an “anomaly” is often part of the problem statement. An anomaly is a relatively rare event and, hence, suffers from the accuracy paradox. Moreover, what is a good measure of success? Because of the nature of the problem, if the model misses all the anomalies, it will still be very accurate. The vastly different data types and preprocessing required, as well as the complex ensemble machine-learning methods needed, prove an additional challenge.
Alessandra Staglianò illustrates these challenges through two very different use cases—the IoT and fraud detection—and explains how to overcome them. You’ll explore the differences and the similarities of the two industries and learn how to set up a framework to solve anomaly detection in these situations.
Alessandra Staglianò is a data scientist who has worked on multiple complex projects. In addition to various machine-learning techniques, Alessandra’s expertise is in extracting relevant information from noisy and redundant data. Her former research work has been published in a variety of journals. Alessandra holds a PhD in computer science specializing in machine learning and machine vision.
Comments on this page are now closed.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.