In this talk, we discuss some of the lessons learned from building anomaly detection systems for three operational systems.
The first case study involves identifying anomalies in the log files produced by a large scale cloud computing facility in order to identify and head off potential problems and to improve the overall efficiency of the facility. The second case study involves looking for anomalous events that are important enough for network analysts to investigate in detail from the daily data that a large network produces, including the network flow data, network packet data and log files. The goal of the third case study is to identify all anomalies that occur each day in the hyperspectral images collected by one of NASA’s earth observing satellites (EO-1).
From these three case studies, we identify eight techniques that have consistently proved useful and discuss how best to deploy these techniques:
Robert Grossman is a faculty member and the Chief Research Informatics Officer in the Biological Sciences Division of the University of Chicago. He is the Director of the Center for Data Intensive Science and a Senior Fellow in the Computation Institute (CI) and the Institute for Genomics and Systems Biology (IGSB). He is also the Founder and a Partner of Open Data Group, which specializes in building predictive models over big data. He has led the development of open source software tools for analyzing big data (Augustus), distributed computing (Sector), and high performance networking (UDT). In 1996 he founded Magnify, Inc., which provides data mining solutions to the insurance industry and was sold to ChoicePoint in 2005. He is also the Chair of the Open Cloud Consortium, which is a not-for-profit that supports the research community by operating cloud infrastructure, such as the Open Science Data Cloud. He blogs occasionally about big data, data science, and data engineering at rgrossman.com.
For exhibition and sponsorship opportunities, email firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences, email email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World contacts
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.