Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Correlation analysis on live data streams

Arun Kejariwal (Independent), Francois Orsini (MZ)
16:3517:15 Wednesday, 23 May 2018
Data science and machine learning
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Time Series and Graphs
Average rating: ***..
(3.14, 7 ratings)

Who is this presentation for?

  • Data scientists and data analysts

What you'll learn

  • Learn how to marry correlation analysis with anomaly detection to surface actionable insights from live data streams


There has been a shift from big data to live streaming data to facilitate faster data-driven decision making. As the number of live data streams grow—partly a result of the expanding IoT—it is critical to develop techniques to better extract actionable insights.

One current application, anomaly detection, is a necessary but insufficient step, due to the fact that anomaly detection over a set of live data streams may result in an anomaly fatigue, limiting effective decision making. One way to address the above is to carry out anomaly detection in a multidimensional space. However, this is typically very expensive computationally and hence not suitable for live data streams. Another approach is to carry out anomaly detection on individual data streams and then leverage correlation analysis to minimize false positives, which in turn helps in surfacing actionable insights faster.

Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.

Topics include:

  • An overview correlation analysis
  • Robust correlation analysis
  • Overview of alternative measures, such as co-median
  • Trade-offs between speed and accuracy
  • Correlation analysis in large dimensions
Photo of Arun Kejariwal

Arun Kejariwal


Arun Kejariwal is an independent lead engineer. Previously, he was he was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install-and-click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns, and his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection; and he developed and open-sourced techniques for anomaly detection and breakout detection at Twitter. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.

Photo of Francois Orsini

Francois Orsini


Francois Orsini is the chief technology officer for MZ’s Satori business unit. Previously, he served as vice president of platform engineering and chief architect, bringing his expertise in building server-side architecture and implementation for a next-gen social and server platform; was a database architect and evangelist at Sun Microsystems; and worked in OLTP database systems, middleware, and real-time infrastructure development at companies like Oracle, Sybase, and Cloudscape. Francois has extensive experience working with database and infrastructure development, honing his expertise in distributed data management systems, scalability, security, resource management, HA cluster solutions, and soft real-time and connectivity services. He also collaborated with Visa International and Visa USA to implement the first Visa Cash Virtual ATM for the internet and founded a VC-backed startup called Unikala in 1999. Francois holds a bachelor’s degree in civil engineering and computer sciences from the Paris Institute of Technology.