Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Anomaly detection on live data

Arun Kejariwal (Independent), Francois Orsini (MZ), Dhruv Choudhary (MZ)
4:35pm5:15pm Thursday, September 28, 2017
Secondary topics:  IoT, Streaming
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Data scientists and analysts

What you'll learn

  • Learn how Satori can be leveraged for anomaly detection on live data


Data-driven decision making has become the norm in every industry, and there has been a shift from leveraging big data to live data, in order to facilitate faster decision-making in a stateless compute tier. Although niche services such as Periscope and Facebook Live focus on live video, there is not a general-purpose platform to democratize live data processing. To address this lack, MZ recently launched the Satori platform for live messaging, with a cloud-based, managed messaging service; live discovery, with a dynamic SQL-based real-time message filtering service that can query at line rate with no configuration or need to index data in advance; and live reactions, through in-stream bots that attach to data channels and react at ultralow latencies.

Arun Kejariwal, Francois Orsini, and Dhruv Choudhary explore Satori’s design and architecture and share techniques for anomaly detection on live data. This is of particular importance, as anomalies occur frequently in live data for a multitude of reasons. Detection and filtering of anomalies is of paramount importance for robust decision making.

Topics include:

  • How to handle low SNR (signal-to-noise ratio), which is typical of live data
  • How to handle seasonality, trend, and structural changes
  • One-pass incremental algorithms
  • Trade-offs between speed and accuracy
Photo of Arun Kejariwal

Arun Kejariwal


Arun Kejariwal is an independent lead engineer. Previously, he was he was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install-and-click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns, and his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection; and he developed and open-sourced techniques for anomaly detection and breakout detection at Twitter. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.

Photo of Francois Orsini

Francois Orsini


Francois Orsini is the chief technology officer for MZ’s Satori business unit. Previously, he served as vice president of platform engineering and chief architect, bringing his expertise in building server-side architecture and implementation for a next-gen social and server platform; was a database architect and evangelist at Sun Microsystems; and worked in OLTP database systems, middleware, and real-time infrastructure development at companies like Oracle, Sybase, and Cloudscape. Francois has extensive experience working with database and infrastructure development, honing his expertise in distributed data management systems, scalability, security, resource management, HA cluster solutions, and soft real-time and connectivity services. He also collaborated with Visa International and Visa USA to implement the first Visa Cash Virtual ATM for the internet and founded a VC-backed startup called Unikala in 1999. Francois holds a bachelor’s degree in civil engineering and computer sciences from the Paris Institute of Technology.

Photo of Dhruv Choudhary

Dhruv Choudhary


Dhruv Choudhary is a research scientist at MZ, where he is researching stream anomaly detection algorithms for time series analysis and computer vision. Previously, Dhruv worked in the connected car space building data products around driver aggression, car behavior, and risk analysis. He holds a master’s degree from Georgia Tech, where he focused on applying control theory techniques to systems problems; his thesis formulated energy efficient thread scheduling for asymmetric architectures as an optimal control problem.

Comments on this page are now closed.


09/30/2017 10:33am EDT

The slides are here:

09/30/2017 8:10am EDT