Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

A spike in sales is not always good news: On the importance of learning the relationships between time series metrics at scale

Inbal Tadeski (Anodot)
Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Deep learning
Average rating: ***..
(3.40, 5 ratings)

If you need to predict how much revenue an ecommerce site will generate this quarter, you could use the previous quarter’s revenue as a guide, but this does not take into consideration any other valid parameters, such as how much traffic came to the site in the current quarter, the site’s bounce rate, or other metrics that may be much better predictors. However, to understand which metrics can be used as predictors (or other tasks), you must first understand which metrics are related to each other and how. For a small-scale operation, these relationships can be manually defined. For certain types of metrics, such as IT, tools such as configuration management databases (CMDBs) may automate some of the discovery of the relationships between the metrics. But if you want to incorporate metrics beyond IT, such as application metrics or business metrics like revenue, and at the vast scale most digital businesses require, machine learning tools are needed.

Inbal Tadeski shares key machine learning methods for correlating metrics at scale, without having to do any manual configuration. Implementing these methods at scale can be computationally expensive, so Inbal also shares methods for reducing the computational resources needed—in particular, she discusses how to scale the similarity and clustering methods. Along the way, Inbal explains how to identify causality, since correlation does not necessarily equal causation. In many cases, it may not matter that the metrics are correlated but not related causally. However, sometimes it does.

Topics include:

  • Abnormal similarities: If certain metrics tend to go off-program at the same time or at similar intervals, they may be related. Inbal outlines what to look for in abnormal similarities and what types of algorithms can be used to identify them.
  • Metadata similarities: Each metric has metadata associated with it, describing what is measured, where, and how. When collecting many metrics, similarities in their metadata properties can be an extremely valuable way to identify related or correlated metrics. Inbal shares algorithms for discovering similarities in the metadata of millions to billions of metrics.
  • Normal behavior similarities: Machine learning algorithms can be used to contrast the shapes and behavior of data metrics when they are behaving normally. While is seems straightforward to use standard correlation algorithms such as the Pearson correlation coefficient for this, off-the-shelf algorithms can generate many false positives. Inbal explores techniques to neutralize these false positives and generate usable results.
Photo of Inbal Tadeski

Inbal Tadeski


Inbal Tadeski is a data scientist at Anodot, a provider of real-time machine learning anomaly detection and analytics solutions for detection of business incidents. Previously, Inbal was a research engineer at HP Labs, where she specialized in machine learning and data mining. She holds an MSc in computer science with a focus on machine learning from Hebrew University in Jerusalem and a BSc in computer science from Ben Gurion University.