Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

How to Detect Anomalies in High Cardinality Dimensions and Make Them Actionable

Shankar Vedaraman (Netflix), Christopher Colburn (Netflix)
10:40am–11:20am Thursday, 02/19/2015
Data Science
Location: LL20 A
Average rating: ****.
(4.65, 23 ratings)

Anomaly detection is the process of identifying data points that do not conform to normal behavior, and it is used ubiquitously at Netflix. For example, real-time systems detect, and raise, outliers when internal systems do not meet some service level agreement. In data warehousing applications, traditional outlier detection methods (e.g. some number of standard deviations) will work for low cardinality dimensions that are normally distributed, but typically dimensions of interest are neither normally distributed nor have low cardinality. In these settings the number of false positives/negatives create an unnecessary overhead and limit the end-user’s ability to respond.

In this session, we present a case study at Netflix where we deployed a variant of the Singular Value Decomposition for anomaly detection in high cardinality dimensions. We then wrapped this in a Business Intelligence tool to present actionable insights for business use.

We will then discuss a specific application centered on payment processing. With more than 50 Million customers worldwide, Netflix has to ensure that the payment methods provided by customers do not fail due to processing problems in the payment network. A typical payment transaction goes through at least 4 external participants (issuers, acquirers, payment gateways, processors, etc…) in addition to Netflix’s systems. The wide array of banks that customers use to pay for Netflix creates this high cardinality dimension, and the complexity of the payment transaction necessitates the need for a different solution than the common methods mentioned above. We will also present the decoupled architecture in the cloud that enables us to provide a highly performing, scalable solution.

Photo of Shankar Vedaraman

Shankar Vedaraman


Shankar Vedaraman leads the data engineering org for Growth, Business operations, and Infrastructure at Netflix. His team is responsible for engineering and managing data models and services that enable data consumption for a variety of use cases. Shankar Vedaraman is passionate about engineering data products that drive data-driven research to solve business problems

Photo of Christopher Colburn

Christopher Colburn


Christopher Colburn is just another data scientist at Netflix.

Comments on this page are now closed.


Picture of Shankar Vedaraman
Shankar Vedaraman
02/22/2015 2:30am PST

Hi Javier and Chris,

Yes, we’ll post the slides. We used Prezi and need to download as PDF, adjust a bit and then upload. We’ll do it on Monday. For now, you can go to the link below for the open source details.

Chris Barbosky
02/22/2015 2:23am PST

Yes, the slides would be helpful, at least please post the info the items that will be open sourced. Thanks! Great talk!

Javier von Stecher
02/21/2015 7:14am PST

I enjoyed this talk a lot. Are you going to post the slides?