Presented By O'Reilly and Cloudera
Make Data Work
5–7 May, 2015 • London, UK

Fast > Perfect: Practical approximation examples for mobile app analytics using Spark Streaming

Kevin Schmidt (Mind Candy Ltd), Luis Angel Vicente Sanchez (Mind Candy Ltd.)
16:15–16:55 Thursday, 7/05/2015
Data Science
Location: King's Suite - Balmoral
Average rating: ****.
(4.50, 2 ratings)
Slides:   1-PDF 

Prerequisite Knowledge

basic knowledge of Spark Streaming, knowledge of basic data structures, computer science fundamentals


For mobile games constant tweaks are the difference between success and failure. Product managers need instant access to the latest metrics, e.g. to see how an acquisition campaign is doing or how a change affects spending per user. Data and analytics must be available in real-time. However, calculating, for example, uniqueness or newness of a data point requires a list of seen data points – both memory-intensive and tricky when using real-time stream processing like Spark Streaming. Probabilistic data structures allow approximation of these properties with a fixed memory representation, and are very well suited for this kind of stream processing. Getting from the theory of approximation to a useful metric at a low error rate even for many millions of users is another story. In our talk we will look at practical ways of achieving this:

  • Which approximation we use for selection of useful metrics
  • Why we picked a specific probabilistic data structure
  • How we store it in Cassandra as a time series
  • How we implemented it in Spark Streaming (including example code to get started)
Photo of Kevin Schmidt

Kevin Schmidt

Mind Candy Ltd

Kevin built up the data science and engineering team at Mind Candy, and with the team created a scalable architecture for mobile game analytics. Before Mind Candy, Kevin headed the data science and back-end services team at, working with ten years of music listening data from millions of users. He also spent four years working on private clouds and service architecture at Goldman Sachs.

Photo of Luis Angel Vicente Sanchez

Luis Angel Vicente Sanchez

Mind Candy Ltd.

Luis is senior data engineer at Mind Candy, was the first to introduce Spark Streaming at the company, and is responsible for the real-time mobile analytics platform. He has more than 10 years of experience in software engineering and architecture.