Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

The Bag of Little Bootstraps: A/B experimenting with big data made small

Emily Sommer (Etsy)
11:15–11:55 Thursday, 2/06/2016
Data-driven business
Location: Capital Suite 4 Level: Non-technical
Average rating: ****.
(4.75, 8 ratings)

Etsy is a data-driven business: it runs A/B tests to evaluate the success and impact of almost every feature that it launches. Historically, it has calculated significance and confidence intervals for its A/B test results using classic parametric testing (which makes assumptions about what the data looks like and requires a lot of data). While these constraints worked well enough for most experiments, iOS and Android results suffered: errors in mobile logging, less data overall, and multiple “visits” per user made it impossible to draw sound conclusions from these crucial experiments.

After fixing iOS and Android logging, Etsy took a deeper look at other ways to evaluate mobile experiment data. In an attempt to correct for some of these problems, Etsy implemented the Bag of Little Bootstraps, a clever take on bootstrapping in which many Monte Carlo iterations are run on many smaller subsets of one’s data to produce “full-size” datasets and corresponding statistics. The significance of an experiment is calculated for each of these subsets of data, and the distribution of these significance tests is examined to determine overall significance. Emily Sommer explains how Etsy does this on a browser-level instead of a visit-level to help ensure that its trials are as independent as possible. Not only has this improved the quality of its A/B test analysis, but it’s also saved a lot of processing time and power over a traditional bootstrap implementation.

Emily discusses in detail how the Bag of Little Bootstraps works, why it’s so effective, and how it can be generalized for anyone running an A/B test, going over gotchas Etsy came across and covering the key things to consider when making these sorts of decisions with one’s data.

Topics include:

  • Data that is iid
  • Bootstrapping to help reduce skew
  • Overview of the Bag of Little Bootstraps
  • Generalizing the implementation
  • Other fun things to consider (hyperparameters, for instance)

Emily Sommer


Emily Sommer is a data engineer at Etsy. She brings a wealth of practical knowledge and can-do attitude to both her team and the students she tutors through ScriptEd.