Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

What does your postcode say about you? A technique to understand rare events based on demographics

Gary Willis (ASI)
14:5515:35 Thursday, 25 May 2017
Data science and advanced analytics
Location: Hall S21/23 (B)
Level: Advanced
Average rating: ***..
(3.20, 5 ratings)

Who is this presentation for?

  • Data scientists

Prerequisite knowledge

  • Knowledge of decision trees and information gain (useful but not required)

What you'll learn

  • Explore a novel algorithm that uses public data and an unsupervised tree-based learning algorithm to help companies leverage locational data they have on their clients


It is widely accepted that where you live says a lot about who you are, demographically speaking. At the same time, many companies are desperate to find out more about their customers in order to better understand them. By knowing where they live however, many companies are sitting on an extremely rich dataset from which they could learn a lot about their customers. Furthermore, this data can be used to optimize their marketing strategy and help them expand their customer base.

Gary Willis offers a technical presentation of a novel algorithm to help companies leverage locational data they have on their clients. The technique enriches a customer dataset using UK census data and then applies a novel, tree-based unsupervised learning algorithm to extract differentiating demographic features, making it possible to identify high-value postcodes without performing anomaly detection on the entirety of the UK population.

Along the way, Gary also discusses a wide range of further potential applications with census data and other datasets. For instance, fires or A&E admissions are relatively rare events where one would like to avoid having to perform anomaly detection on the entire UK population or all UK households.

Photo of Gary Willis

Gary Willis


Gary Willis is a data scientist at ASI with a diverse background in applying machine-learning techniques to commercial data science problems. Gary holds a PhD in statistical physics; his research looked at Markov Chain Monte Carlo simulations of complex systems.