Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Learning location: Real-time feature extraction for mobile analytics

Sander Pick (Set), Andrew Hill (Textile), Carson Farmer (Set)
2:05pm2:45pm Wednesday, September 27, 2017
Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Data scientists, mobile developers, and CTOs

Prerequisite knowledge

  • Basic knowledge of machine learning, statistical computation, and simple/complex data structures

What you'll learn

  • Learn why your mobile applications should be using sketch-based algorithms, how data sketching and streaming data structures can facilitate location-based feature extraction, and some of the pitfalls and features of location data
  • Understand streaming data sketch concepts, as well as some of the available tools and software solutions that are currently available


Modern advances in information and communication technology have converged with popular culture to create an environment that is overflowing with new forms of location data. As excitement surrounding these types of personal mobility data sources continues to build, the potential to develop location-based services that are more dynamic, intelligent, responsive, and powerful builds with it. However, issues of scalability, privacy, and timeliness arise when operating with massive human mobility data sources. Furthermore, many of the machine learning approaches favored by the data science community rely on data preprocessing in the form of feature engineering before the true art of modeling can begin. Andrew Hill and Sander Pick explore new approaches to location-based feature extraction based on an emerging branch of computer and data science focused on streaming algorithms, or “sketches.”

In the data sketching world, the idea is that exact queries on massively large data sources can be untenable, but if an approximate answer is acceptable, then it is often possible to perform these queries orders-of-magnitude faster (often with mathematically proven error bounds). Andrew and Sander start by going over some common data sketching algorithms that are easily deployable for a range of data science problems. Building on this, they then highlight some of the ways in which these algorithms can be extended to location data and, using a suite of relatively simple location-aware data sketches, demonstrate how these simple data structures can be deployed in a novel way on mobile devices and used to extract real-time streaming features for subsequent machine learning in real time.

Along the way, Andrew and Sander cover multiple real-world examples, problems, and solutions driven by simple data sketches, share resources for exploring, learning about, and developing your own sketching frameworks, and outline the benefits of data sketching within a reactive application, as well as some of the issues that arise when operating in a data-rich, real-time streaming environment.

Photo of Sander  Pick

Sander Pick


Sander Pick is CTO at Set, an on-device machine learning platform that aims to embed user intelligence into every mobile application. Previously, Sander worked at Apple and Mission Motors. A Montanan, Sander likes focus, climbing, and open spaces.

Photo of Andrew Hill

Andrew Hill


Andrew Hill is cofounder and CEO of Textile, where he is building technology to help data scientists create the future of predictive models from personal location and behavior data. Textile provides an SDK to access over 200+ features extracted in real-time and designed for machine learning. Previously, Andrew was chief science officer at CARTO. He holds a PhD from the University of Colorado, Boulder.

Photo of Carson Farmer

Carson Farmer


Carson Farmer is lead data scientist at Set, a technology startup focused on building innovative new technologies to help mobile application developers make better use of behavioral data, with a focus on protecting users’ privacy. Carson is also an assistant professor of geocomputation in the Department of Geography at the University of Colorado Boulder, where his research focuses on human mobility and space-time interactions.