Modern advances in information and communication technology have converged with popular culture to create an environment that is overflowing with new forms of location data. As excitement surrounding these types of personal mobility data sources continues to build, the potential to develop location-based services that are more dynamic, intelligent, responsive, and powerful builds with it. However, issues of scalability, privacy, and timeliness arise when operating with massive human mobility data sources. Furthermore, many of the machine learning approaches favored by the data science community rely on data preprocessing in the form of feature engineering before the true art of modeling can begin. Andrew Hill and Sander Pick explore new approaches to location-based feature extraction based on an emerging branch of computer and data science focused on streaming algorithms, or “sketches.”
In the data sketching world, the idea is that exact queries on massively large data sources can be untenable, but if an approximate answer is acceptable, then it is often possible to perform these queries orders-of-magnitude faster (often with mathematically proven error bounds). Andrew and Sander start by going over some common data sketching algorithms that are easily deployable for a range of data science problems. Building on this, they then highlight some of the ways in which these algorithms can be extended to location data and, using a suite of relatively simple location-aware data sketches, demonstrate how these simple data structures can be deployed in a novel way on mobile devices and used to extract real-time streaming features for subsequent machine learning in real time.
Along the way, Andrew and Sander cover multiple real-world examples, problems, and solutions driven by simple data sketches, share resources for exploring, learning about, and developing your own sketching frameworks, and outline the benefits of data sketching within a reactive application, as well as some of the issues that arise when operating in a data-rich, real-time streaming environment.
Sander Pick is CTO at Set, an on-device machine learning platform that aims to embed user intelligence into every mobile application. Previously, Sander worked at Apple and Mission Motors. A Montanan, Sander likes focus, climbing, and open spaces.
Andrew Hill is cofounder and CEO of Textile, where he is building technology to help data scientists create the future of predictive models from personal location and behavior data. Textile provides an SDK to access over 200+ features extracted in real-time and designed for machine learning. Previously, Andrew was chief science officer at CARTO. He holds a PhD from the University of Colorado, Boulder.
Carson Farmer is lead data scientist at Set, a technology startup focused on building innovative new technologies to help mobile application developers make better use of behavioral data, with a focus on protecting users’ privacy. Carson is also an assistant professor of geocomputation in the Department of Geography at the University of Colorado Boulder, where his research focuses on human mobility and space-time interactions.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org