Forecasting Space-time Events

Jeremy Heffner (Azavea)
Privacy, Law & Ethics
Location: 116
Average rating: ****.
(4.50, 8 ratings)

This session uses the speaker’s experience in building a crime forecasting package to outline some tools and techniques useful in modeling space-time event data. While the case study focuses on modeling crime, the techniques and tools presented are applicable to a broad selection of domains. In particular, attendees will leave the session with:

  • An understanding of key concepts in modeling space-time events
  • An introduction to the open source GeoTrellis framework for raster processing
  • An overview of the modeling pipeline used within the case study

Concepts

While many data scientists work with data that includes geographic information, this data is often used in rather rudimentary ways or limited to vector data sets such as the point locations of stores or users. The session will introduce the strengths and weaknesses behind raster-based geographic analysis. Some challenges faced when modeling data at a fine geographic and temporal resolution will be discussed. For example, how can uncertainty around the time of occurrence for events be represented? Finally, the approach of modeling space-time events as stochastic point processes will be outlined.

GeoTrellis

The case study leverages the open source GeoTrellis framework to conduct geographic processing. GeoTrellis is currently an incubating project within the Eclipse Foundation’s LocationTech working group. The project provides fast and scalable geographic processing with an emphasis on raster-based analysis and routing through transportation networks. Already written in Scala, GeoTrellis is currently being extended to integrate with Apache Spark.

Modeling

The modeling pipeline within the case study consists of several loosely coupled components. In addition to GeoTrellis, the project uses R for machine learning and the Amazon Simple Workflow service for pipeline orchestration. The presentation will outline the basic structure of the modeling process including details of the statistical techniques utilized within the process.

Several statistical techniques were examined throughout the development of the project with the final approach included a stacked model incorporating a gradient boosting machine (GBM) to model the presence of events and a generalized additive model (GAM) to transform these predictions into expected counts. The session will conclude by outlining some approaches to evaluating predictive accuracy for these types of data sets.

Photo of Jeremy Heffner

Jeremy Heffner

Azavea

I’m the Senior Data Scientist at Azavea, a geospatial software firm located in Philadelphia. My primary focus is working with crime data to model patterns and forecast risk — the intersection of geography, data science, and social good.

Keywords: geographic data, raster processing, predictive analysis, spacetime event modeling, weather, demographics, machine learning, early warning systems, R, Scala, Python