Making Open Work
May 8–9, 2017: Training & Tutorials
May 10–11, 2017: Conference
Austin, TX

How exploring open taxi data from New York City can lead to a new bus route

11:50am12:30pm Wednesday, May 10, 2017
Data, Big and Small
Location: Meeting Room 18 C/D
Level: Beginner
Average rating: **...
(2.00, 5 ratings)

Who is this presentation for?

  • Members of the open source community interested in interdisciplinary subject matter, especially with respect to data analysis, data exploration, and combining computation with other fields like transportation, cities, and urban planning

Prerequisite knowledge

  • Familiarity with data analysis (even in Excel or tabular form)
  • A basic understanding of Python

What you'll learn

  • Explore a potential application for NYC's open taxi dataset
  • Understand the steps to exploring an open dataset, from formulating a question to cleaning and parsing through data to adopting an approach, exploring results, and offering a potential recommendation based on the analysis undertaken


Anastasia Sagalovitch explains how she used New York City’s open taxi dataset with Python to determine areas of frequent pick-ups and drop-offs within a time frame and superimposed those hotspots atop a map of the subway system to identify taxi hotspots that fall within or outside of a particular radius of established subway stops. Ana dives into analysis conducted on a dataset comprising night hours with three major hotspots that fall a quarter mile outside of major subway routes. Drawing on these hotspots (and additional domain knowledge on nightlife in New York City), Ana proposes a bus route to accommodate these three hubs. But this kind of analysis is just the beginning of exploring open datasets and interacting with data generated as citizens interact with a city’s infrastructure and services.

Photo of A S



Anastasia Sagalovitch is a graduate student at NYU working on combining sustainability with computation and resource reuse. Previously, Ana has investigated emissions trading schemes and green revolving loan funds, explored open transportation datasets in the context of networks, built an agent-based model to simulate how cells could communicate using a problem in graph theory, and interned at a cleantech incubator.