Digging into Open Data

Location: Portland 252 Level: Intermediate
Average rating: ***..
(3.00, 11 ratings)

There are loads of places to find data – open government data at many levels, publicly released data from companies, and researched data from organizations. Ideally, these sources would be provided as web services. However, often they are a mish-mash of Excel or other loosely structured files, HTML tables, or even PDF documents.

It’s easy to become discouraged with so many obstacles to merely acquiring information for your app or site. Fortunately, there are many tools and techniques to help you gather, parse, and clean up data from a variety of sources.

This session will use a real-world example, Politilines, as an example. I will demonstrate how we found, gathered, parsed, and made sense of the public data needed for Politilines.

The following topics will be covered:

  • Finding Data:
    • Commercial repositories
    • Locating free data
  • Retrieving Data:
    • Choosing the right tools based on type of data and level of expertise
    • Example: Using Python and Mechanize to get and parse data
  • Making Sense of Data:
    • Tools for processing and analyzing data
    • Example: Using Natural Language Toolkit
  • Business Considerations:
    • The ins and outs of using existing tools or rolling your own data parsing scripts
    • Thinking ahead – the stability of open data
Photo of kim rees

kim rees


Kim Rees is a founding partner of “Periscopic”: http://www.periscopic.com, an award-winning information visualization firm. Their work has been featured in the MoMA as well as several online and print publications, including CommArts’ Interactive Annual, The Information Design Sourcebook, VisWeek Discovery Exhibit, Adobe Success Stories, CommArts Insights, Infosthetics.com, FlowingData.com, and numerous websites, blogs, and regional media outlets. Periscopic’s body of work was recently nominated for the Cooper-Hewitt National Design Awards.

Kim is a prominent individual in the information visualization community. She has published papers in Parsons Journal of Information Mapping, was an award winner in the VAST 2010 Challenge, and is a guest blogger for Infosthetics.com. Kim has been featured on CommArts Insights and has presented at several industry events including O’Reilly Strata, Wolfram Data Summit, the Tableau Software Conference, AIGA SHIFT, WebVisions, CERF Biennial Conference, and Portland Data Visualization, among others. Recently she has also been a CommArts Interactive Annual judge and is the Technical Editor for Visualize This by Nathan Yau. Kim received her BA in Computer Science from New York University.

Comments on this page are now closed.


Magnus Runesson
07/23/2012 4:42pm PDT

Good links to resources, had expected more depth.


For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com.

View a complete list of OSCON contacts