Cleaning Gritty Data to tell a story in the News Room or the Board Room

Data Science
Location: Room 1-6 Level: Intermediate
Average rating: ***..
(3.00, 1 rating)

When handling data from outside your organisation, the first step is always to gather it up, clean it, and check the quality and structure it into a format for you to use. This gnarly process is fun and satisfying and can lead quickly to real results.

This talk gives quirky examples in Ruby and Python, illustrating questions such as:

  • how to choose heuristics for pulling out structure
  • getting past strange search interfaces, and opaque header rules
  • how to mix human data fixing in with computer fixing
  • getting a quality feedback loop from customers
  • robustly handling the unexpected

The examples are from the real world, used in newspapers, on TV, to get marketing leads or to make new business datasets.

Photo of Thomas Levine

Thomas Levine

csv soundsystem

Thomas Levine has been hacking with computers since he was young. In the process, he noticed back and wrist pain, prompting him to research physical ergonomics of computer use and, in turn, to study statistics. His favorite color is pink.


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at or +1 (707) 827-7148

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata contacts.