Distilling Data Exhaust: How to Surface Insights and Build Products

Location: Mission City M
Average rating: ***..
(3.37, 30 ratings)

As the world becomes more heavily instrumented, we are collecting massive amounts of raw data which often sits unused in log files or data warehouses. At the same time, statistical techniques, cloud computing, and software frameworks have matured to a point where a small team or even a single person can rapidly extract insights and build products on top of this data.

As a case study, this talk will combine datasets from Twitter, LinkedIn, Wikipedia, Mechanical Turk and other sources to extract insights about the O’Reilly Strata Conference and its attendees.

During the process, we will walk you through the nuts & bolts of building data products using these tools. We will cover coming up with ideas, tracking down or creating data, using visualizations during development, wiring together a prototype, and show some algorithmic tricks that can get you out of a jam.

Photo of Pete Skomoroch

Pete Skomoroch


Pete Skomoroch is a Research Scientist at LinkedIn, focusing on building data driven products. For the past several years, he has been a consultant at Data Wrangling in Washington, DC, working on projects involving search, finance, and recommendation systems. Before joining LinkedIn, he was the Director of Advanced Analytics at Juice Analytics and a Sr. Research Engineer at AOL Search. He spent the previous 6 years in Boston implementing pattern detection algorithms for streaming sensor data at MIT Lincoln Laboratory and constructing predictive models for large retail datasets at Profitlogic. Pete has a B.S. in Mathematics and Physics from Brandeis University.

Comments on this page are now closed.


Picture of Pete Skomoroch
Pete Skomoroch
02/04/2011 7:11am PST

Sure thing, I’ll get that Starta attendee cluster analysis up on my blog or on the O’Reilly Strata site.

Picture of Xinh Huynh
Xinh Huynh
02/04/2011 5:15am PST

There was a neat graph clustering Strata attendees. Would you mind sharing it?


  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at syoung@oreilly.com

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts