Schedule: Practitioner sessions

Get hands-on as a data scientist, from tools and techniques to data product design. Figure out real-time analytics, effective use of Hadoop and how to build out a crack data science team.

Location: Mission City M
Joseph Adler (Facebook), Hilary Mason (Cloudera Fast Forward Labs), Drew Conway (Alluvium), Jake Hofman (Yahoo!)
Average rating: ***..
(3.17, 29 ratings)
This tutorial offers a basic introduction to practicing data science. We'll walk through several typical projects that range from conceptualization to acquiring data, to analyzing and visualizing it, to drawing conclusions. Read more.
Location: Mission City B5
Abe Taha (Karmasphere), Shevek - (Karmasphere), Ken Krugler (Scale Unlimited), Chris Wensel (Concurrent, Inc)
Average rating: *....
(1.57, 23 ratings)
This tutorial will explain MapReduce and how to develop big data applications in Java and high level languages such as Pig and Hive SQL. Using examples it will cover how to prototype, debug, monitor, test and optimize big data applications for Hadoop. Attendees will get hands-on instruction and will leave with a solid understanding of how to analyze data on Hadoop clusters and practical examples. Read more.
Location: Mission City B5
Jonathan Ellis (DataStax)
Average rating: ***..
(3.53, 19 ratings)
Apache Cassandra is a second-generation distributed database originally open-sourced by Facebook. Its write-optimized shared-nothing architecture results in excellent performance and scalability. This tutorial will cover application design with Cassandra through a series of exercises with Twissandra, a simple Twitter clone written in Python and Django. Read more.
Location: Mission City M
Pete Skomoroch (Workday)
Average rating: ***..
(3.37, 30 ratings)
Learn how to leverage data exhaust, the digital byproduct of our online activities, to solve problems and discover insights about the world around you. We will walk through a real world example which combines several datasets and statistical techniques to discover insights and make predictions about attendees at O'Reilly Strata. Read more.
Location: Mission City M
Brian Dolan (Discovix ), Joe Hellerstein (UC Berkeley)
Average rating: ***..
(3.75, 20 ratings)
A discussion of Big Data approaches to analysis problems in marketing, forecasting, academia and enterprise computing. We focus on practices to enhance collaboration and employ rich statistical methods: a Magnetic, Agile and Deep (MAD) approach to analytics. While the approach is language-agnostic, we show that sophisticated statistics can be easily scaled in traditional environments like SQL. Read more.
Location: Mission City M
Matt Biddulph (Product Club)
Average rating: ***..
(3.64, 25 ratings)
If you're a new startup looking for investment, or a team at a large company seeking the green light for a new product, nothing convinces like real running code. But how do you solve the chicken-and-egg problem of filling your early prototype with real data? We'll discuss how to use open datasets and public web APIs as a proxy for the final product while you're still in the development stage. Read more.
Location: Mission City M
Average rating: ***..
(3.93, 29 ratings)
How do you build a crack team of data scientists on a shoestring budget? In this 40-minute presentation from the co-founder of Infochimps, Flip Kromer will draw from his experiences as a teacher and his vast programming and data experience to share lessons learned in building a team of smart, enthusiastic hires. Read more.
Location: Mission City M
Joseph Turian (Workday)
Average rating: ****.
(4.35, 23 ratings)
Certain recent academic developments in large data have immediate and sweeping applications in industry. They offer forward-thinking businesses the opportunity to achieve technical competitive advantages. However, these little-known techniques have not been discussed outside academia–until now. What if you knew about important new large data techniques that your competition don't yet know about? Read more.
Location: Mission City M
Patrick Chanezon (Docker), Ryan Boyd (Neo4j)
Average rating: ***..
(3.45, 11 ratings)
Many of the tools Google created to store, query, analyze, visualize data are exposed to external developers. This talk will give you an overview of Google services for Data Crunchers: Google Storage for developers, BigQuery, Machine Learning API, App Engine, Visualization API. Read more.
Location: Mission City M
Benoit Sigoure (StumbleUpon, Inc.)
Average rating: ****.
(4.00, 3 ratings)
OpenTSDB is an open-source, distributed time series database designed to monitor large clusters of commodity machines at an unprecedented level of granularity. OpenTSDB allows operation teams to keep track of all the metrics exposed by operating systems, applications and network equipment, and makes the data easily accessible. Read more.
Location: Mission City B1
Rod Cope (OpenLogic, Inc.)
Average rating: ***..
(3.83, 12 ratings)
Hadoop and HBase make it easy to store terabytes of data, but how do you scale your search mechanism to sift through these mountains of bits and retrieve large result sets in a matter of milliseconds? Careful use of the Solr search server, based on Lucene, made these requirements come to life in our production environment. Come learn how we query terabytes of data in a highly available system. Read more.
Location: Mission City B5
Sam Shah (LinkedIn)
Average rating: ***..
(3.78, 9 ratings)
How do you go about building a product around data using Hadoop? This talk will present how LinkedIn builds and maintains such features as People You May Know. We will present our architecture for doing so (open-sourced) as well as knowledge we've gained in the process. Read more.
Location: Mission City B4
Matthew Russell (Digital Reasoning Systems)
Average rating: ***..
(3.50, 6 ratings)
This talk demonstrates how an eclectic blend of storage, analysis, and visualization techniques can be used to gain a lot of serious insight from Twitter data, but also to answer fun quesions such as "What does Justin Bieber and the Tea Party have (and not have) in common?" Read more.
Location: Mission City M
Doug Cutting (Cloudera)
Average rating: ***..
(3.50, 8 ratings)
Apache Avro provides an expressive, efficient standard for representing large data sets. Avro data is programming-language neutral and MapReduce-friendly. Hopefully it can replace gzipped CSV-like formats as a dominant format for data. Read more.
Location: Mission City B1
Tags: real_time, cep, iep
Theo Schlossnagle (Circonus)
Average rating: ***..
(3.75, 4 ratings)
With thousands of datapoints per second from nodes around the world, how can you tell when something isn't right? The bottom line is: it's hard, but with the right tools it is achievable. Read more.
Location: Mission City B5
Isabel Drost-Fromm (Apache Software Foundation/ Nokia Gate 5 GmbH)
Average rating: **...
(2.75, 12 ratings)
With growing amounts of digital data at the fingertips of software developers the need for a scalable, easy to use framework is tremendous. This talk introduces Apache Mahout - a project with the goal of implementing scalable machine learning algorithms for the masses. Read more.
Location: Mission City B4
Justin Sheehy (Basho Technologies)
Average rating: *****
(5.00, 1 rating)
Riak Core is a general implementation of a distributed systems model, enabling you to build a customized, scalable, highly-available distributed system without too huge an investment. Justin will explain that model, its history, and how it can be used to build new data processing systems. Read more.
Location: Mission City M
Pablo Castro (Microsoft)
Average rating: ****.
(4.58, 12 ratings)
Sharing data on the Web comes with a tough trade-off between minimalism and enabling creative new scenarios. This session will explore Web APIs that focus on exposing data and let clients decide how to use it. We'll share our experiences while designing the Open Data Protocol (, what we found to be great and terrible ideas and what we hear from folks running OData Web APIs. Read more.
Location: Mission City M
Kevin Weil (Twitter, Inc.)
Average rating: ***..
(3.82, 17 ratings)
Most analytics systems rely on large offline computations, which means results come in hours or days behind. Twitter is all about realtime, but with over 160 million users producing over 90 million tweets per day, we need realtime analytics that scaled horizontally. This talk discusses the development of that infrastructure, as well as the products we are beginning to build on top of it. Read more.
Location: Mission City M
Benjamin Black (Boundary)
Average rating: ****.
(4.18, 17 ratings)
The rise of sensor network data and the expectation for low latency query responses combine to obsolete available databases and storage platforms. We have built a platform for web-scale OLAP and in this talk I will cover how we made our infrastructure capable of real-time update and query performance over hundreds of terabytes of multidimensional data. Read more.
Location: Mission City B1
Nicholas Yee (PARC), Nic Ducheneaut (PARC)
Average rating: ****.
(4.33, 3 ratings)
Virtual worlds are a goldmine of untapped insights, even for predicting physical behaviors. Not only will we share PARC findings and methods developed to extract key data from online games, but more importantly, we'll discuss how social scientists converted and processed raw behavioral metrics into meaningful psychological variables that can be applied to a broad spectrum of business applications. Read more.


  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts