Schedule: Deep Data sessions

Michael Rys (Microsoft Corp.)
Average rating: ***..
(3.00, 1 rating)
Contrary to popular belief, SQL and NoSQL are not at odds with each other, they are duals—in fact NoSQL should really be called coSQL. Recognizing this duality can change the way we think about which technology to use when, and what we need to invest in next. Read more.
Ballroom AB
Deep Data is a no-holds-barred program for data scientists. The advanced technical content will keep you up to speed with the latest techniques, and give you the opportunity to debate and network with the most skilled data scientists in our industry. Read more.
Claudia Perlich (Dstillery)
Average rating: ****.
(4.00, 1 rating)
With the collection of almost every piece of information about your customers comes the ability to start asking your data the right question: Why do they do what they do? And even more: what would they do if I could interact with them. We show for the case of online display advertising, how causal analysis gives interesting new answers about the right (and wrong) ways of spending your money. Read more.
Monica Rogati (Data Natives)
Average rating: ****.
(4.50, 2 ratings)
Getting training data for a recommender system is easy: if users clicked it, it’s a positive - if they didn’t, it’s a negative. … Or is it? In this talk, we use examples from production recommender systems to bring training data to the forefront: from overcoming presentation bias to the art of crowdsourcing subjective judgments to creative data exhaust exploitation and feature creation. Read more.
Jacob Perkins (Weotta)
Average rating: ***..
(3.00, 1 rating)
Learn various ways to bootstrap a custom corpus for training highly accurate natural language processing models. Real world examples will be presented with Python code samples using NLTK. Each example will show you how, starting from scratch, you can rapidly produce a highly accurate custom corpus for training the kinds of natural language processing models you need. Read more.
Ben Gimpert (Altos Research)
Average rating: ***..
(3.00, 1 rating)
Twenty-first century big data is being used to train predictive models of emotional sentiment, customer churn, patient health, and other behavioral complexities. Variable importance and feature selection reduces the dimensionality of our models, so an unfeasible and complex problem may become somewhat more predictable. Read more.
Matt Biddulph (Product Club)
Average rating: ****.
(4.00, 1 rating)
The tools of social network analysis are based on mathematical network theory. There is very little in these techniques that actually requires that the data represents social activity. We'll show how these techniques can be applied to data from areas such as geo, linguistics and the Wikipedia link graph. We'll visualise and explore the data using Gephi, the "Photoshop for graphs". Read more.
Average rating: ****.
(4.00, 1 rating)
Relational databases were based on Set theory — which insists that the order of items does not matter. For many (most?) data problems, however, order does matter. By using Array theory, a relational-like database gains a considerable advantage over set-theory based engines. Read more.
Robert Lancaster (Orbitz Worldwide)
Average rating: ****.
(4.00, 1 rating)
We examine the effectiveness of a statistical technique known as survival analysis to optimize the cache time-to-live for hotel rates in a hotel rate cache. We describe how we collect and prepare nearly a billion records per day utilizing MongoDB and Hadoop. Finally, we show how this analysis is improving the operation of our hotel rate cache. Read more.
Pete Skomoroch (Workday), Michael Driscoll (Metamarkets), DJ Patil (White House Office of Science and Technology Policy), Toby Segaran (Google), Pete Warden (TensorFlow), Amy Heineike (Primer)
Average rating: ****.
(4.00, 1 rating)
Join leading data scientists in debating hot issues in the profession. Read more.


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts