Office Hours

Office Hours give you a chance to meet face-to-face with expert Strata presenters.

No sign-ups necessary. Just stop by to discuss the speaker's area of expertise, give feedback about their sessions, or ask questions.

Office Hours will take place in the Exhibit Hall, as scheduled below.

Wednesday, February 29th


  • How do new tools and technologies enable us to solve previously unsolvable problems?
  • Which industry verticals can benefit and how?
  • What are some use cases currently being used?
  • What are some of the implementation and technology challenges for the same?
  • Curation: the human side of ETL.
  • Why Big Data rarely equals Big Insight.
  • Maps and Big Data: where to meet?
  • Data v Metadata: Who Shall Rule?

The challenge is not only to analyze Big Data, but also to process it, and to respond to it, in real time. How do businesses navigate through this ocean of information effortlessly, immersed in it, seeing, hearing, and responding to it moment to moment? We need tools born of 3D animation. And to work with it and find the best solutions, we need the kinds of gameplay mechanics pioneered by massively multiplayer gaming.

I'll present examples from my work in applying immersive animation techniques and gaming dynamics, and discuss how they can address the challenges of consuming – and responding to – the data deluge, turning information overload into business advantage.


  • Why context matters to data analysis
  • The toolkit for working with big data
  • Infochimps Big Data tools and how they can help the average developer or an enterprise
  • How to get started in data science
  • How to use predictive modeling in practice
  • How to apply elastic net and random forests in practice
  • How to scale predictive modeling algorithms
  • When to use predictive modeling vs “simpler” ad hoc methods

In addition to audience questions (moderated by Alex Howard), the session will pull recent use-cases from the press and evaluate them against policy frameworks that would help data professionals think about how their products might (not) unearth privacy issues. Example cases that we may discuss include:

  • Various voter micro-targeting efforts
  • The TomTom in-car navigation
  • Government's data-matching cost-savings initiatives


Ted Dunning will primarily discuss Mahout, MapR, and machine learning, but will be happy to answer any questions that start with M—or anything else that he knows something about.

  • MSFT Big Data Vision and Roadmap
  • Business value of Big Data and the associated technology shifts required to fully leverage this phenomenon
  • Importance of BI + Big Data
  • SQL Azure Labs (Social Analytics, Data Explorer, "Data Hub" aka Private Marketplace, Trust Services)
  • Azure Marketplace
  • What are the big data problems the social sector has (both in terms of broad spectrum problems like data collection and specific data science needs like mapping)?
  • How do you keep data scientists and social org's involved in the process?
  • How do you get data scientists and social sector folks to talk to each other?
  • How do we broaden our horizons (include developers / designers / governments)?


Ready to discuss whatever is on your mind!

Sarah will be discussing anything Hadoop related - such as MapReduce, Hive, Pig and HBase.

Martin will discuss becoming a data-driven organization; why Hadoop is relevant to your organization; how Karmasphere can help; and how Karmasphere fits into the Hadoop Ecosystem.

Thursday, March 1st


Practical aspects of dealing with unstructured data.

Effective Data Visualization

Noah will cover best practices for data visualization, information design, and user experience.


Anything related to Google Maps APIs, Google Earth, KML, and geodata in general.

Introduction to and a discussion of "The Internet of Things"

HBase schema design


  • Mobile-to-cloud data synchoronization;
  • Geospatial data processing;
  • Unstructured data management;
  • Analyzing unstructured data with map-reduce
  • Storm
  • Cascalog
  • ElephantDB
  • Designing realtime Big Data architectures

Spend some time with Bill working through a worksheet exercise to identify potential areas of the business where big data and advanced analytics could drive business value.

  • Usability and User Experience
  • Contextualization
  • The Data Challenges of Developing Countries
  • Platform Accessibility


  • Visual analytics: How to go from eye candy to actual decision support
  • Corporate fear factor: Dealing with the business/management obstacles to analytics
  • In-house vs. outsourcing: How to think about build vs. buy for big data
  • Causal analysis in observational data: Rather than spending effort and money on A/B tests, we show how to use predictive modeling to derive reliable measure on the actual effect that an ad has.
  • Inventory Optimization: How you can use predictive modeling (logistic regression) to evaluate a particular inventory with respect to its impact on the conversion probability of customers. More generally we are looking at counter-factual modeling dealing with biases in data samples for predictive modeling: So you can only get enough data when you look under the streetlight. But you need to illuminate the dark: Building staged predictive models can combine the predictive information from the street light with the 'correct' adjustments to work in the data.
  • Finding groups of url's: clustering website without crawling or parsing the content
  • Lessons from running large scale semi-automated machine learning system in production

And more generally beyond the advertising applications:

  • On the dangers of leakage and why your data scientist has to pull his/ her own data
  • Ranking and probability estimation in Millions of dimensions
  • Open topics
  • Feedback on ideas
  • Review biz ideas


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts