Schedule: Hadoop & Big Data: Tech sessions

R and Hadoop, the two hottest stars on the Analytics stage, were meant to be together. The open source RHadoop project was established to make it happen. We'll go over what RHadoop does for you, how to use it, and why you should add it to your toolset. Read more.
Henry Robinson (Cloudera)
At Cloudera, we've found that monitoring Apache Hadoop is itself a big data problem. Here I'll present work we've been doing on turning the vast amounts of monitoring data a Hadoop cluster generates into meaningful signals to help us wrestle with the biggest challenges of maintaining large distributed systems: failure of machines, processes and people, and root-cause analysis after-the-fact. Read more.
Mark Pollack (SpringSource/VMware)
Hadoop is not an island. To deliver a complete Big Data solution, a data pipeline needs to be developed that incorporates and orchestrates many diverse technologies. Using an example of real-time weblog processing, in this session we will demonstrate how the open source Spring Batch and Spring Integration projects can be used to build manageable and robust pipeline solutions around Hadoop. Read more.
Deepak Senapati (Cloudera)
Cloudera Data Scientist Josh Wills will share insights and “how to” tricks about Crunch, a Java library that aims to make writing, testing and running MapReduce pipelines that run over any type of data easy, efficient and even fun. Read more.
Steve Francia (10gen)
Learn how to integrate MongoDB with Hadoop for large-scale distributed data processing. Read more.
Stefan Groschupf (Datameer)
Average rating: ***..
(3.00, 1 rating)
Using Hadoop based business intelligence analytics, we analyzed Hadoop source code over time. This talk illustrates text and related analytics with Hadoop on Hadoop to reveal the true hidden secrets of the elephant. This entertaining session highlights the value of data correlation across multiple datasets and the visualization of those correlations to reveal hidden data relationships. Read more.
Jonathan Ellis (DataStax)
NoSQL, Big Data, massive scale, real-time, in the cloud, do I need it, do I want it, how the heck can I even know if it’s right for me? Choosing any database solution is a critical and tricky decision. Navigating the murky waters of NoSQL can be even tougher. Read more.
Nathan Marz (Twitter)
Average rating: **...
(2.00, 1 rating)
Storm is an open-source realtime computation system relied upon by Twitter for much of its analytics. Storm does for realtime computation what Hadoop did for batch computation. It has a huge range of applications and combines ease of use with a robust foundation. Read more.
Sean Byrnes (Flurry, Inc.)
Flurry provides an analytics and advertising platform for smartphone applications. Every month we track over 20 billion sessions across over 330 million devices. This talk will go over the Hadoop and HBase architecture we run and the challenges we face managing a massively growing data set. Read more.
James Phillips (Couchbase, Inc.)
Average rating: **...
(2.00, 2 ratings)
Mobile devices offer boundless opportunities for collection and presentation of temporally- and spatially-relevant data. But there are obstacles: intermittent connectivity as well as processing, storage and other constraints. Featuring real-world apps, this session covers device data collection; device-device and device-cloud data synchronization; and data aggregation and analysis in the cloud. Read more.
baldwin ferruza (Inktank)
Average rating: ****.
(4.00, 1 rating)
Data storage needs are increasing at an exponential rate. Incumbent storage systems are proprietary, expensive to buy and expensive to maintain. With the advent of the cloud, everyone expects auto scaling. Ceph storage is a massively scalable storage system that aims to fill the distributed storage system void. Read more.


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts