Schedule: Hadoop & Big Data: Applied sessions

This track looks at big data platforms, with a particular focus on those that can handle massive amounts of information in parallel. We’ll look at real-time, scale-out architectures that crunch data fast, and what it takes to build and run them.

Ballroom CD
Eric Baldeschwieler (Independent)
In this session, Hortonworks CEO Eric Baldeschwieler will look at the current state of Apache Hadoop, how the ecosystem is evolving by working together to close the existing technological and knowledge gaps, and present a roadmap for the future of the project. Read more.
Ballroom CD
Jack Norris (MapR Technologies)
Average rating: ****.
(4.00, 2 ratings)
This session will draw on numerous customer examples to reveal powerful tips, tricks, and in-depth use cases to show how Hadoop can easily integrate, scale, and analyze important data. Read more.
Ballroom CD
Sam Shah (LinkedIn)
Average rating: ****.
(4.50, 2 ratings)
In this talk, we'll build a complete, scalable collaborative filtering ("people who X also Y") system that is almost identical to what prominent Internet properties use today. We'll talk about model improvements, performance enhancements, and practical considerations. This is a practical talk accessible to all. Read more.
Ballroom CD
Asad Khan (Microsoft)
As more companies adopt Hadoop to perform data intensive tasks for large data sets, there is a burning need to make Hadoop available to a broader set of developers. This talk covers two approaches Microsoft is exploring for this purpose: 1. JavaScript interfaces to run Hadoop jobs and 2. web interfaces for Hadoop that let developers write and run MapReduce jobs from any platform. Read more.
Ballroom CD
Vipul Sharma (Eventbrite)
Average rating: *****
(5.00, 1 rating)
This talk will go in details, architecture and challenges of building a recommendation system on a massive social graph. The talk will describe how we applied learning on large datasets using Apache Hadoop and how we scaled millions of reads and writes. We will also showcase Eventbrite's data platform architecture. Read more.
Ballroom CD
Stefan Groschupf (Datameer)
This session discusses financial services use cases and challenges in using Hadoop analytics including long-term storage and analytics of transactions, identifying cross and up sell opportunities by analyzing web log files and customer profiles, value-at-risk analytics, and understanding the SLA issues and identifying problems in a thousands-of-nodes, big-services oriented architecture. Read more.
Ballroom CD
Hundreds of hours of video recordings culled from multiple cameras. Most of these recordings hold little value as the scene does not change for extended periods of time. For organizations that must keep the original in tact, analyzing these recordings can be very difficult. Using Map/Reduce we can harness parallel processing to identify and tag useful periods of time for faster analysis. Read more.
Ballroom CD
Ed Kohlwey (Booz Allen Hamilton)
Average rating: *....
(1.00, 1 rating)
Map/Reduce has created tremendous interest in parallel programming and big data analytics, but it isn't always the right tool for the job. Many new projects have emerged in this space over the last year including two cluster schedulers (YARN and Mesos) and numerous parallel computing environments. We'll provide an introduction to these new technologies, including some you might not have heard of. Read more.
Ballroom CD
Ron Bodkin (Google), Kumar Palaniappan (NetApp)
NetApp collects 250 TB per year of unstructured data from devices that phone home. They need to be able to do ad hoc analysis and build predictive models for device support and cross-sales. We discuss our experiences building a Big Data system with NetApp using Hadoop and HBase to improve customer service, drive sales and develop better products. Read more.


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts