Strata + Hadoop World 2012 Schedule

Below are the confirmed and scheduled talks at Strata + Hadoop World 2012 (schedule subject to change).

Customize Your Own Schedule

Create your own Strata + Hadoop World schedule using the personal scheduler function. Mark the tutorials, sessions, keynotes, and events you want to attend by selecting the calendar icon [calendar icon] next to each listing. Then go to your personal schedule and get your own customized schedule generated.

See the list of all events happening onsite, starting on Monday, October 22.

Beekman / Sutton North (NY Hilton)
9:00am Communicating Data Clearly Naomi Robbins (NBR)
1:30pm Designing Data Visualizations Workshop Noah Iliinsky (Amazon Web Services)
Sutton Center / Sutton South (NY Hilton)
1:30pm Crunching Big Data with R and Hadoop Ed Kohlwey (Booz Allen Hamilton), Stephanie Beben (Booz Allen Hamilton)
Murray Hill (NY Hilton)
9:00am An Introduction to Hadoop Mark Fei (Cloudera)
1:30pm Search and Real-time Analytics on Big Data Sewook Wee (Accenture), Ryan Tabora (Think Big Analytics), Jason Rutherglen (Datastax)
Gramercy Suite (NY Hilton)
9:00am Testing Hadoop Applications Tom Wheeler (Cloudera, Inc.)
1:30pm Hadoop Data Warehousing with Hive Dean Wampler (Lightbend)
Grand East (NY Hilton)
9:00am Using HBase Amandeep Khurana (Cloudera), Matteo Bertozzi (Cloudera)
1:30pm Best Practices for Building and Deploying Predictive Models over Big Data Robert Grossman (Open Data Group), Collin Bennett (Open Data Group)
Grand West (NY Hilton)
Regent Parlor (NY Hilton)
9:00am Dealing with Dirty Data - Finding the Right Tool for the Job Susan McGregor (Columbia University), Alice Brennan (The New York World), Michael Sullivan (The New York World)
1:30pm Building a Large-scale Data Collection System Using Flume NG Hari Shreedharan (Cloudera Inc.), Will McQueen (Cloudera Inc.), Arvind Prabhakar (Cloudera), Prasad Mujumdar (Cloudera Inc.), Mike Percy (Cloudera)
Nassau (NY Hilton)
6:30pm Plenary
Room: Metropolitan West (Sheraton NY)
Startup Showcase
5:00pm Plenary
Room: Grand Ballroom Foyer (NY Hilton)
Attendee Reception
12:30pm Lunch sponsored by Intel
Room: America's Hall (NY Hilton)
9:00am-12:30pm (3h 30m) Visualization & Interface
Communicating Data Clearly
Naomi Robbins (NBR)
Communicating Data Clearly describes how to draw clear, concise, accurate graphs that are easier to understand than many of the graphs one sees today. The tutorial emphasizes how to avoid common mistakes that produce confusing or even misleading graphs. Graphs for one, two, three, and many variables are covered as well as general principles for creating effective graphs.
1:30pm-5:00pm (3h 30m) Visualization & Interface
Designing Data Visualizations Workshop
Noah Iliinsky (Amazon Web Services)
This workshop is a jumpstart lesson on how to get from a blank page and a pile of data to a useful data visualization. We'll focus on the design process, not specific tools. Bring your sample data and paper or a laptop; leave with new visualization ideas.
9:00am-12:30pm (3h 30m) Data Science
A Hands-on Introduction to Cross-disciplinary Analytics With Python
Roy Hyunjin Han (CrossCompute)
Python is the language of choice when it comes to integrating analytical components. We will present a series of concepts and walkthroughs that illustrate how easy scientific computing is in Python, from machine learning and time series to spatial relationships and network analysis.
1:30pm-5:00pm (3h 30m) Hadoop: Tools & Technology
Crunching Big Data with R and Hadoop
Ed Kohlwey (Booz Allen Hamilton), Stephanie Beben (Booz Allen Hamilton)
In this tutorial, we’ll provide an introduction to an open source Map/Reduce library for R called RHadoop that makes Map/Reduce programming convenient and easy to understand for statistical modeling users. The session will cover the basics of RHadoop, common techniques and best practices, and some interactive real-world examples.
9:00am-12:30pm (3h 30m) Hadoop: Tools & Technology
An Introduction to Hadoop
Mark Fei (Cloudera)
Apache Hadoop is enabling companies across many different industries that need to process and analyze large data sets. In this tutorial you will learn why and how people are using Hadoop and related technologies like Hive, Pig and HBase.
1:30pm-5:00pm (3h 30m) Hadoop: Case Studies, Hadoop: Tools & Technology
Search and Real-time Analytics on Big Data
Sewook Wee (Accenture), Ryan Tabora (Think Big Analytics), Jason Rutherglen (Datastax)
This tutorial will help participants understand why distributed search is important and teach them how to use the landscape of tools available. Based on our hands-on experience at NetApp, we will lead a tutorial session that will teach participants how to setup and use search technologies such as Apache Solr and Lucene to enable real-time Big Data analytics with Hadoop, HBase, and other NoSQL.
9:00am-12:30pm (3h 30m) Hadoop: Tools & Technology
Testing Hadoop Applications
Tom Wheeler (Cloudera, Inc.)
This tutorial will explore the tools and techniques you need to ensure that your MapReduce applications are both correct and efficient. You'll learn how to do unit testing, integration testing and performance testing for your Hadoop jobs, as well as how to intepret diagnostic information to isolate and solve problems in your code.
1:30pm-5:00pm (3h 30m) Hadoop: Tools & Technology
Hadoop Data Warehousing with Hive
Dean Wampler (Lightbend)
This hands-on tutorial teaches you how to setup and use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming.
9:00am-12:30pm (3h 30m) Hadoop: Tools & Technology
Using HBase
Amandeep Khurana (Cloudera), Matteo Bertozzi (Cloudera)
HBase is one of the more popular open source NoSQL databases that have cropped up over the last few years. Building applications that use HBase effectively is challenging. This tutorial is geared towards teaching the basics of building applications using HBase and covers concepts that a developer should know while using HBase as a backend store for their application.
1:30pm-5:00pm (3h 30m) Business & Industry, Data Science
Best Practices for Building and Deploying Predictive Models over Big Data
Robert Grossman (Open Data Group), Collin Bennett (Open Data Group)
A successful big data analytic project is not just about selecting the right algorithm for building a predictive model, but also about how to deploy the model efficiently into operational systems, how to evaluate the effectiveness of the model, and how to continuously improve it. In this tutorial we cover best practices for each of these phases in the life cycle of a predictive model.
9:00am-5:00pm (8h) Business & Industry, Data Driven Business Day
Data Driven Business Day
For business strategists, marketers, product managers, and entrepreneurs, Data Driven Business Day looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world.
9:00am-12:30pm (3h 30m) Data Science
Dealing with Dirty Data - Finding the Right Tool for the Job
Susan McGregor (Columbia University), Alice Brennan (The New York World), Michael Sullivan (The New York World)
This tutorial will provide novice users with an overview of a range of common tools use for data cleaning and analysis - including Microsoft Excel, Google Refine, Python and R - along with their relative strengths and weaknesses. Attendees will not only learn useful new skills, and they will know what kind of expertise they need to seek out for help with more complex tasks.
1:30pm-5:00pm (3h 30m) Hadoop: Tools & Technology
Building a Large-scale Data Collection System Using Flume NG
Hari Shreedharan (Cloudera Inc.), Will McQueen (Cloudera Inc.), Arvind Prabhakar (Cloudera), Prasad Mujumdar (Cloudera Inc.), Mike Percy (Cloudera)
Apache Flume (incubating) is a scalable, reliable, fault-tolerant, distributed system designed to collect and transfer massive amounts of event data from disparate systems into some storage tier such as Hadoop HDFS. In this tutorial we show how to easily build a large-scale data collection and transfer system in a scalable way using Flume NG, the next generation of Flume.
9:00am-5:00pm (8h) Bridge to Big Data
Bridge to Big Data
For CIOs, IT executives, and technology professionals, Strata's Bridge to Big Data lays out the roadmap to get your organization up to speed on big data. In this all-day event, learn how to create big data strategy, manage your first pilot project, demystify vendor solutions and understand how big data differs from BI.
6:30pm-8:00pm (1h 30m)
Startup Showcase
Part of NYC Data Week. Don't miss Startup Showcase, Strata's live demo program and competition for startups and early-stage companies. Judges Tim O'Reilly and Fred Wilson will pick winners from 10 finalist companies selected to present at the showcase.
5:00pm-6:00pm (1h)
Attendee Reception
Join your fellow big data enthusiasts at the Strata Conference & Hadoop World Attendee Reception on on Tuesday, October 23.
12:30pm-1:30pm (1h)
Break: Lunch sponsored by Intel

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.