Strata + Hadoop World 2012 Schedule

Below are the confirmed and scheduled talks at Strata + Hadoop World 2012 (schedule subject to change).

Customize Your Own Schedule

Create your own Strata + Hadoop World schedule using the personal scheduler function. Mark the tutorials, sessions, keynotes, and events you want to attend by selecting the calendar icon [calendar icon] next to each listing. Then go to your personal schedule and get your own customized schedule generated.

See the list of all events happening onsite, starting on Monday, October 22.

Beekman / Sutton North (NY Hilton)
10:50am Helping the World’s Farmers Adapt to Climate Change Siraj Khaliq (The Climate Corporation)
11:40am Moneyballing Criminal Justice: Using Data to Reduce Crime Anne Milgram (NYU Law Center on the Administration of Criminal Law Center)
2:30pm How a Traditional Media Company Embraced Big Data Oscar Padilla (Entravision Communications), Franklin Rios (Luminar), Vineet Tyagi (Impetus Technologies)
4:10pm Big Data Analytic Solution Accelerators Kevin Foster (IBM)
Murray Hill (NY Hilton)
10:50am Data Analysis for Explorers Jesper Andersen (Bloom Studios)
11:40am How to See Data Kim Rees (Periscopic)
1:40pm Making Major League Data Work: Carving Up Big Data into Useful Applications for Specific Audiences Richard Brath (Uncharted Software), Noah Schwartz (Bloomberg Sports)
2:30pm hGraph: An Open System for Visualizing Personal Health Metrics Juhan Sonin (Involution Studios)
5:00pm Data Visualization: Statistical Graphics or Data Art? Nigel Holmes (Explanation Graphics), Jon Peltier (Peltier Technical Services), Naomi Robbins (NBR)
Sutton Center / Sutton South (NY Hilton)
10:50am MapReduce Design Patterns Donald Miner (Miner & Kasch)
1:40pm Finance vs. Machine Learning Cathy O'Neil (Intent Media)
4:10pm Creative Thinking and Data Science Mike Stringer (Datascope Analytics)
5:00pm Breeding Data Scientists Amy OConnor (Nokia), Danielle Dean (Nokia)
Gramercy Suite (NY Hilton)
10:50am Data Science with Hadoop at Opower Erik Shilts (Opower)
11:40am DCGS-Army Standard Cloud Multimedia David Bauer (Data Tactics Corporation)
1:40pm Facebook’s Large Scale Monitoring System Built on HBase Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook)
4:10pm Hadoop, HBase, and Healthcare Ryan Brush (Cerner Corporation)
5:00pm Searching for the Genetic Causes of Disease with Hadoop Charles Schmitt (Renaissance Computing Institute)
Grand East (NY Hilton)
10:50am Using Hadoop to do Agile Iterative ETL Ben Werther (Platfora), Kevin Beyer (Platfora)
11:40am Large Scale ETL with Hadoop Eric Sammer (Rocana)
1:40pm Future of Data Processing with Apache Hadoop Arun Murthy (Hortonworks Inc.)
4:10pm HDFS - What is New and Future Sanjay Radia (Hortonworks), Todd Lipcon (Cloudera)
5:00pm High Availability for the HDFS NameNode: Phase 2 Aaron Myers (Cloudera, Inc.), Todd Lipcon (Cloudera)
Grand West (NY Hilton)
10:50am Big Data for the Masses: How We Opened Up the Doors to Google’s Dremel Michael Manoochehri (Google, Inc.), Jim Caputo (Google, Inc.)
1:40pm Deconstructing the Database Rich Hickey (Datomic)
2:30pm Beyond Hadoop: Fast Ad-Hoc Queries on Big Data Mike Driscoll (Metamarkets), Eric Tschetter (Metamarkets)
Regent Parlor (NY Hilton)
10:50am Drill into Big Data Tomer Shiran (MapR Technologies), Jack Norris (MapR Technologies)
11:40am Technology Strategies for Big Data Analytics Scott Chastain (SAS)
1:40pm Drive Smarter Decisions with Microsoft Big Data Shawn Bice (Microsoft)
2:30pm Bringing Real-Time, End-to-End Analytics into Everyday Use Greg Khairallah (Intel), Vin Sharma (Intel)
4:10pm 'Data Exponential' - K-12 Learning Analytics for Personalized Learning at Scale: Opportunities and Challenges Roy Pea (Stanford University), Stephen Coller (Bill and Melinda Gates Foundation), H. Taylor Martin (Utah State University), Ken Koedinger (Carnegie Mellon)
5:00pm Human Intelligence as a Signal to Predictive Analytics Rob Metcalf (Digital Reasoning), Laks Srinivasan (Opera Solutions)
Nassau (NY Hilton)
10:50am Hadoop as a Complementary Data Platform at PayPal Moises Nascimento (PayPal), Nagaraju Chayapathi (PayPal)
11:40am Tying the Knot Between Hadoop and EDW David Jonker (SAP)
1:40pm Hadoop Analytics Without a Ph.D Richard Daley (Pentaho Corporation)
2:30pm Monitoring Cloud Data Gary Dusbabek (Rackspace)
4:10pm How Comcast Turns Big Data into Real-Time Operational Insights Patrick Shumate (Comcast Cable), Raanan Dagan (Splunk)
5:00pm Designing Hadoop for the Enterprise Data Center Jacob Rapp (Cisco Systems), Eric Sammer (Rocana)
8:45am Plenary
Room: Grand Ballroom (NY Hilton)
Wednesday Welcome Edd Wilder-James (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
8:55am Plenary
Room: Grand Ballroom (NY Hilton)
Big Answers Mike Olson (Cloudera)
9:10am Plenary
Room: Grand Ballroom (NY Hilton)
The End of the Data Warehouse Ben Werther (Platfora)
9:15am Plenary
Room: Grand Ballroom (NY Hilton)
Moneyball for New York City Michael Flowers (NYC Mayor's Office of Policy and Strategic Planning)
9:25am Plenary
Room: Grand Ballroom (NY Hilton)
Thinking Big Together: Driving the Future of Data Science Annika Jimenez (Pivotal), Anthony Goldbloom (Kaggle)
9:35am Plenary
Room: Grand Ballroom (NY Hilton)
The Composite Database Rich Hickey (Datomic)
9:45am Plenary
Room: Grand Ballroom (NY Hilton)
The Democratization of Big Data: Bringing Hadoop to the Masses James Markarian (Informatica)
9:50am Plenary
Room: Grand Ballroom (NY Hilton)
Big Data Direct – The Era of Self-driven Big Data Exploration Sharmila Mulligan (ClearStory Data)
10:00am Plenary
Room: Grand Ballroom (NY Hilton)
Bringing the 'So What' to Big Data Tim Estes (Digital Reasoning)
10:20am Morning Break sponsored by Informatica
Room: Break
3:10pm Afternoon Break sponsored by Platfora
Room: Break
8:00am Coffee break sponsored by Hortonworks
Room: Grand Ballroom Foyer (NY Hilton)
5:40pm Plenary
Room: Grand Ballroom Foyer (NY Hilton)
Attendee Reception
12:20pm Lunch sponsored by Greenplum, a division of EMC
Room: America's Hall (NY Hilton)
Wednesday Lunch and BoFs
6:40pm Plenary
Room: Liberty Theatre, 234 W 42nd Street
TBC
8:00pm Plenary
Room: Liberty Theatre, 234 W 42nd Street
Data After Dark
10:50am-11:30am (40m) Hadoop: Case Studies
Helping the World’s Farmers Adapt to Climate Change
Siraj Khaliq (The Climate Corporation)
Big Data takes on the planet’s toughest challenge by analyzing weather’s complex behavior. Using hundreds of terabytes of data and trillions of simulation datapoints, The Climate Corporation models weather’s impact on crops to create customized insurance for farmers facing the financial impact of extreme weather.
11:40am-12:20pm (40m) Data Science
Moneyballing Criminal Justice: Using Data to Reduce Crime
Anne Milgram (NYU Law Center on the Administration of Criminal Law Center)
Anne Milgram, Senior Fellow at the NYU Law Center on the Administration of Criminal Law Center.
1:40pm-2:20pm (40m) Business & Industry
Zillow: Disrupting the Real Estate Marketplace with Data
Stan Humphries (Zillow)
Real estate used to be an industry that had large information asymmetry between professionals and consumers. Zillow has leveled the playing field through its living database on over 100 million homes. Advanced statistical modeling gives consumers even more information and tools, such as the well-known Zestimate. Using data and analytics, Zillow has changed the real estate industry forever.
2:30pm-3:10pm (40m) Business & Industry
How a Traditional Media Company Embraced Big Data
Oscar Padilla (Entravision Communications), Franklin Rios (Luminar), Vineet Tyagi (Impetus Technologies)
How a traditional Spanish-language media company, made the strategic decision to build a robust analytics intelligence division to more effectively target the Hispanic market. Attendees will walk away with insights on how this traditional media company implemented a big data and MapReduce operations from the ground up.
4:10pm-4:50pm (40m) Business & Industry
Big Data Analytic Solution Accelerators
Kevin Foster (IBM)
In this session, Kevin Foster, IBM Big Data Solution Architect, will provide an overview of big data analytic accelerators and how they are being used by organizations to speed up deployments and solve big data problems sooner.
5:00pm-5:40pm (40m) Business & Industry
Explore/Exploit: Driving Business Value with Big Data
Raymie Stata (Altiscale)
Success in Big Data requires both finding new signals buried in your data sources ("explore"), and using those signals to drive business value ("exploit"). Based on his background in Web Search and Internet Advertising, the speaker will describe these two aspects of Big Data and some of the success factors for each.
10:50am-11:30am (40m) Visualization & Interface
Data Analysis for Explorers
Jesper Andersen (Bloom Studios)
In this session we will discuss how subjectivity can be encoded in data, and how this data can be used to help users experience a city more gracefully. We'll create maps and visualizations that re-enforce the ways users engage with cities and augment these experiences using social and crowd-sourced data sources, analytics and both artistic and literal visualization to convey this information.
11:40am-12:20pm (40m) Visualization & Interface
How to See Data
Kim Rees (Periscopic)
Data has been locked in a mindset of rows and columns. Our brains are trapped by database schemas. To get out of that predisposition and communicate visually requires new thinking. This session covers techniques for reframing our thoughts about data, how to describe data, forming a narrative, and coming up with visual solutions.
1:40pm-2:20pm (40m) Visualization & Interface
Making Major League Data Work: Carving Up Big Data into Useful Applications for Specific Audiences
Richard Brath (Uncharted Software), Noah Schwartz (Bloomberg Sports)
MLB captures 10Tb of game data every year. While valuable data, lessons were quickly learned that effective use of this data required different visual front-ends for fans, players, coaches and scouts. The ability to adapt and address different audiences helped the success of this project and can help other big data projects.
2:30pm-3:10pm (40m) Visualization & Interface
hGraph: An Open System for Visualizing Personal Health Metrics
Juhan Sonin (Involution Studios)
hGraph is a compelling, standardized visual representation of a patient's health status for clinicians and patients. Designed to increase awareness of the individual's factors that can affect one's health and lead to improved outcomes, hGraph aggregates all of an individual's health metrics in one location, in a single picture.
4:10pm-4:50pm (40m) Visualization & Interface
Visualization – An Emerging Collaboration Opportunity
Lee Feinberg (DecisionViz)
Attendees with learn practical examples how to build a collaborative environment that accelerates the value of big data, with the goal of “making data part of every conversation.”
5:00pm-5:40pm (40m) Visualization & Interface
Data Visualization: Statistical Graphics or Data Art?
Nigel Holmes (Explanation Graphics), Jon Peltier (Peltier Technical Services), Naomi Robbins (NBR)
"Data visualization" means different things to different people. Some say that to be effective, visualizations need to be clear, concise and accurate. Others say that to be effective, visualizations need to be eye-catching, engaging, and innovative. Naomi Robbins will moderate a panel composed of Jon Peltier and Nigel Homes.
10:50am-11:30am (40m) Data Science, Hadoop: Case Studies
MapReduce Design Patterns
Donald Miner (Miner & Kasch)
The Hadoop and data science communities have matured to the point now that common design patterns across domains are beginning to emerge. Now that Hadoop is maturing and momentum is gaining in the user base, the experienced users can start documenting design patterns that can be shared. In this talk, we'll talk about what makes up a MapReduce design pattern and give some examples.
11:40am-12:20pm (40m) Data Science
Analyzing Millions of GitHub Commits: What Makes Developers Happy, Angry, and Everything in Between?
Ilya Grigorik (Google), Brian Doll (SourceClear)
Open-source developers all over the world contribute to millions of projects every day on GitHub: writing and reviewing code, filing bug reports and updating docs. Data from these events provides an amazing window into open source trends: project momentum, language adoption, community demographics, and more.
1:40pm-2:20pm (40m) Data Science
Finance vs. Machine Learning
Cathy O'Neil (Intent Media)
In this talk techniques from mathematical financial models will be compared and contrasted with methods coming from machine learning. Specifically, we will discuss the concept of time series data, taking account of seasonality, how to avoid overfitting, continuous updating, and fitting a bayesian prior to your data science model. We will also discuss the question of when to use what tools.
2:30pm-3:10pm (40m) Data Science
Best Practices for Reproducible Research: A Case Study in Quantitative Finance
Chang She (Cloudera)
Proper tooling and good habits that maximize reproducibility are essential to being productive as a data scientist. From management of raw data to model version control, the entire workflow must be carefully controlled from end-to-end to produce quality research that scales with the quantity and complexity of data being analyzed.
4:10pm-4:50pm (40m) Data Science
Creative Thinking and Data Science
Mike Stringer (Datascope Analytics)
An effective data science team looks a lot like an effective design team: brainstorming creative ideas, making prototypes, receiving feedback, telling stories, and deeply understanding the needs of others.
5:00pm-5:40pm (40m) Data Science
Breeding Data Scientists
Amy OConnor (Nokia), Danielle Dean (Nokia)
Amy O'Connor, Sr. Director of Nokia Analytics, together with her daughter and Nokia Intern, Danielle Dean, will share what makes a great data scientist, their different paths to acquiring the diverse skill sets that are needed and finally Amy will discuss how to spot, attract and train emerging data scientists in what is quickly becoming a heated market.
10:50am-11:30am (40m) Hadoop: Case Studies
Data Science with Hadoop at Opower
Erik Shilts (Opower)
How does Opower deliver insights to millions of households with big (and getting bigger) data? I discuss how to effectively use Hadoop, integrate it with R and Python, and harness an engaged workforce to solve data science and efficiency problems.
11:40am-12:20pm (40m) Hadoop: Case Studies
DCGS-Army Standard Cloud Multimedia
David Bauer (Data Tactics Corporation)
DCGS-Army Standard Cloud Multimedia (DSC-M) is focused on the Full Motion Video aspects of our Cloudera Hadoop-based implementation for the U.S. Army.
1:40pm-2:20pm (40m) Hadoop: Case Studies, Hadoop: Tools & Technology
Facebook’s Large Scale Monitoring System Built on HBase
Liyin Tang (Facebook), Vinod Venkataraman (Facebook), Charles Thayer (Facebook)
ODS is Facebook's internal large-scale monitoring system. HBase turns out be to a good fit for its workload and solves some manageability and scalability challenges with the previous MySQL based setup. We would like to share a series of valuable experiences learnt from building this large scale realtime system based on HBase.
2:30pm-3:10pm (40m) Hadoop: Case Studies
Commercial Graph: A Map of Financial Relationships
Michael Radwin (Intuit)
Imagine the social graph where personal relationships are replaced by commercial relationships based on real financial data. Imagine the possibilities for small businesses to grow, connect, transact and prosper.
4:10pm-4:50pm (40m) Hadoop: Case Studies
Hadoop, HBase, and Healthcare
Ryan Brush (Cerner Corporation)
A look at using Hadoop, HBase and other technologies to bring together and process health data from many sources in real time. This includes techniques for dealing with data that's incomplete or out-of-order when it arrives, merging bulk and real-time data sets, and creating search indexes and data models to enable better health care.
5:00pm-5:40pm (40m) Hadoop: Case Studies
Searching for the Genetic Causes of Disease with Hadoop
Charles Schmitt (Renaissance Computing Institute)
Your DNA, written out as a string of G, A, T, and C, is about three and half gigabytes long. That string is about 99.9% identical to an arbitrary Reference Genome. Practically all of those differences are harmless, but a a tiny fraction can cause disease, contribute to disease, or just change how your body reacts to drugs. We're using Hadoop to find the variants that actually matter.
10:50am-11:30am (40m) Hadoop: Tools & Technology
Using Hadoop to do Agile Iterative ETL
Ben Werther (Platfora), Kevin Beyer (Platfora)
With traditional ETL (extract-transform-load) you need to decide how you want to transform and store the data before it arrives. Hadoop allows a much more agile pipeline – store the raw data, add a little metadata, and iteratively pull from it at whatever level of detail is needed right now by the application. We'll explore this approach and show you how you can start using it today
11:40am-12:20pm (40m) Hadoop: Tools & Technology
Large Scale ETL with Hadoop
Eric Sammer (Rocana)
While many of the necessary building blocks for data processing exist within the Hadoop ecosystem, it can be a challenge to assemble them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
1:40pm-2:20pm (40m) Hadoop: Tools & Technology
Future of Data Processing with Apache Hadoop
Arun Murthy (Hortonworks Inc.)
Apache Hadoop MapReduce has undergone a complete re-haul to emerge as Apache Hadoop YARN, a generic compute fabric to support MapReduce and other application paradigms. This really changes the game to recast Hadoop as a much more powerful data-processing system making Hadoop very different from itself 12 months ago. Now, ever wonder what it might look like in 12 months or 24 months or longer?
2:30pm-3:10pm (40m) Hadoop: Tools & Technology
Letting More Developers Dance with Elephants: What We Learned
Matt Winkler (Microsoft)
In this session we’ll discuss our experience extending Hadoop development to new platforms and languages, and key aspects of using non-JVM languages in the Hadoop environment.
4:10pm-4:50pm (40m) Hadoop: Tools & Technology
HDFS - What is New and Future
Sanjay Radia (Hortonworks), Todd Lipcon (Cloudera)
Hadoop 2.0 offers significant HDFS improvements: new append-pipeline, federation, wire compatibility, NameNode HA, performance improvements, etc. We describe the new features and their benefits and our plans for HDFS over the next year which includes Snapshots, Disaster recovery, RAID, performance improvements etc. We conclude with some of the misconceptions and myths about HDFS.
5:00pm-5:40pm (40m) Hadoop: Tools & Technology
High Availability for the HDFS NameNode: Phase 2
Aaron Myers (Cloudera, Inc.), Todd Lipcon (Cloudera)
The initial implementation of a highly-available HDFS NameNode successfully removed all single points of failure from HDFS. This talk discusses further improvements to this work, including automatic failure detection and failover initiation, as well as removing the dependency on an HA NFS filer.
10:50am-11:30am (40m) Hadoop & Beyond
Big Data for the Masses: How We Opened Up the Doors to Google’s Dremel
Michael Manoochehri (Google, Inc.), Jim Caputo (Google, Inc.)
Google’s Dremel is a scalable, interactive ad-hoc query system capable of running SQL-like queries over trillion-row tables in seconds. BigQuery is the externalization of this technology as a REST API and web app. This session will discuss the capabilities of Dremel and dive into the design challenges necessary to make this technology accessible and performant for developers and business users.
11:40am-12:20pm (40m) Hadoop & Beyond
How Draw Something Absorbed 50 Million New Users, in 50 Days, With Zero App Downtime
Frank Weigel (Couchbase, Inc.)
OMGPOP’s Draw Something broke all records when it went viral, skyrocketing to more than 50 million downloads and billions of drawings within a few weeks of launch – with no downtime. This session highlights the application architecture and data management technology that enabled this growth, and provides a real-time data management model for developers of any interactive web application.
1:40pm-2:20pm (40m) Hadoop & Beyond
Deconstructing the Database
Rich Hickey (Datomic)
The big data movement has highlighted the value of historical information, and storage is readily available, so why are you still using an update-in-place database? In this talk we'll deconstruct the traditional monolithic database with an eye towards leveraging the scaling properties of distributed architectures, while meeting the business needs for complete historical information.
2:30pm-3:10pm (40m) Hadoop & Beyond
Beyond Hadoop: Fast Ad-Hoc Queries on Big Data
Mike Driscoll (Metamarkets), Eric Tschetter (Metamarkets)
Hadoop is considered THE technology for addressing Big Data. While it shines as a processing platform, it does not respond anywhere close to "human time". In developing our solution, we needed the ability to query across billions of rows in seconds. Hear how and why we developed Druid, our distributed, in-memory OLAP data store after investigating various commercial and open source alternatives.
4:10pm-4:50pm (40m) Hadoop & Beyond
Trecul : Data Flow Processing Using LLVM-based JIT Compilation on Top of Hadoop
David Blair (Akamai Technologies)
Trecul is a dataflow system that powers Akamai's Online Adversting business, processing billions of events hourly. Trecul is built on top of HDFS & Hadoop Pipes to achieve fantastic runtime performance. We'll talk about it's use of LLVM-based JIT compilation so everything runs as native C++ code, no Java and no runtime interpreter. Akamai has open-sourced Trecul and it is available on Github.
5:00pm-5:40pm (40m) Hadoop & Beyond, Hadoop: Tools & Technology
GraphBuilder – Scalable Graph Construction using Hadoop
Nilesh Jain (Intel Corp)
The exponential growth of graph-based data analysis is fueling the need for machine learning. Recently, frameworks have emerged to perform these computations at large scale. But, feeding data to these frameworks is a challenge in itself. This talk introduces the GraphBuilder library for Hadoop, which makes the job easier for programmers. Several case studies showacse the utility of library.
10:50am-11:30am (40m) Sponsored
Drill into Big Data
Tomer Shiran (MapR Technologies), Jack Norris (MapR Technologies)
Google pioneered the use of the MapReduce framework and inspired the creation of Hadoop through their 2004 white paper. To understand the future of Hadoop and the future of Big Data, it’s important to understand how Google processes and analyzes Big Data internally.
11:40am-12:20pm (40m) Sponsored
Technology Strategies for Big Data Analytics
Scott Chastain (SAS)
This presentation provides an overview of how to comprehensively address big data, including emerging strategies for information management, analytics, and high performance computing.
1:40pm-2:20pm (40m) Sponsored
Drive Smarter Decisions with Microsoft Big Data
Shawn Bice (Microsoft)
Big Data is attracting strong interest from technologists and business users alike. Yet few organizations can actually reap the benefits of Big Data today because the barriers to entry are still too high. Existing tools are complex and require deep expertise in Hadoop and Data Analysis that are both in short supply.
2:30pm-3:10pm (40m) Sponsored
Bringing Real-Time, End-to-End Analytics into Everyday Use
Greg Khairallah (Intel), Vin Sharma (Intel)
Over the next decade, organizations will need to absorb, analyze, and act upon 50 times more data than they do today. To do this, they will need a scalable infrastructure that can support data-driven discovery and decision-making in real-time.
4:10pm-4:50pm (40m) Sponsored
'Data Exponential' - K-12 Learning Analytics for Personalized Learning at Scale: Opportunities and Challenges
Roy Pea (Stanford University), Stephen Coller (Bill and Melinda Gates Foundation), H. Taylor Martin (Utah State University), Ken Koedinger (Carnegie Mellon)
Kicking off with an Ignite-style presentation on the growing importance of our topic, this panel will feature multiple perspectives on what K-12 education can learn from Big Data efforts underway in other industries
5:00pm-5:40pm (40m) Sponsored
Human Intelligence as a Signal to Predictive Analytics
Rob Metcalf (Digital Reasoning), Laks Srinivasan (Opera Solutions)
This presentation will provide a detailed understanding of the latest techniques in entity resolution and simplified training of machine learning models and the direct impact on the quality of a comprehensive predictive analytics solution. Specific use cases in the financial services and intelligence communities will be featured.
10:50am-11:30am (40m) Sponsored
Hadoop as a Complementary Data Platform at PayPal
Moises Nascimento (PayPal), Nagaraju Chayapathi (PayPal)
PayPal utilizes Hadoop as a cost-effective data platform to handle growing data volumes. Hadoop along with other traditional data platforms serves different business needs at PayPal for customer sentiment analysis, fraud detection, market segmentation, etc. PayPal will share some early experiences with Informatica on Hadoop to move & integrate data on Hadoop & between different data platforms
11:40am-12:20pm (40m) Sponsored
Tying the Knot Between Hadoop and EDW
David Jonker (SAP)
Opposites attract and that’s the case with Hadoop and Enterprise Data Warehouses. Both have a role to play in your Big Data projects. This session explores the various approaches to marrying Hadoop to your EDW, and why you’ll want to do that in the first place.
1:40pm-2:20pm (40m) Sponsored
Hadoop Analytics Without a Ph.D
Richard Daley (Pentaho Corporation)
Maximize the value of data stored in Hadoop via operational and ad-hoc reporting, highly interactive analysis, advanced visualizations and dashboards
2:30pm-3:10pm (40m) Sponsored
Monitoring Cloud Data
Gary Dusbabek (Rackspace)
Monitoring thousands of servers generates a lot of data. Many organizations trying to harness the power of big data struggle with the same types of challenges as Rackspace's Cloud Monitoring team.
4:10pm-4:50pm (40m) Sponsored
How Comcast Turns Big Data into Real-Time Operational Insights
Patrick Shumate (Comcast Cable), Raanan Dagan (Splunk)
How do you keep up with the velocity and variety of data streaming in from the operational systems that power your business? What about getting analytics on your data even before you persist and replicate it?
5:00pm-5:40pm (40m) Sponsored
Designing Hadoop for the Enterprise Data Center
Jacob Rapp (Cisco Systems), Eric Sammer (Rocana)
In this joint session, experts from Cisco and Cloudera reveal the fundamental design considerations of Hadoop in the Enterprise Data Center. Drawing from lessons learned in the real world, they'll share best practices from deployments of Cloudera's Hadoop distribution alongside Cisco's networking components.
8:45am-8:55am (10m)
Wednesday Welcome
Edd Wilder-James (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Opening remarks by the Strata program chairs, Edd Dumbill and Alistair Croll.
8:55am-9:10am (15m)
Big Answers
Mike Olson (Cloudera)
Society confronts enormous challenges today: How will we feed nine billion people? How can we diagnose and treat diseases better, and more cheaply? How will we produce more energy, more cleanly, than ever before? Big questions like these demand new approaches, and "Big Data" is a crucial of the toolkit we will use over the coming years to answer them.
9:10am-9:15am (5m) Sponsored
The End of the Data Warehouse
Ben Werther (Platfora)
Hadoop is scalable, inexpensive and can store near-infinite amounts of data. But driving it requires exotic skills and hours of batch processing to answer straightforward questions. Learn how everything is about to change.
9:15am-9:25am (10m)
Moneyball for New York City
Michael Flowers (NYC Mayor's Office of Policy and Strategic Planning)
New York City is a complex, thriving organism. Hear how data science has played a surprising and effective role in helping the city government provide services to over 8 million people, from preventing public safety catastrophes to improving New Yorkers' quality of life.
9:25am-9:35am (10m) Sponsored
Thinking Big Together: Driving the Future of Data Science
Annika Jimenez (Pivotal), Anthony Goldbloom (Kaggle)
Data science is a team sport. Collaboration inside and outside your organization is the ultimate Big Data technique. Success depends on having a collaboration platform and solving the number one problem of the Big Data era: the supply and demand for data scientists. Learn how you can take action today to accelerate the success of your data science efforts.
9:35am-9:45am (10m)
The Composite Database
Rich Hickey (Datomic)
While moving away from single powerful servers, distributed databases still tend to be monolithic solutions. But e.g. key-value storage is rapidly becoming a commodity service, on which richer databases might be built. What are the implications?
9:45am-9:50am (5m) Sponsored
The Democratization of Big Data: Bringing Hadoop to the Masses
James Markarian (Informatica)
Data integration for Big Data projects can consume up to 80% of the development effort and yet too many developers reinvent the wheel by hand-coding custom connectors, data parsers, and data integration transformations. A metadata-driven, codeless IDE with pre-built transformations and data quality rules have proven to be up to 10X more productive than hand coding and easier to maintain.
9:50am-10:00am (10m)
Big Data Direct – The Era of Self-driven Big Data Exploration
Sharmila Mulligan (ClearStory Data)
In recent years, "Big Data" has matured from a vague description of massive corporate data to a household term that refers to not just volume but the diversity of data and velocity of change. Today, there's a wealth of data trapped in corporate data repositories, new platforms like Hadoop, a new generation of data marketplaces and volumes generated hourly on the Web.
10:00am-10:15am (15m)
Bringing the 'So What' to Big Data
Tim Estes (Digital Reasoning)
The onset of the Big Data phenomenon has created a unique opportunity, but the challenge ahead of us is to move beyond Big Data infrastructure to morally and practically useful applications. This requires new technologies that close the "Understanding Gap" and, by doing so, can make great strides to prevent evil, reduce suffering, and create more actualized human potential.
10:20am-10:50am (30m)
Break: Morning Break sponsored by Informatica
3:10pm-4:10pm (1h)
Break: Afternoon Break sponsored by Platfora
8:00am-8:45am (45m)
Break: Coffee break sponsored by Hortonworks
5:40pm-6:40pm (1h)
Attendee Reception
Join your fellow big data enthusiasts at the Strata Conference & Hadoop World Attendee Reception on Wednesday, October 24. *Sponsored by Microsoft*
12:20pm-1:40pm (1h 20m)
Wednesday Lunch and BoFs
Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on both days of the conference. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area.
6:40pm-8:00pm (1h 20m)
Plenary
To be confirmed
8:00pm-11:00pm (3h)
Data After Dark
The must-attend data party of year, Data After Dark is hosted by O'Reilly Strata at Liberty Theatre off Broadway, on Wednesday evening, October 24.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.