Skip to main content

Strata + Hadoop World Schedule

Below are the confirmed and scheduled talks at Strata + Hadoop World 2013. Note: The schedule is subject to change.

Customize Your Own Schedule

Create your own conference schedule using the personal scheduler function. Mark the Tutorials, Sessions, Keynotes, and Events you want to attend by clicking on the calendar icon [calendar icon] next to each listing. Then go to your personal schedule and get your own customized schedule generated.

Grand Ballroom East
Add Running Non-MapReduce Big Data applications on Apache Hadoop to your personal schedule
11:00am Running Non-MapReduce Big Data applications on Apache Hadoop Siddharth Seth (Hortonworks Inc), Hitesh Shah (Hortonworks Inc)
Sutton Center - Sutton South
Add Managing a Rapidly Evolving Analytics Pipeline to your personal schedule
11:00am Managing a Rapidly Evolving Analytics Pipeline Feng Peng (LinkTime Cloud)
Add Real-time Stream Processing Architecture for Comcast IP Video to your personal schedule
11:50am Real-time Stream Processing Architecture for Comcast IP Video Chris Lintz (Comcast), Gabriel Commeau (Comcast)
Add Information Security for the Data Management Professional to your personal schedule
2:35pm Information Security for the Data Management Professional Micheline Casey (Federal Reserve Board)
Add Big Data Architectural Patterns to your personal schedule
3:45pm Big Data Architectural Patterns Eddie Satterly (Splunk)
Grand Ballroom West
Add Building Your Analytics Shop, Step By Step to your personal schedule
11:00am Building Your Analytics Shop, Step By Step Q McCallum (@qethanm), Brett Goldstein (University of Chicago)
Add How is a rational (big) data deployment approach like optimizing the generation mix of a power company? to your personal schedule
11:50am How is a rational (big) data deployment approach like optimizing the generation mix of a power company? John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
Add How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce to your personal schedule
1:45pm How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce Erin Shellman (Nordstrom), David Von Lehman (Nordstrom)
Beekman Parlor - Sutton North
Add How to do Predictive Analytics with Limited Data to your personal schedule
11:00am How to do Predictive Analytics with Limited Data Ulrich Rueckert (Datameer)
Add Building More Productive Data Science and Analytics Workflows to your personal schedule
11:50am Building More Productive Data Science and Analytics Workflows Wes McKinney (Two Sigma Investments)
Add Getting the Most Out of Time-series Data to your personal schedule
3:45pm Getting the Most Out of Time-series Data Robert Johnson (Interana)
Add Data Science of Love to your personal schedule
4:35pm Data Science of Love Vaclav Petricek (eHarmony)
Regent Parlor
Add How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron & Big Data. to your personal schedule
11:50am How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron & Big Data. Jorge A. Lopez (Amazon Web Services), Matt Brandwein (Cloudera)
Add Data Science Without a Scientist to your personal schedule
1:45pm Data Science Without a Scientist Matt Schumpert (Datameer)
Add SQL on Hadoop: The Secret to your personal schedule
2:35pm SQL on Hadoop: The Secret Paul Groom (Kognitio)
Add Adaptive Data Preparation™ … From Raw Data to Ready Data in Minutes, not Months to your personal schedule
3:45pm Adaptive Data Preparation™ … From Raw Data to Ready Data in Minutes, not Months Luca Barone (Cisco), Charles Zedlewski (Cloudera), Timothy Weaver (Dannon), John Garris (UBS), Prakash Nanduri (Paxata), Howard Dresner (sandhill.com), Ben Haines (Box)
Murray Hill Suite
Add GraphLab: Large-Scale Machine Learning on Graphs to your personal schedule
11:50am GraphLab: Large-Scale Machine Learning on Graphs Carlos Guestrin (Apple | University of Washington ), Joseph Gonzalez (UC Berkeley)
Add MySQL's NoSQL Interface to your personal schedule
1:45pm MySQL's NoSQL Interface Dave Stokes (MySQL Community Team)
Rhinelander South
Add Data Blending – The Next Step in Big Data to your personal schedule
11:50am Data Blending – The Next Step in Big Data Rob Rosen (Pentaho), Andrew Robbins (Paytronix), Ross Macleod (Paytronix)
2:35pm TBC
Gramercy Suite B
Add Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop to your personal schedule
11:50am Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop Arun Murthy (Hortonworks), Alan Gates (Hortonworks), Owen O'Malley (HortonWorks)
Add Deeper Insight into Opertional BigData Cluster to your personal schedule
1:45pm Deeper Insight into Opertional BigData Cluster Samuel Kommu (Cisco Systems)
Add Driving Business Insights at Morgan Stanley:  Crowd, Content, & Context to your personal schedule
2:35pm Driving Business Insights at Morgan Stanley: Crowd, Content, & Context Michael Dobrovolsky (Morgan Stanley Wealth Management)
Add Apache Hadoop on the Open Cloud to your personal schedule
3:45pm Apache Hadoop on the Open Cloud Nirmal Ranganathan (Rackspace), David Dobbins (Rackspace Hosting)
Gramercy Suite A
Add Four Pillars of Visualization  to your personal schedule
11:00am Four Pillars of Visualization Noah Iliinsky (Amazon Web Services)
Add Visualizing Big Graphs and Social Networks to your personal schedule
11:50am Visualizing Big Graphs and Social Networks Richard Brath (Uncharted Software), David Jonker (Uncharted Software Inc.)
Add Turkers Mapping Africa to your personal schedule
1:45pm Turkers Mapping Africa Lyndon Estes (Princeton University)
Add Addressing Legacy Risks with Hadoop to your personal schedule
2:35pm Addressing Legacy Risks with Hadoop Ravi Hubbly (Leidos)
Add Sparking Global Business Transformation with Big Data to your personal schedule
3:45pm Sparking Global Business Transformation with Big Data David Thompson (Western Union)
Add Evolution in Characterizing Internet Usage to your personal schedule
4:35pm Evolution in Characterizing Internet Usage Amie Elcan (CenturyLink)
Add Wednesday Keynote Welcome to your personal schedule
8:45am Plenary
Room: Grand Ballroom
Wednesday Keynote Welcome Edd Wilder-James (Silicon Valley Data Science), Alistair Croll (Solve For Interesting)
Add The Future of Hadoop: What Happened & What's Possible? to your personal schedule
8:50am Plenary
Room: Grand Ballroom
The Future of Hadoop: What Happened & What's Possible? Doug Cutting (Cloudera)
Add Designing Your Data-Centric Organization to your personal schedule
9:05am Plenary
Room: Grand Ballroom
Designing Your Data-Centric Organization Josh Klahr (Pivotal)
Add Encouraging You to Change the World with Big Data to your personal schedule
9:15am Plenary
Room: Grand Ballroom
Encouraging You to Change the World with Big Data David Parker (SAP)
Add The Value of Social (for) TV to your personal schedule
9:20am Plenary
Room: Grand Ballroom
The Value of Social (for) TV Shawndra Hill (University of Pennsylvania)
Add Ubiquitous Satellite Imagery of our Planet to your personal schedule
9:30am Plenary
Room: Grand Ballroom
Ubiquitous Satellite Imagery of our Planet Will Marshall (Planet Labs)
Add The Big Data Journey: Taking a holistic approach  to your personal schedule
9:40am Plenary
Room: Grand Ballroom
The Big Data Journey: Taking a holistic approach John Choi (IBM)
Add Rethink How You See Data to your personal schedule
9:45am Plenary
Room: Grand Ballroom
Rethink How You See Data Sharmila Mulligan (ClearStory Data)
Add Can Big Data Save Them? to your personal schedule
9:50am Plenary
Room: Grand Ballroom
Can Big Data Save Them? Jim Kaskade (Infochimps)
Add Changing the Face of Technology - Black Girls CODE to your personal schedule
10:00am Plenary
Room: Grand Ballroom
Changing the Face of Technology - Black Girls CODE Peta Clarke (Google), Donna Knutt (Black Girls Code)
Add Beyond R and Ph.D.s: The Mythology of Data Science Debunked to your personal schedule
10:05am Plenary
Room: Grand Ballroom
Beyond R and Ph.D.s: The Mythology of Data Science Debunked Douglas Merrill (ZestFinance)
Add Is Bigger Really Better?  Predictive Analytics with Fine-grained Behavior Data to your personal schedule
10:15am Plenary
Room: Grand Ballroom
Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data Foster Provost ( NYU | Stern )
10:30am Morning Break sponsored by Platfora
Room: Sponsor Pavilion
Add Wednesday Lunchtime Industry BoFs to your personal schedule
12:30pm Lunch sponsored by Pivotal
Room: America's Hall 1 & 2
Wednesday Lunchtime Industry BoFs
Add Wednesday Coffee BoFs to your personal schedule
8:00am Coffee Break
Room: Sutton Foyer
Wednesday Coffee BoFs
3:15pm Afternoon Break sponsored by ClearStory Data
Room: Sutton Foyer
Hadoop World
11:00am-11:40am (40m) Hadoop Platform
Running Non-MapReduce Big Data applications on Apache Hadoop
Siddharth Seth (Hortonworks Inc) et al
Apache Hadoop has become popular from its specialization in the execution of MapReduce programs. However, it has been hard to leverage existing Hadoop infrastructure for various other processing paradigms such as real-time streaming, graph processing and message-passing. Learn how this barrier was removed and how new applications are being built and run on Apache Hadoop.
Hadoop World
11:50am-12:30pm (40m) Hadoop Platform
Unifying Your Data Management Platform with Hadoop: Batch and Real-time Machine Data Ingest, Alerts, and Analytics
Jayant Shekhar (Sparkflows Inc.)
Hadoop has evolved significantly in recent years, today serving as a unified platform for near-real-time (NRT) and batch workflows, such as querying, analysis and alerting for logs and machine data. In this session, we'll dive into the details of using SolrCloud and Cloudera Impala together to serve search queries, by integrating Flume to stream events into Solr, Impala and HBase.
Hadoop World
1:45pm-2:25pm (40m) Hadoop Platform
SAS on Your Cluster, Serving your Data (Analysts)
Paul Kent (SAS)
Analytically focused organizations are building general purpose Hadoop Clusters and want to deploy a wide range of Analytic Software. As the level of data sharing goes up and the variety of tools used to access data increases, you’ll be faced with choices: what format to store your data in; what catalog to describe the data and its layouts; and how/when/where to decide between tools.
Hadoop World
11:00am-11:40am (40m) Hadoop in Action
Managing a Rapidly Evolving Analytics Pipeline
Feng Peng (LinkTime Cloud)
At Twitter our Hadoop-centric data analytics pipeline has been rapidly growing in terms of both size and complexity. With thousands of evolving data sources and analytics programs, orchestrating the analytics production becomes extremely difficult without a systematic solution. We will describe our production challenges and illustrate how the service we built help us address them.
Hadoop World
11:50am-12:30pm (40m) Hadoop in Action
Real-time Stream Processing Architecture for Comcast IP Video
Chris Lintz (Comcast) et al
Real-time analytics produced by IP video players help ensure that Comcast delivers the highest quality experience to customers. While ingesting as many messages as Tweets produced every day, these real-time insights are achieved through an in-house architecture leveraging Flume NG and Storm.
Hadoop World
1:45pm-2:25pm (40m) Hadoop in Action
Ancestry.com: Managing Big Data Reaching Back to the 11th Century with Hadoop
Scott Sorensen (Ancestry.com)
New, affordable DNA sequencing will generate massive new flows of data. Ancestry.com currently manages 4 petabytes of searchable data and is on track to increase this figure exponentially with its new DNA product. Ancestry.com CTO, Scott Sorensen, explains how the company manages tremendous amounts of new data through two categories of Hadoop use cases: 1) analytics and 2) product features.
2:35pm-3:15pm (40m) Enterprise Data
Information Security for the Data Management Professional
Micheline Casey (Federal Reserve Board)
Traditionally, security has tended to mean "lock down and protect". But there is a balance between securing data while still supporting information sharing and reuse. This presentation is meant to educate data management professionals at all levels how to manage this balance.
3:45pm-4:25pm (40m) Enterprise Data
Big Data Architectural Patterns
Eddie Satterly (Splunk)
In this session you will hear from big data experts with real world experience on the architectural patterns and tools integrations used to solve real business problems with data.
4:35pm-5:15pm (40m) Enterprise Data
Monetize Your Data Firehose Like a High-Frequency Trader
Volkmar Uhlig (Adello)
Machine-generated data is getting stale fast. Operational data, sensor data, or video feeds requires new automated approaches to capture value. In this sessions we will show how to apply the lessons learned from automated trading systems and high-frequency trading to today’s Big Data problems to monetize information.
11:00am-11:40am (40m) Enterprise Data
Building Your Analytics Shop, Step By Step
Q McCallum (@qethanm) et al
Data analysis has become a key element of a business, yet there is painfully little guidance for leadership roles who are tasked with building and managing this critical function. We've spoken with various companies to get their take on how to build an analytics shop, and we'd like to share that information with you.
11:50am-12:30pm (40m) Enterprise Data
How is a rational (big) data deployment approach like optimizing the generation mix of a power company?
John Akred (Silicon Valley Data Science) et al
A modern CIO rationalizing a company’s data architecture must consider a mix of deployment options like a utility executive has to invest in a good generation mix. We articulate a framework for applying the deployment levers available to architects as they plot a course forward in this era of big data technologies, born of our deep experience implementing the world's largest data platforms.
1:45pm-2:25pm (40m) Enterprise Data
How Nordstrom Utilizes Human Intelligence to Blend Brick-and-Mortar with Online Commerce
Erin Shellman (Nordstrom) et al
Nordstrom started modestly in 1901 as a small shoe store in Seattle, and has since expanded to 117 full-line department stores and 138 Rack stores across the country. The art of retailing has changed dramatically over the last century and retailers today are concerned with understanding customer behavior and preferences both in the physical world and online.
11:00am-11:40am (40m) Data Science
How to do Predictive Analytics with Limited Data
Ulrich Rueckert (Datameer)
Even if one has big data, sometimes there is a lack of key data. This is a problem for predictive analytics: if there is only a limited amount of training material (e.g. user ratings, categorized documents), then it is hard to generate accurate models. The talk introduces new semi-supervised learning methods to overcome this problem by utilizing the vast amount of unlabeled data.
11:50am-12:30pm (40m) Data Science
Building More Productive Data Science and Analytics Workflows
Wes McKinney (Two Sigma Investments)
This talk will look at end-to-end data workflows (i.e. the sequence of preparation, analysis, visualization, and collaboration) and discuss technologies and tools (both programming and UI-driven) that can help individuals and organizations do more with their data.
1:45pm-2:25pm (40m) Data Science
Text Analytics at Scale: Listening to 45 Million Customers
Heather Wasserlein (Intuit)
Voice of the customer (VOC) data is a rapidly growing, unstructured, untapped data source – for your web site and across social media sites. Topic discovery through clustering of user verbatims, integrated with decision support data, can unleash valuable, actionable insights from millions of customers.
2:35pm-3:15pm (40m) Data Science
How to Get Statistics Right in AB Testing: The Short Answer (With Proof from Four Years of Fundraising Data from Wikipedia)
Zack Exley (Brand New Congress) et al
There's something about AB testing that invites statistical malpractice, and that makes communication between academics and practitioners very difficult. Wikipedia's revenue is depends on doing testing right. We'd like to present simple methods that we believe accurately predict future performance from AB test results, while minimizing sample size, along with proofs from four years of test data.
3:45pm-4:25pm (40m) Data Science
Getting the Most Out of Time-series Data
Robert Johnson (Interana)
Many of the world's largest datasets are time series. With today's technology the number of things in the world doesn't seem that big, but how those things change over time is. Unfortunately many data tools don't natively consider time a first-class concept. I'll be talking about a variety of ways to organize your data and architect your data systems to get the most out of your time-based data.
4:35pm-5:15pm (40m) Data Science
Data Science of Love
Vaclav Petricek (eHarmony)
Humans have a mixed record in choosing romantic partners. Are looks or brains more important for a happy marriage? This session will show you how big data and large scale machine learning can help us model such a complex behavior and tell us which traits in a partner actually matter. Who knows - maybe hadoop will help you find Love ;-)
11:00am-11:40am (40m) Sponsored
Instant Results and Infinite Storage with SAP and Hadoop
David Parker (SAP)
Learn how solutions from SAP and our Hadoop partners can help your organization gain unprecedented insight from Big Data.
11:50am-12:30pm (40m) Sponsored
How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron & Big Data.
Jorge A. Lopez (Amazon Web Services) et al
Mainframe is Big Data too! Leveraging it in Hadoop creates a remarkable competitive advantage, but exploiting it without the right tools is nearly impossible, requiring you to wrestle with thousands of lines of Java, Pig, Hive, COBOL and more. This session presents a smarter way to ingest and process mainframe data in Hadoop, and how to bridge the technical, skill and cost gaps between the two.
1:45pm-2:25pm (40m) Sponsored
Data Science Without a Scientist
Matt Schumpert (Datameer)
With data scientists in short supply, it's surprising that much of their precious time is spent doing "data plumbing"—preparing data or servicing business users rather than doing actual data science. In this session, we'll look at the gradual evolution of tools that's moving us towards self-service data science.
2:35pm-3:15pm (40m) Sponsored
SQL on Hadoop: The Secret
Paul Groom (Kognitio)
Is Hadoop ready for high-concurrency complex BI? Even with Hadoop 2.0 on the way? Advanced analytics requires rip-roaring performance and fast, low-latency execution. Disk is not the solution, in-memory is where the hot BI data needs to reside. This informative session will offer expert advice, opinions from the ""bleeding edge,"" and some hidden secrets from 25 years of work with big data.
3:45pm-4:25pm (40m) Sponsored
Adaptive Data Preparation™ … From Raw Data to Ready Data in Minutes, not Months
Luca Barone (Cisco) et al
Join a lively panel discussion moderated by the undisputed father of BI, Howard Dresner, featuring the emerging leaders of the Gen D Revolution: Luca Barone from Cisco, Timothy Weaver from Dannon, John Garris from UBS and Prakash Nanduri from Paxata.
Hadoop World
11:00am-11:40am (40m) Hadoop & Beyond
The Evolution of Hadoop at Stripe: Replicating MongoDB into HBase in Realtime, and How We Bolted Analytics onto an Existing System
Colin Marc (Stripe)
Most startups don't start to think about having a real analytics platform until it's too late, and Stripe is certainly no exception. In this session, I'll describe how we approached bulding such a platform, and walk through the steps (and missteps) we took in making our production data available in Hadoop - in realtime - for processing and querying.
Hadoop World
11:50am-12:30pm (40m) Hadoop & Beyond
GraphLab: Large-Scale Machine Learning on Graphs
Carlos Guestrin (Apple | University of Washington ) et al
GraphLab is like Hadoop for graphs. Users express graph processing algorithms using a simple API and the GraphLab runtime efficiently executes that computation on multicore and distributed architectures. By leveraging advances in graph representation, asynchronous communication, and scheduling, GraphLab is able to achieve orders-of-magnitude performance gains over existing systems like Hadoop.
Hadoop World
1:45pm-2:25pm (40m) Hadoop & Beyond
MySQL's NoSQL Interface
Dave Stokes (MySQL Community Team)
MySQL 5.6 includes a NoSQL interface, using an integrated memcached daemon that can automatically store data and retrieve it from InnoDB tables, turning the MySQL server into a fast “key-value store” for single-row insert, update, or delete operations. This session explores using this interface and other 'simple' options for those with MySQL Databases instances seeking to explore big data access.
Hadoop World
2:35pm-3:15pm (40m) Hadoop Platform
Practical Performance Analysis and Tuning for Cloudera Impala
Greg Rahn (Cloudera)
Impala brings SQL to Hadoop, but it also brings SQL performance tuning to those using the platform. This technical session will cover several topics in Impala performance analysis to aid in answering the question “why is my query slow?” as well as practical tips and techniques to get the best performance from Impala.
Hadoop World
3:45pm-4:25pm (40m) Hadoop Platform
Hadoop Internals for Oracle Developers and DBAs
Tanel Poder (gluent.)
If you are a developer or DBA with Oracle background and want to learn how Hadoop works, this session is for you. We will go through the Hadoop HDFS and MapReduce data processing flow and compare it to the already familiar Oracle database parallel processing - which should make understanding the internals of this new technology a breeze.
Hadoop World
4:35pm-5:15pm (40m) Hadoop Platform
Trickery and Tooling for Distributed System Diagnosis and Debugging
Philip Zeyliger (Cloudera)
All is quiet on the log file front, but yet the system is down. What next? This talk will cover the tricks of the trade for debugging distributed systems. Motivated by experience gained diagnosing Hadoop, we’ll dig into the JVM, Linux esoterica, and outlier visualization.
11:00am-11:40am (40m) Sponsored
Turn Hadoop Data into Business Insights: A New Approach for Rapid Exploration and Analysis
Brett Sheppard (Splunk)
Learn firsthand how a leading enterprise used Splunk and their Hadoop distribution to empower the organization with new access to Hadoop data. See how they got up and running in under an hour and enabled their developers to start writing big data apps.
11:50am-12:30pm (40m) Sponsored
Data Blending – The Next Step in Big Data
Rob Rosen (Pentaho) et al
Attend this session to learn: What is 'data blending'? How you can take this "next step" with little investment or new skills. Examples of companies taking the "next step" in big data and benefiting from at-the-source Data Blending.
1:45pm-2:25pm (40m) Sponsored
Testing Riak for Multiple Data-Center Support: A Case Study
Jim Englert (Gilt)
In July 2013 a team from Basho joined up with a team of Gilt engineers at Gilt's Dublin office to spend a few days testing how Riak would handle Gilt's production traffic on the company's main user store. In this talk Jim will discuss this process, the results of this stress test, and how Gilt--one of the top eCommerce companies in the U.S...
2:35pm-3:15pm (40m)
Session
To be confirmed
11:00am-11:40am (40m) Sponsored
The Big Data Journey: Identifying roads to success and transforming your organization
John Choi (IBM)
How can big data really help me? What's real and what's hype? How do I ensure my Big Data projects are successful? How do I get started? We will provide real world examples and heuristics from organizations successfully navigating their Big Data journey from early projects to organizational transformation.
11:50am-12:30pm (40m) Sponsored
Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop
Arun Murthy (Hortonworks) et al
Apache Hive is the de facto standard for SQL-in-Hadoop today with more enterprises relying on this open source project than any alternative. New enterprise requirements for Hive to become more real time or interactive have evolved… and the Hive community has responded. Please join Arun Murthy, Owen O'Malley and Alan Gates to learn more about Stinger and improvements to Apache Hive.
1:45pm-2:25pm (40m) Sponsored
Deeper Insight into Opertional BigData Cluster
Samuel Kommu (Cisco Systems)
Is it possible to use a BigData cluster for other applications? Should the cluster be virtualized or on bare metal? Local storage or Shared? Which Hadoop version? Cisco will examine and discuss some of these concepts, to help plan and optimize a Big Data cluster running multiple applications without impacting performance.
2:35pm-3:15pm (40m) Sponsored
Driving Business Insights at Morgan Stanley: Crowd, Content, & Context
Michael Dobrovolsky (Morgan Stanley Wealth Management)
Morgan Stanley is gaining deeper insights from big data to improve operational efficiency and customer value. In this session you’ll learn how Morgan Stanley matured its big data solution with Hadoop to scale big data deployments, leverage crowd innovation, and tackle the challenges associated with big data overload and complexity.
3:45pm-4:25pm (40m) Sponsored
Apache Hadoop on the Open Cloud
Nirmal Ranganathan (Rackspace) et al
We'll discuss some of the use cases for when a virtual Hadoop cluster makes sense and share some of our experiences and some of the decisions that drove the product design of Rackspace Cloud Big Data; an upcoming HDP as a service offering from Rackspace Hosting.
11:00am-11:40am (40m) Design
Four Pillars of Visualization
Noah Iliinsky (Amazon Web Services)
This talk discusses the broad design considerations necessary for effective visualizations. Attendees will learn about purpose, content, structure, and formatting. We will also discuss why they must be selected in this order, and discuss the importance and impact each has on your visualization.
11:50am-12:30pm (40m) Design
Visualizing Big Graphs and Social Networks
Richard Brath (Uncharted Software) et al
Visualizations of big graphs often look like spaghetti and can be difficult to use. Working backwards from the analytic questions, we will show some very different 2D and 3D visualizations for social networks. We'll also cover some of the challenges and discuss some open source tools.
1:45pm-2:25pm (40m) Data, Connectivity, and Society
Turkers Mapping Africa
Lyndon Estes (Princeton University)
Knowing where farming occurs and where it will expand is crucial for understanding food security and our changing environment. However, the satellite-based maps we currently rely on are often inaccurate, particularly in Africa. Our project is harnessing open source software, big data, and crowdsourcing to create better crop field maps for Africa.
Hadoop World
2:35pm-3:15pm (40m) Hadoop in Action
Addressing Legacy Risks with Hadoop
Ravi Hubbly (Leidos)
Enterprises continue to rely on legacy mainframe-based systems even though utilizing these legacy systems is prone to risks. This is mainly because prior efforts at modernization of these legacy systems have been difficult. In this topic we will discuss usage scenarios where utilizing Hadoop has assisted in modernizing legacy systems and position businesses for big data benefits.
Hadoop World
3:45pm-4:25pm (40m) Hadoop in Action
Sparking Global Business Transformation with Big Data
David Thompson (Western Union)
In business there are demands that, if not managed well, can cause friction. This friction can be between colleagues and it can be felt by customers and clients. Consider financial services. Leaders constantly face pressures, from meeting revenue targets and consumer needs to engaging in activities like honoring individuals’ privacy rights and protecting people and the business from fraud.
4:35pm-5:15pm (40m) Data, Connectivity, and Society
Evolution in Characterizing Internet Usage
Amie Elcan (CenturyLink)
As use of the Internet evolves, the data collected about Internet traffic must evolve in parallel to ensure the performance of applications and to keep access affordable. The ability to characterize how the Internet is being used is essential to the telecom industry. Case studies using R and Python Pandas will be presented to demonstrate the power of analytics to answer strategic questions.
8:45am-8:50am (5m)
Wednesday Keynote Welcome
Edd Wilder-James (Silicon Valley Data Science) et al
Program Chairs, Edd Dumbill and Alistair Croll, welcome you to the second day of keynotes.
8:50am-9:05am (15m)
The Future of Hadoop: What Happened & What's Possible?
Doug Cutting (Cloudera)
Doug will talk broadly about the future capability of Hadoop in the context of the road traveled so far. What are the limits of Hadoop? How should you think about workloads like SQL and Search? What's next?
9:05am-9:15am (10m) Sponsored
Designing Your Data-Centric Organization
Josh Klahr (Pivotal)
Data is coming at us from everywhere – in small quantities, large magnitudes, and in almost every format. As Pivotal’s Vice President of Data Platform Product Management, Josh Klahr has the know-how to provide insights on how to build an organization that strategically manages this data in today’s modern and complex enterprise environments.
9:15am-9:20am (5m) Sponsored
Encouraging You to Change the World with Big Data
David Parker (SAP)
Big Data is impacting society in ways never possible before – enabling us all to gain insights that can transform the way we do business, work with others, and live our lives. SAP recognizes that this transformation needs grassroots support...
9:20am-9:30am (10m)
The Value of Social (for) TV
Shawndra Hill (University of Pennsylvania)
In this keynote I will discuss how TV networks and advertisers can derive value from all of the online social activity about TV.
9:30am-9:40am (10m)
Ubiquitous Satellite Imagery of our Planet
Will Marshall (Planet Labs)
Planet Labs is launching the largest ever fleet of Earth-imaging satellites in December. These will enable high resolution imagery of the entire planet to be taken on a more frequent basis. The data is of large potential value: humanitarian applications range from monitoring deforestation and the ice caps to disaster relief and improving agriculture yields in developing nations.
9:40am-9:45am (5m) Sponsored
The Big Data Journey: Taking a holistic approach
John Choi (IBM)
What is Big Data? What will it mean for my organization? What technologies do I need? In this session, we will provide a view of what Big Data really means for organizations and how people, processes, and technologies, when brought together, can catalyze a transformational journey.
9:45am-9:50am (5m) Sponsored
Rethink How You See Data
Sharmila Mulligan (ClearStory Data)
Is your big data analysis constrained by slow cycles, specialist-only access, and a process of one-shot, big data analysis? Traditional approaches are painful, costly and tedious. See a whole new way to speed the cycle, converge and analyze diverse data, and interact on insights.
9:50am-10:00am (10m)
Can Big Data Save Them?
Jim Kaskade (Infochimps)
Data and analytics is a means to an end. Jim highlights a new revolution of analytic applications with some touching examples in the healthcare industry with cancer research and medication therapy management.
10:00am-10:05am (5m)
Changing the Face of Technology - Black Girls CODE
Peta Clarke (Google) et al
Details to come..
10:05am-10:15am (10m)
Beyond R and Ph.D.s: The Mythology of Data Science Debunked
Douglas Merrill (ZestFinance)
Most people think success in big data analysis is about the right mix of vast amounts of data, mathematics and Ph.D.’s (oh my!). Those people are wrong. You need artistry too. This talk will provide some examples of "pure" ML failures and give suggestions on how to build an appropriately artistic team.
10:15am-10:25am (10m)
Is Bigger Really Better? Predictive Analytics with Fine-grained Behavior Data
Foster Provost ( NYU | Stern )
Predictive analytics is one of the most mature areas of data science and an area where "big data" often is associated with competitive advantage. However, concrete results supporting the advantage conferred by big data are few and far between.
10:30am-11:00am (30m)
Break: Morning Break sponsored by Platfora
12:30pm-1:45pm (1h 15m) Event
Wednesday Lunchtime Industry BoFs
Birds of a Feather (BoF) sessions are informal roundtable discussions happening throughout the day on Tuesday and Wednesday. Lunch BoFs will be organized around industries such as finance, media, retail, and more.
8:00am-8:45am (45m) Event
Wednesday Coffee BoFs
Have a particular topic you’d like to discuss with other Strata Conference + Hadoop World attendees during morning coffee? Join in or organize a Birds of a Feather discussion table in the Attendee Lounge (3rd floor). Sign-up board is near the Attendee Lounge.
3:15pm-3:45pm (30m)
Break: Afternoon Break sponsored by ClearStory Data

Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners
@oreilly.com

Press & Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts