Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY
 
1 E10 / 1 E11
9:00am Hardcore data science Ben Lorica (O'Reilly Media), Reza Zadeh (Matroid & Stanford), David Blei (Columbia University), Anima Anandkumar (UC Irvine), Hussein Mehanna (Facebook), Jennifer Chayes (Microsoft Research), Ben Recht (University of California, Berkeley), Tanzeem Choudhury (Cornell and HealthRhythms), Jenn Wortman Vaughan (Microsoft Research), Adam Marcus (B12), Stefanie Jegelka (M.I.T.), Mikhail Bilenko (Microsoft), Reynold Xin (Databricks)
1 E12/ 1 E13
9:00am PyData at Strata Travis Oliphant (Anaconda), Peter Wang (Anaconda), Kyle Kelley (Netflix), Andrew Odewahn (O'Reilly Media), Paige Bailey (Chevron), Jeff Reback (Continuum Analytics), Andy Terrel (NumFOCUS), Bryan Van de Ven (Continuum Analytics), Sarah Bird (Aptivate), James Powell (NumFOCUS), Phil Cloud (Continuum), Jason Grout (Bloomberg LP), Chris Colbert (Anaconda Powered by Continuum Analytics), Owen Zhang (DataRobot), Peter Prettenhofer (DataRobot), Damon McDougall (UT Austin), Michael Droettboom (Space Telescope Science Institute), Jim Crist (Continuum Analytics), Benjamin Zaitlen (Anaconda), Andreas Mueller (NYU, scikit-learn)
1 E14 / 1 E15
9:00am Data-driven business day Alistair Croll (Solve For Interesting), Farrah Bostic (The Difference Engine), Mark Madsen (Third Nature), krish venkataraman (Syncsort), Amy OConnor (Cloudera), Bill Franks (Teradata Corporation), Jake Kendall (Bill & Melinda Gates Foundation), Tricia Wang (Constellate Data ), Cécile Barbaroux (Schibsted Classified Media), Kristi Marotta (Allstate), Adam Devine (WorkFusion), Rahel Jhirad (Hearst), Alexander White (Next Big Sound), Jana Eggers (Nara Logics), Vincent Dell'Anno (Accenture), Fredrik Backner (Telia Company ), Bill Moschella (Evariant), Florin Trandafir (Nokia)
1 E16 / 1 E17
9:00am R Day Garrett Grolemund (RStudio), Yihui Xie (RStudio, Inc.), Nathan Stephens (RStudio, Inc.), Randall Prium (Calvin College)
1 E18
9:00am Innovation + growth Roger Magoulas (O'Reilly Media), Roger Chen, Ari Gesher (Palantir Technologies), Hilary Mason (Fast Forward Labs), Eva Ho (Susa Ventures), Matthew Tamayo-Rios (Kryptnostic), Ann Johnson (Interana), Gary Marcus (Geometric Intelligence), Shivon Zilis (Bloomberg Beta), Jacomo Corbo (QuantumBlack), Peter Brodsky (HyperScience), Cack Wilhelm (Scale Venture Partners), Alex Rice (HackerOne), Chris Wake (Spire Global, Inc.), Harper Reed (Modest), Dennis Mortensen (x.ai), Rajiv Maheswaran (Second Spectrum), Jessica Stauth (Quantopian)
1 E19/ 1 E 20/ 1 E21
9:00am Spark Camp: An introduction to Apache Spark with hands-on tutorials Anthony D. Joseph (UC Berkeley | Databricks)
3D 02/11
9:00am Hadoop application architectures: Fraud detection Gwen Shapira (Confluent), Jonathan Seidman (Cloudera), Ted Malaska (Blizzard Entertainment), Mark Grover (Lyft)
1:30pm Building a Hadoop data application Tom White (Cloudera), Ryan Blue (Cloudera)
3D 03/10
9:00am Data science for Wall Street Sean Owen (Cloudera), Juliet Hougland (Cloudera), Sandy Ryza (Clover Health)
1:30pm Architecting a data platform Stephen O'Sullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)
3D 04/09
9:00am Many streams lead to Kafka - An event data workshop Jesse Anderson (Big Data Institute), Ewen Cheslack-Postava (Confluent)
3D 05/08
9:00am Data 101 Marie Beaugureau (O'Reilly Media, Inc. ), Paco Nathan (O'Reilly Media), Tim Berglund (Confluent), Edd Wilder-James (Google), Matthew Gee (Impact Lab/University of Chicago ), Yael Garten (LinkedIn), Katie Kent (Galvanize)
1:30pm Developing a modern enterprise data strategy Scott Kurth (Silicon Valley Data Science), Edd Wilder-James (Google)
Hall B
3D 06/07
9:00am Introduction to visualizations using D3 Brian Suda (optional.is)
1:30pm Apache Drill bootcamp Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
1 E6 / 1 E7
9:00am Machine Learning 101 Alice Zheng (Amazon), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato)
3D 01/12
9:00am Spark Development Bootcamp Laurent Weichberger (OmPoint Innovations, LLC)
1B 03
9:00am Practical data science on Hadoop Brandon MacKenzie (IBM), John Rollins (IBM), Jacques Roy (IBM), Chris Fregly (PipelineAI), Mokhtar Kandil (IBM)
1B 04
9:00am Designing and building big data applications Nathan Neff (Cloudera)
6:30pm Plenary
Room: Javits North
Startup Showcase
5:00pm Opening Reception sponsored by MarkLogic, VMware, Infosys, and Continuum Analytics
Room: 3E
Opening Reception
12:30pm 12:30pm - 1:30pm Lunch, Sponsored by Intel (3A & 3B) | 3:00pm - 3:30pm Afternoon Break, sponsored by IBM (various locations)
Room: 3A & 3B
7:00am | 8:00am - 9:00am Coffee Break | 10:30am - 11:00am Morning Break, sponsored by SAS (3D, 1E)
Room: Break
9:00am-5:00pm (8h) Hardcore Data Science
Hardcore data science
Ben Lorica (O'Reilly Media), Reza Zadeh (Matroid & Stanford), David Blei (Columbia University), Anima Anandkumar (UC Irvine), Hussein Mehanna (Facebook), Jennifer Chayes (Microsoft Research), Ben Recht (University of California, Berkeley), Tanzeem Choudhury (Cornell and HealthRhythms), Jenn Wortman Vaughan (Microsoft Research), Adam Marcus (B12), Stefanie Jegelka (M.I.T.), Mikhail Bilenko (Microsoft), Reynold Xin (Databricks)
All-Day: Strata's regular data science track has great talks with real-world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting...
9:00am-5:00pm (8h) Data Science & Advanced Analytics
PyData at Strata
Travis Oliphant (Anaconda), Peter Wang (Anaconda), Kyle Kelley (Netflix), Andrew Odewahn (O'Reilly Media), Paige Bailey (Chevron), Jeff Reback (Continuum Analytics), Andy Terrel (NumFOCUS), Bryan Van de Ven (Continuum Analytics), Sarah Bird (Aptivate), James Powell (NumFOCUS), Phil Cloud (Continuum), Jason Grout (Bloomberg LP), Chris Colbert (Anaconda Powered by Continuum Analytics), Owen Zhang (DataRobot), Peter Prettenhofer (DataRobot), Damon McDougall (UT Austin), Michael Droettboom (Space Telescope Science Institute), Jim Crist (Continuum Analytics), Benjamin Zaitlen (Anaconda), Andreas Mueller (NYU, scikit-learn)
Python has become an increasingly important part of the data engineer and analytic tool landscape. Pydata at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including IPython Notebook, NumPy/matplotlib for visualization, SciPy, scikit-learn, and how to scale Python performance, including how to handle large, distributed data sets.
9:00am-5:00pm (8h) Data-driven Business
Data-driven business day
Alistair Croll (Solve For Interesting), Farrah Bostic (The Difference Engine), Mark Madsen (Third Nature), krish venkataraman (Syncsort), Amy OConnor (Cloudera), Bill Franks (Teradata Corporation), Jake Kendall (Bill & Melinda Gates Foundation), Tricia Wang (Constellate Data ), Cécile Barbaroux (Schibsted Classified Media), Kristi Marotta (Allstate), Adam Devine (WorkFusion), Rahel Jhirad (Hearst), Alexander White (Next Big Sound), Jana Eggers (Nara Logics), Vincent Dell'Anno (Accenture), Fredrik Backner (Telia Company ), Bill Moschella (Evariant), Florin Trandafir (Nokia)
All-day: For business strategists, marketers, product managers, and entrepreneurs, Data-Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with big data. It's the missing MBA for a data-driven, always-on business world.
9:00am-5:00pm (8h) Data Science & Advanced Analytics
R Day
Garrett Grolemund (RStudio), Yihui Xie (RStudio, Inc.), Nathan Stephens (RStudio, Inc.), Randall Prium (Calvin College)
From advanced visualization, collaboration, and reproducibility to data manipulation, R Day at Strata covers a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from leading luminaries and R committers, the folks keeping the R ecosystem apace of the challenges facing analysts and others who work with data.
9:00am-5:00pm (8h) Data-driven Business
Innovation + growth
Roger Magoulas (O'Reilly Media), Roger Chen (.), Ari Gesher (Palantir Technologies), Hilary Mason (Fast Forward Labs), Eva Ho (Susa Ventures), Matthew Tamayo-Rios (Kryptnostic), Ann Johnson (Interana), Gary Marcus (Geometric Intelligence), Shivon Zilis (Bloomberg Beta), Jacomo Corbo (QuantumBlack), Peter Brodsky (HyperScience), Cack Wilhelm (Scale Venture Partners), Alex Rice (HackerOne), Chris Wake (Spire Global, Inc.), Harper Reed (Modest), Dennis Mortensen (x.ai), Rajiv Maheswaran (Second Spectrum), Jessica Stauth (Quantopian)
This is a day to learn about the data innovations that have the potential to blindside even the most careful organizations. Aimed at decision makers, the Innovation + Growth program focuses on how data-oriented startups, academics, and venture capitalists approach innovation and the potential to disrupt incumbent business models.
9:00am-5:00pm (8h) Spark & Beyond
Spark Camp: An introduction to Apache Spark with hands-on tutorials
Anthony D. Joseph (UC Berkeley | Databricks)
Spark Camp provides a day long hands-on intro to the Spark platform including the core API, Spark SQL, Spark Streaming, MLlib, GraphX, and more. We will cover each Spark component through a series of technical talks targeted at developers who are new to Spark -- intermixed with hands-on lab work.
9:00am-12:30pm (3h 30m) Hadoop Internals & Development
Hadoop application architectures: Fraud detection
Gwen Shapira (Confluent), Jonathan Seidman (Cloudera), Ted Malaska (Blizzard Entertainment), Mark Grover (Lyft)
Looking for a deeper understanding of how to architect real-time data processing solutions? Then this tutorial is for you. In Part 1 of "Architecture Day," We will build a fraud-detection system, and use it as an example to discuss considerations for building such a system; how you’d integrate various technologies; and why those choices make sense for the use case in question.
1:30pm-5:00pm (3h 30m) Production Ready Hadoop
Building a Hadoop data application
Tom White (Cloudera), Ryan Blue (Cloudera)
In the second (afternoon) half of the Architecture Day tutorial, attendees will build a data application from the ground up. As a part of the tutorial, we will demonstrate how Kite codifies the best practices from the Hadoop Architecture Day morning session.
9:00am-12:30pm (3h 30m) Data Science & Advanced Analytics
Data science for Wall Street
Sean Owen (Cloudera), Juliet Hougland (Cloudera), Sandy Ryza (Clover Health)
In this tutorial, attendees will get a taste of how large-scale data science techniques and technologies developed for the consumer internet can be applied in the world of finance. We will guide an exploration of the relationship between the traffic on Wikipedia pages to the movement of stock prices.
1:30pm-5:00pm (3h 30m) Spark & Beyond
Architecting a data platform
Stephen O'Sullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
9:00am-12:30pm (3h 30m) Data Innovations
Many streams lead to Kafka - An event data workshop
Jesse Anderson (Big Data Institute), Ewen Cheslack-Postava (Confluent)
This is a hands-on workshop where you’ll learn how to leverage the capabilities of Kafka to collect, manage, and process stream data for big data projects and general purpose enterprise data integration needs alike. When your data is captured in real-time and available as real-time subscriptions, you can start to compute new datasets in real-time off these original feeds.
1:30pm-5:00pm (3h 30m) IoT & Real-time
Process, store, and analyze like a boss with Team Apache: Kafka, Spark, and Cassandra
Patrick McFadin (DataStax)
This tutorial is all about managing large volumes of data coming at your data center fast and continuously. If you don't have a strategy, then allow me to help. Amazing Apache Project software can make this problem a lot easier to deal with. Spend a few hours and learn about how each part works, and how they work together. Your users will thank you.
9:00am-12:30pm (3h 30m) Business & Innovation
Data 101
Marie Beaugureau (O'Reilly Media, Inc. ), Paco Nathan (O'Reilly Media), Tim Berglund (Confluent), Edd Wilder-James (Google), Matthew Gee (Impact Lab/University of Chicago ), Yael Garten (LinkedIn), Katie Kent (Galvanize)
Whether starting a data science program, reaching the breaking point with your current data technology, or figuring out what the competition is up to, these sessions will give you a bird's-eye view of data technologies, techniques, and data-driven organizations.
1:30pm-5:00pm (3h 30m) Data-driven Business
Developing a modern enterprise data strategy
Scott Kurth (Silicon Valley Data Science), Edd Wilder-James (Google)
Big data and data science have great potential for accelerating business, but how do you reconcile the opportunity with the sea of possible technologies? Conventional data strategy has little to guide us, focusing more on governance than on creating new value. In this tutorial, we explain how to create a modern data strategy that powers data-driven business.
9:00am-5:00pm (8h) Cultivate
Cultivate: Leading Through Culture
We’re at the cusp of a new network age. The companies defining it are fast, flat, and flexible. They devour data and focus obsessively on their customers. “Analyze and adapt” is their Standing Operating Procedure. At Cultivate, they’ll tell you how they do it—and how you can, too.
9:00am-12:30pm (3h 30m) Design, User Experience, & Visualization
Introduction to visualizations using D3
Brian Suda (optional.is)
The term data vizualization can mean anything from charts and graphs to infographics to big data and everything in between. In this tutorial, we’ll look at the basics of how to design with data, specifically using the industry standard D3 library. By the end, you'll be able to create data vizualizations with your own data sets.
1:30pm-5:00pm (3h 30m) Spark & Beyond
Apache Drill bootcamp
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Apache Drill is an open source distributed SQL engine for Hadoop, NoSQL databases, and other services. Drill's unique schema-free JSON data model enables self-service data exploration and analysis by eliminating the need to define/maintain schemas and transform data. This is a comprehensive hands-on tutorial that will enable you to start exploring and analyzing your data in place, wherever it is.
9:00am-5:00pm (8h) Data Science & Advanced Analytics
Machine Learning 101
Alice Zheng (Amazon), Chris DuBois (Dato), Piotr Teterwak (Dato), Srikrishna Sridhar (Dato)
This hands-on, beginner-friendly tutorial provides a quick start to building intelligent business applications using machine learning. Learn about machine learning basics, feature engineering, recommender systems, and deep learning. The program includes hands-on portions to build and deploy large-scale machine learning applications.
9:00am-5:00pm (8h) Training
Spark Development Bootcamp
Laurent Weichberger (OmPoint Innovations, LLC)
This three-day curriculum features advanced lectures and hands-on technical exercises for Spark usage in data exploration, analysis, and building big data applications.
9:00am-5:00pm (8h) Training
Practical data science on Hadoop
Brandon MacKenzie (IBM), John Rollins (IBM), Jacques Roy (IBM), Chris Fregly (PipelineAI), Mokhtar Kandil (IBM)
In this three-day course, you will: * Learn how to use machine learning, text analysis, and real-time analytics to solve frequently encountered, high-value business problems, * Understand data science methodology and end-to-end work flow of problem solution including data preparation, model building and validation, and model deployment, * Use Apache Spark and other tools for analytics.
9:00am-5:00pm (8h) Training
Designing and building big data applications
Nathan Neff (Cloudera)
Cloudera University’s three-day course for designing and building big data applications prepares you to analyze and solve real-world problems using Apache Hadoop and associated tools in the enterprise data hub (EDH).
6:30pm-8:00pm (1h 30m) Events
Startup Showcase
What new companies are at the leading edge of the data space? Meet some of the best, most innovative founders as they demonstrate their game-changing ideas at the Startup Showcase.
5:00pm-6:30pm (1h 30m) Events
Opening Reception
Grab a drink, mingle with fellow Strata + Hadoop World participants, and see the latest technologies and products from leading companies in the data space.
12:30pm-1:30pm (1h)
Break: 12:30pm - 1:30pm Lunch, Sponsored by Intel (3A & 3B) | 3:00pm - 3:30pm Afternoon Break, sponsored by IBM (various locations)
7:00am-9:00am (2h)
Break: | 8:00am - 9:00am Coffee Break | 10:30am - 11:00am Morning Break, sponsored by SAS (3D, 1E)