Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY
 

Strata + Hadoop World 2014 Schedule

Use the calendar icon [calendar icon] next to each listing you want to attend. Then use the personal schedule button below to generate your schedule.

Schedule Views

List Grid
1 E8/1 E9
Add Data Science at the Command Line to your personal schedule
9:00am Data Science at the Command Line Jeroen Janssens (Data Science Workshops)
Add Getting Started with HBase Application Development to your personal schedule
1:30pm Getting Started with HBase Application Development Sridhar Reddy (MapR Technologies), carol mcdonald (MapR Technologies)
1 C03/1 C04
Add Building Privacy Protected Data Systems to your personal schedule
9:00am Building Privacy Protected Data Systems Ari Gesher (Palantir Technologies), John Grant (Palantir Technologies), Courtney Bowman (Palantir Technologies)
Add Becoming a Scalable Data Scientist   to your personal schedule
1:30pm Becoming a Scalable Data Scientist Carlos Guestrin (Apple | University of Washington ), Alice Zheng (Amazon), Shawn Scully (Dato)
Hall A 23/24
Add Spark Camp to your personal schedule
9:00am Spark Camp Paco Nathan (O'Reilly Media), Michael Armbrust (Databricks), Tathagata Das (Databricks), Matei Zaharia (Databricks), Reynold Xin (Databricks), Ameet Talwalkar (Determined AI), Holden Karau (IBM), Joseph Bradley (Databricks), Sameer Farooqui (Databricks), Patrick Wendell (Databricks)
1 E20/1 E21
Add Data-Driven Business Day to your personal schedule
9:00am Data-Driven Business Day Alistair Croll (Solve For Interesting), Farrah Bostic (The Difference Engine), Edd Wilder-James (Silicon Valley Data Science), Jennifer Zeszut (Beckon), Brian Dalessandro (Zocdoc), Jana Eggers (Nara Logics), Joe Caserta (Caserta Concepts), Joy Beatty (Seilevel), Kim Rees (Periscopic), Peter Ferns (Goldman Sachs & Co), Brigitte Piniewski (nonaffiliated ), Nellwyn Thomas (Etsy), Michael Rosenbaum (Pegged Software), Merici Vinton (OI Engine @ IDEO ), Mary Ann Wayer (Premier Inc), Rohit Jain (Esgyn), Amy Gaskins (Panopticon), Jen van der Meer (Reason Street), Mark Doms (United States Department of Commerce), Halle Tecco (Rock Health)
1 E10/1 E11
Add Architectural Considerations for Hadoop Applications to your personal schedule
9:00am Architectural Considerations for Hadoop Applications Mark Grover (Lyft), Jonathan Seidman (Cloudera), Gwen Shapira (Confluent), Ted Malaska (Blizzard Entertainment)
Add Building A Data Platform to your personal schedule
1:30pm Building A Data Platform Stephen O'Sullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science)
1 E12/1 E13
Add PyData at Strata to your personal schedule
9:00am PyData at Strata Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory), Brian Granger (Cal Poly San Luis Obispo), Andy Terrel (NumFOCUS), Peter Wang (Anaconda), Jake Vanderplas (eScience Institute, University of Washington), Olivier Grisel (Inria & scikit-learn), Travis Oliphant (Anaconda), Wes McKinney (Two Sigma Investments), Trent Nelson (Continuum Analytics), Kayur Patel (Google), Kester Tong (Google)
1 E14/1 E15
Add Hardcore Data Science to your personal schedule
9:00am Hardcore Data Science Ben Lorica (O'Reilly Media), Ted Dunning (MapR Technologies), Tim Kraska (Brown University), Alice Zheng (Amazon), Anna Gilbert (University of Michigan), Jon Kleinberg (Cornell University), Kira Radinsky (eBay | Technion), Rob Fergus (New York University and Facebook), Ben Recht (University of California, Berkeley), Brian Whitman (Spotify), Hanna Wallach (Microsoft Research NYC & University of Massachusetts Amherst), Dafna Shahaf (The Hebrew University of Jerusalem)
1 E6/1 E7
Add D3.js Tutorial - D3 For Everyone! to your personal schedule
9:00am D3.js Tutorial - D3 For Everyone! Sebastian Gutierrez (DashingD3js.com)
Add Just Enough Math to your personal schedule
1:30pm Just Enough Math Paco Nathan (O'Reilly Media), Allen Day (MapR Technologies)
1 D03/1 D04
Add Industrial Internet to your personal schedule
9:00am Industrial Internet Jon Bruner (O'Reilly Media), Daniel Koffler (Rio Tinto Alcan), Ami Daniel (Windward), David Simchi-Levi (MIT), Victor Fang (Pivotal), Yu Cao (EMC), Nathan Oostendorp (Sight Machine), Alasdair Allan (Babilim Light Industries), Cameron Turner (The Data Guild), Leo Spiegel (Pivotal), Edy Liongosari (Accenture), Mark Grabb (General Electric Global Research Center)
1 E16/ 1 E17
Add R Day to your personal schedule
9:00am R Day Hadley Wickham (Rice University / RStudio), Winston Chang (RStudio), Garrett Grolemund (RStudio), Joseph Allaire (Rstudio, Inc.), Yihui Xie (RStudio, Inc.)
1 E05
Add Owning Time Series With Team Apache: Cassandra, Spark, Spark Streaming, and Kafka to your personal schedule
9:00am Owning Time Series With Team Apache: Cassandra, Spark, Spark Streaming, and Kafka Patrick McFadin (Datastax), Helena Edelson (Apple)
12:30pm Lunch
Room: North Hall and Hall 1A
Add Startup Showcase to your personal schedule
5:00pm Plenary
Room: North Hall Mezzanine
Startup Showcase
7:00am Coffee Break
Room: Hall E
9:00am-12:30pm (3h 30m) Data Science
Data Science at the Command Line
Jeroen Janssens (Data Science Workshops)
The command line, although invented decades ago, remains an amazing environment for doing data science. By combining small, yet powerful, command-line tools you can quickly obtain, scrub, explore, visualize, and model your data. In this hands-on tutorial you will gain a solid understanding of how to leverage the power of the command line and integrate it into your existing data science workflow.
1:30pm-5:00pm (3h 30m) Hadoop in Action
Getting Started with HBase Application Development
Sridhar Reddy (MapR Technologies), carol mcdonald (MapR Technologies)
This tutorial will help you get a jump start on HBase development. We’ll start with a quick overview of HBase, the HBase data model, and architecture, and then we’ll dive directly into code to help you understand how to build HBase applications. We will also offer guidelines for good schema design, and will cover a few advanced concepts such as using HBase for transactions.
9:00am-12:30pm (3h 30m) Business & Industry, Law, Ethics & Open Data, Security
Building Privacy Protected Data Systems
Ari Gesher (Palantir Technologies), John Grant (Palantir Technologies), Courtney Bowman (Palantir Technologies)
Technologists focused on privacy and civil liberties will run through the material in their book. The workshop will cover how to think about privacy, privacy protection properties that a system can have and the architectures that implement them, related issues in information security, and privacy issues in data collection.
1:30pm-5:00pm (3h 30m) Data Science
Becoming a Scalable Data Scientist
Carlos Guestrin (Apple | University of Washington ), Alice Zheng (Amazon), Shawn Scully (Dato)
This tutorial focuses on hands-on data science skills from prototyping to production. Using GraphLab tools, we walk through multiple case studies such as fraud detection, social network analysis, and building personalized recommendation services.
9:00am-5:00pm (8h) Hadoop & Beyond
Spark Camp
Paco Nathan (O'Reilly Media), Michael Armbrust (Databricks), Tathagata Das (Databricks), Matei Zaharia (Databricks), Reynold Xin (Databricks), Ameet Talwalkar (Determined AI), Holden Karau (IBM), Joseph Bradley (Databricks), Sameer Farooqui (Databricks), Patrick Wendell (Databricks)
Spark Camp, organized by the creators of the Apache Spark project at Databricks, will be a day long hands-on introduction to the Spark platform including Spark Core, the Spark Shell, Spark Streaming, Spark SQL, MLlib, and more.
9:00am-5:00pm (8h) Data-Driven Business Day
Data-Driven Business Day
Alistair Croll (Solve For Interesting), Farrah Bostic (The Difference Engine), Edd Wilder-James (Silicon Valley Data Science), Jennifer Zeszut (Beckon), Brian Dalessandro (Zocdoc), Jana Eggers (Nara Logics), Joe Caserta (Caserta Concepts), Joy Beatty (Seilevel), Kim Rees (Periscopic), Peter Ferns (Goldman Sachs & Co), Brigitte Piniewski (nonaffiliated ), Nellwyn Thomas (Etsy), Michael Rosenbaum (Pegged Software), Merici Vinton (OI Engine @ IDEO ), Mary Ann Wayer (Premier Inc), Rohit Jain (Esgyn), Amy Gaskins (Panopticon), Jen van der Meer (Reason Street), Mark Doms (United States Department of Commerce), Halle Tecco (Rock Health)
All-Day: For business strategists, marketers, product managers, and entrepreneurs, Data-Driven Business looks at how to use data to make better business decisions faster. Packed with case studies, panels, and eye-opening presentations, this fast-paced day focuses on how to solve today's thorniest business problems with Big Data. It's the missing MBA for a data-driven, always-on business world.
9:00am-12:30pm (3h 30m) Hadoop in Action
Architectural Considerations for Hadoop Applications
Mark Grover (Lyft), Jonathan Seidman (Cloudera), Gwen Shapira (Confluent), Ted Malaska (Blizzard Entertainment)
Are you looking for a deeper understanding of how to integrate components in the Apache Hadoop ecosystem to implement data management and processing solutions? Then this tutorial is for you. We'll provide a clickstream analytics example illustrating how to architect solutions with Apache Hadoop along with providing best practices and recommendations for using Hadoop and related tools.
1:30pm-5:00pm (3h 30m) Hadoop Platform
Building A Data Platform
Stephen O'Sullivan (Silicon Valley Data Science), John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science)
What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads.
9:00am-5:00pm (8h) Data Science
PyData at Strata
Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory), Brian Granger (Cal Poly San Luis Obispo), Andy Terrel (NumFOCUS), Peter Wang (Anaconda), Jake Vanderplas (eScience Institute, University of Washington), Olivier Grisel (Inria & scikit-learn), Travis Oliphant (Anaconda), Wes McKinney (Two Sigma Investments), Trent Nelson (Continuum Analytics), Kayur Patel (Google), Kester Tong (Google)
Python has become an increasingly important part of the data engineer and analytic tool landscape. Pydata at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including iPython Notebook, NumPy/matplotlib for visualization, SciPy, scikit-learn, and how to scale Python performance, including how to handle large, distributed data sets.
9:00am-5:00pm (8h) Hardcore Data Science
Hardcore Data Science
Ben Lorica (O'Reilly Media), Ted Dunning (MapR Technologies), Tim Kraska (Brown University), Alice Zheng (Amazon), Anna Gilbert (University of Michigan), Jon Kleinberg (Cornell University), Kira Radinsky (eBay | Technion), Rob Fergus (New York University and Facebook), Ben Recht (University of California, Berkeley), Brian Whitman (Spotify), Hanna Wallach (Microsoft Research NYC & University of Massachusetts Amherst), Dafna Shahaf (The Hebrew University of Jerusalem)
All-Day: Strata's regular data science track has great talks with real world experience from leading edge speakers. But we didn't just stop there—we added the Hardcore Data Science day to give you a chance to go even deeper. The Hardcore day will add new techniques and technologies to your data science toolbox, shared by leading data science practitioners from startups, industry, consulting...
9:00am-12:30pm (3h 30m) Design & Interfaces
D3.js Tutorial - D3 For Everyone!
Sebastian Gutierrez (DashingD3js.com)
D3.js has a very steep learning curve. However, there are three main concepts that, once you get your head around them, will make the climb much easier. Focusing on these three main concepts, we will walk through many examples to teach the fundamental building blocks of D3.js.
1:30pm-5:00pm (3h 30m) Business & Industry
Just Enough Math
Paco Nathan (O'Reilly Media), Allen Day (MapR Technologies)
Advanced math for business people: “just enough math” to take advantage of new classes of open source frameworks. Many take college math up to calculus, but never learn how to approach sparse matrices, complex graphs, or supply chain optimizations. This tutorial ties these pieces together into a conceptual whole, with use cases and simple Python code, as a new approach to computational thinking.
9:00am-5:00pm (8h) Business & Industry
Industrial Internet
Jon Bruner (O'Reilly Media), Daniel Koffler (Rio Tinto Alcan), Ami Daniel (Windward), David Simchi-Levi (MIT), Victor Fang (Pivotal), Yu Cao (EMC), Nathan Oostendorp (Sight Machine), Alasdair Allan (Babilim Light Industries), Cameron Turner (The Data Guild), Leo Spiegel (Pivotal), Edy Liongosari (Accenture), Mark Grabb (General Electric Global Research Center)
Big Data is reaching beyond the Internet and into the machines that drive our world. Visit Industrial Internet day to gain insights from the way that power plants, factories, cars, and airplanes make use of sensors and software intelligence to improve operations and help managers make good decisions.
9:00am-5:00pm (8h) Data Science
R Day
Hadley Wickham (Rice University / RStudio), Winston Chang (RStudio), Garrett Grolemund (RStudio), Joseph Allaire (Rstudio, Inc.), Yihui Xie (RStudio, Inc.)
From advanced visualization, collaboration, reproducibility to data manipulation, R Day at Strata covers a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from leading luminaries and R committers, the folks keeping the R ecosystem apace of the challenges facing analysts and others who work with data.
9:00am-12:30pm (3h 30m) Hadoop & Beyond
Owning Time Series With Team Apache: Cassandra, Spark, Spark Streaming, and Kafka
Patrick McFadin (Datastax), Helena Edelson (Apple)
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. Add in Apache Spark and Kafka, you have an amazing time series solution. We will talk data models, go through deployment and code to build a functional, real-time application. Languages used: Java, Scala
12:30pm-1:30pm (1h)
Break: Lunch
5:00pm-6:30pm (1h 30m) Events
Startup Showcase
Don't miss Startup Showcase, Strata Conference + Hadoop World's live demo program and competition for startups and early-stage companies. The judges will pick winners from 10 finalist companies selected to present at the showcase. This event is part of NYC Data Week.
7:00am-9:00am (2h)
Break: Coffee Break