Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA
 
LL20 A
9:00am PyData at Strata (Full Day) T.J. Alumbaugh (Continuum Analytics), James Powell (NumFOCUS), Bryan Van de Ven (Continuum Analytics), Sarah Bird (Continuum Analytics), Jake VanderPlas (eScience Institute, University of Washington), Katrina Riehl (Continuum Analytics)
LL20 C
9:00am R Day (Full Day) Garrett Grolemund (RStudio), Nina Zumel (Win-Vector LLC), John Mount (Win-Vector LLC), Stephen Elston (Quantia Analytics, LLC)
LL20 D
9:00am Introduction to visualizations using D3 (Half Day) Brian Suda (optional.is)
1:30pm Hadoop application architectures: Fraud detection (Half Day) Jonathan Seidman (Cloudera), Ted Malaska (Capital One), Gwen Shapira (Confluent), Mark Grover (Lyft)
LL21 B
9:00am Apache Hadoop operations for production systems (Full Day) Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera), Darren Lo (Cloudera), Jordan Hambleton (Cloudera)
LL21 C/D
9:00am Architecting a data platform (Half Day) John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Data Whisperers), Gary Dusbabek (Silicon Valley Data Science)
1:30pm Developing a modern enterprise data strategy (Half Day) Edd Wilder-James (Google), Scott Kurth (Silicon Valley Data Science)
LL21 E/F
9:00am Spark camp: Exploring Wikipedia with Spark (Full Day) Sameer Farooqui (Databricks)
210 A/E
9:00am Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX (Half Day) Jayant Shekhar (Sparkflows Inc.), Amandeep Khurana (Cloudera), Krishna Sankar (U.S.Bank), Vartika Singh (Cloudera)
1:30pm Building data pipelines with Apache Kafka (Half Day) Joseph Adler (Facebook), Ewen Cheslack-Postava (Confluent), Ian Wrigley (StreamSets)
210 C/G
9:00am Sponsored by IBM Hardcore data science (Full Day)
210 D/H
9:00am Data-driven business day (Full Day) Alistair Croll (Solve For Interesting)
LL20 B
9:00am Data 101 (Half Day) Marie Beaugureau (O'Reilly Media, Inc. )
1:30pm A practitioner’s guide to securing your Hadoop cluster (Half Day) Mubashir Kazia (Cloudera), Benjamin Spivey (Cloudera), Sravya Tirukkovalur (Cloudera), Michael Yoder (Cloudera)
LL21 A
9:00am Introduction to Apache Kafka (Half Day) Jesse Anderson (Big Data Institute), Ewen Cheslack-Postava (Confluent), Joseph Adler (Facebook), Ian Wrigley (StreamSets)
210 B/F
9:00am Practical machine learning (Full Day) Chris DuBois (Dato), Brian Kent (Dato), Srikrishna Sridhar (Dato), Piotr Teterwak (Dato)
Hilton, Almaden Ballroom
6:30pm Sponsored by IBM/Cloudant Startup Showcase | Room: Grand Ballroom 220
12:30pm - 1:30pm Lunch | 3:00pm - 3:30pm Afternoon Break | Room: 230 A-C
5:00pm Sponsored by Platfora Opening Reception | Room: Expo Hall
8:00am - 9:00am Coffee Break | 10:30am - 11:00am Morning Break | Room: Foyer
9:00am-5:00pm (8h) Data Science & Advanced Analytics
PyData at Strata (Full Day)
T.J. Alumbaugh (Continuum Analytics), James Powell (NumFOCUS), Bryan Van de Ven (Continuum Analytics), Sarah Bird (Continuum Analytics), Jake VanderPlas (eScience Institute, University of Washington), Katrina Riehl (Continuum Analytics)
Python has become an increasingly important part of the data-engineer and analytic-tool landscapes. PyData at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including IPython Notebook, NumPy/matplotlib, SciPy, and scikit-learn, and explores how to scale Python performance, including handling large, distributed datasets.
9:00am-5:00pm (8h) Data Science & Advanced Analytics
R Day (Full Day)
Garrett Grolemund (RStudio), Nina Zumel (Win-Vector LLC), John Mount (Win-Vector LLC), Stephen Elston (Quantia Analytics, LLC)
From advanced visualization, collaboration, and reproducibility to big data, R Day at Strata covers a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from leading luminaries and R committers—the folks keeping the R ecosystem apace of the challenges facing analysts and others who work with data.
9:00am-12:30pm (3h 30m) Visualization & User Experience
Introduction to visualizations using D3 (Half Day)
Brian Suda (optional.is)
The term "data visualization" can mean anything from charts and graphs to infographics to big data and everything in between. Brian Suda explores the basics of how to design with data, specifically using the industry-standard D3 library. By the end of Brian's tutorial, you'll be able to create data visualizations with your own datasets.
1:30pm-5:00pm (3h 30m) Hadoop Internals & Development
Hadoop application architectures: Fraud detection (Half Day)
Jonathan Seidman (Cloudera), Ted Malaska (Capital One), Gwen Shapira (Confluent), Mark Grover (Lyft)
Jonathan Seidman, Ted Malaska, Gwen Shapira, and Mark Grover walk participants through building a fraud-detection system, using an end-to-end case study to provide a concrete example of how to architect and implement real-time systems via Apache Hadoop components like Kafka, HBase, Impala, and Spark.
9:00am-5:00pm (8h) Enterprise Adoption
Apache Hadoop operations for production systems (Full Day)
Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera), Darren Lo (Cloudera), Jordan Hambleton (Cloudera)
In this full-day tutorial, participants will get an overview of all aspects of successfully managing Hadoop clusters—from installation to configuration management, service monitoring, troubleshooting, and support integration—with an emphasis on production systems.
9:00am-12:30pm (3h 30m) Spark & Beyond
Architecting a data platform (Half Day)
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Data Whisperers), Gary Dusbabek (Silicon Valley Data Science)
What are the essential components of a data platform? John Akred, Stephen O'Sullivan, and Gary Dusbabek explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
1:30pm-5:00pm (3h 30m) Data-driven Business
Developing a modern enterprise data strategy (Half Day)
Edd Wilder-James (Google), Scott Kurth (Silicon Valley Data Science)
Big data and data science have great potential for accelerating business, but how do you reconcile the business opportunity with the sea of possible technologies? Conventional data strategy has little to guide us, focusing more on governance than on creating new value. Edd Dumbill and Scott Kurth explain how to create a modern data strategy that powers data-driven business.
9:00am-5:00pm (8h) Spark & Beyond
Spark camp: Exploring Wikipedia with Spark (Full Day)
Sameer Farooqui (Databricks)
The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. Through hands-on examples, Sameer Farooqui explores various Wikipedia datasets to illustrate a variety of ideal programming paradigms.
9:00am-12:30pm (3h 30m) Spark & Beyond Machine learning
Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX (Half Day)
Jayant Shekhar (Sparkflows Inc.), Amandeep Khurana (Cloudera), Krishna Sankar (U.S.Bank), Vartika Singh (Cloudera)
Jayant Shekhar, Amandeep Khurana, Krishna Sankar, and Vartika Singh guide participants through techniques for building machine-learning apps using Spark MLlib and Spark ML and demonstrate the principles of graph processing with Spark GraphX.
1:30pm-5:00pm (3h 30m) IoT and Real-time
Building data pipelines with Apache Kafka (Half Day)
Joseph Adler (Facebook), Ewen Cheslack-Postava (Confluent), Ian Wrigley (StreamSets)
Joseph Adler, Ewen Cheslack, and Ian Wrigley demonstrate the features of Apache Kafka that make it easy to build fast, secure, and reliable data pipelines and explain how to use Copycat, Kafka Streams, and Kafka Security as they coach you through building a working enterprise data pipeline.
9:00am-5:00pm (8h) Hardcore Data Science
Hardcore data science (Full Day)
Ben Lorica leads a full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox.
9:00am-5:00pm (8h) Data-driven Business Day
Data-driven business day (Full Day)
Alistair Croll (Solve For Interesting)
Alistair Croll leads a full day of case studies, panels, and eye-opening presentations that explain how to use data to make better business decisions faster. Tailored to business strategists, marketers, product managers, and entrepreneurs, this fast-paced day focuses on how to solve today's thorniest business problems with big data. It's the missing MBA for a data-driven, always-on business world.
9:00am-12:30pm (3h 30m) Data-driven Business
Data 101 (Half Day)
Marie Beaugureau (O'Reilly Media, Inc. )
Data 101 introduces you to core principles of data architecture, teaches you how to build and manage successful data teams, and inspires you to do more with your data through real-world applications. Setting the foundation for deeper dives on the following days of Strata + Hadoop World, Data 101 reinforces data fundamentals and helps you focus on how data can solve your business problems.
1:30pm-5:00pm (3h 30m) Security
A practitioner’s guide to securing your Hadoop cluster (Half Day)
Mubashir Kazia (Cloudera), Benjamin Spivey (Cloudera), Sravya Tirukkovalur (Cloudera), Michael Yoder (Cloudera)
Mubashir Kazia, Ben Spivey, Sravya Tirukkovalur, and Michael Yoder guide participants through the process of securing a Hadoop cluster. Participants will start with a Hadoop cluster with no security and then add security features related to authentication, authorization, encryption of data at rest, encryption of data in transit, and complete data governance.
9:00am-12:30pm (3h 30m) IoT and Real-time
Introduction to Apache Kafka (Half Day)
Jesse Anderson (Big Data Institute), Ewen Cheslack-Postava (Confluent), Joseph Adler (Facebook), Ian Wrigley (StreamSets)
Ewen Cheslack-Postava, Joseph Adler, Jesse Anderson, and Ian Wrigley show how to use Apache Kafka to collect, manage, and process stream data for big data projects and general purpose enterprise data-integration needs alike. Once your data is captured in real time and available as real-time subscriptions, you can start to compute new datasets in real-time from these original feeds.
1:30pm-5:00pm (3h 30m) IoT and Real-time
An introduction to time series with Team Apache (Half Day)
Patrick McFadin (DataStax)
Patrick McFadin gives a comprehensive overview of the powerful Team Apache: Apache Kafka, Spark, and Cassandra. Patrick demonstrates data models, covers deployment considerations, and explains code for different requirements.
9:00am-5:00pm (8h) Data Science & Advanced Analytics Machine learning
Practical machine learning (Full Day)
Chris DuBois (Dato), Brian Kent (Dato), Srikrishna Sridhar (Dato), Piotr Teterwak (Dato)
This hands-on tutorial provides a quick start to building intelligent business applications using machine learning. Learn about machine-learning basics, feature engineering, anomaly detection, recommender systems, and deep learning as you are guided through all the steps of prototyping and production: data cleaning, feature engineering, model building and evaluation, and deployment.
9:00am-5:00pm (8h) Data-driven Business
Emerging technology (Full Day)
Learn about the data innovations that have the potential to blindside even the most careful organizations. Aimed at decision makers, the Emerging Technology program focuses on how data-oriented startups, academics, and venture capitalists approach innovation and the potential for innovative technology to disrupt incumbent business models.
6:30pm-8:00pm (1h 30m) Event
Startup Showcase
What new companies are at the leading edge of the data space? Meet some of the best, most innovative founders as they demonstrate their game-changing ideas at the Startup Showcase.
12:30pm-1:30pm (1h)
Break: - 1:30pm Lunch | 3:00pm - 3:30pm Afternoon Break
5:00pm-6:30pm (1h 30m) Event
Opening Reception
Grab a drink, mingle with fellow Strata + Hadoop World participants, and see the latest technologies and products from leading companies in the data space.
8:00am-9:00am (1h)
Break: - 9:00am Coffee Break | 10:30am - 11:00am Morning Break