Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Tutorials

Tuesday, March 29

9:00am–5:00pm Tuesday, 03/29/2016
Location: LL21 B
Kathleen Ting (Cloudera), Vikram Srivastava (Cloudera), Darren Lo (Cloudera), Jordan Hambleton (Cloudera)
Average rating: ***..
(3.50, 8 ratings)
In this full-day tutorial, participants will get an overview of all aspects of successfully managing Hadoop clusters—from installation to configuration management, service monitoring, troubleshooting, and support integration—with an emphasis on production systems. Read more.
9:00am–12:30pm Tuesday, 03/29/2016
Location: 210 A/E
Jayant Shekhar (Sparkflows Inc.), Amandeep Khurana (Cloudera), Krishna Sankar (U.S.Bank), Vartika Singh (Cloudera)
Average rating: **...
(2.80, 45 ratings)
Jayant Shekhar, Amandeep Khurana, Krishna Sankar, and Vartika Singh guide participants through techniques for building machine-learning apps using Spark MLlib and Spark ML and demonstrate the principles of graph processing with Spark GraphX. Read more.
9:00am–5:00pm Tuesday, 03/29/2016
Location: 210 B/F
Chris DuBois (Dato), Brian Kent (Dato), Srikrishna Sridhar (Dato), Piotr Teterwak (Dato)
Average rating: ***..
(3.21, 29 ratings)
This hands-on tutorial provides a quick start to building intelligent business applications using machine learning. Learn about machine-learning basics, feature engineering, anomaly detection, recommender systems, and deep learning as you are guided through all the steps of prototyping and production: data cleaning, feature engineering, model building and evaluation, and deployment. Read more.
9:00am–12:30pm Tuesday, 03/29/2016
Location: LL21 C/D
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Data Whisperers), Gary Dusbabek (Silicon Valley Data Science)
Average rating: ***..
(3.96, 49 ratings)
What are the essential components of a data platform? John Akred, Stephen O'Sullivan, and Gary Dusbabek explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads. Read more.
9:00am–12:30pm Tuesday, 03/29/2016
Location: LL21 A
Tags: real-time
Jesse Anderson (Big Data Institute), Ewen Cheslack-Postava (Confluent), Joseph Adler (Facebook), Ian Wrigley (StreamSets)
Average rating: ***..
(3.90, 21 ratings)
Ewen Cheslack-Postava, Joseph Adler, Jesse Anderson, and Ian Wrigley show how to use Apache Kafka to collect, manage, and process stream data for big data projects and general purpose enterprise data-integration needs alike. Once your data is captured in real time and available as real-time subscriptions, you can start to compute new datasets in real-time from these original feeds. Read more.
9:00am–5:00pm Tuesday, 03/29/2016
Location: LL20 C
Garrett Grolemund (RStudio), Nina Zumel (Win-Vector LLC), John Mount (Win-Vector LLC), Stephen Elston (Quantia Analytics, LLC)
Average rating: ***..
(3.88, 8 ratings)
From advanced visualization, collaboration, and reproducibility to big data, R Day at Strata covers a raft of current topics that analysts and R users need to pay attention to. The R Day tutorials come from leading luminaries and R committers—the folks keeping the R ecosystem apace of the challenges facing analysts and others who work with data. Read more.
9:00am–5:00pm Tuesday, 03/29/2016
SOLD OUT
Location: LL21 E/F
Sameer Farooqui (Databricks)
Average rating: ***..
(3.90, 41 ratings)
The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. Through hands-on examples, Sameer Farooqui explores various Wikipedia datasets to illustrate a variety of ideal programming paradigms. Read more.
9:00am–12:30pm Tuesday, 03/29/2016
Location: LL20 B
Marie Beaugureau (O'Reilly Media, Inc. )
Average rating: ****.
(4.33, 12 ratings)
Data 101 introduces you to core principles of data architecture, teaches you how to build and manage successful data teams, and inspires you to do more with your data through real-world applications. Setting the foundation for deeper dives on the following days of Strata + Hadoop World, Data 101 reinforces data fundamentals and helps you focus on how data can solve your business problems. Read more.
9:00am–5:00pm Tuesday, 03/29/2016
Location: 210 C/G
Average rating: ****.
(4.00, 21 ratings)
Ben Lorica leads a full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. Read more.
9:00am–5:00pm Tuesday, 03/29/2016
Location: 210 D/H
Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.76, 17 ratings)
Alistair Croll leads a full day of case studies, panels, and eye-opening presentations that explain how to use data to make better business decisions faster. Tailored to business strategists, marketers, product managers, and entrepreneurs, this fast-paced day focuses on how to solve today's thorniest business problems with big data. It's the missing MBA for a data-driven, always-on business world. Read more.
9:00am–5:00pm Tuesday, 03/29/2016
Location: LL20 A
T.J. Alumbaugh (Continuum Analytics), James Powell (NumFOCUS), Bryan Van de Ven (Continuum Analytics), Sarah Bird (Continuum Analytics), Jake VanderPlas (eScience Institute, University of Washington), Katrina Riehl (Continuum Analytics)
Average rating: ****.
(4.33, 18 ratings)
Python has become an increasingly important part of the data-engineer and analytic-tool landscapes. PyData at Strata provides in-depth coverage of the tools and techniques gaining traction with the data audience, including IPython Notebook, NumPy/matplotlib, SciPy, and scikit-learn, and explores how to scale Python performance, including handling large, distributed datasets. Read more.
9:00am–5:00pm Tuesday, 03/29/2016
Location: Hilton, Almaden Ballroom
Average rating: ***..
(3.40, 5 ratings)
Learn about the data innovations that have the potential to blindside even the most careful organizations. Aimed at decision makers, the Emerging Technology program focuses on how data-oriented startups, academics, and venture capitalists approach innovation and the potential for innovative technology to disrupt incumbent business models. Read more.
9:00am–12:30pm Tuesday, 03/29/2016
Location: LL20 D
Brian Suda (optional.is)
Average rating: ***..
(3.33, 12 ratings)
The term "data visualization" can mean anything from charts and graphs to infographics to big data and everything in between. Brian Suda explores the basics of how to design with data, specifically using the industry-standard D3 library. By the end of Brian's tutorial, you'll be able to create data visualizations with your own datasets. Read more.
1:30pm–5:00pm Tuesday, 03/29/2016
Location: LL20 D
Jonathan Seidman (Cloudera), Ted Malaska (Capital One), Gwen Shapira (Confluent), Mark Grover (Lyft)
Average rating: ****.
(4.48, 23 ratings)
Jonathan Seidman, Ted Malaska, Gwen Shapira, and Mark Grover walk participants through building a fraud-detection system, using an end-to-end case study to provide a concrete example of how to architect and implement real-time systems via Apache Hadoop components like Kafka, HBase, Impala, and Spark. Read more.
1:30pm–5:00pm Tuesday, 03/29/2016
Location: LL20 B
Mubashir Kazia (Cloudera), Benjamin Spivey (Cloudera), Sravya Tirukkovalur (Cloudera), Michael Yoder (Cloudera)
Average rating: ****.
(4.20, 10 ratings)
Mubashir Kazia, Ben Spivey, Sravya Tirukkovalur, and Michael Yoder guide participants through the process of securing a Hadoop cluster. Participants will start with a Hadoop cluster with no security and then add security features related to authentication, authorization, encryption of data at rest, encryption of data in transit, and complete data governance. Read more.
1:30pm–5:00pm Tuesday, 03/29/2016
Location: LL21 C/D
Edd Wilder-James (Google), Scott Kurth (Silicon Valley Data Science)
Average rating: ***..
(3.59, 22 ratings)
Big data and data science have great potential for accelerating business, but how do you reconcile the business opportunity with the sea of possible technologies? Conventional data strategy has little to guide us, focusing more on governance than on creating new value. Edd Dumbill and Scott Kurth explain how to create a modern data strategy that powers data-driven business. Read more.
1:30pm–5:00pm Tuesday, 03/29/2016
Location: 210 A/E
Tags: real-time
Joseph Adler (Facebook), Ewen Cheslack-Postava (Confluent), Ian Wrigley (StreamSets)
Average rating: ***..
(3.06, 32 ratings)
Joseph Adler, Ewen Cheslack, and Ian Wrigley demonstrate the features of Apache Kafka that make it easy to build fast, secure, and reliable data pipelines and explain how to use Copycat, Kafka Streams, and Kafka Security as they coach you through building a working enterprise data pipeline. Read more.
1:30pm–5:00pm Tuesday, 03/29/2016
Location: LL21 A
Tags: real-time
Patrick McFadin (DataStax)
Average rating: ****.
(4.07, 14 ratings)
Patrick McFadin gives a comprehensive overview of the powerful Team Apache: Apache Kafka, Spark, and Cassandra. Patrick demonstrates data models, covers deployment considerations, and explains code for different requirements. Read more.