Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA
 
LL20 A
Add Data Case Studies to your personal schedule
9:00am Data Case Studies Barbara Eckman (Comcast), Dirk Jungnickel (Emirates Integrated Telecommunications Company (du)), Kishore Papineni (Astellas Pharma), Paul Barth (Podium Data), Carlo Torniai (Pirelli Tyre), Bryan Harrison (American Express), Chris Murphy (Zurich Insurance Group), Martin Lidl (Deloitte), Maura Lynch (Pinterest), Nixon Patel (Kovid Group), Bas Geerdink (ING), Robin Li (Tapjoy), Yohan Chin (Tapjoy), Jim Harrold (NationBuilder), Lana Novikova (Heartbeat AI Technologies)
LL20 C
Add Data 101 to your personal schedule
9:00am Data 101 Edd Wilder-James (Silicon Valley Data Science), Ellen Friedman (Independent), Jim Scott (MapR Technologies), Gabriela de Queiroz (R-Ladies), Melanie Warrick (Google), Aneesh Karve (Quilt Data, Inc)
Add Determining the economic value of your data to your personal schedule
1:30pm Determining the economic value of your data William Schmarzo (Dell EMC)
LL20 D
Add Building your first big data application on AWS to your personal schedule
9:00am Building your first big data application on AWS Radhika Ravirala (Amazon Web Services (AWS)), Ryan Nienhuis (Amazon Web Services (AWS)), Ben Snively (Amazon Web Services (AWS)), Dario Rivera (Amazon Web Services (AWS))
Add Scalable deep learning for the enterprise with DL4J to your personal schedule
1:30pm Scalable deep learning for the enterprise with DL4J Dave Kale (Skymind), Susan Eraly (Skymind), Josh Patterson (Skymind)
LL21 B
Add Just enough Scala for Spark to your personal schedule
9:00am Just enough Scala for Spark Dean Wampler (Lightbend)
Add Guerrilla guide to Python and Apache Hadoop to your personal schedule
1:30pm Guerrilla guide to Python and Apache Hadoop Juliet Hougland (Cloudera)
LL21 C/D
Add Using R for scalable data analytics: From single machines to Hadoop Spark clusters to your personal schedule
9:00am Using R for scalable data analytics: From single machines to Hadoop Spark clusters Vanja Paunic (Microsoft), Robert Horton (Microsoft), Hang Zhang (Microsoft), Srini Kumar (LevaData, Inc.), Mengyue Zhao (Microsoft), John-Mark Agosta (Microsoft), Mario Inchiosa (Microsoft), Debraj GuhaThakurta (Microsoft Corporation)
Add Modeling big data with R, sparklyr, and Apache Spark to your personal schedule
1:30pm Modeling big data with R, sparklyr, and Apache Spark John Mount (Win-Vector LLC)
LL21 E/F
Add Getting started with TensorFlow to your personal schedule
9:00am Getting started with TensorFlow Amy Unruh (Google), Yufeng Guo (Google)
Add Architecting a next-generation data platform to your personal schedule
1:30pm Architecting a next-generation data platform Jonathan Seidman (Cloudera), Ted Malaska (Blizzard), Mark Grover (Cloudera), Gwen Shapira (Confluent)
210 A/E
Add Learn stream processing with Apache Beam to your personal schedule
9:00am Learn stream processing with Apache Beam Frances Perry (Google), Tyler Akidau (Google)
210 C/G
Add Exploration and visualization of large, complex datasets with R, Hadoop, and Spark to your personal schedule
9:00am Exploration and visualization of large, complex datasets with R, Hadoop, and Spark Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting)
Add Introduction to visualizations using D3 to your personal schedule
1:30pm Introduction to visualizations using D3 Brian Suda (optional.is)
210 D/H
Add Architecting a data platform to your personal schedule
9:00am Architecting a data platform John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
Add Architecting and building enterprise-class Spark and Hadoop in cloud environments to your personal schedule
1:30pm Architecting and building enterprise-class Spark and Hadoop in cloud environments James Malone (Google), John Mikula (Google Cloud)
LL21 A
Add Deploying and operating big data analytic apps on the public cloud to your personal schedule
9:00am Deploying and operating big data analytic apps on the public cloud Jennifer Wu (Cloudera), Eugene Fratkin (Cloudera), Andrei Savu (Cloudera), Tony Wu (Cloudera)
Add A practitioner’s guide to securing your Hadoop cluster to your personal schedule
1:30pm A practitioner’s guide to securing your Hadoop cluster Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera)
210 B/F
Add Unraveling data with Spark using machine learning to your personal schedule
9:00am Unraveling data with Spark using machine learning Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera)
Add Developing a modern enterprise data strategy to your personal schedule
1:30pm Developing a modern enterprise data strategy Edd Wilder-James (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science)
LL20 B
Add Data, Transportation, Logistics to your personal schedule
9:00am Data, Transportation, Logistics Michael Abbott (Kleiner Perkins Caufield & Byers), Christopher Pouliot (Nio), Jennifer Anderson, Renee DiResta (Haven), Coco Krumme (Haven | UC Berkeley), Ryan Baumann (Mapbox), Jay White Bear (IBM), Andre Luckow (BMW Group), Rajiv Paul (Yakit), Evangelos Simoudis (Synapse Partners), Roland Major (Transport for London), Rodrigo Fontecilla (Unisys), Lloyd Palum (Vnomics), Andreas Ribbrock (#zeroG, A Lufthansa Systems Company)
San Jose Ballroom, Marriott
Add Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML to your personal schedule
9:00am sponsored by Huawei Technologies Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML Andy Konwinski (Databricks)
Add Startup Showcase  to your personal schedule
6:30pm Startup Showcase | Room: Grand Ballroom
Add Opening Reception to your personal schedule
5:00pm Sponsored by Clearstory & GitHub Opening Reception | Room: Hall 1, 2, 3
12:30pm Lunch | Room: 230 A-C
7:30am Coffee break (7:30am - 9am) | Room: LL Foyer and Executive Concourse
10:30am Morning break sponsored by Google | Room: Executive Concourse
3:00pm Afternoon break | Room: Executive Concourse
Add Speed Networking to your personal schedule
8:15am Speed Networking | Room: East Lobby
9:00am-5:00pm (8h)
Data Case Studies
Barbara Eckman (Comcast), Dirk Jungnickel (Emirates Integrated Telecommunications Company (du)), Kishore Papineni (Astellas Pharma), Paul Barth (Podium Data), Carlo Torniai (Pirelli Tyre), Bryan Harrison (American Express), Chris Murphy (Zurich Insurance Group), Martin Lidl (Deloitte), Maura Lynch (Pinterest), Nixon Patel (Kovid Group), Bas Geerdink (ING), Robin Li (Tapjoy), Yohan Chin (Tapjoy), Jim Harrold (NationBuilder), Lana Novikova (Heartbeat AI Technologies)
In a series of 12 half-hour talks aimed at a business audience, you’ll hear data-themed case studies from household brands and global companies, explaining the challenges they wanted to tackle, the approaches they took, and the benefits—and drawbacks—of their solutions. If you want practical insights about applied data, look no further.
9:00am-12:30pm (3h 30m)
Data 101
Edd Wilder-James (Silicon Valley Data Science), Ellen Friedman (Independent), Jim Scott (MapR Technologies), Gabriela de Queiroz (R-Ladies), Melanie Warrick (Google), Aneesh Karve (Quilt Data, Inc)
Data 101 introduces you to core principles of data architecture, teaches you how to build and manage successful data teams, and inspires you to do more with your data through real-world applications. Setting the foundation for deeper dives on the following days of Strata + Hadoop World, Data 101 reinforces data fundamentals and helps you focus on how data can solve your business problems.
1:30pm-5:00pm (3h 30m) Data-driven business management, Strata Business Summit
Determining the economic value of your data
William Schmarzo (Dell EMC)
Organizations need a model to measure how effectively they are using data and analytics. Once they know where they are and where they need to go, they then need a framework to determine the economic value of their data. William Schmarzo explores techniques for getting business users to “think like a data scientist” so they can assist in identifying data that makes the best performance predictors.
9:00am-12:30pm (3h 30m) Big data and the Cloud Cloud
Building your first big data application on AWS
Radhika Ravirala (Amazon Web Services (AWS)), Ryan Nienhuis (Amazon Web Services (AWS)), Ben Snively (Amazon Web Services (AWS)), Dario Rivera (Amazon Web Services (AWS))
Want to ramp up your knowledge of Amazon's big data web services and launch your first big data application on the cloud? Ben Snively, Radhika Ravirala, Ryan Nienhuis, and Dario Rivera walk you through building a big data application using open source technologies, such as Apache Hadoop, Spark, and Zeppelin, and AWS managed services, such as Amazon EMR, Amazon Kinesis, and more.
1:30pm-5:00pm (3h 30m) Data science & advanced analytics Deep learning
Scalable deep learning for the enterprise with DL4J
Dave Kale (Skymind), Susan Eraly (Skymind), Josh Patterson (Skymind)
Dave Kale, Susan Eraly, and Josh Patterson explain how to build, train, and deploy neural networks using Deeplearning4j. Topics include the fundamentals of deep learning, ND4J and DL4J, and scalable training using GPUs and Apache Spark. You'll gain hands-on experience with several models, including convolutional and recurrent neural nets.
9:00am-12:30pm (3h 30m) Spark & beyond
Just enough Scala for Spark
Dean Wampler (Lightbend)
Apache Spark is written in Scala. Hence, many if not most data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs.
1:30pm-5:00pm (3h 30m) Data science & advanced analytics Pydata
Guerrilla guide to Python and Apache Hadoop
Juliet Hougland (Cloudera)
Using an interactive demo format with accompanying online materials and data, data scientist Juliet Hougland offers a practical overview of the basics of using Python data tools with a Hadoop cluster.
9:00am-12:30pm (3h 30m) Data science & advanced analytics R
Using R for scalable data analytics: From single machines to Hadoop Spark clusters
Vanja Paunic (Microsoft), Robert Horton (Microsoft), Hang Zhang (Microsoft), Srini Kumar (LevaData, Inc.), Mengyue Zhao (Microsoft), John-Mark Agosta (Microsoft), Mario Inchiosa (Microsoft), Debraj GuhaThakurta (Microsoft Corporation)
Join in to learn how to do scalable, end-to-end data science in R on single machines as well as on Spark clusters. You'll be assigned an individual Spark cluster with all contents preloaded and software installed and use it to gain experience building, operationalizing, and consuming machine-learning models using distributed functions in R.
1:30pm-5:00pm (3h 30m) Data science & advanced analytics R
Modeling big data with R, sparklyr, and Apache Spark
John Mount (Win-Vector LLC)
Sparklyr provides an R interface to Spark. With sparklyr, you can manipulate Spark datasets to bring them into R for analysis and visualization and use sparklyr to orchestrate distributed machine learning in Spark from R with the Spark MLlib and H2O SparkingWater libraries. John Mount demonstrates how to use sparklyr to analyze big data in Spark.
9:00am-12:30pm (3h 30m) Data science & advanced analytics Deep learning
Getting started with TensorFlow
Amy Unruh (Google), Yufeng Guo (Google)
Amy Unruh and Yufeng Guo walk you through training and deploying a machine-learning system using TensorFlow, a popular open source library. Amy and Yufeng begin by giving an overview of TensorFlow and demonstrating some fun, already-trained TensorFlow models.
1:30pm-5:00pm (3h 30m) Hadoop platform and applications Architecture
Architecting a next-generation data platform
Jonathan Seidman (Cloudera), Ted Malaska (Blizzard), Mark Grover (Cloudera), Gwen Shapira (Confluent)
Using Entity 360 as an example, Jonathan Seidman, Ted Malaska, Mark Grover, and Gwen Shapira explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics.
9:00am-12:30pm (3h 30m) Stream processing and analytics Streaming
Learn stream processing with Apache Beam
Frances Perry (Google), Tyler Akidau (Google)
Come learn the basics of stream processing via a guided walkthrough of the most sophisticated and portable stream processing model on the planet—Apache Beam (incubating). Tyler Akidau and Frances Perry cover the basics of robust stream processing with the option to execute exercises on top of the runner of your choice—Flink, Spark, or Google Cloud Dataflow.
1:30pm-5:00pm (3h 30m) Stream processing and analytics Streaming
Building real-time data pipelines with Apache Kafka
Ian Wrigley (Confluent)
Ian Wrigley demonstrates how Kafka Connect and Kafka Streams can be used together to build real-world, real-time streaming data pipelines. Using Kafka Connect, you'll ingest data from a relational database into Kafka topics as the data is being generated and then process and enrich the data in real time using Kafka Streams before writing it out for further analysis.
9:00am-12:30pm (3h 30m) Data science & advanced analytics, Visualization & user experience R
Exploration and visualization of large, complex datasets with R, Hadoop, and Spark
Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting)
Divide and recombine techniques provide scalable methods for exploration and visualization of otherwise intractable datasets. Stephen Elston and Ryan Hafen lead a series of hands-on exercises to help you develop skills in exploration and visualization of large, complex datasets using R, Hadoop, and Spark.
1:30pm-5:00pm (3h 30m) Visualization & user experience
Introduction to visualizations using D3
Brian Suda (optional.is)
Visualizations are a key part of conveying any dataset. D3 is the most popular, easiest, and most extensible way to get your data online in an interactive way. Brian Suda outlines best practices for good data visualizations and explains how you can build them using D3.
9:00am-12:30pm (3h 30m) Spark & beyond Architecture
Architecting a data platform
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
What are the essential components of a data platform? John Akred and Stephen O'Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
1:30pm-5:00pm (3h 30m) Big data and the Cloud, Spark & beyond Architecture, Cloud
Architecting and building enterprise-class Spark and Hadoop in cloud environments
James Malone (Google), John Mikula (Google Cloud)
James Malone explores using managed Spark and Hadoop solutions in public clouds alongside cloud products for storage, analysis, and message queues to meet enterprise requirements via the Spark and Hadoop ecosystem.
9:00am-12:30pm (3h 30m) Big data and the Cloud Architecture, Cloud
Deploying and operating big data analytic apps on the public cloud
Jennifer Wu (Cloudera), Eugene Fratkin (Cloudera), Andrei Savu (Cloudera), Tony Wu (Cloudera)
Jennifer Wu, Eugene Fratkin, Andrei Savu, and Tony Wu explore best practices for Hadoop deployments in the public cloud and provide detailed guidance for deploying, configuring, and managing Hive, Spark, and Impala in the public cloud.
1:30pm-5:00pm (3h 30m) Platform Security and Cybersecurity
A practitioner’s guide to securing your Hadoop cluster
Mark Donsky (Cloudera), Andre Araujo (Cloudera), Michael Yoder (Cloudera), Manish Ahluwalia (Cloudera)
Mark Donsky, André Araujo, Michael Yoder, and Manish Ahluwalia walk you through securing a Hadoop cluster. You’ll start with a cluster with no security and then add security features related to authentication, authorization, encryption of data at rest, encryption of data in transit, and complete data governance.
9:00am-12:30pm (3h 30m) Spark & beyond
Unraveling data with Spark using machine learning
Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera)
Vartika Singh, Jayant Shekhar, and Jeffrey Shmain walk you through various approaches available via the machine-learning algorithms available in Spark Framework (and more) to understand and decipher meaningful patterns in real-world data in order to derive value.
1:30pm-5:00pm (3h 30m) Data-driven business management, Strata Business Summit
Developing a modern enterprise data strategy
Edd Wilder-James (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science)
Big data and data science have great potential for accelerating business, but how do you reconcile the business opportunity with the sea of possible technologies? Data should serve the strategic imperatives of a business—those aspirations that will define an organization’s future vision. Scott Kurth and Edd Wilder-James explain how to create a modern data strategy that powers data-driven business.
9:00am-5:00pm (8h)
Data, Transportation, Logistics
Michael Abbott (Kleiner Perkins Caufield & Byers), Christopher Pouliot (Nio), Jennifer Anderson, Renee DiResta (Haven), Coco Krumme (Haven | UC Berkeley), Ryan Baumann (Mapbox), Jay White Bear (IBM), Andre Luckow (BMW Group), Rajiv Paul (Yakit), Evangelos Simoudis (Synapse Partners), Roland Major (Transport for London), Rodrigo Fontecilla (Unisys), Lloyd Palum (Vnomics), Andreas Ribbrock (#zeroG, A Lufthansa Systems Company)
Data, Transportation, and Logistics Day offers a daylong deep-dive into how data science is changing transportation and logistics. We’ll investigate the latest advances in and applications of self-driving vehicles, automated drones, and embedded sensors and explore how new uses of data are challenging the industry to evolve infrastructure for the future.
9:00am-5:00pm (8h) Spark & beyond Streaming, Text
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML
Andy Konwinski (Databricks)
Andy Konwinski introduces you to Apache Spark 2.0 core concepts with a focus on Spark's machine-learning library, using text mining on real-world data as the primary end-to-end use case.
6:30pm-8:00pm (1h 30m) Event
Startup Showcase
What new companies are at the leading edge of the data space? Meet some of the best, most innovative founders as they demonstrate their game-changing ideas at the Startup Showcase.
5:00pm-6:30pm (1h 30m) Event
Opening Reception
Grab a drink, mingle with fellow Strata + Hadoop World attendees, and see the latest technologies and products from leading companies in the data space.
12:30pm-1:30pm (1h)
Break: Lunch
7:30am-8:15am (45m)
Break: Coffee break (7:30am - 9am)
10:30am-11:00am (30m)
Break: Morning break sponsored by Google
3:00pm-3:30pm (30m)
Break: Afternoon break
8:15am-8:45am (30m) Event
Speed Networking
Gather before tutorials on Tuesday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees.