Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Tutorials

On Tuesday, September 26, choose from all-day and half-day tutorials. These expert-led presentations give you a chance to dive deep into the subject matter. Please note: to attend, your registration package must include tutorials on Tuesday; does not include access to training courses.

Tuesday, September 26

Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
Average rating: **...
(2.50, 6 ratings)
Vartika Singh and Jeffrey Shmain walk you through various approaches using the machine learning algorithms available in Spark ML to understand and decipher meaningful patterns in real-world data. Vartika and Jeff also demonstrate how to leverage open source deep learning frameworks to run classification problems on image and text datasets leveraging Spark. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1E 12/13 Level: Intermediate
Secondary topics:  Architecture
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
Average rating: ***..
(3.27, 11 ratings)
What are the essential components of a data platform? John Akred and Stephen O'Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1E 14 Level: Intermediate
Secondary topics:  Streaming
Ian Wrigley (StreamSets)
Average rating: ****.
(4.50, 4 ratings)
Ian Wrigley demonstrates how Kafka Connect and Kafka Streams can be used together to build real-world, real-time streaming data pipelines. Using Kafka Connect, you'll ingest data from a relational database into Kafka topics as the data is being generated and then process and enrich the data in real time using Kafka Streams before writing it out for further analysis. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Deep learning, ecommerce
Mo Patel (Teradata), Junxia Li (Think Big Analytics)
Junxia Li and Mo Patel demonstrate how to apply deep learning to improve consumer recommendations by training neural nets to learn categories of interest for recommendations using embeddings. You'll also learn how to achieve wide and deep learning with WALS matrix factorization—now used in production for the Google Play store. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1A 21/22 Level: Intermediate
Yufeng Guo (Google), Amy Unruh (Google)
Average rating: **...
(2.00, 9 ratings)
Yufeng Guo and Amy Unruh walk you through training and deploying a machine learning system using TensorFlow, a popular open source library. Yufeng and Amy take you from a conceptual overview all the way to building complex classifiers and explain how you can apply deep learning to complex problems in science and industry. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1E 10 Level: Intermediate
Secondary topics:  Architecture, Cloud
Jennifer Wu (Cloudera), Fahd Siddiqui (Cloudera), Paul George (Cloudera), Eugene Fratkin (Cloudera)
Average rating: *....
(1.50, 2 ratings)
Jennifer Wu, Paul George, Fahd Siddiqui, and Eugene Fratkin lead a deep dive into running data engineering workloads in a managed service capacity in the public cloud. Along the way, they share AWS infrastructure best practices and explain how data engineering workloads interoperate with data analytic workloads. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1E 15/16 Level: Intermediate
Matthew Rocklin (Anaconda), Ben Zaitlen (Anaconda)
Average rating: *****
(5.00, 1 rating)
The Python data science stack, which includes NumPy, pandas, and scikit-learn, is efficient and intuitive but only for in-memory data and a single core. Matthew Rocklin and Ben Zaitlen demonstrate how to parallelize and scale your Python workloads to multicore machines and multimachine clusters. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1A 23/24 Level: Beginner
Secondary topics:  Cloud
Pranav Rastogi (Microsoft)
Average rating: **...
(2.50, 2 ratings)
As big data solutions are rapidly moving to the cloud, it's becoming increasingly important to know how to use Apache Hadoop, Spark, R Server, and other open source technologies in the cloud. Pranav Rastogi walks you through building big data applications on Azure HDInsight and other Azure services. Read more.
Add to your personal schedule
9:00am5:00pm Tuesday, September 26, 2017
Location: 1A 08/10
Secondary topics:  Text
Brooke Wenig (Databricks)
Brooke Wenig introduces you to Apache Spark 2.0 core concepts with a focus on Spark's machine learning library, using text mining on real-world data as the primary end-to-end use case. Read more.
Add to your personal schedule
9:00am5:00pm Tuesday, September 26, 2017
Location: 1E 07/08
Bradford Cross (DCVC), Robert Passarella (Alpha Features), Jason Morton (Ascendant), Leigh Drogen (Estimize), Bob Levy (Virtual Cove, Inc.), Abraham Thomas (Quandl), Alistair Croll (Solve For Interesting), Robert Passarella (Alpha Features), Vincent-Charles Hodder (Local Logic), Priya Koul (American Express), Tanvi Singh (Credit Suisse), José Ribau (CIBC), Michael Beal (Data Capital Management), Jike Chong (Tsinghua University | Acorns)
Finance is information. From analyzing risk and detecting fraud to predicting payments and improving customer experience, data technologies are transforming the financial industry. And we're diving deep into this change with a new day of data-meets-finance talks, tailored for Strata Data Conference events in the world's financial hubs. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1E 11
Dan Roesch (Roesch & Associates LLC), Dan Roesch (Roesch & Associates LLC), Edd Wilder-James (Google), Mikio Braun (Zalando SE), Javier Esplugas (DHL Supply Chain), Kevin Parent (Conduce), Jim Scott (MapR Technologies), Melanie Warrick (Google), Sarah Manning (Etsy)
Data 101 introduces you to core principles of data architecture, teaches you how to build and manage successful data teams, and inspires you to do more with your data through real-world applications. Setting the foundation for deeper dives on the following days of Strata + Hadoop World, Data 101 reinforces data fundamentals and helps you focus on how data can solve your business problems. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1E 11 Level: Intermediate
John Akred (Silicon Valley Data Science), Heather Nelson (Silicon Valley Data Science)
Average rating: ****.
(4.50, 2 ratings)
John Akred and Heather Nelson share methods and observations from three years of effectively deploying data science in enterprise organizations. You'll learn how to build, run, and get the most value from data science teams and how to work with and plan for the needs of the business. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1E 15/16 Level: Intermediate
Secondary topics:  Architecture, Cloud
Ryan Nienhuis (Amazon Web Services), Radhika Ravirala (Amazon Web Services (AWS)), Allan MacInnis (Amazon Web Services), Ben Snively (Amazon Web Services (AWS))
Average rating: ****.
(4.00, 2 ratings)
Want to learn how to use Amazon's big data web services to launch your first big data application on the cloud? Ryan Nienhuis, Radhika Ravirala, Allan MacInnis, and Ben Snively walk you through building a big data application using a combination of open source technologies and AWS managed services. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1E 12/13 Level: Advanced
Secondary topics:  Architecture
Jonathan Seidman (Cloudera), Gwen Shapira (Confluent), Mark Grover (Lyft)
Average rating: ****.
(4.11, 9 ratings)
Using Customer 360 and the IoT as examples, Jonathan Seidman, Mark Grover, and Gwen Shapira explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Deep learning, Pydata, Text
David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed)
Natural language processing is a key component in many data science systems that must understand or reason about text. David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, TensorFlow for training custom machine-learned annotators, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1E 10 Level: Advanced
Secondary topics:  R
Jared Lander (Lander Analytics)
Average rating: ***..
(3.25, 4 ratings)
Modern statistics has become almost synonymous with machine learning—a collection of techniques that utilize today's incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net, boosted trees, and cross-validation. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Cloud
Mark Donsky (Cloudera), Manish Ahluwalia (Nerdwallet), Andre Araujo (Cloudera), Syed Rafice (Cloudera)
Average rating: *****
(5.00, 1 rating)
Mark Donsky, André Araujo, Syed Rafice, and Manish Ahluwalia walk you through securing a Hadoop cluster. You’ll start with a cluster with no security and then add security features related to authentication, authorization, encryption of data at rest, encryption of data in transit, and complete data governance. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1E 14 Level: Beginner
Secondary topics:  Architecture, Streaming
Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Arun Kejariwal (MZ), Neng Lu (Twitter), Sijie Guo (Streamlio)
Average rating: ***..
(3.00, 3 ratings)
Karthik Ramasamy, Sanjeev Kulkarni, Avrilia Floratau, Ashvin Agrawal, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming systems, algorithms, and deployment architectures, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1A 21/22 Level: Beginner
Secondary topics:  Deep learning
julia lintern (Metis)
Julia Lintern offers a deep dive into deep learning with Keras, beginning with basic neural nets and before exploring convolutional neural nets and recurrent neural nets. Along the way, Julia explains both the design theory behind and the Keras implementations of today's most widely used deep learning algorithms. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning, Healthcare
Josh Patterson (Skymind), Vartika Singh (Cloudera), Dave Kale (Skymind), Tom Hanlon (Skymind)
Average rating: **...
(2.00, 1 rating)
Josh Patterson, Vartika Singh, David Kale, and Tom Hanlon walk you through interactively developing and training deep neural networks to analyze digital health data using the Cloudera Workbench and Deeplearning4j (DL4J). You'll learn how to use the Workbench to rapidly explore real-world clinical data, build data-preparation pipelines, and launch training of neural networks. Read more.