Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Machine Learning and Data Science

If you're in data, you need to understand machine learning

Using algorithms that learn iteratively, machine learning lets you discover hidden insight from your data. It's a simple idea with phenomenal impact and sophisticated use cases like recommenders, text mining, real-time analytics, large-scale anomaly detection, and business forecasting.

Strata is a unique opportunity to get unparalleled depth and breadth in machine and deep learning—take a look at the sessions below.

Monday & Tuesday, September 25–26: 2-Day Training Courses (Platinum & Training passes)
Location: 1A 01/02 Location: 1A 15/16/17
Tuesday September 26: Tutorials (Gold & Silver passes)
Location: 1A 06/07 Location: 1A 12/14 Location: 1A 18 Location: 1A 21/22 Location: 1A 23/24
12:30pm Lunch
Wednesday September 27: Keynotes & Sessions (Gold, Silver & Bronze passes)
Location: 1A 06/07 Location: 1A 08/10 Location: 1A 12/14
8:45 | Location: 3E
Strata Data Conference Keynotes
12:00pm
Wednesday Industry Tables at Lunch
6:05pm | Location: Expo Hall
Booth Crawl
7:30pm | Location: 230 Fifth Penthouse
Data After Dark
Thursday September 28: Keynotes & Sessions (Gold, Silver & Bronze passes)
Location: 1A 06/07 Location: 1A 08/10 Location: 1A 12/14
8:45 | Location: San Jose Ballroom
Strata Data Conference Keynotes
12:00pm
Thursday Industry Tables at Lunch
Add to your personal schedule
9:00am - 5:00pm Monday, September 25 & Tuesday, September 26 Monday-Tuesday
Location: 1A 01/02
Secondary topics:  Deep learning
SOLD OUT
Dana Mastropole (The Data Incubator)
Average rating: **...
(2.50, 2 ratings)
Dana Mastropole and Michael Li demonstrate TensorFlow's capabilities through its Python interface and explore TFLearn, a high-level deep learning library built on TensorFlow. Join in to learn how to use TFLearn and TensorFlow to build machine learning models on real-world data. Read more.
Add to your personal schedule
9:00am - 5:00pm Monday, September 25 & Tuesday, September 26 Monday-Tuesday
Location: 1A 15/16/17
Secondary topics:  Streaming
SOLD OUT
Joseph Kambourakis (Databricks)
Average rating: *****
(5.00, 1 rating)
Joseph Kambourakis walks you through using Apache Spark to perform exploratory data analysis (EDA), developing machine learning pipelines, and using the APIs and algorithms available in the Spark MLlib DataFrames API. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
Average rating: **...
(2.50, 6 ratings)
Vartika Singh and Jeffrey Shmain walk you through various approaches using the machine learning algorithms available in Spark ML to understand and decipher meaningful patterns in real-world data. Vartika and Jeff also demonstrate how to leverage open source deep learning frameworks to run classification problems on image and text datasets leveraging Spark. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Deep learning, ecommerce
Mo Patel (Teradata), Junxia Li (Think Big Analytics)
Junxia Li and Mo Patel demonstrate how to apply deep learning to improve consumer recommendations by training neural nets to learn categories of interest for recommendations using embeddings. You'll also learn how to achieve wide and deep learning with WALS matrix factorization—now used in production for the Google Play store. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1A 21/22 Level: Intermediate
Yufeng Guo (Google), Amy Unruh (Google)
Average rating: **...
(2.00, 9 ratings)
Yufeng Guo and Amy Unruh walk you through training and deploying a machine learning system using TensorFlow, a popular open source library. Yufeng and Amy take you from a conceptual overview all the way to building complex classifiers and explain how you can apply deep learning to complex problems in science and industry. Read more.
Add to your personal schedule
9:00am5:00pm Tuesday, September 26, 2017
Location: 1A 06/07
Ben Lorica (O'Reilly Media), Assaf Araki (Intel), Jacob Schreiber (University of Washington), Alex Ratner (Stanford University), Madeleine Udell (Cornell University), Yunsong Guo (Pinterest), Katherine Heller (Duke University), Alan Nichol (Rasa), Gerard de Melo (Rutgers University), Tamara Broderick (MIT), Inbal Tadeski (Anodot), Daniel Kang (Stanford University), Bichen Wu (UC Berkeley), Shaked Shammah (Hebrew University)
A full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1A 21/22 Level: Beginner
Secondary topics:  Deep learning
julia lintern (Metis)
Julia Lintern offers a deep dive into deep learning with Keras, beginning with basic neural nets and before exploring convolutional neural nets and recurrent neural nets. Along the way, Julia explains both the design theory behind and the Keras implementations of today's most widely used deep learning algorithms. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Deep learning, Pydata, Text
David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed)
Natural language processing is a key component in many data science systems that must understand or reason about text. David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, TensorFlow for training custom machine-learned annotators, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1E 10 Level: Advanced
Secondary topics:  R
Jared Lander (Lander Analytics)
Average rating: ***..
(3.25, 4 ratings)
Modern statistics has become almost synonymous with machine learning—a collection of techniques that utilize today's incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net, boosted trees, and cross-validation. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
Eric Colson (Stitch Fix)
Average rating: ****.
(4.67, 3 ratings)
While companies often use data science as a supportive function, the emergence of new business models has made it possible for some companies to differentiate via data science. Eric Colson explores what it means to differentiate by data science and explains why companies must now think very differently about the role and placement of data science in the organization. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Financial services
Justin Bleich (Coatue Management)
Average rating: ****.
(4.00, 1 rating)
Prophet is a Bayesian nonlinear time series forecasting model recently released by Facebook. Justin Bleich explains how Coatue—a hedge fund that uses data science to drive investment decisions—extends Prophet to include exogenous covariates when generating forecasts and applies it to nowcasting macroeconomic series using higher-frequency data available from sources such as Google Trends. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  AI, Deep learning, ecommerce
Mikio Braun (Zalando SE)
Average rating: ***..
(3.71, 7 ratings)
Deep learning has become the go-to solution for many application areas, such as image classification or speech processing, but does it work for all application areas? Mikio Braun offers background on deep learning and shares his practical experience working with these exciting technologies. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
Matthew Roche (Microsoft), Jennifer Marie Stevens (Microsoft)
Average rating: *****
(5.00, 1 rating)
The data-driven business must bridge the language gap between data scientists and business users. Matthew Roche and Jennifer Stevens walk you through building a business glossary that codifies your semantic layer and enables greater conversational fluency between business users and data scientists. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Tristan Zajonc (Cloudera), Thomas Dinsmore (Cloudera), Lucas Glass (QuintilesIMS)
Average rating: ***..
(3.00, 1 rating)
Data science alone is easy. Data science with others, whether in the enterprise or on shared distributed systems, requires a bit more work. Tristan Zajonc and Thomas Dinsmore discuss common technology considerations and patterns for collaboration in large teams and for moving machine learning into production at scale. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning
Yuhao Yang (Intel), Zhichao Li (Intel)
Average rating: ****.
(4.00, 2 ratings)
Yuhao Yang and Zhichao Li discuss building end-to-end analytics and deep learning applications, such as speech recognition and object detection, on top of BigDL and Spark and explore recent developments in BigDL, including Python APIs, notebook and TensorBoard support, TensorFlow model R/W support, better recurrent and recursive net support, and 3D image convolutions. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
Sander Pick (Set), Andrew Hill (Set), Carson Farmer (Set)
Average rating: ****.
(4.00, 1 rating)
Location-based data is full of information about our everyday lives, but GPS and WiFi signals create extremely noisy mobile location data, making it hard to extract features, especially when working with real-time data. Andrew Hill and Sander Pick explore new strategies for extracting information from location data while remaining scalable, privacy focused, and contextually aware. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Beginner
Moderated by:
Jason Grout (Bloomberg LP)
Panelists:
Jessica Forde (Jupyter)
Average rating: ****.
(4.80, 5 ratings)
With JupyterLab, users compute with multiple notebooks, editors, and consoles that work together in a tabbed layout. Jason Grout and Jessica Forde offer an overview of JupyterLab, the next generation of the Jupyter Notebook, demonstrate how to use third-party plugins to extend and customize many aspects of JupyterLab, and explain how it fits within the overall vision of Project Jupyter. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Advanced
Secondary topics:  Media, Text
Eui-Hong Han (The Washington Post), Ling Jiang (The Washington Post)
Average rating: ****.
(4.50, 2 ratings)
The quality of online comments is critical to the Washington Post. However, the quality management of the comment section currently requires costly manual resources. Eui-Hong Han and Ling Jiang discuss ModBot, a machine learning-based tool developed for automatic comments moderation, and share the challenges they faced in developing and deploying ModBot into production. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Beginner
Secondary topics:  Data for good, ecommerce, Healthcare
Average rating: ****.
(4.67, 3 ratings)
Zocdoc is an online marketplace that allows easy doctor discovery and instant online booking. However, dealing with healthcare involves many constraints and challenges that render standard approaches to common problems infeasible. Brian Dalessandro surveys the various machine learning problems Zocdoc has faced and shares the data, legal, and ethical constraints that shape its solution space. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Pydata
Matthew Rocklin (Anaconda)
Average rating: ****.
(4.67, 3 ratings)
Dask parallelizes Python libraries like NumPy, pandas, and scikit-learn, bringing a popular data science stack to the world of distributed computing. Matthew Rocklin discusses the architecture and current applications of Dask used in the wild and explores computational task scheduling and parallel computing within Python generally. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning
Joshua Patterson (NVIDIA), Michael Balint (NVIDIA), Satish Varma Dandu (NVIDIA)
Average rating: ****.
(4.00, 1 rating)
How can deep learning be employed to create a system that monitors network traffic, operations data, and system logs to reliably flag risk and unearth potential threats? Satish Dandu, Joshua Patterson, and Michael Balint explain how to bootstrap a deep learning framework to detect risk and threats in operational production systems, using best-of-breed GPU-accelerated open source tools. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
Patrick Hall (H2O.ai | George Washington University), Sri Satish (H2O.ai)
Average rating: *****
(5.00, 1 rating)
Interpreting deep learning and machine learning models is not just another regulatory burden to be overcome. People who use these technologies have the right to trust and understand AI. Patrick Hall and Sri Satish share techniques for interpreting deep learning and machine learning models and telling stories from their results. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Pydata
Shoumik Palkar (Stanford University), Matei Zaharia (Stanford University)
Average rating: *****
(5.00, 2 ratings)
Modern data applications combine functions from many optimized libraries (e.g., pandas and TensorFlow) and yet do not achieve peak hardware performance due to data movement across functions. Shoumik Palkar and Matei Zaharia offer an overview of Weld, a new interface to implement functions in these libraries while enabling optimizations across them. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Financial services, Platform
Nadeem Gulzar (Danske Bank Group), Sune Askjær (Think Big Analytics, a Teradata Company)
Average rating: *****
(5.00, 3 ratings)
Fraud in banking is an arms race, and criminals are now using machine learning to improve their attack effectiveness. Sune Askjaer and Nadeem Gulzar explore how Danske Bank uses deep learning for better fraud detection, covering model effectiveness, TensorFlow versus boosted decision trees, operational considerations in training and deploying models, and lessons learned along the way. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
David Talby (Pacific AI)
Average rating: *****
(5.00, 2 ratings)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Advanced
Secondary topics:  Media
Seth Hendrickson (Cloudera), DB Tsai (Netflix)
Average rating: *****
(5.00, 1 rating)
Recent developments in Spark MLlib have given users the power to express a wider class of ML models and decrease model training times via the use of custom parameter optimization algorithms. Seth Hendrickson and DB Tsai explain when and how to use this new API and walk you through creating your own Spark ML optimizer. Along the way, they also share performance benefits and real-world use cases. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Cloud, Deep learning
Leo Dirac (Amazon Web Services)
Average rating: *****
(5.00, 6 ratings)
Leo Dirac demonstrates how to apply the latest deep learning techniques to semantically understand images. You'll learn what embeddings are, how to extract them from your images using deep convolutional neural networks (CNNs), and how they can be used to cluster and classify large datasets of images. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Text
Paco Nathan (O'Reilly Media)
Average rating: *****
(5.00, 3 ratings)
Paco Nathan demonstrates how to use PyTextRank—an open source Python implementation of TextRank that builds atop spaCy, datasketch, NetworkX, and other popular libraries to prepare raw text for AI applications in media and learning—to move beyond outdated techniques such as stemming, n-grams, or bag-of-words while performing advanced NLP on single-server solutions. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Intermediate
Eduardo Arino de la Rubia (Domino Data Lab)
Average rating: *****
(5.00, 5 ratings)
The promise of the automated statistician is as old as statistics itself. Eduardo Arino de la Rubia explores the tools created by the open source community to free data scientists from tedium, enabling them to work on the high-value aspects of insight creation. Along the way, Eduardo compares open source tools such as TPOT and auto-sklearn and discusses their place in the DS workflow. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  ecommerce, Streaming
Average rating: *****
(5.00, 1 rating)
In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Architecture, Financial services
Steven Totman (Cloudera), Faraz Rasheed (TD Bank)
Average rating: *****
(5.00, 2 ratings)
Steven Totman and Faraz Rasheed offer an overview of Griffin, a high-level, easy-to-use framework built on top of Spark, which encapsulates the complexities of common model development tasks within four phases: data understanding, feature extraction, model development, and serving modeling results. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Cloud, R
Edgar Ruiz (RStudio)
Average rating: ****.
(4.00, 1 rating)
With R and sparklyr, a Spark standalone cluster can be used to analyze large datasets found in S3 buckets. Edgar Ruiz walks you through setting up a Spark standalone cluster using EC2 and offers an overview of S3 bucket folder and file setup, connecting R to Spark, the settings needed to read S3 data into Spark, and a data import and wrangle approach. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Beginner
Secondary topics:  AI
Average rating: **...
(2.75, 4 ratings)
Businesses have spent decades trying to make better decisions by collecting and analyzing structured data. New AI technologies are beginning to transform this process. Richard Tibbetts explores AI that guides business analysts to ask statistically sensible questions and lets junior data scientists answer questions in minutes that previously took trained statisticians hours. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Ted Dunning (MapR Technologies)
Average rating: ****.
(4.50, 2 ratings)
Ted Dunning offers an overview of tensor computing—covering, in practical terms, the high-level principles behind tensor computing systems—and explains how it can be put to good use in a variety of settings beyond training deep neural networks (the most common use case). Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Beginner
Secondary topics:  Deep learning, Platform
Average rating: ***..
(3.00, 1 rating)
Bargava Subramanian and Harjinder Mistry explain how machine learning and deep learning techniques are helping Red Hat build smart developer tools to make software developers become more efficient. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Text
Michelle Casbon (Qordoba)
Average rating: ****.
(4.00, 4 ratings)
Michelle Casbon explores the machine learning and natural language processing that enables teams to build products that feel native to every user and explains how Qordoba is tackling the underserved domain of localization using open source tools, including Kubernetes, Docker, Scala, Apache Spark, Apache Cassandra, and Apache PredictionIO (incubating). Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Deep learning
Mike Pittaro (Dell EMC)
The advances we see in machine learning would be impossible without hardware improvements, but building a high-performance hardware platform is tricky. It involves hardware choices, an understanding of software frameworks and algorithms, and how they interact. Mike Pittaro shares the secrets of matching the right hardware and tools to the right algorithms for optimal performance. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning, Streaming
Josh Patterson (Skymind), Kirit Basu (StreamSets )
Enterprises building data lakes often have to deal with very large volumes of image data that they have collected over the years. Josh Patterson and Kirit Basu explain how some of the most sophisticated big data deployments are using convolutional neural nets to automatically classify images and add rich context about the content of the image, in real time, while ingesting data at scale. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  IoT, Streaming
Average rating: *****
(5.00, 3 ratings)
Services such as YouTube, Netflix, and Spotify popularized streaming in different industry segments, but these services do not center around live data—best exemplified by sensor data—which will be increasingly important in the future. Arun Kejariwal, Francois Orsini, and Dhruv Choudhary demonstrate how to leverage Satori to collect, discover, and react to live data feeds at ultralow latencies. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Non-technical
Secondary topics:  Cloud
Karim Chine (RosettaHUB)
Karim Chine offers an overview of rosettaHUB—which aims to establish a global open data science metacloud centered on usability, reproducibility, auditability, and shareability—and shares the results of the rosettaHUB/AWS Educate initiative, which involved 30 higher education institutions and research labs and over 3,000 researchers, educators, and students. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning, Healthcare
Jon Fuller (KNIME), Olivia Klose (Microsoft)
Average rating: ***..
(3.00, 1 rating)
Jon Fuller and Olivia Klose explain how KNIME, Apache Spark, and Microsoft Azure enable fast and cheap automated classification of malignant lymphoma type in digital pathology images. The trained model is deployed to end users as a web application using the KNIME WebPortal. Read more.