Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Schedule: Data science & advanced analytics sessions

Add to your personal schedule
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1A 21/22 Level: Intermediate
Yufeng Guo (Google), Amy Unruh (Google)
Average rating: **...
(2.00, 9 ratings)
Yufeng Guo and Amy Unruh walk you through training and deploying a machine learning system using TensorFlow, a popular open source library. Yufeng and Amy take you from a conceptual overview all the way to building complex classifiers and explain how you can apply deep learning to complex problems in science and industry. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Location: 1E 15/16 Level: Intermediate
Matthew Rocklin (Anaconda), Ben Zaitlen (Anaconda)
Average rating: *****
(5.00, 1 rating)
The Python data science stack, which includes NumPy, pandas, and scikit-learn, is efficient and intuitive but only for in-memory data and a single core. Matthew Rocklin and Ben Zaitlen demonstrate how to parallelize and scale your Python workloads to multicore machines and multimachine clusters. Read more.
Add to your personal schedule
9:00am5:00pm Tuesday, September 26, 2017
Location: 1A 06/07
Ben Lorica (O'Reilly Media), Assaf Araki (Intel), Jacob Schreiber (University of Washington), Alex Ratner (Stanford University), Madeleine Udell (Cornell University), Yunsong Guo (Pinterest), Katherine Heller (Duke University), Alan Nichol (Rasa), Gerard de Melo (Rutgers University), Tamara Broderick (MIT), Inbal Tadeski (Anodot), Daniel Kang (Stanford University), Bichen Wu (UC Berkeley), Shaked Shammah (Hebrew University)
A full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Deep learning, Pydata, Text
David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed)
Natural language processing is a key component in many data science systems that must understand or reason about text. David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, TensorFlow for training custom machine-learned annotators, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Location: 1E 10 Level: Advanced
Secondary topics:  R
Jared Lander (Lander Analytics)
Average rating: ***..
(3.25, 4 ratings)
Modern statistics has become almost synonymous with machine learning—a collection of techniques that utilize today's incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net, boosted trees, and cross-validation. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 27, 2017
Location: 1E 15/16 Level: Intermediate
Mike Driscoll (Metamarkets)
Average rating: ****.
(4.00, 3 ratings)
Most analytics tools in use today provide static visuals that don’t reveal the full, real-time picture. Mike Driscoll shows how to take an interactive approach to analytics. From design techniques to discovering new forms of data exploration, he demonstrates how to put the full power of big data into the hands of the people who need it to make key business decisions. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Advanced
Secondary topics:  Media, Text
Eui-Hong Han (The Washington Post), Ling Jiang (The Washington Post)
Average rating: ****.
(4.50, 2 ratings)
The quality of online comments is critical to the Washington Post. However, the quality management of the comment section currently requires costly manual resources. Eui-Hong Han and Ling Jiang discuss ModBot, a machine learning-based tool developed for automatic comments moderation, and share the challenges they faced in developing and deploying ModBot into production. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Beginner
Secondary topics:  Data for good, ecommerce, Healthcare
Average rating: ****.
(4.67, 3 ratings)
Zocdoc is an online marketplace that allows easy doctor discovery and instant online booking. However, dealing with healthcare involves many constraints and challenges that render standard approaches to common problems infeasible. Brian Dalessandro surveys the various machine learning problems Zocdoc has faced and shares the data, legal, and ethical constraints that shape its solution space. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Pydata
Matthew Rocklin (Anaconda)
Average rating: ****.
(4.67, 3 ratings)
Dask parallelizes Python libraries like NumPy, pandas, and scikit-learn, bringing a popular data science stack to the world of distributed computing. Matthew Rocklin discusses the architecture and current applications of Dask used in the wild and explores computational task scheduling and parallel computing within Python generally. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning
Joshua Patterson (NVIDIA), Michael Balint (NVIDIA), Satish Varma Dandu (NVIDIA)
Average rating: ****.
(4.00, 1 rating)
How can deep learning be employed to create a system that monitors network traffic, operations data, and system logs to reliably flag risk and unearth potential threats? Satish Dandu, Joshua Patterson, and Michael Balint explain how to bootstrap a deep learning framework to detect risk and threats in operational production systems, using best-of-breed GPU-accelerated open source tools. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 27, 2017
Location: 1A 18 Level: Intermediate
Secondary topics:  Financial services
Tobi Bosede (Johns Hopkins)
Whether an entity seeks to create trading algorithms or mitigate risk, predicting trade volume is an important task. Focusing on futures trading that relies on Apache Spark for processing the large amount data, Tobi Bosede considers the use of penalized regression splines for trade volume prediction and the relationship between price volatility and trade volume. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 08/10 Level: Intermediate
Secondary topics:  Pydata
Shoumik Palkar (Stanford University), Matei Zaharia (Stanford University)
Average rating: *****
(5.00, 2 ratings)
Modern data applications combine functions from many optimized libraries (e.g., pandas and TensorFlow) and yet do not achieve peak hardware performance due to data movement across functions. Shoumik Palkar and Matei Zaharia offer an overview of Weld, a new interface to implement functions in these libraries while enabling optimizations across them. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1E 14 Level: Non-technical
Behrooz Hashemian (Massachusetts Institute of Technology)
People are leaving an increasing amount of digital traces in their everyday life. Since these traces are mostly anonymized, the information gained by advanced data analytics is limited to each individual trace. Behrooz Hashemian explains how to fuse various traces and build multidimensional insight by taking advantage of patterns in people's behavior. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Location: 1A 06/07 Level: Intermediate
David Talby (Pacific AI)
Average rating: *****
(5.00, 2 ratings)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 08/10 Level: Intermediate
Eduardo Arino de la Rubia (Domino Data Lab)
Average rating: *****
(5.00, 5 ratings)
The promise of the automated statistician is as old as statistics itself. Eduardo Arino de la Rubia explores the tools created by the open source community to free data scientists from tedium, enabling them to work on the high-value aspects of insight creation. Along the way, Eduardo compares open source tools such as TPOT and auto-sklearn and discusses their place in the DS workflow. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  ecommerce, Streaming
Average rating: *****
(5.00, 1 rating)
In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Ted Dunning (MapR Technologies)
Average rating: ****.
(4.50, 2 ratings)
Ted Dunning offers an overview of tensor computing—covering, in practical terms, the high-level principles behind tensor computing systems—and explains how it can be put to good use in a variety of settings beyond training deep neural networks (the most common use case). Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Beginner
Secondary topics:  Deep learning, Platform
Average rating: ***..
(3.00, 1 rating)
Bargava Subramanian and Harjinder Mistry explain how machine learning and deep learning techniques are helping Red Hat build smart developer tools to make software developers become more efficient. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Text
Michelle Casbon (Qordoba)
Average rating: ****.
(4.00, 4 ratings)
Michelle Casbon explores the machine learning and natural language processing that enables teams to build products that feel native to every user and explains how Qordoba is tackling the underserved domain of localization using open source tools, including Kubernetes, Docker, Scala, Apache Spark, Apache Cassandra, and Apache PredictionIO (incubating). Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning, Streaming
Josh Patterson (Skymind), Kirit Basu (StreamSets )
Enterprises building data lakes often have to deal with very large volumes of image data that they have collected over the years. Josh Patterson and Kirit Basu explain how some of the most sophisticated big data deployments are using convolutional neural nets to automatically classify images and add rich context about the content of the image, in real time, while ingesting data at scale. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1A 06/07 Level: Intermediate
Secondary topics:  IoT, Streaming
Average rating: *****
(5.00, 3 ratings)
Services such as YouTube, Netflix, and Spotify popularized streaming in different industry segments, but these services do not center around live data—best exemplified by sensor data—which will be increasingly important in the future. Arun Kejariwal, Francois Orsini, and Dhruv Choudhary demonstrate how to leverage Satori to collect, discover, and react to live data feeds at ultralow latencies. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Location: 1E 15/16 Level: Intermediate
Secondary topics:  Text
Noemi Derzsy (Rensselaer Polytechnic Institute)
Open source data has enabled society to engage in community-based research and has provided government agencies with more visibility and trust from individuals. Noemi Derzsy offers an overview of the openNASA platform and discusses openNASA metadata analysis and tools for applying NLP and topic modeling techniques to understand open government dataset associations. Read more.