Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Hardcore Data Science

9:00am - 5:00pm, Tuesday, September 26, 2017
Location: 

Data science is a hot topic, but much of it is simply business intelligence in a new mantle. Ben Lorica and Angie Ma lead a full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. We’ll cover topics such as data management, machine learning, natural language processing, crowdsourcing, and algorithm design.

Who should attend: data scientists, data engineers, statisticians, data modelers, and analysts with a strong understanding of data science fundamentals will find themselves at home in this tutorial, as will CTOs, chief scientists, and academic researchers.

Tuesday, 09/26/2017

9:00am

Add to your personal schedule
9:00am–9:05am Tuesday, 09/26/2017
Location: 1A 06/07
Ben Lorica (O'Reilly Media), Assaf Araki (Intel)
Hosts Ben Lorica and Assaf Araki welcome you to Hardcore Data Science day. Read more.

9:05am

Add to your personal schedule
9:05am–9:30am Tuesday, 09/26/2017 Secondary topics:  Pydata
Jacob Schreiber (University of Washington)
Average rating: ****.
(4.00, 5 ratings)
Jacob Schreiber offers an overview of pomegranate, a flexible probabilistic modeling package implemented in Cython for speed. Jacob explores the models it supports, such as Bayesian networks and hidden Markov models, and how to easily implement them and explains how the underlying modular implementation unlocks several benefits for the modern data scientist. Read more.

9:30am

Add to your personal schedule
9:30am–10:00am Tuesday, 09/26/2017
Artificial Intelligence, Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Hardcore Data Science
Alex Ratner (Stanford University)
Average rating: ****.
(4.40, 5 ratings)
As data-hungry algorithms become the norm in machine learning, the bottleneck is now acquiring labeled training data. Alex Ratner explores data programming, a paradigm for the programmatic creation of training sets in which users express weak supervision strategies or domain heuristics as simple scripts called labeling functions, which are then automatically denoised. Read more.

10:00am

Add to your personal schedule
10:00am–10:30am Tuesday, 09/26/2017
Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Hardcore Data Science
Madeleine Udell (Cornell University)
Average rating: ****.
(4.10, 10 ratings)
Madeleine Udell explains how to fill in missing data with generalized low-rank models. Read more.

10:30am

Morning break (30m)

11:00am

Add to your personal schedule
11:00am–11:30am Tuesday, 09/26/2017
Machine Learning & Data Science
Location: 1A 06/07
Secondary topics:  Deep learning, ecommerce, Marketing, Platform
Yunsong Guo (Pinterest)
Average rating: ***..
(3.83, 6 ratings)
Pinterest has always prioritized user experiences. Yunsong Guo explores how Pinterest uses machine learning—particularly linear, GBDT, and deep NN models—in its most important product, the home feed, to improve user engagement. Along the way, Yunsong shares how Pinterest drastically increased its international user engagement along with lessons on finding the most impactful features. Read more.

11:30am

Add to your personal schedule
11:30am–12:00pm Tuesday, 09/26/2017
Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Hardcore Data Science, Healthcare
Katherine Heller (Duke University)
Average rating: ***..
(3.75, 4 ratings)
Katherine Heller discusses multiple ways in which healthcare data is acquired and explains how machine learning methods are currently being introduced into clinical settings. Read more.

12:00pm

Add to your personal schedule
12:00pm–12:30pm Tuesday, 09/26/2017
Artificial Intelligence, Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Deep learning, Hardcore Data Science
Alan Nichol (Rasa)
Average rating: ****.
(4.00, 2 ratings)
There's a large body of research on machine learning-based dialogue, but most voice and chat systems in production are still implemented using a state machine and a set of rules. Alan Nichol offers an overview of Rasa's applied AI research in language understanding and dialogue and explains how open source implementations bring the state of the art to thousands of developers. Read more.

12:30pm

Lunch (1h)

1:30pm

Add to your personal schedule
1:30pm–2:00pm Tuesday, 09/26/2017
Data science & advanced analytics, Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Deep learning, Text
Gerard de Melo (Rutgers University)
Average rating: ****.
(4.00, 3 ratings)
How can we exploit the massive amounts of data now available on the web to enable more intelligent applications? Gerard de Melo shares results on applying deep learning techniques to web-scale amounts of data to learn neural representations of language and world knowledge. The resulting resources can be used in Spark to work with text in over 300 languages. Read more.

2:00pm

Add to your personal schedule
2:00pm–2:30pm Tuesday, 09/26/2017
Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Hardcore Data Science
Average rating: ****.
(4.50, 6 ratings)
Tamara Broderick demonstrates new advances in computation for Bayesian machine learning that allow reliable quantification of uncertainty and robustness at modern data scales, illustrated with examples in microcredit and online advertising. Read more.

2:30pm

Add to your personal schedule
2:30pm–3:00pm Tuesday, 09/26/2017
Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Deep learning
Inbal Tadeski (Anodot)
Average rating: ***..
(3.40, 5 ratings)
Inbal Tadeski demonstrates the importance of identifying relationships between time metrics so that they can be used for predictions, root cause diagnosis, and more. Inbal shares accurate methods that work on a large scale, such as behavioral pattern similarity clustering algorithms, and strategies for reducing FPs, FNs, and computational resources and distinguishing correlation and causation. Read more.

3:00pm

Afternoon break (30m)

3:30pm

Add to your personal schedule
3:30pm–4:00pm Tuesday, 09/26/2017 Secondary topics:  Deep learning
Daniel Kang (Stanford University)
Average rating: ****.
(4.67, 3 ratings)
Video is one of the fastest-growing sources of data with rich semantic information, and advances in deep learning have made it possible to query this information with near-human accuracy. However, inference remains prohibitively expensive: the most powerful GPU cannot run the state of the art at real time. Daniel Kang offers an overview of NoScope, which runs queries over video 1,000x faster. Read more.

4:00pm

Add to your personal schedule
4:00pm–4:30pm Tuesday, 09/26/2017
Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Deep learning, Geospatial, Hardcore Data Science, Smart cities
Bichen Wu (UC Berkeley)
Average rating: ***..
(3.00, 2 ratings)
Bichen Wu explores perception tasks for autonomous driving and explains how to design efficient neural networks to address critical issues such as latency, energy efficiency, and model size. Read more.

4:30pm

Add to your personal schedule
4:30pm–5:00pm Tuesday, 09/26/2017
Artificial Intelligence
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Deep learning
Shaked Shammah (Hebrew University)
Average rating: ****.
(4.00, 1 rating)
Deep learning is amazing, but it sometimes fails miserably, even for very simple, practical problems. Shaked Shammah discusses four types of common problems in which deep learning fails. Some can be solved by using specific approaches to network architecture and loss functions. For others, deep learning is simply not the right way to go. Read more.