Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Hardcore Data Science conference sessions

Tuesday, March 29

9:00am–5:00pm Tuesday, 03/29/2016
Location: 210 C/G
Average rating: ****.
(4.00, 21 ratings)
Ben Lorica leads a full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. Read more.
9:05am–9:30am Tuesday, 03/29/2016
Location: 210 C/G
Moritz Hardt (Google)
Average rating: ****.
(4.31, 13 ratings)
Moritz Hardt outlines a reusable holdout method, which can be used many times without losing the guarantees of fresh data. Moritz also explains how to design reliable machine-learning benchmarks for a number of applications such as data science competitions and hyperparameter tuning. Read more.
9:30am–10:00am Tuesday, 03/29/2016
Location: 210 C/G
Average rating: ****.
(4.14, 14 ratings)
Xavier Amatriain explains the lessons learned building real-life machine-learning systems at Quora. Read more.
10:00am–10:30am Tuesday, 03/29/2016
Location: 210 C/G
Alice Zheng (1977)
Average rating: ***..
(3.79, 14 ratings)
Feature engineering is widely practiced, but understanding the hows and whys behind this process often relies on folklore and guesswork. Alice Zheng offers a systematic view of feature engineering and discusses the underpinnings of a few popular methods. Read more.
11:00am–11:30am Tuesday, 03/29/2016
Location: 210 C/G
Alexander Ulanov (Hewlett Packard Labs)
Average rating: ***..
(3.64, 11 ratings)
Alexander Ulanov outlines a scalable implementation of deep neural networks for Spark, which uses batch BLAS operations to speed up the computations, employs Spark data parallelism for scaling, and provides friendly and extensible user and developer interfaces. Read more.
11:30am–12:00pm Tuesday, 03/29/2016
Location: 210 C/G
John Canny (UC Berkeley)
Average rating: ****.
(4.07, 14 ratings)
GPUs have proven their value for machine learning, offering orders-of-magnitude speedups on dense and sparse data. They define the current performance limits for machine learning but have limited model capacity. John Canny explains how to mitigate that challenge and achieve linear speedups with GPUs on commodity networks. The result defines the hitherto unseen "outer limits" of ML performance. Read more.
12:00pm–12:30pm Tuesday, 03/29/2016
Location: 210 C/G
Stephen Merity (Salesforce Research), Caiming Xiong (Metamind)
Average rating: ****.
(4.15, 13 ratings)
Stephen Merity, Richard Socher, and Caiming Xiong discuss their recent work on extending the dynamic memory network (DMN) to question answering in both the textual and visual domains and explore how memory networks and attention mechanisms can allow for better interpretability of deep learning models. Read more.
1:30pm–2:00pm Tuesday, 03/29/2016
Location: 210 C/G
Erin Ledell (H2O.ai)
Average rating: ****.
(4.25, 12 ratings)
Erin Ledell covers the basics of ensemble learning and offers an introduction to the scalable open source machine-learning library H2O. Erin then gives a demonstration of the H2O Ensemble package for ensemble learning and dives into more advanced topics including super learning/stacking and scalable ensemble learning with H2O Ensemble. Read more.
2:00pm–2:30pm Tuesday, 03/29/2016
Location: 210 C/G
Tags: geospatial
Alexander Gray (Skytree, Inc.)
Average rating: ***..
(3.67, 12 ratings)
Alex Gray presents a novel approach to score and detect anomalies in large-scale data based on probabilistic machine-learning models. Alex focuses on unsupervised learning and uses a real-world use case—finding outliers in geospatial behavior—to demonstrate how an outlier detection framework can be applied to find anomalies in a dataset with millions of instances. Read more.
2:30pm–3:00pm Tuesday, 03/29/2016
Location: 210 C/G
Lise Getoor (University of California, Santa Cruz)
Average rating: ****.
(4.00, 15 ratings)
Lise Getoor explores scalable collective reasoning in graphs. Read more.
3:30pm–4:00pm Tuesday, 03/29/2016
Location: 210 C/G
Tags: science
Laura Waller (UC Berkeley)
Average rating: ****.
(4.30, 10 ratings)
Laura Waller gives an overview of new optical microscopes that employ simple experimental systems and efficient nonlinear inverse algorithms to achieve high-resolution 3D and phase images. By leveraging recent advances in data science, these microscopes can produce gigapixel-scale images at each time frame, computed efficiently and with good robustness to noise and model mismatch. Read more.
4:00pm–4:30pm Tuesday, 03/29/2016
Location: 210 C/G
Rajat Monga (Google)
Average rating: ***..
(3.71, 14 ratings)
TensorFlow is an open source software library for numerical computation with a focus on machine learning. Rajat Monga offers an introduction to TensorFlow and explains how to use it to train and deploy machine-learning models to make your next application smarter. Read more.
4:30pm–5:00pm Tuesday, 03/29/2016
Location: 210 C/G
Tags: ai
Mike Cafarella (University of Michigan)
Average rating: ****.
(4.60, 10 ratings)
Dark data is the great mass of data buried in text, tables, figures, and images that lacks structure and so is essentially unprocessable by existing software. DeepDive is a system that extracts value from dark data. Mike Cafarella offers an introduction to DeepDive, exploring the key technical innovations that enable DeepDive to produce statistical inference at massive scale. Read more.