Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Hardcore Data Science

9:00 - 17:00, Tuesday, 23 May 2017
Location: London Suite 3

Data science is a hot topic, but much of it is simply business intelligence in a new mantle. Ben Lorica and Angie Ma lead a full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. We’ll cover topics such as data management, machine learning, natural language processing, crowdsourcing, and algorithm design.

Who should attend: data scientists, data engineers, statisticians, data modelers, and analysts with a strong understanding of data science fundamentals will find themselves at home in this tutorial, as will CTOs, chief scientists, and academic researchers.

Tuesday, 23/05/2017

9:00–9:05 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Angie Ma (Faculty), Ben Lorica (O'Reilly)
Ben Lorica and Angie Ma welcome you to the all-day Hardcore Data Science tutorial. Read more.
9:05–9:30 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Secondary topics:  IoT, Streaming
Level: Intermediate
Ira Cohen (Anodot)
Average rating: *****
(5.00, 4 ratings)
Identifying the relationships between time series metrics lets them be used for predictions, root cause diagnosis, and more. Ira Cohen shares accurate methods that work on a large scale (e.g., behavioral pattern similarity clustering algorithms) and strategies for reducing FPs and FNs, reducing computational resources, and distinguishing correlation and causation. Read more.
9:30–10:00 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Yingsong Zhang (ASI Data Science)
There are sometimes occasions where the labels on data are insufficient. In such situations, semisupervised learning can be of great practical value. Yingsong Zhang explores illustrative examples of how to come up with creative solutions, derived from textbook approaches. Read more.
10:00–10:30 Tuesday, 23/05/2017
Data science and advanced analytics
Location: London Suite 2/3
Level: Intermediate
Ali Hürriyetoglu (Statistics Netherlands), Nelleke Oostdijk (Radboud University)
Average rating: *....
(1.00, 1 rating)
Identifying relevant tweets in tweet collections that are gathered via key words is a huge challenge. Ali Hürriyetoglu and Nelleke Oostdijk share the results of a study on using unsupervised and supervised machine learning with linguistic insight to enable people to identify relevant tweets for their needs and offer an overview of their tool, Relevancer. Read more.
10:30–11:00 Tuesday, 23/05/2017
Location: London Suite 2/3
Morning break (30m)
11:00–11:30 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Level: Intermediate
Robin Senge (inovex)
Reliable prediction is the ability of a predictive model to explicitly measure the uncertainty involved in a prediction without feedback. Robin Senge shares two approaches to measure different types of uncertainty involved in a prediction. Read more.
11:30–12:00 Tuesday, 23/05/2017
Level: Intermediate
Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft)
The speed of a machine-learning algorithm can be crucial in problems that require retraining in real time. Mathew Salvaris and Miguel González-Fierro introduce Microsoft's recently open sourced LightGBM library for decision trees, which outperforms other libraries in both speed and performance, and demo several applications using LightGBM. Read more.
12:00–12:30 Tuesday, 23/05/2017
Data science and advanced analytics
Location: London Suite 2/3
Secondary topics:  AI, Deep learning
Level: Intermediate
Amitai Armon (Intel), Yahav Shadmi (Intel)
Average rating: *****
(5.00, 2 ratings)
Neural-network models have a set of configuration hyperparameters tuned to optimize a given model's accuracy. Yahav Shadmi demonstrates how to select hyperparameters to significantly reduce training time while maintaining accuracy, present examples for popular neural network models used for text and images, and describe a real-world optimization method for tuning. Read more.
12:30–13:30 Tuesday, 23/05/2017
Location: London Suite 2/3
Lunch (1h)
13:30–14:00 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Level: Intermediate
Kay Brodersen (Google)
Average rating: *****
(5.00, 2 ratings)
Causal relationships empower us to understand the consequences of our actions and decide what to do next. This is why identifying causal effects has been at the heart of data science. Kay Brodersen offers an introduction to CausalImpact, a new analysis library developed at Google for identifying the causal effect of an intervention on a metric over time. Read more.
14:00–14:30 Tuesday, 23/05/2017
Hardcore Data Science, Spark & beyond
Location: London Suite 2/3
Secondary topics:  Deep learning
Level: Intermediate
Ding Ding (Intel)
Built on Apache Spark, BigDL provides deep learning functionality parity with existing DL frameworks—with better performance. Ding Ding explains how BigDL helps make the big data platform a unified data analytics platform, enabling more accessible deep learning for big data users and data scientists. Read more.
14:30–15:00 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Secondary topics:  Deep learning
Alan Mosca (nPlan)
Average rating: ****.
(4.00, 1 rating)
Alan Mosca discusses using ensembles in deep learning and tackles a benchmark problem in computer vision with Toupee, a library and toolkit for experimentation with deep learning and ensembles. Read more.
15:00–15:30 Tuesday, 23/05/2017
Location: London Suite 2/3
Afternoon break (30m)
15:30–16:00 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Secondary topics:  Deep learning
Eduard Vazquez (Cortexica Vision Systems)
Average rating: ***..
(3.00, 1 rating)
Cortexica had the first commercial implementation of a deep convolutional network in a GPU back in 2010. However, in the real world, running a CNN is not always a possibility. Eduard Vazquez discusses current challenges that commercial applications based on this technology are facing and how some of them can be tackled. Read more.
16:00–16:30 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Aida Mehonic (The Alan Turing Institute)
Average rating: ***..
(3.00, 1 rating)
Aida Mehonic explains how ASI Data Science has trained a deep neural net on historical prices of liquid financial contracts. The neural net has already outperformed comparable strategies based on expert systems. Read more.
16:30–16:55 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Secondary topics:  Deep learning
David Barber (UCL)
Average rating: ****.
(4.00, 1 rating)
David Barber considers two issues related to training of deep learning systems—natural language modeling and the use of higher-order optimization methods for deep learning—offering an overview of the topics, exploring recent work, and demonstrating how to use them effectively. Read more.
16:55–17:00 Tuesday, 23/05/2017
Hardcore Data Science
Location: London Suite 2/3
Ben Lorica offers a recap of the day's talks and closing remarks. Read more.