Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Schedule: Data science & advanced analytics sessions

11:30–12:00 Wednesday, 1/06/2016
Location: Capital Suite 4 Level: Advanced
Alexandre Dalyac (Tractable), Robert Hogan (Tractable)
Average rating: ***..
(3.25, 4 ratings)
The bottleneck in computer vision is in creating sufficiently large, labeled training sets for tasks. Alexandre Dalyac and Robert Hogan address this issue through a combination of dimensionality reduction, information retrieval, and domain adaptation techniques packaged in a software product that acts as a human-algorithm interface to facilitate transfer of expertise from human to machine. Read more.
13:30–17:00 Wednesday, 1/06/2016
Location: Capital Suite 13 Level: Intermediate
Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions), 5abd3266 b21b4657 (Mango Solutions)
Average rating: ****.
(4.67, 3 ratings)
Aimee Gott, Mark Sellors, and Richard Pugh explore techniques for optimizing your workflow in R when working with big data, including how to efficiently extract data from a database, techniques for visualization and analysis, and how all of this can be incorporated into a single, reproducible report, directly from R. Read more.
11:15–11:55 Thursday, 2/06/2016
Location: Capital Suite 8/9 Level: Advanced
Andy Petrella (Kensu), Melanie Warrick (Google)
Average rating: ***..
(3.35, 17 ratings)
Deep learning is taking data science by storm, due to the combination of stable distributed computing technologies, increasing amounts of data, and available computing resources. Andy Petrella and Melanie Warrick show how to implement a Spark­-ready version of the long short­-term memory (LSTM) neural network, widely used in the hardest natural language processing and understanding problems. Read more.
12:05–12:45 Thursday, 2/06/2016
Location: Capital Suite 8/9 Level: Intermediate
David Talby (Pacific AI), Claudiu Branzan (Accenture)
Average rating: **...
(2.89, 9 ratings)
David Talby and Claudiu Branzan offer a live demo of an end-to-end system that makes nontrivial clinical inferences from free-text patient records. Infrastructure components include Kafka, Spark Streaming, Spark, Titan, and Elasticsearch; data science components include custom UIMA annotators, curated taxonomies, machine-learned dynamic ontologies, and real-time inferencing. Read more.
14:05–14:45 Thursday, 2/06/2016
Location: Capital Suite 8/9 Level: Intermediate
Jo-fai Chow (H2O.ai)
Average rating: ***..
(3.50, 2 ratings)
The generalized low-rank model is a new machine-learning approach for reconstructing missing values and identifying important features in heterogeneous data. Through a series of examples, Jo-fai Chow demonstrates how to fit low-rank models in a parallelized framework and how to use these models to make better predictions. Read more.
14:05–14:45 Thursday, 2/06/2016
Location: Capital Suite 15/16 Level: Intermediate
Average rating: ****.
(4.56, 9 ratings)
Which venues have similar visiting patterns? How can we detect when a user is on vacation? Can we predict which venues will be favorited by users by examining their friends' preferences? Natalino Busa explains how these predictive analytics tasks can be accomplished by using Spark SQL, Spark ML, and just a few lines of Scala code. Read more.
16:35–17:15 Thursday, 2/06/2016
Location: Capital Suite 8/9 Level: Intermediate
Tags: health, science
Tom White (Cloudera)
Average rating: ***..
(3.40, 5 ratings)
The advent of next-generation DNA sequencing technologies is revolutionizing life sciences research by routinely generating extremely large datasets. Tom White explains how big data tools developed to handle large-scale Internet data (like Hadoop) help scientists effectively manage this new scale of data and also enable addressing a host of questions that were previously out of reach. Read more.
17:25–18:05 Thursday, 2/06/2016
Location: Capital Suite 8/9 Level: Intermediate
Jeroen Janssens (Data Science Workshops B.V.)
Average rating: ****.
(4.00, 5 ratings)
A polyglot is a person who knows and is able to use several languages. There are a plethora of programming languages and computing environments available for working with data, and some data science projects require using multiple languages together. Jeroen Janssens discusses three approaches to become a polyglot data scientist. Read more.
17:25–18:05 Thursday, 2/06/2016
Location: Capital Suite 10/11 Level: Intermediate
Tags: ai
Marc Warner (ASI), Stuart Russell (UC Berkeley), Jaan Tallinn (CSER)
Average rating: **...
(2.50, 6 ratings)
Stuart Russell and Jaan Tallinn explore and debate the future of artificial intelligence in a panel discussion moderated by Marc Warner. Read more.
11:15–11:55 Friday, 3/06/2016
Location: Capital Suite 8/9 Level: Intermediate
Thomas Wiecki (Quantopian)
Average rating: ***..
(3.50, 6 ratings)
Thomas Wiecki explores the prevalence of backtest overfitting and debunks several common myths in quantitative finance based on empirical findings. Thomas demonstrates how he trained a machine-learning classifier on Quantopian's huge and unique dataset of over 800,000 trading algorithms to predict if an algorithm is overfit and how its future performance will likely unfold. Read more.
11:15–11:55 Friday, 3/06/2016
Location: Capital Suite 17 Level: Intermediate
Anirudh Koul (Microsoft), Saqib Shaikh (Microsoft)
Average rating: *****
(5.00, 3 ratings)
Anirudh Koul and Saqib Shaik explore cutting-edge advances at the intersection of vision, language, and deep learning that help the blind community "see" the physical world and explain how developers can utilize this state-of-the-art image-captioning and computer-vision technology in their own applications. Read more.
12:05–12:45 Friday, 3/06/2016
Location: Capital Suite 8/9 Level: Intermediate
Andy Petrella (Kensu), Dean Wampler (Lightbend)
Average rating: ***..
(3.12, 8 ratings)
Andy Petrella and Dean Wampler explore what it means to do data science today and why Scala succeeds at coping with large and fast data where older languages fail. Andy and Dean then discuss the current ongoing projects in advanced data science that use Scala as the main language, including Splash, mic-cut problem, OptiML, needle (DL), ADAM, and more. Read more.
14:05–14:45 Friday, 3/06/2016
Location: Capital Suite 8/9 Level: Intermediate
Marcel Kornacker (Cloudera)
Average rating: ****.
(4.33, 6 ratings)
Marcel Kornacker explains how nested data structures can increase analytic productivity, using the well-known TPC-H schema to demonstrate how to simplify analytic workloads with nested schemas. Read more.
14:05–14:45 Friday, 3/06/2016
Location: Capital Suite 10/11 Level: Intermediate
Tags: geospatial, iot
Dirk Gorissen (Skycap | World Bank)
Dirk Gorissen demonstrates how to use machine learning to detect land mines from a drone-mounted ground-penetrating radar sensor. Read more.
14:05–14:45 Friday, 3/06/2016
Location: Capital Suite 17 Level: Intermediate
Alyona Medelyan (Thematic)
Average rating: ***..
(3.33, 6 ratings)
With the rise of deep learning, natural language understanding techniques are becoming more effective and are not as reliant on costly annotated data. This leads to an explosion of possibilities of what businesses can do with language. Alyona Medelyan explains what the newest NLU tools can achieve today and presents their common use cases. Read more.
16:35–17:15 Friday, 3/06/2016
Location: Capital Suite 8/9 Level: Intermediate
Gary Willis (ASI)
Applying a data-driven approach to the recruitment process has long been an aspirational goal for many organizations. In recent years, through the use of data science, it has become a genuine reality. Gary Willis explains how data science and, more importantly, an intelligent approach to interview design have enabled companies to start identifying unconscious bias in their recruitment process. Read more.