Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Schedule: Data science and advanced analytics sessions

The practice of data science—from the latest advances in machine learning, prediction, and technology to the softer topics of building a data science team and managing towards a culture of change.

Add to your personal schedule
9:00 - 17:00 Monday, 22 May & Tuesday, 23 May
Location: Capital Suite 16
Kai Voigt (Cloudera)
Learn how Spark and Hadoop enable data scientists to help companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities. Using in-class simulations and exercises, Kai Voigt walks you through applying data science methods to real-world challenges in different industries, offering preparation for data scientist roles in the field. Read more.
Add to your personal schedule
9:00 - 17:00 Monday, 22 May & Tuesday, 23 May
Location: Capital Suite 17
Secondary topics:  Deep learning
Robert Schroll (The Data Incubator)
Robert Schroll demonstrates TensorFlow's capabilities through its Python interface, walking you through building machine-learning algorithms piece by piece and using the higher-level abstractions provided by TensorFlow. You'll then use this knowledge to build machine-learning models on real-world data. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
Location: Capital Suite 12
Level: Beginner
Charlotte Werger (ASI Data Science)
Average rating: ***..
(3.80, 5 ratings)
Charlotte Werger offers a hands-on overview of implementing machine learning with Python, providing practical experience while covering the most commonly used libraries, including NumPy, pandas, and scikit-learn. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
SOLD OUT
Location: Capital Suite 13
Secondary topics:  AI, Deep learning
Level: Intermediate
Alison Lowndes (NVIDIA)
Average rating: **...
(2.50, 4 ratings)
Alison Lowndes leads a hands-on exploration of approaches to the challenging problem of detecting if an object of interest is present within an image and, if so, recognizing its precise location within the image. Along the way, Alison walks you through testing three different approaches to deploying a trained DNN for inference. Read more.
Add to your personal schedule
9:0017:00 Tuesday, 23 May 2017
Location: London Suite 2/3
Angie Ma (ASI), Ben Lorica (O'Reilly Media), Ira Cohen (Anodot), Yingsong Zhang (ASI Data Science), Ali Hürriyetoglu (Statistics Netherlands), Nelleke Oostdijk (Radboud University), Robin Senge (inovex GmbH), Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Amitai Armon (Intel), Yahav Shadmi (Intel), Kay Brodersen (Google), Ding Ding (Intel), Alan Mosca (Sendence | Birkbeck, University of London), Eduard Vazquez (Cortexica Vision Systems), Aida Mehonic (ASI Data Science), David Barber (Department of Computer Science, UCL)
A full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 13
Secondary topics:  Cloud, Deep learning
Level: Advanced
Anima Anandkumar (UC Irvine)
Average rating: ***..
(3.67, 3 ratings)
Deep learning is the state of the art in domains such as computer vision and natural language understanding. Apache MXNet is a highly flexible and developer-friendly deep learning framework. Anima Anandkumar provides hands-on experience on how to use Apache MXNet with preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 24 May 2017
Location: Hall S21/23 (A)
Secondary topics:  Deep learning
Level: Intermediate
Mikio Braun (Zalando SE)
Average rating: ***..
(3.14, 7 ratings)
Deep learning has become the go-to solution for challenges such as image classification or speech processing, but does it work for all application areas? Mikio Braun offers background on deep learning and shares his practical experience working with these exciting technologies. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 24 May 2017
Location: Hall S21/23 (B)
Level: Intermediate
Matthew Rocklin (Anaconda)
Average rating: ****.
(4.33, 3 ratings)
Dask parallelizes Python libraries like NumPy, pandas, and scikit-learn, bringing a popular data science stack to the world of distributed computing. Matthew Rocklin discusses the architecture and current applications of dask used in the wild. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 24 May 2017
Location: Capital Suite 8/9
Secondary topics:  AI, Deep learning, Ecommerce
Level: Beginner
Yishay Carmiel (IntelligentWire)
Average rating: ***..
(3.00, 1 rating)
For years, people have been talking about the great promise of conversation AI. Recently, deep learning has taken us a few steps further toward achieving tangible goals, making a big impact on technologies like speech recognition and natural language processing. Yishay Carmiel offers an overview of the impact of deep learning, recent breakthroughs, and challenges for the future. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 24 May 2017
Location: Capital Suite 7
Level: Intermediate
Damien Lefortier (Facebook)
Average rating: **...
(2.71, 7 ratings)
There are use cases where the only accessible feedback for training machine-learning models is partial and biased (e.g., when feedback is obtained through surveys). Damien Lefortier shares methods to handle these cases and explains how to ensure that they are performing well. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 24 May 2017
Location: Hall S21/23 (A)
Secondary topics:  Cloud
Anima Anandkumar (UC Irvine)
Average rating: ****.
(4.00, 2 ratings)
Anima Anandkumar demonstrates how to use preconfigured Deep Learning AMIs and CloudFormation templates on AWS to help speed up deep learning development and shares use cases in computer vision and natural language processing. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 24 May 2017
Location: Hall S21/23 (B)
Secondary topics:  R-lang
Level: Intermediate
Colin Gillespie (Jumping Rivers | Newcastle University)
Average rating: ****.
(4.33, 6 ratings)
R has the reputation for being slow. Colin Gillespie covers key ideas and techniques for making your R code as efficient as possible, from R setup to common R coding problems to linking R with C++ for an extra speed boost. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 24 May 2017
Location: Hall S21/23 (B)
Level: Intermediate
Seth Hendrickson (Cloudera)
Average rating: ***..
(3.57, 7 ratings)
There are many resources available for learning how to use Spark to build collaborative filtering models. However, there are relatively few that explain how to build a large-scale, end-to-end recommender system. Seth Hendrickson demonstrates how to create such a system using Spark Streaming, Spark ML, and Elasticsearch. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 24 May 2017
Location: Capital Suite 12
Level: Intermediate
Holden Karau (IBM), Seth Hendrickson (Cloudera)
Average rating: ***..
(3.25, 8 ratings)
Structured Streaming is new in Apache Spark 2.0, and work is being done to integrate the machine-learning interfaces with this new streaming system. Holden Karau and Seth Hendrickson demonstrate how to do streaming machine learning using Structured Streaming and walk you through creating your own streaming model. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 24 May 2017
Location: Capital Suite 7
Secondary topics:  Deep learning
Level: Intermediate
Average rating: ***..
(3.33, 3 ratings)
Deep learning is one of the most exciting techniques in machine learning. Miguel González-Fierro explores the problem of image classification using ResNet, the deep neural network that surpassed human-level accuracy for the first time, and demonstrates how to create an end-to-end process to operationalize deep learning in computer vision for business problems using Microsoft RServer and GPU VMs. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 24 May 2017
Location: Hall S21/23 (A)
Level: Advanced
Ted Dunning (MapR Technologies)
Average rating: ***..
(3.00, 2 ratings)
Ted Dunning offers an overview of tensor computing—covering, in practical terms, the high-level principles behind tensor computing systems—and explains how it can be put to good use in a variety of settings beyond training deep neural networks (the most common use case). Read more.
Add to your personal schedule
14:5515:35 Wednesday, 24 May 2017
Location: Hall S21/23 (B)
Level: Intermediate
Michelle Casbon (Qordoba)
Average rating: *****
(5.00, 1 rating)
Supporting multiple locales involves the maintenance and generation of localized strings. Michelle Casbon explains how machine learning and natural language processing are applied to the underserved domain of localization using primarily open source tools, including Scala, Apache Spark, Apache Cassandra, and Apache Kafka. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 24 May 2017
Location: Hall S21/23 (B)
Level: Beginner
Sean Owen (Cloudera)
Average rating: ***..
(3.80, 5 ratings)
Nobody seems to agree just what data science is. Is it engineering, statistics. . .both? David Donoho's "50 Years of Data Science" offers a criticism of the hype around data science from a statistics perspective, arguing that it's not a new field. Sean Owen responds, offering counterpoints from an engineer, in search of a better understanding of how to teach and practice data science in 2017. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 24 May 2017
Location: Capital Suite 10/11
Secondary topics:  Deep learning, Text Analysis and Mining
Level: Beginner
Jonathon Morgan (New Knowledge)
Average rating: *****
(5.00, 12 ratings)
Jonathon Morgan explores computer vision, deep learning, and natural language processing techniques for uncovering communities of white nationalists and neo-Nazis on social media and identifying which ones are on the path to radicalization. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 24 May 2017
Location: Capital Suite 12
Secondary topics:  Ecommerce, Financial services
Level: Intermediate
Harry Powell (Barclays), Raffael Strassnig (Barclays)
Average rating: ****.
(4.00, 6 ratings)
Harry Powell and Raffael Strassnig demonstrate how to model unobserved customer preferences over businesses by thinking about transactional data as a bipartite graph and then computing a new similarity metric—the expected degrees of separation—to characterize the full graph. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 24 May 2017
Location: Capital Suite 7
Secondary topics:  AI, Deep learning
Level: Beginner
Laura Froelich (Think Big Analytics, a Teradata Company)
Average rating: ***..
(3.00, 3 ratings)
Laura Frolich explores applications of deep learning in companies—looking at practical examples of assessing the opportunity for AI, phased adoption, and lessons going from research to prototype to scaled production deployment—and discusses the future of enterprise AI. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 24 May 2017
Location: Hall S21/23 (A)
Secondary topics:  Deep learning
Sherry Moore (Google)
Average rating: ***..
(3.60, 5 ratings)
Sherry Moore discusses TensorFlow progress and adoption over 2016 and looks ahead to TensorFlow efforts in future areas of importance, such as performance, usability, and ubiquity. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 24 May 2017
Location: Hall S21/23 (B)
Secondary topics:  AI, IoT, Logistics, Streaming
Level: Beginner
Dr.-Ing. Michael Nolting (Volkswagen Commercial Vehicles)
Average rating: *....
(1.67, 6 ratings)
It is nearly impossible to sample enough training data initially to prevent autonomous driving accidents on the road, as has been sadly proven by Tesla’s autopilot. Michael Nolting explains that to overcome this problem, a real-time system has to be created to detect dangerous runtime situations in real time, a process much like website monitoring. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 24 May 2017
Location: Capital Suite 12
Level: Intermediate
Natalino Busa (Teradata)
Natalino Busa shares an implementation for classifying pictures based on Spark and Slider that was developed during the 2016 Yelp Restaurant Photo Classification challenge. Spark processes data and trains the ML model, which consists of deep learning and ensemble classification methods, while picture scoring is exposed via an API that is persisted and scaled with Slider. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 24 May 2017
Location: Capital Suite 7
Level: Intermediate
Arshak Navruzyan (Startup.ML)
Average rating: ****.
(4.80, 5 ratings)
Deep learning affords novel and powerful techniques for video prediction and analysis. Arshak Navruzyan explores the current state of the art for video analysis using deep learning techniques and the associated challenges. Read more.
Add to your personal schedule
11:1511:55 Thursday, 25 May 2017
Location: Hall S21/23 (A)
Secondary topics:  Deep learning
Level: Intermediate
Average rating: ****.
(4.00, 2 ratings)
Nikolay Manchev offers an overview of the restricted Boltzmann machine, a type of neural network with a wide range of applications, and shares his experience using it on Hadoop (MapReduce and Spark) to process unstructured and semistructured data at a scale. Read more.
Add to your personal schedule
11:1511:55 Thursday, 25 May 2017
Location: Hall S21/23 (B)
Level: Intermediate
David Talby (Pacific AI)
Average rating: ***..
(3.57, 7 ratings)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
Add to your personal schedule
11:1511:55 Thursday, 25 May 2017
Location: Capital Suite 10/11
Level: Beginner
Aurélien Géron (Kiwisoft)
Average rating: *****
(5.00, 4 ratings)
Collaborative filtering is great for recommendations, yet it suffers from the cold-start problem. New content with no views is ignored, and new users get poor recommendation. Aurélien Géron shares a solution: knowledge graphs. With a knowledge graph, you can truly understand your users' interests and make better, more relevant recommendations. Read more.
Add to your personal schedule
11:1511:55 Thursday, 25 May 2017
Location: Capital Suite 2/3
Secondary topics:  Deep learning
Radhika Rangarajan explains how Intel works with its users to build deep learning-powered big data analytics applications (object detection, image recognition, NLP, etc.) using BigDL. Read more.
Add to your personal schedule
12:0512:45 Thursday, 25 May 2017
Location: Hall S21/23 (A)
Secondary topics:  AI, Cloud, Deep learning
Level: Intermediate
Barbara Fusinska (Microsoft)
Average rating: ***..
(3.00, 5 ratings)
The popularity of deep learning is due in part to its capabilities in recognizing patterns from inputs such as images or sounds. Barbara Fusinska offers an overview of Microsoft Cognitive Toolbox, an open source framework offering various modules and algorithms enabling machines to learn like a human brain. Read more.
Add to your personal schedule
12:0512:45 Thursday, 25 May 2017
Location: Hall S21/23 (B)
Secondary topics:  Cloud
Level: Intermediate
Leah McGuire (Salesforce)
Average rating: ****.
(4.00, 2 ratings)
What if you had to build more models than there are data scientists in the world—a feat enterprise companies serving hundreds of thousands of businesses often have to do? Leah McGuire offers an overview of Salesforce's general-purpose machine-learning platform that automatically builds per-company optimized models for any given predictive problem at scale, beating out most hand-tuned models. Read more.
Add to your personal schedule
14:0514:45 Thursday, 25 May 2017
Location: Hall S21/23 (A)
Secondary topics:  Deep learning, PyData
Level: Intermediate
Martin Görner (Google)
Average rating: ****.
(4.75, 12 ratings)
With TensorFlow, deep machine learning has transitioned from an area of research into mainstream software engineering. Martin Görner walks you through building and training a neural network that recognizes handwritten digits with >99% accuracy using Python and TensorFlow. Read more.
Add to your personal schedule
14:0514:45 Thursday, 25 May 2017
Location: Hall S21/23 (B)
Level: Intermediate
Rumman Chowdhury (Accenture)
Average rating: ***..
(3.00, 1 rating)
Multilevel regression and poststratification (MRP) is a method of estimating granular results from higher-level analyses. While it is generally used to estimate survey responses at a more granular level, MRP has clear applications in industry-level data science. Rumman Chowdhury reviews the methodology behind MRP and provides a hands-on programming tutorial. Read more.
Add to your personal schedule
14:5515:35 Thursday, 25 May 2017
Location: Hall S21/23 (A)
Secondary topics:  Deep learning
Level: Beginner
Nir Lotan (Intel), Barak Rozenwax (Intel)
Average rating: *****
(5.00, 3 ratings)
Barak Rozenwax and Nir Lotan explain how to easily train and deploy deep learning models for image and text analysis problems using Intel's Deep Learning SDK, which enables you to use deep learning frameworks that were optimized to run fast on regular CPUs, including Caffe and TensorFlow. Read more.
Add to your personal schedule
14:5515:35 Thursday, 25 May 2017
Location: Hall S21/23 (B)
Level: Advanced
Gary Willis (ASI)
Average rating: ***..
(3.20, 5 ratings)
Gary Willis offers a technical presentation of a novel algorithm that uses public data and an unsupervised tree-based learning algorithm to help companies leverage locational data they have on their clients. Along the way, Gary also discusses a wide range of further potential applications. Read more.
Add to your personal schedule
14:5515:35 Thursday, 25 May 2017
Location: Capital Suite 13
Level: Beginner
Matt Brandwein (Cloudera), Tristan Zajonc (Cloudera)
Average rating: ***..
(3.00, 1 rating)
Self-service data science is easier said than delivered, especially on Apache Hadoop. Most organizations struggle to balance the diverging needs of the data scientist, data engineer, operator, and architect. Matt Brandwein and Tristan Zajonc cover the underlying root causes of these challenges and introduce new capabilities being developed to make self-service data science a reality. Read more.
Add to your personal schedule
14:5515:35 Thursday, 25 May 2017
Location: Capital Suite 17
Level: Intermediate
Iñaki Puigdollers (Social Point)
Average rating: ****.
(4.00, 1 rating)
Low cost, big impact: this is what data science can bring to your business. Iñaki Puigdollers explores how the analytics department changed Social Point games, creating an even better gaming experience and business. Read more.
Add to your personal schedule
16:3517:15 Thursday, 25 May 2017
Location: Hall S21/23 (A)
Level: Beginner
Paco Nathan (O'Reilly Media)
Average rating: ****.
(4.50, 2 ratings)
Paco Nathan explains how O'Reilly employs AI, from the obvious (chatbots, case studies about other firms) to the less so (using AI to show the structure of content in detail, enhance search and recommendations, and guide editors for gap analysis, assessment, pathing, etc.). Approaches include vector embedding search, summarization, TDA for content gap analysis, and speech-to-text to index video. Read more.
Add to your personal schedule
16:3517:15 Thursday, 25 May 2017
Location: Hall S21/23 (B)
Level: Beginner
Galiya Warrier (Microsoft)
Galiya Warrier demonstrates how to apply a conversational interface (in the form of a chatbot) to communicate with an existing data science model. Read more.
Add to your personal schedule
16:3517:15 Thursday, 25 May 2017
Location: Capital Suite 8/9
Secondary topics:  Deep learning, Streaming
Level: Intermediate
Kamran Yousaf (Redis Labs)
Average rating: ***..
(3.50, 6 ratings)
Kamran Yousaf explains how to substantially accelerate and radically simplify common practices in machine learning, such as running a trained model in production, to meet real-time expectations, using Redis modules that natively store and execute common models generated by Spark ML and TensorFlow algorithms. Read more.
Add to your personal schedule
16:3517:15 Thursday, 25 May 2017
Location: Capital Suite 12
Secondary topics:  Deep learning, IoT
Level: Intermediate
Mads Ingwar (Think Big), Eliano Marques (Think Big)
Average rating: ****.
(4.50, 2 ratings)
Eliano Marques and Mads Ingwar share a case study on how to leverage data science to plan ship engine maintenance by warning about potential piston ring failure. Read more.