Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Schedule: Emerging technologies and case studies sessions

9:0012:30 Tuesday, 22 May 2018
Location: Capital Suite 10 Level: Beginner
Secondary topics:  Text and Language processing and analysis
Barbara Fusinska (Google)
Average rating: ****.
(4.33, 3 ratings)
Natural language processing techniques help address tasks like text classification, information extraction, and content generation. Barbara Fusinska offers an overview of natural language processing and walks you through building a bag-of-words representation, using Python and its machine learning libraries, and then using it for text classification. Read more.
9:0017:00 Tuesday, 22 May 2018
Location: Capital Suite 2/3
Dan Jeavons (Shell), Hollie Lubbock (Fjord), Jivan Virdee (Fjord), fausto morales (Arundo), Marty Cochrane (Arundo), Jane McConnell (Teradata), Paul Ibberson (Teradata), Michael Troughton (Conduce), Jonathan Genah (DHL Supply Chain), Allison Nau (Cox Automotive UK), Dave Fitch (The Data Lab), Maria Assunta Palmieri (Data Reply ), Niranjan Thomas (Dow Jones), Erik Elgersma (FrieslandCampina), Viola Melis (Typeform), Carme Artigas (Synergic Partners), Nuria Bombardo (Pepsico)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
11:1511:55 Wednesday, 23 May 2018
Location: Expo Hall Level: Beginner
Secondary topics:  Media, Advertising, Entertainment
Daniel Gilbert (News UK), Jonathan Leslie (Pivigo)
Average rating: ***..
(3.75, 4 ratings)
In the era of 24-hour news and online newspapers, editors in the newsroom must quickly and efficiently make sense of the enormous amounts of data that they encounter and make decisions about their content. Daniel Gilbert and Jonathan Leslie discuss an ongoing partnership between News UK and Pivigo in which a team of data science trainees helped develop an AI platform to assist in this task. Read more.
14:0514:45 Wednesday, 23 May 2018
Location: Capital Suite 7 Level: Intermediate
Secondary topics:  Security and Privacy
Joshua Patterson (NVIDIA), Chau Dang (NVIDIA)
Joshua Patterson and Mike Wendt explain how NVIDIA used GPU-accelerated open source technologies to improve its cyberdefense platforms by leveraging software from the GPU Open Analytics Initiative (GOAI) and how the company accelerated anomaly detection with more efficient machine learning models, faster deployment, and more granular data exploration. Read more.
14:0514:45 Wednesday, 23 May 2018
Location: Capital Suite 10/11 Level: Intermediate
Manas Ranjan Kar (Episource)
Average rating: ***..
(3.00, 3 ratings)
Episource is building a scalable NLP engine to help summarize medical charts and extract medical coding opportunities and their dependencies to recommend best possible ICD10 codes. Manas Ranjan Kar offers an overview of the wide variety of deep learning algorithms involved and the complex in-house training-data creation exercises that were required to make it work. Read more.
14:5515:35 Wednesday, 23 May 2018
Location: Capital Suite 7 Level: Intermediate
Lee Blum (Verint Systems)
Lee Blum offers an overview of Verint's large-scale cyber-defense system built to serve its data scientists with versatile analytic operations on petabytes of data and trillions of records, covering the company's extremely challenging use case, decision considerations, major design challenges, tips and tricks, and the system’s overall results. Read more.
14:5515:35 Wednesday, 23 May 2018
Location: Capital Suite 13 Level: Beginner
Aurélien Géron (Kiwisoft)
Average rating: ***..
(3.67, 3 ratings)
Convolutional neural networks (CNN) can now complete many computer vision tasks with superhuman ability. This is will have a large impact on manufacturing, by improving anomaly detection, product classification, analytics, and more. Aurélien Géron details common CNN architectures, explains how they can be applied to manufacturing, and covers potential challenges along the way. Read more.
16:3517:15 Wednesday, 23 May 2018
Location: Capital Suite 8/9 Level: Intermediate
Sean Glover (Lightbend)
Average rating: **...
(2.50, 2 ratings)
Kafka is best suited to run close to the metal on dedicated machines in static clusters, but these clusters are quickly becoming extinct. Companies want mixed-use clusters that take advantage of every resource available. Sean Glover offers an overview of leading Kafka implementations on DC/OS and Kubernetes to explore how reliably they run Kafka in container-orchestrated clusters. Read more.
16:3517:15 Wednesday, 23 May 2018
Location: Capital Suite 15/16 Level: Intermediate
Jude Mccorry (The Data Lab), Mahmood Adil (NHS National Services Scotland)
Average rating: *****
(5.00, 2 ratings)
Jude McCorry and Mahmood Adil offer an overview of Data Collaboratives, a new form of collaboration beyond the public-private partnership model, in which participants from different sectors  exchange data, skills, leadership, and knowledge to solve complex problems facing children in Scotland and worldwide. Read more.
17:2518:05 Wednesday, 23 May 2018
Location: Capital Suite 10/11 Level: Intermediate
Jorie Koster-Hale (Dataiku)
Average rating: *****
(5.00, 3 ratings)
Because crime is affected by a number of geospatial and temporal features, predicting crime poses a unique technical challenge. Jorie Koster-Hale shares an approach using a combination of open source data, machine learning, time series modeling, and geostatistics to determine where crime will occur, what predicts it, and what we can do to prevent it in the future. Read more.
17:2518:05 Wednesday, 23 May 2018
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
Darren Cook (QQ Trend)
Darren Cook demonstrates how to use LSTMs, state-of-the-art tokenizers, dictionaries, and other data sources to tackle translation, focusing on one of the most difficult language pairs: Japanese to English. Read more.
17:2518:05 Wednesday, 23 May 2018
Location: Capital Suite 15/16 Level: Non-technical
Richard Goyder (IMC Business Architecture | Scaled Insights), Barry Singleton (IMC Business Architecture)
Average rating: ***..
(3.60, 5 ratings)
Big data analytics tends to focus on what is easily available, which is by and large data about what has already happened, the implicit assumption being that past behavior will predict future behavior. Organizations already possess data they aren’t exploiting. Barry Singleton and Richard Goyder explain how, with the right tools, it can be used to develop far more powerful predictive algorithms. Read more.
12:0512:45 Thursday, 24 May 2018
Location: S11B Level: Intermediate
Secondary topics:  Transportation and Logistics
Mark Grover (Lyft), Ted Malaska (Capital One)
Average rating: *****
(5.00, 6 ratings)
Many details go into building a big data system for speed, from determining a respectable latency until data access and where to store the data to solving multiregion problems—or even knowing just what data you have and where stream processing fits in. Mark Grover and Ted Malaska share challenges, best practices, and lessons learned doing big data processing and analytics at scale and at speed. Read more.
14:5515:35 Thursday, 24 May 2018
Location: Capital Suite 12 Level: Non-technical
David Asboth (Cox Automotive Data Solutions), Shaun McGirr (Cox Automotive Data Solutions)
Average rating: ****.
(4.60, 5 ratings)
Cox Automotive is the world’s largest automotive service organization, which means it can combine data from across the entire vehicle lifecycle. Cox is on a journey to turn this data into insights. David Asboth and Shaun McGirr share their experience building up a data science team at Cox and scaling the company's data science process from laptop to Hadoop cluster. Read more.
16:3517:15 Thursday, 24 May 2018
Location: Capital Suite 10/11 Level: Beginner
Secondary topics:  Financial Services
Jonathan Leslie (Pivigo), Tom Harrison (Hackney Council), Maryam Qurashi (Pivigo)
Average rating: *****
(5.00, 5 ratings)
One major challenge to social housing is determining how best to target interventions when tenants fall behind on rent payments. Jonathan Leslie, Maryam Qurashi, and Tom Harrison discuss a recent project in which a team of data scientist trainees helped Hackney Council devise a more efficient, targeted strategy to detect and prioritize such situations. Read more.