Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Data Science & Machine Learning

21-24 May 2018
London, UK

If you're in data, you need to understand machine learning

Machine learning lets you discover hidden insight from your data. It's a simple idea with phenomenal impact and sophisticated use cases like recommenders, text mining, real-time analytics, large-scale anomaly detection, and business forecasting.

At Strata, you’ll get a deeper and broader understanding of machine and deep learning—take a look at the sessions below.

Monday-Tuesday 21-22 May: 2-Day Training (Platinum & Training passes)
Tuesday 22 May: Tutorials (Gold & Silver passes)
Wednesday 23 May: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
9:00 | Location: Auditorium
Strata Data Conference Keynotes
10:45
Morning break
Thursday 24 May: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
9:00 | Location: Auditorium
Strata Data Conference Keynotes
10:45
Morning break
9:0012:30 Tuesday, 22 May 2018
Location: Capital Suite 10 Level: Beginner
Secondary topics:  Text and Language processing and analysis
Barbara Fusinska (Google)
Average rating: ****.
(4.33, 3 ratings)
Natural language processing techniques help address tasks like text classification, information extraction, and content generation. Barbara Fusinska offers an overview of natural language processing and walks you through building a bag-of-words representation, using Python and its machine learning libraries, and then using it for text classification. Read more.
9:0012:30 Tuesday, 22 May 2018
Location: Capital Suite 15 Level: Intermediate
Vartika Singh (Cloudera), Juan Yu (Cloudera), Marton Balassi (Cloudera), Steven Totman (Cloudera)
Average rating: ***..
(3.75, 4 ratings)
Vartika Singh, Marton Balassi, Steven Totman, and Juan Yu outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. Read more.
9:0017:00 Tuesday, 22 May 2018
Location: Capital Suite 2/3
Dan Jeavons (Shell), Hollie Lubbock (Fjord), Jivan Virdee (Fjord), fausto morales (Arundo), Marty Cochrane (Arundo), Jane McConnell (Teradata), Paul Ibberson (Teradata), Michael Troughton (Conduce), Jonathan Genah (DHL Supply Chain), Allison Nau (Cox Automotive UK), Dave Fitch (The Data Lab), Maria Assunta Palmieri (Data Reply ), Niranjan Thomas (Dow Jones), Erik Elgersma (FrieslandCampina), Viola Melis (Typeform), carme artigas (Synergic Partners), Nuria Bombardo (Pepsico)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
13:3017:00 Tuesday, 22 May 2018
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
David Talby (Pacific AI), Claudiu Branzan (Accenture)
Average rating: ****.
(4.33, 3 ratings)
Natural language processing is a key component in many data science systems. David Talby and Claudiu Branzan lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, Spark NLP for building distributed natural language machine-learned pipelines, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.
11:1511:55 Wednesday, 23 May 2018
Location: Capital Suite 10/11 Level: Beginner
Secondary topics:  Transportation and Logistics
Average rating: ****.
(4.45, 11 ratings)
Because in-house data science teams work with a range of business functions, traditional data science processes are often too abstract to cope with the complexity of these environments. Alberto Rey Villaverde and Grigorios Mingas share case studies from easyJet that highlight some unpredictable hurdles related to requirements, data, infrastructure, and deployment and explain how they solved them. Read more.
11:1511:55 Wednesday, 23 May 2018
Location: Capital Suite 12 Level: Non-technical
Secondary topics:  Security and Privacy
Steven Touw (Immuta)
Average rating: ****.
(4.25, 4 ratings)
The Strata Data conference in London takes place during one of the most important weeks in the history of data regulation, as GDPR begins to be enforced. Steve Touw explores the effects of the GDPR on deploying machine learning models in the EU. Read more.
11:1511:55 Wednesday, 23 May 2018
Location: Capital Suite 13 Level: Advanced
Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Ilia Karmanov (Microsoft)
Average rating: ****.
(4.00, 5 ratings)
Mathew Salvaris, Miguel Gonzalez-Fierro, and Ilia Karmanov offer a comparison of two platforms for running distributed deep learning training in the cloud, using a ResNet network trained on the ImageNet dataset as an example. You'll examine the performance of each as the number of nodes scales and learn some tips and tricks as well as some pitfalls to watch out for. Read more.
11:1511:55 Wednesday, 23 May 2018
Location: Capital Suite 14
Secondary topics:  Media, Advertising, Entertainment, Security and Privacy
Guillaume Chaslot (AlgoTransparency)
Average rating: ****.
(4.17, 6 ratings)
An increasing number of ex-Google and ex-Facebook employees state that social media is starting to control us rather than the other way around. How can we determine if social media is a pure reflection of people's interests or if it pushes us toward specific narratives? Guillaume Chaslot explores methodologies to find out which narratives are favored by social media recommendation engines. Read more.
11:1511:55 Wednesday, 23 May 2018
Location: Expo Hall Level: Beginner
Secondary topics:  Media, Advertising, Entertainment
Dan Gilbert (News UK), Jonathan Leslie (Pivigo)
Average rating: ***..
(3.75, 4 ratings)
In the era of 24-hour news and online newspapers, editors in the newsroom must quickly and efficiently make sense of the enormous amounts of data that they encounter and make decisions about their content. Daniel Gilbert and Jonathan Leslie discuss an ongoing partnership between News UK and Pivigo in which a team of data science trainees helped develop an AI platform to assist in this task. Read more.
12:0512:45 Wednesday, 23 May 2018
Location: Capital Suite 10/11 Level: Intermediate
Secondary topics:  Financial Services
Baiju Devani (Aviva Canada), Etienne Chasse St-Laurent (Aviva Canada)
Average rating: ***..
(3.33, 3 ratings)
Risk-sharing pools allow insurers to get rid of risks they are forced to insure in highly regulated markets. Insurers thus cede both the risk and its premium. But are they ceding the right risk or simply giving up premium? Baiju Devani and Étienne Chassé St-Laurent share an applied machine learning approach that leverages an ensemble of models to gain a distinctive market advantage. Read more.
12:0512:45 Wednesday, 23 May 2018
Location: Capital Suite 12
Secondary topics:  Media, Advertising, Entertainment, Security and Privacy
Elisa Celis (EPFL)
Average rating: ****.
(4.25, 4 ratings)
There is a pressing need to design new algorithms that are socially responsible in how they learn and socially optimal in the manner in which they use information. Elisa Celis explores the emergence of bias in algorithmic decision making and presents first steps toward developing a systematic framework to control biases in classical problems, such as data summarization and personalization. Read more.
12:0512:45 Wednesday, 23 May 2018
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  E-commerce and Retail, Media, Advertising, Entertainment
Average rating: ****.
(4.43, 7 ratings)
In the last few years, deep learning has achieved significant success in a wide range of domains, including computer vision, artificial intelligence, speech, NLP, and reinforcement learning. However, deep learning in recommender systems has, until recently, received relatively little attention. Nick Pentreath explores recent advances in this area in both research and practice. Read more.
12:0512:45 Wednesday, 23 May 2018
Location: Capital Suite 14 Level: Intermediate
Secondary topics:  Visualization, Design, and UX
Jeff Fletcher (Cloudera)
Average rating: ****.
(4.73, 11 ratings)
As big data adoption grows, Apache Hadoop, Apache Spark, and machine learning technologies are increasingly being used to analyze ever-larger datasets, but we still have to keep telling stories about the data and making sure the message is clear. Jeff Fletcher details the tools and techniques that are relevant to data visualization practitioners working with large datasets and predictive models. Read more.
12:0512:45 Wednesday, 23 May 2018
Location: Expo Hall Level: Intermediate
Konstantinos Georgatzis (QuantumBlack), Martha Imprialou (QuantumBlack)
Konstantinos Georgatzis and Martha Imprialou explain how to interpret the predictions given by your black-box model and how machine learning is helping to drive decision making today. Read more.
14:0514:45 Wednesday, 23 May 2018
Location: Capital Suite 10/11 Level: Intermediate
Manas Ranjan Kar (Episource)
Average rating: ***..
(3.00, 3 ratings)
Episource is building a scalable NLP engine to help summarize medical charts and extract medical coding opportunities and their dependencies to recommend best possible ICD10 codes. Manas Ranjan Kar offers an overview of the wide variety of deep learning algorithms involved and the complex in-house training-data creation exercises that were required to make it work. Read more.
14:0514:45 Wednesday, 23 May 2018
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Telecom, Time Series and Graphs
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
Average rating: ***..
(3.00, 1 rating)
Heitor Murilo Gomes and Albert Bifet offer an overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei's Noah’s Ark Lab and Télécom ParisTech. Read more.
14:0514:45 Wednesday, 23 May 2018
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Security and Privacy
eran avidan (Intel)
Average rating: ****.
(4.50, 2 ratings)
Deep learning is revolutionizing many domains within computer vision, but doing real-time analysis is challenging. Eran Avidan offers an overview of a novel architecture based on Redis, Docker, and TensorFlow that enables real-time analysis of high-resolution streaming video. Read more.
14:0514:45 Wednesday, 23 May 2018
Location: Capital Suite 14 Level: Non-technical
Jivan Virdee (Fjord), Hollie Lubbock (Fjord)
Average rating: *****
(5.00, 2 ratings)
Artificial intelligence systems are powerful agents of change in our society, but as this technology becomes increasingly prevalent—transforming our understanding of ourselves and our society—issues around ethics and regulation will arise. Jivan Virdee and Hollie Lubbock explore how to address fairness, accountability, and the long-term effects on our society when designing with data. Read more.
14:0514:45 Wednesday, 23 May 2018
Location: Capital Suite 15/16 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions
Ihab Ilyas (University of Waterloo)
Average rating: ****.
(4.40, 5 ratings)
Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas provides insight into various techniques and discusses how machine learning, human expertise, and problem semantics collectively can deliver a scalable, high-accuracy solution. Read more.
14:5515:35 Wednesday, 23 May 2018
Location: Capital Suite 10/11
Harvinder Atwal (Moneysupermarket)
Average rating: *****
(5.00, 4 ratings)
Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, and shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, and more. Read more.
14:5515:35 Wednesday, 23 May 2018
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  E-commerce and Retail, Financial Services, Time Series and Graphs
Mikio Braun (Zalando)
Average rating: ****.
(4.40, 15 ratings)
Time series data has many applications in industry, in particular predicting the future based on historical data. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn't. Read more.
14:5515:35 Wednesday, 23 May 2018
Location: Capital Suite 13 Level: Beginner
Aurélien Géron (Kiwisoft)
Average rating: ***..
(3.67, 3 ratings)
Convolutional neural networks (CNN) can now complete many computer vision tasks with superhuman ability. This is will have a large impact on manufacturing, by improving anomaly detection, product classification, analytics, and more. Aurélien Géron details common CNN architectures, explains how they can be applied to manufacturing, and covers potential challenges along the way. Read more.
14:5515:35 Wednesday, 23 May 2018
Location: Capital Suite 14 Level: Beginner
Secondary topics:  Visualization, Design, and UX
Brian O'Neill (Designing for Analytics)
Average rating: ****.
(4.00, 2 ratings)
Gartner says 85%+ of big data projects will fail. Your own company may have even spent millions on a recent project that isn’t really delivering the value or UX everyone hoped for. Brian O'Neill explains why CDOs, PMs, and business leaders who leverage design to prioritize utility, usability, and customer value will realize the best ROIs and demonstrates how to start evaluating your UX. Read more.
14:5515:35 Wednesday, 23 May 2018
Location: Expo Hall Level: Intermediate
Secondary topics:  Managing and Deploying Machine Learning
Emre Velipasaoglu (Lightbend)
Average rating: ***..
(3.67, 3 ratings)
Most machine learning algorithms are designed to work on stationary data, but real-life streaming data is rarely stationary. Models lose prediction accuracy over time if they are not retrained. Without model quality monitoring, retraining decisions are suboptimal and costly. Emre Velipasaoglu reviews monitoring methods, focusing on their applicability in fast data and streaming applications. Read more.
16:3517:15 Wednesday, 23 May 2018
Location: Capital Suite 10/11 Level: Non-technical
Secondary topics:  Text and Language processing and analysis
Naveed Ghaffar (Narrative Economics), Rashed Iqbal (UCLA)
Average rating: ***..
(3.67, 3 ratings)
Narratives are significant vectors of rapid change in culture, economic behavior, and the Zeitgeist of a society. Narrative economics studies the impact of popular human-interest stories on economic fluctuations. Naveed Ghaffar and Rashed Iqbal outline a framework that uses natural language understanding to extract and analyze narratives in human communication. Read more.
16:3517:15 Wednesday, 23 May 2018
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Time Series and Graphs
Arun Kejariwal (Independent), Francois Orsini (MZ)
Average rating: ***..
(3.14, 7 ratings)
The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making. Read more.
16:3517:15 Wednesday, 23 May 2018
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines sessions, Media, Advertising, Entertainment
Sergey Ermolin (Intel), Olga Ermolin (MLS Listings)
Average rating: ****.
(4.00, 1 rating)
Aggregation of geospecific real estate databases results in duplicate entries for properties located near geographical boundaries. Sergey Ermolin and Olga Ermolin detail an approach for identifying duplicate entries via the analysis of images that accompany real estate listings that leverages a transfer learning Siamese architecture based on VGG-16 CNN topology. Read more.
16:3517:15 Wednesday, 23 May 2018
Location: Capital Suite 14 Level: Beginner
Secondary topics:  Visualization, Design, and UX
Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)
Average rating: *****
(5.00, 1 rating)
Creating visualizations for data science requires an interactive setup that works at scale. Bargava Subramanian and Amit Kapoor explore the key architectural design considerations for such a system and discuss the four key trade-offs in this design space: rendering for data scale, computation for interaction speed, adapting to data complexity, and being responsive to data velocity. Read more.
17:2518:05 Wednesday, 23 May 2018
Location: Capital Suite 10/11 Level: Intermediate
Jorie Koster-Hale (Dataiku)
Average rating: *****
(5.00, 3 ratings)
Because crime is affected by a number of geospatial and temporal features, predicting crime poses a unique technical challenge. Jorie Koster-Hale shares an approach using a combination of open source data, machine learning, time series modeling, and geostatistics to determine where crime will occur, what predicts it, and what we can do to prevent it in the future. Read more.
17:2518:05 Wednesday, 23 May 2018
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Security and Privacy, Time Series and Graphs
Fabian Yamaguchi (ShiftLeft)
Average rating: ****.
(4.33, 3 ratings)
Fabian Yamaguchi offers an overview of Code Property Graph (CPG), a unique approach that allows the functional elements of code to be represented in an interconnected graph of data and control flows, which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed. Read more.
17:2518:05 Wednesday, 23 May 2018
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
Darren Cook (QQ Trend)
Darren Cook demonstrates how to use LSTMs, state-of-the-art tokenizers, dictionaries, and other data sources to tackle translation, focusing on one of the most difficult language pairs: Japanese to English. Read more.
17:2518:05 Wednesday, 23 May 2018
Location: Capital Suite 14 Level: Intermediate
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Average rating: ***..
(3.83, 6 ratings)
Sure, you’ve got the best and fastest running SQL engine, but you’ve still got some problems: Users don’t know which tables exist or what they contain; sometimes bad things happen to your data, and you need to regenerate partitions but there is no tool to do so. Mark Grover and Deepak Tiwari explain how to make your team and your larger organization more productive when it comes to consuming data. Read more.
11:1511:55 Thursday, 24 May 2018
Location: Capital Suite 10/11 Level: Beginner
Average rating: **...
(2.50, 2 ratings)
Tuning a Spark ML model using cross-validation involves a computationally expensive search over a large parameter space. Nick Pentreath and Bryan Cutler explain how enabling Spark to evaluate models in parallel can significantly reduce the time to complete this process for large workloads and share best practices for choosing the right configuration to achieve optimal resource usage. Read more.
11:1511:55 Thursday, 24 May 2018
Location: Capital Suite 12 Level: Beginner
Jeroen Janssens (Data Science Workshops)
Average rating: ***..
(3.00, 2 ratings)
"Anyone who does not have the command line at their beck and call is really missing something," tweeted Tim O'Reilly when Jeroen Janssens's Data Science at the Command Line was recently made available online for free. Join Jeroen to learn what you're missing out on if you're not applying the command line and many of its power tools to typical data science problems. Read more.
11:1511:55 Thursday, 24 May 2018
Location: Capital Suite 13 Level: Beginner
Secondary topics:  Managing and Deploying Machine Learning
Ramesh Sridharan (Captricity)
Average rating: ****.
(4.00, 1 rating)
Most uses of deep learning involve models trained with large datasets. Ramesh Sridharan explains how Captricity uses deep learning with tiny datasets at scale, training thousands of models using tens to hundreds of examples each. These models are dynamically trained using an automatic deployment framework, and carefully chosen metrics further exploit error properties of the resulting models. Read more.
11:1511:55 Thursday, 24 May 2018
Location: Expo Hall Level: Beginner
Secondary topics:  Time Series and Graphs
Jared Lander (Lander Analytics)
Average rating: ****.
(4.00, 2 ratings)
Temporal data is being produced in ever-greater quantity, but fortunately our time series capabilities are keeping pace. Jared Lander explores techniques for modeling time series, from traditional methods such as ARMA to more modern tools such as Prophet and machine learning models like XGBoost and neural nets. Along the way, Jared shares theory and code for training these models. Read more.
11:1511:55 Thursday, 24 May 2018
Location: Capital Suite 14 Level: Intermediate
Secondary topics:  Managing and Deploying Machine Learning
Ted Dunning (MapR, now part of HPE)
Average rating: *****
(5.00, 1 rating)
Ted Dunning offers an overview of the rendezvous architecture, which is geared to deal with much of the complexity involved in deploying models to production, thus allowing more time to be spent thinking and doing real data science. Ted covers the ideas behind the architecture, practical scenarios, and advantages and disadvantages of the architecture. Read more.
12:0512:45 Thursday, 24 May 2018
Location: Capital Suite 10/11 Level: Intermediate
Secondary topics:  Financial Services
Mike Lee Williams (Cloudera Fast Forward Labs)
Average rating: *****
(5.00, 2 ratings)
Interpretable models result in more accurate, safer, and more profitable machine learning products, but interpretability can be hard to ensure. Michael Lee Williams examines the growing business case for interpretability, explores concrete applications including churn, finance, and healthcare, and demonstrates the use of LIME, an open source, model-agnostic tool you can apply to your models today. Read more.
12:0512:45 Thursday, 24 May 2018
Location: Capital Suite 12 Level: Intermediate
Secondary topics:  Financial Services
Calum Murray (Intuit)
Average rating: *....
(1.50, 2 ratings)
Machine learning-based applications are becoming the new norm. Calum Murray shares five use cases at Intuit that use the data of over 60 million users to create delightful experiences for customers by solving repetitive tasks, freeing them up to spend time more productively or solving very complex tasks with simplicity and elegance. Read more.
12:0512:45 Thursday, 24 May 2018
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Data Platforms, Managing and Deploying Machine Learning
Moty Fania (Intel)
Moty Fania explains how Intel implemented an AI inference platform to enable internal visual inspection use cases and shares lessons learned along the way. The platform is based on open source technologies and was designed for real-time streaming and online actuation. Read more.
12:0512:45 Thursday, 24 May 2018
Location: Expo Hall Level: Intermediate
Secondary topics:  Time Series and Graphs
Erik Nordström (Timescale)
Erik Nordström explains how and why to use PostgreSQL as a Prometheus backend to support complex questions (and get a proper SQL interface), offers an overview of pg_prometheus, a custom Prometheus datatype, and prometheus-postgresql-adapter, a remote storage adaptor for PostgreSQL, and shares his experience with TimescaleDB, which enables PostgreSQL to scale for classic monitoring volumes. Read more.
14:0514:45 Thursday, 24 May 2018
Location: Capital Suite 10/11 Level: Beginner
Paco Nathan (derwen.ai)
Average rating: ****.
(4.50, 2 ratings)
Human in the loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. Such systems are mostly automated, with exceptions referred to human experts, who help train the machines further. Paco Nathan offers an overview of HITL from the perspective of a business manager, focusing on use cases within O'Reilly Media. Read more.
14:0514:45 Thursday, 24 May 2018
Location: Capital Suite 12 Level: Beginner
Kaylea Haynes (Peak )
Deciding how much stock to hold is a challenge for hire businesses. There is a fine balance between holding enough stock to fulfill hires and not holding too much stock so that overall utilization is too low to achieve the return on investment. Kaylea Haynes shares a case study on forecasting the demand for thousands of assets across multiple locations. Read more.
14:0514:45 Thursday, 24 May 2018
Location: Expo Hall Level: Intermediate
Secondary topics:  Financial Services, Text and Language processing and analysis
David Talby (Pacific AI), Saif Addin Ellafi (John Snow Labs), Paul Parau (UiPath)
Average rating: ****.
(4.50, 4 ratings)
Spark NLP natively extends Spark ML to provide natural language understanding capabilities with performance and scale that was not possible to date. David Talby, Saif Addin Ellafi, and Paul Parau explain how Spark NLP was used to augment the Recognos smart data extraction platform in order to automatically infer fuzzy, implied, and complex facts from long financial documents. Read more.
14:0514:45 Thursday, 24 May 2018
Location: Capital Suite 2/3 Level: Intermediate
Secondary topics:  Telecom
Sven Loeffler (Deutsche Telekom)
Average rating: **...
(2.00, 1 rating)
Sven Löffler offers an overview of the Data Intelligence Hub, T-Systems's implementation of the Fraunhofer Industrial Data Space: a reference architecture for the standardized and secure data exchange between industries in the context of the internet of things. Read more.
14:5515:35 Thursday, 24 May 2018
Location: Capital Suite 10/11 Level: Intermediate
Elena Terenzi (Microsoft), Michael Lanzetta (Microsoft)
Average rating: ****.
(4.00, 3 ratings)
Michael Lanzetta and Elena Terenzi offer an overview of a collaboration between Microsoft and the Royal Holloway University that applied deep learning to locate illegal small-scale mines in Ghana using satellite imagery, scaled training using Kubernetes, and investigated the mines' impact on surrounding populations and environment. Read more.
14:5515:35 Thursday, 24 May 2018
Location: Capital Suite 12 Level: Non-technical
David Asboth (Cox Automotive Data Solutions), Shaun McGirr (Cox Automotive Data Solutions)
Average rating: ****.
(4.60, 5 ratings)
Cox Automotive is the world’s largest automotive service organization, which means it can combine data from across the entire vehicle lifecycle. Cox is on a journey to turn this data into insights. David Asboth and Shaun McGirr share their experience building up a data science team at Cox and scaling the company's data science process from laptop to Hadoop cluster. Read more.
14:5515:35 Thursday, 24 May 2018
Location: Capital Suite 13 Level: Intermediate
Secondary topics:  Financial Services, Time Series and Graphs
Francesca Lazzeri (Microsoft), Jaya Susan Mathew (Microsoft)
Average rating: ****.
(4.00, 2 ratings)
Advancements in computing technologies and ecommerce platforms have amplified the risk of online fraud, which results in billions of dollars of loss for the financial industry. This trend has urged companies to consider AI techniques, including deep learning, for fraud detection. Francesca Lazzeri and Jaya Mathew explain how to operationalize deep learning models with Azure ML to prevent fraud. Read more.
14:5515:35 Thursday, 24 May 2018
Location: Expo Hall Level: Beginner
Secondary topics:  Data Integration and Data Pipelines sessions, Data Platforms
Stamatis Stefanakos (D ONE AG)
Average rating: ****.
(4.33, 3 ratings)
Switzerland-based startup WinJi capitalizes on two current megatrends: big data and renewable energy. Stamatis Stefanakos offers an overview of WinJi's TruePower Asset Management Platform, covering the overall architecture and the motivation behind it, the physics behind the data, and the business case. Read more.
14:5515:35 Thursday, 24 May 2018
Location: Capital Suite 14 Level: Intermediate
Chen Salomon (Playbuzz)
Average rating: ****.
(4.00, 1 rating)
A/B testing is the foundation of data-driven decision making. In today's world, advertising is crucial to a website's revenue, so it is even more important to measure the effects of changes correctly. Chen Salomon demonstrates how to correctly design and implement an advertisement A/B testing and shares pitfalls, potential biases related to advertisement metrics, and possible mitigations. Read more.
16:3517:15 Thursday, 24 May 2018
Location: Capital Suite 10/11 Level: Beginner
Secondary topics:  Financial Services
Jonathan Leslie (Pivigo), Tom Harrison (Hackney Council), Maryam Qurashi (Pivigo)
Average rating: *****
(5.00, 5 ratings)
One major challenge to social housing is determining how best to target interventions when tenants fall behind on rent payments. Jonathan Leslie, Maryam Qurashi, and Tom Harrison discuss a recent project in which a team of data scientist trainees helped Hackney Council devise a more efficient, targeted strategy to detect and prioritize such situations. Read more.
16:3517:15 Thursday, 24 May 2018
Location: Capital Suite 13 Level: Intermediate
Amit Kapoor (narrativeVIZ), Bargava Subramanian (Binaize)
Amit Kapoor and Bargava Subramanian lead three live demos of deep learning (DL) done in the browser—building explorable explanations to aid insight, building model inference applications, and rapid prototyping and training an ML model—using the emerging client-side JavaScript libraries for DL. Read more.
16:3517:15 Thursday, 24 May 2018
Location: Capital Suite 14 Level: Intermediate
Pascal Bugnion (ASI Data Science)
Jupyter widgets let you create lightweight, interactive graphical interfaces directly in Jupyter notebooks. Pascal Bugnion demonstrates how to use Jupyter widgets to implement human-in-the-loop machine learning with highly interactive user interfaces. Read more.