Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Data Science, Machine Learning & AI

29 April–2 May 2019
London, UK

If you're in data, you need to understand machine learning & AI

Machine learning lets you discover hidden insight from your data. It's a simple idea with phenomenal impact and sophisticated use cases like recommenders, text mining, real-time analytics, large-scale anomaly detection, and business forecasting.

At Strata, you’ll get a deeper and broader understanding of machine and deep learning—take a look at the sessions below.

Featured Speakers

Monday 29 April - Tuesday 30 April: 2-Day Training (Platinum & Training passes)
Tuesday 30 April: Tutorials (Gold & Silver passes)
Wednesday 1 May: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
9:00 | Location: Auditorium
Strata Data Conference Keynotes
10:45
Morning break
Thursday 2 May: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
9:00 | Location: Auditorium
Strata Data Conference Keynotes
10:45
Morning break
Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Location: S11 C
Secondary topics:  Deep Learning
Ana Hocevar (The Data Incubator)
The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. This training will introduce TensorFlow's capabilities in Python. It will move from building machine learning algorithms piece by piece to using the Keras API provided by TensorFlow with several hands-on applications. Read more.
Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Location: Capital Suite 1
Secondary topics:  Data preparation, data governance, and data lineage
Don Fox (The Data Incubator)
We will walk through all the steps - from prototyping to production - of developing a machine learning pipeline. We’ll look at data cleaning, feature engineering, model building/evaluation, and deployment. Students will extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Location: Capital Suite 7
Secondary topics:  Deep Learning
Ian Cook (Cloudera)
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools. Read more.
Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Location: London Suite 3
Secondary topics:  Deep Learning, Model lifecycle management
Amir Issaei (Databricks)
The course covers the fundamentals of neural networks and how to build distributed Keras/TensorFlow models on top of Spark DataFrames. Throughout the class, you will use Keras, TensorFlow, Deep Learning Pipelines, and Horovod to build and tune models. You will also use MLflow to track experiments and manage the machine learning lifecycle. NOTE: This course is taught entirely in Python. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 30 April 2019
Location: Capital Suite 14
Secondary topics:  Model lifecycle management
Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks Inc.)
In this workshop, we will present how to apply the concept of Continuous Delivery (CD) - which ThoughtWorks pioneered - to data science and machine learning. It allows data scientists to make changes to their models, while at the same time safely integrating and deploying them into production, using testing and automation techniques to release reliably at any time and with a high frequency. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 30 April 2019
Location: Capital Suite 15
Secondary topics:  AI and Data technologies in the cloud, Data preparation, data governance, and data lineage, Health and Medicine
S.P.T. Krishnan (REAN Cloud (A Hitachi Vantara company))
Provides an overview of the latest Big Data and Machine Learning serverless technologies from AWS, and a deep dive into using them to process and analyze two different datasets. The first dataset is publicly available Bureau of Labor Statistics, and the second is Chest X-Ray Image Data. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 30 April 2019
Location: Capital Suite 2/3
Secondary topics:  AI and Data technologies in the cloud, Model lifecycle management
Holden Karau (Google), Trevor Grant (IBM), Ilan Filonenko (Bloomberg LP), Francesca Lazzeri (Microsoft)
This workshop will quickly introduce what Kubeflow is, and how we can use it to train and serve models across different cloud environments (and on-prem). We’ll have a script to do the initial set up work ready so you can jump (almost) straight into training a model on one cloud, and then look at how to set up serving in another cluster/cloud. We will start with a simple model w/follow up links. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 30 April 2019
Location: Capital Suite 4
Secondary topics:  AI and Data technologies in the cloud, Deep Learning
Amy Unruh (Google)
This tutorial provides an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hand-ons labs, you’ll learn machine learning (ML) and TensorFlow concepts, and develop skills in developing, evaluating, and productionizing ML models. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 30 April 2019
Location: Capital Suite 14
Secondary topics:  Deep Learning, Text and Language processing and analysis
Alexander Thomas, Claudiu Branzan (G2 Web Services)
This is a hands-on tutorial for scalable NLP using the highly performant, highly scalable open-source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 30 April 2019
Location: Capital Suite 15
Secondary topics:  AI and Data technologies in the cloud, Deep Learning, Financial Services, Temporal data and time-series
Francesca Lazzeri (Microsoft), Aashish Bhateja (Microsoft)
Time series modeling and forecasting has fundamental importance to various practical domains and, during the past few decades, machine learning model-based forecasting has become very popular in the private and the public decision-making process. In this tutorial, we will walk you through the core steps for using Azure Machine Learning to build and deploy your time series forecasting models. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 30 April 2019
Location: Capital Suite 4
Secondary topics:  AI and Data technologies in the cloud, Deep Learning
Amy Unruh (Google)
This tutorial provides an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hand-ons labs, you’ll learn machine learning (ML) and TensorFlow concepts and develop skills in developing, evaluating, and productionizing ML models. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 1 May 2019
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Media, Marketing, Advertising, Retail and e-commerce
Mounia Lalmas (Spotify)
The aim of our mission is "to match fans and artists in a personal and relevant way". In this talk, Mounia will describe some of the (research) work we are doing to achieve this, from using machine learning to metric validation. She will describe works done in the context of Home, Search and Voice. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 1 May 2019
Location: Capital Suite 14
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Text and Language processing and analysis
In this talk you will learn how to use Spark NLP and Apache Spark to standardize semi-structured text. You will see how Indeed standardizes resume content at scale. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 1 May 2019
Location: Capital Suite 15/16
Secondary topics:  Ethics, Security and Privacy
The application of AI algorithms in domains such as criminal justice, credit scoring, and hiring holds unlimited promise. At the same time, it raises legitimate concerns about algorithmic fairness. There is a growing demand for fairness, accountability, and transparency from machine learning (ML) systems. In this talk we cover how to build just such a pipeline leveraging open source tools. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 1 May 2019
Location: Capital Suite 17
Secondary topics:  Deep Learning, Financial Services, Temporal data and time-series
Sami Niemi (Barclays)
Predicting transaction fraud of debit and credit card payments in real-time is an important challenge, which state-of-art supervised machine learning models can help to solve. Barclays has been developing and testing different solutions and will show how well different models perform in variety of situations like card present and card not present debit and credit card transactions. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Location: Expo Hall (Capital Hall N24)
Secondary topics:  AI and machine learning in the enterprise, Text and Language processing and analysis
Matthew Honnibal (Explosion AI)
In this talk, I'll discuss "one weird trick" that can give your NLP project a better chance of success. The advice is this: avoid a "waterfall" methodology where data definition, corpus construction, modelling and deployment are performed as separate phases of work. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Location: Capital Suite 14
Secondary topics:  Text and Language processing and analysis
Yves Peirsman (NLP Town)
In this age of big data, NLP professionals are all too often faced with a lack of data: written language is abundant, but labelled texts are much harder to get by. In my talk, I will discuss the most effective ways of addressing this challenge: from the semi-automatic construction of labelled training data to transfer learning approaches that reduce the need for labelled training examples. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Location: Capital Suite 15/16
Secondary topics:  Visualization, Design, and UX
Michael Freeman (University of Washington)
Statistical and machine learning techniques are only useful when they're understood by decision makers. While implementing these techniques is easier than ever, communicating about their assumptions and mechanics is not. In this session, participants will learn a design process for crafting visual explanations of analytical techniques and communicating them to stakeholders. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Location: Capital Suite 17
Secondary topics:  Deep Learning, Temporal data and time-series
Arun Kejariwal (Independent), Ira Cohen (Anodot)
Recently, Sequence-2-Sequence has also been used for applications based on time series data. In this talk, we first overview S2S and the early use cases of S2S. Subsequently, we shall walk through how S2S modeling can be leveraged for the aforementioned use cases, viz., real-time anomaly detection and forecasting. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Location: Capital Suite 14
Secondary topics:  Media, Marketing, Advertising, Text and Language processing and analysis
Maryam Jahanshahi (TapRecruit)
In this talk I will discuss exponential family embeddings, which are methods that extend the idea behind word embeddings to other data types. I will describe how we used dynamic embeddings to understand how data science skill-sets have transformed over the last 3 years using our large corpus of job descriptions. The key takeaway is that these models can enrich analysis of specialized datasets. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Location: Capital Suite 15/16
Secondary topics:  Financial Services, Temporal data and time-series
Alun Biffin (Van Lanschot Kempen), David Dogon (Van Lanschot Kempen)
In this talk we describe how machine learning revolutionized the stock picking process for portfolio managers at Kempen Capital Management by filtering the vast small-cap, investment universe down to a handful of optimal stocks. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Location: Capital Suite 17
Secondary topics:  Deep Learning, Text and Language processing and analysis
David Low (Pand.ai)
Transfer Learning has been proven to be a tremendous success in the Computer Vision field as a result of ImageNet competition. In the past months, the Natural Language Processing field has witnessed several breakthroughs with transfer learning, namely ELMo, OpenAI Transformer, and ULMFit. In this talk, David will be showcasing the use of transfer learning on NLP application with SOTA accuracy. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Security and Privacy
Mikio Braun (Zalando SE)
In this talk, we will look at techniques and concepts around fairness, privacy, and security when it comes to machine learning models. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Deep Learning
Wolff Dobson (Google)
In this talk, we will cover the latest in TensorFlow, both for beginners and for developers migrating from 1.x to 2.0. We'll cover the best ways to set up your model, feed your data to it, and distribute it for fast training. We'll also look at how TensorFlow has been recently upgraded to be more intuitive. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Location: Capital Suite 14
Secondary topics:  Data preparation, data governance, and data lineage
Ihab Ilyas (University of Waterloo | Tamr)
Last year, we covered two primary challenges in applying machine learning to data curation: entity consolidation & using probabilistic inference to suggest data repair for identified errors and anomalies. This year, we'll cover these limitations in greater detail and explain why data unification projects common to quickly require human guided machine learning and a probabilistic model. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Location: Capital Suite 15/16
Secondary topics:  Ethics, Financial Services, Health and Medicine
Eitan Anzenberg (Flowcast AI)
Machine learning applications balance interpretability and performance. Linear models provide formulas to directly compare the influence of the input variables, while non-linear algorithms produce more accurate models. We utilize "what-if" scenarios to calculate the marginal influence of features per prediction and compare with standardized methods such as LIME. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Location: Capital Suite 14
Secondary topics:  AI and machine learning in the enterprise, Data preparation, data governance, and data lineage, Text and Language processing and analysis, Transportation and Logistics
Divya Choudhary (GOJEK)
Data scientists around the globe would agree that addresses are the most unorganised textual data. Structuring addresses has almost led to a new stream of NLP itself. Who would've imagined that address text data can be used to develop one of the coolest product feature of finding the most precise pick up/drop-off locations for e-commerce, logistics, food delivery or ride/car services companies! Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Location: Capital Suite 15/16
Secondary topics:  Automation in data science and big data, Temporal data and time-series
Shivnath Babu (Unravel Data Systems | Duke University), Alkis Simitsis (Micro Focus)
Cost and resource provisioning are critical components of the big data stack. A magic 8-ball for the big data stack would give an enterprise a glimpse into its future needs and would enable effective and cost-efficient project and operational planning. This talk covers how to build that magic 8-ball, a decomposable time-series model, for optimal cost and resource allocation for the big data stack. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Location: Capital Suite 17
Secondary topics:  Deep Learning, Temporal data and time-series
Guoqiong Song (Intel)
Collecting and processing massive time series data (e.g., logs, sensor readings, etc.), and detecting the anomalies in real time is critical for many emerging smart systems, such as industrial, manufacturing, AIOps, IoT, etc. This talk will share how to detect anomalies of time series data using Analytics Zoo and BigDL at scale on a standard Spark cluster. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 1 May 2019
Location: Capital Suite 14
Secondary topics:  Security and Privacy
Chris Wallace (Cloudera)
Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. In this talk we’ll cover the algorithmic solutions and the product opportunities. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 1 May 2019
Location: Capital Suite 15/16
Secondary topics:  Text and Language processing and analysis
Weifeng Zhong (American Enterprise Institute)
We developed a machine learning algorithm to “read” the People’s Daily — the official newspaper of the Communist Party of China — and predict changes in China’s policy priorities using only the information in the newspaper. The output of this algorithm, which we call the Policy Change Index (PCI) of China, turns out to be a leading indicator of the actual policy changes in China since 1951. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Location: Capital Suite 14
Secondary topics:  AI and machine learning in the enterprise, Financial Services, Security and Privacy, Text and Language processing and analysis
Charlotte Werger (Van Lanschot Kempen)
This talk discusses a best practice use case for detecting fraud at a financial institution. Where traditional systems fall short, machine learning models can provide a solution. Sifting through large amounts of transaction data, external hit lists, and unstructured text data we managed to build a dynamic and robust monitoring system that successfully detects unwanted client behavior. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Location: Capital Suite 15/16
Secondary topics:  Media, Marketing, Advertising, Retail and e-commerce
Sophie Watson (Red Hat)
Identifying relevant documents quickly and efficiently enhances both user experience and business revenue every day. Sophie Watson demonstrates how to implement Learning to Rank algorithms and provides you with the information you need to implement your own successful ranking system. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Location: Capital Suite 17
Secondary topics:  Deep Learning, Graph technologies and analytics, Security and Privacy
Scott Stevenson (Faculty)
Modern deep learning systems allow us to build speech synthesis systems with the naturalness of a human speaker. Whilst there are myriad benevolent applications, this also ushers in a new era of fake news. This talk will explore the danger of such systems, as well as how deep learning can also be used to build countermeasures to protect against political disinformation. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Ethics
Machine-learning algorithms are good at learning new behaviors, but bad at identifying when those behaviors are harmful or don’t make sense. Bias, ethics, and fairness is a big risk factor in Machine Learning (ML). We have a lot of experience dealing with intelligent beings—one another. In this talk, we use this common sense to build a checklist for protecting against ethical violations with ML. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Text and Language processing and analysis
Ines Montani (Explosion AI)
In this talk, I'll explain spaCy's new support for efficient and easy transfer learning, and show you how it can kickstart new NLP projects with our new annotation tool, Prodigy Scale. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
Location: Capital Suite 14
Secondary topics:  Deep Learning, Text and Language processing and analysis
Moshe Wasserblat presents an overview of NLP Architect, an open source DL NLP library that provides SOTA NLP models making it easy for researchers to implement NLP algorithms and for data scientists to build NLP based solution for extracting insight from textual data to improve business operations. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
Location: Capital Suite 15/16
SEONMIN KIM (LINE Corp)
Kim will provide an introduction to activities that mitigate the risk of mobile payments through various data analytical skills which came out of actual case studies of mobile frauds, along with tree-based machine learning, graph analytics, and statistical approaches. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
Location: Capital Suite 17
Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)
In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. We will explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure. Read more.
Add to your personal schedule
14:0514:45 Thursday, 2 May 2019
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Data Integration and Data Pipelines, Deep Learning
Alex Jaimes (Dataminr)
When emergency events occur, social signals and sensor data are generated. In this talk, I will describe how Machine Learning and Deep Learning are applied in processing large amounts of heterogeneous data from various sources in real time, with a particular focus on how such information can be used for emergencies and in critical events for first responders and for other social good use cases. Read more.
Add to your personal schedule
14:0514:45 Thursday, 2 May 2019
Location: Capital Suite 14
Secondary topics:  Graph technologies and analytics
Mingxi Wu (TigerGraph)
Graph query language is the key to unleash the value from connected data. In this talk, we point out 8 prerequisites of a practical graph query language concluded from our 6 years experience in dealing with real world graph analytical use cases. And compare GSQL, Gremlin, Cypher and Sparql in this regard. Read more.
Add to your personal schedule
14:0514:45 Thursday, 2 May 2019
Location: Capital Suite 15/16
Secondary topics:  IoT and its applications, Temporal data and time-series
Christian Hidber (bSquare)
Reinforcement learning (RL) learns complex processes autonomously like walking, beating the world champion in go or flying a helicopter. No big data sets with the “right” answers are needed: the algorithms learn by experimenting. We show “how” and “why” RL works in an intuitive fashion & highlight how to apply it to an industrial, hydraulics application with 7000 clients in 42 countries. Read more.
Add to your personal schedule
14:0514:45 Thursday, 2 May 2019
Location: Capital Suite 17
Secondary topics:  Deep Learning
Deep Learning has enabled massive breakthroughs in offbeat tracks and has enabled better understanding of how an artist paints, how an artist composes music and so on. As part of Nischal & Raghotham’s loved project - Deep Learning for Humans, they want to build a font classifier and showcase to masses how fonts : * Can be classified * Understand how and why two or more fonts are similar Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Retail and e-commerce
Oliver Gindele (Datatonic)
The success of Deep Learning has reached the realm of structured data in the past few years where neural network have shown to improve the effectiveness and predictability of recommendation engines. This session will give a brief overview of such deep recommender systems and how they can be implemented in TensorFlow. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Location: Capital Suite 14
Shioulin Sam (Cloudera Fast Forward Labs)
Machine learning requires large datasets - a prohibitive limitation in many real world applications. What if we could build models from scratch that could recognize images using only a handful of labeled examples? In this talk, we will cover algorithmic solutions that enable learning with limited data, and discuss product opportunities. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Location: Capital Suite 15/16
Secondary topics:  IoT and its applications, Temporal data and time-series, Transportation and Logistics
Christopher Hooi (Land Transport Authority of Singapore)
The Fusion Analytics for Public Transport Event Response (FASTER) system provides a real-time advanced analytics solution for early warning of potential train incidents. Using novel fusion analytics of multiple data sources, FASTER harnesses the use of engineering and commuter-centric IoT data sources to activate contingency plans at the earliest possible time and reduce impact to commuters. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Location: Capital Suite 17
Secondary topics:  AI and machine learning in the enterprise, Deep Learning
Yoav Einav (GigaSpaces)
Technological advancements are transforming customer experience, and businesses are beginning to benefit from Deep Learning innovations to automate call center routing to the most proper agent. This session will discuss how Deep Learning models can be run with Intel BigDL and Spark frameworks co-located on an in-memory computing platform to enhance the customer experience without the need for GPUs Read more.
Add to your personal schedule
16:3517:15 Thursday, 2 May 2019
Location: Capital Suite 14
Secondary topics:  AI and machine learning in the enterprise, Financial Services, Security and Privacy
Brennan Lodge (Goldman Sachs), Jay Kesavan (Bowery Analytics LLC)
Cyber security analysts are under siege to keep pace with the ever-changing threat landscape. The analysts are overworked, burnout and bombarded with the sheer number of alerts that they must carefully investigate. To empower our cyber security analysts we can use a data science model for alert evaluations. Read more.
Add to your personal schedule
16:3517:15 Thursday, 2 May 2019
Location: Capital Suite 15/16
Secondary topics:  IoT and its applications, Transportation and Logistics
GRDF helps bring natural gas to nearly 11 million customers everyday. In partnership with GRDF, Dataiku worked to optimise the manual process of qualifying addresses to visit and ultimately save GRDF time and money. This solution was the culmination of a year-long adventure in the land of maintenance experts, legacy IT systems and agile development. Read more.