Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Schedule: Data preparation, data governance, and data lineage sessions

Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Data Science, Machine Learning & AI
Location: Capital Suite 1
Don Fox (The Data Incubator)
We will walk through all the steps - from prototyping to production - of developing a machine learning pipeline. We’ll look at data cleaning, feature engineering, model building/evaluation, and deployment. Students will extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 30 April 2019
Data Science, Machine Learning & AI
Location: Capital Suite 15
S.P.T. Krishnan (REAN Cloud (A Hitachi Vantara company))
Provides an overview of the latest Big Data and Machine Learning serverless technologies from AWS, and a deep dive into using them to process and analyze two different datasets. The first dataset is publicly available Bureau of Labor Statistics, and the second is Chest X-Ray Image Data. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Peter Billen (Accenture)
In this session we will explain how to use metadata to automate delivery and operations of a data platform. By injecting automation into the delivery processes we shorten the time-to-market while improving the quality of the initial user experience. Typical examples include: Data profiling and prototyping, Test automation, Continuous delivery and deployment, Automated code creation Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Ananth Durai (Slack Technologies Inc)
Logs are everywhere. Every organization collects tons of data every day. The logs are as good as the trust it earns to make business-critical decisions. Building trust and reliability of logs are critical to creating a data-driven organization. Ananth walkthrough his experience building reliable logging infrastructure at Slack and how it helped to build confidence on data. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Mark Donsky (Okera), Steven Ross (Cloudera)
General Data Protection Regulation (GDPR) goes into effect in May 2018 for firms doing any business in the EU. However many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). This session will explore the capabilities your data environment needs in order to simplify GDPR compliance, as well as future regulations. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Lyft’s data platform is at the heart of Lyft’s business. Decisions all the way from pricing, to ETA, to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. In this talk, Mark Grover walks through various choices Lyft has made in the development and sustenance of the data platform and why along with what lies ahead in future. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Ihab Ilyas (University of Waterloo | Tamr)
Last year, we covered two primary challenges in applying machine learning to data curation: entity consolidation & using probabilistic inference to suggest data repair for identified errors and anomalies. This year, we'll cover these limitations in greater detail and explain why data unification projects common to quickly require human guided machine learning and a probabilistic model. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Paco Nathan (derwen.ai)
Data governance is an almost overwhelming topic. This talk surveys history, themes, plus a survey of tools, process, standards, etc. Mistakes imply data quality issues, lack of availability, and other risks that prevent leveraging data. OTOH, compliance issues aim to preventing risks of leveraging data inappropriately. Ultimately, risk management plays the "thin edge of the wedge" in enterprise. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Neelesh Salian (Stitch Fix)
Developing data infrastructure is not trivial and neither is changing it. It takes effort and discipline to make changes that can affect your team. In this talk, we shall learn what we, in Stitch Fix's Data Platform team, do to maintain and innovate our infrastructure for our Data Scientists. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Case studies, Strata Business Summit
Location: Capital Suite 12
Maurício Lins (everis consultancy UK), Lidia Crespo (Santander UK)
Big data is usually regarded as a menace for data privacy. However, with the right principles and mind-set, it can be a game changer to put customers first and consider data privacy an inalienable right. Santander UK applied this model to comply with GDPR by using graph technology, Hadoop, Spark, Kudu to drive data obscuring and data portability, and driving machine learning exploration. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Divya Choudhary (GOJEK)
Data scientists around the globe would agree that addresses are the most unorganised textual data. Structuring addresses has almost led to a new stream of NLP itself. Who would've imagined that address text data can be used to develop one of the coolest product feature of finding the most precise pick up/drop-off locations for e-commerce, logistics, food delivery or ride/car services companies! Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Sandeep U (Intuit)
Teams today rely on tribal data dictionaries which is a mixed bag w.r.t. correctness -- some datasets have accurate attribute details, while others are incorrect & outdated. This significantly impacts productivity of analysts & scientists. Existing tools for data dictionary are manually updated and difficult to maintain. This talk covers 3 patterns we have deployed to manage data dictionaries. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Sonal Goyal (Nube)
Enterprise data on customers, vendors, products etc is siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting and 360 views. Traditional rule based MDM systems with legacy architectures struggle to unify this growing data. This talk covers a modern master data application using Spark, Cassandra, ML and Elastic. Read more.