Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Schedule: Data preparation, data governance, and data lineage sessions

Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Data Science, Machine Learning & AI
Location: Capital Suite 1
Robert Schroll (The Data Incubator)
Robert Schroll walks you through all the steps of developing a machine learning pipeline, from prototyping to production. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 30 April 2019
Data Science, Machine Learning & AI
Location: Capital Suite 4
Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company)
Krishnan Saidapet offers an overview of the latest big data and machine learning serverless technologies from AWS and leads a deep dive into using them to process and analyze two different datasets: the publicly available Bureau of Labor Statistics dataset and the Chest X-Ray Image Data dataset. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Peter Billen (Accenture)
In this session we will explain how to use metadata to automate delivery and operations of a data platform. By injecting automation into the delivery processes we shorten the time-to-market while improving the quality of the initial user experience. Typical examples include: Data profiling and prototyping, Test automation, Continuous delivery and deployment, Automated code creation Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Ananth Durai (Slack Technologies Inc)
Logs are everywhere. Every organization collects tons of data every day. The logs are as good as the trust it earns to make business-critical decisions. Building trust and reliability of logs are critical to creating a data-driven organization. Ananth walkthrough his experience building reliable logging infrastructure at Slack and how it helped to build confidence on data. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Mark Donsky (Okera), Nikki Rouda (Amazon Web Services)
The implications of new privacy regulations for data management and analytics, such as the General Data Protection Regulation (GDPR) and the upcoming California Consumer Protection Act (CCPA), can seem complex. Mark Donsky and Nikki Rouda highlight aspects of the rules and outline the approaches that will assist with compliance. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Lyft’s data platform is at the heart of Lyft’s business. Decisions all the way from pricing, to ETA, to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. In this talk, Mark Grover walks through various choices Lyft has made in the development and sustenance of the data platform and why along with what lies ahead in future. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Ihab Ilyas (University of Waterloo | Tamr)
Last year, Ihab Ilyas covered two primary challenges in applying machine learning to data curation: entity consolidation and using probabilistic inference to suggest data repair for identified errors and anomalies. This year, he explores these limitations in greater detail and explains why data unification projects quickly require human-guided machine learning and a probabilistic model. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Paco Nathan (derwen.ai)
Effective data governance is foundational for AI adoption in enterprise, but it's an almost overwhelming topic. Paco Nathan offers an overview of its history, themes, tools, process, standards, and more. Join in to learn what impact machine learning has on data governance and vice versa. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Neelesh Salian (Stitch Fix)
Developing data infrastructure is not trivial and neither is changing it. It takes effort and discipline to make changes that can affect your team. In this talk, we shall learn what we, in Stitch Fix's Data Platform team, do to maintain and innovate our infrastructure for our Data Scientists. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Case studies, Strata Business Summit
Location: Capital Suite 12
Maurício Lins (everis consultancy UK), Lidia Crespo (Santander UK)
Big data is usually regarded as a menace for data privacy. However, with the right principles and mind-set, it can be a game changer to put customers first and consider data privacy an inalienable right. Santander UK applied this model to comply with GDPR by using graph technology, Hadoop, Spark, Kudu to drive data obscuring and data portability, and driving machine learning exploration. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Divya Choudhary (University of Southern California)
Addresses are the most unorganized textual data. In fact, structuring addresses has almost led to a new stream of NLP itself. Who would've imagined that address text data can be used to develop one of the coolest product features: finding the most precise pickup and drop-off locations for ecommerce, logistics, food delivery, and ride-hailing companies. Divya Choudhary explains. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Sandeep U (Intuit)
Teams today rely on dictionaries of collective wisdom—a mixed bag wit regard to correctness: some datasets have accurate attribute details, while others are incorrect and outdated. This significantly impacts productivity of analysts and scientists. Sandeep Uttamchandani outlines three patterns to better manage data dictionaries. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Sonal Goyal (Nube)
Enterprise data on customers, vendors, products etc is siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting and 360 views. Traditional rule based MDM systems with legacy architectures struggle to unify this growing data. This talk covers a modern master data application using Spark, Cassandra, ML and Elastic. Read more.