Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Schedule: Data preparation, data governance, and data lineage sessions

Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Data Science, Machine Learning & AI
Location: Capital Suite 1
Robert Schroll (The Data Incubator)
Average rating: ****.
(4.75, 4 ratings)
Robert Schroll walks you through all the steps of developing a machine learning pipeline from prototyping to production. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 30 April 2019
Data Science, Machine Learning & AI
Location: Capital Suite 4
Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company)
Average rating: ***..
(3.43, 7 ratings)
Krishnan Saidapet offers an overview of the latest big data and machine learning serverless technologies from Amazon Web Services (AWS) and leads a deep dive into using them to process and analyze two different datasets: the publicly available Bureau of Labor Statistics dataset and the Chest X-Ray Image Data dataset. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Peter Billen (Accenture)
Average rating: ****.
(4.50, 6 ratings)
Peter Billen explains how to use metadata to automate delivery and operations of a data platform. By injecting automation into the delivery processes, you shorten the time to market while improving the quality of the initial user experience. Typical examples include data profiling and prototyping, test automation, continuous delivery and deployment, and automated code creation. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Mark Donsky (Okera), Nikki Rouda (Amazon Web Services)
Average rating: ****.
(4.67, 3 ratings)
The implications of new privacy regulations for data management and analytics, such as the General Data Protection Regulation (GDPR) and the upcoming California Consumer Protection Act (CCPA), can seem complex. Mark Donsky and Nikki Rouda highlight aspects of the rules and outline the approaches that will assist with compliance. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Average rating: ****.
(4.69, 13 ratings)
Lyft’s data platform is at the heart of the company's business. Decisions from pricing to ETA to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. Mark Grover and Deepak Tiwari walk you through the choices Lyft made in the development and sustenance of the data platform, along with what lies ahead in the future. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Ihab Ilyas (Tamr | University of Waterloo)
Average rating: ****.
(4.71, 7 ratings)
Last year, Ihab Ilyas covered two primary challenges in applying machine learning to data curation: entity consolidation and using probabilistic inference to suggest data repair for identified errors and anomalies. This year, he explores these limitations in greater detail and explains why data unification projects quickly require human-guided machine learning and a probabilistic model. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Paco Nathan (derwen.ai)
Average rating: ****.
(4.14, 7 ratings)
Effective data governance is foundational for AI adoption in enterprise, but it's an almost overwhelming topic. Paco Nathan offers an overview of its history, themes, tools, process, standards, and more. Join in to learn what impact machine learning has on data governance and vice versa. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Neelesh Salian (Stitch Fix)
Average rating: ****.
(4.25, 4 ratings)
Developing data infrastructure is not trivial; neither is changing it. It takes effort and discipline to make changes that can affect your team. Neelesh Salian discusses how Stitch Fix's data platform team maintains and innovates its infrastructure for the company's data scientists. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Case studies, Strata Business Summit
Location: Capital Suite 12
Maurício Lins (everis consultancy UK), Lidia Crespo (Santander UK)
Average rating: ****.
(4.50, 4 ratings)
Big data is usually regarded as a menace to data privacy. But with data privacy principles and a customer-first mindset, it can be a game changer. Maurício Lins and Lidia Crespo explain how Santander UK applied this model to comply with GDPR, using graph technology, Hadoop, Spark, and Kudu to drive data obscuring, data portability, and machine learning exploration. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Sandeep U (Intuit)
Average rating: ****.
(4.67, 3 ratings)
Teams today rely on dictionaries of collective wisdom—a mixed bag with regard to correctness: some datasets have accurate attribute details, while others are incorrect and outdated. This significantly impacts productivity of analysts and scientists. Sandeep Uttamchandani outlines three patterns to better manage data dictionaries. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Sonal Goyal (Nube)
Average rating: *....
(1.00, 4 ratings)
Enterprise data on customers, vendors, and products is often siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting, and 360 views. Traditional rule-based MDM systems with legacy architectures struggle to unify this growing data. Sonal Goyal offers an overview of a modern master data application using Spark, Cassandra, ML, and Elastic. Read more.