San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Schedule: Data preparation, data governance, and data lineage sessions

9:00 - 17:00 Monday, 29 April & Tuesday, 30 April

Hands-on data science with Python

Data Science, Machine Learning & AI
Location: Capital Suite 1

Robert Schroll (The Data Incubator)

Average rating:

(4.75, 4 ratings)

Robert Schroll walks you through all the steps of developing a machine learning pipeline from prototyping to production. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.

9:00–12:30 Tuesday, 30 April 2019

Using AWS serverless technologies to analyze large datasets

Data Science, Machine Learning & AI
Location: Capital Suite 4

Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company)

Average rating:

(3.43, 7 ratings)

Krishnan Saidapet offers an overview of the latest big data and machine learning serverless technologies from Amazon Web Services (AWS) and leads a deep dive into using them to process and analyze two different datasets: the publicly available Bureau of Labor Statistics dataset and the Chest X-Ray Image Data dataset. Read more.

12:05–12:45 Wednesday, 1 May 2019

Leveraging metadata for automating delivery and operations of advanced data platforms

Data Engineering and Architecture
Location: S11 B

Peter Billen (Accenture)

Average rating:

(4.50, 6 ratings)

Peter Billen explains how to use metadata to automate delivery and operations of a data platform. By injecting automation into the delivery processes, you shorten the time to market while improving the quality of the initial user experience. Typical examples include data profiling and prototyping, test automation, continuous delivery and deployment, and automated code creation. Read more.

14:05–14:45 Wednesday, 1 May 2019

Executive Briefing: Big data in the era of heavy worldwide privacy regulations

Executive Briefing and best practices, Strata Business Summit
Location: Capital Suite 13

Mark Donsky (Okera), Nikki Rouda (Amazon Web Services)

Average rating:

(4.67, 3 ratings)

The implications of new privacy regulations for data management and analytics, such as the General Data Protection Regulation (GDPR) and the upcoming California Consumer Protection Act (CCPA), can seem complex. Mark Donsky and Nikki Rouda highlight aspects of the rules and outline the approaches that will assist with compliance. Read more.

14:55–15:35 Wednesday, 1 May 2019

The Lyft data platform: Now and in the future

Data Engineering and Architecture
Location: Capital Suite 8/9

Mark Grover (Lyft), Deepak Tiwari (Lyft)

Average rating:

(4.69, 13 ratings)

Lyft’s data platform is at the heart of the company's business. Decisions from pricing to ETA to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. Mark Grover and Deepak Tiwari walk you through the choices Lyft made in the development and sustenance of the data platform, along with what lies ahead in the future. Read more.

14:55–15:35 Wednesday, 1 May 2019

Solving data cleaning and unification using human-guided machine learning

Data Science, Machine Learning & AI
Location: Capital Suite 14

Ihab Ilyas (University of Waterloo)

Average rating:

(4.71, 7 ratings)

Last year, Ihab Ilyas covered two primary challenges in applying machine learning to data curation: entity consolidation and using probabilistic inference to suggest data repair for identified errors and anomalies. This year, he explores these limitations in greater detail and explains why data unification projects quickly require human-guided machine learning and a probabilistic model. Read more.

14:55–15:35 Wednesday, 1 May 2019

Executive Briefing: Overview of data governance

Executive Briefing and best practices, Strata Business Summit
Location: Capital Suite 13

Paco Nathan (derwen.ai)

Average rating:

(4.14, 7 ratings)

Effective data governance is foundational for AI adoption in enterprise, but it's an almost overwhelming topic. Paco Nathan offers an overview of its history, themes, tools, process, standards, and more. Join in to learn what impact machine learning has on data governance and vice versa. Read more.

16:35–17:15 Wednesday, 1 May 2019

How do you evolve your data infrastructure?

Data Engineering and Architecture
Location: Capital Suite 8/9

Neelesh Salian (Stitch Fix)

Average rating:

(4.25, 4 ratings)

Developing data infrastructure is not trivial; neither is changing it. It takes effort and discipline to make changes that can affect your team. Neelesh Salian discusses how Stitch Fix's data platform team maintains and innovates its infrastructure for the company's data scientists. Read more.

16:35–17:15 Wednesday, 1 May 2019

The vindication of big data: How Santander UK uses Hadoop to defend privacy

Case studies, Strata Business Summit
Location: Capital Suite 12

Maurício Lins (everis NTT DATA UK), Lidia Crespo (Santander UK)

Average rating:

(4.50, 4 ratings)

Big data is usually regarded as a menace to data privacy. But with data privacy principles and a customer-first mindset, it can be a game changer. Maurício Lins and Lidia Crespo explain how Santander UK applied this model to comply with GDPR, using graph technology, Hadoop, Spark, and Kudu to drive data obscuring, data portability, and machine learning exploration. Read more.

11:15–11:55 Thursday, 2 May 2019

Half-correct and half-wrong collective data wisdom: 3 patterns to sanity

Data Engineering and Architecture
Location: Capital Suite 8/9

Sandeep U (Intuit)

Average rating:

(4.67, 3 ratings)

Teams today rely on dictionaries of collective wisdom—a mixed bag with regard to correctness: some datasets have accurate attribute details, while others are incorrect and outdated. This significantly impacts productivity of analysts and scientists. Sandeep Uttamchandani outlines three patterns to better manage data dictionaries. Read more.

14:55–15:35 Thursday, 2 May 2019

Mastering data with Spark and machine learning

Data Engineering and Architecture
Location: S11 B

Sonal Goyal (Nube)

Average rating:

(1.00, 4 ratings)

Enterprise data on customers, vendors, and products is often siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting, and 360 views. Traditional rule-based MDM systems with legacy architectures struggle to unify this growing data. Sonal Goyal offers an overview of a modern master data application using Spark, Cassandra, ML, and Elastic. Read more.

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com