Mar 15–18, 2020

Schedule: Data Quality sessions

Add to your personal schedule
9:00am12:30pm Monday, March 16, 2020
Location: LL20D
Matt Harrison (MetaSnake)
You can use pandas to load data, inspect it, tweak it, visualize it, and do analysis with only a few lines of code. Matt Harrison leads a deep dive in plotting and Matplotlib integration, data quality, and issues such as missing data. Matt uses the split-apply-combine paradigm with groupBy and Pivot and explains stacking and unstacking data. Read more.
Add to your personal schedule
11:00am11:40am Tuesday, March 17, 2020
Location: Expo Hall
Sandeep U (Intuit), Giriraj Bagadi (Intuit), Sunil Goplani (Intuit)
Data quality metrics today focus on quantifying whether "data is a mess." But what are lead indicators to track before data actually becomes a mess? This talk shares our experiences in developing lead indicators for data quality for our production data pipelines at Intuit. The talk covers details of lead indicators, tools developed to optimize, and lessons that moved the needle on data quality. Read more.
Add to your personal schedule
11:50am12:30pm Tuesday, March 17, 2020
Location: Expo Hall
Abe Gong (Superconductive Health)
Data organizations everywhere struggle with pipeline debt: untested, unverified assumptions that corrupt data quality, drain productivity, and erode trust in data. This presentation shares best practices gathered from across the data community in the course of developing leading open source library for fighting pipeline debt and ensuring data quality: Great Expectations. Read more.
Add to your personal schedule
2:35pm3:15pm Tuesday, March 17, 2020
Location: LL21 D
Barr Moses (Monte Carlo Data)
Ever had your CEO or customer look at your report and say the numbers look way off? I'll introduce the concept of “data downtime” — periods of time when data is partial, erroneous, missing or otherwise inaccurate. Data downtime is highly costly for organizations, yet is often addressed ad hoc. We’ll discuss why data downtime matters to the data industry and how best-in-class teams address it. Read more.
Add to your personal schedule
1:45pm2:25pm Wednesday, March 18, 2020
Location: Expo Hall
Mehul Sheth (Druva)
Any software product needs to be tested against data. It is difficult to have a random but realistic data set representing production data. This session highlights the process of using production data to generate models. Production data is accessed without exposing it or violating any customer agreements on privacy. The models are then used to generate test data at scale, in lower environments. Read more.

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

For media/analyst press inquires