Mar 15–18, 2020

Schedule: Data Wrangling and Integration sessions

Add to your personal schedule
11:00am11:40am Tuesday, March 17, 2020
Location: LL21 F
Sandeep U (Intuit), Giriraj Bagdi (Intuit), Sunil Goplani (Intuit)
Data quality metrics focus on quantifying if data is a mess. But you need to identify lead indicators before data becomes a mess. Sandeep Uttamchandani, Giriraj Bagadi, and Sunil Goplani explore developing lead indicators for data quality for Intuit's production data pipelines. You'll learn about the details of lead indicators, optimization tools, and lessons that moved the needle on data quality. Read more.
Add to your personal schedule
11:50am12:30pm Tuesday, March 17, 2020
Location: LL21 F
Abe Gong (Superconductive Health)
Data organizations everywhere struggle with pipeline debt: untested, unverified assumptions that corrupt data quality, drain productivity, and erode trust in data. Abe Gong shares best practices gathered from across the data community in the course of developing a leading open source library for fighting pipeline debt and ensuring data quality: Great Expectations. Read more.
Add to your personal schedule
4:15pm5:45pm Tuesday, March 17, 2020
Location: 230 A
Sarah Guido (InVision)
Getting your data ready for modeling is the essential first step in the machine learning process. Sarah Guido outlines the basics of preparing and standardizing data for use in machine learning models. Read more.
Add to your personal schedule
11:00am11:40am Wednesday, March 18, 2020
Location: LL21 F
Mehul Sheth (Druva)
Any software product needs to be tested against data, and it's difficult to have a random but realistic dataset representing production data. Mehul Sheth highlights using production data to generate models. Production data is accessed without exposing it or violating any customer agreements on privacy, and the models then generate test data at scale in lower environments. Read more.
Add to your personal schedule
11:00am12:30pm Wednesday, March 18, 2020
Location: 230 A
Martin Frigaard (Aperture Digital)
Martin Frigaard not only outlines how to collect, manipulate, summarize, and visualize data, but also explores how to communicate your findings in a convincing way your audience will understand and appreciate. Read more.
Add to your personal schedule
11:50am12:30pm Wednesday, March 18, 2020
Location: LL21 F
Benjamin Batorsky (MIT Sloan School of Management)
Identifying and labeling named entities such as companies or people in text is a key part of text processing pipelines. Benjamin Batorsky outlines how to train, test, and implement a named entity recognition (NER) model with spaCy. You'll get a sneak peak into how to use these techniques with large, non-English corpora. Read more.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires