Mar 15–18, 2020

Fighting pipeline debt with Great Expectations

Abe Gong (Superconductive Health)
11:50am12:30pm Tuesday, March 17, 2020
Location: Expo Hall

Who is this presentation for?

Data engineers, data architects, developers

Level

Intermediate

Description

Data organizations everywhere struggle with pipeline debt: untested, unverified assumptions that corrupt data quality, drain productivity, and erode trust in data. Since its launch in early 2018, Great Expectations has become the leading open source library for fighting pipeline debt.

Abe Gong shares insights gathered from across the data community in the course of developing Great Expectations. He details success stories and best practices from teams fighting pipeline debt, plus features in Great Expectations developed to support those practices.

You’ll explore practical patterns for deploying data validation in production infrastructure. Although infrastructure choices lead to thousands of different data pipeline configurations, almost all deployment patterns for data testing fall into a few specific categories. Within these categories, many data teams have built in-house versions of the same components and business logic. Great Expectations now supports production deployment out of the box. Instead of building these components for yourself over weeks or months, you can now stand up production-ready pipeline validation in a day. This “expectations on rails” framework is flexible, extensible, and plays nice with other data engineering tools.

Best practices for improving data quality tests coverage, because writing and maintaining tests can be repetitive and time consuming. However, savvy teams can move dramatically faster with tools including automated data profiling, data visualization for accelerated review, and frameworks for managing test fixtures and data. Great Expectations provides a pluggable framework for bringing these kinds of tools together so data teams can automate what can be automated and focus on the substance of the data rather than the mechanics of test creation.

Keeping documentation in sync with code and data by compiling from tests is important because maintaining data documentation is crucial for data teams that share data across team boundaries. It’s also time consuming and thankless. As a result, many data systems suffer from “documentation rot,” where data documentation is chronically outdated, incomplete, and therefore only semitrusted.

Great Expectations’s compile-to-docs feature flips the normal workflow by allowing teams to compile their data tests to clean, human-readable data documentation. Since documentation is compiled from tests and tests are run regularly against new data as it arrives, the documentation is guaranteed to never go stale. Abe provides examples of how data teams have been able to use this feature to radically change workflows using this feature of Great Expectations.

Prerequisite knowledge

  • Experience working within technical systems where maintaining data quality and trust in data is an important concern

What you'll learn

  • Learn how to improve data quality through data testing, validation, and documentation
Photo of Abe Gong

Abe Gong

Superconductive Health

Abe Gong is CEO and cofounder at Superconductive Health. A seasoned entrepreneur, Abe has been leading teams using data and technology to solve problems in healthcare, consumer wellness, and public policy for over a decade. Previously, he was chief data officer at Aspire Health, the founding member of the Jawbone data science team, and lead data scientist at Massive Health. Abe holds a PhD in public policy, political science, and complex systems from the University of Michigan. He speaks and writes regularly on data science, healthcare, and the internet of things.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires