Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

The devil is in the details: Interactive, multiscale visualization of data lineage

Sean Kandel (Trifacta)
2:55pm–3:35pm Wednesday, 09/28/2016
Visualization & user experience
Location: 1 E 10/1 E11 Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Prerequisite knowledge

  • An understanding of data lineage, metadata, data visualization, and large-scale data processing
  • What you'll learn

  • Explore an entirely new approach to visualizing metadata and data lineage
  • Description

    Data lineage is critical to answering a wide range of questions about how data is being used within an organization. Which datasets and table columns are driving key performance indicators? How is certain privacy-sensitive data being used? Where do errors or outliers arise, and how do they propagate forward? Where are inefficient or unnecessary processing steps being taken? Tracking data lineage is also critical in real-world use cases such as regulatory reporting and compliance.

    Sean Kandel presents novel interactive visualizations for exploring data lineage across multiple levels of detail. From high-level overviews of input-output relationships to fine-grained column dependency tracking, Sean explains how analysts can rapidly navigate lineage data and formulate provenance queries to gain insight into how data is being processed and transformed. By incorporating summary statistics, distributions, and data quality metrics, these visualizations can further augment lineage views to jointly inspect schema-level metadata and the results of large-scale data processing.

    Photo of Sean Kandel

    Sean Kandel

    Trifacta

    Sean Kandel is the founder and chief technical officer at Trifacta. Sean holds a PhD from Stanford University, where his research focused on new interactive tools for data transformation and discovery, such as Data Wrangler. Prior to Stanford, Sean worked as a data analyst at Citadel Investment Group.