Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Why the next wave of data lineage is driven by automation, visualization, and interaction

Sean Kandel (Trifacta), Wei Zheng (Trifacta)
4:20pm5:00pm Wednesday, March 15, 2017
Visualization & user experience
Location: 212 A-B Level: Intermediate
Average rating: ****.
(4.33, 3 ratings)

Who is this presentation for?

  • Engineers, data analysts, and anyone creating data visualizations

Prerequisite knowledge

  • General knowledge of data lineage, data analytics, data visualization, and metadata

What you'll learn

  • Explore an entirely new approach to visualizing metadata and data lineage
  • Learn what’s required to efficiently apply these techniques to large-scale data


Erroneous or inconsistent reporting pipelines can cost organizations millions of dollars a year due to inaccurate forecast, faulty decision making, and regulatory fines. Furthermore, ensuring accurate reports across an organization is expensive. For example, ensuring compliance has been estimated to cost financial institutions approximately 10 percent of their operational budgets. While existing data governance and data quality tools can define rules and standards for working with data, many analytic projects circumvent these systems.

Sean Kandel and Wei Zheng offer an overview of an entirely new approach to visualizing metadata and data lineage, demonstrating automated methods for detecting, visualizing, and interacting with potential anomalies in reporting pipelines. Sean and Wei describe routines for mining data lineage across an organization to identify duplicate and inconsistent calculations or derivations and show how to leverage anomaly detection and data validation to enhance schema or logical level validation. You’ll learn how to explore this rich catalogue of data through new interactive visualizations and what’s required to efficiently apply these techniques to large-scale data.

Photo of Sean Kandel

Sean Kandel


Sean Kandel is the founder and chief technical officer at Trifacta. Sean holds a PhD from Stanford University, where his research focused on new interactive tools for data transformation and discovery, such as Data Wrangler. Prior to Stanford, Sean worked as a data analyst at Citadel Investment Group.

Photo of Wei Zheng

Wei Zheng


As vice president of products at Trifacta, Wei Zheng combines her passion for technology with experience in enterprise software to define and shape Trifacta’s product offerings. Having founded several startups of her own, Wei believes strongly in innovative technology that solves real-world business problems. Most recently, she led product management efforts at Informatica, where she helped launch several new solutions including its Hadoop and data-virtualization products.

Comments on this page are now closed.


03/19/2017 11:40am PDT

Any chance I could get a copy of the presentation?