Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Driving the next wave of data lineage with automation, visualization, and interaction

Sean Kandel (Trifacta)
12:0512:45 Thursday, 25 May 2017
Level: Beginner
Average rating: ***..
(3.00, 6 ratings)

Who is this presentation for?

  • Data scientists, data architects, Hadoop users, and business analysts

Prerequisite knowledge

  • A basic understanding of data lineage, metadata, and data preparation

What you'll learn

  • Learn new methods for detecting, visualizing, and interacting with anomalies that arise within reporting data

Description

Erroneous or inconsistent reporting pipelines can cost organizations millions of dollars a year due to inaccurate forecasts, faulty decision making, and regulatory fines. Furthermore, ensuring accurate reports across an organization is expensive. For example, ensuring compliance has been estimated to cost financial institutions approximately 10% of their operational budgets. While existing data governance and data quality tools can define rules and standards for working with data, many analytic projects circumvent these systems.

Sean Kandel shares automated methods for detecting, visualizing, and interacting with potential anomalies in reporting pipelines, describing routines for mining data lineage across an organization to identify duplicate and inconsistent calculations or derivations. Sean explains how to leverage anomaly detection and data validation to enhance schema or logical level validation and outlines how organizations can explore this rich catalogue of data through new interactive visualizations. Along the way, Sean covers what’s required to efficiently apply these techniques to large-scale data.

Photo of Sean Kandel

Sean Kandel

Trifacta

Sean Kandel is the founder and chief technical officer at Trifacta. Sean holds a PhD from Stanford University, where his research focused on new interactive tools for data transformation and discovery, such as Data Wrangler. Prior to Stanford, Sean worked as a data analyst at Citadel Investment Group.