Erroneous or inconsistent reporting pipelines can cost organizations millions of dollars a year due to inaccurate forecasts, faulty decision making, and regulatory fines. Furthermore, ensuring accurate reports across an organization is expensive. For example, ensuring compliance has been estimated to cost financial institutions approximately 10% of their operational budgets. While existing data governance and data quality tools can define rules and standards for working with data, many analytic projects circumvent these systems.
Sean Kandel shares automated methods for detecting, visualizing, and interacting with potential anomalies in reporting pipelines, describing routines for mining data lineage across an organization to identify duplicate and inconsistent calculations or derivations. Sean explains how to leverage anomaly detection and data validation to enhance schema or logical level validation and outlines how organizations can explore this rich catalogue of data through new interactive visualizations. Along the way, Sean covers what’s required to efficiently apply these techniques to large-scale data.
Sean Kandel is the founder and chief technical officer at Trifacta. Sean holds a PhD from Stanford University, where his research focused on new interactive tools for data transformation and discovery, such as Data Wrangler. Prior to Stanford, Sean worked as a data analyst at Citadel Investment Group.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com