Erroneous or inconsistent reporting pipelines can cost organizations millions of dollars a year due to inaccurate forecast, faulty decision making, and regulatory fines. Furthermore, ensuring accurate reports across an organization is expensive. For example, ensuring compliance has been estimated to cost financial institutions approximately 10 percent of their operational budgets. While existing data governance and data quality tools can define rules and standards for working with data, many analytic projects circumvent these systems.
Sean Kandel and Wei Zheng offer an overview of an entirely new approach to visualizing metadata and data lineage, demonstrating automated methods for detecting, visualizing, and interacting with potential anomalies in reporting pipelines. Sean and Wei describe routines for mining data lineage across an organization to identify duplicate and inconsistent calculations or derivations and show how to leverage anomaly detection and data validation to enhance schema or logical level validation. You’ll learn how to explore this rich catalogue of data through new interactive visualizations and what’s required to efficiently apply these techniques to large-scale data.
Sean Kandel is the founder and chief technical officer at Trifacta. Sean holds a PhD from Stanford University, where his research focused on new interactive tools for data transformation and discovery, such as Data Wrangler. Prior to Stanford, Sean worked as a data analyst at Citadel Investment Group.
As vice president of products at Trifacta, Wei Zheng combines her passion for technology with experience in enterprise software to define and shape Trifacta’s product offerings. Having founded several startups of her own, Wei believes strongly in innovative technology that solves real-world business problems. Most recently, she led product management efforts at Informatica, where she helped launch several new solutions including its Hadoop and data-virtualization products.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.