Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

How Intuit reduced time to reliable insights for data pipelines

Sandeep U (Intuit)
11:50am12:30pm Wednesday, March 27, 2019
Secondary topics:  Data Integration and Data Pipelines, Financial Services
Average rating: ****.
(4.57, 7 ratings)

Who is this presentation for?

  • Data engineers and architects

Level

Beginner

Prerequisite knowledge

  • A basic understanding of big data platforms and data pipelines

What you'll learn

  • Discover three design patterns to minimize the time to reliable insights in your data platform

Description

There’s no scarcity of raw data today within enterprises. So why can’t analysts and data scientists generate insights fast enough to keep up with business needs?

The single metric Intuit uses is time to reliable insights: the total of time spent to ingest, transform, catalog, analyze, and publish. Based on the company’s experience, there are three elephants in the room when it comes to time to reliable insights—time to discover, time to catalog, and time to debug for data quality—and these aspects are unbounded. Sandeep Uttamchandani shares three design patterns/frameworks Intuit implemented to address these unbounded aspects, expediting time to reliable insights.

Topics include:

  • Circuit breakers (time to debug): Inspired by design patterns used in microservices, Intuit implemented circuit breakers in data pipelines to make data availability in dashboards a function of the data quality. This radically reduces time to debug.
  • Source crawlers (time to discover): With hundreds of data sources, discovering new sources or tracking changes in tables and columns is error prone, leading to inconsistent results. Source crawlers help make time to discover bounded.
  • Table bounties (time to catalog): Similar to bounties used for addressing table vulnerabilities, Intuit employs the process of table bounties to fan out the process of populating data dictionaries. The process automatically defines a bounty based on table access patterns.
Photo of Sandeep U

Sandeep U

Intuit

Sandeep Uttamchandani is the hands-on chief data architect and head of data platform engineering at Intuit, where he’s leading the cloud transformation of the big data analytics, ML, and transactional platform used by 3M+ small business users for financial accounting, payroll, and billions of dollars in daily payments. Previously, Sandeep held engineering roles at VMware and IBM and founded a startup focused on ML for managing enterprise systems. Sandeep’s experience uniquely combines building enterprise data products and operational expertise in managing petabyte-scale data and analytics platforms in production for IBM’s federal and Fortune 100 customers. Sandeep has received several excellence awards. He has over 40 issued patents and 25 publications in key systems conference such as VLDB, SIGMOD, CIDR, and USENIX. Sandeep is a regular speaker at academic institutions and conducts conference tutorials for data engineers and scientists. He advises PhD students and startups, serves as program committee member for systems and data conferences, and was an associate editor for ACM Transactions on Storage. He blogs on LinkedIn and his personal blog, Wrong Data Fabric. Sandeep holds a PhD in computer science from the University of Illinois at Urbana-Champaign.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)