Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Circuit breakers to safeguard for garbage in, garbage out

5:25pm–6:05pm Wednesday, 09/12/2018
Secondary topics:  Data Integration and Data Pipelines, Financial Services

Who is this presentation for?

  • DevOps engineers, big data engineers, and data architects

Prerequisite knowledge

  • A basic understanding of big data technologies and data pipelines

What you'll learn

  • Explore a circuit breaker pattern for building checks in your data pipeline to ensure reliable insights are generated for data analysts and data scientists

Description

Do your analysts always trust the insights generated by your data platform? Faced with an unexpected insight, does your analyst team spend time verifying data quality, ETL correctness, and job dependencies? As financial use cases increasingly combine social feeds, these verifications are extremely complex and nonscalable given the volume, velocity, and variety. Circuit breaker is a common design pattern used by software developers to ensure graceful handling of errors in a service-oriented architecture. Taking inspiration from this pattern, Sandeep Uttamchandani outlines a circuit breaker pattern developed for data pipelines that detects and corrects problems and ensures always reliable insights.

The process of converting data into insights involves a multistage pipeline with ingestion, cleansing, transformations, and analytical operations. Each stage implements a circuit breaker that continuously analyzes metrics and correctness rules. If any of these are violated, the circuit is broken, and processing does not progress to the next stage in the pipeline. The checks are a collection of runtime analysis for data quality, job health, and operational error logs from the analytical engines and data stores. The checks are implemented as a combination of domain-knowledge rules and machine learning for anomaly detection. Depending on the type of error, the circuit breaker framework attempts to either repair and reschedule the jobs or cancels the job with a user notification. Sandeep explains how this pattern was developed and how it is applied.

Photo of Sandeep Uttamchandani

Sandeep Uttamchandani

Intuit

Sandeep Uttamchandani is a Distinguished Engineer at Intuit, where he focuses on platforms for storage, databases, analytics, and machine learning. Previously, Sandeep was cofounder and CEO of a machine learning startup focused on finding security vulnerabilities in cloud-native deployment stacks. Sandeep has nearly two decades of experience in storage and data platforms and has held various technical leadership roles and contributed to multiple enterprise products at companies including VMware and IBM. Sandeep holds 35+ issued patents. He has autored 20+ conference and journal publications and regularly blogs on All Things Enterprise Data. He holds a PhD from the University of Illinois Urbana-Champaign.

Comments on this page are now closed.

Comments

Kliment Mamykin | VP DATA ENGINEERING
09/17/2018 5:48pm EDT

Hi, I am looking for the slide of this presentation. Are they available anywhere?