Fueling innovative software
July 15-18, 2019
Portland, OR

Observability for data pipelines: Monitoring, alerting, and tracing lineage

Jiaqi Liu (University of Chicago, CTDS)
1:45pm2:25pm Thursday, July 18, 2019
Secondary topics:  Data Driven
Average rating: ****.
(4.33, 3 ratings)

Who is this presentation for?

  • Engineers and data scientists building data pipelines

Level

Intermediate

Description

Data-intensive applications, with many layers of transformations and movement from different data sources, can often be challenging to maintain and iterate on even after they are initially built and validated. To truly expand and develop a code base, developers must be able to test confidently during the development process and monitor the production system. Monitoring and testing data pipelines or real-time streaming processes can be very different from monitoring web services.

Jiaqi Liu draws on her experience building and maintaining both batch and real-time stream data pipelines to discuss how to leverage monitoring tools like Prometheus and Grafana to define and visualize metrics, how and when to alert on common health indicators, and how to gain visibility in monitoring not just the system health but the health of the data. General concepts she touches on include observability of pipeline health, interpretability of data results, and building features into data pipelines that makes monitoring and testing just a little bit easier, such as the ability to trace data lineage and designing for immutable data.

Prerequisite knowledge

  • A basic understanding of data pipelines

What you'll learn

  • Learn to identify key health metrics for monitoring data pipelines and work with time series data and monitoring tools like Prometheus and Grafana
  • Discover best practices for building pipelines that enable tracing data lineage
Photo of Jiaqi Liu

Jiaqi Liu

University of Chicago, CTDS

Jiaqi Liu is a Lead Software Engineer at the Center for Translational Data Science at the University of Chicago. Previously she was Tech Lead at Button and Data Scientist at Capital One Labs. She is passionate about bridging the gap between the science and engineering part of data-driven work and champions inclusivity in the workplace, and advocates for a culture where everyone’s input matters. She is also active in the Write/Speak/Code and Women Who Code communities.he’s a director at Women Who Code NYC.