Building and maintaining complex distributed systems
June 19–20, 2017: Training
June 20–22, 2017: Tutorials & Conference
San Jose, CA

The problem with preaggregated metrics

Christine Yen (Honeycomb)
11:25am–12:05pm Thursday, June 22, 2017
Monitoring, Tracing, & Metrics
Location: LL20 A/B
Level: Intermediate
Average rating: ****.
(4.60, 5 ratings)

Who is this presentation for?

  • Those working in infrastructure and operations and application engineers

Prerequisite knowledge

  • Familiarity with metrics or monitoring systems and the sorts of questions that are typically intended to be answered by investigating machine data or logs
  • Experience with operational metrics and using data to debug system behavior, either for an outage or in response to support requests (useful but not required)

What you'll learn

  • Learn when preaggregating metrics are a suboptimal solution
  • Understand the trade-offs made in the implementation of a preaggregated or time series metrics solution and how they impact the flexibility of the end user of said metrics system

Description

Preaggregated metrics and time series form the backbone of many monitoring setups. They have many redeeming qualities but simply aren’t sufficient for capturing or responding to the many ways things can go wrong in modern or complex systems. Preaggregating a small set of metrics is a perfectly reasonable technique for top-level KPIs but not for the day-to-day operations and debugging work that happens by your engineers on the front lines: it forces your engineers to predict what metrics will be interesting sometime in the future and hobbles their ability to quickly react to unexpected factors.

Christine Yen outlines the problems inherent in the use and implementation of preaggregated metrics and covers the implementation details inherent in building an RRD (the basis of many preaggregated metrics systems), highlighting another axis in which data is constrained. Contiguous time series stored on disk are speedy to read and easy to conceptualize but are at risk for a combinatorial explosion of inputs blowing up the underlying storage. Along the way, Christine stresses the importance of context. Relying on individual metrics and segments is like trying to extrapolate a 3D model of a room from hundreds of one-dimensional data points. When exploring a dataset, it’s crucial to be able to easily understand and visualize the interplay between the various attributes and measurements of a system event, but these one-dimensional metrics rob your engineers of this ability.

Christine Yen

Honeycomb

Christine Yen is the cofounder of Honeycomb, a startup with a new approach to observability and debugging systems with data. Christine has built systems and products at companies large and small and likes to have her fingers in as many pies as possible. Previously, she built Parse’s analytics product (and leveraged Facebook’s data systems to expand it) and wrote software at a few now-defunct startups.