Build & maintain complex distributed systems
October 1–2, 2017: Training
October 2–4, 2017: Tutorials & Conference
New York, NY

Customer-centric observability

Mark McBride (Turbine Labs)
1:30pm2:10pm Wednesday, October 4, 2017
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Systems engineers, DevOps engineers, and operations engineers

Prerequisite knowledge

  • Experience using an observability system (e.g., Datadog, Wavefront, Graphite, or Kibana)

What you'll learn

  • Discover three metrics that can serve as an excellent high level proxy for overall system behavior
  • Understand why you should establish a common set of metrics that define system behavior, allowing your teams to communicate clearly about system status

Description

The proliferation of good metrics collection and visualization toolkits over the past five years has been a huge benefit to developers. But with so many metrics available, along with a massive proliferation of services and limited cognitive capacity, which ones should we focus on?

Mark McBride outlines three key metrics—request rate, success rate, and the latency histogram—that provide a high-level abstraction of the customer experience. If these three metrics are good, your system is healthy from a customer perspective. Using concrete examples from a multiyear journey to improve service reliability while scaling a consumer site dramatically, Mark walks you through a customer-centric monitoring approach that fosters better teamwork and faster incident resolution.

As your service gets refactored into smaller services, internal teams become customers as well. These three key metrics serve as a common frame of reference for talking about service behavior across teams. Teams can quickly evaluate how their service is behaving for customers and can also quickly evaluate how their dependencies are serving them. This makes communication about performance and reliability issues crisper and dramatically improves incident troubleshooting and resolution.

Photo of Mark McBride

Mark McBride

Turbine Labs

Mark McBride is founder and CEO of Turbine Labs, building products that help engineers ship features more quickly and safely. Previously, Mark was services engineer lead at Nest Labs and Google, where he was responsible for the development of Nest’s server infrastructure that makes it possible for Nest customers to connect with their homes from wherever they are, and as an early developer on Twitter’s streaming API, delivering thousands of messages per second in real time to millions of users. During his time at Twitter, Mark managed developer productivity and led the web delivery, developer tools, and infrastructure test teams; he also worked with a variety of deploy pipelines and led development of some of Twitter’s early service migrations, which grew into a suite of tools used to migrate of millions of requests per second from legacy services to modern replacements.