Build Systems that Drive Business
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA

End-to-end observability for fun and profit

Ben Hartshorne (Honeycomb), Christine Yen (Honeycomb)
9:00am–12:30pm Tuesday, June 12, 2018
Location: LL21 E/F Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Average rating: ***..
(3.00, 5 ratings)

Prerequisite knowledge

  • Experience keeping production machines healthy
  • A basic understanding of the wide range of factors that can cause an HTTP request from a client to a server to fail
  • Ability to interact with an API via the language and environment of your choice

Materials or downloads needed in advance

  • A laptop with the API access tool of your choice installed (curl/jq/bash would be sufficient (but painful); a scripting language would be better—anything that can issue HTTP requests and parse JSON responses should be sufficient.)
  • A working Go environment (If you'd like to run the sample lightbulb server yourself, the GitHub repository will be public and will likely be written in Golang.)

What you'll learn

  • Explore end-to-end (e2e) checks and learn how to implement them

Description

What does “uptime” really mean for your system? An end-to-end (e2e) check is where the rubber hits the road for your user experience and is the operator’s best tool for measuring uptime as experienced by your users. Creating and evolving e2e checks also establishes a basis for defining the SLOs and SLIs that we are willing to support.

Ben Hartshorne and Christine Yen explore what it means for a system to be “up” by explaining what makes a good end-to-end (e2e) check and what techniques are valuable when thinking about them. Along the way, you’ll learn how to write and evolve an e2e check against a common API.

The class will write one together against a common API we can all access (a small server driving a Philips Hue bulb in the front of the room), and use the simple lightbulb server as a touchpoint from which to gauge the “correctness” of the system. You’ll also write an e2e check for the server, in whichever language and environment you prefer. Ben and Christine then explore capturing, visualizing, and alerting on results (e.g., What’s useful to capture? What metadata should we have along the way? What existing paging alerts are obsoleted by an effective e2e check?) and unveil a new, extended version of the lightbulb server, with multiple light bulbs representing a sharded backend. You’ll update your e2e checks for the more complicated architecture before exploring some real-world trade-offs of e2e checks.

Photo of Ben Hartshorne

Ben Hartshorne

Honeycomb

Ben Hartshorne is an engineer at Honeycomb. For the last 12 years, Ben has built monitoring, alerting, and observability systems for companies ranging from startups like Simply Hired and Parse to large organizations such as Wikimedia and Facebook. Strangely, he actually enjoys this work and is happy to finally be building a company that will help tease out nuances in data that seem to be missing from all the other crappy open source systems he’s used. Though unlikely to pass on a good scotch, he’ll reach for the bourbon or rye first.

Christine Yen

Honeycomb

Christine Yen is the cofounder of Honeycomb, a startup with a new approach to observability and debugging systems with data. Christine has built systems and products at companies large and small and likes to have her fingers in as many pies as possible. Previously, she built Parse’s analytics product (and leveraged Facebook’s data systems to expand it) and wrote software at a few now-defunct startups.