Build Systems that Drive Business
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA

Principia SLOdica: A treatise on the metrology of service level objectives

Jamie Wilkinson (Google)
3:40pm–4:20pm Wednesday, June 13, 2018
Monitoring, Observability, and Performance
Location: LL21 A/B Level: Intermediate
Secondary topics: Systems Monitoring & Orchestration
Average rating: *****
(5.00, 6 ratings)

Prerequisite knowledge

  • Experience with alerting systems

What you'll learn

  • Explore SLOs and the concept of the error budget

Description

As systems grow, they get more components—and more ways to fail. The alerts of the last system’s design can slowly “boil the frog,” and suddenly no one has time to help the system scale further because they’re constantly firefighting. Alert fatigue sets in, and the team burns out.

Jamie Wilkinson offers an overview of SLOs and the concept of the error budget, a study of the motivation to move away from cause- to symtom-based alerting, and demonstrates how to implement it in your own projects. By only paging when the SLO is not met or when the error budget is being burned at a predetermined rate, you can avoid alert fatigue and keep your team ready for action when it counts. You’ll learn about alerting on your SLOs and error budget, how the implementation of that changes as systems scale, and the tools you’ll need once the alerts themselves no longer tell you what part is broken.

Photo of Jamie Wilkinson

Jamie Wilkinson

Google

Jamie Wilkinson is a site reliability engineer at Google. He’s a contributing author to the SRE Book and has presented on contemporary topics at prominent conferences such as Linux.conf.au, Monitorama, PuppetConf, Velocity, and SRECon. His interests began in monitoring and the automation of small installations and have continued with human factors in automation and systems maintenance on large systems. Despite his more than 15 years in the industry, he’s still trying to automate himself out of a job.

Comments on this page are now closed.

Comments

Stone Yen | SR. MANAGER
06/19/2018 2:55pm PDT

great session, may I know where I can download the slide?