Developers are increasingly expected to be on call, provide out-of-hours support, and respond to production outages. Without much experience handling incidents, this can be scary and intimidating—like being dropped in the deep end. But it doesn’t have to be that way.
The content team at the Financial Times has transformed its incident response from a number of mildly terrifying multihour outages to a stable platform where team members feel comfortable on call. Drawing on this experience, Euan Finlay shares practical tips and advice on setting up an incident response framework, what to do when “everything is on fire,” and how to improve things afterward—along with some horror stories of his own.
Euan is part of the Operations & Reliability team at the FT, managing incidents across the globe. Before that, he lead a distributed team responsible for Go microservices, Docker containers in Kubernetes, and the backend APIs powering the website.
On the Ops-ier side of DevOps, he has occasionally admitted to being a sysadmin in public.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org