We all know that sinking feeling when a critical service fails. Outages are inevitable in complex systems, but a good incident commander makes the process of resolution vastly more efficient and pleasant. All engineers, SREs and software engineers alike, need to be able to respond to a page when their system isn’t meeting its objectives, but coordinating a major incident requires a skillset every bit as specialized and important as architecting a high-throughput service. Those skills are learnable; they’re also hard to teach.
Join Beth Long and Elisa Binette to learn how to build strong incident management skills at the individual level and shape organizational processes to drive down MTTR, making both customers and engineers happier. Beth and Elisa break down the characteristics of a good IC, from breadth of technical knowledge to interteam diplomacy, and offer tips on how to build and train your organization’s IC pool. Along the way, they share lessons learned from coordinating incidents that span multiple teams and products.
Beth Adele Long abandoned a potential career as a rocket scientist to tinker with websites. She’s currently a DevOps solutions strategist for New Relic and the project lead for New Relic’s collaboration with the SNAFUcatchers industry consortium. She’s obsessed with joint cognitive systems and good pens.
Elisa Binette is a senior engineering manager within the reliability organization at New Relic. The group focuses on helping teams measure and achieve their reliability goals, improving reliability for both the company’s engineers and its end customers. She’s actively involved with PDXWIT, a local nonprofit whose purpose is to strengthen the Portland women in tech community. She also loves martial arts, which she has practiced and taught for many years.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com