Engineer for the future of Cloud
June 10-13, 2019
San Jose, CA

Lowering costs of coordination during service outages: A multiple case analysis

Laura Maguire (The Ohio State University)
2:20pm3:00pm Thursday, June 13, 2019
Average rating: ***..
(3.50, 8 ratings)

Level

Non-technical

Prerequisite knowledge

  • Experience on an on-call squad maintaining site reliability (useful but not required)

What you'll learn

  • Explore sophisticated, nuanced practices that ease the cognitive burden of coping with complex, time-pressured incidents

Description

Laura Maguire sheds light on how engineers and their practices, tooling, organizations, and vendors interact to produce great code, stellar reliability, and minimal tech debt. . .or not. Expanding upon the 2017 STELLA report from the Ohio State University Cognitive Systems Engineering Lab and fresh data from the second cycle of the SNAFU Catchers Consortium, Laura focuses on a critical aspect of DevOps practices—coordination. Aspects of social coding such as pull requests, daily stand-ups, or wikis just wouldn’t work without it and key tools such as GitHub or Slack literally exist to facilitate it. In studying incident response across digital service organizations, she’s found coordination is a highly complex, nuanced, and sophisticated aspect of modern software engineering.

Using case study examples, Laura explores how coordination is both engineered through top-down decisions (as defined by vendor agreements, organizational structure, tech architecture) but also emergent through bottom-up practices (as defined by the strategies and techniques practitioners use) to successfully carry out joint activity. She discusses how some methods of coordination can increase attentional costs for members of the engineering team, which can slow down problem identification or resolution of complex system outages, and offers promising directions for controlling these costs. She shares insights about complex coordinative functions incident response engineering squads share with high-performing teams across high-risk, high-consequence domains and provokes you to start exploring new ways to lower the costs of coordination within your own squads.

Photo of Laura Maguire

Laura Maguire

The Ohio State University

Laura Maguire studies human performance in high-risk, high-consequence work. As a researcher with the SNAFU Catchers Consortium, she has spent the last two years studying critical digital infrastructure and the teams tasked with keeping them running. She has a master’s degree in human factors and systems safety and is currently completing her PhD in cognitive systems engineering at the Ohio State University.

Comments on this page are now closed.

Comments

Picture of Laura Maguire
Laura Maguire | GRADUATE RESEARCHER
03/14/2019 5:41am PDT

I’m looking forward to sharing what this year’s round of research within the SNAFU Catcher’s Consortium has produced! If you haven’t already read our report from the winter storm Stella workshop you can download it free at stella.report.

What’s your greatest challenge in incident response?