4–7 Nov 2019
Please log in

Measuring chaos: Chaos engineering and team health

Paul Osman (Under Armour Connected Fitness)
15:5016:30 Thursday, 7 November 2019
Location: Hall A1
Average rating: ****.
(4.75, 4 ratings)

Who is this presentation for?

  • Software engineers, site reliability engineers, and engineering managers

Level

Beginner

Description

Chaos engineering is exploding in popularity. Once restricted to companies like Netflix, it’s becoming a common practice in organizations of all sizes. A number of great talks have delivered techniques for introducing your organization to chaos engineering, but without effective methods for measuring impact, organizations can fall victim to resiliency theater. The results of this are predictable: adoption struggles, teams feel burned out, and chaos engineering feels like a chore.

Paul Osman details how Under Armour measures the impact of chaos engineering. He walks you through a service maturity model the company created and how it uses game days to evaluate services against it. You’ll see how Under Armour uses this data to create an overall view of team health, blamelessly identifying teams that are overburdened and need additional help from its infrastructure and SRE teams to get back on track. He also walks you through how the company uses health report cards to create visibility and shared accountability for team health and psychological safety.

Under Armour’s reliability engineering team is taking the lead on moving its culture from one of fire fighting (reactive) to building inspection (proactive). He explains how Under Armour tracks incidents through five stages (ending with continuous chaos experiments) and uses incident data to prioritize proactive reliability work.

Prerequisite knowledge

  • Familiarity with microservice architectures and incident response processes

What you'll learn

  • Learn effective tools for communicating progress on the resiliency of software systems
Photo of Paul Osman

Paul Osman

Under Armour Connected Fitness

Paul Osman is an engineering manager at Under Armour Connected Fitness. He’s been building external and internal platforms for over 10 years. From public APIs targeted at third parties to internal platform teams, he’s helped build distributed systems that power large-scale consumer applications. He’s also managed teams of engineers to deliver service-based software systems rapidly and with confidence.

  • Oracle Cloud Infrastructure
  • Cloudflare
  • JFrog
  • Akamas
  • Aqua Security Software
  • Fastly
  • Google
  • Instana
  • JetBrains
  • LaunchDarkly
  • LightStep
  • OVHcloud
  • SignalFx
  • VictorOps
  • Wayfair
  • Blameless
  • Chronosphere
  • FusionReactor
  • humanitec
  • replex GmbH
  • StackState
  • Datadog
  • GitLab
  • Gremlin
  • StormForger
  • SysEleven GmgH
  • Vamp.io

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

velocity@oreilly.com

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires