All Software Architecture, All the Time
June 10-13, 2019
San Jose, CA

Chaos engineering: When the network breaks (Velocity)

2:10pm–2:55pm Thursday, June 13, 2019
Overcoming Obstacles: Lessons in Resilience
Location: Expo Hall Sessions
Average rating: ****.
(4.67, 3 ratings)

Who is this presentation for?

  • Site reliability engineers, network engineers, system admins, and network admins

Level

Beginner

Prerequisite knowledge

  • A basic understanding of production environments and the infrastructure required to run systems
  • Familiarity with Linux, cloud infrastructure, hardware, networking, and systems troubleshooting
  • General knowledge of chaos engineering: http://techblog.netflix.com/2014/09/introducing-chaos-engineering.html

What you'll learn

  • Gain a better understanding of determining how and when your network breaks and how network chaos engineering attacks can be used to improve the resiliency of your cloud infrastructure
  • Learn about different types of network chaos engineering attacks including packet loss, packet corruption, latency, and blackhole

Description

Chaos engineering is a disciplined approach to identifying failures before they become outages. By proactively testing how a system responds under stress, you can identify and fix failures before they end up in the news. Chaos engineering lets you compare what you think will happen to what actually happens in your systems. You literally break things on purpose to learn how to build more resilient systems.

Tammy Butow walks you through network chaos engineering, covering the tools and practices you need to implement chaos engineering in your organization. Even if you’re already using chaos engineering, you’ll learn to identify new ways to use chaos engineering to improve the resilience of your network and services. You’ll also discover how other companies are using chaos engineering—and the positive results the companies have had using chaos to create reliable distributed systems.

Tammy explains chaos engineering and its principles and asks why many engineering teams (including Netflix, Gremlin, Dropbox, National Australia Bank, Under Armour, Twilio, and more) use chaos engineering and how every engineering team can use chaos engineering to create reliable systems. She gets you started using chaos engineering with your own team and gives you the tools to measure success while providing you with chaos tools and new chaos features built into cloud services and using war game environments to learn about chaos engineering. She explains how to practice chaos engineering on AWS DocumentDB, AWS DynamoDB, AWS RDS, and AWS S3.

Some advanced topics you’ll cover include how to use monitoring tools combined with chaos engineering to help you create reliable distributed systems, where you can learn more, and how to join the chaos community.