Successful companies embrace their failures as opportunities to make their applications and organizations resilient, and part of that means paying close attention to outage investigations and digging in deep to understand how your (likely complex) system failed.
I’ll use actual Etsy.com examples for digging into the anatomy of an outage, and what a blameless and satisfying postmortem meeting looks like.
Humans and their behavior under stressful conditions are also components of our architectures, and need just as much attention as load-balancers and schema changes do. I’ll talk about how the fields of Human Factors and Resilience Engineering converges on web operations, and what we can learn from those fields.
Some of the topics covered, all of which will have real-world illustrations:
John has worked in systems operations for over fourteen years in biotech, government and online media. He started out tuning parallel clusters running vehicle crash simulations for the U.S. government, and then moved on to the Internet in 1997. He built the backing infrastructures at Salon, InfoWorld, Friendster, and Flickr.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Yvonne Romaine at email@example.com
Download the Velocity Sponsor/Exhibitor Prospectus
View a complete list of Velocity contacts