This tutorial is for you because:
You work in operations that are subject to outages (in management, SRE, web operations, etc.) and want to learn a system for improving response to these events.
Without question, the future of computing promises more scale, more complexity, and certainly more change—all at greater velocity. However, scale, complexity, and change, especially when occurring at an ever-increasing velocity, are the natural enemies of stability, performance, availability, and reliability.
Many companies have experienced the fear, pain, and embarrassment of handling a technology failure so significant it shook the core of the business. Without a standardized way to organize the people responding to incidents and solving technology problems, the time to restore services gets longer and longer.
The Incident Management System (IMS) has been battle tested by the American Fire Service for over 40 years across fires, rescues, hazardous materials incidents, and every other type of emergency. Rob Schnepp, Chris Hawley, and Ron Vidal explain how they adapted IMS for IT and offer an early look at content from Incident Management for IT Operations, their upcoming book from O’Reilly Media.
Rob, Chris, and Ron dive into the nuts and bolts of the Incident Management System, which is in use by a number of site reliability teams, and demonstrate how to not let a good crisis go to waste by learning from each response in productive after-action reviews (AAR). You’ll leave knowing what the Incident Management System is and why it’s the best framework to organize the people responding to an incident, how an incident commander (IC) works with subject-matter experts (SMEs) to solve high-severity problems and how to implement after-action-review (AAR) findings into production to prevent future incidents.
Rob Schnepp is a 30-year veteran of the fire service and retired as the division chief of special operations for the Alameda County, CA, Fire Department. Rob has vast experience in emergency response and served as incident commander on numerous large-scale emergencies. Rob has written two hazardous materials response textbooks and numerous peer-reviewed fire-service-related articles on incident command. He is an instructor at the National Fire Academy and for the US Defense Threat Reduction Agency, providing hazmat/WMD training to an international audience. Rob is a principal in Blackrock 3 Partners, a firm specializing in consulting, training, and war-gaming in the areas of incident management and command.
Chris Hawley is deputy program manager on contract managing the International Counterproliferation Program (ICP) of the Defense Threat Reduction Agency (DTRA), the US Department of Defense’s official combat support agency for countering the entire spectrum of chemical, biological, radiological, nuclear, and high-yield explosive threats globally.
Ron Vidal is a partner at Blackrock 3 Partners, a leading incident management firm. Ron’s technology career spans 30 years as a senior executive in critical infrastructure including fiber optic and wireless telecommunications networks, data centers, electric power networks, and oil and gas facilities for Level 3 Communications, MFS Communications, UUNet Technologies, and Kiewit. Ron led teams on $19 billion of M&A transactions and $14 billion of public market financings. Ron managed Level 3’s executive response in New York City after the 9/11 World Trade Center terrorist attack and previously served on Mayor Dinkins’s NYC Task Force on Network Reliability. Ron is a technical peer reviewer for FEMA’s Assistance to Firefighters Grant program and has been a volunteer firefighter in four states. Ron is a member of two working groups on the California Cybersecurity Task Force.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org