High-severity incident management is an inherently stressful time, and it’s made even worse when the available data is lacking and heterogenous. Lyft runs Envoy at every hop of the network, providing best-in-class observability across the entirety of Lyft’s network topology. Having that set of homogenous data vastly reduces the time it takes to identify a production issue.
Constance Caramanolis simulates a production incident and walks you through a page from the dreaded PagerDuty notification to resolution, demonstrating how engineers at Lyft use Envoy’s extensive metrics to identify the root cause of the incident and then proceed to remedy the situation.
Constance Caramanolis is a software engineer on the server networking team at Lyft, where for the past two years, she has built and deployed Envoy and its ecosystem. Constance focuses on configuration management, network security, and engineering education and is an Envoy maintainer. Previously, Constance worked at Microsoft on several different projects and teams.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org