At a certain point of complexity, systems are nearly impossible to understand. So how do you stay reliable when you can’t keep the whole system in your head? Tom Croucher discusses the approaches that Uber takes to ensuring its systems stay reliable by exploring real outages and the lessons they teach us.
Tom Croucher is a Staff Engineer on the Uber SRE team, probably the fastest-growing technology company in the world. Previously, he was the CTO at Change.org, consulted for clients like Walmart, Nexenta, MySpace, Comcast, and the New York Times, and worked at Joyent on the Node.js team and Yahoo on the homepage team. Tom is the coauthor of the O’Reilly book Up and Running with Node.js and has contributed to a number of web standards for the World Wide Web Consortium (W3C) and the British Standards Institute (BSI). He has worked with some of the world’s leading brands including NASA, Tesco, Three UK, and the UK’s Channel 4 Television.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org