We often think of reliability in terms of improving individual components like machines, servers, and operational processes. But in a distributed system, something is always failing, and the more parts you add, the more vectors there are for failure. While tools and optimizations can help, reaching the highest levels of reliability requires rethinking the basic design of how components behave and interact.
Astrid Atkinson discusses techniques for building systems that are resilient by design.
Astrid Atkinson is director of software engineering at Google, where she leads development frameworks. During her 10+ years at Google, Astrid has built infrastructure and managed a variety of engineering teams and spent more than five years on call for Google.com. She has led teams across the infrastructure map, from the team responsible for running and building Google’s web-serving layer to App Engine and cloud systems to core search.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email velocity@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Velocity contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
@Yasmeen Frishman – you can now find the slides at the top of this page.
I also think that this was a great keynote! Will the slides also be available aside from the recording of this session?
This was a great keynote! Will the content be available for viewing shortly?