Build resilient systems at scale
May 27–29, 2015 • Santa Clara, CA

Engineering for the long game: Managing complexity in distributed systems

Astrid Atkinson (Google)
10:15am–10:35am Thursday, 05/28/2015
Location: Mission City Ballroom
Average rating: ****.
(4.45, 60 ratings)
Slides:   1-PPTX 

Prerequisite Knowledge

Basic understanding of distributed systems.


In a long-lived distributed system, the challenges of scaling shift from simple load (“can we handle peak load on New Year’s Eve?”) to more organizational problems of managing failure domains and development complexity. Changing one binary is hard enough – spreading dependencies across 15 or 50 poses a new set of challenges. From breaking systems into microservices to engineering for organizational resilience, this session discusses long-game approaches to making sure systems and organizations can support continuous innovation.

Photo of Astrid Atkinson

Astrid Atkinson


Astrid has built infrastructure and managed a variety of engineering teams during her 10+ years at Google, as well as spending 5+ years on call for She led the team responsible for running and building out Google’s web serving layer and managed site reliability for Google’s social products.

As part of the Cloud Platform team, she led the development of the next generation of app- and service-level infrastructure, including next-generation App Engine. She currently works in Search Infrastructure.