In a long-lived distributed system, the challenges of scaling shift from simple load (“can we handle peak load on New Year’s Eve?”) to more organizational problems of managing failure domains and development complexity. Changing one binary is hard enough – spreading dependencies across 15 or 50 poses a new set of challenges. From breaking systems into microservices to engineering for organizational resilience, this session discusses long-game approaches to making sure systems and organizations can support continuous innovation.
Astrid has built infrastructure and managed a variety of engineering teams during her 10+ years at Google, as well as spending 5+ years on call for google.com. She led the team responsible for running and building out Google’s web serving layer and managed site reliability for Google’s social products.
As part of the Cloud Platform team, she led the development of the next generation of app- and service-level infrastructure, including next-generation App Engine. She currently works in Search Infrastructure.