Drawing on her experience at Uber, Susan Fowler explains how to smash a monolithic API into many hundreds of containerized microservices for fun and profit—and how to create a posture of resiliency and reliability around an infrastructure that grows and changes daily with incredible velocity.
Susan organizes her talk into three key areas:
Process management: It’s not just about building software. You have to have processes around building software that include incident reviews for security and outages. You also have to have process flows to follow up on the things that come out of these processes (and provide parallel paths to new code development) so that ongoing software failures do not sabotage new development or deplete resources (or morale). Process management and organizational overhead becomes critical as the number of services and complexity increases.
Metrics and monitoring: If you can’t quantify what you are doing, why are you doing it? If you change a piece of code but can’t quantify the effect of that change, why did you bother to change it? Containerization and microservices give us new introspection into the tiniest spaces in our software stack with increased agility. But we have to know where to look, how to instrument that monitoring, and what to do with the metrics once we have them; by checking at cross-service boundaries, monitoring code defects, performing exhaustive root cause analysis, et cetera, we begin to see a deep picture of very tiny pieces of a very large and complex architecture. Susan shows how incident mitigation and a comprehensive review process driven by the SRE team have allowed Uber to “see into” parts of its services that previously were opaque or hard to get visibility into.
Evangelizing, teaching, and sticking to it: One of the things that comes up repeatedly in talking to developers outside of Uber (and developers new to Uber) is that ideas like orthogonality, black box design, fragile parsers, language barriers, unit tests, code reviews, and general coding discipline have to be practiced even more carefully. A lack of these practices (technical debt) is magnified by microservices rather than omitted. The same is true with containers. Technical debt exists in a microservice- and container-driven world; it is just isolated into smaller pools of debt, and it surfaces in outages or when new people are brought in. But it still costs developer time and organizational time. Susan discusses the process Uber SRE has developed for approaching these situations, ultimately leading to increased reliability across the stack.
Susan Fowler is a site reliability engineer at Uber, where she splits her time between embedding within business-critical microservice teams and running a production-readiness initiative across Uber’s diverse set of microservices.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com