September 19–20, 2016: Training
September 20–22, 2016: Tutorials & Conference
New York, NY

Mitigating sprawl with microservices and containerization

Susan Fowler (Uber )
1:30pm–2:10pm Wednesday, 09/21/2016
Infrastructure reimagined DevOps, Resilience engineering Beekman Audience level: Advanced
Average rating: ****.
(4.42, 12 ratings)

Prerequisite knowledge

  • Familiarity with process management, the software development life-cycle, containerization, microservices, and the concepts of broad-scale architecture
  • What you'll learn

  • Understand why breaking monoliths into microservices and buying into containerization moves the need for discipline in development to new, possibly unforeseen places that are less about software development and often more about process management
  • Description

    Drawing on her experience at Uber, Susan Fowler explains how to smash a monolithic API into many hundreds of containerized microservices for fun and profit—and how to create a posture of resiliency and reliability around an infrastructure that grows and changes daily with incredible velocity.

    Susan organizes her talk into three key areas:
    Process management: It’s not just about building software. You have to have processes around building software that include incident reviews for security and outages. You also have to have process flows to follow up on the things that come out of these processes (and provide parallel paths to new code development) so that ongoing software failures do not sabotage new development or deplete resources (or morale). Process management and organizational overhead becomes critical as the number of services and complexity increases.

    Metrics and monitoring: If you can’t quantify what you are doing, why are you doing it? If you change a piece of code but can’t quantify the effect of that change, why did you bother to change it? Containerization and microservices give us new introspection into the tiniest spaces in our software stack with increased agility. But we have to know where to look, how to instrument that monitoring, and what to do with the metrics once we have them; by checking at cross-service boundaries, monitoring code defects, performing exhaustive root cause analysis, et cetera, we begin to see a deep picture of very tiny pieces of a very large and complex architecture. Susan shows how incident mitigation and a comprehensive review process driven by the SRE team have allowed Uber to “see into” parts of its services that previously were opaque or hard to get visibility into.

    Evangelizing, teaching, and sticking to it: One of the things that comes up repeatedly in talking to developers outside of Uber (and developers new to Uber) is that ideas like orthogonality, black box design, fragile parsers, language barriers, unit tests, code reviews, and general coding discipline have to be practiced even more carefully. A lack of these practices (technical debt) is magnified by microservices rather than omitted. The same is true with containers. Technical debt exists in a microservice- and container-driven world; it is just isolated into smaller pools of debt, and it surfaces in outages or when new people are brought in. But it still costs developer time and organizational time. Susan discusses the process Uber SRE has developed for approaching these situations, ultimately leading to increased reliability across the stack.

    Photo of Susan Fowler

    Susan Fowler


    Susan Fowler is a site reliability engineer at Uber, where she splits her time between embedding within business-critical microservice teams and running a production-readiness initiative across Uber’s diverse set of microservices.