Engineer for the future of Cloud
June 10-13, 2019
San Jose, CA

SRE classroom: How to design a reliable application in three hours

Jenny Liao (Google)
1:30pm5:00pm Tuesday, June 11, 2019

Who is this presentation for?

  • Site reliability engineers, system engineers, system administrators, and engineering managers who want to better understand quantitative evaluation of microservices-based projects


Explore the key concepts behind microservices with Jenny Liao before she guides you through applying the concepts to evaluate and build systems of your own.


  • Consensus in distributed systems
  • Request routing and load balancing
  • Capacity planning
  • Failure tolerance

Part one: Workshop goals, expectations (30 minutes)

We have a problem. Let’s solve it with software.

  • Present the initial problem statement
  • Introduction of the service-level objective (SLO)

Let’s discuss terminology and concepts so that we’re all talking about the same things with the same vocabulary.

  • Machine: hardware (memory, processor) and software (libraries, invariants)
  • Distributed: hardware (data center, network) and software (algorithms, failures), and what is distributed consensus? Why is it important?

Hands-on workshop, part one (30 minutes)

  • Expectation: Identify the components necessary to build a working system in a single location. Produce a sketch of this working system. (Unified Modeling Language [UML] not required)


Part two: The solution has limitations; let’s improve it. (30 minutes)

We have identified single points of failure…

  • …Because things failed. The system failed. And we lost users.

Let’s replicate this thing.

  • What parts are useful to duplicate? Replicate? How do we arrange this so that we make the computers do all the work?
  • How do we know that these systems are doing what we expect?

We have performance bottlenecks.

  • How do we identify bottlenecks?
  • Conversely, how do we know that we have removed these bottlenecks?

How can we apply these concepts to a real piece of software?

  • What limitations does this introduce?

Hands-on workshop: Part two (30 minutes)

  • Expectation: Identify which components can usefully run in multiple locations; evaluate how to write an SLO (and how to apply it); produce a system that runs in multiple data centers.

Part three: Discussion and conclusions (30 minutes)

  • Present an example solution
  • Discuss commonly encountered limitations
  • What key points have we learned?
  • How does it apply beyond this workshop?: Assessing and evaluating third-party (i.e., cloud) systems and integrating them into your design

Hands-on exercises:
For each exercise, you’ll work in small groups to apply the concepts in the preceding presentation to the problem. As Jenny discusses additional aspects of distributed-systems design, the groups will apply these concepts to their in-progress solutions.

You won’t need laptops or specific coding experience; you will need enthusiasm for collaborating in small groups and for discussion-based problem solving.

Photo of Jenny Liao

Jenny Liao


Jenny Liao is a software engineer in Google’s Pittsburgh office. A Carnegie Mellon alumni, she has a healthy dose of Pittsburgh pride. Jenny is passionate about distributed systems design and is always excited to connect with more people. She enjoys painting, singing, and playing with dogs in her free time.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)