Engineer for the future of Cloud
June 10-13, 2019
San Jose, CA

SRE classroom: How to design a reliable application in three hours

Jenny Liao (Google)
1:30pm5:00pm Tuesday, June 11, 2019
Average rating: ****.
(4.43, 14 ratings)

Who is this presentation for?

  • Site reliability engineers, system engineers, system administrators, and engineering managers who want to better understand quantitative evaluation of microservices-based projects, and want to better understand the design, scalability, and robustness of distributed systems.

Prerequisite knowledge

  • You won’t need laptops or specific coding experience. You will need enthusiasm for collaborating in small groups and for discussion-based problem solving

What you'll learn

  • Learn how to solve problems with software
  • Evaluate and build your own large system


Explore the key concepts behind large system design with Jenny Liao, as she guides you through building, scaling, and provisioning a system. Apply the concepts you learn to evaluate and build systems of your own. You will be working in small groups.

Key Concepts:

  • Microservices
  • Failure tolerance
  • CAP theorem
  • Consensus in distributed systems
  • SLOs
  • Load balancing
  • Capacity Planning

Part one: Workshop introduction: goals, expectations, and problem statement

We have a problem. Let’s solve it with software.

  • Setting the tone: goals and expectations
  • Present the problem statement
  • Introduction of the service-level objectives (SLOs)
  • Discussion of terminology and concepts so we’re using the same vocabulary

Hands-on workshop, part one (40 minutes)

  • Expectation: Identify the components necessary to build a working system in a single location. Produce a sketch of this working system.
  • Unified Modeling Language (UML) not required
  • Afterwards: 5 minute break
  • After the break, sample solution

Part two: The solution has limitations; let’s improve it.

We have identified single points of failure…because things fail. The system failed. And we lost users.

Let’s replicate this thing.

  • What parts are useful to replicate?
  • How do we arrange this so that work is shared efficiently?
  • How do we know that these systems are doing what we expect?

We have correctness problems.

  • How do we identify correctness or consistency issues?
  • How do we know if we have addressed them?
  • How can we apply these concepts to a real piece of software?
  • What limitations does this introduce?

Hands-on workshop, part two (30 minutes)

  • Expectation: Identify which components can usefully run with multiple replicas and/or in multiple locations. Produce a system that runs correctly in multiple data centers.
  • Afterwards: 5 minute break
  • After the break, sample solution

Part three: Provisioning, SLOs, and conclusions
Let’s provision the system based on storage, bandwidth and latency needs. Let’s evaluate our system against the SLO requirements.

We have a map of components in our system.

  • How many replicas do we need for each component?
  • Is the system reliable enough?
  • Does the system give correct results?
  • How much hardware do we need?
  • Are there any bottlenecks? How can we address them?

Hands-on workshop, part three (35 minutes)

  • Expectation: Identify how many machines you need for each component of the system. Determine if the SLOs are reasonably achievable with your design.
  • Afterwards: 5 minute break
  • After the break, sample solution

Part finale: discussion and conclusions

  • What key points have we learned?
  • How can we apply these key points beyond this workshop?
  • Last thoughts

Hands-on workshop exercises:
For each exercise, you’ll work in small groups to make incremental progress on your solutions. Jenny will present a sample solution with distributed systems design concepts and key takeaways after each hands-on portion of the workshop.

Photo of Jenny Liao

Jenny Liao


Jenny Liao is a software engineer in Google’s Pittsburgh office. A Carnegie Mellon alumni, she has a healthy dose of Pittsburgh pride. Jenny is passionate about distributed systems design and is always excited to connect with more people. She enjoys painting, singing, and playing with dogs in her free time.

Comments on this page are now closed.


Picture of Jenny Liao
06/27/2019 3:08am PDT

Hi Jaya – the material is not available online at the moment, but we are looking into open sourcing this material in the future!

Jaya Vara Naga Sindoori Sistla | SOFTWARE ENGINEER
06/13/2019 9:12am PDT

HI Jenny I really liked your session. Can you please share the session material with slides ?