Making Open Work
May 8–9, 2017: Training & Tutorials
May 10–11, 2017: Conference
Austin, TX

Site reliability engineering

Jean Joswig (Google)
9:00am12:30pm Monday, May 8, 2017
Architecture, Infrastructure
Location: Meeting Room 10 A/B
Level: Intermediate
Average rating: ****.
(4.62, 8 ratings)

Who is this presentation for?

  • Site reliability engineers, DevOps engineers, systems engineers, and managers

What you'll learn

  • Learn the theoretical concepts behind distributed systems and how to use design patterns from distributed systems in a practical, user-facing production system
  • Understand how to approach a high-level architectural problem such as “design imgur,” reason about what designs would be appropriate, and identify how to implement a design (which resources the system requires, such as hardware, network bandwidth, and software components)
  • Discover how site reliability influences implementation

Description

Members of Google’s Site Reliability Engineering (SRE) team guide you through the principles of systems engineering. You’ll work in small groups to solve a systems problem, using ideas from distributed computing to build a sample system and gain practical experience with the issues surrounding large-scale system design.

Each group will have a facilitator to answer questions and guide discussion. You’ll leave with stronger analytic skills and with an awareness of technologies and approaches for large-scale design. Nontechnical participants will also leave knowing how to evaluate systems for reliability and suitability for high-availability environments.

Photo of Jean Joswig

Jean Joswig

Google

Jean Joswig is a site reliability engineer at Google working on data center automation.