The A in SRE: Architecting for reliability
Who is this presentation for?
- Architects, tech leads, and SREs
Site reliability engineering (SRE) has become a popular discipline within organizations to improve the reliability of their IT landscape. Typically, SRE focuses on improving reliability of existing services by optimizing the operational procedures and feedback loops to the teams with the ultimate goal of improving service reliability. In some situations, you need to make changes to the architecture to improve the reliability of your service. However, these architectural redesigns are costly and could have been avoided had the SLOs been clear at the beginning. If your objectives are not clear, or not defined at all, you run the risk of not implementing sufficient measures to make your system reliable or implementing too many measures, leading to an overly complex system that can also easily become unreliable.
SLOs must be clear enough to be, among others, understandable, measurable, and reachable within the context of the service. These criteria help to get the SLOs accepted within an organization, help teams select the right stability patterns, and justify to the organization why specific architectural stability patterns are needed. Subsequently, observability patterns around the three pillars, event logs, metrics, and tracing can be applied to make the system observable to measure the SLOs.
Drawing on their real-world experience, Marco van der Linden and Tom Hofte demonstrate how to design reliable and observable systems based on clear SLOs. You’ll work in teams on a fictional case to define SLOs, apply stability patterns to ensure system reliability, and make the system observable.
Join in to learn how to better define clear SLOs and translate them into a reliable and observable system, using well-established architectural patterns.
- A basic understanding of SRE and SLOs
Materials or downloads needed in advance
- A laptop
What you'll learn
- Learn how to define clear SLOs, how to translate SLOs into the right architecture by applying well-established stability patterns, and how to make a system observable
Marco van der Linden
Marco van der Linden is a Netherlands-based IT solutions architect and consultant at Xebia. Marco has more than 15 years’ experience in IT. Previously, he was at IBM and consulted for multiple companies. He’s worked on all kinds of systems using various technologies but is especially interested in distributed systems design. He hosts meetups on RESTful API design, microservices design, and reliability engineering and leads DASA DevOps training. In his spare time, Marco likes to take long walks with his family, do a bit of fencing (épée), and read books.
Tom Hofte is an IT architect at Xebia. Tom has been working as a lead architect in IT for more than 10 years, focusing on integration architectures and distributed system design. He began his career as a developer and over the years has taken on a number of roles within project teams, giving him a deep knowledge and understanding of IT technology and delivering IT projects throughout the complete lifecycle, from concept to grave.
Comments on this page are now closed.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
Become a sponsor
For information on exhibiting or sponsoring a conference
For media/analyst press inquires
Unfortunately no recording was made so, besides the case study and slides there is nothing else we can share.
Will the recording be available?
The correct link is: http://bit.ly/ainsretutorial_material
Thanks for attending our tutorial yesterday. We really appreciated your enthusiasm and discussions during the tutorial. We’ve made the material available under this link: http://bit.ly/aisretutorial_material. The link will be valid till 18/4.
- Tom and Marco