7–9 November 2016: Conference & Tutorials
9–10 November 2016: Training
Amsterdam, The Netherlands

Own your reliability

Adam Surák (Algolia)
13:45–14:25 Monday, 7/11/2016
Reimaging DevOps, security, and infrastructure Networking, Resilience engineering Emerald Room & Lounge Audience level: Intermediate
Average rating: ****.
(4.85, 13 ratings)

Prerequisite knowledge

  • A general understanding of servers and networks infrastructure
  • Hands-on experience running systems in production (useful but not required)

What you'll learn

  • Gain a more critical view of your current and future system design efforts


Who do you trust? What do you control? What are your dependencies? Reliability on the Internet is an adrenaline-fueled adventure, but we all want a good night sleep and working service sometimes. Adam Surák takes a closer look at some reliability nightmares and explains how they could be dealt with, sharing the design learning outcomes of his experience running servers in almost 40 data centers across 15 regions, achieving close to 100% availability globally and 100% in the vast majority of the regions.

In order to demonstrate why we’re being impacted by our design and operations decisions, Adam quickly reviews the basics before exploring in detail SLAs that we commit to every day yet have only a vague idea of what they mean. Adam then offers an overview of blackbox monitoring tools, from very simple, low-precision tools testing traffic to very sophisticated, high-precision tools measuring real-user traffic. Although cloud solutions seem to be the silver bullet of everything for some, Adam explains that that’s not the case—the cloud has its own issues. Adam concludes with an overview of commonly underestimated dependencies in our software, infrastructure, and people.

Photo of Adam Surák

Adam Surák


Adam Surák is a software engineer at Algolia, where he focuses on the sweet spot between software engineering, systems, and networks.