All Software Architecture, All the Time
June 10-13, 2019
San Jose, CA
Please log in

A microservices murder mystery: Discover the root cause of an outage

Thomas Rampelberg (Buoyant)
11:00am–11:45am Thursday, June 13, 2019
Cloud native, Microservices
Location: 210 B/F
Secondary topics:  Best Practice

Who is this presentation for?

  • DevOps engineers, SREs, and developers



Prerequisite knowledge

  • A basic understanding of containers and Kubernetes

What you'll learn

  • Understand how service meshes work
  • Explore Linkerd
  • Learn how to discover the cause of an outage


When you’re operating multiple services, outages can feel like murder mysteries. Building out forensics tools such as monitoring and observability is essential. Unfortunately, it’s a real challenge to balance priorities between building new features and tools to help pinpoint root causes.

Linkerd 2.0 provides many of the tools you need to tame the chaos of operating microservices in a cloud native world. Because it’s a transparent proxy that runs alongside your application, there are no code changes required. It even comes with Prometheus to store the metrics for you and prebuilt Grafana dashboards to show exactly what’s important for your services—success rate, latency, and throughput.

Thomas Rampelberg explains the benefits Linkerd offers, demos the installation of Linkerd on Kubernetes, and debugs a real-world problem. He digs into what functionality you can build on top of the tools provided by Linkerd, such as alerting and autoscaling. By the end he’ll have answered the following questions: What is a service mesh? How does a service mesh work? Why does my code not change? Who benefits from these tools? When do I need to add Linkerd to my services?

Code samples and resources will be provided.

Photo of Thomas Rampelberg

Thomas Rampelberg


Thomas Rampelberg is a software engineer at Buoyant. He’s made a career of building infrastructure software that allows developers and operators to focus on what’s important to them. Previously at Mesosphere, he helped create DC/OS, one of the first container orchestration platforms used by many of the Fortune 500. He has moved to the next big problem in the space: providing insight into what’s happening between services, improving reliability between them, and using best practices to secure the communication channels between them.