Build Systems that Drive Business
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA

Performance debugging: Finding bottlenecks in distributed systems

2:10pm–2:50pm Thursday, June 14, 2018
Distributed Systems
Location: LL21 A/B Level: Intermediate
Secondary topics: Resilient, Performant & Secure Distributed Systems
Average rating: **...
(2.50, 8 ratings)

Prerequisite knowledge

  • An understanding of distributed systems, monitoring, and profiling

What you'll learn

  • Learn how to debug bottlenecks in distributed systems, at both a macro and a micro level


Whether a company is seeing rapid growth or has an existing large customer base, the performance of its software is crucial and can be impacted by a range of variables. These variables include how a company delivers applications to customers, how host machines run the software, and everything in between. Performance debugging is a crucial part of ensuring code is production ready, particularly as a company and its products grow. Debugging bottlenecks that prevent existing software from performing optimally can open up a business’s system to scale and handle more usage. However, most of the battle in the debugging process is actually identifying the bottlenecks rather than fixing them. Skills such as tracing, monitoring, and profiling are invaluable in identifying these bottlenecks.

Christian Grabowski shares his experience debugging bottlenecks in distributed systems, at both a macro (metrics, distributed tracing) and a micro (user space and kernel space profiling) level, focusing particularly on tuning REST API services to handle databases that had doubled in size in a matter of a day and taming a resource-hungry, high-throughput metrics ingestion service.

In the macro view, the goal is to identify the bottleneck(s) of a distributed system. Which service is preventing higher throughput? Which service is adding latency? Which service is using all of the resources? Thankfully, there are many available tools to pinpoint the answer to these questions, such as operational metrics and distributed tracing. The micro view, on the other hand, examines where bottlenecks exist in the service itself. This can involve blocks of code, the right balance of resources, or the configuration of the service or machine. Recent technology is emerging to help identify these issues, such as dynamic tracing with things like eBPF.

Join Christian to learn how to overcome and solve these bottlenecks, making software scale and perform substantially better.

Photo of Christian Grabowski

Christian Grabowski


Christian Grabowski is a backend software engineer at NS1, a next-generation DNS and traffic management company. Christian has worn many engineering hats over the course of his career and has worked on a variety of software but loves getting into the nitty-gritty, low-level code the most. When he’s not developing fast, intelligent DNS services, he’s rather active in the open source community, contributing to projects such as gobpf, BCC, and Kubernetes.