San Jose • New York • London

Build Systems that Drive Business

June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference

San Jose, CA

Performance debugging: Finding bottlenecks in distributed systems

Christian Grabowski (NS1)

2:10pm–2:50pm Thursday, June 14, 2018

Distributed Systems
Location: LL21 A/B Level: Intermediate

Secondary topics: Resilient, Performant & Secure Distributed Systems

Average rating:

(2.50, 8 ratings)

Download slides (PDF)

Prerequisite knowledge

An understanding of distributed systems, monitoring, and profiling

What you'll learn

Learn how to debug bottlenecks in distributed systems, at both a macro and a micro level

Description

Whether a company is seeing rapid growth or has an existing large customer base, the performance of its software is crucial and can be impacted by a range of variables. These variables include how a company delivers applications to customers, how host machines run the software, and everything in between. Performance debugging is a crucial part of ensuring code is production ready, particularly as a company and its products grow. Debugging bottlenecks that prevent existing software from performing optimally can open up a business’s system to scale and handle more usage. However, most of the battle in the debugging process is actually identifying the bottlenecks rather than fixing them. Skills such as tracing, monitoring, and profiling are invaluable in identifying these bottlenecks.

Christian Grabowski shares his experience debugging bottlenecks in distributed systems, at both a macro (metrics, distributed tracing) and a micro (user space and kernel space profiling) level, focusing particularly on tuning REST API services to handle databases that had doubled in size in a matter of a day and taming a resource-hungry, high-throughput metrics ingestion service.

In the macro view, the goal is to identify the bottleneck(s) of a distributed system. Which service is preventing higher throughput? Which service is adding latency? Which service is using all of the resources? Thankfully, there are many available tools to pinpoint the answer to these questions, such as operational metrics and distributed tracing. The micro view, on the other hand, examines where bottlenecks exist in the service itself. This can involve blocks of code, the right balance of resources, or the configuration of the service or machine. Recent technology is emerging to help identify these issues, such as dynamic tracing with things like eBPF.

Join Christian to learn how to overcome and solve these bottlenecks, making software scale and perform substantially better.

Christian Grabowski

NS1

Christian Grabowski is a backend software engineer at NS1, a next-generation DNS and traffic management company. Christian has worn many engineering hats over the course of his career and has worked on a variety of software but loves getting into the nitty-gritty, low-level code the most. When he’s not developing fast, intelligent DNS services, he’s rather active in the open source community, contributing to projects such as gobpf, BCC, and Kubernetes.

Diamond Sponsor

Elite Sponsors

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Innovators

Exhibitors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email velocity@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Velocity contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com