Build Systems that Drive Business
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA

Networks, echolocation, and fish GIFs

Victoria Nguyen (Fastly)
3:40pm–4:20pm Wednesday, June 13, 2018
Distributed Data, Hardware, Storage, and Datacenters
Location: 230 B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Average rating: *****
(5.00, 3 ratings)

Prerequisite knowledge

  • Familiarity with basic networking tools (e.g., ping and traceroute) and vocabulary (routing, transit provider, hops/paths, etc.)

What you'll learn

  • Learn how Fastly overhauled the monitoring and data collection of its globally distributed network

Speaker introduction

Description

Traditional approaches to debugging network issues across a globally distributed system are a pain, and when you’re responsible for an enormous amount of customer traffic, it’s important to tread lightly. If you make the data collection too frequent, at best, your program will take up CPU reserved for serving traffic, and at worst, you risk DDoS-ing your own servers. Data collection that is too sparse (or targets only specific caches) makes determining the quickest path and diagnosing packet loss mainly speculative.

At Fastly, conventional ping and traceroute tools are insufficient at the company’s scale, so it had to build its own. Victoria Nguyen explains how Fastly overhauled the monitoring and data collection of its globally distributed network without its caches noticing. You’ll learn how the company uses hashing to evenly balance data collection between caches within a site and collect data for each provider for best results.

It’s been a process refining what tools Fastly uses to pinpoint latency, packet loss, and quickest paths. In the latest iteration, the company used its own platform by leveraging caching, request routing in VCL, and HTTP to build more flexible monitoring and data collection tools. The system is written in Go, allowing HTTP requests to be lightweight and concurrent, and the API is wired to Slack bots so that anyone can ping or traceroute between sites without having to SSH into production servers or coordinate with each other during an incident—which of course is the only thing anyone ever cares about.

Photo of Victoria Nguyen

Victoria Nguyen

Fastly

Victoria Nguyen is a network systems engineer at Fastly. She loves rock climbing and Halloween.