Build resilient systems at scale
28–30 October 2015 • Amsterdam, The Netherlands

Service instrumentation, monitoring, and alerting with Prometheus (continued)

Björn Rabenstein (SoundCloud), Julius Volz (SoundCloud)
16:00–17:30 Wednesday, 28/10/2015
Tutorial
Location: Auditorium
Average rating: ***..
(3.77, 13 ratings)
Slides:   external link

Prerequisite Knowledge

  • Basic knowledge of monitoring and alerting
  • Experience with distributed services
  • Programming skills in Python or Go

Materials or downloads needed in advance

  • Familiarize yourself with Prometheus. A good starter is the SoundCloud blog post.
  • Next, you should look at the Week of Monitoring blog posts by Boxever.
  • Finally, you find comprehensive documentation at Prometheus.io
  • Bring a Linux or Mac laptop. Optionally, have Linux running in a VM on a Mac or on any other platform. Install the most recent binaries for your platform:
  • The Prometheus server

    The Node Exporter. If you have Linux running in a VM, install the node exporter there (the node exporter works much better on Linux).

  • An example application to be instrumented will be offered in Go and Python. If you prefer Python, make sure you have a working Python development environment on your laptop and install the Python client library by running "pip install prometheus_client"
  • If you prefer Go, make sure you have a working Go1.4.x development on your machine and clone https://github.com/prometheus/client_golang into $GOPATH/src/github.com/prometheus/client_golang.
  • Make sure the following works:
  • cd $GOPATH/src/github.com/prometheus/client_golang/examples/simple

    go get

    go build

Description

Running a multitude of highly scalable services in large clusters provides a challenge to monitoring. Prometheus is a next-generation monitoring system built to cope with that challenge. Over the last three years, it has been developed as an open-source project at SoundCloud, where it has become the standard monitoring system. Other early adopters and contributors are Boxever and Docker. Since its wider announcement in January 2015, the project has rapidly gained attention, including support by third-party tools like Google’s cAdvisor or CoreOS’s etcd.

This tutorial will start with an introduction into the fundamental concepts of Prometheus and the various components of its ecosystem:

  1. The core collection server with its time series database
  2. The various client libraries
  3. The various exporters to export metrics from third-party systems into the Prometheus ecosystem
  4. The alerting component Alertmanager
  5. The dashboard builder Promdash
  6. The Pushgateway for metrics of short-lived jobs

The introduction is followed by a hands-on workshop, where participants will instrument the code of a toy service and create a dashboard and alerts for the service and the hosts it is running on.

Photo of Björn Rabenstein

Björn Rabenstein

SoundCloud

Björn Rabenstein is a production engineer at SoundCloud and one of the main Prometheus developers. Previously, Björn was a site reliability engineer at Google and a number cruncher for science.

Photo of Julius Volz

Julius Volz

SoundCloud

Julius Volz is a production engineer at SoundCloud and co-founder of the Prometheus project. In the past, he worked as a site reliability engineer in Google’s production offline storage team to back up the internet and more.