Build resilient systems at scale
October 12–14, 2015 • New York, NY

Service instrumentation, monitoring, and alerting with Prometheus

Björn Rabenstein (Grafana Labs), Julius Volz (SoundCloud)
1:30pm–5:00pm Monday, 10/12/2015
Location: Beekman Parlor
Average rating: *****
(5.00, 1 rating)
Slides:   1-PDF 

Prerequisite Knowledge

  • Basic knowledge of monitoring and alerting
  • Experience with distributed services
  • Programming skills in Python or Go

Materials or downloads needed in advance

  • Familiarize yourself with Prometheus. A good starter is the SoundCloud blog post.
  • Next, you should look at the Week of Monitoring blog posts by Boxever.
  • Finally, you find comprehensive documentation at
  • Bring a Linux or Mac laptop. Optionally, have Linux running in a VM on a Mac or on any other platform. Install the most recent binaries for your platform:
  • The Prometheus server

    The Node Exporter. If you have Linux running in a VM, install the node exporter there (the node exporter works much better on Linux).

  • An example application to be instrumented will be offered in Go and Python. If you prefer Python, make sure you have a working Python development environment on your laptop and install the Python client library by running "pip install prometheus_client"
  • If you prefer Go, make sure you have a working Go1.4.x development on your machine and clone into $GOPATH/src/
  • Make sure the following works:
  • cd $GOPATH/src/

    go get

    go build


Running a multitude of highly scalable services in large clusters provides a challenge to monitoring. Prometheus is a next-generation monitoring system built to cope with that challenge. Over the last three years, it has been developed as an open-source project at SoundCloud, where it has become the standard monitoring system. Other early adopters and contributors are Boxever and Docker. Since its wider announcement in January 2015, the project has rapidly gained attention, including support by third-party tools like Google’s cAdvisor or CoreOS’s etcd.

This tutorial will start with an introduction into the fundamental concepts of Prometheus and the various components of its ecosystem:

  • The core collection server with its time series database
  • The various client libraries
  • The various exporters to export metrics from third-party systems into the Prometheus ecosystem
  • The alerting component Alertmanager
  • The dashboard builder Promdash
  • The Pushgateway for metrics of short-lived jobs

The introduction is followed by a hands-on workshop where participants will instrument the code of a toy service, apply best practices of Prometheus monitoring, and create dashboards and alerts for the service and the hosts it is running on.

Photo of Björn Rabenstein

Björn Rabenstein

Grafana Labs

Björn Rabenstein is an engineer at Grafana Labs and a Prometheus developer. Previously, he was a production engineer at SoundCloud, a site reliability engineer at Google, and a number cruncher for science.

Photo of Julius Volz

Julius Volz


Julius Volz is a production engineer at SoundCloud and co-founder of the Prometheus project. In the past, he worked as a site reliability engineer in Google’s production offline storage team to back up the internet and more.

Comments on this page are now closed.


Picture of Björn Rabenstein
Björn Rabenstein
10/14/2015 6:53am EDT

Thanks everybody for attending. Here are the slides and the instructions.

Picture of Björn Rabenstein
Björn Rabenstein
10/06/2015 12:13pm EDT

Hi Nelson,

No worries at all. It will be super easy. :)
As long as you know either Python or Go, have performed the downloads, and have at least looked a bit at the introductions and the documentation.

And yes, we have given the tutorial before. Nobody felt stupid after it…

10/06/2015 9:43am EDT

Hi Guys!

That sounds like a great topic.

Have you ever give this tutorial before? With all materials and downloads needed, it seems to be really challenging.

Picture of Björn Rabenstein
Björn Rabenstein
08/31/2015 7:20am EDT

Prometheus is a fast moving project. It probably makes sense to perform (or update) the installs mentioned above just a week or so before the conference start.

Stay Connected

Follow Velocity on Twitter Facebook Group Google+ LinkedIn Group


More Videos »

O’Reilly Media

Tech insight, analysis, and research