Making Open Work
May 8–9, 2017: Training & Tutorials
May 10–11, 2017: Conference
Austin, TX

Prometheus: The next-generation monitoring system

11:50am12:30pm Wednesday, May 10, 2017
Location: Ballroom E
Level: Intermediate
Average rating: ***..
(3.62, 8 ratings)

Who is this presentation for?

  • Software developers and operations engineers

Prerequisite knowledge

  • Intermediate development and operations experience

What you'll learn

  • Explore the Prometheus monitoring system and understand its design and features


Prometheus is an open source systems monitoring and alerting toolkit inspired by Google’s internal monitoring tool Borgmon. It is perfectly suited for monitoring containerized and dynamically orchestrated microservices. Prometheus is the future of monitoring, perfectly poised to become the de facto standard of monitoring cloud-native applications of the next generation.

Prometheus chooses a different approach than those used and popularized by traditional monitoring systems. Cindy Sridharan explores the architecture and philosophy of Prometheus and explains how powerful features like the query language, flexible data model, and relabeling can be leveraged to gain valuable insights about application performance. You’ll learn why Prometheus is a perfect fit for modern, cloud-native applications—think applications/batch workloads running in a containerized, dynamically orchestrated, “microservices architecture” environment where failure is the norm. Along the way, Cindy explains how easy it is to integrate Prometheus clients to services, which enables building and scaling seamlessly, how time series data-driven alerting and notifications based off of percentiles greatly simplifies understanding and reasoning about distributed service availability, how the pushgateway enables applications to “push” metrics to the Prometheus, and how the alertmanager deduplicates, groups, and routes Prometheus alerts to services like Slack and PagerDuty.

Cindy also outlines what Prometheus does not offer such as—anomaly detection, request tracing, horizontal scalability out of the box, and long-term storage, for example—and covers some of the other open source tools in the ecosystem that are available to tackle these issues, looking at how request tracing across different services can be implemented with the Open Tracing spec with a backend like ZipKin and how tools like DigitalOcean’s Vulcan augment Prometheus with long-term storage. Cindy then concludes with a brief example of monitoring a Dockerized application with Prometheus (contingent on how this issue pans out) or monitoring a bare-bones Kubernetes cluster with Prometheus.

Topics include:

  • Why (and how) the pull model works and how it scales
  • The power and flexibility of the dimensional data model and the benefits that labels and the PromQL query language brings
  • How time series data can be turned into meaningful and actionable alerts
  • How features like in built service discovery helps not just with the basic metrics scraping but also acts as a pivotal integration point with other systems like Consul
  • How more advanced features like relabeling work and what benefits it brings
  • The design of stateful client libraries
  • Why metric naming conventions and a non-distributed storage were a design goal
Photo of Cindy Sridharan

Cindy Sridharan


Cindy Sridharan is a Distributed Systems Engineer. She likes thinking about building resilient and maintainable systems. She maintains a blog where she shares her ideas and experience about several of these topics. She is the author of a report on Distributed Systems Observability with O’Reilly.