Making Open Work
May 8–9, 2017: Training & Tutorials
May 10–11, 2017: Conference
Austin, TX

Prometheus: The next-generation monitoring system

Cindy Sridharan (imgix)
11:50am12:30pm Wednesday, May 10, 2017
Infrastructure
Location: Ballroom E
Level: Intermediate
Average rating: ***..
(3.62, 8 ratings)

Who is this presentation for?

  • Software developers and operations engineers

Prerequisite knowledge

  • Intermediate development and operations experience

What you'll learn

  • Explore the Prometheus monitoring system and understand its design and features

Description

Prometheus is an open source systems monitoring and alerting toolkit inspired by Google’s internal monitoring tool Borgmon. It is perfectly suited for monitoring containerized and dynamically orchestrated microservices. Prometheus is the future of monitoring, perfectly poised to become the de facto standard of monitoring cloud-native applications of the next generation.

Prometheus chooses a different approach than those used and popularized by traditional monitoring systems. Cindy Sridharan explores the architecture and philosophy of Prometheus and explains how powerful features like the query language, flexible data model, and relabeling can be leveraged to gain valuable insights about application performance. You’ll learn why Prometheus is a perfect fit for modern, cloud-native applications—think applications/batch workloads running in a containerized, dynamically orchestrated, “microservices architecture” environment where failure is the norm. Along the way, Cindy explains how easy it is to integrate Prometheus clients to services, which enables building and scaling seamlessly, how time series data-driven alerting and notifications based off of percentiles greatly simplifies understanding and reasoning about distributed service availability, how the pushgateway enables applications to “push” metrics to the Prometheus, and how the alertmanager deduplicates, groups, and routes Prometheus alerts to services like Slack and PagerDuty.

Cindy also outlines what Prometheus does not offer such as—anomaly detection, request tracing, horizontal scalability out of the box, and long-term storage, for example—and covers some of the other open source tools in the ecosystem that are available to tackle these issues, looking at how request tracing across different services can be implemented with the Open Tracing spec with a backend like ZipKin and how tools like DigitalOcean’s Vulcan augment Prometheus with long-term storage. Cindy then concludes with a brief example of monitoring a Dockerized application with Prometheus (contingent on how this issue pans out) or monitoring a bare-bones Kubernetes cluster with Prometheus.

Topics include:

  • Why (and how) the pull model works and how it scales
  • The power and flexibility of the dimensional data model and the benefits that labels and the PromQL query language brings
  • How time series data can be turned into meaningful and actionable alerts
  • How features like in built service discovery helps not just with the basic metrics scraping but also acts as a pivotal integration point with other systems like Consul
  • How more advanced features like relabeling work and what benefits it brings
  • The design of stateful client libraries
  • Why metric naming conventions and a non-distributed storage were a design goal
Photo of Cindy Sridharan

Cindy Sridharan

imgix

Cindy Sridharan is an engineer at imgix, where she works on API development, infrastructure, and other miscellaneous backend engineering tasks. Cindy likes thinking about building resilient and maintainable systems and recently started writing about several of these topics.