Build & maintain complex distributed systems
October 1–2, 2017: Training
October 2–4, 2017: Tutorials & Conference
New York, NY

Monitoring in the time of cloud native

11:35am12:15pm Wednesday, October 4, 2017
Average rating: ***..
(3.00, 3 ratings)

Who is this presentation for?

  • SREs, architects, and infrastructure engineers

Prerequisite knowledge

  • A basic familiarity with monitoring systems (useful but not required)

What you'll learn

  • Explore the state of the art when it comes to monitoring
  • Understand the benefits, trade-offs, and challenges involved in adopting these tools
  • Get a blueprint for iteratively evolving and modernizing your monitoring stacks instead of ripping out everything you currently have and replacing it with shiny new tools


The infrastructure space is in the midst of a paradigm-shifting change. The way organizations—from the smallest of startups to established companies—build and operate systems has evolved. But as the systems we build become more distributed and (in the case of containerization) ephemeral, traditional monitoring tools have proven to be grossly insufficient. Newer tools modeled along the lines of Google and Facebook’s internal tools have emerged to meet this challenge, and given how far both SaaS and open source tools have evolved in recent years, we now have an embarrassment of observability tools to choose from.

These tools offer a great number of benefits over their predecessors, but they also bring their own set of technical and organizational challenges. Starting over from scratch isn’t a luxury most of us enjoy, and the most challenging part about modernizing one’s observability stack is iteratively evolving it. Cindy Sridharan offers an honest overview of monitoring challenges and trade-offs.

Topics include:

  • An overview of the three pillars of modern observability: logging, metrics collection, and request tracing
  • The pros and cons of each in terms of resource utilization, ease of use, ease of operation, and cost effectiveness
  • An honest look the challenges involved in scaling all three when used in conjunction
  • What to monitor and how in a modern cloud-native environment; what is better-suited to be aggregated as metrics versus being logged; how to use the data from all three sources to derive actionable alerts and insightful analysis
  • The current crop of monitoring systems using three open source systems as a blueprint—Prometheus (for metrics), an OSS system that adheres to the OpenTracing spec like Uber’s Jaeger, and ELK for logging—and where each falls short
  • When it makes sense to augment the three aforementioned tools with additional tools
Photo of Cindy Sridharan

Cindy Sridharan


Cindy Sridharan is a Distributed Systems Engineer. She likes thinking about building resilient and maintainable systems. She maintains a blog where she shares her ideas and experience about several of these topics. She is the author of a report on Distributed Systems Observability with O’Reilly.