In web operations, a monitoring system is often our main interface to our infrastructure. A monitoring system alerts us, interrupts us, awakes us. When it does, maybe the site is on fire (signal), maybe a new application broke old monitoring assumptions and sent spurious alerts (noise).
As an industry we are disciplined at deploying monitoring systems to support our applications, because the only thing worse than an outage, is an outage you hear about first from your customers. We also have learned to use monitoring to swiftly react to infrastructure and application failures. Where we still fail is at managing our monitoring, at keeping the flow of alerts at a high signal-to-noise ratio and at not drowning under a deluge of dubious alerts. Our monitoring systems are just too noisy.
Noisy monitoring leads to outages, even more noise, and ultimately frustration and resentment. How do we decrease the noise and turn monitoring from a necessary evil into an operational strength? That is our topic.
This session, aimed at anyone who cares about monitoring, will include:
Alexis Lê-Quôc is the CTO of Datadog, an infrastructure monitoring service. He’s spoken at various tech conferences (including Velocity) on operations, architecture and customer support.
For information on exhibition and sponsorship opportunities at the conference, contact Gloria Lombardo at (203) 381-9245 or firstname.lastname@example.org
For media partnerships, contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Velocity contacts