Monitoring at Scale

Moderated by: Jeremy Brinkley
Location: D138

As we have built and scaled Nagios (and to a certain extent other monitoring systems) we have run into different challenges, like distributed operation, that have had to be overcome. We’d like to discuss solutions different users of monitoring systems have run into as they have grown to larger numbers of nodes, locations, and things to monitor. We can share some of the solutions we’ve developed for distributed monitoring and data-driven monitoring configurations, and would like to hear how others manage their environments in the face of similar challenges.

Additional topics of discussion:

  • Deployment methodology (monitoring-driven deployment)
  • Managing metrics data and performance data reporting
  • Managing configuration data with respect to monitoring
  • Usage profiles for monitoring systems outside of server and network operations