Build resilient systems at scale
October 12–14, 2015 • New York, NY

Monitoring and alerting fundamentals

Sean Allen (Wallaroo Labs)
1:30pm–3:00pm Monday, 10/12/2015
Tutorial
Location: Regent Parlor
Average rating: **...
(2.93, 14 ratings)

Prerequisite Knowledge

Attendees should have a basic understanding of running software in a production environment and the general problems that can occur.

Description

While often discussed together, monitoring and alerting serve two different needs. Monitoring can provide you information to diagnose unknown problems and study historical trends. Alerting is about what is going wrong right now that you need to fix. When setting up monitoring and alerting systems, many people start from the tools that are available, pick one, and start monitoring. Ultimately, monitoring and alerting should provide business value just like new features.

There are two general approaches to monitoring. One is to collect every metric possible and filter it later. The other collects only the things you can map to a specific value. Whichever approach you take, you’ll need to choose what to monitor, which metrics result in alerts, and the thresholds at which those alerts occur.

Examples of questions that alerts could answer are:

  • Is my website up and accessible?
  • Does all the important functionality work?
  • Is each server up?
  • Are all the applications we deployed up?
  • What’s my CPU usage per machine? disk? memory? swap?

In the Monitoring and Alerting Fundamentals tutorial, we’ll address choosing what to monitor in order to meet your end goals, the difference between alerting and monitoring, and get deeper into monitoring, with discussions on:

  • Application level metrics
  • Anomaly detection
  • Log aggregation and correlation
  • and MUCH, MUCH more…

These topics are best addressed using real-world examples, so audience participation is strongly encouraged. Bring examples of things you would like to know about the health of your existing system and aspects of your existing monitoring that aren’t meeting your needs, and we can use them as the basis for discussion.

Photo of Sean Allen

Sean Allen

Wallaroo Labs

Sean T. Allen is vice president of engineering at Wallaroo Labs and a member of the Pony core team. His turn-ons include programming languages, distributed computing, Hiwatt amplifiers, and Fender Telecasters. His turn-offs include mayonnaise, stirring yogurt, and sloppy code. He is one of the authors of Storm Applied.

Stay Connected

Follow Velocity on Twitter Facebook Group Google+ LinkedIn Group

Videos

More Videos »

O’Reilly Media

Tech insight, analysis, and research