The Dark Art of Building a Production Incident System

Operations, Grand Ballroom West
Average rating: ***..
(3.64, 11 ratings)

Incident management and alerting are the spine of operations monitoring. Getting them working properly is a great challenge – even today. Besides in-depth knowledge about how IT infrastructure and applications work, you also need a fair amount of mathematical skills, if you don’t want to define ten thousands of alerting thresholds manually.

Our challenge was even tougher. We needed to develop a reliable incident system that works consistent in over 1000 applications without the requirement for any manual configuration. This talk will cover how we did it.

Even if you are not a statistics maniac you will get in-depth insight into how to build a better incident system. We will cover a large variety of topics and cover non-everyday questions like:

  • Which metrics you should track and which you shouldn’t?
  • Why you should differentiate between violations and incidents?
  • What makes a metric suitable for baselining and violation detection?
  • What are the top reasons for too many or too few incidents?
  • Why your incident system might trigger too fast or too slow?
  • What is percentile drift detection and how it helps to improve your alerts?
  • How your infrastructure and applications structure helps to reduce the number of incidents you get?

Whether you are responsible for setting up incident management or are just the consumer, this talk is for you. It will help do develop more trust in your incidents and also provide you with the skills to improve them.

Photo of Alois Reitbauer

Alois Reitbauer

Dynatrace

Alois Reitbauer works as a technology strategist and evangelist for Compuware’s APM Division. He specializes in architecture and performance related topics in the Java and AJAX space. As part of the product management team he drives the future of the dynaTrace product line and works closely with technology companies on implementing performance management solutions. He is a frequent speaker at technology conferences on performance and architecture related topics and regularly publishes articles blogs on blog.dynatrace.com

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Gloria Lombardo at (203) 381-9245 or glombardo@oreilly.com

Media Partner Opportunities

For media partnerships, contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Velocity contacts