New Feature: Alarm Auto-resolution
We’d like to announce a new PagerDuty feature: auto-resolution of alarms. Auto-resolution is a setting on the PagerDuty alarms; if enabled, an alarm will automatically resolve itself after a specified amount of time.
Alarm auto-resolution is an important safety mechanism in case you forget an alarm in the Triggered state. This all makes perfect sense if you understand how the PagerDuty alarms work.
Alarms in PagerDuty are stateful. Each alarm starts out in the Idle state. Upon receiving a trigger email, the alarm transitions to the Triggered state and begins to alert your team based on the rules specified by the alarm’s alarm group. However, if an already Triggered alarm receives additional trigger emails, it logs them but *does not re-start the alerting process*. This can be dangerous, as I’ll explain below.
In the normal case, an alarm is triggered and notifies the person on-call. That person receives the phone/SMS/email alert, fixes the problem and resolves the alarm. In some cases, the person on-call does not receive the alert (this can happen if your cell runs out of batteries, or has no reception, or you forget your phone in another room and go to sleep). In these cases, the alarm is automatically escalated to a secondary person, who then picks up the alert and resolves the alarm. It’s also possible (and this has happened a few times to some of our customers) that an alarm triggers and contacts all of the people in the escalation chain, but nobody picks it up.
When an alarm runs out of people to notify, it stays in the Triggered state until someone resolves it. This is a dangerous state for an alarm to be in, because, as I mentioned above, any trigger emails to the alarm will not restart the alerting process. The alarm must be explicitly resolved to re-enable alerting.
This is where auto-resolution comes in. We strongly recommend you turn it on for all of your alarms. Here’s how to enable auto-resolution for an alarm:
- Click on the Alarms tab, and click one of your alarms.
- Near the top of the page, you’ll see “Auto resolve”. Click “change”.
- Set the amount of time after which the alarm is auto-resolved. This should be set according to the amount of time an alarm would take to run out of people to notify (as specified by the rules set in your alarm group).