When a customer outage occurs, its impact is felt across the organization. While the technical response is underway, stakeholders from public relations, customer support, legal,…
Monitoring is pivotal in the sustained proactivity in your ITOps architecture. In recent years, we have seen an explosion in both the number of and…
Having one person on-call isn’t enough. What happens if your on-call engineer sleeps through their alert? What happens if their phone’s battery dies without them knowing, or if they get an alert at a really inconvenient time, like when stuck on a bus or in traffic? It will happen. We present best practices for back up. One or more people, waiting in the wings, ready to spring into action if your primary on-call is unable to perform his or her duties to the best of their abilities at any given time.
Since we launched on-call handoff notifications, lots of our customers have used them to be notified about their on-call responsibilities to make sure they never…
Whenever we meet someone the first question we are asked is what we do for a living. We are always on the job, even though…
It’s easy to feel underutilized as an engineer working in a NOC. Especially in a larger organizations you may find yourself silod into owning highly…
Anything can happen while you’re on-call. You can experience a quiet, incident-free shift or suffer a severe outage that makes your head explode. Since you…
Last week, we gave some suggestions for how you can spend your time when you are on-call. However, here are some things that you absolutely…
In a recent survey we conducted of on-call engineers, 51.5% of people stated that while on-call during non-business hours they like to spend time with…
| In Features
Long gone are the days of emails being primarily used to catch up with friends and forward those annoying chain letters so you aren’t cursed…
The On-Call Scheduling Best Practices Series is back! In the first on-call best practices series, we covered what equipment is needed and how people want…
| In Reliability
As a general rule, whatever percentage you think your test coverage is, it isn’t. Whatever amount of the known surface area you’re covering, there’s going…
| In Features
Tired of getting a flood of PagerDuty incidents whenever a problem occurs with one of your systems? Do many of the incidents seem identical? Do…
This is Part 1 in a multi-part series dealing with tips for being on-call.