Incident Management | Tags | PagerDuty Build It | Ship It | Own It Fri, 25 Aug 2023 17:34:36 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.1 Building Trust with our Customers with PagerDuty for PagerDuty: Crisis Response Management Operations by Jason Flint https://www.pagerduty.com/blog/building-trust-with-our-customers-with-pagerduty-for-pagerduty-crisis-response-management-operations/ Mon, 04 Sep 2023 12:00:16 +0000 https://www.pagerduty.com/?p=83701 A critical partner in your supply chain just went down. An earthquake just hit your main operations hub. Breaking news about your organization just hit...

The post Building Trust with our Customers with PagerDuty for PagerDuty: Crisis Response Management Operations appeared first on PagerDuty.

]]>
A critical partner in your supply chain just went down. An earthquake just hit your main operations hub. Breaking news about your organization just hit social media. Bad news first—there’s always another crisis or existential threat to your organization on the horizon. If you don’t have an established Crisis Response process and team in place, you’re running a high risk of failure. If you do have a process and team, you should be continuously iterating based on the threats and the level of sophistication or you will also risk failure.

The good news is that the PagerDuty Operations Cloud can support your Crisis Response Management Operations for those next level threats. We know this because it’s what we use to manage the unpredictability of the global operating environment and to keep our employees safe.

In this blog, we cover the evolving nature of global preparedness and the convergence of threats along with how you can enhance your crisis response management operations with PagerDuty.

We’re not in Kansas Anymore

PagerDuty runs an annual Global Preparedness campaign to provide its employees with the tools and resources they need to be resilient in a rapidly changing world. We want our employees globally to establish best in class personal and professional resilience so we can do our very best work in support of our customers. We do this by enabling our employees to act in moments of crisis. Throughout September, we deliver emergency response guides, CPR-certification classes, emerging threat training sessions and disaster response movie trivia. Our customary volunteering opportunity in partnership with our Social Impact team and our crisis response system tests followed by a functional drill are mainstays in our annual programming.

Business leaders should always plan for the company’s worst day before it becomes a reality and because we’re anticipating that future crises will require more complex responses and tools to resolve, we’re also highlighting perfect storm scenarios across categories of threats during this year’s campaign. For example, the intensity of cyber threats and extreme natural disasters that we’re seeing over the last year or so creates an opportunity to examine the effectiveness of our crisis response management operations against combined threats. 

One thing is certain when keeping in mind Murphy’s Law: if a crisis can happen, it will happen. However, it may not happen the traditional way you planned and prepared for it. In other words, buckle up because Kansas is no more.

PagerDuty for PagerDuty

PagerDuty’s Crisis Response Management Operations Guide was built to showcase how PagerDuty uses PagerDuty. In regards to preparedness, we leverage our platform for periodic functional exercises and crisis simulations to ensure our teams are ready as we scan the horizon for emerging threats. After these exercises or simulations, we complete our postmortems (i.e., after action report or hotwash) in the platform capitalizing on all of the “Incident Action Plan” information both we and the system captured/generated during the simulated response, i.e., timeline of events, responder requests, hand-off times, status updates, notes.

After postmortems plug

For operational resilience, when we do need to respond to a critical event, the operations side just works. Using PagerDuty, we don’t have to worry about contact information being out of date, which conference bridge we should be using and where the latest version of our playbook is stored. These problems slow mean time to respond and ultimately the speed at which a critical event can be resolved. PagerDuty supports 700+ native integrations and even more with our API so we never miss a notification or need to spend time punching in conference call numbers on a dialpad. With the ability to add documentation links into our services, our plans and playbooks also never get lost.

After integrations plug

At PagerDuty, we’re empowering teams to build the future and using our own platform to respond to the increasingly complex crises that threaten our people and operations. If you want to learn how PagerDuty can help you modernize your crisis response management operations, read our Crisis Response Management Ops Guide and sign up for a free trial.

The post Building Trust with our Customers with PagerDuty for PagerDuty: Crisis Response Management Operations appeared first on PagerDuty.

]]>
Day in the Life Video by Catherine Craglow https://www.pagerduty.com/resources/video/day-in-the-life-video/ Mon, 31 Jul 2023 13:04:17 +0000 https://www.pagerduty.com/?post_type=resource&p=83408 The post Day in the Life Video appeared first on PagerDuty.

]]>
The post Day in the Life Video appeared first on PagerDuty.

]]>
PagerDuty Launches New Innovations to Reduce Tool Sprawl and Optimize Operations by Ariel Russo https://www.pagerduty.com/blog/pagerduty-launches-new-innovations-to-reduce-tool-sprawl-and-optimize-operations/ Tue, 23 May 2023 12:00:30 +0000 https://www.pagerduty.com/?p=82483 The number of tools used by distributed teams to manage incidents has multiplied over the years, leading to a valley of tool sprawl. Throw in...

The post PagerDuty Launches New Innovations to Reduce Tool Sprawl and Optimize Operations appeared first on PagerDuty.

]]>
GIF of PagerDuty Operations Cloud highlighting products: Incident Response, Process Automation, AIOps, and Customer Service Ops

The number of tools used by distributed teams to manage incidents has multiplied over the years, leading to a valley of tool sprawl. Throw in manual processes and you’ve got too much toil and multiple points of failure. Maintaining disparate tools and systems isn’t just unwieldy, it’s expensive. 

Our latest capabilities add to the PagerDuty Operations Cloud to make it easier than ever for teams to consolidate their incident management stack. New innovations coming to Incident Workflows, Custom Fields on Incidents, and Status Update Notification Templates will further help organizations shift from a manual, reactive state towards a more proactive, preventative approach to incident response.

When used together in an integrated fashion, these features create a multiplier effect, delivering an unparalleled level of operational efficiency and business acceleration. This interoperability is core to what allows the PagerDuty Operations Cloud to empower organizations to manage incidents from ingest to resolution on a unified platform, without the need for third-party tools and homegrown solutions. Let’s take a closer look at what’s new, or check out the updates for yourself in the product tour.

Custom Fields to enrich incident data

No more chasing information across disparate systems–capture incident context in one centralized place with Custom Fields on Incidents! Custom Fields allow teams to pull in important incident data from any system of record and put it at the fingertips of responders so they have the information needed to resolve incidents faster. Custom Fields on Incidents will be available on web, mobile, and through the API. Sign up for early access

Menu to select custom field values to add to the incident details page

Enhanced templates for stakeholder communications 

Automate how status updates are created to drive efficiency and consistency, rather than manually crafting update messages from scratch. Response teams now have access to an expanded set of fields in their templates, including “Business Impact,” “Conference Bridge,” and “Slack Channel.” Templates will soon also support Custom Fields (sign up for Early Access). These new fields help response teams add important context about the incident at hand to their communications to stakeholders. They can also create communications from templates as part of an Incident Workflows workflow action.

Menu for creating a status update from a template within Incident Workflows

Integration between Incident Workflows and ServiceNow and Jira Server

Improve the effectiveness and efficiency of your ITSM tools. PagerDuty customers can now run PagerDuty Incident Workflows from ServiceNow incident records and Jira issue records. This means customers can access powerful workflow automation from the places they already work. This functionality is now available in v7.9 ServiceNow application (Utah certified) and v4 Jira Server. To learn more, check out the KB articles for ServiceNow and Jira Server integrations.

Dashboard of ServiceNow "Run a Workflow" option.

ServiceNow and PagerDuty integration of running workflow.

Expanded Incident Workflow actions 

Reduce operating costs by automating manual steps of the incident response process using Incident Workflows. Today we’re announcing a new set of actions planned for launch in Q2 which further expands the range of PagerDuty features that can be automated through Incident Workflows. These actions include run Automation Actions, use Status Update Notification Templates to send a status update, create a Microsoft Teams meeting or channel, add a note to an incident, reassign an incident, and change incident priority. 

 

Incident Workflow builder, now with new workflow actions including Create MS Teams meeting and Run Automation Actions

Example of an Incident Workflow, broken down by steps

Conclusion

Today’s announcement summarizes a few of the ways that PagerDuty is designing our products and features to help our customers mitigate risk to revenue and minimize toil by helping them manage incidents end-to-end. In building our products cohesively as a platform for action, we can enable teams to automate and accelerate critical work–to ultimately transform operations and move business forward faster. The power of the PagerDuty Operations Cloud lies in the synergies provided through the seamless integration across the entire product suite, and these features work together in concert, lowering the barrier to adopting more proactive, preventative processes. 

If you’d like to learn more about the latest release, register for our launch webinar. Our product team will be diving into and demoing these features. 

To see the latest features in action, check out our product tour.

The post PagerDuty Launches New Innovations to Reduce Tool Sprawl and Optimize Operations appeared first on PagerDuty.

]]>
The 4 Types of Incidents as Zombies from ‘The Last of Us’ by Hannah Culver https://www.pagerduty.com/blog/4-incident-types-as-zombies/ Thu, 18 May 2023 12:00:06 +0000 https://www.pagerduty.com/?p=82498 Seems like everyone has watched or is watching “The Last of Us.” This show is based on a video game of the same name. It...

The post The 4 Types of Incidents as Zombies from ‘The Last of Us’ appeared first on PagerDuty.

]]>
Seems like everyone has watched or is watching “The Last of Us.” This show is based on a video game of the same name. It features Pedro Pascal (from “The Mandalorian”) and his latest surrogate child, Bella Ramsey (from “Game of Thrones”). But this adventure is challenging for a plethora of reasons. Most notably, zombies. In 2003, a fungus, Cordyceps, brought on a global zombie endemic. Twenty years later, a few humans are trying to endure and survive in what’s left. Spoiler alert for anyone who hasn’t watched season one yet: it’s hard. And zombies are scary.

While incident response is rarely life or death, it can be an adrenaline spike akin to watching the show. And some of the incidents you may face have similarities to the zombies we’ve seen so far in “The Last of Us.” These incidents have a “headshot” that can help you survive against all odds.

Runners

The first zombies we see in “The Last of Us” are runners. These are fresh and may still look human compared to ones that have been infected longer. While easy to kill, there’s one factor that makes them dangerous: you never expect them. They’re novel. For those who were around for the 2003 end of the world, zombies were only a work of fiction. Nobody prepared for the end of the world (except for Bill, *sob*). For those people hanging on in 2023, runners are still jarring. They’re usually someone you know. A friend. And they change fast, as we saw at the end of episode five.

If we had to compare this one to an incident, it’d be the one that happens out of nowhere and is rare or an anomaly. The system is fine! Then it’s not and you’re thinking, “How did I miss that?” So what do you do about it? Look for the signal in the noise that tells you something is going wrong. For those infected, this could be twitching, coughing or unexpected mood swings.

There’s similar warning signs for an incident. Latency a bit high? Could be nothing. But combine it with customer support noting an increase in complaints about slowness? You may have a runner. Monitoring only gets you so far. You need to make sense of the data, both from machines and humans. Correlating that data with changes in your ecosystem can help you attack your runner before it bites you.

Stalkers

This is your garden variety of zombies. They’re not too difficult to kill. They don’t have any special abilities to speak of. And you can almost always expect them. Going down into the basement of an abandoned gas station? Of course you’ll find a stalker. Empty mall? Yep, you should have known, Ellie and Riley! Stalkers aren’t fun by any means, and can be deadly. More often than not, though, the average survivor can take care of a stalker. But, what happens when there’s a few stalkers all at once? Or you find 12 stalkers back to back, all in the same day? What if you’re fighting two to three stalkers every day for a year?

You can see where I’m going with this. Stalkers are like death by a thousand cuts. The more you have to tangle with them, the more dangerous they are. Like your most common incidents. They’re not fire drills, they’re annoying. And one isn’t so bad, but one every single day hurts. It takes time away from value-add work to fix something that you’ll need to fix again soon.

Automation isn’t something that Joel and Ellie can do in their world. But in our zombie-free existence, we can apply it to make incident response more efficient. For well-understood issues and incidents that happen frequently, crafting auto-remediation to resolve the problem without human intervention can immediately add time back to your day. And, it’s a great way to drive automation initiatives within the organization. Solving this small but frequent problem has a direct ROI associated with it. Leverage that to further automation initiatives for other types of incidents.

Clickers

Clickers are ominous, obsessive hunters that are harder to kill. As they’re blind, they use echolocation to hunt their prey. Headshots don’t work as their heads are armored with tough fungus. They’re one of the most feared and hated types of zombies in “The Last of Us,” and it’s easy to see why. Can you imagine coming up against this thing and realizing your typical solution doesn’t work the way it should? And against a more dangerous enemy?

This one may be the hardest to correlate to an incident, because clickers seem to be almost impossible to kill in the show. Everyone’s advice? Run. Before they hear you. But with incidents, you can’t do that. So, if this zombie was an incident, it would be the one that only two or three people have seen before. You’ve heard about this issue, and it’s from deep in the tech stack. But not enough people who knew about this incident shared with the class. When it happens, it feels like a bigger issue than it is.

Like a knife to the neck of a clicker, there’s a solution to this type of incident. And success comes down to the same thing: knowledge and a plan. If you know that a clicker’s head has armor, you go for the neck. It’s close combat, but effective. And since enough people have survived clickers, the knowledge spread across the surviving population.

For an incident, the best way to fix your clickers is documentation, runbooks, and historical context. Someone knows how to resolve the problem. If they share this knowledge, teams can document the process and create a runbook for the next time this scary (but repairable) problem happens. Additionally, teams can rely on AI to surface past incident data. Look-alike incidents have lots we can learn from. This past incident data helps teams understand what worked for an incident and what didn’t. If you don’t have AI to assist, you can always scan through old retrospectives as well for this historical context. Centralizing all this information is also important so that everyone can find it. That way, you may not know how to solve every problem that happens, but you know how to find that knowledge. There’s power in that, even if there’s no perfect “headshot.”

Bloaters

Bloaters look more like the demogorgons in “Stranger Things” than something that was, at one point, a human. They kill most people in the vicinity either by brute force or toxic clumps of fungus that they toss in the air like grenades. We only saw one of these in “The Last of Us” so far and it made quite the impression, annihilating most of the fighting population of Kansas City. Bloaters should be avoided at ALL costs. And any signs of them should be dealt with early before the issue compounds. Remember how the zombies were filling up the tunnels and the rebels had other initiatives to take care of? Yeah, that was technical debt and someone should have fixed it.

But that’s the way it goes. You know there’s a problem, even if you don’t know exactly how it’ll manifest. Then you’ve got a major incident on your hands–a bloater. And the best and only real way to deal with these is with a coordinated, end-to-end incident response. Make sure that you understand key components of incident response such as:

  • Escalation policies
  • Roles and responsibilities during the incident
  • Communication standards, both internal and external
  • Workflows that you can trigger automatically to take the heavy lifting off responders

With these plans in place, you will be able to resolve the incident more smoothly, faster, and with less customer impact.

What zombie are you worried most about?

What’s keeping you up at night? Fear of an impending bloater, or notifications about yet another stalker? While we may not find the cure to Zombies in ‘The Last of Us,’ we can work on technology incidents and make those easier and less catastrophic for us and our customers.

PagerDuty is here to help you improve your digital operations. Whatever challenges you’re facing right now our team can help you endure and thrive, not just survive. Check out our weekly demos to learn more.

The post The 4 Types of Incidents as Zombies from ‘The Last of Us’ appeared first on PagerDuty.

]]>
Automate Incident Management Across Teams with Slack and PagerDuty by Nisha Prajapati https://www.pagerduty.com/resources/ebook/automate-incident-management-across-teams-with-slack/ Wed, 03 May 2023 19:10:53 +0000 https://www.pagerduty.com/?post_type=resource&p=82309 The post Automate Incident Management Across Teams with Slack and PagerDuty appeared first on PagerDuty.

]]>
The post Automate Incident Management Across Teams with Slack and PagerDuty appeared first on PagerDuty.

]]>
Doing More with Less: Building Greater Operational Efficiency with PagerDuty by Nancy Lee https://www.pagerduty.com/blog/doing-more-with-less-building-greater-operational-efficiency-with-pagerduty/ Wed, 14 Dec 2022 14:00:13 +0000 https://www.pagerduty.com/?p=80617 How many of us can say with confidence that we know a tool inside and out? If you’re like most, you probably use just a...

The post Doing More with Less: Building Greater Operational Efficiency with PagerDuty appeared first on PagerDuty.

]]>
How many of us can say with confidence that we know a tool inside and out? If you’re like most, you probably use just a small fraction of a product’s features. When it comes to feature-rich software like Microsoft Word or Excel, it’s a safe bet that most users are aware of less than half of the features, and use even less on a regular basis. And the longer we’ve been using a piece of software, the more likely we fall into this trap of feature underutilization. 

I started noticing this in my own life a year and a half ago when a coworker who had recently joined the team told me she found a more efficient way to generate closed captions for our instructional videos. I asked if it was a tool in her Adobe Creative Suite. 

Nope, it’s actually YouTube!” she replied.

“What? That’s amazing!” I said. “How did we not know about this?” I was shocked. For the past 6 months, we had been paying for a separate tool for its closed captioning capabilities when, all along, we could’ve used YouTube’s free captioning feature in our Google accounts. 

More recently, I had my proverbial mind blown yet again when I learned of Slack’s reminder feature. Making my to-do list for the next day, I was scheduling reminders in my Google calendar to follow up with a teammate, call my doctor, and pay the gas bill. My husband looked on in amusement as I added one event after another in my calendar.

“What are you doing?” he asked.  

“Setting reminders for the things I have to do tomorrow,” I replied, mildly annoyed at this interruption to my sacred routine.

“Why don’t you use the Slack reminder feature?” he said. “That way, you’re not filling up half your calendar with reminders and making it hard for people to book meetings.” 

“I had no idea you could do that!” Like the YouTube incident, I was incredulous that I was only learning of this feature now.

As I started scheduling Slack reminders for the following day, I wondered how often we hear our customers use that phrase — “I had no idea you could do that!” It’s not surprising when you think about it. We often purchase a tool for a specific use case. In our haste to implement a solution, we approach the task with blinders on, paying attention to only those features that will help us achieve our goal. “Problem solved!” we declare. Never mind that we only learned a tenth of the software’s capabilities. Years later, we’re still clicking the same buttons and following the same scripts, oblivious to the slew of new features that promise to enhance our user experience.  

It’s human nature to take the path of least resistance. But at a time when many tech companies are being asked to manage costs and do more with less, perhaps a good place to start looking for efficiencies is in our existing investments.

One business area that shines a light on this is Customer Education. At PagerDuty, customer training and enablement sits with PagerDuty University. A comment we often see in our course evaluations is “I had no idea PagerDuty could do [fill in the blank with a feature that’s existed for months or even years]!” Some customers may have started using PagerDuty for on-call management and alerting, and never ventured beyond those basic capabilities. They’ve become so accustomed to using PagerDuty for a single use case that they don’t realize its product portfolio actually encompasses multiple solutions for use cases across their digital operations. 

For organizations facing pressure from the current macroeconomic environment, PagerDuty’s end-to-end digital operations capabilities can help consolidate tool spend and boost productivity by reducing context switching. PagerDuty University helps customers by driving awareness of this end-to-end experience, from pre-incident creation (enriching and routing events) to post-incident mobilization (response automation) to business-wide orchestration (automated stakeholder communication) and beyond. Rather than investing in point solutions that address a single problem, our customers can leverage the solutions they need, when they need it, adopting additional capabilities and products as they continue to evolve their Digital Operations with PagerDuty.  

Those of us who work in Customer Education understand that it’s our job to not only improve a customer’s time to value, but to ensure that they continue to see the return on their investment post-onboarding and beyond. For PagerDuty University, that means making sure that our customers receive proper enablement on PagerDuty’s advanced capabilities such as Event Intelligence and Incident Workflows (in Early Access!), as well as other products and use cases such as Customer Service Operations and Process Automation. Tool consolidation, cost savings, automating away toil, better customer experience — these are some of the biggest ROI our customers walk away with post-training. 

Our instructor-led training courses are centered around achieving customer goals. Rather than training customers on every PagerDuty feature, we first try to understand what business challenges they’re trying to solve, and build training that guides them efficiently to reaching those goals. Often in SaaS, we talk about time to value — we like to think of our technical training team as “guides to value.” 

PagerDuty University’s free, on-demand training complements our instructor-led training by digging into each product feature, situated in real-life scenarios so users always understand the larger context in which these features are used and the problems they solve. Our self-paced eLearning modules are suitable for customers who are trialing a free account, those who want to check out new features, or those who simply prefer the self-serve aspect of on-demand training. 

It should come as no surprise that those of us who work in Education Services love learning. We use that love of learning to drive customer success, which sits at the heart of everything. Whether it’s driving adoption, improving onboarding, or imparting industry best practices, we strive to make sure that we never hear one of our customers say “I had no idea PagerDuty could do that!”

The post Doing More with Less: Building Greater Operational Efficiency with PagerDuty appeared first on PagerDuty.

]]>
Getting Started Workshop: Rundeck By PagerDuty by Nisha Prajapati https://www.pagerduty.com/resources/webinar/getting-started-with-rundeck-workshop/ Mon, 12 Dec 2022 20:49:04 +0000 https://www.pagerduty.com/?post_type=resource&p=80503 The post Getting Started Workshop: Rundeck By PagerDuty appeared first on PagerDuty.

]]>
The post Getting Started Workshop: Rundeck By PagerDuty appeared first on PagerDuty.

]]>
3 Ways You Might Have a NOC Process Hangover by Hannah Culver https://www.pagerduty.com/blog/3-ways-you-might-have-a-noc-process-hangover/ Mon, 24 Oct 2022 13:00:33 +0000 https://www.pagerduty.com/?p=79024 NOC, or network operation center, processes have been set in stone for decades. But it’s time for some of these processes to evolve. Digital transformation...

The post 3 Ways You Might Have a NOC Process Hangover appeared first on PagerDuty.

]]>
NOC, or network operation center, processes have been set in stone for decades. But it’s time for some of these processes to evolve. Digital transformation and the cloud era have led to the rise of DevOps, and with it, service ownership. Service ownership means that developers take responsibility for supporting the software they deliver at every stage of the life cycle. This brings development teams closer to their customers, the business, and the value they deliver.

It also requires a departure from the traditional NOC incident handling methods. Yet, as organizations transition towards service ownership, some old NOC processes remain. Here are three common NOC process hangovers and how to replace or update them.

Process hangover: L1 responders aren’t able to resolve issues

NOCs used to be the command center for technology issues. They functioned like a brain, sending out signals to relevant appendages. Issue with networking? Route to networking. Issue with security? Route to security. The NOC’s central function was to involve the correct SME to resolve an issue. This meant digging through spreadsheets (and sometimes physical contact books!) to figure out who was responsible for what.

When everything was on premise and in person, this made sense. There were fewer services, and incidents could be neatly separated by departments. If the database was having an issue, you could call up the database on-call responder. The responder (who would likely be in office or close enough to respond in person) could then go to the datacenter and take a look.

Now, in the remote work, cloud era, where organizations have hundreds or thousands of services maintained by dozens or even hundreds of teams spread across the globe, the rolodex method has outlived its purpose. It’s next to impossible to maintain accurate spreadsheets to know which teams are responsible for which services. And, as the organization changes, records grow stale quickly. Services can move between teams. Teams change as people move between them, or leave/join the company. Now, an L1 responder has to work too hard to identify the right person in an efficient and timely manner.

Organizations need a way to remove these manual steps to find the right person and route incidents directly to SMEs who can jump in to respond to any issues. This can happen in a variety of ways. For some organizations, a DevOps service ownership model is the right path forward. Those who write the code are assigned to respond and fix the service during an incident. The alert is routed directly to the on-call person on the development team that supports the service, and the SME takes it from there.

For other organizations, it might make sense to have a hybrid approach where L1 responders serve as the first line of defense before escalating to distributed, on-cal teams for their services. L1 responders shouldn’t be a routing center that connects the issue with another team. Instead, they should be empowered to resolve an incident themselves. You can set up your L1 responders to be more effective by enabling them with the ability to both troubleshoot and selectively resolve incidents. Access to automation and resources like runbooks can empower L1 responders to help accelerate the diagnosis and remediation process, oftentimes without needing to disrupt the subject matter experts that are in charge of X service via an escalation. By putting automation in the hands of L1 responders, organizations can avoid unnecessary escalations and empower L1s to resolve issues faster.

Process hangover: Major incidents aren’t called or are called too late

We’ve heard it before: time is money. And when NOCs were the primary method of ensuring incidents were responded to, they had an additional responsibility. An NOC needed to ensure that resources were well managed. This meant no unnecessary personnel responding to problems. NOCs often took the blame if they called a major incident too soon and interrupted people for a minute problem. These disruptions took SMEs away from their work innovating. So it was crucial for NOC responders to only call major incidents when it was clear there was a much bigger issue at play.

But now, time isn’t money, uptime is money. The cost of a major incident that’s flown under the radar is larger than the cost of tagging in some extra help. Imagine you’re an online retailer and your shopping cart function is down. Every minute your customers can’t add items to their cart, you’re losing hundreds of thousands of dollars. Plus, customer expectations have increased over the last few years. Customers expect that their app, tool, platform, streaming service, etc. works without interruption. And it erodes customer trust when it doesn’t. In fact, according to PWC, 1 in 3 customers would stop doing business with a brand they loved after one bad experience.

Organizations need to call major incidents sooner to mitigate customer impact. Yes, this may mean waking someone unnecessarily once in a while. But, that’s far less likely with service ownership. SMEs responsible for a service have a better understanding of when to call a major incident than an L1 responder would. So there are fewer false alarms.

Process hangover: Come-and-go war rooms

NOCs often serve as the communication hub for a major incident. This helps responders working to resolve an issue keep on task. Back when many companies had everything (and everyone) on-premise, there was a war room. People came there and the NOC coordinator kept everyone up to date. Now, with distributed teams and systems, physical war rooms are a thing of the past. Many companies instead have virtual war rooms with a video conferencing bridge or chat channel that remains open during an incident.

Other stakeholders may want to treat this war room like a physical one, dropping in as they please. But, in this virtual world, this means that these stakeholders are asking the incident responders questions. This delays the resolution. Companies with come-and-go virtual war rooms may experience more miscommunications and frustration. Responders feel frustrated by interruptions and stakeholders feel frustrated with the lack of communication.

One way to mitigate this is to close the war room to non-participants. If someone isn’t a part of the incident response team, they don’t need access to the response team’s virtual war room. Instead, what they need is an internal liaison. This is a designated communicator from the incident response team.

The internal communication liaison consolidates incident information and relays it to relevant stakeholders. To make this easier, communication liaisons can use status update notification templates. These templates dictate how to craft communications for a specific audience. They ensure that stakeholders receive any information necessary to make decisions. And no responders have to stop working on the incident at hand to share updates.

Hangovers aren’t fun, but they always end

NOCs are a tried and true way of managing incidents for many organizations. But NOC methods become out of date when moving into this era of digital transformation. Seamless communication and rapid response are key to preserving customer trust. Looking forward, teams will involve SMEs immediately and call major incidents sooner rather than later. They’ll also communicate with key stakeholders throughout an incident while setting boundaries.

And often teams need a digital operations platform to help support this transition. PagerDuty allows teams to bring major incident best practices to their organization, resolving critical incidents faster and preventing future occurrences. Try us for free for 14 days.

The post 3 Ways You Might Have a NOC Process Hangover appeared first on PagerDuty.

]]>
Live Call Routing: The Fastest Way to Reach an On-Call Staff by Nisha Prajapati https://www.pagerduty.com/resources/webinar/live-call-routing-fastest-way-to-reach-on-call-staff/ Fri, 16 Sep 2022 19:45:37 +0000 https://www.pagerduty.com/?post_type=resource&p=78107 The post Live Call Routing: The Fastest Way to Reach an On-Call Staff appeared first on PagerDuty.

]]>
The post Live Call Routing: The Fastest Way to Reach an On-Call Staff appeared first on PagerDuty.

]]>
PagerDuty’s Global Preparedness Month Returns: Securing our hybrid workplaces from emerging threats by Jason Flint https://www.pagerduty.com/blog/pagerdutys-global-preparedness-month-returns-securing-our-hybrid-workplaces-from-emerging-threats/ Thu, 01 Sep 2022 13:00:50 +0000 https://www.pagerduty.com/?p=77964 The world remains increasingly complex. Threats from bad actors continue to disrupt our societies, dominate news cycles and impact our lives in many ways. COVID-19...

The post PagerDuty’s Global Preparedness Month Returns: Securing our hybrid workplaces from emerging threats appeared first on PagerDuty.

]]>
The world remains increasingly complex. Threats from bad actors continue to disrupt our societies, dominate news cycles and impact our lives in many ways. COVID-19 taught us a few lessons about planning for the unexpected and at the same time led many to reconsider where they choose to work. With hybridized work continuing to be the de facto mode of work for many, professionals tasked with crisis response and physical security management are faced with serious challenges now that a majority of us are no longer working under one roof with one set of safety protocols. 

PagerDuty operates around the world and around the clock so our 24×7 operation to keep Dutonians safe starts by enabling every employee through training, knowledge-sharing and effective tools so they can take action. Global Preparedness Month is our World Cup of employee safety where we bring all of our enablement activities home. It’s interactive, cross-functional, fun and endorsed by our Senior Leadership. 

In this blog, we’ll cover how we prepare our teams for disaster and how we use PagerDuty internally for our “Harden the Target” initiative aimed at addressing the gaps in critical physical security events in the hybridized world.

Orchestrating in real-time with PagerDuty

When facing a critical breach or outage in physical security systems, teams need to understand the where and when in real-time. Let’s face it, humans can’t monitor multiple, concurrent critical physical security events as well as machines. Applying the principles of digital operations with real-time alerting and automation is key to better insights and actionable information. PagerDuty’s capabilities help us do just this and also serve as a force multiplier.

We found that consolidating on the PagerDuty platform gives us a more thorough common operating picture to respond to critical events as they unfold across different crisis response and physical security systems. A 1:1 relationship between our configured services and our critical physical security event types gives us increased visibility to what is happening and decreases our mean time to respond to or interrupt potentially harmful activity. The platform’s machine learning and intelligent grouping capabilities give us additional intel for trend and pattern recognition such as repeated failed access attempts or the frequency of equipment outages.

Screenshot of PagerDuty Service Directory page

This use case layers physical security management best practices and FEMA’s Area Command over PagerDuty’s traditional incident response. This setup acts as our virtual Global Security Operations Center (GSOC). PagerDuty’s 650+ integrations (e.g., Slack, Teams, Zoom, etc.), email clients and API ensure that we can ingest crucial event information from virtually any physical security, threat intelligence or status monitoring system, and do something timely with that data.

Through the platform, we can orchestrate alerts to response teams, set off audible alarms for our staff or run automated response plays for contract security teams using third party video analytics. PagerDuty ensures a critical physical security event does not go unnoticed and escalates to the right team at the right time.

Practice keeps us primed for response

Throughout September, we engage our employees with emergency preparedness activities such as weekly educational themes, volunteer activities in partnership with PD.org and gamification like building your best-looking disaster kit or our disaster movie trivia. #stayready is our running theme and getting our workforce ready to respond to a major natural disaster such as an earthquake or a man-made disaster such as an active shooter event is our focal point. We also look to have some fun together and pick up key learnings from each other. 

We’re running systems tests and drills to lock in those response habits with our emergency communications tests and The Great ShakeOut earthquake drill all while coordinating on the PagerDuty platform. Through this process we keep our on call schedules up to date and use the on-call readiness tool to confirm everyone is set up for time sensitive alerts as Incident Commanders. The end result is captured in post-mortems and ensures that our team activations, incident command and escalation protocols are correctly programmed in the platform and continue to reflect the needs of the evolving threat landscape.At PagerDuty, we’re empowering teams to build the future and with our platform configured for critical physical security events, we’ve got another tool in our toolbox to stay ahead of the emerging threats to our people and business. If you want to learn how PagerDuty can help your organization respond to the many threats posed by bad actors and serve as a force multiplier or as your virtual Global Security Operations Center (GSOC), please sign up for a free trial.

The post PagerDuty’s Global Preparedness Month Returns: Securing our hybrid workplaces from emerging threats appeared first on PagerDuty.

]]>