incident response | Tags | PagerDuty Build It | Ship It | Own It Thu, 24 Aug 2023 15:29:56 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.1 Three Teams That Can Use AIOps to Work Smarter, Not Harder by Hannah Culver https://www.pagerduty.com/blog/3-use-cases-for-aiops/ Mon, 28 Aug 2023 12:00:29 +0000 https://www.pagerduty.com/?p=83615 There isn’t a boardroom today that isn’t asking what AI and generative AI in application can help drive efficiency and accelerate their business. For organizations...

The post Three Teams That Can Use AIOps to Work Smarter, Not Harder appeared first on PagerDuty.

]]>
There isn’t a boardroom today that isn’t asking what AI and generative AI in application can help drive efficiency and accelerate their business. For organizations looking to capitalize on ML and automation to improve their efficiency during incidents, AIOps is a tangible, proven application thatproves to be an exciting opportunity for ITOps teams. 

As we’ve seen across market landscape evaluations, there are a number of ways that solutions can be implemented. Despite this, the problems AIOps solutions aim to address remain fairly consistent: fewer incidents and faster resolution. But which teams can stand to benefit from this powerful technology and how will AIOps help them achieve their desired business outcomes?

Understanding how different teams can implement best practices to see a reduction in MTTR, total incidents, and time to adopt automation will help ensure that each team is taking value from your investment. Here are three teams that stand out as having much to gain from leveraging AIOps: Network Operation Center (NOC) teams, Major Incident Management (MIM) teams, and distributed service owning teams. Let’s cover each.

NOC teams

If you have a NOC, it acts as your central nervous system. You may also be in the middle of undertaking modernization efforts to reduce both cost and risk.

Many of our NOC customers tell us about challenges such as:

  • Eyes-on-glass operational style causes incidents to go undetected
  • Catch and dispatch means too many escalations to SMEs or routing incidents to the wrong team
  • Manual work drives up MTTR
  • L1/L2 teams experience high turnover and blame culture is common

To move beyond this, organizations can create L0 automation. This is automation that serves as the first responder, only bringing in humans when necessary. For well-understood, well-documented issues, L0 automation can auto-remediate incidents without a responder intervening. But for other more complex issues that require a hands-on approach, NOC teams can create L0 automation that immediately pulls in diagnostic information before the responder looks at an incident, routes incidents intelligently according to event data, and populates the incident notes with pertinent documentation and runbooks.

PagerDuty AIOps helps NOCs modernize and move away from eyes-on-glass methods. These NOCs are a center of excellence within their organizations, spearheading data-driven optimization, enabling best practices, and ensuring incident readiness.

MIM teams

When critical, customer impacting incidents happen, you don’t have time to waste. But, with complexity and noise on the rise, how do Major Incident Management teams improve to meet growing customer expectations?

We see MIM teams with common challenges such as:

  • Finding out about major incidents from overwhelming customers/users calling in or delayed team escalations
  • Lack of context as initial triage takes too long to assess severity and business impact
  • Long MTTR waiting for the right people, the right diagnostics, the right runbooks, etc
  • Disjointed tooling leading to communication barriers for responders and corresponding teams

MIM teams can overcome these challenges with a variety of automation and ML tactics. First, organizations can create automation that immediately routes high priority or severity incidents to a MIM team and tags in the appropriate teams needed via incident workflows. Additionally, ML can gather key context such as how rare an incident like this is, if it happened before and how it was resolved, and change events that might be correlated to the failure.

PagerDuty AIOps helps MIM teams detect major incidents faster, improve MTTR and customer experience, and save SMEs time. This reduces the cost of each incident and mitigates risk.

Distributed service owning teams

DevOps and distributed service owning teams are under more pressure than ever to deliver exceptional customer experiences. But with competing priorities and fewer resources, this is easier said than done.

Many of our customers share challenges they are facing such as:

  • Disparate monitoring tools with no central pane of glass
  • Too much noise leading to incorrect escalations and false incidents
  • Lack of context and information silos
  • Toil and time taken away from value-add initiatives

For service owning teams looking to overcome these challenges, an AIOps tool that can aggregate data from all the monitoring sources in the technical ecosystem can help bring clarity to incident response. Additionally, with ML, teams can reduce noise by automatically grouping together alerts based on context, time, and previous event data that the model has trained on. With this and the ML-surfaced triage information, incident response is streamlined so teams can get back to innovating faster.

PagerDuty AIOps helps service owning teams spend less time firefighting, reduce MTTR, and create exceptional customer experiences. This improves culture and team retention while increasing revenue for the entire organization. 

Ready to get started?

With PagerDuty AIOps, teams like the ones we looked at see 87% fewer incidents, 14% faster MTTR, and 9x faster automation adoption. This helps organizations move faster, focus on the work that matters most to customers, and reduces risk and team burnout. Best of all, teams from dev to IT can see value from PagerDuty AIOps.

PagerDuty AIOps works in conjunction with the rest of the PagerDuty Operations Cloud to help organizations manage their operations by leveraging AI and automation to supercharge their digital transformation. With over 700 integrations, GenAI capabilities, and end-to-end event-driven automation, PagerDuty gives customers a 400% ROI and the right tools to leapfrog the competition.

To try PagerDuty AIOps out yourself, you can take an interactive product tour or try us for free for 14 days.

The post Three Teams That Can Use AIOps to Work Smarter, Not Harder appeared first on PagerDuty.

]]>
Debug Faster By Capturing Crash States in Kubernetes by Nisha Prajapati https://www.pagerduty.com/resources/webinar/debug-faster-by-capturing-crash-states-in-kubernetes/ Thu, 24 Aug 2023 15:29:56 +0000 https://www.pagerduty.com/?post_type=resource&p=83385 The post Debug Faster By Capturing Crash States in Kubernetes appeared first on PagerDuty.

]]>
The post Debug Faster By Capturing Crash States in Kubernetes appeared first on PagerDuty.

]]>
3 New Updates to the PagerDuty Scheduling Experience by Débora Cambé https://www.pagerduty.com/blog/3-new-updates-to-the-pagerduty-scheduling-experience/ Fri, 18 Aug 2023 12:00:31 +0000 https://www.pagerduty.com/?p=83652 With the acceleration of cloud and digital transformation initiatives, enterprises are under pressure to adopt more agile, DevOps practices to be responsive to the business....

The post 3 New Updates to the PagerDuty Scheduling Experience appeared first on PagerDuty.

]]>
With the acceleration of cloud and digital transformation initiatives, enterprises are under pressure to adopt more agile, DevOps practices to be responsive to the business. But the increased complexity of digital systems and reliance on digital business only makes the cost of incidents more expensive. 

When incidents happen, protecting customer experience and minimizing downtime starts with bringing the right subject matter experts in to fix the problem. For many organizations tackling agile methodologies, embracing service ownership (you build it, you own it) into the incident response process is key to success. However, this cultural transformation to putting developers on-call for their services in production is no small feat. Having the right platform to assign responders with dedicated on-call schedules that can mobilize the right people at the right time when seconds matter makes all the difference. 

As the best-in-class solution for incident response, the PagerDuty scheduling experience  continues to be a focus area for ongoing iteration to ensure that the customer workflow is as seamless as possible.

Therefore, we’re proud to launch a number of highly requested updates to scheduling. Highlights include the ability to rename layers and manage users associated with a schedule. Keep reading to learn all the details.

Consolidated Schedule View

The new schedule details page brings the most relevant information about each schedule front and center. This way, both on-call users and admins / team managers can quickly get an overview of current and upcoming rotations. Here are the details you can easily see on this page: 

  • Who’s on call
  • Who will be next on call
  • Calendar feeds
  • Collapsible menus to check which users, teams and escalation policies are associated with the schedule

Screenshot of consolidated schedule view

Dynamic Schedule Creation

By listening to our customers feedback on usability, we were able to design a more fluid and dynamic schedule creation experience. New capabilities include:

  • Mandatory schedule names: every new schedule created requires a name to give teams more clarity about the existing schedules – both users and team managers / admins. 
  • Dropdown time picker: instead of manually typing a handoff time, you can now use a time picker, also available when adding restrictions to a schedule.
  • Relocated buttons: the cancel and save buttons are now on the right hand side of the page and they follow the user’s page scroll, making them easier to find.

Screenshot of dynamic schedule creation

Flexible Editing

More than ever, change is the only constant for organizations and their teams. So it’s important that their schedules can quickly be adapted to reflect their current structure and process. Check these three new functionalities that improve schedule editing:

  • Manage teams associated with a schedule: admins can now add or remove teams directly from a schedule.
  • Change a schedule layer’s name: users and admins can edit a schedule layer’s name to give it a more descriptive title and adjust it in a way that makes more sense to the organization – it even supports emojis. E.g.: You can create an East Coast layer and a West Coast layer. 
  • Reorder and collapse schedule layers: the new drag and drop functionality allows users to reorder layers easily in both the schedule layer creation and editing pages.

Screenshot of flexible editing

Deep Dive on the New Scheduling Experience

Want to see these new capabilities in action? Watch the below video where Senior Product Manager Kara Smith joins Developer Advocate Mandi Walls to show off all the scheduling UI enhancements.

We’ve launched these updates to make the scheduling experience easier, but we’re not stopping there. Stay tuned on how we’re continuing to build out the PagerDuty Operations Cloud to help scale teams with the power of AI and automation to transform the entire incident management process. Try for yourself with our free 14-day trial

The post 3 New Updates to the PagerDuty Scheduling Experience appeared first on PagerDuty.

]]>
How to Maximize Time Savings and Reduce Toil During Incident Response by Laura Chu https://www.pagerduty.com/blog/how-to-maximize-time-savings-and-reduce-toil-during-incident-response/ Mon, 31 Jul 2023 12:00:44 +0000 https://www.pagerduty.com/?p=83406 Incidents are a costly burden on businesses. Despite assembling the right people and teams, the manual work, tool setup and prolonged tasks can negatively impact...

The post How to Maximize Time Savings and Reduce Toil During Incident Response appeared first on PagerDuty.

]]>
Illustration of the PagerDuty Operations Cloud.

Incidents are a costly burden on businesses. Despite assembling the right people and teams, the manual work, tool setup and prolonged tasks can negatively impact customer experience. The need for adaptable processes to address diverse incident types further complicates the situation.

This is where the PagerDuty Operations Cloud steps in. It streamlines and automates all the various manual steps in the incident response process. The result is a cohesive and end-to-end incident management experience that frees up responders to focus on the critical thinking requirements to resolve the incident.

At the heart of the PagerDuty Operations Cloud lies Incident Response–the backbone for effectively managing an orchestrated response to address customer-impacting incidents. To help our customers build a resilient approach to digital operations, we aim to deliver a solution that is:

  • Automated to eliminate inefficiencies
  • Flexible to accommodate each team’s specific processes
  • Proactive to learn from failure and repeat incidents

This year, PagerDuty has introduced Incident Workflows, Custom Fields on Incidents and Status Update Notification Templates. These latest additions work in concert to further streamline incident management processes, enabling you and your team to focus on resolving incidents and delivering exceptional digital experiences to your customers. With every minute mattering in incident response, saving time during every step of the process becomes crucial, leading to a positive and impactful transformation in your business operations.

Here are Three Ways to Cut Down Incident Time

Experience significant time savings with Incident Workflows

Incident Workflows, a powerful capability within PagerDuty, empowers you to easily customize workflows for different incidents and automate manual steps by integrating them into a unified process. With Incident Workflows, actions can be orchestrated based on the incident type via a customizable, user-friendly no-code/low-code builder. 

For example, let’s say your incident process requires five manual steps. With Incident Workflows, you can automate the entire process. 

Screenshot demonstrating how users can create different steps within an incident workflow.

Responders no longer need to worry about manual steps once the Incident Workflow is configured. Instead, they can initiate the appropriate incident workflow (Eg., P1, P2), allowing the PagerDuty Operations Cloud to coordinate the right teams to promptly address and resolve incidents.  This gives teams more time back to focus on the task at hand: resolving the incident.

screenshot showing how a list of workflows can be found by clicking “Run Workflow.”

Take advantage of our latest generally available Incident Workflow Templates, which enable you to quickly operationalize best practices for managing major incidents, standardizing collaboration tools and ensuring the right stakeholders are informed with the latest updates. These templates are designed to empower responders, who have not previously used Incident Workflows, to quickly adopt and implement this functionality, leading to faster incident resolution.

Screenshot showing choice of three Incident Workflow templates.

Better context for faster incident resolution

Context is key for responders during incidents. Having the right information is essential for sharing with other responders and helps guide their actions, such as sending status updates or writing a postmortem. Details such as “data regions” or “customer impact” help teams prioritize efforts effectively. To assist with this, PagerDuty introduced Custom Fields on Incidents.

This new feature allows teams to easily extract important incident data from any system of record and place it where responders can access it, whether on the incident details page or in a status update. PagerDuty empowers responders to save valuable time during triage and make more informed decisions by including relevant critical data.

Screenshot showing fields that allow you to customize the text so the information is the same and consistent across teams.

Simplify stakeholder updates with notification templates

Effective communication with key stakeholders during incidents is crucial. However, crafting these notifications can be time-consuming and resource-intensive. By using Status Update Notification Templates, you can leverage customizable templates that alleviate the strain of writing communications, streamline the process and reduce the time and effort required to share critical updates.

These templates eliminate the guesswork in formatting updates by providing pre-designed templates tailored to your organization’s needs. With Status Update Notification Templates, you can streamline the process of sharing incident updates, ensuring clear and consistent communication with stakeholders.

Screenshot showing a pre-designed template that can be customized to provide updates for stakeholders.

Get 1+1=3 with the PagerDuty Operations Cloud

These features work great alone, but together they provide a better end-to-end incident management experience. With Incident Workflows, sending templated status updates becomes effortless, and soon, you’ll be able to include Custom Fields directly in those updates. For instance, imagine using a custom field to add an object like “data region” and seamlessly launch an Incident Workflow that includes a status update with the same custom field. In the near future, a responder will be able to automatically populate the same information to a Jira ticket or reassign the incident to the right regional responder. 

This powerful orchestration across a unified platform allows you to streamline work across the entire incident lifecycle for maximum time savings, resulting in faster resolution and better customer experiences without impacting revenue.  

Graphic showing Incident Workflows can be the orchestrator by taking text from Custom Fields and automatically updating Status Update Notifications Template or creating a Jira ticket.

Watch a demonstration of how these features work together.

Dynamic Digital Ecosystem

PagerDuty brings all of these capabilities to a desktop web interface, mobile application, chat experience and API so you can work in a way that suits you best. Therefore, we are making these capabilities available in all four of these services to enable you to do so.

Graphic displaying different mediums for the PagerDuty Operations Cloud (i.e., web, mobile, chat, API).

Don’t Wait, Try it Out

PagerDuty empowers you to streamline your incident response process by leveraging the PagerDuty Operations Cloud with Incident Workflows and integrating various tools and templates. This integration optimizes your incident management, ensuring fast and effective response. As a result, your organization can experience reduced operating costs while freeing up resources to prioritize innovation and growth. 

Curious to see these features in action? Embark on our Product Tour or try our free 14-day trial to witness the power of the PagerDuty Operations Cloud firsthand.

The post How to Maximize Time Savings and Reduce Toil During Incident Response appeared first on PagerDuty.

]]>
Product Launch: Reduce Tool Sprawl and Optimize Operations with PagerDuty by Nisha Prajapati https://www.pagerduty.com/resources/webinar/reduce-tool-sprawl-and-optimize-operations/ Wed, 21 Jun 2023 23:00:17 +0000 https://www.pagerduty.com/?post_type=resource&p=82306 The post Product Launch: Reduce Tool Sprawl and Optimize Operations with PagerDuty appeared first on PagerDuty.

]]>
The post Product Launch: Reduce Tool Sprawl and Optimize Operations with PagerDuty appeared first on PagerDuty.

]]>
Top 5 Use Cases for Custom Fields on Incidents by Ariel Russo https://www.pagerduty.com/blog/top-5-use-cases-for-custom-fields-on-incidents/ Thu, 15 Jun 2023 12:00:27 +0000 https://www.pagerduty.com/?p=82936 Chasing down critical information in disparate systems of record while trying to resolve an incident can make an already stressful situation even more taxing. Extra...

The post Top 5 Use Cases for Custom Fields on Incidents appeared first on PagerDuty.

]]>
Chasing down critical information in disparate systems of record while trying to resolve an incident can make an already stressful situation even more taxing. Extra clicks, extra logins, copy/paste, socializing that information with other responders–it all wastes time and introduces more room for human error. Now PagerDuty customers can use Custom Fields on Incidents to enrich their incident data. This new feature allows teams to pull in important incident data from any system of record and put it at the fingertips of responders so they have the information needed to resolve incidents faster. 

Intrigued? Here are the top 5 use cases of ways PagerDuty customers use Custom Fields: 

1. Label incident impact 

The most common use case for Custom Fields is to capture and assess incident impact. One enterprise SaaS company is using Custom Fields to identify the region, components, and customers impacted by an incident. When their responders open up an incident record, Custom Fields aggregate this critical information from across different systems in one clear and consistent place. This enables responders to quickly understand the downstream implications of the incident at hand. 

2. Sync with important ITSM data 

Many organizations use both PagerDuty and an ITSM ticketing system. Sometimes, it’s necessary to work with data from both at the same time. Rather than flipping between tabs to search for information, one financial institution is using Custom Fields to add the relevant ITSM fields to the incident details page in PagerDuty. For example, you could attach an ITSM incident or problem ID number to your view in PagerDuty. 

3. Attach links to third-party or homegrown tools 

It’s often useful to link to supporting tools directly from a PagerDuty incident. This could be a homegrown tool or a third party vendor for documentation, for example. One travel company is using PagerDuty Custom Fields to append third-party postmortem links to the associated incident. This makes it easy to track and cross-reference information. It also helps the organization enforce adherence to their 2-week SLA for postmortems on incidents. 

4. Connect conference bridges for different regions

One multinational financial institution is using Custom Fields to attach different conference bridges for their operations centers and stakeholders that are spread across multiple geographical regions. In particular, they are using this new flexible field to capture the ‘stakeholder bridge’ which happens to be a URL. Now that the different groups no longer have to chase down links and phone numbers from separate sources, it makes bringing everyone together faster and simpler than ever. 

5. Assign incident response roles

There are several roles that need to be assumed during an incident response. These include, but are not limited to, incident commander, deputy, scribe, and subject-matter expert. To make those role designations clearer and keep the team operating smoothly, one automotive services company is using Custom Fields to add response roles. Now the team never has to question roles and responsibilities for a given incident, whether they are actively working on a resolution or reviewing past data. 

Configuration steps to create a new custom field called “Incident Commander”

Conclusion

These use cases are a starting point to understand how Custom Fields can bring value to your organization, but the sky is the limit for other ways you might apply them. Whatever your use case, Custom Fields will help you utilize PagerDuty as a single place to manage incidents end-to-end. Custom Fields on Incidents are generally available via web, mobile and the API for customers on Business and Digital Operations plans. If you are an existing customer, you can try out Custom Fields today. If you’re a prospect or on a lower-tier PagerDuty plan, check out the product tour to see Custom Fields in action.

The post Top 5 Use Cases for Custom Fields on Incidents appeared first on PagerDuty.

]]>
AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst by Heath Newburn https://www.pagerduty.com/blog/heath-newburn-speaks-with-carlos-casanova/ Fri, 09 Jun 2023 12:00:36 +0000 https://www.pagerduty.com/?p=82855 At the beginning of 2023, I had a great conversation with Carlos Casanova, a Forrester Principal Analyst, in a recent webinar about how AIOps can...

The post AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst appeared first on PagerDuty.

]]>
At the beginning of 2023, I had a great conversation with Carlos Casanova, a Forrester Principal Analyst, in a recent webinar about how AIOps can help drive successful organizational change. According to our conversation, Carlos has divided the AIOps market into two camps: technology-centric (primarily APM/Observability players) and process-centric. PagerDuty is a process-centric solution leveraging multiple technologies.

With process-centric AIOps solutions, organizations gain additional context and insights into  their data. This reduces the time to act, helps improve data quality, enhances decision-making, improves routing and notification efficiency, and ultimately increases the value of services delivered by IT.

This ability to increase speed with greater context shrinks the time for critical incidents. An important thing to note is that the initial routing can be to a virtual operator. Meaning that automation could drive additional triage/debug information or potentially complete a fix before engaging a human responder.

Throughout our conversation, Carlos and I kept returning to the theme of creating better context for responders. When I asked him about what capabilities he sees as most important for solving core AIOps use cases, he said, Quickly identifying the correlation across disparate alerts drastically reduces the noise that individuals are dealing with. Providing all impacted individuals with this clean data signal is vital to improving operations. With this data, individuals can more easily and quickly garner insight into what is truly going on in the environment. They can then quickly determine the right actions to take, decide who needs to be involved for faster remediation, and reduce the amount of effort necessary, which frees up time for other events and alerts.

But teams often struggle with getting started. We agreed that the cost of waiting and planning probably isn’t worth the cost of starting and iterating. He added “The overall initiative may look daunting, but there are achievable quick wins. Waiting is not recommended. Start with small tactical efforts that roll up to your larger and longer-term strategic goals to show progress, demonstrate value, and build momentum.”

So speed is also a continuous theme: quickly getting context, rapidly responding with automation, and starting the process immediately to see these wins. But we also know that the pressure has continued to grow. 

Teams have been affected by the economic downturn and slowdown. When I asked him about how teams can increase efficiency and measure success, we spoke about automation being key to success.

Carlos responded, “Simple scenarios that occur often are great candidates for automating all or part of their remediation. Fully or even partially automating five or 10 simple scenarios instantly frees up large amounts of time for individuals to focus on the more complex scenarios that organizations might not feel comfortable automating.”

But we also have to recognize the forming, storming, and norming before we get to performing in projects. There will be changes to how we measure and think about success that we have to embrace. 

“AIOps can also empower IT to alleviate workloads to help their delivery teams ‘do more with less.’ It’s important to remember that these changes invalidate existing metrics. You must establish new baselines, since individuals will no longer be performing the simple and low-level actions. For example, a technician manually resolves 300 incidents per week. Thirty are simple and have easily automated remediations. The MTTR on these might drop by 90%. Elimination of the simple incidents, however, only allows the technician to take on 10 medium-complexity incidents in their place. That means the technician will handle 20 fewer incidents per week. The average MTTR for the technician will go up, and incidents will stay in their queue longer, with a higher ratio of medium- and high-complexity incidents,” Carlos said.

One of the most common questions I run into is how to get started. Traditionally, AIOps is viewed as a potentially years-long initiative. It can be daunting to begin the journey with so much uncertainty and change. PagerDuty has greatly simplified the process by crafting a one-click process for event correlation so teams can see value immediately but this isn’t the end of the journey to AIOps. 

Carlos shared his insights on getting started, as well as facing the reduction in available OpEx. “Budgets are always a challenge, but to a large extent, you can overcome that hurdle by demonstrating and clearly articulating the value of AIOps. Develop a narrative for your business case that speaks to the value of improved experiences with the organization. Demonstrate how improved routing and notifications with enhanced contextually relevant data enables the same workforce to handle more workloads with less effort. Explain how patterns and trends empower lower-level resources to execute more advanced actions because they are provided suggestive actions that are based on the more experienced and senior staff members. All of this helps organizations deal with the economic challenges they’re currently facing while also improving the quality of products and services they deliver. It’s important for organizations to demonstrate their chosen solution has a fast time to value. For example, to improve user experiences, how quickly can the solution provide complete visualizations of transactions to support personnel to resolve an outage? To provide a faster response time, how quickly can the solution analyze the environment and correlate new alerts into singular incidents that can be handled immediately or in an automated fashion? Time to value is vital in difficult economic times.”

Time to value can be even more important than ROI for many of our customers. Speed is what will delineate winners and losers in digital battlegrounds. How quickly we can deal with inevitable issues and iterate improvements is what sets teams apart from competitors and provides an excellent customer experience.

As I&O leaders work through economic uncertainty that’s forcing them to cut costs and do more with less, they require new tools and approaches that help them scale and optimize their existing resources. AIOps provides teams with a reliable way to process high volumes of data and events, manage routing and response in real-time, and help teams resolve incidents faster. If you’re interested in learning how to tackle those challenges for your business, watch this webinar to hear the rest of my conversation with Carlos.  

The post AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst appeared first on PagerDuty.

]]>
PagerDuty Launches New Innovations to Reduce Tool Sprawl and Optimize Operations by Ariel Russo https://www.pagerduty.com/blog/pagerduty-launches-new-innovations-to-reduce-tool-sprawl-and-optimize-operations/ Tue, 23 May 2023 12:00:30 +0000 https://www.pagerduty.com/?p=82483 The number of tools used by distributed teams to manage incidents has multiplied over the years, leading to a valley of tool sprawl. Throw in...

The post PagerDuty Launches New Innovations to Reduce Tool Sprawl and Optimize Operations appeared first on PagerDuty.

]]>
GIF of PagerDuty Operations Cloud highlighting products: Incident Response, Process Automation, AIOps, and Customer Service Ops

The number of tools used by distributed teams to manage incidents has multiplied over the years, leading to a valley of tool sprawl. Throw in manual processes and you’ve got too much toil and multiple points of failure. Maintaining disparate tools and systems isn’t just unwieldy, it’s expensive. 

Our latest capabilities add to the PagerDuty Operations Cloud to make it easier than ever for teams to consolidate their incident management stack. New innovations coming to Incident Workflows, Custom Fields on Incidents, and Status Update Notification Templates will further help organizations shift from a manual, reactive state towards a more proactive, preventative approach to incident response.

When used together in an integrated fashion, these features create a multiplier effect, delivering an unparalleled level of operational efficiency and business acceleration. This interoperability is core to what allows the PagerDuty Operations Cloud to empower organizations to manage incidents from ingest to resolution on a unified platform, without the need for third-party tools and homegrown solutions. Let’s take a closer look at what’s new, or check out the updates for yourself in the product tour.

Custom Fields to enrich incident data

No more chasing information across disparate systems–capture incident context in one centralized place with Custom Fields on Incidents! Custom Fields allow teams to pull in important incident data from any system of record and put it at the fingertips of responders so they have the information needed to resolve incidents faster. Custom Fields on Incidents will be available on web, mobile, and through the API. Sign up for early access

Menu to select custom field values to add to the incident details page

Enhanced templates for stakeholder communications 

Automate how status updates are created to drive efficiency and consistency, rather than manually crafting update messages from scratch. Response teams now have access to an expanded set of fields in their templates, including “Business Impact,” “Conference Bridge,” and “Slack Channel.” Templates will soon also support Custom Fields (sign up for Early Access). These new fields help response teams add important context about the incident at hand to their communications to stakeholders. They can also create communications from templates as part of an Incident Workflows workflow action.

Menu for creating a status update from a template within Incident Workflows

Integration between Incident Workflows and ServiceNow and Jira Server

Improve the effectiveness and efficiency of your ITSM tools. PagerDuty customers can now run PagerDuty Incident Workflows from ServiceNow incident records and Jira issue records. This means customers can access powerful workflow automation from the places they already work. This functionality is now available in v7.9 ServiceNow application (Utah certified) and v4 Jira Server. To learn more, check out the KB articles for ServiceNow and Jira Server integrations.

Dashboard of ServiceNow "Run a Workflow" option.

ServiceNow and PagerDuty integration of running workflow.

Expanded Incident Workflow actions 

Reduce operating costs by automating manual steps of the incident response process using Incident Workflows. Today we’re announcing a new set of actions planned for launch in Q2 which further expands the range of PagerDuty features that can be automated through Incident Workflows. These actions include run Automation Actions, use Status Update Notification Templates to send a status update, create a Microsoft Teams meeting or channel, add a note to an incident, reassign an incident, and change incident priority. 

 

Incident Workflow builder, now with new workflow actions including Create MS Teams meeting and Run Automation Actions

Example of an Incident Workflow, broken down by steps

Conclusion

Today’s announcement summarizes a few of the ways that PagerDuty is designing our products and features to help our customers mitigate risk to revenue and minimize toil by helping them manage incidents end-to-end. In building our products cohesively as a platform for action, we can enable teams to automate and accelerate critical work–to ultimately transform operations and move business forward faster. The power of the PagerDuty Operations Cloud lies in the synergies provided through the seamless integration across the entire product suite, and these features work together in concert, lowering the barrier to adopting more proactive, preventative processes. 

If you’d like to learn more about the latest release, register for our launch webinar. Our product team will be diving into and demoing these features. 

To see the latest features in action, check out our product tour.

The post PagerDuty Launches New Innovations to Reduce Tool Sprawl and Optimize Operations appeared first on PagerDuty.

]]>
Learn How PagerDuty Customers Save Money and Achieve Fast ROI by Rachel Schmitz https://www.pagerduty.com/blog/customers-save-time-and-money/ Fri, 19 May 2023 12:00:56 +0000 https://www.pagerduty.com/?p=82441 Saving time and money is always important, but these days, it’s a mission-critical business imperative. At PagerDuty, we help organizations realize transformational gains in efficiency...

The post Learn How PagerDuty Customers Save Money and Achieve Fast ROI appeared first on PagerDuty.

]]>
Saving time and money is always important, but these days, it’s a mission-critical business imperative. At PagerDuty, we help organizations realize transformational gains in efficiency that drive both immediate financial impact and long-term business success. 

PagerDuty delivers clear value for any organization at any stage of operational maturity. 

  • $356K savings/year per team of ten.1
  • 70% faster time to resolve.1
  • 795% return on investment.1
  • 2 month payback period.1

But you don’t have to take our word for it – the real-life experiences of our customers speak volumes. Here are a few examples of how PagerDuty creates value for global industry leaders. 

The Value SAP Receives from PagerDuty

SAP is a market leader in enterprise application software. And with more than three quarters of the world’s transaction revenue touching an SAP system, uptime is critical. 

SAP needed to digitally transform its business and move customer-facing services to the cloud. They also needed to ensure that there were fewer and less impactful incidents that could impact the customer experience and put revenue at risk. 

The task initially appeared easier said than done, especially given SAP’s size. Many teams were using custom in-house tools that weren’t scalable across the organization. There were “islands” of automation where certain sub-processes were moving quickly, but this acceleration wasn’t happening at scale. The exceptionally wide variety of tools and processes across business units and global theaters also made collaboration particularly burdensome. 

SAP’s Global Cloud Services team now uses PagerDuty to orchestrate their major incident response. We helped improve communication between teams and stakeholders, providing real-time information about the status of an incident and often reducing response time from hours to minutes. 

"25% reduction in the number of responders needed for major incidents within 2 months"

PagerDuty helped SAP achieve incredible results in just a couple of months, including: 

  • 25% reduction in the number of responders needed for major incidents.
  • Reduced response times by 30%.
  • Resolution times reduced by 26%.
  • Greater cross-team collaboration and ownership of services. 
  • Seamless integration with various commercial and in-house tools. 

Read here to learn more about how SAP’s Global Cloud Services team improved operational excellence.  

The Value Brink’s Receives From PagerDuty 

Brink’s is a well-known leader in cash management, operating more than 16,000 secured trucks serving customers in more than 100 countries. Technology keeps the money moving—but a few years ago the company realized that to grow the business, its technology needed an upgrade.

Teams were managing workflows manually and spending too much time and money on repetitive, mundane tasks. Moreover, attempts to deploy changes in the IT environment were both time consuming and inconsistent. That’s when Brink’s decided to turn to us for help. 

PagerDuty Process Automation quickly demonstrated its value by reducing toil and facilitating faster deployments and migrations. This made employee’s lives easier while delivering agility, scalability, and savings to the business. The company further expanded PagerDuty-powered automation to other stakeholders and services, such as reducing the time it took engineers to provision virtual machines. "By automating one workflow, Brink's saves over 500 hours annually"By choosing an easy-to-use solution and automating well-documented processes, the Brink’s team realized immediate value and saw a fast return on investment, including: 

  • 99% less time spent on manual tasks while reducing risk of manual errors. 
  • More than 500 FTE engineering hours are saved annually.
  • Developer’s waiting time reduced from two weeks to 3 minutes via self-serve automated workflows. 

Read here to learn more about how Brink’s successfully used automation to drive constant, iterative improvements to the business and, in turn, to its customers. 

Reduce Costs and Accelerate Growth

The PagerDuty Operations Cloud is the platform for action that empowers organizations to anticipate, automate, and accelerate critical work and to transform operations. It’s essential infrastructure that allows teams to focus on high-priority work, substantially reduce operating costs, and radically accelerate innovation and growth. 

The results from SAP and Brink’s showcase how PagerDuty helps them save time and money, and the value is true for customers of all sizes and industries.

Learn more about how PagerDuty can help you save time and money, or sign up for a free trial.

1IDC Business Value White Paper, sponsored by PagerDuty, PagerDuty Helps Organizations Optimize Their Digital Operations Management, doc #US47011820, January 2021

The post Learn How PagerDuty Customers Save Money and Achieve Fast ROI appeared first on PagerDuty.

]]>
The 4 Types of Incidents as Zombies from ‘The Last of Us’ by Hannah Culver https://www.pagerduty.com/blog/4-incident-types-as-zombies/ Thu, 18 May 2023 12:00:06 +0000 https://www.pagerduty.com/?p=82498 Seems like everyone has watched or is watching “The Last of Us.” This show is based on a video game of the same name. It...

The post The 4 Types of Incidents as Zombies from ‘The Last of Us’ appeared first on PagerDuty.

]]>
Seems like everyone has watched or is watching “The Last of Us.” This show is based on a video game of the same name. It features Pedro Pascal (from “The Mandalorian”) and his latest surrogate child, Bella Ramsey (from “Game of Thrones”). But this adventure is challenging for a plethora of reasons. Most notably, zombies. In 2003, a fungus, Cordyceps, brought on a global zombie endemic. Twenty years later, a few humans are trying to endure and survive in what’s left. Spoiler alert for anyone who hasn’t watched season one yet: it’s hard. And zombies are scary.

While incident response is rarely life or death, it can be an adrenaline spike akin to watching the show. And some of the incidents you may face have similarities to the zombies we’ve seen so far in “The Last of Us.” These incidents have a “headshot” that can help you survive against all odds.

Runners

The first zombies we see in “The Last of Us” are runners. These are fresh and may still look human compared to ones that have been infected longer. While easy to kill, there’s one factor that makes them dangerous: you never expect them. They’re novel. For those who were around for the 2003 end of the world, zombies were only a work of fiction. Nobody prepared for the end of the world (except for Bill, *sob*). For those people hanging on in 2023, runners are still jarring. They’re usually someone you know. A friend. And they change fast, as we saw at the end of episode five.

If we had to compare this one to an incident, it’d be the one that happens out of nowhere and is rare or an anomaly. The system is fine! Then it’s not and you’re thinking, “How did I miss that?” So what do you do about it? Look for the signal in the noise that tells you something is going wrong. For those infected, this could be twitching, coughing or unexpected mood swings.

There’s similar warning signs for an incident. Latency a bit high? Could be nothing. But combine it with customer support noting an increase in complaints about slowness? You may have a runner. Monitoring only gets you so far. You need to make sense of the data, both from machines and humans. Correlating that data with changes in your ecosystem can help you attack your runner before it bites you.

Stalkers

This is your garden variety of zombies. They’re not too difficult to kill. They don’t have any special abilities to speak of. And you can almost always expect them. Going down into the basement of an abandoned gas station? Of course you’ll find a stalker. Empty mall? Yep, you should have known, Ellie and Riley! Stalkers aren’t fun by any means, and can be deadly. More often than not, though, the average survivor can take care of a stalker. But, what happens when there’s a few stalkers all at once? Or you find 12 stalkers back to back, all in the same day? What if you’re fighting two to three stalkers every day for a year?

You can see where I’m going with this. Stalkers are like death by a thousand cuts. The more you have to tangle with them, the more dangerous they are. Like your most common incidents. They’re not fire drills, they’re annoying. And one isn’t so bad, but one every single day hurts. It takes time away from value-add work to fix something that you’ll need to fix again soon.

Automation isn’t something that Joel and Ellie can do in their world. But in our zombie-free existence, we can apply it to make incident response more efficient. For well-understood issues and incidents that happen frequently, crafting auto-remediation to resolve the problem without human intervention can immediately add time back to your day. And, it’s a great way to drive automation initiatives within the organization. Solving this small but frequent problem has a direct ROI associated with it. Leverage that to further automation initiatives for other types of incidents.

Clickers

Clickers are ominous, obsessive hunters that are harder to kill. As they’re blind, they use echolocation to hunt their prey. Headshots don’t work as their heads are armored with tough fungus. They’re one of the most feared and hated types of zombies in “The Last of Us,” and it’s easy to see why. Can you imagine coming up against this thing and realizing your typical solution doesn’t work the way it should? And against a more dangerous enemy?

This one may be the hardest to correlate to an incident, because clickers seem to be almost impossible to kill in the show. Everyone’s advice? Run. Before they hear you. But with incidents, you can’t do that. So, if this zombie was an incident, it would be the one that only two or three people have seen before. You’ve heard about this issue, and it’s from deep in the tech stack. But not enough people who knew about this incident shared with the class. When it happens, it feels like a bigger issue than it is.

Like a knife to the neck of a clicker, there’s a solution to this type of incident. And success comes down to the same thing: knowledge and a plan. If you know that a clicker’s head has armor, you go for the neck. It’s close combat, but effective. And since enough people have survived clickers, the knowledge spread across the surviving population.

For an incident, the best way to fix your clickers is documentation, runbooks, and historical context. Someone knows how to resolve the problem. If they share this knowledge, teams can document the process and create a runbook for the next time this scary (but repairable) problem happens. Additionally, teams can rely on AI to surface past incident data. Look-alike incidents have lots we can learn from. This past incident data helps teams understand what worked for an incident and what didn’t. If you don’t have AI to assist, you can always scan through old retrospectives as well for this historical context. Centralizing all this information is also important so that everyone can find it. That way, you may not know how to solve every problem that happens, but you know how to find that knowledge. There’s power in that, even if there’s no perfect “headshot.”

Bloaters

Bloaters look more like the demogorgons in “Stranger Things” than something that was, at one point, a human. They kill most people in the vicinity either by brute force or toxic clumps of fungus that they toss in the air like grenades. We only saw one of these in “The Last of Us” so far and it made quite the impression, annihilating most of the fighting population of Kansas City. Bloaters should be avoided at ALL costs. And any signs of them should be dealt with early before the issue compounds. Remember how the zombies were filling up the tunnels and the rebels had other initiatives to take care of? Yeah, that was technical debt and someone should have fixed it.

But that’s the way it goes. You know there’s a problem, even if you don’t know exactly how it’ll manifest. Then you’ve got a major incident on your hands–a bloater. And the best and only real way to deal with these is with a coordinated, end-to-end incident response. Make sure that you understand key components of incident response such as:

  • Escalation policies
  • Roles and responsibilities during the incident
  • Communication standards, both internal and external
  • Workflows that you can trigger automatically to take the heavy lifting off responders

With these plans in place, you will be able to resolve the incident more smoothly, faster, and with less customer impact.

What zombie are you worried most about?

What’s keeping you up at night? Fear of an impending bloater, or notifications about yet another stalker? While we may not find the cure to Zombies in ‘The Last of Us,’ we can work on technology incidents and make those easier and less catastrophic for us and our customers.

PagerDuty is here to help you improve your digital operations. Whatever challenges you’re facing right now our team can help you endure and thrive, not just survive. Check out our weekly demos to learn more.

The post The 4 Types of Incidents as Zombies from ‘The Last of Us’ appeared first on PagerDuty.

]]>