automation | Tags | PagerDuty Build It | Ship It | Own It Wed, 16 Aug 2023 18:59:17 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.1 Three Teams That Can Use AIOps to Work Smarter, Not Harder by Hannah Culver https://www.pagerduty.com/blog/3-use-cases-for-aiops/ Mon, 28 Aug 2023 12:00:29 +0000 https://www.pagerduty.com/?p=83615 There isn’t a boardroom today that isn’t asking what AI and generative AI in application can help drive efficiency and accelerate their business. For organizations...

The post Three Teams That Can Use AIOps to Work Smarter, Not Harder appeared first on PagerDuty.

]]>
There isn’t a boardroom today that isn’t asking what AI and generative AI in application can help drive efficiency and accelerate their business. For organizations looking to capitalize on ML and automation to improve their efficiency during incidents, AIOps is a tangible, proven application thatproves to be an exciting opportunity for ITOps teams. 

As we’ve seen across market landscape evaluations, there are a number of ways that solutions can be implemented. Despite this, the problems AIOps solutions aim to address remain fairly consistent: fewer incidents and faster resolution. But which teams can stand to benefit from this powerful technology and how will AIOps help them achieve their desired business outcomes?

Understanding how different teams can implement best practices to see a reduction in MTTR, total incidents, and time to adopt automation will help ensure that each team is taking value from your investment. Here are three teams that stand out as having much to gain from leveraging AIOps: Network Operation Center (NOC) teams, Major Incident Management (MIM) teams, and distributed service owning teams. Let’s cover each.

NOC teams

If you have a NOC, it acts as your central nervous system. You may also be in the middle of undertaking modernization efforts to reduce both cost and risk.

Many of our NOC customers tell us about challenges such as:

  • Eyes-on-glass operational style causes incidents to go undetected
  • Catch and dispatch means too many escalations to SMEs or routing incidents to the wrong team
  • Manual work drives up MTTR
  • L1/L2 teams experience high turnover and blame culture is common

To move beyond this, organizations can create L0 automation. This is automation that serves as the first responder, only bringing in humans when necessary. For well-understood, well-documented issues, L0 automation can auto-remediate incidents without a responder intervening. But for other more complex issues that require a hands-on approach, NOC teams can create L0 automation that immediately pulls in diagnostic information before the responder looks at an incident, routes incidents intelligently according to event data, and populates the incident notes with pertinent documentation and runbooks.

PagerDuty AIOps helps NOCs modernize and move away from eyes-on-glass methods. These NOCs are a center of excellence within their organizations, spearheading data-driven optimization, enabling best practices, and ensuring incident readiness.

MIM teams

When critical, customer impacting incidents happen, you don’t have time to waste. But, with complexity and noise on the rise, how do Major Incident Management teams improve to meet growing customer expectations?

We see MIM teams with common challenges such as:

  • Finding out about major incidents from overwhelming customers/users calling in or delayed team escalations
  • Lack of context as initial triage takes too long to assess severity and business impact
  • Long MTTR waiting for the right people, the right diagnostics, the right runbooks, etc
  • Disjointed tooling leading to communication barriers for responders and corresponding teams

MIM teams can overcome these challenges with a variety of automation and ML tactics. First, organizations can create automation that immediately routes high priority or severity incidents to a MIM team and tags in the appropriate teams needed via incident workflows. Additionally, ML can gather key context such as how rare an incident like this is, if it happened before and how it was resolved, and change events that might be correlated to the failure.

PagerDuty AIOps helps MIM teams detect major incidents faster, improve MTTR and customer experience, and save SMEs time. This reduces the cost of each incident and mitigates risk.

Distributed service owning teams

DevOps and distributed service owning teams are under more pressure than ever to deliver exceptional customer experiences. But with competing priorities and fewer resources, this is easier said than done.

Many of our customers share challenges they are facing such as:

  • Disparate monitoring tools with no central pane of glass
  • Too much noise leading to incorrect escalations and false incidents
  • Lack of context and information silos
  • Toil and time taken away from value-add initiatives

For service owning teams looking to overcome these challenges, an AIOps tool that can aggregate data from all the monitoring sources in the technical ecosystem can help bring clarity to incident response. Additionally, with ML, teams can reduce noise by automatically grouping together alerts based on context, time, and previous event data that the model has trained on. With this and the ML-surfaced triage information, incident response is streamlined so teams can get back to innovating faster.

PagerDuty AIOps helps service owning teams spend less time firefighting, reduce MTTR, and create exceptional customer experiences. This improves culture and team retention while increasing revenue for the entire organization. 

Ready to get started?

With PagerDuty AIOps, teams like the ones we looked at see 87% fewer incidents, 14% faster MTTR, and 9x faster automation adoption. This helps organizations move faster, focus on the work that matters most to customers, and reduces risk and team burnout. Best of all, teams from dev to IT can see value from PagerDuty AIOps.

PagerDuty AIOps works in conjunction with the rest of the PagerDuty Operations Cloud to help organizations manage their operations by leveraging AI and automation to supercharge their digital transformation. With over 700 integrations, GenAI capabilities, and end-to-end event-driven automation, PagerDuty gives customers a 400% ROI and the right tools to leapfrog the competition.

To try PagerDuty AIOps out yourself, you can take an interactive product tour or try us for free for 14 days.

The post Three Teams That Can Use AIOps to Work Smarter, Not Harder appeared first on PagerDuty.

]]>
Automating Edge Computing with PagerDuty by Nisha Prajapati https://www.pagerduty.com/resources/solutions-brief/automating-edge-computing/ Wed, 16 Aug 2023 18:59:17 +0000 https://www.pagerduty.com/?post_type=resource&p=83620 The post Automating Edge Computing with PagerDuty appeared first on PagerDuty.

]]>
The post Automating Edge Computing with PagerDuty appeared first on PagerDuty.

]]>
Top Ten Toilsome Tech Tasks to Automate Today by Nisha Prajapati https://www.pagerduty.com/resources/ebook/top-ten-toilsome-tech-tasks-to-automate-today/ Tue, 15 Aug 2023 22:24:00 +0000 https://www.pagerduty.com/?post_type=resource&p=83476 The post Top Ten Toilsome Tech Tasks to Automate Today appeared first on PagerDuty.

]]>
The post Top Ten Toilsome Tech Tasks to Automate Today appeared first on PagerDuty.

]]>
Day in the Life Video by Catherine Craglow https://www.pagerduty.com/resources/video/day-in-the-life-video/ Mon, 31 Jul 2023 13:04:17 +0000 https://www.pagerduty.com/?post_type=resource&p=83408 The post Day in the Life Video appeared first on PagerDuty.

]]>
The post Day in the Life Video appeared first on PagerDuty.

]]>
How to Maximize Time Savings and Reduce Toil During Incident Response by Laura Chu https://www.pagerduty.com/blog/how-to-maximize-time-savings-and-reduce-toil-during-incident-response/ Mon, 31 Jul 2023 12:00:44 +0000 https://www.pagerduty.com/?p=83406 Incidents are a costly burden on businesses. Despite assembling the right people and teams, the manual work, tool setup and prolonged tasks can negatively impact...

The post How to Maximize Time Savings and Reduce Toil During Incident Response appeared first on PagerDuty.

]]>
Illustration of the PagerDuty Operations Cloud.

Incidents are a costly burden on businesses. Despite assembling the right people and teams, the manual work, tool setup and prolonged tasks can negatively impact customer experience. The need for adaptable processes to address diverse incident types further complicates the situation.

This is where the PagerDuty Operations Cloud steps in. It streamlines and automates all the various manual steps in the incident response process. The result is a cohesive and end-to-end incident management experience that frees up responders to focus on the critical thinking requirements to resolve the incident.

At the heart of the PagerDuty Operations Cloud lies Incident Response–the backbone for effectively managing an orchestrated response to address customer-impacting incidents. To help our customers build a resilient approach to digital operations, we aim to deliver a solution that is:

  • Automated to eliminate inefficiencies
  • Flexible to accommodate each team’s specific processes
  • Proactive to learn from failure and repeat incidents

This year, PagerDuty has introduced Incident Workflows, Custom Fields on Incidents and Status Update Notification Templates. These latest additions work in concert to further streamline incident management processes, enabling you and your team to focus on resolving incidents and delivering exceptional digital experiences to your customers. With every minute mattering in incident response, saving time during every step of the process becomes crucial, leading to a positive and impactful transformation in your business operations.

Here are Three Ways to Cut Down Incident Time

Experience significant time savings with Incident Workflows

Incident Workflows, a powerful capability within PagerDuty, empowers you to easily customize workflows for different incidents and automate manual steps by integrating them into a unified process. With Incident Workflows, actions can be orchestrated based on the incident type via a customizable, user-friendly no-code/low-code builder. 

For example, let’s say your incident process requires five manual steps. With Incident Workflows, you can automate the entire process. 

Screenshot demonstrating how users can create different steps within an incident workflow.

Responders no longer need to worry about manual steps once the Incident Workflow is configured. Instead, they can initiate the appropriate incident workflow (Eg., P1, P2), allowing the PagerDuty Operations Cloud to coordinate the right teams to promptly address and resolve incidents.  This gives teams more time back to focus on the task at hand: resolving the incident.

screenshot showing how a list of workflows can be found by clicking “Run Workflow.”

Take advantage of our latest generally available Incident Workflow Templates, which enable you to quickly operationalize best practices for managing major incidents, standardizing collaboration tools and ensuring the right stakeholders are informed with the latest updates. These templates are designed to empower responders, who have not previously used Incident Workflows, to quickly adopt and implement this functionality, leading to faster incident resolution.

Screenshot showing choice of three Incident Workflow templates.

Better context for faster incident resolution

Context is key for responders during incidents. Having the right information is essential for sharing with other responders and helps guide their actions, such as sending status updates or writing a postmortem. Details such as “data regions” or “customer impact” help teams prioritize efforts effectively. To assist with this, PagerDuty introduced Custom Fields on Incidents.

This new feature allows teams to easily extract important incident data from any system of record and place it where responders can access it, whether on the incident details page or in a status update. PagerDuty empowers responders to save valuable time during triage and make more informed decisions by including relevant critical data.

Screenshot showing fields that allow you to customize the text so the information is the same and consistent across teams.

Simplify stakeholder updates with notification templates

Effective communication with key stakeholders during incidents is crucial. However, crafting these notifications can be time-consuming and resource-intensive. By using Status Update Notification Templates, you can leverage customizable templates that alleviate the strain of writing communications, streamline the process and reduce the time and effort required to share critical updates.

These templates eliminate the guesswork in formatting updates by providing pre-designed templates tailored to your organization’s needs. With Status Update Notification Templates, you can streamline the process of sharing incident updates, ensuring clear and consistent communication with stakeholders.

Screenshot showing a pre-designed template that can be customized to provide updates for stakeholders.

Get 1+1=3 with the PagerDuty Operations Cloud

These features work great alone, but together they provide a better end-to-end incident management experience. With Incident Workflows, sending templated status updates becomes effortless, and soon, you’ll be able to include Custom Fields directly in those updates. For instance, imagine using a custom field to add an object like “data region” and seamlessly launch an Incident Workflow that includes a status update with the same custom field. In the near future, a responder will be able to automatically populate the same information to a Jira ticket or reassign the incident to the right regional responder. 

This powerful orchestration across a unified platform allows you to streamline work across the entire incident lifecycle for maximum time savings, resulting in faster resolution and better customer experiences without impacting revenue.  

Graphic showing Incident Workflows can be the orchestrator by taking text from Custom Fields and automatically updating Status Update Notifications Template or creating a Jira ticket.

Watch a demonstration of how these features work together.

Dynamic Digital Ecosystem

PagerDuty brings all of these capabilities to a desktop web interface, mobile application, chat experience and API so you can work in a way that suits you best. Therefore, we are making these capabilities available in all four of these services to enable you to do so.

Graphic displaying different mediums for the PagerDuty Operations Cloud (i.e., web, mobile, chat, API).

Don’t Wait, Try it Out

PagerDuty empowers you to streamline your incident response process by leveraging the PagerDuty Operations Cloud with Incident Workflows and integrating various tools and templates. This integration optimizes your incident management, ensuring fast and effective response. As a result, your organization can experience reduced operating costs while freeing up resources to prioritize innovation and growth. 

Curious to see these features in action? Embark on our Product Tour or try our free 14-day trial to witness the power of the PagerDuty Operations Cloud firsthand.

The post How to Maximize Time Savings and Reduce Toil During Incident Response appeared first on PagerDuty.

]]>
PagerDuty Extends Operations Cloud Leadership into AIOps and Automation by Jonathan Rende https://www.pagerduty.com/blog/pagerduty-extends-operations-cloud-leadership-into-aiops-and-automation/ Tue, 11 Jul 2023 22:51:12 +0000 https://www.pagerduty.com/?p=83281 Forrester Names PagerDuty a Leader in first-ever Process-Centric AIOps Wave From helping pioneer the DevOps movement to establishing best practices around service ownership to being...

The post PagerDuty Extends Operations Cloud Leadership into AIOps and Automation appeared first on PagerDuty.

]]>
Forrester Names PagerDuty a Leader in first-ever Process-Centric AIOps Wave

From helping pioneer the DevOps movement to establishing best practices around service ownership to being the standard in incident response, PagerDuty has a long history of leadership. PagerDuty is honored to add to this list and now be recognized as a leader in the AIOps and Automation space by Forrester. To explain why PagerDuty was listed as a leader, it’s important to look at our current economic climate and compare it to the past.

It’s been more than a decade since the last time the Three C’s–Cost Control, Consolidation (of vendors) and Compliance–received so much oversight and scrutiny. Just like in 2008, centralized decision making and cost controls are driving organizations to consolidate entire vendor suites versus only best of breed products to do the job.

It’s no surprise that additional financial oversights are now a part of every purchase, every budget item, and every activity for IT and development. Everyone expects more–and they expect it right now, not next quarter or even next year. 

AIOps, however, has always promised to do more. For us, more is about exceeding SLAs, and improving availability and reliability. More is about cost savings because fewer humans should be needed in a major outage and incident. More should not only be about being more responsive, but be about preventing issues in the first place.

What’s Changed Since the Last Financial Crisis

In 2008, important financial institutions failed given credit and lending practices. This started a market downturn where the global economy contracted. This resulted in cost controls and the need to improve business efficiencies which in turn drove more central decision making. It was a stark contrast from the strategy of top line growth at all cost which typically results in distributed decision making and vendor/tool sprawl.

So, what’s different now and why is PagerDuty’s AIOps a leading solution to look into?

  • Now (vs 2008) machine learning is and should be an operational part of every data centric digital business. PagerDuty’s AIOps solution has progressed fast over the last four years to help both reduce the time to resolve issues by 25% and reduce unnecessary interruptions (noise) by over 90%. 
    • By combining our event correlation (Intelligent event grouping) and event orchestration (event rules engine) with existing observability processes, we better target which experts are needed for which problems. We make escalation policies more effective and powerful.
    • PagerDuty’s AIOps product can make those experts more productive as well when they do get called in to a major incident by automating the diagnostics process and pinpoint offending or culprit services responsible for the problem.
    • Equally important, by combining event rules with automation jobs (event driven automation), an entire class of lower priority problems can be remediated without human intervention and eliminate the need for responders or experts to engage at all. 
    • Lastly…with Generative AI, we just potentially democratized the tools that will further broaden use even faster. 
  • Now (vs 2008) there is no need to hire expensive professional services or bring in tons of white lab coat experts to configure systems. You can and should demand to see the value in days and weeks, not months or longer. PagerDuty’s AIOps offers 5-10x reduction in time to value over alternative solutions.
  • Now (vs 2008) we have proven, high value products that offer AI as an integrated part of your existing practices vs separate bespoke solutions in event management. Whether you have centralized IT Operations (e.g., network operating center with SREs) or decentralized operating models with service ownership by developers or a combination of both (hybrid model), there is no need to have or add new vendors or build new approaches. PagerDuty’s AIOps solution works in all models. 

The promise of AIOps off the shelf products is a reality. There are real products from established leaders like PagerDuty to apply against your needs and requirements. 

And now, we’re proud to share PagerDuty’s AIOps leadership as part of the recent Forrester Wave. Consider this your personal guide to help in your AIOps journey. Enjoy.

The post PagerDuty Extends Operations Cloud Leadership into AIOps and Automation appeared first on PagerDuty.

]]>
What is Zero Trust Security and Why Should You Care? by Joseph Mandros https://www.pagerduty.com/blog/what-is-zero-trust-security-and-why-should-you-care/ Tue, 13 Jun 2023 13:00:45 +0000 https://www.pagerduty.com/?p=82871 Automation has become a game changer for businesses seeking efficiency and scalability in a rather unclear and volatile macroeconomic landscape. Streamlining processes, improving productivity, and...

The post What is Zero Trust Security and Why Should You Care? appeared first on PagerDuty.

]]>
Automation has become a game changer for businesses seeking efficiency and scalability in a rather unclear and volatile macroeconomic landscape. Streamlining processes, improving productivity, and reducing incidence for human error are just a few benefits that automation brings.

However, as organizations embrace automation, it’s crucial to ensure modern security measures are in place to protect these new and evolving assets. While other security models control the majority of the narrative across the business landscape, zero trust is quickly emerging as a necessary security implementation concept.

With our recent release of the next-generation architecture for PagerDuty Runbook Automation and PagerDuty Process Automation, we are positioned as the ideal partner to help organizations implement and grow within a zero trust security architecture for the modern enterprise.

To learn more, keep reading and/or register for our webinar about Zero Trust security happening this Thursday, June 15th at 6 A.M. PT and 11 A.M PT respectively.

What is zero trust security?

Zero trust security is a model that challenges the traditional perimeter-based security approach by assuming that no user or device can be inherently trusted—regardless of their location. It emphasizes continuous verification and validation of identities, devices, and network traffic before granting access to resources. It achieves this through multi-factor authentication, granular access controls, encryption, and monitoring, enabling organizations to minimize the risk of data breaches and unauthorized access.

By shifting the traditional perimeter-based security paradigm and adopting a “trust no one” approach, zero trust security offers a holistic framework that aligns seamlessly with modern automation initiatives. Additionally, it can positively impact the process evolution of a business’ inner workings as the world becomes increasingly more complex—and prone to bank-breaking threats.

Source: https://www.microsoft.com/en-us/security/business/zero-trust

What’s the big deal?

Zero trust security often stands out as a superior approach compared to traditional security models, largely due to its fundamental shift to a modern technological mindset and comprehensive implementation.

Unlike perimeter-based security models that rely on the assumption that internal networks are inherently trustworthy, zero trust security adopts a “trust no one” philosophy. It implements strict access controls, continuous authentication, and rigorous monitoring at every level, ensuring that every user, device, and network component is treated as potentially untrusted. This approach significantly reduces the attack surface and prevents lateral movement within the network, making it highly effective against both external threats and insider risks.

Additionally, zero trust security provides adaptive access controls that dynamically adjust privileges based on context, bolstering security without impeding productivity. By combining strong authentication, encryption, and segmentation, zero trust security offers a holistic and proactive defense strategy that fortifies organizations against sophisticated threats, making it a superior choice for today’s deep field of dynamic and interconnected digital landscapes.

Business of all sizes can positively benefit from implementing a security model like zero trust, with contributing factors such as:

  • Protecting Sensitive Data: Zero trust security ensures that access to this valuable data is strictly controlled and authenticated, reducing the risk of unauthorized access, data breaches, and potential financial and reputational damages.
  • Mitigating Insider Threats: Zero trust security addresses the risk of insider threats by assuming that no user or device should be implicitly trusted. This helps organizations identify and address potential risks before they cause harm.
  • Adapting to Evolving Cyber Threats: Traditional security models often rely on perimeter-based defenses, assuming that internal network traffic is safe. However, modern cyber threats—such as advanced persistent threats and zero-day exploits—can bypass traditional defenses. Zero trust security takes a more granular approach, implementing continuous auditing, multi-factor authentication, and strict access controls to protect against these evolving threats.
  • Supporting Remote and Mobile Workforces: With the rise of remote work and the increasing use of mobile devices, businesses face new challenges in securing their networks and data. Zero trust security allows organizations to implement secure access controls, regardless of the user’s location or device. This flexibility ensures that employees can work remotely while maintaining a strong security posture.
  • Meeting Compliance and Regulatory Requirements: Implementing zero-trust security can help organizations meet these requirements by enforcing access controls, monitoring data usage, and demonstrating a proactive approach to cybersecurity.
  • Building Customer Trust: In today’s data-driven world, customers value the security and privacy of their personal information. By implementing robust zero-trust security measures, businesses can build trust with their customers, demonstrating their commitment to protecting sensitive data and mitigating cyber risks.

PagerDuty Process Automation + Zero Trust

Digital Transformation initiatives rely on cloud technologies to rapidly scale the business, but there are new challenges around security with automating operations and cloud infrastructure. The main challenge being that engineers need the most secure protocols to run automation in restricted application environments that mandate a zero trust architecture—where direct SSH zone access is deprecated.

Additionally, significant engineering effort is required to deploy and manage automation that performs well across hundreds of remote environments and geographical regions. Lastly, creating resilient automation runbooks is time consuming and prone to error when coordinating within a variety of complex environments.

With PagerDuty Runbook Automation, engineers can now run automation from a central system that triggers the execution through enhanced Runners or AWS SSM within the remote environments—without needing to rely on SSH firewall rules.

PagerDuty Runbook Automation dispatching tasks to remote environments using zero-trust principles.

The new Runners can leverage common plugins like Ansible and Kubernetes and customers can create new types of runbooks where engineers target many remote secure environments and explicitly state where and how tasks will be independently routed and executed within each environment. This enables better performance, scale, and fault tolerance.

For customers with high security requirements, PagerDuty Runbook Automation and Process Automation can now enable connectivity without the need to open ports in their firewalls, such as SSH, enabling remote operations. This new functionality simplifies secure connectivity to automation by reducing the need for customers to deploy their own bastion or jump host and public endpoints.

To learn more about zero trust security and PagerDuty Process Automation, be sure to register for the webinar happening this Thursday, June 15th, at 6 A.M. P.T and 11 A.M. PT respectively.

The post What is Zero Trust Security and Why Should You Care? appeared first on PagerDuty.

]]>
Take Advantage of the New Product Trial of Runbook Automation for Incident Resolution by Jorge Villamariona https://www.pagerduty.com/blog/take-advantage-of-the-new-product-trial-of-runbook-automation-for-incident-resolution/ Mon, 12 Jun 2023 12:00:48 +0000 https://www.pagerduty.com/?p=82858 The PagerDuty Operations Cloud is the platform that enables our customers to manage the full lifecycle of urgent incidents. Many of our customers are leveraging...

The post Take Advantage of the New Product Trial of Runbook Automation for Incident Resolution appeared first on PagerDuty.

]]>
The PagerDuty Operations Cloud is the platform that enables our customers to manage the full lifecycle of urgent incidents. Many of our customers are leveraging Process Automation to augment their incident response teams and as a key driver to grow and scale their capabilities. 

The work resulting from urgent incidents cannot be postponed because it impacts a company’s revenue or ability to service customers. Often, this work is repetitive and could be delegated to first responders. However, the deeper context needed to make accurate diagnosis and remediation of these incidents is locked away in production environments and requires knowledge, skills and access privileges from specialists. Responders frequently have to escalate the incident to already overworked specialists—a time-consuming process that can be disruptive, frustrating and repetitive.

By automating repetitive and time-consuming tasks from your incident resolution process, you can free up your engineers to focus on higher-value activities that require creativity and critical thinking. This, in turn, can lead to shorter MTTR, better customer experiences, faster innovation, revenue protection and improved profitability.

PagerDuty Automated Incident Resolution provides pre-built and customizable diagnostic and remediation capabilities that help first-responders determine the cause and initiate remediation within their production environment, which saves time and requires fewer individuals to assist with the response. Automating this repetition can speed up MTTR by 25% and reduce costs and interruptions by at least 50%.

In order to make it easier for our customers to realize how Automated Incident Resolution can speed up their MTTR, we rolled out the In-Product Trial of Runbook Automation for Incident Resolution last month. This trial is exclusively available to our Business and Digital Operations customers. PagerDuty users can request a trial from the automation tab using the Web UI:

Screenshot of automation tab using PagerDuty Web user interface.

The account owner will see an approval request via email. Upon approval, the account owner can set up a trial for Automation Actions and get a fully functional Runbook Automation instance in just a few minutes. Users will see a visual guide to help them get started authoring automation, including: Creating a Runbook Automation (RBA) Instance, Adding a Runner (a program that allows you to execute automation jobs in your environment), Adding Automation Actions (which allows you to invoke automation jobs and workflows from PagerDuty), Running Actions (from the PagerDuty incident details page), and viewing the output from the automation.

Screenshot of running Automation Actions.

We encourage you to fully take advantage of this trial to further automate and optimize your incident response process. We are looking forward to hearing your Incident Resolution Automation Success story.

The post Take Advantage of the New Product Trial of Runbook Automation for Incident Resolution appeared first on PagerDuty.

]]>
AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst by Heath Newburn https://www.pagerduty.com/blog/heath-newburn-speaks-with-carlos-casanova/ Fri, 09 Jun 2023 12:00:36 +0000 https://www.pagerduty.com/?p=82855 At the beginning of 2023, I had a great conversation with Carlos Casanova, a Forrester Principal Analyst, in a recent webinar about how AIOps can...

The post AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst appeared first on PagerDuty.

]]>
At the beginning of 2023, I had a great conversation with Carlos Casanova, a Forrester Principal Analyst, in a recent webinar about how AIOps can help drive successful organizational change. According to our conversation, Carlos has divided the AIOps market into two camps: technology-centric (primarily APM/Observability players) and process-centric. PagerDuty is a process-centric solution leveraging multiple technologies.

With process-centric AIOps solutions, organizations gain additional context and insights into  their data. This reduces the time to act, helps improve data quality, enhances decision-making, improves routing and notification efficiency, and ultimately increases the value of services delivered by IT.

This ability to increase speed with greater context shrinks the time for critical incidents. An important thing to note is that the initial routing can be to a virtual operator. Meaning that automation could drive additional triage/debug information or potentially complete a fix before engaging a human responder.

Throughout our conversation, Carlos and I kept returning to the theme of creating better context for responders. When I asked him about what capabilities he sees as most important for solving core AIOps use cases, he said, Quickly identifying the correlation across disparate alerts drastically reduces the noise that individuals are dealing with. Providing all impacted individuals with this clean data signal is vital to improving operations. With this data, individuals can more easily and quickly garner insight into what is truly going on in the environment. They can then quickly determine the right actions to take, decide who needs to be involved for faster remediation, and reduce the amount of effort necessary, which frees up time for other events and alerts.

But teams often struggle with getting started. We agreed that the cost of waiting and planning probably isn’t worth the cost of starting and iterating. He added “The overall initiative may look daunting, but there are achievable quick wins. Waiting is not recommended. Start with small tactical efforts that roll up to your larger and longer-term strategic goals to show progress, demonstrate value, and build momentum.”

So speed is also a continuous theme: quickly getting context, rapidly responding with automation, and starting the process immediately to see these wins. But we also know that the pressure has continued to grow. 

Teams have been affected by the economic downturn and slowdown. When I asked him about how teams can increase efficiency and measure success, we spoke about automation being key to success.

Carlos responded, “Simple scenarios that occur often are great candidates for automating all or part of their remediation. Fully or even partially automating five or 10 simple scenarios instantly frees up large amounts of time for individuals to focus on the more complex scenarios that organizations might not feel comfortable automating.”

But we also have to recognize the forming, storming, and norming before we get to performing in projects. There will be changes to how we measure and think about success that we have to embrace. 

“AIOps can also empower IT to alleviate workloads to help their delivery teams ‘do more with less.’ It’s important to remember that these changes invalidate existing metrics. You must establish new baselines, since individuals will no longer be performing the simple and low-level actions. For example, a technician manually resolves 300 incidents per week. Thirty are simple and have easily automated remediations. The MTTR on these might drop by 90%. Elimination of the simple incidents, however, only allows the technician to take on 10 medium-complexity incidents in their place. That means the technician will handle 20 fewer incidents per week. The average MTTR for the technician will go up, and incidents will stay in their queue longer, with a higher ratio of medium- and high-complexity incidents,” Carlos said.

One of the most common questions I run into is how to get started. Traditionally, AIOps is viewed as a potentially years-long initiative. It can be daunting to begin the journey with so much uncertainty and change. PagerDuty has greatly simplified the process by crafting a one-click process for event correlation so teams can see value immediately but this isn’t the end of the journey to AIOps. 

Carlos shared his insights on getting started, as well as facing the reduction in available OpEx. “Budgets are always a challenge, but to a large extent, you can overcome that hurdle by demonstrating and clearly articulating the value of AIOps. Develop a narrative for your business case that speaks to the value of improved experiences with the organization. Demonstrate how improved routing and notifications with enhanced contextually relevant data enables the same workforce to handle more workloads with less effort. Explain how patterns and trends empower lower-level resources to execute more advanced actions because they are provided suggestive actions that are based on the more experienced and senior staff members. All of this helps organizations deal with the economic challenges they’re currently facing while also improving the quality of products and services they deliver. It’s important for organizations to demonstrate their chosen solution has a fast time to value. For example, to improve user experiences, how quickly can the solution provide complete visualizations of transactions to support personnel to resolve an outage? To provide a faster response time, how quickly can the solution analyze the environment and correlate new alerts into singular incidents that can be handled immediately or in an automated fashion? Time to value is vital in difficult economic times.”

Time to value can be even more important than ROI for many of our customers. Speed is what will delineate winners and losers in digital battlegrounds. How quickly we can deal with inevitable issues and iterate improvements is what sets teams apart from competitors and provides an excellent customer experience.

As I&O leaders work through economic uncertainty that’s forcing them to cut costs and do more with less, they require new tools and approaches that help them scale and optimize their existing resources. AIOps provides teams with a reliable way to process high volumes of data and events, manage routing and response in real-time, and help teams resolve incidents faster. If you’re interested in learning how to tackle those challenges for your business, watch this webinar to hear the rest of my conversation with Carlos.  

The post AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst appeared first on PagerDuty.

]]>
How to Show Business Value and ROI of Automation by Nisha Prajapati https://www.pagerduty.com/resources/ebook/how-to-show-business-value-and-roi-of-automation/ Mon, 15 May 2023 20:07:16 +0000 https://www.pagerduty.com/?post_type=resource&p=82373 The post How to Show Business Value and ROI of Automation appeared first on PagerDuty.

]]>
The post How to Show Business Value and ROI of Automation appeared first on PagerDuty.

]]>