AIOps | Categories | PagerDuty https://www.pagerduty.com/blog/category/aiops/ Build It | Ship It | Own It Wed, 16 Aug 2023 17:32:58 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.1 Three Teams That Can Use AIOps to Work Smarter, Not Harder by Hannah Culver https://www.pagerduty.com/blog/3-use-cases-for-aiops/ Mon, 28 Aug 2023 12:00:29 +0000 https://www.pagerduty.com/?p=83615 There isn’t a boardroom today that isn’t asking what AI and generative AI in application can help drive efficiency and accelerate their business. For organizations...

The post Three Teams That Can Use AIOps to Work Smarter, Not Harder appeared first on PagerDuty.

]]>
There isn’t a boardroom today that isn’t asking what AI and generative AI in application can help drive efficiency and accelerate their business. For organizations looking to capitalize on ML and automation to improve their efficiency during incidents, AIOps is a tangible, proven application thatproves to be an exciting opportunity for ITOps teams. 

As we’ve seen across market landscape evaluations, there are a number of ways that solutions can be implemented. Despite this, the problems AIOps solutions aim to address remain fairly consistent: fewer incidents and faster resolution. But which teams can stand to benefit from this powerful technology and how will AIOps help them achieve their desired business outcomes?

Understanding how different teams can implement best practices to see a reduction in MTTR, total incidents, and time to adopt automation will help ensure that each team is taking value from your investment. Here are three teams that stand out as having much to gain from leveraging AIOps: Network Operation Center (NOC) teams, Major Incident Management (MIM) teams, and distributed service owning teams. Let’s cover each.

NOC teams

If you have a NOC, it acts as your central nervous system. You may also be in the middle of undertaking modernization efforts to reduce both cost and risk.

Many of our NOC customers tell us about challenges such as:

  • Eyes-on-glass operational style causes incidents to go undetected
  • Catch and dispatch means too many escalations to SMEs or routing incidents to the wrong team
  • Manual work drives up MTTR
  • L1/L2 teams experience high turnover and blame culture is common

To move beyond this, organizations can create L0 automation. This is automation that serves as the first responder, only bringing in humans when necessary. For well-understood, well-documented issues, L0 automation can auto-remediate incidents without a responder intervening. But for other more complex issues that require a hands-on approach, NOC teams can create L0 automation that immediately pulls in diagnostic information before the responder looks at an incident, routes incidents intelligently according to event data, and populates the incident notes with pertinent documentation and runbooks.

PagerDuty AIOps helps NOCs modernize and move away from eyes-on-glass methods. These NOCs are a center of excellence within their organizations, spearheading data-driven optimization, enabling best practices, and ensuring incident readiness.

MIM teams

When critical, customer impacting incidents happen, you don’t have time to waste. But, with complexity and noise on the rise, how do Major Incident Management teams improve to meet growing customer expectations?

We see MIM teams with common challenges such as:

  • Finding out about major incidents from overwhelming customers/users calling in or delayed team escalations
  • Lack of context as initial triage takes too long to assess severity and business impact
  • Long MTTR waiting for the right people, the right diagnostics, the right runbooks, etc
  • Disjointed tooling leading to communication barriers for responders and corresponding teams

MIM teams can overcome these challenges with a variety of automation and ML tactics. First, organizations can create automation that immediately routes high priority or severity incidents to a MIM team and tags in the appropriate teams needed via incident workflows. Additionally, ML can gather key context such as how rare an incident like this is, if it happened before and how it was resolved, and change events that might be correlated to the failure.

PagerDuty AIOps helps MIM teams detect major incidents faster, improve MTTR and customer experience, and save SMEs time. This reduces the cost of each incident and mitigates risk.

Distributed service owning teams

DevOps and distributed service owning teams are under more pressure than ever to deliver exceptional customer experiences. But with competing priorities and fewer resources, this is easier said than done.

Many of our customers share challenges they are facing such as:

  • Disparate monitoring tools with no central pane of glass
  • Too much noise leading to incorrect escalations and false incidents
  • Lack of context and information silos
  • Toil and time taken away from value-add initiatives

For service owning teams looking to overcome these challenges, an AIOps tool that can aggregate data from all the monitoring sources in the technical ecosystem can help bring clarity to incident response. Additionally, with ML, teams can reduce noise by automatically grouping together alerts based on context, time, and previous event data that the model has trained on. With this and the ML-surfaced triage information, incident response is streamlined so teams can get back to innovating faster.

PagerDuty AIOps helps service owning teams spend less time firefighting, reduce MTTR, and create exceptional customer experiences. This improves culture and team retention while increasing revenue for the entire organization. 

Ready to get started?

With PagerDuty AIOps, teams like the ones we looked at see 87% fewer incidents, 14% faster MTTR, and 9x faster automation adoption. This helps organizations move faster, focus on the work that matters most to customers, and reduces risk and team burnout. Best of all, teams from dev to IT can see value from PagerDuty AIOps.

PagerDuty AIOps works in conjunction with the rest of the PagerDuty Operations Cloud to help organizations manage their operations by leveraging AI and automation to supercharge their digital transformation. With over 700 integrations, GenAI capabilities, and end-to-end event-driven automation, PagerDuty gives customers a 400% ROI and the right tools to leapfrog the competition.

To try PagerDuty AIOps out yourself, you can take an interactive product tour or try us for free for 14 days.

The post Three Teams That Can Use AIOps to Work Smarter, Not Harder appeared first on PagerDuty.

]]>
PagerDuty Extends Operations Cloud Leadership into AIOps and Automation by Jonathan Rende https://www.pagerduty.com/blog/pagerduty-extends-operations-cloud-leadership-into-aiops-and-automation/ Tue, 11 Jul 2023 22:51:12 +0000 https://www.pagerduty.com/?p=83281 Forrester Names PagerDuty a Leader in first-ever Process-Centric AIOps Wave From helping pioneer the DevOps movement to establishing best practices around service ownership to being...

The post PagerDuty Extends Operations Cloud Leadership into AIOps and Automation appeared first on PagerDuty.

]]>
Forrester Names PagerDuty a Leader in first-ever Process-Centric AIOps Wave

From helping pioneer the DevOps movement to establishing best practices around service ownership to being the standard in incident response, PagerDuty has a long history of leadership. PagerDuty is honored to add to this list and now be recognized as a leader in the AIOps and Automation space by Forrester. To explain why PagerDuty was listed as a leader, it’s important to look at our current economic climate and compare it to the past.

It’s been more than a decade since the last time the Three C’s–Cost Control, Consolidation (of vendors) and Compliance–received so much oversight and scrutiny. Just like in 2008, centralized decision making and cost controls are driving organizations to consolidate entire vendor suites versus only best of breed products to do the job.

It’s no surprise that additional financial oversights are now a part of every purchase, every budget item, and every activity for IT and development. Everyone expects more–and they expect it right now, not next quarter or even next year. 

AIOps, however, has always promised to do more. For us, more is about exceeding SLAs, and improving availability and reliability. More is about cost savings because fewer humans should be needed in a major outage and incident. More should not only be about being more responsive, but be about preventing issues in the first place.

What’s Changed Since the Last Financial Crisis

In 2008, important financial institutions failed given credit and lending practices. This started a market downturn where the global economy contracted. This resulted in cost controls and the need to improve business efficiencies which in turn drove more central decision making. It was a stark contrast from the strategy of top line growth at all cost which typically results in distributed decision making and vendor/tool sprawl.

So, what’s different now and why is PagerDuty’s AIOps a leading solution to look into?

  • Now (vs 2008) machine learning is and should be an operational part of every data centric digital business. PagerDuty’s AIOps solution has progressed fast over the last four years to help both reduce the time to resolve issues by 25% and reduce unnecessary interruptions (noise) by over 90%. 
    • By combining our event correlation (Intelligent event grouping) and event orchestration (event rules engine) with existing observability processes, we better target which experts are needed for which problems. We make escalation policies more effective and powerful.
    • PagerDuty’s AIOps product can make those experts more productive as well when they do get called in to a major incident by automating the diagnostics process and pinpoint offending or culprit services responsible for the problem.
    • Equally important, by combining event rules with automation jobs (event driven automation), an entire class of lower priority problems can be remediated without human intervention and eliminate the need for responders or experts to engage at all. 
    • Lastly…with Generative AI, we just potentially democratized the tools that will further broaden use even faster. 
  • Now (vs 2008) there is no need to hire expensive professional services or bring in tons of white lab coat experts to configure systems. You can and should demand to see the value in days and weeks, not months or longer. PagerDuty’s AIOps offers 5-10x reduction in time to value over alternative solutions.
  • Now (vs 2008) we have proven, high value products that offer AI as an integrated part of your existing practices vs separate bespoke solutions in event management. Whether you have centralized IT Operations (e.g., network operating center with SREs) or decentralized operating models with service ownership by developers or a combination of both (hybrid model), there is no need to have or add new vendors or build new approaches. PagerDuty’s AIOps solution works in all models. 

The promise of AIOps off the shelf products is a reality. There are real products from established leaders like PagerDuty to apply against your needs and requirements. 

And now, we’re proud to share PagerDuty’s AIOps leadership as part of the recent Forrester Wave. Consider this your personal guide to help in your AIOps journey. Enjoy.

The post PagerDuty Extends Operations Cloud Leadership into AIOps and Automation appeared first on PagerDuty.

]]>
AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst by Heath Newburn https://www.pagerduty.com/blog/heath-newburn-speaks-with-carlos-casanova/ Fri, 09 Jun 2023 12:00:36 +0000 https://www.pagerduty.com/?p=82855 At the beginning of 2023, I had a great conversation with Carlos Casanova, a Forrester Principal Analyst, in a recent webinar about how AIOps can...

The post AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst appeared first on PagerDuty.

]]>
At the beginning of 2023, I had a great conversation with Carlos Casanova, a Forrester Principal Analyst, in a recent webinar about how AIOps can help drive successful organizational change. According to our conversation, Carlos has divided the AIOps market into two camps: technology-centric (primarily APM/Observability players) and process-centric. PagerDuty is a process-centric solution leveraging multiple technologies.

With process-centric AIOps solutions, organizations gain additional context and insights into  their data. This reduces the time to act, helps improve data quality, enhances decision-making, improves routing and notification efficiency, and ultimately increases the value of services delivered by IT.

This ability to increase speed with greater context shrinks the time for critical incidents. An important thing to note is that the initial routing can be to a virtual operator. Meaning that automation could drive additional triage/debug information or potentially complete a fix before engaging a human responder.

Throughout our conversation, Carlos and I kept returning to the theme of creating better context for responders. When I asked him about what capabilities he sees as most important for solving core AIOps use cases, he said, Quickly identifying the correlation across disparate alerts drastically reduces the noise that individuals are dealing with. Providing all impacted individuals with this clean data signal is vital to improving operations. With this data, individuals can more easily and quickly garner insight into what is truly going on in the environment. They can then quickly determine the right actions to take, decide who needs to be involved for faster remediation, and reduce the amount of effort necessary, which frees up time for other events and alerts.

But teams often struggle with getting started. We agreed that the cost of waiting and planning probably isn’t worth the cost of starting and iterating. He added “The overall initiative may look daunting, but there are achievable quick wins. Waiting is not recommended. Start with small tactical efforts that roll up to your larger and longer-term strategic goals to show progress, demonstrate value, and build momentum.”

So speed is also a continuous theme: quickly getting context, rapidly responding with automation, and starting the process immediately to see these wins. But we also know that the pressure has continued to grow. 

Teams have been affected by the economic downturn and slowdown. When I asked him about how teams can increase efficiency and measure success, we spoke about automation being key to success.

Carlos responded, “Simple scenarios that occur often are great candidates for automating all or part of their remediation. Fully or even partially automating five or 10 simple scenarios instantly frees up large amounts of time for individuals to focus on the more complex scenarios that organizations might not feel comfortable automating.”

But we also have to recognize the forming, storming, and norming before we get to performing in projects. There will be changes to how we measure and think about success that we have to embrace. 

“AIOps can also empower IT to alleviate workloads to help their delivery teams ‘do more with less.’ It’s important to remember that these changes invalidate existing metrics. You must establish new baselines, since individuals will no longer be performing the simple and low-level actions. For example, a technician manually resolves 300 incidents per week. Thirty are simple and have easily automated remediations. The MTTR on these might drop by 90%. Elimination of the simple incidents, however, only allows the technician to take on 10 medium-complexity incidents in their place. That means the technician will handle 20 fewer incidents per week. The average MTTR for the technician will go up, and incidents will stay in their queue longer, with a higher ratio of medium- and high-complexity incidents,” Carlos said.

One of the most common questions I run into is how to get started. Traditionally, AIOps is viewed as a potentially years-long initiative. It can be daunting to begin the journey with so much uncertainty and change. PagerDuty has greatly simplified the process by crafting a one-click process for event correlation so teams can see value immediately but this isn’t the end of the journey to AIOps. 

Carlos shared his insights on getting started, as well as facing the reduction in available OpEx. “Budgets are always a challenge, but to a large extent, you can overcome that hurdle by demonstrating and clearly articulating the value of AIOps. Develop a narrative for your business case that speaks to the value of improved experiences with the organization. Demonstrate how improved routing and notifications with enhanced contextually relevant data enables the same workforce to handle more workloads with less effort. Explain how patterns and trends empower lower-level resources to execute more advanced actions because they are provided suggestive actions that are based on the more experienced and senior staff members. All of this helps organizations deal with the economic challenges they’re currently facing while also improving the quality of products and services they deliver. It’s important for organizations to demonstrate their chosen solution has a fast time to value. For example, to improve user experiences, how quickly can the solution provide complete visualizations of transactions to support personnel to resolve an outage? To provide a faster response time, how quickly can the solution analyze the environment and correlate new alerts into singular incidents that can be handled immediately or in an automated fashion? Time to value is vital in difficult economic times.”

Time to value can be even more important than ROI for many of our customers. Speed is what will delineate winners and losers in digital battlegrounds. How quickly we can deal with inevitable issues and iterate improvements is what sets teams apart from competitors and provides an excellent customer experience.

As I&O leaders work through economic uncertainty that’s forcing them to cut costs and do more with less, they require new tools and approaches that help them scale and optimize their existing resources. AIOps provides teams with a reliable way to process high volumes of data and events, manage routing and response in real-time, and help teams resolve incidents faster. If you’re interested in learning how to tackle those challenges for your business, watch this webinar to hear the rest of my conversation with Carlos.  

The post AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst appeared first on PagerDuty.

]]>
Top 3 Incident Response Problems AIOps Can Help Your Teams Solve by Hannah Culver https://www.pagerduty.com/blog/top-3-incident-response-problems-aiops-can-solve/ Thu, 20 Apr 2023 12:00:58 +0000 https://www.pagerduty.com/?p=81946 More data for data’s sake doesn’t help anyone. What organizations need is more information–actionable insight. With data coming from incoming streams of events and alerts,...

The post Top 3 Incident Response Problems AIOps Can Help Your Teams Solve appeared first on PagerDuty.

]]>
More data for data’s sake doesn’t help anyone. What organizations need is more information–actionable insight. With data coming from incoming streams of events and alerts, teams don’t have enough time to look at each one. And they struggle to parse and consolidate this data in order to figure out what they need to do next to resolve an incident. Processing this data to make it more usable and helpful during incident response often results in a rote series of manual, repetitive tasks each time an incident occurs, wasting time. It’s no wonder teams are increasingly turning to AIOps and automation for help. AIOps helps teams turn data into information and reduce that manual work. Let’s break down three ways AIOps allows teams to overcome challenges and reduce customer disruption.

Reducing noise for fewer incidents

Not every alert should become an incident. Yet for many organizations, this is what happens. Even if you’re only experiencing one problem, you may receive dozens or hundreds of pings for the same issue. This is distracting and bogs responders down. Noise should be your first thing to focus on because eliminating it:

  • Gives responders back time when they don’t need to filter out what’s important from what’s irrelevant.
  • Decreases the cognitive load that responders carry. Responders don’t need to think about 63 different alerts. They can focus on the one that matters. This reduces this on-call anxiety.
  • Reduces the distractions that get in a responder’s way during an incident. This helps responders focus on getting a fix in place faster.

To reduce noise, you can analyze the noisiest incidents you’re facing. Which ones are the same incident? Take a look at the alerts you’re receiving and see if there’s a way to group them based on event data that you gather from your monitoring tools. What’s loudest? This is an opportunity to fine tune your monitoring tools so they’re only sending you what’s most valuable. Keep in mind that this often requires routine maintenance. Monitoring tools become messy, especially when data is scattered across vendors. You’ll want to gut check this whenever you notice noise levels are increasing.

PagerDuty AIOps makes it easier to reduce alert noise within a single tool. Users can set PagerDuty to ingest and deduplicate events from those disparate signals. Then PagerDuty AIOps groups the events into an existing incident. This suppresses a new incident from being created. Teams have access to event data in the form of alerts without extra notifications. The result is that teams can better weather alert storms by bringing focus to what’s needed. 

Gaining context for better triage

Technically, all the information a responder needs to resolve an incident exists. But, it’s buried within multiple disparate streams of data. Humans alone cannot condense all this data into succinct actionable insights. This means teams spend a long time looking for answers to questions that they can leverage machine learning (ML) to find instead. ML can look at both historical event data and human interaction. Then ML translates the analyzed data into actionable insights. With ML, teams can answer key questions such as:

  • Where should my team look first?
  • Are other teams working on the same problem?
  • Is this a common incident or completely new?
  • Have we seen this before; how was it resolved?
  • Any relevant changes occur before this incident?

But developing your own ML can be a daunting task. It requires time and resources such as headcount. Many organizations choose to partner with a vendor instead.

PagerDuty AIOps ML algorithms help surface critical information such as:

  • Probable Origin: determines probable cause based on previous incidents affecting your service.
  • Related Incidents: shares if a current incident is affecting your service.
  • Outlier Incidents: whether this incident happens frequently, rarely, or is a total anomaly.
  • Past Incidents: look at the incident details and see how responders resolved it in the past.
  • Change Correlation: connects with your change integrations to show changes to your service, then leverages ML to correlate patterns between change events and incidents.

Each time this information is surfaced for your team without having to manually dig, you get to resolve the incident faster. That decreased MTTR provides you with more time to focus on value-add initiatives.

Self-healing by crafting auto-remediation

One initiative you can focus on to spend less time firefighting is automation. This is where you can orchestrate a fix and self-heal before the problem even becomes an incident. It’s resolved before it hits a responder. Now someone gets to sleep through the night instead of responding to a notification. But this initiative can seem very intimidating. The reality is that starting small and tackling low-hanging fruit can make self-healing easier than you may expect.

You can identify well-understood resolution scenarios where you can automate the response. These may be scenarios that your team would classify as frequent, or ones where the resolution is straight-forward. Teams can then create automation to resolve these without human intervention. Then, as that automation starts to take effect, your teams will start to free up time to work on new automation initiatives.

PagerDuty’s Event Orchestration  helps teams create automation that spans the entire technical ecosystem. Event Orchestration enriches and routes events, then kicks off automation to self-heal. This feature allows users to trigger remediations for well understood incidents via webhook. For more complex issues where auto-remediation might not be a possibility, teams can also leverage automation to kick off diagnostics. This builds upon the triage information responders have when they first view their incident.

Looking to get started with AIOps?

AIOps can help teams see fewer incidents and faster resolution. PagerDuty can help you achieve this, and more, with PagerDuty AIOps. See PagerDuty AIOps in action by requesting a trial or taking our product tour. In the market for AIOps? Read our buyer’s guide

The post Top 3 Incident Response Problems AIOps Can Help Your Teams Solve appeared first on PagerDuty.

]]>
What’s New: Updates to Incident Response, PagerDuty® Process Automation Software & PagerDuty® Runbook Automation, Mobile App Experience, and More! by Vera Chan https://www.pagerduty.com/blog/whats-new-product-update-2022-11/ Wed, 30 Nov 2022 14:00:43 +0000 https://www.pagerduty.com/?p=80229 We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud in addition to the November Product Launch announcements made...

The post What’s New: Updates to Incident Response, PagerDuty® Process Automation Software & PagerDuty® Runbook Automation, Mobile App Experience, and More! appeared first on PagerDuty.

]]>
We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud in addition to the November Product Launch announcements made earlier this month. Recent development and app updates from the product team include Incident Response, PagerDuty® Process Automation, the PagerDuty Mobile App, Integrations, as well as Community & Advocacy Events updates. We continue to help customers further automate to optimize cloud operations and reduce the amount of issues escalated to other teams. Get started now and learn about:

Incident Response

Early Access for Incident Workflows

Use a no-code/low-code builder to create customizable incident workflows that will reduce the manual work required to escalate, mobilize, and orchestrate the right incident response for any use case. Automatically trigger an orchestrated response using if-this-then-that logic to sequence common incident actions, such as adding a responder, subscribing stakeholders, or starting a conference bridge. To learn more, check out the deep-dive blog, our KB article and sign up for Early Access.

View a demo of Incident Workflows above or watch it in action later

(Featured above: Incident Workflows Builder)

(Featured above: Incident Workflows Conditional Trigger Setup)

(Featured above: Incident Workflows Conditional Trigger Configuration)

AIOps

Flexible Time Windows

Our latest AIOps feature, Flexible Time Windows, is now generally available for all Event Intelligence or DigOps users. You can use it to access greater precision and flexibility when tuning for system noise, along with recommendations on optimal time windows. A configurable time interval lets users tailor Intelligent Alert Grouping to optimize noise reduction for each service, all using a simple dropdown menu.

(Featured above: Flexible Time Windows)

Learn more about this feature and all of Intelligent Alert Grouping in the Knowledge Base.

PagerDuty® Process Automation

PagerDuty® Process Automation Software and PagerDuty® Runbook Automation Version 4.8.0

Check out the new features and enhancements for PagerDuty® Process Automation, PagerDuty Runbook Automation, and Rundeck Community.

Highlights include:

  • RSS Feed Plugin. The new RSS Feed Plugin helps users quickly understand whether an incident is due to an internal issue or a third-party. The RSS Feed Plugin allows users to query and parse RSS feeds for events from SaaS tools and public cloud providers. For users deploying the Automated-Diagnostics Solution, this plugin provides a logical first step for implementation.

(Featured above: Events RSS Feed Plugin Sample Configuration)

(Featured above: Output in Runbook Automation)

(Featured above: Output in the PagerDuty App for Slack)

(Featured above: Process Automation ServiceNow App Configuration Settings)

  • Job Resume now works with Parallel/Ruleset strategies. Now users can resume long running jobs at the point of failure vs. re-running a multi-hour / day job. Users can execute previously failed step(s) with the same inputs on Parallel and Ruleset execution strategies. When enabled on a Job, the plugin will record the internal Workflow State as the Execution progresses. When one or more steps fail, the Workflow State prior to executing the failed step(s) is recorded and stored and can be restarted if needed.

Read the release notes to learn more.

Learn more:

PagerDuty Mobile App

Updates to the PagerDuty Mobile App User Login Experience

The login experience now begins with a requirement to enter your email address. Your email address will then drive the rest of the login experience, presenting you with more streamlined and relevant options at every step of the login experience. You will see the following changes:

  • The current login screen with 4 buttons will be replaced by a single email entry field
  • The email address will be entered by the user
  • That email address will determine what service regions you are in, and what login options have been enabled on your accounts (username/password, or single-sign-on)
  • We will then present you with additional fields (e.g. password), or perhaps send you directly to your single-sign-on provider (if only one single-sign-on provider is associated with your email address)

With these updates, you won’t need to choose SSO/Service Region before entering your credentials. 

We started rolling out this change gradually starting December 5th, 2022 and will continue throughout the month of December. If you have additional questions, please contact support@pagerduty.com.

New PagerDuty Mobile Home Screen Experience

Now if you navigate to “Home” in the PagerDuty Mobile App,”My open incidents” now sits front and center to reveal the most recent incidents along with their related details. Responders who want a broader awareness of digital operations can easily view and access non-incident information like on-call shifts and impacted technical services. This is just the latest in a series of new mobile app enhancements we’ve rolled out this year. You can learn more by reading the blog or knowledge base article.

(Featured above: Mobile Home Screen Experience Main Dashboard and Upcoming Shifts Menu)

Integrations

Expanded Incident Response Functionality in the PagerDuty App for Microsoft Teams

Incident responders can now perform additional incident management actions from the PagerDuty App for Microsoft Teams. These updates include the ability to: 

  • Change priority
  • Reassign
  • Add responders
  • Escalate
  • Run PagerDuty Automation Actions

(Featured above: PagerDuty App for Microsoft Teams Incident Actions)

(Featured above: PagerDuty App for Microsoft Teams Automation Actions)

Learn more:

Product Deprecations

Please take note and keep your teams informed of our upcoming product deprecations.

V1 Webhooks EOL

The End of Life date for v1 Webhooks is 10/31/2022. This means:

  • You will no longer be able to create new v1 Webhooks or use existing connections to v1 Webhook extensions
  • Apps or integrations that are using v1 Webhooks will stop working

For more information, please:

Important Dates:

V2 Zendesk Integration EOL

PagerDuty’s v2 App for Zendesk will End-Of-Life in March 2023. Migrate now to continue to send Zendesk Support Ticket events to PagerDuty. You can read about the benefits of migrating to v3 in the Integrations section above.

Event Rules EOL & Migration to Event Orchestration

PagerDuty Event Rules End-Of-Life is January 31, 2023. You can:

  • Learn more about the migration in the knowledge base
  • Learn more about Event Orchestration
  • Contact your account managerWe have plenty of migration paths to support this EOL. Additionally on the EOL date, we will auto-migrate any remaining event rules you are using to Event Orchestrations, one-to-one. From then on, you’ll be able to do everything in Event Orchestration that you can in Event Rules today. Event Orchestration has the same features as Event Rules and it uses the same backend architecture, ensuring that event processing has billions-of-events-worth of testing already baked in.

Webinars & Events

Join us for the following webinars and events to learn more about PagerDuty’s recent product updates and how they benefit customers. These are just a few of many:

Webinars

Fewer Rules & Less Noise: Your Guide to Event Orchestration with PagerDuty University

Join Hannah Lodise for a quick 3-minute walkthrough of Event Orchestration basics. Your teams will learn to leverage automation to reduce noise, more efficiently process events at ingest, and even avoid incidents entirely!

Register now!

Terraform Quarterly Roundtable

Date & Time: February 21, 2023 10:00am PT (US & Canada)

Join Scott McAllister and José Antonio Reyes from PagerDuty to hear from industry peers and share your own experiences regarding best practices, learnings, and things to consider when implementing Infrastructure as Code with Terraform and PagerDuty.

Reserve your spot today!

Evolve to resolve: fewer incidents, faster response (November Product Launch!)

If you missed our latest product launch earlier this month, you can still watch it for a deep dive into our latest AIOps and Automation capabilities. Kat Gains, Jonathan Rende, Julia Nasser, Sam Ferguson, and Hadijah Creary introduced:

  • Incident Workflows
  • PagerDuty Status Page and Status Update Notification Templates
  • Flexible time windows for Intelligent Alert Grouping
  • Updated for 2022! Incident Response Ops Guide

Watch it on demand!

In-Person Events

Join PagerDuty at AWS Re:Invent 2022, or our booth at the Gartner IT Infrastructure, Operations and Cloud Strategies Conference for our latest product demo and to learn more about PagerDuty from our product experts and team of solutions consultants.

Register for upcoming events in December here!

PagerDuty Community Twitch Stream

Join us on our Twitch channels, PagerDuty Twitch Stream and PagerDuty Community Twitch Stream, to catch up on one of our latest streams led by our Developer Advocates! Catch our past streams via the YouTube Twitch Streams Channel.

PagerDuty Community Twitch Stream

If your team could benefit from any of these enhancements, be sure to contact your account manager and sign up for a 14-day free trial.

The post What’s New: Updates to Incident Response, PagerDuty® Process Automation Software & PagerDuty® Runbook Automation, Mobile App Experience, and More! appeared first on PagerDuty.

]]>
PagerDuty and DataOps: Enabling Organizations to Improve Decision Making with Better Data by Jorge Villamariona https://www.pagerduty.com/blog/pagerduty-and-dataops/ Thu, 27 Oct 2022 01:00:01 +0000 https://www.pagerduty.com/?p=79281 This blog was co-authored by Jorge Villamariona from Product Marketing and May Tong from Technology Ecosystem Introduction Many organizations have been digitally transforming their operations...

The post PagerDuty and DataOps: Enabling Organizations to Improve Decision Making with Better Data appeared first on PagerDuty.

]]>
This blog was co-authored by Jorge Villamariona from Product Marketing and May Tong from Technology Ecosystem

Introduction

Many organizations have been digitally transforming their operations and the majority of them are moving to the cloud.  With this transformation, data teams have to analyze ever larger and more complex data sets to allow downstream teams to make faster and more accurate decisions on a daily basis. Consequently, most organizations need to work with: customer data, product data, usage data, advertising data, and financial data. Some of the datasets are structured, some are semi-structured, and some unstructured. In short, there are endless amounts of data of various types arriving from multiple sources at increasing rates.

With these larger volume, velocity, and variety (commonly known as the 3Vs) of big data, the traditional approaches to managing the data lifecycle started to fall short. Concurrently, and towards the end of the first decade of the 2000s, software development teams started adopting agile methodologies for the software development lifecycle. These methodologies became known as DevOps (portmanteau of Development and Operations).  The following diagram illustrates the DevOps process at a high level.

 

DevOps Process

DevOps Process

Meanwhile, data professionals took a page from their next door software development colleagues and started applying DevOps methodologies and concepts to their own complex data environments.  This is what brought about the DataOps approach.

So, what is DataOps?

DataOps is the practice of leveraging software and data engineering, quality assurance, and infrastructure operations into a single nimble organization. DataOps optimizes how organizations develop and deploy data applications. It leverages process evolution, organizational alignment, and multiple technologies to enable relationships among everyone who participates in producing, moving, transforming, and consuming data: developers, data engineers, data scientists, analysts, and business users. It fosters collaboration, removes silos, and gives teams the ability to use data across the organization to make better business decisions. Overall, DataOps helps teams to collect and prepare data, analyze and make faster and more accurate decisions from a complete data set. DataOps also reduces data downtime or failures by monitoring data for quality.

What Problems Does DataOps Solve?

DataOps addresses a number of common challenges in your organization’s data environments, among them:

  1. Removing silos and promoting collaboration between teams:  Data engineers, scientists, and analysts must collaborate.  There has to be a massive cultural shift. Companies need to allow their employees to iterate rapidly with data-driven ideas.
  2. Improving efficiency and agility – Responding to bugs and defects can be dramatically minimized with greater levels of communication and collaboration between teams and the use of automation.
  3. Improving data quality:  DataOps gives data professionals the ability to automatically format data and uses multiple data sources to help teams to analyze the data and make better decisions.
  4. Eliminating data downtime and failures since the data is monitored for data quality by the data teams.
What is Data Observability ?

“Data observability” provides the tools and methodologies to monitor and manage the health of an organization’s data across multiple tools and across the complete data lifecycle. Data observability allows organizations to proactively correct problems in real-time before the problems impact business users.

What is the relationship between Data Observability and DataOps?

Data observability is a framework that enables DataOps.  DataOps teams use agile approaches to extract business value from enterprise data. But any problems with incorrect or inaccurate data could create serious challenges, especially if issues (aka data downtime) are not detected before they impact the business. Fortunately, with AI-powered data observability, organizations can detect, resolve and prevent data downtime.

Data Observability tools are concerned with data: Freshness, Statistical distribution, Volume, Schema, and Lineage.  The correct use of data observability tools results in better quality data, enhanced trust, and a more operationally mature environment.

Who are the stakeholders in DataOps?

Surely, building a strong centralized data team that builds relationships between all of the departments within an organization is a key factor in achieving data operational maturity. The data team usually publishes the most relevant datasets, thus ensuring that decisions, analyses, and data models are done from a single source of truth. At the other end of the spectrum are the data analysts and line-of-business users who consume these datasets by asking questions and extracting answers from the data. Carefully and intentionally defining roles and responsibilities helps organizations avoid conflicts, redundancies, and inefficiencies.

DataOps Personas

Here are the most common profiles (aka personas) that take part in the data lifecycle:

  • Data Engineers: These data professionals are in charge of capturing the data and building the pipelines that bring it from the source systems into data stores so that analysts and data scientists can access it. They publish core datasets after cleansing and transforming the data. They are in charge of providing timely data that is clean, curated, and accessible to those who need it. In the most traditional data environments the ETL (Extraction, Transformation, and Loading) acronym appears in their title.
  • Data Scientists: Apply their knowledge of statistics to build predictive and prescriptive models. Their most common environments are Scala, Python, and R. Aside from statistics, they are generally experts in data mining, machine learning, and deep learning. The financial industry, for example, has traditionally referred to them as quants, because of their solid background in mathematics.
  • Data Analysts/Business Analysts: Are data professionals who are generally part of line-of-business or functional groups (sales, marketing, etc.).  They are familiar with how the organization operates, the strategic objectives, and where, and how data is needed.  They transform business questions into data queries.   They have a deep understanding of the information and key metrics executives need to measure and achieve their goals. They are experts at utilizing front-end BI (Business Intelligence) tools.
  • Data Platform Administrators: Manage the infrastructure so that it works well, has ample capacity, and provides high quality of service to every department relying on it. They are responsible for transactional databases, data warehouses, data lakes, BI tools and so on. Additionally, they establish the access policies, control the infrastructure, and licensing costs.
  • Line of Business Data Consumers: Are the final users of the data, and generally use the data to make decisions. They rely on BI tools and are responsible for taking action based on what the data says. For example, sales leaders may decide to invest more in a particular geography based on sales activity. Perhaps marketing managers may decide to allocate campaign funds to certain types of campaigns based on ROI metrics.
  • Chief Data Officer: This person oversees the whole data team operation. Typically they report to the CEO, CTO, and sometimes the CIO.

 

DataOps Stakeholders and their tools

Stakeholders in the DataOps process at PagerDuty

The diagram above places the stakeholders in their traditional area of responsibility within the DataOps process at PagerDuty.  Undoubtedly, there will be varying degrees of overlap in different organizations.

DataOps at PagerDuty

At PagerDuty we have implemented a DataOps practice that leverages PagerDuty and a handful of our technology partners. By applying PagerDuty and DataOps principles we have been able to:

  • Move away from several data warehouses to a single data warehouse where datasets from MuleSoft, Segment, Fivetran, Kafka, and Spark pipelines get consolidated into a single source of truth.
  • Meet data SLAs from multiple data workloads by taking advantage of automation and data technology partnerships.
  • Leverage Observability for Detection, Resolution, and Prevention of Incidents with our data – before users learn about it.
  • Shift the focus of the data team from administrative tasks to data driven insights and data science.
  • Future-proof our data environment to meet the demands of proliferating data use cases.  These range from BI to new Artificial Intelligence (AI) applications from over 400 internal users in multiple departments and thousands of customers.

 

DevOps Process at PagerDuty

DataOps Environment at PagerDuty

The diagram above depicts several of the key components that make up our DataOps environment.  While every organization’s data needs and data environment are unique, you can glean into the fact that our problems and architecture are not all that unique (multiple data warehouses, multiple ETL tools, strict SLAs, sprawling demand for datasets). More than likely, you are already spotting several shared high level problems as well as architectural similarities with your own data environment.

You can also leverage PagerDuty in your DataOps environment

The PagerDuty digital operations platform alerts data teams and downstream data users and consumers as soon as data issues arise to prevent data downtime. We are excited to announce our six currently published DataOps or data-related integrations within our ecosystem. These technology partners solve data pipeline and data quality problems across the organization.  They improve collaboration, reduce friction, and reduce data failures by improving alignment:

  • Monte Carlo: Provides end-to-end data observability, solving data downtime before it happens.
  • Lightup: Helps enterprises achieve great data quality at cloud scale.
  • Arize: A Machine Learning (ML) observability platform to monitor, troubleshoot, and resolve ML model issues.
  • WhyLabs: Prevents costly AI failures by providing data and model monitoring
  • Prefect: Build and monitor data pipelines with real-time alerting
  • Astronomer: Reduces data downtime with real-time data monitoring on pipelines

 

Image of PagerDuty integrations

PagerDuty DataOps Ecosystem

Most importantly, these new DataOps integrations with PagerDuty cover key areas such as: data pipeline orchestration, testing and production quality, deployment automation, and data science/ML model management.  We encourage you to try PagerDuty along with some of these PagerDuty ecosystem technology partners to help you drive tighter collaboration amongst cross-functional teams and achieve better and faster decisions with less data downtime.  Similarly, if you are thinking about building a PagerDuty Integration, please sign up for a developer account to get started.

The post PagerDuty and DataOps: Enabling Organizations to Improve Decision Making with Better Data appeared first on PagerDuty.

]]>
What’s New: Updates to Mobile, PagerDuty® Process Automation Software & PagerDuty® Runbook Automation, and More! by Vera Chan https://www.pagerduty.com/blog/whats-new-product-update-2022-09-29/ Thu, 29 Sep 2022 13:00:43 +0000 https://www.pagerduty.com/?p=78880 We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent development and app updates from the product team...

The post What’s New: Updates to Mobile, PagerDuty® Process Automation Software & PagerDuty® Runbook Automation, and More! appeared first on PagerDuty.

]]>
We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent development and app updates from the product team include Incident Response, PagerDuty® Process Automation, as well as Community & Advocacy Events updates. We continue to help customers automate everywhere to optimize cloud operations and reduce the amount of issues escalated to other teams. Get started now and learn about:

PagerDuty Mobile App

New! Create and Manage Maintenance Windows Through the PagerDuty Mobile App

The Maintenance Windows feature for the PagerDuty mobile app is generally available (as of September 15th, 2022). Maintenance windows help responders temporarily disable a service, including all of its integrations, while it is in maintenance mode. When a service is in the maintenance window, all the service integrations are effectively “switched off” so that no new incidents will trigger. Now, users away from their desk or office have the flexibility to create, update, and delete maintenance windows through the PagerDuty mobile app.

(Featured above: Active Maintenance Windows on Mobile)

(Featured above: Maintenance Windows on Mobile Create and Schedule Maintenance Window)

PagerDuty® Process Automation

PagerDuty® Process Automation Software and PagerDuty® Runbook Automation Version 4.6.0

Check out the new features and enhancements for PagerDuty®  Process Automation (formerly Rundeck Enterprise), PagerDuty® Runbook Automation and Rundeck Community in this release, including: 

  • Enhancements to the Amazon ECS node source plugin. Users can now integrate with multiple clusters in a given region, making it easier to manage ECS Tasks across larger environments

  • A number of important security and compliance updates, and bug fixes

Learn more:

Product Deprecations

Please take note and keep your teams informed of our upcoming product deprecations.

V1/V2 Webhooks EOL

The End of Life date for v1 Webhooks is 10/31/2022. This means:

  • You will no longer be able to create new v1 Webhooks or use existing connections to v1 Webhook extensions
  • Apps or integrations that are using v1 Webhooks will stop working

For more details and steps to migrate to v3 Webhooks, please refer to this migration guide.

If you have additional questions, please reach out to support@pagerduty.com.

Important Dates:

  • V2 Webhooks – V2 webhook extensions will be unsupported in October, 2022.

Required Permissions:

  • Admins or Account Owners can migrate an entire account
  • Team Managers can only migrate webhooks for their assigned Teams

If you have any questions, please reach out to your PagerDuty contact or our support team at support@pagerduty.com.

Event Rules EOL & Migration to Event Orchestration

PagerDuty Event Rules End-Of-Life is January 31, 2023. You can:

  • Learn more about the migration in the knowledge base
  • Learn more about Event Orchestration
  • Contact your account managerWe have plenty of migration paths to support this EOL. Additionally on the EOL date, we will auto-migrate any remaining event rules you are using to Event Orchestrations, one-to-one. From then on, you’ll be able to do everything in Event Orchestration that you can in Event Rules today. Event Orchestration has the same features as Event Rules and it uses the same backend architecture, ensuring that event processing has billions-of-events-worth of testing already baked in.

Webinars & Events

Join us for the following webinars and events to learn more about PagerDuty’s recent product updates and how they benefit customers. These are just a few of many:

Webinars

Cloud Ready: Accelerate Your Modernization Journey in the Cloud with PagerDuty and AWS

Join Inga Weizman and Mandi Walls as they walk you through how PagerDuty’s industry-leading AWS integrations are built to help organizations using AWS and PagerDuty to automate incident response and speed up cloud adoption while also minimizing downtime and impact to customers. They’ll cover how PagerDuty helps organizations:

  • Reduce downtime and customer impact with service ownership while enabling teams to drive continuous improvement and innovation
  • Modernize and optimize your operations with a set of enterprise-grade AWS integrations
  • Automate incident response with PagerDuty’s Runbook Automation and newest set of AWS plugins and prebuilt jobs that make it easier to get up and running with auto-diagnostics

Register today!

Improve Efficiency of Incident Response with Automated Diagnostics for AWS in PagerDuty

Join Greg Chase, Sebastian Joseph, and John Kiefer from PagerDuty as they discuss how PagerDuty Automated Diagnostics for AWS can help customers quickly triage problems in AWS environments. You’ll see:

  • How first responders can diagnose problems in AWS like senior engineers
  • How to use prebuilt diagnostics for AWS services available for PagerDuty
  • How your senior engineers can create new diagnostics for first responders
  • Demos of how all of the above works

Register Today!

Customer Service Operations: The Proactive Approach with Zendesk

Join Kat Gaines and Carrie Lacina as they showcase how PagerDuty for Customer Service Operations and Zendesk empower customer service teams to resolve issues faster to get ahead of customer-impacting incidents. Join to learn:

  • How to leverage machine learning to inform customers before they know about a problem, with information on what to expect next and provide differentiated responses for VIP customers
  • The benefits of using PagerDuty Automation Actions in Zendesk to validate customer problems and capture critical information via automation to diagnose and resolve cases faster
  • How PagerDuty and Zendesk customers drive loyalty, improve CSAT, and exceed customer SLAs by resolving issues before they impact business

      Register today!

Register for upcoming events in October here!

PagerDuty Community Twitch Stream

Join us on our Twitch channels, PagerDuty Twitch Stream and PagerDuty Community Twitch Stream, to catch up on one of our latest streams led by our Developer Advocates! Catch our past streams via the YouTube Twitch Streams Channel.

PagerDuty Community Twitch Stream

If your team could benefit from any of these enhancements, be sure to contact your account manager and sign up for a 14-day free trial.

The post What’s New: Updates to Mobile, PagerDuty® Process Automation Software & PagerDuty® Runbook Automation, and More! appeared first on PagerDuty.

]]>
What’s New: Updates to PagerDuty® Process Automation Software & PagerDuty® Runbook Automation, Integrations, and More! by Vera Chan https://www.pagerduty.com/blog/whats-new-product-update-2022-08-31/ Wed, 31 Aug 2022 13:00:10 +0000 https://www.pagerduty.com/?p=78256 We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent development and app updates from the product team...

The post What’s New: Updates to PagerDuty® Process Automation Software & PagerDuty® Runbook Automation, Integrations, and More! appeared first on PagerDuty.

]]>
We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent development and app updates from the product team include PagerDuty® Process Automation, our Partner Integrations and App Ecosystem, as well as Community & Advocacy Events updates. We continue to help customers automate everywhere to optimize cloud operations and reduce the amount of issues escalated to other teams. Get started now and:

PagerDuty® Process Automation

PagerDuty® Process Automation Software and PagerDuty® Runbook Automation Version 4.5.0

The latest PagerDuty(R) Process Automation Version 4.5.0 includes the following:

  • New Sumo Logic Job Step Plugin: Now Sumo Logic users can automate operational tasks – such as retrieving logs for incident diagnostics–by integrating with a Sumo Logic instance.
  • AWS ECS Node Executor Plugin: Users can now run commands across multiple ECS containers in a single Job Step or from the Commands tab. This makes it easier to accomplish tasks–such as retrieving time-critical diagnostics during an incident before containers are redeployed. 

View the highlights and learn about additional updates and fixes to plugins and more.

Upgrades, fixes, and package updates are also available for the Rundeck Open Source Product. You can learn more about them and view our public pull requests in GitHub.

Learn more about the Nessie Orchid Tower Release (4.5.0), August 10, 2022.

Partner Integrations & App Ecosystem

PagerDuty App for Jira Data Center: Annual Marketplace App Recertification

If you’ve already integrated PagerDuty with Jira Data Center to tackle critical service requests and accelerate incident resolution with bi-directional sync between Jira Server issues and PagerDuty incidents, we have great news for you! We have also officially re-certified the PagerDuty App for Jira Data Center for another year, maintaining our Data Center Approved Status in the Atlassian marketplace.

Learn more about the app in the knowledge base

PagerDuty App for Jira Cloud: New Option to Display PagerDuty on Jira Cloud Sidebar Available This September

  • New Jira Cloud customers will not see PagerDuty displayed on the sidebar by default for all Jira Projects.
  • Existing Jira Cloud project admins have the ability to hide or display PagerDuty on the Jira sidebar per project.

(Featured above: PagerDuty sidebar disbled in Jira Cloud)

(Featured above: PagerDuty sidebar enabled in Jira Cloud)

If you have any questions, please reach out to your PagerDuty account team or Customer Support (support@pagerduty.com).

PagerDuty App for BMC Helix/Remedy: Transfer of Ownership to Partner KTSL

PagerDuty has transferred ownership of the BMC Helix/Remedy integration over to the BMC Elite partner known as KTSL-who specializes in service management and integration expertise. Going forward, KTSL will build and support the integration.


(Featured above: PagerDuty Demo from Andrew North on Vimeo)

Product Deprecations

Please take note and keep your teams informed of our upcoming product deprecations.

Event Rules EOL & Migration to Event Orchestration

PagerDuty has decided to EOL (end-of-life) Event Rules on January 31, 2023. We have made this decision to ensure that we are dedicating our resources toward building the most robust and reliable event-driven enrichment and automation experience for our customers. Event Orchestration was released earlier this year as the next evolution of Event Rules, and it is now the best way for users to compress rule volumes, improve noise reduction, and more effectively automate away well-understood manual work.

We have plenty of migration paths to support this EOL. Additionally on the EOL date, we will auto-migrate any remaining event rules you are using to Event Orchestrations, one-to-one. From then on, you’ll be able to do everything in Event Orchestration that you can in Event Rules today. Event Orchestration has the same features as Event Rules and it uses the same backend architecture, ensuring that event processing has billions-of-events-worth of testing already baked in.

(Featured above: PagerDuty Event Orchestration)

V1/V2 Webhooks

If you are currently using V1/V2 webhook extensions in your PagerDuty environment, you need to migrate them to V3 webhook subscriptions to maintain functionality.

Please follow our migration guide.

Important Dates:

  • V1 Webhooks – V1 webhook extensions became unsupported (no new features or bug fixes) since November 13, 2021 and will stop working in October, 2022.
  • V2 Webhooks – V2 webhook extensions will be unsupported in October, 2022 and will stop working in March, 2023.

Required Permissions:

  • Admins or Account Owners can migrate an entire account.
  • Team Managers can only migrate webhooks for their assigned Teams.

What are Webhooks? Webhooks allow you to receive HTTP callbacks when significant events happen in your PagerDuty account, for example, when an incident triggers, escalates, or resolves. Details about the event are sent to your specified URL, such as Slack or your own custom PagerDuty webhook processor.

If you have any questions, please reach out to your PagerDuty contact or our support team at support@pagerduty.com.

Learn more about webhooks.

Webinars & Events

Join us for the following webinars and events to learn more about PagerDuty’s recent product updates and how they benefit customers. These are just a few of many:

Events

PagerDuty Summit 2022 (On Demand)

What's New in PagerDuty and PagerDuty Product Updates : PagerDuty Summit

Missed us at PagerDuty Summit this year? Summit talks are now available on demand so you can catch our newest demos, technical sessions, and keynotes from executives and industry leaders anytime!

Watch Summit 2022 on demand

Accelerating Incident Resolution with PagerDuty through Automated Diagnostics for AWS
Thursday, Sep 29, 2022 — 7:00 AM PDT
Thursday, Sep 29, 2022 — 10:00 AM PDT
Thursday, Sep 29, 2022 — 5:00 PM PDT

Join Jake Cohen and Greg Chase from PagerDuty as they discuss how PagerDuty’s:

  • Event Intelligence can help reduce noise and drive to next best action for fewer incidents and faster resolution
  • Automated Diagnostics help jumpstart time to triage and accelerate root cause analysis

Register Today!

Customer Service Operations: The Proactive Approach with Zendesk
Wednesday, Sep 28, 2022 10 AM PDT

Join Kat Gaines and Carrie Lacina  to learn how to:

  • Leverage machine learning to inform customers before they know about a problem, with information on what to expect next and provide differentiated responses for VIP customers
  • Use Automation Actions in Zendesk to validate customer problems and capture critical information via automation to diagnose and resolve cases faster
  • Drive loyalty, improve CSAT, and exceed customer SLAs by resolving issues before they impact the business

Register today!

Harness the Power of Automation-First AIOps to Improve MTTR
Thursday, Sep 22, 2022 10:00 AM PDT

Join Heath Newburn from PagerDuty as he walks you through:

  • Common problems enterprises are looking to AIOps to solve
  • Key criteria to consider when evaluating solutions to get value quickly
  • How PagerDuty’s automation-first, people-centric approach to AIOps enables machines and people to do what they do best

Register today!

Register for upcoming events in September here!

PagerDuty Community Twitch Stream

Join us on our Twitch channels, PagerDuty Twitch Stream and PagerDuty Community Twitch Stream, to catch up on one of our latest streams led by our Developer Advocates! Catch our past streams via the YouTube Twitch Streams Channel.

PagerDuty Community Twitch Stream

If your team could benefit from any of these enhancements, be sure to contact your account manager and sign up for a 14-day free trial.

The post What’s New: Updates to PagerDuty® Process Automation Software & PagerDuty® Runbook Automation, Integrations, and More! appeared first on PagerDuty.

]]>
PagerDuty Debuts as a Leader in 2022 GigaOm Radar for AIOps Solutions by Heath Newburn https://www.pagerduty.com/blog/pagerduty-debuts-as-a-leader-in-2022-gigaom-radar-for-aiops-solutions/ Tue, 09 Aug 2022 13:00:32 +0000 https://www.pagerduty.com/?p=77888 Every year there is a surprise in a Radar report. While it won’t be a surprise to our thousands of customers who are seeing tremendous...

The post PagerDuty Debuts as a Leader in 2022 GigaOm Radar for AIOps Solutions appeared first on PagerDuty.

]]>
Every year there is a surprise in a Radar report. While it won’t be a surprise to our thousands of customers who are seeing tremendous benefits with us, PagerDuty is excited to be named a Leader in the 2022 GigaOm Radar for AIOps Solutions.

GigaOm uses extensive criteria to evaluate vendors in their Radar. From the report: “This year we’re distinguishing AIOps solutions that require displacing existing tools from those that can be added to the IT tool box without major disruption.” This was one of the keys to PagerDuty being positioned as a Leader and rated Outstanding on Tool Displacement

Time to value and total cost of ownership have long been a hallmark of Pagerduty’s business value. Customers can trust PagerDuty to help them rapidly maximize the value of their existing systems without having to rip and replace. 

Our SaaS platform delivers simplified setup, snap-on integrations, and easy-to-use event routing and enrichment that were all important to our Outstanding score for Ease of Implementation. 

Diagram of GigaOm AIOps Radar Report

With more than 650 integrations, PagerDuty was rated as Exceptional on Data Consumption, System Integration, and Cloud Monitoring. This breadth of capabilities ensures that practitioners get quick time to value with virtually no customization required. The UI, Terraform, and API providers allow subject matter experts to leverage all their data sources – monitoring, CI/CD, DevOps, Security, BizOps, etc. to create the context needed for even new team members to rapidly solve problems.

Our Event Orchestration allows for simple yet powerful routing, enrichment, and automated responses to problems. By moving away from massive, complex rule bases to simplified node-based graph routing, SRE and DevOps teams can control exactly how they want to use events to create context, provide diagnostics, and automatically resolve problems where appropriate. The simple graphical interface provides for easy experimentation, while the underlying Terraform provider enables self-service capabilities, removing the burden from a centralized team. This holistic self-service capability was highlighted in our Outstanding rating for Manageability.

GigaOm recognized the advantage of an automation-first approach to AIOps. Our Rundeck and Catalytic acquisitions have enabled our platform to offer comprehensive automation integration across the platform in the form of built-in Automation Actions and flexible workflows. Balancing workloads between your humans and your machines is critical to maintaining productivity and preventing burnout. Leveraging automation as the first responder in incident resolution can remove toil and accelerate time to resolution. In cases where a responder is not required,  common problem signatures can be identified and handled at machine-speed with automated remediation. But no, the machines are not coming for our jobs: while auto-remediation can handle a small percentage of well-understood fixes, more often than not automation can serve as a second pair of hands to augment responders at the center of incident response and investigation.

Although this is only our first year in the Radar, we have built on the past several years’ success with Event Intelligence and are committed to growing capabilities for our customers to deliver new business outcomes. We are on track to process 20 billion+ events for clients this year. By leveraging our many years of data as a SaaS platform to understand how clients reduce noise and resolve problems, we have been able to grow machine learning, automation, and analytics allowing teams to focus on keeping production running and delivering better solutions. 

Read the report for yourself here and learn more about PagerDuty’s solution for AIOps here.

The post PagerDuty Debuts as a Leader in 2022 GigaOm Radar for AIOps Solutions appeared first on PagerDuty.

]]>
What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today by Vivian Chan https://www.pagerduty.com/blog/what-is-event-orchestration-7-ways-to-start-using-this-powerful-new-feature-from-pagerduty-to-reduce-noise-and-automate-away-manual-toil-today/ Tue, 02 Aug 2022 13:00:02 +0000 https://www.pagerduty.com/?p=77580 Does your team deal with too much noise? Does your heart sink a bit when you think about how much your rulesets have sprawled in...

The post What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today appeared first on PagerDuty.

]]>
Does your team deal with too much noise? Does your heart sink a bit when you think about how much your rulesets have sprawled in order to manage your event processing needs? That’s why we released Event Orchestration earlier this year to help teams reduce the amount of manual work that goes into event management. Event Orchestration is the next evolution of our Event Rules feature set, which helps to route, enrich, and modify events on ingest to remove noise and automate processes.

We took Event Rules and supercharged it to handle more complex, custom logic and sophisticated conditional event processing capabilities. We even wrote our own condition language (PagerDuty Condition Language or PCL, pronounced “pickle”) to enable this – you can learn about how we built it from Staff Engineer Barry Kim’s Summit session “PCL 101” here

Event Orchestration is now the best way for users to compress rule volumes, improve noise reduction, and more effectively automate away well-understood manual work. We’ve recently announced that to ensure that we are dedicating our resources toward building the most robust and reliable event-driven enrichment and automation experience for our customers, we will End-of-Life Event Rules and migrate all customers to Event Orchestration early next year. For more information about this and the various migration options, we’ve outlined everything in this Knowledge Base article.

In this blog, I’m going to walk through how Event Orchestration is different from Event Rules and review seven common use cases for Event Orchestration that we’re seeing make the most impact for our customers. 

What is Event Orchestration? And how’s it different from Event Rules?

Event Orchestration is a direct upgrade from Event Rules. Basic Event Orchestrations can perform all the same basic event processing actions that event rules can perform with the added benefits of improved UI, better rule creation, APIs and Terraform support, and advanced conditions. For customers with the Event Intelligence add-on or Digital Operations plans, Advanced Event Orchestrations bring even more functionality to the table, including contextual conditions, webhooks, paused incident notifications, rule nesting, and a direct integration with Automation Actions.

Below are a few of the key ways that Event Orchestration is superior to Event Rules:

  • Easier to use: Architecturally, Event Orchestration takes advantage of PagerDuty’s more modern approach to front-end development by leveraging React as its core frontend stack. This allows customers to navigate their rules with less lag and greater support for accessibility improvements in the future.
  • More complex event processing: Because of the condition language that Event Orchestration supports and the capability to nest rules, customers using Event Orchestration can perform complex event processing actions with a fraction of the configuration effort. What could once be accomplished with 10 event rules can now be done with 1 Event Orchestration rule.
  • More robust support for automation: Users can trigger webhooks with custom headers or automation actions.
  • More precise event processing: Rule nesting allows users to execute automations with a high degree of precision as customers can itemize out in detail each known failure start for their systems, deploying automation to each with confidence.

What are the most common use cases for Event Orchestration? 

With all this additional functionality, I hope it’s clear that Event Orchestration has the potential to significantly improve your team’s experience as a part of major and minor incident response. But where should people get started? 

One of the most popular sessions in our on demand video library at Summit 2022 was 7 Ways to Use Event Orchestration to Reduce Noise and Automate More Often. In the session, Professional Services Consultant Eddie Willits, joined by Senior Product Manager Frank Emery, walks through Event Orchestration and the most common use cases that customers are using the powerful new capability. I’ve summarized them below, but if you’re an audio/visual learner, you can also watch their quick 20 minute session

Here are the 7 most common use cases for Event Orchestration today:

1) Suppression

The trouble with noise is that it’s very distracting. It’s especially annoying when it wasn’t even worth stopping what you were doing to look at it in the first place. Classic examples of this would be events coming from a staging environment or non-critical development events that are sent after hours. How can you ensure that your team only works on the incidents that matter?

Event Orchestration can help teams stay focused on only critical events by only interrupting responders with the most important, time-critical alerts. You can design an orchestration that looks for a certain type of low-priority signal and configure an orchestration that calls PagerDuty’s Pause Incident Notification to handle irrelevant, low value, or distracting events by automatically downgrading or suppressing them entirely. Instead of spending time acknowledging distracting events, responders can stay focused on critical alerts affecting the business.

2) Automated maintenance windows

How often are you thinking “I’m performing maintenance at midnight tonight! How do I make sure that service owners are not woken up?” 

Event Orchestration helps with this use case with the ability to create custom logic that accommodates recurring or scheduled rule conditions. Customers can define when all alerts should be suppressed or re-routed to support an ongoing or planned maintenance window. You can even get more specific than a blanket maintenance window per service by setting up rules that have differentiated ways to handle per alert by monitoring tool. An example we’ve seen customers lean into for this would be to configure an orchestration that can adjust severity after hours for production environment-specific events that coincide with on-call and off-call hours. 

NOTE: We’re often asked about what happens to the alerts when they’re put in maintenance. Events that come into PagerDuty are always viewable for reference, even if suppressed. These can be seen in the “Alerts” menu. 

3) Controlling Alert Storms

Nobody wants to deal with an alert storm. But they do happen. The question is how to control your team’s experience when it happens during a partial or full outage so that it’s minimally disruptive and they can focus on the most important task at hand: get to the fix.

With Event Orchestration, customers can use threshold-based rules to control incident creation behavior during alert storms. You can configure rules that are specific to thresholds to trigger actions that run up to a certain threshold or that run after going above a certain threshold. This gives you even more precision for event enrichment, routing, or grouping in relation to event volume. 

4) Routing and Enrichment

When troubleshooting, responders need to be able to quickly understand what happened during an outage. How can you highlight this information better in an incident so responders don’t waste time looking for it?

Event Orchestration can help customers with an automated way to approach standardization of incident data by: 

  • overriding malformed fields
  • replacing fields based on known use cases
  • updating the severity/priority/urgency
  • adjusting incident creation behavior (email integration)

As an example, you could set up an orchestration where anytime an event contains the payload of “Response Time is High” that is over 1000ms, it will immediately flag the incident as Priority 1. 

5) Providing Runbooks

Anytime someone new joins your team, especially when they’re on the junior side, it takes a while to onboard them on specific approaches that are a part of your incident response processes. It takes time to explain and train on how to approach even well-understood, common incidents. One of the most basic forms of automation we’ve seen customers take to address this problem is simply start by writing down how they solve these issues in runbooks that can be shared as tried and true ways to handle repeat issues. 

Event Orchestration makes it easy to add notes that contain links to runbooks, or resolution instructions for known issues. That way, while triaging the incident and looking at the alert payload, the runbook is easily accessible for reference. Embedding this actionable intelligence during event processing on ingest means that L1 responders can easily solve common, well-understood issues without further escalation to senior engineers. 

6) Updating Systems of Record

Customers using specific ITSM tools for major and minor incidents will be interested in how to keep their system of record in sync with their PagerDuty incidents.

With Event Orchestration webhooks, users are able to ensure that as incidents are ingested they update connected systems. Specific rules contain webhooks that fire off payloads to these systems which create records with up to date event payload information. We’ve seen this used with Jira, ServiceNow, and homegrown CMDB systems. Learn more about PagerDuty’s integrations with ITSM solutions here.

7) Automated Diagnostics and Remediation

Everyone wants to start automating their operational processes. This is not surprising: there are a LOT of manual steps associated with incidents. However, it can be hard to know where and how to start. 

Automated diagnostics are a low-risk, high value way to trim down on MTTR time. Think of all the diagnostics you’d have to run at the beginning of an investigation – now, imagine if those were already run by the time your responder got to the incident?

Event Orchestration makes it simple to integrate automation tools via webhooks. It also has a built-in native integration with PagerDuty Automation Actions, which can trigger automated diagnostics and remediation all in the PagerDuty platform. This helps cut down overall time to resolution since diagnostic results are piped directly into incident details and ready for the responder to review.

Learn more about Event Orchestration

You can read more about Event Orchestration or check out some of our videos on Youtube, including Event Orchestration in Terraform and Fun and Math Behind Event Orchestration.

To learn more about how to extend Event Orchestration across services, read this blog about Global Event Orchestration or take our product tour.

The post What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today appeared first on PagerDuty.

]]>