features | Tags | PagerDuty Build It | Ship It | Own It Mon, 17 Apr 2023 20:13:25 +0000 en-US hourly 1 https://wordpress.org/?v=6.3.1 Reduce MTTR and Take Automation to a New Level with PagerDuty Global Event Orchestration by Hannah Culver https://www.pagerduty.com/blog/global-event-orchestration-generally-available/ Tue, 18 Apr 2023 12:00:58 +0000 https://www.pagerduty.com/?p=81923 PagerDuty’s Global Event Orchestration is now generally available. Global Event Orchestration’s powerful decision engine enriches events, controls their routing, and triggers self-healing actions based on...

The post Reduce MTTR and Take Automation to a New Level with PagerDuty Global Event Orchestration appeared first on PagerDuty.

]]>
PagerDuty’s Global Event Orchestration is now generally available. Global Event Orchestration’s powerful decision engine enriches events, controls their routing, and triggers self-healing actions based on event data. Teams can use this functionality across any or all services within PagerDuty. This feature is a continued investment in Event Orchestration, demonstrating PagerDuty’s commitment to providing customers with best-in-class automation capabilities.

Customers in our early access program are already seeing value in Global Event Orchestration, touting reduced MTTR and better standardization of incident response at scale. As Kiril Yurovnik, Technical Lead at Riskified, said, “With a growing number of events, minimizing noise and toil is imperative, especially as organizations aim to optimize their IT processes amid the current economic environment. We’ve been using PagerDuty’s Global Event Orchestration as part of the early availability program, and the results have been strong. Riskified has been able to scale noise reduction, especially from non-production environments, saving our team valuable time to spend time innovating on what’s next.” 

What are Global Event Orchestrations?

Global Event Orchestration is like Service Event Orchestration in that it allows users to define complex rules that determine what happens to an event as it is processed. The difference is that Global Event Orchestration enriches events at ingest. Then, once the data is normalized, the event is routed to a service based on various criteria. This ensures that responders have the best event data possible to begin the response process.

Global Event Orchestration has three key components that make it successful for scaling incident response. 

Global Orchestration Rules allow users to apply actions to events across services. Teams can create rules which process event data across services and use the processed data to improve event routing. This empowers organizations to establish and improve on auto-remediation. This means that a human doesn’t need to be involved in an incident to resolve it. This also reduces the blast radius of an incident via more intelligent routing.

Enhanced integration key management reduces the workload of managing integration keys for different monitoring tools. This allows users to combine integration keys into one event orchestration. Even better, enhanced integration key management is now available for all PagerDuty plans.

Additional APIs allow for management at scale. Teams can use REST APIs for event source and Global Orchestration Rule management. Both of these APIs have Terraform support. These APIs are in addition to the REST APIs for Event Orchestration/Service Orchestration management.

“Leveraging PagerDuty’s Global Event Orchestration has been critical to ensure that our event routing processes are efficient and scalable to optimize IT operations and spend,” said Brian Long, Cloud Infrastructure Engineer at Hyland. “With Global Event Orchestration, our organization is able to detect the “resolved” condition from our notifications to execute as a resolve and reduce the number of places these conditions need to be configured by at least a factor of three. This frees up our time to focus on innovation, not configuration.”

How can Global Event Orchestration help my team?

With Global Event Orchestration, teams will see:

  • Codified incident response processes: democratize and distribute well-understood incident responses across distributed teams
  • Fewer incidents: use contextual event data from all services within your ecosystem to improve suppression accuracy
  • Faster resolution: apply automation across teams and enable automated diagnostics at scale with standardized enrichment and data normalization

How teams use Global Event Orchestration may vary based on organizational structure. Capabilities align with two different teams: ITOps, SRE, and NOC teams and developer teams.

ITOps teams will be able to capitalize on the event normalization capabilities, ensuring that all events look the same as they come in.

SRE teams can create and extend automation across any or all services within a technical ecosystem. This makes scaling and standardizing automation across an organization easier than ever.

For L1 response teams such as NOCs, Global Event Orchestration helps them handle the massive incoming wave of events. Events can be routed to the NOC if they meet certain criteria. And, as the event passes through levels of rules and nested rules, automation can deliver diagnostics to the L1 responder. If the fix for an incident is well-known, organizations can create auto-remediation.

Developer teams will see fewer incidents and faster resolution. With auto-remediation, incidents can be resolved before they even hit the services that the developer teams are on call for. And, with in-depth routing criteria, incidents don’t bounce from team to team. If automation or the NOC or L1 responders can’t resolve it, the incident will go to the subject matter expert (SME). And, by the time the SME begins to work on the incident, diagnostic information is already available, reducing resolution time.

How can I get started today?

Global Event Orchestration is generally available for all PagerDuty AIOps customers. To see it in action, join us on Twitch Friday, April 14. 

PagerDuty AIOps helps teams experience fewer incidents, faster resolution, and greater productivity without long implementations or heavy ongoing maintenance. To try PagerDuty AIOps, you can request a trial here or take our product tour. If you want to talk to sales, contact us through this form.

To learn more about Global Event Orchestration, register for this webinar. If you’re a PagerDuty AIOps customer looking to create your first Global Event Orchestration, this knowledge base article can show you how to get started.

The post Reduce MTTR and Take Automation to a New Level with PagerDuty Global Event Orchestration appeared first on PagerDuty.

]]>
Introducing PagerDuty AIOps: Harnessing the Power of AI to Transform Modern Operations for the Enterprise by Hannah Culver https://www.pagerduty.com/blog/introducing-pagerduty-aiops/ Tue, 11 Apr 2023 12:00:40 +0000 https://www.pagerduty.com/?p=81930 Today, PagerDuty launched a new AIOps solution to leverage the power of AI, provide built-in automation and build on the company’s foundation data model to...

The post Introducing PagerDuty AIOps: Harnessing the Power of AI to Transform Modern Operations for the Enterprise appeared first on PagerDuty.

]]>
Today, PagerDuty launched a new AIOps solution to leverage the power of AI, provide built-in automation and build on the company’s foundation data model to transform modern operations for the enterprise. PagerDuty has long suppressed noise to help distributed development teams focus. Now, PagerDuty AIOps addresses the large-scale event correlation, compression, and automation needs of ITOps, Command Centers, NOCs, and SRE teams with Global Event Orchestration (now generally available), and Global Alert Grouping (EA in H2 2023). If you’re interested in being a part of the early access program for Global Alert Grouping, sign up here. Going beyond event management, PagerDuty AIOps helps organizations work more efficiently, including giving them the ability to execute end-to-end, event-driven automation.

Our early access customers are already seeing results with PagerDuty AIOps, including 87% average noise reduction, deployed automated incident response 9x faster than existing solutions, and 14% faster MTTR.

As Kiril Yurovnik, Technical Lead at Riskified, said, “With a growing number of events, minimizing noise and toil is imperative, especially as organizations aim to optimize their IT processes amid the current economic environment. We’ve been using PagerDuty’s Global Event Orchestration as part of the early availability program, and the results have been strong. Riskified has been able to scale noise reduction, especially from non-production environments, saving our team valuable time to spend time innovating on what’s next.”

You can see PagerDuty AIOps in action by taking our product tour.

What is PagerDuty AIOps?

According to PagerDuty platform data, event volumes have grown by 70% YoY. As a result, businesses suffer from too much noise and too much toil while their response teams slog through chaotic, manual response processes.  

And when ITOps and SRE teams who act as first responders for incidents lack access to crucial context and visibility system-wide, they can’t take the next best action. This operational inefficiency has a compounding effect. It increases the cost of operations, reduces productivity across the technical organization, and takes away from value-add work.

In a resource-constrained environment, teams can’t wait for year-long implementations, they need help now. Organizations are looking for a solution that has fast time to value, integrates with their existing systems, and provides fast ROI. 

PagerDuty AIOps helps teams reduce noise, triage efficiently to drive the right actions towards resolution, and remove manual, repetitive work from the incident response process. PagerDuty AIOps works out of the box without requiring long implementations or heavy, ongoing  maintenance. Organizations continue to see best-in-class results. Noise reduction baked in with ML models that learn and adapt based on user behavior means teams see fewer incidents overall. And end-to-end event driven automation ensures that resolution is faster and requires less input from humans who are needed for value-add work.

“Leveraging PagerDuty’s Global Event Orchestration has been critical to ensure that our event routing processes are efficient and scalable to optimize IT operations and spend,” said Brian Long, Cloud Infrastructure Engineer at Hyland. “With Global Event Orchestration, our organization is able to detect the “resolved” condition from our notifications to execute as a resolve and reduce the number of places these conditions need to be configured by at least a factor of three. This frees up our time to focus on innovation, not configuration.”

Here’s what PagerDuty AIOps includes: 

  • Event correlation, noise compression, and triage context functionality, freeing site reliability engineers and information technology teams from managing multiple vendors and manual processes to a single powerful solution that drives to resolution quickly.
  • End-to-end automation, from event ingestion through auto-remediation, to help teams shift from reactive to proactive by capturing and actioning critical events before they become value-destroying incidents.  
  • Advanced noise reduction features (available in our early access program) that group alerts across services and allow customers to leverage both defined rules and machine learning to only surface the incidents that matter.
  • A visibility console that gives operations teams a single source of truth to monitor and quickly manage all incidents before major incidents occur with far-ranging business, IT, and financial impacts. 
  • Global Event Orchestration, a powerful decision engine to enrich and control routing or trigger self-healing actions.
  • With more than 700 integrations on the PagerDuty Operations Cloud platform, teams can trust our automation-led, people-centric AIOps solution to help save time and money.

How does PagerDuty AIOps work?

PagerDuty AIOps has sets of capabilities that help organizations standardize and scale incident best practices across all teams and services. And, it comes with new features custom-built to serve ITOps, Command Centers, NOCs, and SRE teams.

Reduce noisy incidents: reduce incident noise with the click of a button, either within a service or across services with Global Alert Grouping. Use built-in ML models, or create your own logic. And combine intelligent ML and rule-based alert grouping methods for customizable grouping capabilities. Group alerts by content, time, or other criteria for noise reduction that fits your organization’s needs.

Screen recording of PagerDuty noise reduction via alert grouping.Accelerate triage time and drive action: Leverage ML to surface the most important information for responders immediately. When an incident occurs, responders can quickly discover the probable origin of the incident, if the incident has previously occurred, and if a change was the likely cause.

Screen recording of PagerDuty triage features including past incidents and probable origin.Automate the redundant: Leverage event orchestration’s powerful decision engine to enrich and control routing or trigger self-healing actions based on event conditions across any or all services within PagerDuty with Global Event Orchestration.

Screenshot of PagerDuty Global Event Orchestration rule builder.Visualize what matters: Create a custom dashboard that provides a comprehensive view of your operations posture across services. Additionally, you’ll get full visibility into your event data so that you can prioritize what gets ingested and processed and have total transparency into your event usage.

Screen recording of PagerDuty Visibility Console where users can visualize all their event data.

How can I get started with PagerDuty AIOps today?

For current PagerDuty customers with Professional or Business plans, you can self-serve purchasing PagerDuty AIOps in your account subscriptions menu. 

For Event Intelligence customers, contact your account team about migration options to get access to new features available in PagerDuty AIOps. For more details, please see our knowledge base article.

Whether you’re a current PagerDuty customer or looking to get started, you can see PagerDuty AIOps in action by requesting a trial or taking our product tour. If you have questions and want to speak with our sales team, you can reach out here.

The post Introducing PagerDuty AIOps: Harnessing the Power of AI to Transform Modern Operations for the Enterprise appeared first on PagerDuty.

]]>
Getting Started Workshop: Rundeck By PagerDuty by Nisha Prajapati https://www.pagerduty.com/resources/webinar/getting-started-with-rundeck-workshop/ Mon, 12 Dec 2022 20:49:04 +0000 https://www.pagerduty.com/?post_type=resource&p=80503 The post Getting Started Workshop: Rundeck By PagerDuty appeared first on PagerDuty.

]]>
The post Getting Started Workshop: Rundeck By PagerDuty appeared first on PagerDuty.

]]>
Automating Work in Real Time Through the PagerDuty Operations Cloud by Greg Chase https://www.pagerduty.com/blog/automating-work-in-real-time-through-the-pagerduty-operations-cloud/ Tue, 14 Dec 2021 14:00:10 +0000 https://www.pagerduty.com/?p=73111 Did you catch PagerDuty’s Fall Launch? We revealed a raft of new capabilities to help our customers accelerate critical work as part of the PagerDuty...

The post Automating Work in Real Time Through the PagerDuty Operations Cloud appeared first on PagerDuty.

]]>
Did you catch PagerDuty’s Fall Launch? We revealed a raft of new capabilities to help our customers accelerate critical work as part of the PagerDuty Operations Cloud. We announced general availability of PagerDuty Rundeck Actions. We also previewed early access for Rundeck Cloud.

The PagerDuty Operations Cloud includes the ability to automate work such as event driven automation, and automating incident response.

These new offerings reflect PagerDuty’s vision to democratize automation for all users. We do this by allowing engineers to delegate automated IT processes to IT users. This way employees can take real-time action themselves to get their work done.

Bridging the Automation Gap

Companies these days are awash in “task automation”.  This includes Python and bash scripts, Ansible and Jenkins frameworks, and cloud infrastructure platforms such as the Amazon EC2. These automations are designed for specific purposes. They are often only usable by the experts who deploy them.

Other users have need for the operations performed by this automation. Yet, they must escalate to the experts to have that automation executed for them. We call this disconnect the “Automation Gap”.

PagerDuty closes the automation gap by making automation available to the people who need it.

Three kinds of disconnects interplay to create the Automation Gap. These are knowledge gaps, skills gaps, and access gaps. Gaps in knowledge are a reflection of how vast and complex our IT environments have become. No single person can know specific context, version, and dependencies, of every component. Specialization is a must. Skills gaps are the reality that different users have different technical skill sets. Not everyone knows how to administer a database, or automate a continuous integration pipeline. In fact, the time of a company’s most skilled technicians are usually in high demand. Anything that can offload their work helps scale the business. Finally maintaining security according to best practices is what causes access gaps. IT staff should not have wide access to super-user credentials.

A gap between automation and the people who need the results is called The Automation Gap

The Rundeck platform by PagerDuty helps bridge these gaps. Now engineers can delegating automation to stakeholders. This cuts escalations, interruptions, and waiting time. Engineers standardize automation into operational processes. This makes it easier for colleagues to traverse knowledge and skills gaps. Processes can incorporate existing task automation as individual steps in an operational workflow. This abstracts the context of each step allowing a consistent operational experience. Rundeck also helps optimize and improve security.  Processes can run privileged operations on resources without needing to share secrets to users. Rundeck integrates with single-sign-on (SSO) to enable role-based access control (RBAC). Rundeck logs all activity at process and step levels to comply with audit requirements.  Finally Rundeck plugs into operations platforms and applications that IT users work in. These include Jira, ServiceNow, PagerDuty, and even Slack and Microsoft Teams.  This means IT end users can get access to operations they need, in the context of screens they work with.

Rundeck is about enabling your end users to do what before only your expert engineers were able to do. And, it’s in this spirit that we’ve designed our new products, Rundeck Actions and Rundeck Cloud.  Now incident responders and other IT end users can get work done in real time instead of waiting.  Operations close out faster. IT teams see fewer interruptions. Everyone is a lot less frustrated.

Connects Responders to Automated Diagnostics and Remediation 

Rundeck Actions connects automation to incident responders in PagerDuty. It allows automation engineers to connect and curate automated calls. Engineers can then publish and delegate this automation for use by first responders. These calls invoke automation in production environments to run diagnostics and repair actions.

See the power you can provide to your responders to run diagnostics and repairs.

Rundeck Actions connects with production environments through an Action Runner. The Action Runner sits behind a firewall calls back to the Rundeck Actions endpoint. It looks for steps to execute. and returns output from automation.  PagerDuty then records this feedback into the incident timeline.

Rundeck Actions is now generally available and has an ambitious roadmap ahead of it. Planned features include integrating to Slack and Microsoft Teams, and PagerDuty Event Orchestration. Let us know if you’d like a demo of Rundeck Actions.

Automate Operations Without Operating the Automation With Rundeck Cloud

Rundeck Cloud provides the benefit of automating IT processes, without having to maintain automation servers. Users work in the same design time and run time user experience as Rundeck Enterprise.  The only difference is the operating environment is provided as a service.

Rundeck Cloud delivers the same general automation capabilities as Rundeck Enterprise with these additional features:

  • Fast start: Get access to the design time environment in minutes rather than waiting for an installation process.
  • Always up to date: Upgrades and patches are applied automatically, so you always have the latest software capabilities, bug fixes, and security updates.
  • On demand scaling: Adjust capacity as needed to fit your automation workload.
  • Built for availability: A resilient infrastructure with redundancy and load balancing is handled for you according to our service level agreement (SLA)
  • Hardened security: Built using best practices for security means you can comply with safety and security requirements by integrating with your SSO and privileged access management, while logging a comprehensive audit trail.

Similar to Rundeck Actions, Rundeck Cloud securely connects to your production infrastructure through a Cloud Runner that sits behind your firewall or VPC. Think of the Cloud Runner as a remote execution server that calls back to the Rundeck Cloud endpoint looking for job steps to execute.

A single Rundeck Cloud instance can be associated with multiple runners, which means Rundeck Cloud can act as your process automation control plane across disparate environments, cloud accounts, and SaaS services.

We are accepting limited sign-ups for customers who want to try out Rundeck Cloud in our Early Access program. We expect Rundeck Cloud to be generally available sometime early2023.

 

The post Automating Work in Real Time Through the PagerDuty Operations Cloud appeared first on PagerDuty.

]]>
Optimizing Response Time with Change Event Data by Nisha Prajapati https://www.pagerduty.com/resources/webinar/optimizing-response-time-with-change-event-data/ Wed, 11 Aug 2021 18:54:54 +0000 https://www.pagerduty.com/?post_type=resource&p=70885 The post Optimizing Response Time with Change Event Data appeared first on PagerDuty.

]]>
The post Optimizing Response Time with Change Event Data appeared first on PagerDuty.

]]>
New Webinar Alert: PagerDuty 201 by Camden Louie https://www.pagerduty.com/blog/new-webinar-pdu-201/ Thu, 25 Mar 2021 13:00:47 +0000 https://www.pagerduty.com/?p=68509 When we launched our first-ever training webinar, PagerDuty 101, the PagerDuty team was eager to help new users hit the ground running and provide a...

The post New Webinar Alert: PagerDuty 201 appeared first on PagerDuty.

]]>
When we launched our first-ever training webinar, PagerDuty 101, the PagerDuty team was eager to help new users hit the ground running and provide a quick on-ramp to educate customers about the PagerDuty platform. In that one hour session, users learned how to configure the basic objects in their account to best suit their team’s needs—from adding users, schedules, and escalation policies, to technical and business services.

Since then, our product (and our customers) have come a long way. Technical complexities continue to increase with unforeseen circumstances like the COVID-19 pandemic, leaving teams with fewer resources to resolve incidents. So, we asked ourselves: “how do we ensure users are using the PagerDuty platform best practices to make their lives a little bit easier?” To answer this question, we looked inward. We know that the platform is a huge asset that helps automate the incident response process and mobilize Dutonians across teams, offices, and countries when incidents occur. In other words, PagerDuty is there to help during difficult times. Our leadership team used the tool to share real-time, localized information when the company decided to work from home indefinitely across the world.

But this type of seamless, automated action only works when an organization’s PagerDuty account is set up according to our suggested practices. There was a lot of time and intention put into our internal subdomain to ensure that the global and local leadership teams were empowered to use PagerDuty capabilities quickly and effectively—even when faced with a global pandemic. Since then, local teams have continued to take advantage of the PagerDuty platform as offices discuss safe reopening practices.

The Gold Standard

So how does a team learn best practices without having to go through the painful process of making costly, time-consuming mistakes? We learned that, unless you had been guided by experts like our Customer Success Managers or Solutions Consultants, users did not have a good idea of the different suggested and best practices laid out in the robust (but dense) PagerDuty Knowledge Base. So how can we provide this necessary knowledge for the many diverse teams using the PagerDuty platform?

To help fill the gap, PagerDuty University is excited to announce our newest webinar series, PagerDuty 201, to help our customers apply their learnings and configure their account according to our gold standard best practices. PagerDuty 201 goes through suggested practices and looks at more advanced functionality to maximize your investment.

Using the methods learned in PagerDuty 101, you can learn how to set up Technical and Business Services and look at why one model might benefit your team more than another. You can also learn about the extensive API functionality, as well as the creation of blameless postmortems and the huge benefit we see with organizations who use them. We know that seeing live examples and being able to ask questions to our Technical Trainers is a key to success. One of our inaugural attendees shared: “Absolutely thrilled that [PagerDuty 201] is going to be a regular event, will recommend it to dev managers and interested [individual contributors].”

To ensure you are ready to uplevel your PagerDuty use, PagerDuty 201 will cover:

  • PagerDuty Overview
  • Technical and Business Services
  • Status Dashboard & Communicating with Stakeholders
  • API Documentation
  • Event Rules
  • Triggering an Incident via the API
  • Postmortems

Don’t forget that, in addition to these webinars, the PDU e-learning courses, online resources, and Support Knowledge Base are always available to you! We hope to see you at PagerDuty 101 (offered twice monthly) and PagerDuty 201 (now offered monthly) soon!

The post New Webinar Alert: PagerDuty 201 appeared first on PagerDuty.

]]>
Product Keynote – It’s Time: What’s New in PagerDuty’s Platform by Bianca Wood https://www.pagerduty.com/resources/webinar/product-keynote/ Thu, 08 Oct 2020 21:21:54 +0000 https://www.pagerduty.com/?post_type=resource&p=65128 The post Product Keynote – It’s Time: What’s New in PagerDuty’s Platform appeared first on PagerDuty.

]]>
The post Product Keynote – It’s Time: What’s New in PagerDuty’s Platform appeared first on PagerDuty.

]]>
PagerDuty Deep Dive: How to Optimise Your Digital Ops Platform by Bianca Wood https://www.pagerduty.com/resources/webinar/deep-dive/ Thu, 23 Jul 2020 16:44:13 +0000 https://www.pagerduty.com/?post_type=resource&p=62913 The post PagerDuty Deep Dive: How to Optimise Your Digital Ops Platform appeared first on PagerDuty.

]]>
The post PagerDuty Deep Dive: How to Optimise Your Digital Ops Platform appeared first on PagerDuty.

]]>
PagerDuty: The Year in Review by Rachel Obstler https://www.pagerduty.com/blog/products-launched-year-in-review/ Thu, 18 Oct 2018 13:00:52 +0000 https://www.pagerduty.com/?p=50247 We just held our annual conference, PagerDuty Summit 2018, where we shared new product announcements and demoed new capabilities. But while we always have big...

The post PagerDuty: The Year in Review appeared first on PagerDuty.

]]>
We just held our annual conference, PagerDuty Summit 2018, where we shared new product announcements and demoed new capabilities. But while we always have big things that our engineering teams have produced to announce at Summit, there’s also a lot of work that happens throughout the year across our platform—and we just didn’t have time to demo it all.

But we did string together many of the features and capabilities we launched in the past year in a short form verse, which we shared at Summit. For those of you who couldn’t make it to the conference, here it is (and hope to see you next year)!

Our Event Intelligence product
Helps responders to focus.
It’s applied machine learning,
Not some hocus pocus.

It makes use of your actions
From your past response.
So if you’re an existing customer
You’ll see benefit at once.

And if you’re trying to route
Alerts to the right teams,
Our new alert rules engine
Will automate those streams.

And these rules can run response plays
That automate response actions.
They add responders and send updates
With no human interaction.

Still sometimes you get awoken
By alerts that are a-flapping,
Our new threshold alerting
Means we won’t disrupt your napping.

Integrations: our count
Is now over 15 score.
But there are always new tools
So we’ll keep adding more.

Like with Azure’s new metrics,
We’re adding alert types to the group.
And with VSTS, (I mean Azure DevOps)
We close the DevOps loop.

And with other oft-used tools,
We’re going deeper, too.
With updates for Splunk
And a Jira Version 2

And ServiceNow’s on V5
With priority syncing for starters.
We sync so well together
We’re now a Gold partner.

ServiceNow security and change items
Are also now supported,
And with our new one-click setup
Teams and schedules are ported.

The breadth and depth of our ecosystem
Insures your investment tenfold
Like with AWS Cloudwatch and Marketplace
Where we are now sold.

With our mobile app, now
You can make priority switches
To declare an incident major,
Add responders, and join bridges.

So if something goes wrong
When you aren’t at home,
You can incident command
All from your phone.

And if you are in your chat tool
Working fast while the world’s ablaze,
Our new APIs let you send
Stakeholder updates and run plays.

And as your growth company
Adds employees apace,
Our team hierarchy feature
Will keep them organized in place.

Now before we close,
We would like to thank
All who gave product feedback
Even when it was quite frank.

If you have more, please do share
We at PagerDuty are all ears
Thank you for being our customers,
Now on to innovating for the next year!

The post PagerDuty: The Year in Review appeared first on PagerDuty.

]]>
Automate Cross-Functional Team Responses For Any Situation by Paul Rechsteiner https://www.pagerduty.com/blog/response-plays/ Tue, 12 Dec 2017 12:00:42 +0000 https://www.pagerduty.com/?p=39815 Continuing our ongoing effort to make incident response best practices easy to adopt, PagerDuty is pleased to announce that response plays are now available! Response...

The post Automate Cross-Functional Team Responses For Any Situation appeared first on PagerDuty.

]]>
Continuing our ongoing effort to make incident response best practices easy to adopt, PagerDuty is pleased to announce that response plays are now available! Response plays let you automate precise cross-functional team responses for any situation, so that organizations can plan their major incident responses during peacetime and mobilize instantly when it’s wartime. While improving mobilization time and driving down total time-to-resolution, response plays also eliminate the perpetual need to keep “who to page” response documentation up to date.

How do response plays work?

Each response play lets you configure the following response actions in advance:

  • the on-call teams and conference information needed for a coordinated response
  • the cross-functional stakeholders to subscribe to incidents on a service
  • an optional status update for subscribers

Running a play on an incident is as easy as selecting the appropriate play from the list of available plays, and can be done with a couple of clicks in the PagerDuty web app:

Run a Play" action on the incident page in the PagerDuty web app, showing the list of available plays

Plays are also available in the mobile app, enabling response automation for on-call responders. This means that responders can mobilize a coordinated response or communicate to stakeholders with just a couple of taps:

 PagerDuty mobile app showing a response play selected from a list and being run

You can configure a response play to mobilize a complete major incident response team, including incident commander, communications liaison, and technical on-calls for the involved systems, which looks like this:

Response play entitled 'Mobilize P1 response' has three escalation policies as responders and a custom message

Having all of these response actions packaged up as a play lets any responder easily engage a major incident response during triage — without having to look up the corresponding documentation and then identifying the appropriate individuals and teams to involve. This kind of automation saves time and fits into nearly any triage workflow, including DevOps, NOC, and customer support.

Go directly from signal to action

If you have monitoring that directly measures customer or business impact, you can skip manual triage altogether and go directly from monitoring tool alert to mobilizing a coordinated response. For example, external monitoring can determine when your retail site is unreachable, and in that situation, there’s no need for someone to investigate before engaging the team; instead, an incident commander and the relevant subject matter experts should be automatically mobilized when this happens.

This is easy to set up with response plays. Any response play can be attached to a service, and run on each new incident created on the service. This approach minimizes the time from incident detection until the response team is mobilized and reduces opportunities for user error during mobilization.

Automate incident stakeholders

A response play can also subscribe stakeholders to an incident. This can be done on demand during an incident response, where the incident commander or communications liaison can determine when stakeholders need to be involved. Alternately, stakeholders can be immediately subscribed to new incidents by attaching the corresponding response play to a service. If your response process calls for it, you can even combine mobilization and stakeholder subscription in a single response play, like this:

Response play entitled "Mobilize P1 response" will simultaneously notify 3 responders with a custom message and subscribe 5 stakeholders to the incident

Improve your operational effectiveness with response plays

Response plays are a valuable automation capability for improving your organization’s incident response practices. Whether your goal is improving response precision, reducing mobilization time, efficiently engaging stakeholders, or saving responder time during an incident, response plays will help you achieve it.

To learn more about incident response best practices, take a look at PagerDuty’s Incident Response Guide. You can also check out our mobilizing coordinated responses and effective stakeholder communication guides for an in-depth explanation of how to best use PagerDuty capabilities for these needs.

Response plays are available immediately to all PagerDuty customers on Standard and Enterprise plans at no extra cost. Please contact our support team for additional information about response plays, or our sales team if you would like to upgrade to a plan with this feature.

The post Automate Cross-Functional Team Responses For Any Situation appeared first on PagerDuty.

]]>