Troubleshooting should not be this much trouble.
Did you know that 85% of the duration of a typical incident is spent in diagnosis, involving at least 4 engineers? The fundamental goal of incident response is to figure out what went wrong, and who needs to fix it as fast as possible. Speeding up diagnosis of issues gets you to the resolver and the resolution that much quicker.
The challenge for many companies is that the deeper data needed to make accurate diagnoses is locked away in production environments, and requires specialists to extract because of their knowledge, skills, and access privileges.
To answer the questions of “What went wrong?” and “Who can fix it?” a first responder has to summon on average at least 3 other engineers to pull information to which only they have proper access.
The bottom line is too much time and too many people hours are spent repeatedly gathering diagnostic data. Automating this repetition will speed up MTTR by at least 15 minutes and reduce costs and interruptions by at least 50%.
Let responders do the troubleshooting, not your developers.
Here’s a better operating pattern: automate your most common troubleshooting procedures for your responders in PagerDuty and stop disrupting day-to-day work of expert engineers with PagerDuty’s Automated Diagnostics solution.
Automated Diagnostics saves time and interruptions throughout an incident by allowing responders to efficiently triage problems, only escalating to engineers who can resolve the issue. Resolvers have the data they need on hand, and this troubleshooting data is captured in the incident response record for future retrospectives.
Resolve faster
Diagnose and resolve incidents faster—while also reducing error budget consumption.
Increase efficiency
Resolve more incidents with 40% fewer escalations and fewer responders per incident.
Continuously Improve
Automate more as you go, improving MTTR by 25 minutes and reducing toil while bolstering engineering capacity.
Stop escalating and start mitigating
Automated Diagnostics helps responders using PagerDuty Incident Response to rapidly triage incidents using introspective data from services that was previously only possible by escalating to domain experts. Responders can utilize this expanded awareness to triage and rule out other possibilities among dependencies and check for false positives. This allows responders to quickly and efficiently escalate to the right resolver to mitigate an issue and resolve the incident.
Better understand current state
Automation emojis make it easier for your first-responders—and any other stakeholders—to understand which services are impacted/recovered.
Invoke automation with one hand
Get on-demand automation from the incident, use pre-approved automated actions, and view diagnostic output on an incident—all from inside the PagerDuty app.
Automate workarounds to reduce severity
Automated Diagnostics allows customers to compose automated mitigation processes such as triggering fail-over and disaster recovery, and remediation processes such as service restarts. Customers can deploy such workarounds in as little as half a day to reduce the severity of an outage until a more permanent fix can be implemented.
Trigger diagnostics proactively with Event Orchestration
When connected to PagerDuty Event Intelligence, it is possible to proactively run diagnostic jobs even before responders are notified, so they have the information they need on hand as they acknowledge an incident. For well understood cases, it's even possible to trigger automated workarounds, eliminating the need to summon responders at all if the automated remediation resolves the incident.
Automated diagnostics for AWS
Automated Diagnostics for AWS in PagerDuty provides frequently used out of the box diagnostics jobs for commonly used services including Amazon CloudWatch, Amazon Lambda, Amazon EC2, Amazon ECS, Amazon ELB, Amazon RDS, and Amazon VPC. Customers can easily configure these template jobs to work in their specific environments and extend the diagnostics steps in a job definition to help them to get started right away.
How it Works
When an incident is generated in PagerDuty, responders can invoke diagnostics from infrastructure, monitoring tools, cloud-providers, and more automatically or with the click of a button.
This information is then presented in PagerDuty in a format that is consumable by first-responders so that they can make more informed decisions on how to start troubleshooting the incident or who to pull in for assistance.
Automated Diagnostics Solutions Package
The PagerDuty Automated Diagnostics solution bundle consists of the following:
- PagerDuty Automation Actions. An add-on to PagerDuty Incident Response that securely connects PagerDuty end users with remotely executed automation.
- PagerDuty Runbook Automation. SaaS offering that is seamlessly connected with PagerDuty through Automated Actions.
- Plugin integrations with 35 common components and services, allowing their APIs to be rapidly and securely incorporated into automated workflows.
- Predefined diagnostic jobs providing common diagnostics for OS and infrastructure tools and services
- Automated Diagnostics implementation & customization Quickstart services