Strengthen Your DORA Metrics with PagerDuty
For technical teams, the findings from DORA provide a model for measuring and improving performance. With almost a decade of data gathered from more than 33,000 professionals worldwide, the capabilities and frameworks detailed by the research help teams pinpoint areas for improvement and areas to celebrate.
The team at DORA categorizes capabilities into three sections: Technical Capabilities, Process Capabilities and Cultural Capabilities. These are all important considerations for teams hoping to use the DORA findings to improve their own performance.
The DORA research is tool agnostic, and not prescriptive on how teams should go about improving their performance or prioritizing their goals, so it can be challenging to put together a tactical plan. With the myriad tools available to teams today, putting together the best combination for your team seems daunting.
If your team is utilizing the DORA metrics as part of your improvement goals, the PagerDuty Operations Cloud can help. As a central component of your environment, the visibility PagerDuty brings will help with a number of key metrics and compliment the tools that impact others. Several PagerDuty features fit well into the capabilities described by the DORA team.
Reducing Unplanned Work
A key benefit of using PagerDuty for incident response is reducing the overall time spent on unplanned work across the organization. This is important in several ways unrelated to the DORA capabilities, primarily around making better use of resources and focusing on work that provides the most value to the organization. Less time spent in incidents = more time spent delighting customers.
One of the challenges of incidents and unplanned work from an organizational culture perspective, is that they can be invisible–the time isn’t tracked in work plans, or documented as part of the time required to build a new feature. So making unplanned work visible helps teams manage the burden and work towards improving outcomes.
Work in process limits and visual management capabilities can both be improved by deploying PagerDuty to capture the impact of unplanned work and incidents on your team. Including data from PagerDuty related to how many engineers are impacted by out-of-hours incidents, overnight or “sleep time” incidents, and even work-day incidents gives managers additional insight into how teams are performing.
Integrations and Extensions
Integrations and extensions are powerful features that place PagerDuty at the center of your operational capabilities.
Integrations allow PagerDuty to receive information and alerts from other services, interrogate them, assign them to services, and initiate incidents. PagerDuty integrates with many third party services that provide monitoring, observability and tracing functionality for various types of events in your environment.
Extensions help you streamline your PagerDuty workflows with third-party tools like Slack, Microsoft Teams, Jira Cloud and Zoom that enhance your incident response experience.
Integrations and extensions mean that your teams can bring any number of tools with them to PagerDuty. The flexibility provided by more than 700 integrations gives all of your teams the tools they need, whether they are running machine learning applications, web platforms or databases.
Selecting the right tool for the job saves teams time and confusion, but it shouldn’t sacrifice the ability to respond to incidents and preserve reliability. Make PagerDuty part of your cross-team baseline, as described in the capability Empowering Teams to Choose Tools.
Change Events
Change events are non-alerting events that can be sent to PagerDuty from your build and deploy tools. They give your teams insight into what has changed in a service, and are an invaluable first-stop when investigating incidents.
Having a good continuous delivery practice is a key capability for DORA-aligned teams, and the research shows that teams using continuous delivery practices spend less time on unplanned work. Your team can speed up what time they do spend on unplanned work by using change events.
Change events can also provide wider visibility for deployment changes in your environment, improving your deployment automation, and helping streamline change approval. Even when teams are using different build and deployment tools, their change events can be captured by PagerDuty to help manage service reliability.
Automation
PagerDuty’s various Automation solutions play an important role in not just incident response, but in the completion of general technical tasks. Process Automation, Runbook Automation, and Automation Actions all contribute to teams spending less time on simple well-understood tasks and more time on work that adds value.
Many DORA capabilities emphasize automation, so using a general-purpose tool provides teams with a single interface for automating many tasks. One of the key capabilities strengthened by automation is cloud infrastructure, in which on-demand self-service is a requirement. Teams using PagerDuty’s automation solutions can create jobs and delegate work to whomever in the organization requires the tasks to be completed, creating true self-service workflows.
Terraform Provider
Related to automation is PagerDuty’s support for Infrastructure as Code (IaC) via the Terraform provider. IaC and similar solutions help teams track their changes to infrastructure and other components via Version Control, another of DORA’s technical capabilities.
Managing a large PagerDuty environment can be complicated using only the UI, but making use of Terraform to create objects and provide teams with templates helps everyone on the team improve the reproducibility and traceability of their changes.
Service Graph and Business Services
Finally, PagerDuty’s Service Graph and Business Services features enable teams to create relationships among services, illuminating the impact of incidents when they happen in large environments. Status Pages give the entire organization a place to look for service impacts during an incident, and how they relate to the customer.
Business services in PagerDuty are representations of user-facing features; users might see odd behaviors in a shopping cart experience, but will have no idea which backend service is causing the behavior. Building the relationships in the service graph provides data to your organization about the health of user-facing features and capabilities while also helping responders troubleshoot issues with dependencies.
These features will help with strengthening a team’s monitoring and observability capability, as well as the Monitoring systems to perform business decisions capability.
Do more DORA with PagerDuty
The tools your team uses should help your organization reach its goals. Implementing PagerDuty’s features will help your organization improve not only in responding to incidents, but also in creating reliable services your users love. To learn more, visit our website and join our community forum.