Change Healthcare enables a better and more efficient healthcare system by offering advanced technology and services to healthcare providers. One of its solutions is Change Healthcare Stratus Imaging—a clinical imaging platform in the cloud for securely sharing imaging data between hospitals, doctors, and patients. Change Healthcare Stratus Imaging connects 100 of the largest hospital systems and 3,400 medical imaging centers to their patients, supporting nearly 100 million studies a year.
Change Healthcare’s applications are integral in many clinical professionals’ day to day jobs. Because users expect consistent performance and data integrity, Change Healthcare provides four-nines availability SLA for critical services and application workflows. “Very high and guaranteed application availability is one of the fundamental requirements for mission-critical medical systems like ours,” shared Rusty Boguslavsky, Vice President, Cloud Infrastructure and Operations.
Like many companies moving to cloud-native architectures, Change Healthcare needed to address several challenges in order to meet their high availability service level objectives.
Large numbers of microservices lead to high alert volumes and false positive alerts. Complex dependencies made it difficult for engineers and managers to understand the broader business impact of an incident. This meant responders had to log into multiple tools and servers to perform diagnostic and repair tasks. Documenting runbooks for each service was a burden for site reliability engineers (SREs), and there was risk of human error when following manual tasks prescribed in the runbooks.
“We were finding that the manual steps for triaging an incident were taking too long—up to half the total duration of an incident—inflating our mean time to resolve (MTTR)”, said Boguslavsky.
To address these challenges, Change Healthcare turned to PagerDuty to manage incident response and fuel automation. The team integrated PagerDuty with their monitoring systems to collect data signals, and automatically trigger and document incidents. This reduces the volume of alerts and false positives that responders need to wade through. The separation of technical and business services in PagerDuty help localize diagnostics and drive technical resolution. Also, responders use Service Graph in PagerDuty to get more context on an incident’s impact to the business, and further narrow down the root cause of an incident.
With the addition of PagerDuty Process Automation, the team can fully automate the execution of diagnostic and incident resolutions runbooks during an incident. Automated diagnostic runbooks ensure the most relevant information—within the context of a specific incident—is available to an SRE before they even respond. This deeper context helps SREs immediately isolate the problem, determine who needs to be involved, and rapidly move to incident resolution. Responders can then trigger problem resolution automation as needed to potentially remediate the problem. Process Automation orchestrates cross-system tasks, eliminating the need for an SRE to log into multiple tools or directly access systems through a bastion server. This speeds up resolution and reduces chances for error.
As part of continuous health checking, Change Healthcare runs robotic interactions between their services which are expected to succeed 100% of the time. When an incident or customer issue occurs, these robotic results help show whether the system is behaving as expected.
Change Healthcare is required to comply with HIPAA, FDA and other cybersecurity and privacy regulations to protect the security and privacy of patient information. They chose to follow HITRUST CSF, achieving HITRUST and SOC2 Type 2 certification. Manual documentation was time-consuming to maintain these certifications, and there was risk of human mistake.
By introducing PagerDuty Process Automation, runbook automated jobs are pre-approved and validated to satisfy regulatory change controls, making them available to use during a time critical incident. Role-based access control for automation users, and integration with underlying secrets management infrastructure for privileged access ensures safe operation of diagnostic and remediation automation while improving overall security posture. And, since all automated actions taken, including users, jobs, systems, and results are logged, Change Healthcare is able to comply with regulatory auditability requirements.
“For cybersecurity, runbooks significantly reduce the risk of misconfiguration of existing controls, and subsequently reduce the risk of new vulnerabilities based on human mistakes or completely unintentional actions,” explained Boguslavsky. “Also pre-validated system changes by automated runbooks satisfy change control requirements for medical devices, meeting FDA requirements.”
PagerDuty has helped Change Healthcare exceed customer SLAs with improved application reliability, while maintaining industry compliance by:
“PagerDuty helped us significantly improve application reliability, reduce overall MTTR, and consistently support our committed SLAs,” said Boguslavsky.
Hear more from Rusty in his Summit ‘22 presentation: Automating Incident Resolution – the HITRUST Compliant Way.
To find out how PagerDuty Process Automation can help you automate and delegate business and IT processes, contact your account manager or request a demo.