Manheim® is North America’s leading provider of vehicle remarketing services, connecting buyers and sellers to the largest wholesale used vehicle marketplace and most extensive auction network. Through its 115 physical, digital and mobile auction sales, the company helps dealer and commercial clients achieve business results by providing innovative end-to-end inventory solutions. At Manheim, the team is heavily investing in the future of software delivery, which is a strategic business driver. To move this aspect of the business forward, Manheim has adopted DevOps practices alongside traditional IT operations. Jason Riggins, director of production engineering, is in charge of providing strategic direction, leadership and oversight for a number of teams: release engineering, development operations, and site operations. These “production engineering” teams serve as a core foundation for ensuring reliable software delivery, and therefore, revenue streams.
Overcoming the communication gap when critical incidents occurred
Prior to having PagerDuty in place, Manheim had a manual follow-up process when incidents occurred and critical apps and services were impacted. On-call responders would have to use a Google phone number or on-premises phone system to file an incident.
Manheim needed to keep up with their production engineering teams and improve their methods for recruiting the right responders when incidents occurred. They required a platform that could be standardized across all development teams. The challenge was the company was very siloed as a result of past organizational changes. This specific siloed nature within IT operations was becoming more and more challenging for Manheim, “We were less agile. We had tickets built up in the que, issues due to sheer siloed teams, and a communication gap that was created between the different units,” stated Riggins. “Something had to change for us to remain cutting edge.”
“PagerDuty was overall a more mature product, which is why we chose them.”
– Jason Riggins, Director of Production Engineering, Manheim
Recruiting the right people, with the right information to reduce downtime
“We did a bake off between PagerDuty and one of their competitors. One of the reasons we went with PagerDuty is because of the track record that they had already established in the industry and the proven customer base that existed. PagerDuty was overall a more mature and feature rich platform which is why we choose them,” stated Riggins.
Manheim has since changed the way they develop software and improved their IT support and response capabilities. “A big part of these business-impacting changes is having PagerDuty because it helped us become more efficient,” stated Riggins. Currently, the organization is using PagerDuty for incident management of various services, on-call scheduling, escalation policies, event management and a customized API integration. Being able to customize notifications, scheduling and escalations policies helps Manheim recruit the right team, every time, while the event management feature enables Manheim to aggregate incidents and reduce mean time to resolution (MTTR). Implementing PagerDuty enabled Riggins and his teams to seamlessly assign workloads and incidents to the appropriate teams. The organization no longer depended on one team opening tickets when an incident occurs, and then waiting for another team to respond. “With PagerDuty, the power of managing incidents within and across teams allows them to develop their own escalation policies and become self-managed,” said Riggins. As a result of these business impacting benefits and tangible ROI, Manheim continues to grow its implementation of PagerDuty.
When Manheim started looking for a solution, they found PagerDuty helped automate the work of their after-hours teams. “PagerDuty enabled us to move after hours dedicated headcount to the day time which increased overall productivity,” said Riggins. That’s when the company moved to a capability team model which shifted to a “you build it, you run it” model. The team develops the software and supports it, and PagerDuty helps with the monitoring and alerting of the application, increasing overall availability. The Enterprise Operation Center (EOC) identifies the top critical alerts that come through with an immediate response for each DevOps team, rather than sending them every single alert. If a developer on call misses an alert, they add the EOC team to the escalation policy, serving as another line of defense for the organization.
Another advantage PagerDuty offers Manheim entails the out-of-the-box integrations with Datadog, New Relic, and Amazon Cloud Watch. “We have a suite of monitoring tools and PagerDuty allows us to be more proactive, quickly,” said Riggins. There was a time when the organization rolled out a change and their service levels dropped, queues were backed up and the response time wasn’t where they needed it to be. Manheim has now standardized on PagerDuty, Datadog, and New Relic operations stack.
Finding a comprehensive and agile incident management solution
PagerDuty enables Manheim to connect to critical IT services and generate event data with the right teams at the right time, which increases their operational agility and IT efficiency. Overall, the enterprise-grade incident management solution has improved trust and communication within the organization. “PagerDuty is going to end up growing in user count and become the standard. We rely on the stability and accuracy that we get through PagerDuty. If you ripped the solution out, we would be back to square one,” stated Riggins.