SPS Commerce, the largest retail network, connects over 90,000 retail businesses of all sizes across the globe. Companies turn to SPS to streamline operations and support new order management models, such as the ability to ship directly to consumers.
As Andy Domeier, Senior Director of Technology at SPS, explained, “Companies have very different backend systems and technical abilities, which can make collaboration complicated. Retailers and suppliers need to work together regardless of size, and we offer a variety of full-service offerings to connect these companies across our network.” He leads a group of technology teams that include the Site Reliability Engineering (SRE), Cloud Operations, System Operations, and Continuous Improvement teams, responsible for ensuring the network is always on and working seamlessly for their customers.
To support the company’s growth, in 2013 Domeier sought to streamline existing digital operations to better scale to meet the future needs of the business.
At this time, Domeier’s teams faced new challenges as its retail network grew. For example, Domeier’s teams saw an increase in noise and clutter as they adopted new monitoring and data observation tools.
When incidents arose, teams had to scramble, with little visibility because of the alert noise coming from various monitoring tools. They also had difficulty notifying the subject matter expert (SME) for each issue or affected service. SPS needed a solution to help streamline this process and a platform to help manage the entire incident lifecycle.
Domeier and his teams faced challenges with:
“We needed something that could integrate with our monitoring tools, send alerts, and act as a hub to make sure those alerts were sent to the right person,” explained Domeier.
Domeier centralized all of the monitoring tools and teams onto PagerDuty so they could have improved consistency in terms of visibility into performance. This removed friction from the incident response process and enabled SPS to maintain “organizational velocity”. Leveraging PagerDuty’s broad ecosystem of over 500 integrations, SPS connected all of its cloud monitoring tools, including Amazon CloudWatch, Grafana, LogicMonitor, Prometheus, Sentry, and Sumo Logic, to PagerDuty. Additionally, Domeier’s teams leveraged PagerDuty’s integration with Slack, so that teams could trigger, respond to, and resolve incidents—all within the chat application. As a result, SPS technology teams smoothly transitioned mission-critical services to improve how the teams monitored an ecosystem of tooling and performance solutions, and could take immediate action on incidents.
In recent years, the company adopted a full-service ownership model, where developers own their code in production. Full-service ownership enabled SPS teams to minimize downtime and maintain a consistent customer experience.
“We’ve seen a positive internal cultural shift,” explained Domeier. “Before, our development teams would deliver their code to production with little transparency to their service’s health and availability. But as we architect and deploy new services, managing these services using PagerDuty has allowed development teams to see their code all the way through deployment and take ownership when incidents arise. Our Technology team is a talented and truly special group of individuals around the world!”
Furthermore, the company’s customer success teams have also started using PagerDuty. Because the company’s platform must be always on, the customer success team can now proactively escalate customer-facing issues to engineering teams before customers are impacted. They also leverage PagerDuty to route important notifications about specific customers to Technical Account Managers improving the quality of service SPS is able to provide.
With PagerDuty, SPS has seen several benefits, including:
“PagerDuty’s incident data is a gold mine of improvement insights,” said Domeier.
As the world went remote and consumers went digital in 2020, so did SPS. Using PagerDuty, the company smoothly transitioned to remote work, despite high volumes within their network. “Since the pandemic began, we’ve found that retailers needed to find ways to be more efficient, more effective, and save money,” explained Domeier. “This led to an increase in the use of our retail network, and PagerDuty has been able to help us keep organizational velocity even as we’ve moved to a remote working environment.”
Looking ahead, SPS plans to embed PagerDuty into its service creation process to streamline development and support teams as new products, features, and services are built. SPS is also planning to build more automation around the PagerDuty platform so developers can gain more context about new code and service deployment using PagerDuty’s change events. Domeier is also looking into other PagerDuty products like Event Intelligence and Analytics as they continue to see new operational data surface from the platform.
“PagerDuty allows my teams to focus on what’s important to us and continue to move our business forward,” explained Domeier.
To learn more about how PagerDuty is helping companies transform their digital operations, visit www.pagerduty.com/customers. To see how PagerDuty could help your team approach real-time work more efficiently, start a 14-day free trial today.