Monitoring is an ancient discipline—but one that has evolved significantly in the past few years. Modern monitoring platforms collect a lot of data from our systems: work and resource metrics, events that are happening inside and outside our applications, distributed tracing data, real user monitoring, and more.
But are we using all that data in a way that helps to avoid outages without causing alert fatigue? Are we suffering from information overload in our monitoring systems? We’ll present strategies on how to organise your system data in a way that helps your teams anticipate future user-facing issues and avoids alert fatigue by paging only when immediate attention is required.
Monitoring is an ancient discipline—but one that has evolved significantly in the past few years. Modern monitoring platforms collect a lot of data from our systems: work and resource metrics, events that are happening inside and outside our applications, distributed tracing data, real user monitoring, and more.
But are we using all that data in a way that helps to avoid outages without causing alert fatigue? Are we suffering from information overload in our monitoring systems? We’ll present strategies on how to organise your system data in a way that helps your teams anticipate future user-facing issues and avoids alert fatigue by paging only when immediate attention is required.