Alert policy notifications are delayed by minutes [77664368]

Fixed

Feature Request

Status Update

No update yet.

Description

de...@danielcompton.net

created issue #1

Apr 6, 2018 11:35PM

I was testing the Stackdriver Monitoring alerts and they took three and a half minutes from the policy failing to getting a Pagerduty and email notification (both arrived at the same time).

Timeline: (all in NZST)
11:13:48 - checks start failing
11:19:45 - violation began (according to Stackdriver)
11:23:24 - Slack notification from PagerDuty, saying that the incident started at 11:19:45
11:23:33 - Email from Stackdriver Monitoring
11:24:47 - Alerting resolved automatically

Something is severely wrong if it can take 3.5 minutes from policy violation to notification. Additionally, in Stackdriver it says that the incident duration was a minute instead of 4 minutes. This is on top of the minimum 5 minute uptime check alerting policy, meaning that the time between an issue starting and getting first notification is 10 minutes (!).

Comments

di...@google.com <di...@google.com> #2Apr 8, 2018 08:11PM

Assigned to di...@google.com.

Please note that altering policy notification delay depends upon measuring Metric collection delay, Duration window and notification's network delay for the delivery. If you specify a five-minute duration window, the notification will be delayed at least five minutes from when the event first occurs. You can find more detailed in this public document[1].

Moreover, a lengthy delay—longer than the duration window can cause conditions to enter an "unknown" state. When the data finally arrives, Stackdriver Monitoring might have lost some of the recent history of the conditions. And For uptime checks, Metric collection delay can be an average of four minutes, up to a maximum of 5 minutes and 30 seconds (from the end of the duration window). So you are appropriately getting 5 minutes delay for the uptime checks and total delay around 10 minutes.

Please let me know if this answers your query?

[1]:

https://cloud.google.com/monitoring/alerts/#notification-latency

de...@danielcompton.net <de...@danielcompton.net> #3Apr 16, 2018 12:13PM

> And For uptime checks, Metric collection delay can be an average of four minutes, up to a maximum of 5 minutes and 30 seconds (from the end of the duration window)

This seems like a very long time. I would really like to see that time be reduced to something on the order of a handful of seconds, not 5 minutes, if possible? I don't think I've used any other monitoring solutions which have this much latency built-in, it is quite surprising as a user.

Message last modified on Apr 16, 2018 12:13PM

di...@google.com <di...@google.com> #4Apr 16, 2018 01:44PM

Product Engineering team is aware of this feature request. However, I don't have any ETA on the fix and implementation. Nevertheless, you may follow this thread for the future updates.

di...@google.com <di...@google.com> Apr 18, 2018 04:37PM

Reassigned to gc...@google.com.

[Deleted User] <[Deleted User]> #5Mar 21, 2019 10:51AM

Any updates on this issue?

sa...@google.com <sa...@google.com> Aug 9, 2023 04:25AM

Reassigned to sa...@google.com.

sa...@google.com <sa...@google.com> #6Aug 9, 2023 12:28PM

Marked as fixed, reassigned to gc...@google.com.

Hello,

I’m pleased to inform you that our product engineering team has successfully resolved the reported issue. Please verify if the problem has been resolved from your end as well. If you encounter any further issues or have any additional concerns, please don't hesitate to create a new thread on the Issue Tracker, providing a detailed description of your issue.

I will now proceed to close this issue. If you have any other questions or need further assistance, please feel free to let us know.