Fixed
Status Update
Comments
di...@google.com <di...@google.com> #2
Please note that altering policy notification delay depends upon measuring Metric collection delay, Duration window and notification's network delay for the delivery. If you specify a five-minute duration window, the notification will be delayed at least five minutes from when the event first occurs. You can find more detailed in this public document[1].
Moreover, a lengthy delay—longer than the duration window can cause conditions to enter an "unknown" state. When the data finally arrives, Stackdriver Monitoring might have lost some of the recent history of the conditions. And For uptime checks, Metric collection delay can be an average of four minutes, up to a maximum of 5 minutes and 30 seconds (from the end of the duration window). So you are appropriately getting 5 minutes delay for the uptime checks and total delay around 10 minutes.
Please let me know if this answers your query?
[1]:https://cloud.google.com/monitoring/alerts/#notification-latency
Moreover, a lengthy delay—longer than the duration window can cause conditions to enter an "unknown" state. When the data finally arrives, Stackdriver Monitoring might have lost some of the recent history of the conditions. And For uptime checks, Metric collection delay can be an average of four minutes, up to a maximum of 5 minutes and 30 seconds (from the end of the duration window). So you are appropriately getting 5 minutes delay for the uptime checks and total delay around 10 minutes.
Please let me know if this answers your query?
[1]:
de...@danielcompton.net <de...@danielcompton.net> #3
> And For uptime checks, Metric collection delay can be an average of four minutes, up to a maximum of 5 minutes and 30 seconds (from the end of the duration window)
This seems like a very long time. I would really like to see that time be reduced to something on the order of a handful of seconds, not 5 minutes, if possible? I don't think I've used any other monitoring solutions which have this much latency built-in, it is quite surprising as a user.
This seems like a very long time. I would really like to see that time be reduced to something on the order of a handful of seconds, not 5 minutes, if possible? I don't think I've used any other monitoring solutions which have this much latency built-in, it is quite surprising as a user.
di...@google.com <di...@google.com> #4
Product Engineering team is aware of this feature request. However, I don't have any ETA on the fix and implementation. Nevertheless, you may follow this thread for the future updates.
di...@google.com <di...@google.com>
[Deleted User] <[Deleted User]> #5
Any updates on this issue?
sa...@google.com <sa...@google.com>
sa...@google.com <sa...@google.com> #6
Hello,
I’m pleased to inform you that our product engineering team has successfully resolved the reported issue. Please verify if the problem has been resolved from your end as well. If you encounter any further issues or have any additional concerns, please don't hesitate to create a new thread on the
I will now proceed to close this issue. If you have any other questions or need further assistance, please feel free to let us know.
Description
Timeline: (all in NZST)
11:13:48 - checks start failing
11:19:45 - violation began (according to Stackdriver)
11:23:24 - Slack notification from PagerDuty, saying that the incident started at 11:19:45
11:23:33 - Email from Stackdriver Monitoring
11:24:47 - Alerting resolved automatically
Something is severely wrong if it can take 3.5 minutes from policy violation to notification. Additionally, in Stackdriver it says that the incident duration was a minute instead of 4 minutes. This is on top of the minimum 5 minute uptime check alerting policy, meaning that the time between an issue starting and getting first notification is 10 minutes (!).