Incidents may fail to close for absence type conditions (conditionAbsent) [222104474]

Assigned

Bug

Status Update

No update yet.

Description

jo...@google.com

created issue #1

Mar 1, 2022 04:35PM

Summary

Under certain circumstances, an alerting incident may fail to auto-close. The circumstances are:

Alert policy with MetricAbsence type condition.
The condition is met (data is initially present, then goes missing), which causes an incident to be created.
Data remains absent for greater than 24 hours.
Data returns after 24 hours.

We've identified a bug that prevents some incidents from auto-closing after the data returns. This behavior is inconsistent and leads to end-user confusion. Why would identical policies, with identical data streams behave differently? The difference in observed behavior is rooted in an internal implementation detail that needs to be addressed.

Mitigation

If in incident (that meets the above criteria) fails to auto-close as expected, users must wait 7 days (or the configured auto-close period) for the incident to auto-close.

Long-term fix

The product team is aware of this issue and actively working on a lasting solution. At the moment, there is no ETA available for a fix. All future updates regarding this issue will be communicated on this public issue tracker.

Comments

rp...@slb.com <rp...@slb.com> #2Sep 16, 2022 10:04PM

I have forwarded this request to the engineering team. We will update this issue with any progress updates and a resolution.

nc...@slb.com <nc...@slb.com> #3Sep 26, 2022 04:49PM

Hello! Sorry to bring up this issue after almost a year but I wanted to add that we have chosen metric identifier as

agent.googleapis.com/memory/percent_used but autoscaling didnt work out for us either. It would be appraciated if you can guide us.

ka...@google.com <ka...@google.com> #4Apr 4, 2023 08:30PM

Hi, at the moment we are using the cpu_utilization/target_utilization attribute (in app.yaml) for autoscaling in the app engine flexible environment, however it would be great if we can have the way to mention the memory_utilization metrics as well to decide on the auto scaling. It will give us more control of the auto scaling the instances than now.

za...@gmail.com <za...@gmail.com> #5Jun 15, 2023 05:25AM

Hi, I do not see any memory metrics in neither console nor stackdriver. Is this connected to this issue?

su...@google.com <su...@google.com> #6Jul 31, 2023 05:42AM

Hello Google team, I was directed to this issue by the support team when we raised concerns around the non availability of memory metrics for Auto Scaling. Is this feature 'released' or in roadmap or not considered ? Please provide some details around this