Assigned
Status Update
Comments
dh...@google.com <dh...@google.com>
sa...@vaultedge.com <sa...@vaultedge.com> #2
I will try to explain the problem we are facing.
We use the App Engine Flexible environment which uses auto_scaling. It scales based on CPU utilization. When the CPU utilization falls below the provided threshold, it starts to scale down the pods. The application receives a SIGTERM signal followed by a 3 second wait time after which the pod is forcefully brought down. It is possible that the pod which is scaled down is still running a request. This 3 second wait time is not enough to complete the running request. So we end up failing the request.
Why are we not able to requeue the request during the 3-second wait time?
This is because the request starts with a MFA from the user. To requeue the request, we will have to go back to the user to get the MFA. As an API provider this is not within our purview.
Ask:
The requirement here is to have a parameter similar to "Idle Timeout" which is available for Basic Scaling. The hope is that this will help us in slowing down the scaling down process so that we can reduce the number of requests which are getting terminated.
We use the App Engine Flexible environment which uses auto_scaling. It scales based on CPU utilization. When the CPU utilization falls below the provided threshold, it starts to scale down the pods. The application receives a SIGTERM signal followed by a 3 second wait time after which the pod is forcefully brought down. It is possible that the pod which is scaled down is still running a request. This 3 second wait time is not enough to complete the running request. So we end up failing the request.
Why are we not able to requeue the request during the 3-second wait time?
This is because the request starts with a MFA from the user. To requeue the request, we will have to go back to the user to get the MFA. As an API provider this is not within our purview.
Ask:
The requirement here is to have a parameter similar to "Idle Timeout" which is available for Basic Scaling. The hope is that this will help us in slowing down the scaling down process so that we can reduce the number of requests which are getting terminated.
Description
What you would like to accomplish:
App Engine Flex scaling issues
Scaling automatically and process all the requests in the App Engine Flex even when the CPU utilization goes high.