Following a recent upgrade (1.21.6-gke.1503 in our case), the cluster-autoscaler.kubernetes.io/safe-to-evict=false annotation was banned on GKE Autopilot. For deployments this is fine, but for complex jobs this can quickly make things a major headache.

We use Kubernetes for multi-step jobs that can span from a few minutes all the way to a few days. It often look something like this:

API backend spawns a parent job A.
Parent job A does some data preparation, then fills a queue with stuff to process and starts job A-1 with N pods to process the queue. Parent job A then monitors the queue until it's empty, before waiting for job A-1 to terminate.
Parent job A then fills another queue using results from job A-1 (among other things) and starts job A-2 with M pods to process that one. Again, it monitors the queue, then waits for job A-2 to terminate.
Parent job A computes some statistics and terminates.

Imagine this, but with many more sub-jobs involved, and multiple parent jobs being started at various times, meaning nodes and pods get created and complete all over the place. Sub-job pods mostly don't care if they get evicted, but when it happens to a parent job, whose purpose is mostly to prepare data, fill queues, monitor sub-jobs and report progress, recovering its state is way too big a headache.

This is why until now we have been using the safe-to-evict annotation to ensure jobs run to completion without ever being restarted. So when the annotation was banned a few weeks ago, it had a major impact on our production workloads which is still present to this day, as we have failed to find a workaround despite investigating this with Googlers over on StackOverflow.

Workarounds that were attempted:

Pod Disruption Budgets (not compatible with jobs for maxUnavailable, and minAvailable wouldn't work either because we would need one policy per job and we're spawning jobs with unique names all over the place).
Setting a large terminationGracePeriodSeconds (doesn't work because the autoscaler caps it at 10 minutes when moving pods).

Please consider allowing this annotation again as there are legitimate use cases for it, in addition to it being the only reliable way of preventing pod eviction. That last point was even acknowledged in an email I got from GCP dated October 14, 2021, which stated the following:

To prevent the node from being scaled down or pod rescheduled to a different node, you can add the following annotations:

You can add the “cluster-autoscaler.kubernetes.io/safe-to-evict”:”false” annotation to your Kubernetes Pod object, which prevents CA/NAP from evicting the pod from the node. As a result, GKE will never scale down the node or reschedule the pod.

Please at least consider unbanning the annotation while you figure out some kind of viable alternative. We would also appreciate having some form of advance notice for substantial changes like this in the future.