Change theme
Help
Press space for more information.
Show links for this issue (Shortcut: i, l)
Copy issue ID
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
Vote: I am impacted
Notification menu
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Pending code changes (auto-populated)
View issue level access limits(Press Alt + Right arrow for more information)
Request for new functionality
View staffing
Description
GKE Upgrade Introduction
The GKE upgrade docs describe:
During automatic or manual node upgrades, PodDisruptionBudgets (PDBs) and Pod termination grace period are respected for a maximum of 1 hour. If Pods running on the node can't be scheduled onto new nodes after one hour, GKE initiates the upgrade anyway.
Problem
Usually GKE waits for up to one hour for protected pods (protected by PodDisruptionBudget) to disappear. But very often, GKE does not wait one hour and instead it deletes protected pods immediately.
The GKE logs indicate, that a protected pod is deleted immediately, when the
GoogleContainerEngine
fails to create anEviction
object for the pod with errorcontext deadline exceeded
.Details
Here is the failing
Eviction
object creation as it appears in the GKE logs. Notice the"callerSuppliedUserAgent": "GoogleContainerEngine"
and the errorTimeout: request did not complete within requested timeout - context deadline exceeded
.I removed the private information from this log entry and replaced it with
<my-...>
.Expectation
During upgrade, GKE should never delete a pod if it fails to create an GKE upgrade docs .
Eviction
object for this pod. GKE must respect thePodDisruptionBudget
for up to one hour as described in