Status Update
Comments
to...@google.com <to...@google.com>
to...@google.com <to...@google.com> #2
Hello,
Thanks for reaching out to us!
The Product Engineering Team has been made aware of your feature request, and will address it in due course. Though we can't provide an ETA on feature requests nor guarantee their implementation, rest assured that your feedback is always taken very seriously, as it allows us to improve our products. Thank you for your trust and continued support to improve Google Cloud Platform products.
In case you want to report a new issue, please do not hesitate to create a new
Thanks & Regards,
Manish Bavireddy.
Google Cloud Support
je...@gmail.com <je...@gmail.com> #3
We are stuck with the same problem.
We have an open case with Google Support about this and, among others, they have redirected us here.
We have tried the following workaround:
START THE WORKARROUND
As per temporary fix the recommended way to match on port would be at the Gateway level.In the following example, the Route will only match incoming traffic on port 80 because it's attaching to the "http" Gateway listener.
apiVersion:
kind: Gateway
metadata:
name: my-gateway
spec:
gatewayClassName: gke-l7-global-external-managed
listeners:
- name: http
protocol: HTTP
port: 80
- name: https
protocol: HTTPS
port: 443
---
apiVersion:
kind: HTTPRoute
metadata:
name: http-store-80
spec:
parentRefs:
- name: my-gateway
sectionName: http
hostnames:
- "
rules:
backendRefs:
- name: my-service
port: 8080
END THE WORKARROUND
But the workarround has not worked, the problem persists.
We post this comment to see what the status of the Issue is and to see if there are other workarrounds available.
Thanks & Regards,
Joan Cholvi.
Mercadona
ca...@gmail.com <ca...@gmail.com> #4
Hi,
Thanks for your response.
The information has been shared with the Product Team and further updates will be provided in this thread.
According to the
Please note that the Issue Tracker is primarily meant for reporting bugs and requesting new features. If you have any additional issues or concerns, please don’t hesitate to create a new thread on the
Thanks
nu...@google.com <nu...@google.com> #5
vi...@rivile.lt <vi...@rivile.lt> #6
Hello,
we also had the problem of receiving an 404 HTTP Error with the body message 'fault filter abort' whenever a client connects to the gatway with a port inside the 'Host' Header.
For everyone looking for a workaround, try it this way: For every domain, create one HTTPRoute with two matchers for the HTTP Header "Host". One matching "Host" to "domain:port" and one matching "Host" to "domain" without the port.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: my-workaround-https-route
spec:
# hostnames:
# - "mydomain.com"
parentRefs:
- kind: Gateway
name: my-gateway
sectionName: https-listener
rules:
# This host header matchers are the workaround
# Open issue at google (https://issuetracker.google.com/issues/294510336)
- matches:
# Matches domain with port
- headers:
- name: "Host"
value: "mydomain.com:443"
path:
value: /
type: PathPrefix
# Matches domain only
- headers:
- name: "Host"
value: "mydomain.com"
path:
value: /
type: PathPrefix
# This is not important for the workaround but maybe also useful for people switching from ingress to gateway
filters:
- type: RequestHeaderModifier
requestHeaderModifier:
set:
- name: X-Forwarded-Host
value: "{tls_sni_hostname}"
backendRefs:
- name: my-backend-service
port: 8080
The downside is that you have to create matchers instead of just simply listing all the domains you want to route through this rule. If you use multiple HTTPRoute resources for the same domains, make sure that the rules do not overwrite each other.
I hope the issue gets resolved quickly, we can't implement it on all domains and can't migrate all load balancers to gateway api.
[Deleted User] <[Deleted User]> #7
Experiencing the same issue after upgrading gateway class from gke-l7-gxlb
to gke-l7-global-external-managed
.
[Deleted User] <[Deleted User]> #8
w....@gmail.com <w....@gmail.com> #9
This feature is critical for running Jobs. We have started running jobs that are very expensive for us (in terms of database usage and data consumption that we pay for). The jobs need to run all the way through and complete. Instead, the autoscaler is triggering a scale down in the middle of execution.
I have spoken to other people who use prefect and airflow to orchestrate on Autopilot, and this random eviction of pods is forcing all of us to move out of GKE Autopilot because the jobs aren't atomic.
You should really add this feature back. More people are starting to use prefect and Autopilot is the obvious way to run the infrastructure EXCEPT for the fact that you can't set this one annotation.
[Deleted User] <[Deleted User]> #10
We are on GKE Autopilot version: 1.24.8-gke.2000
[Deleted User] <[Deleted User]> #11
[Deleted User] <[Deleted User]> #12
an...@storrs.ca <an...@storrs.ca> #13
ju...@gtempaccount.com <ju...@gtempaccount.com> #14
pn...@aritzia.com <pn...@aritzia.com> #15
dh...@google.com <dh...@google.com>
[Deleted User] <[Deleted User]> #16
st...@vaticlabs.com <st...@vaticlabs.com> #17
GKE Autopilot supports extended duration Pods from 1.27 or later with the
[1]
Description
Following a recent upgrade (1.21.6-gke.1503 in our case), the
cluster-autoscaler.kubernetes.io/safe-to-evict=false
annotation was banned on GKE Autopilot. For deployments this is fine, but for complex jobs this can quickly make things a major headache.We use Kubernetes for multi-step jobs that can span from a few minutes all the way to a few days. It often look something like this:
Imagine this, but with many more sub-jobs involved, and multiple parent jobs being started at various times, meaning nodes and pods get created and complete all over the place. Sub-job pods mostly don't care if they get evicted, but when it happens to a parent job, whose purpose is mostly to prepare data, fill queues, monitor sub-jobs and report progress, recovering its state is way too big a headache.
This is why until now we have been using the investigating this with Googlers over on StackOverflow .
safe-to-evict
annotation to ensure jobs run to completion without ever being restarted. So when the annotation was banned a few weeks ago, it had a major impact on our production workloads which is still present to this day, as we have failed to find a workaround despiteWorkarounds that were attempted:
maxUnavailable
, andminAvailable
wouldn't work either because we would need one policy per job and we're spawning jobs with unique names all over the place).terminationGracePeriodSeconds
(doesn't work because the autoscaler caps it at 10 minutes when moving pods).Please consider allowing this annotation again as there are legitimate use cases for it, in addition to it being the only reliable way of preventing pod eviction. That last point was even acknowledged in an email I got from GCP dated October 14, 2021, which stated the following:
Please at least consider unbanning the annotation while you figure out some kind of viable alternative. We would also appreciate having some form of advance notice for substantial changes like this in the future.