Fixed
Status Update
Comments
al...@gmail.com <al...@gmail.com> #2
I did a cluster upgrade... saw it work, like for half an hour, till the workloads again reached above 300 and then it started failing again.
al...@gmail.com <al...@gmail.com> #4
only 6 of the 9 nodes show up in the infrastructure. I've given the project number to "Terrence"
tu...@google.com <tu...@google.com> #5
I've inspected the project and was able to load nested resources under each tab. Many of the namespaces had 100+ pods deployed. I did notice some latency in the table drawing. Are you able to now load the resources you'd expect?
al...@gmail.com <al...@gmail.com> #6
Iag indeed, but I suspect some things are dropped as well. Example: only 4 of the 8 nodes show op in the infrastructure tab.
ig...@google.com <ig...@google.com> #7
The metadata agent crashlooping indicates that it's an older version. The crashes were fixed in v0.0.20-2.
al...@gmail.com <al...@gmail.com> #8
That's the version that comes with the cluster. I tried updating it but GKE resets it to 0.2-0.0.19-1
I'm running the latest version of GKE available for non alpha clusters: 1.10.5-gke.3
I'm running the latest version of GKE available for non alpha clusters: 1.10.5-gke.3
ig...@google.com <ig...@google.com> #9
Yes, you are using the managed GKE configs. Unfortunately, it will take time to release the fixed version into GKE. We are investigating other options for shorter-term mitigation.
One workaround you could try is running a second metadata agent by following the instructions athttps://cloud.google.com/monitoring/kubernetes-engine/customizing , but applying https://storage.googleapis.com/stackdriver-kubernetes/stable/metadata-agent.yaml instead of https://storage.googleapis.com/stackdriver-kubernetes/stable/agents.yaml . The metadata agent in kube-system will still crash, but you should get fresher metadata overall.
One workaround you could try is running a second metadata agent by following the instructions at
al...@gmail.com <al...@gmail.com> #10
Indeed. Installing an extra agent will bring the metrics back in k8s beta metric.
al...@gmail.com <al...@gmail.com> #11
Note that I still see meta-data agent restarts, but it's not crash-looping anymore.
ig...@google.com <ig...@google.com> #12
Yes, those restarts are expected for now. We are working on fixing this,
but they are mostly harmless.
but they are mostly harmless.
ig...@google.com <ig...@google.com> #13
Looks like there's a workaround, and this is fixed in the latest agents.
jd...@google.com <jd...@google.com> #14
Yes, we have a workaround for it, but I'd like to keep this open until the actual fix is rolled out on GKE.
sh...@google.com <sh...@google.com> #15
Reassign to myself, because I am going to drive the release of this fix into GKE.
Alex, could I confirm the manual installation above fixes the issue in your GKE cluster? Thanks!
Alex, could I confirm the manual installation above fixes the issue in your GKE cluster? Thanks!
al...@gmail.com <al...@gmail.com> #16
Even better, I used this trick to run Kubernetes beta on an existing cluster!!!
Personally, I think this (running the agent separate) would have been the better approach for testing this release.
Personally, I think this (running the agent separate) would have been the better approach for testing this release.
ig...@google.com <ig...@google.com> #17
Alex, to get the full experience on an existing cluster that doesn't
already have managed configs, rather than using the workaround from comment
#9 (of installing only the metadata agent), just follow the instructions in
https://cloud.google.com/monitoring/kubernetes-engine/customizing . That way
you'll get all 3 agents and will see both monitoring and logging data. Not
sure if that's what you meant by "this trick"...
already have managed configs, rather than using the workaround from comment
#9 (of installing only the metadata agent), just follow the instructions in
you'll get all 3 agents and will see both monitoring and logging data. Not
sure if that's what you meant by "this trick"...
al...@gmail.com <al...@gmail.com> #18
Yes, sorry. I indeed installed all the agents and followed the instructions on the customizing page. This is really a life saver... I also installed the Prometheus agent (described as well in the docs and this also works on old cluster).
ig...@google.com <ig...@google.com> #19
Ah, great. Then this is working as intended. We are in the process of
updating the documentation to mention this possibility.
updating the documentation to mention this possibility.
sh...@google.com <sh...@google.com> #20
Update:
This bug is fixed in GKE 1.10.6, which will be fully rolled out today. Besides, it also fixes bunch of other Beta known issues.
Please read Stackdriver release note:https://cloud.google.com/monitoring/kubernetes-engine/release-guide , and GKE release note: https://cloud.google.com/kubernetes-engine/release-notes#fixes
I have verified issues are fixed in my own cluster, after upgrading.
Please feel free to re-open if you still encounter these issues.
Thanks a lot for reporting bugs and collaborating with us!
This bug is fixed in GKE 1.10.6, which will be fully rolled out today. Besides, it also fixes bunch of other Beta known issues.
Please read Stackdriver release note:
I have verified issues are fixed in my own cluster, after upgrading.
Please feel free to re-open if you still encounter these issues.
Thanks a lot for reporting bugs and collaborating with us!
Description