Assigned
Status Update
Comments
ka...@google.com <ka...@google.com>
ma...@google.com <ma...@google.com> #2
This was partially addressed by https://android-review.googlesource.com/c/platform/frameworks/support/+/2076902 , available now in 1.1 RC02, see b/230665435 .
As compilation on user builds requires a full target reinstall (which is a behavior change), we offer an opt out, which can be used to accomplish this feature request:
1) configure every macrobench to use `CompilationMode.Full()`
1) manually issuing the compile command `cmd package compile -f -m speed <packagename>` for your target
1) pass the instrumentation arg `androidx.benchmark.compilation.enable` = `false` to skip compilation/reinstall for each macrobenchmark.
This should still give you the numbers you've been seeing, while avoiding the cost of a large AOT each test.
Leaving this bug open, since in general we should be able to do this more automatically for everything without warmup driven profiles.
(Somewhat related bug - there have been excess compilations issued specifically for `Compilation.None`, `StartupMode.COLD` benchmarks, which has been fixed, but not shipped publicly yet: b/231976084 )
Description
Currently, metrics-server runs as a Deployment with a single replica.
Problem
The metrics-server is registered as the backend for the
When the metrics-server Pod is unhealthy, disrupted, or rescheduled, it can no longer serve API requests for the metrics endpoint.
This causes disruption to Kubernetes 1st-party & 3rd-party controllers, especially those that use API Discovery to discover which API groups and resources are available.
Two notable error examples:
* The Kubernetes namespace garbage collector fails to fully clean up Namespaces when the metrics-server is unavailable.
* Config Sync fails to sync and/or update resource status when the metrics-server is unavailable.
Possible Solution
The metrics-server Deployment used by GKE does not specify a replica count, which makes it default to 1 replica.
The official component GitHub repo has example YAML for deploying a highly available metrics-server:
In addition to the recommended configuration, it would also be a good idea to define the PodDisruptionBudget to avoid both Pods being disrupted at the same time, and use enable-aggregator-routing to share the traffic load between the instances.
Cost & Node Requirement Concerns
For metrics-server to be made HA requires clusters to have at least 2 nodes. So some tweaks may be required for this solution to work on single-node and zero-node clusters, like Autopilot.
One way to handle this might be to create a simple controller that modifies the metrics-server Deployment config depending on how many Nodes are in the cluster at any given time. This way the config could be changed to be single-replica on one-node clusters, or even scale to zero on zero-node clusters without causing constant errors about the Deployment not having any healthy replicas.