Assigned
Status Update
Comments
ba...@google.com <ba...@google.com>
on...@google.com <on...@google.com> #2
I had the same problem!
I could solve it by putting the .aidl file in an aidl directory.
Look at the attached screenshot to see the project structure.
I could solve it by putting the .aidl file in an aidl directory.
Look at the attached screenshot to see the project structure.
Description
Problem you have encountered:
While trying to run GPU-Operator on a GKE cluster, it works fine on standard, nodepools. But we need to disable the default gpu device plugin daemonset provided by GKE. With node auto-provisioner running COS_CONTAINERD.
ref:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/google-gke.html#using-the-google-driver-installer
From what i see the only way to disable it is with a label on the node ("gke-no-default-nvidia-gpu-device-plugin=true"). After the node is provisioned i manually added the label on the node and everything worked as expected. It would be nice to suppress this manual step to support GPU-Operator on auto-provisioned nodes.
What you expected to happen:
I believe the easier way to suppress this manual step and support GPU-Operator on auto-provisioned nodes would be to allow the definition of labels to be set on auto-provisioned nodes in the auto-provisioner configuration.
Here:https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters?_gl=1*6h0v19*_up*MQ..&gclid=CjwKCAiA-ty8BhA_EiwAkyoa37vmnRKUuFxXb46Tf_7WoqpDBiwcKktvb24mLGbju-yBpeXEFdlOeRoCQXkQAvD_BwE&gclsrc=aw.ds#Cluster.AutoprovisioningNodePoolDefaults
Or here:https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters?_gl=1*1wz2fc3*_up*MQ..&gclid=CjwKCAiA-ty8BhA_EiwAkyoa37vmnRKUuFxXb46Tf_7WoqpDBiwcKktvb24mLGbju-yBpeXEFdlOeRoCQXkQAvD_BwE&gclsrc=aw.ds#Cluster.NodePoolAutoConfig
Additional note: it works nicely with driver pre-installed. ("gpu-driver-version=default") so this can be updated in the docs.
Other information (workarounds you have tried, documentation consulted, etc):