Bug P2
Status Update
Comments
tg...@nvidia.com <tg...@nvidia.com> #2
Hi , Could you please clarify the issue description or share any screen shot of the problem where you are facing issue ?
im...@google.com <im...@google.com> #3
Hi, its a feature request. Can you change it to feature request?
I am requesting a dataproc image version that supports spark 3.4
https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.2
I am requesting a dataproc image version that supports spark 3.4
im...@google.com <im...@google.com> #4
Hello,
Thank you for reaching out to us with your request.
We have duly noted your feedback and will thoroughly validate it. While we cannot provide an estimated time of implementation or guarantee the fulfillment of the feature request
, please be assured that your input is highly valued. Your feedback enables us to enhance our products and services.
sr...@nvidia.com <sr...@nvidia.com> #5
Is it possible to get early access to the fix, so we or the end customer test?
im...@google.com <im...@google.com> #6
The updated images are not yet built to be released externally, but you may apply the workaround by setting the following parameters in the startup scripts or initialization actions:
yarn.scheduler.capacity.root.default.user-limit-factor=2
yarn.scheduler.capacity.root.dataproc-driverpool-driver-queue.user-limit-factor=2
Description
Problem you have encountered:
In clusters created with driver pools, the default queue's capacity ( these docs ).
yarn.scheduler.capacity.root.default.capacity
) is set to 50% while its maximum capacity (yarn.scheduler.capacity.root.default.maximum-capacity
) is set to 100%. This means that a single user's jobs can not get more than half of the cluster's resources as user-limit (yarn.scheduler.capacity.root.default.user-limit
) is set to 1 by default. (seeWhat you expected to happen:
Jobs being able to use 100% of cluster resources (regardless of their user)
Steps to reproduce:
Other information (workarounds you have tried, documentation consulted, etc):
Changing the default pool's capacity to 100% or the user limit to 2.0 would fix the issue, however capacity scheduler configurations can't be submitted as properties during cluster creation (they're blacklisted) so this needs to be done manually or in startup/init scripts.
[1]