Status Update
Comments
cj...@google.com <cj...@google.com> #2
cj...@google.com <cj...@google.com> #3
Hello,
This issue report has been forwarded to the Cloud Dataproc Product team so that they may investigate it, but there is no ETA for a resolution today. Future updates regarding this issue will be provided here.
va...@google.com <va...@google.com>
je...@google.com <je...@google.com> #4
Hello,
This issue report has been forwarded to the Cloud Dataproc Product team so that they may investigate it, but there is no ETA for a resolution today. Future updates regarding this issue will be provided here.
cj...@google.com <cj...@google.com> #5
Thank you Sushma!
cj...@colliertech.org <cj...@colliertech.org> #6
cj...@colliertech.org <cj...@colliertech.org> #7
cj...@google.com <cj...@google.com> #8
Hello Prakash and Yaswanth,
I spoke with one of our engineers, and they suggested that the sa multi-tenancy feature[1] might provide a partial solution to your problem, or some building blocks that can help you to produce something close to a solution as I discuss the issue with engineering.
Here is a response from product engineering:
Is the idea that you want to run a job inside the cluster that uses a different SA than the VM default SA for connections to GCS, BigQuery, etc.? We have a feature that can do this:
The awkward part is that you need to know upfront all of the users and all of the service accounts they map to so that you can declare configuration during cluster creation. You can't add, remove or modify this mapping on a running cluster (though we have work in progress to improve this).
It can't be group-based. You really need to know all of the users and SAs at cluster creation, and you can't make changes except by deleting and recreating the cluster.
The other possibility is I know of at least one customer that ships exported SA JSON key files into their cluster, and then they configure the GCS connector in their jobs to use that JSON key file instead of the VM credentials. The drawback here is the extra configuration overhead and exported key files are not considered security best practice. It's easy to leak them into other systems with insufficient protections.
This sounds like an anti-pattern that your security team would object to.
[1]
pr...@verizon.com <pr...@verizon.com> #9
500 data scientists
a lot of groups (100+)
cannot create 100 service accounts
When I create a cluster, it currently uses project service account or the service account specified when cluster is created.
Instead, it should use my own credentials for interacting with GCS or BigQuery
Description
Please add test to exercise a use case
500 data scientists
a lot of groups (100+)
cannot create 100 service accounts
When I create a cluster, it currently uses project service account or the service account specified when cluster is created.
Instead, it should use my own credentials for interacting with GCS or BigQuery
The way it was working 1.5+ years ago, in the middle of 2023
grant service account access to GCS bucket
when reads happen, read should be executed as my user, not the service account
authorization should be granted by groups
when I create a cluster, I should be able to access the next service using my own principal rather than granting the permissions to the service account.
for IC cluster, only I will have access to. Access will only come from my user. There is no shared concept in this IC cluster.
For general purpose (not IC cluster), access is determined at the time of request (GCS, BigQuery, whatever). The user who launched the job will be the user as whom the service requests are issued.
=== For internal use only ===
go/vgo/55137450 # SME Consult
go/vgo/55085759 # Vector Case
go/vgo/57661976 # SME Consult
go/vgo/57500204 # Vector Case