Status Update
Comments
ha...@google.com <ha...@google.com>
ja...@jamie-thomson.net <ja...@jamie-thomson.net> #2
I wholeheartedly agree. We put policy tags live one week ago and the first feedback we got was "Is there any way to make SELECT * work". I would ask that you prioritise this if possible. Of the options presented above I would prefer option 1 of course however option 2 would be satisfactory.
Credit to the original poster who has clearly put a lot of effort into submitting such a detailed issue.
pi...@gmail.com <pi...@gmail.com> #3
Thanks!
Someone suggested the idea of using Data Masking, which would solve the issue as well. In an better way, even. Masked fields could still be queried and used in JOINS and aggregations, but when displayed the values would be replaced by unique hashes for unauthorized users.
ja...@jamie-thomson.net <ja...@jamie-thomson.net> #4
Masked fields could still be queried and used in JOINS and aggregations, but when displayed the values would be replaced by unique hashes for unauthorized users.
Oh yeah, absolutely. That would be fantastic.
as...@gmail.com <as...@gmail.com> #5
se...@google.com <se...@google.com>
bw...@google.com <bw...@google.com>
bw...@google.com <bw...@google.com> #6
This will be addressed with the forthcoming data masking features. For columns that the user does not have access to, you can specify that the user will get NULL, a static value, or a hashed value.
Description
What you would like to accomplish:
TLDR; Have a syntax like
SELECT AUTHORIZED * FROM table
on a table with column-level policy tags that would automatically select all the columns that the user has the right to see (including for nested sub-columns).Extended use case description
Column-level security is feature that really got us excited on the paper. Before it, we had to shard our tables into two parts:
my_dataset.public_table
my_dataset.private_table
Where
public_table
is a copy ofprivate_table
where we filtered out sensitive columns (or conversely,private_table
is a copy ofpublic_table
where we added sensitive columns).While this was covering our use case, it was a chore to maintain because:
private_table
instead ofpublic_table
Therefore, we were really excited by the newly-added column-level security feature. We toyed around with it on a POC and discovered in the documentation that when performing a
SELECT *
, every column you did not have access to had to be manually excluded from the querySELECT * EXCEPT(private_field_1, private_field_2)
. Needless to say, this would be extremely annoying to do for our users, especially when they can doSELECT * FROM public_table
with our current implementation.As a workaround, we could of course organize our table with a column struct
private_cols
and ask our users to doSELECT * EXCEPT (private_cols)
, but then we would have the same issue of not being able to update our security policy without having to change the table's schema, and not being able to have more fined-grained policies.Last but not least, when the table schema is nested with some columns being private, the EXCEPT solution becomes even more complex to use, as this example demonstrate :
And so on...
How this might work
Option 1.
Make the
SELECT * FROM table
automatically select the columns that the user has the right to use.Option 2.
If we don't want to alter the current behavior, we could add a function or keyword that would enable such behavior:
SELECT AUTHORIZED * FROM table
or
SELECT AUTHORIZED(*) FROM table
Of course, the tricky part would also be to make it work recursively on nested structures.
I am not familiar with BigQuery's internals, but I would not be surprised if such feature was very difficult to add (especially if the security checks are done after the query is fully compiled). It might also cause issues with query result caching, because the exact same query run by two distinct users would produce different results, but since there is already a
SESSION_USER()
function that have a similar effect, it doesn't seem to be unprecedented.It probably isn't an easy feature to add, but I believe it would make column-level security much more usable for us.