Assigned
Status Update
Comments
to...@ziggipapers.com <to...@ziggipapers.com> #2
Edit:
- Spurious line "65 / ~350" after the first paragraph
- Spurious line "65 / ~350" after the first paragraph
te...@gmail.com <te...@gmail.com> #3
+1
[Deleted User] <[Deleted User]> #4
Could you perhaps use REGEX to parse the json string? Something like this should work (with some modifications for your use case):
WITH
yourTable AS (
SELECT
'{"bar": ["vimota", ""]}' AS json
UNION ALL
SELECT
'{"bar": [, "Brazil"]}' )
SELECT
ARRAY(
SELECT
REGEXP_EXTRACT(num, r'"(.*)"')
FROM
UNNEST(SPLIT(REGEXP_EXTRACT(JSON_EXTRACT(json,
'$.bar'), r'\[(.*)\]'))) AS num
WHERE
REGEXP_EXTRACT(num, r'"(.*)"') IS NOT NULL)
FROM
yourTable;
WITH
yourTable AS (
SELECT
'{"bar": ["vimota", ""]}' AS json
UNION ALL
SELECT
'{"bar": [, "Brazil"]}' )
SELECT
ARRAY(
SELECT
REGEXP_EXTRACT(num, r'"(.*)"')
FROM
UNNEST(SPLIT(REGEXP_EXTRACT(JSON_EXTRACT(json,
'$.bar'), r'\[(.*)\]'))) AS num
WHERE
REGEXP_EXTRACT(num, r'"(.*)"') IS NOT NULL)
FROM
yourTable;
tp...@gmail.com <tp...@gmail.com> #5
Nope, many of our json arrays contain json string values with user-input chars like " which would break a regex-based approach to parsing the json, since we'd have to distinguish " from \" from \\" from \\\", etc.
[Deleted User] <[Deleted User]> #6
Thanks for the feedback! We'll take this suggestion into account as we plan JSON-related functionality, and I'll update here if and when there is more to share.
su...@gmail.com <su...@gmail.com> #7
Thanks! In the meantime, what's the best way to turn a json array into a bq array? Looking through the docs on json functions I don't see a way to achieve this, other than writing a custom javascript udf, which imposes the strict limitations of queries that use udfs.
es...@gmail.com <es...@gmail.com> #8
The best option right now--if you need to take escaping into account--is using a JavaScript UDF. If you generally have a small number of JSON array elements and you want to handle escaped strings, you could use a hack like this one:
CREATE TEMP FUNCTION JsonExtractArray(json STRING) AS (
(SELECT ARRAY_AGG(v IGNORE NULLS)
FROM UNNEST([
JSON_EXTRACT_SCALAR(json, '$.foo[0]'),
JSON_EXTRACT_SCALAR(json, '$.foo[1]'),
JSON_EXTRACT_SCALAR(json, '$.foo[2]'),
JSON_EXTRACT_SCALAR(json, '$.foo[3]'),
JSON_EXTRACT_SCALAR(json, '$.foo[4]'),
JSON_EXTRACT_SCALAR(json, '$.foo[5]'),
JSON_EXTRACT_SCALAR(json, '$.foo[6]'),
JSON_EXTRACT_SCALAR(json, '$.foo[7]'),
JSON_EXTRACT_SCALAR(json, '$.foo[8]'),
JSON_EXTRACT_SCALAR(json, '$.foo[9]')]) AS v)
);
Even though there is an escaped quote inside the "bar" string in this example, you'll get the expected three elements:
SELECT JsonExtractArray('{"foo":[1,2,3,"ba\\"r"]}');
CREATE TEMP FUNCTION JsonExtractArray(json STRING) AS (
(SELECT ARRAY_AGG(v IGNORE NULLS)
FROM UNNEST([
JSON_EXTRACT_SCALAR(json, '$.foo[0]'),
JSON_EXTRACT_SCALAR(json, '$.foo[1]'),
JSON_EXTRACT_SCALAR(json, '$.foo[2]'),
JSON_EXTRACT_SCALAR(json, '$.foo[3]'),
JSON_EXTRACT_SCALAR(json, '$.foo[4]'),
JSON_EXTRACT_SCALAR(json, '$.foo[5]'),
JSON_EXTRACT_SCALAR(json, '$.foo[6]'),
JSON_EXTRACT_SCALAR(json, '$.foo[7]'),
JSON_EXTRACT_SCALAR(json, '$.foo[8]'),
JSON_EXTRACT_SCALAR(json, '$.foo[9]')]) AS v)
);
Even though there is an escaped quote inside the "bar" string in this example, you'll get the expected three elements:
SELECT JsonExtractArray('{"foo":[1,2,3,"ba\\"r"]}');
gi...@gmail.com <gi...@gmail.com> #9
Yeah, hardcoding a max length on the input arrays is a non starter for us.
da...@newfangled.com <da...@newfangled.com> #10
[Deleted User] <[Deleted User]> #11
> Process the data differently (e.g. using Cloud Dataflow or another tool) so that you can load it from newline-delimited JSON into BigQuery.
We've been taking advantage of bigquery to follow an ELT (extract-load-transfer) pattern, where the T happens in bigquery sql itself, so adding another T step like ETLT would be a heavy and undesirable change for us.
> Use a JavaScript UDF that takes the input JSON and returns the desired type; this is fairly straightforward but generally uses more CPU (and hence may require a higher billing tier).
(Discussed above.)
> Use SQL functions with the understanding that the solution breaks down if there are too many elements.
(Discussed above.)
We've been taking advantage of bigquery to follow an ELT (extract-load-transfer) pattern, where the T happens in bigquery sql itself, so adding another T step like ETLT would be a heavy and undesirable change for us.
> Use a JavaScript UDF that takes the input JSON and returns the desired type; this is fairly straightforward but generally uses more CPU (and hence may require a higher billing tier).
(Discussed above.)
> Use SQL functions with the understanding that the solution breaks down if there are too many elements.
(Discussed above.)
[Deleted User] <[Deleted User]> #12
We have a similar issue with map stored in JSON, parsing via REGEX is rather error prone. Right now seems JavaScript UDF is the only option and as mentioned before I'm fearing performance issues. In our case it's up to ~1M rows with each row containing map encoded as JSON (up to ~100 key-value pairs, might become more later).
Should I open a separate ticket for json_extract_map: string -> map<string, string> ?
Should I open a separate ticket for json_extract_map: string -> map<string, string> ?
ro...@gmail.com <ro...@gmail.com> #13
Yes, please do (this is more along the lines of supporting a new type). Thanks!
[Deleted User] <[Deleted User]> #15
>> Process the data differently (e.g. using Cloud Dataflow or another tool) so that you can load it from newline-delimited JSON into BigQuery.
> We've been taking advantage of bigquery to follow an ELT (extract-load-transfer) pattern, where the T happens in bigquery sql itself, so adding another T step like ETLT would be a heavy and undesirable change for us.
I think what the StackOverflow user, Elliott Brossard, was proposing is that instead of using an ELT pattern, use an ETL pattern, with DataProc/DataFlow as your transformation technology/layer.
Basically:
1. Extract from source into Google Cloud Storage. (E)
2. Run a DataProc/DataFlow job to parse your data, and transform it as necessary. (T)
3. Write the result(s) to BigQuery. (L)
> We've been taking advantage of bigquery to follow an ELT (extract-load-transfer) pattern, where the T happens in bigquery sql itself, so adding another T step like ETLT would be a heavy and undesirable change for us.
I think what the StackOverflow user, Elliott Brossard, was proposing is that instead of using an ELT pattern, use an ETL pattern, with DataProc/DataFlow as your transformation technology/layer.
Basically:
1. Extract from source into Google Cloud Storage. (E)
2. Run a DataProc/DataFlow job to parse your data, and transform it as necessary. (T)
3. Write the result(s) to BigQuery. (L)
du...@gmail.com <du...@gmail.com> #16
Another option is to add an STRING_TO_ARRAY() function, as we already have the reverse one: ARRAY_TO_STRING()
It should basically do this:
regexp_extract_all(json_extract(FIELD, '$.keyWithArrayAsVal'), '{{[^}}]+}}')
It should basically do this:
regexp_extract_all(json_extract(FIELD, '$.keyWithArrayAsVal'), '{{[^}}]+}}')
ap...@gmail.com <ap...@gmail.com> #17
Any updates on this?
[Deleted User] <[Deleted User]> #18
Not yet. The best workarounds are those listed above, e.g. a JavaScript UDF or splitting with a regex, assuming the strings don't have escaped quotes in them.
se...@gmail.com <se...@gmail.com> #19
+1 for this -- would be extraordinary helpful for occasions where you may use a 3rd party tool that integrates with BigQuery but where you can't control how the data arrives (e.g. as a string).
A prime example of such a tool is Segment.com:
- can be configured to use Redshift or BigQuery as a data warehouse
- stringifies arrays before sending to warehouse
A prime example of such a tool is Segment.com:
- can be configured to use Redshift or BigQuery as a data warehouse
- stringifies arrays before sending to warehouse
al...@curvestone.io <al...@curvestone.io> #20
+1 for this.
sw...@terpmail.umd.edu <sw...@terpmail.umd.edu> #21
+100 for this.
Please implement a aolution for thia.
Please implement a aolution for thia.
sw...@terpmail.umd.edu <sw...@terpmail.umd.edu> #22
+1 for this.
[Deleted User] <[Deleted User]> #23
+1 for this
[Deleted User] <[Deleted User]> #24
+1 for this
gr...@gmail.com <gr...@gmail.com> #25
+1 for this
al...@gmail.com <al...@gmail.com> #26
+1
[Deleted User] <[Deleted User]> #27
+1
as...@gmail.com <as...@gmail.com> #28
+1
It seems insane that in late 2019, BigQuery can't unnest a stringified JSON array without resorting to performance-breaking hacks. What the heck Google?
It seems insane that in late 2019, BigQuery can't unnest a stringified JSON array without resorting to performance-breaking hacks. What the heck Google?
da...@gmail.com <da...@gmail.com> #29
Unnest JSON Arrays is blocking us to migrate our application completely into the cloud.
[Deleted User] <[Deleted User]> #30
+1
ca...@gmail.com <ca...@gmail.com> #31
+1
bj...@measureone.com <bj...@measureone.com> #32
+1
ru...@gmail.com <ru...@gmail.com> #33
+1
I ran into this issue again today. I was able to use a workaround using REGEXP_EXTRACT, but now I have to teach this hack (and associated pitfalls and limitations) to the whole team.
I ran into this issue again today. I was able to use a workaround using REGEXP_EXTRACT, but now I have to teach this hack (and associated pitfalls and limitations) to the whole team.
ne...@acerb.biz <ne...@acerb.biz> #34
+1
ct...@gmail.com <ct...@gmail.com> #35
+1
What I wouldn't give for a JSON_EXTRACT_ARRAY function. JSON_EXTRACT already allows array access by index, and JSON_EXTRACT_SCALAR will actually return NULL if the result is an array (or an object), so it seems safe to assume there are already means within those functions to parse JSON arrays - can we not expose arrays natively?
In addition, pretty much every ETL platform converts JSON columns into strings within BigQuery, but if the data in those columns can not be readily converted into an array, BigQuery becomes a real handicap in the ETL process. I appreciate any consideration here.
What I wouldn't give for a JSON_EXTRACT_ARRAY function. JSON_EXTRACT already allows array access by index, and JSON_EXTRACT_SCALAR will actually return NULL if the result is an array (or an object), so it seems safe to assume there are already means within those functions to parse JSON arrays - can we not expose arrays natively?
In addition, pretty much every ETL platform converts JSON columns into strings within BigQuery, but if the data in those columns can not be readily converted into an array, BigQuery becomes a real handicap in the ETL process. I appreciate any consideration here.
en...@slidespro.io <en...@slidespro.io> #36
+1
cd...@escp.eu <cd...@escp.eu> #37
+1
pr...@gmail.com <pr...@gmail.com> #38
+1
lu...@britanniaeducationtrust.com <lu...@britanniaeducationtrust.com> #39
+1
ig...@carely.group <ig...@carely.group> #40
+1
by...@gmail.com <by...@gmail.com> #41
+1 this would be really helpful.
dy...@gmail.com <dy...@gmail.com> #42
+1
th...@gmail.com <th...@gmail.com> #43
+1
or...@gmail.com <or...@gmail.com> #44
+1
en...@gmail.com <en...@gmail.com> #45
+1
jd...@gmail.com <jd...@gmail.com> #46
+1 !
km...@gmail.com <km...@gmail.com> #47
just tried to create UDF function called json_extract_array but it reported an error that "User-defined function name 'json_extract_array' conflicts with a reserved built-in function name". Unfortunately, it's not usable yet :) so it looks like there is some progress.
rm...@gmail.com <rm...@gmail.com> #48
+1
al...@gmail.com <al...@gmail.com> #49
Comment has been deleted.
pr...@gmail.com <pr...@gmail.com> #50
+1
mg...@gmail.com <mg...@gmail.com> #51
Comment has been deleted.
mp...@gmail.com <mp...@gmail.com> #52
+1
ju...@gmail.com <ju...@gmail.com> #53
+1
pa...@gmail.com <pa...@gmail.com> #54
+1
an...@gmail.com <an...@gmail.com> #55
+1
ar...@gmail.com <ar...@gmail.com> #56
+1
ro...@gmail.com <ro...@gmail.com> #57
se...@queries.co.jp <se...@queries.co.jp> #59
Thanks for all the upvotes.
ja...@confluent.io <ja...@confluent.io> #60
+1
jp...@google.com <jp...@google.com>
vi...@gmail.com <vi...@gmail.com> #61
+1
dh...@gmail.com <dh...@gmail.com> #62
+1
sm...@gmail.com <sm...@gmail.com> #63
+1
ks...@stanford.edu <ks...@stanford.edu> #64
+1 Please add this feature
di...@gmail.com <di...@gmail.com> #65
+1
hu...@gmail.com <hu...@gmail.com> #66
+1 Please add
va...@gmail.com <va...@gmail.com> #67
+1
be...@gmail.com <be...@gmail.com> #68
+1
ou...@gmail.com <ou...@gmail.com> #69
+1
jo...@gmail.com <jo...@gmail.com> #70
+1
va...@breaktrue.ai <va...@breaktrue.ai> #71
+1
pr...@gmail.com <pr...@gmail.com> #72
+10000000000000
be...@gmail.com <be...@gmail.com> #73
+1000000000000000000000 please fix this after 8 years!!
su...@gmail.com <su...@gmail.com> #74
Google, come on, do this!
rh...@gmail.com <rh...@gmail.com> #75
Yes - about time!
ni...@elunic.com <ni...@elunic.com> #76
PUSH
ba...@gmail.com <ba...@gmail.com> #77
+1
ma...@gmail.com <ma...@gmail.com> #78
+googol
sa...@gmail.com <sa...@gmail.com> #79
+1
[Deleted User] <[Deleted User]> #80
+1
aa...@aaf.lu <aa...@aaf.lu> #81
+1
se...@vezign.com <se...@vezign.com> #82
Oh no... this has been on here since 2016... not much hope for it being resolved eh?
vi...@gmail.com <vi...@gmail.com> #83
+1
ca...@gmail.com <ca...@gmail.com> #84
+1
da...@dbuxton.com <da...@dbuxton.com> #85
+1
mj...@mozilla.com <mj...@mozilla.com> #86
+1
de...@burai.online <de...@burai.online> #87
Comment has been deleted.
ss...@umbrellab.com <ss...@umbrellab.com> #88
+1
2025 and it's still not possible?
2025 and it's still not possible?
ni...@elunic.com <ni...@elunic.com> #89
How can such a vital bug not be fixed after almost 10 years....
ti...@telus.com <ti...@telus.com> #90
+1 I was hyped after google next this year to start building some new integrations with all the new LLM capabilities that have come out in the last couple of years, but have been stopped dead in my tracks by lack of a basic feature that still hasn't been implemented in 10 years. My disappointment is immeasurable and my day is ruined.
Description
From what I can tell, there are two pieces of the puzzle that could be combined to make this possible:
1) The Drive API (running as Advanced Service) currently allows for comment insertions, but these comments cannot be tied to user selections because the comment anchor resource for Docs is "proprietary." See
Inspecting selection-bound comments returned by the list endpoint suggests that behind the scenes, Google has an anchors resource that is a core part of the Docs product, and that these have simple string identifiers like "kix.74d1qqbpcb6s"
2) The DocumentApp service has the ability to identify the current user's cursor selection from the active Document (e.g. via DocumentApp.getActiveDocument().getSelection();) however the selection resource does not have a method to extract anything that can be used as a Drive Comment anchor.
My proposal is to add a method to the selection class that creates a "kix" anchor that can be used with the Drive API comments endpoint.
This same approach should also be taken for Sheets and any other editor where the active user's selections are made programmatically available.