Assigned
Status Update
Comments
de...@derekperkins.com <de...@derekperkins.com> #2
+1
bw...@google.com <bw...@google.com> #3
+1
de...@derekperkins.com <de...@derekperkins.com> #4
I'm very excited for Table Functions, that enables the parameterized view feature mentioned in my second comment, and we will definitely use it right now. It still doesn't provide all that I am looking for, since it doesn't allow for scripting or side-effects, which would really unleash the power of BigQuery as a data platform api.
bw...@google.com <bw...@google.com> #5
Understood. We will consider this for a future roadmap item.
de...@derekperkins.com <de...@derekperkins.com> #7
Following up to see if there has been any movement
de...@derekperkins.com <de...@derekperkins.com> #8
I've attempted to make this work with Table Functions, but they're pretty much unusable without having optional parameters
Description
What you would like to accomplish:
We are really having a hard time making BigQuery / BI tools easy to use for our customers. Queries to our raw data are semi-slow / expensive, plus are complex. To date, we have given users 1000+ line queries for it, but those are very intimidating, plus prone to break.
For our dashboard users, we have a multi-stage caching setup where we hash the inputs of a query, write the output to a BigQuery table, then use that as the data source for further queries / pagination. I thought I could replace that system with a stored procedure, and I can get 99.9% of the way there.
The other benefit of using a script is that I can provide a stable api for our DataStudio (or whatever BI tool)
CALL rollup_data_func(a, b, c)
, but not lock myself into implementation details. I can add/remove caching tiers, change source tables/fields, etc., as long as I maintain that api contract.The problem is BigQuery doesn't allow you to interact a table response. When using the api, the script output is exactly what I would expect.
"The jobs.getQueryResults method returns the query results for the last SELECT, DML, or DDL statement to execute in the script, with no query results if none of the above statements have executed"
All that I'm looking for is for BigQuery to have the same semantics. Let me interact with that output,
SELECT * FROM (CALL rollup_data_func(a, b, c)) ORDER BY col1
.If applicable, reasons why alternative solutions are not sufficient:
Without that, it is impossible to connect data directly from a stored procedure to DataStudio (or any other BigQuery connector). Writing to a table is not a workaround, as I'm already lazily creating a temporary table, then using that output. I still have to trigger the creation of that table, and using a cron to execute every possible combination of input parameters isn't possible, and I can't require my users to build in a custom "generate data" button that they have to click whenever the change parameters.
Another option is that instead of using pre-built BigQuery connectors, we would have to build our own custom integrations into every BI tool on the planet. We don't have the resources, plus that would reduce BigQuery's spend potential.
I've tried returning the query string and executing that inline, but that errors out on the DataStudio side because of the script declaration. That also rules out returning the table data in a predefined array of structs as an
OUT
variable.Sample Procedure