Change theme
Help
Press space for more information.
Show links for this issue (Shortcut: i, l)
Copy issue ID
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
Vote: I am impacted
Notification menu
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Pending code changes (auto-populated)
View issue level access limits(Press Alt + Right arrow for more information)
Unintended behavior
View staffing
Description
Problem you have encountered:
BigQuery does not prune blocks with NULL values when a column filter is applied.
What you expected to happen:
BigQuery should skip blocks containing NULL values based on metadata, thereby avoiding scanning those blocks.
Steps to reproduce:
To replicate this issue, I created a table with a sparse column (90% NULLs, 10% non-NULL values). The table was clustered by this column to optimize for queries filtering on it. THen I queried for rows with a specific value in the clustering column (the value can be found using the data preview feature, for example).
Test table
The resulting table has 3.18 GB of logical bytes after materialization.
Test query
Running this query results in 2.59 GB of bytes processed.
Clustering does appear to be working because parts of the table are being skipped. However, I expected approximately 90% of the table to be skipped because those blocks contain NULL values. In this case, BigQuery only skipped 18% of the table.
To compare, I created an example with a dense column.
Test table
This table has 5.96 GB of logical bytes after materialization.
Test query
Running this query results in only 14.13 MB of bytes processed.
The results are much better when the column is dense compared to when it is sparse. I would expect a similar amount of bytes processed in both cases, assuming that the blocks with NULL values are properly skipped.