Change theme
Help
Press space for more information.
Show links for this issue (Shortcut: i, l)
Copy issue ID
Previous Issue (Shortcut: k)
Next Issue (Shortcut: j)
Sign in to use full features.
Vote: I am impacted
Notification menu
Refresh (Shortcut: Shift+r)
Go home (Shortcut: u)
Use Markdown for this comment
Set severity, which reflects how much the issue affects the use of the product
Assign issue to yourself
Pending code changes (auto-populated)
[ID: 82937]
Primary programming language affected, if applicable [ID: 82936]
[ID: 82935]
[ID: 82940]
[ID: 82941]
Set the version(s) of the product affected by this issue (comma-separated list)
Set the version(s) of the product in which the issue should be fixed (comma-separated list)
Set the version(s) of the product in which the issue fix was verified (comma-separated list)
Set if this issue occurs in production
Set Reporter
Set Type
Set priority, which reflects how soon the issue should be fixed
Set Status
Set Assignee
Set Verifier
Remove item
View or edit staffing
View issue level access limits(Press Alt + Right arrow for more information)
Description
Problem you have encountered:
When querying a materialized view (MV) built on CDC-based tables (populated via DataStream) that recently had their staleness setting changed from 8 hours to 2 hours, the MV returns inflated aggregated metrics (or duplicate rows when not aggregated) compared to running the same query directly on the base tables. This occurs even when the underlying base table data (e.g., for a specific date) remains unchanged beyond the staleness period. Adjusting the MV’s max staleness to exactly twice that of the base tables appears to temporarily mitigate the issue in some cases (not always), but the problem ultimately recurs. Toggling
Use cached results
on and off or changing theallow_non_incremental_definition
parameter (true/false) does not resolve the problem.What you expected to happen:
The materialized view should produce results identical to those from direct queries on the base tables—without duplicate entries or inflated aggregation values.
Steps to reproduce:
Create Dummy Base Tables:
Assume you have two CDC-based tables:
project.dataset.transactions
andproject.dataset.users
. Thetransactions
andusers
table are continuously updated via Datastream with a known staleness of 2 hours (used to be 8HR a week ago).Create a Materialized View:
Use the following dummy query as an example:
Query the Materialized View:
Execute a query similar to:
Query the Based tables:
We should see much higher
total_amount
andtransaction_count
from the MV when compared to the direct query against the base tables.Other information (workarounds you have tried, documentation consulted, etc):
max_staleness
to twice the base table’s staleness (i.e., 4 hours) can sometimes mask the issue, but the problem eventually recurs.allow_non_incremental_definition
parameter (true/false) did not resolve the problem.max_staleness
settings of materialized views may be misaligned with CDC updates, then causing overlapping refresh windows that lead to duplicate processing of data.Please let me know if further details or additional reproduction steps are needed. Thank you for your assistance.