Status Update
Comments
al...@google.com <al...@google.com> #2
I am able to also reproduce issue #1 now; I've committed a change to the sample project that sets enableMultiInstanceInvalidation()
on the DB and adjusts the timing a bit more. After running it for 15 minutes or so I am seeing multiple skipped updates.
My production app uses enableMultiInstanceInvalidation()
as well, so this may be a significant factor.
sg...@google.com <sg...@google.com> #3
Any thoughts on this one? I mean, it makes JournalMode.TRUNCATE largely useless, unless one has a penchant for random app misbehaviors. I'd like to keep using TRUNCATE, as it avoids certain headaches from WRITE_AHEAD_LOGGING.
al...@google.com <al...@google.com> #4
To be clear, "batching" does not take care of the missed invalidation notifications. Updates for a table may be spaced hours apart, and in my production app about 5% of notifications are lost, meaning that the user does not see the update until the next one comes in, which may be hours later. It's a chatroom feature, where reliable, timely updates are important.
Also, not setting enableMultiInstanceInvalidation() does not fix the issue, it only changes the timing; notifications go missing regardless of whether I include one process or multiple.
ag...@google.com <ag...@google.com> #5
Hi - Sorry, we haven't had the chance to investigate this. I know this might be a lot to ask and thanks for giving us a sample app, but have you try adding a transaction in the invalidation tracker? You can check out Room's source code here:
Also what headaches are you trying to avoid from WAL mode?
al...@google.com <al...@google.com> #6
Thanks for the pointer. I've built Room from source now as you suggested and do see that the problem goes away once I use the transactionality block in InvalidationTracker.mRefreshRunnable for all JournalModes. Don't know why that wasn't done in the first place, it certainly is a misconception to assume that TRUNCATE does not need it.
Regarding headaches from WAL mode:
- Android's SqliteDatabase runs TRUNCATE transactions effectively serializable due to grabbing exclusive locks right at the start of a transaction, so I can worry less about transactions failing, at the cost of reduced concurrency. My app isn't that heavy on DB ops, so favoring reliability over concurrency makes lots of sense. I'd actually assume that this is true for most Android apps, and that TRUNCATE would make a better default, not fancy WAL.
- WAL has certain other disadvantages, as pointed out here:
https://sqlite.org/wal.html In particular the risk of extended failure from SQLITE_BUSY is high for my app as it has DB connections from two processes (https://sqlite.org/wal.html#busy ); of particular concern is the recovery case after a crash, where one process may hold an exclusive lock for an extended period of time at startup, essentially causing the other process to error out without much recourse.
Hope this can be fixed soon in an official build; seems a trivial change.
al...@google.com <al...@google.com>
al...@google.com <al...@google.com> #7
al...@google.com <al...@google.com>
al...@google.com <al...@google.com>
ap...@google.com <ap...@google.com> #8
Yes please! It would be ideal if you can write a test for this though, in the test app specifically:
Similar to your sample app, open a DB in TRUNCATE mode, have a thread that inserts a lot and another one reading notifications and at the end make sure the amount of notifications received match the amount of items inserted.
ap...@google.com <ap...@google.com> #9
ap...@google.com <ap...@google.com> #10
just to clarify on #8, we cannot count the number of invalidation events as database might combine them. Instead, we need to make sure latest value is always eventually dispatched.
al...@google.com <al...@google.com> #11
As it's an intermittent problem (triggered by specific transaction timing), it's difficult to surface in the first place, so this type of instrumented test cannot really reliably find the problem, it was hard enough trying to surface it with manually run code in a controlled environment. Also, the longer of a time interval we choose, the harder it gets to reproduce within a finite amount of time.
I think the main concern with fixing the issue is not actually the PR's ability to address the issue, but the potential to introduce regressions. I have no idea why there was no transactionality for the TRUNCATE path, was it really just because the thinking was that it was superfluous? This can only be addressed by existing test coverage and maybe insight from whoever wrote the original code and other knowledegable team members. I traced its history and found no helpful pointers, I remember finding that it always looked like this since it was introduced in the early days of Room.
Do you guys really want the sketchy black box test, or maybe just stick with some extra code documentation, existing regression tests, plus the repro case? It almost seems not worth the trouble.
ap...@google.com <ap...@google.com> #12
Seems like it was added here:
For testing, I agree it is not great as it only shows as a flake but we have a couple tests stress similar to that and we usually use 10secs (to account for cloud devices). Under normal circumstances (device is not slowed down and db is not locked), it shouldn't take more than ~100ms to get the update after the right (usually much faster). But with virtual device testing, we cannot rely on timing hence we pick large enough numbers (if a virtual device idles for 10 seconds, than that is an infra problem).
It is at least better than nothing and if it flakes, that'll keep bugging us until finding a better solution.
Btw, if you can detect the exact ordering of events that will cause the problem, then we can add package private restricted APIs from the InvalidationTracker to have fine tuned control over them in tests.
ap...@google.com <ap...@google.com> #13
ap...@google.com <ap...@google.com> #14
oh-oh, I'm sorry to hear that, we definitely don't require you to enter any SSH key password. If anything, only the built-in Git integration in Android Studio will ask you for Github credentials if you use it to push changes to Github. Maybe you have some additional plugin that is asking for it? The Github-based setup downloads the correct Android Studio version needed for the project, but some plugins are installed separately and can persist across IDE installations.
al...@google.com <al...@google.com> #15
#13 i'm curious what went wrong w/ the Github setup. I don't want to hijack this bug but if you can either file an issue at
bu...@google.com <bu...@google.com> #16
I don't think I will add the test; ultimately I am unfamiliar with AOSP tooling and in particular the test setup for this, so it's an open-ended time commitment, and I already spent too much time on this tiny code problem.
I can still make a PR for the code change that fixes the bug (based on my testing via repro & actual app), but it's so trivial that you might just as well change it yourselves. As I mentioned in #10, the primary concern with the code change would be regressions, which are not captured by the envisioned test anyway, so I'd consider it defensible to go without test. The alternative, no fix, leaves TRUNCATE in a poor state.
ap...@google.com <ap...@google.com> #17
I meant
bu...@google.com <bu...@google.com>
al...@google.com <al...@google.com> #18
are you still blocked on the SSH issue? FYI we do have other people sending PRs from Github without that issue so I would like to figure out what is broken in your case (as it might be affecting other people as well).
I want to respect your time though so if you don't want to deal with it anymore, you can send a PR without tests and we'll take it from there to add tests (though unfortunately, we cannot merge a PR without tests so we'll need to add them).
pr...@google.com <pr...@google.com> #19
Thanks for your and Aurimas' help on the SSH issue last year, doing a checkout with filter and SSH caused the problems. I suggested updating the online instructions to point out this gotcha, not sure what happened after that on your end.
I am sending you a PR with the fix & added test. I gave it one more push to get if off my mind.
na...@google.com <na...@google.com> #20
Branch: androidx-main
commit a5c2900ca26e8f24d4492e5a0e90f0669702c680
Author: Uli Bubenheimer <bubenheimer@users.noreply.github.com>
Date: Mon Apr 19 09:22:25 2021
[GH] Fix intermittent InvalidationTracker issues in JournalMode.TRUNCATE
## Proposed Changes
- Fixes issue in Room JournalMode.TRUNCATE where the InvalidationTracker callback is sometimes invoked invalidly, too late, or not at all.
- Adds InvalidationTrackerBehavioralTest
## Testing
Test: ./gradlew test connectedCheck -x :room:room-benchmark:cC -x :room:integration-tests:room-incremental-annotation-processing:test
## Issues Fixed
Fixes:
This is an imported pull request from
Resolves #159
Github-Pr-Head-Sha: b32fc85dd1368db5fe046b1edfecdb82ac3e5478
GitOrigin-RevId: 169da50d9ce5d0be6aa686d603d51a380f46ab63
Change-Id: I0c04f25303043f45c8efe687df93f7122acbc7d4
A room/integration-tests/testapp/src/androidTest/java/androidx/room/integration/testapp/test/InvalidationTrackerBehavioralTest.java
M room/runtime/src/main/java/androidx/room/InvalidationTracker.java
pr...@google.com <pr...@google.com> #21
The following release(s) address this bug.It is possible this bug has only been partially addressed:
androidx.activity:activity:1.10.0-alpha02
androidx.core:core-splashscreen:1.2.0-alpha02
androidx.work:work-multiprocess:2.10.0-alpha03
androidx.work:work-runtime:2.10.0-alpha03
na...@google.com <na...@google.com> #22
The following release(s) address this bug.It is possible this bug has only been partially addressed:
androidx.car.app:app:1.7.0-beta02
androidx.car.app:app-automotive:1.7.0-beta02
androidx.privacysandbox.ui:ui-client:1.0.0-alpha10
androidx.privacysandbox.ui:ui-provider:1.0.0-alpha10
androidx.wear.watchface:watchface-complications-data:1.3.0-alpha04
pr...@google.com <pr...@google.com> #23
The following release(s) address this bug.It is possible this bug has only been partially addressed:
androidx.input:input-motionprediction:1.0.0-beta05
androidx.webkit:webkit:1.12.1
na...@google.com <na...@google.com> #24
The following release(s) address this bug.It is possible this bug has only been partially addressed:
androidx.core:core-location-altitude:1.0.0-alpha03
na...@google.com <na...@google.com> #25
The following release(s) address this bug.It is possible this bug has only been partially addressed:
androidx.exifinterface:exifinterface:1.4.0-alpha01
pr...@google.com <pr...@google.com> #26
The following release(s) address this bug.It is possible this bug has only been partially addressed:
androidx.core:core-telecom:1.0.0-beta01
androidx.mediarouter:mediarouter:1.8.0-alpha01
androidx.transition:transition:1.6.0-alpha01
na...@google.com <na...@google.com> #27
The following release(s) address this bug.It is possible this bug has only been partially addressed:
androidx.browser:browser:1.9.0-alpha01
androidx.versionedparcelable:versionedparcelable:1.2.1
pr...@google.com <pr...@google.com> #28
The following release(s) address this bug.It is possible this bug has only been partially addressed:
androidx.tracing:tracing:1.3.0-beta01
na...@google.com <na...@google.com> #29
The following release(s) address this bug.It is possible this bug has only been partially addressed:
androidx.core:core-i18n:1.0.0-beta01
androidx.leanback:leanback:1.2.0-beta01
Description
R8 may double-outline platform NewApi calls when automatic outlining is enabled, so we should consider no-op'ing the
@DoNotInline
annotation when we're using an R8 version that does outlining.Looping in from email,