Status Update
Comments
el...@google.com <el...@google.com> #2
I am able to also reproduce issue #1 now; I've committed a change to the sample project that sets enableMultiInstanceInvalidation()
on the DB and adjusts the timing a bit more. After running it for 15 minutes or so I am seeing multiple skipped updates.
My production app uses enableMultiInstanceInvalidation()
as well, so this may be a significant factor.
as...@gmail.com <as...@gmail.com> #3
Any thoughts on this one? I mean, it makes JournalMode.TRUNCATE largely useless, unless one has a penchant for random app misbehaviors. I'd like to keep using TRUNCATE, as it avoids certain headaches from WRITE_AHEAD_LOGGING.
yb...@google.com <yb...@google.com> #4
To be clear, "batching" does not take care of the missed invalidation notifications. Updates for a table may be spaced hours apart, and in my production app about 5% of notifications are lost, meaning that the user does not see the update until the next one comes in, which may be hours later. It's a chatroom feature, where reliable, timely updates are important.
Also, not setting enableMultiInstanceInvalidation() does not fix the issue, it only changes the timing; notifications go missing regardless of whether I include one process or multiple.
as...@gmail.com <as...@gmail.com> #5
Hi - Sorry, we haven't had the chance to investigate this. I know this might be a lot to ask and thanks for giving us a sample app, but have you try adding a transaction in the invalidation tracker? You can check out Room's source code here:
Also what headaches are you trying to avoid from WAL mode?
as...@gmail.com <as...@gmail.com> #6
Thanks for the pointer. I've built Room from source now as you suggested and do see that the problem goes away once I use the transactionality block in InvalidationTracker.mRefreshRunnable for all JournalModes. Don't know why that wasn't done in the first place, it certainly is a misconception to assume that TRUNCATE does not need it.
Regarding headaches from WAL mode:
- Android's SqliteDatabase runs TRUNCATE transactions effectively serializable due to grabbing exclusive locks right at the start of a transaction, so I can worry less about transactions failing, at the cost of reduced concurrency. My app isn't that heavy on DB ops, so favoring reliability over concurrency makes lots of sense. I'd actually assume that this is true for most Android apps, and that TRUNCATE would make a better default, not fancy WAL.
- WAL has certain other disadvantages, as pointed out here:
https://sqlite.org/wal.html In particular the risk of extended failure from SQLITE_BUSY is high for my app as it has DB connections from two processes (https://sqlite.org/wal.html#busy ); of particular concern is the recovery case after a crash, where one process may hold an exclusive lock for an extended period of time at startup, essentially causing the other process to error out without much recourse.
Hope this can be fixed soon in an official build; seems a trivial change.
yb...@google.com <yb...@google.com> #7
el...@google.com <el...@google.com> #8
Yes please! It would be ideal if you can write a test for this though, in the test app specifically:
Similar to your sample app, open a DB in TRUNCATE mode, have a thread that inserts a lot and another one reading notifications and at the end make sure the amount of notifications received match the amount of items inserted.
el...@google.com <el...@google.com> #9
as...@gmail.com <as...@gmail.com> #10
just to clarify on #8, we cannot count the number of invalidation events as database might combine them. Instead, we need to make sure latest value is always eventually dispatched.
ap...@google.com <ap...@google.com> #11
As it's an intermittent problem (triggered by specific transaction timing), it's difficult to surface in the first place, so this type of instrumented test cannot really reliably find the problem, it was hard enough trying to surface it with manually run code in a controlled environment. Also, the longer of a time interval we choose, the harder it gets to reproduce within a finite amount of time.
I think the main concern with fixing the issue is not actually the PR's ability to address the issue, but the potential to introduce regressions. I have no idea why there was no transactionality for the TRUNCATE path, was it really just because the thinking was that it was superfluous? This can only be addressed by existing test coverage and maybe insight from whoever wrote the original code and other knowledegable team members. I traced its history and found no helpful pointers, I remember finding that it always looked like this since it was introduced in the early days of Room.
Do you guys really want the sketchy black box test, or maybe just stick with some extra code documentation, existing regression tests, plus the repro case? It almost seems not worth the trouble.
Description
Component used: Room
Version used: 2.4.0-alpha02
I just added foreign keys on a couple of entities and got this error when trying to use AutoMigration.
Here's the generated code:
Stack trace: