Status Update
Comments
da...@google.com <da...@google.com> #2
yb...@google.com <yb...@google.com> #3
This was actually very hard to reproduce.
I've tried many, seems like it is also related to cancelling and launching from the main thread (which is also the thread used by the lifecycle dispatcher).
Out of all those variations, only canceledTransaction_withDelay_lifecycleLikeScope_launchOnMain fails .
I don't have fix yet though :)
yb...@google.com <yb...@google.com> #4
ok i think i figured out what is going on.
The logic that reserves a thread in the transaction executor uses a job to handle cancelation (controlJob
).
the reserved thread, which is responsible to resume the continuation uses this job to finish itself (controlJob.join()
)
Unfortunately, there is a race condition where when the executor reservation thread tries to resume the continuation, it immediately returns as it is cancelled but then the executing thread just waits for the job to finish, which never finishes because it didn't even start.
A possible fix is to ensure control job is cancelled when the job of the current context is cancelled.
val controlJob = Job()
coroutineContext[Job]!!.invokeOnCompletion {
controlJob.cancel()
}
this way, if continuation cannot be resumed (because it is cancelled), we won't block the execution thread.
Thanks for the nice repro, it is really an edge case so without the repro, we wouldn't be able to figure out what is going on. I'll work on cleaning up the CL for a proper fix.
thanks!
yb...@google.com <yb...@google.com> #5
current code actually tries to accomodate for a similar case in
Executor.acquireTransactionThread
by cancelling the job if that suspendCancellableCoroutine is cancelled yet in this case it actually completes successfully as cancellation only happens when it tries to hand it back in which case there is no place to cancel the job anymore.
yb...@google.com <yb...@google.com> #6
seems like the right fix is to use:
continuation.resume(coroutineContext[ContinuationInterceptor]!!) {
// on cancel block
controlJob.cancel()
}
to cover that case. unfortunately though it is experimental API.
Docs mention that this is when we need to close a resource, which is exactly our use case.
This function shall be used when resuming with a resource that must be closed by the code that had called the corresponding suspending function, e.g.:
yb...@google.com <yb...@google.com> #7
actually
continuation.resume(coroutineContext[ContinuationInterceptor]!!) {
// on cancel block
controlJob.cancel()
}
does not seem to work as I can still make it flake, very unlikely but possible
On the other hand, the initial solution of cancelling the job via the coroutine context's Job seems to work fine.
yb...@google.com <yb...@google.com> #8
ok that one fails because even if we get into execute, the following withContext may never start, again leaving the job lingering around.
I think it is best to tie that job to the context which is creating the transaction context as it should never live outside.
ap...@google.com <ap...@google.com> #9
Branch: androidx-master-dev
commit c89f069d9d7de77f8f0961f15f318494687cdac2
Author: Yigit Boyar <yboyar@google.com>
Date: Sun Jan 26 08:51:17 2020
Tie transaction job to the calling context
This CL fixes a bug in room suspend transactions where we would deadlock
the transactions if the calling coroutine gets cancelled before we can
even start the transaction. It would leave the control job in active
state with no other callback to cancel it (to unlock the runner).
With this change, we automatically cancel it if the calling context
has a job. Note that it is not added as a child job since doing so would
cancel the calling context when job is cancelled. We want that to be
just a one way cancelation.
Bug: 148181325
Test: SuspendingTransactionCancelationTest
Change-Id: I18a76128d325f099b9677af1bb5d35cedf43e3d5
A room/integration-tests/kotlintestapp/src/androidTest/java/androidx/room/integration/kotlintestapp/test/SuspendingTransactionCancellationTest.kt
M room/ktx/src/main/java/androidx/room/RoomDatabase.kt
an...@google.com <an...@google.com> #11
[Deleted User] <[Deleted User]> #12
zs...@google.com <zs...@google.com> #13
This deadlocks in tests as well:
@Test
fun bug() = runTest {
createTestDatabase().withTransaction {
println("hello world")
}
}
This will hang inside withTransaction
(more precisely, in acquireTransactionThread
), waiting for controlJob.join()
Description
Version used: 2.2.3
Devices/Android versions reproduced on: Pixel XL (API 26 - Emulator), Samsung Galaxy Tab A (API 28)
There seems to be an issue, with calling function from DAO, which is marked as @Transaction.
Issue emerges, while cancelling Job (Coroutines) which called the function.
Let's say, that we have simple function which:
1. Launches new coroutine
2. Inside this coroutine, we call function from DAO marked with @Transaction annotation
Then, if we would cancel this Job immediately after starting it, deadlock happens.
Every following function call, does not make place.
I've made sample project available here:
In there, you have two buttons. First one "Fetch user once" fetches user from the Database (using function called "getUser()").
Second button "Fetch user twice", calls this function twice.
As you can see in the sample code, we have mutual object, to which we assign latest Job.
In the beginning of every function call, we cancel previous Job and start new one.
Everything that is happening, is displayed in the TextView below buttons.
After pressing first button we can see that Job is loading data "Job 1 - Loading Data" for 3 seconds (there is a delay in DAO function) and then fetched data is displayed. In the logs we have:
D/MainActivity: Starting singular fetch
D/MainActivity: [main] Starting job 1
D/[main] Job 1: Getting user
D/UserDao: [arch_disk_io_2] getUserWithTransaction - start delay
D/UserDao: [arch_disk_io_2] getUserWithTransaction - after delay
D/[main] Job 1: Got user: User(uid=123, firstName=Adam)
If we later press the second button, TextView changes to "Job 3 - Loading data" (it's 3, because we called it twice) and nothing then changes. Data is not displayed (because it failed to fetch it), no error message is shown. In the logs we see:
D/MainActivity: Starting double fetch
D/MainActivity: [main] Starting job 2
D/[main] Job 2: Getting user
D/MainActivity: [main] Starting job 3
D/[main] Job 3: Getting user
E/[main] Job 2: Getting user cancelled
kotlinx.coroutines.JobCancellationException: StandaloneCoroutine was cancelled; job=StandaloneCoroutine{Cancelling}@86c7792
From now on, every call to this function is being frozen. For example next click on "Fetch user once" changes TextView message to "Job 4 - Loading data..." but nothing after that. Again in the logs:
D/MainActivity: Starting singular fetch
E/[main] Job 3: Getting user cancelled
kotlinx.coroutines.JobCancellationException: StandaloneCoroutine was cancelled; job=StandaloneCoroutine{Cancelling}@5cdea19
D/MainActivity: [main] Starting job 4
D/[main] Job 4: Getting user
As you can see in second and third case no logs from UsersDao are displayed. This means, that User's Dao function wasn't even called.
We can fix it now only by restarting the application.