Status Update
Comments
ra...@google.com <ra...@google.com> #2
They won't grow unbounded. The current implementation only looks at the top 200 scheduled requests by priority.
If you are seeing OOM errors, WorkManager
is very unlikely to be the root cause. The app was already likely under memory pressure.
sh...@pinterest.com <sh...@pinterest.com> #3
Ah, I see. In my sample app I got mConstrainedWorkSpec
to grow to size 220 then no larger. While I agree that likely something bad in happening in our app, we are seeing ArrayDeque.doubleCapacity
fail with allocations in the 30-65MB range, which also seems strange, so I was adding breakpoints in our app and noticed these collections growing large.
se...@google.com <se...@google.com> #4
I've seen similar cases in internal apps when a queue in SerialExecutor
grows very long and it results in failure in ArrayDeque.doubleCapacity
. So I think there can be an issue in WM
sh...@pinterest.com <sh...@pinterest.com> #5
#4: yes, that's what we're seeing in our app. At some point the task list of the SerialExecutor
gets so long that it fails doubling its capacity.
Is it possible that work could get scheduled, then rescheduled or reshuffled such that the list of eligible WorkSpecs returned from the DAO change, and mConstrainedWorkSpecs
just keeps growing? mConstrainedWorkSpecs
never seems to get cleared if the jobs are not run, and just calls .addAll
with all eligible WorkSpecs so while I see how the number of jobs getting passed into GreedyScheduler.schedule
is capped at 200, the internal collections are technically not.
sh...@pinterest.com <sh...@pinterest.com> #7
Sorry, I mean in the degenerate case where work never runs due to it being constrained (e.g. no network).
se...@google.com <se...@google.com> #8
So far I've found one bug: because WorkSpec includes scheduleRequestedAt
in its equals
and hashCode
, GreedyScheduler
has two copies of the same work because on the first iteration it receives WorkSpec
with scheduleRequestedAt = -1
and on the second iteration it receives a WorkSpec with scheduleRequestedAt
with value set to the time of original schedule.
While I'll address this issue, I don't think it is enough to result in big enough queue in SerialExecutor
, so need to look more.
se...@google.com <se...@google.com> #9
Spent more time with this:
I've added a button that schedules 5000 workers and checked the queue size in SerialExecutor
, it grows up to 200k, which is a lot, but not a anywhere close to few millions like it is in the
se...@google.com <se...@google.com> #10
One interesting thing that I noticed that previously GreedyScheduler
was fairly spammy with stop commands when constraints weren't met, even if work hasn't been actually started. With some of the latest changes, it only
sh...@pinterest.com <sh...@pinterest.com> #11
Has that change to only stop work that has been started been released yet? I did notice when testing that there seemed to be a lot of Stop commands in the queue.
sh...@pinterest.com <sh...@pinterest.com> #12
And in your test, are those 5000 workers constrained or not?
se...@google.com <se...@google.com> #13
Has that change to only stop work that has been started been released yet?
Unfortunately it hasn't been released just yet
And in your test, are those 5000 workers constrained or not?
Yeah, they were constrained.
sh...@pinterest.com <sh...@pinterest.com> #14
Ok. Perhaps this issue will be helped whenever that stop work change gets released. I do think in the end that we're doing something wrong-ish in our app, but it also seems like there are opportunities to add some guardrails inside of WM.
em...@getkeepsafe.com <em...@getkeepsafe.com> #15
We are also experiencing this issue when we updated from WorkManager `2.3.4` -> `2.6.0`.
I wanted to ask if there are any updates to release the fix or is there a possible work around for the issue?
The reason I ask is because downgrading to `2.3.4` is not a good long term solution since updating to target SDK 31+ will cause a different crash with WorkManager due to the changes in Android S with pending intent flags.
Thank you.
sh...@pinterest.com <sh...@pinterest.com> #16
#15 - This isn't a perfect fix as I do think the real fix is somewhere in the Google frameworks, but we did notice one job in our app that we were scheduling with enqueue when we could have used enqueueUniqueWork instead. Our A/B tests do show a small dip in OOMs, so you might check if you can make the same change in your app as well.
em...@getkeepsafe.com <em...@getkeepsafe.com> #17
se...@google.com <se...@google.com> #18
unfortunately the fix that prevents spam with stop commands can't be easily released separately. However, next alpha version of WM should be available at the end of July and it would be nice if anyone can try it out to see if OOM issue gets resolved
em...@getkeepsafe.com <em...@getkeepsafe.com> #19
em...@getkeepsafe.com <em...@getkeepsafe.com> #20
I could not find this issue ID: 235259756 in any of the release notes.
Thank you.
ra...@google.com <ra...@google.com> #21
Yes, it was.
em...@getkeepsafe.com <em...@getkeepsafe.com> #22
Update:
I've confirmed that 2.8.0-alpha04
addresses the OOM issue and is not being reported in our app.
I filed a separate issue in 2.8.0-alpha04
.
se...@google.com <se...@google.com> #23
Closing this bug, aosp/2052907 and aosp/2227317 have been landed and released as beta. If the original cause was understood correctly, then load on SerialExecutor.mTasks
should have been improved significantly. Please comment here or file a new issue if you still see the issue with 2.8.0-alpha04 or newer.
Description
Component used: WorkManager Version used: v2.7.1 Devices/Android versions reproduced on: all
I have noticed that there are a couple opportunities for collections in WorkManager to grow unbounded due to constrained workers. I don't know if this is by-design or a bug.
See sample project herehttps://github.com/shashachu/workmanageroom