Fixed
Status Update
Comments
ra...@google.com <ra...@google.com> #3
Thank you for the quick fix.
Is there an ETA for the next release?
Is there an ETA for the next release?
pe...@gmail.com <pe...@gmail.com> #5
Project: platform/frameworks/support
Branch: androidx-master-dev
commit a1957df3709a06f4e6482fb0e4d39ded4f230a70
Author: Rahul Ravikumar <rahulrav@google.com>
Date: Mon Jul 29 09:48:05 2019
Workaround NPE in PersistableBundle.getExtras().
Test: Existing unit tests pass. Ran integration test app.
Fixes: b/138441699
Change-Id: I0b48e0009a7d83c343a3d26112b94c057470c281
M work/workmanager/src/main/java/androidx/work/impl/background/systemjob/SystemJobService.java
https://android-review.googlesource.com/1092870
https://goto.google.com/android-sha1/a1957df3709a06f4e6482fb0e4d39ded4f230a70
Branch: androidx-master-dev
commit a1957df3709a06f4e6482fb0e4d39ded4f230a70
Author: Rahul Ravikumar <rahulrav@google.com>
Date: Mon Jul 29 09:48:05 2019
Workaround NPE in PersistableBundle.getExtras().
Test: Existing unit tests pass. Ran integration test app.
Fixes:
Change-Id: I0b48e0009a7d83c343a3d26112b94c057470c281
M work/workmanager/src/main/java/androidx/work/impl/background/systemjob/SystemJobService.java
ra...@google.com <ra...@google.com> #6
pe...@gmail.com <pe...@gmail.com> #7
So, I've I understood well, because the worker crashed, in order to get it into a RUNNING state I need to submit new workers into the chain, right?
This would explain that on the first crash only after I submit the new workers everything starts again to correctly be processed.
But, if I force a second crash, then after reopening the app the chain is no longer processed at all, not even submitting new workers into the chain.
This would explain that on the first crash only after I submit the new workers everything starts again to correctly be processed.
But, if I force a second crash, then after reopening the app the chain is no longer processed at all, not even submitting new workers into the chain.
pe...@gmail.com <pe...@gmail.com> #8
OK, I misunderstood your final explanation. I will do a few test more waiting for the back-off period.
ra...@google.com <ra...@google.com> #9
No, you don't have to submit new Workers. The Worker will go from an ENQUEUED state to a RUNNING state eventually (after back-off). I just tested it using the same test app you attached before.
ra...@google.com <ra...@google.com> #10
If you turn on logging, you should see something like this:
02-21 14:47:15.264 8191 8207 D WM-ForceStopRunnable: Unfinished Workers exist, rescheduling.
02-21 14:47:15.414 8191 8191 D WM-SystemJobService: onStartJob for d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:15.417 8191 8207 D WM-Processor: Processor: processing d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:15.418 8191 8207 D WM-WorkerWrapper: Delaying execution for com.perracolabs.workmanagerbug.WorkerTest because it is being executed before schedule. [ Delay ]
02-21 14:47:15.583 8191 8191 D WM-Processor: Processor d7ce98ce-3d3d-401e-8812-a49430b26fc1 executed; reschedule = true
02-21 14:47:15.584 8191 8191 D WM-SystemJobService: d7ce98ce-3d3d-401e-8812-a49430b26fc1 executed on JobScheduler
02-21 14:47:25.647 8191 8191 D WM-SystemJobService: onStartJob for d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.648 8191 8207 D WM-Processor: Processor: processing d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.665 8191 8191 D WM-WorkerWrapper: Starting work for com.perracolabs.workmanagerbug.WorkerTest
02-21 14:47:25.671 8191 8207 D WM-WorkerWrapper: com.perracolabs.workmanagerbug.WorkerTest returned a Success {mOutputData=androidx.work.Data@0} result.
02-21 14:47:25.672 8191 8207 I WM-WorkerWrapper: Worker result SUCCESS for Work [ id=d7ce98ce-3d3d-401e-8812-a49430b26fc1, tags={ com.perracolabs.workmanagerbug.WorkerTest } ]
02-21 14:47:25.673 8191 8207 D WM-PackageManagerHelper: androidx.work.impl.background.systemalarm.RescheduleReceiver disabled
02-21 14:47:25.676 8191 8191 D WM-Processor: Processor d7ce98ce-3d3d-401e-8812-a49430b26fc1 executed; reschedule = false
02-21 14:47:25.676 8191 8191 D WM-SystemJobService: d7ce98ce-3d3d-401e-8812-a49430b26fc1 executed on JobScheduler
02-21 14:47:25.677 8191 8207 D WM-GreedyScheduler: Cancelling work ID d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.679 8191 8207 D WM-Processor: Processor stopping d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.679 8191 8207 D WM-Processor: WorkerWrapper could not be found for d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.679 8191 8207 D WM-StopWorkRunnable: StopWorkRunnable for d7ce98ce-3d3d-401e-8812-a49430b26fc1; Processor.stopWork = false
02-21 14:47:15.264 8191 8207 D WM-ForceStopRunnable: Unfinished Workers exist, rescheduling.
02-21 14:47:15.414 8191 8191 D WM-SystemJobService: onStartJob for d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:15.417 8191 8207 D WM-Processor: Processor: processing d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:15.418 8191 8207 D WM-WorkerWrapper: Delaying execution for com.perracolabs.workmanagerbug.WorkerTest because it is being executed before schedule. [ Delay ]
02-21 14:47:15.583 8191 8191 D WM-Processor: Processor d7ce98ce-3d3d-401e-8812-a49430b26fc1 executed; reschedule = true
02-21 14:47:15.584 8191 8191 D WM-SystemJobService: d7ce98ce-3d3d-401e-8812-a49430b26fc1 executed on JobScheduler
02-21 14:47:25.647 8191 8191 D WM-SystemJobService: onStartJob for d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.648 8191 8207 D WM-Processor: Processor: processing d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.665 8191 8191 D WM-WorkerWrapper: Starting work for com.perracolabs.workmanagerbug.WorkerTest
02-21 14:47:25.671 8191 8207 D WM-WorkerWrapper: com.perracolabs.workmanagerbug.WorkerTest returned a Success {mOutputData=androidx.work.Data@0} result.
02-21 14:47:25.672 8191 8207 I WM-WorkerWrapper: Worker result SUCCESS for Work [ id=d7ce98ce-3d3d-401e-8812-a49430b26fc1, tags={ com.perracolabs.workmanagerbug.WorkerTest } ]
02-21 14:47:25.673 8191 8207 D WM-PackageManagerHelper: androidx.work.impl.background.systemalarm.RescheduleReceiver disabled
02-21 14:47:25.676 8191 8191 D WM-Processor: Processor d7ce98ce-3d3d-401e-8812-a49430b26fc1 executed; reschedule = false
02-21 14:47:25.676 8191 8191 D WM-SystemJobService: d7ce98ce-3d3d-401e-8812-a49430b26fc1 executed on JobScheduler
02-21 14:47:25.677 8191 8207 D WM-GreedyScheduler: Cancelling work ID d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.679 8191 8207 D WM-Processor: Processor stopping d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.679 8191 8207 D WM-Processor: WorkerWrapper could not be found for d7ce98ce-3d3d-401e-8812-a49430b26fc1
02-21 14:47:25.679 8191 8207 D WM-StopWorkRunnable: StopWorkRunnable for d7ce98ce-3d3d-401e-8812-a49430b26fc1; Processor.stopWork = false
pe...@gmail.com <pe...@gmail.com> #11
I'm still able to reproduce it but randomly. Most of the times seems to honor the back-off criteria, and other times the workers get permanently stuck with several in a BLOCKED state and only 1 in a ENQUEUED state.
When stuck, if I kill the app via the recent apps history, then after reopening it everything gets processed correctly.
I thought that maybe was that since in my test I send several Workers, then maybe the back-off criteria were getting accumulated and hence the delay could become several minutes. But removing the app and re-installing to start a clean test again shows that even after several minutes they don't get processed.
I've also noticed that if when crashed instead of pressing the "Open app again" option, I choose "Close app", immediately reopen the app through the launcher and send new workers before the current stuck ones gets processed, then there are more chances to be reproduced.
Will update when I find proper pattern, unless I can prove to myself that I'm doing something wrong.
When stuck, if I kill the app via the recent apps history, then after reopening it everything gets processed correctly.
I thought that maybe was that since in my test I send several Workers, then maybe the back-off criteria were getting accumulated and hence the delay could become several minutes. But removing the app and re-installing to start a clean test again shows that even after several minutes they don't get processed.
I've also noticed that if when crashed instead of pressing the "Open app again" option, I choose "Close app", immediately reopen the app through the launcher and send new workers before the current stuck ones gets processed, then there are more chances to be reproduced.
Will update when I find proper pattern, unless I can prove to myself that I'm doing something wrong.
ra...@google.com <ra...@google.com> #12
You should enable logging in your app, and that will tell you a lot more about what might be going on. Look at how to initialize WorkManager with a custom Configuration.
When I test thing, the Workers always eventually transition to RUNNING.
Send us the logs, if you still find something odd.
When I test thing, the Workers always eventually transition to RUNNING.
Send us the logs, if you still find something odd.
pe...@gmail.com <pe...@gmail.com> #13
Please find attached a log file.
As you will see in the log the first time the chain gets correctly processed, then on the second crash they got stuck. I waited for several minutes, yet they didn't get processed.
The last line of the log (shown next) maybe is related to the problem.
2019-02-22 01:36:39.069 766-766/? W/JobServiceContext: Time-out while trying to bind 82dcb4 #u0a29/979 com.perracolabs.workmanagerbug/androidx.work.impl.background.systemjob.SystemJobService, dropping.
As you will see in the log the first time the chain gets correctly processed, then on the second crash they got stuck. I waited for several minutes, yet they didn't get processed.
The last line of the log (shown next) maybe is related to the problem.
2019-02-22 01:36:39.069 766-766/? W/JobServiceContext: Time-out while trying to bind 82dcb4 #u0a29/979 com.perracolabs.workmanagerbug/androidx.work.impl.background.systemjob.SystemJobService, dropping.
su...@google.com <su...@google.com> #14
Where are you initializing your WorkManager?
pe...@gmail.com <pe...@gmail.com> #15
For this test the first time the WorkManager.getInstance() gets called in the activity onCreate() method in order to place the LiveData observers. So I assume this is the moment when it gets initialized. Though WorkManager.getInstance() is also called every time the test execution button is pressed in its onClick method.
ra...@google.com <ra...@google.com> #16
I tried to reproduce this problem again, and its seems that I can get into this state easily on Android P (relative to other versions of the platform).
After looking into this further, this is what is going on; After the app process crashes more than a few times, ActivityManager designates that the app as "bad". JobScheduler tries to bind to WorkManager's JobService via ActivityManager to start a job, and times out because ActivityManager considers this app "bad" the will not allow binds to the JobService.
At this point JobScheduler drops the job (a bug) and hence the Worker is still ENQUEUED.
At some point in the future (after a subsequent process restart / more Workers being enqueued) this Worker will eventually execute because a new job is created by WorkManager (because WorkManager discovers a Worker does not have a backing job).
I am going to file a bug with JobScheduler to better handle this edge case. However, this is not something WorkManager can work around. Also, your work is not blocked indefinitely. It will eventually transition to a RUNNING state.
After looking into this further, this is what is going on; After the app process crashes more than a few times, ActivityManager designates that the app as "bad". JobScheduler tries to bind to WorkManager's JobService via ActivityManager to start a job, and times out because ActivityManager considers this app "bad" the will not allow binds to the JobService.
At this point JobScheduler drops the job (a bug) and hence the Worker is still ENQUEUED.
At some point in the future (after a subsequent process restart / more Workers being enqueued) this Worker will eventually execute because a new job is created by WorkManager (because WorkManager discovers a Worker does not have a backing job).
I am going to file a bug with JobScheduler to better handle this edge case. However, this is not something WorkManager can work around. Also, your work is not blocked indefinitely. It will eventually transition to a RUNNING state.
pe...@gmail.com <pe...@gmail.com> #17
This makes sense. I guess that for a production app will not be a problem as an app that crashes constantly isn't something realistic, and forcing it in a test project is logical that eventually can get reproduced with none realistic conditions.
Said that, without forcing the crash scenario the initial issue I reported seems to be resolved following a normal use-case. Though maybe some minimal notes could be added to the documentation to state that such possibility may happen while developing, as I discovered the initial issue while debugging inside a Worker.doWork method, killed the app process to re-start debugging again and found that the chain wasn't getting processed.
Thanks for the update.
Said that, without forcing the crash scenario the initial issue I reported seems to be resolved following a normal use-case. Though maybe some minimal notes could be added to the documentation to state that such possibility may happen while developing, as I discovered the initial issue while debugging inside a Worker.doWork method, killed the app process to re-start debugging again and found that the chain wasn't getting processed.
Thanks for the update.
jo...@truecaller.com <jo...@truecaller.com> #18
Hi, I would like to add some more details about the crash we started to experience after we updated to WorkManager 1.0.0. I think this crash is related to the problem described in this thread especially in the end of it where and we verified that it had direct impact on our users after we shipped another version where we had to downgrade WorkManager to 1.0.0-rc01
In our app we have many users (120M+ daily active users) running our app on different OS. After we released WorkManager 1.0.0 we got a number of users (not many users overall) but generating very high amount of crashers.
Some stats from one of the crash groups caused by this bug - 7251 users but 2,770,397 reports total
Fatal Exception: java.lang.IllegalStateException: Apps may not schedule more than 100 distinct jobs
at android.os.Parcel.createException(Parcel.java:1958)
at android.os.Parcel.readException(Parcel.java:1918)
at android.os.Parcel.readException(Parcel.java:1868)
at android.app.job.IJobScheduler$Stub$Proxy.schedule(IJobScheduler.java:184)
at android.app.JobSchedulerImpl.schedule(JobSchedulerImpl.java:44)
at androidx.work.impl.background.systemjob.SystemJobScheduler.scheduleInternal(SystemJobScheduler.java:161)
at androidx.work.impl.background.systemjob.SystemJobScheduler.schedule(SystemJobScheduler.java:128)
at androidx.work.impl.Schedulers.schedule(Schedulers.java:91)
at androidx.work.impl.utils.ForceStopRunnable.run(ForceStopRunnable.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:764)
Caused by android.os.RemoteException: Remote stack trace:
at com.android.server.job.JobSchedulerService.scheduleAsPackage(JobSchedulerService.java:871)
at com.android.server.job.JobSchedulerService$JobSchedulerStub.schedule(JobSchedulerService.java:2554)
at android.app.job.IJobScheduler$Stub.onTransact(IJobScheduler.java:60)
at android.os.Binder.execTransact(Binder.java:731)
We had to halt ongoing release, downgrade WorkManager and release a new version. We verified crashes went down. Please prioritize and fix. If needed I can provide additional information: links to crash stacks etc
In our app we have many users (120M+ daily active users) running our app on different OS. After we released WorkManager 1.0.0 we got a number of users (not many users overall) but generating very high amount of crashers.
Some stats from one of the crash groups caused by this bug - 7251 users but 2,770,397 reports total
Fatal Exception: java.lang.IllegalStateException: Apps may not schedule more than 100 distinct jobs
at android.os.Parcel.createException(Parcel.java:1958)
at android.os.Parcel.readException(Parcel.java:1918)
at android.os.Parcel.readException(Parcel.java:1868)
at android.app.job.IJobScheduler$Stub$Proxy.schedule(IJobScheduler.java:184)
at android.app.JobSchedulerImpl.schedule(JobSchedulerImpl.java:44)
at androidx.work.impl.background.systemjob.SystemJobScheduler.scheduleInternal(SystemJobScheduler.java:161)
at androidx.work.impl.background.systemjob.SystemJobScheduler.schedule(SystemJobScheduler.java:128)
at androidx.work.impl.Schedulers.schedule(Schedulers.java:91)
at androidx.work.impl.utils.ForceStopRunnable.run(ForceStopRunnable.java:100)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:764)
Caused by android.os.RemoteException: Remote stack trace:
at com.android.server.job.JobSchedulerService.scheduleAsPackage(JobSchedulerService.java:871)
at com.android.server.job.JobSchedulerService$JobSchedulerStub.schedule(JobSchedulerService.java:2554)
at android.app.job.IJobScheduler$Stub.onTransact(IJobScheduler.java:60)
at android.os.Binder.execTransact(Binder.java:731)
We had to halt ongoing release, downgrade WorkManager and release a new version. We verified crashes went down. Please prioritize and fix. If needed I can provide additional information: links to crash stacks etc
su...@google.com <su...@google.com> #19
#18: Please file a new bug with the appropriate details. Please also try 1.0.1-rc01, which includes the fix for this issue, before filing.
ra...@google.com <ra...@google.com> #20
WorkManager's schedulers won't schedule the same job if JobScheduler is already aware of the job. This is because we know the jobId corresponding to a given WorkRequest id.
So the fact that more jobs are being scheduled is very interesting (because they should not be).
Can you please provide us with some more information ?
* How are you initializing WorkManager ?
* Are you creating any WorkRequest's on app startup ? If so what kind of work requests are they ?
* Are you scheduling work using JobScheduler on your own, apart from WorkManager ?
* How many users (total) and the population (total) got effected ? (Just to get an idea of % of users effected). What devices were they on? What versions of Android ?
* Can you send us a sample app that reproduces the problem ?
So the fact that more jobs are being scheduled is very interesting (because they should not be).
Can you please provide us with some more information ?
* How are you initializing WorkManager ?
* Are you creating any WorkRequest's on app startup ? If so what kind of work requests are they ?
* Are you scheduling work using JobScheduler on your own, apart from WorkManager ?
* How many users (total) and the population (total) got effected ? (Just to get an idea of % of users effected). What devices were they on? What versions of Android ?
* Can you send us a sample app that reproduces the problem ?
sa...@gmail.com <sa...@gmail.com> #21
I need some work.
Description
Version used: 1.0.0-rc01
Devices/Android versions reproduced on: Nexus 6, API 25 (7.1.1)
Description:
If an app crashes while a Worker is being executed, then new submitted Workers aren't executed, even after the app is restarted. The WorkManager will not resume any further work execution at any time, neither if waiting for several minutes.
So far, seen only when sending OneTimeWorkRequest works with enqueueUniqueWork or beginUniqueWork.
The issue started to appear in version 1.0.0-rc01
Find attached a test project. Once run the app shows the steps to reproduce the issue.