Status Update
Comments
el...@google.com <el...@google.com>
el...@google.com <el...@google.com> #2
qu...@quicinc.com <qu...@quicinc.com> #3
Please restart the device and check for this issue on the latest software build. If the issue is still reproducible please share the bugreport(to be captured immediately after resolving the issue) and screenrecord of the issue.
Android bug report (to be captured after reproducing the issue)
For steps to capture a bug report, please refer:
Alternate method Navigate to “Developer options”, ensure “USB debugging” is enabled, then enable “Bug report shortcut”. Capture bug report by holding the power button and selecting the “Take bug report” option.
Screen Record of the Issue Please capture screen record or video of the issue using following steps: adb shell screenrecord /sdcard/video.mp4 Subsequently use following command to pull the recorded file: adb pull /sdcard/video.mp4 Attach the file to this issue.
el...@google.com <el...@google.com> #4
(Note: It is the build when sending this report. For exact build reference, please see the attached bugreport.)
This bug persists with the latest build
Debugging information
Google Play services
com.google.android.gms
Version 225014044 (22.50.14 (190400-499278674))
System App (Updated)
Android System WebView
com.google.android.webview
Version 541408544 (109.0.5414.85)
System App (Updated)
Network operator: Fido
SIM operator: Fido
Filed by Android Beta Feedback. Version (Updated): 2.31-betterbug.external_20221207_RC04
To learn more about our feedback process, please visit
qu...@quicinc.com <qu...@quicinc.com> #5
zh...@nxp.com <zh...@nxp.com> #6
Please restart the device and check if the issue persists?
qu...@quicinc.com <qu...@quicinc.com> #7
zh...@nxp.com <zh...@nxp.com> #8
The issue has been fixed and it will be available in a future build.Please refer the release notes link
qu...@quicinc.com <qu...@quicinc.com> #9
For the following comment in #6,
> Could you add print or dump_stack(); to confirm whether your code has reached the above code?
Where do you want to add this? I think the problem is fairly understood and I don't think a need for this.
> Judging from the linux suspend mechanism and my results, the process you describe will abort the suspend.
Right. From what I see, it's a loop. Please see my explanation below based on my understanding.
This one is the case I've looked into that led to adding CAP_BLOCK_SUSPEND for health HAL service.
1.a. power_supply_changed() -> power_supply_changed_work() -> kobject_uevent() ---> sends uevents to userspace
1.b. Health HAL's healthloop [1] have to process the power supply uevents from battery/USB power supply to take actions E.g. when the charger adapter is unplugged in offmode charging (also known as charger mode), it should process that event and initiate a shutdown in 10 seconds. Without health HAL having the capability to block suspend, these uevents are not handled by health HAL and thereby android battery manager service is not taking the action.
Now comes the cpufreq_offline() part.
2.a. When system suspend is initiated, cpufreq_offline() can be called based on the cpufreq governor policy (I'm not an expert to provide comments here but this is just my understanding).
2.b. cpufreq_offline() -> cpufreq_cooling_unregister() -> thermal_cooling_device_unregister() -> device_del() -> kobject_uevent() -> sends uevents to userspace
For this case, there is no need for health HAL to process any uevents as it is not from power supply subsystem. Hence, there is no need for android battery manager service to take any action either. When health HAL service doesn't have CAP_BLOCK_SUSPEND (i.e. before my fix), this path was working fine without any issues. However, for the previous use case (1.a, 1.b), we definitely need health HAL to block suspend so that it can process all the uevents from power supply subsystem. Of course, it might receive other uevents from other subsystems but those would be discarded.
Here is the worst case I can see.
3.a System suspend is initiated by userspace
3.b cpufreq_offline() -> cpufreq_cooling_unregister() -> thermal_cooling_device_unregister() -> device_del() -> kobject_uevent() -> sends uevents to userspace
3.c Health HAL blocks the suspend until it processes all the uevents even if it's not from power supply subsystem. Of course, any uevents other than from power supply subsystem is going to be discarded. Still it has to receive all of them and process if needed.
3.d Go to (3.a)
Removing CAP_BLOCK_SUSPEND from health HAL service just so that cpufreq_offline() can function as before doesn't seem correct to me. Rather a right solution should be to think on how cpufreq_offline() if invoked in the suspend path shouldn't be sending uevents again to the userspace as that would block the suspend forever.
[1]
Cheers,
Subbaraman
zh...@nxp.com <zh...@nxp.com> #10
First of all thank you for your reply.
I understand that this change is useful to you. The situation I encountered is due to thermal_cooling_device_unregister, platform with two clusters can't enter deep suspend, you also agree.
For the following comment in #9, Rather a right solution should be to think on how cpufreq_offline() if invoked in the suspend path shouldn't be sending uevents again to the userspace as that would block the suspend forever.
I think it is an idea to solve the problem.
But I have a question about your previous reply. For the following comment in #3, you said I don't know how this multi cluster CPU case is causing such a problem. My original change still looks good and we're in fact using it across different targets with multiple CPU clusters so far.
I want to confirm if you encounter the same problem as me when entering the suspend platform with two clusters. If you also encounter the same problem as me, and add CAP_BLOCK_SUSPEND for health HAL service is very useful to you, we can follow the method you said or find a better way to solve this problem.
BRS
Zhipeng
bb...@gmail.com <bb...@gmail.com> #11
တစ်ခါတလေအားနည်းချက်ရှိနေတယ်
Description
If cpu have one cluster, cpu0 is online when enter suspend.
system do not call cpufreq_cooling_unregister when last cpu enter offline.
If cpu have two cluster, cpu0 is in the first cluster and the last cpu(cpu5 for imx8qm)
for second cluster enter suspend, it will call cpufreq_cooling_unregister.
policy_is_inactive for cpu1-4 is false while the value is true for cpu5.
if system call thermal_cooling_device_unregister in process of suspend,it will
active wakeupsource "NETLINK" as below which cause system can't enter suspend.
thermal_cooling_device_unregister -> device_del -> kobject_uevent_env ->netlink_broadcast_filtered
sock_def_readable -> __pm_stay_awake -> wakeup_source_report_event -> wakeup_count++
So the system cannot enter suspend.
So if cpu have two cluster, NETLINK will block system from entering suspend.
I remove the BLOCK_SUSPEND in the capabilities of android.hardware.health@2.1-service.rc,
the kernel will not create wakeup_source for NETLINK. When the NETLINK mechanism is triggered,
it will not block the system from entering the susupend.
This problem is found in Android12, and the code of Android11 is the same as I changed.