Platforms with two clusters can't enter deep suspend. [194314791]

Assigned

Bug

Status Update

No update yet.

Description

zh...@nxp.com

created issue #1

Jul 22, 2021 01:32AM

Platforms with two clusters can't enter deep suspend.

If cpu have one cluster, cpu0 is online when enter suspend.
system do not call cpufreq_cooling_unregister when last cpu enter offline.

If cpu have two cluster, cpu0 is in the first cluster and the last cpu(cpu5 for imx8qm)
for second cluster enter suspend, it will call cpufreq_cooling_unregister.
policy_is_inactive for cpu1-4 is false while the value is true for cpu5.

if system call thermal_cooling_device_unregister in process of suspend,it will
active wakeupsource "NETLINK" as below which cause system can't enter suspend.
thermal_cooling_device_unregister -> device_del -> kobject_uevent_env ->netlink_broadcast_filtered
sock_def_readable -> __pm_stay_awake -> wakeup_source_report_event -> wakeup_count++

So the system cannot enter suspend.

So if cpu have two cluster, NETLINK will block system from entering suspend.

I remove the BLOCK_SUSPEND in the capabilities of android.hardware.health@2.1-service.rc,
the kernel will not create wakeup_source for NETLINK. When the NETLINK mechanism is triggered,
it will not block the system from entering the susupend.

This problem is found in Android12, and the code of Android11 is the same as I changed.

Comments

el...@google.com <el...@google.com> Jul 22, 2021 01:38AM

Assigned to el...@google.com.

el...@google.com <el...@google.com> #2Jul 22, 2021 09:02PM

Information redacted by Android Beta Feedback.

qu...@quicinc.com <qu...@quicinc.com> #3Jul 23, 2021 09:09PM

Please restart the device and check for this issue on the latest software build. If the issue is still reproducible please share the bugreport(to be captured immediately after resolving the issue) and screenrecord of the issue.

Android bug report (to be captured after reproducing the issue) For steps to capture a bug report, please refer: https://developer.android.com/studio/debug/bug-report#bugreportdevice

Alternate method Navigate to “Developer options”, ensure “USB debugging” is enabled, then enable “Bug report shortcut”. Capture bug report by holding the power button and selecting the “Take bug report” option.

Screen Record of the Issue Please capture screen record or video of the issue using following steps: adb shell screenrecord /sdcard/video.mp4 Subsequently use following command to pull the recorded file: adb pull /sdcard/video.mp4 Attach the file to this issue.

el...@google.com <el...@google.com> #4Jul 26, 2021 10:17PM

Reassigned to zh...@nxp.com.

- Build Number: google/cheetah/cheetah:13/T2B2.221216.006/9443277:user/release-keys
(Note: It is the build when sending this report. For exact build reference, please see the attached bugreport.)

This bug persists with the latest build

Debugging information
Google Play services
com.google.android.gms
Version 225014044 (22.50.14 (190400-499278674))
System App (Updated)

Android System WebView
com.google.android.webview
Version 541408544 (109.0.5414.85)
System App (Updated)

Network operator: Fido
SIM operator: Fido

Filed by Android Beta Feedback. Version (Updated): 2.31-betterbug.external_20221207_RC04
To learn more about our feedback process, please visit

https://developer.android.com/preview/feedback#feedback-app.

qu...@quicinc.com <qu...@quicinc.com> #5Jul 29, 2021 06:10PM

Information redacted by Android Beta Feedback.

zh...@nxp.com <zh...@nxp.com> #6Jul 30, 2021 01:21AM

Please restart the device and check if the issue persists?

qu...@quicinc.com <qu...@quicinc.com> #7Aug 5, 2021 03:54AM

Reboot doesn't help.

zh...@nxp.com <zh...@nxp.com> #8Aug 6, 2021 07:06AM

The issue has been fixed and it will be available in a future build.Please refer the release notes link https://developer.android.com/about/versions/13/release-notes.

qu...@quicinc.com <qu...@quicinc.com> #9Sep 4, 2021 03:30AM

Hi Zhipeng,

For the following comment in #6,

> Could you add print or dump_stack(); to confirm whether your code has reached the above code?

Where do you want to add this? I think the problem is fairly understood and I don't think a need for this.

> Judging from the linux suspend mechanism and my results, the process you describe will abort the suspend.

Right. From what I see, it's a loop. Please see my explanation below based on my understanding.

This one is the case I've looked into that led to adding CAP_BLOCK_SUSPEND for health HAL service.

1.a. power_supply_changed() -> power_supply_changed_work() -> kobject_uevent() ---> sends uevents to userspace

1.b. Health HAL's healthloop [1] have to process the power supply uevents from battery/USB power supply to take actions E.g. when the charger adapter is unplugged in offmode charging (also known as charger mode), it should process that event and initiate a shutdown in 10 seconds. Without health HAL having the capability to block suspend, these uevents are not handled by health HAL and thereby android battery manager service is not taking the action.

Now comes the cpufreq_offline() part.

2.a. When system suspend is initiated, cpufreq_offline() can be called based on the cpufreq governor policy (I'm not an expert to provide comments here but this is just my understanding).

2.b. cpufreq_offline() -> cpufreq_cooling_unregister() -> thermal_cooling_device_unregister() -> device_del() -> kobject_uevent() -> sends uevents to userspace

For this case, there is no need for health HAL to process any uevents as it is not from power supply subsystem. Hence, there is no need for android battery manager service to take any action either. When health HAL service doesn't have CAP_BLOCK_SUSPEND (i.e. before my fix), this path was working fine without any issues. However, for the previous use case (1.a, 1.b), we definitely need health HAL to block suspend so that it can process all the uevents from power supply subsystem. Of course, it might receive other uevents from other subsystems but those would be discarded.

Here is the worst case I can see.

3.a System suspend is initiated by userspace
3.b cpufreq_offline() -> cpufreq_cooling_unregister() -> thermal_cooling_device_unregister() -> device_del() -> kobject_uevent() -> sends uevents to userspace
3.c Health HAL blocks the suspend until it processes all the uevents even if it's not from power supply subsystem. Of course, any uevents other than from power supply subsystem is going to be discarded. Still it has to receive all of them and process if needed.
3.d Go to (3.a)

Removing CAP_BLOCK_SUSPEND from health HAL service just so that cpufreq_offline() can function as before doesn't seem correct to me. Rather a right solution should be to think on how cpufreq_offline() if invoked in the suspend path shouldn't be sending uevents again to the userspace as that would block the suspend forever.

[1]

https://android.googlesource.com/platform/hardware/interfaces/+/refs/heads/master/health/utils/libhealthloop/HealthLoop.cpp#146

Cheers,
Subbaraman

zh...@nxp.com <zh...@nxp.com> #10Sep 6, 2021 03:04AM

Hi Subbaraman,

First of all thank you for your reply.
I understand that this change is useful to you. The situation I encountered is due to thermal_cooling_device_unregister, platform with two clusters can't enter deep suspend, you also agree.

For the following comment in #9, Rather a right solution should be to think on how cpufreq_offline() if invoked in the suspend path shouldn't be sending uevents again to the userspace as that would block the suspend forever.
I think it is an idea to solve the problem.

But I have a question about your previous reply. For the following comment in #3, you said I don't know how this multi cluster CPU case is causing such a problem. My original change still looks good and we're in fact using it across different targets with multiple CPU clusters so far.
I want to confirm if you encounter the same problem as me when entering the suspend platform with two clusters. If you also encounter the same problem as me, and add CAP_BLOCK_SUSPEND for health HAL service is very useful to you, we can follow the method you said or find a better way to solve this problem.

BRS
Zhipeng

bb...@gmail.com <bb...@gmail.com> #11Nov 3, 2021 02:09PM

တစ်ခါတလေအားနည်းချက်ရှိနေတယ်

wi...@gmail.com <wi...@gmail.com> #12Dec 31, 2021 02:02AM

i will download all this message and will take them to the police

da...@linaro.org <da...@linaro.org> #13Feb 22, 2024 12:02PM

https://bugzilla.kernel.org/show_bug.cgi?id=218521