Status Update
Comments
gb...@google.com <gb...@google.com> #2
The watchdog issue is caused by vold stuck in mounting fuse user bind bounts
I only have the logs on Xiaomi devices, and Xiaomi's dual app feature is built based on the AOSP's app clone feature, only difference is that the user id of the Xiaomi's dual app is fixed to 999
// create the app clone
03-05 16:11:01.422 989 989 I vold : onUserAdded: 999
03-05 16:11:02.028 989 989 I vold : Bind mounted /mnt/user/0/emulated/0 on /mnt/user/999/emulated/0
03-05 16:11:02.028 989 989 I vold : Configuring read_ahead of /mnt/user/999/emulated fuse filesystem to 256kb
// Delete app clone
03-05 16:11:11.230 989 995 I vold : Killing all processes referencing /storage/emulated/999
03-05 16:11:11.785 989 995 I vold : fuse-bpf is disabled because of property ro.fuse.bpf.is_running
03-05 16:11:11.785 989 995 I vold : Unmounting /mnt/user/999/emulated/0
03-05 16:11:11.786 989 995 I vold : Unmounting /mnt/user/999/emulated/999/Android/data
03-05 16:11:11.787 989 995 I vold : Unmounted /mnt/user/999/emulated/999/Android/data
03-05 16:11:11.787 989 995 I vold : Unmounting /mnt/user/999/emulated/999/Android/obb
03-05 16:11:11.787 989 995 I vold : Unmounted /mnt/user/999/emulated/999/Android/obb
03-05 16:11:11.787 989 995 I vold : Unmounting fuse path /mnt/user/999/emulated
03-05 16:11:11.788 989 995 I vold : Unmounting pass_through_path /mnt/pass_through/999/emulated
// Re-create the app clone
03-05 16:12:28.393 989 995 I vold : onUserAdded: 999
03-05 16:12:28.626 989 995 I vold : Mounting emulated fuse volume
03-05 16:12:28.638 989 995 I vold : Bind mounting /data/media to /mnt/pass_through/999/emulated
03-05 16:12:28.644 2604 3669 I StorageSessionController: Creating and starting session with id: emulated;999
// Becuase the old fuse session for the the previously deleted app clone is not destroyed correctly, new session would not be created
03-05 16:12:28.656 5845 6204 W ExternalStorageServiceImpl: Session already started with id: emulated;999
// Stuck forever !
03-05 16:12:28.657 989 995 I vold : Bind mounting //data/media/999/Android/data on /mnt/user/999/emulated/999/Android/data
"watchdog.monitor" prio=5 tid=16 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x138c1540 self=0xb400007b61654330
| sysTid=2753 nice=0 cgrp=foreground sched=0/0 handle=0x798c6766b0
| state=S schedstat=( 14406145 8462918 49 ) utm=0 stm=0 core=1 HZ=100
| stack=0x798c573000-0x798c575000 stackSize=1037KB
| held mutexes=
native: #00 pc 000c304c /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12) (BuildId: da81583da99c8268aa14df4694ef9086)
native: #01 pc 00069c7c /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156) (BuildId: da81583da99c8268aa14df4694ef9086)
native: #02 pc 0004bec0 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver+244) (BuildId: 44b748826f51f8fcc91290b7d22a013e)
native: #03 pc 0004d23c /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse+136) (BuildId: 44b748826f51f8fcc91290b7d22a013e)
native: #04 pc 0004cf48 /system/lib64/libbinder.so (android::IPCThreadState::transact+188) (BuildId: 44b748826f51f8fcc91290b7d22a013e)
native: #05 pc 00061e48 /system/lib64/libbinder.so (android::BpBinder::transact+200) (BuildId: 44b748826f51f8fcc91290b7d22a013e)
native: #06 pc 001b2b10 /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact+156) (BuildId: 462355a3dfd128463d8eb4e28bd5323f)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(BinderProxy.java:684)
at android.os.IVold$Stub$Proxy.monitor(IVold.java:1666)
at com.android.server.StorageManagerService.monitor(StorageManagerService.java:5093)
at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:410)
at android.os.Handler.handleCallback(Handler.java:959)
at android.os.Handler.dispatchMessage(Handler.java:100)
at android.os.Looper.loopOnce(Looper.java:249)
at android.os.Looper.loop(Looper.java:337)
at android.os.HandlerThread.run(HandlerThread.java:85)
at com.android.server.ServiceThread.run(ServiceThread.java:46)
DumpLatencyMs: 23.7132
"StorageManagerService" prio=5 tid=123 Native
| group="main" sCount=1 ucsCount=0 flags=1 obj=0xcf012b8 self=0xb400007b61768960
| sysTid=3669 nice=0 cgrp=foreground sched=0/0 handle=0x79093c36b0
| state=S schedstat=( 50018397 84219946 273 ) utm=2 stm=2 core=5 HZ=100
| stack=0x79092c0000-0x79092c2000 stackSize=1037KB
| held mutexes=
native: #00 pc 000c304c /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12) (BuildId: da81583da99c8268aa14df4694ef9086)
native: #01 pc 00069c7c /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156) (BuildId: da81583da99c8268aa14df4694ef9086)
native: #02 pc 0004bec0 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver+244) (BuildId: 44b748826f51f8fcc91290b7d22a013e)
native: #03 pc 0004d23c /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse+136) (BuildId: 44b748826f51f8fcc91290b7d22a013e)
native: #04 pc 0004cf48 /system/lib64/libbinder.so (android::IPCThreadState::transact+188) (BuildId: 44b748826f51f8fcc91290b7d22a013e)
native: #05 pc 00061e48 /system/lib64/libbinder.so (android::BpBinder::transact+200) (BuildId: 44b748826f51f8fcc91290b7d22a013e)
native: #06 pc 001b2b10 /system/lib64/libandroid_runtime.so (android_os_BinderProxy_transact+156) (BuildId: 462355a3dfd128463d8eb4e28bd5323f)
at android.os.BinderProxy.transactNative(Native method)
at android.os.BinderProxy.transact(BinderProxy.java:684)
at android.os.IVold$Stub$Proxy.mount(IVold.java:1866)
at com.android.server.StorageManagerService.mount(StorageManagerService.java:2591)
at com.android.server.StorageManagerService.-$$Nest$mmount(unavailable:0)
at com.android.server.StorageManagerService$StorageManagerServiceHandler.handleMessage(StorageManagerService.java:782)
at android.os.Handler.dispatchMessage(Handler.java:107)
at android.os.Looper.loopOnce(Looper.java:249)
at android.os.Looper.loop(Looper.java:337)
at android.os.HandlerThread.run(HandlerThread.java:85)
----- Waiting Channels: pid 989 at 2025-03-05 16:13:44.984401844+0800 -----
Cmd line: /system/bin/vold --blkid_context=u:r:blkid:s0 --blkid_untrusted_context=u:r:blkid_untrusted:s0 --fsck_context=u:r:fsck:s0 --fsck_untrusted_context=u:r:fsck_untrusted:s0
sysTid=989 state=S binder_wait_for_work
sysTid=995 state=D fuse_get_req
sysTid=996 state=S do_sys_poll
sysTid=997 state=S binder_wait_for_work
sysTid=1096 state=S binder_wait_for_work
sysTid=1280 state=S futex_wait_queue_me
sysTid=3677 state=S binder_wait_for_work
sysTid=11118 state=S futex_wait_queue_me
----- end 989 -----
<6>[ 244.766287][T12063] sysrq: Show Blocked State
<6>[ 244.766384][T12063] task:binder:989_1 state:D stack: 0 pid: 995 ppid: 1 flags:0x04000001
<6>[ 244.766389][T12063] Call trace:
<6>[ 244.766396][T12063] __switch_to+0x180/0x2dc
<6>[ 244.766401][T12063] __schedule+0x4dc/0x940
<6>[ 244.766403][T12063] schedule+0x80/0x100
<6>[ 244.766407][T12063] fuse_get_req+0xc4/0x2b4
<6>[ 244.766408][T12063] fuse_simple_request+0x3c/0x298
<6>[ 244.766410][T12063] fuse_lookup_name+0x110/0x240
<6>[ 244.766412][T12063] fuse_lookup+0x80/0x1e4
<6>[ 244.766415][T12063] __lookup_slow+0x13c/0x1ec
<6>[ 244.766417][T12063] walk_component+0x1d4/0x230
<6>[ 244.766418][T12063] link_path_walk+0x248/0x374
<6>[ 244.766420][T12063] path_lookupat+0x94/0x17c
<6>[ 244.766421][T12063] filename_lookup+0xcc/0x208
<6>[ 244.766424][T12063] __arm64_sys_umount+0x68/0xb4
<6>[ 244.766428][T12063] el0_svc_common.llvm.1779838892619036198+0xd0/0x1e0
<6>[ 244.766429][T12063] do_el0_svc+0x28/0x98
<6>[ 244.766432][T12063] el0_svc+0x14/0x24
<6>[ 244.766434][T12063] el0_sync_handler+0x88/0xec
<6>[ 244.766436][T12063] el0_sync+0x1b8/0x1c0
The exact code where vold stuck in is
// system/vold/model/EmulatedVolume.cpp
status_t EmulatedVolume::doMount() {
...
if (!IsFuseBpfEnabled()) {
// Only do the bind-mounts when we know for sure the FUSE daemon can resolve the path.
res = mountFuseBindMounts();
if (res != OK) {
return res;
}
}
// system/vold/Utils.cpp
status_t MountUserFuse(userid_t user_id, const std::string& absolute_lower_path,
const std::string& relative_upper_path, android::base::unique_fd* fuse_fd) {
std::string pre_fuse_path(StringPrintf("/mnt/user/%d", user_id));
std::string fuse_path(
StringPrintf("%s/%s", pre_fuse_path.c_str(), relative_upper_path.c_str()));
// Open fuse fd.
fuse_fd->reset(open("/dev/fuse", O_RDWR | O_CLOEXEC));
if (fuse_fd->get() == -1) {
PLOG(ERROR) << "Failed to open /dev/fuse";
return -1;
}
// Note: leaving out default_permissions since we don't want kernel to do lower filesystem
// permission checks before routing to FUSE daemon.
const auto opts = StringPrintf(
"fd=%i,"
"rootmode=40000,"
"allow_other,"
"user_id=0,group_id=0,",
fuse_fd->get());
result = TEMP_FAILURE_RETRY(mount("/dev/fuse", fuse_path.c_str(), "fuse",
MS_NOSUID | MS_NODEV | MS_NOEXEC | MS_NOATIME | MS_LAZYTIME,
opts.c_str()));
if (result != 0) {
PLOG(ERROR) << "Failed to mount " << fuse_path;
return -errno;
}
if (IsSdcardfsUsed()) {
...
} else {
LOG(INFO) << "Bind mounting " << absolute_lower_path << " to " << pass_through_path;
return BindMount(absolute_lower_path, pass_through_path); // <<<<<<<<<<<<<<<
}
}
status_t BindMount(const std::string& source, const std::string& target) {
// <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
// This will stuck since the fuse is initlized above, so this unmount request will go through the fuse driver,
// but the fuse daemon in the user space will never start
if (UnmountTree(target) < 0) {
return -errno;
}
if (TEMP_FAILURE_RETRY(mount(source.c_str(), target.c_str(), nullptr, MS_BIND, nullptr)) < 0) {
PLOG(ERROR) << "Failed to bind mount " << source << " to " << target;
return -errno;
}
return OK;
}
ap...@google.com <ap...@google.com> #3
We have shared this with our product and engineering team and will update this issue with more information as it becomes available.
ap...@google.com <ap...@google.com> #4
I think it's a bug of the app clone feature, could you please assign this to
ap...@google.com <ap...@google.com> #5
As discussed on the CL shared by you, here is the patch to try to see if it works on your side. It actually tries to make sure that fuse daemon thread exists when we unmount the volume for clone profile.
gb...@google.com <gb...@google.com> #6
Thanks for the patch, I've verified that it could address the previous vold hang issue when disable and re-enable the dual app feature
Description
This warning should be cleaned up, as part of b/296092419 . Since there're multiple hits for it throughout CrOS ( b/296092419#comment7 and b/296092419#comment8 ), I'm filing a separate bug to track it.