Status Update
Comments
vi...@google.com <vi...@google.com> #2
Android build
Which Android build are you using? (e.g. UQ1A.240205.002)
Device used
Which device did you use to reproduce this issue?
Screen Record of the Issue
Please capture screen record or video of the issue using following steps:
adb shell screenrecord /sdcard/video.mp4
Subsequently use following command to pull the recorded file:
adb pull /sdcard/video.mp4
Attach the file to this issue.
Android bug report (to be captured after reproducing the issue)
For steps to capture a bug report, please refer:
Alternate method
Navigate to “Developer options”, ensure “USB debugging” is enabled, then enable “Bug report shortcut”. Capture bug report by holding the power button and selecting the “Take bug report” option.
Note: Please upload the bug report and screenrecord to google drive and share the folder to android-bugreport@google.com, then share the link here.
sr...@gmail.com <sr...@gmail.com> #3
sr...@gmail.com <sr...@gmail.com> #4
Please provide the requested information to proceed further. Unfortunately the issue will be closed within 7 days if there is no further update.
vi...@google.com <vi...@google.com> #5
We are closing this issue since we didn't receive a response. If you are still facing this problem, please open a new issue and add the relevant information along with reference to this issue.
en...@google.com <en...@google.com> #6
captured. Time flashes blank/stopwatch time at 1 hertz.
OK to close.
Corbet
On Thu, Nov 7, 2024, 11:59 PM <buganizer-system@google.com> wrote:
gu...@google.com <gu...@google.com> #7
Re #6, AMD is likely building Mesa using the out-of-tree NDK flow:
sr.muralidhar@ can likely confirm this.
The dependencies to build AMD drivers in AOSP Mesa using Soong are not there. There are plans in the works to build Mesa using Soong in the immediate future, maybe as a Q3 OKR:
en...@google.com <en...@google.com> #8
gurchetansingh: do you know where mesa runs? which process(es)? (so we can make the seccomp hole as small as possible.)
je...@google.com <je...@google.com> #9
Yeah, we need more information here. Where is mesa running? What pids is it calling kcmp() on? Provide the UID/GIDs/selinux context for both the caller and the PIDs it's being called on.
On further debugging we found that page fault is due corruption of file descriptors. And when we enbaled kcmp feature, we are not observing page fault.
How does kcmp prevent corruption of file descriptors?
gu...@google.com <gu...@google.com> #10
Mesa provides GLES/Vulkan implementations: it runs with both platform apps (surfaceflinger, mediacodecs) and un-trusted apps (games).
AMDGPU tries to create one AMDGPU winsys per physical device, using kcmp
:
Interestingly, Android only allows single GPU setups as of right now and that design is somewhat baked in.
So perhaps you can call DETECT_OS_ANDROID
in Mesa, and say for Android two DRM file descriptors always refer to the same physical device? That would avoid the kcmp syscall and is upstreamable.
en...@google.com <en...@google.com> #11
So perhaps you can call DETECT_OS_ANDROID in Mesa, and say for Android two DRM file descriptors always refer to the same physical device? That would avoid the kcmp syscall and is upstreamable.
that sounds like a much better solution all round!
en...@google.com <en...@google.com> #12
as another forward-looking alternative,
The new F_DUPFD_QUERY operation for fcntl() allows a process to check whether two file descriptors refer to the same underlying file. This functionality is also provided by kcmp(), but in a more restricted form that leaks less information from the kernel and, as a result, should be available even on systems where kcmp() is disabled.
but "So perhaps you can call DETECT_OS_ANDROID in Mesa, and say for Android two DRM file descriptors always refer to the same physical device?" from #10 sounds like an even simpler, cheaper, and more compatible option for Android. (but if upstream doesn't like it, we can always promise that we'll be able to switch to F_DUPFD_QUERY
eventually, if they'll let us do this in the meantime.)
en...@google.com <en...@google.com> #13
(the external submitter was dropped from the bug when the component was changed, which is why they haven't been responding...)
sr...@gmail.com <sr...@gmail.com> #14
However, please find replies for few queries above :
Yes, but also AOSP (tablet configuration has same problem)
2. We are currently using Mesa 23.2.1 version. But KCMP is used even in older Mesa version and also present in MESA variant in AOSP mainline
3. do you know where mesa runs? which process(es)? (so we can make the seccomp hole as small as possible.)
Media applications like Gallery, media player which links to omx mesa library (mostly any process calling android::egl_display_t::initialize(int*, int*)+323) on AMD platform can hit this issue)
Call stack
----------------------
7639 09-27 03:18:41.124 0 0 I logd : logdr: UID=10063 GID=10063 PID=5495 n tail=0 logMask=1 pid=5380 start=0ns deadline=0ns
7640 09-27 03:18:41.105 5495 5495 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
7641 09-27 03:18:41.106 5495 5495 F DEBUG : Build fingerprint: 'celadon_x86_64/celadon_x86_64/celadon_x86_64:13/TQ1A.230105.002.A1/eng.kxue.20230926.083833:userdebug/dev-keys'
7642 09-27 03:18:41.106 5495 5495 F DEBUG : Revision: '0'
7643 09-27 03:18:41.106 5495 5495 F DEBUG : ABI: 'x86_64'
7644 09-27 03:18:41.106 5495 5495 F DEBUG : Timestamp: 2023-09-27 03:18:40.626487258+0000
7645 09-27 03:18:41.106 5495 5495 F DEBUG : Process uptime: 3s
7646 09-27 03:18:41.106 5495 5495 F DEBUG : Cmdline: com.android.systemui
7647 09-27 03:18:41.106 5495 5495 F DEBUG : pid: 5380, tid: 5416, name: ImageWallpaper >>> com.android.systemui <<<
7648 09-27 03:18:41.106 5495 5495 F DEBUG : uid: 10063
7649 09-27 03:18:41.106 5495 5495 F DEBUG : signal 31 (SIGSYS), code 1 (SYS_SECCOMP), fault addr --------
7650 09-27 03:18:41.106 5495 5495 F DEBUG : Cause: seccomp prevented call to disallowed x86_64 system call 312
7651 09-27 03:18:41.106 5495 5495 F DEBUG : rax 0000000000000138 rbx 0000000000000084 rcx 000078605e45fdd8 rdx 0000000000000000
7652 09-27 03:18:41.106 5495 5495 F DEBUG : r8 0000000000000084 r9 0000000000000086 r10 0000000000000086 r11 0000000000000246
7653 09-27 03:18:41.106 5495 5495 F DEBUG : r12 0000785edde24cd0 r13 0000000000000084 r14 0000000000000001 r15 0000785e8de1e450
7654 09-27 03:18:41.106 5495 5495 F DEBUG : rdi 0000000000001504 rsi 0000000000001504
7655 09-27 03:18:41.106 5495 5495 F DEBUG : rbp 0000000000000086 rsp 0000785d477b08d8 rip 000078605e45fdd8
7656 09-27 03:18:41.106 5495 5495 F DEBUG : backtrace:
7657 09-27 03:18:41.106 5495 5495 F DEBUG : #00 pc 000000000005bdd8 /apex/com.android.runtime/lib64/bionic/libc.so (syscall+24) (BuildId: f090904cc3ac285a6f190f8003c3eb0e)
7658 09-27 03:18:41.106 5495 5495 F DEBUG : #01 pc 00000000006c6829 /vendor/lib64/dri/libgallium_dri.so (BuildId: d1d36ff02a1e0bc9191e0a9816320bf5d44d6fff)
7659 09-27 03:18:41.106 5495 5495 F DEBUG : #02 pc 0000000001034590 /vendor/lib64/dri/libgallium_dri.so (amdgpu_winsys_create+784) (BuildId: d1d36ff02a1e0bc9191e0a9816320bf5d44d6fff)
7660 09-27 03:18:41.107 5495 5495 F DEBUG : #03 pc 0000000000ddfeab /vendor/lib64/dri/libgallium_dri.so (BuildId: d1d36ff02a1e0bc9191e0a9816320bf5d44d6fff)
7661 09-27 03:18:41.107 5495 5495 F DEBUG : #04 pc 00000000006b9a75 /vendor/lib64/dri/libgallium_dri.so (BuildId: d1d36ff02a1e0bc9191e0a9816320bf5d44d6fff)
7662 09-27 03:18:41.107 5495 5495 F DEBUG : #05 pc 0000000000ddbe42 /vendor/lib64/dri/libgallium_dri.so (BuildId: d1d36ff02a1e0bc9191e0a9816320bf5d44d6fff)
7663 09-27 03:18:41.107 5495 5495 F DEBUG : #06 pc 00000000006bbc9e /vendor/lib64/dri/libgallium_dri.so (BuildId: d1d36ff02a1e0bc9191e0a9816320bf5d44d6fff)
7664 09-27 03:18:41.107 5495 5495 F DEBUG : #07 pc 00000000006c022d /vendor/lib64/dri/libgallium_dri.so (BuildId: d1d36ff02a1e0bc9191e0a9816320bf5d44d6fff)
7665 09-27 03:18:41.107 5495 5495 F DEBUG : #08 pc 000000000002c123 /vendor/lib64/egl/libGLES_mesa.so
7666 09-27 03:18:41.107 5495 5495 F DEBUG : #09 pc 0000000000030bdf /vendor/lib64/egl/libGLES_mesa.so
7667 09-27 03:18:41.107 5495 5495 F DEBUG : #10 pc 00000000000307a4 /vendor/lib64/egl/libGLES_mesa.so
7668 09-27 03:18:41.107 5495 5495 F DEBUG : #11 pc 000000000002d23a /vendor/lib64/egl/libGLES_mesa.so
7669 09-27 03:18:41.107 5495 5495 F DEBUG : #12 pc 000000000001c9bc /vendor/lib64/egl/libGLES_mesa.so (eglInitialize+268)
7670 09-27 03:18:41.107 5495 5495 F DEBUG : #13 pc 0000000000017563 /system/lib64/libEGL.so (android::egl_display_t::initialize(int*, int*)+323) (BuildId: e5c73ac29d0dfaee79d97a04d44cf2f6)
4. AMDGPU tries to create one AMDGPU winsys per physical device, using kcmp
Yes, it will be used for multi devices, but also for single device where multiple process can share same resources like GPU buffers.
5. alternative approach - With
We will discuss internally and verify. However, in AMD Radeon driver based on mesa is common across platforms (unified driver approach).
Can you please check if kcmp in seccomp be enabled, if no security concerns in Android.
However, can you
sr...@gmail.com <sr...@gmail.com> #15
Am still trying to reproduce scenario where fd's are duplicate, which was the actual failure case for us previously in Monkey test.
Can you please share if kcmp in seccomp be enabled, if no security concerns in Android ?
Changes :
+ #define F_LINUX_SPECIFIC_BASE 1024
+ #define F_DUPFD_QUERY (F_LINUX_SPECIFIC_BASE + 3)
- return syscall(SYS_kcmp, pid, pid, KCMP_FILE, fd1, fd2);
+ flags = fcntl(fd1, F_DUPFD_QUERY, fd2);
+
+ //return syscall(SYS_kcmp, pid, pid, KCMP_FILE, fd1, fd2);
en...@google.com <en...@google.com> #16
note that F_DUPFD_QUERY
is new in Linux 6.10; i'm guessing you tested on a kernel that's too old.
see #12 for our actual suggestion.
as...@gmail.com <as...@gmail.com> #17
However sysctl is deprecated since linux 5.x and we are using kernel 6.1.25.
Is there any other alternative approach or can kcmp be mainlined, if no security concerns.
en...@google.com <en...@google.com> #18
i'm starting to wonder whether the #10 you can see is the same as the #10 i see; maybe the numbering is different externally? here's the #10 i'm talking about:
Interestingly, Android only allows single GPU setups as of right now and that design is somewhat baked in.
So perhaps you can call DETECT_OS_ANDROID in Mesa, and say for Android two DRM file descriptors always refer to the same physical device? That would avoid the kcmp syscall and is upstreamable.
gu...@google.com <gu...@google.com> #19
I made a patch with the DETECT_OS_ANDROID
suggestion.
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
index 7c6a8196d88..5a294b1ee1a 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
@@ -332,6 +332,16 @@ amdgpu_cs_set_pstate(struct radeon_cmdbuf *rcs, enum radeon_ctx_pstate pstate)
static bool
are_file_descriptions_equal(int fd1, int fd2)
{
+
+ /* os_same_file_description(..) always returns false in Android
+ * since the syscall is blocked on Android.
+ *
+ * Android is single GPU anyways, so we should be fine with this.
+ */
+#if DETECT_OS_ANDROID
+ return true;
+#endif
+
int r = os_same_file_description(fd1, fd2);
if (r == 0)
Try it out, if it helps with the page faults, it's definitely upstreamable.
Description
We are facing a GPU timeout issue due to page-fault, and upon testing with kcmp enabled, situation is improved and page fault not observed.
To enable kcmp, we made changes in bionic repository, and device/oem folders.
So, Can you please let us know if we can enable kcmp and upstream bionic changes?
Or, Any known issues already reported/observed with kcmp in Android ?