Status Update
Comments
[Deleted User] <[Deleted User]> #2
Sorry for the inconvenience if the bug really should have been left as Available.
For more details visit
di...@google.com <di...@google.com> #3
we...@google.com <we...@google.com> #4
AFAICT the buddy lockup detector will not help with debugging as it cannot produce the stack backtrace on the locked up core.
I think something like the "IPI as NMI" series [1] would help with this. The NMI IPI can be used to force a stack dump. I've lightly tested this on MT8195 (CrOS 5.10 Cherry ToT) and RK3399 (5.10 uprev). Some fixes are required [2][3] though.
Unfortunately the latest Cherry ToT simply panics during boot from a spinlock recursion if pseudo-NMIs are enabled.
[1]
[2]
[3]
di...@google.com <di...@google.com> #5
AFAICT the buddy lockup detector will not help with debugging as it cannot produce the stack backtrace on the locked up core.
It's still better than nothing (especially if you combine it with something like
Right that the NMI solution is better. That's
we...@google.com <we...@google.com> #6
If for some reason the PMU interrupts can't be used as NMIs, like if they are partitioned (seen on RK3399 and MT8195), or they aren't PPIs (not seen on ChromeOS platforms but on other platforms), then this won't work.
I'll open another bug to track "IPI as NMI for NMI backtraces" to complement the buddy lockup detector.
we...@google.com <we...@google.com> #7
Filed as
av...@google.com <av...@google.com>
di...@google.com <di...@google.com> #8
Not sure why this got closed; it's still relevant.
ap...@google.com <ap...@google.com> #9
Branch: chromeos-5.15
commit 399e5d4e576b8bfdc0939b6e62694bbe65b5a9f9
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 17 17:25:44 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (all backtrace)
The buddy hard lockup detector should try backtracing on all
CPUs. Right now it doesn't. Copy that bit of logic from the normal
hardlockup detector.
NOTE: On arm64 (the current user of the buddy detector), this won't
(yet) do anything. Soon, hopefully.
UPSTREAM-TASK=b:172213097
BUG=b:278598383, b:278594093, b:197061987, b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Id225408348d8a45e68080d08139bc6d9e170000a
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
M kernel/watchdog_buddy_cpu.c
ap...@google.com <ap...@google.com> #10
Branch: chromeos-5.15
commit 71986679fe52d94286f9051f09b958ecf582c7fc
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 17 17:15:31 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (sysctl)
The CHROMIUM patch accidentally didn't expose the hardlockup panic
sysctls based on the right config. Fix it.
NOTE: Only one of these two sysctls actually does something with the
current buddy detector. You can turn on/off the hard lockup detector
but it doesn't (yet) support tracing other CPUs.
UPSTREAM-TASK=b:172213097
BUG=b:278598383, b:278594093, b:197061987, b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Id88d1fb603308e7210c30e42bb6e4e6a4be65a0c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
M kernel/sysctl.c
ap...@google.com <ap...@google.com> #11
Branch: chromeos-6.1
commit e21e2990b1d7fbb917a7b37541f91f41670d0d1d
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 17 17:25:44 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (all backtrace)
The buddy hard lockup detector should try backtracing on all
CPUs. Right now it doesn't. Copy that bit of logic from the normal
hardlockup detector.
NOTE: On arm64 (the current user of the buddy detector), this won't
(yet) do anything. Soon, hopefully.
UPSTREAM-TASK=b:172213097
BUG=b:278598383, b:278594093, b:197061987, b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Id225408348d8a45e68080d08139bc6d9e170000a
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
(cherry picked from commit 399e5d4e576b8bfdc0939b6e62694bbe65b5a9f9)
Reviewed-on:
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
M kernel/watchdog_buddy_cpu.c
ap...@google.com <ap...@google.com> #12
Branch: chromeos-6.1
commit 0351e3dbd6dce50087d5b4cb4698c7ceacb93bfa
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 17 17:15:31 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (sysctl)
The CHROMIUM patch accidentally didn't expose the hardlockup panic
sysctls based on the right config. Fix it.
NOTE: Only one of these two sysctls actually does something with the
current buddy detector. You can turn on/off the hard lockup detector
but it doesn't (yet) support tracing other CPUs.
UPSTREAM-TASK=b:172213097
BUG=b:278598383, b:278594093, b:197061987, b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Id88d1fb603308e7210c30e42bb6e4e6a4be65a0c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-on:
M kernel/watchdog.c
di...@google.com <di...@google.com>
di...@google.com <di...@google.com> #17
Everything, including fixes for all reported issues, is now in Linus's tree. \o/
I've done a backport of all of this to chromeos-6.1. The hope is that will make the continuous rebase kernels go more smoothly because they'll all get the configs enabled properly and won't have to separately realize the patches that they need to drop. Anyone care to review? Chen-Yu, maybe? I've put a tiny bit of notes about why I picked various patches in the attached notes.
Note that this also happens to bring in the arm64 perf lockup detector (see
The whole chain is:
https://crrev.com/c/4673849 - CHROMIUM: config: Enable the buddy hardlockup detector in the upstream wayhttps://crrev.com/c/4673848 - UPSTREAM: powerpc: Include asm/nmi.c in mobility.c for watchdog_hardlockup_set_timeout_pct()https://crrev.com/c/4673847 - UPSTREAM: watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDYhttps://crrev.com/c/4673846 - UPSTREAM: powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.hhttps://crrev.com/c/4673845 - UPSTREAM: watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCHhttps://crrev.com/c/4673844 - UPSTREAM: watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64https://crrev.com/c/4673843 - BACKPORT: watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specifichttps://crrev.com/c/4673842 - UPSTREAM: watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.hhttps://crrev.com/c/4673841 - UPSTREAM: watchdog/hardlockup: make the config checks more straightforwardhttps://crrev.com/c/4673840 - UPSTREAM: watchdog/hardlockup: sort hardlockup detector related config values a logical wayhttps://crrev.com/c/4673839 - UPSTREAM: watchdog/hardlockup: move SMP barriers from common code to buddy codehttps://crrev.com/c/4673838 - UPSTREAM: watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDYhttps://crrev.com/c/4673837 - UPSTREAM: watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()https://crrev.com/c/4673836 - UPSTREAM: watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is calledhttps://crrev.com/c/4673835 - UPSTREAM: watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()https://crrev.com/c/4673834 - UPSTREAM: watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()https://crrev.com/c/4673833 - UPSTREAM: watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()https://crrev.com/c/4673832 - UPSTREAM: watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()https://crrev.com/c/4673831 - UPSTREAM: watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe failshttps://crrev.com/c/4673830 - UPSTREAM: arm64: enable perf events based hard lockup detectorhttps://crrev.com/c/4673829 - UPSTREAM: arm64: add hw_nmi_get_sample_period for preparation of lockup detectorhttps://crrev.com/c/4673828 - UPSTREAM: watchdog/perf: adapt the watchdog_perf interface for async modelhttps://crrev.com/c/4673827 - UPSTREAM: watchdog/perf: add a weak function for an arch to detect if perf can use NMIshttps://crrev.com/c/4673826 - UPSTREAM: watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUshttps://crrev.com/c/4673825 - UPSTREAM: watchdog/hardlockup: have the perf hardlockup use __weak functions more cleanlyhttps://crrev.com/c/4673824 - UPSTREAM: watchdog/hardlockup: rename some "NMI watchdog" constants/functionhttps://crrev.com/c/4673823 - UPSTREAM: watchdog/hardlockup: move perf hardlockup watchdog petting to watchdog.chttps://crrev.com/c/4673822 - UPSTREAM: watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()https://crrev.com/c/4673821 - UPSTREAM: watchdog/hardlockup: style changes to watchdog_hardlockup_check() / is_hardlockup()https://crrev.com/c/4673520 - UPSTREAM: watchdog/hardlockup: move perf hardlockup checking/panic to common watchdog.chttps://crrev.com/c/4673519 - UPSTREAM: watchdog/perf: rename watchdog_hld.c to watchdog_perf.chttps://crrev.com/c/4673518 - UPSTREAM: watchdog/hardlockup: add comments to touch_nmi_watchdog()https://crrev.com/c/4673517 - UPSTREAM: watchdog/perf: ensure CPU-bound context when creating hardlockup detector eventhttps://crrev.com/c/4673516 - UPSTREAM: watchdog/hardlockup: change watchdog_nmi_enable() to voidhttps://crrev.com/c/4673515 - UPSTREAM: watchdog: remove WATCHDOG_DEFAULThttps://crrev.com/c/4673514 - UPSTREAM: watchdog/perf: more properly prevent false positives with turbo modeshttps://crrev.com/c/4673513 - UPSTREAM: watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct confighttps://crrev.com/c/4673512 - Revert "CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus"https://crrev.com/c/4673511 - Revert "FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (sysctl)"https://crrev.com/c/4673510 - Revert "FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (all backtrace)"https://crrev.com/c/4673509 - UPSTREAM: arm64: perf: Move PMUv3 driver to drivers/perf
Once that lands, we could probably call this fixed. If needed, we could also backport to older kernels.
ap...@google.com <ap...@google.com> #18
Branch: chromeos-6.1
commit 4d218ff03e576500a617329c19c0ba1f1461bbce
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:31 2023
UPSTREAM: watchdog/perf: rename watchdog_hld.c to watchdog_perf.c
The code currently in "watchdog_hld.c" is for detecting hardlockups using
perf, as evidenced by the line in the Makefile that only compiles this
file if CONFIG_HARDLOCKUP_DETECTOR_PERF is defined. Rename the file to
prepare for the buddy hardlockup detector, which doesn't use perf.
It could be argued that the new name makes it less obvious that this is a
hardlockup detector. While true, it's not hard to remember that the
"perf" detector is always a hardlockup detector and it's nice not to have
names that are too convoluted.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 6ea0d04211a7715a926f5dca4aa1065614fa64da)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Ice803cb078d0e15fb2cbf49132f096ee2bd4199d
Reviewed-on:
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M kernel/Makefile
M kernel/watchdog_perf.c
ap...@google.com <ap...@google.com> #19
Branch: chromeos-6.1
commit 222ddfef1622e21c885c6b6d310547c8c1533d65
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:30 2023
UPSTREAM: watchdog/hardlockup: add comments to touch_nmi_watchdog()
In preparation for the buddy hardlockup detector, add comments to
touch_nmi_watchdog() to make it obvious that it touches the configured
hardlockup detector regardless of whether it's backed by an NMI. Also
note that arch_touch_nmi_watchdog() may not be architecture-specific.
Ideally, we'd like to rename these functions but that is a fairly
disruptive change touching a lot of drivers. After discussion [1] the
plan is to defer this until a good time.
[1]
[akpm@linux-foundation.org: comment changes, per Petr]
Link:
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8b5c59a92b5b37e5c65ec185810d67e3f30f5a2e)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I4e47cbfa1bb2ebbcdb5ca16817aa2887f15dc82c
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M include/linux/nmi.h
ap...@google.com <ap...@google.com> #20
Branch: chromeos-6.1
commit 226f0cff12cabe1dc31895d20409d9e22654a76e
Author: Pingfan Liu <kernelfans@gmail.com>
Date: Fri May 19 10:18:29 2023
UPSTREAM: watchdog/perf: ensure CPU-bound context when creating hardlockup detector event
hardlockup_detector_event_create() should create perf_event on the current
CPU. Preemption could not get disabled because
perf_event_create_kernel_counter() allocates memory. Instead, the CPU
locality is achieved by processing the code in a per-CPU bound kthread.
Add a check to prevent mistakes when calling the code in another code
path.
Link:
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Co-developed-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 1fafaa7745eeeeffef7155ab5eeb8cc83d04874f)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I654063e53782b11d53e736a8ad4897ffd207406a
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M kernel/watchdog_hld.c
ap...@google.com <ap...@google.com> #21
Branch: chromeos-6.1
commit 8e80d8f4420819b56cc60472256e266308b54a15
Author: Lecopzer Chen <lecopzer.chen@mediatek.com>
Date: Fri May 19 10:18:28 2023
UPSTREAM: watchdog/hardlockup: change watchdog_nmi_enable() to void
Nobody cares about the return value of watchdog_nmi_enable(), changing its
prototype to void.
Link:
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 730211182ed083898fa5feb4b28459ffac4c9615)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Ic3a19b592eb1ac4c6f6eade44ffd943e8637b6e5
Reviewed-on:
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M arch/sparc/kernel/nmi.c
M include/linux/nmi.h
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #22
Branch: chromeos-6.1
commit f3583f5bf1ce1445abef98e6cadfe33e487018aa
Author: Lecopzer Chen <lecopzer.chen@mediatek.com>
Date: Fri May 19 10:18:27 2023
UPSTREAM: watchdog: remove WATCHDOG_DEFAULT
No reference to WATCHDOG_DEFAULT, remove it.
Link:
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 810b560e8985725dbd57bbb3f188c231365eb5ae)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I6a729209a1320e0ad212176e250ff945b8f91b2a
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #23
Branch: chromeos-6.1
commit a4af9617ba585314e0e08906a78bfab2c30be00d
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:26 2023
UPSTREAM: watchdog/perf: more properly prevent false positives with turbo modes
Currently, in the watchdog_overflow_callback() we first check to see if
the watchdog had been touched and _then_ we handle the workaround for
turbo mode. This order should be reversed.
Specifically, "touching" the hardlockup detector's watchdog should avoid
lockups being detected for one period that should be roughly the same
regardless of whether we're running turbo or not. That means that we
should do the extra accounting for turbo _before_ we look at (and clear)
the global indicating that we've been touched.
NOTE: this fix is made based on code inspection. I am not aware of any
reports where the old code would have generated false positives. That
being said, this order seems more correct and also makes it easier down
the line to share code with the "buddy" hardlockup detector.
Link:
Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4379e59fe5665cfda737e45b8bf2f05321ef049c)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I843b0d1de3e096ba111a179f3adb16d576bef5c7
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M kernel/watchdog_hld.c
ap...@google.com <ap...@google.com> #24
Branch: chromeos-6.1
commit be55001831db6391cbab179f3c9f2fb1f9757de2
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:25 2023
UPSTREAM: watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config
Patch series "watchdog/hardlockup: Add the buddy hardlockup detector", v5.
This patch series adds the "buddy" hardlockup detector. In brief, the
buddy hardlockup detector can detect hardlockups without arch-level
support by having CPUs checkup on a "buddy" CPU periodically.
Given the new design of this patch series, testing all combinations is
fairly difficult. I've attempted to make sure that all combinations of
CONFIG_ options are good, but it wouldn't surprise me if I missed
something. I apologize in advance and I'll do my best to fix any
problems that are found.
This patch (of 18):
The real watchdog_update_hrtimer_threshold() is defined in
kernel/watchdog_hld.c. That file is included if
CONFIG_HARDLOCKUP_DETECTOR_PERF and the function is defined in that file
if CONFIG_HARDLOCKUP_CHECK_TIMESTAMP.
The dummy version of the function in "nmi.h" didn't get that quite right.
While this doesn't appear to be a huge deal, it's nice to make it
consistent.
It doesn't break builds because CHECK_TIMESTAMP is only defined by x86 so
others don't get a double definition, and x86 uses perf lockup detector,
so it gets the out of line version.
Link:
Link:
Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Cc: Colin Cross <ccross@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 5e008df11c55228a86a1bae692cc2002503572c9)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I8cbb2f4fa740528fcfade4f5439b6cdcdd059251
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M include/linux/nmi.h
ap...@google.com <ap...@google.com> #25
Branch: chromeos-6.1
commit 176585f4a79d2fa783276fdefb299410d97a9b9d
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Jul 07 12:26:16 2023
Revert "CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus"
This reverts commit 3b0603c1a24696e0c8ec3892ed13862a0c71b200.
We're replacing the old CHROMIUM patches with what landed upstream.
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Ia1205a8a82fb0255c03dc1080b04d139b3374df7
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M include/linux/nmi.h
M kernel/Makefile
M kernel/watchdog.c
D kernel/watchdog_buddy_cpu.c
M lib/Kconfig.debug
ap...@google.com <ap...@google.com> #26
Branch: chromeos-6.1
commit 283d8d26b36737ca2eb2d8b9ac70bb05e5fcb067
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Jul 07 12:26:06 2023
Revert "FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (sysctl)"
This reverts commit 0351e3dbd6dce50087d5b4cb4698c7ceacb93bfa.
We're replacing the old CHROMIUM patches with what landed upstream.
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Ie3c6b77f7aac3e7988916c58a364792ca9a73ad0
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #27
Branch: chromeos-6.1
commit 8d75434d7bb36944d3fa1620765556fd48fbbe6d
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Jul 07 12:26:03 2023
Revert "FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (all backtrace)"
This reverts commit e21e2990b1d7fbb917a7b37541f91f41670d0d1d.
We're replacing the old CHROMIUM patches with what landed upstream.
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Cq-Depend: chromium:4673849
Change-Id: If682778037673b55818c43766c5b209d5f6ff855
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M kernel/watchdog_buddy_cpu.c
ap...@google.com <ap...@google.com> #28
Branch: chromeos-6.1
commit 0c6d5223e82ae40fa11588c34ef2eb24008f29c7
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Jul 07 13:04:26 2023
CHROMIUM: config: Enable the buddy hardlockup detector in the upstream way
Now that we've reverted the downstream buddy lockup detector and
picked the upstream buddy lockup detector, we need to enable it
differently. Now we just enable the "CONFIG_HARDLOCKUP_DETECTOR" for
all platforms and the system realizes that means that buddy lockup
detector for arm64.
NOTE: We'll have to be careful when we enable CONFIG_ARM64_PSEUDO_NMI
since suddenly the default will change to the perf lockup detector and
that's probably not what we want for arm64. We don't want the perf one
in most cases because just turning on CONFIG_ARM64_PSEUDO_NMI doesn't
actually guarantee that pseudo-NMI will be available and if pseudo-NMI
isn't available the buddy lockup detector is more functional than the
perf-based one. Pseudo-NMI might not be available either because we
haven't yet turned it on with "irqchip.gicv3_pseudo_nmi=1" or because
of the Mediatek errata (see
CONFIG_ARM64_PSEUDO_NMI=y we'll have to also add
CONFIG_HARDLOCKUP_DETECTOR_PREFER_BUDDY=y.
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I67e347bc4ccd9beba2a54a128c83d623b7d5b949
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M chromeos/config/chromeos/arm64/common.config
M chromeos/config/chromeos/armel/common.config
M chromeos/config/chromeos/base.config
M chromeos/config/chromeos/x86_64/common.config
ap...@google.com <ap...@google.com> #29
Branch: chromeos-6.1
commit 812271a35b03ff519f6734d29e5cfc6f45530365
Author: Douglas Anderson <dianders@chromium.org>
Date: Thu Jun 29 12:45:06 2023
UPSTREAM: powerpc: Include asm/nmi.c in mobility.c for watchdog_hardlockup_set_timeout_pct()
The powerpc/platforms/pseries/mobility.c calls
watchdog_hardlockup_set_timeout_pct(), which is declared in
<asm/nmi.h>. We used to automatically get <asm/nmi.h> included, but
that changed as of commit 7ca8fe94aa92 ("watchdog/hardlockup: define
HARDLOCKUP_DETECTOR_ARCH"). Let's add the explicit include.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes:
Fixes: 7ca8fe94aa92 ("watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link:
(cherry picked from commit 6cb44bef35ac11724ef22c5ae4f1bc607e2ef3d8)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I55e2f4e7903d686c4484cb23c033c6a9e1a9d4c4
Reviewed-on:
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M arch/powerpc/platforms/pseries/mobility.c
ap...@google.com <ap...@google.com> #30
Branch: chromeos-6.1
commit 8d51d01ab10615aff5e333eb36ef6e801c342e29
Author: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Date: Fri Jun 23 06:07:17 2023
UPSTREAM: watchdog/hardlockup: fix typo in config HARDLOCKUP_DETECTOR_PREFER_BUDDY
Commit a5fcc2367e22 ("watchdog/hardlockup: make HAVE_NMI_WATCHDOG
sparc64-specific") accidentially introduces a typo in one of the config
dependencies of HARDLOCKUP_DETECTOR_PREFER_BUDDY.
Fix this accidental typo.
Link:
Fixes: a5fcc2367e22 ("watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit a8992d8ad7775860594d3d981ef93fc423185fa4)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I38647ad1d3003e5273ad243a9c99b1a4cd2f40cf
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M lib/Kconfig.debug
ap...@google.com <ap...@google.com> #31
Branch: chromeos-6.1
commit 84755174561f3cb611532e9978ad2e6a403c338c
Author: Douglas Anderson <dianders@chromium.org>
Date: Wed Jun 21 16:48:19 2023
UPSTREAM: powerpc: move arch_trigger_cpumask_backtrace from nmi.h to irq.h
The powerpc architecture was the only one that defined
arch_trigger_cpumask_backtrace() in asm/nmi.h instead of
asm/irq.h. Move it to be consistent.
This fixes compile time errors introduced by commit 7ca8fe94aa92
("watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH"). That commit
caused <asm/nmi.h> to stop being included if the hardlockup detector
wasn't enabled. The specific errors were:
error: implicit declaration of function `nmi_cpu_backtrace'
error: implicit declaration of function `nmi_trigger_cpumask_backtrace'
NOTE: when moving this into irq.h, we also change the guards from just
checking if "CONFIG_NMI_IPI" is defined to also checking if
"CONFIG_PPC_BOOK3S_64" is defined. This matches the code in
arch/powerpc/kernel/stacktrace.c. Previously this worked because
<asm.nmi.h> was included if "CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH" was
defined. For powerpc that's only selected if "CONFIG_PPC_BOOK3S_64" is
defined.
[dianders@chromium.org: change the guards to include CONFIG_PPC_BOOK3S_64]
Link:
Link:
Fixes: 7ca8fe94aa92 ("watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Closes:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tom Rix <trix@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit df8b78e1630fa6cff82fdc33ce04000e3ed065f7)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Ice67126857506712559078e7de26d32d26e64631
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M arch/powerpc/include/asm/irq.h
M arch/powerpc/include/asm/nmi.h
ap...@google.com <ap...@google.com> #32
Branch: chromeos-6.1
commit 5d37636bc9a19edf2cd1bb452aa57ab04f7c0fb8
Author: Petr Mladek <pmladek@suse.com>
Date: Fri Jun 16 17:06:18 2023
UPSTREAM: watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH
The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_ARCH without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.
The change allows to clean up dependencies of PPC_WATCHDOG
and HAVE_HARDLOCKUP_DETECTOR_PERF definitions for powerpc.
As a result HAVE_HARDLOCKUP_DETECTOR_PERF has the same dependencies
on arm, x86, powerpc architectures.
Link:
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7ca8fe94aa92d9adcd7dcdf64371fc78eb2da3f9)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Cq-Depend: chromium:4673848
Change-Id: I36072e8e644cd3b94f4c3161010f82a838374a2e
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M arch/powerpc/Kconfig
M include/linux/nmi.h
M lib/Kconfig.debug
ap...@google.com <ap...@google.com> #33
Branch: chromeos-6.1
commit c5d08b1efb0c0ad1986b4625a49a2b38c93d3c9b
Author: Petr Mladek <pmladek@suse.com>
Date: Fri Jun 16 17:06:17 2023
UPSTREAM: watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.
Before, it is far from obvious that the SPARC64 variant is actually used:
$> make ARCH=sparc64 defconfig
$> grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
After, it is more clear:
$> make ARCH=sparc64 defconfig
$> grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y
Link:
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 47f4cb433923a08d81f1e5c065cb680215109db9)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I4decda0c3c4385c8a8dbe2e687a23b8d0fa8126f
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
M arch/sparc/Kconfig.debug
M include/linux/nmi.h
M kernel/watchdog.c
M lib/Kconfig.debug
ap...@google.com <ap...@google.com> #34
Branch: chromeos-6.1
commit f19fa1112fc39c078bc59867e9ad3cf18084903d
Author: Petr Mladek <pmladek@suse.com>
Date: Fri Jun 16 17:06:16 2023
BACKPORT: watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.
CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.
The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.
CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.
The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.
And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.
There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:
+ implements arch_touch_nmi_watchdog() which is called from the generic
touch_nmi_watchdog()
+ implements watchdog_hardlockup_enable()/disable() to support
/proc/sys/kernel/nmi_watchdog
+ is always preferred over other generic watchdogs, see
CONFIG_HARDLOCKUP_DETECTOR
+ includes asm/nmi.h into linux/nmi.h because some sparc-specific
functions are needed in sparc-specific code which includes
only linux/nmi.h.
The situation became more complicated after the commit 05a4a95279311c3
("kernel/watchdog: split up config options") and commit 2104180a53698df5
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.
HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:
+ implemented arch_touch_nmi_watchdog()
+ included asm/nmi.h into linux/nmi.h
+ defined the default value for /proc/sys/kernel/nmi_watchdog
But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.
As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:
((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
The problem is that the very special sparc implementation is defined as:
HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH
Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.
Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.
Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
Link:
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit a5fcc2367e223c45c78a882438c2b8e13fe0f580)
Conflicts:
arch/sparc/Kconfig
Trivial context conflict because we don't have commit fcbfe8121a45
("Kconfig: introduce HAS_IOPORT option and select it as
necessary"). It's not totally trivial to pick that so BACKPORT seems
better.
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Cq-Depend: chromium:4673847
Change-Id: I3b19937946bf3c16b8ed82689207fe4ba26e572d
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M arch/Kconfig
M arch/sparc/Kconfig
M arch/sparc/Kconfig.debug
M include/linux/nmi.h
M kernel/watchdog.c
M lib/Kconfig.debug
ap...@google.com <ap...@google.com> #35
Branch: chromeos-6.1
commit d8ff3afc17cb59b119cc814969a197753bbdbce9
Author: Petr Mladek <pmladek@suse.com>
Date: Fri Jun 16 17:06:15 2023
UPSTREAM: watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h
arch_touch_nmi_watchdog() needs a different implementation for various
hardlockup detector implementations. And it does nothing when
any hardlockup detector is not built at all.
arch_touch_nmi_watchdog() is declared via linux/nmi.h. And it must be
defined as an empty function when there is no hardlockup detector.
It is done directly in this header file for the perf and buddy detectors.
And it is done in the included asm/linux.h for arch specific detectors.
The reason probably is that the arch specific variants build the code
using another conditions. For example, powerpc64/sparc64 builds the code
when CONFIG_PPC_WATCHDOG is enabled.
Another reason might be that these architectures define more functions
in asm/nmi.h anyway.
However the generic code actually knows when the function will be
implemented. It happens when some full featured or the sparc64-specific
hardlockup detector is built.
In particular, CONFIG_HARDLOCKUP_DETECTOR can be enabled only when
a generic or arch-specific full featured hardlockup detector is available.
The only exception is sparc64 which can be built even when the global
HARDLOCKUP_DETECTOR switch is disabled.
The information about sparc64 is a bit complicated. The hardlockup
detector is built there when CONFIG_HAVE_NMI_WATCHDOG is set and
CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.
People might wonder whether this change really makes things easier.
The motivation is:
+ The current logic in linux/nmi.h is far from obvious.
For example, arch_touch_nmi_watchdog() is defined as {} when
neither CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER nor
CONFIG_HAVE_NMI_WATCHDOG is defined.
+ The change synchronizes the checks in lib/Kconfig.debug and
in the generic code.
+ It is a step that will help cleaning HAVE_NMI_WATCHDOG related
checks.
The change should not change the existing behavior.
Link:
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0c68bda69665307bf835b0c433363e5073608c95)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I70a2531ec47a1975867960c6fe81c85a2b9e2f3f
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
M arch/powerpc/include/asm/nmi.h
M arch/sparc/include/asm/nmi.h
M include/linux/nmi.h
ap...@google.com <ap...@google.com> #36
Branch: chromeos-6.1
commit 654fc04d79f46b1810cdb4d0f66c1d860e4b8010
Author: Petr Mladek <pmladek@suse.com>
Date: Fri Jun 16 17:06:14 2023
UPSTREAM: watchdog/hardlockup: make the config checks more straightforward
There are four possible variants of hardlockup detectors:
+ buddy: available when SMP is set.
+ perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set.
+ arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set.
+ sparc64 special variant: available when HAVE_NMI_WATCHDOG is set
and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.
The check for the sparc64 variant is more complicated because
HAVE_NMI_WATCHDOG is used to #ifdef code used by both arch-specific
and sparc64 specific variant. Therefore it is automatically
selected with HAVE_HARDLOCKUP_DETECTOR_ARCH.
This complexity is partly hidden in HAVE_HARDLOCKUP_DETECTOR_NON_ARCH.
It reduces the size of some checks but it makes them harder to follow.
Finally, the other temporary variable HARDLOCKUP_DETECTOR_NON_ARCH
is used to re-compute HARDLOCKUP_DETECTOR_PERF/BUDDY when the global
HARDLOCKUP_DETECTOR switch is enabled/disabled.
Make the logic more straightforward by the following changes:
+ Better explain the role of HAVE_HARDLOCKUP_DETECTOR_ARCH and
HAVE_NMI_WATCHDOG in comments.
+ Add HAVE_HARDLOCKUP_DETECTOR_BUDDY so that there is separate
HAVE_* for all four hardlockup detector variants.
Use it in the other conditions instead of SMP. It makes it
clear that it is about the buddy detector.
+ Open code HAVE_HARDLOCKUP_DETECTOR_NON_ARCH in HARDLOCKUP_DETECTOR
and HARDLOCKUP_DETECTOR_PREFER_BUDDY. It helps to understand
the conditions between the four hardlockup detector variants.
+ Define the exact conditions when HARDLOCKUP_DETECTOR_PERF/BUDDY
can be enabled. It explains the dependency on the other
hardlockup detector variants.
Also it allows to remove HARDLOCKUP_DETECTOR_NON_ARCH by using "imply".
It triggers re-evaluating HARDLOCKUP_DETECTOR_PERF/BUDDY when
the global HARDLOCKUP_DETECTOR switch is changed.
+ Add dependency on HARDLOCKUP_DETECTOR so that the affected variables
disappear when the hardlockup detectors are disabled.
Another nice side effect is that HARDLOCKUP_DETECTOR_PREFER_BUDDY
value is not preserved when the global switch is disabled.
The user has to make the decision again when it gets re-enabled.
Link:
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 1356d0b966e7ed81832af35478b913495cf7792e)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I9060d61fd2eeb85d2c579c39ab4251a8a25d2778
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
M arch/Kconfig
M lib/Kconfig.debug
ap...@google.com <ap...@google.com> #37
Branch: chromeos-6.1
commit cae44ebbf2c880980fe934a252725fe4ecc73683
Author: Petr Mladek <pmladek@suse.com>
Date: Fri Jun 16 17:06:13 2023
UPSTREAM: watchdog/hardlockup: sort hardlockup detector related config values a logical way
Patch series "watchdog/hardlockup: Cleanup configuration of hardlockup
detectors", v2.
Clean up watchdog Kconfig after introducing the buddy detector.
This patch (of 6):
There are four possible variants of hardlockup detectors:
+ buddy: available when SMP is set.
+ perf: available when HAVE_HARDLOCKUP_DETECTOR_PERF is set.
+ arch-specific: available when HAVE_HARDLOCKUP_DETECTOR_ARCH is set.
+ sparc64 special variant: available when HAVE_NMI_WATCHDOG is set
and HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.
Only one hardlockup detector can be compiled in. The selection is done
using quite complex dependencies between several CONFIG variables.
The following patches will try to make it more straightforward.
As a first step, reorder the definitions of the various CONFIG variables.
The logical order is:
1. HAVE_* variables define available variants. They are typically
defined in the arch/ config files.
2. HARDLOCKUP_DETECTOR y/n variable defines whether the hardlockup
detector is enabled at all.
3. HARDLOCKUP_DETECTOR_PREFER_BUDDY y/n variable defines whether
the buddy detector should be preferred over the perf one.
Note that the arch specific variants are always preferred when
available.
4. HARDLOCKUP_DETECTOR_PERF/BUDDY variables define whether the given
detector is enabled in the end.
5. HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and HARDLOCKUP_DETECTOR_NON_ARCH
are temporary variables that are going to be removed in
a followup patch.
This is a preparation step for further cleanup. It will change the logic
without shuffling the definitions.
This change temporary breaks the C-like ordering where the variables are
declared or defined before they are used. It is not really needed for
Kconfig. Also the following patches will rework the logic so that
the ordering will be C-like in the end.
The patch just shuffles the definitions. It should not change the existing
behavior.
Link:
Link:
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 4917a25f83a8dc95eafd0107be87d4340f48d265)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I94afe8404ce86b320470ba554dddf0b7a81648bc
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M lib/Kconfig.debug
ap...@google.com <ap...@google.com> #38
Branch: chromeos-6.1
commit 4ab0e1bae15c927f8f119ab50a31bfd6626256af
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 26 18:41:39 2023
UPSTREAM: watchdog/hardlockup: move SMP barriers from common code to buddy code
It's been suggested that since the SMP barriers are only potentially
useful for the buddy hardlockup detector, not the perf hardlockup
detector, that the barriers belong in the buddy code. Let's move them and
add clearer comments about why they're needed.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 28168eca3297d68faa8a9433ec93cb6acf06d2f4)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I5ab0a0eeb0bd52fb23f901d298c72fa5c396e22b
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M kernel/watchdog.c
M kernel/watchdog_buddy.c
ap...@google.com <ap...@google.com> #39
Branch: chromeos-6.1
commit 3a8bd2f79511647673da87b86903a6d146106838
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 26 18:41:38 2023
UPSTREAM: watchdog/buddy: simplify the dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY
The dependency for HARDLOCKUP_DETECTOR_PREFER_BUDDY was more complicated
than it needed to be. If the "perf" detector is available and we have SMP
then we have a choice, so enable the config based on just those two config
items.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7ece48b7b4a22c1b2d59d7ab8ebcbacbfcaa7872)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I49d5b483336b65b8acb1e5066548a05260caf809
Reviewed-on:
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
M lib/Kconfig.debug
ap...@google.com <ap...@google.com> #40
Branch: chromeos-6.1
commit 73f191705cb528b05a170910583b0cf13427e39c
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 26 18:41:37 2023
UPSTREAM: watchdog/buddy: don't copy the cpumask in watchdog_next_cpu()
There's no reason to make a copy of the "watchdog_cpus" locally in
watchdog_next_cpu(). Making a copy wouldn't make things any more race
free and we're just reading the value so there's no need for a copy.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 813efda23934edcad96343fc96727017378c3fe9)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: If466f9a2b50884cbf6a1d8ad05525a2c17069407
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M kernel/watchdog_buddy.c
ap...@google.com <ap...@google.com> #41
Branch: chromeos-6.1
commit fbfda64bc4713885318d72771f3a6b770537603f
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 26 18:41:36 2023
UPSTREAM: watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called
In the patch ("watchdog/hardlockup: detect hard lockups using secondary
(buddy) CPUs"), we added a call from the common watchdog.c file into the
buddy. That call could be done more cleanly. Specifically:
1. If we move the call into watchdog_hardlockup_kick() then it keeps
watchdog_timer_fn() simpler.
2. We don't need to pass an "unsigned long" to the buddy for the timer
count. In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") the count was changed to "atomic_t"
which is backed by an int, so we should match types.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d3b62ace0f097f1d863fb6c41df3c61503e4ec9e)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I006c7d958a1ea5c4e1e4dc44a25596d9bb5fd3ba
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M include/linux/nmi.h
M kernel/watchdog.c
M kernel/watchdog_buddy.c
ap...@google.com <ap...@google.com> #42
Branch: chromeos-6.1
commit fa34dc36de09e4f68637d93946f525eabefe4b65
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 26 18:41:35 2023
UPSTREAM: watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()
In the patch ("watchdog/hardlockup: add comments to touch_nmi_watchdog()")
we adjusted some comments for touch_nmi_watchdog(). The comment about the
softlockup had a typo and were also felt to be too obvious. Remove it.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 05e7b558766114aa9c3d5d3af188a5c574809661)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Ia593afc9eb12082d55ea6681dc2c5a89677f20a8
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M include/linux/nmi.h
ap...@google.com <ap...@google.com> #43
Branch: chromeos-6.1
commit 60307aac1f45f92a9ad3e79e1c000049965a2f81
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 26 18:41:34 2023
UPSTREAM: watchdog/hardlockup: in watchdog_hardlockup_check() use cpumask_copy()
In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started using a cpumask to keep track of
which CPUs to backtrace. When setting up this cpumask, it's better to use
cpumask_copy() than to just copy the structure directly. Fix this.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 7a71d8e650b06833095e7a0d4206585e8585c00f)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Iccee2d1ea19114dafb6553a854ea4d8ab2a3f25b
Reviewed-on:
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #44
Branch: chromeos-6.1
commit 477909d775461df94828760f4300898ddd3eca16
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 26 18:41:33 2023
UPSTREAM: watchdog/hardlockup: don't use raw_cpu_ptr() in watchdog_hardlockup_kick()
In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") there was no reason to use raw_cpu_ptr().
Using this_cpu_ptr() works fine.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 2711e4adef4fac2eeaee66e3c22a2f75ee86e7b3)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I660e103077dcc23bb29aaf2be09cb234e0495b2d
Reviewed-on:
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #45
Branch: chromeos-6.1
commit 5b439e2dab042db4a7cad7f619dd0bd45a5f62e2
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 26 18:41:32 2023
UPSTREAM: watchdog/hardlockup: HAVE_NMI_WATCHDOG must implement watchdog_hardlockup_probe()
Right now there is one arch (sparc64) that selects HAVE_NMI_WATCHDOG
without selecting HAVE_HARDLOCKUP_DETECTOR_ARCH. Because of that one
architecture, we have some special case code in the watchdog core to
handle the fact that watchdog_hardlockup_probe() isn't implemented.
Let's implement watchdog_hardlockup_probe() for sparc64 and get rid of the
special case.
As a side effect of doing this, code inspection tells us that we could fix
a minor bug where the system won't properly realize that NMI watchdogs are
disabled. Specifically, on powerpc if CONFIG_PPC_WATCHDOG is turned off
the arch might still select CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH which
selects CONFIG_HAVE_NMI_WATCHDOG. Since CONFIG_PPC_WATCHDOG was off then
nothing will override the "weak" watchdog_hardlockup_probe() and we'll
fallback to looking at CONFIG_HAVE_NMI_WATCHDOG.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 6426e8d1f27417834ea37e75a9ead832d1cf7713)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Ic6ebbf307ca0efe91f08ce2c1eb4a037ba6b0700
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M arch/Kconfig
M arch/sparc/kernel/nmi.c
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #46
Branch: chromeos-6.1
commit e420c01f44cd9de3e7bdedcc44ec72c7a2253882
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 26 18:41:31 2023
UPSTREAM: watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails
Patch series "watchdog: Cleanup / fixes after buddy series v5 reviews".
This patch series attempts to finish resolving the feedback received
from Petr Mladek on the v5 series I posted.
Probably the only thing that wasn't fully as clean as Petr requested was
the Kconfig stuff. I couldn't find a better way to express it without a
more major overhaul. In the very least, I renamed "NON_ARCH" to
"PERF_OR_BUDDY" in the hopes that will make it marginally better.
Nothing in this series is terribly critical and even the bugfixes are
small. However, it does cleanup a few things that were pointed out in
review.
This patch (of 10):
The permissions for the kernel.nmi_watchdog sysctl have always been set at
compile time despite the fact that a watchdog can fail to probe. Let's
fix this and set the permissions based on whether the hardlockup detector
actually probed.
Link:
Link:
Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reported-by: Petr Mladek <pmladek@suse.com>
Closes:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 9ec272c586b07d1abf73438524bd12b1df9c5f9b)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I0d75971cc52a7283f495aac0bd5c3041aadc734e
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M include/linux/nmi.h
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #47
Branch: chromeos-6.1
commit b1406028f28961cb93b3ad6039f239748a3c9fa6
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:38 2023
UPSTREAM: watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs
Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups. Instead of using something
arch-specific we will use the buddy system, where each CPU watches out for
another one. Specifically, each CPU will use its softlockup hrtimer to
check that the next CPU is processing hrtimer interrupts by verifying that
a counter is increasing.
NOTE: unlike the other hard lockup detectors, the buddy one can't easily
show what's happening on the CPU that locked up just by doing a simple
backtrace. It relies on some other mechanism in the system to get
information about the locked up CPUs. This could be support for NMI
backtraces like [1], it could be a mechanism for printing the PC of locked
CPUs at panic time like [2] / [3], or it could be something else. Even
though that means we still rely on arch-specific code, this arch-specific
code seems to often be implemented even on architectures that don't have a
hardlockup detector.
This style of hardlockup detector originated in some downstream Android
trees and has been rebased on / carried in ChromeOS trees for quite a long
time for use on arm and arm64 boards. Historically on these boards we've
leveraged mechanism [2] / [3] to get information about hung CPUs, but we
could move to [1].
Although the original motivation for the buddy system was for use on
systems without an arch-specific hardlockup detector, it can still be
useful to use even on systems that _do_ have an arch-specific hardlockup
detector. On x86, for instance, there is a 24-part patch series [4] in
progress switching the arch-specific hard lockup detector from a scarce
perf counter to a less-scarce hardware resource. Potentially the buddy
system could be a simpler alternative to free up the perf counter but
still get hard lockup detection.
Overall, pros (+) and cons (-) of the buddy system compared to an
arch-specific hardlockup detector (which might be implemented using
perf):
+ The buddy system is usable on systems that don't have an
arch-specific hardlockup detector, like arm32 and arm64 (though it's
being worked on for arm64 [5]).
+ The buddy system may free up scarce hardware resources.
+ If a CPU totally goes out to lunch (can't process NMIs) the buddy
system could still detect the problem (though it would be unlikely
to be able to get a stack trace).
+ The buddy system uses the same timer function to pet the hardlockup
detector on the running CPU as it uses to detect hardlockups on
other CPUs. Compared to other hardlockup detectors, this means it
generates fewer interrupts and thus is likely better able to let
CPUs stay idle longer.
- If all CPUs are hard locked up at the same time the buddy system
can't detect it.
- If we don't have SMP we can't use the buddy system.
- The buddy system needs an arch-specific mechanism (possibly NMI
backtrace) to get info about the locked up CPU.
[1]
[2]
[3]
[4]
[5]
Link:
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Tzung-Bi Shih <tzungbi@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 1f423c905a6b43b493df1b259e6e6267e5624e62)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I6bf789d21d0c3d75d382e7e51a804a7a51315f2c
Reviewed-on:
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M include/linux/nmi.h
M kernel/Makefile
M kernel/watchdog.c
A kernel/watchdog_buddy.c
M lib/Kconfig.debug
ap...@google.com <ap...@google.com> #48
Branch: chromeos-6.1
commit 07937693e8c6553095b4a5d5bdde9d3c334ee5dc
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:37 2023
UPSTREAM: watchdog/hardlockup: have the perf hardlockup use __weak functions more cleanly
The fact that there watchdog_hardlockup_enable(),
watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are
declared __weak means that the configured hardlockup detector can define
non-weak versions of those functions if it needs to. Instead of doing
this, the perf hardlockup detector hooked itself into the default __weak
implementation, which was a bit awkward. Clean this up.
From comments, it looks as if the original design was done because the
__weak function were expected to implemented by the architecture and not
by the configured hardlockup detector. This got awkward when we tried to
add the buddy lockup detector which was not arch-specific but wanted to
hook into those same functions.
This is not expected to have any functional impact.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d9b3629ade8ebffb0075e311409796a56bac8282)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I847d9ec852449350997ba00401d2462a9cb4302b
Reviewed-on:
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M include/linux/nmi.h
M kernel/watchdog.c
M kernel/watchdog_perf.c
ap...@google.com <ap...@google.com> #49
Branch: chromeos-6.1
commit 1a46e6ec597bd31668c7daf1201616195dc2f57e
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:36 2023
UPSTREAM: watchdog/hardlockup: rename some "NMI watchdog" constants/function
Do a search and replace of:
- NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED
- SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED
- watchdog_nmi_ => watchdog_hardlockup_
- nmi_watchdog_available => watchdog_hardlockup_available
- nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled
- soft_watchdog_user_enabled => watchdog_softlockup_user_enabled
- NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT
Then update a few comments near where names were changed.
This is specifically to make it less confusing when we want to introduce
the buddy hardlockup detector, which isn't using NMIs. As part of this,
we sanitized a few names for consistency.
[trix@redhat.com: make variables static]
Link:
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit df95d3085caa5b99a60eb033d7ad6c2ff2b43dbf)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I91f7277bab4bf8c0cb238732ed92e7ce7bbd71a6
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M arch/powerpc/include/asm/nmi.h
M arch/powerpc/kernel/watchdog.c
M arch/powerpc/platforms/pseries/mobility.c
M arch/sparc/kernel/nmi.c
M include/linux/nmi.h
M kernel/watchdog.c
M kernel/watchdog_perf.c
ap...@google.com <ap...@google.com> #50
Branch: chromeos-6.1
commit e24c5ac90a55e751828c0f0bd069c9d0dc5a6117
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:35 2023
UPSTREAM: watchdog/hardlockup: move perf hardlockup watchdog petting to watchdog.c
In preparation for the buddy hardlockup detector, which wants the same
petting logic as the current perf hardlockup detector, move the code to
watchdog.c. While doing this, rename the global variable to match others
nearby. As part of this change we have to change the code to account for
the fact that the CPU we're running on might be different than the one
we're checking.
Currently the code in watchdog.c is guarded by
CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem silly.
However, a future patch will change this.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit ed92e1ef52224c7c9c15fba559448396b059c2ee)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I00dfd6386ee00da25bf26d140559a41339b53e57
Reviewed-on:
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
M include/linux/nmi.h
M kernel/watchdog.c
M kernel/watchdog_perf.c
ap...@google.com <ap...@google.com> #51
Branch: chromeos-6.1
commit c627cf25556bc8a4fbdeb0d6b92f9294988adf94
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:34 2023
UPSTREAM: watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()
In preparation for the buddy hardlockup detector where the CPU checking
for lockup might not be the currently running CPU, add a "cpu" parameter
to watchdog_hardlockup_check().
As part of this change, make hrtimer_interrupts an atomic_t since now the
CPU incrementing the value and the CPU reading the value might be
different. Technially this could also be done with just READ_ONCE and
WRITE_ONCE, but atomic_t feels a little cleaner in this case.
While hrtimer_interrupts is made atomic_t, we change
hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
needed to match the data type backing atomic_t for hrtimer_interrupts.
Even if this changes us from 64-bits to 32-bits (which I don't think is
true for most compilers), it doesn't really matter. All we ever do is
increment it every few seconds and compare it to an old value so 32-bits
is fine (even 16-bits would be). The "signed" vs "unsigned" also doesn't
matter for simple equality comparisons.
hrtimer_interrupts_saved is _not_ switched to atomic_t nor even accessed
with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is always
consistently accessed with the same CPU. NOTE: with the upcoming "buddy"
detector there is one special case. When a CPU goes offline/online then
we can change which CPU is the one to consistently access a given instance
of hrtimer_interrupts_saved. We still can't end up with a partially
updated hrtimer_interrupts_saved, however, because we end up petting all
affected CPUs to make sure the new and old CPU can't end up somehow
read/write hrtimer_interrupts_saved at the same time.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 77c12fc95980d100fdc49e88a5727c242d0dfedc)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I3a7d4dd8c23ac30ee0b607d77feb6646b64825c0
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M include/linux/nmi.h
M kernel/watchdog.c
M kernel/watchdog_perf.c
ap...@google.com <ap...@google.com> #52
Branch: chromeos-6.1
commit b828a2620013a42e676dc5ee1d838405384240ef
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:33 2023
UPSTREAM: watchdog/hardlockup: style changes to watchdog_hardlockup_check() / is_hardlockup()
These are tiny style changes:
- Add a blank line before a "return".
- Renames two globals to use the "watchdog_hardlockup" prefix.
- Store processor id in "unsigned int" rather than "int".
- Minor comment rewording.
- Use "else" rather than extra returns since it seemed more symmetric.
Link:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 1610611aadc2241179e7090ba2f3fd0d763d9932)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I818492c326b632560b09f20d2608455ecf9d3650
Reviewed-on:
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #53
Branch: chromeos-6.1
commit 12abb46c748583fe636612e02c3c62ac8a342ccf
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:32 2023
UPSTREAM: watchdog/hardlockup: move perf hardlockup checking/panic to common watchdog.c
The perf hardlockup detector works by looking at interrupt counts and
seeing if they change from run to run. The interrupt counts are managed
by the common watchdog code via its watchdog_timer_fn().
Currently the API between the perf detector and the common code is a
function: is_hardlockup(). When the hard lockup detector sees that
function return true then it handles printing out debug info and inducing
a panic if necessary.
Let's change the API a little bit in preparation for the buddy hardlockup
detector. The buddy hardlockup detector wants to print nearly the same
debug info and have nearly the same panic behavior. That means we want to
move all that code to the common file. For now, the code in the common
file will only be there if the perf hardlockup detector is enabled, but
eventually it will be selected by a common config.
Right now, this _just_ moves the code from the perf detector file to the
common file and changes the names. It doesn't make the changes that the
buddy hardlockup detector will need and doesn't do any style cleanups. A
future patch will do cleanup to make it more obvious what changed.
With the above, we no longer have any callers of is_hardlockup() outside
of the "watchdog.c" file, so we can remove it from the header, make it
static, and move it to the same "#ifdef" block as our new
watchdog_hardlockup_check(). While doing this, it can be noted that even
if no hardlockup detectors were configured the existing code used to still
have the code for counting/checking "hrtimer_interrupts" even if the perf
hardlockup detector wasn't configured. We didn't need to do that, so move
all the "hrtimer_interrupts" counting to only be there if the perf
hardlockup detector is configured as well.
This change is expected to be a no-op.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 81972551df9d168a8183b786ff4de06008469c2e)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Id4133d3183e798122dc3b6205e7852601f289071
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M include/linux/nmi.h
M kernel/watchdog.c
M kernel/watchdog_perf.c
di...@google.com <di...@google.com> #54
I'm going to call this done. Woohoo! Special thanks to Guenter for giving the +2 and suffering the email spam. Also (of course) happy for all the help that folks upstream gave.
If someone has a burning desire to have the upstream version of the buddy detector pushed to an older kernel, let me know. I think our old downstream one ought to be fine, though. The upstream one might give slightly nicer output, but I don't think it's important enough to backport a ton of patches for.
ap...@google.com <ap...@google.com> #55
Branch: main
commit 80da8860a3c81aa40fde97018176474458bdaeee
Author: Douglas Anderson <dianders@chromium.org>
Date: Wed Apr 19 13:09:47 2023
qualcomm+mediatek_defconfig: Turn on the buddy hardlockup detector
Now that the buddy lockup detector is upstream, we can enable it in
our fallback config.
In the qualcomm defconfig, this also moves a few configs to the
location that they're when you do `savedefconfig` with the latest
upstream kernel.
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Ia16de1737106471339f0b160912130fb95075709
Reviewed-on:
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
M eclass/cros-kernel/mediatek_defconfig
M eclass/cros-kernel/qualcomm_defconfig
ap...@google.com <ap...@google.com> #56
Branch: chromeos-6.1
commit 5afbac24cd69b5b57cbab8e0fe856a04adc12ba7
Author: Linux Patches Robot <linux-patches-robot@chromeos-missing-patches.google.com.iam.gserviceaccount.com>
Date: Wed Aug 30 01:40:35 2023
UPSTREAM: watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check()
After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started storing a `struct cpumask` on the
stack in watchdog_hardlockup_check(). On systems with CONFIG_NR_CPUS set
to 8192 this takes up 1K on the stack. That triggers warnings with
`CONFIG_FRAME_WARN` set to 1024.
We'll use the new trigger_allbutcpu_cpu_backtrace() to avoid needing to
use a CPU mask at all.
Link:
Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reported-by: kernel test robot <lkp@intel.com>
Closes:
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 1f38c86bb29f4548b8df01b47a313518e6ed2dfe)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Signed-off-by: Linux Patches Robot <linux-patches-robot@chromeos-missing-patches.google.com.iam.gserviceaccount.com>
Change-Id: Id0da74a59a4537067c2dadcb8e758b8b8751fcda
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Guenter Roeck <groeck@chromium.org>
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #57
Branch: chromeos-6.1
commit eca0384cbe6a8e771d7915818d71031831a13d83
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri Aug 04 07:00:42 2023
UPSTREAM: nmi_backtrace: allow excluding an arbitrary CPU
The APIs that allow backtracing across CPUs have always had a way to
exclude the current CPU. This convenience means callers didn't need to
find a place to allocate a CPU mask just to handle the common case.
Let's extend the API to take a CPU ID to exclude instead of just a
boolean. This isn't any more complex for the API to handle and allows the
hardlockup detector to exclude a different CPU (the one it already did a
trace for) without needing to find space for a CPU mask.
Arguably, this new API also encourages safer behavior. Specifically if
the caller wants to avoid tracing the current CPU (maybe because they
already traced the current CPU) this makes it more obvious to the caller
that they need to make sure that the current CPU ID can't change.
[akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub]
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8d539b84f1e3478436f978ceaf55a0b6cab497b5)
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Ia35521b91fc781368945161d7b28538f9996c182
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Guenter Roeck <groeck@chromium.org>
M arch/arm/include/asm/irq.h
M arch/arm/kernel/smp.c
M arch/loongarch/include/asm/irq.h
M arch/loongarch/kernel/process.c
M arch/mips/include/asm/irq.h
M arch/mips/kernel/process.c
M arch/powerpc/include/asm/irq.h
M arch/powerpc/kernel/stacktrace.c
M arch/powerpc/kernel/watchdog.c
M arch/sparc/include/asm/irq_64.h
M arch/sparc/kernel/process_64.c
M arch/x86/include/asm/irq.h
M arch/x86/kernel/apic/hw_nmi.c
M include/linux/nmi.h
M kernel/watchdog.c
M lib/nmi_backtrace.c
ap...@google.com <ap...@google.com> #58
Branch: chromeos-6.1
commit 4e357380c85696f5cf2ab2153c28ba6b9c7482e4
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Sep 18 14:24:39 2023
CHROMIUM: config: Switch x86 to the buddy hardlockup detector
As talked about in commit 1f423c905a6b ("watchdog/hardlockup: detect
hard lockups using secondary (buddy) CPUs"), the buddy hardlockup
detector has some advantages over the traditional perf-based
hardlockup detector. The most notable advantage should be slightly
fewer wakeup events since the same interrupt is now used to update our
counter as is used to check our buddy's counter.
Since all Chromebooks have at least two cores, the main downside of
the buddy lockup detector is eliminated so let's switch over to it.
BUG=b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: I064df1d0d470aaac549ca79b7e819d2e92c8f2d3
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Guenter Roeck <groeck@chromium.org>
M chromeos/config/chromeos/x86_64/common.config
ap...@google.com <ap...@google.com> #59
Branch: chromeos-5.15
commit 3350a3f5b75b51f3beeeb492319057acc736dc38
Author: Douglas Anderson <dianders@chromium.org>
Date: Thu Nov 02 13:45:54 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (match upstream format)
The format for the output for the buddy lockup detector changed a bit
(for the better) when it landed upstream. The format that landed
upstream also matches the format of the "perf" hardlockup
detector. Though there's not a huge advantage of backporting all of
the cleanups so we can get the exact upstream code, let's at least get
the format changes in our downstream version.
NOTE: this version uses trigger_cpumask_backtrace(), which is more
like how upstream worked prior to commit 1f38c86bb29f
("watchdog/hardlockup: avoid large stack frames in
watchdog_hardlockup_check()"). Doing it the upstream worked after that
commit also needs a bunch of extra backporting and the output here is
the same. In our case we use a static global to store the cpumask (to
avoid any large stackframe warnings) and that's not wasting too much
globals space because we never build kernels with a large
CONFIG_NR_CPUS.
DO NOT PICK THIS PATCH TO NEWER KERNELS DURING UPREVS. Newer kernels
will use the upstream code directly.
BUG=b:172213097
TEST=Format looks better
Change-Id: Ie6565d0b8f020e1b95bba4eb498ee5d45e533042
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M kernel/watchdog_buddy_cpu.c
ap...@google.com <ap...@google.com> #60
Branch: chromeos-5.10
commit 1721529828fcbef5dcf6b8b0fe774516390016a2
Author: Douglas Anderson <dianders@chromium.org>
Date: Thu Nov 02 13:45:54 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (match upstream format)
The format for the output for the buddy lockup detector changed a bit
(for the better) when it landed upstream. The format that landed
upstream also matches the format of the "perf" hardlockup
detector. Though there's not a huge advantage of backporting all of
the cleanups so we can get the exact upstream code, let's at least get
the format changes in our downstream version.
NOTE: this version uses trigger_cpumask_backtrace(), which is more
like how upstream worked prior to commit 1f38c86bb29f
("watchdog/hardlockup: avoid large stack frames in
watchdog_hardlockup_check()"). Doing it the upstream worked after that
commit also needs a bunch of extra backporting and the output here is
the same. In our case we use a static global to store the cpumask (to
avoid any large stackframe warnings) and that's not wasting too much
globals space because we never build kernels with a large
CONFIG_NR_CPUS.
DO NOT PICK THIS PATCH TO NEWER KERNELS DURING UPREVS. Newer kernels
will use the upstream code directly.
BUG=b:172213097
TEST=Format looks better
Change-Id: Ie6565d0b8f020e1b95bba4eb498ee5d45e533042
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on:
M kernel/watchdog_buddy_cpu.c
ap...@google.com <ap...@google.com> #61
Branch: chromeos-5.10
commit 2402f349c8523afddb44cf74f826c3f2a382731f
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 17 17:25:44 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (all backtrace)
The buddy hard lockup detector should try backtracing on all
CPUs. Right now it doesn't. Copy that bit of logic from the normal
hardlockup detector.
NOTE: On arm64 (the current user of the buddy detector), this won't
(yet) do anything. Soon, hopefully.
UPSTREAM-TASK=b:172213097
BUG=b:278598383, b:278594093, b:197061987, b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Id225408348d8a45e68080d08139bc6d9e170000a
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M kernel/watchdog_buddy_cpu.c
ap...@google.com <ap...@google.com> #62
Branch: chromeos-5.4
commit f5c5953de0fe06d1f6a2787f5470fa5abe6ffb02
Author: Douglas Anderson <dianders@chromium.org>
Date: Thu Nov 02 13:45:54 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (match upstream format)
The format for the output for the buddy lockup detector changed a bit
(for the better) when it landed upstream. The format that landed
upstream also matches the format of the "perf" hardlockup
detector. Though there's not a huge advantage of backporting all of
the cleanups so we can get the exact upstream code, let's at least get
the format changes in our downstream version.
NOTE: this version uses trigger_cpumask_backtrace(), which is more
like how upstream worked prior to commit 1f38c86bb29f
("watchdog/hardlockup: avoid large stack frames in
watchdog_hardlockup_check()"). Doing it the upstream worked after that
commit also needs a bunch of extra backporting and the output here is
the same. In our case we use a static global to store the cpumask (to
avoid any large stackframe warnings) and that's not wasting too much
globals space because we never build kernels with a large
CONFIG_NR_CPUS.
DO NOT PICK THIS PATCH TO NEWER KERNELS DURING UPREVS. Newer kernels
will use the upstream code directly.
BUG=b:172213097
TEST=Format looks better
Change-Id: Ie6565d0b8f020e1b95bba4eb498ee5d45e533042
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on:
M kernel/watchdog_buddy_cpu.c
ap...@google.com <ap...@google.com> #63
Branch: chromeos-5.4
commit 28e15645f7f4a7be395d1ea547849792ea4a1339
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 17 17:25:44 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (all backtrace)
The buddy hard lockup detector should try backtracing on all
CPUs. Right now it doesn't. Copy that bit of logic from the normal
hardlockup detector.
NOTE: On arm64 (the current user of the buddy detector), this won't
(yet) do anything. Soon, hopefully.
UPSTREAM-TASK=b:172213097
BUG=b:278598383, b:278594093, b:197061987, b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Id225408348d8a45e68080d08139bc6d9e170000a
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M kernel/watchdog_buddy_cpu.c
ap...@google.com <ap...@google.com> #64
Branch: chromeos-5.4
commit 1b415cbdc4d2258cb37610c4a66039ebffdec9f6
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 17 17:15:31 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (sysctl)
The CHROMIUM patch accidentally didn't expose the hardlockup panic
sysctls based on the right config. Fix it.
NOTE: Only one of these two sysctls actually does something with the
current buddy detector. You can turn on/off the hard lockup detector
but it doesn't (yet) support tracing other CPUs.
UPSTREAM-TASK=b:172213097
BUG=b:278598383, b:278594093, b:197061987, b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Id88d1fb603308e7210c30e42bb6e4e6a4be65a0c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
(cherry picked from commit 71986679fe52d94286f9051f09b958ecf582c7fc)
Reviewed-on:
Commit-Queue: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
M kernel/sysctl.c
ap...@google.com <ap...@google.com> #65
Branch: chromeos-5.10
commit 9c9acd59e87ef7423808dfc5b4109732a05f78db
Author: Douglas Anderson <dianders@chromium.org>
Date: Mon Apr 17 17:15:31 2023
FIXUP: CHROMIUM: hardlockup: detect hard lockups without NMIs using secondary cpus (sysctl)
The CHROMIUM patch accidentally didn't expose the hardlockup panic
sysctls based on the right config. Fix it.
NOTE: Only one of these two sysctls actually does something with the
current buddy detector. You can turn on/off the hard lockup detector
but it doesn't (yet) support tracing other CPUs.
UPSTREAM-TASK=b:172213097
BUG=b:278598383, b:278594093, b:197061987, b:172213097
TEST=echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
Change-Id: Id88d1fb603308e7210c30e42bb6e4e6a4be65a0c
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-on:
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
(cherry picked from commit 71986679fe52d94286f9051f09b958ecf582c7fc)
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Guenter Roeck <groeck@chromium.org>
M kernel/sysctl.c
Description
Previously Colin Cross attempted to upstream this but it seems to have dropped on the floor. It's still useful, though. Maybe someone wants to make another attempt?
For some details see:
*
*
...as per the linked bug, I think the buddy detector could still be useful even if we ever get a true NMI-based detector.