Status Update
Comments
di...@google.com <di...@google.com> #2
sw...@google.com <sw...@google.com> #3
[1]
[2]
[3]
so...@google.com <so...@google.com> #4
sw...@google.com <sw...@google.com> #5
sw...@google.com <sw...@google.com> #6
localhost ~ # cat /sys/kernel/debug/irq/irqs/39
handler: handle_fasteoi_irq
device: (null)
status: 0x00000000
istate: 0x00000000
ddepth: 1
wdepth: 0
dstate: 0x03030001
IRQ_TYPE_EDGE_RISING
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_SINGLE_TARGET
node: 0
affinity: 0-7
effectiv:
domain: :soc@0:interrupt-controller@17a00000-1
hwirq: 0x192
chip: GICv3
flags: 0x115
IRQCHIP_SET_TYPE_MASKED
IRQCHIP_MASK_ON_SUSPEND
IRQCHIP_SKIP_SET_WAKE
IRQCHIP_SUPPORTS_NMI
and we can see that the gic irqchip has NMI. So the code seems to work at least to the point of probing support for this. Now we need to gain callers of request_nmi() or request_percpu_nmi() to implement a hardlock detector. I don't see any patches on the list, but maybe someone is working on it?
sw...@google.com <sw...@google.com> #8
It sounds like there is a performance concern if we force the commandline parameter to make NMIs usable via irqchip.gicv3_pseudo_nmi=1 on the commandline. We should run a few benchmarks to make sure things don't get worse.
The other problem that will need to be resolved is that we'll need to either document and/or fix the perf tool usage on the DUT so that the lockup detector is turned off while sampling with perf. The perf based lockup detector looks to take away a perf counter from the system so we can't do all possible perf recordings without it. Maybe this doesn't matter if we're not stressing this scenario.
di...@google.com <di...@google.com> #9
gm...@google.com <gm...@google.com> #10
I am not sure how many event counters are available on the typical ARM64 architecture, but one can test if "perf record -a -e cycles -c 1000003" still collects data while the lockup detector is on.
sw...@google.com <sw...@google.com> #11
hw perfevents: enabled with armv8_pmuv3 PMU driver, 7 counters available
and then with these patches it shows
NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
The perf record line suggested in
gm...@google.com <gm...@google.com> #12
Your experiment in #16 shows that losing one counter is not a problem for what we collect.
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #13
commit e3fe1a74b1f8707dc41f909cfe18127e7113d3db
Author: Ionela Voinescu <ionela.voinescu@arm.com>
Date: Sat Sep 12 04:25:42 2020
UPSTREAM: cpufreq: add function to get the hardware max frequency
Add weak function to return the hardware maximum frequency of a CPU,
with the default implementation returning cpuinfo.max_freq, which is
the best information we can generically get from the cpufreq framework.
The default can be overwritten by a strong function in platforms
that want to provide an alternative implementation, with more accurate
information, obtained either from hardware or firmware.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit bbce8eaa603236bf958b0d24e6377b3f3b623991)
BUG=chromium:972103
TEST=trigger hardlockup on CPU, see all stacks
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: Ie0f18b104a1f9d0a9236f726a28ec0185697156d
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
[modify]
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #14
commit 04f05c66182df55846183f71e02ca728e16cb665
Author: Andrew Murray <andrew.murray@arm.com>
Date: Sat Sep 12 04:25:44 2020
UPSTREAM: arm64: perf: Add support for ARMv8.5-PMU 64-bit counters
At present ARMv8 event counters are limited to 32-bits, though by
using the CHAIN event it's possible to combine adjacent counters to
achieve 64-bits. The perf config1:0 bit can be set to use such a
configuration.
With the introduction of ARMv8.5-PMU support, all event counters can
now be used as 64-bit counters.
Let's enable 64-bit event counters where support exists. Unless the
user sets config1:0 we will adjust the counter value such that it
overflows upon 32-bit overflow. This follows the same behaviour as
the cycle counter which has always been (and remains) 64-bits.
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[Mark: fix ID field names, compare with 8.5 value]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 8673e02e58410e6c4cefa499efa846286e45a991)
BUG=chromium:972103
TEST=trigger hardlockup on CPU, see all stacks
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: I4c5d9768a174de870b4ba5dd25ad0c00cd110683
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
[modify]
[modify]
[modify]
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #15
commit 7c4b55d4b4603da9fb04655c610ad96b409f6143
Author: Alexandru Elisei <alexandru.elisei@arm.com>
Date: Fri Oct 02 00:36:23 2020
FROMGIT: arm64: perf: Add missing ISB in armv8pmu_enable_counter()
Writes to the PMXEVTYPER_EL0 register are not self-synchronising. In
armv8pmu_enable_event(), the PE can reorder configuring the event type
after we have enabled the counter and the interrupt. This can lead to an
interrupt being asserted because of the previous event type that we were
counting using the same counter, not the one that we've just configured.
The same rationale applies to writes to the PMINTENSET_EL1 register. The PE
can reorder enabling the interrupt at any point in the future after we have
enabled the event.
Prevent both situations from happening by adding an ISB just before we
enable the event counter.
Fixes: 030896885ade ("arm64: Performance counters support")
Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link:
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 490d7b7c0845eacf5593db333fd2ae7715416e16
BUG=chromium:972103
TEST=perf record -a -- sleep 60
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: I6c4bb15d670b3219589af4a167b8dc1e81012043
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #16
commit 88aee879bb65cbee80dd2ae8d4ddd771af8380a6
Author: Mark Rutland <mark.rutland@arm.com>
Date: Fri Oct 02 00:36:25 2020
FROMGIT: arm64: perf: Avoid PMXEV* indirection
Currently we access the counter registers and their respective type
registers indirectly. This requires us to write to PMSELR, issue an ISB,
then access the relevant PMXEV* registers.
This is unfortunate, because:
* Under virtualization, accessing one register requires two traps to
the hypervisor, even though we could access the register directly with
a single trap.
* We have to issue an ISB which we could otherwise avoid the cost of.
* When we use NMIs, the NMI handler will have to save/restore the select
register in case the code it preempted was attempting to access a
counter or its type register.
We can avoid these issues by directly accessing the relevant registers.
This patch adds helpers to do so.
In armv8pmu_enable_event() we still need the ISB to prevent the PE from
reordering the write to PMINTENSET_EL1 register. If the interrupt is
enabled before we disable the counter and the new event is configured,
we might get an interrupt triggered by the previously programmed event
overflowing, but which we wrongly attribute to the event that we are
enabling. Execute an ISB after we disable the counter.
In the process, remove the comment that refers to the ARMv7 PMU.
[Julien T.: Don't inline read/write functions to avoid big code-size
increase, remove unused read_pmevtypern function,
fix counter index issue.]
[Alexandru E.: Removed comment, removed trailing semicolons in macros,
added ISB]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link:
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 0fdf1bb75953a67e63e5055a7709c629ab6d7692
BUG=chromium:972103
TEST=perf record -a -- sleep 60
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: Ic28530beca338873ba9fc58fe15d06d2ce18ce27
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #17
commit 2573b8a9af66c76f472d8aec19d6397da9beca74
Author: Julien Thierry <julien.thierry@arm.com>
Date: Fri Oct 02 00:36:26 2020
FROMGIT: arm64: perf: Remove PMU locking
The PMU is disabled and enabled, and the counters are programmed from
contexts where interrupts or preemption is disabled.
The functions to toggle the PMU and to program the PMU counters access the
registers directly and don't access data modified by the interrupt handler.
That, and the fact that they're always called from non-preemptible
contexts, means that we don't need to disable interrupts or use a spinlock.
[Alexandru E.: Explained why locking is not needed, removed WARN_ONs]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link:
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 2a0e2a02e4b719174547d6f04c27410c6fe456f5
BUG=chromium:972103
TEST=perf record -a -- sleep 60
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: Ia2eaaa073a9071db6dd361989cee232cf760b921
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #18
commit 32e89f21dabc1ec80efbd9d623e61ff2114ec4f9
Author: Julien Thierry <julien.thierry@arm.com>
Date: Fri Oct 02 00:36:27 2020
FROMGIT: arm64: perf: Defer irq_work to IPI_IRQ_WORK
When handling events, armv8pmu_handle_irq() calls perf_event_overflow(),
and subsequently calls irq_work_run() to handle any work queued by
perf_event_overflow(). As perf_event_overflow() raises IPI_IRQ_WORK when
queuing the work, this isn't strictly necessary and the work could be
handled as part of the IPI_IRQ_WORK handler.
In the common case the IPI handler will run immediately after the PMU IRQ
handler, and where the PE is heavily loaded with interrupts other handlers
may run first, widening the window where some counters are disabled.
In practice this window is unlikely to be a significant issue, and removing
the call to irq_work_run() would make the PMU IRQ handler NMI safe in
addition to making it simpler, so let's do that.
[Alexandru E.: Reworded commit message]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link:
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 05ab72813340d11205556c0d1bc08e6857a3856c
BUG=chromium:972103
TEST=perf record -a -- sleep 60
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: Ie17451aa18f797c51e9210f2016c1e64a72dd1eb
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #19
commit d8fd369ee22168e2ee14912f755e9823d9aa943d
Author: Julien Thierry <julien.thierry@arm.com>
Date: Fri Oct 02 00:36:28 2020
BACKPORT: FROMGIT: KVM: arm64: pmu: Make overflow handler NMI safe
kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from
NMI context, defer waking the vcpu to an irq_work queue.
A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent
running the irq_work for a non-existent vcpu by calling irq_work_sync() on
the PMU destroy path.
[Alexandru E.: Added irq_work_sync()]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose@arm.com>
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Link:
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 95e92e45a454a10a8114294d0f7aec930fb85891
Conflicts:
arch/arm64/kvm/pmu-emul.c
We're missing commit 9ed24f4b712b ("KVM: arm64: Move virt/kvm/arm to
arch/arm64") from upstream so we have to backport changes to where
the file used to be.
BUG=chromium:972103
TEST=perf record -a -- sleep 60
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: I2a7b79473891197b8c8846aba6bc6acc6c49dfdd
Reviewed-on:
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
[modify]
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #20
commit d012e6f1d895631e1910f0ed372eff2f4b0a8569
Author: Julien Thierry <julien.thierry@arm.com>
Date: Fri Oct 02 00:36:29 2020
FROMGIT: arm_pmu: Introduce pmu_irq_ops
Currently the PMU interrupt can either be a normal irq or a percpu irq.
Supporting NMI will introduce two cases for each existing one. It becomes
a mess of 'if's when managing the interrupt.
Define sets of callbacks for operations commonly done on the interrupt. The
appropriate set of callbacks is selected at interrupt request time and
simplifies interrupt enabling/disabling and freeing.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link:
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit f76b130bdb8949eac002b8e0ddb85576ed137838
BUG=chromium:972103
TEST=perf record -a -- sleep 60
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: If553bd59f8bd2cccbd1ba3321e449da2fdd7bbb4
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
[modify]
bu...@chops-service-accounts.iam.gserviceaccount.com <bu...@chops-service-accounts.iam.gserviceaccount.com> #21
commit 6c2bd3362ebd46c54de9e978f79628117561770e
Author: Julien Thierry <julien.thierry@arm.com>
Date: Fri Oct 02 00:36:30 2020
FROMGIT: arm_pmu: arm64: Use NMIs for PMU
Add required PMU interrupt operations for NMIs. Request interrupt lines as
NMIs when possible, otherwise fall back to normal interrupts.
NMIs are only supported on the arm64 architecture with a GICv3 irqchip.
[Alexandru E.: Added that NMIs only work on arm64 + GICv3, print message
when PMU is using NMIs]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link:
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit d8f6267f7ce5dc7b8920910e7e75216f77e06d21
BUG=chromium:972103
TEST=perf record -a -- sleep 60
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: I30ec7f27e34ff665f8c29460ac20ed9e0f014b07
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
[modify]
ap...@google.com <ap...@google.com> #22
Branch: chromeos-5.4
commit 83615913d98d403ae24326fabffb654e61611978
Author: Will Deacon <will@kernel.org>
Date: Fri Dec 04 09:19:35 2020
FROMGIT: arm64: Fix build failure when HARDLOCKUP_DETECTOR_PERF is enabled
If HARDLOCKUP_DETECTOR_PERF is selected but HW_PERF_EVENTS is not, then
the associated watchdog driver will fail to link:
| aarch64-linux-ld: Unexpected GOT/PLT entries detected!
| aarch64-linux-ld: Unexpected run-time procedure linkages detected!
| aarch64-linux-ld: kernel/watchdog_hld.o: in function `hardlockup_detector_event_create':
| >> watchdog_hld.c:(.text+0x68): undefined reference to `hw_nmi_get_sample_period
Change the Kconfig dependencies so that HAVE_PERF_EVENTS_NMI requires
the hardware PMU driver to be enabled, ensuring that the required
symbols are present.
Cc: Sumit Garg <sumit.garg@linaro.org>
Reported-by: kernel test robot <lkp@intel.com>
Link:
Fixes: 367c820ef080 ("arm64: Enable perf events based hard lockup detector")
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit ce4b2c01781a24b0a04845f683feea44ade70503
BUG=b:172228850
TEST=trigger hardlockup on CPU, see stacktrace
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: I60644b3c4eea3318f305d6b278a429a1a0293534
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
M arch/arm64/Kconfig
ap...@google.com <ap...@google.com> #23
Branch: chromeos-5.4
commit 21fe352407886afecc3698fa8fbf4288b142501b
Author: Sumit Garg <sumit.garg@linaro.org>
Date: Wed Oct 07 14:21:43 2020
FROMGIT: arm64: Enable perf events based hard lockup detector
With the recent feature added to enable perf events to use pseudo NMIs
as interrupts on platforms which support GICv3 or later, its now been
possible to enable hard lockup detector (or NMI watchdog) on arm64
platforms. So enable corresponding support.
One thing to note here is that normally lockup detector is initialized
just after the early initcalls but PMU on arm64 comes up much later as
device_initcall(). So we need to re-initialize lockup detection once
PMU has been initialized.
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Acked-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link:
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 367c820ef08082e68df8a3bc12e62393af21e4b5
BUG=b:172228850
TEST=trigger hardlockup on CPU, see stacktrace
Cq-Depend: chromium:2576119
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: I207553d086f401cbcfcfe9bfa035e3b8e7282373
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
M arch/arm64/Kconfig
M arch/arm64/kernel/perf_event.c
M drivers/perf/arm_pmu.c
M include/linux/perf/arm_pmu.h
di...@google.com <di...@google.com> #24
sw...@google.com <sw...@google.com> #26
was reverted upstream in v5.11-rc4,
We should make the kernel patches robot auto pick back reverts of patches that we land in our kernels. Right now it searches for Fixes tags but not reverts.
Things seem to have stalled somewhat though.
Agreed. It doesn't look like it's going to land upstream? I haven't been following closely.
ap...@google.com <ap...@google.com> #27
Branch: chromeos-5.4
commit 81dc15c630b5fc7ceda25a74b4a06400e0b73727
Author: Will Deacon <will@kernel.org>
Date: Tue Jan 12 22:18:55 2021
UPSTREAM: Revert "arm64: Enable perf events based hard lockup detector"
This reverts commit 367c820ef08082e68df8a3bc12e62393af21e4b5.
lockup_detector_init() makes heavy use of per-cpu variables and must be
called with preemption disabled. Usually, it's handled early during boot
in kernel_init_freeable(), before SMP has been initialised.
Since we do not know whether or not our PMU interrupt can be signalled
as an NMI until considerably later in the boot process, the Arm PMU
driver attempts to re-initialise the lockup detector off the back of a
device_initcall(). Unfortunately, this is called from preemptible
context and results in the following splat:
| BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
| caller is debug_smp_processor_id+0x20/0x2c
| CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x3c0
| show_stack+0x20/0x6c
| dump_stack+0x2f0/0x42c
| check_preemption_disabled+0x1cc/0x1dc
| debug_smp_processor_id+0x20/0x2c
| hardlockup_detector_event_create+0x34/0x18c
| hardlockup_detector_perf_init+0x2c/0x134
| watchdog_nmi_probe+0x18/0x24
| lockup_detector_init+0x44/0xa8
| armv8_pmu_driver_init+0x54/0x78
| do_one_initcall+0x184/0x43c
| kernel_init_freeable+0x368/0x380
| kernel_init+0x1c/0x1cc
| ret_from_fork+0x10/0x30
Rather than bodge this with raw_smp_processor_id() or randomly disabling
preemption, simply revert the culprit for now until we figure out how to
do this properly.
Reported-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Link:
Link:
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit b90d72a6bfdb5e5c62cd223a8cdf4045bfbcb94d)
BUG=b:172228850
TEST=emerge-trogdor chromeos-kernel-5_4
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Change-Id: I09ac3cec8a8a09e39f3bda8fc1d734c0444df7d6
Reviewed-on:
Reviewed-by: Sean Paul <seanpaul@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M arch/arm64/Kconfig
M arch/arm64/kernel/perf_event.c
M drivers/perf/arm_pmu.c
M include/linux/perf/arm_pmu.h
we...@google.com <we...@google.com> #28
Seems like PPI partitioning prevents this from working.
sw...@google.com <sw...@google.com> #29
Re
we...@google.com <we...@google.com> #30
Sure. The Mediatek chips all seem to have PPI partitioning for their PMU interrupts [1][2].
With that, the PMU driver reports that NMI interrupts are not available. I believe this is due to the PPI partitioning creates a new irqchip and separate irq domains to stand in, and the irqchip doesn't handle NMIs and therefore doesn't signal support for NMIs.
I tried adding that support, but it looks like due to how irq chaining works, the first partition ends up enabling the interrupt, and when the second partition requests NMI, the GICv3 driver denies it because the (same) interrupt on the GIC was already enabled. And there were some other issues I can remember off the top of my head.
[1]
[2]
av...@google.com <av...@google.com>
di...@google.com <di...@google.com>
di...@google.com <di...@google.com> #33
To some extent I guess I'd consider this "done". This patch is now in Linus's tree:
d7a0fe9ef6d6 arm64: enable perf events based hard lockup detector
...and should enable this if anyone needs it. ...but... Even ignoring the Mediatek firmware bug (
...but then again we probably don't care because, as far as I can tell, there's no real advantage of using perf for the lockup detector compared to the buddy lockup detector (
we...@google.com <we...@google.com> #34
If you want to verify that it works, any of the Qualcomm platforms should work. They have proper, non-partitioned PPIs for PMUs.
ap...@google.com <ap...@google.com> #35
Branch: chromeos-6.1
commit 25ac675dfde557802641020944594c88444e9d1b
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:42 2023
UPSTREAM: arm64: enable perf events based hard lockup detector
With the recent feature added to enable perf events to use pseudo NMIs as
interrupts on platforms which support GICv3 or later, its now been
possible to enable hard lockup detector (or NMI watchdog) on arm64
platforms. So enable corresponding support.
One thing to note here is that normally lockup detector is initialized
just after the early initcalls but PMU on arm64 comes up much later as
device_initcall(). To cope with that, override
arch_perf_nmi_is_available() to let the watchdog framework know PMU not
ready, and inform the framework to re-initialize lockup detection once PMU
has been initialized.
[dianders@chromium.org: only HAVE_HARDLOCKUP_DETECTOR_PERF if the PMU config is enabled]
Link:
Link:
Co-developed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit d7a0fe9ef6d6484fca4ba55c19091932337d4272)
BUG=b:172228850
TEST=Enable perf lockup detector and see it work
Change-Id: Ia44852044cdcb074f387e80df6b45e892965d4a1
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M arch/arm64/Kconfig
M arch/arm64/kernel/perf_event.c
M arch/arm64/kernel/watchdog_hld.c
M drivers/perf/arm_pmu.c
M include/linux/perf/arm_pmu.h
ap...@google.com <ap...@google.com> #36
Branch: chromeos-6.1
commit 299c819a360571bb0c76e5a0587bd99b60da4249
Author: Lecopzer Chen <lecopzer.chen@mediatek.com>
Date: Fri May 19 10:18:41 2023
BACKPORT: arm64: add hw_nmi_get_sample_period for preparation of lockup detector
Set safe maximum CPU frequency to 5 GHz in case a particular platform
doesn't implement cpufreq driver. Although, architecture doesn't put any
restrictions on maximum frequency but 5 GHz seems to be safe maximum given
the available Arm CPUs in the market which are clocked much less than 5
GHz.
On the other hand, we can't make it much higher as it would lead to a
large hard-lockup detection timeout on parts which are running slower (eg.
1GHz on Developerbox) and doesn't possess a cpufreq driver.
Link:
Co-developed-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 94946f9eaac116f2943ec79ec3df1ec2fc92ae07)
Conflicts:
arch/arm64/kernel/Makefile
...due to lack of commit 7755cec63ade ("arm64: perf: Move PMUv3 driver
to drivers/perf"). That commit is easy to pick, but the "Fixes" of
that patch are a pain to pick. Since this is just a context conflict,
let's just do the BACKPORT.
BUG=b:172228850
TEST=Enable perf lockup detector and see it work
Change-Id: Ia9d02578e89c3f44d3cb12eec8b0176603c8ab2f
Reviewed-on:
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M arch/arm64/kernel/Makefile
A arch/arm64/kernel/watchdog_hld.c
ap...@google.com <ap...@google.com> #37
Branch: chromeos-6.1
commit fc594ff52c2d9e6c7c29d277b1c0d7062ec59d4f
Author: Lecopzer Chen <lecopzer.chen@mediatek.com>
Date: Fri May 19 10:18:40 2023
UPSTREAM: watchdog/perf: adapt the watchdog_perf interface for async model
When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be not
ready yet. E.g. on arm64, PMU is not ready until
device_initcall(armv8_pmu_driver_init). And it is deeply integrated with
the driver model and cpuhp. Hence it is hard to push this initialization
before smp_init().
But it is easy to take an opposite approach and try to initialize the
watchdog once again later. The delayed probe is called using workqueues.
It need to allocate memory and must be proceed in a normal context. The
delayed probe is able to use if watchdog_hardlockup_probe() returns
non-zero which means the return code returned when PMU is not ready yet.
Provide an API - lockup_detector_retry_init() for anyone who needs to
delayed init lockup detector if they had ever failed at
lockup_detector_init().
The original assumption is: nobody should use delayed probe after
lockup_detector_check() which has __init attribute. That is, anyone uses
this API must call between lockup_detector_init() and
lockup_detector_check(), and the caller must have __init attribute
Link:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Co-developed-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Suggested-by: Petr Mladek <pmladek@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 930d8f8dbab97cb05dba30e67a2dfa0c6dbf4bc7)
BUG=b:172228850
TEST=Enable perf lockup detector and see it work
Change-Id: If4ad5dd5d09fb1309cebf8bcead4b6a5a7758ca7
Reviewed-on:
Reviewed-by: Sean Paul <sean@poorly.run>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Commit-Queue: Douglas Anderson <dianders@chromium.org>
M include/linux/nmi.h
M kernel/watchdog.c
ap...@google.com <ap...@google.com> #38
Branch: chromeos-6.1
commit b46ef70ae9468b3deba79c4a8339da8e9f8cb9f9
Author: Douglas Anderson <dianders@chromium.org>
Date: Fri May 19 10:18:39 2023
UPSTREAM: watchdog/perf: add a weak function for an arch to detect if perf can use NMIs
On arm64, NMI support needs to be detected at runtime. Add a weak
function to the perf hardlockup detector so that an architecture can
implement it to detect whether NMIs are available.
Link:
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit b17aa959330e8058452297049a0056ba4b9c72e8)
BUG=b:172228850
TEST=Enable perf lockup detector and see it work
Change-Id: Ic55cb6f90ef5967d8aaa2b503a4e67c753f64d3a
Reviewed-on:
Commit-Queue: Douglas Anderson <dianders@chromium.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Sean Paul <sean@poorly.run>
M include/linux/nmi.h
M kernel/watchdog_perf.c
we...@google.com <we...@google.com> #39
0-day bot reported:
FYI, the error/warning was bisected to this commit, please ignore it if it's irrelevant.
tree: https://chromium.googlesource.com/chromiumos/third_party/kernel chromeos-6.1
head: 71434b829327327088bdd3de43426e3f0437bcd2
commit: 1a46e6ec597bd31668c7daf1201616195dc2f57e [130/168] UPSTREAM: watchdog/hardlockup: rename some "NMI watchdog" constants/function
config: powerpc-allyesconfig (https://download.01.org/0day-ci/archive/20230719/202307191046.1AsuyX9R-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20230719/202307191046.1AsuyX9R-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307191046.1AsuyX9R-lkp@intel.com/
All errors (new ones prefixed by >>):
>> Cannot find symbol for section 4: .text.watchdog_hardlockup_enable.
kernel/watchdog.o: failed
di...@google.com <di...@google.com> #40
Hmmm. There were definitely some PPC breakages upstream but I thought I got all the fixes. Maybe there's an extra breakage downstream. Let me investigate.
di...@google.com <di...@google.com> #41
OK, so I started at ToT of chromeos-6.1:
b98aa1b68171 (HEAD, m/main, cros/chromeos-6.1) CHROMIUM: add configs to support audio legacy driver.
The zero-day report indicated problems with PPC and "allyesconfig", but instead of using the 0-day config/compiler I just used what I had to start with. My cross compiler came from the debian package gcc-powerpc-linux-gnu
. AKA I ran this outside the chroot:
CROSS_COMPILE=powerpc-linux-gnu- ARCH=powerpc make -j64 allyesconfig
CROSS_COMPILE=powerpc-linux-gnu- ARCH=powerpc make -j64
That didn't give me any failures. I tried again checking out the commit referenced:
commit: 1a46e6ec597bd31668c7daf1201616195dc2f57e [130/168] UPSTREAM: watchdog/hardlockup: rename some "NMI watchdog" constants/function
That also didn't give me any failures.
So I guess next I need to try the exact reproduction from 0-day. That gets a whole pile of compiler errors, including this one. Even turning off "WERROR" with echo 'CONFIG_WERROR=n' >> build_dir/.config
didn't help.
This really smells like a compiler bug. Checking out at the exact failing commit (1a46e6ec597b) and doing a git grep watchdog_hardlockup_enable
:
$ git grep watchdog_hardlockup_enable
arch/sparc/kernel/nmi.c:void watchdog_hardlockup_enable(unsigned int cpu)
include/linux/nmi.h:void watchdog_hardlockup_enable(unsigned int cpu);
kernel/watchdog.c: * watchdog_hardlockup_enable/disable can be implemented to start and stop when
kernel/watchdog.c:void __weak watchdog_hardlockup_enable(unsigned int cpu)
kernel/watchdog.c: watchdog_hardlockup_enable(cpu);
I think we can ignore the "sparc" file since this is a powerpc test. Thus we see one call to the function and the (__weak) implementation in the same file.
...and, if I take out the __weak
here then the error goes away.
I'm going to consider this to be a compiler bug in the 0-day compiler. Do you know the right person to report it to?
we...@google.com <we...@google.com> #42
I don't have any contacts in that direction. Maybe someone from Collabora working on Kernel CI might? Or you could just ask on IRC. I'm sure someone there would.
di...@google.com <di...@google.com> #43
I tried doing some digging.
-
Using the 0-day compiler/config on mainline (v6.5-rc2-46-gccff6d117d8d) actually seemed to compile / not have this error.
-
Using the 0-day compiler/config on pure upstream v6.1 didn't compile. It got assembler errors:
../arch/powerpc/mm/book3s32/hash_low.S:202:2: error: too few operands for instruction
cmpi 0,%r0,0
^
../arch/powerpc/mm/book3s32/hash_low.S:207:2: error: too few operands for instruction
cmpi 0,%r0,0
^
../arch/powerpc/mm/book3s32/hash_low.S:515:2: error: too few operands for instruction
cmpi 0,%r0,0
^
../arch/powerpc/mm/book3s32/hash_low.S:520:2: error: too few operands for instruction
cmpi 0,%r0,0
^
make[5]: *** [../scripts/Makefile.build:382: arch/powerpc/mm/book3s32/hash_low.o] Error 1
...and then it gets a pile of errors which are just like mine. A subset:
Cannot find symbol for section 3: .text.arch_setup_msi_irq.
drivers/pci/msi/legacy.o: failed
...
Cannot find symbol for section 3: .text.phys_mem_access_prot_allowed.
drivers/char/mem.o: failed
...
Cannot find symbol for section 6: .text.setup_kuep.
arch/powerpc/mm/init-common.o: failed
...
...
...
A very quick skim shows that these are just similar problems with __weak functions.
That would lead me to believe that before my patch (which just renamed things) the error already existed. ...but the rename made it look like a new error and so it got reported.
While I could try to bisect to see why ToT Linux works OK, it's probably not worth bothering. I'm going to re-close this.
Description
We are looking at picking the old buddy lockup detector to 4.19 in <
Compared to the buddy detector, the real detector would get stack crawls. Right now the only way that we get any real helpful data from the buddy detector is via something like <
NOTE: even if we got a real detector, it's _possible_ that the buddy detector and printing of DBGPCSR could be useful. If a CPU is hard locked up because of a CPU errata then presumably it will stop getting FIQs and fully stop executing instructions.