From e2fbc857d3c677ce96d9260577491e0d8f21b6f7 Mon Sep 17 00:00:00 2001 From: Namhyung Kim Date: Fri, 8 Dec 2023 17:52:11 -0800 Subject: x86/nmi: Rate limit unknown NMI messages On some AMD machines, unknown NMI messages were printed on the console continuously when using perf command with IBS. It was reported that it can slow down the kernel. Ratelimit the unknown NMI messages. Signed-off-by: Namhyung Kim Signed-off-by: Borislav Petkov (AMD) Acked-by: Ravi Bangoria Acked-by: Guilherme Amadio Acked-by: Thomas Gleixner Link: https://lore.kernel.org/r/20231209015211.357983-1-namhyung@kernel.org --- arch/x86/kernel/nmi.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'arch') diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index 17e955ab69fe..d23867901186 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -303,13 +303,13 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs) __this_cpu_add(nmi_stats.unknown, 1); - pr_emerg("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n", - reason, smp_processor_id()); + pr_emerg_ratelimited("Uhhuh. NMI received for unknown reason %02x on CPU %d.\n", + reason, smp_processor_id()); if (unknown_nmi_panic || panic_on_unrecovered_nmi) nmi_panic(regs, "NMI: Not continuing"); - pr_emerg("Dazed and confused, but trying to continue\n"); + pr_emerg_ratelimited("Dazed and confused, but trying to continue\n"); } NOKPROBE_SYMBOL(unknown_nmi_error); -- cgit v1.2.3 From d54e56f31a34fa38fcb5e91df609f9633419a79a Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Wed, 7 Feb 2024 08:52:35 -0800 Subject: x86/nmi: Fix the inverse "in NMI handler" check Commit 344da544f177 ("x86/nmi: Print reasons why backtrace NMIs are ignored") creates a super nice framework to diagnose NMIs. Every time nmi_exc() is called, it increments a per_cpu counter (nsp->idt_nmi_seq). At its exit, it also increments the same counter. By reading this counter it can be seen how many times that function was called (dividing by 2), and, if the function is still being executed, by checking the idt_nmi_seq's least significant bit. On the check side (nmi_backtrace_stall_check()), that variable is queried to check if the NMI is still being executed, but, there is a mistake in the bitwise operation. That code wants to check if the least significant bit of the idt_nmi_seq is set or not, but does the opposite, and checks for all the other bits, which will always be true after the first exc_nmi() executed successfully. This appends the misleading string to the dump "(CPU currently in NMI handler function)" Fix it by checking the least significant bit, and if it is set, append the string. Fixes: 344da544f177 ("x86/nmi: Print reasons why backtrace NMIs are ignored") Signed-off-by: Breno Leitao Signed-off-by: Thomas Gleixner Reviewed-by: Paul E. McKenney Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240207165237.1048837-1-leitao@debian.org --- arch/x86/kernel/nmi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index d23867901186..c95dc1ba1d27 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -639,7 +639,7 @@ void nmi_backtrace_stall_check(const struct cpumask *btp) msgp = nmi_check_stall_msg[idx]; if (nsp->idt_ignored_snap != READ_ONCE(nsp->idt_ignored) && (idx & 0x1)) modp = ", but OK because ignore_nmis was set"; - if (nmi_seq & ~0x1) + if (nmi_seq & 0x1) msghp = " (CPU currently in NMI handler function)"; else if (nsp->idt_nmi_seq_snap + 1 == nmi_seq) msghp = " (CPU exited one NMI handler function)"; -- cgit v1.2.3