From d51507098ff91e863b6e0a8047507741d59b8175 Mon Sep 17 00:00:00 2001 From: Stephen Brennan Date: Wed, 2 Feb 2022 09:18:19 -0800 Subject: printk: disable optimistic spin during panic A CPU executing with console lock spinning enabled might be halted during a panic. Before the panicking CPU calls console_flush_on_panic(), it may call console_trylock(), which attempts to optimistically spin, deadlocking the panic CPU: CPU 0 (panic CPU) CPU 1 ----------------- ------ printk() { vprintk_func() { vprintk_default() { vprintk_emit() { console_unlock() { console_lock_spinning_enable(); ... printing to console ... panic() { crash_smp_send_stop() { NMI -------------------> HALT } atomic_notifier_call_chain() { printk() { ... console_trylock_spinnning() { // optimistic spin infinitely This hang during panic can be induced when a kdump kernel is loaded, and crash_kexec_post_notifiers=1 is present on the kernel command line. The following script which concurrently writes to /dev/kmsg, and triggers a panic, can result in this hang: #!/bin/bash date # 991 chars (based on log buffer size): chars="$(printf 'a%.0s' {1..991})" while :; do echo $chars > /dev/kmsg done & echo c > /proc/sysrq-trigger & date exit To avoid this deadlock, ensure that console_trylock_spinning() does not allow spinning once a panic has begun. Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") Suggested-by: Petr Mladek Signed-off-by: Stephen Brennan Reviewed-by: Petr Mladek Reviewed-by: Sergey Senozhatsky Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220202171821.179394-3-stephen.s.brennan@oracle.com --- kernel/printk/printk.c | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'kernel/printk') diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index f04bbed0aa79..e83c12770104 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1847,6 +1847,16 @@ static int console_trylock_spinning(void) if (console_trylock()) return 1; + /* + * It's unsafe to spin once a panic has begun. If we are the + * panic CPU, we may have already halted the owner of the + * console_sem. If we are not the panic CPU, then we should + * avoid taking console_sem, so the panic CPU has a better + * chance of cleanly acquiring it later. + */ + if (panic_in_progress()) + return 0; + printk_safe_enter_irqsave(flags); raw_spin_lock(&console_owner_lock); -- cgit v1.2.3