From bc85746b44f3792b9c1144afbf9de26bef1af71d Mon Sep 17 00:00:00 2001 From: Adriana Kobylak Date: Thu, 9 Mar 2023 13:53:19 -0600 Subject: meta-ibm: systemd: Enable hardware watchdog The hardware watchdog config was disabled with commit [1] because the fan-watchdog.bb in meta-ibm uses the /dev/watchdog already, which according to the kernel documentation[2] it's the same device as /dev/watchdog0. ``` [1]: https://gerrit.openbmc.org/c/openbmc/openbmc/+/60829 [2]: https://www.kernel.org/doc/Documentation/watchdog/watchdog-kernel-api.txt ``` Update the hardware watchdog config to use the currently unused watchdog1 device to be able to recover from systemd hangs. Verified that all IBM and OpenPower device trees contain a wdt2 device. Tested: - With the change, the BMC reboots after 2 min from injecting a systemd error: Mar 09 20:53:30 witherspoon systemd[1]: Caught from PID 552. Mar 09 20:53:30 witherspoon systemd-coredump[562]: Due to PID 1 having crashed coredump collection will now be turned off. Mar 09 20:54:25 witherspoon kernel: watchdog: watchdog1: watchdog did not stop! Mar 09 20:54:22 witherspoon systemd[1]: Freezing execution. Mar 09 20:55:57 witherspoon systemd-journald[132]: Failed to send WATCHDOG=1 notification message: Connection refused client_loop: send disconnect: Broken pipe - Without the change, the BMC just hangs, it stops pinging, and never reboots: Mar 09 21:07:23 witherspoon systemd[1]: Caught from PID 433. Mar 09 21:07:24 witherspoon systemd-coredump[687]: Due to PID 1 having crashed coredump collection will now be turned off. Mar 09 21:08:07 witherspoon systemd[1]: Freezing execution. Mar 09 21:08:41 witherspoon systemd-journald[120]: Failed to send WATCHDOG=1 notification message: Connection refused Mar 09 21:10:11 witherspoon systemd-journald[120]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected Mar 09 21:11:41 witherspoon systemd-journald[120]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected Mar 09 21:13:12 witherspoon systemd-journald[120]: Failed to send WATCHDOG=1 notification message: Transport endpoint is not connected Change-Id: I7850d23805c1cb5c0b84cac4add28df16fe648f5 Signed-off-by: Adriana Kobylak --- meta-ibm/recipes-core/systemd/systemd/40-hardware-watchdog.conf | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'meta-ibm/recipes-core') diff --git a/meta-ibm/recipes-core/systemd/systemd/40-hardware-watchdog.conf b/meta-ibm/recipes-core/systemd/systemd/40-hardware-watchdog.conf index 42ca55e2bc..f7373588d4 100644 --- a/meta-ibm/recipes-core/systemd/systemd/40-hardware-watchdog.conf +++ b/meta-ibm/recipes-core/systemd/systemd/40-hardware-watchdog.conf @@ -1,3 +1,3 @@ [Manager] -#RuntimeWatchdogSec=120s -#WatchdogDevice=/dev/watchdog +RuntimeWatchdogSec=120s +WatchdogDevice=/dev/watchdog1 -- cgit v1.2.3