summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorBorislav Petkov <bp@suse.de>2021-12-20 23:43:28 +0300
committerBorislav Petkov <bp@suse.de>2022-02-23 13:09:25 +0300
commit7f1b8e0d6360178e3527d4f14e6921c254a86035 (patch)
tree21fd1faf6a1cbe36343dc8330da52c750bee7a33 /Documentation
parent8ca97812c3c830573f965a07bbd84223e8c5f5bd (diff)
downloadlinux-7f1b8e0d6360178e3527d4f14e6921c254a86035.tar.xz
x86/mce: Remove the tolerance level control
This is pretty much unused and not really useful. What is more, all relevant MCA hardware has recoverable machine checks support so there's no real need to tweak MCA tolerance levels in order to *maybe* extend machine lifetime. So rip it out. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/YcDq8PxvKtTENl/e@zn.tnic
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/removed/sysfs-mce37
-rw-r--r--Documentation/ABI/testing/sysfs-mce32
-rw-r--r--Documentation/vm/hwpoison.rst2
-rw-r--r--Documentation/x86/x86_64/boot-options.rst9
4 files changed, 38 insertions, 42 deletions
diff --git a/Documentation/ABI/removed/sysfs-mce b/Documentation/ABI/removed/sysfs-mce
new file mode 100644
index 000000000000..ef5dd2a80918
--- /dev/null
+++ b/Documentation/ABI/removed/sysfs-mce
@@ -0,0 +1,37 @@
+What: /sys/devices/system/machinecheck/machinecheckX/tolerant
+Contact: Borislav Petkov <bp@suse.de>
+Date: Dec, 2021
+Description:
+ Unused and obsolete after the advent of recoverable machine
+ checks (see last sentence below) and those are present since
+ 2010 (Nehalem).
+
+ Original description:
+
+ The entries appear for each CPU, but they are truly shared
+ between all CPUs.
+
+ Tolerance level. When a machine check exception occurs for a
+ non corrected machine check the kernel can take different
+ actions.
+
+ Since machine check exceptions can happen any time it is
+ sometimes risky for the kernel to kill a process because it
+ defies normal kernel locking rules. The tolerance level
+ configures how hard the kernel tries to recover even at some
+ risk of deadlock. Higher tolerant values trade potentially
+ better uptime with the risk of a crash or even corruption
+ (for tolerant >= 3).
+
+ == ===========================================================
+ 0 always panic on uncorrected errors, log corrected errors
+ 1 panic or SIGBUS on uncorrected errors, log corrected errors
+ 2 SIGBUS or log uncorrected errors, log corrected errors
+ 3 never panic or SIGBUS, log all errors (for testing only)
+ == ===========================================================
+
+ Default: 1
+
+ Note this only makes a difference if the CPU allows recovery
+ from a machine check exception. Current x86 CPUs generally
+ do not.
diff --git a/Documentation/ABI/testing/sysfs-mce b/Documentation/ABI/testing/sysfs-mce
index c8cd989034b4..83172f50e27c 100644
--- a/Documentation/ABI/testing/sysfs-mce
+++ b/Documentation/ABI/testing/sysfs-mce
@@ -53,38 +53,6 @@ Description:
(but some corrected errors might be still reported
in other ways)
-What: /sys/devices/system/machinecheck/machinecheckX/tolerant
-Contact: Andi Kleen <ak@linux.intel.com>
-Date: Feb, 2007
-Description:
- The entries appear for each CPU, but they are truly shared
- between all CPUs.
-
- Tolerance level. When a machine check exception occurs for a
- non corrected machine check the kernel can take different
- actions.
-
- Since machine check exceptions can happen any time it is
- sometimes risky for the kernel to kill a process because it
- defies normal kernel locking rules. The tolerance level
- configures how hard the kernel tries to recover even at some
- risk of deadlock. Higher tolerant values trade potentially
- better uptime with the risk of a crash or even corruption
- (for tolerant >= 3).
-
- == ===========================================================
- 0 always panic on uncorrected errors, log corrected errors
- 1 panic or SIGBUS on uncorrected errors, log corrected errors
- 2 SIGBUS or log uncorrected errors, log corrected errors
- 3 never panic or SIGBUS, log all errors (for testing only)
- == ===========================================================
-
- Default: 1
-
- Note this only makes a difference if the CPU allows recovery
- from a machine check exception. Current x86 CPUs generally
- do not.
-
What: /sys/devices/system/machinecheck/machinecheckX/trigger
Contact: Andi Kleen <ak@linux.intel.com>
Date: Feb, 2007
diff --git a/Documentation/vm/hwpoison.rst b/Documentation/vm/hwpoison.rst
index 89b5f7a52077..c742de1769d1 100644
--- a/Documentation/vm/hwpoison.rst
+++ b/Documentation/vm/hwpoison.rst
@@ -60,8 +60,6 @@ There are two (actually three) modes memory failure recovery can be in:
vm.memory_failure_recovery sysctl set to zero:
All memory failures cause a panic. Do not attempt recovery.
- (on x86 this can be also affected by the tolerant level of the
- MCE subsystem)
early kill
(can be controlled globally and per process)
diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
index ccb7e86bf8d9..07aa0007f346 100644
--- a/Documentation/x86/x86_64/boot-options.rst
+++ b/Documentation/x86/x86_64/boot-options.rst
@@ -47,14 +47,7 @@ Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables.
in a reboot. On Intel systems it is enabled by default.
mce=nobootlog
Disable boot machine check logging.
- mce=tolerancelevel[,monarchtimeout] (number,number)
- tolerance levels:
- 0: always panic on uncorrected errors, log corrected errors
- 1: panic or SIGBUS on uncorrected errors, log corrected errors
- 2: SIGBUS or log uncorrected errors, log corrected errors
- 3: never panic or SIGBUS, log all errors (for testing only)
- Default is 1
- Can be also set using sysfs which is preferable.
+ mce=monarchtimeout (number)
monarchtimeout:
Sets the time in us to wait for other CPUs on machine checks. 0
to disable.