summaryrefslogtreecommitdiff
path: root/Documentation/x86
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/x86')
-rw-r--r--Documentation/x86/boot.rst4
-rw-r--r--Documentation/x86/buslock.rst126
-rw-r--r--Documentation/x86/elf_auxvec.rst53
-rw-r--r--Documentation/x86/index.rst2
-rw-r--r--Documentation/x86/mtrr.rst2
-rw-r--r--Documentation/x86/x86_64/boot-options.rst42
-rw-r--r--Documentation/x86/x86_64/mm.rst4
7 files changed, 195 insertions, 38 deletions
diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
index fc844913dece..894a19897005 100644
--- a/Documentation/x86/boot.rst
+++ b/Documentation/x86/boot.rst
@@ -1343,7 +1343,7 @@ follow::
In addition to read/modify/write the setup header of the struct
boot_params as that of 16-bit boot protocol, the boot loader should
also fill the additional fields of the struct boot_params as
-described in chapter :doc:`zero-page`.
+described in chapter Documentation/x86/zero-page.rst.
After setting up the struct boot_params, the boot loader can load the
32/64-bit kernel in the same way as that of 16-bit boot protocol.
@@ -1379,7 +1379,7 @@ can be calculated as follows::
In addition to read/modify/write the setup header of the struct
boot_params as that of 16-bit boot protocol, the boot loader should
also fill the additional fields of the struct boot_params as described
-in chapter :doc:`zero-page`.
+in chapter Documentation/x86/zero-page.rst.
After setting up the struct boot_params, the boot loader can load
64-bit kernel in the same way as that of 16-bit boot protocol, but
diff --git a/Documentation/x86/buslock.rst b/Documentation/x86/buslock.rst
new file mode 100644
index 000000000000..7c051e714943
--- /dev/null
+++ b/Documentation/x86/buslock.rst
@@ -0,0 +1,126 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+===============================
+Bus lock detection and handling
+===============================
+
+:Copyright: |copy| 2021 Intel Corporation
+:Authors: - Fenghua Yu <fenghua.yu@intel.com>
+ - Tony Luck <tony.luck@intel.com>
+
+Problem
+=======
+
+A split lock is any atomic operation whose operand crosses two cache lines.
+Since the operand spans two cache lines and the operation must be atomic,
+the system locks the bus while the CPU accesses the two cache lines.
+
+A bus lock is acquired through either split locked access to writeback (WB)
+memory or any locked access to non-WB memory. This is typically thousands of
+cycles slower than an atomic operation within a cache line. It also disrupts
+performance on other cores and brings the whole system to its knees.
+
+Detection
+=========
+
+Intel processors may support either or both of the following hardware
+mechanisms to detect split locks and bus locks.
+
+#AC exception for split lock detection
+--------------------------------------
+
+Beginning with the Tremont Atom CPU split lock operations may raise an
+Alignment Check (#AC) exception when a split lock operation is attemped.
+
+#DB exception for bus lock detection
+------------------------------------
+
+Some CPUs have the ability to notify the kernel by an #DB trap after a user
+instruction acquires a bus lock and is executed. This allows the kernel to
+terminate the application or to enforce throttling.
+
+Software handling
+=================
+
+The kernel #AC and #DB handlers handle bus lock based on the kernel
+parameter "split_lock_detect". Here is a summary of different options:
+
++------------------+----------------------------+-----------------------+
+|split_lock_detect=|#AC for split lock |#DB for bus lock |
++------------------+----------------------------+-----------------------+
+|off |Do nothing |Do nothing |
++------------------+----------------------------+-----------------------+
+|warn |Kernel OOPs |Warn once per task and |
+|(default) |Warn once per task and |and continues to run. |
+| |disable future checking | |
+| |When both features are | |
+| |supported, warn in #AC | |
++------------------+----------------------------+-----------------------+
+|fatal |Kernel OOPs |Send SIGBUS to user. |
+| |Send SIGBUS to user | |
+| |When both features are | |
+| |supported, fatal in #AC | |
++------------------+----------------------------+-----------------------+
+|ratelimit:N |Do nothing |Limit bus lock rate to |
+|(0 < N <= 1000) | |N bus locks per second |
+| | |system wide and warn on|
+| | |bus locks. |
++------------------+----------------------------+-----------------------+
+
+Usages
+======
+
+Detecting and handling bus lock may find usages in various areas:
+
+It is critical for real time system designers who build consolidated real
+time systems. These systems run hard real time code on some cores and run
+"untrusted" user processes on other cores. The hard real time cannot afford
+to have any bus lock from the untrusted processes to hurt real time
+performance. To date the designers have been unable to deploy these
+solutions as they have no way to prevent the "untrusted" user code from
+generating split lock and bus lock to block the hard real time code to
+access memory during bus locking.
+
+It's also useful for general computing to prevent guests or user
+applications from slowing down the overall system by executing instructions
+with bus lock.
+
+
+Guidance
+========
+off
+---
+
+Disable checking for split lock and bus lock. This option can be useful if
+there are legacy applications that trigger these events at a low rate so
+that mitigation is not needed.
+
+warn
+----
+
+A warning is emitted when a bus lock is detected which allows to identify
+the offending application. This is the default behavior.
+
+fatal
+-----
+
+In this case, the bus lock is not tolerated and the process is killed.
+
+ratelimit
+---------
+
+A system wide bus lock rate limit N is specified where 0 < N <= 1000. This
+allows a bus lock rate up to N bus locks per second. When the bus lock rate
+is exceeded then any task which is caught via the buslock #DB exception is
+throttled by enforced sleeps until the rate goes under the limit again.
+
+This is an effective mitigation in cases where a minimal impact can be
+tolerated, but an eventual Denial of Service attack has to be prevented. It
+allows to identify the offending processes and analyze whether they are
+malicious or just badly written.
+
+Selecting a rate limit of 1000 allows the bus to be locked for up to about
+seven million cycles each second (assuming 7000 cycles for each bus
+lock). On a 2 GHz processor that would be about 0.35% system slowdown.
diff --git a/Documentation/x86/elf_auxvec.rst b/Documentation/x86/elf_auxvec.rst
new file mode 100644
index 000000000000..18e4744717f9
--- /dev/null
+++ b/Documentation/x86/elf_auxvec.rst
@@ -0,0 +1,53 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+x86-specific ELF Auxiliary Vectors
+==================================
+
+This document describes the semantics of the x86 auxiliary vectors.
+
+Introduction
+============
+
+ELF Auxiliary vectors enable the kernel to efficiently provide
+configuration-specific parameters to userspace. In this example, a program
+allocates an alternate stack based on the kernel-provided size::
+
+ #include <sys/auxv.h>
+ #include <elf.h>
+ #include <signal.h>
+ #include <stdlib.h>
+ #include <assert.h>
+ #include <err.h>
+
+ #ifndef AT_MINSIGSTKSZ
+ #define AT_MINSIGSTKSZ 51
+ #endif
+
+ ....
+ stack_t ss;
+
+ ss.ss_sp = malloc(ss.ss_size);
+ assert(ss.ss_sp);
+
+ ss.ss_size = getauxval(AT_MINSIGSTKSZ) + SIGSTKSZ;
+ ss.ss_flags = 0;
+
+ if (sigaltstack(&ss, NULL))
+ err(1, "sigaltstack");
+
+
+The exposed auxiliary vectors
+=============================
+
+AT_SYSINFO is used for locating the vsyscall entry point. It is not
+exported on 64-bit mode.
+
+AT_SYSINFO_EHDR is the start address of the page containing the vDSO.
+
+AT_MINSIGSTKSZ denotes the minimum stack size required by the kernel to
+deliver a signal to user-space. AT_MINSIGSTKSZ comprehends the space
+consumed by the kernel to accommodate the user context for the current
+hardware configuration. It does not comprehend subsequent user-space stack
+consumption, which must be added by the user. (e.g. Above, user-space adds
+SIGSTKSZ to AT_MINSIGSTKSZ.)
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 4693e192b447..383048396336 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -29,9 +29,11 @@ x86-specific Documentation
microcode
resctrl
tsx_async_abort
+ buslock
usb-legacy-support
i386/index
x86_64/index
sva
sgx
features
+ elf_auxvec
diff --git a/Documentation/x86/mtrr.rst b/Documentation/x86/mtrr.rst
index c5b695d75349..9f0b1851771a 100644
--- a/Documentation/x86/mtrr.rst
+++ b/Documentation/x86/mtrr.rst
@@ -28,7 +28,7 @@ are aligned with platform MTRR setup. If MTRRs are only set up by the platform
firmware code though and the OS does not make any specific MTRR mapping
requests mtrr_type_lookup() should always return MTRR_TYPE_INVALID.
-For details refer to :doc:`pat`.
+For details refer to Documentation/x86/pat.rst.
.. tip::
On Intel P6 family processors (Pentium Pro, Pentium II and later)
diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
index 324cefff92e7..ccb7e86bf8d9 100644
--- a/Documentation/x86/x86_64/boot-options.rst
+++ b/Documentation/x86/x86_64/boot-options.rst
@@ -126,7 +126,7 @@ Idle loop
Rebooting
=========
- reboot=b[ios] | t[riple] | k[bd] | a[cpi] | e[fi] [, [w]arm | [c]old]
+ reboot=b[ios] | t[riple] | k[bd] | a[cpi] | e[fi] | p[ci] [, [w]arm | [c]old]
bios
Use the CPU reboot vector for warm reset
warm
@@ -145,6 +145,8 @@ Rebooting
Use efi reset_system runtime service. If EFI is not configured or
the EFI reset does not work, the reboot path attempts the reset using
the keyboard controller.
+ pci
+ Use a write to the PCI config space register 0xcf9 to trigger reboot.
Using warm reset will be much faster especially on big memory
systems because the BIOS will not go through the memory check.
@@ -155,6 +157,13 @@ Rebooting
Don't stop other CPUs on reboot. This can make reboot more reliable
in some cases.
+ reboot=default
+ There are some built-in platform specific "quirks" - you may see:
+ "reboot: <name> series board detected. Selecting <type> for reboots."
+ In the case where you think the quirk is in error (e.g. you have
+ newer BIOS, or newer board) using this option will ignore the built-in
+ quirk table, and use the generic default reboot actions.
+
Non Executable Mappings
=======================
@@ -247,16 +256,11 @@ Multiple x86-64 PCI-DMA mapping implementations exist, for example:
Kernel boot message: "PCI-DMA: Using software bounce buffering
for IO (SWIOTLB)"
- 4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM
- pSeries and xSeries servers. This hardware IOMMU supports DMA address
- mapping with memory protection, etc.
- Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
-
::
iommu=[<size>][,noagp][,off][,force][,noforce]
[,memaper[=<order>]][,merge][,fullflush][,nomerge]
- [,noaperture][,calgary]
+ [,noaperture]
General iommu options:
@@ -295,8 +299,6 @@ iommu options only relevant to the AMD GART hardware IOMMU:
Don't initialize the AGP driver and use full aperture.
panic
Always panic when IOMMU overflows.
- calgary
- Use the Calgary IOMMU if it is available
iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
implementation:
@@ -307,28 +309,6 @@ implementation:
force
Force all IO through the software TLB.
-Settings for the IBM Calgary hardware IOMMU currently found in IBM
-pSeries and xSeries machines
-
- calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
- Set the size of each PCI slot's translation table when using the
- Calgary IOMMU. This is the size of the translation table itself
- in main memory. The smallest table, 64k, covers an IO space of
- 32MB; the largest, 8MB table, can cover an IO space of 4GB.
- Normally the kernel will make the right choice by itself.
- calgary=[translate_empty_slots]
- Enable translation even on slots that have no devices attached to
- them, in case a device will be hotplugged in the future.
- calgary=[disable=<PCI bus number>]
- Disable translation on a given PHB. For
- example, the built-in graphics adapter resides on the first bridge
- (PCI bus number 0); if translation (isolation) is enabled on this
- bridge, X servers that access the hardware directly from user
- space might stop working. Use this option if you have devices that
- are accessed from userspace directly on some PCI host bridge.
- panic
- Always panic when IOMMU overflows
-
Miscellaneous
=============
diff --git a/Documentation/x86/x86_64/mm.rst b/Documentation/x86/x86_64/mm.rst
index ede1875719fb..9798676bb0bf 100644
--- a/Documentation/x86/x86_64/mm.rst
+++ b/Documentation/x86/x86_64/mm.rst
@@ -140,10 +140,6 @@ The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
holes).
-vmalloc space is lazily synchronized into the different PML4/PML5 pages of
-the processes using the page fault handler, with init_top_pgt as
-reference.
-
We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
memory window (this size is arbitrary, it can be raised later if needed).
The mappings are not part of any other kernel PGD and are only available