summaryrefslogtreecommitdiff
path: root/Documentation/arm64
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/arm64')
-rw-r--r--Documentation/arm64/asymmetric-32bit.rst155
-rw-r--r--Documentation/arm64/booting.rst43
-rw-r--r--Documentation/arm64/index.rst1
-rw-r--r--Documentation/arm64/memory-tagging-extension.rst48
-rw-r--r--Documentation/arm64/tagged-address-abi.rst26
5 files changed, 254 insertions, 19 deletions
diff --git a/Documentation/arm64/asymmetric-32bit.rst b/Documentation/arm64/asymmetric-32bit.rst
new file mode 100644
index 000000000000..64a0b505da7d
--- /dev/null
+++ b/Documentation/arm64/asymmetric-32bit.rst
@@ -0,0 +1,155 @@
+======================
+Asymmetric 32-bit SoCs
+======================
+
+Author: Will Deacon <will@kernel.org>
+
+This document describes the impact of asymmetric 32-bit SoCs on the
+execution of 32-bit (``AArch32``) applications.
+
+Date: 2021-05-17
+
+Introduction
+============
+
+Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset
+of the CPUs are capable of executing 32-bit user applications. On such
+a system, Linux by default treats the asymmetry as a "mismatch" and
+disables support for both the ``PER_LINUX32`` personality and
+``execve(2)`` of 32-bit ELF binaries, with the latter returning
+``-ENOEXEC``. If the mismatch is detected during late onlining of a
+64-bit-only CPU, then the onlining operation fails and the new CPU is
+unavailable for scheduling.
+
+Surprisingly, these SoCs have been produced with the intention of
+running legacy 32-bit binaries. Unsurprisingly, that doesn't work very
+well with the default behaviour of Linux.
+
+It seems inevitable that future SoCs will drop 32-bit support
+altogether, so if you're stuck in the unenviable position of needing to
+run 32-bit code on one of these transitionary platforms then you would
+be wise to consider alternatives such as recompilation, emulation or
+retirement. If neither of those options are practical, then read on.
+
+Enabling kernel support
+=======================
+
+Since the kernel support is not completely transparent to userspace,
+allowing 32-bit tasks to run on an asymmetric 32-bit system requires an
+explicit "opt-in" and can be enabled by passing the
+``allow_mismatched_32bit_el0`` parameter on the kernel command-line.
+
+For the remainder of this document we will refer to an *asymmetric
+system* to mean an asymmetric 32-bit SoC running Linux with this kernel
+command-line option enabled.
+
+Userspace impact
+================
+
+32-bit tasks running on an asymmetric system behave in mostly the same
+way as on a homogeneous system, with a few key differences relating to
+CPU affinity.
+
+sysfs
+-----
+
+The subset of CPUs capable of running 32-bit tasks is described in
+``/sys/devices/system/cpu/aarch32_el0`` and is documented further in
+``Documentation/ABI/testing/sysfs-devices-system-cpu``.
+
+**Note:** CPUs are advertised by this file as they are detected and so
+late-onlining of 32-bit-capable CPUs can result in the file contents
+being modified by the kernel at runtime. Once advertised, CPUs are never
+removed from the file.
+
+``execve(2)``
+-------------
+
+On a homogeneous system, the CPU affinity of a task is preserved across
+``execve(2)``. This is not always possible on an asymmetric system,
+specifically when the new program being executed is 32-bit yet the
+affinity mask contains 64-bit-only CPUs. In this situation, the kernel
+determines the new affinity mask as follows:
+
+ 1. If the 32-bit-capable subset of the affinity mask is not empty,
+ then the affinity is restricted to that subset and the old affinity
+ mask is saved. This saved mask is inherited over ``fork(2)`` and
+ preserved across ``execve(2)`` of 32-bit programs.
+
+ **Note:** This step does not apply to ``SCHED_DEADLINE`` tasks.
+ See `SCHED_DEADLINE`_.
+
+ 2. Otherwise, the cpuset hierarchy of the task is walked until an
+ ancestor is found containing at least one 32-bit-capable CPU. The
+ affinity of the task is then changed to match the 32-bit-capable
+ subset of the cpuset determined by the walk.
+
+ 3. On failure (i.e. out of memory), the affinity is changed to the set
+ of all 32-bit-capable CPUs of which the kernel is aware.
+
+A subsequent ``execve(2)`` of a 64-bit program by the 32-bit task will
+invalidate the affinity mask saved in (1) and attempt to restore the CPU
+affinity of the task using the saved mask if it was previously valid.
+This restoration may fail due to intervening changes to the deadline
+policy or cpuset hierarchy, in which case the ``execve(2)`` continues
+with the affinity unchanged.
+
+Calls to ``sched_setaffinity(2)`` for a 32-bit task will consider only
+the 32-bit-capable CPUs of the requested affinity mask. On success, the
+affinity for the task is updated and any saved mask from a prior
+``execve(2)`` is invalidated.
+
+``SCHED_DEADLINE``
+------------------
+
+Explicit admission of a 32-bit deadline task to the default root domain
+(e.g. by calling ``sched_setattr(2)``) is rejected on an asymmetric
+32-bit system unless admission control is disabled by writing -1 to
+``/proc/sys/kernel/sched_rt_runtime_us``.
+
+``execve(2)`` of a 32-bit program from a 64-bit deadline task will
+return ``-ENOEXEC`` if the root domain for the task contains any
+64-bit-only CPUs and admission control is enabled. Concurrent offlining
+of 32-bit-capable CPUs may still necessitate the procedure described in
+`execve(2)`_, in which case step (1) is skipped and a warning is
+emitted on the console.
+
+**Note:** It is recommended that a set of 32-bit-capable CPUs are placed
+into a separate root domain if ``SCHED_DEADLINE`` is to be used with
+32-bit tasks on an asymmetric system. Failure to do so is likely to
+result in missed deadlines.
+
+Cpusets
+-------
+
+The affinity of a 32-bit task on an asymmetric system may include CPUs
+that are not explicitly allowed by the cpuset to which it is attached.
+This can occur as a result of the following two situations:
+
+ - A 64-bit task attached to a cpuset which allows only 64-bit CPUs
+ executes a 32-bit program.
+
+ - All of the 32-bit-capable CPUs allowed by a cpuset containing a
+ 32-bit task are offlined.
+
+In both of these cases, the new affinity is calculated according to step
+(2) of the process described in `execve(2)`_ and the cpuset hierarchy is
+unchanged irrespective of the cgroup version.
+
+CPU hotplug
+-----------
+
+On an asymmetric system, the first detected 32-bit-capable CPU is
+prevented from being offlined by userspace and any such attempt will
+return ``-EPERM``. Note that suspend is still permitted even if the
+primary CPU (i.e. CPU 0) is 64-bit-only.
+
+KVM
+---
+
+Although KVM will not advertise 32-bit EL0 support to any vCPUs on an
+asymmetric system, a broken guest at EL1 could still attempt to execute
+32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit
+mode will return to host userspace with an ``exit_reason`` of
+``KVM_EXIT_FAIL_ENTRY`` and will remain non-runnable until successfully
+re-initialised by a subsequent ``KVM_ARM_VCPU_INIT`` operation.
diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
index 18b8cc1bf32c..3f9d86557c5e 100644
--- a/Documentation/arm64/booting.rst
+++ b/Documentation/arm64/booting.rst
@@ -207,10 +207,17 @@ Before jumping into the kernel, the following conditions must be met:
software at a higher exception level to prevent execution in an UNKNOWN
state.
- - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
- executing on.
- - The value of SCR_EL3.FIQ must be the same as the one present at boot
- time whenever the kernel is executing.
+ For all systems:
+ - If EL3 is present:
+
+ - SCR_EL3.FIQ must have the same value across all CPUs the kernel is
+ executing on.
+ - The value of SCR_EL3.FIQ must be the same as the one present at boot
+ time whenever the kernel is executing.
+
+ - If EL3 is present and the kernel is entered at EL2:
+
+ - SCR_EL3.HCE (bit 8) must be initialised to 0b1.
For systems with a GICv3 interrupt controller to be used in v3 mode:
- If EL3 is present:
@@ -277,6 +284,12 @@ Before jumping into the kernel, the following conditions must be met:
- SCR_EL3.FGTEn (bit 27) must be initialised to 0b1.
+ For CPUs with support for HCRX_EL2 (FEAT_HCX) present:
+
+ - If EL3 is present and the kernel is entered at EL2:
+
+ - SCR_EL3.HXEn (bit 38) must be initialised to 0b1.
+
For CPUs with Advanced SIMD and floating point support:
- If EL3 is present:
@@ -305,6 +318,28 @@ Before jumping into the kernel, the following conditions must be met:
- ZCR_EL2.LEN must be initialised to the same value for all CPUs the
kernel will execute on.
+ For CPUs with the Scalable Matrix Extension (FEAT_SME):
+
+ - If EL3 is present:
+
+ - CPTR_EL3.ESM (bit 12) must be initialised to 0b1.
+
+ - SCR_EL3.EnTP2 (bit 41) must be initialised to 0b1.
+
+ - SMCR_EL3.LEN must be initialised to the same value for all CPUs the
+ kernel will execute on.
+
+ - If the kernel is entered at EL1 and EL2 is present:
+
+ - CPTR_EL2.TSM (bit 12) must be initialised to 0b0.
+
+ - CPTR_EL2.SMEN (bits 25:24) must be initialised to 0b11.
+
+ - SCTLR_EL2.EnTP2 (bit 60) must be initialised to 0b1.
+
+ - SMCR_EL2.LEN must be initialised to the same value for all CPUs the
+ kernel will execute on.
+
The requirements described above for CPU mode, caches, MMUs, architected
timers, coherency and system registers apply to all CPUs. All CPUs must
enter the kernel in the same exception level. Where the values documented
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 97d65ba12a35..4f840bac083e 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -10,6 +10,7 @@ ARM64 Architecture
acpi_object_usage
amu
arm-acpi
+ asymmetric-32bit
booting
cpu-feature-registers
elf_hwcaps
diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
index b540178a93f8..7b99c8f428eb 100644
--- a/Documentation/arm64/memory-tagging-extension.rst
+++ b/Documentation/arm64/memory-tagging-extension.rst
@@ -77,14 +77,20 @@ configurable behaviours:
address is unknown).
The user can select the above modes, per thread, using the
-``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
-``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
+``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
+contains any number of the following values in the ``PR_MTE_TCF_MASK``
bit-field:
-- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults
+- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
+ (ignored if combined with other options)
- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode
- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
+If no modes are specified, tag check faults are ignored. If a single
+mode is specified, the program will run in that mode. If multiple
+modes are specified, the mode is selected as described in the "Per-CPU
+preferred tag checking modes" section below.
+
The current tag check fault mode can be read using the
``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
@@ -120,13 +126,39 @@ in the ``PR_MTE_TAG_MASK`` bit-field.
interface provides an include mask. An include mask of ``0`` (exclusion
mask ``0xffff``) results in the CPU always generating tag ``0``.
+Per-CPU preferred tag checking mode
+-----------------------------------
+
+On some CPUs the performance of MTE in stricter tag checking modes
+is similar to that of less strict tag checking modes. This makes it
+worthwhile to enable stricter checks on those CPUs when a less strict
+checking mode is requested, in order to gain the error detection
+benefits of the stricter checks without the performance downsides. To
+support this scenario, a privileged user may configure a stricter
+tag checking mode as the CPU's preferred tag checking mode.
+
+The preferred tag checking mode for each CPU is controlled by
+``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
+privileged user may write the value ``async`` or ``sync``. The default
+preferred mode for each CPU is ``async``.
+
+To allow a program to potentially run in the CPU's preferred tag
+checking mode, the user program may set multiple tag check fault mode
+bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
+flags, 0, 0, 0)`` system call. If the CPU's preferred tag checking
+mode is in the task's set of provided tag checking modes (this will
+always be the case at present because the kernel only supports two
+tag checking modes, but future kernels may support more modes), that
+mode will be selected. Otherwise, one of the modes in the task's mode
+set will be selected in a currently unspecified manner.
+
Initial process state
---------------------
On ``execve()``, the new process has the following configuration:
- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
-- Tag checking mode set to ``PR_MTE_TCF_NONE``
+- No tag checking modes are selected (tag check faults ignored)
- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
- ``PSTATE.TCO`` set to 0
- ``PROT_MTE`` not set on any of the initial memory maps
@@ -251,11 +283,13 @@ Example of correct usage
return EXIT_FAILURE;
/*
- * Enable the tagged address ABI, synchronous MTE tag check faults and
- * allow all non-zero tags in the randomly generated set.
+ * Enable the tagged address ABI, synchronous or asynchronous MTE
+ * tag check faults (based on per-CPU preference) and allow all
+ * non-zero tags in the randomly generated set.
*/
if (prctl(PR_SET_TAGGED_ADDR_CTRL,
- PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT),
+ PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC |
+ (0xfffe << PR_MTE_TAG_SHIFT),
0, 0, 0)) {
perror("prctl() failed");
return EXIT_FAILURE;
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst
index 459e6b66ff68..0c9120ec58ae 100644
--- a/Documentation/arm64/tagged-address-abi.rst
+++ b/Documentation/arm64/tagged-address-abi.rst
@@ -45,14 +45,24 @@ how the user addresses are used by the kernel:
1. User addresses not accessed by the kernel but used for address space
management (e.g. ``mprotect()``, ``madvise()``). The use of valid
- tagged pointers in this context is allowed with the exception of
- ``brk()``, ``mmap()`` and the ``new_address`` argument to
- ``mremap()`` as these have the potential to alias with existing
- user addresses.
-
- NOTE: This behaviour changed in v5.6 and so some earlier kernels may
- incorrectly accept valid tagged pointers for the ``brk()``,
- ``mmap()`` and ``mremap()`` system calls.
+ tagged pointers in this context is allowed with these exceptions:
+
+ - ``brk()``, ``mmap()`` and the ``new_address`` argument to
+ ``mremap()`` as these have the potential to alias with existing
+ user addresses.
+
+ NOTE: This behaviour changed in v5.6 and so some earlier kernels may
+ incorrectly accept valid tagged pointers for the ``brk()``,
+ ``mmap()`` and ``mremap()`` system calls.
+
+ - The ``range.start``, ``start`` and ``dst`` arguments to the
+ ``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from
+ ``userfaultfd()``, as fault addresses subsequently obtained by reading
+ the file descriptor will be untagged, which may otherwise confuse
+ tag-unaware programs.
+
+ NOTE: This behaviour changed in v5.14 and so some earlier kernels may
+ incorrectly accept valid tagged pointers for this system call.
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
relaxation is disabled by default and the application thread needs to