diff options
Diffstat (limited to 'Documentation/x86')
-rw-r--r-- | Documentation/x86/index.rst | 3 | ||||
-rw-r--r-- | Documentation/x86/resctrl.rst (renamed from Documentation/x86/resctrl_ui.rst) | 93 | ||||
-rw-r--r-- | Documentation/x86/sgx.rst | 211 | ||||
-rw-r--r-- | Documentation/x86/topology.rst | 9 |
4 files changed, 315 insertions, 1 deletions
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index b4fcb6f258b2..4693e192b447 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst @@ -27,10 +27,11 @@ x86-specific Documentation pti mds microcode - resctrl_ui + resctrl tsx_async_abort usb-legacy-support i386/index x86_64/index sva + sgx features diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl.rst index e59b7b93a9b4..71a531061e4e 100644 --- a/Documentation/x86/resctrl_ui.rst +++ b/Documentation/x86/resctrl.rst @@ -1209,3 +1209,96 @@ View the llc occupancy snapshot:: # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy 11234000 + +Intel RDT Errata +================ + +Intel MBM Counters May Report System Memory Bandwidth Incorrectly +----------------------------------------------------------------- + +Errata SKX99 for Skylake server and BDF102 for Broadwell server. + +Problem: Intel Memory Bandwidth Monitoring (MBM) counters track metrics +according to the assigned Resource Monitor ID (RMID) for that logical +core. The IA32_QM_CTR register (MSR 0xC8E), used to report these +metrics, may report incorrect system bandwidth for certain RMID values. + +Implication: Due to the errata, system memory bandwidth may not match +what is reported. + +Workaround: MBM total and local readings are corrected according to the +following correction factor table: + ++---------------+---------------+---------------+-----------------+ +|core count |rmid count |rmid threshold |correction factor| ++---------------+---------------+---------------+-----------------+ +|1 |8 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|2 |16 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|3 |24 |15 |0.969650 | ++---------------+---------------+---------------+-----------------+ +|4 |32 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|6 |48 |31 |0.969650 | ++---------------+---------------+---------------+-----------------+ +|7 |56 |47 |1.142857 | ++---------------+---------------+---------------+-----------------+ +|8 |64 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|9 |72 |63 |1.185115 | ++---------------+---------------+---------------+-----------------+ +|10 |80 |63 |1.066553 | ++---------------+---------------+---------------+-----------------+ +|11 |88 |79 |1.454545 | ++---------------+---------------+---------------+-----------------+ +|12 |96 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|13 |104 |95 |1.230769 | ++---------------+---------------+---------------+-----------------+ +|14 |112 |95 |1.142857 | ++---------------+---------------+---------------+-----------------+ +|15 |120 |95 |1.066667 | ++---------------+---------------+---------------+-----------------+ +|16 |128 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|17 |136 |127 |1.254863 | ++---------------+---------------+---------------+-----------------+ +|18 |144 |127 |1.185255 | ++---------------+---------------+---------------+-----------------+ +|19 |152 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|20 |160 |127 |1.066667 | ++---------------+---------------+---------------+-----------------+ +|21 |168 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|22 |176 |159 |1.454334 | ++---------------+---------------+---------------+-----------------+ +|23 |184 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|24 |192 |127 |0.969744 | ++---------------+---------------+---------------+-----------------+ +|25 |200 |191 |1.280246 | ++---------------+---------------+---------------+-----------------+ +|26 |208 |191 |1.230921 | ++---------------+---------------+---------------+-----------------+ +|27 |216 |0 |1.000000 | ++---------------+---------------+---------------+-----------------+ +|28 |224 |191 |1.143118 | ++---------------+---------------+---------------+-----------------+ + +If rmid > rmid threshold, MBM total and local values should be multiplied +by the correction factor. + +See: + +1. Erratum SKX99 in Intel Xeon Processor Scalable Family Specification Update: +http://web.archive.org/web/20200716124958/https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html + +2. Erratum BDF102 in Intel Xeon E5-2600 v4 Processor Product Family Specification Update: +http://web.archive.org/web/20191125200531/https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v4-spec-update.pdf + +3. The errata in Intel Resource Director Technology (Intel RDT) on 2nd Generation Intel Xeon Scalable Processors Reference Manual: +https://software.intel.com/content/www/us/en/develop/articles/intel-resource-director-technology-rdt-reference-manual.html + +for further information. diff --git a/Documentation/x86/sgx.rst b/Documentation/x86/sgx.rst new file mode 100644 index 000000000000..eaee1368b4fd --- /dev/null +++ b/Documentation/x86/sgx.rst @@ -0,0 +1,211 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============================== +Software Guard eXtensions (SGX) +=============================== + +Overview +======== + +Software Guard eXtensions (SGX) hardware enables for user space applications +to set aside private memory regions of code and data: + +* Privileged (ring-0) ENCLS functions orchestrate the construction of the. + regions. +* Unprivileged (ring-3) ENCLU functions allow an application to enter and + execute inside the regions. + +These memory regions are called enclaves. An enclave can be only entered at a +fixed set of entry points. Each entry point can hold a single hardware thread +at a time. While the enclave is loaded from a regular binary file by using +ENCLS functions, only the threads inside the enclave can access its memory. The +region is denied from outside access by the CPU, and encrypted before it leaves +from LLC. + +The support can be determined by + + ``grep sgx /proc/cpuinfo`` + +SGX must both be supported in the processor and enabled by the BIOS. If SGX +appears to be unsupported on a system which has hardware support, ensure +support is enabled in the BIOS. If a BIOS presents a choice between "Enabled" +and "Software Enabled" modes for SGX, choose "Enabled". + +Enclave Page Cache +================== + +SGX utilizes an *Enclave Page Cache (EPC)* to store pages that are associated +with an enclave. It is contained in a BIOS-reserved region of physical memory. +Unlike pages used for regular memory, pages can only be accessed from outside of +the enclave during enclave construction with special, limited SGX instructions. + +Only a CPU executing inside an enclave can directly access enclave memory. +However, a CPU executing inside an enclave may access normal memory outside the +enclave. + +The kernel manages enclave memory similar to how it treats device memory. + +Enclave Page Types +------------------ + +**SGX Enclave Control Structure (SECS)** + Enclave's address range, attributes and other global data are defined + by this structure. + +**Regular (REG)** + Regular EPC pages contain the code and data of an enclave. + +**Thread Control Structure (TCS)** + Thread Control Structure pages define the entry points to an enclave and + track the execution state of an enclave thread. + +**Version Array (VA)** + Version Array pages contain 512 slots, each of which can contain a version + number for a page evicted from the EPC. + +Enclave Page Cache Map +---------------------- + +The processor tracks EPC pages in a hardware metadata structure called the +*Enclave Page Cache Map (EPCM)*. The EPCM contains an entry for each EPC page +which describes the owning enclave, access rights and page type among the other +things. + +EPCM permissions are separate from the normal page tables. This prevents the +kernel from, for instance, allowing writes to data which an enclave wishes to +remain read-only. EPCM permissions may only impose additional restrictions on +top of normal x86 page permissions. + +For all intents and purposes, the SGX architecture allows the processor to +invalidate all EPCM entries at will. This requires that software be prepared to +handle an EPCM fault at any time. In practice, this can happen on events like +power transitions when the ephemeral key that encrypts enclave memory is lost. + +Application interface +===================== + +Enclave build functions +----------------------- + +In addition to the traditional compiler and linker build process, SGX has a +separate enclave “build” process. Enclaves must be built before they can be +executed (entered). The first step in building an enclave is opening the +**/dev/sgx_enclave** device. Since enclave memory is protected from direct +access, special privileged instructions are Then used to copy data into enclave +pages and establish enclave page permissions. + +.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c + :functions: sgx_ioc_enclave_create + sgx_ioc_enclave_add_pages + sgx_ioc_enclave_init + sgx_ioc_enclave_provision + +Enclave vDSO +------------ + +Entering an enclave can only be done through SGX-specific EENTER and ERESUME +functions, and is a non-trivial process. Because of the complexity of +transitioning to and from an enclave, enclaves typically utilize a library to +handle the actual transitions. This is roughly analogous to how glibc +implementations are used by most applications to wrap system calls. + +Another crucial characteristic of enclaves is that they can generate exceptions +as part of their normal operation that need to be handled in the enclave or are +unique to SGX. + +Instead of the traditional signal mechanism to handle these exceptions, SGX +can leverage special exception fixup provided by the vDSO. The kernel-provided +vDSO function wraps low-level transitions to/from the enclave like EENTER and +ERESUME. The vDSO function intercepts exceptions that would otherwise generate +a signal and return the fault information directly to its caller. This avoids +the need to juggle signal handlers. + +.. kernel-doc:: arch/x86/include/uapi/asm/sgx.h + :functions: vdso_sgx_enter_enclave_t + +ksgxd +===== + +SGX support includes a kernel thread called *ksgxwapd*. + +EPC sanitization +---------------- + +ksgxd is started when SGX initializes. Enclave memory is typically ready +For use when the processor powers on or resets. However, if SGX has been in +use since the reset, enclave pages may be in an inconsistent state. This might +occur after a crash and kexec() cycle, for instance. At boot, ksgxd +reinitializes all enclave pages so that they can be allocated and re-used. + +The sanitization is done by going through EPC address space and applying the +EREMOVE function to each physical page. Some enclave pages like SECS pages have +hardware dependencies on other pages which prevents EREMOVE from functioning. +Executing two EREMOVE passes removes the dependencies. + +Page reclaimer +-------------- + +Similar to the core kswapd, ksgxd, is responsible for managing the +overcommitment of enclave memory. If the system runs out of enclave memory, +*ksgxwapd* “swaps” enclave memory to normal memory. + +Launch Control +============== + +SGX provides a launch control mechanism. After all enclave pages have been +copied, kernel executes EINIT function, which initializes the enclave. Only after +this the CPU can execute inside the enclave. + +ENIT function takes an RSA-3072 signature of the enclave measurement. The function +checks that the measurement is correct and signature is signed with the key +hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the +SHA256 of a public key. + +Those MSRs can be configured by the BIOS to be either readable or writable. +Linux supports only writable configuration in order to give full control to the +kernel on launch control policy. Before calling EINIT function, the driver sets +the MSRs to match the enclave's signing key. + +Encryption engines +================== + +In order to conceal the enclave data while it is out of the CPU package, the +memory controller has an encryption engine to transparently encrypt and decrypt +enclave memory. + +In CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used to +encrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root in +SRAM to maintain integrity of the encrypted data. This provides integrity and +anti-replay protection but does not scale to large memory sizes because the time +required to update the Merkle tree grows logarithmically in relation to the +memory size. + +CPUs starting from Icelake use Total Memory Encryption (TME) in the place of +MEE. TME-based SGX implementations do not have an integrity Merkle tree, which +means integrity and replay-attacks are not mitigated. B, it includes +additional changes to prevent cipher text from being returned and SW memory +aliases from being Created. + +DMA to enclave memory is blocked by range registers on both MEE and TME systems +(SDM section 41.10). + +Usage Models +============ + +Shared Library +-------------- + +Sensitive data and the code that acts on it is partitioned from the application +into a separate library. The library is then linked as a DSO which can be loaded +into an enclave. The application can then make individual function calls into +the enclave through special SGX instructions. A run-time within the enclave is +configured to marshal function parameters into and out of the enclave and to +call the correct library function. + +Application Container +--------------------- + +An application may be loaded into a container enclave which is specially +configured with a library OS and run-time which permits the application to run. +The enclave run-time and library OS work together to execute the application +when a thread enters the enclave. diff --git a/Documentation/x86/topology.rst b/Documentation/x86/topology.rst index e29739904e37..7f58010ea86a 100644 --- a/Documentation/x86/topology.rst +++ b/Documentation/x86/topology.rst @@ -41,6 +41,8 @@ Package Packages contain a number of cores plus shared resources, e.g. DRAM controller, shared caches etc. +Modern systems may also use the term 'Die' for package. + AMD nomenclature for package is 'Node'. Package-related topology information in the kernel: @@ -53,11 +55,18 @@ Package-related topology information in the kernel: The number of dies in a package. This information is retrieved via CPUID. + - cpuinfo_x86.cpu_die_id: + + The physical ID of the die. This information is retrieved via CPUID. + - cpuinfo_x86.phys_proc_id: The physical ID of the package. This information is retrieved via CPUID and deduced from the APIC IDs of the cores in the package. + Modern systems use this value for the socket. There may be multiple + packages within a socket. This value may differ from cpu_die_id. + - cpuinfo_x86.logical_proc_id: The logical ID of the package. As we do not trust BIOSes to enumerate the |