From e4624435f38b34e7ce827070aa0f8b533a37c07e Mon Sep 17 00:00:00 2001 From: Jonathan Corbet Date: Mon, 12 Jun 2023 06:06:39 -0600 Subject: docs: arm64: Move arm64 documentation under Documentation/arch/ Architecture-specific documentation is being moved into Documentation/arch/ as a way of cleaning up the top-level documentation directory and making the docs hierarchy more closely match the source hierarchy. Move Documentation/arm64 into arch/ (along with the Chinese equvalent translations) and fix up documentation references. Cc: Will Deacon Cc: Alex Shi Cc: Hu Haowen Cc: Paolo Bonzini Acked-by: Catalin Marinas Reviewed-by: Yantengsi Signed-off-by: Jonathan Corbet --- Documentation/ABI/testing/sysfs-devices-system-cpu | 2 +- Documentation/admin-guide/kernel-parameters.txt | 2 +- Documentation/admin-guide/sysctl/kernel.rst | 2 +- Documentation/arch/arm64/acpi_object_usage.rst | 738 +++++++++++++++++++++ Documentation/arch/arm64/amu.rst | 119 ++++ Documentation/arch/arm64/arm-acpi.rst | 528 +++++++++++++++ Documentation/arch/arm64/asymmetric-32bit.rst | 155 +++++ Documentation/arch/arm64/booting.rst | 431 ++++++++++++ Documentation/arch/arm64/cpu-feature-registers.rst | 400 +++++++++++ Documentation/arch/arm64/elf_hwcaps.rst | 309 +++++++++ Documentation/arch/arm64/features.rst | 3 + Documentation/arch/arm64/hugetlbpage.rst | 43 ++ Documentation/arch/arm64/index.rst | 36 + Documentation/arch/arm64/kasan-offsets.sh | 26 + Documentation/arch/arm64/legacy_instructions.rst | 68 ++ .../arch/arm64/memory-tagging-extension.rst | 375 +++++++++++ Documentation/arch/arm64/memory.rst | 167 +++++ Documentation/arch/arm64/perf.rst | 166 +++++ .../arch/arm64/pointer-authentication.rst | 142 ++++ Documentation/arch/arm64/silicon-errata.rst | 216 ++++++ Documentation/arch/arm64/sme.rst | 468 +++++++++++++ Documentation/arch/arm64/sve.rst | 616 +++++++++++++++++ Documentation/arch/arm64/tagged-address-abi.rst | 179 +++++ Documentation/arch/arm64/tagged-pointers.rst | 88 +++ Documentation/arch/index.rst | 2 +- Documentation/arm64/acpi_object_usage.rst | 738 --------------------- Documentation/arm64/amu.rst | 119 ---- Documentation/arm64/arm-acpi.rst | 528 --------------- Documentation/arm64/asymmetric-32bit.rst | 155 ----- Documentation/arm64/booting.rst | 431 ------------ Documentation/arm64/cpu-feature-registers.rst | 400 ----------- Documentation/arm64/elf_hwcaps.rst | 309 --------- Documentation/arm64/features.rst | 3 - Documentation/arm64/hugetlbpage.rst | 43 -- Documentation/arm64/index.rst | 36 - Documentation/arm64/kasan-offsets.sh | 26 - Documentation/arm64/legacy_instructions.rst | 68 -- Documentation/arm64/memory-tagging-extension.rst | 375 ----------- Documentation/arm64/memory.rst | 167 ----- Documentation/arm64/perf.rst | 166 ----- Documentation/arm64/pointer-authentication.rst | 142 ---- Documentation/arm64/silicon-errata.rst | 216 ------ Documentation/arm64/sme.rst | 468 ------------- Documentation/arm64/sve.rst | 616 ----------------- Documentation/arm64/tagged-address-abi.rst | 179 ----- Documentation/arm64/tagged-pointers.rst | 88 --- .../translations/zh_CN/arch/arm64/amu.rst | 100 +++ .../translations/zh_CN/arch/arm64/booting.txt | 246 +++++++ .../translations/zh_CN/arch/arm64/elf_hwcaps.rst | 240 +++++++ .../translations/zh_CN/arch/arm64/hugetlbpage.rst | 45 ++ .../translations/zh_CN/arch/arm64/index.rst | 19 + .../zh_CN/arch/arm64/legacy_instructions.txt | 72 ++ .../translations/zh_CN/arch/arm64/memory.txt | 114 ++++ .../translations/zh_CN/arch/arm64/perf.rst | 86 +++ .../zh_CN/arch/arm64/silicon-errata.txt | 74 +++ .../zh_CN/arch/arm64/tagged-pointers.txt | 52 ++ Documentation/translations/zh_CN/arch/index.rst | 2 +- Documentation/translations/zh_CN/arm64/amu.rst | 100 --- Documentation/translations/zh_CN/arm64/booting.txt | 246 ------- .../translations/zh_CN/arm64/elf_hwcaps.rst | 240 ------- .../translations/zh_CN/arm64/hugetlbpage.rst | 45 -- Documentation/translations/zh_CN/arm64/index.rst | 19 - .../zh_CN/arm64/legacy_instructions.txt | 72 -- Documentation/translations/zh_CN/arm64/memory.txt | 114 ---- Documentation/translations/zh_CN/arm64/perf.rst | 86 --- .../translations/zh_CN/arm64/silicon-errata.txt | 74 --- .../translations/zh_CN/arm64/tagged-pointers.txt | 52 -- .../translations/zh_TW/arch/arm64/amu.rst | 104 +++ .../translations/zh_TW/arch/arm64/booting.txt | 251 +++++++ .../translations/zh_TW/arch/arm64/elf_hwcaps.rst | 244 +++++++ .../translations/zh_TW/arch/arm64/hugetlbpage.rst | 49 ++ .../translations/zh_TW/arch/arm64/index.rst | 23 + .../zh_TW/arch/arm64/legacy_instructions.txt | 77 +++ .../translations/zh_TW/arch/arm64/memory.txt | 119 ++++ .../translations/zh_TW/arch/arm64/perf.rst | 88 +++ .../zh_TW/arch/arm64/silicon-errata.txt | 79 +++ .../zh_TW/arch/arm64/tagged-pointers.txt | 57 ++ Documentation/translations/zh_TW/arm64/amu.rst | 104 --- Documentation/translations/zh_TW/arm64/booting.txt | 251 ------- .../translations/zh_TW/arm64/elf_hwcaps.rst | 244 ------- .../translations/zh_TW/arm64/hugetlbpage.rst | 49 -- Documentation/translations/zh_TW/arm64/index.rst | 23 - .../zh_TW/arm64/legacy_instructions.txt | 77 --- Documentation/translations/zh_TW/arm64/memory.txt | 119 ---- Documentation/translations/zh_TW/arm64/perf.rst | 88 --- .../translations/zh_TW/arm64/silicon-errata.txt | 79 --- .../translations/zh_TW/arm64/tagged-pointers.txt | 57 -- Documentation/translations/zh_TW/index.rst | 2 +- Documentation/virt/kvm/api.rst | 2 +- 89 files changed, 7419 insertions(+), 7419 deletions(-) create mode 100644 Documentation/arch/arm64/acpi_object_usage.rst create mode 100644 Documentation/arch/arm64/amu.rst create mode 100644 Documentation/arch/arm64/arm-acpi.rst create mode 100644 Documentation/arch/arm64/asymmetric-32bit.rst create mode 100644 Documentation/arch/arm64/booting.rst create mode 100644 Documentation/arch/arm64/cpu-feature-registers.rst create mode 100644 Documentation/arch/arm64/elf_hwcaps.rst create mode 100644 Documentation/arch/arm64/features.rst create mode 100644 Documentation/arch/arm64/hugetlbpage.rst create mode 100644 Documentation/arch/arm64/index.rst create mode 100644 Documentation/arch/arm64/kasan-offsets.sh create mode 100644 Documentation/arch/arm64/legacy_instructions.rst create mode 100644 Documentation/arch/arm64/memory-tagging-extension.rst create mode 100644 Documentation/arch/arm64/memory.rst create mode 100644 Documentation/arch/arm64/perf.rst create mode 100644 Documentation/arch/arm64/pointer-authentication.rst create mode 100644 Documentation/arch/arm64/silicon-errata.rst create mode 100644 Documentation/arch/arm64/sme.rst create mode 100644 Documentation/arch/arm64/sve.rst create mode 100644 Documentation/arch/arm64/tagged-address-abi.rst create mode 100644 Documentation/arch/arm64/tagged-pointers.rst delete mode 100644 Documentation/arm64/acpi_object_usage.rst delete mode 100644 Documentation/arm64/amu.rst delete mode 100644 Documentation/arm64/arm-acpi.rst delete mode 100644 Documentation/arm64/asymmetric-32bit.rst delete mode 100644 Documentation/arm64/booting.rst delete mode 100644 Documentation/arm64/cpu-feature-registers.rst delete mode 100644 Documentation/arm64/elf_hwcaps.rst delete mode 100644 Documentation/arm64/features.rst delete mode 100644 Documentation/arm64/hugetlbpage.rst delete mode 100644 Documentation/arm64/index.rst delete mode 100644 Documentation/arm64/kasan-offsets.sh delete mode 100644 Documentation/arm64/legacy_instructions.rst delete mode 100644 Documentation/arm64/memory-tagging-extension.rst delete mode 100644 Documentation/arm64/memory.rst delete mode 100644 Documentation/arm64/perf.rst delete mode 100644 Documentation/arm64/pointer-authentication.rst delete mode 100644 Documentation/arm64/silicon-errata.rst delete mode 100644 Documentation/arm64/sme.rst delete mode 100644 Documentation/arm64/sve.rst delete mode 100644 Documentation/arm64/tagged-address-abi.rst delete mode 100644 Documentation/arm64/tagged-pointers.rst create mode 100644 Documentation/translations/zh_CN/arch/arm64/amu.rst create mode 100644 Documentation/translations/zh_CN/arch/arm64/booting.txt create mode 100644 Documentation/translations/zh_CN/arch/arm64/elf_hwcaps.rst create mode 100644 Documentation/translations/zh_CN/arch/arm64/hugetlbpage.rst create mode 100644 Documentation/translations/zh_CN/arch/arm64/index.rst create mode 100644 Documentation/translations/zh_CN/arch/arm64/legacy_instructions.txt create mode 100644 Documentation/translations/zh_CN/arch/arm64/memory.txt create mode 100644 Documentation/translations/zh_CN/arch/arm64/perf.rst create mode 100644 Documentation/translations/zh_CN/arch/arm64/silicon-errata.txt create mode 100644 Documentation/translations/zh_CN/arch/arm64/tagged-pointers.txt delete mode 100644 Documentation/translations/zh_CN/arm64/amu.rst delete mode 100644 Documentation/translations/zh_CN/arm64/booting.txt delete mode 100644 Documentation/translations/zh_CN/arm64/elf_hwcaps.rst delete mode 100644 Documentation/translations/zh_CN/arm64/hugetlbpage.rst delete mode 100644 Documentation/translations/zh_CN/arm64/index.rst delete mode 100644 Documentation/translations/zh_CN/arm64/legacy_instructions.txt delete mode 100644 Documentation/translations/zh_CN/arm64/memory.txt delete mode 100644 Documentation/translations/zh_CN/arm64/perf.rst delete mode 100644 Documentation/translations/zh_CN/arm64/silicon-errata.txt delete mode 100644 Documentation/translations/zh_CN/arm64/tagged-pointers.txt create mode 100644 Documentation/translations/zh_TW/arch/arm64/amu.rst create mode 100644 Documentation/translations/zh_TW/arch/arm64/booting.txt create mode 100644 Documentation/translations/zh_TW/arch/arm64/elf_hwcaps.rst create mode 100644 Documentation/translations/zh_TW/arch/arm64/hugetlbpage.rst create mode 100644 Documentation/translations/zh_TW/arch/arm64/index.rst create mode 100644 Documentation/translations/zh_TW/arch/arm64/legacy_instructions.txt create mode 100644 Documentation/translations/zh_TW/arch/arm64/memory.txt create mode 100644 Documentation/translations/zh_TW/arch/arm64/perf.rst create mode 100644 Documentation/translations/zh_TW/arch/arm64/silicon-errata.txt create mode 100644 Documentation/translations/zh_TW/arch/arm64/tagged-pointers.txt delete mode 100644 Documentation/translations/zh_TW/arm64/amu.rst delete mode 100644 Documentation/translations/zh_TW/arm64/booting.txt delete mode 100644 Documentation/translations/zh_TW/arm64/elf_hwcaps.rst delete mode 100644 Documentation/translations/zh_TW/arm64/hugetlbpage.rst delete mode 100644 Documentation/translations/zh_TW/arm64/index.rst delete mode 100644 Documentation/translations/zh_TW/arm64/legacy_instructions.txt delete mode 100644 Documentation/translations/zh_TW/arm64/memory.txt delete mode 100644 Documentation/translations/zh_TW/arm64/perf.rst delete mode 100644 Documentation/translations/zh_TW/arm64/silicon-errata.txt delete mode 100644 Documentation/translations/zh_TW/arm64/tagged-pointers.txt (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index f54867cadb0f..ecd585ca2d50 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -670,7 +670,7 @@ Description: Preferred MTE tag checking mode "async" Prefer asynchronous mode ================ ============================================== - See also: Documentation/arm64/memory-tagging-extension.rst + See also: Documentation/arch/arm64/memory-tagging-extension.rst What: /sys/devices/system/cpu/nohz_full Date: Apr 2015 diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9e5bab29685f..893b5a133041 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -304,7 +304,7 @@ EL0 is indicated by /sys/devices/system/cpu/aarch32_el0 and hot-unplug operations may be restricted. - See Documentation/arm64/asymmetric-32bit.rst for more + See Documentation/arch/arm64/asymmetric-32bit.rst for more information. amd_iommu= [HW,X86-64] diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index d85d90f5d000..3800fab1619b 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -949,7 +949,7 @@ user space can read performance monitor counter registers directly. The default value is 0 (access disabled). -See Documentation/arm64/perf.rst for more information. +See Documentation/arch/arm64/perf.rst for more information. pid_max diff --git a/Documentation/arch/arm64/acpi_object_usage.rst b/Documentation/arch/arm64/acpi_object_usage.rst new file mode 100644 index 000000000000..484ef9676653 --- /dev/null +++ b/Documentation/arch/arm64/acpi_object_usage.rst @@ -0,0 +1,738 @@ +=========== +ACPI Tables +=========== + +The expectations of individual ACPI tables are discussed in the list that +follows. + +If a section number is used, it refers to a section number in the ACPI +specification where the object is defined. If "Signature Reserved" is used, +the table signature (the first four bytes of the table) is the only portion +of the table recognized by the specification, and the actual table is defined +outside of the UEFI Forum (see Section 5.2.6 of the specification). + +For ACPI on arm64, tables also fall into the following categories: + + - Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT + + - Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT + + - Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IBFT, + IORT, MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, + STAO, TCPA, TPM2, UEFI, XENV + + - Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT, MSDM, OEMx, + PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT + +====== ======================================================================== +Table Usage for ARMv8 Linux +====== ======================================================================== +BERT Section 18.3 (signature == "BERT") + + **Boot Error Record Table** + + Must be supplied if RAS support is provided by the platform. It + is recommended this table be supplied. + +BOOT Signature Reserved (signature == "BOOT") + + **simple BOOT flag table** + + Microsoft only table, will not be supported. + +BGRT Section 5.2.22 (signature == "BGRT") + + **Boot Graphics Resource Table** + + Optional, not currently supported, with no real use-case for an + ARM server. + +CPEP Section 5.2.18 (signature == "CPEP") + + **Corrected Platform Error Polling table** + + Optional, not currently supported, and not recommended until such + time as ARM-compatible hardware is available, and the specification + suitably modified. + +CSRT Signature Reserved (signature == "CSRT") + + **Core System Resources Table** + + Optional, not currently supported. + +DBG2 Signature Reserved (signature == "DBG2") + + **DeBuG port table 2** + + License has changed and should be usable. Optional if used instead + of earlycon= on the command line. + +DBGP Signature Reserved (signature == "DBGP") + + **DeBuG Port table** + + Microsoft only table, will not be supported. + +DSDT Section 5.2.11.1 (signature == "DSDT") + + **Differentiated System Description Table** + + A DSDT is required; see also SSDT. + + ACPI tables contain only one DSDT but can contain one or more SSDTs, + which are optional. Each SSDT can only add to the ACPI namespace, + but cannot modify or replace anything in the DSDT. + +DMAR Signature Reserved (signature == "DMAR") + + **DMA Remapping table** + + x86 only table, will not be supported. + +DRTM Signature Reserved (signature == "DRTM") + + **Dynamic Root of Trust for Measurement table** + + Optional, not currently supported. + +ECDT Section 5.2.16 (signature == "ECDT") + + **Embedded Controller Description Table** + + Optional, not currently supported, but could be used on ARM if and + only if one uses the GPE_BIT field to represent an IRQ number, since + there are no GPE blocks defined in hardware reduced mode. This would + need to be modified in the ACPI specification. + +EINJ Section 18.6 (signature == "EINJ") + + **Error Injection table** + + This table is very useful for testing platform response to error + conditions; it allows one to inject an error into the system as + if it had actually occurred. However, this table should not be + shipped with a production system; it should be dynamically loaded + and executed with the ACPICA tools only during testing. + +ERST Section 18.5 (signature == "ERST") + + **Error Record Serialization Table** + + On a platform supports RAS, this table must be supplied if it is not + UEFI-based; if it is UEFI-based, this table may be supplied. When this + table is not present, UEFI run time service will be utilized to save + and retrieve hardware error information to and from a persistent store. + +ETDT Signature Reserved (signature == "ETDT") + + **Event Timer Description Table** + + Obsolete table, will not be supported. + +FACS Section 5.2.10 (signature == "FACS") + + **Firmware ACPI Control Structure** + + It is unlikely that this table will be terribly useful. If it is + provided, the Global Lock will NOT be used since it is not part of + the hardware reduced profile, and only 64-bit address fields will + be considered valid. + +FADT Section 5.2.9 (signature == "FACP") + + **Fixed ACPI Description Table** + Required for arm64. + + + The HW_REDUCED_ACPI flag must be set. All of the fields that are + to be ignored when HW_REDUCED_ACPI is set are expected to be set to + zero. + + If an FACS table is provided, the X_FIRMWARE_CTRL field is to be + used, not FIRMWARE_CTRL. + + If PSCI is used (as is recommended), make sure that ARM_BOOT_ARCH is + filled in properly - that the PSCI_COMPLIANT flag is set and that + PSCI_USE_HVC is set or unset as needed (see table 5-37). + + For the DSDT that is also required, the X_DSDT field is to be used, + not the DSDT field. + +FPDT Section 5.2.23 (signature == "FPDT") + + **Firmware Performance Data Table** + + Optional, useful for boot performance profiling. + +GTDT Section 5.2.24 (signature == "GTDT") + + **Generic Timer Description Table** + + Required for arm64. + +HEST Section 18.3.2 (signature == "HEST") + + **Hardware Error Source Table** + + ARM-specific error sources have been defined; please use those or the + PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER + Bridge), or use type 9 (Generic Hardware Error Source). Firmware first + error handling is possible if and only if Trusted Firmware is being + used on arm64. + + Must be supplied if RAS support is provided by the platform. It + is recommended this table be supplied. + +HPET Signature Reserved (signature == "HPET") + + **High Precision Event timer Table** + + x86 only table, will not be supported. + +IBFT Signature Reserved (signature == "IBFT") + + **iSCSI Boot Firmware Table** + + Microsoft defined table, support TBD. + +IORT Signature Reserved (signature == "IORT") + + **Input Output Remapping Table** + + arm64 only table, required in order to describe IO topology, SMMUs, + and GIC ITSs, and how those various components are connected together, + such as identifying which components are behind which SMMUs/ITSs. + This table will only be required on certain SBSA platforms (e.g., + when using GICv3-ITS and an SMMU); on SBSA Level 0 platforms, it + remains optional. + +IVRS Signature Reserved (signature == "IVRS") + + **I/O Virtualization Reporting Structure** + + x86_64 (AMD) only table, will not be supported. + +LPIT Signature Reserved (signature == "LPIT") + + **Low Power Idle Table** + + x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor + descriptions and power states on ARM platforms should use the DSDT + and define processor container devices (_HID ACPI0010, Section 8.4, + and more specifically 8.4.3 and 8.4.4). + +MADT Section 5.2.12 (signature == "APIC") + + **Multiple APIC Description Table** + + Required for arm64. Only the GIC interrupt controller structures + should be used (types 0xA - 0xF). + +MCFG Signature Reserved (signature == "MCFG") + + **Memory-mapped ConFiGuration space** + + If the platform supports PCI/PCIe, an MCFG table is required. + +MCHI Signature Reserved (signature == "MCHI") + + **Management Controller Host Interface table** + + Optional, not currently supported. + +MPST Section 5.2.21 (signature == "MPST") + + **Memory Power State Table** + + Optional, not currently supported. + +MSCT Section 5.2.19 (signature == "MSCT") + + **Maximum System Characteristic Table** + + Optional, not currently supported. + +MSDM Signature Reserved (signature == "MSDM") + + **Microsoft Data Management table** + + Microsoft only table, will not be supported. + +NFIT Section 5.2.25 (signature == "NFIT") + + **NVDIMM Firmware Interface Table** + + Optional, not currently supported. + +OEMx Signature of "OEMx" only + + **OEM Specific Tables** + + All tables starting with a signature of "OEM" are reserved for OEM + use. Since these are not meant to be of general use but are limited + to very specific end users, they are not recommended for use and are + not supported by the kernel for arm64. + +PCCT Section 14.1 (signature == "PCCT) + + **Platform Communications Channel Table** + + Recommend for use on arm64; use of PCC is recommended when using CPPC + to control performance and power for platform processors. + +PMTT Section 5.2.21.12 (signature == "PMTT") + + **Platform Memory Topology Table** + + Optional, not currently supported. + +PSDT Section 5.2.11.3 (signature == "PSDT") + + **Persistent System Description Table** + + Obsolete table, will not be supported. + +RASF Section 5.2.20 (signature == "RASF") + + **RAS Feature table** + + Optional, not currently supported. + +RSDP Section 5.2.5 (signature == "RSD PTR") + + **Root System Description PoinTeR** + + Required for arm64. + +RSDT Section 5.2.7 (signature == "RSDT") + + **Root System Description Table** + + Since this table can only provide 32-bit addresses, it is deprecated + on arm64, and will not be used. If provided, it will be ignored. + +SBST Section 5.2.14 (signature == "SBST") + + **Smart Battery Subsystem Table** + + Optional, not currently supported. + +SLIC Signature Reserved (signature == "SLIC") + + **Software LIcensing table** + + Microsoft only table, will not be supported. + +SLIT Section 5.2.17 (signature == "SLIT") + + **System Locality distance Information Table** + + Optional in general, but required for NUMA systems. + +SPCR Signature Reserved (signature == "SPCR") + + **Serial Port Console Redirection table** + + Required for arm64. + +SPMI Signature Reserved (signature == "SPMI") + + **Server Platform Management Interface table** + + Optional, not currently supported. + +SRAT Section 5.2.16 (signature == "SRAT") + + **System Resource Affinity Table** + + Optional, but if used, only the GICC Affinity structures are read. + To support arm64 NUMA, this table is required. + +SSDT Section 5.2.11.2 (signature == "SSDT") + + **Secondary System Description Table** + + These tables are a continuation of the DSDT; these are recommended + for use with devices that can be added to a running system, but can + also serve the purpose of dividing up device descriptions into more + manageable pieces. + + An SSDT can only ADD to the ACPI namespace. It cannot modify or + replace existing device descriptions already in the namespace. + + These tables are optional, however. ACPI tables should contain only + one DSDT but can contain many SSDTs. + +STAO Signature Reserved (signature == "STAO") + + **_STA Override table** + + Optional, but only necessary in virtualized environments in order to + hide devices from guest OSs. + +TCPA Signature Reserved (signature == "TCPA") + + **Trusted Computing Platform Alliance table** + + Optional, not currently supported, and may need changes to fully + interoperate with arm64. + +TPM2 Signature Reserved (signature == "TPM2") + + **Trusted Platform Module 2 table** + + Optional, not currently supported, and may need changes to fully + interoperate with arm64. + +UEFI Signature Reserved (signature == "UEFI") + + **UEFI ACPI data table** + + Optional, not currently supported. No known use case for arm64, + at present. + +WAET Signature Reserved (signature == "WAET") + + **Windows ACPI Emulated devices Table** + + Microsoft only table, will not be supported. + +WDAT Signature Reserved (signature == "WDAT") + + **Watch Dog Action Table** + + Microsoft only table, will not be supported. + +WDRT Signature Reserved (signature == "WDRT") + + **Watch Dog Resource Table** + + Microsoft only table, will not be supported. + +WPBT Signature Reserved (signature == "WPBT") + + **Windows Platform Binary Table** + + Microsoft only table, will not be supported. + +XENV Signature Reserved (signature == "XENV") + + **Xen project table** + + Optional, used only by Xen at present. + +XSDT Section 5.2.8 (signature == "XSDT") + + **eXtended System Description Table** + + Required for arm64. +====== ======================================================================== + +ACPI Objects +------------ +The expectations on individual ACPI objects that are likely to be used are +shown in the list that follows; any object not explicitly mentioned below +should be used as needed for a particular platform or particular subsystem, +such as power management or PCI. + +===== ================ ======================================================== +Name Section Usage for ARMv8 Linux +===== ================ ======================================================== +_CCA 6.2.17 This method must be defined for all bus masters + on arm64 - there are no assumptions made about + whether such devices are cache coherent or not. + The _CCA value is inherited by all descendants of + these devices so it does not need to be repeated. + Without _CCA on arm64, the kernel does not know what + to do about setting up DMA for the device. + + NB: this method provides default cache coherency + attributes; the presence of an SMMU can be used to + modify that, however. For example, a master could + default to non-coherent, but be made coherent with + the appropriate SMMU configuration (see Table 17 of + the IORT specification, ARM Document DEN 0049B). + +_CID 6.1.2 Use as needed, see also _HID. + +_CLS 6.1.3 Use as needed, see also _HID. + +_CPC 8.4.7.1 Use as needed, power management specific. CPPC is + recommended on arm64. + +_CRS 6.2.2 Required on arm64. + +_CSD 8.4.2.2 Use as needed, used only in conjunction with _CST. + +_CST 8.4.2.1 Low power idle states (8.4.4) are recommended instead + of C-states. + +_DDN 6.1.4 This field can be used for a device name. However, + it is meant for DOS device names (e.g., COM1), so be + careful of its use across OSes. + +_DSD 6.2.5 To be used with caution. If this object is used, try + to use it within the constraints already defined by the + Device Properties UUID. Only in rare circumstances + should it be necessary to create a new _DSD UUID. + + In either case, submit the _DSD definition along with + any driver patches for discussion, especially when + device properties are used. A driver will not be + considered complete without a corresponding _DSD + description. Once approved by kernel maintainers, + the UUID or device properties must then be registered + with the UEFI Forum; this may cause some iteration as + more than one OS will be registering entries. + +_DSM 9.1.1 Do not use this method. It is not standardized, the + return values are not well documented, and it is + currently a frequent source of error. + +\_GL 5.7.1 This object is not to be used in hardware reduced + mode, and therefore should not be used on arm64. + +_GLK 6.5.7 This object requires a global lock be defined; there + is no global lock on arm64 since it runs in hardware + reduced mode. Hence, do not use this object on arm64. + +\_GPE 5.3.1 This namespace is for x86 use only. Do not use it + on arm64. + +_HID 6.1.5 This is the primary object to use in device probing, + though _CID and _CLS may also be used. + +_INI 6.5.1 Not required, but can be useful in setting up devices + when UEFI leaves them in a state that may not be what + the driver expects before it starts probing. + +_LPI 8.4.4.3 Recommended for use with processor definitions (_HID + ACPI0010) on arm64. See also _RDI. + +_MLS 6.1.7 Highly recommended for use in internationalization. + +_OFF 7.2.2 It is recommended to define this method for any device + that can be turned on or off. + +_ON 7.2.3 It is recommended to define this method for any device + that can be turned on or off. + +\_OS 5.7.3 This method will return "Linux" by default (this is + the value of the macro ACPI_OS_NAME on Linux). The + command line parameter acpi_os= can be used + to set it to some other value. + +_OSC 6.2.11 This method can be a global method in ACPI (i.e., + \_SB._OSC), or it may be associated with a specific + device (e.g., \_SB.DEV0._OSC), or both. When used + as a global method, only capabilities published in + the ACPI specification are allowed. When used as + a device-specific method, the process described for + using _DSD MUST be used to create an _OSC definition; + out-of-process use of _OSC is not allowed. That is, + submit the device-specific _OSC usage description as + part of the kernel driver submission, get it approved + by the kernel community, then register it with the + UEFI Forum. + +\_OSI 5.7.2 Deprecated on ARM64. As far as ACPI firmware is + concerned, _OSI is not to be used to determine what + sort of system is being used or what functionality + is provided. The _OSC method is to be used instead. + +_PDC 8.4.1 Deprecated, do not use on arm64. + +\_PIC 5.8.1 The method should not be used. On arm64, the only + interrupt model available is GIC. + +\_PR 5.3.1 This namespace is for x86 use only on legacy systems. + Do not use it on arm64. + +_PRT 6.2.13 Required as part of the definition of all PCI root + devices. + +_PRx 7.3.8-11 Use as needed; power management specific. If _PR0 is + defined, _PR3 must also be defined. + +_PSx 7.3.2-5 Use as needed; power management specific. If _PS0 is + defined, _PS3 must also be defined. If clocks or + regulators need adjusting to be consistent with power + usage, change them in these methods. + +_RDI 8.4.4.4 Recommended for use with processor definitions (_HID + ACPI0010) on arm64. This should only be used in + conjunction with _LPI. + +\_REV 5.7.4 Always returns the latest version of ACPI supported. + +\_SB 5.3.1 Required on arm64; all devices must be defined in this + namespace. + +_SLI 6.2.15 Use is recommended when SLIT table is in use. + +_STA 6.3.7, It is recommended to define this method for any device + 7.2.4 that can be turned on or off. See also the STAO table + that provides overrides to hide devices in virtualized + environments. + +_SRS 6.2.16 Use as needed; see also _PRS. + +_STR 6.1.10 Recommended for conveying device names to end users; + this is preferred over using _DDN. + +_SUB 6.1.9 Use as needed; _HID or _CID are preferred. + +_SUN 6.1.11 Use as needed, but recommended. + +_SWS 7.4.3 Use as needed; power management specific; this may + require specification changes for use on arm64. + +_UID 6.1.12 Recommended for distinguishing devices of the same + class; define it if at all possible. +===== ================ ======================================================== + + + + +ACPI Event Model +---------------- +Do not use GPE block devices; these are not supported in the hardware reduced +profile used by arm64. Since there are no GPE blocks defined for use on ARM +platforms, ACPI events must be signaled differently. + +There are two options: GPIO-signaled interrupts (Section 5.6.5), and +interrupt-signaled events (Section 5.6.9). Interrupt-signaled events are a +new feature in the ACPI 6.1 specification. Either - or both - can be used +on a given platform, and which to use may be dependent of limitations in any +given SoC. If possible, interrupt-signaled events are recommended. + + +ACPI Processor Control +---------------------- +Section 8 of the ACPI specification changed significantly in version 6.0. +Processors should now be defined as Device objects with _HID ACPI0007; do +not use the deprecated Processor statement in ASL. All multiprocessor systems +should also define a hierarchy of processors, done with Processor Container +Devices (see Section 8.4.3.1, _HID ACPI0010); do not use processor aggregator +devices (Section 8.5) to describe processor topology. Section 8.4 of the +specification describes the semantics of these object definitions and how +they interrelate. + +Most importantly, the processor hierarchy defined also defines the low power +idle states that are available to the platform, along with the rules for +determining which processors can be turned on or off and the circumstances +that control that. Without this information, the processors will run in +whatever power state they were left in by UEFI. + +Note too, that the processor Device objects defined and the entries in the +MADT for GICs are expected to be in synchronization. The _UID of the Device +object must correspond to processor IDs used in the MADT. + +It is recommended that CPPC (8.4.5) be used as the primary model for processor +performance control on arm64. C-states and P-states may become available at +some point in the future, but most current design work appears to favor CPPC. + +Further, it is essential that the ARMv8 SoC provide a fully functional +implementation of PSCI; this will be the only mechanism supported by ACPI +to control CPU power state. Booting of secondary CPUs using the ACPI +parking protocol is possible, but discouraged, since only PSCI is supported +for ARM servers. + + +ACPI System Address Map Interfaces +---------------------------------- +In Section 15 of the ACPI specification, several methods are mentioned as +possible mechanisms for conveying memory resource information to the kernel. +For arm64, we will only support UEFI for booting with ACPI, hence the UEFI +GetMemoryMap() boot service is the only mechanism that will be used. + + +ACPI Platform Error Interfaces (APEI) +------------------------------------- +The APEI tables supported are described above. + +APEI requires the equivalent of an SCI and an NMI on ARMv8. The SCI is used +to notify the OSPM of errors that have occurred but can be corrected and the +system can continue correct operation, even if possibly degraded. The NMI is +used to indicate fatal errors that cannot be corrected, and require immediate +attention. + +Since there is no direct equivalent of the x86 SCI or NMI, arm64 handles +these slightly differently. The SCI is handled as a high priority interrupt; +given that these are corrected (or correctable) errors being reported, this +is sufficient. The NMI is emulated as the highest priority interrupt +possible. This implies some caution must be used since there could be +interrupts at higher privilege levels or even interrupts at the same priority +as the emulated NMI. In Linux, this should not be the case but one should +be aware it could happen. + + +ACPI Objects Not Supported on ARM64 +----------------------------------- +While this may change in the future, there are several classes of objects +that can be defined, but are not currently of general interest to ARM servers. +Some of these objects have x86 equivalents, and may actually make sense in ARM +servers. However, there is either no hardware available at present, or there +may not even be a non-ARM implementation yet. Hence, they are not currently +supported. + +The following classes of objects are not supported: + + - Section 9.2: ambient light sensor devices + + - Section 9.3: battery devices + + - Section 9.4: lids (e.g., laptop lids) + + - Section 9.8.2: IDE controllers + + - Section 9.9: floppy controllers + + - Section 9.10: GPE block devices + + - Section 9.15: PC/AT RTC/CMOS devices + + - Section 9.16: user presence detection devices + + - Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT + + - Section 9.18: time and alarm devices (see 9.15) + + - Section 10: power source and power meter devices + + - Section 11: thermal management + + - Section 12: embedded controllers interface + + - Section 13: SMBus interfaces + + +This also means that there is no support for the following objects: + +==== =========================== ==== ========== +Name Section Name Section +==== =========================== ==== ========== +_ALC 9.3.4 _FDM 9.10.3 +_ALI 9.3.2 _FIX 6.2.7 +_ALP 9.3.6 _GAI 10.4.5 +_ALR 9.3.5 _GHL 10.4.7 +_ALT 9.3.3 _GTM 9.9.2.1.1 +_BCT 10.2.2.10 _LID 9.5.1 +_BDN 6.5.3 _PAI 10.4.4 +_BIF 10.2.2.1 _PCL 10.3.2 +_BIX 10.2.2.1 _PIF 10.3.3 +_BLT 9.2.3 _PMC 10.4.1 +_BMA 10.2.2.4 _PMD 10.4.8 +_BMC 10.2.2.12 _PMM 10.4.3 +_BMD 10.2.2.11 _PRL 10.3.4 +_BMS 10.2.2.5 _PSR 10.3.1 +_BST 10.2.2.6 _PTP 10.4.2 +_BTH 10.2.2.7 _SBS 10.1.3 +_BTM 10.2.2.9 _SHL 10.4.6 +_BTP 10.2.2.8 _STM 9.9.2.1.1 +_DCK 6.5.2 _UPD 9.16.1 +_EC 12.12 _UPP 9.16.2 +_FDE 9.10.1 _WPC 10.5.2 +_FDI 9.10.2 _WPP 10.5.3 +==== =========================== ==== ========== diff --git a/Documentation/arch/arm64/amu.rst b/Documentation/arch/arm64/amu.rst new file mode 100644 index 000000000000..01f2de2b0450 --- /dev/null +++ b/Documentation/arch/arm64/amu.rst @@ -0,0 +1,119 @@ +.. _amu_index: + +======================================================= +Activity Monitors Unit (AMU) extension in AArch64 Linux +======================================================= + +Author: Ionela Voinescu + +Date: 2019-09-10 + +This document briefly describes the provision of Activity Monitors Unit +support in AArch64 Linux. + + +Architecture overview +--------------------- + +The activity monitors extension is an optional extension introduced by the +ARMv8.4 CPU architecture. + +The activity monitors unit, implemented in each CPU, provides performance +counters intended for system management use. The AMU extension provides a +system register interface to the counter registers and also supports an +optional external memory-mapped interface. + +Version 1 of the Activity Monitors architecture implements a counter group +of four fixed and architecturally defined 64-bit event counters. + + - CPU cycle counter: increments at the frequency of the CPU. + - Constant counter: increments at the fixed frequency of the system + clock. + - Instructions retired: increments with every architecturally executed + instruction. + - Memory stall cycles: counts instruction dispatch stall cycles caused by + misses in the last level cache within the clock domain. + +When in WFI or WFE these counters do not increment. + +The Activity Monitors architecture provides space for up to 16 architected +event counters. Future versions of the architecture may use this space to +implement additional architected event counters. + +Additionally, version 1 implements a counter group of up to 16 auxiliary +64-bit event counters. + +On cold reset all counters reset to 0. + + +Basic support +------------- + +The kernel can safely run a mix of CPUs with and without support for the +activity monitors extension. Therefore, when CONFIG_ARM64_AMU_EXTN is +selected we unconditionally enable the capability to allow any late CPU +(secondary or hotplugged) to detect and use the feature. + +When the feature is detected on a CPU, we flag the availability of the +feature but this does not guarantee the correct functionality of the +counters, only the presence of the extension. + +Firmware (code running at higher exception levels, e.g. arm-tf) support is +needed to: + + - Enable access for lower exception levels (EL2 and EL1) to the AMU + registers. + - Enable the counters. If not enabled these will read as 0. + - Save/restore the counters before/after the CPU is being put/brought up + from the 'off' power state. + +When using kernels that have this feature enabled but boot with broken +firmware the user may experience panics or lockups when accessing the +counter registers. Even if these symptoms are not observed, the values +returned by the register reads might not correctly reflect reality. Most +commonly, the counters will read as 0, indicating that they are not +enabled. + +If proper support is not provided in firmware it's best to disable +CONFIG_ARM64_AMU_EXTN. To be noted that for security reasons, this does not +bypass the setting of AMUSERENR_EL0 to trap accesses from EL0 (userspace) to +EL1 (kernel). Therefore, firmware should still ensure accesses to AMU registers +are not trapped in EL2/EL3. + +The fixed counters of AMUv1 are accessible though the following system +register definitions: + + - SYS_AMEVCNTR0_CORE_EL0 + - SYS_AMEVCNTR0_CONST_EL0 + - SYS_AMEVCNTR0_INST_RET_EL0 + - SYS_AMEVCNTR0_MEM_STALL_EL0 + +Auxiliary platform specific counters can be accessed using +SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15. + +Details can be found in: arch/arm64/include/asm/sysreg.h. + + +Userspace access +---------------- + +Currently, access from userspace to the AMU registers is disabled due to: + + - Security reasons: they might expose information about code executed in + secure mode. + - Purpose: AMU counters are intended for system management use. + +Also, the presence of the feature is not visible to userspace. + + +Virtualization +-------------- + +Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM +guest side is disabled due to: + + - Security reasons: they might expose information about code executed + by other guests or the host. + +Any attempt to access the AMU registers will result in an UNDEFINED +exception being injected into the guest. diff --git a/Documentation/arch/arm64/arm-acpi.rst b/Documentation/arch/arm64/arm-acpi.rst new file mode 100644 index 000000000000..1636352756bb --- /dev/null +++ b/Documentation/arch/arm64/arm-acpi.rst @@ -0,0 +1,528 @@ +===================== +ACPI on ARMv8 Servers +===================== + +ACPI can be used for ARMv8 general purpose servers designed to follow +the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server +Base Boot Requirements) [1] specifications. Please note that the SBBR +can be retrieved simply by visiting [1], but the SBSA is currently only +available to those with an ARM login due to ARM IP licensing concerns. + +The ARMv8 kernel implements the reduced hardware model of ACPI version +5.1 or later. Links to the specification and all external documents +it refers to are managed by the UEFI Forum. The specification is +available at http://www.uefi.org/specifications and documents referenced +by the specification can be found via http://www.uefi.org/acpi. + +If an ARMv8 system does not meet the requirements of the SBSA and SBBR, +or cannot be described using the mechanisms defined in the required ACPI +specifications, then ACPI may not be a good fit for the hardware. + +While the documents mentioned above set out the requirements for building +industry-standard ARMv8 servers, they also apply to more than one operating +system. The purpose of this document is to describe the interaction between +ACPI and Linux only, on an ARMv8 system -- that is, what Linux expects of +ACPI and what ACPI can expect of Linux. + + +Why ACPI on ARM? +---------------- +Before examining the details of the interface between ACPI and Linux, it is +useful to understand why ACPI is being used. Several technologies already +exist in Linux for describing non-enumerable hardware, after all. In this +section we summarize a blog post [2] from Grant Likely that outlines the +reasoning behind ACPI on ARMv8 servers. Actually, we snitch a good portion +of the summary text almost directly, to be honest. + +The short form of the rationale for ACPI on ARM is: + +- ACPI’s byte code (AML) allows the platform to encode hardware behavior, + while DT explicitly does not support this. For hardware vendors, being + able to encode behavior is a key tool used in supporting operating + system releases on new hardware. + +- ACPI’s OSPM defines a power management model that constrains what the + platform is allowed to do into a specific model, while still providing + flexibility in hardware design. + +- In the enterprise server environment, ACPI has established bindings (such + as for RAS) which are currently used in production systems. DT does not. + Such bindings could be defined in DT at some point, but doing so means ARM + and x86 would end up using completely different code paths in both firmware + and the kernel. + +- Choosing a single interface to describe the abstraction between a platform + and an OS is important. Hardware vendors would not be required to implement + both DT and ACPI if they want to support multiple operating systems. And, + agreeing on a single interface instead of being fragmented into per OS + interfaces makes for better interoperability overall. + +- The new ACPI governance process works well and Linux is now at the same + table as hardware vendors and other OS vendors. In fact, there is no + longer any reason to feel that ACPI only belongs to Windows or that + Linux is in any way secondary to Microsoft in this arena. The move of + ACPI governance into the UEFI forum has significantly opened up the + specification development process, and currently, a large portion of the + changes being made to ACPI are being driven by Linux. + +Key to the use of ACPI is the support model. For servers in general, the +responsibility for hardware behaviour cannot solely be the domain of the +kernel, but rather must be split between the platform and the kernel, in +order to allow for orderly change over time. ACPI frees the OS from needing +to understand all the minute details of the hardware so that the OS doesn’t +need to be ported to each and every device individually. It allows the +hardware vendors to take responsibility for power management behaviour without +depending on an OS release cycle which is not under their control. + +ACPI is also important because hardware and OS vendors have already worked +out the mechanisms for supporting a general purpose computing ecosystem. The +infrastructure is in place, the bindings are in place, and the processes are +in place. DT does exactly what Linux needs it to when working with vertically +integrated devices, but there are no good processes for supporting what the +server vendors need. Linux could potentially get there with DT, but doing so +really just duplicates something that already works. ACPI already does what +the hardware vendors need, Microsoft won’t collaborate on DT, and hardware +vendors would still end up providing two completely separate firmware +interfaces -- one for Linux and one for Windows. + + +Kernel Compatibility +-------------------- +One of the primary motivations for ACPI is standardization, and using that +to provide backward compatibility for Linux kernels. In the server market, +software and hardware are often used for long periods. ACPI allows the +kernel and firmware to agree on a consistent abstraction that can be +maintained over time, even as hardware or software change. As long as the +abstraction is supported, systems can be updated without necessarily having +to replace the kernel. + +When a Linux driver or subsystem is first implemented using ACPI, it by +definition ends up requiring a specific version of the ACPI specification +-- it's baseline. ACPI firmware must continue to work, even though it may +not be optimal, with the earliest kernel version that first provides support +for that baseline version of ACPI. There may be a need for additional drivers, +but adding new functionality (e.g., CPU power management) should not break +older kernel versions. Further, ACPI firmware must also work with the most +recent version of the kernel. + + +Relationship with Device Tree +----------------------------- +ACPI support in drivers and subsystems for ARMv8 should never be mutually +exclusive with DT support at compile time. + +At boot time the kernel will only use one description method depending on +parameters passed from the boot loader (including kernel bootargs). + +Regardless of whether DT or ACPI is used, the kernel must always be capable +of booting with either scheme (in kernels with both schemes enabled at compile +time). + + +Booting using ACPI tables +------------------------- +The only defined method for passing ACPI tables to the kernel on ARMv8 +is via the UEFI system configuration table. Just so it is explicit, this +means that ACPI is only supported on platforms that boot via UEFI. + +When an ARMv8 system boots, it can either have DT information, ACPI tables, +or in some very unusual cases, both. If no command line parameters are used, +the kernel will try to use DT for device enumeration; if there is no DT +present, the kernel will try to use ACPI tables, but only if they are present. +In neither is available, the kernel will not boot. If acpi=force is used +on the command line, the kernel will attempt to use ACPI tables first, but +fall back to DT if there are no ACPI tables present. The basic idea is that +the kernel will not fail to boot unless it absolutely has no other choice. + +Processing of ACPI tables may be disabled by passing acpi=off on the kernel +command line; this is the default behavior. + +In order for the kernel to load and use ACPI tables, the UEFI implementation +MUST set the ACPI_20_TABLE_GUID to point to the RSDP table (the table with +the ACPI signature "RSD PTR "). If this pointer is incorrect and acpi=force +is used, the kernel will disable ACPI and try to use DT to boot instead; the +kernel has, in effect, determined that ACPI tables are not present at that +point. + +If the pointer to the RSDP table is correct, the table will be mapped into +the kernel by the ACPI core, using the address provided by UEFI. + +The ACPI core will then locate and map in all other ACPI tables provided by +using the addresses in the RSDP table to find the XSDT (eXtended System +Description Table). The XSDT in turn provides the addresses to all other +ACPI tables provided by the system firmware; the ACPI core will then traverse +this table and map in the tables listed. + +The ACPI core will ignore any provided RSDT (Root System Description Table). +RSDTs have been deprecated and are ignored on arm64 since they only allow +for 32-bit addresses. + +Further, the ACPI core will only use the 64-bit address fields in the FADT +(Fixed ACPI Description Table). Any 32-bit address fields in the FADT will +be ignored on arm64. + +Hardware reduced mode (see Section 4.1 of the ACPI 6.1 specification) will +be enforced by the ACPI core on arm64. Doing so allows the ACPI core to +run less complex code since it no longer has to provide support for legacy +hardware from other architectures. Any fields that are not to be used for +hardware reduced mode must be set to zero. + +For the ACPI core to operate properly, and in turn provide the information +the kernel needs to configure devices, it expects to find the following +tables (all section numbers refer to the ACPI 6.1 specification): + + - RSDP (Root System Description Pointer), section 5.2.5 + + - XSDT (eXtended System Description Table), section 5.2.8 + + - FADT (Fixed ACPI Description Table), section 5.2.9 + + - DSDT (Differentiated System Description Table), section + 5.2.11.1 + + - MADT (Multiple APIC Description Table), section 5.2.12 + + - GTDT (Generic Timer Description Table), section 5.2.24 + + - If PCI is supported, the MCFG (Memory mapped ConFiGuration + Table), section 5.2.6, specifically Table 5-31. + + - If booting without a console= kernel parameter is + supported, the SPCR (Serial Port Console Redirection table), + section 5.2.6, specifically Table 5-31. + + - If necessary to describe the I/O topology, SMMUs and GIC ITSs, + the IORT (Input Output Remapping Table, section 5.2.6, specifically + Table 5-31). + + - If NUMA is supported, the SRAT (System Resource Affinity Table) + and SLIT (System Locality distance Information Table), sections + 5.2.16 and 5.2.17, respectively. + +If the above tables are not all present, the kernel may or may not be +able to boot properly since it may not be able to configure all of the +devices available. This list of tables is not meant to be all inclusive; +in some environments other tables may be needed (e.g., any of the APEI +tables from section 18) to support specific functionality. + + +ACPI Detection +-------------- +Drivers should determine their probe() type by checking for a null +value for ACPI_HANDLE, or checking .of_node, or other information in +the device structure. This is detailed further in the "Driver +Recommendations" section. + +In non-driver code, if the presence of ACPI needs to be detected at +run time, then check the value of acpi_disabled. If CONFIG_ACPI is not +set, acpi_disabled will always be 1. + + +Device Enumeration +------------------ +Device descriptions in ACPI should use standard recognized ACPI interfaces. +These may contain less information than is typically provided via a Device +Tree description for the same device. This is also one of the reasons that +ACPI can be useful -- the driver takes into account that it may have less +detailed information about the device and uses sensible defaults instead. +If done properly in the driver, the hardware can change and improve over +time without the driver having to change at all. + +Clocks provide an excellent example. In DT, clocks need to be specified +and the drivers need to take them into account. In ACPI, the assumption +is that UEFI will leave the device in a reasonable default state, including +any clock settings. If for some reason the driver needs to change a clock +value, this can be done in an ACPI method; all the driver needs to do is +invoke the method and not concern itself with what the method needs to do +to change the clock. Changing the hardware can then take place over time +by changing what the ACPI method does, and not the driver. + +In DT, the parameters needed by the driver to set up clocks as in the example +above are known as "bindings"; in ACPI, these are known as "Device Properties" +and provided to a driver via the _DSD object. + +ACPI tables are described with a formal language called ASL, the ACPI +Source Language (section 19 of the specification). This means that there +are always multiple ways to describe the same thing -- including device +properties. For example, device properties could use an ASL construct +that looks like this: Name(KEY0, "value0"). An ACPI device driver would +then retrieve the value of the property by evaluating the KEY0 object. +However, using Name() this way has multiple problems: (1) ACPI limits +names ("KEY0") to four characters unlike DT; (2) there is no industry +wide registry that maintains a list of names, minimizing re-use; (3) +there is also no registry for the definition of property values ("value0"), +again making re-use difficult; and (4) how does one maintain backward +compatibility as new hardware comes out? The _DSD method was created +to solve precisely these sorts of problems; Linux drivers should ALWAYS +use the _DSD method for device properties and nothing else. + +The _DSM object (ACPI Section 9.14.1) could also be used for conveying +device properties to a driver. Linux drivers should only expect it to +be used if _DSD cannot represent the data required, and there is no way +to create a new UUID for the _DSD object. Note that there is even less +regulation of the use of _DSM than there is of _DSD. Drivers that depend +on the contents of _DSM objects will be more difficult to maintain over +time because of this; as of this writing, the use of _DSM is the cause +of quite a few firmware problems and is not recommended. + +Drivers should look for device properties in the _DSD object ONLY; the _DSD +object is described in the ACPI specification section 6.2.5, but this only +describes how to define the structure of an object returned via _DSD, and +how specific data structures are defined by specific UUIDs. Linux should +only use the _DSD Device Properties UUID [5]: + + - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301 + + - https://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf + +The UEFI Forum provides a mechanism for registering device properties [4] +so that they may be used across all operating systems supporting ACPI. +Device properties that have not been registered with the UEFI Forum should +not be used. + +Before creating new device properties, check to be sure that they have not +been defined before and either registered in the Linux kernel documentation +as DT bindings, or the UEFI Forum as device properties. While we do not want +to simply move all DT bindings into ACPI device properties, we can learn from +what has been previously defined. + +If it is necessary to define a new device property, or if it makes sense to +synthesize the definition of a binding so it can be used in any firmware, +both DT bindings and ACPI device properties for device drivers have review +processes. Use them both. When the driver itself is submitted for review +to the Linux mailing lists, the device property definitions needed must be +submitted at the same time. A driver that supports ACPI and uses device +properties will not be considered complete without their definitions. Once +the device property has been accepted by the Linux community, it must be +registered with the UEFI Forum [4], which will review it again for consistency +within the registry. This may require iteration. The UEFI Forum, though, +will always be the canonical site for device property definitions. + +It may make sense to provide notice to the UEFI Forum that there is the +intent to register a previously unused device property name as a means of +reserving the name for later use. Other operating system vendors will +also be submitting registration requests and this may help smooth the +process. + +Once registration and review have been completed, the kernel provides an +interface for looking up device properties in a manner independent of +whether DT or ACPI is being used. This API should be used [6]; it can +eliminate some duplication of code paths in driver probing functions and +discourage divergence between DT bindings and ACPI device properties. + + +Programmable Power Control Resources +------------------------------------ +Programmable power control resources include such resources as voltage/current +providers (regulators) and clock sources. + +With ACPI, the kernel clock and regulator framework is not expected to be used +at all. + +The kernel assumes that power control of these resources is represented with +Power Resource Objects (ACPI section 7.1). The ACPI core will then handle +correctly enabling and disabling resources as they are needed. In order to +get that to work, ACPI assumes each device has defined D-states and that these +can be controlled through the optional ACPI methods _PS0, _PS1, _PS2, and _PS3; +in ACPI, _PS0 is the method to invoke to turn a device full on, and _PS3 is for +turning a device full off. + +There are two options for using those Power Resources. They can: + + - be managed in a _PSx method which gets called on entry to power + state Dx. + + - be declared separately as power resources with their own _ON and _OFF + methods. They are then tied back to D-states for a particular device + via _PRx which specifies which power resources a device needs to be on + while in Dx. Kernel then tracks number of devices using a power resource + and calls _ON/_OFF as needed. + +The kernel ACPI code will also assume that the _PSx methods follow the normal +ACPI rules for such methods: + + - If either _PS0 or _PS3 is implemented, then the other method must also + be implemented. + + - If a device requires usage or setup of a power resource when on, the ASL + should organize that it is allocated/enabled using the _PS0 method. + + - Resources allocated or enabled in the _PS0 method should be disabled + or de-allocated in the _PS3 method. + + - Firmware will leave the resources in a reasonable state before handing + over control to the kernel. + +Such code in _PSx methods will of course be very platform specific. But, +this allows the driver to abstract out the interface for operating the device +and avoid having to read special non-standard values from ACPI tables. Further, +abstracting the use of these resources allows the hardware to change over time +without requiring updates to the driver. + + +Clocks +------ +ACPI makes the assumption that clocks are initialized by the firmware -- +UEFI, in this case -- to some working value before control is handed over +to the kernel. This has implications for devices such as UARTs, or SoC-driven +LCD displays, for example. + +When the kernel boots, the clocks are assumed to be set to reasonable +working values. If for some reason the frequency needs to change -- e.g., +throttling for power management -- the device driver should expect that +process to be abstracted out into some ACPI method that can be invoked +(please see the ACPI specification for further recommendations on standard +methods to be expected). The only exceptions to this are CPU clocks where +CPPC provides a much richer interface than ACPI methods. If the clocks +are not set, there is no direct way for Linux to control them. + +If an SoC vendor wants to provide fine-grained control of the system clocks, +they could do so by providing ACPI methods that could be invoked by Linux +drivers. However, this is NOT recommended and Linux drivers should NOT use +such methods, even if they are provided. Such methods are not currently +standardized in the ACPI specification, and using them could tie a kernel +to a very specific SoC, or tie an SoC to a very specific version of the +kernel, both of which we are trying to avoid. + + +Driver Recommendations +---------------------- +DO NOT remove any DT handling when adding ACPI support for a driver. The +same device may be used on many different systems. + +DO try to structure the driver so that it is data-driven. That is, set up +a struct containing internal per-device state based on defaults and whatever +else must be discovered by the driver probe function. Then, have the rest +of the driver operate off of the contents of that struct. Doing so should +allow most divergence between ACPI and DT functionality to be kept local to +the probe function instead of being scattered throughout the driver. For +example:: + + static int device_probe_dt(struct platform_device *pdev) + { + /* DT specific functionality */ + ... + } + + static int device_probe_acpi(struct platform_device *pdev) + { + /* ACPI specific functionality */ + ... + } + + static int device_probe(struct platform_device *pdev) + { + ... + struct device_node node = pdev->dev.of_node; + ... + + if (node) + ret = device_probe_dt(pdev); + else if (ACPI_HANDLE(&pdev->dev)) + ret = device_probe_acpi(pdev); + else + /* other initialization */ + ... + /* Continue with any generic probe operations */ + ... + } + +DO keep the MODULE_DEVICE_TABLE entries together in the driver to make it +clear the different names the driver is probed for, both from DT and from +ACPI:: + + static struct of_device_id virtio_mmio_match[] = { + { .compatible = "virtio,mmio", }, + { } + }; + MODULE_DEVICE_TABLE(of, virtio_mmio_match); + + static const struct acpi_device_id virtio_mmio_acpi_match[] = { + { "LNRO0005", }, + { } + }; + MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match); + + +ASWG +---- +The ACPI specification changes regularly. During the year 2014, for instance, +version 5.1 was released and version 6.0 substantially completed, with most of +the changes being driven by ARM-specific requirements. Proposed changes are +presented and discussed in the ASWG (ACPI Specification Working Group) which +is a part of the UEFI Forum. The current version of the ACPI specification +is 6.1 release in January 2016. + +Participation in this group is open to all UEFI members. Please see +http://www.uefi.org/workinggroup for details on group membership. + +It is the intent of the ARMv8 ACPI kernel code to follow the ACPI specification +as closely as possible, and to only implement functionality that complies with +the released standards from UEFI ASWG. As a practical matter, there will be +vendors that provide bad ACPI tables or violate the standards in some way. +If this is because of errors, quirks and fix-ups may be necessary, but will +be avoided if possible. If there are features missing from ACPI that preclude +it from being used on a platform, ECRs (Engineering Change Requests) should be +submitted to ASWG and go through the normal approval process; for those that +are not UEFI members, many other members of the Linux community are and would +likely be willing to assist in submitting ECRs. + + +Linux Code +---------- +Individual items specific to Linux on ARM, contained in the Linux +source code, are in the list that follows: + +ACPI_OS_NAME + This macro defines the string to be returned when + an ACPI method invokes the _OS method. On ARM64 + systems, this macro will be "Linux" by default. + The command line parameter acpi_os= + can be used to set it to some other value. The + default value for other architectures is "Microsoft + Windows NT", for example. + +ACPI Objects +------------ +Detailed expectations for ACPI tables and object are listed in the file +Documentation/arch/arm64/acpi_object_usage.rst. + + +References +---------- +[0] http://silver.arm.com + document ARM-DEN-0029, or newer: + "Server Base System Architecture", version 2.3, dated 27 Mar 2014 + +[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf + Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System + Software on ARM Platforms", dated 16 Aug 2014 + +[2] http://www.secretlab.ca/archives/151, + 10 Jan 2015, Copyright (c) 2015, + Linaro Ltd., written by Grant Likely. + +[3] AMD ACPI for Seattle platform documentation + http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf + + +[4] http://www.uefi.org/acpi + please see the link for the "ACPI _DSD Device + Property Registry Instructions" + +[5] http://www.uefi.org/acpi + please see the link for the "_DSD (Device + Specific Data) Implementation Guide" + +[6] Kernel code for the unified device + property interface can be found in + include/linux/property.h and drivers/base/property.c. + + +Authors +------- +- Al Stone +- Graeme Gregory +- Hanjun Guo + +- Grant Likely , for the "Why ACPI on ARM?" section diff --git a/Documentation/arch/arm64/asymmetric-32bit.rst b/Documentation/arch/arm64/asymmetric-32bit.rst new file mode 100644 index 000000000000..64a0b505da7d --- /dev/null +++ b/Documentation/arch/arm64/asymmetric-32bit.rst @@ -0,0 +1,155 @@ +====================== +Asymmetric 32-bit SoCs +====================== + +Author: Will Deacon + +This document describes the impact of asymmetric 32-bit SoCs on the +execution of 32-bit (``AArch32``) applications. + +Date: 2021-05-17 + +Introduction +============ + +Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset +of the CPUs are capable of executing 32-bit user applications. On such +a system, Linux by default treats the asymmetry as a "mismatch" and +disables support for both the ``PER_LINUX32`` personality and +``execve(2)`` of 32-bit ELF binaries, with the latter returning +``-ENOEXEC``. If the mismatch is detected during late onlining of a +64-bit-only CPU, then the onlining operation fails and the new CPU is +unavailable for scheduling. + +Surprisingly, these SoCs have been produced with the intention of +running legacy 32-bit binaries. Unsurprisingly, that doesn't work very +well with the default behaviour of Linux. + +It seems inevitable that future SoCs will drop 32-bit support +altogether, so if you're stuck in the unenviable position of needing to +run 32-bit code on one of these transitionary platforms then you would +be wise to consider alternatives such as recompilation, emulation or +retirement. If neither of those options are practical, then read on. + +Enabling kernel support +======================= + +Since the kernel support is not completely transparent to userspace, +allowing 32-bit tasks to run on an asymmetric 32-bit system requires an +explicit "opt-in" and can be enabled by passing the +``allow_mismatched_32bit_el0`` parameter on the kernel command-line. + +For the remainder of this document we will refer to an *asymmetric +system* to mean an asymmetric 32-bit SoC running Linux with this kernel +command-line option enabled. + +Userspace impact +================ + +32-bit tasks running on an asymmetric system behave in mostly the same +way as on a homogeneous system, with a few key differences relating to +CPU affinity. + +sysfs +----- + +The subset of CPUs capable of running 32-bit tasks is described in +``/sys/devices/system/cpu/aarch32_el0`` and is documented further in +``Documentation/ABI/testing/sysfs-devices-system-cpu``. + +**Note:** CPUs are advertised by this file as they are detected and so +late-onlining of 32-bit-capable CPUs can result in the file contents +being modified by the kernel at runtime. Once advertised, CPUs are never +removed from the file. + +``execve(2)`` +------------- + +On a homogeneous system, the CPU affinity of a task is preserved across +``execve(2)``. This is not always possible on an asymmetric system, +specifically when the new program being executed is 32-bit yet the +affinity mask contains 64-bit-only CPUs. In this situation, the kernel +determines the new affinity mask as follows: + + 1. If the 32-bit-capable subset of the affinity mask is not empty, + then the affinity is restricted to that subset and the old affinity + mask is saved. This saved mask is inherited over ``fork(2)`` and + preserved across ``execve(2)`` of 32-bit programs. + + **Note:** This step does not apply to ``SCHED_DEADLINE`` tasks. + See `SCHED_DEADLINE`_. + + 2. Otherwise, the cpuset hierarchy of the task is walked until an + ancestor is found containing at least one 32-bit-capable CPU. The + affinity of the task is then changed to match the 32-bit-capable + subset of the cpuset determined by the walk. + + 3. On failure (i.e. out of memory), the affinity is changed to the set + of all 32-bit-capable CPUs of which the kernel is aware. + +A subsequent ``execve(2)`` of a 64-bit program by the 32-bit task will +invalidate the affinity mask saved in (1) and attempt to restore the CPU +affinity of the task using the saved mask if it was previously valid. +This restoration may fail due to intervening changes to the deadline +policy or cpuset hierarchy, in which case the ``execve(2)`` continues +with the affinity unchanged. + +Calls to ``sched_setaffinity(2)`` for a 32-bit task will consider only +the 32-bit-capable CPUs of the requested affinity mask. On success, the +affinity for the task is updated and any saved mask from a prior +``execve(2)`` is invalidated. + +``SCHED_DEADLINE`` +------------------ + +Explicit admission of a 32-bit deadline task to the default root domain +(e.g. by calling ``sched_setattr(2)``) is rejected on an asymmetric +32-bit system unless admission control is disabled by writing -1 to +``/proc/sys/kernel/sched_rt_runtime_us``. + +``execve(2)`` of a 32-bit program from a 64-bit deadline task will +return ``-ENOEXEC`` if the root domain for the task contains any +64-bit-only CPUs and admission control is enabled. Concurrent offlining +of 32-bit-capable CPUs may still necessitate the procedure described in +`execve(2)`_, in which case step (1) is skipped and a warning is +emitted on the console. + +**Note:** It is recommended that a set of 32-bit-capable CPUs are placed +into a separate root domain if ``SCHED_DEADLINE`` is to be used with +32-bit tasks on an asymmetric system. Failure to do so is likely to +result in missed deadlines. + +Cpusets +------- + +The affinity of a 32-bit task on an asymmetric system may include CPUs +that are not explicitly allowed by the cpuset to which it is attached. +This can occur as a result of the following two situations: + + - A 64-bit task attached to a cpuset which allows only 64-bit CPUs + executes a 32-bit program. + + - All of the 32-bit-capable CPUs allowed by a cpuset containing a + 32-bit task are offlined. + +In both of these cases, the new affinity is calculated according to step +(2) of the process described in `execve(2)`_ and the cpuset hierarchy is +unchanged irrespective of the cgroup version. + +CPU hotplug +----------- + +On an asymmetric system, the first detected 32-bit-capable CPU is +prevented from being offlined by userspace and any such attempt will +return ``-EPERM``. Note that suspend is still permitted even if the +primary CPU (i.e. CPU 0) is 64-bit-only. + +KVM +--- + +Although KVM will not advertise 32-bit EL0 support to any vCPUs on an +asymmetric system, a broken guest at EL1 could still attempt to execute +32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit +mode will return to host userspace with an ``exit_reason`` of +``KVM_EXIT_FAIL_ENTRY`` and will remain non-runnable until successfully +re-initialised by a subsequent ``KVM_ARM_VCPU_INIT`` operation. diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst new file mode 100644 index 000000000000..ffeccdd6bdac --- /dev/null +++ b/Documentation/arch/arm64/booting.rst @@ -0,0 +1,431 @@ +===================== +Booting AArch64 Linux +===================== + +Author: Will Deacon + +Date : 07 September 2012 + +This document is based on the ARM booting document by Russell King and +is relevant to all public releases of the AArch64 Linux kernel. + +The AArch64 exception model is made up of a number of exception levels +(EL0 - EL3), with EL0, EL1 and EL2 having a secure and a non-secure +counterpart. EL2 is the hypervisor level, EL3 is the highest priority +level and exists only in secure mode. Both are architecturally optional. + +For the purposes of this document, we will use the term `boot loader` +simply to define all software that executes on the CPU(s) before control +is passed to the Linux kernel. This may include secure monitor and +hypervisor code, or it may just be a handful of instructions for +preparing a minimal boot environment. + +Essentially, the boot loader should provide (as a minimum) the +following: + +1. Setup and initialise the RAM +2. Setup the device tree +3. Decompress the kernel image +4. Call the kernel image + + +1. Setup and initialise RAM +--------------------------- + +Requirement: MANDATORY + +The boot loader is expected to find and initialise all RAM that the +kernel will use for volatile data storage in the system. It performs +this in a machine dependent manner. (It may use internal algorithms +to automatically locate and size all RAM, or it may use knowledge of +the RAM in the machine, or any other method the boot loader designer +sees fit.) + + +2. Setup the device tree +------------------------- + +Requirement: MANDATORY + +The device tree blob (dtb) must be placed on an 8-byte boundary and must +not exceed 2 megabytes in size. Since the dtb will be mapped cacheable +using blocks of up to 2 megabytes in size, it must not be placed within +any 2M region which must be mapped with any specific attributes. + +NOTE: versions prior to v4.2 also require that the DTB be placed within +the 512 MB region starting at text_offset bytes below the kernel Image. + +3. Decompress the kernel image +------------------------------ + +Requirement: OPTIONAL + +The AArch64 kernel does not currently provide a decompressor and +therefore requires decompression (gzip etc.) to be performed by the boot +loader if a compressed Image target (e.g. Image.gz) is used. For +bootloaders that do not implement this requirement, the uncompressed +Image target is available instead. + + +4. Call the kernel image +------------------------ + +Requirement: MANDATORY + +The decompressed kernel image contains a 64-byte header as follows:: + + u32 code0; /* Executable code */ + u32 code1; /* Executable code */ + u64 text_offset; /* Image load offset, little endian */ + u64 image_size; /* Effective Image size, little endian */ + u64 flags; /* kernel flags, little endian */ + u64 res2 = 0; /* reserved */ + u64 res3 = 0; /* reserved */ + u64 res4 = 0; /* reserved */ + u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ + u32 res5; /* reserved (used for PE COFF offset) */ + + +Header notes: + +- As of v3.17, all fields are little endian unless stated otherwise. + +- code0/code1 are responsible for branching to stext. + +- when booting through EFI, code0/code1 are initially skipped. + res5 is an offset to the PE header and the PE header has the EFI + entry point (efi_stub_entry). When the stub has done its work, it + jumps to code0 to resume the normal boot process. + +- Prior to v3.17, the endianness of text_offset was not specified. In + these cases image_size is zero and text_offset is 0x80000 in the + endianness of the kernel. Where image_size is non-zero image_size is + little-endian and must be respected. Where image_size is zero, + text_offset can be assumed to be 0x80000. + +- The flags field (introduced in v3.17) is a little-endian 64-bit field + composed as follows: + + ============= =============================================================== + Bit 0 Kernel endianness. 1 if BE, 0 if LE. + Bit 1-2 Kernel Page size. + + * 0 - Unspecified. + * 1 - 4K + * 2 - 16K + * 3 - 64K + Bit 3 Kernel physical placement + + 0 + 2MB aligned base should be as close as possible + to the base of DRAM, since memory below it is not + accessible via the linear mapping + 1 + 2MB aligned base such that all image_size bytes + counted from the start of the image are within + the 48-bit addressable range of physical memory + Bits 4-63 Reserved. + ============= =============================================================== + +- When image_size is zero, a bootloader should attempt to keep as much + memory as possible free for use by the kernel immediately after the + end of the kernel image. The amount of space required will vary + depending on selected features, and is effectively unbound. + +The Image must be placed text_offset bytes from a 2MB aligned base +address anywhere in usable system RAM and called there. The region +between the 2 MB aligned base address and the start of the image has no +special significance to the kernel, and may be used for other purposes. +At least image_size bytes from the start of the image must be free for +use by the kernel. +NOTE: versions prior to v4.6 cannot make use of memory below the +physical offset of the Image so it is recommended that the Image be +placed as close as possible to the start of system RAM. + +If an initrd/initramfs is passed to the kernel at boot, it must reside +entirely within a 1 GB aligned physical memory window of up to 32 GB in +size that fully covers the kernel Image as well. + +Any memory described to the kernel (even that below the start of the +image) which is not marked as reserved from the kernel (e.g., with a +memreserve region in the device tree) will be considered as available to +the kernel. + +Before jumping into the kernel, the following conditions must be met: + +- Quiesce all DMA capable devices so that memory does not get + corrupted by bogus network packets or disk data. This will save + you many hours of debug. + +- Primary CPU general-purpose register settings: + + - x0 = physical address of device tree blob (dtb) in system RAM. + - x1 = 0 (reserved for future use) + - x2 = 0 (reserved for future use) + - x3 = 0 (reserved for future use) + +- CPU mode + + All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, + IRQ and FIQ). + The CPU must be in non-secure state, either in EL2 (RECOMMENDED in order + to have access to the virtualisation extensions), or in EL1. + +- Caches, MMUs + + The MMU must be off. + + The instruction cache may be on or off, and must not hold any stale + entries corresponding to the loaded kernel image. + + The address range corresponding to the loaded kernel image must be + cleaned to the PoC. In the presence of a system cache or other + coherent masters with caches enabled, this will typically require + cache maintenance by VA rather than set/way operations. + System caches which respect the architected cache maintenance by VA + operations must be configured and may be enabled. + System caches which do not respect architected cache maintenance by VA + operations (not recommended) must be configured and disabled. + +- Architected timers + + CNTFRQ must be programmed with the timer frequency and CNTVOFF must + be programmed with a consistent value on all CPUs. If entering the + kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where + available. + +- Coherency + + All CPUs to be booted by the kernel must be part of the same coherency + domain on entry to the kernel. This may require IMPLEMENTATION DEFINED + initialisation to enable the receiving of maintenance operations on + each CPU. + +- System registers + + All writable architected system registers at or below the exception + level where the kernel image will be entered must be initialised by + software at a higher exception level to prevent execution in an UNKNOWN + state. + + For all systems: + - If EL3 is present: + + - SCR_EL3.FIQ must have the same value across all CPUs the kernel is + executing on. + - The value of SCR_EL3.FIQ must be the same as the one present at boot + time whenever the kernel is executing. + + - If EL3 is present and the kernel is entered at EL2: + + - SCR_EL3.HCE (bit 8) must be initialised to 0b1. + + For systems with a GICv3 interrupt controller to be used in v3 mode: + - If EL3 is present: + + - ICC_SRE_EL3.Enable (bit 3) must be initialised to 0b1. + - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. + - ICC_CTLR_EL3.PMHE (bit 6) must be set to the same value across + all CPUs the kernel is executing on, and must stay constant + for the lifetime of the kernel. + + - If the kernel is entered at EL1: + + - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 + - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. + + - The DT or ACPI tables must describe a GICv3 interrupt controller. + + For systems with a GICv3 interrupt controller to be used in + compatibility (v2) mode: + + - If EL3 is present: + + ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0. + + - If the kernel is entered at EL1: + + ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0. + + - The DT or ACPI tables must describe a GICv2 interrupt controller. + + For CPUs with pointer authentication functionality: + + - If EL3 is present: + + - SCR_EL3.APK (bit 16) must be initialised to 0b1 + - SCR_EL3.API (bit 17) must be initialised to 0b1 + + - If the kernel is entered at EL1: + + - HCR_EL2.APK (bit 40) must be initialised to 0b1 + - HCR_EL2.API (bit 41) must be initialised to 0b1 + + For CPUs with Activity Monitors Unit v1 (AMUv1) extension present: + + - If EL3 is present: + + - CPTR_EL3.TAM (bit 30) must be initialised to 0b0 + - CPTR_EL2.TAM (bit 30) must be initialised to 0b0 + - AMCNTENSET0_EL0 must be initialised to 0b1111 + - AMCNTENSET1_EL0 must be initialised to a platform specific value + having 0b1 set for the corresponding bit for each of the auxiliary + counters present. + + - If the kernel is entered at EL1: + + - AMCNTENSET0_EL0 must be initialised to 0b1111 + - AMCNTENSET1_EL0 must be initialised to a platform specific value + having 0b1 set for the corresponding bit for each of the auxiliary + counters present. + + For CPUs with the Fine Grained Traps (FEAT_FGT) extension present: + + - If EL3 is present and the kernel is entered at EL2: + + - SCR_EL3.FGTEn (bit 27) must be initialised to 0b1. + + For CPUs with support for HCRX_EL2 (FEAT_HCX) present: + + - If EL3 is present and the kernel is entered at EL2: + + - SCR_EL3.HXEn (bit 38) must be initialised to 0b1. + + For CPUs with Advanced SIMD and floating point support: + + - If EL3 is present: + + - CPTR_EL3.TFP (bit 10) must be initialised to 0b0. + + - If EL2 is present and the kernel is entered at EL1: + + - CPTR_EL2.TFP (bit 10) must be initialised to 0b0. + + For CPUs with the Scalable Vector Extension (FEAT_SVE) present: + + - if EL3 is present: + + - CPTR_EL3.EZ (bit 8) must be initialised to 0b1. + + - ZCR_EL3.LEN must be initialised to the same value for all CPUs the + kernel is executed on. + + - If the kernel is entered at EL1 and EL2 is present: + + - CPTR_EL2.TZ (bit 8) must be initialised to 0b0. + + - CPTR_EL2.ZEN (bits 17:16) must be initialised to 0b11. + + - ZCR_EL2.LEN must be initialised to the same value for all CPUs the + kernel will execute on. + + For CPUs with the Scalable Matrix Extension (FEAT_SME): + + - If EL3 is present: + + - CPTR_EL3.ESM (bit 12) must be initialised to 0b1. + + - SCR_EL3.EnTP2 (bit 41) must be initialised to 0b1. + + - SMCR_EL3.LEN must be initialised to the same value for all CPUs the + kernel will execute on. + + - If the kernel is entered at EL1 and EL2 is present: + + - CPTR_EL2.TSM (bit 12) must be initialised to 0b0. + + - CPTR_EL2.SMEN (bits 25:24) must be initialised to 0b11. + + - SCTLR_EL2.EnTP2 (bit 60) must be initialised to 0b1. + + - SMCR_EL2.LEN must be initialised to the same value for all CPUs the + kernel will execute on. + + - HWFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. + + - HWFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. + + - HWFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. + + - HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. + + For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64): + + - If EL3 is present: + + - SMCR_EL3.FA64 (bit 31) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - SMCR_EL2.FA64 (bit 31) must be initialised to 0b1. + + For CPUs with the Memory Tagging Extension feature (FEAT_MTE2): + + - If EL3 is present: + + - SCR_EL3.ATA (bit 26) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - HCR_EL2.ATA (bit 56) must be initialised to 0b1. + + For CPUs with the Scalable Matrix Extension version 2 (FEAT_SME2): + + - If EL3 is present: + + - SMCR_EL3.EZT0 (bit 30) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1. + +The requirements described above for CPU mode, caches, MMUs, architected +timers, coherency and system registers apply to all CPUs. All CPUs must +enter the kernel in the same exception level. Where the values documented +disable traps it is permissible for these traps to be enabled so long as +those traps are handled transparently by higher exception levels as though +the values documented were set. + +The boot loader is expected to enter the kernel on each CPU in the +following manner: + +- The primary CPU must jump directly to the first instruction of the + kernel image. The device tree blob passed by this CPU must contain + an 'enable-method' property for each cpu node. The supported + enable-methods are described below. + + It is expected that the bootloader will generate these device tree + properties and insert them into the blob prior to kernel entry. + +- CPUs with a "spin-table" enable-method must have a 'cpu-release-addr' + property in their cpu node. This property identifies a + naturally-aligned 64-bit zero-initalised memory location. + + These CPUs should spin outside of the kernel in a reserved area of + memory (communicated to the kernel by a /memreserve/ region in the + device tree) polling their cpu-release-addr location, which must be + contained in the reserved region. A wfe instruction may be inserted + to reduce the overhead of the busy-loop and a sev will be issued by + the primary CPU. When a read of the location pointed to by the + cpu-release-addr returns a non-zero value, the CPU must jump to this + value. The value will be written as a single 64-bit little-endian + value, so CPUs must convert the read value to their native endianness + before jumping to it. + +- CPUs with a "psci" enable method should remain outside of + the kernel (i.e. outside of the regions of memory described to the + kernel in the memory node, or in a reserved area of memory described + to the kernel by a /memreserve/ region in the device tree). The + kernel will issue CPU_ON calls as described in ARM document number ARM + DEN 0022A ("Power State Coordination Interface System Software on ARM + processors") to bring CPUs into the kernel. + + The device tree should contain a 'psci' node, as described in + Documentation/devicetree/bindings/arm/psci.yaml. + +- Secondary CPU general-purpose register settings + + - x0 = 0 (reserved for future use) + - x1 = 0 (reserved for future use) + - x2 = 0 (reserved for future use) + - x3 = 0 (reserved for future use) diff --git a/Documentation/arch/arm64/cpu-feature-registers.rst b/Documentation/arch/arm64/cpu-feature-registers.rst new file mode 100644 index 000000000000..c7adc7897df6 --- /dev/null +++ b/Documentation/arch/arm64/cpu-feature-registers.rst @@ -0,0 +1,400 @@ +=========================== +ARM64 CPU Feature Registers +=========================== + +Author: Suzuki K Poulose + + +This file describes the ABI for exporting the AArch64 CPU ID/feature +registers to userspace. The availability of this ABI is advertised +via the HWCAP_CPUID in HWCAPs. + +1. Motivation +------------- + +The ARM architecture defines a set of feature registers, which describe +the capabilities of the CPU/system. Access to these system registers is +restricted from EL0 and there is no reliable way for an application to +extract this information to make better decisions at runtime. There is +limited information available to the application via HWCAPs, however +there are some issues with their usage. + + a) Any change to the HWCAPs requires an update to userspace (e.g libc) + to detect the new changes, which can take a long time to appear in + distributions. Exposing the registers allows applications to get the + information without requiring updates to the toolchains. + + b) Access to HWCAPs is sometimes limited (e.g prior to libc, or + when ld is initialised at startup time). + + c) HWCAPs cannot represent non-boolean information effectively. The + architecture defines a canonical format for representing features + in the ID registers; this is well defined and is capable of + representing all valid architecture variations. + + +2. Requirements +--------------- + + a) Safety: + + Applications should be able to use the information provided by the + infrastructure to run safely across the system. This has greater + implications on a system with heterogeneous CPUs. + The infrastructure exports a value that is safe across all the + available CPU on the system. + + e.g, If at least one CPU doesn't implement CRC32 instructions, while + others do, we should report that the CRC32 is not implemented. + Otherwise an application could crash when scheduled on the CPU + which doesn't support CRC32. + + b) Security: + + Applications should only be able to receive information that is + relevant to the normal operation in userspace. Hence, some of the + fields are masked out(i.e, made invisible) and their values are set to + indicate the feature is 'not supported'. See Section 4 for the list + of visible features. Also, the kernel may manipulate the fields + based on what it supports. e.g, If FP is not supported by the + kernel, the values could indicate that the FP is not available + (even when the CPU provides it). + + c) Implementation Defined Features + + The infrastructure doesn't expose any register which is + IMPLEMENTATION DEFINED as per ARMv8-A Architecture. + + d) CPU Identification: + + MIDR_EL1 is exposed to help identify the processor. On a + heterogeneous system, this could be racy (just like getcpu()). The + process could be migrated to another CPU by the time it uses the + register value, unless the CPU affinity is set. Hence, there is no + guarantee that the value reflects the processor that it is + currently executing on. The REVIDR is not exposed due to this + constraint, as REVIDR makes sense only in conjunction with the + MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs + at:: + + /sys/devices/system/cpu/cpu$ID/regs/identification/ + \- midr + \- revidr + +3. Implementation +-------------------- + +The infrastructure is built on the emulation of the 'MRS' instruction. +Accessing a restricted system register from an application generates an +exception and ends up in SIGILL being delivered to the process. +The infrastructure hooks into the exception handler and emulates the +operation if the source belongs to the supported system register space. + +The infrastructure emulates only the following system register space:: + + Op0=3, Op1=0, CRn=0, CRm=0,2,3,4,5,6,7 + +(See Table C5-6 'System instruction encodings for non-Debug System +register accesses' in ARMv8 ARM DDI 0487A.h, for the list of +registers). + +The following rules are applied to the value returned by the +infrastructure: + + a) The value of an 'IMPLEMENTATION DEFINED' field is set to 0. + b) The value of a reserved field is populated with the reserved + value as defined by the architecture. + c) The value of a 'visible' field holds the system wide safe value + for the particular feature (except for MIDR_EL1, see section 4). + d) All other fields (i.e, invisible fields) are set to indicate + the feature is missing (as defined by the architecture). + +4. List of registers with visible features +------------------------------------------- + + 1) ID_AA64ISAR0_EL1 - Instruction Set Attribute Register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | RNDR | [63-60] | y | + +------------------------------+---------+---------+ + | TS | [55-52] | y | + +------------------------------+---------+---------+ + | FHM | [51-48] | y | + +------------------------------+---------+---------+ + | DP | [47-44] | y | + +------------------------------+---------+---------+ + | SM4 | [43-40] | y | + +------------------------------+---------+---------+ + | SM3 | [39-36] | y | + +------------------------------+---------+---------+ + | SHA3 | [35-32] | y | + +------------------------------+---------+---------+ + | RDM | [31-28] | y | + +------------------------------+---------+---------+ + | ATOMICS | [23-20] | y | + +------------------------------+---------+---------+ + | CRC32 | [19-16] | y | + +------------------------------+---------+---------+ + | SHA2 | [15-12] | y | + +------------------------------+---------+---------+ + | SHA1 | [11-8] | y | + +------------------------------+---------+---------+ + | AES | [7-4] | y | + +------------------------------+---------+---------+ + + + 2) ID_AA64PFR0_EL1 - Processor Feature Register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | DIT | [51-48] | y | + +------------------------------+---------+---------+ + | SVE | [35-32] | y | + +------------------------------+---------+---------+ + | GIC | [27-24] | n | + +------------------------------+---------+---------+ + | AdvSIMD | [23-20] | y | + +------------------------------+---------+---------+ + | FP | [19-16] | y | + +------------------------------+---------+---------+ + | EL3 | [15-12] | n | + +------------------------------+---------+---------+ + | EL2 | [11-8] | n | + +------------------------------+---------+---------+ + | EL1 | [7-4] | n | + +------------------------------+---------+---------+ + | EL0 | [3-0] | n | + +------------------------------+---------+---------+ + + + 3) ID_AA64PFR1_EL1 - Processor Feature Register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | MTE | [11-8] | y | + +------------------------------+---------+---------+ + | SSBS | [7-4] | y | + +------------------------------+---------+---------+ + | BT | [3-0] | y | + +------------------------------+---------+---------+ + + + 4) MIDR_EL1 - Main ID Register + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | Implementer | [31-24] | y | + +------------------------------+---------+---------+ + | Variant | [23-20] | y | + +------------------------------+---------+---------+ + | Architecture | [19-16] | y | + +------------------------------+---------+---------+ + | PartNum | [15-4] | y | + +------------------------------+---------+---------+ + | Revision | [3-0] | y | + +------------------------------+---------+---------+ + + NOTE: The 'visible' fields of MIDR_EL1 will contain the value + as available on the CPU where it is fetched and is not a system + wide safe value. + + 5) ID_AA64ISAR1_EL1 - Instruction set attribute register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | I8MM | [55-52] | y | + +------------------------------+---------+---------+ + | DGH | [51-48] | y | + +------------------------------+---------+---------+ + | BF16 | [47-44] | y | + +------------------------------+---------+---------+ + | SB | [39-36] | y | + +------------------------------+---------+---------+ + | FRINTTS | [35-32] | y | + +------------------------------+---------+---------+ + | GPI | [31-28] | y | + +------------------------------+---------+---------+ + | GPA | [27-24] | y | + +------------------------------+---------+---------+ + | LRCPC | [23-20] | y | + +------------------------------+---------+---------+ + | FCMA | [19-16] | y | + +------------------------------+---------+---------+ + | JSCVT | [15-12] | y | + +------------------------------+---------+---------+ + | API | [11-8] | y | + +------------------------------+---------+---------+ + | APA | [7-4] | y | + +------------------------------+---------+---------+ + | DPB | [3-0] | y | + +------------------------------+---------+---------+ + + 6) ID_AA64MMFR0_EL1 - Memory model feature register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | ECV | [63-60] | y | + +------------------------------+---------+---------+ + + 7) ID_AA64MMFR2_EL1 - Memory model feature register 2 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | AT | [35-32] | y | + +------------------------------+---------+---------+ + + 8) ID_AA64ZFR0_EL1 - SVE feature ID register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | F64MM | [59-56] | y | + +------------------------------+---------+---------+ + | F32MM | [55-52] | y | + +------------------------------+---------+---------+ + | I8MM | [47-44] | y | + +------------------------------+---------+---------+ + | SM4 | [43-40] | y | + +------------------------------+---------+---------+ + | SHA3 | [35-32] | y | + +------------------------------+---------+---------+ + | BF16 | [23-20] | y | + +------------------------------+---------+---------+ + | BitPerm | [19-16] | y | + +------------------------------+---------+---------+ + | AES | [7-4] | y | + +------------------------------+---------+---------+ + | SVEVer | [3-0] | y | + +------------------------------+---------+---------+ + + 8) ID_AA64MMFR1_EL1 - Memory model feature register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | AFP | [47-44] | y | + +------------------------------+---------+---------+ + + 9) ID_AA64ISAR2_EL1 - Instruction set attribute register 2 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | RPRES | [7-4] | y | + +------------------------------+---------+---------+ + | WFXT | [3-0] | y | + +------------------------------+---------+---------+ + + 10) MVFR0_EL1 - AArch32 Media and VFP Feature Register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | FPDP | [11-8] | y | + +------------------------------+---------+---------+ + + 11) MVFR1_EL1 - AArch32 Media and VFP Feature Register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | SIMDFMAC | [31-28] | y | + +------------------------------+---------+---------+ + | SIMDSP | [19-16] | y | + +------------------------------+---------+---------+ + | SIMDInt | [15-12] | y | + +------------------------------+---------+---------+ + | SIMDLS | [11-8] | y | + +------------------------------+---------+---------+ + + 12) ID_ISAR5_EL1 - AArch32 Instruction Set Attribute Register 5 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | CRC32 | [19-16] | y | + +------------------------------+---------+---------+ + | SHA2 | [15-12] | y | + +------------------------------+---------+---------+ + | SHA1 | [11-8] | y | + +------------------------------+---------+---------+ + | AES | [7-4] | y | + +------------------------------+---------+---------+ + + +Appendix I: Example +------------------- + +:: + + /* + * Sample program to demonstrate the MRS emulation ABI. + * + * Copyright (C) 2015-2016, ARM Ltd + * + * Author: Suzuki K Poulose + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + + #include + #include + #include + + #define get_cpu_ftr(id) ({ \ + unsigned long __val; \ + asm("mrs %0, "#id : "=r" (__val)); \ + printf("%-20s: 0x%016lx\n", #id, __val); \ + }) + + int main(void) + { + + if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) { + fputs("CPUID registers unavailable\n", stderr); + return 1; + } + + get_cpu_ftr(ID_AA64ISAR0_EL1); + get_cpu_ftr(ID_AA64ISAR1_EL1); + get_cpu_ftr(ID_AA64MMFR0_EL1); + get_cpu_ftr(ID_AA64MMFR1_EL1); + get_cpu_ftr(ID_AA64PFR0_EL1); + get_cpu_ftr(ID_AA64PFR1_EL1); + get_cpu_ftr(ID_AA64DFR0_EL1); + get_cpu_ftr(ID_AA64DFR1_EL1); + + get_cpu_ftr(MIDR_EL1); + get_cpu_ftr(MPIDR_EL1); + get_cpu_ftr(REVIDR_EL1); + + #if 0 + /* Unexposed register access causes SIGILL */ + get_cpu_ftr(ID_MMFR0_EL1); + #endif + + return 0; + } diff --git a/Documentation/arch/arm64/elf_hwcaps.rst b/Documentation/arch/arm64/elf_hwcaps.rst new file mode 100644 index 000000000000..58a86d532228 --- /dev/null +++ b/Documentation/arch/arm64/elf_hwcaps.rst @@ -0,0 +1,309 @@ +.. _elf_hwcaps_index: + +================ +ARM64 ELF hwcaps +================ + +This document describes the usage and semantics of the arm64 ELF hwcaps. + + +1. Introduction +--------------- + +Some hardware or software features are only available on some CPU +implementations, and/or with certain kernel configurations, but have no +architected discovery mechanism available to userspace code at EL0. The +kernel exposes the presence of these features to userspace through a set +of flags called hwcaps, exposed in the auxiliary vector. + +Userspace software can test for features by acquiring the AT_HWCAP or +AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant +flags are set, e.g.:: + + bool floating_point_is_present(void) + { + unsigned long hwcaps = getauxval(AT_HWCAP); + if (hwcaps & HWCAP_FP) + return true; + + return false; + } + +Where software relies on a feature described by a hwcap, it should check +the relevant hwcap flag to verify that the feature is present before +attempting to make use of the feature. + +Features cannot be probed reliably through other means. When a feature +is not available, attempting to use it may result in unpredictable +behaviour, and is not guaranteed to result in any reliable indication +that the feature is unavailable, such as a SIGILL. + + +2. Interpretation of hwcaps +--------------------------- + +The majority of hwcaps are intended to indicate the presence of features +which are described by architected ID registers inaccessible to +userspace code at EL0. These hwcaps are defined in terms of ID register +fields, and should be interpreted with reference to the definition of +these fields in the ARM Architecture Reference Manual (ARM ARM). + +Such hwcaps are described below in the form:: + + Functionality implied by idreg.field == val. + +Such hwcaps indicate the availability of functionality that the ARM ARM +defines as being present when idreg.field has value val, but do not +indicate that idreg.field is precisely equal to val, nor do they +indicate the absence of functionality implied by other values of +idreg.field. + +Other hwcaps may indicate the presence of features which cannot be +described by ID registers alone. These may be described without +reference to ID registers, and may refer to other documentation. + + +3. The hwcaps exposed in AT_HWCAP +--------------------------------- + +HWCAP_FP + Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000. + +HWCAP_ASIMD + Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000. + +HWCAP_EVTSTRM + The generic timer is configured to generate events at a frequency of + approximately 10KHz. + +HWCAP_AES + Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001. + +HWCAP_PMULL + Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010. + +HWCAP_SHA1 + Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001. + +HWCAP_SHA2 + Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001. + +HWCAP_CRC32 + Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001. + +HWCAP_ATOMICS + Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010. + +HWCAP_FPHP + Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001. + +HWCAP_ASIMDHP + Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001. + +HWCAP_CPUID + EL0 access to certain ID registers is available, to the extent + described by Documentation/arch/arm64/cpu-feature-registers.rst. + + These ID registers may imply the availability of features. + +HWCAP_ASIMDRDM + Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001. + +HWCAP_JSCVT + Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001. + +HWCAP_FCMA + Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001. + +HWCAP_LRCPC + Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001. + +HWCAP_DCPOP + Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001. + +HWCAP_SHA3 + Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001. + +HWCAP_SM3 + Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001. + +HWCAP_SM4 + Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001. + +HWCAP_ASIMDDP + Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001. + +HWCAP_SHA512 + Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010. + +HWCAP_SVE + Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001. + +HWCAP_ASIMDFHM + Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0001. + +HWCAP_DIT + Functionality implied by ID_AA64PFR0_EL1.DIT == 0b0001. + +HWCAP_USCAT + Functionality implied by ID_AA64MMFR2_EL1.AT == 0b0001. + +HWCAP_ILRCPC + Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010. + +HWCAP_FLAGM + Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001. + +HWCAP_SSBS + Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010. + +HWCAP_SB + Functionality implied by ID_AA64ISAR1_EL1.SB == 0b0001. + +HWCAP_PACA + Functionality implied by ID_AA64ISAR1_EL1.APA == 0b0001 or + ID_AA64ISAR1_EL1.API == 0b0001, as described by + Documentation/arch/arm64/pointer-authentication.rst. + +HWCAP_PACG + Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or + ID_AA64ISAR1_EL1.GPI == 0b0001, as described by + Documentation/arch/arm64/pointer-authentication.rst. + +HWCAP2_DCPODP + Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010. + +HWCAP2_SVE2 + Functionality implied by ID_AA64ZFR0_EL1.SVEVer == 0b0001. + +HWCAP2_SVEAES + Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0001. + +HWCAP2_SVEPMULL + Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0010. + +HWCAP2_SVEBITPERM + Functionality implied by ID_AA64ZFR0_EL1.BitPerm == 0b0001. + +HWCAP2_SVESHA3 + Functionality implied by ID_AA64ZFR0_EL1.SHA3 == 0b0001. + +HWCAP2_SVESM4 + Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001. + +HWCAP2_FLAGM2 + Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010. + +HWCAP2_FRINT + Functionality implied by ID_AA64ISAR1_EL1.FRINTTS == 0b0001. + +HWCAP2_SVEI8MM + Functionality implied by ID_AA64ZFR0_EL1.I8MM == 0b0001. + +HWCAP2_SVEF32MM + Functionality implied by ID_AA64ZFR0_EL1.F32MM == 0b0001. + +HWCAP2_SVEF64MM + Functionality implied by ID_AA64ZFR0_EL1.F64MM == 0b0001. + +HWCAP2_SVEBF16 + Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0001. + +HWCAP2_I8MM + Functionality implied by ID_AA64ISAR1_EL1.I8MM == 0b0001. + +HWCAP2_BF16 + Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0001. + +HWCAP2_DGH + Functionality implied by ID_AA64ISAR1_EL1.DGH == 0b0001. + +HWCAP2_RNG + Functionality implied by ID_AA64ISAR0_EL1.RNDR == 0b0001. + +HWCAP2_BTI + Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001. + +HWCAP2_MTE + Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described + by Documentation/arch/arm64/memory-tagging-extension.rst. + +HWCAP2_ECV + Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001. + +HWCAP2_AFP + Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001. + +HWCAP2_RPRES + Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001. + +HWCAP2_MTE3 + Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0011, as described + by Documentation/arch/arm64/memory-tagging-extension.rst. + +HWCAP2_SME + Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described + by Documentation/arch/arm64/sme.rst. + +HWCAP2_SME_I16I64 + Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111. + +HWCAP2_SME_F64F64 + Functionality implied by ID_AA64SMFR0_EL1.F64F64 == 0b1. + +HWCAP2_SME_I8I32 + Functionality implied by ID_AA64SMFR0_EL1.I8I32 == 0b1111. + +HWCAP2_SME_F16F32 + Functionality implied by ID_AA64SMFR0_EL1.F16F32 == 0b1. + +HWCAP2_SME_B16F32 + Functionality implied by ID_AA64SMFR0_EL1.B16F32 == 0b1. + +HWCAP2_SME_F32F32 + Functionality implied by ID_AA64SMFR0_EL1.F32F32 == 0b1. + +HWCAP2_SME_FA64 + Functionality implied by ID_AA64SMFR0_EL1.FA64 == 0b1. + +HWCAP2_WFXT + Functionality implied by ID_AA64ISAR2_EL1.WFXT == 0b0010. + +HWCAP2_EBF16 + Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0010. + +HWCAP2_SVE_EBF16 + Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0010. + +HWCAP2_CSSC + Functionality implied by ID_AA64ISAR2_EL1.CSSC == 0b0001. + +HWCAP2_RPRFM + Functionality implied by ID_AA64ISAR2_EL1.RPRFM == 0b0001. + +HWCAP2_SVE2P1 + Functionality implied by ID_AA64ZFR0_EL1.SVEver == 0b0010. + +HWCAP2_SME2 + Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0001. + +HWCAP2_SME2P1 + Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0010. + +HWCAP2_SMEI16I32 + Functionality implied by ID_AA64SMFR0_EL1.I16I32 == 0b0101 + +HWCAP2_SMEBI32I32 + Functionality implied by ID_AA64SMFR0_EL1.BI32I32 == 0b1 + +HWCAP2_SMEB16B16 + Functionality implied by ID_AA64SMFR0_EL1.B16B16 == 0b1 + +HWCAP2_SMEF16F16 + Functionality implied by ID_AA64SMFR0_EL1.F16F16 == 0b1 + +4. Unused AT_HWCAP bits +----------------------- + +For interoperation with userspace, the kernel guarantees that bits 62 +and 63 of AT_HWCAP will always be returned as 0. diff --git a/Documentation/arch/arm64/features.rst b/Documentation/arch/arm64/features.rst new file mode 100644 index 000000000000..dfa4cb3cd3ef --- /dev/null +++ b/Documentation/arch/arm64/features.rst @@ -0,0 +1,3 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. kernel-feat:: $srctree/Documentation/features arm64 diff --git a/Documentation/arch/arm64/hugetlbpage.rst b/Documentation/arch/arm64/hugetlbpage.rst new file mode 100644 index 000000000000..a110124c11e3 --- /dev/null +++ b/Documentation/arch/arm64/hugetlbpage.rst @@ -0,0 +1,43 @@ +.. _hugetlbpage_index: + +==================== +HugeTLBpage on ARM64 +==================== + +Hugepage relies on making efficient use of TLBs to improve performance of +address translations. The benefit depends on both - + + - the size of hugepages + - size of entries supported by the TLBs + +The ARM64 port supports two flavours of hugepages. + +1) Block mappings at the pud/pmd level +-------------------------------------- + +These are regular hugepages where a pmd or a pud page table entry points to a +block of memory. Regardless of the supported size of entries in TLB, block +mappings reduce the depth of page table walk needed to translate hugepage +addresses. + +2) Using the Contiguous bit +--------------------------- + +The architecture provides a contiguous bit in the translation table entries +(D4.5.3, ARM DDI 0487C.a) that hints to the MMU to indicate that it is one of a +contiguous set of entries that can be cached in a single TLB entry. + +The contiguous bit is used in Linux to increase the mapping size at the pmd and +pte (last) level. The number of supported contiguous entries varies by page size +and level of the page table. + + +The following hugepage sizes are supported - + + ====== ======== ==== ======== === + - CONT PTE PMD CONT PMD PUD + ====== ======== ==== ======== === + 4K: 64K 2M 32M 1G + 16K: 2M 32M 1G + 64K: 2M 512M 16G + ====== ======== ==== ======== === diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst new file mode 100644 index 000000000000..ae21f8118830 --- /dev/null +++ b/Documentation/arch/arm64/index.rst @@ -0,0 +1,36 @@ +.. _arm64_index: + +================== +ARM64 Architecture +================== + +.. toctree:: + :maxdepth: 1 + + acpi_object_usage + amu + arm-acpi + asymmetric-32bit + booting + cpu-feature-registers + elf_hwcaps + hugetlbpage + legacy_instructions + memory + memory-tagging-extension + perf + pointer-authentication + silicon-errata + sme + sve + tagged-address-abi + tagged-pointers + + features + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/arch/arm64/kasan-offsets.sh b/Documentation/arch/arm64/kasan-offsets.sh new file mode 100644 index 000000000000..2dc5f9e18039 --- /dev/null +++ b/Documentation/arch/arm64/kasan-offsets.sh @@ -0,0 +1,26 @@ +#!/bin/sh + +# Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW +# start address at the top of the linear region + +print_kasan_offset () { + printf "%02d\t" $1 + printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \ + - (1 << (64 - 32 - $2)) )) +} + +echo KASAN_SHADOW_SCALE_SHIFT = 3 +printf "VABITS\tKASAN_SHADOW_OFFSET\n" +print_kasan_offset 48 3 +print_kasan_offset 47 3 +print_kasan_offset 42 3 +print_kasan_offset 39 3 +print_kasan_offset 36 3 +echo +echo KASAN_SHADOW_SCALE_SHIFT = 4 +printf "VABITS\tKASAN_SHADOW_OFFSET\n" +print_kasan_offset 48 4 +print_kasan_offset 47 4 +print_kasan_offset 42 4 +print_kasan_offset 39 4 +print_kasan_offset 36 4 diff --git a/Documentation/arch/arm64/legacy_instructions.rst b/Documentation/arch/arm64/legacy_instructions.rst new file mode 100644 index 000000000000..54401b22cb8f --- /dev/null +++ b/Documentation/arch/arm64/legacy_instructions.rst @@ -0,0 +1,68 @@ +=================== +Legacy instructions +=================== + +The arm64 port of the Linux kernel provides infrastructure to support +emulation of instructions which have been deprecated, or obsoleted in +the architecture. The infrastructure code uses undefined instruction +hooks to support emulation. Where available it also allows turning on +the instruction execution in hardware. + +The emulation mode can be controlled by writing to sysctl nodes +(/proc/sys/abi). The following explains the different execution +behaviours and the corresponding values of the sysctl nodes - + +* Undef + Value: 0 + + Generates undefined instruction abort. Default for instructions that + have been obsoleted in the architecture, e.g., SWP + +* Emulate + Value: 1 + + Uses software emulation. To aid migration of software, in this mode + usage of emulated instruction is traced as well as rate limited + warnings are issued. This is the default for deprecated + instructions, .e.g., CP15 barriers + +* Hardware Execution + Value: 2 + + Although marked as deprecated, some implementations may support the + enabling/disabling of hardware support for the execution of these + instructions. Using hardware execution generally provides better + performance, but at the loss of ability to gather runtime statistics + about the use of the deprecated instructions. + +The default mode depends on the status of the instruction in the +architecture. Deprecated instructions should default to emulation +while obsolete instructions must be undefined by default. + +Note: Instruction emulation may not be possible in all cases. See +individual instruction notes for further information. + +Supported legacy instructions +----------------------------- +* SWP{B} + +:Node: /proc/sys/abi/swp +:Status: Obsolete +:Default: Undef (0) + +* CP15 Barriers + +:Node: /proc/sys/abi/cp15_barrier +:Status: Deprecated +:Default: Emulate (1) + +* SETEND + +:Node: /proc/sys/abi/setend +:Status: Deprecated +:Default: Emulate (1)* + + Note: All the cpus on the system must have mixed endian support at EL0 + for this feature to be enabled. If a new CPU - which doesn't support mixed + endian - is hotplugged in after this feature has been enabled, there could + be unexpected results in the application. diff --git a/Documentation/arch/arm64/memory-tagging-extension.rst b/Documentation/arch/arm64/memory-tagging-extension.rst new file mode 100644 index 000000000000..679725030731 --- /dev/null +++ b/Documentation/arch/arm64/memory-tagging-extension.rst @@ -0,0 +1,375 @@ +=============================================== +Memory Tagging Extension (MTE) in AArch64 Linux +=============================================== + +Authors: Vincenzo Frascino + Catalin Marinas + +Date: 2020-02-25 + +This document describes the provision of the Memory Tagging Extension +functionality in AArch64 Linux. + +Introduction +============ + +ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) +feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI +(Top Byte Ignore) feature and allows software to access a 4-bit +allocation tag for each 16-byte granule in the physical address space. +Such memory range must be mapped with the Normal-Tagged memory +attribute. A logical tag is derived from bits 59-56 of the virtual +address used for the memory access. A CPU with MTE enabled will compare +the logical tag against the allocation tag and potentially raise an +exception on mismatch, subject to system registers configuration. + +Userspace Support +================= + +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is +supported by the hardware, the kernel advertises the feature to +userspace via ``HWCAP2_MTE``. + +PROT_MTE +-------- + +To access the allocation tags, a user process must enable the Tagged +memory attribute on an address range using a new ``prot`` flag for +``mmap()`` and ``mprotect()``: + +``PROT_MTE`` - Pages allow access to the MTE allocation tags. + +The allocation tag is set to 0 when such pages are first mapped in the +user address space and preserved on copy-on-write. ``MAP_SHARED`` is +supported and the allocation tags can be shared between processes. + +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other +types of mapping will result in ``-EINVAL`` returned by these system +calls. + +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot +be cleared by ``mprotect()``. + +**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and +``MADV_FREE`` may have the allocation tags cleared (set to 0) at any +point after the system call. + +Tag Check Faults +---------------- + +When ``PROT_MTE`` is enabled on an address range and a mismatch between +the logical and allocation tags occurs on access, there are three +configurable behaviours: + +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the + tag check fault. + +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with + ``.si_code = SEGV_MTESERR`` and ``.si_addr = ``. The + memory access is not performed. If ``SIGSEGV`` is ignored or blocked + by the offending thread, the containing process is terminated with a + ``coredump``. + +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending + thread, asynchronously following one or multiple tag check faults, + with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting + address is unknown). + +- *Asymmetric* - Reads are handled as for synchronous mode while writes + are handled as for asynchronous mode. + +The user can select the above modes, per thread, using the +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags`` +contains any number of the following values in the ``PR_MTE_TCF_MASK`` +bit-field: + +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults + (ignored if combined with other options) +- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode + +If no modes are specified, tag check faults are ignored. If a single +mode is specified, the program will run in that mode. If multiple +modes are specified, the mode is selected as described in the "Per-CPU +preferred tag checking modes" section below. + +The current tag check fault configuration can be read using the +``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. If +multiple modes were requested then all will be reported. + +Tag checking can also be disabled for a user thread by setting the +``PSTATE.TCO`` bit with ``MSR TCO, #1``. + +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, +irrespective of the interrupted context. ``PSTATE.TCO`` is restored on +``sigreturn()``. + +**Note**: There are no *match-all* logical tags available for user +applications. + +**Note**: Kernel accesses to the user address space (e.g. ``read()`` +system call) are not checked if the user thread tag checking mode is +``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is +``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user +address accesses, however it cannot always guarantee it. Kernel accesses +to user addresses are always performed with an effective ``PSTATE.TCO`` +value of zero, regardless of the user configuration. + +Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions +----------------------------------------------------------------- + +The architecture allows excluding certain tags to be randomly generated +via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux +excludes all tags other than 0. A user thread can enable specific tags +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap +in the ``PR_MTE_TAG_MASK`` bit-field. + +**Note**: The hardware uses an exclude mask but the ``prctl()`` +interface provides an include mask. An include mask of ``0`` (exclusion +mask ``0xffff``) results in the CPU always generating tag ``0``. + +Per-CPU preferred tag checking mode +----------------------------------- + +On some CPUs the performance of MTE in stricter tag checking modes +is similar to that of less strict tag checking modes. This makes it +worthwhile to enable stricter checks on those CPUs when a less strict +checking mode is requested, in order to gain the error detection +benefits of the stricter checks without the performance downsides. To +support this scenario, a privileged user may configure a stricter +tag checking mode as the CPU's preferred tag checking mode. + +The preferred tag checking mode for each CPU is controlled by +``/sys/devices/system/cpu/cpu/mte_tcf_preferred``, to which a +privileged user may write the value ``async``, ``sync`` or ``asymm``. The +default preferred mode for each CPU is ``async``. + +To allow a program to potentially run in the CPU's preferred tag +checking mode, the user program may set multiple tag check fault mode +bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, +flags, 0, 0, 0)`` system call. If both synchronous and asynchronous +modes are requested then asymmetric mode may also be selected by the +kernel. If the CPU's preferred tag checking mode is in the task's set +of provided tag checking modes, that mode will be selected. Otherwise, +one of the modes in the task's mode will be selected by the kernel +from the task's mode set using the preference order: + + 1. Asynchronous + 2. Asymmetric + 3. Synchronous + +Note that there is no way for userspace to request multiple modes and +also disable asymmetric mode. + +Initial process state +--------------------- + +On ``execve()``, the new process has the following configuration: + +- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) +- No tag checking modes are selected (tag check faults ignored) +- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) +- ``PSTATE.TCO`` set to 0 +- ``PROT_MTE`` not set on any of the initial memory maps + +On ``fork()``, the new process inherits the parent's configuration and +memory map attributes with the exception of the ``madvise()`` ranges +with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set +to 0). + +The ``ptrace()`` interface +-------------------------- + +``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read +the tags from or set the tags to a tracee's address space. The +``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, +data)`` where: + +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``. +- ``pid`` - the tracee's PID. +- ``addr`` - address in the tracee's address space. +- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to + a buffer of ``iov_len`` length in the tracer's address space. + +The tags in the tracer's ``iov_base`` buffer are represented as one +4-bit tag per byte and correspond to a 16-byte MTE tag granule in the +tracee's address space. + +**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel +will use the corresponding aligned address. + +``ptrace()`` return value: + +- 0 - tags were copied, the tracer's ``iov_len`` was updated to the + number of tags transferred. This may be smaller than the requested + ``iov_len`` if the requested address range in the tracee's or the + tracer's space cannot be accessed or does not have valid tags. +- ``-EPERM`` - the specified process cannot be traced. +- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid + address) and no tags copied. ``iov_len`` not updated. +- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` + or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. +- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never + mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. + +**Note**: There are no transient errors for the requests above, so user +programs should not retry in case of a non-zero system call return. + +``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == +``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged +address ABI control and MTE configuration of a process as per the +``prctl()`` options described in +Documentation/arch/arm64/tagged-address-abi.rst and above. The corresponding +``regset`` is 1 element of 8 bytes (``sizeof(long))``). + +Core dump support +----------------- + +The allocation tags for user memory mapped with ``PROT_MTE`` are dumped +in the core file as additional ``PT_AARCH64_MEMTAG_MTE`` segments. The +program header for such segment is defined as: + +:``p_type``: ``PT_AARCH64_MEMTAG_MTE`` +:``p_flags``: 0 +:``p_offset``: segment file offset +:``p_vaddr``: segment virtual address, same as the corresponding + ``PT_LOAD`` segment +:``p_paddr``: 0 +:``p_filesz``: segment size in file, calculated as ``p_mem_sz / 32`` + (two 4-bit tags cover 32 bytes of memory) +:``p_memsz``: segment size in memory, same as the corresponding + ``PT_LOAD`` segment +:``p_align``: 0 + +The tags are stored in the core file at ``p_offset`` as two 4-bit tags +in a byte. With the tag granule of 16 bytes, a 4K page requires 128 +bytes in the core file. + +Example of correct usage +======================== + +*MTE Example code* + +.. code-block:: c + + /* + * To be compiled with -march=armv8.5-a+memtag + */ + #include + #include + #include + #include + #include + #include + #include + #include + + /* + * From arch/arm64/include/uapi/asm/hwcap.h + */ + #define HWCAP2_MTE (1 << 18) + + /* + * From arch/arm64/include/uapi/asm/mman.h + */ + #define PROT_MTE 0x20 + + /* + * From include/uapi/linux/prctl.h + */ + #define PR_SET_TAGGED_ADDR_CTRL 55 + #define PR_GET_TAGGED_ADDR_CTRL 56 + # define PR_TAGGED_ADDR_ENABLE (1UL << 0) + # define PR_MTE_TCF_SHIFT 1 + # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TAG_SHIFT 3 + # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) + + /* + * Insert a random logical tag into the given pointer. + */ + #define insert_random_tag(ptr) ({ \ + uint64_t __val; \ + asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ + __val; \ + }) + + /* + * Set the allocation tag on the destination address. + */ + #define set_tag(tagged_addr) do { \ + asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ + } while (0) + + int main() + { + unsigned char *a; + unsigned long page_sz = sysconf(_SC_PAGESIZE); + unsigned long hwcap2 = getauxval(AT_HWCAP2); + + /* check if MTE is present */ + if (!(hwcap2 & HWCAP2_MTE)) + return EXIT_FAILURE; + + /* + * Enable the tagged address ABI, synchronous or asynchronous MTE + * tag check faults (based on per-CPU preference) and allow all + * non-zero tags in the randomly generated set. + */ + if (prctl(PR_SET_TAGGED_ADDR_CTRL, + PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC | + (0xfffe << PR_MTE_TAG_SHIFT), + 0, 0, 0)) { + perror("prctl() failed"); + return EXIT_FAILURE; + } + + a = mmap(0, page_sz, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (a == MAP_FAILED) { + perror("mmap() failed"); + return EXIT_FAILURE; + } + + /* + * Enable MTE on the above anonymous mmap. The flag could be passed + * directly to mmap() and skip this step. + */ + if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { + perror("mprotect() failed"); + return EXIT_FAILURE; + } + + /* access with the default tag (0) */ + a[0] = 1; + a[1] = 2; + + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); + + /* set the logical and allocation tags */ + a = (unsigned char *)insert_random_tag(a); + set_tag(a); + + printf("%p\n", a); + + /* non-zero tag access */ + a[0] = 3; + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); + + /* + * If MTE is enabled correctly the next instruction will generate an + * exception. + */ + printf("Expecting SIGSEGV...\n"); + a[16] = 0xdd; + + /* this should not be printed in the PR_MTE_TCF_SYNC mode */ + printf("...haven't got one\n"); + + return EXIT_FAILURE; + } diff --git a/Documentation/arch/arm64/memory.rst b/Documentation/arch/arm64/memory.rst new file mode 100644 index 000000000000..2a641ba7be3b --- /dev/null +++ b/Documentation/arch/arm64/memory.rst @@ -0,0 +1,167 @@ +============================== +Memory Layout on AArch64 Linux +============================== + +Author: Catalin Marinas + +This document describes the virtual memory layout used by the AArch64 +Linux kernel. The architecture allows up to 4 levels of translation +tables with a 4KB page size and up to 3 levels with a 64KB page size. + +AArch64 Linux uses either 3 levels or 4 levels of translation tables +with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit +(256TB) virtual addresses, respectively, for both user and kernel. With +64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) +virtual address, are used but the memory layout is the same. + +ARMv8.2 adds optional support for Large Virtual Address space. This is +only available when running with a 64KB page size and expands the +number of descriptors in the first level of translation. + +User addresses have bits 63:48 set to 0 while the kernel addresses have +the same bits set to 1. TTBRx selection is given by bit 63 of the +virtual address. The swapper_pg_dir contains only kernel (global) +mappings while the user pgd contains only user (non-global) mappings. +The swapper_pg_dir address is written to TTBR1 and never written to +TTBR0. + + +AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: + + Start End Size Use + ----------------------------------------------------------------------- + 0000000000000000 0000ffffffffffff 256TB user + ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map + [ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region] + ffff800000000000 ffff800007ffffff 128MB modules + ffff800008000000 fffffbffefffffff 124TB vmalloc + fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) + fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] + fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space + fffffbffff800000 fffffbffffffffff 8MB [guard region] + fffffc0000000000 fffffdffffffffff 2TB vmemmap + fffffe0000000000 ffffffffffffffff 2TB [guard region] + + +AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: + + Start End Size Use + ----------------------------------------------------------------------- + 0000000000000000 000fffffffffffff 4PB user + fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map + [fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region] + ffff800000000000 ffff800007ffffff 128MB modules + ffff800008000000 fffffbffefffffff 124TB vmalloc + fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) + fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] + fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space + fffffbffff800000 fffffbffffffffff 8MB [guard region] + fffffc0000000000 ffffffdfffffffff ~4TB vmemmap + ffffffe000000000 ffffffffffffffff 128GB [guard region] + + +Translation table lookup with 4KB pages:: + + +--------+--------+--------+--------+--------+--------+--------+--------+ + |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| + +--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | | + | | | | | v + | | | | | [11:0] in-page offset + | | | | +-> [20:12] L3 index + | | | +-----------> [29:21] L2 index + | | +---------------------> [38:30] L1 index + | +-------------------------------> [47:39] L0 index + +-------------------------------------------------> [63] TTBR0/1 + + +Translation table lookup with 64KB pages:: + + +--------+--------+--------+--------+--------+--------+--------+--------+ + |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| + +--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | + | | | | v + | | | | [15:0] in-page offset + | | | +----------> [28:16] L3 index + | | +--------------------------> [41:29] L2 index + | +-------------------------------> [47:42] L1 index (48-bit) + | [51:42] L1 index (52-bit) + +-------------------------------------------------> [63] TTBR0/1 + + +When using KVM without the Virtualization Host Extensions, the +hypervisor maps kernel pages in EL2 at a fixed (and potentially +random) offset from the linear mapping. See the kern_hyp_va macro and +kvm_update_va_mask function for more details. MMIO devices such as +GICv2 gets mapped next to the HYP idmap page, as do vectors when +ARM64_SPECTRE_V3A is enabled for particular CPUs. + +When using KVM with the Virtualization Host Extensions, no additional +mappings are created, since the host kernel runs directly in EL2. + +52-bit VA support in the kernel +------------------------------- +If the ARMv8.2-LVA optional feature is present, and we are running +with a 64KB page size; then it is possible to use 52-bits of address +space for both userspace and kernel addresses. However, any kernel +binary that supports 52-bit must also be able to fall back to 48-bit +at early boot time if the hardware feature is not present. + +This fallback mechanism necessitates the kernel .text to be in the +higher addresses such that they are invariant to 48/52-bit VAs. Due +to the kasan shadow being a fraction of the entire kernel VA space, +the end of the kasan shadow must also be in the higher half of the +kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit, +the end of the kasan shadow is invariant and dependent on ~0UL, +whilst the start address will "grow" towards the lower addresses). + +In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET +is kept constant at 0xFFF0000000000000 (corresponding to 52-bit), +this obviates the need for an extra variable read. The physvirt +offset and vmemmap offsets are computed at early boot to enable +this logic. + +As a single binary will need to support both 48-bit and 52-bit VA +spaces, the VMEMMAP must be sized large enough for 52-bit VAs and +also must be sized large enough to accommodate a fixed PAGE_OFFSET. + +Most code in the kernel should not need to consider the VA_BITS, for +code that does need to know the VA size the variables are +defined as follows: + +VA_BITS constant the *maximum* VA space size + +VA_BITS_MIN constant the *minimum* VA space size + +vabits_actual variable the *actual* VA space size + + +Maximum and minimum sizes can be useful to ensure that buffers are +sized large enough or that addresses are positioned close enough for +the "worst" case. + +52-bit userspace VAs +-------------------- +To maintain compatibility with software that relies on the ARMv8.0 +VA space maximum size of 48-bits, the kernel will, by default, +return virtual addresses to userspace from a 48-bit range. + +Software can "opt-in" to receiving VAs from a 52-bit space by +specifying an mmap hint parameter that is larger than 48-bit. + +For example: + +.. code-block:: c + + maybe_high_address = mmap(~0UL, size, prot, flags,...); + +It is also possible to build a debug kernel that returns addresses +from a 52-bit space by enabling the following kernel config options: + +.. code-block:: sh + + CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y + +Note that this option is only intended for debugging applications +and should not be used in production. diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst new file mode 100644 index 000000000000..1f87b57c2332 --- /dev/null +++ b/Documentation/arch/arm64/perf.rst @@ -0,0 +1,166 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _perf_index: + +==== +Perf +==== + +Perf Event Attributes +===================== + +:Author: Andrew Murray +:Date: 2019-03-06 + +exclude_user +------------ + +This attribute excludes userspace. + +Userspace always runs at EL0 and thus this attribute will exclude EL0. + + +exclude_kernel +-------------- + +This attribute excludes the kernel. + +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run +at EL1. + +For the host this attribute will exclude EL1 and additionally EL2 on a VHE +system. + +For the guest this attribute will exclude EL1. Please note that EL2 is +never counted within a guest. + + +exclude_hv +---------- + +This attribute excludes the hypervisor. + +For a VHE host this attribute is ignored as we consider the host kernel to +be the hypervisor. + +For a non-VHE host this attribute will exclude EL2 as we consider the +hypervisor to be any code that runs at EL2 which is predominantly used for +guest/host transitions. + +For the guest this attribute has no effect. Please note that EL2 is +never counted within a guest. + + +exclude_host / exclude_guest +---------------------------- + +These attributes exclude the KVM host and guest, respectively. + +The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE +kernel or non-VHE hypervisor). + +The KVM guest may run at EL0 (userspace) and EL1 (kernel). + +Due to the overlapping exception levels between host and guests we cannot +exclusively rely on the PMU's hardware exception filtering - therefore we +must enable/disable counting on the entry and exit to the guest. This is +performed differently on VHE and non-VHE systems. + +For non-VHE systems we exclude EL2 for exclude_host - upon entering and +exiting the guest we disable/enable the event as appropriate based on the +exclude_host and exclude_guest attributes. + +For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2 +for exclude_host. Upon entering and exiting the guest we modify the event +to include/exclude EL0 as appropriate based on the exclude_host and +exclude_guest attributes. + +The statements above also apply when these attributes are used within a +non-VHE guest however please note that EL2 is never counted within a guest. + + +Accuracy +-------- + +On non-VHE hosts we enable/disable counters on the entry/exit of host/guest +transition at EL2 - however there is a period of time between +enabling/disabling the counters and entering/exiting the guest. We are +able to eliminate counters counting host events on the boundaries of guest +entry/exit when counting guest events by filtering out EL2 for +exclude_host. However when using !exclude_hv there is a small blackout +window at the guest entry/exit where host events are not captured. + +On VHE systems there are no blackout windows. + +Perf Userspace PMU Hardware Counter Access +========================================== + +Overview +-------- +The perf userspace tool relies on the PMU to monitor events. It offers an +abstraction layer over the hardware counters since the underlying +implementation is cpu-dependent. +Arm64 allows userspace tools to have access to the registers storing the +hardware counters' values directly. + +This targets specifically self-monitoring tasks in order to reduce the overhead +by directly accessing the registers without having to go through the kernel. + +How-to +------ +The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu +registers is enabled and that the userspace has access to the relevant +information in order to use them. + +In order to have access to the hardware counters, the global sysctl +kernel/perf_user_access must first be enabled: + +.. code-block:: sh + + echo 1 > /proc/sys/kernel/perf_user_access + +It is necessary to open the event using the perf tool interface with config1:1 +attr bit set: the sys_perf_event_open syscall returns a fd which can +subsequently be used with the mmap syscall in order to retrieve a page of memory +containing information about the event. The PMU driver uses this page to expose +to the user the hardware counter's index and other necessary data. Using this +index enables the user to access the PMU registers using the `mrs` instruction. +Access to the PMU registers is only valid while the sequence lock is unchanged. +In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is +changed. + +The userspace access is supported in libperf using the perf_evsel__mmap() +and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for +an example. + +About heterogeneous systems +--------------------------- +On heterogeneous systems such as big.LITTLE, userspace PMU counter access can +only be enabled when the tasks are pinned to a homogeneous subset of cores and +the corresponding PMU instance is opened by specifying the 'type' attribute. +The use of generic event types is not supported in this case. + +Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It +can be run using the perf tool to check that the access to the registers works +correctly from userspace: + +.. code-block:: sh + + perf test -v user + +About chained events and counter sizes +-------------------------------------- +The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1) +counter along with userspace access. The sys_perf_event_open syscall will fail +if a 64-bit counter is requested and the hardware doesn't support 64-bit +counters. Chained events are not supported in conjunction with userspace counter +access. If a 32-bit counter is requested on hardware with 64-bit counters, then +userspace must treat the upper 32-bits read from the counter as UNKNOWN. The +'pmc_width' field in the user page will indicate the valid width of the counter +and should be used to mask the upper bits as needed. + +.. Links +.. _tools/perf/arch/arm64/tests/user-events.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c +.. _tools/lib/perf/tests/test-evsel.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c diff --git a/Documentation/arch/arm64/pointer-authentication.rst b/Documentation/arch/arm64/pointer-authentication.rst new file mode 100644 index 000000000000..e5dad2e40aa8 --- /dev/null +++ b/Documentation/arch/arm64/pointer-authentication.rst @@ -0,0 +1,142 @@ +======================================= +Pointer authentication in AArch64 Linux +======================================= + +Author: Mark Rutland + +Date: 2017-07-19 + +This document briefly describes the provision of pointer authentication +functionality in AArch64 Linux. + + +Architecture overview +--------------------- + +The ARMv8.3 Pointer Authentication extension adds primitives that can be +used to mitigate certain classes of attack where an attacker can corrupt +the contents of some memory (e.g. the stack). + +The extension uses a Pointer Authentication Code (PAC) to determine +whether pointers have been modified unexpectedly. A PAC is derived from +a pointer, another value (such as the stack pointer), and a secret key +held in system registers. + +The extension adds instructions to insert a valid PAC into a pointer, +and to verify/remove the PAC from a pointer. The PAC occupies a number +of high-order bits of the pointer, which varies dependent on the +configured virtual address size and whether pointer tagging is in use. + +A subset of these instructions have been allocated from the HINT +encoding space. In the absence of the extension (or when disabled), +these instructions behave as NOPs. Applications and libraries using +these instructions operate correctly regardless of the presence of the +extension. + +The extension provides five separate keys to generate PACs - two for +instruction addresses (APIAKey, APIBKey), two for data addresses +(APDAKey, APDBKey), and one for generic authentication (APGAKey). + + +Basic support +------------- + +When CONFIG_ARM64_PTR_AUTH is selected, and relevant HW support is +present, the kernel will assign random key values to each process at +exec*() time. The keys are shared by all threads within the process, and +are preserved across fork(). + +Presence of address authentication functionality is advertised via +HWCAP_PACA, and generic authentication functionality via HWCAP_PACG. + +The number of bits that the PAC occupies in a pointer is 55 minus the +virtual address size configured by the kernel. For example, with a +virtual address size of 48, the PAC is 7 bits wide. + +When ARM64_PTR_AUTH_KERNEL is selected, the kernel will be compiled +with HINT space pointer authentication instructions protecting +function returns. Kernels built with this option will work on hardware +with or without pointer authentication support. + +In addition to exec(), keys can also be reinitialized to random values +using the PR_PAC_RESET_KEYS prctl. A bitmask of PR_PAC_APIAKEY, +PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY and PR_PAC_APGAKEY +specifies which keys are to be reinitialized; specifying 0 means "all +keys". + + +Debugging +--------- + +When CONFIG_ARM64_PTR_AUTH is selected, and HW support for address +authentication is present, the kernel will expose the position of TTBR0 +PAC bits in the NT_ARM_PAC_MASK regset (struct user_pac_mask), which +userspace can acquire via PTRACE_GETREGSET. + +The regset is exposed only when HWCAP_PACA is set. Separate masks are +exposed for data pointers and instruction pointers, as the set of PAC +bits can vary between the two. Note that the masks apply to TTBR0 +addresses, and are not valid to apply to TTBR1 addresses (e.g. kernel +pointers). + +Additionally, when CONFIG_CHECKPOINT_RESTORE is also set, the kernel +will expose the NT_ARM_PACA_KEYS and NT_ARM_PACG_KEYS regsets (struct +user_pac_address_keys and struct user_pac_generic_keys). These can be +used to get and set the keys for a thread. + + +Virtualization +-------------- + +Pointer authentication is enabled in KVM guest when each virtual cpu is +initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and +requesting these two separate cpu features to be enabled. The current KVM +guest implementation works by enabling both features together, so both +these userspace flags are checked before enabling pointer authentication. +The separate userspace flag will allow to have no userspace ABI changes +if support is added in the future to allow these two features to be +enabled independently of one another. + +As Arm Architecture specifies that Pointer Authentication feature is +implemented along with the VHE feature so KVM arm64 ptrauth code relies +on VHE mode to be present. + +Additionally, when these vcpu feature flags are not set then KVM will +filter out the Pointer Authentication system key registers from +KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID +register. Any attempt to use the Pointer Authentication instructions will +result in an UNDEFINED exception being injected into the guest. + + +Enabling and disabling keys +--------------------------- + +The prctl PR_PAC_SET_ENABLED_KEYS allows the user program to control which +PAC keys are enabled in a particular task. It takes two arguments, the +first being a bitmask of PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY +and PR_PAC_APDBKEY specifying which keys shall be affected by this prctl, +and the second being a bitmask of the same bits specifying whether the key +should be enabled or disabled. For example:: + + prctl(PR_PAC_SET_ENABLED_KEYS, + PR_PAC_APIAKEY | PR_PAC_APIBKEY | PR_PAC_APDAKEY | PR_PAC_APDBKEY, + PR_PAC_APIBKEY, 0, 0); + +disables all keys except the IB key. + +The main reason why this is useful is to enable a userspace ABI that uses PAC +instructions to sign and authenticate function pointers and other pointers +exposed outside of the function, while still allowing binaries conforming to +the ABI to interoperate with legacy binaries that do not sign or authenticate +pointers. + +The idea is that a dynamic loader or early startup code would issue this +prctl very early after establishing that a process may load legacy binaries, +but before executing any PAC instructions. + +For compatibility with previous kernel versions, processes start up with IA, +IB, DA and DB enabled, and are reset to this state on exec(). Processes created +via fork() and clone() inherit the key enabled state from the calling process. + +It is recommended to avoid disabling the IA key, as this has higher performance +overhead than disabling any of the other keys. diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst new file mode 100644 index 000000000000..9e311bc43e05 --- /dev/null +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -0,0 +1,216 @@ +======================================= +Silicon Errata and Software Workarounds +======================================= + +Author: Will Deacon + +Date : 27 November 2015 + +It is an unfortunate fact of life that hardware is often produced with +so-called "errata", which can cause it to deviate from the architecture +under specific circumstances. For hardware produced by ARM, these +errata are broadly classified into the following categories: + + ========== ======================================================== + Category A A critical error without a viable workaround. + Category B A significant or critical error with an acceptable + workaround. + Category C A minor error that is not expected to occur under normal + operation. + ========== ======================================================== + +For more information, consult one of the "Software Developers Errata +Notice" documents available on infocenter.arm.com (registration +required). + +As far as Linux is concerned, Category B errata may require some special +treatment in the operating system. For example, avoiding a particular +sequence of code, or configuring the processor in a particular way. A +less common situation may require similar actions in order to declassify +a Category A erratum into a Category C erratum. These are collectively +known as "software workarounds" and are only required in the minority of +cases (e.g. those cases that both require a non-secure workaround *and* +can be triggered by Linux). + +For software workarounds that may adversely impact systems unaffected by +the erratum in question, a Kconfig entry is added under "Kernel +Features" -> "ARM errata workarounds via the alternatives framework". +These are enabled by default and patched in at runtime when an affected +CPU is detected. For less-intrusive workarounds, a Kconfig option is not +available and the code is structured (preferably with a comment) in such +a way that the erratum will not be hit. + +This approach can make it slightly onerous to determine exactly which +errata are worked around in an arbitrary kernel source tree, so this +file acts as a registry of software workarounds in the Linux Kernel and +will be updated when new workarounds are committed and backported to +stable kernels. + ++----------------+-----------------+-----------------+-----------------------------+ +| Implementor | Component | Erratum ID | Kconfig | ++================+=================+=================+=============================+ +| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2457168 | ARM64_ERRATUM_2457168 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2064142 | ARM64_ERRATUM_2064142 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2038923 | ARM64_ERRATUM_2038923 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #1902691 | ARM64_ERRATUM_1902691 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #1530923 | ARM64_ERRATUM_1530923 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #2441007 | ARM64_ERRATUM_2441007 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #852523 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #1319537 | ARM64_ERRATUM_1319367 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #1742098 | ARM64_ERRATUM_1742098 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A72 | #853709 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A72 | #1319367 | ARM64_ERRATUM_1319367 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A72 | #1655431 | ARM64_ERRATUM_1742098 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A76 | #1165522 | ARM64_ERRATUM_1165522 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A76 | #1286807 | ARM64_ERRATUM_1286807 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2051678 | ARM64_ERRATUM_2051678 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2077057 | ARM64_ERRATUM_2077057 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2441009 | ARM64_ERRATUM_2441009 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A715 | #2645198 | ARM64_ERRATUM_2645198 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-X2 | #2119858 | ARM64_ERRATUM_2119858 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-X2 | #2224489 | ARM64_ERRATUM_2224489 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N1 | #1349291 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | MMU-500 | #841119,826419 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 | ++----------------+-----------------+-----------------+-----------------------------+ +| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX ITS | #22375,24313 | CAVIUM_ERRATUM_22375 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX GICv3 | #23154,38545 | CAVIUM_ERRATUM_23154 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX GICv3 | #38539 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX SMMUv2 | #27704 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX2 SMMUv3| #74 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX2 SMMUv3| #126 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX2 Core | #219 | CAVIUM_TX2_ERRATUM_219 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Marvell | ARM-MMU-500 | #582743 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| NVIDIA | Carmel Core | N/A | NVIDIA_CARMEL_CNP_ERRATUM | ++----------------+-----------------+-----------------+-----------------------------+ +| NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip0{5,6,7} | #161010101 | HISILICON_ERRATUM_161010101 | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip0{6,7} | #161010701 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip0{6,7} | #161010803 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip07 | #161600802 | HISILICON_ERRATUM_161600802 | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip08 SMMU PMCG | #162001800 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo/Falkor v1 | E1009 | QCOM_FALKOR_ERRATUM_1009 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | QDF2400 ITS | E0065 | QCOM_QDF2400_ERRATUM_0065 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Falkor v{1,2} | E1041 | QCOM_FALKOR_ERRATUM_1041 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1463225 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1418040 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Silver | N/A | ARM64_ERRATUM_1530923 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Silver | N/A | ARM64_ERRATUM_1024718 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1286807 | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| Rockchip | RK3588 | #3588001 | ROCKCHIP_ERRATUM_3588001 | ++----------------+-----------------+-----------------+-----------------------------+ + ++----------------+-----------------+-----------------+-----------------------------+ +| Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 | ++----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arch/arm64/sme.rst b/Documentation/arch/arm64/sme.rst new file mode 100644 index 000000000000..ba529a1dc606 --- /dev/null +++ b/Documentation/arch/arm64/sme.rst @@ -0,0 +1,468 @@ +=================================================== +Scalable Matrix Extension support for AArch64 Linux +=================================================== + +This document outlines briefly the interface provided to userspace by Linux in +order to support use of the ARM Scalable Matrix Extension (SME). + +This is an outline of the most important features and issues only and not +intended to be exhaustive. It should be read in conjunction with the SVE +documentation in sve.rst which provides details on the Streaming SVE mode +included in SME. + +This document does not aim to describe the SME architecture or programmer's +model. To aid understanding, a minimal description of relevant programmer's +model features for SME is included in Appendix A. + + +1. General +----------- + +* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA and (when + present) ZTn register state and TPIDR2_EL0 are tracked per thread. + +* The presence of SME is reported to userspace via HWCAP2_SME in the aux vector + AT_HWCAP2 entry. Presence of this flag implies the presence of the SME + instructions and registers, and the Linux-specific system interfaces + described in this document. SME is reported in /proc/cpuinfo as "sme". + +* The presence of SME2 is reported to userspace via HWCAP2_SME2 in the + aux vector AT_HWCAP2 entry. Presence of this flag implies the presence of + the SME2 instructions and ZT0, and the Linux-specific system interfaces + described in this document. SME2 is reported in /proc/cpuinfo as "sme2". + +* Support for the execution of SME instructions in userspace can also be + detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS + instruction, and checking that the value of the SME field is nonzero. [3] + + It does not guarantee the presence of the system interfaces described in the + following sections: software that needs to verify that those interfaces are + present must check for HWCAP2_SME instead. + +* There are a number of optional SME features, presence of these is reported + through AT_HWCAP2 through: + + HWCAP2_SME_I16I64 + HWCAP2_SME_F64F64 + HWCAP2_SME_I8I32 + HWCAP2_SME_F16F32 + HWCAP2_SME_B16F32 + HWCAP2_SME_F32F32 + HWCAP2_SME_FA64 + HWCAP2_SME2 + + This list may be extended over time as the SME architecture evolves. + + These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1, + which userspace can read using an MRS instruction. See elf_hwcaps.txt and + cpu-feature-registers.txt for details. + +* Debuggers should restrict themselves to interacting with the target via the + NT_ARM_SVE, NT_ARM_SSVE, NT_ARM_ZA and NT_ARM_ZT regsets. The recommended + way of detecting support for these regsets is to connect to a target process + first and then attempt a + + ptrace(PTRACE_GETREGSET, pid, NT_ARM_, &iov). + +* Whenever ZA register values are exchanged in memory between userspace and + the kernel, the register value is encoded in memory as a series of horizontal + vectors from 0 to VL/8-1 stored in the same endianness invariant format as is + used for SVE vectors. + +* On thread creation TPIDR2_EL0 is preserved unless CLONE_SETTLS is specified, + in which case it is set to 0. + +2. Vector lengths +------------------ + +SME defines a second vector length similar to the SVE vector length which is +controls the size of the streaming mode SVE vectors and the ZA matrix array. +The ZA matrix is square with each side having as many bytes as a streaming +mode SVE vector. + + +3. Sharing of streaming and non-streaming mode SVE state +--------------------------------------------------------- + +It is implementation defined which if any parts of the SVE state are shared +between streaming and non-streaming modes. When switching between modes +via software interfaces such as ptrace if no register content is provided as +part of switching no state will be assumed to be shared and everything will +be zeroed. + + +4. System call behaviour +------------------------- + +* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the + ZA matrix and ZTn (if present) are preserved. + +* On syscall PSTATE.SM will be cleared and the SVE registers will be handled + as per the standard SVE ABI. + +* None of the SVE registers, ZA or ZTn are used to pass arguments to + or receive results from any syscall. + +* On process creation (eg, clone()) the newly created process will have + PSTATE.SM cleared. + +* All other SME state of a thread, including the currently configured vector + length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector + length (if any), is preserved across all syscalls, subject to the specific + exceptions for execve() described in section 6. + + +5. Signal handling +------------------- + +* Signal handlers are invoked with streaming mode and ZA disabled. + +* A new signal frame record TPIDR2_MAGIC is added formatted as a struct + tpidr2_context to allow access to TPIDR2_EL0 from signal handlers. + +* A new signal frame record za_context encodes the ZA register contents on + signal delivery. [1] + +* The signal frame record for ZA always contains basic metadata, in particular + the thread's vector length (in za_context.vl). + +* The ZA matrix may or may not be included in the record, depending on + the value of PSTATE.ZA. The registers are present if and only if: + za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl)) + in which case PSTATE.ZA == 1. + +* If matrix data is present, the remainder of the record has a vl-dependent + size and layout. Macros ZA_SIG_* are defined [1] to facilitate access to + them. + +* The matrix is stored as a series of horizontal vectors in the same format as + is used for SVE vectors. + +* If the ZA context is too big to fit in sigcontext.__reserved[], then extra + space is allocated on the stack, an extra_context record is written in + __reserved[] referencing this space. za_context is then written in the + extra space. Refer to [1] for further details about this mechanism. + +* If ZTn is supported and PSTATE.ZA==1 then a signal frame record for ZTn will + be generated. + +* The signal record for ZTn has magic ZT_MAGIC (0x5a544e01) and consists of a + standard signal frame header followed by a struct zt_context specifying + the number of ZTn registers supported by the system, then zt_context.nregs + blocks of 64 bytes of data per register. + + +5. Signal return +----------------- + +When returning from a signal handler: + +* If there is no za_context record in the signal frame, or if the record is + present but contains no register data as described in the previous section, + then ZA is disabled. + +* If za_context is present in the signal frame and contains matrix data then + PSTATE.ZA is set to 1 and ZA is populated with the specified data. + +* The vector length cannot be changed via signal return. If za_context.vl in + the signal frame does not match the current vector length, the signal return + attempt is treated as illegal, resulting in a forced SIGSEGV. + +* If ZTn is not supported or PSTATE.ZA==0 then it is illegal to have a + signal frame record for ZTn, resulting in a forced SIGSEGV. + + +6. prctl extensions +-------------------- + +Some new prctl() calls are added to allow programs to manage the SME vector +length: + +prctl(PR_SME_SET_VL, unsigned long arg) + + Sets the vector length of the calling thread and related flags, where + arg == vl | flags. Other threads of the calling process are unaffected. + + vl is the desired vector length, where sve_vl_valid(vl) must be true. + + flags: + + PR_SME_VL_INHERIT + + Inherit the current vector length across execve(). Otherwise, the + vector length is reset to the system default at execve(). (See + Section 9.) + + PR_SME_SET_VL_ONEXEC + + Defer the requested vector length change until the next execve() + performed by this thread. + + The effect is equivalent to implicit execution of the following + call immediately after the next execve() (if any) by the thread: + + prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC) + + This allows launching of a new program with a different vector + length, while avoiding runtime side effects in the caller. + + Without PR_SME_SET_VL_ONEXEC, the requested change takes effect + immediately. + + + Return value: a nonnegative on success, or a negative value on error: + EINVAL: SME not supported, invalid vector length requested, or + invalid flags. + + + On success: + + * Either the calling thread's vector length or the deferred vector length + to be applied at the next execve() by the thread (dependent on whether + PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value + supported by the system that is less than or equal to vl. If vl == + SVE_VL_MAX, the value set will be the largest value supported by the + system. + + * Any previously outstanding deferred vector length change in the calling + thread is cancelled. + + * The returned value describes the resulting configuration, encoded as for + PR_SME_GET_VL. The vector length reported in this value is the new + current vector length for this thread if PR_SME_SET_VL_ONEXEC was not + present in arg; otherwise, the reported vector length is the deferred + vector length that will be applied at the next execve() by the calling + thread. + + * Changing the vector length causes all of ZA, ZTn, P0..P15, FFR and all + bits of Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become + unspecified, including both streaming and non-streaming SVE state. + Calling PR_SME_SET_VL with vl equal to the thread's current vector + length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, + does not constitute a change to the vector length for this purpose. + + * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared. + Calling PR_SME_SET_VL with vl equal to the thread's current vector + length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, + does not constitute a change to the vector length for this purpose. + + +prctl(PR_SME_GET_VL) + + Gets the vector length of the calling thread. + + The following flag may be OR-ed into the result: + + PR_SME_VL_INHERIT + + Vector length will be inherited across execve(). + + There is no way to determine whether there is an outstanding deferred + vector length change (which would only normally be the case between a + fork() or vfork() and the corresponding execve() in typical use). + + To extract the vector length from the result, bitwise and it with + PR_SME_VL_LEN_MASK. + + Return value: a nonnegative value on success, or a negative value on error: + EINVAL: SME not supported. + + +7. ptrace extensions +--------------------- + +* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE + state via PTRACE_GETREGSET and PTRACE_SETREGSET, this is documented in + sve.rst. + +* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via + PTRACE_GETREGSET and PTRACE_SETREGSET. + + Refer to [2] for definitions. + +The regset data starts with struct user_za_header, containing: + + size + + Size of the complete regset, in bytes. + This depends on vl and possibly on other things in the future. + + If a call to PTRACE_GETREGSET requests less data than the value of + size, the caller can allocate a larger buffer and retry in order to + read the complete regset. + + max_size + + Maximum size in bytes that the regset can grow to for the target + thread. The regset won't grow bigger than this even if the target + thread changes its vector length etc. + + vl + + Target thread's current streaming vector length, in bytes. + + max_vl + + Maximum possible streaming vector length for the target thread. + + flags + + Zero or more of the following flags, which have the same + meaning and behaviour as the corresponding PR_SET_VL_* flags: + + SME_PT_VL_INHERIT + + SME_PT_VL_ONEXEC (SETREGSET only). + +* The effects of changing the vector length and/or flags are equivalent to + those documented for PR_SME_SET_VL. + + The caller must make a further GETREGSET call if it needs to know what VL is + actually set by SETREGSET, unless is it known in advance that the requested + VL is supported. + +* The size and layout of the payload depends on the header fields. The + SME_PT_ZA_*() macros are provided to facilitate access to the data. + +* In either case, for SETREGSET it is permissible to omit the payload, in which + case the vector length and flags are changed and PSTATE.ZA is set to 0 + (along with any consequences of those changes). If a payload is provided + then PSTATE.ZA will be set to 1. + +* For SETREGSET, if the requested VL is not supported, the effect will be the + same as if the payload were omitted, except that an EIO error is reported. + No attempt is made to translate the payload data to the correct layout + for the vector length actually set. It is up to the caller to translate the + payload layout for the actual VL and retry. + +* The effect of writing a partial, incomplete payload is unspecified. + +* A new regset NT_ARM_ZT is defined for access to ZTn state via + PTRACE_GETREGSET and PTRACE_SETREGSET. + +* The NT_ARM_ZT regset consists of a single 512 bit register. + +* When PSTATE.ZA==0 reads of NT_ARM_ZT will report all bits of ZTn as 0. + +* Writes to NT_ARM_ZT will set PSTATE.ZA to 1. + + +8. ELF coredump extensions +--------------------------- + +* NT_ARM_SSVE notes will be added to each coredump for + each thread of the dumped process. The contents will be equivalent to the + data that would have been read if a PTRACE_GETREGSET of the corresponding + type were executed for each thread when the coredump was generated. + +* A NT_ARM_ZA note will be added to each coredump for each thread of the + dumped process. The contents will be equivalent to the data that would have + been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread + when the coredump was generated. + +* A NT_ARM_ZT note will be added to each coredump for each thread of the + dumped process. The contents will be equivalent to the data that would have + been read if a PTRACE_GETREGSET of NT_ARM_ZT were executed for each thread + when the coredump was generated. + +* The NT_ARM_TLS note will be extended to two registers, the second register + will contain TPIDR2_EL0 on systems that support SME and will be read as + zero with writes ignored otherwise. + +9. System runtime configuration +-------------------------------- + +* To mitigate the ABI impact of expansion of the signal frame, a policy + mechanism is provided for administrators, distro maintainers and developers + to set the default vector length for userspace processes: + +/proc/sys/abi/sme_default_vector_length + + Writing the text representation of an integer to this file sets the system + default vector length to the specified value, unless the value is greater + than the maximum vector length supported by the system in which case the + default vector length is set to that maximum. + + The result can be determined by reopening the file and reading its + contents. + + At boot, the default vector length is initially set to 32 or the maximum + supported vector length, whichever is smaller and supported. This + determines the initial vector length of the init process (PID 1). + + Reading this file returns the current system default vector length. + +* At every execve() call, the new vector length of the new process is set to + the system default vector length, unless + + * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the + calling thread, or + + * a deferred vector length change is pending, established via the + PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC). + +* Modifying the system default vector length does not affect the vector length + of any existing process or thread that does not make an execve() call. + + +Appendix A. SME programmer's model (informative) +================================================= + +This section provides a minimal description of the additions made by SME to the +ARMv8-A programmer's model that are relevant to this document. + +Note: This section is for information only and not intended to be complete or +to replace any architectural specification. + +A.1. Registers +--------------- + +In A64 state, SME adds the following: + +* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE + features are available. When supported EL0 software may enter and leave + streaming mode at any time. + + For best system performance it is strongly encouraged for software to enable + streaming mode only when it is actively being used. + +* A new vector length controlling the size of ZA and the Z registers when in + streaming mode, separately to the vector length used for SVE when not in + streaming mode. There is no requirement that either the currently selected + vector length or the set of vector lengths supported for the two modes in + a given system have any relationship. The streaming mode vector length + is referred to as SVL. + +* A new ZA matrix register. This is a square matrix of SVLxSVL bits. Most + operations on ZA require that streaming mode be enabled but ZA can be + enabled without streaming mode in order to load, save and retain data. + + For best system performance it is strongly encouraged for software to enable + ZA only when it is actively being used. + +* A new ZT0 register is introduced when SME2 is present. This is a 512 bit + register which is accessible when PSTATE.ZA is set, as ZA itself is. + +* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and + SMSTOP instructions or by access to the SVCR system register: + + * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid + data while if it is 0 then ZA can not be accessed. When PSTATE.ZA is + changed from 0 to 1 all bits in ZA are cleared. + + * PSTATE.SM, if this is 1 then the PE is in streaming mode. When the value + of PSTATE.SM is changed then it is implementation defined if the subset + of the floating point register bits valid in both modes may be retained. + Any other bits will be cleared. + + +References +========== + +[1] arch/arm64/include/uapi/asm/sigcontext.h + AArch64 Linux signal ABI definitions + +[2] arch/arm64/include/uapi/asm/ptrace.h + AArch64 Linux ptrace ABI definitions + +[3] Documentation/arch/arm64/cpu-feature-registers.rst diff --git a/Documentation/arch/arm64/sve.rst b/Documentation/arch/arm64/sve.rst new file mode 100644 index 000000000000..0d9a426e9f85 --- /dev/null +++ b/Documentation/arch/arm64/sve.rst @@ -0,0 +1,616 @@ +=================================================== +Scalable Vector Extension support for AArch64 Linux +=================================================== + +Author: Dave Martin + +Date: 4 August 2017 + +This document outlines briefly the interface provided to userspace by Linux in +order to support use of the ARM Scalable Vector Extension (SVE), including +interactions with Streaming SVE mode added by the Scalable Matrix Extension +(SME). + +This is an outline of the most important features and issues only and not +intended to be exhaustive. + +This document does not aim to describe the SVE architecture or programmer's +model. To aid understanding, a minimal description of relevant programmer's +model features for SVE is included in Appendix A. + + +1. General +----------- + +* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are + tracked per-thread. + +* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present + in the system, when it is not supported and these interfaces are used to + access streaming mode FFR is read and written as zero. + +* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector + AT_HWCAP entry. Presence of this flag implies the presence of the SVE + instructions and registers, and the Linux-specific system interfaces + described in this document. SVE is reported in /proc/cpuinfo as "sve". + +* Support for the execution of SVE instructions in userspace can also be + detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS + instruction, and checking that the value of the SVE field is nonzero. [3] + + It does not guarantee the presence of the system interfaces described in the + following sections: software that needs to verify that those interfaces are + present must check for HWCAP_SVE instead. + +* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also + be reported in the AT_HWCAP2 aux vector entry. In addition to this, + optional extensions to SVE2 may be reported by the presence of: + + HWCAP2_SVE2 + HWCAP2_SVEAES + HWCAP2_SVEPMULL + HWCAP2_SVEBITPERM + HWCAP2_SVESHA3 + HWCAP2_SVESM4 + HWCAP2_SVE2P1 + + This list may be extended over time as the SVE architecture evolves. + + These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1, + which userspace can read using an MRS instruction. See elf_hwcaps.txt and + cpu-feature-registers.txt for details. + +* On hardware that supports the SME extensions, HWCAP2_SME will also be + reported in the AT_HWCAP2 aux vector entry. Among other things SME adds + streaming mode which provides a subset of the SVE feature set using a + separate SME vector length and the same Z/V registers. See sme.rst + for more details. + +* Debuggers should restrict themselves to interacting with the target via the + NT_ARM_SVE regset. The recommended way of detecting support for this regset + is to connect to a target process first and then attempt a + ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is + present and streaming SVE mode is in use the FPSIMD subset of registers + will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode + in the target. + +* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory + between userspace and the kernel, the register value is encoded in memory in + an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at + byte offset i from the start of the memory representation. This affects for + example the signal frame (struct sve_context) and ptrace interface + (struct user_sve_header) and associated data. + + Beware that on big-endian systems this results in a different byte order than + for the FPSIMD V-registers, which are stored as single host-endian 128-bit + values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at + byte offset i. (struct fpsimd_context, struct user_fpsimd_state). + + +2. Vector length terminology +----------------------------- + +The size of an SVE vector (Z) register is referred to as the "vector length". + +To avoid confusion about the units used to express vector length, the kernel +adopts the following conventions: + +* Vector length (VL) = size of a Z-register in bytes + +* Vector quadwords (VQ) = size of a Z-register in units of 128 bits + +(So, VL = 16 * VQ.) + +The VQ convention is used where the underlying granularity is important, such +as in data structure definitions. In most other situations, the VL convention +is used. This is consistent with the meaning of the "VL" pseudo-register in +the SVE instruction set architecture. + + +3. System call behaviour +------------------------- + +* On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of + Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR + become zero on return from a syscall. + +* The SVE registers are not used to pass arguments to or receive results from + any syscall. + +* In practice the affected registers/bits will be preserved or will be replaced + with zeros on return from a syscall, but userspace should not make + assumptions about this. The kernel behaviour may vary on a case-by-case + basis. + +* All other SVE state of a thread, including the currently configured vector + length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector + length (if any), is preserved across all syscalls, subject to the specific + exceptions for execve() described in section 6. + + In particular, on return from a fork() or clone(), the parent and new child + process or thread share identical SVE configuration, matching that of the + parent before the call. + + +4. Signal handling +------------------- + +* A new signal frame record sve_context encodes the SVE registers on signal + delivery. [1] + +* This record is supplementary to fpsimd_context. The FPSR and FPCR registers + are only present in fpsimd_context. For convenience, the content of V0..V31 + is duplicated between sve_context and fpsimd_context. + +* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which + if set indicates that the thread is in streaming mode and the vector length + and register data (if present) describe the streaming SVE data and vector + length. + +* The signal frame record for SVE always contains basic metadata, in particular + the thread's vector length (in sve_context.vl). + +* The SVE registers may or may not be included in the record, depending on + whether the registers are live for the thread. The registers are present if + and only if: + sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)). + +* If the registers are present, the remainder of the record has a vl-dependent + size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to + the members. + +* Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant + layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the + start of the register's representation in memory. + +* If the SVE context is too big to fit in sigcontext.__reserved[], then extra + space is allocated on the stack, an extra_context record is written in + __reserved[] referencing this space. sve_context is then written in the + extra space. Refer to [1] for further details about this mechanism. + + +5. Signal return +----------------- + +When returning from a signal handler: + +* If there is no sve_context record in the signal frame, or if the record is + present but contains no register data as described in the previous section, + then the SVE registers/bits become non-live and take unspecified values. + +* If sve_context is present in the signal frame and contains full register + data, the SVE registers become live and are populated with the specified + data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31 + are always restored from the corresponding members of fpsimd_context.vregs[] + and not from sve_context. The remaining bits are restored from sve_context. + +* Inclusion of fpsimd_context in the signal frame remains mandatory, + irrespective of whether sve_context is present or not. + +* The vector length cannot be changed via signal return. If sve_context.vl in + the signal frame does not match the current vector length, the signal return + attempt is treated as illegal, resulting in a forced SIGSEGV. + +* It is permitted to enter or leave streaming mode by setting or clearing + the SVE_SIG_FLAG_SM flag but applications should take care to ensure that + when doing so sve_context.vl and any register data are appropriate for the + vector length in the new mode. + + +6. prctl extensions +-------------------- + +Some new prctl() calls are added to allow programs to manage the SVE vector +length: + +prctl(PR_SVE_SET_VL, unsigned long arg) + + Sets the vector length of the calling thread and related flags, where + arg == vl | flags. Other threads of the calling process are unaffected. + + vl is the desired vector length, where sve_vl_valid(vl) must be true. + + flags: + + PR_SVE_VL_INHERIT + + Inherit the current vector length across execve(). Otherwise, the + vector length is reset to the system default at execve(). (See + Section 9.) + + PR_SVE_SET_VL_ONEXEC + + Defer the requested vector length change until the next execve() + performed by this thread. + + The effect is equivalent to implicit execution of the following + call immediately after the next execve() (if any) by the thread: + + prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC) + + This allows launching of a new program with a different vector + length, while avoiding runtime side effects in the caller. + + + Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect + immediately. + + + Return value: a nonnegative on success, or a negative value on error: + EINVAL: SVE not supported, invalid vector length requested, or + invalid flags. + + + On success: + + * Either the calling thread's vector length or the deferred vector length + to be applied at the next execve() by the thread (dependent on whether + PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value + supported by the system that is less than or equal to vl. If vl == + SVE_VL_MAX, the value set will be the largest value supported by the + system. + + * Any previously outstanding deferred vector length change in the calling + thread is cancelled. + + * The returned value describes the resulting configuration, encoded as for + PR_SVE_GET_VL. The vector length reported in this value is the new + current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not + present in arg; otherwise, the reported vector length is the deferred + vector length that will be applied at the next execve() by the calling + thread. + + * Changing the vector length causes all of P0..P15, FFR and all bits of + Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become + unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current + vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC + flag, does not constitute a change to the vector length for this purpose. + + +prctl(PR_SVE_GET_VL) + + Gets the vector length of the calling thread. + + The following flag may be OR-ed into the result: + + PR_SVE_VL_INHERIT + + Vector length will be inherited across execve(). + + There is no way to determine whether there is an outstanding deferred + vector length change (which would only normally be the case between a + fork() or vfork() and the corresponding execve() in typical use). + + To extract the vector length from the result, bitwise and it with + PR_SVE_VL_LEN_MASK. + + Return value: a nonnegative value on success, or a negative value on error: + EINVAL: SVE not supported. + + +7. ptrace extensions +--------------------- + +* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with + PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the + streaming mode SVE registers and NT_ARM_SVE describes the + non-streaming mode SVE registers. + + In this description a register set is referred to as being "live" when + the target is in the appropriate streaming or non-streaming mode and is + using data beyond the subset shared with the FPSIMD Vn registers. + + Refer to [2] for definitions. + +The regset data starts with struct user_sve_header, containing: + + size + + Size of the complete regset, in bytes. + This depends on vl and possibly on other things in the future. + + If a call to PTRACE_GETREGSET requests less data than the value of + size, the caller can allocate a larger buffer and retry in order to + read the complete regset. + + max_size + + Maximum size in bytes that the regset can grow to for the target + thread. The regset won't grow bigger than this even if the target + thread changes its vector length etc. + + vl + + Target thread's current vector length, in bytes. + + max_vl + + Maximum possible vector length for the target thread. + + flags + + at most one of + + SVE_PT_REGS_FPSIMD + + SVE registers are not live (GETREGSET) or are to be made + non-live (SETREGSET). + + The payload is of type struct user_fpsimd_state, with the same + meaning as for NT_PRFPREG, starting at offset + SVE_PT_FPSIMD_OFFSET from the start of user_sve_header. + + Extra data might be appended in the future: the size of the + payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags). + + vq should be obtained using sve_vq_from_vl(vl). + + or + + SVE_PT_REGS_SVE + + SVE registers are live (GETREGSET) or are to be made live + (SETREGSET). + + The payload contains the SVE register data, starting at offset + SVE_PT_SVE_OFFSET from the start of user_sve_header, and with + size SVE_PT_SVE_SIZE(vq, flags); + + ... OR-ed with zero or more of the following flags, which have the same + meaning and behaviour as the corresponding PR_SET_VL_* flags: + + SVE_PT_VL_INHERIT + + SVE_PT_VL_ONEXEC (SETREGSET only). + + If neither FPSIMD nor SVE flags are provided then no register + payload is available, this is only possible when SME is implemented. + + +* The effects of changing the vector length and/or flags are equivalent to + those documented for PR_SVE_SET_VL. + + The caller must make a further GETREGSET call if it needs to know what VL is + actually set by SETREGSET, unless is it known in advance that the requested + VL is supported. + +* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on + the header fields. The SVE_PT_SVE_*() macros are provided to facilitate + access to the members. + +* In either case, for SETREGSET it is permissible to omit the payload, in which + case only the vector length and flags are changed (along with any + consequences of those changes). + +* In systems supporting SME when in streaming mode a GETREGSET for + NT_REG_SVE will return only the user_sve_header with no register data, + similarly a GETREGSET for NT_REG_SSVE will not return any register data + when not in streaming mode. + +* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD. + +* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the + requested VL is not supported, the effect will be the same as if the + payload were omitted, except that an EIO error is reported. No + attempt is made to translate the payload data to the correct layout + for the vector length actually set. The thread's FPSIMD state is + preserved, but the remaining bits of the SVE registers become + unspecified. It is up to the caller to translate the payload layout + for the actual VL and retry. + +* Where SME is implemented it is not possible to GETREGSET the register + state for normal SVE when in streaming mode, nor the streaming mode + register state when in normal mode, regardless of the implementation defined + behaviour of the hardware for sharing data between the two modes. + +* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in + streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode + if the target was not in streaming mode. + +* The effect of writing a partial, incomplete payload is unspecified. + + +8. ELF coredump extensions +--------------------------- + +* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for + each thread of the dumped process. The contents will be equivalent to the + data that would have been read if a PTRACE_GETREGSET of the corresponding + type were executed for each thread when the coredump was generated. + +9. System runtime configuration +-------------------------------- + +* To mitigate the ABI impact of expansion of the signal frame, a policy + mechanism is provided for administrators, distro maintainers and developers + to set the default vector length for userspace processes: + +/proc/sys/abi/sve_default_vector_length + + Writing the text representation of an integer to this file sets the system + default vector length to the specified value, unless the value is greater + than the maximum vector length supported by the system in which case the + default vector length is set to that maximum. + + The result can be determined by reopening the file and reading its + contents. + + At boot, the default vector length is initially set to 64 or the maximum + supported vector length, whichever is smaller. This determines the initial + vector length of the init process (PID 1). + + Reading this file returns the current system default vector length. + +* At every execve() call, the new vector length of the new process is set to + the system default vector length, unless + + * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the + calling thread, or + + * a deferred vector length change is pending, established via the + PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC). + +* Modifying the system default vector length does not affect the vector length + of any existing process or thread that does not make an execve() call. + +10. Perf extensions +-------------------------------- + +* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register + at index 46. This register is used for DWARF unwinding when variable length + SVE registers are pushed onto the stack. + +* Its value is equivalent to the current SVE vector length (VL) in bits divided + by 64. + +* The value is included in Perf samples in the regs[46] field if + PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set. + +* The value is the current value at the time the sample was taken, and it can + change over time. + +* If the system doesn't support SVE when perf_event_open is called with these + settings, the event will fail to open. + +Appendix A. SVE programmer's model (informative) +================================================= + +This section provides a minimal description of the additions made by SVE to the +ARMv8-A programmer's model that are relevant to this document. + +Note: This section is for information only and not intended to be complete or +to replace any architectural specification. + +A.1. Registers +--------------- + +In A64 state, SVE adds the following: + +* 32 8VL-bit vector registers Z0..Z31 + For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn. + + A register write using a Vn register name zeros all bits of the corresponding + Zn except for bits [127:0]. + +* 16 VL-bit predicate registers P0..P15 + +* 1 VL-bit special-purpose predicate register FFR (the "first-fault register") + +* a VL "pseudo-register" that determines the size of each vector register + + The SVE instruction set architecture provides no way to write VL directly. + Instead, it can be modified only by EL1 and above, by writing appropriate + system registers. + +* The value of VL can be configured at runtime by EL1 and above: + 16 <= VL <= VLmax, where VL must be a multiple of 16. + +* The maximum vector length is determined by the hardware: + 16 <= VLmax <= 256. + + (The SVE architecture specifies 256, but permits future architecture + revisions to raise this limit.) + +* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point + operations in a similar way to the way in which they interact with ARMv8 + floating-point operations:: + + 8VL-1 128 0 bit index + +---- //// -----------------+ + Z0 | : V0 | + : : + Z7 | : V7 | + Z8 | : * V8 | + : : : + Z15 | : *V15 | + Z16 | : V16 | + : : + Z31 | : V31 | + +---- //// -----------------+ + 31 0 + VL-1 0 +-------+ + +---- //// --+ FPSR | | + P0 | | +-------+ + : | | *FPCR | | + P15 | | +-------+ + +---- //// --+ + FFR | | +-----+ + +---- //// --+ VL | | + +-----+ + +(*) callee-save: + This only applies to bits [63:0] of Z-/V-registers. + FPCR contains callee-save and caller-save bits. See [4] for details. + + +A.2. Procedure call standard +----------------------------- + +The ARMv8-A base procedure call standard is extended as follows with respect to +the additional SVE register state: + +* All SVE register bits that are not shared with FP/SIMD are caller-save. + +* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save. + + This follows from the way these bits are mapped to V8..V15, which are caller- + save in the base procedure call standard. + + +Appendix B. ARMv8-A FP/SIMD programmer's model +=============================================== + +Note: This section is for information only and not intended to be complete or +to replace any architectural specification. + +Refer to [4] for more information. + +ARMv8-A defines the following floating-point / SIMD register state: + +* 32 128-bit vector registers V0..V31 +* 2 32-bit status/control registers FPSR, FPCR + +:: + + 127 0 bit index + +---------------+ + V0 | | + : : : + V7 | | + * V8 | | + : : : : + *V15 | | + V16 | | + : : : + V31 | | + +---------------+ + + 31 0 + +-------+ + FPSR | | + +-------+ + *FPCR | | + +-------+ + +(*) callee-save: + This only applies to bits [63:0] of V-registers. + FPCR contains a mixture of callee-save and caller-save bits. + + +References +========== + +[1] arch/arm64/include/uapi/asm/sigcontext.h + AArch64 Linux signal ABI definitions + +[2] arch/arm64/include/uapi/asm/ptrace.h + AArch64 Linux ptrace ABI definitions + +[3] Documentation/arch/arm64/cpu-feature-registers.rst + +[4] ARM IHI0055C + http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf + http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html + Procedure Call Standard for the ARM 64-bit Architecture (AArch64) + +[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst diff --git a/Documentation/arch/arm64/tagged-address-abi.rst b/Documentation/arch/arm64/tagged-address-abi.rst new file mode 100644 index 000000000000..fe24a3f158c5 --- /dev/null +++ b/Documentation/arch/arm64/tagged-address-abi.rst @@ -0,0 +1,179 @@ +========================== +AArch64 TAGGED ADDRESS ABI +========================== + +Authors: Vincenzo Frascino + Catalin Marinas + +Date: 21 August 2019 + +This document describes the usage and semantics of the Tagged Address +ABI on AArch64 Linux. + +1. Introduction +--------------- + +On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing +userspace (EL0) to perform memory accesses through 64-bit pointers with +a non-zero top byte. This document describes the relaxation of the +syscall ABI that allows userspace to pass certain tagged pointers to +kernel syscalls. + +2. AArch64 Tagged Address ABI +----------------------------- + +From the kernel syscall interface perspective and for the purposes of +this document, a "valid tagged pointer" is a pointer with a potentially +non-zero top-byte that references an address in the user process address +space obtained in one of the following ways: + +- ``mmap()`` syscall where either: + + - flags have the ``MAP_ANONYMOUS`` bit set or + - the file descriptor refers to a regular file (including those + returned by ``memfd_create()``) or ``/dev/zero`` + +- ``brk()`` syscall (i.e. the heap area between the initial location of + the program break at process creation and its current location). + +- any memory mapped by the kernel in the address space of the process + during creation and with the same restrictions as for ``mmap()`` above + (e.g. data, bss, stack). + +The AArch64 Tagged Address ABI has two stages of relaxation depending on +how the user addresses are used by the kernel: + +1. User addresses not accessed by the kernel but used for address space + management (e.g. ``mprotect()``, ``madvise()``). The use of valid + tagged pointers in this context is allowed with these exceptions: + + - ``brk()``, ``mmap()`` and the ``new_address`` argument to + ``mremap()`` as these have the potential to alias with existing + user addresses. + + NOTE: This behaviour changed in v5.6 and so some earlier kernels may + incorrectly accept valid tagged pointers for the ``brk()``, + ``mmap()`` and ``mremap()`` system calls. + + - The ``range.start``, ``start`` and ``dst`` arguments to the + ``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from + ``userfaultfd()``, as fault addresses subsequently obtained by reading + the file descriptor will be untagged, which may otherwise confuse + tag-unaware programs. + + NOTE: This behaviour changed in v5.14 and so some earlier kernels may + incorrectly accept valid tagged pointers for this system call. + +2. User addresses accessed by the kernel (e.g. ``write()``). This ABI + relaxation is disabled by default and the application thread needs to + explicitly enable it via ``prctl()`` as follows: + + - ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged + Address ABI for the calling thread. + + The ``(unsigned int) arg2`` argument is a bit mask describing the + control mode used: + + - ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI. + Default status is disabled. + + Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0. + + - ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged + Address ABI for the calling thread. + + Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0. + + The ABI properties described above are thread-scoped, inherited on + clone() and fork() and cleared on exec(). + + Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)`` + returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally + disabled by ``sysctl abi.tagged_addr_disabled=1``. The default + ``sysctl abi.tagged_addr_disabled`` configuration is 0. + +When the AArch64 Tagged Address ABI is enabled for a thread, the +following behaviours are guaranteed: + +- All syscalls except the cases mentioned in section 3 can accept any + valid tagged pointer. + +- The syscall behaviour is undefined for invalid tagged pointers: it may + result in an error code being returned, a (fatal) signal being raised, + or other modes of failure. + +- The syscall behaviour for a valid tagged pointer is the same as for + the corresponding untagged pointer. + + +A definition of the meaning of tagged pointers on AArch64 can be found +in Documentation/arch/arm64/tagged-pointers.rst. + +3. AArch64 Tagged Address ABI Exceptions +----------------------------------------- + +The following system call parameters must be untagged regardless of the +ABI relaxation: + +- ``prctl()`` other than pointers to user data either passed directly or + indirectly as arguments to be accessed by the kernel. + +- ``ioctl()`` other than pointers to user data either passed directly or + indirectly as arguments to be accessed by the kernel. + +- ``shmat()`` and ``shmdt()``. + +- ``brk()`` (since kernel v5.6). + +- ``mmap()`` (since kernel v5.6). + +- ``mremap()``, the ``new_address`` argument (since kernel v5.6). + +Any attempt to use non-zero tagged pointers may result in an error code +being returned, a (fatal) signal being raised, or other modes of +failure. + +4. Example of correct usage +--------------------------- +.. code-block:: c + + #include + #include + #include + #include + #include + + #define PR_SET_TAGGED_ADDR_CTRL 55 + #define PR_TAGGED_ADDR_ENABLE (1UL << 0) + + #define TAG_SHIFT 56 + + int main(void) + { + int tbi_enabled = 0; + unsigned long tag = 0; + char *ptr; + + /* check/enable the tagged address ABI */ + if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)) + tbi_enabled = 1; + + /* memory allocation */ + ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (ptr == MAP_FAILED) + return 1; + + /* set a non-zero tag if the ABI is available */ + if (tbi_enabled) + tag = rand() & 0xff; + ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT)); + + /* memory access to a tagged address */ + strcpy(ptr, "tagged pointer\n"); + + /* syscall with a tagged pointer */ + write(1, ptr, strlen(ptr)); + + return 0; + } diff --git a/Documentation/arch/arm64/tagged-pointers.rst b/Documentation/arch/arm64/tagged-pointers.rst new file mode 100644 index 000000000000..81b6c2a770dd --- /dev/null +++ b/Documentation/arch/arm64/tagged-pointers.rst @@ -0,0 +1,88 @@ +========================================= +Tagged virtual addresses in AArch64 Linux +========================================= + +Author: Will Deacon + +Date : 12 June 2013 + +This document briefly describes the provision of tagged virtual +addresses in the AArch64 translation system and their potential uses +in AArch64 Linux. + +The kernel configures the translation tables so that translations made +via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of +the virtual address ignored by the translation hardware. This frees up +this byte for application use. + + +Passing tagged addresses to the kernel +-------------------------------------- + +All interpretation of userspace memory addresses by the kernel assumes +an address tag of 0x00, unless the application enables the AArch64 +Tagged Address ABI explicitly +(Documentation/arch/arm64/tagged-address-abi.rst). + +This includes, but is not limited to, addresses found in: + + - pointer arguments to system calls, including pointers in structures + passed to system calls, + + - the stack pointer (sp), e.g. when interpreting it to deliver a + signal, + + - the frame pointer (x29) and frame records, e.g. when interpreting + them to generate a backtrace or call graph. + +Using non-zero address tags in any of these locations when the +userspace application did not enable the AArch64 Tagged Address ABI may +result in an error code being returned, a (fatal) signal being raised, +or other modes of failure. + +For these reasons, when the AArch64 Tagged Address ABI is disabled, +passing non-zero address tags to the kernel via system calls is +forbidden, and using a non-zero address tag for sp is strongly +discouraged. + +Programs maintaining a frame pointer and frame records that use non-zero +address tags may suffer impaired or inaccurate debug and profiling +visibility. + + +Preserving tags +--------------- + +When delivering signals, non-zero tags are not preserved in +siginfo.si_addr unless the flag SA_EXPOSE_TAGBITS was set in +sigaction.sa_flags when the signal handler was installed. This means +that signal handlers in applications making use of tags cannot rely +on the tag information for user virtual addresses being maintained +in these fields unless the flag was set. + +Due to architecture limitations, bits 63:60 of the fault address +are not preserved in response to synchronous tag check faults +(SEGV_MTESERR) even if SA_EXPOSE_TAGBITS was set. Applications should +treat the values of these bits as undefined in order to accommodate +future architecture revisions which may preserve the bits. + +For signals raised in response to watchpoint debug exceptions, the +tag information will be preserved regardless of the SA_EXPOSE_TAGBITS +flag setting. + +Non-zero tags are never preserved in sigcontext.fault_address +regardless of the SA_EXPOSE_TAGBITS flag setting. + +The architecture prevents the use of a tagged PC, so the upper byte will +be set to a sign-extension of bit 55 on exception return. + +This behaviour is maintained when the AArch64 Tagged Address ABI is +enabled. + + +Other considerations +-------------------- + +Special care should be taken when using tagged pointers, since it is +likely that C compilers will not hazard two virtual addresses differing +only in the upper byte. diff --git a/Documentation/arch/index.rst b/Documentation/arch/index.rst index 21e3d0b61004..8458b88e9b79 100644 --- a/Documentation/arch/index.rst +++ b/Documentation/arch/index.rst @@ -11,7 +11,7 @@ implementation. arc/index arm/index - ../arm64/index + arm64/index ia64/index ../loongarch/index m68k/index diff --git a/Documentation/arm64/acpi_object_usage.rst b/Documentation/arm64/acpi_object_usage.rst deleted file mode 100644 index 484ef9676653..000000000000 --- a/Documentation/arm64/acpi_object_usage.rst +++ /dev/null @@ -1,738 +0,0 @@ -=========== -ACPI Tables -=========== - -The expectations of individual ACPI tables are discussed in the list that -follows. - -If a section number is used, it refers to a section number in the ACPI -specification where the object is defined. If "Signature Reserved" is used, -the table signature (the first four bytes of the table) is the only portion -of the table recognized by the specification, and the actual table is defined -outside of the UEFI Forum (see Section 5.2.6 of the specification). - -For ACPI on arm64, tables also fall into the following categories: - - - Required: DSDT, FADT, GTDT, MADT, MCFG, RSDP, SPCR, XSDT - - - Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT - - - Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IBFT, - IORT, MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, - STAO, TCPA, TPM2, UEFI, XENV - - - Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT, MSDM, OEMx, - PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT - -====== ======================================================================== -Table Usage for ARMv8 Linux -====== ======================================================================== -BERT Section 18.3 (signature == "BERT") - - **Boot Error Record Table** - - Must be supplied if RAS support is provided by the platform. It - is recommended this table be supplied. - -BOOT Signature Reserved (signature == "BOOT") - - **simple BOOT flag table** - - Microsoft only table, will not be supported. - -BGRT Section 5.2.22 (signature == "BGRT") - - **Boot Graphics Resource Table** - - Optional, not currently supported, with no real use-case for an - ARM server. - -CPEP Section 5.2.18 (signature == "CPEP") - - **Corrected Platform Error Polling table** - - Optional, not currently supported, and not recommended until such - time as ARM-compatible hardware is available, and the specification - suitably modified. - -CSRT Signature Reserved (signature == "CSRT") - - **Core System Resources Table** - - Optional, not currently supported. - -DBG2 Signature Reserved (signature == "DBG2") - - **DeBuG port table 2** - - License has changed and should be usable. Optional if used instead - of earlycon= on the command line. - -DBGP Signature Reserved (signature == "DBGP") - - **DeBuG Port table** - - Microsoft only table, will not be supported. - -DSDT Section 5.2.11.1 (signature == "DSDT") - - **Differentiated System Description Table** - - A DSDT is required; see also SSDT. - - ACPI tables contain only one DSDT but can contain one or more SSDTs, - which are optional. Each SSDT can only add to the ACPI namespace, - but cannot modify or replace anything in the DSDT. - -DMAR Signature Reserved (signature == "DMAR") - - **DMA Remapping table** - - x86 only table, will not be supported. - -DRTM Signature Reserved (signature == "DRTM") - - **Dynamic Root of Trust for Measurement table** - - Optional, not currently supported. - -ECDT Section 5.2.16 (signature == "ECDT") - - **Embedded Controller Description Table** - - Optional, not currently supported, but could be used on ARM if and - only if one uses the GPE_BIT field to represent an IRQ number, since - there are no GPE blocks defined in hardware reduced mode. This would - need to be modified in the ACPI specification. - -EINJ Section 18.6 (signature == "EINJ") - - **Error Injection table** - - This table is very useful for testing platform response to error - conditions; it allows one to inject an error into the system as - if it had actually occurred. However, this table should not be - shipped with a production system; it should be dynamically loaded - and executed with the ACPICA tools only during testing. - -ERST Section 18.5 (signature == "ERST") - - **Error Record Serialization Table** - - On a platform supports RAS, this table must be supplied if it is not - UEFI-based; if it is UEFI-based, this table may be supplied. When this - table is not present, UEFI run time service will be utilized to save - and retrieve hardware error information to and from a persistent store. - -ETDT Signature Reserved (signature == "ETDT") - - **Event Timer Description Table** - - Obsolete table, will not be supported. - -FACS Section 5.2.10 (signature == "FACS") - - **Firmware ACPI Control Structure** - - It is unlikely that this table will be terribly useful. If it is - provided, the Global Lock will NOT be used since it is not part of - the hardware reduced profile, and only 64-bit address fields will - be considered valid. - -FADT Section 5.2.9 (signature == "FACP") - - **Fixed ACPI Description Table** - Required for arm64. - - - The HW_REDUCED_ACPI flag must be set. All of the fields that are - to be ignored when HW_REDUCED_ACPI is set are expected to be set to - zero. - - If an FACS table is provided, the X_FIRMWARE_CTRL field is to be - used, not FIRMWARE_CTRL. - - If PSCI is used (as is recommended), make sure that ARM_BOOT_ARCH is - filled in properly - that the PSCI_COMPLIANT flag is set and that - PSCI_USE_HVC is set or unset as needed (see table 5-37). - - For the DSDT that is also required, the X_DSDT field is to be used, - not the DSDT field. - -FPDT Section 5.2.23 (signature == "FPDT") - - **Firmware Performance Data Table** - - Optional, useful for boot performance profiling. - -GTDT Section 5.2.24 (signature == "GTDT") - - **Generic Timer Description Table** - - Required for arm64. - -HEST Section 18.3.2 (signature == "HEST") - - **Hardware Error Source Table** - - ARM-specific error sources have been defined; please use those or the - PCI types such as type 6 (AER Root Port), 7 (AER Endpoint), or 8 (AER - Bridge), or use type 9 (Generic Hardware Error Source). Firmware first - error handling is possible if and only if Trusted Firmware is being - used on arm64. - - Must be supplied if RAS support is provided by the platform. It - is recommended this table be supplied. - -HPET Signature Reserved (signature == "HPET") - - **High Precision Event timer Table** - - x86 only table, will not be supported. - -IBFT Signature Reserved (signature == "IBFT") - - **iSCSI Boot Firmware Table** - - Microsoft defined table, support TBD. - -IORT Signature Reserved (signature == "IORT") - - **Input Output Remapping Table** - - arm64 only table, required in order to describe IO topology, SMMUs, - and GIC ITSs, and how those various components are connected together, - such as identifying which components are behind which SMMUs/ITSs. - This table will only be required on certain SBSA platforms (e.g., - when using GICv3-ITS and an SMMU); on SBSA Level 0 platforms, it - remains optional. - -IVRS Signature Reserved (signature == "IVRS") - - **I/O Virtualization Reporting Structure** - - x86_64 (AMD) only table, will not be supported. - -LPIT Signature Reserved (signature == "LPIT") - - **Low Power Idle Table** - - x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor - descriptions and power states on ARM platforms should use the DSDT - and define processor container devices (_HID ACPI0010, Section 8.4, - and more specifically 8.4.3 and 8.4.4). - -MADT Section 5.2.12 (signature == "APIC") - - **Multiple APIC Description Table** - - Required for arm64. Only the GIC interrupt controller structures - should be used (types 0xA - 0xF). - -MCFG Signature Reserved (signature == "MCFG") - - **Memory-mapped ConFiGuration space** - - If the platform supports PCI/PCIe, an MCFG table is required. - -MCHI Signature Reserved (signature == "MCHI") - - **Management Controller Host Interface table** - - Optional, not currently supported. - -MPST Section 5.2.21 (signature == "MPST") - - **Memory Power State Table** - - Optional, not currently supported. - -MSCT Section 5.2.19 (signature == "MSCT") - - **Maximum System Characteristic Table** - - Optional, not currently supported. - -MSDM Signature Reserved (signature == "MSDM") - - **Microsoft Data Management table** - - Microsoft only table, will not be supported. - -NFIT Section 5.2.25 (signature == "NFIT") - - **NVDIMM Firmware Interface Table** - - Optional, not currently supported. - -OEMx Signature of "OEMx" only - - **OEM Specific Tables** - - All tables starting with a signature of "OEM" are reserved for OEM - use. Since these are not meant to be of general use but are limited - to very specific end users, they are not recommended for use and are - not supported by the kernel for arm64. - -PCCT Section 14.1 (signature == "PCCT) - - **Platform Communications Channel Table** - - Recommend for use on arm64; use of PCC is recommended when using CPPC - to control performance and power for platform processors. - -PMTT Section 5.2.21.12 (signature == "PMTT") - - **Platform Memory Topology Table** - - Optional, not currently supported. - -PSDT Section 5.2.11.3 (signature == "PSDT") - - **Persistent System Description Table** - - Obsolete table, will not be supported. - -RASF Section 5.2.20 (signature == "RASF") - - **RAS Feature table** - - Optional, not currently supported. - -RSDP Section 5.2.5 (signature == "RSD PTR") - - **Root System Description PoinTeR** - - Required for arm64. - -RSDT Section 5.2.7 (signature == "RSDT") - - **Root System Description Table** - - Since this table can only provide 32-bit addresses, it is deprecated - on arm64, and will not be used. If provided, it will be ignored. - -SBST Section 5.2.14 (signature == "SBST") - - **Smart Battery Subsystem Table** - - Optional, not currently supported. - -SLIC Signature Reserved (signature == "SLIC") - - **Software LIcensing table** - - Microsoft only table, will not be supported. - -SLIT Section 5.2.17 (signature == "SLIT") - - **System Locality distance Information Table** - - Optional in general, but required for NUMA systems. - -SPCR Signature Reserved (signature == "SPCR") - - **Serial Port Console Redirection table** - - Required for arm64. - -SPMI Signature Reserved (signature == "SPMI") - - **Server Platform Management Interface table** - - Optional, not currently supported. - -SRAT Section 5.2.16 (signature == "SRAT") - - **System Resource Affinity Table** - - Optional, but if used, only the GICC Affinity structures are read. - To support arm64 NUMA, this table is required. - -SSDT Section 5.2.11.2 (signature == "SSDT") - - **Secondary System Description Table** - - These tables are a continuation of the DSDT; these are recommended - for use with devices that can be added to a running system, but can - also serve the purpose of dividing up device descriptions into more - manageable pieces. - - An SSDT can only ADD to the ACPI namespace. It cannot modify or - replace existing device descriptions already in the namespace. - - These tables are optional, however. ACPI tables should contain only - one DSDT but can contain many SSDTs. - -STAO Signature Reserved (signature == "STAO") - - **_STA Override table** - - Optional, but only necessary in virtualized environments in order to - hide devices from guest OSs. - -TCPA Signature Reserved (signature == "TCPA") - - **Trusted Computing Platform Alliance table** - - Optional, not currently supported, and may need changes to fully - interoperate with arm64. - -TPM2 Signature Reserved (signature == "TPM2") - - **Trusted Platform Module 2 table** - - Optional, not currently supported, and may need changes to fully - interoperate with arm64. - -UEFI Signature Reserved (signature == "UEFI") - - **UEFI ACPI data table** - - Optional, not currently supported. No known use case for arm64, - at present. - -WAET Signature Reserved (signature == "WAET") - - **Windows ACPI Emulated devices Table** - - Microsoft only table, will not be supported. - -WDAT Signature Reserved (signature == "WDAT") - - **Watch Dog Action Table** - - Microsoft only table, will not be supported. - -WDRT Signature Reserved (signature == "WDRT") - - **Watch Dog Resource Table** - - Microsoft only table, will not be supported. - -WPBT Signature Reserved (signature == "WPBT") - - **Windows Platform Binary Table** - - Microsoft only table, will not be supported. - -XENV Signature Reserved (signature == "XENV") - - **Xen project table** - - Optional, used only by Xen at present. - -XSDT Section 5.2.8 (signature == "XSDT") - - **eXtended System Description Table** - - Required for arm64. -====== ======================================================================== - -ACPI Objects ------------- -The expectations on individual ACPI objects that are likely to be used are -shown in the list that follows; any object not explicitly mentioned below -should be used as needed for a particular platform or particular subsystem, -such as power management or PCI. - -===== ================ ======================================================== -Name Section Usage for ARMv8 Linux -===== ================ ======================================================== -_CCA 6.2.17 This method must be defined for all bus masters - on arm64 - there are no assumptions made about - whether such devices are cache coherent or not. - The _CCA value is inherited by all descendants of - these devices so it does not need to be repeated. - Without _CCA on arm64, the kernel does not know what - to do about setting up DMA for the device. - - NB: this method provides default cache coherency - attributes; the presence of an SMMU can be used to - modify that, however. For example, a master could - default to non-coherent, but be made coherent with - the appropriate SMMU configuration (see Table 17 of - the IORT specification, ARM Document DEN 0049B). - -_CID 6.1.2 Use as needed, see also _HID. - -_CLS 6.1.3 Use as needed, see also _HID. - -_CPC 8.4.7.1 Use as needed, power management specific. CPPC is - recommended on arm64. - -_CRS 6.2.2 Required on arm64. - -_CSD 8.4.2.2 Use as needed, used only in conjunction with _CST. - -_CST 8.4.2.1 Low power idle states (8.4.4) are recommended instead - of C-states. - -_DDN 6.1.4 This field can be used for a device name. However, - it is meant for DOS device names (e.g., COM1), so be - careful of its use across OSes. - -_DSD 6.2.5 To be used with caution. If this object is used, try - to use it within the constraints already defined by the - Device Properties UUID. Only in rare circumstances - should it be necessary to create a new _DSD UUID. - - In either case, submit the _DSD definition along with - any driver patches for discussion, especially when - device properties are used. A driver will not be - considered complete without a corresponding _DSD - description. Once approved by kernel maintainers, - the UUID or device properties must then be registered - with the UEFI Forum; this may cause some iteration as - more than one OS will be registering entries. - -_DSM 9.1.1 Do not use this method. It is not standardized, the - return values are not well documented, and it is - currently a frequent source of error. - -\_GL 5.7.1 This object is not to be used in hardware reduced - mode, and therefore should not be used on arm64. - -_GLK 6.5.7 This object requires a global lock be defined; there - is no global lock on arm64 since it runs in hardware - reduced mode. Hence, do not use this object on arm64. - -\_GPE 5.3.1 This namespace is for x86 use only. Do not use it - on arm64. - -_HID 6.1.5 This is the primary object to use in device probing, - though _CID and _CLS may also be used. - -_INI 6.5.1 Not required, but can be useful in setting up devices - when UEFI leaves them in a state that may not be what - the driver expects before it starts probing. - -_LPI 8.4.4.3 Recommended for use with processor definitions (_HID - ACPI0010) on arm64. See also _RDI. - -_MLS 6.1.7 Highly recommended for use in internationalization. - -_OFF 7.2.2 It is recommended to define this method for any device - that can be turned on or off. - -_ON 7.2.3 It is recommended to define this method for any device - that can be turned on or off. - -\_OS 5.7.3 This method will return "Linux" by default (this is - the value of the macro ACPI_OS_NAME on Linux). The - command line parameter acpi_os= can be used - to set it to some other value. - -_OSC 6.2.11 This method can be a global method in ACPI (i.e., - \_SB._OSC), or it may be associated with a specific - device (e.g., \_SB.DEV0._OSC), or both. When used - as a global method, only capabilities published in - the ACPI specification are allowed. When used as - a device-specific method, the process described for - using _DSD MUST be used to create an _OSC definition; - out-of-process use of _OSC is not allowed. That is, - submit the device-specific _OSC usage description as - part of the kernel driver submission, get it approved - by the kernel community, then register it with the - UEFI Forum. - -\_OSI 5.7.2 Deprecated on ARM64. As far as ACPI firmware is - concerned, _OSI is not to be used to determine what - sort of system is being used or what functionality - is provided. The _OSC method is to be used instead. - -_PDC 8.4.1 Deprecated, do not use on arm64. - -\_PIC 5.8.1 The method should not be used. On arm64, the only - interrupt model available is GIC. - -\_PR 5.3.1 This namespace is for x86 use only on legacy systems. - Do not use it on arm64. - -_PRT 6.2.13 Required as part of the definition of all PCI root - devices. - -_PRx 7.3.8-11 Use as needed; power management specific. If _PR0 is - defined, _PR3 must also be defined. - -_PSx 7.3.2-5 Use as needed; power management specific. If _PS0 is - defined, _PS3 must also be defined. If clocks or - regulators need adjusting to be consistent with power - usage, change them in these methods. - -_RDI 8.4.4.4 Recommended for use with processor definitions (_HID - ACPI0010) on arm64. This should only be used in - conjunction with _LPI. - -\_REV 5.7.4 Always returns the latest version of ACPI supported. - -\_SB 5.3.1 Required on arm64; all devices must be defined in this - namespace. - -_SLI 6.2.15 Use is recommended when SLIT table is in use. - -_STA 6.3.7, It is recommended to define this method for any device - 7.2.4 that can be turned on or off. See also the STAO table - that provides overrides to hide devices in virtualized - environments. - -_SRS 6.2.16 Use as needed; see also _PRS. - -_STR 6.1.10 Recommended for conveying device names to end users; - this is preferred over using _DDN. - -_SUB 6.1.9 Use as needed; _HID or _CID are preferred. - -_SUN 6.1.11 Use as needed, but recommended. - -_SWS 7.4.3 Use as needed; power management specific; this may - require specification changes for use on arm64. - -_UID 6.1.12 Recommended for distinguishing devices of the same - class; define it if at all possible. -===== ================ ======================================================== - - - - -ACPI Event Model ----------------- -Do not use GPE block devices; these are not supported in the hardware reduced -profile used by arm64. Since there are no GPE blocks defined for use on ARM -platforms, ACPI events must be signaled differently. - -There are two options: GPIO-signaled interrupts (Section 5.6.5), and -interrupt-signaled events (Section 5.6.9). Interrupt-signaled events are a -new feature in the ACPI 6.1 specification. Either - or both - can be used -on a given platform, and which to use may be dependent of limitations in any -given SoC. If possible, interrupt-signaled events are recommended. - - -ACPI Processor Control ----------------------- -Section 8 of the ACPI specification changed significantly in version 6.0. -Processors should now be defined as Device objects with _HID ACPI0007; do -not use the deprecated Processor statement in ASL. All multiprocessor systems -should also define a hierarchy of processors, done with Processor Container -Devices (see Section 8.4.3.1, _HID ACPI0010); do not use processor aggregator -devices (Section 8.5) to describe processor topology. Section 8.4 of the -specification describes the semantics of these object definitions and how -they interrelate. - -Most importantly, the processor hierarchy defined also defines the low power -idle states that are available to the platform, along with the rules for -determining which processors can be turned on or off and the circumstances -that control that. Without this information, the processors will run in -whatever power state they were left in by UEFI. - -Note too, that the processor Device objects defined and the entries in the -MADT for GICs are expected to be in synchronization. The _UID of the Device -object must correspond to processor IDs used in the MADT. - -It is recommended that CPPC (8.4.5) be used as the primary model for processor -performance control on arm64. C-states and P-states may become available at -some point in the future, but most current design work appears to favor CPPC. - -Further, it is essential that the ARMv8 SoC provide a fully functional -implementation of PSCI; this will be the only mechanism supported by ACPI -to control CPU power state. Booting of secondary CPUs using the ACPI -parking protocol is possible, but discouraged, since only PSCI is supported -for ARM servers. - - -ACPI System Address Map Interfaces ----------------------------------- -In Section 15 of the ACPI specification, several methods are mentioned as -possible mechanisms for conveying memory resource information to the kernel. -For arm64, we will only support UEFI for booting with ACPI, hence the UEFI -GetMemoryMap() boot service is the only mechanism that will be used. - - -ACPI Platform Error Interfaces (APEI) -------------------------------------- -The APEI tables supported are described above. - -APEI requires the equivalent of an SCI and an NMI on ARMv8. The SCI is used -to notify the OSPM of errors that have occurred but can be corrected and the -system can continue correct operation, even if possibly degraded. The NMI is -used to indicate fatal errors that cannot be corrected, and require immediate -attention. - -Since there is no direct equivalent of the x86 SCI or NMI, arm64 handles -these slightly differently. The SCI is handled as a high priority interrupt; -given that these are corrected (or correctable) errors being reported, this -is sufficient. The NMI is emulated as the highest priority interrupt -possible. This implies some caution must be used since there could be -interrupts at higher privilege levels or even interrupts at the same priority -as the emulated NMI. In Linux, this should not be the case but one should -be aware it could happen. - - -ACPI Objects Not Supported on ARM64 ------------------------------------ -While this may change in the future, there are several classes of objects -that can be defined, but are not currently of general interest to ARM servers. -Some of these objects have x86 equivalents, and may actually make sense in ARM -servers. However, there is either no hardware available at present, or there -may not even be a non-ARM implementation yet. Hence, they are not currently -supported. - -The following classes of objects are not supported: - - - Section 9.2: ambient light sensor devices - - - Section 9.3: battery devices - - - Section 9.4: lids (e.g., laptop lids) - - - Section 9.8.2: IDE controllers - - - Section 9.9: floppy controllers - - - Section 9.10: GPE block devices - - - Section 9.15: PC/AT RTC/CMOS devices - - - Section 9.16: user presence detection devices - - - Section 9.17: I/O APIC devices; all GICs must be enumerable via MADT - - - Section 9.18: time and alarm devices (see 9.15) - - - Section 10: power source and power meter devices - - - Section 11: thermal management - - - Section 12: embedded controllers interface - - - Section 13: SMBus interfaces - - -This also means that there is no support for the following objects: - -==== =========================== ==== ========== -Name Section Name Section -==== =========================== ==== ========== -_ALC 9.3.4 _FDM 9.10.3 -_ALI 9.3.2 _FIX 6.2.7 -_ALP 9.3.6 _GAI 10.4.5 -_ALR 9.3.5 _GHL 10.4.7 -_ALT 9.3.3 _GTM 9.9.2.1.1 -_BCT 10.2.2.10 _LID 9.5.1 -_BDN 6.5.3 _PAI 10.4.4 -_BIF 10.2.2.1 _PCL 10.3.2 -_BIX 10.2.2.1 _PIF 10.3.3 -_BLT 9.2.3 _PMC 10.4.1 -_BMA 10.2.2.4 _PMD 10.4.8 -_BMC 10.2.2.12 _PMM 10.4.3 -_BMD 10.2.2.11 _PRL 10.3.4 -_BMS 10.2.2.5 _PSR 10.3.1 -_BST 10.2.2.6 _PTP 10.4.2 -_BTH 10.2.2.7 _SBS 10.1.3 -_BTM 10.2.2.9 _SHL 10.4.6 -_BTP 10.2.2.8 _STM 9.9.2.1.1 -_DCK 6.5.2 _UPD 9.16.1 -_EC 12.12 _UPP 9.16.2 -_FDE 9.10.1 _WPC 10.5.2 -_FDI 9.10.2 _WPP 10.5.3 -==== =========================== ==== ========== diff --git a/Documentation/arm64/amu.rst b/Documentation/arm64/amu.rst deleted file mode 100644 index 01f2de2b0450..000000000000 --- a/Documentation/arm64/amu.rst +++ /dev/null @@ -1,119 +0,0 @@ -.. _amu_index: - -======================================================= -Activity Monitors Unit (AMU) extension in AArch64 Linux -======================================================= - -Author: Ionela Voinescu - -Date: 2019-09-10 - -This document briefly describes the provision of Activity Monitors Unit -support in AArch64 Linux. - - -Architecture overview ---------------------- - -The activity monitors extension is an optional extension introduced by the -ARMv8.4 CPU architecture. - -The activity monitors unit, implemented in each CPU, provides performance -counters intended for system management use. The AMU extension provides a -system register interface to the counter registers and also supports an -optional external memory-mapped interface. - -Version 1 of the Activity Monitors architecture implements a counter group -of four fixed and architecturally defined 64-bit event counters. - - - CPU cycle counter: increments at the frequency of the CPU. - - Constant counter: increments at the fixed frequency of the system - clock. - - Instructions retired: increments with every architecturally executed - instruction. - - Memory stall cycles: counts instruction dispatch stall cycles caused by - misses in the last level cache within the clock domain. - -When in WFI or WFE these counters do not increment. - -The Activity Monitors architecture provides space for up to 16 architected -event counters. Future versions of the architecture may use this space to -implement additional architected event counters. - -Additionally, version 1 implements a counter group of up to 16 auxiliary -64-bit event counters. - -On cold reset all counters reset to 0. - - -Basic support -------------- - -The kernel can safely run a mix of CPUs with and without support for the -activity monitors extension. Therefore, when CONFIG_ARM64_AMU_EXTN is -selected we unconditionally enable the capability to allow any late CPU -(secondary or hotplugged) to detect and use the feature. - -When the feature is detected on a CPU, we flag the availability of the -feature but this does not guarantee the correct functionality of the -counters, only the presence of the extension. - -Firmware (code running at higher exception levels, e.g. arm-tf) support is -needed to: - - - Enable access for lower exception levels (EL2 and EL1) to the AMU - registers. - - Enable the counters. If not enabled these will read as 0. - - Save/restore the counters before/after the CPU is being put/brought up - from the 'off' power state. - -When using kernels that have this feature enabled but boot with broken -firmware the user may experience panics or lockups when accessing the -counter registers. Even if these symptoms are not observed, the values -returned by the register reads might not correctly reflect reality. Most -commonly, the counters will read as 0, indicating that they are not -enabled. - -If proper support is not provided in firmware it's best to disable -CONFIG_ARM64_AMU_EXTN. To be noted that for security reasons, this does not -bypass the setting of AMUSERENR_EL0 to trap accesses from EL0 (userspace) to -EL1 (kernel). Therefore, firmware should still ensure accesses to AMU registers -are not trapped in EL2/EL3. - -The fixed counters of AMUv1 are accessible though the following system -register definitions: - - - SYS_AMEVCNTR0_CORE_EL0 - - SYS_AMEVCNTR0_CONST_EL0 - - SYS_AMEVCNTR0_INST_RET_EL0 - - SYS_AMEVCNTR0_MEM_STALL_EL0 - -Auxiliary platform specific counters can be accessed using -SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15. - -Details can be found in: arch/arm64/include/asm/sysreg.h. - - -Userspace access ----------------- - -Currently, access from userspace to the AMU registers is disabled due to: - - - Security reasons: they might expose information about code executed in - secure mode. - - Purpose: AMU counters are intended for system management use. - -Also, the presence of the feature is not visible to userspace. - - -Virtualization --------------- - -Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM -guest side is disabled due to: - - - Security reasons: they might expose information about code executed - by other guests or the host. - -Any attempt to access the AMU registers will result in an UNDEFINED -exception being injected into the guest. diff --git a/Documentation/arm64/arm-acpi.rst b/Documentation/arm64/arm-acpi.rst deleted file mode 100644 index 47ecb9930dde..000000000000 --- a/Documentation/arm64/arm-acpi.rst +++ /dev/null @@ -1,528 +0,0 @@ -===================== -ACPI on ARMv8 Servers -===================== - -ACPI can be used for ARMv8 general purpose servers designed to follow -the ARM SBSA (Server Base System Architecture) [0] and SBBR (Server -Base Boot Requirements) [1] specifications. Please note that the SBBR -can be retrieved simply by visiting [1], but the SBSA is currently only -available to those with an ARM login due to ARM IP licensing concerns. - -The ARMv8 kernel implements the reduced hardware model of ACPI version -5.1 or later. Links to the specification and all external documents -it refers to are managed by the UEFI Forum. The specification is -available at http://www.uefi.org/specifications and documents referenced -by the specification can be found via http://www.uefi.org/acpi. - -If an ARMv8 system does not meet the requirements of the SBSA and SBBR, -or cannot be described using the mechanisms defined in the required ACPI -specifications, then ACPI may not be a good fit for the hardware. - -While the documents mentioned above set out the requirements for building -industry-standard ARMv8 servers, they also apply to more than one operating -system. The purpose of this document is to describe the interaction between -ACPI and Linux only, on an ARMv8 system -- that is, what Linux expects of -ACPI and what ACPI can expect of Linux. - - -Why ACPI on ARM? ----------------- -Before examining the details of the interface between ACPI and Linux, it is -useful to understand why ACPI is being used. Several technologies already -exist in Linux for describing non-enumerable hardware, after all. In this -section we summarize a blog post [2] from Grant Likely that outlines the -reasoning behind ACPI on ARMv8 servers. Actually, we snitch a good portion -of the summary text almost directly, to be honest. - -The short form of the rationale for ACPI on ARM is: - -- ACPI’s byte code (AML) allows the platform to encode hardware behavior, - while DT explicitly does not support this. For hardware vendors, being - able to encode behavior is a key tool used in supporting operating - system releases on new hardware. - -- ACPI’s OSPM defines a power management model that constrains what the - platform is allowed to do into a specific model, while still providing - flexibility in hardware design. - -- In the enterprise server environment, ACPI has established bindings (such - as for RAS) which are currently used in production systems. DT does not. - Such bindings could be defined in DT at some point, but doing so means ARM - and x86 would end up using completely different code paths in both firmware - and the kernel. - -- Choosing a single interface to describe the abstraction between a platform - and an OS is important. Hardware vendors would not be required to implement - both DT and ACPI if they want to support multiple operating systems. And, - agreeing on a single interface instead of being fragmented into per OS - interfaces makes for better interoperability overall. - -- The new ACPI governance process works well and Linux is now at the same - table as hardware vendors and other OS vendors. In fact, there is no - longer any reason to feel that ACPI only belongs to Windows or that - Linux is in any way secondary to Microsoft in this arena. The move of - ACPI governance into the UEFI forum has significantly opened up the - specification development process, and currently, a large portion of the - changes being made to ACPI are being driven by Linux. - -Key to the use of ACPI is the support model. For servers in general, the -responsibility for hardware behaviour cannot solely be the domain of the -kernel, but rather must be split between the platform and the kernel, in -order to allow for orderly change over time. ACPI frees the OS from needing -to understand all the minute details of the hardware so that the OS doesn’t -need to be ported to each and every device individually. It allows the -hardware vendors to take responsibility for power management behaviour without -depending on an OS release cycle which is not under their control. - -ACPI is also important because hardware and OS vendors have already worked -out the mechanisms for supporting a general purpose computing ecosystem. The -infrastructure is in place, the bindings are in place, and the processes are -in place. DT does exactly what Linux needs it to when working with vertically -integrated devices, but there are no good processes for supporting what the -server vendors need. Linux could potentially get there with DT, but doing so -really just duplicates something that already works. ACPI already does what -the hardware vendors need, Microsoft won’t collaborate on DT, and hardware -vendors would still end up providing two completely separate firmware -interfaces -- one for Linux and one for Windows. - - -Kernel Compatibility --------------------- -One of the primary motivations for ACPI is standardization, and using that -to provide backward compatibility for Linux kernels. In the server market, -software and hardware are often used for long periods. ACPI allows the -kernel and firmware to agree on a consistent abstraction that can be -maintained over time, even as hardware or software change. As long as the -abstraction is supported, systems can be updated without necessarily having -to replace the kernel. - -When a Linux driver or subsystem is first implemented using ACPI, it by -definition ends up requiring a specific version of the ACPI specification --- it's baseline. ACPI firmware must continue to work, even though it may -not be optimal, with the earliest kernel version that first provides support -for that baseline version of ACPI. There may be a need for additional drivers, -but adding new functionality (e.g., CPU power management) should not break -older kernel versions. Further, ACPI firmware must also work with the most -recent version of the kernel. - - -Relationship with Device Tree ------------------------------ -ACPI support in drivers and subsystems for ARMv8 should never be mutually -exclusive with DT support at compile time. - -At boot time the kernel will only use one description method depending on -parameters passed from the boot loader (including kernel bootargs). - -Regardless of whether DT or ACPI is used, the kernel must always be capable -of booting with either scheme (in kernels with both schemes enabled at compile -time). - - -Booting using ACPI tables -------------------------- -The only defined method for passing ACPI tables to the kernel on ARMv8 -is via the UEFI system configuration table. Just so it is explicit, this -means that ACPI is only supported on platforms that boot via UEFI. - -When an ARMv8 system boots, it can either have DT information, ACPI tables, -or in some very unusual cases, both. If no command line parameters are used, -the kernel will try to use DT for device enumeration; if there is no DT -present, the kernel will try to use ACPI tables, but only if they are present. -In neither is available, the kernel will not boot. If acpi=force is used -on the command line, the kernel will attempt to use ACPI tables first, but -fall back to DT if there are no ACPI tables present. The basic idea is that -the kernel will not fail to boot unless it absolutely has no other choice. - -Processing of ACPI tables may be disabled by passing acpi=off on the kernel -command line; this is the default behavior. - -In order for the kernel to load and use ACPI tables, the UEFI implementation -MUST set the ACPI_20_TABLE_GUID to point to the RSDP table (the table with -the ACPI signature "RSD PTR "). If this pointer is incorrect and acpi=force -is used, the kernel will disable ACPI and try to use DT to boot instead; the -kernel has, in effect, determined that ACPI tables are not present at that -point. - -If the pointer to the RSDP table is correct, the table will be mapped into -the kernel by the ACPI core, using the address provided by UEFI. - -The ACPI core will then locate and map in all other ACPI tables provided by -using the addresses in the RSDP table to find the XSDT (eXtended System -Description Table). The XSDT in turn provides the addresses to all other -ACPI tables provided by the system firmware; the ACPI core will then traverse -this table and map in the tables listed. - -The ACPI core will ignore any provided RSDT (Root System Description Table). -RSDTs have been deprecated and are ignored on arm64 since they only allow -for 32-bit addresses. - -Further, the ACPI core will only use the 64-bit address fields in the FADT -(Fixed ACPI Description Table). Any 32-bit address fields in the FADT will -be ignored on arm64. - -Hardware reduced mode (see Section 4.1 of the ACPI 6.1 specification) will -be enforced by the ACPI core on arm64. Doing so allows the ACPI core to -run less complex code since it no longer has to provide support for legacy -hardware from other architectures. Any fields that are not to be used for -hardware reduced mode must be set to zero. - -For the ACPI core to operate properly, and in turn provide the information -the kernel needs to configure devices, it expects to find the following -tables (all section numbers refer to the ACPI 6.1 specification): - - - RSDP (Root System Description Pointer), section 5.2.5 - - - XSDT (eXtended System Description Table), section 5.2.8 - - - FADT (Fixed ACPI Description Table), section 5.2.9 - - - DSDT (Differentiated System Description Table), section - 5.2.11.1 - - - MADT (Multiple APIC Description Table), section 5.2.12 - - - GTDT (Generic Timer Description Table), section 5.2.24 - - - If PCI is supported, the MCFG (Memory mapped ConFiGuration - Table), section 5.2.6, specifically Table 5-31. - - - If booting without a console= kernel parameter is - supported, the SPCR (Serial Port Console Redirection table), - section 5.2.6, specifically Table 5-31. - - - If necessary to describe the I/O topology, SMMUs and GIC ITSs, - the IORT (Input Output Remapping Table, section 5.2.6, specifically - Table 5-31). - - - If NUMA is supported, the SRAT (System Resource Affinity Table) - and SLIT (System Locality distance Information Table), sections - 5.2.16 and 5.2.17, respectively. - -If the above tables are not all present, the kernel may or may not be -able to boot properly since it may not be able to configure all of the -devices available. This list of tables is not meant to be all inclusive; -in some environments other tables may be needed (e.g., any of the APEI -tables from section 18) to support specific functionality. - - -ACPI Detection --------------- -Drivers should determine their probe() type by checking for a null -value for ACPI_HANDLE, or checking .of_node, or other information in -the device structure. This is detailed further in the "Driver -Recommendations" section. - -In non-driver code, if the presence of ACPI needs to be detected at -run time, then check the value of acpi_disabled. If CONFIG_ACPI is not -set, acpi_disabled will always be 1. - - -Device Enumeration ------------------- -Device descriptions in ACPI should use standard recognized ACPI interfaces. -These may contain less information than is typically provided via a Device -Tree description for the same device. This is also one of the reasons that -ACPI can be useful -- the driver takes into account that it may have less -detailed information about the device and uses sensible defaults instead. -If done properly in the driver, the hardware can change and improve over -time without the driver having to change at all. - -Clocks provide an excellent example. In DT, clocks need to be specified -and the drivers need to take them into account. In ACPI, the assumption -is that UEFI will leave the device in a reasonable default state, including -any clock settings. If for some reason the driver needs to change a clock -value, this can be done in an ACPI method; all the driver needs to do is -invoke the method and not concern itself with what the method needs to do -to change the clock. Changing the hardware can then take place over time -by changing what the ACPI method does, and not the driver. - -In DT, the parameters needed by the driver to set up clocks as in the example -above are known as "bindings"; in ACPI, these are known as "Device Properties" -and provided to a driver via the _DSD object. - -ACPI tables are described with a formal language called ASL, the ACPI -Source Language (section 19 of the specification). This means that there -are always multiple ways to describe the same thing -- including device -properties. For example, device properties could use an ASL construct -that looks like this: Name(KEY0, "value0"). An ACPI device driver would -then retrieve the value of the property by evaluating the KEY0 object. -However, using Name() this way has multiple problems: (1) ACPI limits -names ("KEY0") to four characters unlike DT; (2) there is no industry -wide registry that maintains a list of names, minimizing re-use; (3) -there is also no registry for the definition of property values ("value0"), -again making re-use difficult; and (4) how does one maintain backward -compatibility as new hardware comes out? The _DSD method was created -to solve precisely these sorts of problems; Linux drivers should ALWAYS -use the _DSD method for device properties and nothing else. - -The _DSM object (ACPI Section 9.14.1) could also be used for conveying -device properties to a driver. Linux drivers should only expect it to -be used if _DSD cannot represent the data required, and there is no way -to create a new UUID for the _DSD object. Note that there is even less -regulation of the use of _DSM than there is of _DSD. Drivers that depend -on the contents of _DSM objects will be more difficult to maintain over -time because of this; as of this writing, the use of _DSM is the cause -of quite a few firmware problems and is not recommended. - -Drivers should look for device properties in the _DSD object ONLY; the _DSD -object is described in the ACPI specification section 6.2.5, but this only -describes how to define the structure of an object returned via _DSD, and -how specific data structures are defined by specific UUIDs. Linux should -only use the _DSD Device Properties UUID [5]: - - - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301 - - - https://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf - -The UEFI Forum provides a mechanism for registering device properties [4] -so that they may be used across all operating systems supporting ACPI. -Device properties that have not been registered with the UEFI Forum should -not be used. - -Before creating new device properties, check to be sure that they have not -been defined before and either registered in the Linux kernel documentation -as DT bindings, or the UEFI Forum as device properties. While we do not want -to simply move all DT bindings into ACPI device properties, we can learn from -what has been previously defined. - -If it is necessary to define a new device property, or if it makes sense to -synthesize the definition of a binding so it can be used in any firmware, -both DT bindings and ACPI device properties for device drivers have review -processes. Use them both. When the driver itself is submitted for review -to the Linux mailing lists, the device property definitions needed must be -submitted at the same time. A driver that supports ACPI and uses device -properties will not be considered complete without their definitions. Once -the device property has been accepted by the Linux community, it must be -registered with the UEFI Forum [4], which will review it again for consistency -within the registry. This may require iteration. The UEFI Forum, though, -will always be the canonical site for device property definitions. - -It may make sense to provide notice to the UEFI Forum that there is the -intent to register a previously unused device property name as a means of -reserving the name for later use. Other operating system vendors will -also be submitting registration requests and this may help smooth the -process. - -Once registration and review have been completed, the kernel provides an -interface for looking up device properties in a manner independent of -whether DT or ACPI is being used. This API should be used [6]; it can -eliminate some duplication of code paths in driver probing functions and -discourage divergence between DT bindings and ACPI device properties. - - -Programmable Power Control Resources ------------------------------------- -Programmable power control resources include such resources as voltage/current -providers (regulators) and clock sources. - -With ACPI, the kernel clock and regulator framework is not expected to be used -at all. - -The kernel assumes that power control of these resources is represented with -Power Resource Objects (ACPI section 7.1). The ACPI core will then handle -correctly enabling and disabling resources as they are needed. In order to -get that to work, ACPI assumes each device has defined D-states and that these -can be controlled through the optional ACPI methods _PS0, _PS1, _PS2, and _PS3; -in ACPI, _PS0 is the method to invoke to turn a device full on, and _PS3 is for -turning a device full off. - -There are two options for using those Power Resources. They can: - - - be managed in a _PSx method which gets called on entry to power - state Dx. - - - be declared separately as power resources with their own _ON and _OFF - methods. They are then tied back to D-states for a particular device - via _PRx which specifies which power resources a device needs to be on - while in Dx. Kernel then tracks number of devices using a power resource - and calls _ON/_OFF as needed. - -The kernel ACPI code will also assume that the _PSx methods follow the normal -ACPI rules for such methods: - - - If either _PS0 or _PS3 is implemented, then the other method must also - be implemented. - - - If a device requires usage or setup of a power resource when on, the ASL - should organize that it is allocated/enabled using the _PS0 method. - - - Resources allocated or enabled in the _PS0 method should be disabled - or de-allocated in the _PS3 method. - - - Firmware will leave the resources in a reasonable state before handing - over control to the kernel. - -Such code in _PSx methods will of course be very platform specific. But, -this allows the driver to abstract out the interface for operating the device -and avoid having to read special non-standard values from ACPI tables. Further, -abstracting the use of these resources allows the hardware to change over time -without requiring updates to the driver. - - -Clocks ------- -ACPI makes the assumption that clocks are initialized by the firmware -- -UEFI, in this case -- to some working value before control is handed over -to the kernel. This has implications for devices such as UARTs, or SoC-driven -LCD displays, for example. - -When the kernel boots, the clocks are assumed to be set to reasonable -working values. If for some reason the frequency needs to change -- e.g., -throttling for power management -- the device driver should expect that -process to be abstracted out into some ACPI method that can be invoked -(please see the ACPI specification for further recommendations on standard -methods to be expected). The only exceptions to this are CPU clocks where -CPPC provides a much richer interface than ACPI methods. If the clocks -are not set, there is no direct way for Linux to control them. - -If an SoC vendor wants to provide fine-grained control of the system clocks, -they could do so by providing ACPI methods that could be invoked by Linux -drivers. However, this is NOT recommended and Linux drivers should NOT use -such methods, even if they are provided. Such methods are not currently -standardized in the ACPI specification, and using them could tie a kernel -to a very specific SoC, or tie an SoC to a very specific version of the -kernel, both of which we are trying to avoid. - - -Driver Recommendations ----------------------- -DO NOT remove any DT handling when adding ACPI support for a driver. The -same device may be used on many different systems. - -DO try to structure the driver so that it is data-driven. That is, set up -a struct containing internal per-device state based on defaults and whatever -else must be discovered by the driver probe function. Then, have the rest -of the driver operate off of the contents of that struct. Doing so should -allow most divergence between ACPI and DT functionality to be kept local to -the probe function instead of being scattered throughout the driver. For -example:: - - static int device_probe_dt(struct platform_device *pdev) - { - /* DT specific functionality */ - ... - } - - static int device_probe_acpi(struct platform_device *pdev) - { - /* ACPI specific functionality */ - ... - } - - static int device_probe(struct platform_device *pdev) - { - ... - struct device_node node = pdev->dev.of_node; - ... - - if (node) - ret = device_probe_dt(pdev); - else if (ACPI_HANDLE(&pdev->dev)) - ret = device_probe_acpi(pdev); - else - /* other initialization */ - ... - /* Continue with any generic probe operations */ - ... - } - -DO keep the MODULE_DEVICE_TABLE entries together in the driver to make it -clear the different names the driver is probed for, both from DT and from -ACPI:: - - static struct of_device_id virtio_mmio_match[] = { - { .compatible = "virtio,mmio", }, - { } - }; - MODULE_DEVICE_TABLE(of, virtio_mmio_match); - - static const struct acpi_device_id virtio_mmio_acpi_match[] = { - { "LNRO0005", }, - { } - }; - MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match); - - -ASWG ----- -The ACPI specification changes regularly. During the year 2014, for instance, -version 5.1 was released and version 6.0 substantially completed, with most of -the changes being driven by ARM-specific requirements. Proposed changes are -presented and discussed in the ASWG (ACPI Specification Working Group) which -is a part of the UEFI Forum. The current version of the ACPI specification -is 6.1 release in January 2016. - -Participation in this group is open to all UEFI members. Please see -http://www.uefi.org/workinggroup for details on group membership. - -It is the intent of the ARMv8 ACPI kernel code to follow the ACPI specification -as closely as possible, and to only implement functionality that complies with -the released standards from UEFI ASWG. As a practical matter, there will be -vendors that provide bad ACPI tables or violate the standards in some way. -If this is because of errors, quirks and fix-ups may be necessary, but will -be avoided if possible. If there are features missing from ACPI that preclude -it from being used on a platform, ECRs (Engineering Change Requests) should be -submitted to ASWG and go through the normal approval process; for those that -are not UEFI members, many other members of the Linux community are and would -likely be willing to assist in submitting ECRs. - - -Linux Code ----------- -Individual items specific to Linux on ARM, contained in the Linux -source code, are in the list that follows: - -ACPI_OS_NAME - This macro defines the string to be returned when - an ACPI method invokes the _OS method. On ARM64 - systems, this macro will be "Linux" by default. - The command line parameter acpi_os= - can be used to set it to some other value. The - default value for other architectures is "Microsoft - Windows NT", for example. - -ACPI Objects ------------- -Detailed expectations for ACPI tables and object are listed in the file -Documentation/arm64/acpi_object_usage.rst. - - -References ----------- -[0] http://silver.arm.com - document ARM-DEN-0029, or newer: - "Server Base System Architecture", version 2.3, dated 27 Mar 2014 - -[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0044a/Server_Base_Boot_Requirements.pdf - Document ARM-DEN-0044A, or newer: "Server Base Boot Requirements, System - Software on ARM Platforms", dated 16 Aug 2014 - -[2] http://www.secretlab.ca/archives/151, - 10 Jan 2015, Copyright (c) 2015, - Linaro Ltd., written by Grant Likely. - -[3] AMD ACPI for Seattle platform documentation - http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Seattle_ACPI_Guide.pdf - - -[4] http://www.uefi.org/acpi - please see the link for the "ACPI _DSD Device - Property Registry Instructions" - -[5] http://www.uefi.org/acpi - please see the link for the "_DSD (Device - Specific Data) Implementation Guide" - -[6] Kernel code for the unified device - property interface can be found in - include/linux/property.h and drivers/base/property.c. - - -Authors -------- -- Al Stone -- Graeme Gregory -- Hanjun Guo - -- Grant Likely , for the "Why ACPI on ARM?" section diff --git a/Documentation/arm64/asymmetric-32bit.rst b/Documentation/arm64/asymmetric-32bit.rst deleted file mode 100644 index 64a0b505da7d..000000000000 --- a/Documentation/arm64/asymmetric-32bit.rst +++ /dev/null @@ -1,155 +0,0 @@ -====================== -Asymmetric 32-bit SoCs -====================== - -Author: Will Deacon - -This document describes the impact of asymmetric 32-bit SoCs on the -execution of 32-bit (``AArch32``) applications. - -Date: 2021-05-17 - -Introduction -============ - -Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset -of the CPUs are capable of executing 32-bit user applications. On such -a system, Linux by default treats the asymmetry as a "mismatch" and -disables support for both the ``PER_LINUX32`` personality and -``execve(2)`` of 32-bit ELF binaries, with the latter returning -``-ENOEXEC``. If the mismatch is detected during late onlining of a -64-bit-only CPU, then the onlining operation fails and the new CPU is -unavailable for scheduling. - -Surprisingly, these SoCs have been produced with the intention of -running legacy 32-bit binaries. Unsurprisingly, that doesn't work very -well with the default behaviour of Linux. - -It seems inevitable that future SoCs will drop 32-bit support -altogether, so if you're stuck in the unenviable position of needing to -run 32-bit code on one of these transitionary platforms then you would -be wise to consider alternatives such as recompilation, emulation or -retirement. If neither of those options are practical, then read on. - -Enabling kernel support -======================= - -Since the kernel support is not completely transparent to userspace, -allowing 32-bit tasks to run on an asymmetric 32-bit system requires an -explicit "opt-in" and can be enabled by passing the -``allow_mismatched_32bit_el0`` parameter on the kernel command-line. - -For the remainder of this document we will refer to an *asymmetric -system* to mean an asymmetric 32-bit SoC running Linux with this kernel -command-line option enabled. - -Userspace impact -================ - -32-bit tasks running on an asymmetric system behave in mostly the same -way as on a homogeneous system, with a few key differences relating to -CPU affinity. - -sysfs ------ - -The subset of CPUs capable of running 32-bit tasks is described in -``/sys/devices/system/cpu/aarch32_el0`` and is documented further in -``Documentation/ABI/testing/sysfs-devices-system-cpu``. - -**Note:** CPUs are advertised by this file as they are detected and so -late-onlining of 32-bit-capable CPUs can result in the file contents -being modified by the kernel at runtime. Once advertised, CPUs are never -removed from the file. - -``execve(2)`` -------------- - -On a homogeneous system, the CPU affinity of a task is preserved across -``execve(2)``. This is not always possible on an asymmetric system, -specifically when the new program being executed is 32-bit yet the -affinity mask contains 64-bit-only CPUs. In this situation, the kernel -determines the new affinity mask as follows: - - 1. If the 32-bit-capable subset of the affinity mask is not empty, - then the affinity is restricted to that subset and the old affinity - mask is saved. This saved mask is inherited over ``fork(2)`` and - preserved across ``execve(2)`` of 32-bit programs. - - **Note:** This step does not apply to ``SCHED_DEADLINE`` tasks. - See `SCHED_DEADLINE`_. - - 2. Otherwise, the cpuset hierarchy of the task is walked until an - ancestor is found containing at least one 32-bit-capable CPU. The - affinity of the task is then changed to match the 32-bit-capable - subset of the cpuset determined by the walk. - - 3. On failure (i.e. out of memory), the affinity is changed to the set - of all 32-bit-capable CPUs of which the kernel is aware. - -A subsequent ``execve(2)`` of a 64-bit program by the 32-bit task will -invalidate the affinity mask saved in (1) and attempt to restore the CPU -affinity of the task using the saved mask if it was previously valid. -This restoration may fail due to intervening changes to the deadline -policy or cpuset hierarchy, in which case the ``execve(2)`` continues -with the affinity unchanged. - -Calls to ``sched_setaffinity(2)`` for a 32-bit task will consider only -the 32-bit-capable CPUs of the requested affinity mask. On success, the -affinity for the task is updated and any saved mask from a prior -``execve(2)`` is invalidated. - -``SCHED_DEADLINE`` ------------------- - -Explicit admission of a 32-bit deadline task to the default root domain -(e.g. by calling ``sched_setattr(2)``) is rejected on an asymmetric -32-bit system unless admission control is disabled by writing -1 to -``/proc/sys/kernel/sched_rt_runtime_us``. - -``execve(2)`` of a 32-bit program from a 64-bit deadline task will -return ``-ENOEXEC`` if the root domain for the task contains any -64-bit-only CPUs and admission control is enabled. Concurrent offlining -of 32-bit-capable CPUs may still necessitate the procedure described in -`execve(2)`_, in which case step (1) is skipped and a warning is -emitted on the console. - -**Note:** It is recommended that a set of 32-bit-capable CPUs are placed -into a separate root domain if ``SCHED_DEADLINE`` is to be used with -32-bit tasks on an asymmetric system. Failure to do so is likely to -result in missed deadlines. - -Cpusets -------- - -The affinity of a 32-bit task on an asymmetric system may include CPUs -that are not explicitly allowed by the cpuset to which it is attached. -This can occur as a result of the following two situations: - - - A 64-bit task attached to a cpuset which allows only 64-bit CPUs - executes a 32-bit program. - - - All of the 32-bit-capable CPUs allowed by a cpuset containing a - 32-bit task are offlined. - -In both of these cases, the new affinity is calculated according to step -(2) of the process described in `execve(2)`_ and the cpuset hierarchy is -unchanged irrespective of the cgroup version. - -CPU hotplug ------------ - -On an asymmetric system, the first detected 32-bit-capable CPU is -prevented from being offlined by userspace and any such attempt will -return ``-EPERM``. Note that suspend is still permitted even if the -primary CPU (i.e. CPU 0) is 64-bit-only. - -KVM ---- - -Although KVM will not advertise 32-bit EL0 support to any vCPUs on an -asymmetric system, a broken guest at EL1 could still attempt to execute -32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit -mode will return to host userspace with an ``exit_reason`` of -``KVM_EXIT_FAIL_ENTRY`` and will remain non-runnable until successfully -re-initialised by a subsequent ``KVM_ARM_VCPU_INIT`` operation. diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst deleted file mode 100644 index ffeccdd6bdac..000000000000 --- a/Documentation/arm64/booting.rst +++ /dev/null @@ -1,431 +0,0 @@ -===================== -Booting AArch64 Linux -===================== - -Author: Will Deacon - -Date : 07 September 2012 - -This document is based on the ARM booting document by Russell King and -is relevant to all public releases of the AArch64 Linux kernel. - -The AArch64 exception model is made up of a number of exception levels -(EL0 - EL3), with EL0, EL1 and EL2 having a secure and a non-secure -counterpart. EL2 is the hypervisor level, EL3 is the highest priority -level and exists only in secure mode. Both are architecturally optional. - -For the purposes of this document, we will use the term `boot loader` -simply to define all software that executes on the CPU(s) before control -is passed to the Linux kernel. This may include secure monitor and -hypervisor code, or it may just be a handful of instructions for -preparing a minimal boot environment. - -Essentially, the boot loader should provide (as a minimum) the -following: - -1. Setup and initialise the RAM -2. Setup the device tree -3. Decompress the kernel image -4. Call the kernel image - - -1. Setup and initialise RAM ---------------------------- - -Requirement: MANDATORY - -The boot loader is expected to find and initialise all RAM that the -kernel will use for volatile data storage in the system. It performs -this in a machine dependent manner. (It may use internal algorithms -to automatically locate and size all RAM, or it may use knowledge of -the RAM in the machine, or any other method the boot loader designer -sees fit.) - - -2. Setup the device tree -------------------------- - -Requirement: MANDATORY - -The device tree blob (dtb) must be placed on an 8-byte boundary and must -not exceed 2 megabytes in size. Since the dtb will be mapped cacheable -using blocks of up to 2 megabytes in size, it must not be placed within -any 2M region which must be mapped with any specific attributes. - -NOTE: versions prior to v4.2 also require that the DTB be placed within -the 512 MB region starting at text_offset bytes below the kernel Image. - -3. Decompress the kernel image ------------------------------- - -Requirement: OPTIONAL - -The AArch64 kernel does not currently provide a decompressor and -therefore requires decompression (gzip etc.) to be performed by the boot -loader if a compressed Image target (e.g. Image.gz) is used. For -bootloaders that do not implement this requirement, the uncompressed -Image target is available instead. - - -4. Call the kernel image ------------------------- - -Requirement: MANDATORY - -The decompressed kernel image contains a 64-byte header as follows:: - - u32 code0; /* Executable code */ - u32 code1; /* Executable code */ - u64 text_offset; /* Image load offset, little endian */ - u64 image_size; /* Effective Image size, little endian */ - u64 flags; /* kernel flags, little endian */ - u64 res2 = 0; /* reserved */ - u64 res3 = 0; /* reserved */ - u64 res4 = 0; /* reserved */ - u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ - u32 res5; /* reserved (used for PE COFF offset) */ - - -Header notes: - -- As of v3.17, all fields are little endian unless stated otherwise. - -- code0/code1 are responsible for branching to stext. - -- when booting through EFI, code0/code1 are initially skipped. - res5 is an offset to the PE header and the PE header has the EFI - entry point (efi_stub_entry). When the stub has done its work, it - jumps to code0 to resume the normal boot process. - -- Prior to v3.17, the endianness of text_offset was not specified. In - these cases image_size is zero and text_offset is 0x80000 in the - endianness of the kernel. Where image_size is non-zero image_size is - little-endian and must be respected. Where image_size is zero, - text_offset can be assumed to be 0x80000. - -- The flags field (introduced in v3.17) is a little-endian 64-bit field - composed as follows: - - ============= =============================================================== - Bit 0 Kernel endianness. 1 if BE, 0 if LE. - Bit 1-2 Kernel Page size. - - * 0 - Unspecified. - * 1 - 4K - * 2 - 16K - * 3 - 64K - Bit 3 Kernel physical placement - - 0 - 2MB aligned base should be as close as possible - to the base of DRAM, since memory below it is not - accessible via the linear mapping - 1 - 2MB aligned base such that all image_size bytes - counted from the start of the image are within - the 48-bit addressable range of physical memory - Bits 4-63 Reserved. - ============= =============================================================== - -- When image_size is zero, a bootloader should attempt to keep as much - memory as possible free for use by the kernel immediately after the - end of the kernel image. The amount of space required will vary - depending on selected features, and is effectively unbound. - -The Image must be placed text_offset bytes from a 2MB aligned base -address anywhere in usable system RAM and called there. The region -between the 2 MB aligned base address and the start of the image has no -special significance to the kernel, and may be used for other purposes. -At least image_size bytes from the start of the image must be free for -use by the kernel. -NOTE: versions prior to v4.6 cannot make use of memory below the -physical offset of the Image so it is recommended that the Image be -placed as close as possible to the start of system RAM. - -If an initrd/initramfs is passed to the kernel at boot, it must reside -entirely within a 1 GB aligned physical memory window of up to 32 GB in -size that fully covers the kernel Image as well. - -Any memory described to the kernel (even that below the start of the -image) which is not marked as reserved from the kernel (e.g., with a -memreserve region in the device tree) will be considered as available to -the kernel. - -Before jumping into the kernel, the following conditions must be met: - -- Quiesce all DMA capable devices so that memory does not get - corrupted by bogus network packets or disk data. This will save - you many hours of debug. - -- Primary CPU general-purpose register settings: - - - x0 = physical address of device tree blob (dtb) in system RAM. - - x1 = 0 (reserved for future use) - - x2 = 0 (reserved for future use) - - x3 = 0 (reserved for future use) - -- CPU mode - - All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, - IRQ and FIQ). - The CPU must be in non-secure state, either in EL2 (RECOMMENDED in order - to have access to the virtualisation extensions), or in EL1. - -- Caches, MMUs - - The MMU must be off. - - The instruction cache may be on or off, and must not hold any stale - entries corresponding to the loaded kernel image. - - The address range corresponding to the loaded kernel image must be - cleaned to the PoC. In the presence of a system cache or other - coherent masters with caches enabled, this will typically require - cache maintenance by VA rather than set/way operations. - System caches which respect the architected cache maintenance by VA - operations must be configured and may be enabled. - System caches which do not respect architected cache maintenance by VA - operations (not recommended) must be configured and disabled. - -- Architected timers - - CNTFRQ must be programmed with the timer frequency and CNTVOFF must - be programmed with a consistent value on all CPUs. If entering the - kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where - available. - -- Coherency - - All CPUs to be booted by the kernel must be part of the same coherency - domain on entry to the kernel. This may require IMPLEMENTATION DEFINED - initialisation to enable the receiving of maintenance operations on - each CPU. - -- System registers - - All writable architected system registers at or below the exception - level where the kernel image will be entered must be initialised by - software at a higher exception level to prevent execution in an UNKNOWN - state. - - For all systems: - - If EL3 is present: - - - SCR_EL3.FIQ must have the same value across all CPUs the kernel is - executing on. - - The value of SCR_EL3.FIQ must be the same as the one present at boot - time whenever the kernel is executing. - - - If EL3 is present and the kernel is entered at EL2: - - - SCR_EL3.HCE (bit 8) must be initialised to 0b1. - - For systems with a GICv3 interrupt controller to be used in v3 mode: - - If EL3 is present: - - - ICC_SRE_EL3.Enable (bit 3) must be initialised to 0b1. - - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. - - ICC_CTLR_EL3.PMHE (bit 6) must be set to the same value across - all CPUs the kernel is executing on, and must stay constant - for the lifetime of the kernel. - - - If the kernel is entered at EL1: - - - ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 - - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. - - - The DT or ACPI tables must describe a GICv3 interrupt controller. - - For systems with a GICv3 interrupt controller to be used in - compatibility (v2) mode: - - - If EL3 is present: - - ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b0. - - - If the kernel is entered at EL1: - - ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b0. - - - The DT or ACPI tables must describe a GICv2 interrupt controller. - - For CPUs with pointer authentication functionality: - - - If EL3 is present: - - - SCR_EL3.APK (bit 16) must be initialised to 0b1 - - SCR_EL3.API (bit 17) must be initialised to 0b1 - - - If the kernel is entered at EL1: - - - HCR_EL2.APK (bit 40) must be initialised to 0b1 - - HCR_EL2.API (bit 41) must be initialised to 0b1 - - For CPUs with Activity Monitors Unit v1 (AMUv1) extension present: - - - If EL3 is present: - - - CPTR_EL3.TAM (bit 30) must be initialised to 0b0 - - CPTR_EL2.TAM (bit 30) must be initialised to 0b0 - - AMCNTENSET0_EL0 must be initialised to 0b1111 - - AMCNTENSET1_EL0 must be initialised to a platform specific value - having 0b1 set for the corresponding bit for each of the auxiliary - counters present. - - - If the kernel is entered at EL1: - - - AMCNTENSET0_EL0 must be initialised to 0b1111 - - AMCNTENSET1_EL0 must be initialised to a platform specific value - having 0b1 set for the corresponding bit for each of the auxiliary - counters present. - - For CPUs with the Fine Grained Traps (FEAT_FGT) extension present: - - - If EL3 is present and the kernel is entered at EL2: - - - SCR_EL3.FGTEn (bit 27) must be initialised to 0b1. - - For CPUs with support for HCRX_EL2 (FEAT_HCX) present: - - - If EL3 is present and the kernel is entered at EL2: - - - SCR_EL3.HXEn (bit 38) must be initialised to 0b1. - - For CPUs with Advanced SIMD and floating point support: - - - If EL3 is present: - - - CPTR_EL3.TFP (bit 10) must be initialised to 0b0. - - - If EL2 is present and the kernel is entered at EL1: - - - CPTR_EL2.TFP (bit 10) must be initialised to 0b0. - - For CPUs with the Scalable Vector Extension (FEAT_SVE) present: - - - if EL3 is present: - - - CPTR_EL3.EZ (bit 8) must be initialised to 0b1. - - - ZCR_EL3.LEN must be initialised to the same value for all CPUs the - kernel is executed on. - - - If the kernel is entered at EL1 and EL2 is present: - - - CPTR_EL2.TZ (bit 8) must be initialised to 0b0. - - - CPTR_EL2.ZEN (bits 17:16) must be initialised to 0b11. - - - ZCR_EL2.LEN must be initialised to the same value for all CPUs the - kernel will execute on. - - For CPUs with the Scalable Matrix Extension (FEAT_SME): - - - If EL3 is present: - - - CPTR_EL3.ESM (bit 12) must be initialised to 0b1. - - - SCR_EL3.EnTP2 (bit 41) must be initialised to 0b1. - - - SMCR_EL3.LEN must be initialised to the same value for all CPUs the - kernel will execute on. - - - If the kernel is entered at EL1 and EL2 is present: - - - CPTR_EL2.TSM (bit 12) must be initialised to 0b0. - - - CPTR_EL2.SMEN (bits 25:24) must be initialised to 0b11. - - - SCTLR_EL2.EnTP2 (bit 60) must be initialised to 0b1. - - - SMCR_EL2.LEN must be initialised to the same value for all CPUs the - kernel will execute on. - - - HWFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. - - - HWFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. - - - HWFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. - - - HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. - - For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64): - - - If EL3 is present: - - - SMCR_EL3.FA64 (bit 31) must be initialised to 0b1. - - - If the kernel is entered at EL1 and EL2 is present: - - - SMCR_EL2.FA64 (bit 31) must be initialised to 0b1. - - For CPUs with the Memory Tagging Extension feature (FEAT_MTE2): - - - If EL3 is present: - - - SCR_EL3.ATA (bit 26) must be initialised to 0b1. - - - If the kernel is entered at EL1 and EL2 is present: - - - HCR_EL2.ATA (bit 56) must be initialised to 0b1. - - For CPUs with the Scalable Matrix Extension version 2 (FEAT_SME2): - - - If EL3 is present: - - - SMCR_EL3.EZT0 (bit 30) must be initialised to 0b1. - - - If the kernel is entered at EL1 and EL2 is present: - - - SMCR_EL2.EZT0 (bit 30) must be initialised to 0b1. - -The requirements described above for CPU mode, caches, MMUs, architected -timers, coherency and system registers apply to all CPUs. All CPUs must -enter the kernel in the same exception level. Where the values documented -disable traps it is permissible for these traps to be enabled so long as -those traps are handled transparently by higher exception levels as though -the values documented were set. - -The boot loader is expected to enter the kernel on each CPU in the -following manner: - -- The primary CPU must jump directly to the first instruction of the - kernel image. The device tree blob passed by this CPU must contain - an 'enable-method' property for each cpu node. The supported - enable-methods are described below. - - It is expected that the bootloader will generate these device tree - properties and insert them into the blob prior to kernel entry. - -- CPUs with a "spin-table" enable-method must have a 'cpu-release-addr' - property in their cpu node. This property identifies a - naturally-aligned 64-bit zero-initalised memory location. - - These CPUs should spin outside of the kernel in a reserved area of - memory (communicated to the kernel by a /memreserve/ region in the - device tree) polling their cpu-release-addr location, which must be - contained in the reserved region. A wfe instruction may be inserted - to reduce the overhead of the busy-loop and a sev will be issued by - the primary CPU. When a read of the location pointed to by the - cpu-release-addr returns a non-zero value, the CPU must jump to this - value. The value will be written as a single 64-bit little-endian - value, so CPUs must convert the read value to their native endianness - before jumping to it. - -- CPUs with a "psci" enable method should remain outside of - the kernel (i.e. outside of the regions of memory described to the - kernel in the memory node, or in a reserved area of memory described - to the kernel by a /memreserve/ region in the device tree). The - kernel will issue CPU_ON calls as described in ARM document number ARM - DEN 0022A ("Power State Coordination Interface System Software on ARM - processors") to bring CPUs into the kernel. - - The device tree should contain a 'psci' node, as described in - Documentation/devicetree/bindings/arm/psci.yaml. - -- Secondary CPU general-purpose register settings - - - x0 = 0 (reserved for future use) - - x1 = 0 (reserved for future use) - - x2 = 0 (reserved for future use) - - x3 = 0 (reserved for future use) diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst deleted file mode 100644 index c7adc7897df6..000000000000 --- a/Documentation/arm64/cpu-feature-registers.rst +++ /dev/null @@ -1,400 +0,0 @@ -=========================== -ARM64 CPU Feature Registers -=========================== - -Author: Suzuki K Poulose - - -This file describes the ABI for exporting the AArch64 CPU ID/feature -registers to userspace. The availability of this ABI is advertised -via the HWCAP_CPUID in HWCAPs. - -1. Motivation -------------- - -The ARM architecture defines a set of feature registers, which describe -the capabilities of the CPU/system. Access to these system registers is -restricted from EL0 and there is no reliable way for an application to -extract this information to make better decisions at runtime. There is -limited information available to the application via HWCAPs, however -there are some issues with their usage. - - a) Any change to the HWCAPs requires an update to userspace (e.g libc) - to detect the new changes, which can take a long time to appear in - distributions. Exposing the registers allows applications to get the - information without requiring updates to the toolchains. - - b) Access to HWCAPs is sometimes limited (e.g prior to libc, or - when ld is initialised at startup time). - - c) HWCAPs cannot represent non-boolean information effectively. The - architecture defines a canonical format for representing features - in the ID registers; this is well defined and is capable of - representing all valid architecture variations. - - -2. Requirements ---------------- - - a) Safety: - - Applications should be able to use the information provided by the - infrastructure to run safely across the system. This has greater - implications on a system with heterogeneous CPUs. - The infrastructure exports a value that is safe across all the - available CPU on the system. - - e.g, If at least one CPU doesn't implement CRC32 instructions, while - others do, we should report that the CRC32 is not implemented. - Otherwise an application could crash when scheduled on the CPU - which doesn't support CRC32. - - b) Security: - - Applications should only be able to receive information that is - relevant to the normal operation in userspace. Hence, some of the - fields are masked out(i.e, made invisible) and their values are set to - indicate the feature is 'not supported'. See Section 4 for the list - of visible features. Also, the kernel may manipulate the fields - based on what it supports. e.g, If FP is not supported by the - kernel, the values could indicate that the FP is not available - (even when the CPU provides it). - - c) Implementation Defined Features - - The infrastructure doesn't expose any register which is - IMPLEMENTATION DEFINED as per ARMv8-A Architecture. - - d) CPU Identification: - - MIDR_EL1 is exposed to help identify the processor. On a - heterogeneous system, this could be racy (just like getcpu()). The - process could be migrated to another CPU by the time it uses the - register value, unless the CPU affinity is set. Hence, there is no - guarantee that the value reflects the processor that it is - currently executing on. The REVIDR is not exposed due to this - constraint, as REVIDR makes sense only in conjunction with the - MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs - at:: - - /sys/devices/system/cpu/cpu$ID/regs/identification/ - \- midr - \- revidr - -3. Implementation --------------------- - -The infrastructure is built on the emulation of the 'MRS' instruction. -Accessing a restricted system register from an application generates an -exception and ends up in SIGILL being delivered to the process. -The infrastructure hooks into the exception handler and emulates the -operation if the source belongs to the supported system register space. - -The infrastructure emulates only the following system register space:: - - Op0=3, Op1=0, CRn=0, CRm=0,2,3,4,5,6,7 - -(See Table C5-6 'System instruction encodings for non-Debug System -register accesses' in ARMv8 ARM DDI 0487A.h, for the list of -registers). - -The following rules are applied to the value returned by the -infrastructure: - - a) The value of an 'IMPLEMENTATION DEFINED' field is set to 0. - b) The value of a reserved field is populated with the reserved - value as defined by the architecture. - c) The value of a 'visible' field holds the system wide safe value - for the particular feature (except for MIDR_EL1, see section 4). - d) All other fields (i.e, invisible fields) are set to indicate - the feature is missing (as defined by the architecture). - -4. List of registers with visible features -------------------------------------------- - - 1) ID_AA64ISAR0_EL1 - Instruction Set Attribute Register 0 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | RNDR | [63-60] | y | - +------------------------------+---------+---------+ - | TS | [55-52] | y | - +------------------------------+---------+---------+ - | FHM | [51-48] | y | - +------------------------------+---------+---------+ - | DP | [47-44] | y | - +------------------------------+---------+---------+ - | SM4 | [43-40] | y | - +------------------------------+---------+---------+ - | SM3 | [39-36] | y | - +------------------------------+---------+---------+ - | SHA3 | [35-32] | y | - +------------------------------+---------+---------+ - | RDM | [31-28] | y | - +------------------------------+---------+---------+ - | ATOMICS | [23-20] | y | - +------------------------------+---------+---------+ - | CRC32 | [19-16] | y | - +------------------------------+---------+---------+ - | SHA2 | [15-12] | y | - +------------------------------+---------+---------+ - | SHA1 | [11-8] | y | - +------------------------------+---------+---------+ - | AES | [7-4] | y | - +------------------------------+---------+---------+ - - - 2) ID_AA64PFR0_EL1 - Processor Feature Register 0 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | DIT | [51-48] | y | - +------------------------------+---------+---------+ - | SVE | [35-32] | y | - +------------------------------+---------+---------+ - | GIC | [27-24] | n | - +------------------------------+---------+---------+ - | AdvSIMD | [23-20] | y | - +------------------------------+---------+---------+ - | FP | [19-16] | y | - +------------------------------+---------+---------+ - | EL3 | [15-12] | n | - +------------------------------+---------+---------+ - | EL2 | [11-8] | n | - +------------------------------+---------+---------+ - | EL1 | [7-4] | n | - +------------------------------+---------+---------+ - | EL0 | [3-0] | n | - +------------------------------+---------+---------+ - - - 3) ID_AA64PFR1_EL1 - Processor Feature Register 1 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | MTE | [11-8] | y | - +------------------------------+---------+---------+ - | SSBS | [7-4] | y | - +------------------------------+---------+---------+ - | BT | [3-0] | y | - +------------------------------+---------+---------+ - - - 4) MIDR_EL1 - Main ID Register - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | Implementer | [31-24] | y | - +------------------------------+---------+---------+ - | Variant | [23-20] | y | - +------------------------------+---------+---------+ - | Architecture | [19-16] | y | - +------------------------------+---------+---------+ - | PartNum | [15-4] | y | - +------------------------------+---------+---------+ - | Revision | [3-0] | y | - +------------------------------+---------+---------+ - - NOTE: The 'visible' fields of MIDR_EL1 will contain the value - as available on the CPU where it is fetched and is not a system - wide safe value. - - 5) ID_AA64ISAR1_EL1 - Instruction set attribute register 1 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | I8MM | [55-52] | y | - +------------------------------+---------+---------+ - | DGH | [51-48] | y | - +------------------------------+---------+---------+ - | BF16 | [47-44] | y | - +------------------------------+---------+---------+ - | SB | [39-36] | y | - +------------------------------+---------+---------+ - | FRINTTS | [35-32] | y | - +------------------------------+---------+---------+ - | GPI | [31-28] | y | - +------------------------------+---------+---------+ - | GPA | [27-24] | y | - +------------------------------+---------+---------+ - | LRCPC | [23-20] | y | - +------------------------------+---------+---------+ - | FCMA | [19-16] | y | - +------------------------------+---------+---------+ - | JSCVT | [15-12] | y | - +------------------------------+---------+---------+ - | API | [11-8] | y | - +------------------------------+---------+---------+ - | APA | [7-4] | y | - +------------------------------+---------+---------+ - | DPB | [3-0] | y | - +------------------------------+---------+---------+ - - 6) ID_AA64MMFR0_EL1 - Memory model feature register 0 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | ECV | [63-60] | y | - +------------------------------+---------+---------+ - - 7) ID_AA64MMFR2_EL1 - Memory model feature register 2 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | AT | [35-32] | y | - +------------------------------+---------+---------+ - - 8) ID_AA64ZFR0_EL1 - SVE feature ID register 0 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | F64MM | [59-56] | y | - +------------------------------+---------+---------+ - | F32MM | [55-52] | y | - +------------------------------+---------+---------+ - | I8MM | [47-44] | y | - +------------------------------+---------+---------+ - | SM4 | [43-40] | y | - +------------------------------+---------+---------+ - | SHA3 | [35-32] | y | - +------------------------------+---------+---------+ - | BF16 | [23-20] | y | - +------------------------------+---------+---------+ - | BitPerm | [19-16] | y | - +------------------------------+---------+---------+ - | AES | [7-4] | y | - +------------------------------+---------+---------+ - | SVEVer | [3-0] | y | - +------------------------------+---------+---------+ - - 8) ID_AA64MMFR1_EL1 - Memory model feature register 1 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | AFP | [47-44] | y | - +------------------------------+---------+---------+ - - 9) ID_AA64ISAR2_EL1 - Instruction set attribute register 2 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | RPRES | [7-4] | y | - +------------------------------+---------+---------+ - | WFXT | [3-0] | y | - +------------------------------+---------+---------+ - - 10) MVFR0_EL1 - AArch32 Media and VFP Feature Register 0 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | FPDP | [11-8] | y | - +------------------------------+---------+---------+ - - 11) MVFR1_EL1 - AArch32 Media and VFP Feature Register 1 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | SIMDFMAC | [31-28] | y | - +------------------------------+---------+---------+ - | SIMDSP | [19-16] | y | - +------------------------------+---------+---------+ - | SIMDInt | [15-12] | y | - +------------------------------+---------+---------+ - | SIMDLS | [11-8] | y | - +------------------------------+---------+---------+ - - 12) ID_ISAR5_EL1 - AArch32 Instruction Set Attribute Register 5 - - +------------------------------+---------+---------+ - | Name | bits | visible | - +------------------------------+---------+---------+ - | CRC32 | [19-16] | y | - +------------------------------+---------+---------+ - | SHA2 | [15-12] | y | - +------------------------------+---------+---------+ - | SHA1 | [11-8] | y | - +------------------------------+---------+---------+ - | AES | [7-4] | y | - +------------------------------+---------+---------+ - - -Appendix I: Example -------------------- - -:: - - /* - * Sample program to demonstrate the MRS emulation ABI. - * - * Copyright (C) 2015-2016, ARM Ltd - * - * Author: Suzuki K Poulose - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - */ - - #include - #include - #include - - #define get_cpu_ftr(id) ({ \ - unsigned long __val; \ - asm("mrs %0, "#id : "=r" (__val)); \ - printf("%-20s: 0x%016lx\n", #id, __val); \ - }) - - int main(void) - { - - if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) { - fputs("CPUID registers unavailable\n", stderr); - return 1; - } - - get_cpu_ftr(ID_AA64ISAR0_EL1); - get_cpu_ftr(ID_AA64ISAR1_EL1); - get_cpu_ftr(ID_AA64MMFR0_EL1); - get_cpu_ftr(ID_AA64MMFR1_EL1); - get_cpu_ftr(ID_AA64PFR0_EL1); - get_cpu_ftr(ID_AA64PFR1_EL1); - get_cpu_ftr(ID_AA64DFR0_EL1); - get_cpu_ftr(ID_AA64DFR1_EL1); - - get_cpu_ftr(MIDR_EL1); - get_cpu_ftr(MPIDR_EL1); - get_cpu_ftr(REVIDR_EL1); - - #if 0 - /* Unexposed register access causes SIGILL */ - get_cpu_ftr(ID_MMFR0_EL1); - #endif - - return 0; - } diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst deleted file mode 100644 index 83e57e4d38e2..000000000000 --- a/Documentation/arm64/elf_hwcaps.rst +++ /dev/null @@ -1,309 +0,0 @@ -.. _elf_hwcaps_index: - -================ -ARM64 ELF hwcaps -================ - -This document describes the usage and semantics of the arm64 ELF hwcaps. - - -1. Introduction ---------------- - -Some hardware or software features are only available on some CPU -implementations, and/or with certain kernel configurations, but have no -architected discovery mechanism available to userspace code at EL0. The -kernel exposes the presence of these features to userspace through a set -of flags called hwcaps, exposed in the auxiliary vector. - -Userspace software can test for features by acquiring the AT_HWCAP or -AT_HWCAP2 entry of the auxiliary vector, and testing whether the relevant -flags are set, e.g.:: - - bool floating_point_is_present(void) - { - unsigned long hwcaps = getauxval(AT_HWCAP); - if (hwcaps & HWCAP_FP) - return true; - - return false; - } - -Where software relies on a feature described by a hwcap, it should check -the relevant hwcap flag to verify that the feature is present before -attempting to make use of the feature. - -Features cannot be probed reliably through other means. When a feature -is not available, attempting to use it may result in unpredictable -behaviour, and is not guaranteed to result in any reliable indication -that the feature is unavailable, such as a SIGILL. - - -2. Interpretation of hwcaps ---------------------------- - -The majority of hwcaps are intended to indicate the presence of features -which are described by architected ID registers inaccessible to -userspace code at EL0. These hwcaps are defined in terms of ID register -fields, and should be interpreted with reference to the definition of -these fields in the ARM Architecture Reference Manual (ARM ARM). - -Such hwcaps are described below in the form:: - - Functionality implied by idreg.field == val. - -Such hwcaps indicate the availability of functionality that the ARM ARM -defines as being present when idreg.field has value val, but do not -indicate that idreg.field is precisely equal to val, nor do they -indicate the absence of functionality implied by other values of -idreg.field. - -Other hwcaps may indicate the presence of features which cannot be -described by ID registers alone. These may be described without -reference to ID registers, and may refer to other documentation. - - -3. The hwcaps exposed in AT_HWCAP ---------------------------------- - -HWCAP_FP - Functionality implied by ID_AA64PFR0_EL1.FP == 0b0000. - -HWCAP_ASIMD - Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0000. - -HWCAP_EVTSTRM - The generic timer is configured to generate events at a frequency of - approximately 10KHz. - -HWCAP_AES - Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001. - -HWCAP_PMULL - Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0010. - -HWCAP_SHA1 - Functionality implied by ID_AA64ISAR0_EL1.SHA1 == 0b0001. - -HWCAP_SHA2 - Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0001. - -HWCAP_CRC32 - Functionality implied by ID_AA64ISAR0_EL1.CRC32 == 0b0001. - -HWCAP_ATOMICS - Functionality implied by ID_AA64ISAR0_EL1.Atomic == 0b0010. - -HWCAP_FPHP - Functionality implied by ID_AA64PFR0_EL1.FP == 0b0001. - -HWCAP_ASIMDHP - Functionality implied by ID_AA64PFR0_EL1.AdvSIMD == 0b0001. - -HWCAP_CPUID - EL0 access to certain ID registers is available, to the extent - described by Documentation/arm64/cpu-feature-registers.rst. - - These ID registers may imply the availability of features. - -HWCAP_ASIMDRDM - Functionality implied by ID_AA64ISAR0_EL1.RDM == 0b0001. - -HWCAP_JSCVT - Functionality implied by ID_AA64ISAR1_EL1.JSCVT == 0b0001. - -HWCAP_FCMA - Functionality implied by ID_AA64ISAR1_EL1.FCMA == 0b0001. - -HWCAP_LRCPC - Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0001. - -HWCAP_DCPOP - Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0001. - -HWCAP_SHA3 - Functionality implied by ID_AA64ISAR0_EL1.SHA3 == 0b0001. - -HWCAP_SM3 - Functionality implied by ID_AA64ISAR0_EL1.SM3 == 0b0001. - -HWCAP_SM4 - Functionality implied by ID_AA64ISAR0_EL1.SM4 == 0b0001. - -HWCAP_ASIMDDP - Functionality implied by ID_AA64ISAR0_EL1.DP == 0b0001. - -HWCAP_SHA512 - Functionality implied by ID_AA64ISAR0_EL1.SHA2 == 0b0010. - -HWCAP_SVE - Functionality implied by ID_AA64PFR0_EL1.SVE == 0b0001. - -HWCAP_ASIMDFHM - Functionality implied by ID_AA64ISAR0_EL1.FHM == 0b0001. - -HWCAP_DIT - Functionality implied by ID_AA64PFR0_EL1.DIT == 0b0001. - -HWCAP_USCAT - Functionality implied by ID_AA64MMFR2_EL1.AT == 0b0001. - -HWCAP_ILRCPC - Functionality implied by ID_AA64ISAR1_EL1.LRCPC == 0b0010. - -HWCAP_FLAGM - Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001. - -HWCAP_SSBS - Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010. - -HWCAP_SB - Functionality implied by ID_AA64ISAR1_EL1.SB == 0b0001. - -HWCAP_PACA - Functionality implied by ID_AA64ISAR1_EL1.APA == 0b0001 or - ID_AA64ISAR1_EL1.API == 0b0001, as described by - Documentation/arm64/pointer-authentication.rst. - -HWCAP_PACG - Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or - ID_AA64ISAR1_EL1.GPI == 0b0001, as described by - Documentation/arm64/pointer-authentication.rst. - -HWCAP2_DCPODP - Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010. - -HWCAP2_SVE2 - Functionality implied by ID_AA64ZFR0_EL1.SVEVer == 0b0001. - -HWCAP2_SVEAES - Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0001. - -HWCAP2_SVEPMULL - Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0010. - -HWCAP2_SVEBITPERM - Functionality implied by ID_AA64ZFR0_EL1.BitPerm == 0b0001. - -HWCAP2_SVESHA3 - Functionality implied by ID_AA64ZFR0_EL1.SHA3 == 0b0001. - -HWCAP2_SVESM4 - Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001. - -HWCAP2_FLAGM2 - Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010. - -HWCAP2_FRINT - Functionality implied by ID_AA64ISAR1_EL1.FRINTTS == 0b0001. - -HWCAP2_SVEI8MM - Functionality implied by ID_AA64ZFR0_EL1.I8MM == 0b0001. - -HWCAP2_SVEF32MM - Functionality implied by ID_AA64ZFR0_EL1.F32MM == 0b0001. - -HWCAP2_SVEF64MM - Functionality implied by ID_AA64ZFR0_EL1.F64MM == 0b0001. - -HWCAP2_SVEBF16 - Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0001. - -HWCAP2_I8MM - Functionality implied by ID_AA64ISAR1_EL1.I8MM == 0b0001. - -HWCAP2_BF16 - Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0001. - -HWCAP2_DGH - Functionality implied by ID_AA64ISAR1_EL1.DGH == 0b0001. - -HWCAP2_RNG - Functionality implied by ID_AA64ISAR0_EL1.RNDR == 0b0001. - -HWCAP2_BTI - Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001. - -HWCAP2_MTE - Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described - by Documentation/arm64/memory-tagging-extension.rst. - -HWCAP2_ECV - Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001. - -HWCAP2_AFP - Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001. - -HWCAP2_RPRES - Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001. - -HWCAP2_MTE3 - Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0011, as described - by Documentation/arm64/memory-tagging-extension.rst. - -HWCAP2_SME - Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described - by Documentation/arm64/sme.rst. - -HWCAP2_SME_I16I64 - Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111. - -HWCAP2_SME_F64F64 - Functionality implied by ID_AA64SMFR0_EL1.F64F64 == 0b1. - -HWCAP2_SME_I8I32 - Functionality implied by ID_AA64SMFR0_EL1.I8I32 == 0b1111. - -HWCAP2_SME_F16F32 - Functionality implied by ID_AA64SMFR0_EL1.F16F32 == 0b1. - -HWCAP2_SME_B16F32 - Functionality implied by ID_AA64SMFR0_EL1.B16F32 == 0b1. - -HWCAP2_SME_F32F32 - Functionality implied by ID_AA64SMFR0_EL1.F32F32 == 0b1. - -HWCAP2_SME_FA64 - Functionality implied by ID_AA64SMFR0_EL1.FA64 == 0b1. - -HWCAP2_WFXT - Functionality implied by ID_AA64ISAR2_EL1.WFXT == 0b0010. - -HWCAP2_EBF16 - Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0010. - -HWCAP2_SVE_EBF16 - Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0010. - -HWCAP2_CSSC - Functionality implied by ID_AA64ISAR2_EL1.CSSC == 0b0001. - -HWCAP2_RPRFM - Functionality implied by ID_AA64ISAR2_EL1.RPRFM == 0b0001. - -HWCAP2_SVE2P1 - Functionality implied by ID_AA64ZFR0_EL1.SVEver == 0b0010. - -HWCAP2_SME2 - Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0001. - -HWCAP2_SME2P1 - Functionality implied by ID_AA64SMFR0_EL1.SMEver == 0b0010. - -HWCAP2_SMEI16I32 - Functionality implied by ID_AA64SMFR0_EL1.I16I32 == 0b0101 - -HWCAP2_SMEBI32I32 - Functionality implied by ID_AA64SMFR0_EL1.BI32I32 == 0b1 - -HWCAP2_SMEB16B16 - Functionality implied by ID_AA64SMFR0_EL1.B16B16 == 0b1 - -HWCAP2_SMEF16F16 - Functionality implied by ID_AA64SMFR0_EL1.F16F16 == 0b1 - -4. Unused AT_HWCAP bits ------------------------ - -For interoperation with userspace, the kernel guarantees that bits 62 -and 63 of AT_HWCAP will always be returned as 0. diff --git a/Documentation/arm64/features.rst b/Documentation/arm64/features.rst deleted file mode 100644 index dfa4cb3cd3ef..000000000000 --- a/Documentation/arm64/features.rst +++ /dev/null @@ -1,3 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -.. kernel-feat:: $srctree/Documentation/features arm64 diff --git a/Documentation/arm64/hugetlbpage.rst b/Documentation/arm64/hugetlbpage.rst deleted file mode 100644 index a110124c11e3..000000000000 --- a/Documentation/arm64/hugetlbpage.rst +++ /dev/null @@ -1,43 +0,0 @@ -.. _hugetlbpage_index: - -==================== -HugeTLBpage on ARM64 -==================== - -Hugepage relies on making efficient use of TLBs to improve performance of -address translations. The benefit depends on both - - - - the size of hugepages - - size of entries supported by the TLBs - -The ARM64 port supports two flavours of hugepages. - -1) Block mappings at the pud/pmd level --------------------------------------- - -These are regular hugepages where a pmd or a pud page table entry points to a -block of memory. Regardless of the supported size of entries in TLB, block -mappings reduce the depth of page table walk needed to translate hugepage -addresses. - -2) Using the Contiguous bit ---------------------------- - -The architecture provides a contiguous bit in the translation table entries -(D4.5.3, ARM DDI 0487C.a) that hints to the MMU to indicate that it is one of a -contiguous set of entries that can be cached in a single TLB entry. - -The contiguous bit is used in Linux to increase the mapping size at the pmd and -pte (last) level. The number of supported contiguous entries varies by page size -and level of the page table. - - -The following hugepage sizes are supported - - - ====== ======== ==== ======== === - - CONT PTE PMD CONT PMD PUD - ====== ======== ==== ======== === - 4K: 64K 2M 32M 1G - 16K: 2M 32M 1G - 64K: 2M 512M 16G - ====== ======== ==== ======== === diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst deleted file mode 100644 index ae21f8118830..000000000000 --- a/Documentation/arm64/index.rst +++ /dev/null @@ -1,36 +0,0 @@ -.. _arm64_index: - -================== -ARM64 Architecture -================== - -.. toctree:: - :maxdepth: 1 - - acpi_object_usage - amu - arm-acpi - asymmetric-32bit - booting - cpu-feature-registers - elf_hwcaps - hugetlbpage - legacy_instructions - memory - memory-tagging-extension - perf - pointer-authentication - silicon-errata - sme - sve - tagged-address-abi - tagged-pointers - - features - -.. only:: subproject and html - - Indices - ======= - - * :ref:`genindex` diff --git a/Documentation/arm64/kasan-offsets.sh b/Documentation/arm64/kasan-offsets.sh deleted file mode 100644 index 2dc5f9e18039..000000000000 --- a/Documentation/arm64/kasan-offsets.sh +++ /dev/null @@ -1,26 +0,0 @@ -#!/bin/sh - -# Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW -# start address at the top of the linear region - -print_kasan_offset () { - printf "%02d\t" $1 - printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \ - - (1 << (64 - 32 - $2)) )) -} - -echo KASAN_SHADOW_SCALE_SHIFT = 3 -printf "VABITS\tKASAN_SHADOW_OFFSET\n" -print_kasan_offset 48 3 -print_kasan_offset 47 3 -print_kasan_offset 42 3 -print_kasan_offset 39 3 -print_kasan_offset 36 3 -echo -echo KASAN_SHADOW_SCALE_SHIFT = 4 -printf "VABITS\tKASAN_SHADOW_OFFSET\n" -print_kasan_offset 48 4 -print_kasan_offset 47 4 -print_kasan_offset 42 4 -print_kasan_offset 39 4 -print_kasan_offset 36 4 diff --git a/Documentation/arm64/legacy_instructions.rst b/Documentation/arm64/legacy_instructions.rst deleted file mode 100644 index 54401b22cb8f..000000000000 --- a/Documentation/arm64/legacy_instructions.rst +++ /dev/null @@ -1,68 +0,0 @@ -=================== -Legacy instructions -=================== - -The arm64 port of the Linux kernel provides infrastructure to support -emulation of instructions which have been deprecated, or obsoleted in -the architecture. The infrastructure code uses undefined instruction -hooks to support emulation. Where available it also allows turning on -the instruction execution in hardware. - -The emulation mode can be controlled by writing to sysctl nodes -(/proc/sys/abi). The following explains the different execution -behaviours and the corresponding values of the sysctl nodes - - -* Undef - Value: 0 - - Generates undefined instruction abort. Default for instructions that - have been obsoleted in the architecture, e.g., SWP - -* Emulate - Value: 1 - - Uses software emulation. To aid migration of software, in this mode - usage of emulated instruction is traced as well as rate limited - warnings are issued. This is the default for deprecated - instructions, .e.g., CP15 barriers - -* Hardware Execution - Value: 2 - - Although marked as deprecated, some implementations may support the - enabling/disabling of hardware support for the execution of these - instructions. Using hardware execution generally provides better - performance, but at the loss of ability to gather runtime statistics - about the use of the deprecated instructions. - -The default mode depends on the status of the instruction in the -architecture. Deprecated instructions should default to emulation -while obsolete instructions must be undefined by default. - -Note: Instruction emulation may not be possible in all cases. See -individual instruction notes for further information. - -Supported legacy instructions ------------------------------ -* SWP{B} - -:Node: /proc/sys/abi/swp -:Status: Obsolete -:Default: Undef (0) - -* CP15 Barriers - -:Node: /proc/sys/abi/cp15_barrier -:Status: Deprecated -:Default: Emulate (1) - -* SETEND - -:Node: /proc/sys/abi/setend -:Status: Deprecated -:Default: Emulate (1)* - - Note: All the cpus on the system must have mixed endian support at EL0 - for this feature to be enabled. If a new CPU - which doesn't support mixed - endian - is hotplugged in after this feature has been enabled, there could - be unexpected results in the application. diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst deleted file mode 100644 index dbae47bba25e..000000000000 --- a/Documentation/arm64/memory-tagging-extension.rst +++ /dev/null @@ -1,375 +0,0 @@ -=============================================== -Memory Tagging Extension (MTE) in AArch64 Linux -=============================================== - -Authors: Vincenzo Frascino - Catalin Marinas - -Date: 2020-02-25 - -This document describes the provision of the Memory Tagging Extension -functionality in AArch64 Linux. - -Introduction -============ - -ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) -feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI -(Top Byte Ignore) feature and allows software to access a 4-bit -allocation tag for each 16-byte granule in the physical address space. -Such memory range must be mapped with the Normal-Tagged memory -attribute. A logical tag is derived from bits 59-56 of the virtual -address used for the memory access. A CPU with MTE enabled will compare -the logical tag against the allocation tag and potentially raise an -exception on mismatch, subject to system registers configuration. - -Userspace Support -================= - -When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is -supported by the hardware, the kernel advertises the feature to -userspace via ``HWCAP2_MTE``. - -PROT_MTE --------- - -To access the allocation tags, a user process must enable the Tagged -memory attribute on an address range using a new ``prot`` flag for -``mmap()`` and ``mprotect()``: - -``PROT_MTE`` - Pages allow access to the MTE allocation tags. - -The allocation tag is set to 0 when such pages are first mapped in the -user address space and preserved on copy-on-write. ``MAP_SHARED`` is -supported and the allocation tags can be shared between processes. - -**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and -RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other -types of mapping will result in ``-EINVAL`` returned by these system -calls. - -**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot -be cleared by ``mprotect()``. - -**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and -``MADV_FREE`` may have the allocation tags cleared (set to 0) at any -point after the system call. - -Tag Check Faults ----------------- - -When ``PROT_MTE`` is enabled on an address range and a mismatch between -the logical and allocation tags occurs on access, there are three -configurable behaviours: - -- *Ignore* - This is the default mode. The CPU (and kernel) ignores the - tag check fault. - -- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with - ``.si_code = SEGV_MTESERR`` and ``.si_addr = ``. The - memory access is not performed. If ``SIGSEGV`` is ignored or blocked - by the offending thread, the containing process is terminated with a - ``coredump``. - -- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending - thread, asynchronously following one or multiple tag check faults, - with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting - address is unknown). - -- *Asymmetric* - Reads are handled as for synchronous mode while writes - are handled as for asynchronous mode. - -The user can select the above modes, per thread, using the -``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags`` -contains any number of the following values in the ``PR_MTE_TCF_MASK`` -bit-field: - -- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults - (ignored if combined with other options) -- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode -- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode - -If no modes are specified, tag check faults are ignored. If a single -mode is specified, the program will run in that mode. If multiple -modes are specified, the mode is selected as described in the "Per-CPU -preferred tag checking modes" section below. - -The current tag check fault configuration can be read using the -``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. If -multiple modes were requested then all will be reported. - -Tag checking can also be disabled for a user thread by setting the -``PSTATE.TCO`` bit with ``MSR TCO, #1``. - -**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, -irrespective of the interrupted context. ``PSTATE.TCO`` is restored on -``sigreturn()``. - -**Note**: There are no *match-all* logical tags available for user -applications. - -**Note**: Kernel accesses to the user address space (e.g. ``read()`` -system call) are not checked if the user thread tag checking mode is -``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is -``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user -address accesses, however it cannot always guarantee it. Kernel accesses -to user addresses are always performed with an effective ``PSTATE.TCO`` -value of zero, regardless of the user configuration. - -Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions ------------------------------------------------------------------ - -The architecture allows excluding certain tags to be randomly generated -via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux -excludes all tags other than 0. A user thread can enable specific tags -in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, -flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap -in the ``PR_MTE_TAG_MASK`` bit-field. - -**Note**: The hardware uses an exclude mask but the ``prctl()`` -interface provides an include mask. An include mask of ``0`` (exclusion -mask ``0xffff``) results in the CPU always generating tag ``0``. - -Per-CPU preferred tag checking mode ------------------------------------ - -On some CPUs the performance of MTE in stricter tag checking modes -is similar to that of less strict tag checking modes. This makes it -worthwhile to enable stricter checks on those CPUs when a less strict -checking mode is requested, in order to gain the error detection -benefits of the stricter checks without the performance downsides. To -support this scenario, a privileged user may configure a stricter -tag checking mode as the CPU's preferred tag checking mode. - -The preferred tag checking mode for each CPU is controlled by -``/sys/devices/system/cpu/cpu/mte_tcf_preferred``, to which a -privileged user may write the value ``async``, ``sync`` or ``asymm``. The -default preferred mode for each CPU is ``async``. - -To allow a program to potentially run in the CPU's preferred tag -checking mode, the user program may set multiple tag check fault mode -bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, -flags, 0, 0, 0)`` system call. If both synchronous and asynchronous -modes are requested then asymmetric mode may also be selected by the -kernel. If the CPU's preferred tag checking mode is in the task's set -of provided tag checking modes, that mode will be selected. Otherwise, -one of the modes in the task's mode will be selected by the kernel -from the task's mode set using the preference order: - - 1. Asynchronous - 2. Asymmetric - 3. Synchronous - -Note that there is no way for userspace to request multiple modes and -also disable asymmetric mode. - -Initial process state ---------------------- - -On ``execve()``, the new process has the following configuration: - -- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) -- No tag checking modes are selected (tag check faults ignored) -- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) -- ``PSTATE.TCO`` set to 0 -- ``PROT_MTE`` not set on any of the initial memory maps - -On ``fork()``, the new process inherits the parent's configuration and -memory map attributes with the exception of the ``madvise()`` ranges -with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set -to 0). - -The ``ptrace()`` interface --------------------------- - -``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read -the tags from or set the tags to a tracee's address space. The -``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, -data)`` where: - -- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``. -- ``pid`` - the tracee's PID. -- ``addr`` - address in the tracee's address space. -- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to - a buffer of ``iov_len`` length in the tracer's address space. - -The tags in the tracer's ``iov_base`` buffer are represented as one -4-bit tag per byte and correspond to a 16-byte MTE tag granule in the -tracee's address space. - -**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel -will use the corresponding aligned address. - -``ptrace()`` return value: - -- 0 - tags were copied, the tracer's ``iov_len`` was updated to the - number of tags transferred. This may be smaller than the requested - ``iov_len`` if the requested address range in the tracee's or the - tracer's space cannot be accessed or does not have valid tags. -- ``-EPERM`` - the specified process cannot be traced. -- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid - address) and no tags copied. ``iov_len`` not updated. -- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` - or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. -- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never - mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. - -**Note**: There are no transient errors for the requests above, so user -programs should not retry in case of a non-zero system call return. - -``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == -``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged -address ABI control and MTE configuration of a process as per the -``prctl()`` options described in -Documentation/arm64/tagged-address-abi.rst and above. The corresponding -``regset`` is 1 element of 8 bytes (``sizeof(long))``). - -Core dump support ------------------ - -The allocation tags for user memory mapped with ``PROT_MTE`` are dumped -in the core file as additional ``PT_AARCH64_MEMTAG_MTE`` segments. The -program header for such segment is defined as: - -:``p_type``: ``PT_AARCH64_MEMTAG_MTE`` -:``p_flags``: 0 -:``p_offset``: segment file offset -:``p_vaddr``: segment virtual address, same as the corresponding - ``PT_LOAD`` segment -:``p_paddr``: 0 -:``p_filesz``: segment size in file, calculated as ``p_mem_sz / 32`` - (two 4-bit tags cover 32 bytes of memory) -:``p_memsz``: segment size in memory, same as the corresponding - ``PT_LOAD`` segment -:``p_align``: 0 - -The tags are stored in the core file at ``p_offset`` as two 4-bit tags -in a byte. With the tag granule of 16 bytes, a 4K page requires 128 -bytes in the core file. - -Example of correct usage -======================== - -*MTE Example code* - -.. code-block:: c - - /* - * To be compiled with -march=armv8.5-a+memtag - */ - #include - #include - #include - #include - #include - #include - #include - #include - - /* - * From arch/arm64/include/uapi/asm/hwcap.h - */ - #define HWCAP2_MTE (1 << 18) - - /* - * From arch/arm64/include/uapi/asm/mman.h - */ - #define PROT_MTE 0x20 - - /* - * From include/uapi/linux/prctl.h - */ - #define PR_SET_TAGGED_ADDR_CTRL 55 - #define PR_GET_TAGGED_ADDR_CTRL 56 - # define PR_TAGGED_ADDR_ENABLE (1UL << 0) - # define PR_MTE_TCF_SHIFT 1 - # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) - # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) - # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) - # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) - # define PR_MTE_TAG_SHIFT 3 - # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) - - /* - * Insert a random logical tag into the given pointer. - */ - #define insert_random_tag(ptr) ({ \ - uint64_t __val; \ - asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ - __val; \ - }) - - /* - * Set the allocation tag on the destination address. - */ - #define set_tag(tagged_addr) do { \ - asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ - } while (0) - - int main() - { - unsigned char *a; - unsigned long page_sz = sysconf(_SC_PAGESIZE); - unsigned long hwcap2 = getauxval(AT_HWCAP2); - - /* check if MTE is present */ - if (!(hwcap2 & HWCAP2_MTE)) - return EXIT_FAILURE; - - /* - * Enable the tagged address ABI, synchronous or asynchronous MTE - * tag check faults (based on per-CPU preference) and allow all - * non-zero tags in the randomly generated set. - */ - if (prctl(PR_SET_TAGGED_ADDR_CTRL, - PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC | - (0xfffe << PR_MTE_TAG_SHIFT), - 0, 0, 0)) { - perror("prctl() failed"); - return EXIT_FAILURE; - } - - a = mmap(0, page_sz, PROT_READ | PROT_WRITE, - MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - if (a == MAP_FAILED) { - perror("mmap() failed"); - return EXIT_FAILURE; - } - - /* - * Enable MTE on the above anonymous mmap. The flag could be passed - * directly to mmap() and skip this step. - */ - if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { - perror("mprotect() failed"); - return EXIT_FAILURE; - } - - /* access with the default tag (0) */ - a[0] = 1; - a[1] = 2; - - printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); - - /* set the logical and allocation tags */ - a = (unsigned char *)insert_random_tag(a); - set_tag(a); - - printf("%p\n", a); - - /* non-zero tag access */ - a[0] = 3; - printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); - - /* - * If MTE is enabled correctly the next instruction will generate an - * exception. - */ - printf("Expecting SIGSEGV...\n"); - a[16] = 0xdd; - - /* this should not be printed in the PR_MTE_TCF_SYNC mode */ - printf("...haven't got one\n"); - - return EXIT_FAILURE; - } diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst deleted file mode 100644 index 2a641ba7be3b..000000000000 --- a/Documentation/arm64/memory.rst +++ /dev/null @@ -1,167 +0,0 @@ -============================== -Memory Layout on AArch64 Linux -============================== - -Author: Catalin Marinas - -This document describes the virtual memory layout used by the AArch64 -Linux kernel. The architecture allows up to 4 levels of translation -tables with a 4KB page size and up to 3 levels with a 64KB page size. - -AArch64 Linux uses either 3 levels or 4 levels of translation tables -with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit -(256TB) virtual addresses, respectively, for both user and kernel. With -64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) -virtual address, are used but the memory layout is the same. - -ARMv8.2 adds optional support for Large Virtual Address space. This is -only available when running with a 64KB page size and expands the -number of descriptors in the first level of translation. - -User addresses have bits 63:48 set to 0 while the kernel addresses have -the same bits set to 1. TTBRx selection is given by bit 63 of the -virtual address. The swapper_pg_dir contains only kernel (global) -mappings while the user pgd contains only user (non-global) mappings. -The swapper_pg_dir address is written to TTBR1 and never written to -TTBR0. - - -AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: - - Start End Size Use - ----------------------------------------------------------------------- - 0000000000000000 0000ffffffffffff 256TB user - ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map - [ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region] - ffff800000000000 ffff800007ffffff 128MB modules - ffff800008000000 fffffbffefffffff 124TB vmalloc - fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) - fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] - fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space - fffffbffff800000 fffffbffffffffff 8MB [guard region] - fffffc0000000000 fffffdffffffffff 2TB vmemmap - fffffe0000000000 ffffffffffffffff 2TB [guard region] - - -AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: - - Start End Size Use - ----------------------------------------------------------------------- - 0000000000000000 000fffffffffffff 4PB user - fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map - [fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region] - ffff800000000000 ffff800007ffffff 128MB modules - ffff800008000000 fffffbffefffffff 124TB vmalloc - fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) - fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] - fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space - fffffbffff800000 fffffbffffffffff 8MB [guard region] - fffffc0000000000 ffffffdfffffffff ~4TB vmemmap - ffffffe000000000 ffffffffffffffff 128GB [guard region] - - -Translation table lookup with 4KB pages:: - - +--------+--------+--------+--------+--------+--------+--------+--------+ - |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| - +--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | | - | | | | | v - | | | | | [11:0] in-page offset - | | | | +-> [20:12] L3 index - | | | +-----------> [29:21] L2 index - | | +---------------------> [38:30] L1 index - | +-------------------------------> [47:39] L0 index - +-------------------------------------------------> [63] TTBR0/1 - - -Translation table lookup with 64KB pages:: - - +--------+--------+--------+--------+--------+--------+--------+--------+ - |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| - +--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | - | | | | v - | | | | [15:0] in-page offset - | | | +----------> [28:16] L3 index - | | +--------------------------> [41:29] L2 index - | +-------------------------------> [47:42] L1 index (48-bit) - | [51:42] L1 index (52-bit) - +-------------------------------------------------> [63] TTBR0/1 - - -When using KVM without the Virtualization Host Extensions, the -hypervisor maps kernel pages in EL2 at a fixed (and potentially -random) offset from the linear mapping. See the kern_hyp_va macro and -kvm_update_va_mask function for more details. MMIO devices such as -GICv2 gets mapped next to the HYP idmap page, as do vectors when -ARM64_SPECTRE_V3A is enabled for particular CPUs. - -When using KVM with the Virtualization Host Extensions, no additional -mappings are created, since the host kernel runs directly in EL2. - -52-bit VA support in the kernel -------------------------------- -If the ARMv8.2-LVA optional feature is present, and we are running -with a 64KB page size; then it is possible to use 52-bits of address -space for both userspace and kernel addresses. However, any kernel -binary that supports 52-bit must also be able to fall back to 48-bit -at early boot time if the hardware feature is not present. - -This fallback mechanism necessitates the kernel .text to be in the -higher addresses such that they are invariant to 48/52-bit VAs. Due -to the kasan shadow being a fraction of the entire kernel VA space, -the end of the kasan shadow must also be in the higher half of the -kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit, -the end of the kasan shadow is invariant and dependent on ~0UL, -whilst the start address will "grow" towards the lower addresses). - -In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET -is kept constant at 0xFFF0000000000000 (corresponding to 52-bit), -this obviates the need for an extra variable read. The physvirt -offset and vmemmap offsets are computed at early boot to enable -this logic. - -As a single binary will need to support both 48-bit and 52-bit VA -spaces, the VMEMMAP must be sized large enough for 52-bit VAs and -also must be sized large enough to accommodate a fixed PAGE_OFFSET. - -Most code in the kernel should not need to consider the VA_BITS, for -code that does need to know the VA size the variables are -defined as follows: - -VA_BITS constant the *maximum* VA space size - -VA_BITS_MIN constant the *minimum* VA space size - -vabits_actual variable the *actual* VA space size - - -Maximum and minimum sizes can be useful to ensure that buffers are -sized large enough or that addresses are positioned close enough for -the "worst" case. - -52-bit userspace VAs --------------------- -To maintain compatibility with software that relies on the ARMv8.0 -VA space maximum size of 48-bits, the kernel will, by default, -return virtual addresses to userspace from a 48-bit range. - -Software can "opt-in" to receiving VAs from a 52-bit space by -specifying an mmap hint parameter that is larger than 48-bit. - -For example: - -.. code-block:: c - - maybe_high_address = mmap(~0UL, size, prot, flags,...); - -It is also possible to build a debug kernel that returns addresses -from a 52-bit space by enabling the following kernel config options: - -.. code-block:: sh - - CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y - -Note that this option is only intended for debugging applications -and should not be used in production. diff --git a/Documentation/arm64/perf.rst b/Documentation/arm64/perf.rst deleted file mode 100644 index 1f87b57c2332..000000000000 --- a/Documentation/arm64/perf.rst +++ /dev/null @@ -1,166 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -.. _perf_index: - -==== -Perf -==== - -Perf Event Attributes -===================== - -:Author: Andrew Murray -:Date: 2019-03-06 - -exclude_user ------------- - -This attribute excludes userspace. - -Userspace always runs at EL0 and thus this attribute will exclude EL0. - - -exclude_kernel --------------- - -This attribute excludes the kernel. - -The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run -at EL1. - -For the host this attribute will exclude EL1 and additionally EL2 on a VHE -system. - -For the guest this attribute will exclude EL1. Please note that EL2 is -never counted within a guest. - - -exclude_hv ----------- - -This attribute excludes the hypervisor. - -For a VHE host this attribute is ignored as we consider the host kernel to -be the hypervisor. - -For a non-VHE host this attribute will exclude EL2 as we consider the -hypervisor to be any code that runs at EL2 which is predominantly used for -guest/host transitions. - -For the guest this attribute has no effect. Please note that EL2 is -never counted within a guest. - - -exclude_host / exclude_guest ----------------------------- - -These attributes exclude the KVM host and guest, respectively. - -The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE -kernel or non-VHE hypervisor). - -The KVM guest may run at EL0 (userspace) and EL1 (kernel). - -Due to the overlapping exception levels between host and guests we cannot -exclusively rely on the PMU's hardware exception filtering - therefore we -must enable/disable counting on the entry and exit to the guest. This is -performed differently on VHE and non-VHE systems. - -For non-VHE systems we exclude EL2 for exclude_host - upon entering and -exiting the guest we disable/enable the event as appropriate based on the -exclude_host and exclude_guest attributes. - -For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2 -for exclude_host. Upon entering and exiting the guest we modify the event -to include/exclude EL0 as appropriate based on the exclude_host and -exclude_guest attributes. - -The statements above also apply when these attributes are used within a -non-VHE guest however please note that EL2 is never counted within a guest. - - -Accuracy --------- - -On non-VHE hosts we enable/disable counters on the entry/exit of host/guest -transition at EL2 - however there is a period of time between -enabling/disabling the counters and entering/exiting the guest. We are -able to eliminate counters counting host events on the boundaries of guest -entry/exit when counting guest events by filtering out EL2 for -exclude_host. However when using !exclude_hv there is a small blackout -window at the guest entry/exit where host events are not captured. - -On VHE systems there are no blackout windows. - -Perf Userspace PMU Hardware Counter Access -========================================== - -Overview --------- -The perf userspace tool relies on the PMU to monitor events. It offers an -abstraction layer over the hardware counters since the underlying -implementation is cpu-dependent. -Arm64 allows userspace tools to have access to the registers storing the -hardware counters' values directly. - -This targets specifically self-monitoring tasks in order to reduce the overhead -by directly accessing the registers without having to go through the kernel. - -How-to ------- -The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu -registers is enabled and that the userspace has access to the relevant -information in order to use them. - -In order to have access to the hardware counters, the global sysctl -kernel/perf_user_access must first be enabled: - -.. code-block:: sh - - echo 1 > /proc/sys/kernel/perf_user_access - -It is necessary to open the event using the perf tool interface with config1:1 -attr bit set: the sys_perf_event_open syscall returns a fd which can -subsequently be used with the mmap syscall in order to retrieve a page of memory -containing information about the event. The PMU driver uses this page to expose -to the user the hardware counter's index and other necessary data. Using this -index enables the user to access the PMU registers using the `mrs` instruction. -Access to the PMU registers is only valid while the sequence lock is unchanged. -In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is -changed. - -The userspace access is supported in libperf using the perf_evsel__mmap() -and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for -an example. - -About heterogeneous systems ---------------------------- -On heterogeneous systems such as big.LITTLE, userspace PMU counter access can -only be enabled when the tasks are pinned to a homogeneous subset of cores and -the corresponding PMU instance is opened by specifying the 'type' attribute. -The use of generic event types is not supported in this case. - -Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It -can be run using the perf tool to check that the access to the registers works -correctly from userspace: - -.. code-block:: sh - - perf test -v user - -About chained events and counter sizes --------------------------------------- -The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1) -counter along with userspace access. The sys_perf_event_open syscall will fail -if a 64-bit counter is requested and the hardware doesn't support 64-bit -counters. Chained events are not supported in conjunction with userspace counter -access. If a 32-bit counter is requested on hardware with 64-bit counters, then -userspace must treat the upper 32-bits read from the counter as UNKNOWN. The -'pmc_width' field in the user page will indicate the valid width of the counter -and should be used to mask the upper bits as needed. - -.. Links -.. _tools/perf/arch/arm64/tests/user-events.c: - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c -.. _tools/lib/perf/tests/test-evsel.c: - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c diff --git a/Documentation/arm64/pointer-authentication.rst b/Documentation/arm64/pointer-authentication.rst deleted file mode 100644 index e5dad2e40aa8..000000000000 --- a/Documentation/arm64/pointer-authentication.rst +++ /dev/null @@ -1,142 +0,0 @@ -======================================= -Pointer authentication in AArch64 Linux -======================================= - -Author: Mark Rutland - -Date: 2017-07-19 - -This document briefly describes the provision of pointer authentication -functionality in AArch64 Linux. - - -Architecture overview ---------------------- - -The ARMv8.3 Pointer Authentication extension adds primitives that can be -used to mitigate certain classes of attack where an attacker can corrupt -the contents of some memory (e.g. the stack). - -The extension uses a Pointer Authentication Code (PAC) to determine -whether pointers have been modified unexpectedly. A PAC is derived from -a pointer, another value (such as the stack pointer), and a secret key -held in system registers. - -The extension adds instructions to insert a valid PAC into a pointer, -and to verify/remove the PAC from a pointer. The PAC occupies a number -of high-order bits of the pointer, which varies dependent on the -configured virtual address size and whether pointer tagging is in use. - -A subset of these instructions have been allocated from the HINT -encoding space. In the absence of the extension (or when disabled), -these instructions behave as NOPs. Applications and libraries using -these instructions operate correctly regardless of the presence of the -extension. - -The extension provides five separate keys to generate PACs - two for -instruction addresses (APIAKey, APIBKey), two for data addresses -(APDAKey, APDBKey), and one for generic authentication (APGAKey). - - -Basic support -------------- - -When CONFIG_ARM64_PTR_AUTH is selected, and relevant HW support is -present, the kernel will assign random key values to each process at -exec*() time. The keys are shared by all threads within the process, and -are preserved across fork(). - -Presence of address authentication functionality is advertised via -HWCAP_PACA, and generic authentication functionality via HWCAP_PACG. - -The number of bits that the PAC occupies in a pointer is 55 minus the -virtual address size configured by the kernel. For example, with a -virtual address size of 48, the PAC is 7 bits wide. - -When ARM64_PTR_AUTH_KERNEL is selected, the kernel will be compiled -with HINT space pointer authentication instructions protecting -function returns. Kernels built with this option will work on hardware -with or without pointer authentication support. - -In addition to exec(), keys can also be reinitialized to random values -using the PR_PAC_RESET_KEYS prctl. A bitmask of PR_PAC_APIAKEY, -PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY and PR_PAC_APGAKEY -specifies which keys are to be reinitialized; specifying 0 means "all -keys". - - -Debugging ---------- - -When CONFIG_ARM64_PTR_AUTH is selected, and HW support for address -authentication is present, the kernel will expose the position of TTBR0 -PAC bits in the NT_ARM_PAC_MASK regset (struct user_pac_mask), which -userspace can acquire via PTRACE_GETREGSET. - -The regset is exposed only when HWCAP_PACA is set. Separate masks are -exposed for data pointers and instruction pointers, as the set of PAC -bits can vary between the two. Note that the masks apply to TTBR0 -addresses, and are not valid to apply to TTBR1 addresses (e.g. kernel -pointers). - -Additionally, when CONFIG_CHECKPOINT_RESTORE is also set, the kernel -will expose the NT_ARM_PACA_KEYS and NT_ARM_PACG_KEYS regsets (struct -user_pac_address_keys and struct user_pac_generic_keys). These can be -used to get and set the keys for a thread. - - -Virtualization --------------- - -Pointer authentication is enabled in KVM guest when each virtual cpu is -initialised by passing flags KVM_ARM_VCPU_PTRAUTH_[ADDRESS/GENERIC] and -requesting these two separate cpu features to be enabled. The current KVM -guest implementation works by enabling both features together, so both -these userspace flags are checked before enabling pointer authentication. -The separate userspace flag will allow to have no userspace ABI changes -if support is added in the future to allow these two features to be -enabled independently of one another. - -As Arm Architecture specifies that Pointer Authentication feature is -implemented along with the VHE feature so KVM arm64 ptrauth code relies -on VHE mode to be present. - -Additionally, when these vcpu feature flags are not set then KVM will -filter out the Pointer Authentication system key registers from -KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID -register. Any attempt to use the Pointer Authentication instructions will -result in an UNDEFINED exception being injected into the guest. - - -Enabling and disabling keys ---------------------------- - -The prctl PR_PAC_SET_ENABLED_KEYS allows the user program to control which -PAC keys are enabled in a particular task. It takes two arguments, the -first being a bitmask of PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY -and PR_PAC_APDBKEY specifying which keys shall be affected by this prctl, -and the second being a bitmask of the same bits specifying whether the key -should be enabled or disabled. For example:: - - prctl(PR_PAC_SET_ENABLED_KEYS, - PR_PAC_APIAKEY | PR_PAC_APIBKEY | PR_PAC_APDAKEY | PR_PAC_APDBKEY, - PR_PAC_APIBKEY, 0, 0); - -disables all keys except the IB key. - -The main reason why this is useful is to enable a userspace ABI that uses PAC -instructions to sign and authenticate function pointers and other pointers -exposed outside of the function, while still allowing binaries conforming to -the ABI to interoperate with legacy binaries that do not sign or authenticate -pointers. - -The idea is that a dynamic loader or early startup code would issue this -prctl very early after establishing that a process may load legacy binaries, -but before executing any PAC instructions. - -For compatibility with previous kernel versions, processes start up with IA, -IB, DA and DB enabled, and are reset to this state on exec(). Processes created -via fork() and clone() inherit the key enabled state from the calling process. - -It is recommended to avoid disabling the IA key, as this has higher performance -overhead than disabling any of the other keys. diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst deleted file mode 100644 index 9e311bc43e05..000000000000 --- a/Documentation/arm64/silicon-errata.rst +++ /dev/null @@ -1,216 +0,0 @@ -======================================= -Silicon Errata and Software Workarounds -======================================= - -Author: Will Deacon - -Date : 27 November 2015 - -It is an unfortunate fact of life that hardware is often produced with -so-called "errata", which can cause it to deviate from the architecture -under specific circumstances. For hardware produced by ARM, these -errata are broadly classified into the following categories: - - ========== ======================================================== - Category A A critical error without a viable workaround. - Category B A significant or critical error with an acceptable - workaround. - Category C A minor error that is not expected to occur under normal - operation. - ========== ======================================================== - -For more information, consult one of the "Software Developers Errata -Notice" documents available on infocenter.arm.com (registration -required). - -As far as Linux is concerned, Category B errata may require some special -treatment in the operating system. For example, avoiding a particular -sequence of code, or configuring the processor in a particular way. A -less common situation may require similar actions in order to declassify -a Category A erratum into a Category C erratum. These are collectively -known as "software workarounds" and are only required in the minority of -cases (e.g. those cases that both require a non-secure workaround *and* -can be triggered by Linux). - -For software workarounds that may adversely impact systems unaffected by -the erratum in question, a Kconfig entry is added under "Kernel -Features" -> "ARM errata workarounds via the alternatives framework". -These are enabled by default and patched in at runtime when an affected -CPU is detected. For less-intrusive workarounds, a Kconfig option is not -available and the code is structured (preferably with a comment) in such -a way that the erratum will not be hit. - -This approach can make it slightly onerous to determine exactly which -errata are worked around in an arbitrary kernel source tree, so this -file acts as a registry of software workarounds in the Linux Kernel and -will be updated when new workarounds are committed and backported to -stable kernels. - -+----------------+-----------------+-----------------+-----------------------------+ -| Implementor | Component | Erratum ID | Kconfig | -+================+=================+=================+=============================+ -| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 | -+----------------+-----------------+-----------------+-----------------------------+ -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2457168 | ARM64_ERRATUM_2457168 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2064142 | ARM64_ERRATUM_2064142 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2038923 | ARM64_ERRATUM_2038923 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #1902691 | ARM64_ERRATUM_1902691 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A55 | #1530923 | ARM64_ERRATUM_1530923 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A55 | #2441007 | ARM64_ERRATUM_2441007 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A57 | #852523 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A57 | #1319537 | ARM64_ERRATUM_1319367 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A57 | #1742098 | ARM64_ERRATUM_1742098 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A72 | #853709 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A72 | #1319367 | ARM64_ERRATUM_1319367 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A72 | #1655431 | ARM64_ERRATUM_1742098 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A76 | #1165522 | ARM64_ERRATUM_1165522 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A76 | #1286807 | ARM64_ERRATUM_1286807 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2051678 | ARM64_ERRATUM_2051678 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2077057 | ARM64_ERRATUM_2077057 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2441009 | ARM64_ERRATUM_2441009 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A715 | #2645198 | ARM64_ERRATUM_2645198 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-X2 | #2119858 | ARM64_ERRATUM_2119858 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-X2 | #2224489 | ARM64_ERRATUM_2224489 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Neoverse-N1 | #1349291 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 | -+----------------+-----------------+-----------------+-----------------------------+ -| ARM | MMU-500 | #841119,826419 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -+----------------+-----------------+-----------------+-----------------------------+ -| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 | -+----------------+-----------------+-----------------+-----------------------------+ -| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 | -+----------------+-----------------+-----------------+-----------------------------+ -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX ITS | #22375,24313 | CAVIUM_ERRATUM_22375 | -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 | -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX GICv3 | #23154,38545 | CAVIUM_ERRATUM_23154 | -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX GICv3 | #38539 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX Core | #27456 | CAVIUM_ERRATUM_27456 | -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX Core | #30115 | CAVIUM_ERRATUM_30115 | -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX SMMUv2 | #27704 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX2 SMMUv3| #74 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX2 SMMUv3| #126 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX2 Core | #219 | CAVIUM_TX2_ERRATUM_219 | -+----------------+-----------------+-----------------+-----------------------------+ -+----------------+-----------------+-----------------+-----------------------------+ -| Marvell | ARM-MMU-500 | #582743 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -+----------------+-----------------+-----------------+-----------------------------+ -| NVIDIA | Carmel Core | N/A | NVIDIA_CARMEL_CNP_ERRATUM | -+----------------+-----------------+-----------------+-----------------------------+ -| NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -+----------------+-----------------+-----------------+-----------------------------+ -| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 | -+----------------+-----------------+-----------------+-----------------------------+ -+----------------+-----------------+-----------------+-----------------------------+ -| Hisilicon | Hip0{5,6,7} | #161010101 | HISILICON_ERRATUM_161010101 | -+----------------+-----------------+-----------------+-----------------------------+ -| Hisilicon | Hip0{6,7} | #161010701 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -| Hisilicon | Hip0{6,7} | #161010803 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -| Hisilicon | Hip07 | #161600802 | HISILICON_ERRATUM_161600802 | -+----------------+-----------------+-----------------+-----------------------------+ -| Hisilicon | Hip08 SMMU PMCG | #162001800 | N/A | -+----------------+-----------------+-----------------+-----------------------------+ -+----------------+-----------------+-----------------+-----------------------------+ -| Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 | -+----------------+-----------------+-----------------+-----------------------------+ -| Qualcomm Tech. | Kryo/Falkor v1 | E1009 | QCOM_FALKOR_ERRATUM_1009 | -+----------------+-----------------+-----------------+-----------------------------+ -| Qualcomm Tech. | QDF2400 ITS | E0065 | QCOM_QDF2400_ERRATUM_0065 | -+----------------+-----------------+-----------------+-----------------------------+ -| Qualcomm Tech. | Falkor v{1,2} | E1041 | QCOM_FALKOR_ERRATUM_1041 | -+----------------+-----------------+-----------------+-----------------------------+ -| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1463225 | -+----------------+-----------------+-----------------+-----------------------------+ -| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1418040 | -+----------------+-----------------+-----------------+-----------------------------+ -| Qualcomm Tech. | Kryo4xx Silver | N/A | ARM64_ERRATUM_1530923 | -+----------------+-----------------+-----------------+-----------------------------+ -| Qualcomm Tech. | Kryo4xx Silver | N/A | ARM64_ERRATUM_1024718 | -+----------------+-----------------+-----------------+-----------------------------+ -| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1286807 | -+----------------+-----------------+-----------------+-----------------------------+ -+----------------+-----------------+-----------------+-----------------------------+ -| Rockchip | RK3588 | #3588001 | ROCKCHIP_ERRATUM_3588001 | -+----------------+-----------------+-----------------+-----------------------------+ - -+----------------+-----------------+-----------------+-----------------------------+ -| Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 | -+----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst deleted file mode 100644 index 1c43ea12eb4f..000000000000 --- a/Documentation/arm64/sme.rst +++ /dev/null @@ -1,468 +0,0 @@ -=================================================== -Scalable Matrix Extension support for AArch64 Linux -=================================================== - -This document outlines briefly the interface provided to userspace by Linux in -order to support use of the ARM Scalable Matrix Extension (SME). - -This is an outline of the most important features and issues only and not -intended to be exhaustive. It should be read in conjunction with the SVE -documentation in sve.rst which provides details on the Streaming SVE mode -included in SME. - -This document does not aim to describe the SME architecture or programmer's -model. To aid understanding, a minimal description of relevant programmer's -model features for SME is included in Appendix A. - - -1. General ------------ - -* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA and (when - present) ZTn register state and TPIDR2_EL0 are tracked per thread. - -* The presence of SME is reported to userspace via HWCAP2_SME in the aux vector - AT_HWCAP2 entry. Presence of this flag implies the presence of the SME - instructions and registers, and the Linux-specific system interfaces - described in this document. SME is reported in /proc/cpuinfo as "sme". - -* The presence of SME2 is reported to userspace via HWCAP2_SME2 in the - aux vector AT_HWCAP2 entry. Presence of this flag implies the presence of - the SME2 instructions and ZT0, and the Linux-specific system interfaces - described in this document. SME2 is reported in /proc/cpuinfo as "sme2". - -* Support for the execution of SME instructions in userspace can also be - detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS - instruction, and checking that the value of the SME field is nonzero. [3] - - It does not guarantee the presence of the system interfaces described in the - following sections: software that needs to verify that those interfaces are - present must check for HWCAP2_SME instead. - -* There are a number of optional SME features, presence of these is reported - through AT_HWCAP2 through: - - HWCAP2_SME_I16I64 - HWCAP2_SME_F64F64 - HWCAP2_SME_I8I32 - HWCAP2_SME_F16F32 - HWCAP2_SME_B16F32 - HWCAP2_SME_F32F32 - HWCAP2_SME_FA64 - HWCAP2_SME2 - - This list may be extended over time as the SME architecture evolves. - - These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1, - which userspace can read using an MRS instruction. See elf_hwcaps.txt and - cpu-feature-registers.txt for details. - -* Debuggers should restrict themselves to interacting with the target via the - NT_ARM_SVE, NT_ARM_SSVE, NT_ARM_ZA and NT_ARM_ZT regsets. The recommended - way of detecting support for these regsets is to connect to a target process - first and then attempt a - - ptrace(PTRACE_GETREGSET, pid, NT_ARM_, &iov). - -* Whenever ZA register values are exchanged in memory between userspace and - the kernel, the register value is encoded in memory as a series of horizontal - vectors from 0 to VL/8-1 stored in the same endianness invariant format as is - used for SVE vectors. - -* On thread creation TPIDR2_EL0 is preserved unless CLONE_SETTLS is specified, - in which case it is set to 0. - -2. Vector lengths ------------------- - -SME defines a second vector length similar to the SVE vector length which is -controls the size of the streaming mode SVE vectors and the ZA matrix array. -The ZA matrix is square with each side having as many bytes as a streaming -mode SVE vector. - - -3. Sharing of streaming and non-streaming mode SVE state ---------------------------------------------------------- - -It is implementation defined which if any parts of the SVE state are shared -between streaming and non-streaming modes. When switching between modes -via software interfaces such as ptrace if no register content is provided as -part of switching no state will be assumed to be shared and everything will -be zeroed. - - -4. System call behaviour -------------------------- - -* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the - ZA matrix and ZTn (if present) are preserved. - -* On syscall PSTATE.SM will be cleared and the SVE registers will be handled - as per the standard SVE ABI. - -* None of the SVE registers, ZA or ZTn are used to pass arguments to - or receive results from any syscall. - -* On process creation (eg, clone()) the newly created process will have - PSTATE.SM cleared. - -* All other SME state of a thread, including the currently configured vector - length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector - length (if any), is preserved across all syscalls, subject to the specific - exceptions for execve() described in section 6. - - -5. Signal handling -------------------- - -* Signal handlers are invoked with streaming mode and ZA disabled. - -* A new signal frame record TPIDR2_MAGIC is added formatted as a struct - tpidr2_context to allow access to TPIDR2_EL0 from signal handlers. - -* A new signal frame record za_context encodes the ZA register contents on - signal delivery. [1] - -* The signal frame record for ZA always contains basic metadata, in particular - the thread's vector length (in za_context.vl). - -* The ZA matrix may or may not be included in the record, depending on - the value of PSTATE.ZA. The registers are present if and only if: - za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl)) - in which case PSTATE.ZA == 1. - -* If matrix data is present, the remainder of the record has a vl-dependent - size and layout. Macros ZA_SIG_* are defined [1] to facilitate access to - them. - -* The matrix is stored as a series of horizontal vectors in the same format as - is used for SVE vectors. - -* If the ZA context is too big to fit in sigcontext.__reserved[], then extra - space is allocated on the stack, an extra_context record is written in - __reserved[] referencing this space. za_context is then written in the - extra space. Refer to [1] for further details about this mechanism. - -* If ZTn is supported and PSTATE.ZA==1 then a signal frame record for ZTn will - be generated. - -* The signal record for ZTn has magic ZT_MAGIC (0x5a544e01) and consists of a - standard signal frame header followed by a struct zt_context specifying - the number of ZTn registers supported by the system, then zt_context.nregs - blocks of 64 bytes of data per register. - - -5. Signal return ------------------ - -When returning from a signal handler: - -* If there is no za_context record in the signal frame, or if the record is - present but contains no register data as described in the previous section, - then ZA is disabled. - -* If za_context is present in the signal frame and contains matrix data then - PSTATE.ZA is set to 1 and ZA is populated with the specified data. - -* The vector length cannot be changed via signal return. If za_context.vl in - the signal frame does not match the current vector length, the signal return - attempt is treated as illegal, resulting in a forced SIGSEGV. - -* If ZTn is not supported or PSTATE.ZA==0 then it is illegal to have a - signal frame record for ZTn, resulting in a forced SIGSEGV. - - -6. prctl extensions --------------------- - -Some new prctl() calls are added to allow programs to manage the SME vector -length: - -prctl(PR_SME_SET_VL, unsigned long arg) - - Sets the vector length of the calling thread and related flags, where - arg == vl | flags. Other threads of the calling process are unaffected. - - vl is the desired vector length, where sve_vl_valid(vl) must be true. - - flags: - - PR_SME_VL_INHERIT - - Inherit the current vector length across execve(). Otherwise, the - vector length is reset to the system default at execve(). (See - Section 9.) - - PR_SME_SET_VL_ONEXEC - - Defer the requested vector length change until the next execve() - performed by this thread. - - The effect is equivalent to implicit execution of the following - call immediately after the next execve() (if any) by the thread: - - prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC) - - This allows launching of a new program with a different vector - length, while avoiding runtime side effects in the caller. - - Without PR_SME_SET_VL_ONEXEC, the requested change takes effect - immediately. - - - Return value: a nonnegative on success, or a negative value on error: - EINVAL: SME not supported, invalid vector length requested, or - invalid flags. - - - On success: - - * Either the calling thread's vector length or the deferred vector length - to be applied at the next execve() by the thread (dependent on whether - PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value - supported by the system that is less than or equal to vl. If vl == - SVE_VL_MAX, the value set will be the largest value supported by the - system. - - * Any previously outstanding deferred vector length change in the calling - thread is cancelled. - - * The returned value describes the resulting configuration, encoded as for - PR_SME_GET_VL. The vector length reported in this value is the new - current vector length for this thread if PR_SME_SET_VL_ONEXEC was not - present in arg; otherwise, the reported vector length is the deferred - vector length that will be applied at the next execve() by the calling - thread. - - * Changing the vector length causes all of ZA, ZTn, P0..P15, FFR and all - bits of Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become - unspecified, including both streaming and non-streaming SVE state. - Calling PR_SME_SET_VL with vl equal to the thread's current vector - length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, - does not constitute a change to the vector length for this purpose. - - * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared. - Calling PR_SME_SET_VL with vl equal to the thread's current vector - length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, - does not constitute a change to the vector length for this purpose. - - -prctl(PR_SME_GET_VL) - - Gets the vector length of the calling thread. - - The following flag may be OR-ed into the result: - - PR_SME_VL_INHERIT - - Vector length will be inherited across execve(). - - There is no way to determine whether there is an outstanding deferred - vector length change (which would only normally be the case between a - fork() or vfork() and the corresponding execve() in typical use). - - To extract the vector length from the result, bitwise and it with - PR_SME_VL_LEN_MASK. - - Return value: a nonnegative value on success, or a negative value on error: - EINVAL: SME not supported. - - -7. ptrace extensions ---------------------- - -* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE - state via PTRACE_GETREGSET and PTRACE_SETREGSET, this is documented in - sve.rst. - -* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via - PTRACE_GETREGSET and PTRACE_SETREGSET. - - Refer to [2] for definitions. - -The regset data starts with struct user_za_header, containing: - - size - - Size of the complete regset, in bytes. - This depends on vl and possibly on other things in the future. - - If a call to PTRACE_GETREGSET requests less data than the value of - size, the caller can allocate a larger buffer and retry in order to - read the complete regset. - - max_size - - Maximum size in bytes that the regset can grow to for the target - thread. The regset won't grow bigger than this even if the target - thread changes its vector length etc. - - vl - - Target thread's current streaming vector length, in bytes. - - max_vl - - Maximum possible streaming vector length for the target thread. - - flags - - Zero or more of the following flags, which have the same - meaning and behaviour as the corresponding PR_SET_VL_* flags: - - SME_PT_VL_INHERIT - - SME_PT_VL_ONEXEC (SETREGSET only). - -* The effects of changing the vector length and/or flags are equivalent to - those documented for PR_SME_SET_VL. - - The caller must make a further GETREGSET call if it needs to know what VL is - actually set by SETREGSET, unless is it known in advance that the requested - VL is supported. - -* The size and layout of the payload depends on the header fields. The - SME_PT_ZA_*() macros are provided to facilitate access to the data. - -* In either case, for SETREGSET it is permissible to omit the payload, in which - case the vector length and flags are changed and PSTATE.ZA is set to 0 - (along with any consequences of those changes). If a payload is provided - then PSTATE.ZA will be set to 1. - -* For SETREGSET, if the requested VL is not supported, the effect will be the - same as if the payload were omitted, except that an EIO error is reported. - No attempt is made to translate the payload data to the correct layout - for the vector length actually set. It is up to the caller to translate the - payload layout for the actual VL and retry. - -* The effect of writing a partial, incomplete payload is unspecified. - -* A new regset NT_ARM_ZT is defined for access to ZTn state via - PTRACE_GETREGSET and PTRACE_SETREGSET. - -* The NT_ARM_ZT regset consists of a single 512 bit register. - -* When PSTATE.ZA==0 reads of NT_ARM_ZT will report all bits of ZTn as 0. - -* Writes to NT_ARM_ZT will set PSTATE.ZA to 1. - - -8. ELF coredump extensions ---------------------------- - -* NT_ARM_SSVE notes will be added to each coredump for - each thread of the dumped process. The contents will be equivalent to the - data that would have been read if a PTRACE_GETREGSET of the corresponding - type were executed for each thread when the coredump was generated. - -* A NT_ARM_ZA note will be added to each coredump for each thread of the - dumped process. The contents will be equivalent to the data that would have - been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread - when the coredump was generated. - -* A NT_ARM_ZT note will be added to each coredump for each thread of the - dumped process. The contents will be equivalent to the data that would have - been read if a PTRACE_GETREGSET of NT_ARM_ZT were executed for each thread - when the coredump was generated. - -* The NT_ARM_TLS note will be extended to two registers, the second register - will contain TPIDR2_EL0 on systems that support SME and will be read as - zero with writes ignored otherwise. - -9. System runtime configuration --------------------------------- - -* To mitigate the ABI impact of expansion of the signal frame, a policy - mechanism is provided for administrators, distro maintainers and developers - to set the default vector length for userspace processes: - -/proc/sys/abi/sme_default_vector_length - - Writing the text representation of an integer to this file sets the system - default vector length to the specified value, unless the value is greater - than the maximum vector length supported by the system in which case the - default vector length is set to that maximum. - - The result can be determined by reopening the file and reading its - contents. - - At boot, the default vector length is initially set to 32 or the maximum - supported vector length, whichever is smaller and supported. This - determines the initial vector length of the init process (PID 1). - - Reading this file returns the current system default vector length. - -* At every execve() call, the new vector length of the new process is set to - the system default vector length, unless - - * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the - calling thread, or - - * a deferred vector length change is pending, established via the - PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC). - -* Modifying the system default vector length does not affect the vector length - of any existing process or thread that does not make an execve() call. - - -Appendix A. SME programmer's model (informative) -================================================= - -This section provides a minimal description of the additions made by SME to the -ARMv8-A programmer's model that are relevant to this document. - -Note: This section is for information only and not intended to be complete or -to replace any architectural specification. - -A.1. Registers ---------------- - -In A64 state, SME adds the following: - -* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE - features are available. When supported EL0 software may enter and leave - streaming mode at any time. - - For best system performance it is strongly encouraged for software to enable - streaming mode only when it is actively being used. - -* A new vector length controlling the size of ZA and the Z registers when in - streaming mode, separately to the vector length used for SVE when not in - streaming mode. There is no requirement that either the currently selected - vector length or the set of vector lengths supported for the two modes in - a given system have any relationship. The streaming mode vector length - is referred to as SVL. - -* A new ZA matrix register. This is a square matrix of SVLxSVL bits. Most - operations on ZA require that streaming mode be enabled but ZA can be - enabled without streaming mode in order to load, save and retain data. - - For best system performance it is strongly encouraged for software to enable - ZA only when it is actively being used. - -* A new ZT0 register is introduced when SME2 is present. This is a 512 bit - register which is accessible when PSTATE.ZA is set, as ZA itself is. - -* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and - SMSTOP instructions or by access to the SVCR system register: - - * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid - data while if it is 0 then ZA can not be accessed. When PSTATE.ZA is - changed from 0 to 1 all bits in ZA are cleared. - - * PSTATE.SM, if this is 1 then the PE is in streaming mode. When the value - of PSTATE.SM is changed then it is implementation defined if the subset - of the floating point register bits valid in both modes may be retained. - Any other bits will be cleared. - - -References -========== - -[1] arch/arm64/include/uapi/asm/sigcontext.h - AArch64 Linux signal ABI definitions - -[2] arch/arm64/include/uapi/asm/ptrace.h - AArch64 Linux ptrace ABI definitions - -[3] Documentation/arm64/cpu-feature-registers.rst diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst deleted file mode 100644 index 1b90a30382ac..000000000000 --- a/Documentation/arm64/sve.rst +++ /dev/null @@ -1,616 +0,0 @@ -=================================================== -Scalable Vector Extension support for AArch64 Linux -=================================================== - -Author: Dave Martin - -Date: 4 August 2017 - -This document outlines briefly the interface provided to userspace by Linux in -order to support use of the ARM Scalable Vector Extension (SVE), including -interactions with Streaming SVE mode added by the Scalable Matrix Extension -(SME). - -This is an outline of the most important features and issues only and not -intended to be exhaustive. - -This document does not aim to describe the SVE architecture or programmer's -model. To aid understanding, a minimal description of relevant programmer's -model features for SVE is included in Appendix A. - - -1. General ------------ - -* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are - tracked per-thread. - -* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present - in the system, when it is not supported and these interfaces are used to - access streaming mode FFR is read and written as zero. - -* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector - AT_HWCAP entry. Presence of this flag implies the presence of the SVE - instructions and registers, and the Linux-specific system interfaces - described in this document. SVE is reported in /proc/cpuinfo as "sve". - -* Support for the execution of SVE instructions in userspace can also be - detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS - instruction, and checking that the value of the SVE field is nonzero. [3] - - It does not guarantee the presence of the system interfaces described in the - following sections: software that needs to verify that those interfaces are - present must check for HWCAP_SVE instead. - -* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also - be reported in the AT_HWCAP2 aux vector entry. In addition to this, - optional extensions to SVE2 may be reported by the presence of: - - HWCAP2_SVE2 - HWCAP2_SVEAES - HWCAP2_SVEPMULL - HWCAP2_SVEBITPERM - HWCAP2_SVESHA3 - HWCAP2_SVESM4 - HWCAP2_SVE2P1 - - This list may be extended over time as the SVE architecture evolves. - - These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1, - which userspace can read using an MRS instruction. See elf_hwcaps.txt and - cpu-feature-registers.txt for details. - -* On hardware that supports the SME extensions, HWCAP2_SME will also be - reported in the AT_HWCAP2 aux vector entry. Among other things SME adds - streaming mode which provides a subset of the SVE feature set using a - separate SME vector length and the same Z/V registers. See sme.rst - for more details. - -* Debuggers should restrict themselves to interacting with the target via the - NT_ARM_SVE regset. The recommended way of detecting support for this regset - is to connect to a target process first and then attempt a - ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is - present and streaming SVE mode is in use the FPSIMD subset of registers - will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode - in the target. - -* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory - between userspace and the kernel, the register value is encoded in memory in - an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at - byte offset i from the start of the memory representation. This affects for - example the signal frame (struct sve_context) and ptrace interface - (struct user_sve_header) and associated data. - - Beware that on big-endian systems this results in a different byte order than - for the FPSIMD V-registers, which are stored as single host-endian 128-bit - values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at - byte offset i. (struct fpsimd_context, struct user_fpsimd_state). - - -2. Vector length terminology ------------------------------ - -The size of an SVE vector (Z) register is referred to as the "vector length". - -To avoid confusion about the units used to express vector length, the kernel -adopts the following conventions: - -* Vector length (VL) = size of a Z-register in bytes - -* Vector quadwords (VQ) = size of a Z-register in units of 128 bits - -(So, VL = 16 * VQ.) - -The VQ convention is used where the underlying granularity is important, such -as in data structure definitions. In most other situations, the VL convention -is used. This is consistent with the meaning of the "VL" pseudo-register in -the SVE instruction set architecture. - - -3. System call behaviour -------------------------- - -* On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of - Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR - become zero on return from a syscall. - -* The SVE registers are not used to pass arguments to or receive results from - any syscall. - -* In practice the affected registers/bits will be preserved or will be replaced - with zeros on return from a syscall, but userspace should not make - assumptions about this. The kernel behaviour may vary on a case-by-case - basis. - -* All other SVE state of a thread, including the currently configured vector - length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector - length (if any), is preserved across all syscalls, subject to the specific - exceptions for execve() described in section 6. - - In particular, on return from a fork() or clone(), the parent and new child - process or thread share identical SVE configuration, matching that of the - parent before the call. - - -4. Signal handling -------------------- - -* A new signal frame record sve_context encodes the SVE registers on signal - delivery. [1] - -* This record is supplementary to fpsimd_context. The FPSR and FPCR registers - are only present in fpsimd_context. For convenience, the content of V0..V31 - is duplicated between sve_context and fpsimd_context. - -* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which - if set indicates that the thread is in streaming mode and the vector length - and register data (if present) describe the streaming SVE data and vector - length. - -* The signal frame record for SVE always contains basic metadata, in particular - the thread's vector length (in sve_context.vl). - -* The SVE registers may or may not be included in the record, depending on - whether the registers are live for the thread. The registers are present if - and only if: - sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)). - -* If the registers are present, the remainder of the record has a vl-dependent - size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to - the members. - -* Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant - layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the - start of the register's representation in memory. - -* If the SVE context is too big to fit in sigcontext.__reserved[], then extra - space is allocated on the stack, an extra_context record is written in - __reserved[] referencing this space. sve_context is then written in the - extra space. Refer to [1] for further details about this mechanism. - - -5. Signal return ------------------ - -When returning from a signal handler: - -* If there is no sve_context record in the signal frame, or if the record is - present but contains no register data as described in the previous section, - then the SVE registers/bits become non-live and take unspecified values. - -* If sve_context is present in the signal frame and contains full register - data, the SVE registers become live and are populated with the specified - data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31 - are always restored from the corresponding members of fpsimd_context.vregs[] - and not from sve_context. The remaining bits are restored from sve_context. - -* Inclusion of fpsimd_context in the signal frame remains mandatory, - irrespective of whether sve_context is present or not. - -* The vector length cannot be changed via signal return. If sve_context.vl in - the signal frame does not match the current vector length, the signal return - attempt is treated as illegal, resulting in a forced SIGSEGV. - -* It is permitted to enter or leave streaming mode by setting or clearing - the SVE_SIG_FLAG_SM flag but applications should take care to ensure that - when doing so sve_context.vl and any register data are appropriate for the - vector length in the new mode. - - -6. prctl extensions --------------------- - -Some new prctl() calls are added to allow programs to manage the SVE vector -length: - -prctl(PR_SVE_SET_VL, unsigned long arg) - - Sets the vector length of the calling thread and related flags, where - arg == vl | flags. Other threads of the calling process are unaffected. - - vl is the desired vector length, where sve_vl_valid(vl) must be true. - - flags: - - PR_SVE_VL_INHERIT - - Inherit the current vector length across execve(). Otherwise, the - vector length is reset to the system default at execve(). (See - Section 9.) - - PR_SVE_SET_VL_ONEXEC - - Defer the requested vector length change until the next execve() - performed by this thread. - - The effect is equivalent to implicit execution of the following - call immediately after the next execve() (if any) by the thread: - - prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC) - - This allows launching of a new program with a different vector - length, while avoiding runtime side effects in the caller. - - - Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect - immediately. - - - Return value: a nonnegative on success, or a negative value on error: - EINVAL: SVE not supported, invalid vector length requested, or - invalid flags. - - - On success: - - * Either the calling thread's vector length or the deferred vector length - to be applied at the next execve() by the thread (dependent on whether - PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value - supported by the system that is less than or equal to vl. If vl == - SVE_VL_MAX, the value set will be the largest value supported by the - system. - - * Any previously outstanding deferred vector length change in the calling - thread is cancelled. - - * The returned value describes the resulting configuration, encoded as for - PR_SVE_GET_VL. The vector length reported in this value is the new - current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not - present in arg; otherwise, the reported vector length is the deferred - vector length that will be applied at the next execve() by the calling - thread. - - * Changing the vector length causes all of P0..P15, FFR and all bits of - Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become - unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current - vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC - flag, does not constitute a change to the vector length for this purpose. - - -prctl(PR_SVE_GET_VL) - - Gets the vector length of the calling thread. - - The following flag may be OR-ed into the result: - - PR_SVE_VL_INHERIT - - Vector length will be inherited across execve(). - - There is no way to determine whether there is an outstanding deferred - vector length change (which would only normally be the case between a - fork() or vfork() and the corresponding execve() in typical use). - - To extract the vector length from the result, bitwise and it with - PR_SVE_VL_LEN_MASK. - - Return value: a nonnegative value on success, or a negative value on error: - EINVAL: SVE not supported. - - -7. ptrace extensions ---------------------- - -* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with - PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the - streaming mode SVE registers and NT_ARM_SVE describes the - non-streaming mode SVE registers. - - In this description a register set is referred to as being "live" when - the target is in the appropriate streaming or non-streaming mode and is - using data beyond the subset shared with the FPSIMD Vn registers. - - Refer to [2] for definitions. - -The regset data starts with struct user_sve_header, containing: - - size - - Size of the complete regset, in bytes. - This depends on vl and possibly on other things in the future. - - If a call to PTRACE_GETREGSET requests less data than the value of - size, the caller can allocate a larger buffer and retry in order to - read the complete regset. - - max_size - - Maximum size in bytes that the regset can grow to for the target - thread. The regset won't grow bigger than this even if the target - thread changes its vector length etc. - - vl - - Target thread's current vector length, in bytes. - - max_vl - - Maximum possible vector length for the target thread. - - flags - - at most one of - - SVE_PT_REGS_FPSIMD - - SVE registers are not live (GETREGSET) or are to be made - non-live (SETREGSET). - - The payload is of type struct user_fpsimd_state, with the same - meaning as for NT_PRFPREG, starting at offset - SVE_PT_FPSIMD_OFFSET from the start of user_sve_header. - - Extra data might be appended in the future: the size of the - payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags). - - vq should be obtained using sve_vq_from_vl(vl). - - or - - SVE_PT_REGS_SVE - - SVE registers are live (GETREGSET) or are to be made live - (SETREGSET). - - The payload contains the SVE register data, starting at offset - SVE_PT_SVE_OFFSET from the start of user_sve_header, and with - size SVE_PT_SVE_SIZE(vq, flags); - - ... OR-ed with zero or more of the following flags, which have the same - meaning and behaviour as the corresponding PR_SET_VL_* flags: - - SVE_PT_VL_INHERIT - - SVE_PT_VL_ONEXEC (SETREGSET only). - - If neither FPSIMD nor SVE flags are provided then no register - payload is available, this is only possible when SME is implemented. - - -* The effects of changing the vector length and/or flags are equivalent to - those documented for PR_SVE_SET_VL. - - The caller must make a further GETREGSET call if it needs to know what VL is - actually set by SETREGSET, unless is it known in advance that the requested - VL is supported. - -* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on - the header fields. The SVE_PT_SVE_*() macros are provided to facilitate - access to the members. - -* In either case, for SETREGSET it is permissible to omit the payload, in which - case only the vector length and flags are changed (along with any - consequences of those changes). - -* In systems supporting SME when in streaming mode a GETREGSET for - NT_REG_SVE will return only the user_sve_header with no register data, - similarly a GETREGSET for NT_REG_SSVE will not return any register data - when not in streaming mode. - -* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD. - -* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the - requested VL is not supported, the effect will be the same as if the - payload were omitted, except that an EIO error is reported. No - attempt is made to translate the payload data to the correct layout - for the vector length actually set. The thread's FPSIMD state is - preserved, but the remaining bits of the SVE registers become - unspecified. It is up to the caller to translate the payload layout - for the actual VL and retry. - -* Where SME is implemented it is not possible to GETREGSET the register - state for normal SVE when in streaming mode, nor the streaming mode - register state when in normal mode, regardless of the implementation defined - behaviour of the hardware for sharing data between the two modes. - -* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in - streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode - if the target was not in streaming mode. - -* The effect of writing a partial, incomplete payload is unspecified. - - -8. ELF coredump extensions ---------------------------- - -* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for - each thread of the dumped process. The contents will be equivalent to the - data that would have been read if a PTRACE_GETREGSET of the corresponding - type were executed for each thread when the coredump was generated. - -9. System runtime configuration --------------------------------- - -* To mitigate the ABI impact of expansion of the signal frame, a policy - mechanism is provided for administrators, distro maintainers and developers - to set the default vector length for userspace processes: - -/proc/sys/abi/sve_default_vector_length - - Writing the text representation of an integer to this file sets the system - default vector length to the specified value, unless the value is greater - than the maximum vector length supported by the system in which case the - default vector length is set to that maximum. - - The result can be determined by reopening the file and reading its - contents. - - At boot, the default vector length is initially set to 64 or the maximum - supported vector length, whichever is smaller. This determines the initial - vector length of the init process (PID 1). - - Reading this file returns the current system default vector length. - -* At every execve() call, the new vector length of the new process is set to - the system default vector length, unless - - * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the - calling thread, or - - * a deferred vector length change is pending, established via the - PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC). - -* Modifying the system default vector length does not affect the vector length - of any existing process or thread that does not make an execve() call. - -10. Perf extensions --------------------------------- - -* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register - at index 46. This register is used for DWARF unwinding when variable length - SVE registers are pushed onto the stack. - -* Its value is equivalent to the current SVE vector length (VL) in bits divided - by 64. - -* The value is included in Perf samples in the regs[46] field if - PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set. - -* The value is the current value at the time the sample was taken, and it can - change over time. - -* If the system doesn't support SVE when perf_event_open is called with these - settings, the event will fail to open. - -Appendix A. SVE programmer's model (informative) -================================================= - -This section provides a minimal description of the additions made by SVE to the -ARMv8-A programmer's model that are relevant to this document. - -Note: This section is for information only and not intended to be complete or -to replace any architectural specification. - -A.1. Registers ---------------- - -In A64 state, SVE adds the following: - -* 32 8VL-bit vector registers Z0..Z31 - For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn. - - A register write using a Vn register name zeros all bits of the corresponding - Zn except for bits [127:0]. - -* 16 VL-bit predicate registers P0..P15 - -* 1 VL-bit special-purpose predicate register FFR (the "first-fault register") - -* a VL "pseudo-register" that determines the size of each vector register - - The SVE instruction set architecture provides no way to write VL directly. - Instead, it can be modified only by EL1 and above, by writing appropriate - system registers. - -* The value of VL can be configured at runtime by EL1 and above: - 16 <= VL <= VLmax, where VL must be a multiple of 16. - -* The maximum vector length is determined by the hardware: - 16 <= VLmax <= 256. - - (The SVE architecture specifies 256, but permits future architecture - revisions to raise this limit.) - -* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point - operations in a similar way to the way in which they interact with ARMv8 - floating-point operations:: - - 8VL-1 128 0 bit index - +---- //// -----------------+ - Z0 | : V0 | - : : - Z7 | : V7 | - Z8 | : * V8 | - : : : - Z15 | : *V15 | - Z16 | : V16 | - : : - Z31 | : V31 | - +---- //// -----------------+ - 31 0 - VL-1 0 +-------+ - +---- //// --+ FPSR | | - P0 | | +-------+ - : | | *FPCR | | - P15 | | +-------+ - +---- //// --+ - FFR | | +-----+ - +---- //// --+ VL | | - +-----+ - -(*) callee-save: - This only applies to bits [63:0] of Z-/V-registers. - FPCR contains callee-save and caller-save bits. See [4] for details. - - -A.2. Procedure call standard ------------------------------ - -The ARMv8-A base procedure call standard is extended as follows with respect to -the additional SVE register state: - -* All SVE register bits that are not shared with FP/SIMD are caller-save. - -* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save. - - This follows from the way these bits are mapped to V8..V15, which are caller- - save in the base procedure call standard. - - -Appendix B. ARMv8-A FP/SIMD programmer's model -=============================================== - -Note: This section is for information only and not intended to be complete or -to replace any architectural specification. - -Refer to [4] for more information. - -ARMv8-A defines the following floating-point / SIMD register state: - -* 32 128-bit vector registers V0..V31 -* 2 32-bit status/control registers FPSR, FPCR - -:: - - 127 0 bit index - +---------------+ - V0 | | - : : : - V7 | | - * V8 | | - : : : : - *V15 | | - V16 | | - : : : - V31 | | - +---------------+ - - 31 0 - +-------+ - FPSR | | - +-------+ - *FPCR | | - +-------+ - -(*) callee-save: - This only applies to bits [63:0] of V-registers. - FPCR contains a mixture of callee-save and caller-save bits. - - -References -========== - -[1] arch/arm64/include/uapi/asm/sigcontext.h - AArch64 Linux signal ABI definitions - -[2] arch/arm64/include/uapi/asm/ptrace.h - AArch64 Linux ptrace ABI definitions - -[3] Documentation/arm64/cpu-feature-registers.rst - -[4] ARM IHI0055C - http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf - http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html - Procedure Call Standard for the ARM 64-bit Architecture (AArch64) - -[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst deleted file mode 100644 index 540a1d4fc6c9..000000000000 --- a/Documentation/arm64/tagged-address-abi.rst +++ /dev/null @@ -1,179 +0,0 @@ -========================== -AArch64 TAGGED ADDRESS ABI -========================== - -Authors: Vincenzo Frascino - Catalin Marinas - -Date: 21 August 2019 - -This document describes the usage and semantics of the Tagged Address -ABI on AArch64 Linux. - -1. Introduction ---------------- - -On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing -userspace (EL0) to perform memory accesses through 64-bit pointers with -a non-zero top byte. This document describes the relaxation of the -syscall ABI that allows userspace to pass certain tagged pointers to -kernel syscalls. - -2. AArch64 Tagged Address ABI ------------------------------ - -From the kernel syscall interface perspective and for the purposes of -this document, a "valid tagged pointer" is a pointer with a potentially -non-zero top-byte that references an address in the user process address -space obtained in one of the following ways: - -- ``mmap()`` syscall where either: - - - flags have the ``MAP_ANONYMOUS`` bit set or - - the file descriptor refers to a regular file (including those - returned by ``memfd_create()``) or ``/dev/zero`` - -- ``brk()`` syscall (i.e. the heap area between the initial location of - the program break at process creation and its current location). - -- any memory mapped by the kernel in the address space of the process - during creation and with the same restrictions as for ``mmap()`` above - (e.g. data, bss, stack). - -The AArch64 Tagged Address ABI has two stages of relaxation depending on -how the user addresses are used by the kernel: - -1. User addresses not accessed by the kernel but used for address space - management (e.g. ``mprotect()``, ``madvise()``). The use of valid - tagged pointers in this context is allowed with these exceptions: - - - ``brk()``, ``mmap()`` and the ``new_address`` argument to - ``mremap()`` as these have the potential to alias with existing - user addresses. - - NOTE: This behaviour changed in v5.6 and so some earlier kernels may - incorrectly accept valid tagged pointers for the ``brk()``, - ``mmap()`` and ``mremap()`` system calls. - - - The ``range.start``, ``start`` and ``dst`` arguments to the - ``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from - ``userfaultfd()``, as fault addresses subsequently obtained by reading - the file descriptor will be untagged, which may otherwise confuse - tag-unaware programs. - - NOTE: This behaviour changed in v5.14 and so some earlier kernels may - incorrectly accept valid tagged pointers for this system call. - -2. User addresses accessed by the kernel (e.g. ``write()``). This ABI - relaxation is disabled by default and the application thread needs to - explicitly enable it via ``prctl()`` as follows: - - - ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged - Address ABI for the calling thread. - - The ``(unsigned int) arg2`` argument is a bit mask describing the - control mode used: - - - ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI. - Default status is disabled. - - Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0. - - - ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged - Address ABI for the calling thread. - - Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0. - - The ABI properties described above are thread-scoped, inherited on - clone() and fork() and cleared on exec(). - - Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)`` - returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally - disabled by ``sysctl abi.tagged_addr_disabled=1``. The default - ``sysctl abi.tagged_addr_disabled`` configuration is 0. - -When the AArch64 Tagged Address ABI is enabled for a thread, the -following behaviours are guaranteed: - -- All syscalls except the cases mentioned in section 3 can accept any - valid tagged pointer. - -- The syscall behaviour is undefined for invalid tagged pointers: it may - result in an error code being returned, a (fatal) signal being raised, - or other modes of failure. - -- The syscall behaviour for a valid tagged pointer is the same as for - the corresponding untagged pointer. - - -A definition of the meaning of tagged pointers on AArch64 can be found -in Documentation/arm64/tagged-pointers.rst. - -3. AArch64 Tagged Address ABI Exceptions ------------------------------------------ - -The following system call parameters must be untagged regardless of the -ABI relaxation: - -- ``prctl()`` other than pointers to user data either passed directly or - indirectly as arguments to be accessed by the kernel. - -- ``ioctl()`` other than pointers to user data either passed directly or - indirectly as arguments to be accessed by the kernel. - -- ``shmat()`` and ``shmdt()``. - -- ``brk()`` (since kernel v5.6). - -- ``mmap()`` (since kernel v5.6). - -- ``mremap()``, the ``new_address`` argument (since kernel v5.6). - -Any attempt to use non-zero tagged pointers may result in an error code -being returned, a (fatal) signal being raised, or other modes of -failure. - -4. Example of correct usage ---------------------------- -.. code-block:: c - - #include - #include - #include - #include - #include - - #define PR_SET_TAGGED_ADDR_CTRL 55 - #define PR_TAGGED_ADDR_ENABLE (1UL << 0) - - #define TAG_SHIFT 56 - - int main(void) - { - int tbi_enabled = 0; - unsigned long tag = 0; - char *ptr; - - /* check/enable the tagged address ABI */ - if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)) - tbi_enabled = 1; - - /* memory allocation */ - ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, - MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - if (ptr == MAP_FAILED) - return 1; - - /* set a non-zero tag if the ABI is available */ - if (tbi_enabled) - tag = rand() & 0xff; - ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT)); - - /* memory access to a tagged address */ - strcpy(ptr, "tagged pointer\n"); - - /* syscall with a tagged pointer */ - write(1, ptr, strlen(ptr)); - - return 0; - } diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst deleted file mode 100644 index 19d284b70384..000000000000 --- a/Documentation/arm64/tagged-pointers.rst +++ /dev/null @@ -1,88 +0,0 @@ -========================================= -Tagged virtual addresses in AArch64 Linux -========================================= - -Author: Will Deacon - -Date : 12 June 2013 - -This document briefly describes the provision of tagged virtual -addresses in the AArch64 translation system and their potential uses -in AArch64 Linux. - -The kernel configures the translation tables so that translations made -via TTBR0 (i.e. userspace mappings) have the top byte (bits 63:56) of -the virtual address ignored by the translation hardware. This frees up -this byte for application use. - - -Passing tagged addresses to the kernel --------------------------------------- - -All interpretation of userspace memory addresses by the kernel assumes -an address tag of 0x00, unless the application enables the AArch64 -Tagged Address ABI explicitly -(Documentation/arm64/tagged-address-abi.rst). - -This includes, but is not limited to, addresses found in: - - - pointer arguments to system calls, including pointers in structures - passed to system calls, - - - the stack pointer (sp), e.g. when interpreting it to deliver a - signal, - - - the frame pointer (x29) and frame records, e.g. when interpreting - them to generate a backtrace or call graph. - -Using non-zero address tags in any of these locations when the -userspace application did not enable the AArch64 Tagged Address ABI may -result in an error code being returned, a (fatal) signal being raised, -or other modes of failure. - -For these reasons, when the AArch64 Tagged Address ABI is disabled, -passing non-zero address tags to the kernel via system calls is -forbidden, and using a non-zero address tag for sp is strongly -discouraged. - -Programs maintaining a frame pointer and frame records that use non-zero -address tags may suffer impaired or inaccurate debug and profiling -visibility. - - -Preserving tags ---------------- - -When delivering signals, non-zero tags are not preserved in -siginfo.si_addr unless the flag SA_EXPOSE_TAGBITS was set in -sigaction.sa_flags when the signal handler was installed. This means -that signal handlers in applications making use of tags cannot rely -on the tag information for user virtual addresses being maintained -in these fields unless the flag was set. - -Due to architecture limitations, bits 63:60 of the fault address -are not preserved in response to synchronous tag check faults -(SEGV_MTESERR) even if SA_EXPOSE_TAGBITS was set. Applications should -treat the values of these bits as undefined in order to accommodate -future architecture revisions which may preserve the bits. - -For signals raised in response to watchpoint debug exceptions, the -tag information will be preserved regardless of the SA_EXPOSE_TAGBITS -flag setting. - -Non-zero tags are never preserved in sigcontext.fault_address -regardless of the SA_EXPOSE_TAGBITS flag setting. - -The architecture prevents the use of a tagged PC, so the upper byte will -be set to a sign-extension of bit 55 on exception return. - -This behaviour is maintained when the AArch64 Tagged Address ABI is -enabled. - - -Other considerations --------------------- - -Special care should be taken when using tagged pointers, since it is -likely that C compilers will not hazard two virtual addresses differing -only in the upper byte. diff --git a/Documentation/translations/zh_CN/arch/arm64/amu.rst b/Documentation/translations/zh_CN/arch/arm64/amu.rst new file mode 100644 index 000000000000..f8e09fd21ef5 --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/amu.rst @@ -0,0 +1,100 @@ +.. include:: ../../disclaimer-zh_CN.rst + +:Original: :ref:`Documentation/arch/arm64/amu.rst ` + +Translator: Bailu Lin + +================================== +AArch64 Linux 中扩展的活动监控单元 +================================== + +作者: Ionela Voinescu + +日期: 2019-09-10 + +本文档简要描述了 AArch64 Linux 支持的活动监控单元的规范。 + + +架构总述 +-------- + +活动监控是 ARMv8.4 CPU 架构引入的一个可选扩展特性。 + +活动监控单元(在每个 CPU 中实现)为系统管理提供了性能计数器。既可以通 +过系统寄存器的方式访问计数器,同时也支持外部内存映射的方式访问计数器。 + +AMUv1 架构实现了一个由4个固定的64位事件计数器组成的计数器组。 + + - CPU 周期计数器:同 CPU 的频率增长 + - 常量计数器:同固定的系统时钟频率增长 + - 淘汰指令计数器: 同每次架构指令执行增长 + - 内存停顿周期计数器:计算由在时钟域内的最后一级缓存中未命中而引起 + 的指令调度停顿周期数 + +当处于 WFI 或者 WFE 状态时,计数器不会增长。 + +AMU 架构提供了一个高达16位的事件计数器空间,未来新的 AMU 版本中可能 +用它来实现新增的事件计数器。 + +另外,AMUv1 实现了一个多达16个64位辅助事件计数器的计数器组。 + +冷复位时所有的计数器会清零。 + + +基本支持 +-------- + +内核可以安全地运行在支持 AMU 和不支持 AMU 的 CPU 组合中。 +因此,当配置 CONFIG_ARM64_AMU_EXTN 后我们无条件使能后续 +(secondary or hotplugged) CPU 检测和使用这个特性。 + +当在 CPU 上检测到该特性时,我们会标记为特性可用但是不能保证计数器的功能, +仅表明有扩展属性。 + +固件(代码运行在高异常级别,例如 arm-tf )需支持以下功能: + + - 提供低异常级别(EL2 和 EL1)访问 AMU 寄存器的能力。 + - 使能计数器。如果未使能,它的值应为 0。 + - 在从电源关闭状态启动 CPU 前或后保存或者恢复计数器。 + +当使用使能了该特性的内核启动但固件损坏时,访问计数器寄存器可能会遭遇 +panic 或者死锁。即使未发现这些症状,计数器寄存器返回的数据结果并不一 +定能反映真实情况。通常,计数器会返回 0,表明他们未被使能。 + +如果固件没有提供适当的支持最好关闭 CONFIG_ARM64_AMU_EXTN。 +值得注意的是,出于安全原因,不要绕过 AMUSERRENR_EL0 设置而捕获从 +EL0(用户空间) 访问 EL1(内核空间)。 因此,固件应该确保访问 AMU寄存器 +不会困在 EL2或EL3。 + +AMUv1 的固定计数器可以通过如下系统寄存器访问: + + - SYS_AMEVCNTR0_CORE_EL0 + - SYS_AMEVCNTR0_CONST_EL0 + - SYS_AMEVCNTR0_INST_RET_EL0 + - SYS_AMEVCNTR0_MEM_STALL_EL0 + +特定辅助计数器可以通过 SYS_AMEVCNTR1_EL0(n) 访问,其中n介于0到15。 + +详细信息定义在目录:arch/arm64/include/asm/sysreg.h。 + + +用户空间访问 +------------ + +由于以下原因,当前禁止从用户空间访问 AMU 的寄存器: + + - 安全因数:可能会暴露处于安全模式执行的代码信息。 + - 意愿:AMU 是用于系统管理的。 + +同样,该功能对用户空间不可见。 + + +虚拟化 +------ + +由于以下原因,当前禁止从 KVM 客户端的用户空间(EL0)和内核空间(EL1) +访问 AMU 的寄存器: + + - 安全因数:可能会暴露给其他客户端或主机端执行的代码信息。 + +任何试图访问 AMU 寄存器的行为都会触发一个注册在客户端的未定义异常。 diff --git a/Documentation/translations/zh_CN/arch/arm64/booting.txt b/Documentation/translations/zh_CN/arch/arm64/booting.txt new file mode 100644 index 000000000000..630eb32a8854 --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/booting.txt @@ -0,0 +1,246 @@ +Chinese translated version of Documentation/arch/arm64/booting.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +M: Will Deacon +zh_CN: Fu Wei +C: 55f058e7574c3615dea4615573a19bdb258696c6 +--------------------------------------------------------------------- +Documentation/arch/arm64/booting.rst 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +英文版维护者: Will Deacon +中文版维护者: 傅炜 Fu Wei +中文版翻译者: 傅炜 Fu Wei +中文版校译者: 傅炜 Fu Wei +本文翻译提交时的 Git 检出点为: 55f058e7574c3615dea4615573a19bdb258696c6 + +以下为正文 +--------------------------------------------------------------------- + 启动 AArch64 Linux + ================== + +作者: Will Deacon +日期: 2012 年 09 月 07 日 + +本文档基于 Russell King 的 ARM 启动文档,且适用于所有公开发布的 +AArch64 Linux 内核代码。 + +AArch64 异常模型由多个异常级(EL0 - EL3)组成,对于 EL0 和 EL1 异常级 +有对应的安全和非安全模式。EL2 是系统管理级,且仅存在于非安全模式下。 +EL3 是最高特权级,且仅存在于安全模式下。 + +基于本文档的目的,我们将简单地使用‘引导装载程序’(‘boot loader’) +这个术语来定义在将控制权交给 Linux 内核前 CPU 上执行的所有软件。 +这可能包含安全监控和系统管理代码,或者它可能只是一些用于准备最小启动 +环境的指令。 + +基本上,引导装载程序(至少)应实现以下操作: + +1、设置和初始化 RAM +2、设置设备树数据 +3、解压内核映像 +4、调用内核映像 + + +1、设置和初始化 RAM +----------------- + +必要性: 强制 + +引导装载程序应该找到并初始化系统中所有内核用于保持系统变量数据的 RAM。 +这个操作的执行方式因设备而异。(它可能使用内部算法来自动定位和计算所有 +RAM,或可能使用对这个设备已知的 RAM 信息,还可能是引导装载程序设计者 +想到的任何合适的方法。) + + +2、设置设备树数据 +--------------- + +必要性: 强制 + +设备树数据块(dtb)必须 8 字节对齐,且大小不能超过 2MB。由于设备树 +数据块将在使能缓存的情况下以 2MB 粒度被映射,故其不能被置于必须以特定 +属性映射的2M区域内。 + +注: v4.2 之前的版本同时要求设备树数据块被置于从内核映像以下 +text_offset 字节处算起第一个 512MB 内。 + +3、解压内核映像 +------------- + +必要性: 可选 + +AArch64 内核当前没有提供自解压代码,因此如果使用了压缩内核映像文件 +(比如 Image.gz),则需要通过引导装载程序(使用 gzip 等)来进行解压。 +若引导装载程序没有实现这个功能,就要使用非压缩内核映像文件。 + + +4、调用内核映像 +------------- + +必要性: 强制 + +已解压的内核映像包含一个 64 字节的头,内容如下: + + u32 code0; /* 可执行代码 */ + u32 code1; /* 可执行代码 */ + u64 text_offset; /* 映像装载偏移,小端模式 */ + u64 image_size; /* 映像实际大小, 小端模式 */ + u64 flags; /* 内核旗标, 小端模式 * + u64 res2 = 0; /* 保留 */ + u64 res3 = 0; /* 保留 */ + u64 res4 = 0; /* 保留 */ + u32 magic = 0x644d5241; /* 魔数, 小端, "ARM\x64" */ + u32 res5; /* 保留 (用于 PE COFF 偏移) */ + + +映像头注释: + +- 自 v3.17 起,除非另有说明,所有域都是小端模式。 + +- code0/code1 负责跳转到 stext. + +- 当通过 EFI 启动时, 最初 code0/code1 被跳过。 + res5 是到 PE 文件头的偏移,而 PE 文件头含有 EFI 的启动入口点 + (efi_stub_entry)。当 stub 代码完成了它的使命,它会跳转到 code0 + 继续正常的启动流程。 + +- v3.17 之前,未明确指定 text_offset 的字节序。此时,image_size 为零, + 且 text_offset 依照内核字节序为 0x80000。 + 当 image_size 非零,text_offset 为小端模式且是有效值,应被引导加载 + 程序使用。当 image_size 为零,text_offset 可假定为 0x80000。 + +- flags 域 (v3.17 引入) 为 64 位小端模式,其编码如下: + 位 0: 内核字节序。 1 表示大端模式,0 表示小端模式。 + 位 1-2: 内核页大小。 + 0 - 未指定。 + 1 - 4K + 2 - 16K + 3 - 64K + 位 3: 内核物理位置 + 0 - 2MB 对齐基址应尽量靠近内存起始处,因为 + 其基址以下的内存无法通过线性映射访问 + 1 - 2MB 对齐基址可以在物理内存的任意位置 + 位 4-63: 保留。 + +- 当 image_size 为零时,引导装载程序应试图在内核映像末尾之后尽可能 + 多地保留空闲内存供内核直接使用。对内存空间的需求量因所选定的内核 + 特性而异, 并无实际限制。 + +内核映像必须被放置在任意一个可用系统内存 2MB 对齐基址的 text_offset +字节处,并从该处被调用。2MB 对齐基址和内核映像起始地址之间的区域对于 +内核来说没有特殊意义,且可能被用于其他目的。 +从映像起始地址算起,最少必须准备 image_size 字节的空闲内存供内核使用。 +注: v4.6 之前的版本无法使用内核映像物理偏移以下的内存,所以当时建议 +将映像尽量放置在靠近系统内存起始的地方。 + +任何提供给内核的内存(甚至在映像起始地址之前),若未从内核中标记为保留 +(如在设备树(dtb)的 memreserve 区域),都将被认为对内核是可用。 + +在跳转入内核前,必须符合以下状态: + +- 停止所有 DMA 设备,这样内存数据就不会因为虚假网络包或磁盘数据而 + 被破坏。这可能可以节省你许多的调试时间。 + +- 主 CPU 通用寄存器设置 + x0 = 系统 RAM 中设备树数据块(dtb)的物理地址。 + x1 = 0 (保留,将来可能使用) + x2 = 0 (保留,将来可能使用) + x3 = 0 (保留,将来可能使用) + +- CPU 模式 + 所有形式的中断必须在 PSTATE.DAIF 中被屏蔽(Debug、SError、IRQ + 和 FIQ)。 + CPU 必须处于 EL2(推荐,可访问虚拟化扩展)或非安全 EL1 模式下。 + +- 高速缓存、MMU + MMU 必须关闭。 + 指令缓存开启或关闭皆可。 + 已载入的内核映像的相应内存区必须被清理,以达到缓存一致性点(PoC)。 + 当存在系统缓存或其他使能缓存的一致性主控器时,通常需使用虚拟地址 + 维护其缓存,而非 set/way 操作。 + 遵从通过虚拟地址操作维护构架缓存的系统缓存必须被配置,并可以被使能。 + 而不通过虚拟地址操作维护构架缓存的系统缓存(不推荐),必须被配置且 + 禁用。 + + *译者注:对于 PoC 以及缓存相关内容,请参考 ARMv8 构架参考手册 + ARM DDI 0487A + +- 架构计时器 + CNTFRQ 必须设定为计时器的频率,且 CNTVOFF 必须设定为对所有 CPU + 都一致的值。如果在 EL1 模式下进入内核,则 CNTHCTL_EL2 中的 + EL1PCTEN (bit 0) 必须置位。 + +- 一致性 + 通过内核启动的所有 CPU 在内核入口地址上必须处于相同的一致性域中。 + 这可能要根据具体实现来定义初始化过程,以使能每个CPU上对维护操作的 + 接收。 + +- 系统寄存器 + 在进入内核映像的异常级中,所有构架中可写的系统寄存器必须通过软件 + 在一个更高的异常级别下初始化,以防止在 未知 状态下运行。 + + 对于拥有 GICv3 中断控制器并以 v3 模式运行的系统: + - 如果 EL3 存在: + ICC_SRE_EL3.Enable (位 3) 必须初始化为 0b1。 + ICC_SRE_EL3.SRE (位 0) 必须初始化为 0b1。 + - 若内核运行在 EL1: + ICC_SRE_EL2.Enable (位 3) 必须初始化为 0b1。 + ICC_SRE_EL2.SRE (位 0) 必须初始化为 0b1。 + - 设备树(DT)或 ACPI 表必须描述一个 GICv3 中断控制器。 + + 对于拥有 GICv3 中断控制器并以兼容(v2)模式运行的系统: + - 如果 EL3 存在: + ICC_SRE_EL3.SRE (位 0) 必须初始化为 0b0。 + - 若内核运行在 EL1: + ICC_SRE_EL2.SRE (位 0) 必须初始化为 0b0。 + - 设备树(DT)或 ACPI 表必须描述一个 GICv2 中断控制器。 + +以上对于 CPU 模式、高速缓存、MMU、架构计时器、一致性、系统寄存器的 +必要条件描述适用于所有 CPU。所有 CPU 必须在同一异常级别跳入内核。 + +引导装载程序必须在每个 CPU 处于以下状态时跳入内核入口: + +- 主 CPU 必须直接跳入内核映像的第一条指令。通过此 CPU 传递的设备树 + 数据块必须在每个 CPU 节点中包含一个 ‘enable-method’ 属性,所 + 支持的 enable-method 请见下文。 + + 引导装载程序必须生成这些设备树属性,并在跳入内核入口之前将其插入 + 数据块。 + +- enable-method 为 “spin-table” 的 CPU 必须在它们的 CPU + 节点中包含一个 ‘cpu-release-addr’ 属性。这个属性标识了一个 + 64 位自然对齐且初始化为零的内存位置。 + + 这些 CPU 必须在内存保留区(通过设备树中的 /memreserve/ 域传递 + 给内核)中自旋于内核之外,轮询它们的 cpu-release-addr 位置(必须 + 包含在保留区中)。可通过插入 wfe 指令来降低忙循环开销,而主 CPU 将 + 发出 sev 指令。当对 cpu-release-addr 所指位置的读取操作返回非零值 + 时,CPU 必须跳入此值所指向的地址。此值为一个单独的 64 位小端值, + 因此 CPU 须在跳转前将所读取的值转换为其本身的端模式。 + +- enable-method 为 “psci” 的 CPU 保持在内核外(比如,在 + memory 节点中描述为内核空间的内存区外,或在通过设备树 /memreserve/ + 域中描述为内核保留区的空间中)。内核将会发起在 ARM 文档(编号 + ARM DEN 0022A:用于 ARM 上的电源状态协调接口系统软件)中描述的 + CPU_ON 调用来将 CPU 带入内核。 + + *译者注: ARM DEN 0022A 已更新到 ARM DEN 0022C。 + + 设备树必须包含一个 ‘psci’ 节点,请参考以下文档: + Documentation/devicetree/bindings/arm/psci.yaml + + +- 辅助 CPU 通用寄存器设置 + x0 = 0 (保留,将来可能使用) + x1 = 0 (保留,将来可能使用) + x2 = 0 (保留,将来可能使用) + x3 = 0 (保留,将来可能使用) diff --git a/Documentation/translations/zh_CN/arch/arm64/elf_hwcaps.rst b/Documentation/translations/zh_CN/arch/arm64/elf_hwcaps.rst new file mode 100644 index 000000000000..f60ac1580d3e --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/elf_hwcaps.rst @@ -0,0 +1,240 @@ +.. include:: ../../disclaimer-zh_CN.rst + +:Original: :ref:`Documentation/arch/arm64/elf_hwcaps.rst ` + +Translator: Bailu Lin + +================ +ARM64 ELF hwcaps +================ + +这篇文档描述了 arm64 ELF hwcaps 的用法和语义。 + + +1. 简介 +------- + +有些硬件或软件功能仅在某些 CPU 实现上和/或在具体某个内核配置上可用,但 +对于处于 EL0 的用户空间代码没有可用的架构发现机制。内核通过在辅助向量表 +公开一组称为 hwcaps 的标志而把这些功能暴露给用户空间。 + +用户空间软件可以通过获取辅助向量的 AT_HWCAP 或 AT_HWCAP2 条目来测试功能, +并测试是否设置了相关标志,例如:: + + bool floating_point_is_present(void) + { + unsigned long hwcaps = getauxval(AT_HWCAP); + if (hwcaps & HWCAP_FP) + return true; + + return false; + } + +如果软件依赖于 hwcap 描述的功能,在尝试使用该功能前则应检查相关的 hwcap +标志以验证该功能是否存在。 + +不能通过其他方式探查这些功能。当一个功能不可用时,尝试使用它可能导致不可 +预测的行为,并且无法保证能确切的知道该功能不可用,例如 SIGILL。 + + +2. Hwcaps 的说明 +---------------- + +大多数 hwcaps 旨在说明通过架构 ID 寄存器(处于 EL0 的用户空间代码无法访问) +描述的功能的存在。这些 hwcap 通过 ID 寄存器字段定义,并且应根据 ARM 体系 +结构参考手册(ARM ARM)中定义的字段来解释说明。 + +这些 hwcaps 以下面的形式描述:: + + idreg.field == val 表示有某个功能。 + +当 idreg.field 中有 val 时,hwcaps 表示 ARM ARM 定义的功能是有效的,但是 +并不是说要完全和 val 相等,也不是说 idreg.field 描述的其他功能就是缺失的。 + +其他 hwcaps 可能表明无法仅由 ID 寄存器描述的功能的存在。这些 hwcaps 可能 +没有被 ID 寄存器描述,需要参考其他文档。 + + +3. AT_HWCAP 中揭示的 hwcaps +--------------------------- + +HWCAP_FP + ID_AA64PFR0_EL1.FP == 0b0000 表示有此功能。 + +HWCAP_ASIMD + ID_AA64PFR0_EL1.AdvSIMD == 0b0000 表示有此功能。 + +HWCAP_EVTSTRM + 通用计时器频率配置为大约100KHz以生成事件。 + +HWCAP_AES + ID_AA64ISAR0_EL1.AES == 0b0001 表示有此功能。 + +HWCAP_PMULL + ID_AA64ISAR0_EL1.AES == 0b0010 表示有此功能。 + +HWCAP_SHA1 + ID_AA64ISAR0_EL1.SHA1 == 0b0001 表示有此功能。 + +HWCAP_SHA2 + ID_AA64ISAR0_EL1.SHA2 == 0b0001 表示有此功能。 + +HWCAP_CRC32 + ID_AA64ISAR0_EL1.CRC32 == 0b0001 表示有此功能。 + +HWCAP_ATOMICS + ID_AA64ISAR0_EL1.Atomic == 0b0010 表示有此功能。 + +HWCAP_FPHP + ID_AA64PFR0_EL1.FP == 0b0001 表示有此功能。 + +HWCAP_ASIMDHP + ID_AA64PFR0_EL1.AdvSIMD == 0b0001 表示有此功能。 + +HWCAP_CPUID + 根据 Documentation/arch/arm64/cpu-feature-registers.rst 描述,EL0 可以访问 + 某些 ID 寄存器。 + + 这些 ID 寄存器可能表示功能的可用性。 + +HWCAP_ASIMDRDM + ID_AA64ISAR0_EL1.RDM == 0b0001 表示有此功能。 + +HWCAP_JSCVT + ID_AA64ISAR1_EL1.JSCVT == 0b0001 表示有此功能。 + +HWCAP_FCMA + ID_AA64ISAR1_EL1.FCMA == 0b0001 表示有此功能。 + +HWCAP_LRCPC + ID_AA64ISAR1_EL1.LRCPC == 0b0001 表示有此功能。 + +HWCAP_DCPOP + ID_AA64ISAR1_EL1.DPB == 0b0001 表示有此功能。 + +HWCAP_SHA3 + ID_AA64ISAR0_EL1.SHA3 == 0b0001 表示有此功能。 + +HWCAP_SM3 + ID_AA64ISAR0_EL1.SM3 == 0b0001 表示有此功能。 + +HWCAP_SM4 + ID_AA64ISAR0_EL1.SM4 == 0b0001 表示有此功能。 + +HWCAP_ASIMDDP + ID_AA64ISAR0_EL1.DP == 0b0001 表示有此功能。 + +HWCAP_SHA512 + ID_AA64ISAR0_EL1.SHA2 == 0b0010 表示有此功能。 + +HWCAP_SVE + ID_AA64PFR0_EL1.SVE == 0b0001 表示有此功能。 + +HWCAP_ASIMDFHM + ID_AA64ISAR0_EL1.FHM == 0b0001 表示有此功能。 + +HWCAP_DIT + ID_AA64PFR0_EL1.DIT == 0b0001 表示有此功能。 + +HWCAP_USCAT + ID_AA64MMFR2_EL1.AT == 0b0001 表示有此功能。 + +HWCAP_ILRCPC + ID_AA64ISAR1_EL1.LRCPC == 0b0010 表示有此功能。 + +HWCAP_FLAGM + ID_AA64ISAR0_EL1.TS == 0b0001 表示有此功能。 + +HWCAP_SSBS + ID_AA64PFR1_EL1.SSBS == 0b0010 表示有此功能。 + +HWCAP_SB + ID_AA64ISAR1_EL1.SB == 0b0001 表示有此功能。 + +HWCAP_PACA + 如 Documentation/arch/arm64/pointer-authentication.rst 所描述, + ID_AA64ISAR1_EL1.APA == 0b0001 或 ID_AA64ISAR1_EL1.API == 0b0001 + 表示有此功能。 + +HWCAP_PACG + 如 Documentation/arch/arm64/pointer-authentication.rst 所描述, + ID_AA64ISAR1_EL1.GPA == 0b0001 或 ID_AA64ISAR1_EL1.GPI == 0b0001 + 表示有此功能。 + +HWCAP2_DCPODP + + ID_AA64ISAR1_EL1.DPB == 0b0010 表示有此功能。 + +HWCAP2_SVE2 + + ID_AA64ZFR0_EL1.SVEVer == 0b0001 表示有此功能。 + +HWCAP2_SVEAES + + ID_AA64ZFR0_EL1.AES == 0b0001 表示有此功能。 + +HWCAP2_SVEPMULL + + ID_AA64ZFR0_EL1.AES == 0b0010 表示有此功能。 + +HWCAP2_SVEBITPERM + + ID_AA64ZFR0_EL1.BitPerm == 0b0001 表示有此功能。 + +HWCAP2_SVESHA3 + + ID_AA64ZFR0_EL1.SHA3 == 0b0001 表示有此功能。 + +HWCAP2_SVESM4 + + ID_AA64ZFR0_EL1.SM4 == 0b0001 表示有此功能。 + +HWCAP2_FLAGM2 + + ID_AA64ISAR0_EL1.TS == 0b0010 表示有此功能。 + +HWCAP2_FRINT + + ID_AA64ISAR1_EL1.FRINTTS == 0b0001 表示有此功能。 + +HWCAP2_SVEI8MM + + ID_AA64ZFR0_EL1.I8MM == 0b0001 表示有此功能。 + +HWCAP2_SVEF32MM + + ID_AA64ZFR0_EL1.F32MM == 0b0001 表示有此功能。 + +HWCAP2_SVEF64MM + + ID_AA64ZFR0_EL1.F64MM == 0b0001 表示有此功能。 + +HWCAP2_SVEBF16 + + ID_AA64ZFR0_EL1.BF16 == 0b0001 表示有此功能。 + +HWCAP2_I8MM + + ID_AA64ISAR1_EL1.I8MM == 0b0001 表示有此功能。 + +HWCAP2_BF16 + + ID_AA64ISAR1_EL1.BF16 == 0b0001 表示有此功能。 + +HWCAP2_DGH + + ID_AA64ISAR1_EL1.DGH == 0b0001 表示有此功能。 + +HWCAP2_RNG + + ID_AA64ISAR0_EL1.RNDR == 0b0001 表示有此功能。 + +HWCAP2_BTI + + ID_AA64PFR0_EL1.BT == 0b0001 表示有此功能。 + + +4. 未使用的 AT_HWCAP 位 +----------------------- + +为了与用户空间交互,内核保证 AT_HWCAP 的第62、63位将始终返回0。 diff --git a/Documentation/translations/zh_CN/arch/arm64/hugetlbpage.rst b/Documentation/translations/zh_CN/arch/arm64/hugetlbpage.rst new file mode 100644 index 000000000000..8079eadde29a --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/hugetlbpage.rst @@ -0,0 +1,45 @@ +.. include:: ../../disclaimer-zh_CN.rst + +:Original: :ref:`Documentation/arch/arm64/hugetlbpage.rst ` + +Translator: Bailu Lin + +===================== +ARM64中的 HugeTLBpage +===================== + +大页依靠有效利用 TLBs 来提高地址翻译的性能。这取决于以下 +两点 - + + - 大页的大小 + - TLBs 支持的条目大小 + +ARM64 接口支持2种大页方式。 + +1) pud/pmd 级别的块映射 +----------------------- + +这是常规大页,他们的 pmd 或 pud 页面表条目指向一个内存块。 +不管 TLB 中支持的条目大小如何,块映射可以减少翻译大页地址 +所需遍历的页表深度。 + +2) 使用连续位 +------------- + +架构中转换页表条目(D4.5.3, ARM DDI 0487C.a)中提供一个连续 +位告诉 MMU 这个条目是一个连续条目集的一员,它可以被缓存在单 +个 TLB 条目中。 + +在 Linux 中连续位用来增加 pmd 和 pte(最后一级)级别映射的大 +小。受支持的连续页表条目数量因页面大小和页表级别而异。 + + +支持以下大页尺寸配置 - + + ====== ======== ==== ======== === + - CONT PTE PMD CONT PMD PUD + ====== ======== ==== ======== === + 4K: 64K 2M 32M 1G + 16K: 2M 32M 1G + 64K: 2M 512M 16G + ====== ======== ==== ======== === diff --git a/Documentation/translations/zh_CN/arch/arm64/index.rst b/Documentation/translations/zh_CN/arch/arm64/index.rst new file mode 100644 index 000000000000..e12b9f6e5d6c --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/index.rst @@ -0,0 +1,19 @@ +.. include:: ../../disclaimer-zh_CN.rst + +:Original: :ref:`Documentation/arch/arm64/index.rst ` +:Translator: Bailu Lin + +.. _cn_arm64_index: + + +========== +ARM64 架构 +========== + +.. toctree:: + :maxdepth: 2 + + amu + hugetlbpage + perf + elf_hwcaps diff --git a/Documentation/translations/zh_CN/arch/arm64/legacy_instructions.txt b/Documentation/translations/zh_CN/arch/arm64/legacy_instructions.txt new file mode 100644 index 000000000000..e469fccbe356 --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/legacy_instructions.txt @@ -0,0 +1,72 @@ +Chinese translated version of Documentation/arch/arm64/legacy_instructions.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Punit Agrawal + Suzuki K. Poulose +Chinese maintainer: Fu Wei +--------------------------------------------------------------------- +Documentation/arch/arm64/legacy_instructions.rst 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +本文翻译提交时的 Git 检出点为: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 + +英文版维护者: Punit Agrawal + Suzuki K. Poulose +中文版维护者: 傅炜 Fu Wei +中文版翻译者: 傅炜 Fu Wei +中文版校译者: 傅炜 Fu Wei + +以下为正文 +--------------------------------------------------------------------- +Linux 内核在 arm64 上的移植提供了一个基础框架,以支持构架中正在被淘汰或已废弃指令的模拟执行。 +这个基础框架的代码使用未定义指令钩子(hooks)来支持模拟。如果指令存在,它也允许在硬件中启用该指令。 + +模拟模式可通过写 sysctl 节点(/proc/sys/abi)来控制。 +不同的执行方式及 sysctl 节点的相应值,解释如下: + +* Undef(未定义) + 值: 0 + 产生未定义指令终止异常。它是那些构架中已废弃的指令,如 SWP,的默认处理方式。 + +* Emulate(模拟) + 值: 1 + 使用软件模拟方式。为解决软件迁移问题,这种模拟指令模式的使用是被跟踪的,并会发出速率限制警告。 + 它是那些构架中正在被淘汰的指令,如 CP15 barriers(隔离指令),的默认处理方式。 + +* Hardware Execution(硬件执行) + 值: 2 + 虽然标记为正在被淘汰,但一些实现可能提供硬件执行这些指令的使能/禁用操作。 + 使用硬件执行一般会有更好的性能,但将无法收集运行时对正被淘汰指令的使用统计数据。 + +默认执行模式依赖于指令在构架中状态。正在被淘汰的指令应该以模拟(Emulate)作为默认模式, +而已废弃的指令必须默认使用未定义(Undef)模式 + +注意:指令模拟可能无法应对所有情况。更多详情请参考单独的指令注释。 + +受支持的遗留指令 +------------- +* SWP{B} +节点: /proc/sys/abi/swp +状态: 已废弃 +默认执行方式: Undef (0) + +* CP15 Barriers +节点: /proc/sys/abi/cp15_barrier +状态: 正被淘汰,不推荐使用 +默认执行方式: Emulate (1) + +* SETEND +节点: /proc/sys/abi/setend +状态: 正被淘汰,不推荐使用 +默认执行方式: Emulate (1)* +注:为了使能这个特性,系统中的所有 CPU 必须在 EL0 支持混合字节序。 +如果一个新的 CPU (不支持混合字节序) 在使能这个特性后被热插入系统, +在应用中可能会出现不可预期的结果。 diff --git a/Documentation/translations/zh_CN/arch/arm64/memory.txt b/Documentation/translations/zh_CN/arch/arm64/memory.txt new file mode 100644 index 000000000000..c6962e9cb9f8 --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/memory.txt @@ -0,0 +1,114 @@ +Chinese translated version of Documentation/arch/arm64/memory.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Catalin Marinas +Chinese maintainer: Fu Wei +--------------------------------------------------------------------- +Documentation/arch/arm64/memory.rst 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +本文翻译提交时的 Git 检出点为: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 + +英文版维护者: Catalin Marinas +中文版维护者: 傅炜 Fu Wei +中文版翻译者: 傅炜 Fu Wei +中文版校译者: 傅炜 Fu Wei + +以下为正文 +--------------------------------------------------------------------- + Linux 在 AArch64 中的内存布局 + =========================== + +作者: Catalin Marinas + +本文档描述 AArch64 Linux 内核所使用的虚拟内存布局。此构架可以实现 +页大小为 4KB 的 4 级转换表和页大小为 64KB 的 3 级转换表。 + +AArch64 Linux 使用 3 级或 4 级转换表,其页大小配置为 4KB,对于用户和内核 +分别都有 39-bit (512GB) 或 48-bit (256TB) 的虚拟地址空间。 +对于页大小为 64KB的配置,仅使用 2 级转换表,有 42-bit (4TB) 的虚拟地址空间,但内存布局相同。 + +用户地址空间的 63:48 位为 0,而内核地址空间的相应位为 1。TTBRx 的 +选择由虚拟地址的 63 位给出。swapper_pg_dir 仅包含内核(全局)映射, +而用户 pgd 仅包含用户(非全局)映射。swapper_pg_dir 地址被写入 +TTBR1 中,且从不写入 TTBR0。 + + +AArch64 Linux 在页大小为 4KB,并使用 3 级转换表时的内存布局: + +起始地址 结束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 0000007fffffffff 512GB 用户空间 +ffffff8000000000 ffffffffffffffff 512GB 内核空间 + + +AArch64 Linux 在页大小为 4KB,并使用 4 级转换表时的内存布局: + +起始地址 结束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 0000ffffffffffff 256TB 用户空间 +ffff000000000000 ffffffffffffffff 256TB 内核空间 + + +AArch64 Linux 在页大小为 64KB,并使用 2 级转换表时的内存布局: + +起始地址 结束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 000003ffffffffff 4TB 用户空间 +fffffc0000000000 ffffffffffffffff 4TB 内核空间 + + +AArch64 Linux 在页大小为 64KB,并使用 3 级转换表时的内存布局: + +起始地址 结束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 0000ffffffffffff 256TB 用户空间 +ffff000000000000 ffffffffffffffff 256TB 内核空间 + + +更详细的内核虚拟内存布局,请参阅内核启动信息。 + + +4KB 页大小的转换表查找: + ++--------+--------+--------+--------+--------+--------+--------+--------+ +|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| ++--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | | + | | | | | v + | | | | | [11:0] 页内偏移 + | | | | +-> [20:12] L3 索引 + | | | +-----------> [29:21] L2 索引 + | | +---------------------> [38:30] L1 索引 + | +-------------------------------> [47:39] L0 索引 + +-------------------------------------------------> [63] TTBR0/1 + + +64KB 页大小的转换表查找: + ++--------+--------+--------+--------+--------+--------+--------+--------+ +|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| ++--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | + | | | | v + | | | | [15:0] 页内偏移 + | | | +----------> [28:16] L3 索引 + | | +--------------------------> [41:29] L2 索引 + | +-------------------------------> [47:42] L1 索引 + +-------------------------------------------------> [63] TTBR0/1 + + +当使用 KVM 时, 管理程序(hypervisor)在 EL2 中通过相对内核虚拟地址的 +一个固定偏移来映射内核页(内核虚拟地址的高 24 位设为零): + +起始地址 结束地址 大小 用途 +----------------------------------------------------------------------- +0000004000000000 0000007fffffffff 256GB 在 HYP 中映射的内核对象 diff --git a/Documentation/translations/zh_CN/arch/arm64/perf.rst b/Documentation/translations/zh_CN/arch/arm64/perf.rst new file mode 100644 index 000000000000..6be72704e659 --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/perf.rst @@ -0,0 +1,86 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../../disclaimer-zh_CN.rst + +:Original: :ref:`Documentation/arch/arm64/perf.rst ` + +Translator: Bailu Lin + +============= +Perf 事件属性 +============= + +:作者: Andrew Murray +:日期: 2019-03-06 + +exclude_user +------------ + +该属性排除用户空间。 + +用户空间始终运行在 EL0,因此该属性将排除 EL0。 + + +exclude_kernel +-------------- + +该属性排除内核空间。 + +打开 VHE 时内核运行在 EL2,不打开 VHE 时内核运行在 EL1。客户机 +内核总是运行在 EL1。 + +对于宿主机,该属性排除 EL1 和 VHE 上的 EL2。 + +对于客户机,该属性排除 EL1。请注意客户机从来不会运行在 EL2。 + + +exclude_hv +---------- + +该属性排除虚拟机监控器。 + +对于 VHE 宿主机该属性将被忽略,此时我们认为宿主机内核是虚拟机监 +控器。 + +对于 non-VHE 宿主机该属性将排除 EL2,因为虚拟机监控器运行在 EL2 +的任何代码主要用于客户机和宿主机的切换。 + +对于客户机该属性无效。请注意客户机从来不会运行在 EL2。 + + +exclude_host / exclude_guest +---------------------------- + +这些属性分别排除了 KVM 宿主机和客户机。 + +KVM 宿主机可能运行在 EL0(用户空间),EL1(non-VHE 内核)和 +EL2(VHE 内核 或 non-VHE 虚拟机监控器)。 + +KVM 客户机可能运行在 EL0(用户空间)和 EL1(内核)。 + +由于宿主机和客户机之间重叠的异常级别,我们不能仅仅依靠 PMU 的硬件异 +常过滤机制-因此我们必须启用/禁用对于客户机进入和退出的计数。而这在 +VHE 和 non-VHE 系统上表现不同。 + +对于 non-VHE 系统的 exclude_host 属性排除 EL2 - 在进入和退出客户 +机时,我们会根据 exclude_host 和 exclude_guest 属性在适当的情况下 +禁用/启用该事件。 + +对于 VHE 系统的 exclude_guest 属性排除 EL1,而对其中的 exclude_host +属性同时排除 EL0,EL2。在进入和退出客户机时,我们会适当地根据 +exclude_host 和 exclude_guest 属性包括/排除 EL0。 + +以上声明也适用于在 not-VHE 客户机使用这些属性时,但是请注意客户机从 +来不会运行在 EL2。 + + +准确性 +------ + +在 non-VHE 宿主机上,我们在 EL2 进入/退出宿主机/客户机的切换时启用/ +关闭计数器 -但是在启用/禁用计数器和进入/退出客户机之间存在一段延时。 +对于 exclude_host, 我们可以通过过滤 EL2 消除在客户机进入/退出边界 +上用于计数客户机事件的宿主机事件计数器。但是当使用 !exclude_hv 时, +在客户机进入/退出有一个小的停电窗口无法捕获到宿主机的事件。 + +在 VHE 系统没有停电窗口。 diff --git a/Documentation/translations/zh_CN/arch/arm64/silicon-errata.txt b/Documentation/translations/zh_CN/arch/arm64/silicon-errata.txt new file mode 100644 index 000000000000..f4767ffdd61d --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/silicon-errata.txt @@ -0,0 +1,74 @@ +Chinese translated version of Documentation/arch/arm64/silicon-errata.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +M: Will Deacon +zh_CN: Fu Wei +C: 1926e54f115725a9248d0c4c65c22acaf94de4c4 +--------------------------------------------------------------------- +Documentation/arch/arm64/silicon-errata.rst 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +英文版维护者: Will Deacon +中文版维护者: 傅炜 Fu Wei +中文版翻译者: 傅炜 Fu Wei +中文版校译者: 傅炜 Fu Wei +本文翻译提交时的 Git 检出点为: 1926e54f115725a9248d0c4c65c22acaf94de4c4 + +以下为正文 +--------------------------------------------------------------------- + 芯片勘误和软件补救措施 + ================== + +作者: Will Deacon +日期: 2015年11月27日 + +一个不幸的现实:硬件经常带有一些所谓的“瑕疵(errata)”,导致其在 +某些特定情况下会违背构架定义的行为。就基于 ARM 的硬件而言,这些瑕疵 +大体可分为以下几类: + + A 类:无可行补救措施的严重缺陷。 + B 类:有可接受的补救措施的重大或严重缺陷。 + C 类:在正常操作中不会显现的小瑕疵。 + +更多资讯,请在 infocenter.arm.com (需注册)中查阅“软件开发者勘误 +笔记”(“Software Developers Errata Notice”)文档。 + +对于 Linux 而言,B 类缺陷可能需要操作系统的某些特别处理。例如,避免 +一个特殊的代码序列,或是以一种特定的方式配置处理器。在某种不太常见的 +情况下,为将 A 类缺陷当作 C 类处理,可能需要用类似的手段。这些手段被 +统称为“软件补救措施”,且仅在少数情况需要(例如,那些需要一个运行在 +非安全异常级的补救措施 *并且* 能被 Linux 触发的情况)。 + +对于尚在讨论中的可能对未受瑕疵影响的系统产生干扰的软件补救措施,有一个 +相应的内核配置(Kconfig)选项被加在 “内核特性(Kernel Features)”-> +“基于可选方法框架的 ARM 瑕疵补救措施(ARM errata workarounds via +the alternatives framework)"。这些选项被默认开启,若探测到受影响的CPU, +补丁将在运行时被使用。至于对系统运行影响较小的补救措施,内核配置选项 +并不存在,且代码以某种规避瑕疵的方式被构造(带注释为宜)。 + +这种做法对于在任意内核源代码树中准确地判断出哪个瑕疵已被软件方法所补救 +稍微有点麻烦,所以在 Linux 内核中此文件作为软件补救措施的注册表, +并将在新的软件补救措施被提交和向后移植(backported)到稳定内核时被更新。 + +| 实现者 | 受影响的组件 | 勘误编号 | 内核配置 | ++----------------+-----------------+-----------------+-------------------------+ +| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | +| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | +| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | +| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | +| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | +| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | +| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | +| ARM | Cortex-A57 | #852523 | N/A | +| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | +| | | | | +| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 | +| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | diff --git a/Documentation/translations/zh_CN/arch/arm64/tagged-pointers.txt b/Documentation/translations/zh_CN/arch/arm64/tagged-pointers.txt new file mode 100644 index 000000000000..27577c3c5e3f --- /dev/null +++ b/Documentation/translations/zh_CN/arch/arm64/tagged-pointers.txt @@ -0,0 +1,52 @@ +Chinese translated version of Documentation/arch/arm64/tagged-pointers.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Will Deacon +Chinese maintainer: Fu Wei +--------------------------------------------------------------------- +Documentation/arch/arm64/tagged-pointers.rst 的中文翻译 + +如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 +交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 +译存在问题,请联系中文版维护者。 + +英文版维护者: Will Deacon +中文版维护者: 傅炜 Fu Wei +中文版翻译者: 傅炜 Fu Wei +中文版校译者: 傅炜 Fu Wei + +以下为正文 +--------------------------------------------------------------------- + Linux 在 AArch64 中带标记的虚拟地址 + ================================= + +作者: Will Deacon +日期: 2013 年 06 月 12 日 + +本文档简述了在 AArch64 地址转换系统中提供的带标记的虚拟地址及其在 +AArch64 Linux 中的潜在用途。 + +内核提供的地址转换表配置使通过 TTBR0 完成的虚拟地址转换(即用户空间 +映射),其虚拟地址的最高 8 位(63:56)会被转换硬件所忽略。这种机制 +让这些位可供应用程序自由使用,其注意事项如下: + + (1) 内核要求所有传递到 EL1 的用户空间地址带有 0x00 标记。 + 这意味着任何携带用户空间虚拟地址的系统调用(syscall) + 参数 *必须* 在陷入内核前使它们的最高字节被清零。 + + (2) 非零标记在传递信号时不被保存。这意味着在应用程序中利用了 + 标记的信号处理函数无法依赖 siginfo_t 的用户空间虚拟 + 地址所携带的包含其内部域信息的标记。此规则的一个例外是 + 当信号是在调试观察点的异常处理程序中产生的,此时标记的 + 信息将被保存。 + + (3) 当使用带标记的指针时需特别留心,因为仅对两个虚拟地址 + 的高字节,C 编译器很可能无法判断它们是不同的。 + +此构架会阻止对带标记的 PC 指针的利用,因此在异常返回时,其高字节 +将被设置成一个为 “55” 的扩展符。 diff --git a/Documentation/translations/zh_CN/arch/index.rst b/Documentation/translations/zh_CN/arch/index.rst index 908ea131bb1c..6fa0cb671009 100644 --- a/Documentation/translations/zh_CN/arch/index.rst +++ b/Documentation/translations/zh_CN/arch/index.rst @@ -9,7 +9,7 @@ :maxdepth: 2 ../mips/index - ../arm64/index + arm64/index ../riscv/index openrisc/index parisc/index diff --git a/Documentation/translations/zh_CN/arm64/amu.rst b/Documentation/translations/zh_CN/arm64/amu.rst deleted file mode 100644 index ab7180f91394..000000000000 --- a/Documentation/translations/zh_CN/arm64/amu.rst +++ /dev/null @@ -1,100 +0,0 @@ -.. include:: ../disclaimer-zh_CN.rst - -:Original: :ref:`Documentation/arm64/amu.rst ` - -Translator: Bailu Lin - -================================== -AArch64 Linux 中扩展的活动监控单元 -================================== - -作者: Ionela Voinescu - -日期: 2019-09-10 - -本文档简要描述了 AArch64 Linux 支持的活动监控单元的规范。 - - -架构总述 --------- - -活动监控是 ARMv8.4 CPU 架构引入的一个可选扩展特性。 - -活动监控单元(在每个 CPU 中实现)为系统管理提供了性能计数器。既可以通 -过系统寄存器的方式访问计数器,同时也支持外部内存映射的方式访问计数器。 - -AMUv1 架构实现了一个由4个固定的64位事件计数器组成的计数器组。 - - - CPU 周期计数器:同 CPU 的频率增长 - - 常量计数器:同固定的系统时钟频率增长 - - 淘汰指令计数器: 同每次架构指令执行增长 - - 内存停顿周期计数器:计算由在时钟域内的最后一级缓存中未命中而引起 - 的指令调度停顿周期数 - -当处于 WFI 或者 WFE 状态时,计数器不会增长。 - -AMU 架构提供了一个高达16位的事件计数器空间,未来新的 AMU 版本中可能 -用它来实现新增的事件计数器。 - -另外,AMUv1 实现了一个多达16个64位辅助事件计数器的计数器组。 - -冷复位时所有的计数器会清零。 - - -基本支持 --------- - -内核可以安全地运行在支持 AMU 和不支持 AMU 的 CPU 组合中。 -因此,当配置 CONFIG_ARM64_AMU_EXTN 后我们无条件使能后续 -(secondary or hotplugged) CPU 检测和使用这个特性。 - -当在 CPU 上检测到该特性时,我们会标记为特性可用但是不能保证计数器的功能, -仅表明有扩展属性。 - -固件(代码运行在高异常级别,例如 arm-tf )需支持以下功能: - - - 提供低异常级别(EL2 和 EL1)访问 AMU 寄存器的能力。 - - 使能计数器。如果未使能,它的值应为 0。 - - 在从电源关闭状态启动 CPU 前或后保存或者恢复计数器。 - -当使用使能了该特性的内核启动但固件损坏时,访问计数器寄存器可能会遭遇 -panic 或者死锁。即使未发现这些症状,计数器寄存器返回的数据结果并不一 -定能反映真实情况。通常,计数器会返回 0,表明他们未被使能。 - -如果固件没有提供适当的支持最好关闭 CONFIG_ARM64_AMU_EXTN。 -值得注意的是,出于安全原因,不要绕过 AMUSERRENR_EL0 设置而捕获从 -EL0(用户空间) 访问 EL1(内核空间)。 因此,固件应该确保访问 AMU寄存器 -不会困在 EL2或EL3。 - -AMUv1 的固定计数器可以通过如下系统寄存器访问: - - - SYS_AMEVCNTR0_CORE_EL0 - - SYS_AMEVCNTR0_CONST_EL0 - - SYS_AMEVCNTR0_INST_RET_EL0 - - SYS_AMEVCNTR0_MEM_STALL_EL0 - -特定辅助计数器可以通过 SYS_AMEVCNTR1_EL0(n) 访问,其中n介于0到15。 - -详细信息定义在目录:arch/arm64/include/asm/sysreg.h。 - - -用户空间访问 ------------- - -由于以下原因,当前禁止从用户空间访问 AMU 的寄存器: - - - 安全因数:可能会暴露处于安全模式执行的代码信息。 - - 意愿:AMU 是用于系统管理的。 - -同样,该功能对用户空间不可见。 - - -虚拟化 ------- - -由于以下原因,当前禁止从 KVM 客户端的用户空间(EL0)和内核空间(EL1) -访问 AMU 的寄存器: - - - 安全因数:可能会暴露给其他客户端或主机端执行的代码信息。 - -任何试图访问 AMU 寄存器的行为都会触发一个注册在客户端的未定义异常。 diff --git a/Documentation/translations/zh_CN/arm64/booting.txt b/Documentation/translations/zh_CN/arm64/booting.txt deleted file mode 100644 index 5b0164132c71..000000000000 --- a/Documentation/translations/zh_CN/arm64/booting.txt +++ /dev/null @@ -1,246 +0,0 @@ -Chinese translated version of Documentation/arm64/booting.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -M: Will Deacon -zh_CN: Fu Wei -C: 55f058e7574c3615dea4615573a19bdb258696c6 ---------------------------------------------------------------------- -Documentation/arm64/booting.rst 的中文翻译 - -如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 -交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 -译存在问题,请联系中文版维护者。 - -英文版维护者: Will Deacon -中文版维护者: 傅炜 Fu Wei -中文版翻译者: 傅炜 Fu Wei -中文版校译者: 傅炜 Fu Wei -本文翻译提交时的 Git 检出点为: 55f058e7574c3615dea4615573a19bdb258696c6 - -以下为正文 ---------------------------------------------------------------------- - 启动 AArch64 Linux - ================== - -作者: Will Deacon -日期: 2012 年 09 月 07 日 - -本文档基于 Russell King 的 ARM 启动文档,且适用于所有公开发布的 -AArch64 Linux 内核代码。 - -AArch64 异常模型由多个异常级(EL0 - EL3)组成,对于 EL0 和 EL1 异常级 -有对应的安全和非安全模式。EL2 是系统管理级,且仅存在于非安全模式下。 -EL3 是最高特权级,且仅存在于安全模式下。 - -基于本文档的目的,我们将简单地使用‘引导装载程序’(‘boot loader’) -这个术语来定义在将控制权交给 Linux 内核前 CPU 上执行的所有软件。 -这可能包含安全监控和系统管理代码,或者它可能只是一些用于准备最小启动 -环境的指令。 - -基本上,引导装载程序(至少)应实现以下操作: - -1、设置和初始化 RAM -2、设置设备树数据 -3、解压内核映像 -4、调用内核映像 - - -1、设置和初始化 RAM ------------------ - -必要性: 强制 - -引导装载程序应该找到并初始化系统中所有内核用于保持系统变量数据的 RAM。 -这个操作的执行方式因设备而异。(它可能使用内部算法来自动定位和计算所有 -RAM,或可能使用对这个设备已知的 RAM 信息,还可能是引导装载程序设计者 -想到的任何合适的方法。) - - -2、设置设备树数据 ---------------- - -必要性: 强制 - -设备树数据块(dtb)必须 8 字节对齐,且大小不能超过 2MB。由于设备树 -数据块将在使能缓存的情况下以 2MB 粒度被映射,故其不能被置于必须以特定 -属性映射的2M区域内。 - -注: v4.2 之前的版本同时要求设备树数据块被置于从内核映像以下 -text_offset 字节处算起第一个 512MB 内。 - -3、解压内核映像 -------------- - -必要性: 可选 - -AArch64 内核当前没有提供自解压代码,因此如果使用了压缩内核映像文件 -(比如 Image.gz),则需要通过引导装载程序(使用 gzip 等)来进行解压。 -若引导装载程序没有实现这个功能,就要使用非压缩内核映像文件。 - - -4、调用内核映像 -------------- - -必要性: 强制 - -已解压的内核映像包含一个 64 字节的头,内容如下: - - u32 code0; /* 可执行代码 */ - u32 code1; /* 可执行代码 */ - u64 text_offset; /* 映像装载偏移,小端模式 */ - u64 image_size; /* 映像实际大小, 小端模式 */ - u64 flags; /* 内核旗标, 小端模式 * - u64 res2 = 0; /* 保留 */ - u64 res3 = 0; /* 保留 */ - u64 res4 = 0; /* 保留 */ - u32 magic = 0x644d5241; /* 魔数, 小端, "ARM\x64" */ - u32 res5; /* 保留 (用于 PE COFF 偏移) */ - - -映像头注释: - -- 自 v3.17 起,除非另有说明,所有域都是小端模式。 - -- code0/code1 负责跳转到 stext. - -- 当通过 EFI 启动时, 最初 code0/code1 被跳过。 - res5 是到 PE 文件头的偏移,而 PE 文件头含有 EFI 的启动入口点 - (efi_stub_entry)。当 stub 代码完成了它的使命,它会跳转到 code0 - 继续正常的启动流程。 - -- v3.17 之前,未明确指定 text_offset 的字节序。此时,image_size 为零, - 且 text_offset 依照内核字节序为 0x80000。 - 当 image_size 非零,text_offset 为小端模式且是有效值,应被引导加载 - 程序使用。当 image_size 为零,text_offset 可假定为 0x80000。 - -- flags 域 (v3.17 引入) 为 64 位小端模式,其编码如下: - 位 0: 内核字节序。 1 表示大端模式,0 表示小端模式。 - 位 1-2: 内核页大小。 - 0 - 未指定。 - 1 - 4K - 2 - 16K - 3 - 64K - 位 3: 内核物理位置 - 0 - 2MB 对齐基址应尽量靠近内存起始处,因为 - 其基址以下的内存无法通过线性映射访问 - 1 - 2MB 对齐基址可以在物理内存的任意位置 - 位 4-63: 保留。 - -- 当 image_size 为零时,引导装载程序应试图在内核映像末尾之后尽可能 - 多地保留空闲内存供内核直接使用。对内存空间的需求量因所选定的内核 - 特性而异, 并无实际限制。 - -内核映像必须被放置在任意一个可用系统内存 2MB 对齐基址的 text_offset -字节处,并从该处被调用。2MB 对齐基址和内核映像起始地址之间的区域对于 -内核来说没有特殊意义,且可能被用于其他目的。 -从映像起始地址算起,最少必须准备 image_size 字节的空闲内存供内核使用。 -注: v4.6 之前的版本无法使用内核映像物理偏移以下的内存,所以当时建议 -将映像尽量放置在靠近系统内存起始的地方。 - -任何提供给内核的内存(甚至在映像起始地址之前),若未从内核中标记为保留 -(如在设备树(dtb)的 memreserve 区域),都将被认为对内核是可用。 - -在跳转入内核前,必须符合以下状态: - -- 停止所有 DMA 设备,这样内存数据就不会因为虚假网络包或磁盘数据而 - 被破坏。这可能可以节省你许多的调试时间。 - -- 主 CPU 通用寄存器设置 - x0 = 系统 RAM 中设备树数据块(dtb)的物理地址。 - x1 = 0 (保留,将来可能使用) - x2 = 0 (保留,将来可能使用) - x3 = 0 (保留,将来可能使用) - -- CPU 模式 - 所有形式的中断必须在 PSTATE.DAIF 中被屏蔽(Debug、SError、IRQ - 和 FIQ)。 - CPU 必须处于 EL2(推荐,可访问虚拟化扩展)或非安全 EL1 模式下。 - -- 高速缓存、MMU - MMU 必须关闭。 - 指令缓存开启或关闭皆可。 - 已载入的内核映像的相应内存区必须被清理,以达到缓存一致性点(PoC)。 - 当存在系统缓存或其他使能缓存的一致性主控器时,通常需使用虚拟地址 - 维护其缓存,而非 set/way 操作。 - 遵从通过虚拟地址操作维护构架缓存的系统缓存必须被配置,并可以被使能。 - 而不通过虚拟地址操作维护构架缓存的系统缓存(不推荐),必须被配置且 - 禁用。 - - *译者注:对于 PoC 以及缓存相关内容,请参考 ARMv8 构架参考手册 - ARM DDI 0487A - -- 架构计时器 - CNTFRQ 必须设定为计时器的频率,且 CNTVOFF 必须设定为对所有 CPU - 都一致的值。如果在 EL1 模式下进入内核,则 CNTHCTL_EL2 中的 - EL1PCTEN (bit 0) 必须置位。 - -- 一致性 - 通过内核启动的所有 CPU 在内核入口地址上必须处于相同的一致性域中。 - 这可能要根据具体实现来定义初始化过程,以使能每个CPU上对维护操作的 - 接收。 - -- 系统寄存器 - 在进入内核映像的异常级中,所有构架中可写的系统寄存器必须通过软件 - 在一个更高的异常级别下初始化,以防止在 未知 状态下运行。 - - 对于拥有 GICv3 中断控制器并以 v3 模式运行的系统: - - 如果 EL3 存在: - ICC_SRE_EL3.Enable (位 3) 必须初始化为 0b1。 - ICC_SRE_EL3.SRE (位 0) 必须初始化为 0b1。 - - 若内核运行在 EL1: - ICC_SRE_EL2.Enable (位 3) 必须初始化为 0b1。 - ICC_SRE_EL2.SRE (位 0) 必须初始化为 0b1。 - - 设备树(DT)或 ACPI 表必须描述一个 GICv3 中断控制器。 - - 对于拥有 GICv3 中断控制器并以兼容(v2)模式运行的系统: - - 如果 EL3 存在: - ICC_SRE_EL3.SRE (位 0) 必须初始化为 0b0。 - - 若内核运行在 EL1: - ICC_SRE_EL2.SRE (位 0) 必须初始化为 0b0。 - - 设备树(DT)或 ACPI 表必须描述一个 GICv2 中断控制器。 - -以上对于 CPU 模式、高速缓存、MMU、架构计时器、一致性、系统寄存器的 -必要条件描述适用于所有 CPU。所有 CPU 必须在同一异常级别跳入内核。 - -引导装载程序必须在每个 CPU 处于以下状态时跳入内核入口: - -- 主 CPU 必须直接跳入内核映像的第一条指令。通过此 CPU 传递的设备树 - 数据块必须在每个 CPU 节点中包含一个 ‘enable-method’ 属性,所 - 支持的 enable-method 请见下文。 - - 引导装载程序必须生成这些设备树属性,并在跳入内核入口之前将其插入 - 数据块。 - -- enable-method 为 “spin-table” 的 CPU 必须在它们的 CPU - 节点中包含一个 ‘cpu-release-addr’ 属性。这个属性标识了一个 - 64 位自然对齐且初始化为零的内存位置。 - - 这些 CPU 必须在内存保留区(通过设备树中的 /memreserve/ 域传递 - 给内核)中自旋于内核之外,轮询它们的 cpu-release-addr 位置(必须 - 包含在保留区中)。可通过插入 wfe 指令来降低忙循环开销,而主 CPU 将 - 发出 sev 指令。当对 cpu-release-addr 所指位置的读取操作返回非零值 - 时,CPU 必须跳入此值所指向的地址。此值为一个单独的 64 位小端值, - 因此 CPU 须在跳转前将所读取的值转换为其本身的端模式。 - -- enable-method 为 “psci” 的 CPU 保持在内核外(比如,在 - memory 节点中描述为内核空间的内存区外,或在通过设备树 /memreserve/ - 域中描述为内核保留区的空间中)。内核将会发起在 ARM 文档(编号 - ARM DEN 0022A:用于 ARM 上的电源状态协调接口系统软件)中描述的 - CPU_ON 调用来将 CPU 带入内核。 - - *译者注: ARM DEN 0022A 已更新到 ARM DEN 0022C。 - - 设备树必须包含一个 ‘psci’ 节点,请参考以下文档: - Documentation/devicetree/bindings/arm/psci.yaml - - -- 辅助 CPU 通用寄存器设置 - x0 = 0 (保留,将来可能使用) - x1 = 0 (保留,将来可能使用) - x2 = 0 (保留,将来可能使用) - x3 = 0 (保留,将来可能使用) diff --git a/Documentation/translations/zh_CN/arm64/elf_hwcaps.rst b/Documentation/translations/zh_CN/arm64/elf_hwcaps.rst deleted file mode 100644 index 9aa4637eac97..000000000000 --- a/Documentation/translations/zh_CN/arm64/elf_hwcaps.rst +++ /dev/null @@ -1,240 +0,0 @@ -.. include:: ../disclaimer-zh_CN.rst - -:Original: :ref:`Documentation/arm64/elf_hwcaps.rst ` - -Translator: Bailu Lin - -================ -ARM64 ELF hwcaps -================ - -这篇文档描述了 arm64 ELF hwcaps 的用法和语义。 - - -1. 简介 -------- - -有些硬件或软件功能仅在某些 CPU 实现上和/或在具体某个内核配置上可用,但 -对于处于 EL0 的用户空间代码没有可用的架构发现机制。内核通过在辅助向量表 -公开一组称为 hwcaps 的标志而把这些功能暴露给用户空间。 - -用户空间软件可以通过获取辅助向量的 AT_HWCAP 或 AT_HWCAP2 条目来测试功能, -并测试是否设置了相关标志,例如:: - - bool floating_point_is_present(void) - { - unsigned long hwcaps = getauxval(AT_HWCAP); - if (hwcaps & HWCAP_FP) - return true; - - return false; - } - -如果软件依赖于 hwcap 描述的功能,在尝试使用该功能前则应检查相关的 hwcap -标志以验证该功能是否存在。 - -不能通过其他方式探查这些功能。当一个功能不可用时,尝试使用它可能导致不可 -预测的行为,并且无法保证能确切的知道该功能不可用,例如 SIGILL。 - - -2. Hwcaps 的说明 ----------------- - -大多数 hwcaps 旨在说明通过架构 ID 寄存器(处于 EL0 的用户空间代码无法访问) -描述的功能的存在。这些 hwcap 通过 ID 寄存器字段定义,并且应根据 ARM 体系 -结构参考手册(ARM ARM)中定义的字段来解释说明。 - -这些 hwcaps 以下面的形式描述:: - - idreg.field == val 表示有某个功能。 - -当 idreg.field 中有 val 时,hwcaps 表示 ARM ARM 定义的功能是有效的,但是 -并不是说要完全和 val 相等,也不是说 idreg.field 描述的其他功能就是缺失的。 - -其他 hwcaps 可能表明无法仅由 ID 寄存器描述的功能的存在。这些 hwcaps 可能 -没有被 ID 寄存器描述,需要参考其他文档。 - - -3. AT_HWCAP 中揭示的 hwcaps ---------------------------- - -HWCAP_FP - ID_AA64PFR0_EL1.FP == 0b0000 表示有此功能。 - -HWCAP_ASIMD - ID_AA64PFR0_EL1.AdvSIMD == 0b0000 表示有此功能。 - -HWCAP_EVTSTRM - 通用计时器频率配置为大约100KHz以生成事件。 - -HWCAP_AES - ID_AA64ISAR0_EL1.AES == 0b0001 表示有此功能。 - -HWCAP_PMULL - ID_AA64ISAR0_EL1.AES == 0b0010 表示有此功能。 - -HWCAP_SHA1 - ID_AA64ISAR0_EL1.SHA1 == 0b0001 表示有此功能。 - -HWCAP_SHA2 - ID_AA64ISAR0_EL1.SHA2 == 0b0001 表示有此功能。 - -HWCAP_CRC32 - ID_AA64ISAR0_EL1.CRC32 == 0b0001 表示有此功能。 - -HWCAP_ATOMICS - ID_AA64ISAR0_EL1.Atomic == 0b0010 表示有此功能。 - -HWCAP_FPHP - ID_AA64PFR0_EL1.FP == 0b0001 表示有此功能。 - -HWCAP_ASIMDHP - ID_AA64PFR0_EL1.AdvSIMD == 0b0001 表示有此功能。 - -HWCAP_CPUID - 根据 Documentation/arm64/cpu-feature-registers.rst 描述,EL0 可以访问 - 某些 ID 寄存器。 - - 这些 ID 寄存器可能表示功能的可用性。 - -HWCAP_ASIMDRDM - ID_AA64ISAR0_EL1.RDM == 0b0001 表示有此功能。 - -HWCAP_JSCVT - ID_AA64ISAR1_EL1.JSCVT == 0b0001 表示有此功能。 - -HWCAP_FCMA - ID_AA64ISAR1_EL1.FCMA == 0b0001 表示有此功能。 - -HWCAP_LRCPC - ID_AA64ISAR1_EL1.LRCPC == 0b0001 表示有此功能。 - -HWCAP_DCPOP - ID_AA64ISAR1_EL1.DPB == 0b0001 表示有此功能。 - -HWCAP_SHA3 - ID_AA64ISAR0_EL1.SHA3 == 0b0001 表示有此功能。 - -HWCAP_SM3 - ID_AA64ISAR0_EL1.SM3 == 0b0001 表示有此功能。 - -HWCAP_SM4 - ID_AA64ISAR0_EL1.SM4 == 0b0001 表示有此功能。 - -HWCAP_ASIMDDP - ID_AA64ISAR0_EL1.DP == 0b0001 表示有此功能。 - -HWCAP_SHA512 - ID_AA64ISAR0_EL1.SHA2 == 0b0010 表示有此功能。 - -HWCAP_SVE - ID_AA64PFR0_EL1.SVE == 0b0001 表示有此功能。 - -HWCAP_ASIMDFHM - ID_AA64ISAR0_EL1.FHM == 0b0001 表示有此功能。 - -HWCAP_DIT - ID_AA64PFR0_EL1.DIT == 0b0001 表示有此功能。 - -HWCAP_USCAT - ID_AA64MMFR2_EL1.AT == 0b0001 表示有此功能。 - -HWCAP_ILRCPC - ID_AA64ISAR1_EL1.LRCPC == 0b0010 表示有此功能。 - -HWCAP_FLAGM - ID_AA64ISAR0_EL1.TS == 0b0001 表示有此功能。 - -HWCAP_SSBS - ID_AA64PFR1_EL1.SSBS == 0b0010 表示有此功能。 - -HWCAP_SB - ID_AA64ISAR1_EL1.SB == 0b0001 表示有此功能。 - -HWCAP_PACA - 如 Documentation/arm64/pointer-authentication.rst 所描述, - ID_AA64ISAR1_EL1.APA == 0b0001 或 ID_AA64ISAR1_EL1.API == 0b0001 - 表示有此功能。 - -HWCAP_PACG - 如 Documentation/arm64/pointer-authentication.rst 所描述, - ID_AA64ISAR1_EL1.GPA == 0b0001 或 ID_AA64ISAR1_EL1.GPI == 0b0001 - 表示有此功能。 - -HWCAP2_DCPODP - - ID_AA64ISAR1_EL1.DPB == 0b0010 表示有此功能。 - -HWCAP2_SVE2 - - ID_AA64ZFR0_EL1.SVEVer == 0b0001 表示有此功能。 - -HWCAP2_SVEAES - - ID_AA64ZFR0_EL1.AES == 0b0001 表示有此功能。 - -HWCAP2_SVEPMULL - - ID_AA64ZFR0_EL1.AES == 0b0010 表示有此功能。 - -HWCAP2_SVEBITPERM - - ID_AA64ZFR0_EL1.BitPerm == 0b0001 表示有此功能。 - -HWCAP2_SVESHA3 - - ID_AA64ZFR0_EL1.SHA3 == 0b0001 表示有此功能。 - -HWCAP2_SVESM4 - - ID_AA64ZFR0_EL1.SM4 == 0b0001 表示有此功能。 - -HWCAP2_FLAGM2 - - ID_AA64ISAR0_EL1.TS == 0b0010 表示有此功能。 - -HWCAP2_FRINT - - ID_AA64ISAR1_EL1.FRINTTS == 0b0001 表示有此功能。 - -HWCAP2_SVEI8MM - - ID_AA64ZFR0_EL1.I8MM == 0b0001 表示有此功能。 - -HWCAP2_SVEF32MM - - ID_AA64ZFR0_EL1.F32MM == 0b0001 表示有此功能。 - -HWCAP2_SVEF64MM - - ID_AA64ZFR0_EL1.F64MM == 0b0001 表示有此功能。 - -HWCAP2_SVEBF16 - - ID_AA64ZFR0_EL1.BF16 == 0b0001 表示有此功能。 - -HWCAP2_I8MM - - ID_AA64ISAR1_EL1.I8MM == 0b0001 表示有此功能。 - -HWCAP2_BF16 - - ID_AA64ISAR1_EL1.BF16 == 0b0001 表示有此功能。 - -HWCAP2_DGH - - ID_AA64ISAR1_EL1.DGH == 0b0001 表示有此功能。 - -HWCAP2_RNG - - ID_AA64ISAR0_EL1.RNDR == 0b0001 表示有此功能。 - -HWCAP2_BTI - - ID_AA64PFR0_EL1.BT == 0b0001 表示有此功能。 - - -4. 未使用的 AT_HWCAP 位 ------------------------ - -为了与用户空间交互,内核保证 AT_HWCAP 的第62、63位将始终返回0。 diff --git a/Documentation/translations/zh_CN/arm64/hugetlbpage.rst b/Documentation/translations/zh_CN/arm64/hugetlbpage.rst deleted file mode 100644 index 13304d269d0b..000000000000 --- a/Documentation/translations/zh_CN/arm64/hugetlbpage.rst +++ /dev/null @@ -1,45 +0,0 @@ -.. include:: ../disclaimer-zh_CN.rst - -:Original: :ref:`Documentation/arm64/hugetlbpage.rst ` - -Translator: Bailu Lin - -===================== -ARM64中的 HugeTLBpage -===================== - -大页依靠有效利用 TLBs 来提高地址翻译的性能。这取决于以下 -两点 - - - - 大页的大小 - - TLBs 支持的条目大小 - -ARM64 接口支持2种大页方式。 - -1) pud/pmd 级别的块映射 ------------------------ - -这是常规大页,他们的 pmd 或 pud 页面表条目指向一个内存块。 -不管 TLB 中支持的条目大小如何,块映射可以减少翻译大页地址 -所需遍历的页表深度。 - -2) 使用连续位 -------------- - -架构中转换页表条目(D4.5.3, ARM DDI 0487C.a)中提供一个连续 -位告诉 MMU 这个条目是一个连续条目集的一员,它可以被缓存在单 -个 TLB 条目中。 - -在 Linux 中连续位用来增加 pmd 和 pte(最后一级)级别映射的大 -小。受支持的连续页表条目数量因页面大小和页表级别而异。 - - -支持以下大页尺寸配置 - - - ====== ======== ==== ======== === - - CONT PTE PMD CONT PMD PUD - ====== ======== ==== ======== === - 4K: 64K 2M 32M 1G - 16K: 2M 32M 1G - 64K: 2M 512M 16G - ====== ======== ==== ======== === diff --git a/Documentation/translations/zh_CN/arm64/index.rst b/Documentation/translations/zh_CN/arm64/index.rst deleted file mode 100644 index 57dc5de5ccc5..000000000000 --- a/Documentation/translations/zh_CN/arm64/index.rst +++ /dev/null @@ -1,19 +0,0 @@ -.. include:: ../disclaimer-zh_CN.rst - -:Original: :ref:`Documentation/arm64/index.rst ` -:Translator: Bailu Lin - -.. _cn_arm64_index: - - -========== -ARM64 架构 -========== - -.. toctree:: - :maxdepth: 2 - - amu - hugetlbpage - perf - elf_hwcaps diff --git a/Documentation/translations/zh_CN/arm64/legacy_instructions.txt b/Documentation/translations/zh_CN/arm64/legacy_instructions.txt deleted file mode 100644 index e295cf75f606..000000000000 --- a/Documentation/translations/zh_CN/arm64/legacy_instructions.txt +++ /dev/null @@ -1,72 +0,0 @@ -Chinese translated version of Documentation/arm64/legacy_instructions.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -Maintainer: Punit Agrawal - Suzuki K. Poulose -Chinese maintainer: Fu Wei ---------------------------------------------------------------------- -Documentation/arm64/legacy_instructions.rst 的中文翻译 - -如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 -交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 -译存在问题,请联系中文版维护者。 - -本文翻译提交时的 Git 检出点为: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 - -英文版维护者: Punit Agrawal - Suzuki K. Poulose -中文版维护者: 傅炜 Fu Wei -中文版翻译者: 傅炜 Fu Wei -中文版校译者: 傅炜 Fu Wei - -以下为正文 ---------------------------------------------------------------------- -Linux 内核在 arm64 上的移植提供了一个基础框架,以支持构架中正在被淘汰或已废弃指令的模拟执行。 -这个基础框架的代码使用未定义指令钩子(hooks)来支持模拟。如果指令存在,它也允许在硬件中启用该指令。 - -模拟模式可通过写 sysctl 节点(/proc/sys/abi)来控制。 -不同的执行方式及 sysctl 节点的相应值,解释如下: - -* Undef(未定义) - 值: 0 - 产生未定义指令终止异常。它是那些构架中已废弃的指令,如 SWP,的默认处理方式。 - -* Emulate(模拟) - 值: 1 - 使用软件模拟方式。为解决软件迁移问题,这种模拟指令模式的使用是被跟踪的,并会发出速率限制警告。 - 它是那些构架中正在被淘汰的指令,如 CP15 barriers(隔离指令),的默认处理方式。 - -* Hardware Execution(硬件执行) - 值: 2 - 虽然标记为正在被淘汰,但一些实现可能提供硬件执行这些指令的使能/禁用操作。 - 使用硬件执行一般会有更好的性能,但将无法收集运行时对正被淘汰指令的使用统计数据。 - -默认执行模式依赖于指令在构架中状态。正在被淘汰的指令应该以模拟(Emulate)作为默认模式, -而已废弃的指令必须默认使用未定义(Undef)模式 - -注意:指令模拟可能无法应对所有情况。更多详情请参考单独的指令注释。 - -受支持的遗留指令 -------------- -* SWP{B} -节点: /proc/sys/abi/swp -状态: 已废弃 -默认执行方式: Undef (0) - -* CP15 Barriers -节点: /proc/sys/abi/cp15_barrier -状态: 正被淘汰,不推荐使用 -默认执行方式: Emulate (1) - -* SETEND -节点: /proc/sys/abi/setend -状态: 正被淘汰,不推荐使用 -默认执行方式: Emulate (1)* -注:为了使能这个特性,系统中的所有 CPU 必须在 EL0 支持混合字节序。 -如果一个新的 CPU (不支持混合字节序) 在使能这个特性后被热插入系统, -在应用中可能会出现不可预期的结果。 diff --git a/Documentation/translations/zh_CN/arm64/memory.txt b/Documentation/translations/zh_CN/arm64/memory.txt deleted file mode 100644 index be20f8228b91..000000000000 --- a/Documentation/translations/zh_CN/arm64/memory.txt +++ /dev/null @@ -1,114 +0,0 @@ -Chinese translated version of Documentation/arm64/memory.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -Maintainer: Catalin Marinas -Chinese maintainer: Fu Wei ---------------------------------------------------------------------- -Documentation/arm64/memory.rst 的中文翻译 - -如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 -交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 -译存在问题,请联系中文版维护者。 - -本文翻译提交时的 Git 检出点为: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 - -英文版维护者: Catalin Marinas -中文版维护者: 傅炜 Fu Wei -中文版翻译者: 傅炜 Fu Wei -中文版校译者: 傅炜 Fu Wei - -以下为正文 ---------------------------------------------------------------------- - Linux 在 AArch64 中的内存布局 - =========================== - -作者: Catalin Marinas - -本文档描述 AArch64 Linux 内核所使用的虚拟内存布局。此构架可以实现 -页大小为 4KB 的 4 级转换表和页大小为 64KB 的 3 级转换表。 - -AArch64 Linux 使用 3 级或 4 级转换表,其页大小配置为 4KB,对于用户和内核 -分别都有 39-bit (512GB) 或 48-bit (256TB) 的虚拟地址空间。 -对于页大小为 64KB的配置,仅使用 2 级转换表,有 42-bit (4TB) 的虚拟地址空间,但内存布局相同。 - -用户地址空间的 63:48 位为 0,而内核地址空间的相应位为 1。TTBRx 的 -选择由虚拟地址的 63 位给出。swapper_pg_dir 仅包含内核(全局)映射, -而用户 pgd 仅包含用户(非全局)映射。swapper_pg_dir 地址被写入 -TTBR1 中,且从不写入 TTBR0。 - - -AArch64 Linux 在页大小为 4KB,并使用 3 级转换表时的内存布局: - -起始地址 结束地址 大小 用途 ------------------------------------------------------------------------ -0000000000000000 0000007fffffffff 512GB 用户空间 -ffffff8000000000 ffffffffffffffff 512GB 内核空间 - - -AArch64 Linux 在页大小为 4KB,并使用 4 级转换表时的内存布局: - -起始地址 结束地址 大小 用途 ------------------------------------------------------------------------ -0000000000000000 0000ffffffffffff 256TB 用户空间 -ffff000000000000 ffffffffffffffff 256TB 内核空间 - - -AArch64 Linux 在页大小为 64KB,并使用 2 级转换表时的内存布局: - -起始地址 结束地址 大小 用途 ------------------------------------------------------------------------ -0000000000000000 000003ffffffffff 4TB 用户空间 -fffffc0000000000 ffffffffffffffff 4TB 内核空间 - - -AArch64 Linux 在页大小为 64KB,并使用 3 级转换表时的内存布局: - -起始地址 结束地址 大小 用途 ------------------------------------------------------------------------ -0000000000000000 0000ffffffffffff 256TB 用户空间 -ffff000000000000 ffffffffffffffff 256TB 内核空间 - - -更详细的内核虚拟内存布局,请参阅内核启动信息。 - - -4KB 页大小的转换表查找: - -+--------+--------+--------+--------+--------+--------+--------+--------+ -|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| -+--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | | - | | | | | v - | | | | | [11:0] 页内偏移 - | | | | +-> [20:12] L3 索引 - | | | +-----------> [29:21] L2 索引 - | | +---------------------> [38:30] L1 索引 - | +-------------------------------> [47:39] L0 索引 - +-------------------------------------------------> [63] TTBR0/1 - - -64KB 页大小的转换表查找: - -+--------+--------+--------+--------+--------+--------+--------+--------+ -|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| -+--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | - | | | | v - | | | | [15:0] 页内偏移 - | | | +----------> [28:16] L3 索引 - | | +--------------------------> [41:29] L2 索引 - | +-------------------------------> [47:42] L1 索引 - +-------------------------------------------------> [63] TTBR0/1 - - -当使用 KVM 时, 管理程序(hypervisor)在 EL2 中通过相对内核虚拟地址的 -一个固定偏移来映射内核页(内核虚拟地址的高 24 位设为零): - -起始地址 结束地址 大小 用途 ------------------------------------------------------------------------ -0000004000000000 0000007fffffffff 256GB 在 HYP 中映射的内核对象 diff --git a/Documentation/translations/zh_CN/arm64/perf.rst b/Documentation/translations/zh_CN/arm64/perf.rst deleted file mode 100644 index 9bf21d73f4d1..000000000000 --- a/Documentation/translations/zh_CN/arm64/perf.rst +++ /dev/null @@ -1,86 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -.. include:: ../disclaimer-zh_CN.rst - -:Original: :ref:`Documentation/arm64/perf.rst ` - -Translator: Bailu Lin - -============= -Perf 事件属性 -============= - -:作者: Andrew Murray -:日期: 2019-03-06 - -exclude_user ------------- - -该属性排除用户空间。 - -用户空间始终运行在 EL0,因此该属性将排除 EL0。 - - -exclude_kernel --------------- - -该属性排除内核空间。 - -打开 VHE 时内核运行在 EL2,不打开 VHE 时内核运行在 EL1。客户机 -内核总是运行在 EL1。 - -对于宿主机,该属性排除 EL1 和 VHE 上的 EL2。 - -对于客户机,该属性排除 EL1。请注意客户机从来不会运行在 EL2。 - - -exclude_hv ----------- - -该属性排除虚拟机监控器。 - -对于 VHE 宿主机该属性将被忽略,此时我们认为宿主机内核是虚拟机监 -控器。 - -对于 non-VHE 宿主机该属性将排除 EL2,因为虚拟机监控器运行在 EL2 -的任何代码主要用于客户机和宿主机的切换。 - -对于客户机该属性无效。请注意客户机从来不会运行在 EL2。 - - -exclude_host / exclude_guest ----------------------------- - -这些属性分别排除了 KVM 宿主机和客户机。 - -KVM 宿主机可能运行在 EL0(用户空间),EL1(non-VHE 内核)和 -EL2(VHE 内核 或 non-VHE 虚拟机监控器)。 - -KVM 客户机可能运行在 EL0(用户空间)和 EL1(内核)。 - -由于宿主机和客户机之间重叠的异常级别,我们不能仅仅依靠 PMU 的硬件异 -常过滤机制-因此我们必须启用/禁用对于客户机进入和退出的计数。而这在 -VHE 和 non-VHE 系统上表现不同。 - -对于 non-VHE 系统的 exclude_host 属性排除 EL2 - 在进入和退出客户 -机时,我们会根据 exclude_host 和 exclude_guest 属性在适当的情况下 -禁用/启用该事件。 - -对于 VHE 系统的 exclude_guest 属性排除 EL1,而对其中的 exclude_host -属性同时排除 EL0,EL2。在进入和退出客户机时,我们会适当地根据 -exclude_host 和 exclude_guest 属性包括/排除 EL0。 - -以上声明也适用于在 not-VHE 客户机使用这些属性时,但是请注意客户机从 -来不会运行在 EL2。 - - -准确性 ------- - -在 non-VHE 宿主机上,我们在 EL2 进入/退出宿主机/客户机的切换时启用/ -关闭计数器 -但是在启用/禁用计数器和进入/退出客户机之间存在一段延时。 -对于 exclude_host, 我们可以通过过滤 EL2 消除在客户机进入/退出边界 -上用于计数客户机事件的宿主机事件计数器。但是当使用 !exclude_hv 时, -在客户机进入/退出有一个小的停电窗口无法捕获到宿主机的事件。 - -在 VHE 系统没有停电窗口。 diff --git a/Documentation/translations/zh_CN/arm64/silicon-errata.txt b/Documentation/translations/zh_CN/arm64/silicon-errata.txt deleted file mode 100644 index 440c59ac7dce..000000000000 --- a/Documentation/translations/zh_CN/arm64/silicon-errata.txt +++ /dev/null @@ -1,74 +0,0 @@ -Chinese translated version of Documentation/arm64/silicon-errata.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -M: Will Deacon -zh_CN: Fu Wei -C: 1926e54f115725a9248d0c4c65c22acaf94de4c4 ---------------------------------------------------------------------- -Documentation/arm64/silicon-errata.rst 的中文翻译 - -如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 -交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 -译存在问题,请联系中文版维护者。 - -英文版维护者: Will Deacon -中文版维护者: 傅炜 Fu Wei -中文版翻译者: 傅炜 Fu Wei -中文版校译者: 傅炜 Fu Wei -本文翻译提交时的 Git 检出点为: 1926e54f115725a9248d0c4c65c22acaf94de4c4 - -以下为正文 ---------------------------------------------------------------------- - 芯片勘误和软件补救措施 - ================== - -作者: Will Deacon -日期: 2015年11月27日 - -一个不幸的现实:硬件经常带有一些所谓的“瑕疵(errata)”,导致其在 -某些特定情况下会违背构架定义的行为。就基于 ARM 的硬件而言,这些瑕疵 -大体可分为以下几类: - - A 类:无可行补救措施的严重缺陷。 - B 类:有可接受的补救措施的重大或严重缺陷。 - C 类:在正常操作中不会显现的小瑕疵。 - -更多资讯,请在 infocenter.arm.com (需注册)中查阅“软件开发者勘误 -笔记”(“Software Developers Errata Notice”)文档。 - -对于 Linux 而言,B 类缺陷可能需要操作系统的某些特别处理。例如,避免 -一个特殊的代码序列,或是以一种特定的方式配置处理器。在某种不太常见的 -情况下,为将 A 类缺陷当作 C 类处理,可能需要用类似的手段。这些手段被 -统称为“软件补救措施”,且仅在少数情况需要(例如,那些需要一个运行在 -非安全异常级的补救措施 *并且* 能被 Linux 触发的情况)。 - -对于尚在讨论中的可能对未受瑕疵影响的系统产生干扰的软件补救措施,有一个 -相应的内核配置(Kconfig)选项被加在 “内核特性(Kernel Features)”-> -“基于可选方法框架的 ARM 瑕疵补救措施(ARM errata workarounds via -the alternatives framework)"。这些选项被默认开启,若探测到受影响的CPU, -补丁将在运行时被使用。至于对系统运行影响较小的补救措施,内核配置选项 -并不存在,且代码以某种规避瑕疵的方式被构造(带注释为宜)。 - -这种做法对于在任意内核源代码树中准确地判断出哪个瑕疵已被软件方法所补救 -稍微有点麻烦,所以在 Linux 内核中此文件作为软件补救措施的注册表, -并将在新的软件补救措施被提交和向后移植(backported)到稳定内核时被更新。 - -| 实现者 | 受影响的组件 | 勘误编号 | 内核配置 | -+----------------+-----------------+-----------------+-------------------------+ -| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | -| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | -| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | -| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | -| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | -| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | -| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | -| ARM | Cortex-A57 | #852523 | N/A | -| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | -| | | | | -| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 | -| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | diff --git a/Documentation/translations/zh_CN/arm64/tagged-pointers.txt b/Documentation/translations/zh_CN/arm64/tagged-pointers.txt deleted file mode 100644 index 77ac3548a16d..000000000000 --- a/Documentation/translations/zh_CN/arm64/tagged-pointers.txt +++ /dev/null @@ -1,52 +0,0 @@ -Chinese translated version of Documentation/arm64/tagged-pointers.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -Maintainer: Will Deacon -Chinese maintainer: Fu Wei ---------------------------------------------------------------------- -Documentation/arm64/tagged-pointers.rst 的中文翻译 - -如果想评论或更新本文的内容,请直接联系原文档的维护者。如果你使用英文 -交流有困难的话,也可以向中文版维护者求助。如果本翻译更新不及时或者翻 -译存在问题,请联系中文版维护者。 - -英文版维护者: Will Deacon -中文版维护者: 傅炜 Fu Wei -中文版翻译者: 傅炜 Fu Wei -中文版校译者: 傅炜 Fu Wei - -以下为正文 ---------------------------------------------------------------------- - Linux 在 AArch64 中带标记的虚拟地址 - ================================= - -作者: Will Deacon -日期: 2013 年 06 月 12 日 - -本文档简述了在 AArch64 地址转换系统中提供的带标记的虚拟地址及其在 -AArch64 Linux 中的潜在用途。 - -内核提供的地址转换表配置使通过 TTBR0 完成的虚拟地址转换(即用户空间 -映射),其虚拟地址的最高 8 位(63:56)会被转换硬件所忽略。这种机制 -让这些位可供应用程序自由使用,其注意事项如下: - - (1) 内核要求所有传递到 EL1 的用户空间地址带有 0x00 标记。 - 这意味着任何携带用户空间虚拟地址的系统调用(syscall) - 参数 *必须* 在陷入内核前使它们的最高字节被清零。 - - (2) 非零标记在传递信号时不被保存。这意味着在应用程序中利用了 - 标记的信号处理函数无法依赖 siginfo_t 的用户空间虚拟 - 地址所携带的包含其内部域信息的标记。此规则的一个例外是 - 当信号是在调试观察点的异常处理程序中产生的,此时标记的 - 信息将被保存。 - - (3) 当使用带标记的指针时需特别留心,因为仅对两个虚拟地址 - 的高字节,C 编译器很可能无法判断它们是不同的。 - -此构架会阻止对带标记的 PC 指针的利用,因此在异常返回时,其高字节 -将被设置成一个为 “55” 的扩展符。 diff --git a/Documentation/translations/zh_TW/arch/arm64/amu.rst b/Documentation/translations/zh_TW/arch/arm64/amu.rst new file mode 100644 index 000000000000..f947a6c7369f --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/amu.rst @@ -0,0 +1,104 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arch/arm64/amu.rst ` + +Translator: Bailu Lin + Hu Haowen + +================================== +AArch64 Linux 中擴展的活動監控單元 +================================== + +作者: Ionela Voinescu + +日期: 2019-09-10 + +本文檔簡要描述了 AArch64 Linux 支持的活動監控單元的規範。 + + +架構總述 +-------- + +活動監控是 ARMv8.4 CPU 架構引入的一個可選擴展特性。 + +活動監控單元(在每個 CPU 中實現)爲系統管理提供了性能計數器。既可以通 +過系統寄存器的方式訪問計數器,同時也支持外部內存映射的方式訪問計數器。 + +AMUv1 架構實現了一個由4個固定的64位事件計數器組成的計數器組。 + + - CPU 周期計數器:同 CPU 的頻率增長 + - 常量計數器:同固定的系統時鐘頻率增長 + - 淘汰指令計數器: 同每次架構指令執行增長 + - 內存停頓周期計數器:計算由在時鐘域內的最後一級緩存中未命中而引起 + 的指令調度停頓周期數 + +當處於 WFI 或者 WFE 狀態時,計數器不會增長。 + +AMU 架構提供了一個高達16位的事件計數器空間,未來新的 AMU 版本中可能 +用它來實現新增的事件計數器。 + +另外,AMUv1 實現了一個多達16個64位輔助事件計數器的計數器組。 + +冷復位時所有的計數器會清零。 + + +基本支持 +-------- + +內核可以安全地運行在支持 AMU 和不支持 AMU 的 CPU 組合中。 +因此,當配置 CONFIG_ARM64_AMU_EXTN 後我們無條件使能後續 +(secondary or hotplugged) CPU 檢測和使用這個特性。 + +當在 CPU 上檢測到該特性時,我們會標記爲特性可用但是不能保證計數器的功能, +僅表明有擴展屬性。 + +固件(代碼運行在高異常級別,例如 arm-tf )需支持以下功能: + + - 提供低異常級別(EL2 和 EL1)訪問 AMU 寄存器的能力。 + - 使能計數器。如果未使能,它的值應爲 0。 + - 在從電源關閉狀態啓動 CPU 前或後保存或者恢復計數器。 + +當使用使能了該特性的內核啓動但固件損壞時,訪問計數器寄存器可能會遭遇 +panic 或者死鎖。即使未發現這些症狀,計數器寄存器返回的數據結果並不一 +定能反映真實情況。通常,計數器會返回 0,表明他們未被使能。 + +如果固件沒有提供適當的支持最好關閉 CONFIG_ARM64_AMU_EXTN。 +值得注意的是,出於安全原因,不要繞過 AMUSERRENR_EL0 設置而捕獲從 +EL0(用戶空間) 訪問 EL1(內核空間)。 因此,固件應該確保訪問 AMU寄存器 +不會困在 EL2或EL3。 + +AMUv1 的固定計數器可以通過如下系統寄存器訪問: + + - SYS_AMEVCNTR0_CORE_EL0 + - SYS_AMEVCNTR0_CONST_EL0 + - SYS_AMEVCNTR0_INST_RET_EL0 + - SYS_AMEVCNTR0_MEM_STALL_EL0 + +特定輔助計數器可以通過 SYS_AMEVCNTR1_EL0(n) 訪問,其中n介於0到15。 + +詳細信息定義在目錄:arch/arm64/include/asm/sysreg.h。 + + +用戶空間訪問 +------------ + +由於以下原因,當前禁止從用戶空間訪問 AMU 的寄存器: + + - 安全因數:可能會暴露處於安全模式執行的代碼信息。 + - 意願:AMU 是用於系統管理的。 + +同樣,該功能對用戶空間不可見。 + + +虛擬化 +------ + +由於以下原因,當前禁止從 KVM 客戶端的用戶空間(EL0)和內核空間(EL1) +訪問 AMU 的寄存器: + + - 安全因數:可能會暴露給其他客戶端或主機端執行的代碼信息。 + +任何試圖訪問 AMU 寄存器的行爲都會觸發一個註冊在客戶端的未定義異常。 + diff --git a/Documentation/translations/zh_TW/arch/arm64/booting.txt b/Documentation/translations/zh_TW/arch/arm64/booting.txt new file mode 100644 index 000000000000..24817b8b70cd --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/booting.txt @@ -0,0 +1,251 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arch/arm64/booting.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +M: Will Deacon +zh_CN: Fu Wei +zh_TW: Hu Haowen +C: 55f058e7574c3615dea4615573a19bdb258696c6 +--------------------------------------------------------------------- +Documentation/arch/arm64/booting.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +英文版維護者: Will Deacon +中文版維護者: 傅煒 Fu Wei +中文版翻譯者: 傅煒 Fu Wei +中文版校譯者: 傅煒 Fu Wei +繁體中文版校譯者: 胡皓文 Hu Haowen +本文翻譯提交時的 Git 檢出點爲: 55f058e7574c3615dea4615573a19bdb258696c6 + +以下爲正文 +--------------------------------------------------------------------- + 啓動 AArch64 Linux + ================== + +作者: Will Deacon +日期: 2012 年 09 月 07 日 + +本文檔基於 Russell King 的 ARM 啓動文檔,且適用於所有公開發布的 +AArch64 Linux 內核代碼。 + +AArch64 異常模型由多個異常級(EL0 - EL3)組成,對於 EL0 和 EL1 異常級 +有對應的安全和非安全模式。EL2 是系統管理級,且僅存在於非安全模式下。 +EL3 是最高特權級,且僅存在於安全模式下。 + +基於本文檔的目的,我們將簡單地使用『引導裝載程序』(『boot loader』) +這個術語來定義在將控制權交給 Linux 內核前 CPU 上執行的所有軟體。 +這可能包含安全監控和系統管理代碼,或者它可能只是一些用於準備最小啓動 +環境的指令。 + +基本上,引導裝載程序(至少)應實現以下操作: + +1、設置和初始化 RAM +2、設置設備樹數據 +3、解壓內核映像 +4、調用內核映像 + + +1、設置和初始化 RAM +----------------- + +必要性: 強制 + +引導裝載程序應該找到並初始化系統中所有內核用於保持系統變量數據的 RAM。 +這個操作的執行方式因設備而異。(它可能使用內部算法來自動定位和計算所有 +RAM,或可能使用對這個設備已知的 RAM 信息,還可能是引導裝載程序設計者 +想到的任何合適的方法。) + + +2、設置設備樹數據 +--------------- + +必要性: 強制 + +設備樹數據塊(dtb)必須 8 字節對齊,且大小不能超過 2MB。由於設備樹 +數據塊將在使能緩存的情況下以 2MB 粒度被映射,故其不能被置於必須以特定 +屬性映射的2M區域內。 + +註: v4.2 之前的版本同時要求設備樹數據塊被置於從內核映像以下 +text_offset 字節處算起第一個 512MB 內。 + +3、解壓內核映像 +------------- + +必要性: 可選 + +AArch64 內核當前沒有提供自解壓代碼,因此如果使用了壓縮內核映像文件 +(比如 Image.gz),則需要通過引導裝載程序(使用 gzip 等)來進行解壓。 +若引導裝載程序沒有實現這個功能,就要使用非壓縮內核映像文件。 + + +4、調用內核映像 +------------- + +必要性: 強制 + +已解壓的內核映像包含一個 64 字節的頭,內容如下: + + u32 code0; /* 可執行代碼 */ + u32 code1; /* 可執行代碼 */ + u64 text_offset; /* 映像裝載偏移,小端模式 */ + u64 image_size; /* 映像實際大小, 小端模式 */ + u64 flags; /* 內核旗標, 小端模式 * + u64 res2 = 0; /* 保留 */ + u64 res3 = 0; /* 保留 */ + u64 res4 = 0; /* 保留 */ + u32 magic = 0x644d5241; /* 魔數, 小端, "ARM\x64" */ + u32 res5; /* 保留 (用於 PE COFF 偏移) */ + + +映像頭注釋: + +- 自 v3.17 起,除非另有說明,所有域都是小端模式。 + +- code0/code1 負責跳轉到 stext. + +- 當通過 EFI 啓動時, 最初 code0/code1 被跳過。 + res5 是到 PE 文件頭的偏移,而 PE 文件頭含有 EFI 的啓動入口點 + (efi_stub_entry)。當 stub 代碼完成了它的使命,它會跳轉到 code0 + 繼續正常的啓動流程。 + +- v3.17 之前,未明確指定 text_offset 的字節序。此時,image_size 爲零, + 且 text_offset 依照內核字節序爲 0x80000。 + 當 image_size 非零,text_offset 爲小端模式且是有效值,應被引導加載 + 程序使用。當 image_size 爲零,text_offset 可假定爲 0x80000。 + +- flags 域 (v3.17 引入) 爲 64 位小端模式,其編碼如下: + 位 0: 內核字節序。 1 表示大端模式,0 表示小端模式。 + 位 1-2: 內核頁大小。 + 0 - 未指定。 + 1 - 4K + 2 - 16K + 3 - 64K + 位 3: 內核物理位置 + 0 - 2MB 對齊基址應儘量靠近內存起始處,因爲 + 其基址以下的內存無法通過線性映射訪問 + 1 - 2MB 對齊基址可以在物理內存的任意位置 + 位 4-63: 保留。 + +- 當 image_size 爲零時,引導裝載程序應試圖在內核映像末尾之後儘可能 + 多地保留空閒內存供內核直接使用。對內存空間的需求量因所選定的內核 + 特性而異, 並無實際限制。 + +內核映像必須被放置在任意一個可用系統內存 2MB 對齊基址的 text_offset +字節處,並從該處被調用。2MB 對齊基址和內核映像起始地址之間的區域對於 +內核來說沒有特殊意義,且可能被用於其他目的。 +從映像起始地址算起,最少必須準備 image_size 字節的空閒內存供內核使用。 +註: v4.6 之前的版本無法使用內核映像物理偏移以下的內存,所以當時建議 +將映像儘量放置在靠近系統內存起始的地方。 + +任何提供給內核的內存(甚至在映像起始地址之前),若未從內核中標記爲保留 +(如在設備樹(dtb)的 memreserve 區域),都將被認爲對內核是可用。 + +在跳轉入內核前,必須符合以下狀態: + +- 停止所有 DMA 設備,這樣內存數據就不會因爲虛假網絡包或磁碟數據而 + 被破壞。這可能可以節省你許多的調試時間。 + +- 主 CPU 通用寄存器設置 + x0 = 系統 RAM 中設備樹數據塊(dtb)的物理地址。 + x1 = 0 (保留,將來可能使用) + x2 = 0 (保留,將來可能使用) + x3 = 0 (保留,將來可能使用) + +- CPU 模式 + 所有形式的中斷必須在 PSTATE.DAIF 中被屏蔽(Debug、SError、IRQ + 和 FIQ)。 + CPU 必須處於 EL2(推薦,可訪問虛擬化擴展)或非安全 EL1 模式下。 + +- 高速緩存、MMU + MMU 必須關閉。 + 指令緩存開啓或關閉皆可。 + 已載入的內核映像的相應內存區必須被清理,以達到緩存一致性點(PoC)。 + 當存在系統緩存或其他使能緩存的一致性主控器時,通常需使用虛擬地址 + 維護其緩存,而非 set/way 操作。 + 遵從通過虛擬地址操作維護構架緩存的系統緩存必須被配置,並可以被使能。 + 而不通過虛擬地址操作維護構架緩存的系統緩存(不推薦),必須被配置且 + 禁用。 + + *譯者註:對於 PoC 以及緩存相關內容,請參考 ARMv8 構架參考手冊 + ARM DDI 0487A + +- 架構計時器 + CNTFRQ 必須設定爲計時器的頻率,且 CNTVOFF 必須設定爲對所有 CPU + 都一致的值。如果在 EL1 模式下進入內核,則 CNTHCTL_EL2 中的 + EL1PCTEN (bit 0) 必須置位。 + +- 一致性 + 通過內核啓動的所有 CPU 在內核入口地址上必須處於相同的一致性域中。 + 這可能要根據具體實現來定義初始化過程,以使能每個CPU上對維護操作的 + 接收。 + +- 系統寄存器 + 在進入內核映像的異常級中,所有構架中可寫的系統寄存器必須通過軟體 + 在一個更高的異常級別下初始化,以防止在 未知 狀態下運行。 + + 對於擁有 GICv3 中斷控制器並以 v3 模式運行的系統: + - 如果 EL3 存在: + ICC_SRE_EL3.Enable (位 3) 必須初始化爲 0b1。 + ICC_SRE_EL3.SRE (位 0) 必須初始化爲 0b1。 + - 若內核運行在 EL1: + ICC_SRE_EL2.Enable (位 3) 必須初始化爲 0b1。 + ICC_SRE_EL2.SRE (位 0) 必須初始化爲 0b1。 + - 設備樹(DT)或 ACPI 表必須描述一個 GICv3 中斷控制器。 + + 對於擁有 GICv3 中斷控制器並以兼容(v2)模式運行的系統: + - 如果 EL3 存在: + ICC_SRE_EL3.SRE (位 0) 必須初始化爲 0b0。 + - 若內核運行在 EL1: + ICC_SRE_EL2.SRE (位 0) 必須初始化爲 0b0。 + - 設備樹(DT)或 ACPI 表必須描述一個 GICv2 中斷控制器。 + +以上對於 CPU 模式、高速緩存、MMU、架構計時器、一致性、系統寄存器的 +必要條件描述適用於所有 CPU。所有 CPU 必須在同一異常級別跳入內核。 + +引導裝載程序必須在每個 CPU 處於以下狀態時跳入內核入口: + +- 主 CPU 必須直接跳入內核映像的第一條指令。通過此 CPU 傳遞的設備樹 + 數據塊必須在每個 CPU 節點中包含一個 『enable-method』 屬性,所 + 支持的 enable-method 請見下文。 + + 引導裝載程序必須生成這些設備樹屬性,並在跳入內核入口之前將其插入 + 數據塊。 + +- enable-method 爲 「spin-table」 的 CPU 必須在它們的 CPU + 節點中包含一個 『cpu-release-addr』 屬性。這個屬性標識了一個 + 64 位自然對齊且初始化爲零的內存位置。 + + 這些 CPU 必須在內存保留區(通過設備樹中的 /memreserve/ 域傳遞 + 給內核)中自旋於內核之外,輪詢它們的 cpu-release-addr 位置(必須 + 包含在保留區中)。可通過插入 wfe 指令來降低忙循環開銷,而主 CPU 將 + 發出 sev 指令。當對 cpu-release-addr 所指位置的讀取操作返回非零值 + 時,CPU 必須跳入此值所指向的地址。此值爲一個單獨的 64 位小端值, + 因此 CPU 須在跳轉前將所讀取的值轉換爲其本身的端模式。 + +- enable-method 爲 「psci」 的 CPU 保持在內核外(比如,在 + memory 節點中描述爲內核空間的內存區外,或在通過設備樹 /memreserve/ + 域中描述爲內核保留區的空間中)。內核將會發起在 ARM 文檔(編號 + ARM DEN 0022A:用於 ARM 上的電源狀態協調接口系統軟體)中描述的 + CPU_ON 調用來將 CPU 帶入內核。 + + *譯者注: ARM DEN 0022A 已更新到 ARM DEN 0022C。 + + 設備樹必須包含一個 『psci』 節點,請參考以下文檔: + Documentation/devicetree/bindings/arm/psci.yaml + + +- 輔助 CPU 通用寄存器設置 + x0 = 0 (保留,將來可能使用) + x1 = 0 (保留,將來可能使用) + x2 = 0 (保留,將來可能使用) + x3 = 0 (保留,將來可能使用) + diff --git a/Documentation/translations/zh_TW/arch/arm64/elf_hwcaps.rst b/Documentation/translations/zh_TW/arch/arm64/elf_hwcaps.rst new file mode 100644 index 000000000000..fca3c6ff7b93 --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/elf_hwcaps.rst @@ -0,0 +1,244 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arch/arm64/elf_hwcaps.rst ` + +Translator: Bailu Lin + Hu Haowen + +================ +ARM64 ELF hwcaps +================ + +這篇文檔描述了 arm64 ELF hwcaps 的用法和語義。 + + +1. 簡介 +------- + +有些硬體或軟體功能僅在某些 CPU 實現上和/或在具體某個內核配置上可用,但 +對於處於 EL0 的用戶空間代碼沒有可用的架構發現機制。內核通過在輔助向量表 +公開一組稱爲 hwcaps 的標誌而把這些功能暴露給用戶空間。 + +用戶空間軟體可以通過獲取輔助向量的 AT_HWCAP 或 AT_HWCAP2 條目來測試功能, +並測試是否設置了相關標誌,例如:: + + bool floating_point_is_present(void) + { + unsigned long hwcaps = getauxval(AT_HWCAP); + if (hwcaps & HWCAP_FP) + return true; + + return false; + } + +如果軟體依賴於 hwcap 描述的功能,在嘗試使用該功能前則應檢查相關的 hwcap +標誌以驗證該功能是否存在。 + +不能通過其他方式探查這些功能。當一個功能不可用時,嘗試使用它可能導致不可 +預測的行爲,並且無法保證能確切的知道該功能不可用,例如 SIGILL。 + + +2. Hwcaps 的說明 +---------------- + +大多數 hwcaps 旨在說明通過架構 ID 寄存器(處於 EL0 的用戶空間代碼無法訪問) +描述的功能的存在。這些 hwcap 通過 ID 寄存器欄位定義,並且應根據 ARM 體系 +結構參考手冊(ARM ARM)中定義的欄位來解釋說明。 + +這些 hwcaps 以下面的形式描述:: + + idreg.field == val 表示有某個功能。 + +當 idreg.field 中有 val 時,hwcaps 表示 ARM ARM 定義的功能是有效的,但是 +並不是說要完全和 val 相等,也不是說 idreg.field 描述的其他功能就是缺失的。 + +其他 hwcaps 可能表明無法僅由 ID 寄存器描述的功能的存在。這些 hwcaps 可能 +沒有被 ID 寄存器描述,需要參考其他文檔。 + + +3. AT_HWCAP 中揭示的 hwcaps +--------------------------- + +HWCAP_FP + ID_AA64PFR0_EL1.FP == 0b0000 表示有此功能。 + +HWCAP_ASIMD + ID_AA64PFR0_EL1.AdvSIMD == 0b0000 表示有此功能。 + +HWCAP_EVTSTRM + 通用計時器頻率配置爲大約100KHz以生成事件。 + +HWCAP_AES + ID_AA64ISAR0_EL1.AES == 0b0001 表示有此功能。 + +HWCAP_PMULL + ID_AA64ISAR0_EL1.AES == 0b0010 表示有此功能。 + +HWCAP_SHA1 + ID_AA64ISAR0_EL1.SHA1 == 0b0001 表示有此功能。 + +HWCAP_SHA2 + ID_AA64ISAR0_EL1.SHA2 == 0b0001 表示有此功能。 + +HWCAP_CRC32 + ID_AA64ISAR0_EL1.CRC32 == 0b0001 表示有此功能。 + +HWCAP_ATOMICS + ID_AA64ISAR0_EL1.Atomic == 0b0010 表示有此功能。 + +HWCAP_FPHP + ID_AA64PFR0_EL1.FP == 0b0001 表示有此功能。 + +HWCAP_ASIMDHP + ID_AA64PFR0_EL1.AdvSIMD == 0b0001 表示有此功能。 + +HWCAP_CPUID + 根據 Documentation/arch/arm64/cpu-feature-registers.rst 描述,EL0 可以訪問 + 某些 ID 寄存器。 + + 這些 ID 寄存器可能表示功能的可用性。 + +HWCAP_ASIMDRDM + ID_AA64ISAR0_EL1.RDM == 0b0001 表示有此功能。 + +HWCAP_JSCVT + ID_AA64ISAR1_EL1.JSCVT == 0b0001 表示有此功能。 + +HWCAP_FCMA + ID_AA64ISAR1_EL1.FCMA == 0b0001 表示有此功能。 + +HWCAP_LRCPC + ID_AA64ISAR1_EL1.LRCPC == 0b0001 表示有此功能。 + +HWCAP_DCPOP + ID_AA64ISAR1_EL1.DPB == 0b0001 表示有此功能。 + +HWCAP_SHA3 + ID_AA64ISAR0_EL1.SHA3 == 0b0001 表示有此功能。 + +HWCAP_SM3 + ID_AA64ISAR0_EL1.SM3 == 0b0001 表示有此功能。 + +HWCAP_SM4 + ID_AA64ISAR0_EL1.SM4 == 0b0001 表示有此功能。 + +HWCAP_ASIMDDP + ID_AA64ISAR0_EL1.DP == 0b0001 表示有此功能。 + +HWCAP_SHA512 + ID_AA64ISAR0_EL1.SHA2 == 0b0010 表示有此功能。 + +HWCAP_SVE + ID_AA64PFR0_EL1.SVE == 0b0001 表示有此功能。 + +HWCAP_ASIMDFHM + ID_AA64ISAR0_EL1.FHM == 0b0001 表示有此功能。 + +HWCAP_DIT + ID_AA64PFR0_EL1.DIT == 0b0001 表示有此功能。 + +HWCAP_USCAT + ID_AA64MMFR2_EL1.AT == 0b0001 表示有此功能。 + +HWCAP_ILRCPC + ID_AA64ISAR1_EL1.LRCPC == 0b0010 表示有此功能。 + +HWCAP_FLAGM + ID_AA64ISAR0_EL1.TS == 0b0001 表示有此功能。 + +HWCAP_SSBS + ID_AA64PFR1_EL1.SSBS == 0b0010 表示有此功能。 + +HWCAP_SB + ID_AA64ISAR1_EL1.SB == 0b0001 表示有此功能。 + +HWCAP_PACA + 如 Documentation/arch/arm64/pointer-authentication.rst 所描述, + ID_AA64ISAR1_EL1.APA == 0b0001 或 ID_AA64ISAR1_EL1.API == 0b0001 + 表示有此功能。 + +HWCAP_PACG + 如 Documentation/arch/arm64/pointer-authentication.rst 所描述, + ID_AA64ISAR1_EL1.GPA == 0b0001 或 ID_AA64ISAR1_EL1.GPI == 0b0001 + 表示有此功能。 + +HWCAP2_DCPODP + + ID_AA64ISAR1_EL1.DPB == 0b0010 表示有此功能。 + +HWCAP2_SVE2 + + ID_AA64ZFR0_EL1.SVEVer == 0b0001 表示有此功能。 + +HWCAP2_SVEAES + + ID_AA64ZFR0_EL1.AES == 0b0001 表示有此功能。 + +HWCAP2_SVEPMULL + + ID_AA64ZFR0_EL1.AES == 0b0010 表示有此功能。 + +HWCAP2_SVEBITPERM + + ID_AA64ZFR0_EL1.BitPerm == 0b0001 表示有此功能。 + +HWCAP2_SVESHA3 + + ID_AA64ZFR0_EL1.SHA3 == 0b0001 表示有此功能。 + +HWCAP2_SVESM4 + + ID_AA64ZFR0_EL1.SM4 == 0b0001 表示有此功能。 + +HWCAP2_FLAGM2 + + ID_AA64ISAR0_EL1.TS == 0b0010 表示有此功能。 + +HWCAP2_FRINT + + ID_AA64ISAR1_EL1.FRINTTS == 0b0001 表示有此功能。 + +HWCAP2_SVEI8MM + + ID_AA64ZFR0_EL1.I8MM == 0b0001 表示有此功能。 + +HWCAP2_SVEF32MM + + ID_AA64ZFR0_EL1.F32MM == 0b0001 表示有此功能。 + +HWCAP2_SVEF64MM + + ID_AA64ZFR0_EL1.F64MM == 0b0001 表示有此功能。 + +HWCAP2_SVEBF16 + + ID_AA64ZFR0_EL1.BF16 == 0b0001 表示有此功能。 + +HWCAP2_I8MM + + ID_AA64ISAR1_EL1.I8MM == 0b0001 表示有此功能。 + +HWCAP2_BF16 + + ID_AA64ISAR1_EL1.BF16 == 0b0001 表示有此功能。 + +HWCAP2_DGH + + ID_AA64ISAR1_EL1.DGH == 0b0001 表示有此功能。 + +HWCAP2_RNG + + ID_AA64ISAR0_EL1.RNDR == 0b0001 表示有此功能。 + +HWCAP2_BTI + + ID_AA64PFR0_EL1.BT == 0b0001 表示有此功能。 + + +4. 未使用的 AT_HWCAP 位 +----------------------- + +爲了與用戶空間交互,內核保證 AT_HWCAP 的第62、63位將始終返回0。 + diff --git a/Documentation/translations/zh_TW/arch/arm64/hugetlbpage.rst b/Documentation/translations/zh_TW/arch/arm64/hugetlbpage.rst new file mode 100644 index 000000000000..10feb329dfb8 --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/hugetlbpage.rst @@ -0,0 +1,49 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arch/arm64/hugetlbpage.rst ` + +Translator: Bailu Lin + Hu Haowen + +===================== +ARM64中的 HugeTLBpage +===================== + +大頁依靠有效利用 TLBs 來提高地址翻譯的性能。這取決於以下 +兩點 - + + - 大頁的大小 + - TLBs 支持的條目大小 + +ARM64 接口支持2種大頁方式。 + +1) pud/pmd 級別的塊映射 +----------------------- + +這是常規大頁,他們的 pmd 或 pud 頁面表條目指向一個內存塊。 +不管 TLB 中支持的條目大小如何,塊映射可以減少翻譯大頁地址 +所需遍歷的頁表深度。 + +2) 使用連續位 +------------- + +架構中轉換頁表條目(D4.5.3, ARM DDI 0487C.a)中提供一個連續 +位告訴 MMU 這個條目是一個連續條目集的一員,它可以被緩存在單 +個 TLB 條目中。 + +在 Linux 中連續位用來增加 pmd 和 pte(最後一級)級別映射的大 +小。受支持的連續頁表條目數量因頁面大小和頁表級別而異。 + + +支持以下大頁尺寸配置 - + + ====== ======== ==== ======== === + - CONT PTE PMD CONT PMD PUD + ====== ======== ==== ======== === + 4K: 64K 2M 32M 1G + 16K: 2M 32M 1G + 64K: 2M 512M 16G + ====== ======== ==== ======== === + diff --git a/Documentation/translations/zh_TW/arch/arm64/index.rst b/Documentation/translations/zh_TW/arch/arm64/index.rst new file mode 100644 index 000000000000..68befee14b99 --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/index.rst @@ -0,0 +1,23 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arch/arm64/index.rst ` +:Translator: Bailu Lin + Hu Haowen + +.. _tw_arm64_index: + + +========== +ARM64 架構 +========== + +.. toctree:: + :maxdepth: 2 + + amu + hugetlbpage + perf + elf_hwcaps + diff --git a/Documentation/translations/zh_TW/arch/arm64/legacy_instructions.txt b/Documentation/translations/zh_TW/arch/arm64/legacy_instructions.txt new file mode 100644 index 000000000000..3c915df9836c --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/legacy_instructions.txt @@ -0,0 +1,77 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arch/arm64/legacy_instructions.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Punit Agrawal + Suzuki K. Poulose +Chinese maintainer: Fu Wei +Traditional Chinese maintainer: Hu Haowen +--------------------------------------------------------------------- +Documentation/arch/arm64/legacy_instructions.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +本文翻譯提交時的 Git 檢出點爲: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 + +英文版維護者: Punit Agrawal + Suzuki K. Poulose +中文版維護者: 傅煒 Fu Wei +中文版翻譯者: 傅煒 Fu Wei +中文版校譯者: 傅煒 Fu Wei +繁體中文版校譯者:胡皓文 Hu Haowen + +以下爲正文 +--------------------------------------------------------------------- +Linux 內核在 arm64 上的移植提供了一個基礎框架,以支持構架中正在被淘汰或已廢棄指令的模擬執行。 +這個基礎框架的代碼使用未定義指令鉤子(hooks)來支持模擬。如果指令存在,它也允許在硬體中啓用該指令。 + +模擬模式可通過寫 sysctl 節點(/proc/sys/abi)來控制。 +不同的執行方式及 sysctl 節點的相應值,解釋如下: + +* Undef(未定義) + 值: 0 + 產生未定義指令終止異常。它是那些構架中已廢棄的指令,如 SWP,的默認處理方式。 + +* Emulate(模擬) + 值: 1 + 使用軟體模擬方式。爲解決軟體遷移問題,這種模擬指令模式的使用是被跟蹤的,並會發出速率限制警告。 + 它是那些構架中正在被淘汰的指令,如 CP15 barriers(隔離指令),的默認處理方式。 + +* Hardware Execution(硬體執行) + 值: 2 + 雖然標記爲正在被淘汰,但一些實現可能提供硬體執行這些指令的使能/禁用操作。 + 使用硬體執行一般會有更好的性能,但將無法收集運行時對正被淘汰指令的使用統計數據。 + +默認執行模式依賴於指令在構架中狀態。正在被淘汰的指令應該以模擬(Emulate)作爲默認模式, +而已廢棄的指令必須默認使用未定義(Undef)模式 + +注意:指令模擬可能無法應對所有情況。更多詳情請參考單獨的指令注釋。 + +受支持的遺留指令 +------------- +* SWP{B} +節點: /proc/sys/abi/swp +狀態: 已廢棄 +默認執行方式: Undef (0) + +* CP15 Barriers +節點: /proc/sys/abi/cp15_barrier +狀態: 正被淘汰,不推薦使用 +默認執行方式: Emulate (1) + +* SETEND +節點: /proc/sys/abi/setend +狀態: 正被淘汰,不推薦使用 +默認執行方式: Emulate (1)* +註:爲了使能這個特性,系統中的所有 CPU 必須在 EL0 支持混合字節序。 +如果一個新的 CPU (不支持混合字節序) 在使能這個特性後被熱插入系統, +在應用中可能會出現不可預期的結果。 + diff --git a/Documentation/translations/zh_TW/arch/arm64/memory.txt b/Documentation/translations/zh_TW/arch/arm64/memory.txt new file mode 100644 index 000000000000..2437380a26d8 --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/memory.txt @@ -0,0 +1,119 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arch/arm64/memory.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Catalin Marinas +Chinese maintainer: Fu Wei +Traditional Chinese maintainer: Hu Haowen +--------------------------------------------------------------------- +Documentation/arch/arm64/memory.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +本文翻譯提交時的 Git 檢出點爲: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 + +英文版維護者: Catalin Marinas +中文版維護者: 傅煒 Fu Wei +中文版翻譯者: 傅煒 Fu Wei +中文版校譯者: 傅煒 Fu Wei +繁體中文版校譯者: 胡皓文 Hu Haowen + +以下爲正文 +--------------------------------------------------------------------- + Linux 在 AArch64 中的內存布局 + =========================== + +作者: Catalin Marinas + +本文檔描述 AArch64 Linux 內核所使用的虛擬內存布局。此構架可以實現 +頁大小爲 4KB 的 4 級轉換表和頁大小爲 64KB 的 3 級轉換表。 + +AArch64 Linux 使用 3 級或 4 級轉換表,其頁大小配置爲 4KB,對於用戶和內核 +分別都有 39-bit (512GB) 或 48-bit (256TB) 的虛擬地址空間。 +對於頁大小爲 64KB的配置,僅使用 2 級轉換表,有 42-bit (4TB) 的虛擬地址空間,但內存布局相同。 + +用戶地址空間的 63:48 位爲 0,而內核地址空間的相應位爲 1。TTBRx 的 +選擇由虛擬地址的 63 位給出。swapper_pg_dir 僅包含內核(全局)映射, +而用戶 pgd 僅包含用戶(非全局)映射。swapper_pg_dir 地址被寫入 +TTBR1 中,且從不寫入 TTBR0。 + + +AArch64 Linux 在頁大小爲 4KB,並使用 3 級轉換表時的內存布局: + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 0000007fffffffff 512GB 用戶空間 +ffffff8000000000 ffffffffffffffff 512GB 內核空間 + + +AArch64 Linux 在頁大小爲 4KB,並使用 4 級轉換表時的內存布局: + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 0000ffffffffffff 256TB 用戶空間 +ffff000000000000 ffffffffffffffff 256TB 內核空間 + + +AArch64 Linux 在頁大小爲 64KB,並使用 2 級轉換表時的內存布局: + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 000003ffffffffff 4TB 用戶空間 +fffffc0000000000 ffffffffffffffff 4TB 內核空間 + + +AArch64 Linux 在頁大小爲 64KB,並使用 3 級轉換表時的內存布局: + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 0000ffffffffffff 256TB 用戶空間 +ffff000000000000 ffffffffffffffff 256TB 內核空間 + + +更詳細的內核虛擬內存布局,請參閱內核啓動信息。 + + +4KB 頁大小的轉換表查找: + ++--------+--------+--------+--------+--------+--------+--------+--------+ +|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| ++--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | | + | | | | | v + | | | | | [11:0] 頁內偏移 + | | | | +-> [20:12] L3 索引 + | | | +-----------> [29:21] L2 索引 + | | +---------------------> [38:30] L1 索引 + | +-------------------------------> [47:39] L0 索引 + +-------------------------------------------------> [63] TTBR0/1 + + +64KB 頁大小的轉換表查找: + ++--------+--------+--------+--------+--------+--------+--------+--------+ +|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| ++--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | + | | | | v + | | | | [15:0] 頁內偏移 + | | | +----------> [28:16] L3 索引 + | | +--------------------------> [41:29] L2 索引 + | +-------------------------------> [47:42] L1 索引 + +-------------------------------------------------> [63] TTBR0/1 + + +當使用 KVM 時, 管理程序(hypervisor)在 EL2 中通過相對內核虛擬地址的 +一個固定偏移來映射內核頁(內核虛擬地址的高 24 位設爲零): + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000004000000000 0000007fffffffff 256GB 在 HYP 中映射的內核對象 + diff --git a/Documentation/translations/zh_TW/arch/arm64/perf.rst b/Documentation/translations/zh_TW/arch/arm64/perf.rst new file mode 100644 index 000000000000..3b39997a52eb --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/perf.rst @@ -0,0 +1,88 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arch/arm64/perf.rst ` + +Translator: Bailu Lin + Hu Haowen + +============= +Perf 事件屬性 +============= + +:作者: Andrew Murray +:日期: 2019-03-06 + +exclude_user +------------ + +該屬性排除用戶空間。 + +用戶空間始終運行在 EL0,因此該屬性將排除 EL0。 + + +exclude_kernel +-------------- + +該屬性排除內核空間。 + +打開 VHE 時內核運行在 EL2,不打開 VHE 時內核運行在 EL1。客戶機 +內核總是運行在 EL1。 + +對於宿主機,該屬性排除 EL1 和 VHE 上的 EL2。 + +對於客戶機,該屬性排除 EL1。請注意客戶機從來不會運行在 EL2。 + + +exclude_hv +---------- + +該屬性排除虛擬機監控器。 + +對於 VHE 宿主機該屬性將被忽略,此時我們認爲宿主機內核是虛擬機監 +控器。 + +對於 non-VHE 宿主機該屬性將排除 EL2,因爲虛擬機監控器運行在 EL2 +的任何代碼主要用於客戶機和宿主機的切換。 + +對於客戶機該屬性無效。請注意客戶機從來不會運行在 EL2。 + + +exclude_host / exclude_guest +---------------------------- + +這些屬性分別排除了 KVM 宿主機和客戶機。 + +KVM 宿主機可能運行在 EL0(用戶空間),EL1(non-VHE 內核)和 +EL2(VHE 內核 或 non-VHE 虛擬機監控器)。 + +KVM 客戶機可能運行在 EL0(用戶空間)和 EL1(內核)。 + +由於宿主機和客戶機之間重疊的異常級別,我們不能僅僅依靠 PMU 的硬體異 +常過濾機制-因此我們必須啓用/禁用對於客戶機進入和退出的計數。而這在 +VHE 和 non-VHE 系統上表現不同。 + +對於 non-VHE 系統的 exclude_host 屬性排除 EL2 - 在進入和退出客戶 +機時,我們會根據 exclude_host 和 exclude_guest 屬性在適當的情況下 +禁用/啓用該事件。 + +對於 VHE 系統的 exclude_guest 屬性排除 EL1,而對其中的 exclude_host +屬性同時排除 EL0,EL2。在進入和退出客戶機時,我們會適當地根據 +exclude_host 和 exclude_guest 屬性包括/排除 EL0。 + +以上聲明也適用於在 not-VHE 客戶機使用這些屬性時,但是請注意客戶機從 +來不會運行在 EL2。 + + +準確性 +------ + +在 non-VHE 宿主機上,我們在 EL2 進入/退出宿主機/客戶機的切換時啓用/ +關閉計數器 -但是在啓用/禁用計數器和進入/退出客戶機之間存在一段延時。 +對於 exclude_host, 我們可以通過過濾 EL2 消除在客戶機進入/退出邊界 +上用於計數客戶機事件的宿主機事件計數器。但是當使用 !exclude_hv 時, +在客戶機進入/退出有一個小的停電窗口無法捕獲到宿主機的事件。 + +在 VHE 系統沒有停電窗口。 + diff --git a/Documentation/translations/zh_TW/arch/arm64/silicon-errata.txt b/Documentation/translations/zh_TW/arch/arm64/silicon-errata.txt new file mode 100644 index 000000000000..66c3a3506458 --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/silicon-errata.txt @@ -0,0 +1,79 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arch/arm64/silicon-errata.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +M: Will Deacon +zh_CN: Fu Wei +zh_TW: Hu Haowen +C: 1926e54f115725a9248d0c4c65c22acaf94de4c4 +--------------------------------------------------------------------- +Documentation/arch/arm64/silicon-errata.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +英文版維護者: Will Deacon +中文版維護者: 傅煒 Fu Wei +中文版翻譯者: 傅煒 Fu Wei +中文版校譯者: 傅煒 Fu Wei +繁體中文版校譯者: 胡皓文 Hu Haowen +本文翻譯提交時的 Git 檢出點爲: 1926e54f115725a9248d0c4c65c22acaf94de4c4 + +以下爲正文 +--------------------------------------------------------------------- + 晶片勘誤和軟體補救措施 + ================== + +作者: Will Deacon +日期: 2015年11月27日 + +一個不幸的現實:硬體經常帶有一些所謂的「瑕疵(errata)」,導致其在 +某些特定情況下會違背構架定義的行爲。就基於 ARM 的硬體而言,這些瑕疵 +大體可分爲以下幾類: + + A 類:無可行補救措施的嚴重缺陷。 + B 類:有可接受的補救措施的重大或嚴重缺陷。 + C 類:在正常操作中不會顯現的小瑕疵。 + +更多資訊,請在 infocenter.arm.com (需註冊)中查閱「軟體開發者勘誤 +筆記」(「Software Developers Errata Notice」)文檔。 + +對於 Linux 而言,B 類缺陷可能需要作業系統的某些特別處理。例如,避免 +一個特殊的代碼序列,或是以一種特定的方式配置處理器。在某種不太常見的 +情況下,爲將 A 類缺陷當作 C 類處理,可能需要用類似的手段。這些手段被 +統稱爲「軟體補救措施」,且僅在少數情況需要(例如,那些需要一個運行在 +非安全異常級的補救措施 *並且* 能被 Linux 觸發的情況)。 + +對於尚在討論中的可能對未受瑕疵影響的系統產生干擾的軟體補救措施,有一個 +相應的內核配置(Kconfig)選項被加在 「內核特性(Kernel Features)」-> +「基於可選方法框架的 ARM 瑕疵補救措施(ARM errata workarounds via +the alternatives framework)"。這些選項被默認開啓,若探測到受影響的CPU, +補丁將在運行時被使用。至於對系統運行影響較小的補救措施,內核配置選項 +並不存在,且代碼以某種規避瑕疵的方式被構造(帶注釋爲宜)。 + +這種做法對於在任意內核原始碼樹中準確地判斷出哪個瑕疵已被軟體方法所補救 +稍微有點麻煩,所以在 Linux 內核中此文件作爲軟體補救措施的註冊表, +並將在新的軟體補救措施被提交和向後移植(backported)到穩定內核時被更新。 + +| 實現者 | 受影響的組件 | 勘誤編號 | 內核配置 | ++----------------+-----------------+-----------------+-------------------------+ +| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | +| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | +| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | +| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | +| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | +| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | +| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | +| ARM | Cortex-A57 | #852523 | N/A | +| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | +| | | | | +| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 | +| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | + diff --git a/Documentation/translations/zh_TW/arch/arm64/tagged-pointers.txt b/Documentation/translations/zh_TW/arch/arm64/tagged-pointers.txt new file mode 100644 index 000000000000..b7f683f20ed1 --- /dev/null +++ b/Documentation/translations/zh_TW/arch/arm64/tagged-pointers.txt @@ -0,0 +1,57 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arch/arm64/tagged-pointers.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Will Deacon +Chinese maintainer: Fu Wei +Traditional Chinese maintainer: Hu Haowen +--------------------------------------------------------------------- +Documentation/arch/arm64/tagged-pointers.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +英文版維護者: Will Deacon +中文版維護者: 傅煒 Fu Wei +中文版翻譯者: 傅煒 Fu Wei +中文版校譯者: 傅煒 Fu Wei +繁體中文版校譯者: 胡皓文 Hu Haowen + +以下爲正文 +--------------------------------------------------------------------- + Linux 在 AArch64 中帶標記的虛擬地址 + ================================= + +作者: Will Deacon +日期: 2013 年 06 月 12 日 + +本文檔簡述了在 AArch64 地址轉換系統中提供的帶標記的虛擬地址及其在 +AArch64 Linux 中的潛在用途。 + +內核提供的地址轉換表配置使通過 TTBR0 完成的虛擬地址轉換(即用戶空間 +映射),其虛擬地址的最高 8 位(63:56)會被轉換硬體所忽略。這種機制 +讓這些位可供應用程式自由使用,其注意事項如下: + + (1) 內核要求所有傳遞到 EL1 的用戶空間地址帶有 0x00 標記。 + 這意味著任何攜帶用戶空間虛擬地址的系統調用(syscall) + 參數 *必須* 在陷入內核前使它們的最高字節被清零。 + + (2) 非零標記在傳遞信號時不被保存。這意味著在應用程式中利用了 + 標記的信號處理函數無法依賴 siginfo_t 的用戶空間虛擬 + 地址所攜帶的包含其內部域信息的標記。此規則的一個例外是 + 當信號是在調試觀察點的異常處理程序中產生的,此時標記的 + 信息將被保存。 + + (3) 當使用帶標記的指針時需特別留心,因爲僅對兩個虛擬地址 + 的高字節,C 編譯器很可能無法判斷它們是不同的。 + +此構架會阻止對帶標記的 PC 指針的利用,因此在異常返回時,其高字節 +將被設置成一個爲 「55」 的擴展符。 + diff --git a/Documentation/translations/zh_TW/arm64/amu.rst b/Documentation/translations/zh_TW/arm64/amu.rst deleted file mode 100644 index ffdc466e0f62..000000000000 --- a/Documentation/translations/zh_TW/arm64/amu.rst +++ /dev/null @@ -1,104 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -.. include:: ../disclaimer-zh_TW.rst - -:Original: :ref:`Documentation/arm64/amu.rst ` - -Translator: Bailu Lin - Hu Haowen - -================================== -AArch64 Linux 中擴展的活動監控單元 -================================== - -作者: Ionela Voinescu - -日期: 2019-09-10 - -本文檔簡要描述了 AArch64 Linux 支持的活動監控單元的規範。 - - -架構總述 --------- - -活動監控是 ARMv8.4 CPU 架構引入的一個可選擴展特性。 - -活動監控單元(在每個 CPU 中實現)爲系統管理提供了性能計數器。既可以通 -過系統寄存器的方式訪問計數器,同時也支持外部內存映射的方式訪問計數器。 - -AMUv1 架構實現了一個由4個固定的64位事件計數器組成的計數器組。 - - - CPU 周期計數器:同 CPU 的頻率增長 - - 常量計數器:同固定的系統時鐘頻率增長 - - 淘汰指令計數器: 同每次架構指令執行增長 - - 內存停頓周期計數器:計算由在時鐘域內的最後一級緩存中未命中而引起 - 的指令調度停頓周期數 - -當處於 WFI 或者 WFE 狀態時,計數器不會增長。 - -AMU 架構提供了一個高達16位的事件計數器空間,未來新的 AMU 版本中可能 -用它來實現新增的事件計數器。 - -另外,AMUv1 實現了一個多達16個64位輔助事件計數器的計數器組。 - -冷復位時所有的計數器會清零。 - - -基本支持 --------- - -內核可以安全地運行在支持 AMU 和不支持 AMU 的 CPU 組合中。 -因此,當配置 CONFIG_ARM64_AMU_EXTN 後我們無條件使能後續 -(secondary or hotplugged) CPU 檢測和使用這個特性。 - -當在 CPU 上檢測到該特性時,我們會標記爲特性可用但是不能保證計數器的功能, -僅表明有擴展屬性。 - -固件(代碼運行在高異常級別,例如 arm-tf )需支持以下功能: - - - 提供低異常級別(EL2 和 EL1)訪問 AMU 寄存器的能力。 - - 使能計數器。如果未使能,它的值應爲 0。 - - 在從電源關閉狀態啓動 CPU 前或後保存或者恢復計數器。 - -當使用使能了該特性的內核啓動但固件損壞時,訪問計數器寄存器可能會遭遇 -panic 或者死鎖。即使未發現這些症狀,計數器寄存器返回的數據結果並不一 -定能反映真實情況。通常,計數器會返回 0,表明他們未被使能。 - -如果固件沒有提供適當的支持最好關閉 CONFIG_ARM64_AMU_EXTN。 -值得注意的是,出於安全原因,不要繞過 AMUSERRENR_EL0 設置而捕獲從 -EL0(用戶空間) 訪問 EL1(內核空間)。 因此,固件應該確保訪問 AMU寄存器 -不會困在 EL2或EL3。 - -AMUv1 的固定計數器可以通過如下系統寄存器訪問: - - - SYS_AMEVCNTR0_CORE_EL0 - - SYS_AMEVCNTR0_CONST_EL0 - - SYS_AMEVCNTR0_INST_RET_EL0 - - SYS_AMEVCNTR0_MEM_STALL_EL0 - -特定輔助計數器可以通過 SYS_AMEVCNTR1_EL0(n) 訪問,其中n介於0到15。 - -詳細信息定義在目錄:arch/arm64/include/asm/sysreg.h。 - - -用戶空間訪問 ------------- - -由於以下原因,當前禁止從用戶空間訪問 AMU 的寄存器: - - - 安全因數:可能會暴露處於安全模式執行的代碼信息。 - - 意願:AMU 是用於系統管理的。 - -同樣,該功能對用戶空間不可見。 - - -虛擬化 ------- - -由於以下原因,當前禁止從 KVM 客戶端的用戶空間(EL0)和內核空間(EL1) -訪問 AMU 的寄存器: - - - 安全因數:可能會暴露給其他客戶端或主機端執行的代碼信息。 - -任何試圖訪問 AMU 寄存器的行爲都會觸發一個註冊在客戶端的未定義異常。 - diff --git a/Documentation/translations/zh_TW/arm64/booting.txt b/Documentation/translations/zh_TW/arm64/booting.txt deleted file mode 100644 index b9439dd54012..000000000000 --- a/Documentation/translations/zh_TW/arm64/booting.txt +++ /dev/null @@ -1,251 +0,0 @@ -SPDX-License-Identifier: GPL-2.0 - -Chinese translated version of Documentation/arm64/booting.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -M: Will Deacon -zh_CN: Fu Wei -zh_TW: Hu Haowen -C: 55f058e7574c3615dea4615573a19bdb258696c6 ---------------------------------------------------------------------- -Documentation/arm64/booting.rst 的中文翻譯 - -如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 -交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 -譯存在問題,請聯繫中文版維護者。 - -英文版維護者: Will Deacon -中文版維護者: 傅煒 Fu Wei -中文版翻譯者: 傅煒 Fu Wei -中文版校譯者: 傅煒 Fu Wei -繁體中文版校譯者: 胡皓文 Hu Haowen -本文翻譯提交時的 Git 檢出點爲: 55f058e7574c3615dea4615573a19bdb258696c6 - -以下爲正文 ---------------------------------------------------------------------- - 啓動 AArch64 Linux - ================== - -作者: Will Deacon -日期: 2012 年 09 月 07 日 - -本文檔基於 Russell King 的 ARM 啓動文檔,且適用於所有公開發布的 -AArch64 Linux 內核代碼。 - -AArch64 異常模型由多個異常級(EL0 - EL3)組成,對於 EL0 和 EL1 異常級 -有對應的安全和非安全模式。EL2 是系統管理級,且僅存在於非安全模式下。 -EL3 是最高特權級,且僅存在於安全模式下。 - -基於本文檔的目的,我們將簡單地使用『引導裝載程序』(『boot loader』) -這個術語來定義在將控制權交給 Linux 內核前 CPU 上執行的所有軟體。 -這可能包含安全監控和系統管理代碼,或者它可能只是一些用於準備最小啓動 -環境的指令。 - -基本上,引導裝載程序(至少)應實現以下操作: - -1、設置和初始化 RAM -2、設置設備樹數據 -3、解壓內核映像 -4、調用內核映像 - - -1、設置和初始化 RAM ------------------ - -必要性: 強制 - -引導裝載程序應該找到並初始化系統中所有內核用於保持系統變量數據的 RAM。 -這個操作的執行方式因設備而異。(它可能使用內部算法來自動定位和計算所有 -RAM,或可能使用對這個設備已知的 RAM 信息,還可能是引導裝載程序設計者 -想到的任何合適的方法。) - - -2、設置設備樹數據 ---------------- - -必要性: 強制 - -設備樹數據塊(dtb)必須 8 字節對齊,且大小不能超過 2MB。由於設備樹 -數據塊將在使能緩存的情況下以 2MB 粒度被映射,故其不能被置於必須以特定 -屬性映射的2M區域內。 - -註: v4.2 之前的版本同時要求設備樹數據塊被置於從內核映像以下 -text_offset 字節處算起第一個 512MB 內。 - -3、解壓內核映像 -------------- - -必要性: 可選 - -AArch64 內核當前沒有提供自解壓代碼,因此如果使用了壓縮內核映像文件 -(比如 Image.gz),則需要通過引導裝載程序(使用 gzip 等)來進行解壓。 -若引導裝載程序沒有實現這個功能,就要使用非壓縮內核映像文件。 - - -4、調用內核映像 -------------- - -必要性: 強制 - -已解壓的內核映像包含一個 64 字節的頭,內容如下: - - u32 code0; /* 可執行代碼 */ - u32 code1; /* 可執行代碼 */ - u64 text_offset; /* 映像裝載偏移,小端模式 */ - u64 image_size; /* 映像實際大小, 小端模式 */ - u64 flags; /* 內核旗標, 小端模式 * - u64 res2 = 0; /* 保留 */ - u64 res3 = 0; /* 保留 */ - u64 res4 = 0; /* 保留 */ - u32 magic = 0x644d5241; /* 魔數, 小端, "ARM\x64" */ - u32 res5; /* 保留 (用於 PE COFF 偏移) */ - - -映像頭注釋: - -- 自 v3.17 起,除非另有說明,所有域都是小端模式。 - -- code0/code1 負責跳轉到 stext. - -- 當通過 EFI 啓動時, 最初 code0/code1 被跳過。 - res5 是到 PE 文件頭的偏移,而 PE 文件頭含有 EFI 的啓動入口點 - (efi_stub_entry)。當 stub 代碼完成了它的使命,它會跳轉到 code0 - 繼續正常的啓動流程。 - -- v3.17 之前,未明確指定 text_offset 的字節序。此時,image_size 爲零, - 且 text_offset 依照內核字節序爲 0x80000。 - 當 image_size 非零,text_offset 爲小端模式且是有效值,應被引導加載 - 程序使用。當 image_size 爲零,text_offset 可假定爲 0x80000。 - -- flags 域 (v3.17 引入) 爲 64 位小端模式,其編碼如下: - 位 0: 內核字節序。 1 表示大端模式,0 表示小端模式。 - 位 1-2: 內核頁大小。 - 0 - 未指定。 - 1 - 4K - 2 - 16K - 3 - 64K - 位 3: 內核物理位置 - 0 - 2MB 對齊基址應儘量靠近內存起始處,因爲 - 其基址以下的內存無法通過線性映射訪問 - 1 - 2MB 對齊基址可以在物理內存的任意位置 - 位 4-63: 保留。 - -- 當 image_size 爲零時,引導裝載程序應試圖在內核映像末尾之後儘可能 - 多地保留空閒內存供內核直接使用。對內存空間的需求量因所選定的內核 - 特性而異, 並無實際限制。 - -內核映像必須被放置在任意一個可用系統內存 2MB 對齊基址的 text_offset -字節處,並從該處被調用。2MB 對齊基址和內核映像起始地址之間的區域對於 -內核來說沒有特殊意義,且可能被用於其他目的。 -從映像起始地址算起,最少必須準備 image_size 字節的空閒內存供內核使用。 -註: v4.6 之前的版本無法使用內核映像物理偏移以下的內存,所以當時建議 -將映像儘量放置在靠近系統內存起始的地方。 - -任何提供給內核的內存(甚至在映像起始地址之前),若未從內核中標記爲保留 -(如在設備樹(dtb)的 memreserve 區域),都將被認爲對內核是可用。 - -在跳轉入內核前,必須符合以下狀態: - -- 停止所有 DMA 設備,這樣內存數據就不會因爲虛假網絡包或磁碟數據而 - 被破壞。這可能可以節省你許多的調試時間。 - -- 主 CPU 通用寄存器設置 - x0 = 系統 RAM 中設備樹數據塊(dtb)的物理地址。 - x1 = 0 (保留,將來可能使用) - x2 = 0 (保留,將來可能使用) - x3 = 0 (保留,將來可能使用) - -- CPU 模式 - 所有形式的中斷必須在 PSTATE.DAIF 中被屏蔽(Debug、SError、IRQ - 和 FIQ)。 - CPU 必須處於 EL2(推薦,可訪問虛擬化擴展)或非安全 EL1 模式下。 - -- 高速緩存、MMU - MMU 必須關閉。 - 指令緩存開啓或關閉皆可。 - 已載入的內核映像的相應內存區必須被清理,以達到緩存一致性點(PoC)。 - 當存在系統緩存或其他使能緩存的一致性主控器時,通常需使用虛擬地址 - 維護其緩存,而非 set/way 操作。 - 遵從通過虛擬地址操作維護構架緩存的系統緩存必須被配置,並可以被使能。 - 而不通過虛擬地址操作維護構架緩存的系統緩存(不推薦),必須被配置且 - 禁用。 - - *譯者註:對於 PoC 以及緩存相關內容,請參考 ARMv8 構架參考手冊 - ARM DDI 0487A - -- 架構計時器 - CNTFRQ 必須設定爲計時器的頻率,且 CNTVOFF 必須設定爲對所有 CPU - 都一致的值。如果在 EL1 模式下進入內核,則 CNTHCTL_EL2 中的 - EL1PCTEN (bit 0) 必須置位。 - -- 一致性 - 通過內核啓動的所有 CPU 在內核入口地址上必須處於相同的一致性域中。 - 這可能要根據具體實現來定義初始化過程,以使能每個CPU上對維護操作的 - 接收。 - -- 系統寄存器 - 在進入內核映像的異常級中,所有構架中可寫的系統寄存器必須通過軟體 - 在一個更高的異常級別下初始化,以防止在 未知 狀態下運行。 - - 對於擁有 GICv3 中斷控制器並以 v3 模式運行的系統: - - 如果 EL3 存在: - ICC_SRE_EL3.Enable (位 3) 必須初始化爲 0b1。 - ICC_SRE_EL3.SRE (位 0) 必須初始化爲 0b1。 - - 若內核運行在 EL1: - ICC_SRE_EL2.Enable (位 3) 必須初始化爲 0b1。 - ICC_SRE_EL2.SRE (位 0) 必須初始化爲 0b1。 - - 設備樹(DT)或 ACPI 表必須描述一個 GICv3 中斷控制器。 - - 對於擁有 GICv3 中斷控制器並以兼容(v2)模式運行的系統: - - 如果 EL3 存在: - ICC_SRE_EL3.SRE (位 0) 必須初始化爲 0b0。 - - 若內核運行在 EL1: - ICC_SRE_EL2.SRE (位 0) 必須初始化爲 0b0。 - - 設備樹(DT)或 ACPI 表必須描述一個 GICv2 中斷控制器。 - -以上對於 CPU 模式、高速緩存、MMU、架構計時器、一致性、系統寄存器的 -必要條件描述適用於所有 CPU。所有 CPU 必須在同一異常級別跳入內核。 - -引導裝載程序必須在每個 CPU 處於以下狀態時跳入內核入口: - -- 主 CPU 必須直接跳入內核映像的第一條指令。通過此 CPU 傳遞的設備樹 - 數據塊必須在每個 CPU 節點中包含一個 『enable-method』 屬性,所 - 支持的 enable-method 請見下文。 - - 引導裝載程序必須生成這些設備樹屬性,並在跳入內核入口之前將其插入 - 數據塊。 - -- enable-method 爲 「spin-table」 的 CPU 必須在它們的 CPU - 節點中包含一個 『cpu-release-addr』 屬性。這個屬性標識了一個 - 64 位自然對齊且初始化爲零的內存位置。 - - 這些 CPU 必須在內存保留區(通過設備樹中的 /memreserve/ 域傳遞 - 給內核)中自旋於內核之外,輪詢它們的 cpu-release-addr 位置(必須 - 包含在保留區中)。可通過插入 wfe 指令來降低忙循環開銷,而主 CPU 將 - 發出 sev 指令。當對 cpu-release-addr 所指位置的讀取操作返回非零值 - 時,CPU 必須跳入此值所指向的地址。此值爲一個單獨的 64 位小端值, - 因此 CPU 須在跳轉前將所讀取的值轉換爲其本身的端模式。 - -- enable-method 爲 「psci」 的 CPU 保持在內核外(比如,在 - memory 節點中描述爲內核空間的內存區外,或在通過設備樹 /memreserve/ - 域中描述爲內核保留區的空間中)。內核將會發起在 ARM 文檔(編號 - ARM DEN 0022A:用於 ARM 上的電源狀態協調接口系統軟體)中描述的 - CPU_ON 調用來將 CPU 帶入內核。 - - *譯者注: ARM DEN 0022A 已更新到 ARM DEN 0022C。 - - 設備樹必須包含一個 『psci』 節點,請參考以下文檔: - Documentation/devicetree/bindings/arm/psci.yaml - - -- 輔助 CPU 通用寄存器設置 - x0 = 0 (保留,將來可能使用) - x1 = 0 (保留,將來可能使用) - x2 = 0 (保留,將來可能使用) - x3 = 0 (保留,將來可能使用) - diff --git a/Documentation/translations/zh_TW/arm64/elf_hwcaps.rst b/Documentation/translations/zh_TW/arm64/elf_hwcaps.rst deleted file mode 100644 index 3eb1c623ce31..000000000000 --- a/Documentation/translations/zh_TW/arm64/elf_hwcaps.rst +++ /dev/null @@ -1,244 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -.. include:: ../disclaimer-zh_TW.rst - -:Original: :ref:`Documentation/arm64/elf_hwcaps.rst ` - -Translator: Bailu Lin - Hu Haowen - -================ -ARM64 ELF hwcaps -================ - -這篇文檔描述了 arm64 ELF hwcaps 的用法和語義。 - - -1. 簡介 -------- - -有些硬體或軟體功能僅在某些 CPU 實現上和/或在具體某個內核配置上可用,但 -對於處於 EL0 的用戶空間代碼沒有可用的架構發現機制。內核通過在輔助向量表 -公開一組稱爲 hwcaps 的標誌而把這些功能暴露給用戶空間。 - -用戶空間軟體可以通過獲取輔助向量的 AT_HWCAP 或 AT_HWCAP2 條目來測試功能, -並測試是否設置了相關標誌,例如:: - - bool floating_point_is_present(void) - { - unsigned long hwcaps = getauxval(AT_HWCAP); - if (hwcaps & HWCAP_FP) - return true; - - return false; - } - -如果軟體依賴於 hwcap 描述的功能,在嘗試使用該功能前則應檢查相關的 hwcap -標誌以驗證該功能是否存在。 - -不能通過其他方式探查這些功能。當一個功能不可用時,嘗試使用它可能導致不可 -預測的行爲,並且無法保證能確切的知道該功能不可用,例如 SIGILL。 - - -2. Hwcaps 的說明 ----------------- - -大多數 hwcaps 旨在說明通過架構 ID 寄存器(處於 EL0 的用戶空間代碼無法訪問) -描述的功能的存在。這些 hwcap 通過 ID 寄存器欄位定義,並且應根據 ARM 體系 -結構參考手冊(ARM ARM)中定義的欄位來解釋說明。 - -這些 hwcaps 以下面的形式描述:: - - idreg.field == val 表示有某個功能。 - -當 idreg.field 中有 val 時,hwcaps 表示 ARM ARM 定義的功能是有效的,但是 -並不是說要完全和 val 相等,也不是說 idreg.field 描述的其他功能就是缺失的。 - -其他 hwcaps 可能表明無法僅由 ID 寄存器描述的功能的存在。這些 hwcaps 可能 -沒有被 ID 寄存器描述,需要參考其他文檔。 - - -3. AT_HWCAP 中揭示的 hwcaps ---------------------------- - -HWCAP_FP - ID_AA64PFR0_EL1.FP == 0b0000 表示有此功能。 - -HWCAP_ASIMD - ID_AA64PFR0_EL1.AdvSIMD == 0b0000 表示有此功能。 - -HWCAP_EVTSTRM - 通用計時器頻率配置爲大約100KHz以生成事件。 - -HWCAP_AES - ID_AA64ISAR0_EL1.AES == 0b0001 表示有此功能。 - -HWCAP_PMULL - ID_AA64ISAR0_EL1.AES == 0b0010 表示有此功能。 - -HWCAP_SHA1 - ID_AA64ISAR0_EL1.SHA1 == 0b0001 表示有此功能。 - -HWCAP_SHA2 - ID_AA64ISAR0_EL1.SHA2 == 0b0001 表示有此功能。 - -HWCAP_CRC32 - ID_AA64ISAR0_EL1.CRC32 == 0b0001 表示有此功能。 - -HWCAP_ATOMICS - ID_AA64ISAR0_EL1.Atomic == 0b0010 表示有此功能。 - -HWCAP_FPHP - ID_AA64PFR0_EL1.FP == 0b0001 表示有此功能。 - -HWCAP_ASIMDHP - ID_AA64PFR0_EL1.AdvSIMD == 0b0001 表示有此功能。 - -HWCAP_CPUID - 根據 Documentation/arm64/cpu-feature-registers.rst 描述,EL0 可以訪問 - 某些 ID 寄存器。 - - 這些 ID 寄存器可能表示功能的可用性。 - -HWCAP_ASIMDRDM - ID_AA64ISAR0_EL1.RDM == 0b0001 表示有此功能。 - -HWCAP_JSCVT - ID_AA64ISAR1_EL1.JSCVT == 0b0001 表示有此功能。 - -HWCAP_FCMA - ID_AA64ISAR1_EL1.FCMA == 0b0001 表示有此功能。 - -HWCAP_LRCPC - ID_AA64ISAR1_EL1.LRCPC == 0b0001 表示有此功能。 - -HWCAP_DCPOP - ID_AA64ISAR1_EL1.DPB == 0b0001 表示有此功能。 - -HWCAP_SHA3 - ID_AA64ISAR0_EL1.SHA3 == 0b0001 表示有此功能。 - -HWCAP_SM3 - ID_AA64ISAR0_EL1.SM3 == 0b0001 表示有此功能。 - -HWCAP_SM4 - ID_AA64ISAR0_EL1.SM4 == 0b0001 表示有此功能。 - -HWCAP_ASIMDDP - ID_AA64ISAR0_EL1.DP == 0b0001 表示有此功能。 - -HWCAP_SHA512 - ID_AA64ISAR0_EL1.SHA2 == 0b0010 表示有此功能。 - -HWCAP_SVE - ID_AA64PFR0_EL1.SVE == 0b0001 表示有此功能。 - -HWCAP_ASIMDFHM - ID_AA64ISAR0_EL1.FHM == 0b0001 表示有此功能。 - -HWCAP_DIT - ID_AA64PFR0_EL1.DIT == 0b0001 表示有此功能。 - -HWCAP_USCAT - ID_AA64MMFR2_EL1.AT == 0b0001 表示有此功能。 - -HWCAP_ILRCPC - ID_AA64ISAR1_EL1.LRCPC == 0b0010 表示有此功能。 - -HWCAP_FLAGM - ID_AA64ISAR0_EL1.TS == 0b0001 表示有此功能。 - -HWCAP_SSBS - ID_AA64PFR1_EL1.SSBS == 0b0010 表示有此功能。 - -HWCAP_SB - ID_AA64ISAR1_EL1.SB == 0b0001 表示有此功能。 - -HWCAP_PACA - 如 Documentation/arm64/pointer-authentication.rst 所描述, - ID_AA64ISAR1_EL1.APA == 0b0001 或 ID_AA64ISAR1_EL1.API == 0b0001 - 表示有此功能。 - -HWCAP_PACG - 如 Documentation/arm64/pointer-authentication.rst 所描述, - ID_AA64ISAR1_EL1.GPA == 0b0001 或 ID_AA64ISAR1_EL1.GPI == 0b0001 - 表示有此功能。 - -HWCAP2_DCPODP - - ID_AA64ISAR1_EL1.DPB == 0b0010 表示有此功能。 - -HWCAP2_SVE2 - - ID_AA64ZFR0_EL1.SVEVer == 0b0001 表示有此功能。 - -HWCAP2_SVEAES - - ID_AA64ZFR0_EL1.AES == 0b0001 表示有此功能。 - -HWCAP2_SVEPMULL - - ID_AA64ZFR0_EL1.AES == 0b0010 表示有此功能。 - -HWCAP2_SVEBITPERM - - ID_AA64ZFR0_EL1.BitPerm == 0b0001 表示有此功能。 - -HWCAP2_SVESHA3 - - ID_AA64ZFR0_EL1.SHA3 == 0b0001 表示有此功能。 - -HWCAP2_SVESM4 - - ID_AA64ZFR0_EL1.SM4 == 0b0001 表示有此功能。 - -HWCAP2_FLAGM2 - - ID_AA64ISAR0_EL1.TS == 0b0010 表示有此功能。 - -HWCAP2_FRINT - - ID_AA64ISAR1_EL1.FRINTTS == 0b0001 表示有此功能。 - -HWCAP2_SVEI8MM - - ID_AA64ZFR0_EL1.I8MM == 0b0001 表示有此功能。 - -HWCAP2_SVEF32MM - - ID_AA64ZFR0_EL1.F32MM == 0b0001 表示有此功能。 - -HWCAP2_SVEF64MM - - ID_AA64ZFR0_EL1.F64MM == 0b0001 表示有此功能。 - -HWCAP2_SVEBF16 - - ID_AA64ZFR0_EL1.BF16 == 0b0001 表示有此功能。 - -HWCAP2_I8MM - - ID_AA64ISAR1_EL1.I8MM == 0b0001 表示有此功能。 - -HWCAP2_BF16 - - ID_AA64ISAR1_EL1.BF16 == 0b0001 表示有此功能。 - -HWCAP2_DGH - - ID_AA64ISAR1_EL1.DGH == 0b0001 表示有此功能。 - -HWCAP2_RNG - - ID_AA64ISAR0_EL1.RNDR == 0b0001 表示有此功能。 - -HWCAP2_BTI - - ID_AA64PFR0_EL1.BT == 0b0001 表示有此功能。 - - -4. 未使用的 AT_HWCAP 位 ------------------------ - -爲了與用戶空間交互,內核保證 AT_HWCAP 的第62、63位將始終返回0。 - diff --git a/Documentation/translations/zh_TW/arm64/hugetlbpage.rst b/Documentation/translations/zh_TW/arm64/hugetlbpage.rst deleted file mode 100644 index 846b500dae97..000000000000 --- a/Documentation/translations/zh_TW/arm64/hugetlbpage.rst +++ /dev/null @@ -1,49 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -.. include:: ../disclaimer-zh_TW.rst - -:Original: :ref:`Documentation/arm64/hugetlbpage.rst ` - -Translator: Bailu Lin - Hu Haowen - -===================== -ARM64中的 HugeTLBpage -===================== - -大頁依靠有效利用 TLBs 來提高地址翻譯的性能。這取決於以下 -兩點 - - - - 大頁的大小 - - TLBs 支持的條目大小 - -ARM64 接口支持2種大頁方式。 - -1) pud/pmd 級別的塊映射 ------------------------ - -這是常規大頁,他們的 pmd 或 pud 頁面表條目指向一個內存塊。 -不管 TLB 中支持的條目大小如何,塊映射可以減少翻譯大頁地址 -所需遍歷的頁表深度。 - -2) 使用連續位 -------------- - -架構中轉換頁表條目(D4.5.3, ARM DDI 0487C.a)中提供一個連續 -位告訴 MMU 這個條目是一個連續條目集的一員,它可以被緩存在單 -個 TLB 條目中。 - -在 Linux 中連續位用來增加 pmd 和 pte(最後一級)級別映射的大 -小。受支持的連續頁表條目數量因頁面大小和頁表級別而異。 - - -支持以下大頁尺寸配置 - - - ====== ======== ==== ======== === - - CONT PTE PMD CONT PMD PUD - ====== ======== ==== ======== === - 4K: 64K 2M 32M 1G - 16K: 2M 32M 1G - 64K: 2M 512M 16G - ====== ======== ==== ======== === - diff --git a/Documentation/translations/zh_TW/arm64/index.rst b/Documentation/translations/zh_TW/arm64/index.rst deleted file mode 100644 index 2322783f3881..000000000000 --- a/Documentation/translations/zh_TW/arm64/index.rst +++ /dev/null @@ -1,23 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -.. include:: ../disclaimer-zh_TW.rst - -:Original: :ref:`Documentation/arm64/index.rst ` -:Translator: Bailu Lin - Hu Haowen - -.. _tw_arm64_index: - - -========== -ARM64 架構 -========== - -.. toctree:: - :maxdepth: 2 - - amu - hugetlbpage - perf - elf_hwcaps - diff --git a/Documentation/translations/zh_TW/arm64/legacy_instructions.txt b/Documentation/translations/zh_TW/arm64/legacy_instructions.txt deleted file mode 100644 index 6d4454f77b9e..000000000000 --- a/Documentation/translations/zh_TW/arm64/legacy_instructions.txt +++ /dev/null @@ -1,77 +0,0 @@ -SPDX-License-Identifier: GPL-2.0 - -Chinese translated version of Documentation/arm64/legacy_instructions.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -Maintainer: Punit Agrawal - Suzuki K. Poulose -Chinese maintainer: Fu Wei -Traditional Chinese maintainer: Hu Haowen ---------------------------------------------------------------------- -Documentation/arm64/legacy_instructions.rst 的中文翻譯 - -如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 -交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 -譯存在問題,請聯繫中文版維護者。 - -本文翻譯提交時的 Git 檢出點爲: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 - -英文版維護者: Punit Agrawal - Suzuki K. Poulose -中文版維護者: 傅煒 Fu Wei -中文版翻譯者: 傅煒 Fu Wei -中文版校譯者: 傅煒 Fu Wei -繁體中文版校譯者:胡皓文 Hu Haowen - -以下爲正文 ---------------------------------------------------------------------- -Linux 內核在 arm64 上的移植提供了一個基礎框架,以支持構架中正在被淘汰或已廢棄指令的模擬執行。 -這個基礎框架的代碼使用未定義指令鉤子(hooks)來支持模擬。如果指令存在,它也允許在硬體中啓用該指令。 - -模擬模式可通過寫 sysctl 節點(/proc/sys/abi)來控制。 -不同的執行方式及 sysctl 節點的相應值,解釋如下: - -* Undef(未定義) - 值: 0 - 產生未定義指令終止異常。它是那些構架中已廢棄的指令,如 SWP,的默認處理方式。 - -* Emulate(模擬) - 值: 1 - 使用軟體模擬方式。爲解決軟體遷移問題,這種模擬指令模式的使用是被跟蹤的,並會發出速率限制警告。 - 它是那些構架中正在被淘汰的指令,如 CP15 barriers(隔離指令),的默認處理方式。 - -* Hardware Execution(硬體執行) - 值: 2 - 雖然標記爲正在被淘汰,但一些實現可能提供硬體執行這些指令的使能/禁用操作。 - 使用硬體執行一般會有更好的性能,但將無法收集運行時對正被淘汰指令的使用統計數據。 - -默認執行模式依賴於指令在構架中狀態。正在被淘汰的指令應該以模擬(Emulate)作爲默認模式, -而已廢棄的指令必須默認使用未定義(Undef)模式 - -注意:指令模擬可能無法應對所有情況。更多詳情請參考單獨的指令注釋。 - -受支持的遺留指令 -------------- -* SWP{B} -節點: /proc/sys/abi/swp -狀態: 已廢棄 -默認執行方式: Undef (0) - -* CP15 Barriers -節點: /proc/sys/abi/cp15_barrier -狀態: 正被淘汰,不推薦使用 -默認執行方式: Emulate (1) - -* SETEND -節點: /proc/sys/abi/setend -狀態: 正被淘汰,不推薦使用 -默認執行方式: Emulate (1)* -註:爲了使能這個特性,系統中的所有 CPU 必須在 EL0 支持混合字節序。 -如果一個新的 CPU (不支持混合字節序) 在使能這個特性後被熱插入系統, -在應用中可能會出現不可預期的結果。 - diff --git a/Documentation/translations/zh_TW/arm64/memory.txt b/Documentation/translations/zh_TW/arm64/memory.txt deleted file mode 100644 index 99c2b78b5674..000000000000 --- a/Documentation/translations/zh_TW/arm64/memory.txt +++ /dev/null @@ -1,119 +0,0 @@ -SPDX-License-Identifier: GPL-2.0 - -Chinese translated version of Documentation/arm64/memory.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -Maintainer: Catalin Marinas -Chinese maintainer: Fu Wei -Traditional Chinese maintainer: Hu Haowen ---------------------------------------------------------------------- -Documentation/arm64/memory.rst 的中文翻譯 - -如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 -交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 -譯存在問題,請聯繫中文版維護者。 - -本文翻譯提交時的 Git 檢出點爲: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 - -英文版維護者: Catalin Marinas -中文版維護者: 傅煒 Fu Wei -中文版翻譯者: 傅煒 Fu Wei -中文版校譯者: 傅煒 Fu Wei -繁體中文版校譯者: 胡皓文 Hu Haowen - -以下爲正文 ---------------------------------------------------------------------- - Linux 在 AArch64 中的內存布局 - =========================== - -作者: Catalin Marinas - -本文檔描述 AArch64 Linux 內核所使用的虛擬內存布局。此構架可以實現 -頁大小爲 4KB 的 4 級轉換表和頁大小爲 64KB 的 3 級轉換表。 - -AArch64 Linux 使用 3 級或 4 級轉換表,其頁大小配置爲 4KB,對於用戶和內核 -分別都有 39-bit (512GB) 或 48-bit (256TB) 的虛擬地址空間。 -對於頁大小爲 64KB的配置,僅使用 2 級轉換表,有 42-bit (4TB) 的虛擬地址空間,但內存布局相同。 - -用戶地址空間的 63:48 位爲 0,而內核地址空間的相應位爲 1。TTBRx 的 -選擇由虛擬地址的 63 位給出。swapper_pg_dir 僅包含內核(全局)映射, -而用戶 pgd 僅包含用戶(非全局)映射。swapper_pg_dir 地址被寫入 -TTBR1 中,且從不寫入 TTBR0。 - - -AArch64 Linux 在頁大小爲 4KB,並使用 3 級轉換表時的內存布局: - -起始地址 結束地址 大小 用途 ------------------------------------------------------------------------ -0000000000000000 0000007fffffffff 512GB 用戶空間 -ffffff8000000000 ffffffffffffffff 512GB 內核空間 - - -AArch64 Linux 在頁大小爲 4KB,並使用 4 級轉換表時的內存布局: - -起始地址 結束地址 大小 用途 ------------------------------------------------------------------------ -0000000000000000 0000ffffffffffff 256TB 用戶空間 -ffff000000000000 ffffffffffffffff 256TB 內核空間 - - -AArch64 Linux 在頁大小爲 64KB,並使用 2 級轉換表時的內存布局: - -起始地址 結束地址 大小 用途 ------------------------------------------------------------------------ -0000000000000000 000003ffffffffff 4TB 用戶空間 -fffffc0000000000 ffffffffffffffff 4TB 內核空間 - - -AArch64 Linux 在頁大小爲 64KB,並使用 3 級轉換表時的內存布局: - -起始地址 結束地址 大小 用途 ------------------------------------------------------------------------ -0000000000000000 0000ffffffffffff 256TB 用戶空間 -ffff000000000000 ffffffffffffffff 256TB 內核空間 - - -更詳細的內核虛擬內存布局,請參閱內核啓動信息。 - - -4KB 頁大小的轉換表查找: - -+--------+--------+--------+--------+--------+--------+--------+--------+ -|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| -+--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | | - | | | | | v - | | | | | [11:0] 頁內偏移 - | | | | +-> [20:12] L3 索引 - | | | +-----------> [29:21] L2 索引 - | | +---------------------> [38:30] L1 索引 - | +-------------------------------> [47:39] L0 索引 - +-------------------------------------------------> [63] TTBR0/1 - - -64KB 頁大小的轉換表查找: - -+--------+--------+--------+--------+--------+--------+--------+--------+ -|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| -+--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | - | | | | v - | | | | [15:0] 頁內偏移 - | | | +----------> [28:16] L3 索引 - | | +--------------------------> [41:29] L2 索引 - | +-------------------------------> [47:42] L1 索引 - +-------------------------------------------------> [63] TTBR0/1 - - -當使用 KVM 時, 管理程序(hypervisor)在 EL2 中通過相對內核虛擬地址的 -一個固定偏移來映射內核頁(內核虛擬地址的高 24 位設爲零): - -起始地址 結束地址 大小 用途 ------------------------------------------------------------------------ -0000004000000000 0000007fffffffff 256GB 在 HYP 中映射的內核對象 - diff --git a/Documentation/translations/zh_TW/arm64/perf.rst b/Documentation/translations/zh_TW/arm64/perf.rst deleted file mode 100644 index f1ffd55dfe50..000000000000 --- a/Documentation/translations/zh_TW/arm64/perf.rst +++ /dev/null @@ -1,88 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -.. include:: ../disclaimer-zh_TW.rst - -:Original: :ref:`Documentation/arm64/perf.rst ` - -Translator: Bailu Lin - Hu Haowen - -============= -Perf 事件屬性 -============= - -:作者: Andrew Murray -:日期: 2019-03-06 - -exclude_user ------------- - -該屬性排除用戶空間。 - -用戶空間始終運行在 EL0,因此該屬性將排除 EL0。 - - -exclude_kernel --------------- - -該屬性排除內核空間。 - -打開 VHE 時內核運行在 EL2,不打開 VHE 時內核運行在 EL1。客戶機 -內核總是運行在 EL1。 - -對於宿主機,該屬性排除 EL1 和 VHE 上的 EL2。 - -對於客戶機,該屬性排除 EL1。請注意客戶機從來不會運行在 EL2。 - - -exclude_hv ----------- - -該屬性排除虛擬機監控器。 - -對於 VHE 宿主機該屬性將被忽略,此時我們認爲宿主機內核是虛擬機監 -控器。 - -對於 non-VHE 宿主機該屬性將排除 EL2,因爲虛擬機監控器運行在 EL2 -的任何代碼主要用於客戶機和宿主機的切換。 - -對於客戶機該屬性無效。請注意客戶機從來不會運行在 EL2。 - - -exclude_host / exclude_guest ----------------------------- - -這些屬性分別排除了 KVM 宿主機和客戶機。 - -KVM 宿主機可能運行在 EL0(用戶空間),EL1(non-VHE 內核)和 -EL2(VHE 內核 或 non-VHE 虛擬機監控器)。 - -KVM 客戶機可能運行在 EL0(用戶空間)和 EL1(內核)。 - -由於宿主機和客戶機之間重疊的異常級別,我們不能僅僅依靠 PMU 的硬體異 -常過濾機制-因此我們必須啓用/禁用對於客戶機進入和退出的計數。而這在 -VHE 和 non-VHE 系統上表現不同。 - -對於 non-VHE 系統的 exclude_host 屬性排除 EL2 - 在進入和退出客戶 -機時,我們會根據 exclude_host 和 exclude_guest 屬性在適當的情況下 -禁用/啓用該事件。 - -對於 VHE 系統的 exclude_guest 屬性排除 EL1,而對其中的 exclude_host -屬性同時排除 EL0,EL2。在進入和退出客戶機時,我們會適當地根據 -exclude_host 和 exclude_guest 屬性包括/排除 EL0。 - -以上聲明也適用於在 not-VHE 客戶機使用這些屬性時,但是請注意客戶機從 -來不會運行在 EL2。 - - -準確性 ------- - -在 non-VHE 宿主機上,我們在 EL2 進入/退出宿主機/客戶機的切換時啓用/ -關閉計數器 -但是在啓用/禁用計數器和進入/退出客戶機之間存在一段延時。 -對於 exclude_host, 我們可以通過過濾 EL2 消除在客戶機進入/退出邊界 -上用於計數客戶機事件的宿主機事件計數器。但是當使用 !exclude_hv 時, -在客戶機進入/退出有一個小的停電窗口無法捕獲到宿主機的事件。 - -在 VHE 系統沒有停電窗口。 - diff --git a/Documentation/translations/zh_TW/arm64/silicon-errata.txt b/Documentation/translations/zh_TW/arm64/silicon-errata.txt deleted file mode 100644 index bf2077197504..000000000000 --- a/Documentation/translations/zh_TW/arm64/silicon-errata.txt +++ /dev/null @@ -1,79 +0,0 @@ -SPDX-License-Identifier: GPL-2.0 - -Chinese translated version of Documentation/arm64/silicon-errata.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -M: Will Deacon -zh_CN: Fu Wei -zh_TW: Hu Haowen -C: 1926e54f115725a9248d0c4c65c22acaf94de4c4 ---------------------------------------------------------------------- -Documentation/arm64/silicon-errata.rst 的中文翻譯 - -如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 -交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 -譯存在問題,請聯繫中文版維護者。 - -英文版維護者: Will Deacon -中文版維護者: 傅煒 Fu Wei -中文版翻譯者: 傅煒 Fu Wei -中文版校譯者: 傅煒 Fu Wei -繁體中文版校譯者: 胡皓文 Hu Haowen -本文翻譯提交時的 Git 檢出點爲: 1926e54f115725a9248d0c4c65c22acaf94de4c4 - -以下爲正文 ---------------------------------------------------------------------- - 晶片勘誤和軟體補救措施 - ================== - -作者: Will Deacon -日期: 2015年11月27日 - -一個不幸的現實:硬體經常帶有一些所謂的「瑕疵(errata)」,導致其在 -某些特定情況下會違背構架定義的行爲。就基於 ARM 的硬體而言,這些瑕疵 -大體可分爲以下幾類: - - A 類:無可行補救措施的嚴重缺陷。 - B 類:有可接受的補救措施的重大或嚴重缺陷。 - C 類:在正常操作中不會顯現的小瑕疵。 - -更多資訊,請在 infocenter.arm.com (需註冊)中查閱「軟體開發者勘誤 -筆記」(「Software Developers Errata Notice」)文檔。 - -對於 Linux 而言,B 類缺陷可能需要作業系統的某些特別處理。例如,避免 -一個特殊的代碼序列,或是以一種特定的方式配置處理器。在某種不太常見的 -情況下,爲將 A 類缺陷當作 C 類處理,可能需要用類似的手段。這些手段被 -統稱爲「軟體補救措施」,且僅在少數情況需要(例如,那些需要一個運行在 -非安全異常級的補救措施 *並且* 能被 Linux 觸發的情況)。 - -對於尚在討論中的可能對未受瑕疵影響的系統產生干擾的軟體補救措施,有一個 -相應的內核配置(Kconfig)選項被加在 「內核特性(Kernel Features)」-> -「基於可選方法框架的 ARM 瑕疵補救措施(ARM errata workarounds via -the alternatives framework)"。這些選項被默認開啓,若探測到受影響的CPU, -補丁將在運行時被使用。至於對系統運行影響較小的補救措施,內核配置選項 -並不存在,且代碼以某種規避瑕疵的方式被構造(帶注釋爲宜)。 - -這種做法對於在任意內核原始碼樹中準確地判斷出哪個瑕疵已被軟體方法所補救 -稍微有點麻煩,所以在 Linux 內核中此文件作爲軟體補救措施的註冊表, -並將在新的軟體補救措施被提交和向後移植(backported)到穩定內核時被更新。 - -| 實現者 | 受影響的組件 | 勘誤編號 | 內核配置 | -+----------------+-----------------+-----------------+-------------------------+ -| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | -| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | -| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | -| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | -| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | -| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | -| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | -| ARM | Cortex-A57 | #852523 | N/A | -| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | -| | | | | -| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 | -| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | - diff --git a/Documentation/translations/zh_TW/arm64/tagged-pointers.txt b/Documentation/translations/zh_TW/arm64/tagged-pointers.txt deleted file mode 100644 index 87f88628401a..000000000000 --- a/Documentation/translations/zh_TW/arm64/tagged-pointers.txt +++ /dev/null @@ -1,57 +0,0 @@ -SPDX-License-Identifier: GPL-2.0 - -Chinese translated version of Documentation/arm64/tagged-pointers.rst - -If you have any comment or update to the content, please contact the -original document maintainer directly. However, if you have a problem -communicating in English you can also ask the Chinese maintainer for -help. Contact the Chinese maintainer if this translation is outdated -or if there is a problem with the translation. - -Maintainer: Will Deacon -Chinese maintainer: Fu Wei -Traditional Chinese maintainer: Hu Haowen ---------------------------------------------------------------------- -Documentation/arm64/tagged-pointers.rst 的中文翻譯 - -如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 -交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 -譯存在問題,請聯繫中文版維護者。 - -英文版維護者: Will Deacon -中文版維護者: 傅煒 Fu Wei -中文版翻譯者: 傅煒 Fu Wei -中文版校譯者: 傅煒 Fu Wei -繁體中文版校譯者: 胡皓文 Hu Haowen - -以下爲正文 ---------------------------------------------------------------------- - Linux 在 AArch64 中帶標記的虛擬地址 - ================================= - -作者: Will Deacon -日期: 2013 年 06 月 12 日 - -本文檔簡述了在 AArch64 地址轉換系統中提供的帶標記的虛擬地址及其在 -AArch64 Linux 中的潛在用途。 - -內核提供的地址轉換表配置使通過 TTBR0 完成的虛擬地址轉換(即用戶空間 -映射),其虛擬地址的最高 8 位(63:56)會被轉換硬體所忽略。這種機制 -讓這些位可供應用程式自由使用,其注意事項如下: - - (1) 內核要求所有傳遞到 EL1 的用戶空間地址帶有 0x00 標記。 - 這意味著任何攜帶用戶空間虛擬地址的系統調用(syscall) - 參數 *必須* 在陷入內核前使它們的最高字節被清零。 - - (2) 非零標記在傳遞信號時不被保存。這意味著在應用程式中利用了 - 標記的信號處理函數無法依賴 siginfo_t 的用戶空間虛擬 - 地址所攜帶的包含其內部域信息的標記。此規則的一個例外是 - 當信號是在調試觀察點的異常處理程序中產生的,此時標記的 - 信息將被保存。 - - (3) 當使用帶標記的指針時需特別留心,因爲僅對兩個虛擬地址 - 的高字節,C 編譯器很可能無法判斷它們是不同的。 - -此構架會阻止對帶標記的 PC 指針的利用,因此在異常返回時,其高字節 -將被設置成一個爲 「55」 的擴展符。 - diff --git a/Documentation/translations/zh_TW/index.rst b/Documentation/translations/zh_TW/index.rst index e97d7d578751..e7c83868e780 100644 --- a/Documentation/translations/zh_TW/index.rst +++ b/Documentation/translations/zh_TW/index.rst @@ -150,7 +150,7 @@ TODOList: .. toctree:: :maxdepth: 2 - arm64/index + arch/arm64/index TODOList: diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index add067793b90..96c4475539c2 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -2613,7 +2613,7 @@ follows:: this vcpu, and determines which register slices are visible through this ioctl interface. -(See Documentation/arm64/sve.rst for an explanation of the "vq" +(See Documentation/arch/arm64/sve.rst for an explanation of the "vq" nomenclature.) KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT. -- cgit v1.2.3