summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2024-03-18ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512Christoph Lameter (Ampere)1-1/+2
[ a.k.a. Revert "Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512""; originally reverted because of a bug in the cpufreq-dt code not using zalloc_cpumask_var() ] Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere Computing) are planning to ship systems with 512 CPUs. So that all CPUs on these systems can be used with defconfig, we'd like to bump NR_CPUS to 512. Therefore this patch increases the default NR_CPUS from 256 to 512. As increasing NR_CPUS will increase the size of cpumasks, there's a fear that this might have a significant impact on stack usage due to code which places cpumasks on the stack. To mitigate that concern, we can select CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with NR_CPUS=256, we only select this when NR_CPUS > 256. CPUMASK_OFFSTACK configures the cpumasks in the kernel to be dynamically allocated. This was used in the X86 architecture in the past to enable support for larger CPU configurations up to 8k cpus. With that is becomes possible to dynamically size the allocation of the cpu bitmaps depending on the quantity of processors detected on bootup. Memory used for cpumasks will increase if the kernel is run on a machine with more cores. Further increases may be needed if ARM processor vendors start supporting more processors. Given the current inflationary trends in core counts from multiple processor manufacturers this may occur. There are minor regressions for hackbench. The kernel data size for 512 cpus is smaller with offstack than with onstack. Benchmark results using hackbench average over 10 runs of hackbench -s 512 -l 2000 -g 15 -f 25 -P on Altra 80 Core Support for 256 CPUs on stack. Baseline 7.8564 sec Support for 512 CUs on stack. 7.8713 sec + 0.18% 512 CPUS offstack 7.8916 sec + 0.44% Kernel size comparison: text data filename Difference to onstack256 baseline 25755648 9589248 vmlinuz-6.8.0-rc4-onstack256 25755648 9607680 vmlinuz-6.8.0-rc4-onstack512 +0.19% 25755648 9603584 vmlinuz-6.8.0-rc4-offstack512 +0.14% Tested-by: Eric Mackay <eric.mackay@oracle.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/37099a57-b655-3b3a-56d0-5f7fbd49d7db@gentwo.org Link: https://lore.kernel.org/r/20240314125457.186678-1-m.szyprowski@samsung.com [catalin.marinas@arm.com: use 'select' instead of duplicating 'config CPUMASK_OFFSTACK'] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-13Revert "arm64: mm: add support for WXN memory translation attribute"Catalin Marinas7-116/+2
This reverts commit 50e3ed0f93f4f62ed2aa83de5db6cb84ecdd5707. The SCTLR_EL1.WXN control forces execute-never when a page has write permissions. While the idea of hardening such write/exec combinations is good, with permissions indirection enabled (FEAT_PIE) this control becomes RES0. FEAT_PIE introduces a slightly different form of WXN which only has an effect when the base permission is RWX and the write is toggled by the permission overlay (FEAT_POE, not yet supported by the arm64 kernel). Revert the patch for now. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/ZfGESD3a91lxH367@arm.com
2024-03-11Revert "ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512"Catalin Marinas1-2/+1
This reverts commit 0499a78369adacec1af29340b71ff8dd375b4697. Enabling CPUMASK_OFFSTACK on arm64 triggers a warning in the dev_pm_opp_set_config() function followed by a failure to set the regulators and cpufreq-dt probing error. There is no apparent reason why this happens, so revert this commit until further investigation. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/c1f2902d-cefc-4122-9b86-d1d32911f590@samsung.com
2024-03-07Merge branch 'for-next/stage1-lpa2' into for-next/coreCatalin Marinas53-1124/+1931
* for-next/stage1-lpa2: (48 commits) : Add support for LPA2 and WXN and stage 1 arm64/mm: Avoid ID mapping of kpti flag if it is no longer needed arm64/mm: Use generic __pud_free() helper in pud_free() implementation arm64: gitignore: ignore relacheck arm64: Use Signed/Unsigned enums for TGRAN{4,16,64} and VARange arm64: mm: Make PUD folding check in set_pud() a runtime check arm64: mm: add support for WXN memory translation attribute mm: add arch hook to validate mmap() prot flags arm64: defconfig: Enable LPA2 support arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levels arm64: ptdump: Deal with translation levels folded at runtime arm64: ptdump: Disregard unaddressable VA space arm64: mm: Add support for folding PUDs at runtime arm64: kasan: Reduce minimum shadow alignment and enable 5 level paging arm64: mm: Add 5 level paging support to fixmap and swapper handling arm64: Enable LPA2 at boot if supported by the system arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversion arm64: mm: Add definitions to support 5 levels of paging arm64: mm: Add LPA2 support to phys<->pte conversion routines arm64: mm: Wire up TCR.DS bit to PTE shareability fields ...
2024-03-07Merge branches 'for-next/reorg-va-space', 'for-next/rust-for-arm64', ↵Catalin Marinas42-220/+466
'for-next/misc', 'for-next/daif-cleanup', 'for-next/kselftest', 'for-next/documentation', 'for-next/sysreg' and 'for-next/dpisa', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (39 commits) docs: perf: Fix build warning of hisi-pcie-pmu.rst perf: starfive: Only allow COMPILE_TEST for 64-bit architectures MAINTAINERS: Add entry for StarFive StarLink PMU docs: perf: Add description for StarFive's StarLink PMU dt-bindings: perf: starfive: Add JH8100 StarLink PMU perf: starfive: Add StarLink PMU support docs: perf: Update usage for target filter of hisi-pcie-pmu drivers/perf: hisi_pcie: Merge find_related_event() and get_event_idx() drivers/perf: hisi_pcie: Relax the check on related events drivers/perf: hisi_pcie: Check the target filter properly drivers/perf: hisi_pcie: Add more events for counting TLP bandwidth drivers/perf: hisi_pcie: Fix incorrect counting under metric mode drivers/perf: hisi_pcie: Introduce hisi_pcie_pmu_get_event_ctrl_val() drivers/perf: hisi_pcie: Rename hisi_pcie_pmu_{config,clear}_filter() drivers/perf: hisi: Enable HiSilicon Erratum 162700402 quirk for HIP09 perf/arm_cspmu: Add devicetree support dt-bindings/perf: Add Arm CoreSight PMU perf/arm_cspmu: Simplify counter reset perf/arm_cspmu: Simplify attribute groups perf/arm_cspmu: Simplify initialisation ... * for-next/reorg-va-space: : Reorganise the arm64 kernel VA space in preparation for LPA2 support : (52-bit VA/PA). arm64: kaslr: Adjust randomization range dynamically arm64: mm: Reclaim unused vmemmap region for vmalloc use arm64: vmemmap: Avoid base2 order of struct page size to dimension region arm64: ptdump: Discover start of vmemmap region at runtime arm64: ptdump: Allow all region boundaries to be defined at boot time arm64: mm: Move fixmap region above vmemmap region arm64: mm: Move PCI I/O emulation region above the vmemmap region * for-next/rust-for-arm64: : Enable Rust support for arm64 arm64: rust: Enable Rust support for AArch64 rust: Refactor the build target to allow the use of builtin targets * for-next/misc: : Miscellaneous arm64 patches ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512 arm64: Remove enable_daif macro arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exception arm64: cpufeatures: Clean up temporary variable to simplify code arm64: Update setup_arch() comment on interrupt masking arm64: remove unnecessary ifdefs around is_compat_task() arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with Clang arm64/sme: Ensure that all fields in SMCR_EL1 are set to known values arm64/sve: Ensure that all fields in ZCR_EL1 are set to known values arm64/sve: Document that __SVE_VQ_MAX is much larger than needed arm64: make member of struct pt_regs and it's offset macro in the same order arm64: remove unneeded BUILD_BUG_ON assertion arm64: kretprobes: acquire the regs via a BRK exception arm64: io: permit offset addressing arm64: errata: Don't enable workarounds for "rare" errata by default * for-next/daif-cleanup: : Clean up DAIF handling for EL0 returns arm64: Unmask Debug + SError in do_notify_resume() arm64: Move do_notify_resume() to entry-common.c arm64: Simplify do_notify_resume() DAIF masking * for-next/kselftest: : Miscellaneous arm64 kselftest patches kselftest/arm64: Test that ptrace takes effect in the target process * for-next/documentation: : arm64 documentation patches arm64/sme: Remove spurious 'is' in SME documentation arm64/fp: Clarify effect of setting an unsupported system VL arm64/sme: Fix cut'n'paste in ABI document arm64/sve: Remove bitrotted comment about syscall behaviour * for-next/sysreg: : sysreg updates arm64/sysreg: Update ID_AA64DFR0_EL1 register arm64/sysreg: Update ID_DFR0_EL1 register fields arm64/sysreg: Add register fields for ID_AA64DFR1_EL1 * for-next/dpisa: : Support for 2023 dpISA extensions kselftest/arm64: Add 2023 DPISA hwcap test coverage kselftest/arm64: Add basic FPMR test kselftest/arm64: Handle FPMR context in generic signal frame parser arm64/hwcap: Define hwcaps for 2023 DPISA features arm64/ptrace: Expose FPMR via ptrace arm64/signal: Add FPMR signal handling arm64/fpsimd: Support FEAT_FPMR arm64/fpsimd: Enable host kernel access to FPMR arm64/cpufeature: Hook new identification registers up to cpufeature
2024-03-07ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512Christoph Lameter (Ampere)1-1/+2
Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere Computing) are planning to ship systems with 512 CPUs. So that all CPUs on these systems can be used with defconfig, we'd like to bump NR_CPUS to 512. Therefore this patch increases the default NR_CPUS from 256 to 512. As increasing NR_CPUS will increase the size of cpumasks, there's a fear that this might have a significant impact on stack usage due to code which places cpumasks on the stack. To mitigate that concern, we can select CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with NR_CPUS=256, we only select this when NR_CPUS > 256. CPUMASK_OFFSTACK configures the cpumasks in the kernel to be dynamically allocated. This was used in the X86 architecture in the past to enable support for larger CPU configurations up to 8k cpus. With that is becomes possible to dynamically size the allocation of the cpu bitmaps depending on the quantity of processors detected on bootup. Memory used for cpumasks will increase if the kernel is run on a machine with more cores. Further increases may be needed if ARM processor vendors start supporting more processors. Given the current inflationary trends in core counts from multiple processor manufacturers this may occur. There are minor regressions for hackbench. The kernel data size for 512 cpus is smaller with offstack than with onstack. Benchmark results using hackbench average over 10 runs of hackbench -s 512 -l 2000 -g 15 -f 25 -P on Altra 80 Core Support for 256 CPUs on stack. Baseline 7.8564 sec Support for 512 CUs on stack. 7.8713 sec + 0.18% 512 CPUS offstack 7.8916 sec + 0.44% Kernel size comparison: text data filename Difference to onstack256 baseline 25755648 9589248 vmlinuz-6.8.0-rc4-onstack256 25755648 9607680 vmlinuz-6.8.0-rc4-onstack512 +0.19% 25755648 9603584 vmlinuz-6.8.0-rc4-offstack512 +0.14% Tested-by: Eric Mackay <eric.mackay@oracle.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Christoph Lameter (Ampere) <cl@linux.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/37099a57-b655-3b3a-56d0-5f7fbd49d7db@gentwo.org [catalin.marinas@arm.com: use 'select' instead of duplicating 'config CPUMASK_OFFSTACK'] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07arm64/hwcap: Define hwcaps for 2023 DPISA featuresMark Brown4-0/+80
The 2023 architecture extensions include a large number of floating point features, most of which simply add new instructions. Add hwcaps so that userspace can enumerate these features. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-6-c568edc8ed7f@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07arm64/ptrace: Expose FPMR via ptraceMark Brown1-0/+42
Add a new regset to expose FPMR via ptrace. It is not added to the FPSIMD registers since that structure is exposed elsewhere without any allowance for extension we don't add there. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-5-c568edc8ed7f@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07arm64/signal: Add FPMR signal handlingMark Brown2-0/+67
Expose FPMR in the signal context on systems where it is supported. The kernel validates the exact size of the FPSIMD registers so we can't readily add it to fpsimd_context without disruption. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-4-c568edc8ed7f@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07arm64/fpsimd: Support FEAT_FPMRMark Brown8-0/+36
FEAT_FPMR defines a new EL0 accessible register FPMR use to configure the FP8 related features added to the architecture at the same time. Detect support for this register and context switch it for EL0 when present. Due to the sharing of responsibility for saving floating point state between the host kernel and KVM FP8 support is not yet implemented in KVM and a stub similar to that used for SVCR is provided for FPMR in order to avoid bisection issues. To make it easier to share host state with the hypervisor we store FPMR as a hardened usercopy field in uw (along with some padding). Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-3-c568edc8ed7f@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07arm64/fpsimd: Enable host kernel access to FPMRMark Brown1-1/+1
FEAT_FPMR provides a new generally accessible architectural register FPMR. This is only accessible to EL0 and EL1 when HCRX_EL2.EnFPM is set to 1, do this when the host is running. The guest part will be done along with context switching the new register and exposing it via guest management. Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-2-c568edc8ed7f@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-07arm64/cpufeature: Hook new identification registers up to cpufeatureMark Brown3-0/+34
The 2023 architecture extensions have defined several new ID registers, hook them up to the cpufeature code so we can add feature checks and hwcaps based on their contents. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-1-c568edc8ed7f@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-01arm64: Remove enable_daif macroJinjie Ruan1-4/+0
Since commit bb8e93a287a5 ("arm64: entry: convert SError handlers to C"), the enable_daif assembler macro is no longer used anywhere, so remove it. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240229132802.1682026-2-ruanjinjie@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-01arm64/hw_breakpoint: Directly use ESR_ELx_WNR for an watchpoint exceptionAnshuman Khandual2-2/+2
Let's use existing ISS encoding for an watchpoint exception i.e ESR_ELx_WNR This represents an instruction's either writing to or reading from a memory location during an watchpoint exception. While here this drops non-standard macro AARCH64_ESR_ACCESS_MASK. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240229083431.356578-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-01arm64: cpufeatures: Clean up temporary variable to simplify codeLiao Chang1-6/+2
Clean up one temporary variable to simplifiy code in capability detection. Signed-off-by: Liao Chang <liaochang1@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240229105208.456704-1-liaochang1@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-01arm64/mm: Avoid ID mapping of kpti flag if it is no longer neededArd Biesheuvel1-1/+1
arm64_use_ng_mappings will be set to 'true' by the early boot code if it decides to use non-global (nG) attributes for all kernel mappings, typically when enabling KASLR on a system that does not implement E0PD. In this case, the G-to-nG update routines are never called, and so there is no reason to create the writable mapping of the associated status flag in the ID map. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240301104046.1234309-6-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-01arm64/mm: Use generic __pud_free() helper in pud_free() implementationArd Biesheuvel1-2/+1
Commit 0dd4f60a2c76 ("arm64: mm: Add support for folding PUDs at runtime") implements specialized PUD alloc/free helpers to allow the decision whether or not to fold PUDs to be made at runtime when the number of paging levels is 4 or higher. Its implementation of pud_free() is based on the generic version that existed when the patch was first written, but in the meantime, the freeing of a PUD has become a bit more involved, and so instead of simply freeing the page, we should invoke the generic __pud_free() that encapsulates whatever needs doing at this point. This fixes a reported warning emitted by the page flags self-diagnostics. Reported-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20240301104046.1234309-5-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-28arm64: Update setup_arch() comment on interrupt maskingRyo Takakura1-3/+2
DAIF_PROCCTX_NOIRQ contains the FIQ bit. Update the comment as only asynchronous aborts are unmasked and FIQ is still masked. Signed-off-by: Ryo Takakura <takakura@valinux.co.jp> Link: https://lore.kernel.org/r/20240228022836.1756-1-takakura@valinux.co.jp Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-28arm64: remove unnecessary ifdefs around is_compat_task()Leonardo Bras4-16/+9
Currently some parts of the codebase will test for CONFIG_COMPAT before testing is_compat_task(). is_compat_task() is a inlined function only present on CONFIG_COMPAT. On the other hand, for !CONFIG_COMPAT, we have in linux/compat.h: #define is_compat_task() (0) Since we have this define available in every usage of is_compat_task() for !CONFIG_COMPAT, it's unnecessary to keep the ifdefs, since the compiler is smart enough to optimize-out those snippets on CONFIG_COMPAT=n This requires some regset code as well as a few other defines to be made available on !CONFIG_COMPAT, so some symbols can get resolved before getting optimized-out. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240109034651.478462-2-leobras@redhat.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-28arm64: ftrace: Don't forbid CALL_OPS+CC_OPTIMIZE_FOR_SIZE with ClangStephen Boyd1-1/+1
Per commit b3f11af9b2ce ("arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE"), GCC is silently ignoring `-falign-functions=N` when passed `-Os`, causing functions to be improperly aligned. This doesn't seem to be a problem with Clang though, where enabling CALL_OPS with CC_OPTIMIZE_FOR_SIZE doesn't spit out any warnings at boot about misaligned patch-sites. Only forbid CALL_OPS if GCC is used and we're optimizing for size so that CALL_OPS can be used with clang optimizing for size. Cc: Jason Ling <jasonling@chromium.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Justin Stitt <justinstitt@google.com> Cc: llvm@lists.linux.dev Fixes: b3f11af9b2ce ("arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20240223064032.3463229-1-swboyd@chromium.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-23arm64: gitignore: ignore relacheckBartosz Golaszewski1-0/+3
Add the generated executable for relacheck to the list of ignored files. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Link: https://lore.kernel.org/r/20240222210441.33142-1-brgl@bgdev.pl Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-22arm64/sme: Ensure that all fields in SMCR_EL1 are set to known valuesMark Brown1-0/+3
At present nothing in our CPU initialisation code ever sets unknown fields in SMCR_EL1 to known values, all updates to SMCR_EL1 are read/modify/write sequences. All the unknown fields are RES0, explicitly initialise them as such to avoid future surprises. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240213-arm64-fp-init-vec-cr-v1-2-7e7c2d584f26@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-22arm64/sve: Ensure that all fields in ZCR_EL1 are set to known valuesMark Brown1-0/+2
At present nothing in our CPU initialisation code ever sets unknown fields in ZCR_EL1 to known values, all updates to ZCR_EL1 are read/modify/write sequences for LEN. All the unknown fields are RES0, explicitly initialise them as such to avoid future surprises. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240213-arm64-fp-init-vec-cr-v1-1-7e7c2d584f26@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-22arm64/sve: Document that __SVE_VQ_MAX is much larger than neededMark Brown1-0/+11
__SVE_VQ_MAX is defined without comment as 512 but the actual architectural maximum is 16, a substantial difference which might not be obvious to readers especially given the several different units used for specifying vector sizes in various contexts and the fact that it's often used via macros. In an effort to minimise surprises for users who might assume the value is the architectural maximum and use it to do things like size allocations add a comment noting the difference, and add a note for SVE_VQ_MAX to aid discoverability. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Dave Martin <Dave.Martin@arm.com> Link: https://lore.kernel.org/r/20240209-arm64-sve-vl-max-comment-v2-1-111b283469ee@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-22arm64: make member of struct pt_regs and it's offset macro in the same orderKemeng Shi1-1/+1
In struct pt_regs, member pstate is after member pc. Move offset macro of pstate after offset macro of pc to improve readability a little. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240130175504.106364-1-shikemeng@huaweicloud.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-22arm64: remove unneeded BUILD_BUG_ON assertionDawei Li1-3/+0
Since commit c02433dd6de3 ("arm64: split thread_info from task stack"), CONFIG_THREAD_INFO_IN_TASK is enabled unconditionally for arm64. So remove this always-true assertion from arch_dup_task_struct. Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240202040211.3118918-1-dawei.li@shingroup.cn Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-21arm64/sysreg: Update ID_AA64DFR0_EL1 registerAnshuman Khandual1-0/+2
This updates ID_AA64DFR0_EL1.PMSVer and ID_AA64DFR0_EL1.DebugVer register fields as per the definitions based on DDI0601 2023-12. Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240220034829.3098373-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-21arm64/sysreg: Update ID_DFR0_EL1 register fieldsAnshuman Khandual1-0/+2
This updates ID_DFR0_EL1.PerfMon and ID_DFR0_EL1.CopDbg register fields as per the definitions based on DDI0601 2023-12. Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240220025343.3093955-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-21arm64/sysreg: Add register fields for ID_AA64DFR1_EL1Anshuman Khandual1-1/+30
This adds register fields for ID_AA64DFR1_EL1 as per the definitions based on DDI0601 2023-12. Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240220023203.3091229-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-20arm64: kretprobes: acquire the regs via a BRK exceptionMark Rutland3-77/+24
On arm64, kprobes always take an exception and so create a struct pt_regs through the usual exception entry logic. Similarly kretprobes taskes and exception for function entry, but for function returns it uses a trampoline which attempts to create a struct pt_regs without taking an exception. This is problematic for a few reasons, including: 1) The kretprobes trampoline neither saves nor restores all of the portions of PSTATE. Before invoking the handler it saves a number of portions of PSTATE, and after returning from the handler it restores NZCV before returning to the original return address provided by the handler. 2) The kretprobe trampoline constructs the PSTATE value piecemeal from special purpose registers as it cannot read all of PSTATE atomically without taking an exception. This is somewhat fragile, and it's not possible to reliably recover PSTATE information which only exists on some physical CPUs (e.g. when SSBS support is mismatched). Today the kretprobes trampoline does not record: - BTYPE - SSBS - ALLINT - SS - PAN - UAO - DIT - TCO ... and this will only get worse with future architecture extensions which add more PSTATE bits. 3) The kretprobes trampoline doesn't store portions of struct pt_regs (e.g. the PMR value when using pseudo-NMIs). Due to this, helpers which operate on a struct pt_regs, such as interrupts_enabled(), may not work correctly. 4) The function entry and function exit handlers run in different contexts. The entry handler will always be run in a debug exception context (which is currently treated as an NMI), but the return will be treated as whatever context the instrumented function was executed in. The differences between these contexts are liable to cause problems (e.g. as the two can be differently interruptible or preemptible, adversely affecting synchronization between the handlers). 5) As the kretprobes trampoline runs in the same context as the code being probed, it is subject to the same single-stepping context, which may not be desirable if this is being driven by the kprobes handlers. Overall, this is fragile, painful to maintain, and gets in the way of supporting other things (e.g. RELIABLE_STACKTRACE, FEAT_NMI). This patch addresses these issues by replacing the kretprobes trampoline with a `BRK` instruction, and using an exception boundary to acquire and restore the regs, in the same way as the regular kprobes trampoline. Ive tested this atop v6.8-rc3: | KTAP version 1 | 1..1 | KTAP version 1 | # Subtest: kprobes_test | # module: test_kprobes | 1..7 | ok 1 test_kprobe | ok 2 test_kprobes | ok 3 test_kprobe_missed | ok 4 test_kretprobe | ok 5 test_kretprobes | ok 6 test_stacktrace_on_kretprobe | ok 7 test_stacktrace_on_nested_kretprobe | # kprobes_test: pass:7 fail:0 skip:0 total:7 | # Totals: pass:7 fail:0 skip:0 total:7 | ok 1 kprobes_test Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Florent Revest <revest@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240208145916.2004154-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-20arm64: Unmask Debug + SError in do_notify_resume()Mark Rutland1-3/+5
When returning to a user context, the arm64 entry code masks all DAIF exceptions before handling pending work in exit_to_user_mode_prepare() and do_notify_resume(), where it will transiently unmask all DAIF exceptions. This is a holdover from the old entry assembly, which conservatively masked all DAIF exceptions, and it's only necessary to mask interrupts at this point during the exception return path, so long as we subsequently mask all DAIF exceptions before the actual exception return. While most DAIF manipulation follows a save...restore sequence, the manipulation in do_notify_resume() is the other way around, unmasking all DAIF exceptions before masking them again. This is unfortunate as we unnecessarily mask Debug and SError exceptions, and it would be nice to remove this special case to make DAIF manipulation simpler and most consistent. This patch changes exit_to_user_mode_prepare() and do_notify_resume() to only mask interrupts while handling pending work, masking other DAIF exceptions after this has completed. This removes the unusual DAIF manipulation and allows Debug and SError exceptions to be taken for a slightly longer window during the exception return path. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240206123848.1696480-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Itaru Kitayama <itaru.kitayama@linux.dev>
2024-02-20arm64: Move do_notify_resume() to entry-common.cMark Rutland3-34/+35
Currently do_notify_resume() lives in arch/arm64/kernel/signal.c, but it would make more sense for it to live in entry-common.c as it handles more than signals, and is coupled with the rest of the return-to-userspace sequence (e.g. with unusual DAIF masking that matches the exception return requirements). Move do_notify_resume() to entry-common.c. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240206123848.1696480-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Itaru Kitayama <itaru.kitayama@linux.dev>
2024-02-20arm64: Simplify do_notify_resume() DAIF maskingMark Rutland1-19/+15
In do_notify_resume, we handle _TIF_NEED_RESCHED differently from all other flags, leaving IRQ+FIQ masked when calling into schedule(). This masking is a historical artifact, and it is not currently necessary to mask IRQ+FIQ when calling into schedule (as evidenced by the generic exit_to_user_mode_loop(), which unmasks IRQs before checking _TIF_NEED_RESCHED and calling schedule()). This patch removes the special case for _TIF_NEED_RESCHED, moving this check into the main loop such that schedule() will be called from a regular process context with IRQ+FIQ unmasked. This is a minor simplification to do_notify_resume() and brings it into line with the generic exit_to_user_mode_loop() logic. This will also aid subsequent rework of DAIF management. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240206123848.1696480-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Itaru Kitayama <itaru.kitayama@linux.dev>
2024-02-20arm64: io: permit offset addressingMark Rutland1-4/+8
Currently our IO accessors all use register addressing without offsets, but we could safely use offset addressing (without writeback) to simplify and optimize the generated code. To function correctly under a hypervisor which emulates IO accesses, we must ensure that any faulting/trapped IO access results in an ESR_ELx value with ESR_ELX.ISS.ISV=1 and with the tranfer register described in ESR_ELx.ISS.SRT. This means that we can only use loads/stores of a single general purpose register (or the zero register), and must avoid writeback addressing modes. However, we can use immediate offset addressing modes, as these still provide ESR_ELX.ISS.ISV=1 and a valid ESR_ELx.ISS.SRT when those accesses fault at Stage-2. Currently we only use register addressing without offsets. We use the "r" constraint to place the address into a register, and manually generate the register addressing by surrounding the resulting register operand with square braces, e.g. | static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr) | { | asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr)); | } Due to this, sequences of adjacent accesses need to generate addresses using separate instructions. For example, the following code: | void writeq_zero_8_times(void *ptr) | { | writeq_relaxed(0, ptr + 8 * 0); | writeq_relaxed(0, ptr + 8 * 1); | writeq_relaxed(0, ptr + 8 * 2); | writeq_relaxed(0, ptr + 8 * 3); | writeq_relaxed(0, ptr + 8 * 4); | writeq_relaxed(0, ptr + 8 * 5); | writeq_relaxed(0, ptr + 8 * 6); | writeq_relaxed(0, ptr + 8 * 7); | } ... is compiled to: | <writeq_zero_8_times>: | str xzr, [x0] | add x1, x0, #0x8 | str xzr, [x1] | add x1, x0, #0x10 | str xzr, [x1] | add x1, x0, #0x18 | str xzr, [x1] | add x1, x0, #0x20 | str xzr, [x1] | add x1, x0, #0x28 | str xzr, [x1] | add x1, x0, #0x30 | str xzr, [x1] | add x0, x0, #0x38 | str xzr, [x0] | ret As described above, we could safely use immediate offset addressing, which would allow the ADDs to be folded into the address generation for the STRs, resulting in simpler and smaller generated assembly. We can do this by using the "o" constraint to allow the compiler to generate offset addressing (without writeback) for a memory operand, e.g. | static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr) | { | volatile u64 __iomem *ptr = addr; | asm volatile("str %x0, %1" : : "rZ" (val), "o" (*ptr)); | } ... which results in the earlier code sequence being compiled to: | <writeq_zero_8_times>: | str xzr, [x0] | str xzr, [x0, #8] | str xzr, [x0, #16] | str xzr, [x0, #24] | str xzr, [x0, #32] | str xzr, [x0, #40] | str xzr, [x0, #48] | str xzr, [x0, #56] | ret As Will notes at: https://lore.kernel.org/linux-arm-kernel/20240117160528.GA3398@willie-the-truck/ ... some compilers struggle with a plain "o" constraint, so it's preferable to use "Qo", where the additional "Q" constraint permits using non-offset register addressing. This patch modifies our IO write accessors to use "Qo" constraints, resulting in the better code generation described above. The IO read accessors are left as-is because ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE requires that non-offset register addressing is used, as the LDAR instruction does not support offset addressing. When compiling v6.8-rc1 defconfig with GCC 13.2.0, this saves ~4KiB of text: | [mark@lakrids:~/src/linux]% ls -al vmlinux-* | -rwxr-xr-x 1 mark mark 153960576 Jan 23 12:01 vmlinux-after | -rwxr-xr-x 1 mark mark 153862192 Jan 23 11:57 vmlinux-before | | [mark@lakrids:~/src/linux]% size vmlinux-before vmlinux-after | text data bss dec hex filename | 26708921 16690350 622736 44022007 29fb8f7 vmlinux-before | 26704761 16690414 622736 44017911 29fa8f7 vmlinux-after ... though due to internal alignment of sections, this has no impact on the size of the resulting Image: | [mark@lakrids:~/src/linux]% ls -al Image-* | -rw-r--r-- 1 mark mark 43590144 Jan 23 12:01 Image-after | -rw-r--r-- 1 mark mark 43590144 Jan 23 11:57 Image-before Aside from the better code generation, there should be no functional change as a result of this patch. I have lightly tested this patch, including booting under KVM (where some devices such as PL011 are emulated). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240124111259.874975-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-20arm64: errata: Don't enable workarounds for "rare" errata by defaultWill Deacon1-14/+11
Arm classifies some of its CPU errata as "rare", indicating that the hardware error is unlikely to occur in practice. Given that the cost of errata workarounds can often be significant in terms of power and performance, don't enable workarounds for "rare" errata by default and update our documentation to reflect that. Cc: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240209183916.25860-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-19arm64: Use Signed/Unsigned enums for TGRAN{4,16,64} and VARangeMarc Zyngier2-16/+7
Open-coding the feature matching parameters for LVA/LVA2 leads to issues with upcoming changes to the cpufeature code. By making TGRAN{4,16,64} and VARange signed/unsigned as per the architecture, we can use the existing macros, making the feature match robust against those changes. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-19arm64: mm: Make PUD folding check in set_pud() a runtime checkArd Biesheuvel1-3/+3
When set_pud() is called on a 4-level paging build config that runs with 3 levels at runtime (which happens with 16k page size builds with support for LPA2), the updated entry is in fact a PGD in swapper_pg_dir[], and this is mapped read-only after boot. So in this case, the existing check needs to be performed as well, even though __PAGETABLE_PUD_FOLDED is not #define'd. So replace the #ifdef with a call to pgtable_l4_enabled(). Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240216235944.3677178-2-ardb+git@google.com Reviewed-by: Itaru Kitayama <itaru.kitayama@fujitsu.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: add support for WXN memory translation attributeArd Biesheuvel7-2/+116
The AArch64 virtual memory system supports a global WXN control, which can be enabled to make all writable mappings implicitly no-exec. This is a useful hardening feature, as it prevents mistakes in managing page table permissions from being exploited to attack the system. When enabled at EL1, the restrictions apply to both EL1 and EL0. EL1 is completely under our control, and has been cleaned up to allow WXN to be enabled from boot onwards. EL0 is not under our control, but given that widely deployed security features such as selinux or PaX already limit the ability of user space to create mappings that are writable and executable at the same time, the impact of enabling this for EL0 is expected to be limited. (For this reason, common user space libraries that have a legitimate need for manipulating executable code already carry fallbacks such as [0].) If enabled at compile time, the feature can still be disabled at boot if needed, by passing arm64.nowxn on the kernel command line. [0] https://github.com/libffi/libffi/blob/master/src/closures.c#L440 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240214122845.2033971-88-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: defconfig: Enable LPA2 supportArd Biesheuvel2-4/+1
We typically enable support in defconfig for all architectural features for which we can detect at runtime if the hardware actually supports them. Now that we have implemented support for LPA2 based 52-bit virtual addressing in a way that should not impact 48-bit operation on non-LPA2 CPU, we can do the same, and enable 52-bit virtual addressing by default. Catalin adds: Currently the "Virtual address space size" arch/arm64/Kconfig menu entry sets different defaults for each page size. However, all are overridden by the defconfig to 48 bits. Set the new default in Kconfig and remove the defconfig line. [ardb: squash follow-up fix from Catalin] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-86-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: Enable 52-bit virtual addressing for 4k and 16k granule configsArd Biesheuvel2-11/+28
Update Kconfig to permit 4k and 16k granule configurations to be built with 52-bit virtual addressing, now that all the prerequisites are in place. While at it, update the feature description so it matches on the appropriate feature bits depending on the page size. For simplicity, let's just keep ARM64_HAS_VA52 as the feature name. Note that LPA2 based 52-bit virtual addressing requires 52-bit physical addressing support to be enabled as well, as programming TCR.TxSZ to values below 16 is not allowed unless TCR.DS is set, which is what activates the 52-bit physical addressing support. While supporting the converse (52-bit physical addressing without 52-bit virtual addressing) would be possible in principle, let's keep things simple, by only allowing these features to be enabled at the same time. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-85-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: kvm: avoid CONFIG_PGTABLE_LEVELS for runtime levelsArd Biesheuvel1-1/+1
get_user_mapping_size() uses vabits_actual and CONFIG_PGTABLE_LEVELS to provide the starting point for a table walk. This is fine for LVA, as the number of translation levels is the same regardless of whether LVA is enabled. However, with LPA2, this will no longer be the case, so let's derive the number of levels from the number of VA bits directly. Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-84-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: ptdump: Deal with translation levels folded at runtimeArd Biesheuvel1-5/+12
Currently, the ptdump code deals with folded PMD or PUD levels at build time, by omitting those levels when invoking note_page. IOW, note_page() is never invoked with level == 1 if P4Ds are folded in the build configuration. With the introduction of LPA2 support, we will defer some of these folding decisions to runtime, so let's take care of this by overriding the 'level' argument when this condition triggers. Substituting the PUD or PMD strings for "PGD" when the level in question is folded at build time is no longer necessary, and so the conditional expressions can be simplified. This also makes the indirection of the 'name' field unnecessary, so change that into a char[] array, and make the whole thing __ro_after_init. Note that the mm_p?d_folded() functions currently ignore their mm pointer arguments, but let's wire them up correctly anyway. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-83-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: ptdump: Disregard unaddressable VA spaceArd Biesheuvel1-2/+2
Configurations built with support for 52-bit virtual addressing can also run on CPUs that only support 48 bits of VA space, in which case only that part of swapper_pg_dir that represents the 48-bit addressable region is relevant, and everything else is ignored by the hardware. Our software pagetable walker has little in the way of input address validation, and so it will happily start a walk from an address that is not representable by the number of paging levels that are actually active, resulting in lots of bogus output from the page table dumper unless we take care to start at a valid address. So define the start address at runtime based on vabits_actual. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-82-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: Add support for folding PUDs at runtimeArd Biesheuvel6-13/+95
In order to support LPA2 on 16k pages in a way that permits non-LPA2 systems to run the same kernel image, we have to be able to fall back to at most 48 bits of virtual addressing. Falling back to 48 bits would result in a level 0 with only 2 entries, which is suboptimal in terms of TLB utilization. So instead, let's fall back to 47 bits in that case. This means we need to be able to fold PUDs dynamically, similar to how we fold P4Ds for 48 bit virtual addressing on LPA2 with 4k pages. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-81-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: kasan: Reduce minimum shadow alignment and enable 5 level pagingArd Biesheuvel2-20/+130
Allow the KASAN init code to deal with 5 levels of paging, and relax the requirement that the shadow region is aligned to the top level pgd_t size. This is necessary for LPA2 based 52-bit virtual addressing, where the KASAN shadow will never be aligned to the pgd_t size. Allowing this also enables the 16k/48-bit case for KASAN, which is a nice bonus. This involves some hackery to manipulate the root and next level page tables without having to distinguish all the various configurations, including 16k/48-bits (which has a two entry pgd_t level), and LPA2 configurations running with one translation level less on non-LPA2 hardware. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-80-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: Add 5 level paging support to fixmap and swapper handlingArd Biesheuvel4-10/+85
Add support for using 5 levels of paging in the fixmap, as well as in the kernel page table handling code which uses fixmaps internally. This also handles the case where a 5 level build runs on hardware that only supports 4 levels of paging. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-79-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: Enable LPA2 at boot if supported by the systemArd Biesheuvel11-11/+124
Update the early kernel mapping code to take 52-bit virtual addressing into account based on the LPA2 feature. This is a bit more involved than LVA (which is supported with 64k pages only), given that some page table descriptor bits change meaning in this case. To keep the handling in asm to a minimum, the initial ID map is still created with 48-bit virtual addressing, which implies that the kernel image must be loaded into 48-bit addressable physical memory. This is currently required by the boot protocol, even though we happen to support placement outside of that for LVA/64k based configurations. Enabling LPA2 involves more than setting TCR.T1SZ to a lower value, there is also a DS bit in TCR that needs to be set, and which changes the meaning of bits [9:8] in all page table descriptors. Since we cannot enable DS and every live page table descriptor at the same time, let's pivot through another temporary mapping. This avoids the need to reintroduce manipulations of the page tables with the MMU and caches disabled. To permit the LPA2 feature to be overridden on the kernel command line, which may be necessary to work around silicon errata, or to deal with mismatched features on heterogeneous SoC designs, test for CPU feature overrides first, and only then enable LPA2. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-78-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: add LPA2 and 5 level paging support to G-to-nG conversionArd Biesheuvel2-13/+66
Add support for 5 level paging in the G-to-nG routine that creates its own temporary page tables to traverse the swapper page tables. Also add support for running the 5 level configuration with the top level folded at runtime, to support CPUs that do not implement the LPA2 extension. While at it, wire up the level skipping logic so it will also trigger on 4 level configurations with LPA2 enabled at build time but not active at runtime, as we'll fall back to 3 level paging in that case. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-77-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: Add definitions to support 5 levels of pagingArd Biesheuvel6-9/+188
Add the required types and descriptor accessors to support 5 levels of paging in the common code. This is one of the prerequisites for supporting 52-bit virtual addressing with 4k pages. Note that this does not cover the code that handles kernel mappings or the fixmap. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-76-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: Add LPA2 support to phys<->pte conversion routinesArd Biesheuvel4-19/+20
In preparation for enabling LPA2 support, introduce the mask values for converting between physical addresses and their representations in a page table descriptor. While at it, move the pte_to_phys asm macro into its only user, so that we can freely modify it to use its input value register as a temp register. For LPA2, the PTE_ADDR_MASK contains two non-adjacent sequences of zero bits, which means it no longer fits into the immediate field of an ordinary ALU instruction. So let's redefine it to include the bits in between as well, and only use it when converting from physical address to PTE representation, where the distinction does not matter. Also update the name accordingly to emphasize this. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-75-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>