summaryrefslogtreecommitdiff
path: root/arch/arm64/mm/mmu.c
AgeCommit message (Collapse)AuthorFilesLines
2023-08-21arm64: convert various functions to use ptdescsVishal Moola (Oracle)1-3/+4
As part of the conversions to replace pgtable constructor/destructors with ptdesc equivalents, convert various page table functions to use ptdescs. Link: https://lkml.kernel.org/r/20230807230513.102486-19-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Guo Ren <guoren@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matthew Wilcox <willy@infradead.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-15arm64: mm: fix VA-range sanity checkMark Rutland1-2/+2
Both create_mapping_noalloc() and update_mapping_prot() sanity-check their 'virt' parameter, but the check itself doesn't make much sense. The condition used today appears to be a historical accident. The sanity-check condition: if ((virt >= PAGE_END) && (virt < VMALLOC_START)) { [ ... warning here ... ] return; } ... can only be true for the KASAN shadow region or the module region, and there's no reason to exclude these specifically for creating and updateing mappings. When arm64 support was first upstreamed in commit: c1cc1552616d0f35 ("arm64: MMU initialisation") ... the condition was: if (virt < VMALLOC_START) { [ ... warning here ... ] return; } At the time, VMALLOC_START was the lowest kernel address, and this was checking whether 'virt' would be translated via TTBR1. Subsequently in commit: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space") ... the condition was changed to: if ((virt >= VA_START) && (virt < VMALLOC_START)) { [ ... warning here ... ] return; } This appear to have been a thinko. The commit moved the linear map to the bottom of the kernel address space, with VMALLOC_START being at the halfway point. The old condition would warn for changes to the linear map below this, and at the time VA_START was the end of the linear map. Subsequently we cleaned up the naming of VA_START in commit: 77ad4ce69321abbe ("arm64: memory: rename VA_START to PAGE_END") ... keeping the erroneous condition as: if ((virt >= PAGE_END) && (virt < VMALLOC_START)) { [ ... warning here ... ] return; } Correct the condition to check against the start of the TTBR1 address space, which is currently PAGE_OFFSET. This simplifies the logic, and more clearly matches the "outside kernel range" message in the warning. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230615102628.1052103-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-06-14arm64: consolidate rox page protection logicRussell King1-2/+7
Consolidate the arm64 decision making for the page protections used for executable pages, used by both the trampoline code and the kernel text mapping code. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1q9T3v-00EDmW-BH@rmk-PC.armlinux.org.uk Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-04-20Merge branch 'for-next/mm' into for-next/coreWill Deacon1-194/+63
* for-next/mm: arm64: mm: always map fixmap at page granularity arm64: mm: move fixmap code to its own file arm64: add FIXADDR_TOT_{START,SIZE} Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"" arm: uaccess: Remove memcpy_page_flushcache() mm,kfence: decouple kfence from page granularity mapping judgement
2023-04-11arm64: kdump: do not map crashkernel region specificallyBaoquan He1-43/+0
After taking off the protection functions on crashkernel memory region, there's no need to map crashkernel region with page granularity during linear mapping. With this change, the system can make use of block or section mapping on linear region to largely improve perforcemence during system bootup and running. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230407011507.17572-3-bhe@redhat.com Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11arm64: mm: move fixmap code to its own fileMark Rutland1-194/+3
Over time, arm64's mm/mmu.c has become increasingly large and painful to navigate. Move the fixmap code to its own file where it can be understood in isolation. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20230406152759.4164229-3-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-04-11arm64: add FIXADDR_TOT_{START,SIZE}Mark Rutland1-13/+13
Currently arm64's FIXADDR_{START,SIZE} definitions only cover the runtime fixmap slots (and not the boot-time fixmap slots), but the code for creating the fixmap assumes that these definitions cover the entire fixmap range. This means that the ptdump boundaries are reported in a misleading way, missing the VA region of the runtime slots. In theory this could also cause the fixmap creation to go wrong if the boot-time fixmap slots end up spilling into a separate PMD entry, though luckily this is not currently the case in any configuration. While it seems like we could extend FIXADDR_{START,SIZE} to cover the entire fixmap area, core code relies upon these *only* covering the runtime slots. For example, fix_to_virt() and virt_to_fix() try to reject manipulation of the boot-time slots based upon FIXADDR_{START,SIZE}, while __fix_to_virt() and __virt_to_fix() can handle any fixmap slot. This patch follows the lead of x86 in commit: 55f49fcb879fbeeb ("x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP") ... and add new FIXADDR_TOT_{START,SIZE} definitions which cover the entire fixmap area, using these for the fixmap creation and ptdump code. As the boot-time fixmap slots are now rejected by fix_to_virt(), the early_fixmap_init() code is changed to consistently use __fix_to_virt(), as it already does in a few cases. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20230406152759.4164229-2-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-03-27mm,kfence: decouple kfence from page granularity mapping judgementZhenhua Huang1-0/+61
Kfence only needs its pool to be mapped as page granularity, if it is inited early. Previous judgement was a bit over protected. From [1], Mark suggested to "just map the KFENCE region a page granularity". So I decouple it from judgement and do page granularity mapping for kfence pool only. Need to be noticed that late init of kfence pool still requires page granularity mapping. Page granularity mapping in theory cost more(2M per 1GB) memory on arm64 platform. Like what I've tested on QEMU(emulated 1GB RAM) with gki_defconfig, also turning off rodata protection: Before: [root@liebao ]# cat /proc/meminfo MemTotal: 999484 kB After: [root@liebao ]# cat /proc/meminfo MemTotal: 1001480 kB To implement this, also relocate the kfence pool allocation before the linear mapping setting up, arm64_kfence_alloc_pool is to allocate phys addr, __kfence_pool is to be set after linear mapping set up. LINK: [1] https://lore.kernel.org/linux-arm-kernel/Y+IsdrvDNILA59UN@FVFF77S0Q05N/ Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Marco Elver <elver@google.com> Link: https://lore.kernel.org/r/1679066974-690-1-git-send-email-quic_zhenhuah@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2023-02-22Merge tag 'arm64-upstream' of ↵Linus Torvalds1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit architectural register (ZT0, for the look-up table feature) that Linux needs to save/restore - Include TPIDR2 in the signal context and add the corresponding kselftests - Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG (ARM CMN) at probe time - Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64 - Permit EFI boot with MMU and caches on. Instead of cleaning the entire loaded kernel image to the PoC and disabling the MMU and caches before branching to the kernel bare metal entry point, leave the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of all of system RAM to populate the initial page tables - Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64 kernel (the arm32 kernel only defines the values) - Harden the arm64 shadow call stack pointer handling: stash the shadow stack pointer in the task struct on interrupt, load it directly from this structure - Signal handling cleanups to remove redundant validation of size information and avoid reading the same data from userspace twice - Refactor the hwcap macros to make use of the automatically generated ID registers. It should make new hwcaps writing less error prone - Further arm64 sysreg conversion and some fixes - arm64 kselftest fixes and improvements - Pointer authentication cleanups: don't sign leaf functions, unify asm-arch manipulation - Pseudo-NMI code generation optimisations - Minor fixes for SME and TPIDR2 handling - Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable, replace strtobool() to kstrtobool() in the cpufeature.c code, apply dynamic shadow call stack in two passes, intercept pfn changes in set_pte_at() without the required break-before-make sequence, attempt to dump all instructions on unhandled kernel faults * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits) arm64: fix .idmap.text assertion for large kernels kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests kselftest/arm64: Copy whole EXTRA context arm64: kprobes: Drop ID map text from kprobes blacklist perf: arm_spe: Print the version of SPE detected perf: arm_spe: Add support for SPEv1.2 inverted event filtering perf: Add perf_event_attr::config3 arm64/sme: Fix __finalise_el2 SMEver check drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable arm64/signal: Only read new data when parsing the ZT context arm64/signal: Only read new data when parsing the ZA context arm64/signal: Only read new data when parsing the SVE context arm64/signal: Avoid rereading context frame sizes arm64/signal: Make interface for restore_fpsimd_context() consistent arm64/signal: Remove redundant size validation from parse_user_sigframe() arm64/signal: Don't redundantly verify FPSIMD magic arm64/cpufeature: Use helper macros to specify hwcaps arm64/cpufeature: Always use symbolic name for feature value in hwcaps arm64/sysreg: Initial unsigned annotations for ID registers arm64/sysreg: Initial annotation of signed ID registers ...
2023-01-31arm64/mm: Intercept pfn changes in set_pte_at()Anshuman Khandual1-2/+6
Changing pfn on a user page table mapped entry, without first going through break-before-make (BBM) procedure is unsafe. This just updates set_pte_at() to intercept such changes, via an updated pgattr_change_is_safe(). This new check happens via __check_racy_pte_update(), which has now been renamed as __check_safe_pte_update(). Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20230130121457.1607675-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-01-06arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruptionAnshuman Khandual1-0/+21
If a Cortex-A715 cpu sees a page mapping permissions change from executable to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the next instruction abort caused by permission fault. Only user-space does executable to non-executable permission transition via mprotect() system call which calls ptep_modify_prot_start() and ptep_modify _prot_commit() helpers, while changing the page mapping. The platform code can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION. Work around the problem via doing a break-before-make TLB invalidation, for all executable user space mappings, that go through mprotect() system call. This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving an opportunity to intercept user space exec mappings, and do the necessary TLB invalidation. Similar interceptions are also implemented for HugeTLB. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20230102061651.34745-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-12-16Merge tag 'arm64-fixes' of ↵Linus Torvalds1-21/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix Kconfig dependencies to re-allow the enabling of function graph tracer and shadow call stacks at the same time. - Revert the workaround for CPU erratum #2645198 since the CONFIG_ guards were incorrect and the code has therefore not seen any real exposure in -next. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: Revert "arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption" ftrace: Allow WITH_ARGS flavour of graph tracer with shadow call stack
2022-12-15Revert "arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption"Will Deacon1-21/+0
This reverts commit 44ecda71fd8a70185c270f5914ac563827fe1d4c. All versions of this patch on the mailing list, including the version that ended up getting merged, have portions of code guarded by the non-existent CONFIG_ARM64_WORKAROUND_2645198 option. Although Anshuman says he tested the code with some additional debug changes [1], I'm hesitant to fix the CONFIG option and light up a bunch of code right before I (and others) disappear for the end of year holidays, during which time we won't be around to deal with any fallout. So revert the change for now. We can bring back a fixed, tested version for a later -rc when folks are thinking about things other than trees and turkeys. [1] https://lore.kernel.org/r/b6f61241-e436-5db1-1053-3b441080b8d6@arm.com Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20221215094811.23188-1-lukas.bulwahn@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-12-14Merge tag 'mm-stable-2022-12-13' of ↵Linus Torvalds1-87/+15
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
2022-12-12mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()Feiyang Chen1-40/+15
Generalise vmemmap_populate_hugepages() so ARM64 & X86 & LoongArch can share its implementation. Link: https://lkml.kernel.org/r/20221027125253.3458989-4-chenhuacai@loongson.cn Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Acked-by: Will Deacon <will@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Min Zhou <zhoumin@loongson.cn> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Philippe Mathieu-Daudé <philmd@linaro.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Xuefeng Li <lixuefeng@loongson.cn> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-06Merge branch 'for-next/trivial' into for-next/coreWill Deacon1-1/+1
* for-next/trivial: arm64: alternatives: add __init/__initconst to some functions/variables arm64/asm: Remove unused assembler DAIF save/restore macros arm64/kpti: Move DAIF masking to C code Revert "arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)" arm64/mm: Drop unused restore_ttbr1 arm64: alternatives: make apply_alternatives_vdso() static arm64/mm: Drop idmap_pg_end[] declaration arm64/mm: Drop redundant BUG_ON(!pgtable_alloc) arm64: make is_ttbrX_addr() noinstr-safe arm64/signal: Document our convention for choosing magic numbers arm64: atomics: lse: remove stale dependency on JUMP_LABEL arm64: paravirt: remove conduit check in has_pv_steal_clock arm64: entry: Fix typo arm64/booting: Add missing colon to FA64 entry arm64/mm: Drop ARM64_KERNEL_USES_PMD_MAPS arm64/asm: Remove unused enable_da macro
2022-11-21Revert "arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)"Will Deacon1-1/+3
This reverts commit 9ed2b4616d4e846ece2a04cb5007ce1d1bd9e3f3. Nathan reports early boot failures bisected to this change which look related to the kPTI nG repainting. In any case, consolidating the BUG_ON()s to a single location needs more thought, so revert the change until this is figured out properly. Link: https://lore.kernel.org/r/Y3pS5fdZ3MdLZ00t@dev-arch.thelio-3990X Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruptionAnshuman Khandual1-0/+21
If a Cortex-A715 cpu sees a page mapping permissions change from executable to non-executable, it may corrupt the ESR_ELx and FAR_ELx registers, on the next instruction abort caused by permission fault. Only user-space does executable to non-executable permission transition via mprotect() system call which calls ptep_modify_prot_start() and ptep_modify _prot_commit() helpers, while changing the page mapping. The platform code can override these helpers via __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION. Work around the problem via doing a break-before-make TLB invalidation, for all executable user space mappings, that go through mprotect() system call. This overrides ptep_modify_prot_start() and ptep_modify_prot_commit(), via defining HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION on the platform thus giving an opportunity to intercept user space exec mappings, and do the necessary TLB invalidation. Similar interceptions are also implemented for HugeTLB. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20221116140915.356601-3-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-18arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)Anshuman Khandual1-3/+1
__create_pgd_mapping_locked() expects a page allocator used while mapping a virtual range. This page allocator function propagates down the call chain, while building intermediate levels in the page table. Passed page allocator is a necessary ingredient required to build the page table but its presence can be asserted just once in the very beginning rather than in all the down stream functions. This consolidates BUG_ON(!pgtable_alloc) checks just in a single place i.e __create_pgd_mapping_locked(). Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20221118053102.500216-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-11-09mm: remove kern_addr_valid() completelyKefeng Wang1-47/+0
Most architectures (except arm64/x86/sparc) simply return 1 for kern_addr_valid(), which is only used in read_kcore(), and it calls copy_from_kernel_nofault() which could check whether the address is a valid kernel address. So as there is no need for kern_addr_valid(), let's remove it. Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Helge Deller <deller@gmx.de> [parisc] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Guo Ren <guoren@kernel.org> [csky] Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: <aou@eecs.berkeley.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jonas Bonn <jonas@southpole.se> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08arm64/mm: Drop ARM64_KERNEL_USES_PMD_MAPSAnshuman Khandual1-1/+1
Currently ARM64_KERNEL_USES_PMD_MAPS is an unnecessary abstraction. Kernel mapping at PMD (aka huge page aka block) level, is only applicable with 4K base page, which makes it 2MB aligned, a necessary requirement for linear mapping and physical memory start address. This can be easily achieved by directly checking against base page size itself. This drops off the macro ARM64_KERNE_USES_PMD_MAPS which is redundant. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20221108034406.2950071-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-10-06Merge tag 'arm64-upstream' of ↵Linus Torvalds1-16/+7
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE vector granule register added to the user regs together with SVE perf extensions documentation. - SVE updates: add HWCAP for SVE EBF16, update the SVE ABI documentation to match the actual kernel behaviour (zeroing the registers on syscall rather than "zeroed or preserved" previously). - More conversions to automatic system registers generation. - vDSO: use self-synchronising virtual counter access in gettimeofday() if the architecture supports it. - arm64 stacktrace cleanups and improvements. - arm64 atomics improvements: always inline assembly, remove LL/SC trampolines. - Improve the reporting of EL1 exceptions: rework BTI and FPAC exception handling, better EL1 undefs reporting. - Cortex-A510 erratum 2658417: remove BF16 support due to incorrect result. - arm64 defconfig updates: build CoreSight as a module, enable options necessary for docker, memory hotplug/hotremove, enable all PMUs provided by Arm. - arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME extensions). - arm64 ftraces updates/fixes: fix module PLTs with mcount, remove unused function. - kselftest updates for arm64: simple HWCAP validation, FP stress test improvements, validation of ZA regs in signal handlers, include larger SVE and SME vector lengths in signal tests, various cleanups. - arm64 alternatives (code patching) improvements to robustness and consistency: replace cpucap static branches with equivalent alternatives, associate callback alternatives with a cpucap. - Miscellaneous updates: optimise kprobe performance of patching single-step slots, simplify uaccess_mask_ptr(), move MTE registers initialisation to C, support huge vmalloc() mappings, run softirqs on the per-CPU IRQ stack, compat (arm32) misalignment fixups for multiword accesses. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (126 commits) arm64: alternatives: Use vdso/bits.h instead of linux/bits.h arm64/kprobe: Optimize the performance of patching single-step slot arm64: defconfig: Add Coresight as module kselftest/arm64: Handle EINTR while reading data from children kselftest/arm64: Flag fp-stress as exiting when we begin finishing up kselftest/arm64: Don't repeat termination handler for fp-stress ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs arm64/mm: fold check for KFENCE into can_set_direct_map() arm64: ftrace: fix module PLTs with mcount arm64: module: Remove unused plt_entry_is_initialized() arm64: module: Make plt_equals_entry() static arm64: fix the build with binutils 2.27 kselftest/arm64: Don't enable v8.5 for MTE selftest builds arm64: uaccess: simplify uaccess_mask_ptr() arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header kselftest/arm64: Fix typo in hwcap check arm64: mte: move register initialization to C arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate() arm64: dma: Drop cache invalidation from arch_dma_prep_coherent() arm64/sve: Add Perf extensions documentation ...
2022-09-30Merge branch 'for-next/misc' into for-next/coreCatalin Marinas1-15/+6
* for-next/misc: : Miscellaneous patches arm64/kprobe: Optimize the performance of patching single-step slot ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs arm64/mm: fold check for KFENCE into can_set_direct_map() arm64: uaccess: simplify uaccess_mask_ptr() arm64: mte: move register initialization to C arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate() arm64: dma: Drop cache invalidation from arch_dma_prep_coherent() arm64: support huge vmalloc mappings arm64: spectre: increase parameters that can be used to turn off bhb mitigation individually arm64: run softirqs on the per-CPU IRQ stack arm64: compat: Implement misalignment fixups for multiword loads
2022-09-29arm64/mm: fold check for KFENCE into can_set_direct_map()Mike Rapoport1-6/+2
KFENCE requires linear map to be mapped at page granularity, so that it is possible to protect/unprotect single pages, just like with rodata_full and DEBUG_PAGEALLOC. Instead of repating can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE) make can_set_direct_map() handle the KFENCE case. This also prevents potential false positives in kernel_page_present() that may return true for non-present page if CONFIG_KFENCE is enabled. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220921074841.382615-1-rppt@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()Kefeng Wang1-9/+4
Directly check ARM64_SWAPPER_USES_SECTION_MAPS to choose base page or PMD level huge page mapping in vmemmap_populate() to simplify code a bit. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20220920014951.196191-1-wangkefeng.wang@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-09-22arm64: mm: don't acquire mutex when rewriting swapperMark Rutland1-14/+18
Since commit: 47546a1912fc4a03 ("arm64: mm: install KPTI nG mappings with MMU enabled)" ... when building with CONFIG_DEBUG_ATOMIC_SLEEP=y and booting under QEMU TCG with '-cpu max', there's a boot-time splat: | BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 | in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 15, name: migration/0 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | no locks held by migration/0/15. | irq event stamp: 28 | hardirqs last enabled at (27): [<ffff8000091ed180>] _raw_spin_unlock_irq+0x3c/0x7c | hardirqs last disabled at (28): [<ffff8000081b8d74>] multi_cpu_stop+0x150/0x18c | softirqs last enabled at (0): [<ffff80000809a314>] copy_process+0x594/0x1964 | softirqs last disabled at (0): [<0000000000000000>] 0x0 | CPU: 0 PID: 15 Comm: migration/0 Not tainted 6.0.0-rc3-00002-g419b42ff7eef #3 | Hardware name: linux,dummy-virt (DT) | Stopper: multi_cpu_stop+0x0/0x18c <- stop_cpus.constprop.0+0xa0/0xfc | Call trace: | dump_backtrace.part.0+0xd0/0xe0 | show_stack+0x1c/0x5c | dump_stack_lvl+0x88/0xb4 | dump_stack+0x1c/0x38 | __might_resched+0x180/0x230 | __might_sleep+0x4c/0xa0 | __mutex_lock+0x5c/0x450 | mutex_lock_nested+0x30/0x40 | create_kpti_ng_temp_pgd+0x4fc/0x6d0 | kpti_install_ng_mappings+0x2b8/0x3b0 | cpu_enable_non_boot_scope_capabilities+0x7c/0xd0 | multi_cpu_stop+0xa0/0x18c | cpu_stopper_thread+0x88/0x11c | smpboot_thread_fn+0x1ec/0x290 | kthread+0x118/0x120 | ret_from_fork+0x10/0x20 Since commit: ee017ee353506fce ("arm64/mm: avoid fixmap race condition when create pud mapping") ... once the kernel leave the SYSTEM_BOOTING state, the fixmap pagetable entries are protected by the fixmap_lock mutex. The new KPTI rewrite code uses __create_pgd_mapping() to create a temporary pagetable. This happens in atomic context, after secondary CPUs are brought up and the kernel has left the SYSTEM_BOOTING state. Hence we try to acquire a mutex in atomic context, which is generally unsound (though benign in this case as the mutex should be free and all other CPUs are quiescent). This patch avoids the issue by pulling the mutex out of alloc_init_pud() and calling it at a higher level in the pagetable manipulation code. This allows it to be used without locking where one CPU is known to be in exclusive control of the machine, even after having left the SYSTEM_BOOTING state. Fixes: 47546a1912fc ("arm64: mm: install KPTI nG mappings with MMU enabled") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220920134731.1625740-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-09-09arm64/sysreg: Add _EL1 into ID_AA64PFR1_EL1 constant namesMark Brown1-1/+1
Our standard is to include the _EL1 in the constant names for registers but we did not do that for ID_AA64PFR1_EL1, update to do so in preparation for conversion to automatic generation. No functional change. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20220905225425.1871461-8-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-08-23arm64: fix rodata=fullMark Rutland1-18/+0
On arm64, "rodata=full" has been suppored (but not documented) since commit: c55191e96caa9d78 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well") As it's necessary to determine the rodata configuration early during boot, arm64 has an early_param() handler for this, whereas init/main.c has a __setup() handler which is run later. Unfortunately, this split meant that since commit: f9a40b0890658330 ("init/main.c: return 1 from handled __setup() functions") ... passing "rodata=full" would result in a spurious warning from the __setup() handler (though RO permissions would be configured appropriately). Further, "rodata=full" has been broken since commit: 0d6ea3ac94ca77c5 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()") ... which caused strtobool() to parse "full" as false (in addition to many other values not documented for the "rodata=" kernel parameter. This patch fixes this breakage by: * Moving the core parameter parser to an __early_param(), such that it is available early. * Adding an (optional) arch hook which arm64 can use to parse "full". * Updating the documentation to mention that "full" is valid for arm64. * Having the core parameter parser handle "on" and "off" explicitly, such that any undocumented values (e.g. typos such as "ful") are reported as errors rather than being silently accepted. Note that __setup() and early_param() have opposite conventions for their return values, where __setup() uses 1 to indicate a parameter was handled and early_param() uses 0 to indicate a parameter was handled. Fixes: f9a40b089065 ("init/main.c: return 1 from handled __setup() functions") Fixes: 0d6ea3ac94ca ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jagdish Gediya <jvgediya@linux.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-07-25Merge branch 'for-next/boot' into for-next/coreWill Deacon1-4/+51
* for-next/boot: (34 commits) arm64: fix KASAN_INLINE arm64: Add an override for ID_AA64SMFR0_EL1.FA64 arm64: Add the arm64.nosve command line option arm64: Add the arm64.nosme command line option arm64: Expose a __check_override primitive for oddball features arm64: Allow the idreg override to deal with variable field width arm64: Factor out checking of a feature against the override into a macro arm64: Allow sticky E2H when entering EL1 arm64: Save state of HCR_EL2.E2H before switch to EL1 arm64: Rename the VHE switch to "finalise_el2" arm64: mm: fix booting with 52-bit address space arm64: head: remove __PHYS_OFFSET arm64: lds: use PROVIDE instead of conditional definitions arm64: setup: drop early FDT pointer helpers arm64: head: avoid relocating the kernel twice for KASLR arm64: kaslr: defer initialization to initcall where permitted arm64: head: record CPU boot mode after enabling the MMU arm64: head: populate kernel page tables with MMU and caches on arm64: head: factor out TTBR1 assignment into a macro arm64: idreg-override: use early FDT mapping in ID map ...
2022-07-25Merge branch 'for-next/misc' into for-next/coreWill Deacon1-4/+2
* for-next/misc: arm64/mm: use GENMASK_ULL for TTBR_BADDR_MASK_52 arm64: numa: Don't check node against MAX_NUMNODES arm64: mm: Remove assembly DMA cache maintenance wrappers arm64/mm: Define defer_reserve_crashkernel() arm64: fix oops in concurrently setting insn_emulation sysctls arm64: Do not forget syscall when starting a new thread. arm64: boot: add zstd support
2022-07-05arm64/mm: Define defer_reserve_crashkernel()Anshuman Khandual1-4/+2
Crash kernel memory reservation gets deferred, when either CONFIG_ZONE_DMA or CONFIG_ZONE_DMA32 config is enabled on the platform. This deferral also impacts overall linear mapping creation including the crash kernel itself. Just encapsulate this deferral check in a new helper for better clarity. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220705062556.1845734-1-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: record CPU boot mode after enabling the MMUArd Biesheuvel1-0/+8
In order to avoid having to touch memory with the MMU and caches disabled, and therefore having to invalidate it from the caches explicitly, just defer storing the value until after the MMU has been turned on, unless we are giving up with an error. While at it, move the associated variable definitions into C code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-19-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: cover entire kernel image in initial ID mapArd Biesheuvel1-1/+34
As a first step towards avoiding the need to create, tear down and recreate the kernel virtual mapping with MMU and caches disabled, start by expanding the ID map so it covers the page tables as well as all executable code. This will allow us to populate the page tables with the MMU and caches on, and call KASLR init code before setting up the virtual mapping. Since this ID map is only needed at boot, create it as a temporary set of page tables, and populate the permanent ID map after enabling the MMU and caches. While at it, switch to read-only attributes for the where possible, as writable permissions are only needed for the initial kernel page tables. Note that on 4k granule configurations, the permanent ID map will now be reduced to a single page rather than a 2M block mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-13-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: mm: provide idmap pointer to cpu_replace_ttbr1()Ard Biesheuvel1-1/+1
In preparation for changing the way we initialize the permanent ID map, update cpu_replace_ttbr1() so we can use it with the initial ID map as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-11-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: drop idmap_ptrs_per_pgdArd Biesheuvel1-1/+0
The assignment of idmap_ptrs_per_pgd lacks any cache invalidation, even though it is updated with the MMU and caches disabled. However, we never bother to read the value again except in the very next instruction, and so we can just drop the variable entirely. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220624150651.1358849-5-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: move assignment of idmap_t0sz to C codeArd Biesheuvel1-1/+3
Setting idmap_t0sz involves fiddling with the caches if done with the MMU off. Since we will be creating an initial ID map with the MMU and caches off, and the permanent ID map with the MMU and caches on, let's move this assignment of idmap_t0sz out of the startup code, and replace it with a macro that simply issues the three instructions needed to calculate the value wherever it is needed before the MMU is turned on. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-4-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: mm: make vabits_actual a build time constant if possibleArd Biesheuvel1-1/+3
Currently, we only support 52-bit virtual addressing on 64k pages configurations, and in all other cases, vabits_actual is guaranteed to equal VA_BITS (== VA_BITS_MIN). So get rid of the variable entirely in that case. While at it, move the assignment out of the asm entry code - it has no need to be there. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-3-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: head: move kimage_vaddr variable into C fileArd Biesheuvel1-0/+3
This variable definition does not need to be in head.S so move it out. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220624150651.1358849-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-24arm64: entry: simplify trampoline data pageArd Biesheuvel1-7/+3
Get rid of some clunky open coded arithmetic on section addresses, by emitting the trampoline data variables into a separate, dedicated r/o data section, and putting it at the next page boundary. This way, we can access the literals via single LDR instruction. While at it, get rid of other, implicit literals, and use ADRP/ADD or MOVZ/MOVK sequences, as appropriate. Note that the latter are only supported for CONFIG_RELOCATABLE=n (which is usually the case if CONFIG_RANDOMIZE_BASE=n), so update the CPP conditionals to reflect this. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220622161010.3845775-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-06-23arm64: mm: install KPTI nG mappings with MMU enabledArd Biesheuvel1-0/+7
In cases where we unmap the kernel while running in user space, we rely on ASIDs to distinguish the minimal trampoline from the full kernel mapping, and this means we must use non-global attributes for those mappings, to ensure they are scoped by ASID and will not hit in the TLB inadvertently. We only do this when needed, as this is generally more costly in terms of TLB pressure, and so we boot without these non-global attributes, and apply them to all existing kernel mappings once all CPUs are up and we know whether or not the non-global attributes are needed. At this point, we cannot simply unmap and remap the entire address space, so we have to update all existing block and page descriptors in place. Currently, we go through a lot of trouble to perform these updates with the MMU and caches off, to avoid violating break before make (BBM) rules imposed by the architecture. Since we make changes to page tables that are not covered by the ID map, we gain access to those descriptors by disabling translations altogether. This means that the stores to memory are issued with device attributes, and require extra care in terms of coherency, which is costly. We also rely on the ID map to access a shared flag, which requires the ID map to be executable and writable at the same time, which is another thing we'd prefer to avoid. So let's switch to an approach where we replace the kernel mapping with a minimal mapping of a few pages that can be used for a minimal, ad-hoc fixmap that we can use to map each page table in turn as we traverse the hierarchy. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220609174320.4035379-3-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2022-03-23Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds1-0/+1
Pull folio updates from Matthew Wilcox: - Rewrite how munlock works to massively reduce the contention on i_mmap_rwsem (Hugh Dickins): https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/ - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig): https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/ - Convert GUP to use folios and make pincount available for order-1 pages. (Matthew Wilcox) - Convert a few more truncation functions to use folios (Matthew Wilcox) - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox) - Convert rmap_walk to use folios (Matthew Wilcox) - Convert most of shrink_page_list() to use a folio (Matthew Wilcox) - Add support for creating large folios in readahead (Matthew Wilcox) * tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits) mm/damon: minor cleanup for damon_pa_young selftests/vm/transhuge-stress: Support file-backed PMD folios mm/filemap: Support VM_HUGEPAGE for file mappings mm/readahead: Switch to page_cache_ra_order mm/readahead: Align file mappings for non-DAX mm/readahead: Add large folio readahead mm: Support arbitrary THP sizes mm: Make large folios depend on THP mm: Fix READ_ONLY_THP warning mm/filemap: Allow large folios to be added to the page cache mm: Turn can_split_huge_page() into can_split_folio() mm/vmscan: Convert pageout() to take a folio mm/vmscan: Turn page_check_references() into folio_check_references() mm/vmscan: Account large folios correctly mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios mm/vmscan: Free non-shmem folios without splitting them mm/rmap: Constify the rmap_walk_control argument mm/rmap: Convert rmap_walk() to take a folio mm: Turn page_anon_vma() into folio_anon_vma() mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read() ...
2022-03-14Merge branch 'for-next/spectre-bhb' into for-next/coreWill Deacon1-3/+9
Merge in the latest Spectre mess to fix up conflicts with what was already queued for 5.18 when the embargo finally lifted. * for-next/spectre-bhb: (21 commits) arm64: Do not include __READ_ONCE() block in assembly files arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting arm64: Use the clearbhb instruction in mitigations KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated arm64: Mitigate spectre style branch history side channels arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2 arm64: Add percpu vectors for EL1 arm64: entry: Add macro for reading symbol addresses from the trampoline arm64: entry: Add vectors that have the bhb mitigation sequences arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations arm64: entry: Allow the trampoline text to occupy multiple pages arm64: entry: Make the kpti trampoline's kpti sequence optional arm64: entry: Move trampoline macros out of ifdef'd section arm64: entry: Don't assume tramp_vectors is the start of the vectors arm64: entry: Allow tramp_alias to access symbols after the 4K boundary arm64: entry: Move the trampoline data page before the text page arm64: entry: Free up another register on kpti's tramp_exit path arm64: entry: Make the trampoline cleanup optional KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit ...
2022-03-14Merge branch 'for-next/mm' into for-next/coreWill Deacon1-13/+11
* for-next/mm: Documentation: vmcoreinfo: Fix htmldocs warning arm64/mm: Drop use_1G_block() arm64: avoid flushing icache multiple times on contiguous HugeTLB arm64: crash_core: Export MODULES, VMALLOC, and VMEMMAP ranges arm64/hugetlb: Define __hugetlb_valid_size() arm64/mm: avoid fixmap race condition when create pud mapping arm64/mm: Consolidate TCR_EL1 fields
2022-03-08arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zonesVijay Balakrishna1-1/+31
The following patches resulted in deferring crash kernel reservation to mem_init(), mainly aimed at platforms with DMA memory zones (no IOMMU), in particular Raspberry Pi 4. commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") commit 8424ecdde7df ("arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges") commit 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into mem_init()") commit 2687275a5843 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required") Above changes introduced boot slowdown due to linear map creation for all the memory banks with NO_BLOCK_MAPPINGS, see discussion[1]. The proposed changes restore crash kernel reservation to earlier behavior thus avoids slow boot, particularly for platforms with IOMMU (no DMA memory zones). Tested changes to confirm no ~150ms boot slowdown on our SoC with IOMMU and 8GB memory. Also tested with ZONE_DMA and/or ZONE_DMA32 configs to confirm no regression to deferring scheme of crash kernel memory reservation. In both cases successfully collected kernel crash dump. [1] https://lore.kernel.org/all/9436d033-579b-55fa-9b00-6f4b661c2dd7@linux.microsoft.com/ Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com> Cc: stable@vger.kernel.org Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/r/1646242689-20744-1-git-send-email-vijayb@linux.microsoft.com [will: Add #ifdef CONFIG_KEXEC_CORE guards to fix 'crashk_res' references in allnoconfig build] Signed-off-by: Will Deacon <will@kernel.org>
2022-03-08arm64/mm: Drop use_1G_block()Anshuman Khandual1-13/+2
pud_sect_supported() already checks for PUD level block mapping support i.e on ARM64_4K_PAGES config. Hence pud_sect_supported(), along with some other required alignment checks can help completely drop use_1G_block(). Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/1644988012-25455-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2022-03-03mm: don't include <linux/memremap.h> in <linux/mm.h>Christoph Hellwig1-0/+1
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h. Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-02-15arm64: entry: Allow the trampoline text to occupy multiple pagesJames Morse1-3/+9
Adding a second set of vectors to .entry.tramp.text will make it larger than a single 4K page. Allow the trampoline text to occupy up to three pages by adding two more fixmap slots. Previous changes to tramp_valias allowed it to reach beyond a single page. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
2022-02-15arm64/mm: avoid fixmap race condition when create pud mappingJianyong Wu1-0/+9
The 'fixmap' is a global resource and is used recursively by create pud mapping(), leading to a potential race condition in the presence of a concurrent call to alloc_init_pud(): kernel_init thread virtio-mem workqueue thread ================== =========================== alloc_init_pud(...) alloc_init_pud(...) pudp = pud_set_fixmap_offset(...) pudp = pud_set_fixmap_offset(...) READ_ONCE(*pudp) pud_clear_fixmap(...) READ_ONCE(*pudp) // CRASH! As kernel may sleep during creating pud mapping, introduce a mutex lock to serialise use of the fixmap entries by alloc_init_pud(). However, there is no need for locking in early boot stage and it doesn't work well with KASLR enabled when early boot. So, enable lock when system_state doesn't equal to "SYSTEM_BOOTING". Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: f4710445458c ("arm64: mm: use fixmap when creating page tables") Link: https://lore.kernel.org/r/20220201114400.56885-1-jianyong.wu@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2021-11-10Merge tag 'arm64-fixes' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix double-evaluation of 'pte' macro argument when using 52-bit PAs - Fix signedness of some MTE prctl PR_* constants - Fix kmemleak memory usage by skipping early pgtable allocations - Fix printing of CPU feature register strings - Remove redundant -nostdlib linker flag for vDSO binaries * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions arm64: Track no early_pgtable_alloc() for kmemleak arm64: mte: change PR_MTE_TCF_NONE back into an unsigned long arm64: vdso: remove -nostdlib compiler flag arm64: arm64_ftr_reg->name may not be a human-readable string
2021-11-08arm64: Track no early_pgtable_alloc() for kmemleakQian Cai1-1/+2
After switched page size from 64KB to 4KB on several arm64 servers here, kmemleak starts to run out of early memory pool due to a huge number of those early_pgtable_alloc() calls: kmemleak_alloc_phys() memblock_alloc_range_nid() memblock_phys_alloc_range() early_pgtable_alloc() init_pmd() alloc_init_pud() __create_pgd_mapping() __map_memblock() paging_init() setup_arch() start_kernel() Increased the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE by 4 times won't be enough for a server with 200GB+ memory. There isn't much interesting to check memory leaks for those early page tables and those early memory mappings should not reference to other memory. Hence, no kmemleak false positives, and we can safely skip tracking those early allocations from kmemleak like we did in the commit fed84c785270 ("mm/memblock.c: skip kmemleak for kasan_init()") without needing to introduce complications to automatically scale the value depends on the runtime memory size etc. After the patch, the default value of DEBUG_KMEMLEAK_MEM_POOL_SIZE becomes sufficient again. Signed-off-by: Qian Cai <quic_qiancai@quicinc.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20211105150509.7826-1-quic_qiancai@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>