summaryrefslogtreecommitdiff
path: root/arch/riscv/mm
AgeCommit message (Collapse)AuthorFilesLines
2024-01-20Merge tag 'riscv-for-linus-6.8-mw4' of ↵Linus Torvalds3-21/+104
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for tuning for systems with fast misaligned accesses. - Support for SBI-based suspend. - Support for the new SBI debug console extension. - The T-Head CMOs now use PA-based flushes. - Support for enabling the V extension in kernel code. - Optimized IP checksum routines. - Various ftrace improvements. - Support for archrandom, which depends on the Zkr extension. - The build is no longer broken under NET=n, KUNIT=y for ports that don't define their own ipv6 checksum. * tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (56 commits) lib: checksum: Fix build with CONFIG_NET=n riscv: lib: Check if output in asm goto supported riscv: Fix build error on rv32 + XIP riscv: optimize ELF relocation function in riscv RISC-V: Implement archrandom when Zkr is available riscv: Optimize hweight API with Zbb extension riscv: add dependency among Image(.gz), loader(.bin), and vmlinuz.efi samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI] riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support riscv: ftrace: Make function graph use ftrace directly riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name riscv: Restrict DWARF5 when building with LLVM to known working versions riscv: Hoist linker relaxation disabling logic into Kconfig kunit: Add tests for csum_ipv6_magic and ip_fast_csum riscv: Add checksum library riscv: Add checksum header riscv: Add static key for misaligned accesses asm-generic: Improve csum_fold RISC-V: selftests: cbo: Ensure asm operands match constraints ...
2024-01-19Merge tag 'iommu-updates-v6.8' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Fix race conditions in device probe path - Retire IOMMU bus_ops - Support for passing custom allocators to page table drivers - Clean up Kconfig around IOMMU_SVA - Support for sharing SVA domains with all devices bound to a mm - Firmware data parsing cleanup - Tracing improvements for iommu-dma code - Some smaller fixes and cleanups ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Document Adreno clocks for Qualcomm's SM8350 SoC - SMMUv2: - Implement support for the ->domain_alloc_paging() callback - Ensure Secure context is restored following suspend of Qualcomm SMMU implementation - SMMUv3: - Disable stalling mode for the "quiet" context descriptor - Minor refactoring and driver cleanups Intel VT-d driver: - Cleanup and refactoring AMD IOMMU driver: - Improve IO TLB invalidation logic - Small cleanups and improvements Rockchip IOMMU driver: - DT binding update to add Rockchip RK3588 Apple DART driver: - Apple M1 USB4/Thunderbolt DART support - Cleanups Virtio IOMMU driver: - Add support for iotlb_sync_map - Enable deferred IO TLB flushes" * tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (66 commits) iommu: Don't reserve 0-length IOVA region iommu/vt-d: Move inline helpers to header files iommu/vt-d: Remove unused vcmd interfaces iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through() iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directly iommu/sva: Fix memory leak in iommu_sva_bind_device() dt-bindings: iommu: rockchip: Add Rockchip RK3588 iommu/dma: Trace bounce buffer usage when mapping buffers iommu/arm-smmu: Convert to domain_alloc_paging() iommu/arm-smmu: Pass arm_smmu_domain to internal functions iommu/arm-smmu: Implement IOMMU_DOMAIN_BLOCKED iommu/arm-smmu: Convert to a global static identity domain iommu/arm-smmu: Reorganize arm_smmu_domain_add_master() iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Master cannot be NULL in arm_smmu_write_strtab_ent() iommu/arm-smmu-v3: Add a type for the STE iommu/arm-smmu-v3: disable stall for quiet_cd iommu/qcom: restore IOMMU state if needed iommu/arm-smmu-qcom: Add QCM2290 MDSS compatible iommu/arm-smmu-qcom: Add missing GMU entry to match table ...
2024-01-19Merge tag 'percpu-for-6.8' of ↵Linus Torvalds2-0/+13
git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu updates from Dennis Zhou: "Enable percpu page allocator for RISC-V. There are RISC-V configurations with sparse NUMA configurations and small vmalloc space causing dynamic percpu allocations to fail as the backing chunk stride is too far apart" * tag 'percpu-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: riscv: Enable pcpu page first chunk allocator mm: Introduce flush_cache_vmap_early()
2024-01-19riscv: Fix build error on rv32 + XIPAlexandre Ghiti1-0/+4
commit 66f1e6809397 ("riscv: Make XIP bootable again") restricted page offset to the sv39 page offset instead of the default sv57, which makes sense since probably the platforms that target XIP kernels do not support anything else than sv39 and we do not try to find out the largest address space supported on XIP kernels (ie set_satp_mode()). But PAGE_OFFSET_L3 is not defined for rv32, so fix the build error by restoring the previous behaviour which picks CONFIG_PAGE_OFFSET for rv32. Fixes: 66f1e6809397 ("riscv: Make XIP bootable again") Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/linux-riscv/344dca85-5c48-44e1-bc64-4fa7973edd12@infradead.org/T/#u Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20240118212120.2087803-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-17Merge tag 'riscv-for-linus-6.8-mw1' of ↵Linus Torvalds7-69/+121
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for many new extensions in hwprobe, along with a handful of cleanups - Various cleanups to our page table handling code, so we alwayse use {READ,WRITE}_ONCE - Support for the which-cpus flavor of hwprobe - Support for XIP kernels has been resurrected * tag 'riscv-for-linus-6.8-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (52 commits) riscv: hwprobe: export Zicond extension riscv: hwprobe: export Zacas ISA extension riscv: add ISA extension parsing for Zacas dt-bindings: riscv: add Zacas ISA extension description riscv: hwprobe: export Ztso ISA extension riscv: add ISA extension parsing for Ztso use linux/export.h rather than asm-generic/export.h riscv: Remove SHADOW_OVERFLOW_STACK_SIZE macro riscv; fix __user annotation in save_v_state() riscv: fix __user annotation in traps_misaligned.c riscv: Select ARCH_WANTS_NO_INSTR riscv: Remove obsolete rv32_defconfig file riscv: Allow disabling of BUILTIN_DTB for XIP riscv: Fixed wrong register in XIP_FIXUP_FLASH_OFFSET macro riscv: Make XIP bootable again riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC riscv: Fix module_alloc() that did not reset the linear mapping permissions riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping riscv: Check if the code to patch lies in the exit section riscv: Use the same CPU operations for all CPUs ...
2024-01-11riscv: Add support for BATCHED_UNMAP_TLB_FLUSHAlexandre Ghiti1-20/+49
Allow to defer the flushing of the TLB when unmapping pages, which allows to reduce the numbers of IPI and the number of sfence.vma. The ubenchmarch used in commit 43b3dfdd0455 ("arm64: support batched/deferred tlb shootdown during page reclamation/migration") that was multithreaded to force the usage of IPI shows good performance improvement on all platforms: * Unmatched: ~34% * TH1520 : ~78% * Qemu : ~81% In addition, perf on qemu reports an important decrease in time spent dealing with IPIs: Before: 68.17% main [kernel.kallsyms] [k] __sbi_rfence_v02_call After : 8.64% main [kernel.kallsyms] [k] __sbi_rfence_v02_call * Benchmark: int stick_this_thread_to_core(int core_id) { int num_cores = sysconf(_SC_NPROCESSORS_ONLN); if (core_id < 0 || core_id >= num_cores) return EINVAL; cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(core_id, &cpuset); pthread_t current_thread = pthread_self(); return pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset); } static void *fn_thread (void *p_data) { int ret; pthread_t thread; stick_this_thread_to_core((int)p_data); while (1) { sleep(1); } return NULL; } int main() { volatile unsigned char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); pthread_t threads[4]; int ret; for (int i = 0; i < 4; ++i) { ret = pthread_create(&threads[i], NULL, fn_thread, (void *)i); if (ret) { printf("%s", strerror (ret)); } } memset(p, 0x88, SIZE); for (int k = 0; k < 10000; k++) { /* swap in */ for (int i = 0; i < SIZE; i += 4096) { (void)p[i]; } /* swap out */ madvise(p, SIZE, MADV_PAGEOUT); } for (int i = 0; i < 4; i++) { pthread_cancel(threads[i]); } for (int i = 0; i < 4; i++) { pthread_join(threads[i], NULL); } return 0; } Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Tested-by: Jisheng Zhang <jszhang@kernel.org> # Tested on TH1520 Tested-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20240108193640.344929-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11riscv: Use hugepage mappings for vmemmapAlexandre Ghiti1-1/+20
This will allow better TLB utilization and then should be more performant. Before: ---[ vmemmap start ]--- 0xffff8d8002000000-0xffff8d8012000000 0x000000046ec00000 256M PTE . .. .. D A G . . W R V ---[ vmemmap end ]--- After: ---[ vmemmap start ]--- 0xffff8d8002000000-0xffff8d8012000000 0x000000046ec00000 256M PMD . .. .. D A G . . W R V ---[ vmemmap end ]--- Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231214132935.212864-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-11Merge patch series "riscv: enable EFFICIENT_UNALIGNED_ACCESS and ↵Palmer Dabbelt1-0/+31
DCACHE_WORD_ACCESS" Jisheng Zhang <jszhang@kernel.org> says: Some riscv implementations such as T-HEAD's C906, C908, C910 and C920 support efficient unaligned access, for performance reason we want to enable HAVE_EFFICIENT_UNALIGNED_ACCESS on these platforms. To avoid performance regressions on non efficient unaligned access platforms, HAVE_EFFICIENT_UNALIGNED_ACCESS can't be globally selected. To solve this problem, runtime code patching based on the detected speed is a good solution. But that's not easy, it involves lots of work to modify vairous subsystems such as net, mm, lib and so on. This can be done step by step. So let's take an easier solution: add support to efficient unaligned access and hide the support under NONPORTABLE. patch1 introduces RISCV_EFFICIENT_UNALIGNED_ACCESS which depends on NONPORTABLE, if users know during config time that the kernel will be only run on those efficient unaligned access hw platforms, they can enable it. Obviously, generic unified kernel Image shouldn't enable it. patch2 adds support DCACHE_WORD_ACCESS when MMU and RISCV_EFFICIENT_UNALIGNED_ACCESS. Below test program and step shows how much performance can be improved: $ cat tt.c #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #define ITERATIONS 1000000 #define PATH "123456781234567812345678123456781" int main(void) { unsigned long i; struct stat buf; for (i = 0; i < ITERATIONS; i++) stat(PATH, &buf); return 0; } $ gcc -O2 tt.c $ touch 123456781234567812345678123456781 $ time ./a.out Per my test on T-HEAD C910 platforms, the above test performance is improved by about 7.5%. * b4-shazam-merge: riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HW riscv: introduce RISCV_EFFICIENT_UNALIGNED_ACCESS Link: https://lore.kernel.org/r/20231225044207.3821-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10riscv: select DCACHE_WORD_ACCESS for efficient unaligned access HWJisheng Zhang1-0/+31
DCACHE_WORD_ACCESS uses the word-at-a-time API for optimised string comparisons in the vfs layer. This patch implements support for load_unaligned_zeropad in much the same way as has been done for arm64. Here is the test program and step: $ cat tt.c #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #define ITERATIONS 1000000 #define PATH "123456781234567812345678123456781" int main(void) { unsigned long i; struct stat buf; for (i = 0; i < ITERATIONS; i++) stat(PATH, &buf); return 0; } $ gcc -O2 tt.c $ touch 123456781234567812345678123456781 $ time ./a.out Per my test on T-HEAD C910 platforms, the above test performance is improved by about 7.5%. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/r/20231225044207.3821-3-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10Merge patch series "Fix XIP boot and make XIP testable in QEMU"Palmer Dabbelt1-2/+6
Frederik Haxel <haxel@fzi.de> says: XIP boot seems to be broken for some time now. A likely reason why no one seems to have noticed this is that XIP is more difficult to test, as it is currently not easily testable with QEMU. These patches fix the XIP boot and allow an XIP build without BUILTIN_DTB, which in turn makes it easier to test an image with the QEMU virt machine. * b4-shazam-merge: riscv: Allow disabling of BUILTIN_DTB for XIP riscv: Fixed wrong register in XIP_FIXUP_FLASH_OFFSET macro riscv: Make XIP bootable again Link: https://lore.kernel.org/r/20231212130116.848530-1-haxel@fzi.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10Merge remote-tracking branch 'palmer/fixes' into for-nextPalmer Dabbelt1-3/+8
I don't usually merge these in, but I missed sending a PR due to the holidays. * palmer/fixes: riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXEC riscv: Fix module_alloc() that did not reset the linear mapping permissions riscv: Fix wrong usage of lm_alias() when splitting a huge linear mapping riscv: Check if the code to patch lies in the exit section riscv: errata: andes: Probe for IOCP only once in boot stage riscv: Fix SMP when shadow call stacks are enabled dt-bindings: perf: riscv,pmu: drop unneeded quotes riscv: fix misaligned access handling of C.SWSP and C.SDSP RISC-V: hwprobe: Always use u64 for extension bits Support rv32 ULEB128 test riscv: Correct type casting in module loading riscv: Safely remove entries from relocation list Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-10riscv: Make XIP bootable againFrederik Haxel1-2/+6
Currently, the XIP kernel seems to fail to boot due to missing XIP_FIXUP and a wrong page_offset value. A superfluous XIP_FIXUP has also been removed. Signed-off-by: Frederik Haxel <haxel@fzi.de> Link: https://lore.kernel.org/r/20231212130116.848530-2-haxel@fzi.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09riscv: Fix set_direct_map_default_noflush() to reset _PAGE_EXECAlexandre Ghiti1-1/+1
When resetting the linear mapping permissions, we must make sure that we clear the X bit so that do not end up with WX mappings (since we set PAGE_KERNEL). Fixes: 395a21ff859c ("riscv: add ARCH_HAS_SET_DIRECT_MAP support") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231213134027.155327-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-01-09riscv: Fix wrong usage of lm_alias() when splitting a huge linear mappingAlexandre Ghiti1-2/+7
lm_alias() can only be used on kernel mappings since it explicitly uses __pa_symbol(), so simply fix this by checking where the address belongs to before. Fixes: 311cd2f6e253 ("riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings") Reported-by: syzbot+afb726d49f84c8d95ee1@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-riscv/000000000000620dd0060c02c5e1@google.com/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20231212195400.128457-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-29arch/mm/fault: fix major fault accounting when retrying under per-VMA lockSuren Baghdasaryan1-0/+2
A test [1] in Android test suite started failing after [2] was merged. It turns out that after handling a major fault under per-VMA lock, the process major fault counter does not register that fault as major. Before [2] read faults would be done under mmap_lock, in which case FAULT_FLAG_TRIED flag is set before retrying. That in turn causes mm_account_fault() to account the fault as major once retry completes. With per-VMA locks we often retry because a fault can't be handled without locking the whole mm using mmap_lock. Therefore such retries do not set FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can now handle read major faults under per-VMA lock and upon retry the fact there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally we would use an additional VM_FAULT bit to indicate the reason for the retry (could not handle under per-VMA lock vs other reason) but this simpler solution seems to work, so keeping it simple. [1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/ Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20riscv: Use accessors to page table entries instead of direct dereferenceAlexandre Ghiti5-62/+106
As very well explained in commit 20a004e7b017 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose page table walker can modify the PTE in parallel must use READ_ONCE()/WRITE_ONCE() macro to avoid any compiler transformation. So apply that to riscv which is such architecture. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20231213203001.179237-5-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-20riscv: mm: Only compile pgtable.c if MMUAlexandre Ghiti1-2/+1
All functions defined in there depend on MMU, so no need to compile it for !MMU configs. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231213203001.179237-4-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-14riscv: Enable pcpu page first chunk allocatorAlexandre Ghiti1-0/+8
As explained in commit 6ea529a2037c ("percpu: make embedding first chunk allocator check vmalloc space size"), the embedding first chunk allocator needs the vmalloc space to be larger than the maximum distance between units which are grouped into NUMA nodes. On a very sparse NUMA configurations and a small vmalloc area (for example, it is 64GB in sv39), the allocation of dynamic percpu data in the vmalloc area could fail. So provide the pcpu page allocator as a fallback in case we fall into such a sparse configuration (which happened in arm64 as shown by commit 09cea6195073 ("arm64: support page mapping percpu first chunk allocator")). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>
2023-12-14mm: Introduce flush_cache_vmap_early()Alexandre Ghiti1-0/+5
The pcpu setup when using the page allocator sets up a new vmalloc mapping very early in the boot process, so early that it cannot use the flush_cache_vmap() function which may depend on structures not yet initialized (for example in riscv, we currently send an IPI to flush other cpus TLB). But on some architectures, we must call flush_cache_vmap(): for example, in riscv, some uarchs can cache invalid TLB entries so we need to flush the new established mapping to avoid taking an exception. So fix this by introducing a new function flush_cache_vmap_early() which is called right after setting the new page table entry and before accessing this new mapping. This new function implements a local flush tlb on riscv and is no-op for other architectures (same as today). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dennis Zhou <dennis@kernel.org>
2023-12-12iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()Jason Gunthorpe1-1/+1
This is not being used to pass ops, it is just a way to tell if an iommu driver was probed. These days this can be detected directly via device_iommu_mapped(). Call device_iommu_mapped() in the two places that need to check it and remove the iommu parameter everywhere. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Moritz Fischer <mdf@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Rob Herring <robh@kernel.org> Tested-by: Hector Martin <marcan@marcan.st> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-16e4def25ebb+820-iommu_fwspec_p1_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-10Merge tag 'riscv-for-linus-6.7-mw2' of ↵Linus Torvalds8-151/+410
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for handling misaligned accesses in S-mode - Probing for misaligned access support is now properly cached and handled in parallel - PTDUMP now reflects the SW reserved bits, as well as the PBMT and NAPOT extensions - Performance improvements for TLB flushing - Support for many new relocations in the module loader - Various bug fixes and cleanups * tag 'riscv-for-linus-6.7-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits) riscv: Optimize bitops with Zbb extension riscv: Rearrange hwcap.h and cpufeature.h drivers: perf: Do not broadcast to other cpus when starting a counter drivers: perf: Check find_first_bit() return value of: property: Add fw_devlink support for msi-parent RISC-V: Don't fail in riscv_of_parent_hartid() for disabled HARTs riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings riscv: Don't use PGD entries for the linear mapping RISC-V: Probe misaligned access speed in parallel RISC-V: Remove __init on unaligned_emulation_finish() RISC-V: Show accurate per-hart isa in /proc/cpuinfo RISC-V: Don't rely on positional structure initialization riscv: Add tests for riscv module loading riscv: Add remaining module relocations riscv: Avoid unaligned access when relocating modules riscv: split cache ops out of dma-noncoherent.c riscv: Improve flush_tlb_kernel_range() riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb riscv: Improve flush_tlb_range() for hugetlb pages riscv: Improve tlb_flush() ...
2023-11-08Merge tag 'riscv-for-linus-6.7-rc1' of ↵Linus Torvalds3-6/+24
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for cbo.zero in userspace - Support for CBOs on ACPI-based systems - A handful of improvements for the T-Head cache flushing ops - Support for software shadow call stacks - Various cleanups and fixes * tag 'riscv-for-linus-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (31 commits) RISC-V: hwprobe: Fix vDSO SIGSEGV riscv: configs: defconfig: Enable configs required for RZ/Five SoC riscv: errata: prefix T-Head mnemonics with th. riscv: put interrupt entries into .irqentry.text riscv: mm: Update the comment of CONFIG_PAGE_OFFSET riscv: Using TOOLCHAIN_HAS_ZIHINTPAUSE marco replace zihintpause riscv/mm: Fix the comment for swap pte format RISC-V: clarify the QEMU workaround in ISA parser riscv: correct pt_level name via pgtable_l5/4_enabled RISC-V: Provide pgtable_l5_enabled on rv32 clocksource: timer-riscv: Increase rating of clock_event_device for Sstc clocksource: timer-riscv: Don't enable/disable timer interrupt lkdtm: Fix CFI_BACKWARD on RISC-V riscv: Use separate IRQ shadow call stacks riscv: Implement Shadow Call Stack riscv: Move global pointer loading to a macro riscv: Deduplicate IRQ stack switching riscv: VMAP_STACK overflow detection thread-safe RISC-V: cacheflush: Initialize CBO variables on ACPI systems RISC-V: ACPI: RHCT: Add function to get CBO block sizes ...
2023-11-08Merge patch series "riscv: Fix set_memory_XX() and set_direct_map_XX()"Palmer Dabbelt2-46/+236
Alexandre Ghiti <alexghiti@rivosinc.com> says: Those 2 patches fix the set_memory_XX() and set_direct_map_XX() APIs, which in turn fix STRICT_KERNEL_RWX and memfd_secret(). Those were broken since the permission changes were not applied to the linear mapping because the linear mapping is mapped using hugepages and walk_page_range_novma() does not split such mappings. To fix that, patch 1 disables PGD mappings in the linear mapping as it is hard to propagate changes at this level in *all* the page tables, this has the downside of disabling PMD mapping for sv32 and PUD (1GB) mapping for sv39 in the linear mapping (for specific kernels, we could add a Kconfig to enable ARCH_HAS_SET_DIRECT_MAP and STRICT_KERNEL_RWX if needed, I'm pretty sure we'll discuss that). patch 2 implements the split of the huge linear mappings so that walk_page_range_novma() can properly apply the permissions. The whole split is protected with mmap_sem in write mode, but I'm wondering if that's enough, any opinion on that is appreciated. * b4-shazam-merge: riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear mappings riscv: Don't use PGD entries for the linear mapping Link: https://lore.kernel.org/r/20231108075930.7157-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-08riscv: Fix set_memory_XX() and set_direct_map_XX() by splitting huge linear ↵Alexandre Ghiti1-40/+230
mappings When STRICT_KERNEL_RWX is set, any change of permissions on any kernel mapping (vmalloc/modules/kernel text...etc) should be applied on its linear mapping alias. The problem is that the riscv kernel uses huge mappings for the linear mapping and walk_page_range_novma() does not split those huge mappings. So this patchset implements such split in order to apply fine-grained permissions on the linear mapping. Below is the difference before and after (the first PUD mapping is split into PTE/PMD mappings): Before: ---[ Linear mapping ]--- 0xffffaf8000080000-0xffffaf8000200000 0x0000000080080000 1536K PTE D A G . . W R V 0xffffaf8000200000-0xffffaf8077c00000 0x0000000080200000 1914M PMD D A G . . W R V 0xffffaf8077c00000-0xffffaf8078800000 0x00000000f7c00000 12M PMD D A G . . . R V 0xffffaf8078800000-0xffffaf8078c00000 0x00000000f8800000 4M PMD D A G . . W R V 0xffffaf8078c00000-0xffffaf8079200000 0x00000000f8c00000 6M PMD D A G . . . R V 0xffffaf8079200000-0xffffaf807e600000 0x00000000f9200000 84M PMD D A G . . W R V 0xffffaf807e600000-0xffffaf807e716000 0x00000000fe600000 1112K PTE D A G . . W R V 0xffffaf807e717000-0xffffaf807e71a000 0x00000000fe717000 12K PTE D A G . . W R V 0xffffaf807e71d000-0xffffaf807e71e000 0x00000000fe71d000 4K PTE D A G . . W R V 0xffffaf807e722000-0xffffaf807e800000 0x00000000fe722000 888K PTE D A G . . W R V 0xffffaf807e800000-0xffffaf807fe00000 0x00000000fe800000 22M PMD D A G . . W R V 0xffffaf807fe00000-0xffffaf807ff54000 0x00000000ffe00000 1360K PTE D A G . . W R V 0xffffaf807ff55000-0xffffaf8080000000 0x00000000fff55000 684K PTE D A G . . W R V 0xffffaf8080000000-0xffffaf8400000000 0x0000000100000000 14G PUD D A G . . W R V After: ---[ Linear mapping ]--- 0xffffaf8000080000-0xffffaf8000200000 0x0000000080080000 1536K PTE D A G . . W R V 0xffffaf8000200000-0xffffaf8077c00000 0x0000000080200000 1914M PMD D A G . . W R V 0xffffaf8077c00000-0xffffaf8078800000 0x00000000f7c00000 12M PMD D A G . . . R V 0xffffaf8078800000-0xffffaf8078a00000 0x00000000f8800000 2M PMD D A G . . W R V 0xffffaf8078a00000-0xffffaf8078c00000 0x00000000f8a00000 2M PTE D A G . . W R V 0xffffaf8078c00000-0xffffaf8079200000 0x00000000f8c00000 6M PMD D A G . . . R V 0xffffaf8079200000-0xffffaf807e600000 0x00000000f9200000 84M PMD D A G . . W R V 0xffffaf807e600000-0xffffaf807e716000 0x00000000fe600000 1112K PTE D A G . . W R V 0xffffaf807e717000-0xffffaf807e71a000 0x00000000fe717000 12K PTE D A G . . W R V 0xffffaf807e71d000-0xffffaf807e71e000 0x00000000fe71d000 4K PTE D A G . . W R V 0xffffaf807e722000-0xffffaf807e800000 0x00000000fe722000 888K PTE D A G . . W R V 0xffffaf807e800000-0xffffaf807fe00000 0x00000000fe800000 22M PMD D A G . . W R V 0xffffaf807fe00000-0xffffaf807ff54000 0x00000000ffe00000 1360K PTE D A G . . W R V 0xffffaf807ff55000-0xffffaf8080000000 0x00000000fff55000 684K PTE D A G . . W R V 0xffffaf8080000000-0xffffaf8080800000 0x0000000100000000 8M PMD D A G . . W R V 0xffffaf8080800000-0xffffaf8080af6000 0x0000000100800000 3032K PTE D A G . . W R V 0xffffaf8080af6000-0xffffaf8080af8000 0x0000000100af6000 8K PTE D A G . X . R V 0xffffaf8080af8000-0xffffaf8080c00000 0x0000000100af8000 1056K PTE D A G . . W R V 0xffffaf8080c00000-0xffffaf8081a00000 0x0000000100c00000 14M PMD D A G . . W R V 0xffffaf8081a00000-0xffffaf8081a40000 0x0000000101a00000 256K PTE D A G . . W R V 0xffffaf8081a40000-0xffffaf8081a44000 0x0000000101a40000 16K PTE D A G . X . R V 0xffffaf8081a44000-0xffffaf8081a52000 0x0000000101a44000 56K PTE D A G . . W R V 0xffffaf8081a52000-0xffffaf8081a54000 0x0000000101a52000 8K PTE D A G . X . R V ... 0xffffaf809e800000-0xffffaf80c0000000 0x000000011e800000 536M PMD D A G . . W R V 0xffffaf80c0000000-0xffffaf8400000000 0x0000000140000000 13G PUD D A G . . W R V Note that this also fixes memfd_secret() syscall which uses set_direct_map_invalid_noflush() and set_direct_map_default_noflush() to remove the pages from the linear mapping. Below is the kernel page table while a memfd_secret() syscall is running, you can see all the !valid page table entries in the linear mapping: ... 0xffffaf8082240000-0xffffaf8082241000 0x0000000102240000 4K PTE D A G . . W R . 0xffffaf8082241000-0xffffaf8082250000 0x0000000102241000 60K PTE D A G . . W R V 0xffffaf8082250000-0xffffaf8082252000 0x0000000102250000 8K PTE D A G . . W R . 0xffffaf8082252000-0xffffaf8082256000 0x0000000102252000 16K PTE D A G . . W R V 0xffffaf8082256000-0xffffaf8082257000 0x0000000102256000 4K PTE D A G . . W R . 0xffffaf8082257000-0xffffaf8082258000 0x0000000102257000 4K PTE D A G . . W R V 0xffffaf8082258000-0xffffaf8082259000 0x0000000102258000 4K PTE D A G . . W R . 0xffffaf8082259000-0xffffaf808225a000 0x0000000102259000 4K PTE D A G . . W R V 0xffffaf808225a000-0xffffaf808225c000 0x000000010225a000 8K PTE D A G . . W R . 0xffffaf808225c000-0xffffaf8082266000 0x000000010225c000 40K PTE D A G . . W R V 0xffffaf8082266000-0xffffaf8082268000 0x0000000102266000 8K PTE D A G . . W R . 0xffffaf8082268000-0xffffaf8082284000 0x0000000102268000 112K PTE D A G . . W R V 0xffffaf8082284000-0xffffaf8082288000 0x0000000102284000 16K PTE D A G . . W R . 0xffffaf8082288000-0xffffaf808229c000 0x0000000102288000 80K PTE D A G . . W R V 0xffffaf808229c000-0xffffaf80822a0000 0x000000010229c000 16K PTE D A G . . W R . 0xffffaf80822a0000-0xffffaf80822a5000 0x00000001022a0000 20K PTE D A G . . W R V 0xffffaf80822a5000-0xffffaf80822a6000 0x00000001022a5000 4K PTE D A G . . . R V 0xffffaf80822a6000-0xffffaf80822ab000 0x00000001022a6000 20K PTE D A G . . W R V ... And when the memfd_secret() fd is released, the linear mapping is correctly reset: ... 0xffffaf8082240000-0xffffaf80822a5000 0x0000000102240000 404K PTE D A G . . W R V 0xffffaf80822a5000-0xffffaf80822a6000 0x00000001022a5000 4K PTE D A G . . . R V 0xffffaf80822a6000-0xffffaf80822af000 0x00000001022a6000 36K PTE D A G . . W R V ... Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231108075930.7157-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-08riscv: Don't use PGD entries for the linear mappingAlexandre Ghiti1-6/+6
Propagating changes at this level is cumbersome as we need to go through all the page tables when that happens (either when changing the permissions or when splitting the mapping). Note that this prevents the use of 4MB mapping for sv32 and 1GB mapping for sv39 in the linear mapping. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231108075930.7157-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-07riscv: split cache ops out of dma-noncoherent.cChristoph Hellwig3-15/+18
The cache ops are also used by the pmem code which is unconditionally built into the kernel. Move them into a separate file that is built based on the correct config option. Fixes: fd962781270e ("riscv: RISCV_NONSTANDARD_CACHE_OPS shouldn't depend on RISCV_DMA_NONCOHERENT") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # Link: https://lore.kernel.org/r/20231028155101.1039049-1-hch@lst.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-06Merge patch series "riscv: tlb flush improvements"Palmer Dabbelt1-65/+116
Alexandre Ghiti <alexghiti@rivosinc.com> says: This series optimizes the tlb flushes on riscv which used to simply flush the whole tlb whatever the size of the range to flush or the size of the stride. Patch 3 introduces a threshold that is microarchitecture specific and will very likely be modified by vendors, not sure though which mechanism we'll use to do that (dt? alternatives? vendor initialization code?). * b4-shazam-merge: riscv: Improve flush_tlb_kernel_range() riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb riscv: Improve flush_tlb_range() for hugetlb pages riscv: Improve tlb_flush() Link: https://lore.kernel.org/r/20231030133027.19542-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-06riscv: Improve flush_tlb_kernel_range()Alexandre Ghiti1-10/+24
This function used to simply flush the whole tlb of all harts, be more subtile and try to only flush the range. The problem is that we can only use PAGE_SIZE as stride since we don't know the size of the underlying mapping and then this function will be improved only if the size of the region to flush is < threshold * PAGE_SIZE. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20231030133027.19542-5-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-06riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlbAlexandre Ghiti1-56/+59
Currently, when the range to flush covers more than one page (a 4K page or a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole tlb comes with a greater cost than flushing a single entry so we should flush single entries up to a certain threshold so that: threshold * cost of flushing a single entry < cost of flushing the whole tlb. Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com> Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20231030133027.19542-4-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-06riscv: Improve flush_tlb_range() for hugetlb pagesAlexandre Ghiti1-1/+28
flush_tlb_range() uses a fixed stride of PAGE_SIZE and in its current form, when a hugetlb mapping needs to be flushed, flush_tlb_range() flushes the whole tlb: so set a stride of the size of the hugetlb mapping in order to only flush the hugetlb mapping. However, if the hugepage is a NAPOT region, all PTEs that constitute this mapping must be invalidated, so the stride size must actually be the size of the PTE. Note that THPs are directly handled by flush_pmd_tlb_range(). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC Link: https://lore.kernel.org/r/20231030133027.19542-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-06riscv: Improve tlb_flush()Alexandre Ghiti1-0/+7
For now, tlb_flush() simply calls flush_tlb_mm() which results in a flush of the whole TLB. So let's use mmu_gather fields to provide a more fine-grained flush of the TLB. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC Link: https://lore.kernel.org/r/20231030133027.19542-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05Merge patch series "Improve PTDUMP and introduce new fields"Palmer Dabbelt1-19/+34
Yu Chien Peter Lin <peterlin@andestech.com> says: This patchset enhances PTDUMP by providing additional information from pagetable entries. The first patch fixes the RSW field, while the second and third patches introduce the PBMT and NAPOT fields, respectively, for RV64 systems. * b4-shazam-merge: riscv: Introduce NAPOT field to PTDUMP riscv: Introduce PBMT field to PTDUMP riscv: Improve PTDUMP to show RSW with non-zero value Link: https://lore.kernel.org/r/20230921025022.3989723-1-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05riscv: Introduce NAPOT field to PTDUMPYu Chien Peter Lin1-0/+4
This patch introduces the NAPOT field to PTDUMP, allowing it to display the letter "N" for pages that have the 63rd bit set. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230921025022.3989723-4-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05riscv: Introduce PBMT field to PTDUMPYu Chien Peter Lin1-0/+16
This patch introduces the PBMT field to the PTDUMP, so it can display the memory attributes for NC or IO. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230921025022.3989723-3-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05riscv: Improve PTDUMP to show RSW with non-zero valueYu Chien Peter Lin1-20/+15
RSW field can be used to encode 2 bits of software defined information. Currently, PTDUMP only prints "RSW" when its value is 1 or 3. To fix this issue and improve the debugging experience with PTDUMP, we redefine _PAGE_SPECIAL to its original value and use _PAGE_SOFT as the RSW mask, allow it to print the RSW with any non-zero value. This patch also removes the val from the struct prot_bits as it is no longer needed. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230921025022.3989723-2-peterlin@andestech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-05RISC-V: capitalise CMO op macrosConor Dooley2-6/+6
The CMO op macros initially used lower case, as the original iteration of the ALT_CMO_OP alternative stringified the first parameter to finalise the assembly for the standard variant. As a knock-on, the T-Head versions of these CMOs had to use mixed case defines. Commit dd23e9535889 ("RISC-V: replace cbom instructions with an insn-def") removed the asm construction with stringify, replacing it an insn-def macro, rending the lower-case surplus to requirements. As far as I can tell from a brief check, CBO_zero does not see similar use and didn't require the mixed case define in the first place. Replace the lower case characters now for consistency with other insn-def macros in the standard and T-Head forms, and adjust the callsites. Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20230915-aloe-dollar-994937477776@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-03Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of ↵Linus Torvalds1-130/+11
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...
2023-11-01Merge patch series "RISC-V: ACPI improvements"Palmer Dabbelt1-6/+19
Sunil V L <sunilvl@ventanamicro.com> says: This series is a set of patches which were originally part of RFC v1 series [1] to add ACPI support in RISC-V interrupt controllers. Since these patches are independent of the interrupt controllers, creating this new series which helps to merge instead of waiting for big series. This set of patches primarily adds support below ECR [2] which is approved by the ASWG and adds below features. - Get CBO block sizes from RHCT on ACPI based systems. Additionally, the series contains a patch to improve acpi_os_ioremap(). [1] - https://lore.kernel.org/lkml/20230803175202.3173957-1-sunilvl@ventanamicro.com/ [2] - https://drive.google.com/file/d/1sKbOa8m1UZw1JkquZYe3F1zQBN1xXsaf/view?usp=sharing * b4-shazam-merge: RISC-V: cacheflush: Initialize CBO variables on ACPI systems RISC-V: ACPI: RHCT: Add function to get CBO block sizes RISC-V: ACPI: Update the return value of acpi_get_rhct() RISC-V: ACPI: Enhance acpi_os_ioremap with MMIO remapping Link: https://lore.kernel.org/r/20231018124007.1306159-1-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-01riscv: correct pt_level name via pgtable_l5/4_enabledSong Shuai1-0/+3
The pt_level uses CONFIG_PGTABLE_LEVELS to display page table names. But if page mode is downgraded from kernel cmdline or restricted by the hardware in 64BIT, it will give a wrong name. Like, using no4lvl for sv39, ptdump named the 1G-mapping as "PUD" that should be "PGD": 0xffffffd840000000-0xffffffd900000000 0x00000000c0000000 3G PUD D A G . . W R V So select "P4D/PUD" or "PGD" via pgtable_l5/4_enabled to correct it. Fixes: e8a62cc26ddf ("riscv: Implement sv48 support") Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Song Shuai <suagrfillet@gmail.com> Link: https://lore.kernel.org/r/20230712115740.943324-1-suagrfillet@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230830044129.11481-3-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-11-01RISC-V: Provide pgtable_l5_enabled on rv32Palmer Dabbelt1-0/+2
A few of the other page table level helpers are defined on rv32, but not pgtable_l5_enabled. This adds the definition as a constant and converts pgtable_l4_enabled to a constant as well. Link: https://lore.kernel.org/r/20230830044129.11481-2-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-26RISC-V: cacheflush: Initialize CBO variables on ACPI systemsSunil V L1-6/+19
Initialize the CBO variables on ACPI based systems using information in RHCT. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20231018124007.1306159-5-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-10-07riscv: fix set_huge_pte_at() for NAPOT mappings when a swap entry is setAlexandre Ghiti1-6/+13
We used to determine the number of page table entries to set for a NAPOT hugepage by using the pte value which actually fails when the pte to set is a swap entry. So take advantage of a recent fix for arm64 reported in [1] which introduces the size of the mapping as an argument of set_huge_pte_at(): we can then use this size to compute the number of page table entries to set for a NAPOT region. Link: https://lkml.kernel.org/r/20230928151846.8229-3-alexghiti@rivosinc.com Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reported-by: Ryan Roberts <ryan.roberts@arm.com> Closes: https://lore.kernel.org/linux-arm-kernel/20230922115804.2043771-1-ryan.roberts@arm.com/ [1] Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Qinglin Pan <panqinglin2020@iscas.ac.cn> Cc: Conor Dooley <conor@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-07riscv: handle VM_FAULT_[HWPOISON|HWPOISON_LARGE] faults instead of panickingAlexandre Ghiti1-1/+1
Patch series "Fix set_huge_pte_at()". A recent report [1] from Ryan for arm64 revealed that we do not handle swap entries when setting a hugepage backed by a NAPOT region (the contpte riscv equivalent). As explained in [1], the issue was discovered by a new test in kselftest which uses poison entries, but the symptoms are different from arm64 though: - the riscv kernel bugs because we do not handle VM_FAULT_HWPOISON*, this is fixed by patch 1, - after that, the test passes because the first pte_napot() fails (the poison entry does not have the N bit set), and then we only set the first page table entry covering the NAPOT hugepage, which is enough for hugetlb_fault() to correctly raise a VM_FAULT_HWPOISON wherever we write in this mapping since only this first page table entry is checked (see https://elixir.bootlin.com/linux/v6.6-rc3/source/mm/hugetlb.c#L6071). But this seems fragile so patch 2 sets all page table entries of a NAPOT mapping. [1]: https://lore.kernel.org/linux-arm-kernel/20230922115804.2043771-1-ryan.roberts@arm.com/ This patch (of 2): We used to panic when such faults were encountered but we should handle those faults gracefully for userspace by sending a SIGBUS to the process, like most architectures do. Link: https://lkml.kernel.org/r/20230928151846.8229-1-alexghiti@rivosinc.com Link: https://lkml.kernel.org/r/20230928151846.8229-2-alexghiti@rivosinc.com Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: Conor Dooley <conor@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Qinglin Pan <panqinglin2020@iscas.ac.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04riscv: kdump: use generic interface to simplify crashkernel reservationBaoquan He1-130/+11
With the help of newly changed function parse_crashkernel() and generic reserve_crashkernel_generic(), crashkernel reservation can be simplified by steps: 1) Add a new header file <asm/crash_core.h>, and define CRASH_ALIGN, CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX and DEFAULT_CRASH_KERNEL_LOW_SIZE in <asm/crash_core.h>; 2) Add arch_reserve_crashkernel() to call parse_crashkernel() and reserve_crashkernel_generic(); 3) Add ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION Kconfig in arch/riscv/Kconfig. The old reserve_crashkernel_low() and reserve_crashkernel() can be removed. [chenjiahao16@huawei.com: fix crashkernel reserving problem on RISC-V] Link: https://lkml.kernel.org/r/20230925024333.730964-1-chenjiahao16@huawei.com Link: https://lkml.kernel.org/r/20230914033142.676708-9-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jiahao <chenjiahao16@huawei.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-04crash_core: change the prototype of function parse_crashkernel()Baoquan He1-1/+1
Add two parameters 'low_size' and 'high' to function parse_crashkernel(), later crashkernel=,high|low parsing will be added. Make adjustments in all call sites of parse_crashkernel() in arch. Link: https://lkml.kernel.org/r/20230914033142.676708-3-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Jiahao <chenjiahao16@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-30mm: hugetlb: add huge page size param to set_huge_pte_at()Ryan Roberts1-1/+2
Patch series "Fix set_huge_pte_at() panic on arm64", v2. This series fixes a bug in arm64's implementation of set_huge_pte_at(), which can result in an unprivileged user causing a kernel panic. The problem was triggered when running the new uffd poison mm selftest for HUGETLB memory. This test (and the uffd poison feature) was merged for v6.5-rc7. Ideally, I'd like to get this fix in for v6.6 and I've cc'ed stable (correctly this time) to get it backported to v6.5, where the issue first showed up. Description of Bug ================== arm64's huge pte implementation supports multiple huge page sizes, some of which are implemented in the page table with multiple contiguous entries. So set_huge_pte_at() needs to work out how big the logical pte is, so that it can also work out how many physical ptes (or pmds) need to be written. It previously did this by grabbing the folio out of the pte and querying its size. However, there are cases when the pte being set is actually a swap entry. But this also used to work fine, because for huge ptes, we only ever saw migration entries and hwpoison entries. And both of these types of swap entries have a PFN embedded, so the code would grab that and everything still worked out. But over time, more calls to set_huge_pte_at() have been added that set swap entry types that do not embed a PFN. And this causes the code to go bang. The triggering case is for the uffd poison test, commit 99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") - added in v6.5-rc7. Although review shows that there are other call sites that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger on arm64 because arm64 doesn't support UFFD WP. If CONFIG_DEBUG_VM is enabled, we do at least get a BUG(), but otherwise, it will dereference a bad pointer in page_folio(): static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry) { VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry)); return page_folio(pfn_to_page(swp_offset_pfn(entry))); } Fix === The simplest fix would have been to revert the dodgy cleanup commit 18f3962953e4 ("mm: hugetlb: kill set_huge_swap_pte_at()"), but since things have moved on, this would have required an audit of all the new set_huge_pte_at() call sites to see if they should be converted to set_huge_swap_pte_at(). As per the original intent of the change, it would also leave us open to future bugs when people invariably get it wrong and call the wrong helper. So instead, I've added a huge page size parameter to set_huge_pte_at(). This means that the arm64 code has the size in all cases. It's a bigger change, due to needing to touch the arches that implement the function, but it is entirely mechanical, so in my view, low risk. I've compile-tested all touched arches; arm64, parisc, powerpc, riscv, s390, sparc (and additionally x86_64). I've additionally booted and run mm selftests against arm64, where I observe the uffd poison test is fixed, and there are no other regressions. This patch (of 2): In order to fix a bug, arm64 needs to be told the size of the huge page for which the pte is being set in set_huge_pte_at(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. No behavioral changes intended. Link: https://lkml.kernel.org/r/20230922115804.2043771-1-ryan.roberts@arm.com Link: https://lkml.kernel.org/r/20230922115804.2043771-2-ryan.roberts@arm.com Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [powerpc 8xx] Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> [vmalloc change] Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-08Merge patch series "riscv: Introduce KASLR"Palmer Dabbelt1-1/+35
Alexandre Ghiti <alexghiti@rivosinc.com> says: The following KASLR implementation allows to randomize the kernel mapping: - virtually: we expect the bootloader to provide a seed in the device-tree - physically: only implemented in the EFI stub, it relies on the firmware to provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation hence the patch 3 factorizes KASLR related functions for riscv to take advantage. The new virtual kernel location is limited by the early page table that only has one PUD and with the PMD alignment constraint, the kernel can only take < 512 positions. * b4-shazam-merge: riscv: libstub: Implement KASLR by using generic functions libstub: Fix compilation warning for rv32 arm64: libstub: Move KASLR handling functions to kaslr.c riscv: Dump out kernel offset information on panic riscv: Introduce virtual kernel mapping KASLR Link: https://lore.kernel.org/r/20230722123850.634544-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "Add non-coherent DMA support for AX45MP"Palmer Dabbelt2-0/+56
Prabhakar <prabhakar.csengg@gmail.com> says: From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> non-coherent DMA support for AX45MP ==================================== On the Andes AX45MP core, cache coherency is a specification option so it may not be supported. In this case DMA will fail. To get around with this issue this patch series does the below: 1] Andes alternative ports is implemented as errata which checks if the IOCP is missing and only then applies to CMO errata. One vendor specific SBI EXT (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. Below are the configs which Andes port provides (and are selected by RZ/Five): - ERRATA_ANDES - ERRATA_ANDES_CMO OpenSBI patch supporting ANDES_SBI_EXT_IOCP_SW_WORKAROUND SBI is now part v1.3 release. 2] Andes AX45MP core has a Programmable Physical Memory Attributes (PMA) block that allows dynamic adjustment of memory attributes in the runtime. It contains a configurable amount of PMA entries implemented as CSR registers to control the attributes of memory locations in interest. OpenSBI configures the PMA regions as required and creates a reserve memory node and propagates it to the higher boot stack. Currently OpenSBI (upstream) configures the required PMA region and passes this a shared DMA pool to Linux. reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; pma_resv0@58000000 { compatible = "shared-dma-pool"; reg = <0x0 0x58000000 0x0 0x08000000>; no-map; linux,dma-default; }; }; The above shared DMA pool gets appended to Linux DTB so the DMA memory requests go through this region. 3] We provide callbacks to synchronize specific content between memory and cache. 4] RZ/Five SoC selects the below configs - AX45MP_L2_CACHE - DMA_GLOBAL_POOL - ERRATA_ANDES - ERRATA_ANDES_CMO ----------x---------------------x--------------------x---------------x---- * b4-shazam-merge: soc: renesas: Kconfig: Select the required configs for RZ/Five SoC cache: Add L2 cache management for Andes AX45MP RISC-V core dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller riscv: mm: dma-noncoherent: nonstandard cache operations support riscv: errata: Add Andes alternative ports riscv: asm: vendorid_list: Add Andes Technology to the vendors list Link: https://lore.kernel.org/r/20230818135723.80612-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "riscv: dma-mapping: unify support for cache flushes"Palmer Dabbelt1-9/+51
Prabhakar <prabhakar.csengg@gmail.com> says: From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> This patch series is a subset from Arnd's original series [0]. Ive just picked up the bits required for RISC-V unification of cache flushing. Remaining patches from the series [0] will be taken care by Arnd soon. * b4-shazam-merge: riscv: dma-mapping: switch over to generic implementation riscv: dma-mapping: skip invalidation before bidirectional DMA riscv: dma-mapping: only invalidate after DMA, not flush Link: https://lore.kernel.org/r/20230816232336.164413-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-06riscv: Introduce virtual kernel mapping KASLRAlexandre Ghiti1-1/+35
KASLR implementation relies on a relocatable kernel so that we can move the kernel mapping. The seed needed to virtually move the kernel is taken from the device tree, so we rely on the bootloader to provide a correct seed. Zkr could be used unconditionnally instead if implemented, but that's for another patch. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20230722123850.634544-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>