summaryrefslogtreecommitdiff
path: root/arch/riscv/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2024-06-25RISC-V: fix vector insn load/store width maskJesse Taube1-1/+1
RVFDQ_FL_FS_WIDTH_MASK should be 3 bits [14-12], shifted down by 12 bits. Replace GENMASK(3, 0) with GENMASK(2, 0). Fixes: cd054837243b ("riscv: Allocate user's vector context in the first-use trap") Signed-off-by: Jesse Taube <jesse@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240606182800.415831-1-jesse@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-30riscv: Fix fully ordered LR/SC xchg[8|16]() implementationsAlexandre Ghiti1-10/+12
The fully ordered versions of xchg[8|16]() using LR/SC lack the necessary memory barriers to guarantee the order. Fix this by matching what is already implemented in the fully ordered versions of cmpxchg() using LR/SC. Suggested-by: Andrea Parri <parri.andrea@gmail.com> Reported-by: Andrea Parri <parri.andrea@gmail.com> Closes: https://lore.kernel.org/linux-riscv/ZlYbupL5XgzgA0MX@andrea/T/#u Fixes: a8ed2b7a2c13 ("riscv/cmpxchg: Implement xchg for variables of size 1 and 2") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20240530145546.394248-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-24Merge tag 'riscv-for-linus-6.10-mw2' of ↵Linus Torvalds5-11/+78
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - The compression format used for boot images is now configurable at build time, and these formats are shown in `make help` - access_ok() has been optimized - A pair of performance bugs have been fixed in the uaccess handlers - Various fixes and cleanups, including one for the IMSIC build failure and one for the early-boot ftrace illegal NOPs bug * tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix early ftrace nop patching irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict riscv: selftests: Add signal handling vector tests riscv: mm: accelerate pagefault when badaccess riscv: uaccess: Relax the threshold for fast path riscv: uaccess: Allow the last potential unrolled copy riscv: typo in comment for get_f64_reg Use bool value in set_cpu_online() riscv: selftests: Add hwprobe binaries to .gitignore riscv: stacktrace: fixed walk_stackframe() ftrace: riscv: move from REGS to ARGS riscv: do not select MODULE_SECTIONS by default riscv: show help string for riscv-specific targets riscv: make image compression configurable riscv: cpufeature: Fix extension subset checking riscv: cpufeature: Fix thead vector hwcap removal riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled riscv: Define TASK_SIZE_MAX for __access_ok() riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
2024-05-23riscv: Fix early ftrace nop patchingAlexandre Ghiti1-0/+6
Commit c97bf629963e ("riscv: Fix text patching when IPI are used") converted ftrace_make_nop() to use patch_insn_write() which does not emit any icache flush relying entirely on __ftrace_modify_code() to do that. But we missed that ftrace_make_nop() was called very early directly when converting mcount calls into nops (actually on riscv it converts 2B nops emitted by the compiler into 4B nops). This caused crashes on multiple HW as reported by Conor and Björn since the booting core could have half-patched instructions in its icache which would trigger an illegal instruction trap: fix this by emitting a local flush icache when early patching nops. Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reported-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240523115134.70380-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-23Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of ↵Linus Torvalds1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more non-mm updates from Andrew Morton: - A series ("kbuild: enable more warnings by default") from Arnd Bergmann which enables a number of additional build-time warnings. We fixed all the fallout which we could find, there may still be a few stragglers. - Samuel Holland has developed the series "Unified cross-architecture kernel-mode FPU API". This does a lot of consolidation of per-architecture kernel-mode FPU usage and enables the use of newer AMD GPUs on RISC-V. - Tao Su has fixed some selftests build warnings in the series "Selftests: Fix compilation warnings due to missing _GNU_SOURCE definition". - This pull also includes a nilfs2 fixup from Ryusuke Konishi. * tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) nilfs2: make block erasure safe in nilfs_finish_roll_forward() selftests/harness: use 1024 in place of LINE_MAX Revert "selftests/harness: remove use of LINE_MAX" selftests/fpu: allow building on other architectures selftests/fpu: move FP code to a separate translation unit drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT drm/amd/display: only use hard-float, not altivec on powerpc riscv: add support for kernel-mode FPU x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT arch: add ARCH_HAS_KERNEL_FPU_SUPPORT x86/fpu: fix asm/fpu/types.h include guard kbuild: enable -Wcast-function-type-strict unconditionally kbuild: enable -Wformat-truncation on clang ...
2024-05-23Merge patch series "riscv: Extension parsing fixes"Palmer Dabbelt1-0/+2
Charlie Jenkins <charlie@rivosinc.com> says: This series contains two minor fixes for the extension parsing in cpufeature.c. Some T-Head boards without vector 1.0 support report "v" in the isa string in their DT which will cause the kernel to run vector code. The code to blacklist "v" from these boards was doing so by using riscv_cached_mvendorid() which has not been populated at the time of extension parsing. This fix instead greedily reads the mvendorid CSR of the boot hart to determine if the cpu is from T-Head. The other fix is for an incorrect indexing bug. riscv extensions sometimes imply other extensions. When adding these "subset" extensions to the hardware capabilities array, they need to be checked if they are valid. The current code only checks if the extension that is including other extensions is valid and not the subset extensions. These patches were previously included in: https://lore.kernel.org/lkml/20240420-dev-charlie-support_thead_vector_6_9-v3-0-67cff4271d1d@rivosinc.com/ * b4-shazam-merge: riscv: cpufeature: Fix extension subset checking riscv: cpufeature: Fix thead vector hwcap removal Link: https://lore.kernel.org/r/20240502-cpufeature_fixes-v4-0-b3d1a088722d@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-23ftrace: riscv: move from REGS to ARGSPuranjay Mohan1-7/+69
This commit replaces riscv's support for FTRACE_WITH_REGS with support for FTRACE_WITH_ARGS. This is required for the ongoing effort to stop relying on stop_machine() for RISCV's implementation of ftrace. The main relevant benefit that this change will bring for the above use-case is that now we don't have separate ftrace_caller and ftrace_regs_caller trampolines. This will allow the callsite to call ftrace_caller by modifying a single instruction. Now the callsite can do something similar to: When not tracing: | When tracing: func: func: auipc t0, ftrace_caller_top auipc t0, ftrace_caller_top nop <=========<Enable/Disable>=========> jalr t0, ftrace_caller_bottom [...] [...] The above assumes that we are dropping the support of calling a direct trampoline from the callsite. We need to drop this as the callsite can't change the target address to call, it can only enable/disable a call to a preset target (ftrace_caller in the above diagram). We can later optimize this by calling an intermediate dispatcher trampoline before ftrace_caller. Currently, ftrace_regs_caller saves all CPU registers in the format of struct pt_regs and allows the tracer to modify them. We don't need to save all of the CPU registers because at function entry only a subset of pt_regs is live: |----------+----------+---------------------------------------------| | Register | ABI Name | Description | |----------+----------+---------------------------------------------| | x1 | ra | Return address for traced function | | x2 | sp | Stack pointer | | x5 | t0 | Return address for ftrace_caller trampoline | | x8 | s0/fp | Frame pointer | | x10-11 | a0-1 | Function arguments/return values | | x12-17 | a2-7 | Function arguments | |----------+----------+---------------------------------------------| See RISCV calling convention[1] for the above table. Saving just the live registers decreases the amount of stack space required from 288 Bytes to 112 Bytes. Basic testing was done with this on the VisionFive 2 development board. Note: - Moving from REGS to ARGS will mean that RISCV will stop supporting KPROBES_ON_FTRACE as it requires full pt_regs to be saved. - KPROBES_ON_FTRACE will be supplanted by FPROBES see [2]. [1] https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf [2] https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/ Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240405142453.4187-1-puranjay@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-23Merge patch series "riscv: access_ok() optimization"Palmer Dabbelt2-4/+1
Samuel Holland <samuel.holland@sifive.com> says: This series optimizes access_ok() by defining TASK_SIZE_MAX. At Alex's suggestion, I also tried making TASK_SIZE constant (specifically by making PGDIR_SHIFT a variable instead of a ternary expression, then replacing the load with an immediate using ALTERNATIVE). This appeared to slightly improve performance on some implementations (C906) but regressed it on others (FU740). So I am leaving further optimizations to a later series. * b4-shazam-merge: riscv: Define TASK_SIZE_MAX for __access_ok() riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN Link: https://lore.kernel.org/r/20240327143858.711792-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-22Merge tag 'riscv-for-linus-6.10-mw1' of ↵Linus Torvalds16-447/+316
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Add byte/half-word compare-and-exchange, emulated via LR/SC loops - Support for Rust - Support for Zihintpause in hwprobe - Add PR_RISCV_SET_ICACHE_FLUSH_CTX prctl() - Support lockless lockrefs * tag 'riscv-for-linus-6.10-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits) riscv: defconfig: Enable CONFIG_CLK_SOPHGO_CV1800 riscv: select ARCH_HAS_FAST_MULTIPLIER riscv: mm: still create swiotlb buffer for kmalloc() bouncing if required riscv: Annotate pgtable_l{4,5}_enabled with __ro_after_init riscv: Remove redundant CONFIG_64BIT from pgtable_l{4,5}_enabled riscv: mm: Always use an ASID to flush mm contexts riscv: mm: Preserve global TLB entries when switching contexts riscv: mm: Make asid_bits a local variable riscv: mm: Use a fixed layout for the MM context ID riscv: mm: Introduce cntx2asid/cntx2version helper macros riscv: Avoid TLB flush loops when affected by SiFive CIP-1200 riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma riscv: mm: Combine the SMP and UP TLB flush code riscv: Only send remote fences when some other CPU is online riscv: mm: Broadcast kernel TLB flushes only when needed riscv: Use IPIs for remote cache/TLB flushes by default riscv: Factor out page table TLB synchronization riscv: Flush the instruction cache during SMP bringup riscv: hwprobe: export Zihintpause ISA extension riscv: misaligned: remove CONFIG_RISCV_M_MODE specific code ...
2024-05-22riscv: cpufeature: Fix thead vector hwcap removalCharlie Jenkins1-0/+2
The riscv_cpuinfo struct that contains mvendorid and marchid is not populated until all harts are booted which happens after the DT parsing. Use the mvendorid/marchid from the boot hart to determine if the DT contains an invalid V. Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20240502-cpufeature_fixes-v4-1-b3d1a088722d@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-20riscv: add support for kernel-mode FPUSamuel Holland1-0/+16
This is motivated by the amdgpu DRM driver, which needs floating-point code to support recent hardware. That code is not performance-critical, so only provide a minimal non-preemptible implementation for now. Support is limited to riscv64 because riscv32 requires runtime (libgcc) assistance to convert between doubles and 64-bit integers. Link: https://lkml.kernel.org/r/20240329072441.591471-12-samuel.holland@sifive.com Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: WANG Xuerui <git@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-19Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
2024-05-16riscv: Define TASK_SIZE_MAX for __access_ok()Samuel Holland1-0/+1
TASK_SIZE_MAX should be set to a constant value, at least the largest valid userspace address under any runtime configuration. This optimizes the check in __access_ok(), which no longer needs to compute the runtime value of TASK_SIZE. The check does not need to be exact, as long as it accepts all valid userspace addresses and rejects all valid kernel addresses; well-behaved programs will never fail the access_ok() check. For RISC-V, which requires all virtual addresses to be sign extended, the optimal choice is LONG_MAX because it simplifies the limit comparison to a sign bit test. This removes about half of the references to pgtable_l[45]_enabled. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240327143858.711792-3-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-16riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MINSamuel Holland2-4/+0
TASK_SIZE_MIN is unused since commit 085e2ff9aeb0 ("efi: libstub: Drop randomization of runtime memory map"). PGDIR_SIZE_L3 is only used in the definition of TASK_SIZE_MIN. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240327143858.711792-2-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-19/+61
Pull KVM updates from Paolo Bonzini: "ARM: - Move a lot of state that was previously stored on a per vcpu basis into a per-CPU area, because it is only pertinent to the host while the vcpu is loaded. This results in better state tracking, and a smaller vcpu structure. - Add full handling of the ERET/ERETAA/ERETAB instructions in nested virtualisation. The last two instructions also require emulating part of the pointer authentication extension. As a result, the trap handling of pointer authentication has been greatly simplified. - Turn the global (and not very scalable) LPI translation cache into a per-ITS, scalable cache, making non directly injected LPIs much cheaper to make visible to the vcpu. - A batch of pKVM patches, mostly fixes and cleanups, as the upstreaming process seems to be resuming. Fingers crossed! - Allocate PPIs and SGIs outside of the vcpu structure, allowing for smaller EL2 mapping and some flexibility in implementing more or less than 32 private IRQs. - Purge stale mpidr_data if a vcpu is created after the MPIDR map has been created. - Preserve vcpu-specific ID registers across a vcpu reset. - Various minor cleanups and improvements. LoongArch: - Add ParaVirt IPI support - Add software breakpoint support - Add mmio trace events support RISC-V: - Support guest breakpoints using ebreak - Introduce per-VCPU mp_state_lock and reset_cntx_lock - Virtualize SBI PMU snapshot and counter overflow interrupts - New selftests for SBI PMU and Guest ebreak - Some preparatory work for both TDX and SNP page fault handling. This also cleans up the page fault path, so that the priorities of various kinds of fauls (private page, no memory, write to read-only slot, etc.) are easier to follow. x86: - Minimize amount of time that shadow PTEs remain in the special REMOVED_SPTE state. This is a state where the mmu_lock is held for reading but concurrent accesses to the PTE have to spin; shortening its use allows other vCPUs to repopulate the zapped region while the zapper finishes tearing down the old, defunct page tables. - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field, which is defined by hardware but left for software use. This lets KVM communicate its inability to map GPAs that set bits 51:48 on hosts without 5-level nested page tables. Guest firmware is expected to use the information when mapping BARs; this avoids that they end up at a legal, but unmappable, GPA. - Fixed a bug where KVM would not reject accesses to MSR that aren't supposed to exist given the vCPU model and/or KVM configuration. - As usual, a bunch of code cleanups. x86 (AMD): - Implement a new and improved API to initialize SEV and SEV-ES VMs, which will also be extendable to SEV-SNP. The new API specifies the desired encryption in KVM_CREATE_VM and then separately initializes the VM. The new API also allows customizing the desired set of VMSA features; the features affect the measurement of the VM's initial state, and therefore enabling them cannot be done tout court by the hypervisor. While at it, the new API includes two bugfixes that couldn't be applied to the old one without a flag day in userspace or without affecting the initial measurement. When a SEV-ES VM is created with the new VM type, KVM_GET_REGS/KVM_SET_REGS and friends are rejected once the VMSA has been encrypted. Also, the FPU and AVX state will be synchronized and encrypted too. - Support for GHCB version 2 as applicable to SEV-ES guests. This, once more, is only accessible when using the new KVM_SEV_INIT2 flow for initialization of SEV-ES VMs. x86 (Intel): - An initial bunch of prerequisite patches for Intel TDX were merged. They generally don't do anything interesting. The only somewhat user visible change is a new debugging mode that checks that KVM's MMU never triggers a #VE virtualization exception in the guest. - Clear vmcs.EXIT_QUALIFICATION when synthesizing an EPT Misconfig VM-Exit to L1, as per the SDM. Generic: - Use vfree() instead of kvfree() for allocations that always use vcalloc() or __vcalloc(). - Remove .change_pte() MMU notifier - the changes to non-KVM code are small and Andrew Morton asked that I also take those through the KVM tree. The callback was only ever implemented by KVM (which was also the original user of MMU notifiers) but it had been nonfunctional ever since calls to set_pte_at_notify were wrapped with invalidate_range_start and invalidate_range_end... in 2012. Selftests: - Enhance the demand paging test to allow for better reporting and stressing of UFFD performance. - Convert the steal time test to generate TAP-friendly output. - Fix a flaky false positive in the xen_shinfo_test due to comparing elapsed time across two different clock domains. - Skip the MONITOR/MWAIT test if the host doesn't actually support MWAIT. - Avoid unnecessary use of "sudo" in the NX hugepage test wrapper shell script, to play nice with running in a minimal userspace environment. - Allow skipping the RSEQ test's sanity check that the vCPU was able to complete a reasonable number of KVM_RUNs, as the assert can fail on a completely valid setup. If the test is run on a large-ish system that is otherwise idle, and the test isn't affined to a low-ish number of CPUs, the vCPU task can be repeatedly migrated to CPUs that are in deep sleep states, which results in the vCPU having very little net runtime before the next migration due to high wakeup latencies. - Define _GNU_SOURCE for all selftests to fix a warning that was introduced by a change to kselftest_harness.h late in the 6.9 cycle, and because forcing every test to #define _GNU_SOURCE is painful. - Provide a global pseudo-RNG instance for all tests, so that library code can generate random, but determinstic numbers. - Use the global pRNG to randomly force emulation of select writes from guest code on x86, e.g. to help validate KVM's emulation of locked accesses. - Allocate and initialize x86's GDT, IDT, TSS, segments, and default exception handlers at VM creation, instead of forcing tests to manually trigger the related setup. Documentation: - Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (225 commits) selftests/kvm: remove dead file KVM: selftests: arm64: Test vCPU-scoped feature ID registers KVM: selftests: arm64: Test that feature ID regs survive a reset KVM: selftests: arm64: Store expected register value in set_id_regs KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope KVM: arm64: Only reset vCPU-scoped feature ID regs once KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs() KVM: arm64: Rename is_id_reg() to imply VM scope KVM: arm64: Destroy mpidr_data for 'late' vCPU creation KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support KVM: arm64: Fix hvhe/nvhe early alias parsing KVM: SEV: Allow per-guest configuration of GHCB protocol version KVM: SEV: Add GHCB handling for termination requests KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests KVM: SEV: Add support to handle AP reset MSR protocol KVM: x86: Explicitly zero kvm_caps during vendor module load KVM: x86: Fully re-initialize supported_mce_cap on vendor module load KVM: x86: Fully re-initialize supported_vm_types on vendor module load KVM: x86/mmu: Sanity check that __kvm_faultin_pfn() doesn't create noslot pfns KVM: x86/mmu: Initialize kvm_page_fault's pfn and hva to error values ...
2024-05-14riscv: extend execmem_params for generated code allocationsMike Rapoport (IBM)1-0/+3
The memory allocations for kprobes and BPF on RISC-V are not placed in the modules area and these custom allocations are implemented with overrides of alloc_insn_page() and bpf_jit_alloc_exec(). Define MODULES_VADDR and MODULES_END as VMALLOC_START and VMALLOC_END for 32 bit and slightly reorder execmem_params initialization to support both 32 and 64 bit variants, define EXECMEM_KPROBES and EXECMEM_BPF ranges in riscv::execmem_params and drop overrides of alloc_insn_page() and bpf_jit_alloc_exec(). Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-05-10Merge tag 'loongarch-kvm-6.10' of ↵Paolo Bonzini3-12/+12
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.10 1. Add ParaVirt IPI support. 2. Add software breakpoint support. 3. Add mmio trace events support.
2024-04-30Merge patch series "riscv: ASID-related and UP-related TLB flush enhancements"Palmer Dabbelt6-62/+56
Samuel Holland <samuel.holland@sifive.com> says: This series converts uniprocessor kernel builds to use the same TLB flushing code as SMP builds, to take advantage of batching and existing range- and ASID-based TLB flush optimizations. It optimizes out IPIs and SBI calls based on the online CPU count, which also covers the scenario where SMP was enabled at build time but only one CPU is present/online. A final optimization is to use single-ASID flushes wherever possible, to avoid unnecessary TLB misses for kernel mappings. This series has a semantic conflict with the AIA patches that are in linux-next due to the removal of the third parameter of riscv_ipi_set_virq_range(), which is called from imsic_ipi_domain_init() in drivers/irqchip/irq-riscv-imsic-early.c. The resolution is to remove the extra argument from the call site. Here are some numbers from D1 which show the performance impact: v6.9-rc1: System Benchmarks Partial Index BASELINE RESULT INDEX Execl Throughput 43.0 198.5 46.2 File Copy 1024 bufsize 2000 maxblocks 3960.0 73934.4 186.7 File Copy 256 bufsize 500 maxblocks 1655.0 20242.6 122.3 File Copy 4096 bufsize 8000 maxblocks 5800.0 197706.4 340.9 Pipe Throughput 12440.0 176974.2 142.3 Pipe-based Context Switching 4000.0 23626.8 59.1 Process Creation 126.0 449.9 35.7 Shell Scripts (1 concurrent) 42.4 544.4 128.4 Shell Scripts (16 concurrent) --- 35.3 --- Shell Scripts (8 concurrent) 6.0 71.6 119.3 System Call Overhead 15000.0 248072.6 165.4 ======== System Benchmarks Index Score (Partial Only) 110.6 v6.9-rc1 + this patch series: System Benchmarks Partial Index BASELINE RESULT INDEX Execl Throughput 43.0 196.8 45.8 File Copy 1024 bufsize 2000 maxblocks 3960.0 71782.2 181.3 File Copy 256 bufsize 500 maxblocks 1655.0 21269.4 128.5 File Copy 4096 bufsize 8000 maxblocks 5800.0 199424.0 343.8 Pipe Throughput 12440.0 196468.6 157.9 Pipe-based Context Switching 4000.0 24261.8 60.7 Process Creation 126.0 459.0 36.4 Shell Scripts (1 concurrent) 42.4 543.8 128.2 Shell Scripts (16 concurrent) --- 35.5 --- Shell Scripts (8 concurrent) 6.0 71.7 119.6 System Call Overhead 15000.0 259415.2 172.9 ======== System Benchmarks Index Score (Partial Only) 113.0 * b4-shazam-lts: riscv: mm: Always use an ASID to flush mm contexts riscv: mm: Preserve global TLB entries when switching contexts riscv: mm: Make asid_bits a local variable riscv: mm: Use a fixed layout for the MM context ID riscv: mm: Introduce cntx2asid/cntx2version helper macros riscv: Avoid TLB flush loops when affected by SiFive CIP-1200 riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma riscv: mm: Combine the SMP and UP TLB flush code riscv: Only send remote fences when some other CPU is online riscv: mm: Broadcast kernel TLB flushes only when needed riscv: Use IPIs for remote cache/TLB flushes by default riscv: Factor out page table TLB synchronization riscv: Flush the instruction cache during SMP bringup Link: https://lore.kernel.org/r/20240327045035.368512-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-30Merge patch series "riscv: enable lockless lockref implementation"Palmer Dabbelt1-0/+18
Jisheng Zhang <jszhang@kernel.org> says: This series selects ARCH_USE_CMPXCHG_LOCKREF to enable the cmpxchg-based lockless lockref implementation for riscv. Then, implement arch_cmpxchg64_{relaxed|acquire|release}. After patch1: Using Linus' test case[1] on TH1520 platform, I see a 11.2% improvement. On JH7110 platform, I see 12.0% improvement. After patch2: on both TH1520 and JH7110 platforms, I didn't see obvious performance improvement with Linus' test case [1]. IMHO, this may be related with the fence and lr.d/sc.d hw implementations. In theory, lr/sc without fence could give performance improvement over lr/sc plus fence, so add the code here to leave performance improvement room on newer HW platforms. * b4-shazam-merge: riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release} riscv: select ARCH_USE_CMPXCHG_LOCKREF Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1] Link: https://lore.kernel.org/r/20240325111038.1700-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-30riscv: mm: still create swiotlb buffer for kmalloc() bouncing if requiredJisheng Zhang1-1/+1
After commit f51f7a0fc2f4 ("riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC for !dma_coherent"), for non-coherent platforms with less than 4GB memory, we rely on users to pass "swiotlb=mmnn,force" kernel parameters to enable DMA bouncing for unaligned kmalloc() buffers. Now let's go further: If no bouncing needed for ZONE_DMA, let kernel automatically allocate 1MB swiotlb buffer per 1GB of RAM for kmalloc() bouncing on non-coherent platforms, so that no need to pass "swiotlb=mmnn,force" any more. The math of "1MB swiotlb buffer per 1GB of RAM for kmalloc() bouncing" is taken from arm64. Users can still force smaller swiotlb buffer by passing "swiotlb=mmnn". Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240325110036.1564-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-30Merge patch series "riscv: Create and document PR_RISCV_SET_ICACHE_FLUSH_CTX ↵Palmer Dabbelt4-1/+35
prctl" Charlie Jenkins <charlie@rivosinc.com> says: Improve the performance of icache flushing by creating a new prctl flag PR_RISCV_SET_ICACHE_FLUSH_CTX. The interface is left generic to allow for future expansions such as with the proposed J extension [1]. Documentation is also provided to explain the use case. Patch sent to add PR_RISCV_SET_ICACHE_FLUSH_CTX to man-pages [2]. [1] https://github.com/riscv/riscv-j-extension [2] https://lore.kernel.org/linux-man/20240124-fencei_prctl-v1-1-0bddafcef331@rivosinc.com * b4-shazam-merge: cpumask: Add assign cpu documentation: Document PR_RISCV_SET_ICACHE_FLUSH_CTX prctl riscv: Include riscv_set_icache_flush_ctx prctl riscv: Remove unnecessary irqflags processor.h include Link: https://lore.kernel.org/r/20240312-fencei-v13-0-4b6bdc2bbf32@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-30Merge patch series "riscv: fix patching with IPI"Palmer Dabbelt1-0/+1
Alexandre Ghiti <alexghiti@rivosinc.com> says: patch 1 removes a useless memory barrier and patch 2 actually fixes the issue with IPI in the patching code. * b4-shazam-merge: riscv: Fix text patching when IPI are used riscv: Remove superfluous smp_mb() Link: https://lore.kernel.org/r/20240229121056.203419-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29riscv: mm: Use a fixed layout for the MM context IDSamuel Holland2-4/+2
Currently, the size of the ASID field in the MM context ID dynamically depends on the number of hardware-supported ASID bits. This requires reading a global variable to extract either field from the context ID. Instead, allocate the maximum possible number of bits to the ASID field, so the layout of the context ID is known at compile-time. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240327045035.368512-11-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29riscv: mm: Introduce cntx2asid/cntx2version helper macrosSamuel Holland1-0/+3
When using the ASID allocator, the MM context ID contains two values: the ASID in the lower bits, and the allocator version number in the remaining bits. Use macros to make this separation more obvious. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240327045035.368512-10-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29riscv: Avoid TLB flush loops when affected by SiFive CIP-1200Samuel Holland1-0/+2
Implementations affected by SiFive errata CIP-1200 have a bug which forces the kernel to always use the global variant of the sfence.vma instruction. When affected by this errata, do not attempt to flush a range of addresses; each iteration of the loop would actually flush the whole TLB instead. Instead, minimize the overall number of sfence.vma instructions. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20240327045035.368512-9-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vmaSamuel Holland2-2/+29
commit 3f1e782998cd ("riscv: add ASID-based tlbflushing methods") added calls to the sfence.vma instruction with rs2 != x0. These single-ASID instruction variants are also affected by SiFive errata CIP-1200. Until now, the errata workaround was not needed for the single-ASID sfence.vma variants, because they were only used when the ASID allocator was enabled, and the affected SiFive platforms do not support multiple ASIDs. However, we are going to start using those sfence.vma variants regardless of ASID support, so now we need alternatives covering them. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240327045035.368512-8-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29riscv: mm: Combine the SMP and UP TLB flush codeSamuel Holland1-28/+3
In SMP configurations, all TLB flushing narrower than flush_tlb_all() goes through __flush_tlb_range(). Do the same in UP configurations. This allows UP configurations to take advantage of recent improvements to the code in tlbflush.c, such as support for huge pages and flushing multiple-page ranges. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20240327045035.368512-7-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29riscv: Use IPIs for remote cache/TLB flushes by defaultSamuel Holland3-16/+10
An IPI backend is always required in an SMP configuration, but an SBI implementation is not. For example, SBI will be unavailable when the kernel runs in M mode. For this reason, consider IPI delivery of cache and TLB flushes to be the base case, and any other implementation (such as the SBI remote fence extension) to be an optimization. Generally, if IPIs can be delivered without firmware assistance, they are assumed to be faster than SBI calls due to the SBI context switch overhead. However, when SBI is used as the IPI backend, then the context switch cost must be paid anyway, and performing the cache/TLB flush directly in the SBI implementation is more efficient than injecting an interrupt to S-mode. This is the only existing scenario where riscv_ipi_set_virq_range() is called with use_for_rfence set to false. sbi_ipi_init() already checks riscv_ipi_have_virq_range(), so it only calls riscv_ipi_set_virq_range() when no other IPI device is available. This allows moving the static key and dropping the use_for_rfence parameter. This decouples the static key from the irqchip driver probe order. Furthermore, the static branch only makes sense when CONFIG_RISCV_SBI is enabled. Optherwise, IPIs must be used. Add a fallback definition of riscv_use_sbi_for_rfence() which handles this case and removes the need to check CONFIG_RISCV_SBI elsewhere, such as in cacheflush.c. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240327045035.368512-4-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29riscv: Factor out page table TLB synchronizationSamuel Holland1-18/+13
The logic is the same for all page table levels. See commit 69be3fb111e7 ("riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU"). Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240327045035.368512-3-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29riscv: Do not save the scratch CSR during suspendSamuel Holland1-1/+0
While the processor is executing kernel code, the value of the scratch CSR is always zero, so there is no need to save the value. Continue to write the CSR during the resume flow, so we do not rely on firmware to initialize it. Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240312195641.1830521-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29Merge patch series "riscv: 64-bit NOMMU fixes and enhancements"Palmer Dabbelt2-2/+2
Samuel Holland <samuel.holland@sifive.com> says: This series aims to improve support for NOMMU, specifically by making it easier to test NOMMU kernels in QEMU and on various widely-available hardware (errata permitting). After all, everything supports Svbare... After applying this series, a NOMMU kernel based on defconfig (changing only the three options below*) boots to userspace on QEMU when passed as -kernel. # CONFIG_RISCV_M_MODE is not set # CONFIG_MMU is not set CONFIG_NONPORTABLE=y *if you are using LLD, you must also disable BPF_SYSCALL and KALLSYMS, because LLD bails on out-of-range references to undefined weak symbols. * b4-shazam-merge: riscv: Allow NOMMU kernels to run in S-mode riscv: Remove MMU dependency from Zbb and Zicboz riscv: Fix loading 64-bit NOMMU kernels past the start of RAM riscv: Fix TASK_SIZE on 64-bit NOMMU Link: https://lore.kernel.org/r/20240227003630.3634533-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-29Merge patch series "Rework & improve riscv cmpxchg.h and atomic.h"Palmer Dabbelt2-368/+200
Leonardo Bras <leobras@redhat.com> says: While studying riscv's cmpxchg.h file, I got really interested in understanding how RISCV asm implemented the different versions of {cmp,}xchg. When I understood the pattern, it made sense for me to remove the duplications and create macros to make it easier to understand what exactly changes between the versions: Instruction sufixes & barriers. Also, did the same kind of work on atomic.c. After that, I noted both cmpxchg and xchg only accept variables of size 4 and 8, compared to x86 and arm64 which do 1,2,4,8. Now that deduplication is done, it is quite direct to implement them for variable sizes 1 and 2, so I did it. Then Guo Ren already presented me some possible users :) I did compare the generated asm on a test.c that contained usage for every changed function, and could not detect any change on patches 1 + 2 + 3 compared with upstream. Pathes 4 & 5 were compiled-tested, merged with guoren/qspinlock_v11 and booted just fine with qemu -machine virt -append "qspinlock". (tree: https://gitlab.com/LeoBras/linux/-/commits/guo_qspinlock_v11) Latest tests happened based on this tree: https://github.com/guoren83/linux/tree/qspinlock_v12 * b4-shazam-lts: riscv/cmpxchg: Implement xchg for variables of size 1 and 2 riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2 riscv/atomic.h : Deduplicate arch_atomic.* riscv/cmpxchg: Deduplicate cmpxchg() asm and macros riscv/cmpxchg: Deduplicate xchg() asm functions Link: https://lore.kernel.org/r/20240103163203.72768-2-leobras@redhat.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-26Merge patch series "RISC-V: Test th.sxstatus.MAEE bit before enabling MAEE"Palmer Dabbelt1-10/+10
Christoph Müllner <christoph.muellner@vrull.eu> says: Currently, the Linux kernel suffers from a boot regression when running on the c906 QEMU emulation. Details have been reported here by Björn Töpel: https://lists.gnu.org/archive/html/qemu-devel/2024-01/msg04766.html The main issue is, that Linux enables XTheadMae for CPUs that have a T-Head mvendorid but QEMU maintainers don't want to emulate a CPU that uses reserved bits in PTEs. See also the following discussion for more context: https://lists.gnu.org/archive/html/qemu-devel/2024-02/msg00775.html This series renames "T-Head PBMT" to "MAE"/"XTheadMae" and only enables it if the th.sxstatus.MAEE bit is set. The th.sxstatus CSR is documented here: https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadsxstatus.adoc XTheadMae is documented here: https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadmae.adoc The QEMU patch to emulate th.sxstatus with the MAEE bit not set is here: https://lore.kernel.org/all/20240329120427.684677-1-christoph.muellner@vrull.eu/ After applying the referenced QEMU patch, this patchset allows to successfully boot a C906 QEMU system emulation ("-cpu thead-c906"). * b4-shazam-lts: riscv: T-Head: Test availability bit before enabling MAE errata riscv: thead: Rename T-Head PBMT to MAE Link: https://lore.kernel.org/r/20240407213236.2121592-1-christoph.muellner@vrull.eu Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-26RISC-V: KVM: Improve firmware counter read functionAtish Patra1-1/+1
Rename the function to indicate that it is meant for firmware counter read. While at it, add a range sanity check for it as well. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240420151741.962500-17-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-26RISC-V: KVM: Support 64 bit firmware counters on RV32Atish Patra1-1/+3
The SBI v2.0 introduced a fw_read_hi function to read 64 bit firmware counters for RV32 based systems. Add infrastructure to support that. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-16-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-26RISC-V: KVM: Add perf sampling support for guestsAtish Patra2-1/+5
KVM enables perf for guest via counter virtualization. However, the sampling can not be supported as there is no mechanism to enabled trap/emulate scountovf in ISA yet. Rely on the SBI PMU snapshot to provide the counter overflow data via the shared memory. In case of sampling event, the host first sets the guest's LCOFI interrupt and injects to the guest via irq filtering mechanism defined in AIA specification. Thus, ssaia must be enabled in the host in order to use perf sampling in the guest. No other AIA dependency w.r.t kernel is required. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-15-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-26RISC-V: KVM: Implement SBI PMU Snapshot featureAtish Patra1-0/+7
PMU Snapshot function allows to minimize the number of traps when the guest access configures/access the hpmcounters. If the snapshot feature is enabled, the hypervisor updates the shared memory with counter data and state of overflown counters. The guest can just read the shared memory instead of trap & emulate done by the hypervisor. This patch doesn't implement the counter overflow yet. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-14-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-26mm/arch: provide pud_pfn() fallbackPeter Xu1-0/+1
The comment in the code explains the reasons. We took a different approach comparing to pmd_pfn() by providing a fallback function. Another option is to provide some lower level config options (compare to HUGETLB_PAGE or THP) to identify which layer an arch can support for such huge mappings. However that can be an overkill. [peterx@redhat.com: fix loongson defconfig] Link: https://lkml.kernel.org/r/20240403013249.1418299-4-peterx@redhat.com Link: https://lkml.kernel.org/r/20240327152332.950956-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Jones <andrew.jones@linux.dev> Cc: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-26mm: convert arch_clear_hugepage_flags to take a folioMatthew Wilcox (Oracle)1-3/+3
All implementations that aren't no-ops just set a bit in the flags, and we want to use the folio flags rather than the page flags for that. Rename it to arch_clear_hugetlb_flags() while we're touching it so nobody thinks it's used for THP. [willy@infradead.org: fix arm64 build] Link: https://lkml.kernel.org/r/ZgQvNKGdlDkwhQEX@casper.infradead.org Link: https://lkml.kernel.org/r/20240326171045.410737-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-25riscv: thead: Rename T-Head PBMT to MAEChristoph Müllner1-10/+10
T-Head's vendor extension to set page attributes has the name MAE (memory attribute extension). Let's rename it, so it is clear what this referes to. Link: https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadmae.adoc Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Link: https://lore.kernel.org/r/20240407213236.2121592-2-christoph.muellner@vrull.eu Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-24riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release}Jisheng Zhang1-0/+18
After selecting ARCH_USE_CMPXCHG_LOCKREF, one straight futher optimization is implementing the arch_cmpxchg64_relaxed() because the lockref code does not need the cmpxchg to have barrier semantics. At the same time, implement arch_cmpxchg64_acquire and arch_cmpxchg64_release as well. However, on both TH1520 and JH7110 platforms, I didn't see obvious performance improvement with Linus' test case [1]. IMHO, this may be related with the fence and lr.d/sc.d hw implementations. In theory, lr/sc without fence could give performance improvement over lr/sc plus fence, so add the code here to leave performance improvement room on newer HW platforms. Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4 [1] Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20240325111038.1700-3-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-04-22RISC-V: Use the minor version mask while computing sbi versionAtish Patra1-2/+2
As per the SBI specification, minor version is encoded in the lower 24 bits only. Make sure that the SBI version is computed with the appropriate mask. Currently, there is no minor version in use. Thus, it doesn't change anything functionality but it is good to be compliant with the specification. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-8-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22RISC-V: KVM: Rename the SBI_STA_SHMEM_DISABLE to a generic nameAtish Patra1-1/+1
SBI_STA_SHMEM_DISABLE is a macro to invoke disable shared memory commands. As this can be invoked from other SBI extension context as well, rename it to more generic name as SBI_SHMEM_DISABLE. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-7-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22RISC-V: Add SBI PMU snapshot definitionsAtish Patra1-0/+11
SBI PMU Snapshot function optimizes the number of traps to higher privilege mode by leveraging a shared memory between the S/VS-mode and the M/HS mode. Add the definitions for that extension and new error codes. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-6-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22drivers/perf: riscv: Use BIT macro for shifting operationsAtish Patra1-10/+10
It is a good practice to use BIT() instead of (1 << x). Replace the current usages with BIT(). Take this opportunity to replace few (1UL << x) with BIT() as well for consistency. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-5-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22RISC-V: Add FIRMWARE_READ_HI definitionAtish Patra1-0/+1
SBI v2.0 added another function to SBI PMU extension to read the upper bits of a counter with width larger than XLEN. Add the definition for that function. Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-3-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22RISC-V: Fix the typo in Scountovf CSR nameAtish Patra1-1/+1
The counter overflow CSR name is "scountovf" not "sscountovf". Fix the csr name. Fixes: 4905ec2fb7e6 ("RISC-V: Add sscofpmf extension support") Reviewed-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240420151741.962500-2-atishp@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22RISCV: KVM: Introduce vcpu->reset_cntx_lockYong-Xuan Wang1-0/+1
Originally, the use of kvm->lock in SBI_EXT_HSM_HART_START also avoids the simultaneous updates to the reset context of target VCPU. Since this lock has been replace with vcpu->mp_state_lock, and this new lock also protects the vcpu->mp_state. We have to add a separate lock for vcpu->reset_cntx. Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240417074528.16506-3-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-22RISCV: KVM: Introduce mp_state_lock to avoid lock inversionYong-Xuan Wang1-2/+6
Documentation/virt/kvm/locking.rst advises that kvm->lock should be acquired outside vcpu->mutex and kvm->srcu. However, when KVM/RISC-V handling SBI_EXT_HSM_HART_START, the lock ordering is vcpu->mutex, kvm->srcu then kvm->lock. Although the lockdep checking no longer complains about this after commit f0f44752f5f6 ("rcu: Annotate SRCU's update-side lockdep dependencies"), it's necessary to replace kvm->lock with a new dedicated lock to ensure only one hart can execute the SBI_EXT_HSM_HART_START call for the target hart simultaneously. Additionally, this patch also rename "power_off" to "mp_state" with two possible values. The vcpu->mp_state_lock also protects the access of vcpu->mp_state. Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240417074528.16506-2-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-04-18riscv: Include riscv_set_icache_flush_ctx prctlCharlie Jenkins3-0/+35
Support new prctl with key PR_RISCV_SET_ICACHE_FLUSH_CTX to enable optimization of cross modifying code. This prctl enables userspace code to use icache flushing instructions such as fence.i with the guarantee that the icache will continue to be clean after thread migration. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240312-fencei-v13-2-4b6bdc2bbf32@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>