summaryrefslogtreecommitdiff
path: root/arch/riscv
AgeCommit message (Collapse)AuthorFilesLines
2024-01-02Merge tag 'loongarch-kvm-6.8' of ↵Paolo Bonzini16-87/+112
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.8 1. Optimization for memslot hugepage checking. 2. Cleanup and fix some HW/SW timer issues. 3. Add LSX/LASX (128bit/256bit SIMD) support.
2023-12-30RISC-V: KVM: Implement SBI STA extensionAndrew Jones2-2/+95
Add a select SCHED_INFO to the KVM config in order to get run_delay info. Then implement SBI STA's set-steal-time-shmem function and kvm_riscv_vcpu_record_steal_time() to provide the steal-time info to guests. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add support for SBI STA registersAndrew Jones5-14/+97
KVM userspace needs to be able to save and restore the steal-time shared memory address. Provide the address through the get/set-one-reg interface with two ulong-sized SBI STA extension registers (lo and hi). 64-bit KVM userspace must not set the hi register to anything other than zero and is allowed to completely neglect saving/restoring it. Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add support for SBI extension registersAndrew Jones4-4/+103
Some SBI extensions have state that needs to be saved / restored when migrating the VM. Provide a get/set-one-reg register type for SBI extension registers. Each SBI extension that uses this type will have its own subtype. There are currently no subtypes defined. The next patch introduces the first one. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add SBI STA info to vcpu_archAndrew Jones3-0/+19
KVM's implementation of SBI STA needs to track the address of each VCPU's steal-time shared memory region as well as the amount of stolen time. Add a structure to vcpu_arch to contain this state and make sure that the address is always set to INVALID_GPA on vcpu reset. And, of course, ensure KVM won't try to update steal- time when the shared memory address is invalid. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add steal-update vcpu requestAndrew Jones3-0/+12
Add a new vcpu request to inform a vcpu that it should record its steal-time information. The request is made each time it has been detected that the vcpu task was not assigned a cpu for some time, which is easy to do by making the request from vcpu-load. The record function is just a stub for now and will be filled in with the rest of the steal-time support functions in following patches. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: KVM: Add SBI STA extension skeletonAndrew Jones5-0/+54
Add the files and functions needed to support the SBI STA (steal-time accounting) extension. In the next patches we'll complete the functions to fully enable SBI STA support. Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: paravirt: Implement steal-time supportAndrew Jones2-3/+78
When the SBI STA extension exists we can use it to implement paravirt steal-time support. Fill in the empty pv-time functions with an SBI STA implementation and add the Kconfig knobs allowing it to be enabled. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: Add SBI STA extension definitionsAndrew Jones1-0/+17
The SBI STA extension enables steal-time accounting. Add the definitions it specifies. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-30RISC-V: paravirt: Add skeleton for pv-time supportAndrew Jones5-0/+112
Add the files and functions needed to support paravirt time on RISC-V. Also include the common code needed for the first application of pv-time, which is steal-time. In the next patches we'll complete the functions to fully enable steal-time support. Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29arch/mm/fault: fix major fault accounting when retrying under per-VMA lockSuren Baghdasaryan1-0/+2
A test [1] in Android test suite started failing after [2] was merged. It turns out that after handling a major fault under per-VMA lock, the process major fault counter does not register that fault as major. Before [2] read faults would be done under mmap_lock, in which case FAULT_FLAG_TRIED flag is set before retrying. That in turn causes mm_account_fault() to account the fault as major once retry completes. With per-VMA locks we often retry because a fault can't be handled without locking the whole mm using mmap_lock. Therefore such retries do not set FAULT_FLAG_TRIED flag. This logic does not work after [2] because we can now handle read major faults under per-VMA lock and upon retry the fact there was a major fault gets lost. Fix this by setting FAULT_FLAG_TRIED after retrying under per-VMA lock if VM_FAULT_MAJOR was returned. Ideally we would use an additional VM_FAULT bit to indicate the reason for the retry (could not handle under per-VMA lock vs other reason) but this simpler solution seems to work, so keeping it simple. [1] https://cs.android.com/android/platform/superproject/+/master:test/vts-testcase/kernel/api/drop_caches_prop/drop_caches_test.cpp [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/ Link: https://lkml.kernel.org/r/20231226214610.109282-1-surenb@google.com Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()Anup Patel1-1/+1
The indentation of "break" in kvm_riscv_vcpu_set_reg_csr() is inconsistent hence let us fix it. Fixes: c04913f2b54e ("RISCV: KVM: Add sstateen0 to ONE_REG") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312190719.kBuYl6oJ-lkp@intel.com/ Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: add vector registers and CSRs in KVM_GET_REG_LISTDaniel Henrique Barboza1-0/+55
Add all vector registers and CSRs (vstart, vl, vtype, vcsr, vlenb) in get-reg-list. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: add 'vlenb' Vector CSRDaniel Henrique Barboza1-0/+15
Userspace requires 'vlenb' to be able to encode it in reg ID. Otherwise it is not possible to retrieve any vector reg since we're returning EINVAL if reg_size isn't vlenb (see kvm_riscv_vcpu_vreg_addr()). Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: set 'vlenb' in kvm_riscv_vcpu_alloc_vector_context()Daniel Henrique Barboza1-0/+1
'vlenb', added to riscv_v_ext_state by commit c35f3aa34509 ("RISC-V: vector: export VLENB csr in __sc_riscv_v_state"), isn't being initialized in guest_context. If we export 'vlenb' as a KVM CSR, something we want to do in the next patch, it'll always return 0. Set 'vlenb' to riscv_v_size/32. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: Make SBI uapi consistent with ISA uapiAndrew Jones4-45/+65
When an SBI extension cannot be enabled, that's a distinct state vs. enabled and disabled. Modify enum kvm_riscv_sbi_ext_status to accommodate it, which allows KVM userspace to tell the difference in state too, as the SBI extension register will disappear when it cannot be enabled, i.e. accesses to it return ENOENT. get-reg-list is updated as well to only add SBI extension registers to the list which may be enabled. Returning ENOENT for SBI extension registers which cannot be enabled makes them consistent with ISA extension registers. Any SBI extensions which were enabled by default are still enabled by default, if they can be enabled at all. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: Don't add SBI multi regs in get-reg-listAndrew Jones1-34/+2
The multi regs are derived from the single registers. Only list the single registers in get-reg-list. This also makes the SBI extension register listing consistent with the ISA extension register listing. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29RISC-V: KVM: remove a redundant condition in kvm_arch_vcpu_ioctl_run()Chao Du1-2/+1
The latest ret value is updated by kvm_riscv_vcpu_aia_update(), the loop will continue if the ret is less than or equal to zero. So the later condition will never hit. Thus remove it. Signed-off-by: Chao Du <duchao@eswincomputing.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29riscv: kvm: use ".L" local labels in assembly when applicableClément Léger1-2/+2
For the sake of coherency, use local labels in assembly when applicable. This also avoid kprobes being confused when applying a kprobe since the size of function is computed by checking where the next visible symbol is located. This might end up in computing some function size to be way shorter than expected and thus failing to apply kprobes to the specified offset. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-29riscv: kvm: Use SYM_*() assembly macros instead of deprecated onesClément Léger1-16/+12
ENTRY()/END()/WEAK() macros are deprecated and we should make use of the new SYM_*() macros [1] for better annotation of symbols. Replace the deprecated ones with the new ones and fix wrong usage of END()/ENDPROC() to correctly describe the symbols. [1] https://docs.kernel.org/core-api/asm-annotations.html Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-28Merge tag 'mm-hotfixes-stable-2023-12-27-15-00' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "11 hotfixes. 7 are cc:stable and the other 4 address post-6.6 issues or are not considered backporting material" * tag 'mm-hotfixes-stable-2023-12-27-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: add an old address for Naoya Horiguchi mm/memory-failure: cast index to loff_t before shifting it mm/memory-failure: check the mapcount of the precise page mm/memory-failure: pass the folio and the page to collect_procs() selftests: secretmem: floor the memory size to the multiple of page_size mm: migrate high-order folios in swap cache correctly maple_tree: do not preallocate nodes for slot stores mm/filemap: avoid buffered read/write race to read inconsistent data kunit: kasan_test: disable fortify string checker on kmalloc_oob_memset kexec: select CRYPTO from KEXEC_FILE instead of depending on it kexec: fix KEXEC_FILE dependencies
2023-12-23sched/topology: Add a new arch_scale_freq_ref() methodVincent Guittot1-0/+1
Create a new method to get a unique and fixed max frequency. Currently cpuinfo.max_freq or the highest (or last) state of performance domain are used as the max frequency when computing the frequency for a level of utilization, but: - cpuinfo_max_freq can change at runtime. boost is one example of such change. - cpuinfo.max_freq and last item of the PD can be different leading to different results between cpufreq and energy model. We need to save the reference frequency that has been used when computing the CPUs capacity and use this fixed and coherent value to convert between frequency and CPU's capacity. In fact, we already save the frequency that has been used when computing the capacity of each CPU. We extend the precision to save kHz instead of MHz currently and we modify the type to be aligned with other variables used when converting frequency to capacity and the other way. [ mingo: Minor edits. ] Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20231211104855.558096-2-vincent.guittot@linaro.org
2023-12-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+13
Pull kvm fixes from Paolo Bonzini: "RISC-V: - Fix a race condition in updating external interrupt for trap-n-emulated IMSIC swfile - Fix print_reg defaults in get-reg-list selftest ARM: - Ensure a vCPU's redistributor is unregistered from the MMIO bus if vCPU creation fails - Fix building KVM selftests for arm64 from the top-level Makefile x86: - Fix breakage for SEV-ES guests that use XSAVES Selftests: - Fix bad use of strcat(), by not using strcat() at all" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SEV: Do not intercept accesses to MSR_IA32_XSS for SEV-ES guests KVM: selftests: Fix dynamic generation of configuration names RISCV: KVM: update external interrupt atomically for IMSIC swfile KVM: riscv: selftests: Fix get-reg-list print_reg defaults KVM: selftests: Ensure sysreg-defs.h is generated at the expected path KVM: Convert comment into an assertion in kvm_io_bus_register_dev() KVM: arm64: vgic: Ensure that slots_lock is held in vgic_register_all_redist_iodevs() KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroy KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy() KVM: arm64: vgic: Simplify kvm_vgic_destroy()
2023-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni4-9/+6
Cross-merge networking fixes after downstream PR. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 23c93c3b6275 ("bnxt_en: do not map packet buffers twice") 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") tools/testing/selftests/net/Makefile 2258b666482d ("selftests: add vlan hw filter tests") a0bc96c0cd6e ("selftests: net: verify fq per-band packet limit") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21Merge tag 'riscv-dt-for-v6.8' of ↵Arnd Bergmann12-116/+560
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into soc/dt RISC-V Devicetrees for v6.8 StarFive: Key peripheral support for the jh7100 that depended on the non-standard non-coherent DMA operations, namely mmc, sdcard and sdio wifi. This platform has long been supported out of tree by Emil and Ubuntu etc ship images for it, so having mainline support for a wider range of peripherals (at last) is great. Microchip: The flash used by Auto Update support and the corresponding QSPI controller are added. On publicly available Icicle kits this flash is not usable (engineering sample silicon issues) but in the future Icicle kits will be available that have production silicon. T-Head: Jisheng is busy with RL this cycle and hence T-Head appears here. The Lichee Pi and BeagleV both grow eMMC and uSD support. Sopgho: Support for the Huashan Pi and the cv1812h SoC it uses. The cv1812h is almost identical to the existing cv1800b SoC. These SoCs are intended for use in IP camera type systems but also appear on SBCs, with the last digit denoting the amount integrated DDR3 the device has. The difference between the cv1812h and the existing cv180x devices appears to be the addition of video output interfaces. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-dt-for-v6.8' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: riscv: dts: starfive: Enable SDIO wifi on JH7100 boards riscv: dts: starfive: Enable SD-card on JH7100 boards riscv: dts: starfive: Add JH7100 MMC nodes riscv: dts: starfive: Add pool for coherent DMA memory on JH7100 boards riscv: dts: starfive: Add JH7100 cache controller riscv: dts: starfive: Mark the JH7100 as having non-coherent DMAs riscv: dts: starfive: Group tuples in interrupt properties riscv: dts: thead: Enable LicheePi 4A eMMC and microSD riscv: dts: thead: Enable BeagleV Ahead eMMC and microSD riscv: dts: thead: Add TH1520 mmc controllers and sdhci clock riscv: dts: microchip: add the mpfs' system controller qspi & associated flash riscv: dts: sophgo: add Huashan Pi board device tree riscv: dts: sophgo: add initial CV1812H SoC device tree riscv: dts: sophgo: cv18xx: Add gpio devices riscv: dts: sophgo: Separate compatible specific for CV1800B soc dt-bindings: riscv: Add SOPHGO Huashan Pi board compatibles dt-bindings: timer: Add SOPHGO CV1812H clint dt-bindings: interrupt-controller: Add SOPHGO CV1812H plic Link: https://lore.kernel.org/r/20231221-skimmed-boxy-b78aed8afdc4@spud Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-21posix-timers: Get rid of [COMPAT_]SYS_NI() usesLinus Torvalds1-5/+0
Only the posix timer system calls use this (when the posix timer support is disabled, which does not actually happen in any normal case), because they had debug code to print out a warning about missing system calls. Get rid of that special case, and just use the standard COND_SYSCALL interface that creates weak system call stubs that return -ENOSYS for when the system call does not exist. This fixes a kCFI issue with the SYS_NI() hackery: CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9) WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0 Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Sami Tolvanen <samitolvanen@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-21riscv, kexec: fix the ifdeffery for AFLAGS_kexec_relocate.oBaoquan He1-1/+1
This was introduced in commit fba8a8674f68 ("RISC-V: Add kexec support"). It should work on CONFIG_KEXEC_CORE, but not CONFIG_KEXEC only, since we could set CONFIG_KEXEC_FILE=y and CONFIG_KEXEC=N, or only set CONFIG_CRASH_DUMP=y and disable both CONFIG_KEXEC and CONFIG_KEXEC_FILE. In these cases, the AFLAGS won't take effect with the current ifdeffery for AFLAGS_kexec_relocate.o. So fix it now. Link: https://lkml.kernel.org/r/20231201062538.27240-1-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Changbin Du <changbin.du@intel.com> Cc: Nick Kossifidis <mick@ics.forth.gr> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-21kexec_file, riscv: print out debugging message if requiredBaoquan He2-31/+6
Then when specifying '-d' for kexec_file_load interface, loaded locations of kernel/initrd/cmdline etc can be printed out to help debug. Here replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. And also replace pr_notice() with kexec_dprintk() in elf_kexec_load() because loaded location of purgatory and device tree are only printed out for debugging, it doesn't make sense to always print them out. And also remove kexec_image_info() because the content has been printed out in generic code. Link: https://lkml.kernel.org/r/20231213055747.61826-6-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Conor Dooley <conor@kernel.org> Cc: Joe Perches <joe@perches.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-21kexec: fix KEXEC_FILE dependenciesArnd Bergmann1-3/+1
The cleanup for the CONFIG_KEXEC Kconfig logic accidentally changed the 'depends on CRYPTO=y' dependency to a plain 'depends on CRYPTO', which causes a link failure when all the crypto support is in a loadable module and kexec_file support is built-in: x86_64-linux-ld: vmlinux.o: in function `__x64_sys_kexec_file_load': (.text+0x32e30a): undefined reference to `crypto_alloc_shash' x86_64-linux-ld: (.text+0x32e58e): undefined reference to `crypto_shash_update' x86_64-linux-ld: (.text+0x32e6ee): undefined reference to `crypto_shash_final' Both s390 and x86 have this problem, while ppc64 and riscv have the correct dependency already. On riscv, the dependency is only used for the purgatory, not for the kexec_file code itself, which may be a bit surprising as it means that with CONFIG_CRYPTO=m, it is possible to enable KEXEC_FILE but then the purgatory code is silently left out. Move this into the common Kconfig.kexec file in a way that is correct everywhere, using the dependency on CRYPTO_SHA256=y only when the purgatory code is available. This requires reversing the dependency between ARCH_SUPPORTS_KEXEC_PURGATORY and KEXEC_FILE, but the effect remains the same, other than making riscv behave like the other ones. On s390, there is an additional dependency on CRYPTO_SHA256_S390, which should technically not be required but gives better performance. Remove this dependency here, noting that it was not present in the initial Kconfig code but was brought in without an explanation in commit 71406883fd357 ("s390/kexec_file: Add kexec_file_load system call"). [arnd@arndb.de: fix riscv build] Link: https://lkml.kernel.org/r/67ddd260-d424-4229-a815-e3fcfb864a77@app.fastmail.com Link: https://lkml.kernel.org/r/20231023110308.1202042-1-arnd@kernel.org Fixes: 6af5138083005 ("x86/kexec: refactor for kernel/Kconfig.kexec") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com> Tested-by: Eric DeVolder <eric_devolder@yahoo.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Conor Dooley <conor@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-20Merge patch series "riscv: Use READ_ONCE()/WRITE_ONCE() for pte accesses"Palmer Dabbelt11-120/+134
Alexandre Ghiti <alexghiti@rivosinc.com> says: This series is a follow-up for riscv of a recent series from Ryan [1] which converts all direct dereferences of pte_t into a ptet_get() access. The goal here for riscv is to use READ_ONCE()/WRITE_ONCE() for all page table entries accesses to avoid any compiler transformation when the hardware can concurrently modify the page tables entries (A/D bits for example). I went a bit further and added pud/p4d/pgd_get() helpers as such concurrent modifications can happen too at those levels. [1] https://lore.kernel.org/all/20230612151545.3317766-1-ryan.roberts@arm.com/ * b4-shazam-merge: riscv: Use accessors to page table entries instead of direct dereference riscv: mm: Only compile pgtable.c if MMU mm: Introduce pudp/p4dp/pgdp_get() functions riscv: Use WRITE_ONCE() when setting page table entries Link: https://lore.kernel.org/r/20231213203001.179237-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-20riscv: Use accessors to page table entries instead of direct dereferenceAlexandre Ghiti10-113/+128
As very well explained in commit 20a004e7b017 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose page table walker can modify the PTE in parallel must use READ_ONCE()/WRITE_ONCE() macro to avoid any compiler transformation. So apply that to riscv which is such architecture. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20231213203001.179237-5-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-20riscv: mm: Only compile pgtable.c if MMUAlexandre Ghiti1-2/+1
All functions defined in there depend on MMU, so no need to compile it for !MMU configs. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231213203001.179237-4-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-20riscv: Use WRITE_ONCE() when setting page table entriesAlexandre Ghiti2-5/+5
To avoid any compiler "weirdness" when accessing page table entries which are concurrently modified by the HW, let's use WRITE_ONCE() macro (commit 20a004e7b017 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables") gives a great explanation with more details). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231213203001.179237-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-12-19Merge tag 'for-netdev' of ↵Jakub Kicinski3-12/+18
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-12-18 This PR is larger than usual and contains changes in various parts of the kernel. The main changes are: 1) Fix kCFI bugs in BPF, from Peter Zijlstra. End result: all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y. 2) Introduce BPF token object, from Andrii Nakryiko. It adds an ability to delegate a subset of BPF features from privileged daemon (e.g., systemd) through special mount options for userns-bound BPF FS to a trusted unprivileged application. The design accommodates suggestions from Christian Brauner and Paul Moore. Example: $ sudo mkdir -p /sys/fs/bpf/token $ sudo mount -t bpf bpffs /sys/fs/bpf/token \ -o delegate_cmds=prog_load:MAP_CREATE \ -o delegate_progs=kprobe \ -o delegate_attachs=xdp 3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei. - Complete precision tracking support for register spills - Fix verification of possibly-zero-sized stack accesses - Fix access to uninit stack slots - Track aligned STACK_ZERO cases as imprecise spilled registers. It improves the verifier "instructions processed" metric from single digit to 50-60% for some programs. - Fix verifier retval logic 4) Support for VLAN tag in XDP hints, from Larysa Zaremba. 5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu. End result: better memory utilization and lower I$ miss for calls to BPF via BPF trampoline. 6) Fix race between BPF prog accessing inner map and parallel delete, from Hou Tao. 7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu. It allows BPF interact with IPSEC infra. The intent is to support software RSS (via XDP) for the upcoming ipsec pcpu work. Experiments on AWS demonstrate single tunnel pcpu ipsec reaching line rate on 100G ENA nics. 8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao. 9) BPF file verification via fsverity, from Song Liu. It allows BPF progs get fsverity digest. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits) bpf: Ensure precise is reset to false in __mark_reg_const_zero() selftests/bpf: Add more uprobe multi fail tests bpf: Fail uprobe multi link with negative offset selftests/bpf: Test the release of map btf s390/bpf: Fix indirect trampoline generation selftests/bpf: Temporarily disable dummy_struct_ops test on s390 x86/cfi,bpf: Fix bpf_exception_cb() signature bpf: Fix dtor CFI cfi: Add CFI_NOSEAL() x86/cfi,bpf: Fix bpf_struct_ops CFI x86/cfi,bpf: Fix bpf_callback_t CFI x86/cfi,bpf: Fix BPF JIT call cfi: Flip headers selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment bpf: Limit the number of kprobes when attaching program to multiple kprobes bpf: Limit the number of uprobes when attaching program to multiple uprobes bpf: xdp: Register generic_kfunc_set with XDP programs selftests/bpf: utilize string values for delegate_xxx mount options ... ==================== Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-17riscv: errata: Make ERRATA_STARFIVE_JH7100 depend on !DMA_DIRECT_REMAPEmil Renner Berthing1-1/+3
Similar to the Renesas RZ/Five[1] the JH7100 SoC needs the non-portable CONFIG_DMA_GLOBAL_POOL enabled which is incompatible with DMA_DIRECT_REMAP selected by RISCV_ISA_ZICBOM. [1]: commit 31b2daea0764 ("soc: renesas: Make RZ/Five depend on !DMA_DIRECT_REMAP") Link: https://lore.kernel.org/all/24942b4d-d16a-463f-b39a-f9dfcb89d742@infradead.org/ Fixes: 64fc984a8a54 ("riscv: errata: Add StarFive JH7100 errata") Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-16cfi: Flip headersPeter Zijlstra2-2/+3
Normal include order is that linux/foo.h should include asm/foo.h, CFI has it the wrong way around. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20231215092707.231038174@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-15Merge tag 'mm-hotfixes-stable-2023-12-15-07-11' of ↵Linus Torvalds3-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. 8 are cc:stable and the other 9 pertain to post-6.6 issues" * tag 'mm-hotfixes-stable-2023-12-15-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/mglru: reclaim offlined memcgs harder mm/mglru: respect min_ttl_ms with memcgs mm/mglru: try to stop at high watermarks mm/mglru: fix underprotected page cache mm/shmem: fix race in shmem_undo_range w/THP Revert "selftests: error out if kernel header files are not yet built" crash_core: fix the check for whether crashkernel is from high memory x86, kexec: fix the wrong ifdeffery CONFIG_KEXEC sh, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC mips, kexec: fix the incorrect ifdeffery and dependency of CONFIG_KEXEC m68k, kexec: fix the incorrect ifdeffery and build dependency of CONFIG_KEXEC loongarch, kexec: change dependency of object files mm/damon/core: make damon_start() waits until kdamond_fn() starts selftests/mm: cow: print ksft header before printing anything else mm: fix VMA heap bounds checking riscv: fix VMALLOC_START definition kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
2023-12-14riscv: Enable pcpu page first chunk allocatorAlexandre Ghiti2-0/+10
As explained in commit 6ea529a2037c ("percpu: make embedding first chunk allocator check vmalloc space size"), the embedding first chunk allocator needs the vmalloc space to be larger than the maximum distance between units which are grouped into NUMA nodes. On a very sparse NUMA configurations and a small vmalloc area (for example, it is 64GB in sv39), the allocation of dynamic percpu data in the vmalloc area could fail. So provide the pcpu page allocator as a fallback in case we fall into such a sparse configuration (which happened in arm64 as shown by commit 09cea6195073 ("arm64: support page mapping percpu first chunk allocator")). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Dennis Zhou <dennis@kernel.org>
2023-12-14mm: Introduce flush_cache_vmap_early()Alexandre Ghiti3-1/+8
The pcpu setup when using the page allocator sets up a new vmalloc mapping very early in the boot process, so early that it cannot use the flush_cache_vmap() function which may depend on structures not yet initialized (for example in riscv, we currently send an IPI to flush other cpus TLB). But on some architectures, we must call flush_cache_vmap(): for example, in riscv, some uarchs can cache invalid TLB entries so we need to flush the new established mapping to avoid taking an exception. So fix this by introducing a new function flush_cache_vmap_early() which is called right after setting the new page table entry and before accessing this new mapping. This new function implements a local flush tlb on riscv and is no-op for other architectures (same as today). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dennis Zhou <dennis@kernel.org>
2023-12-13riscv: dts: starfive: Enable SDIO wifi on JH7100 boardsEmil Renner Berthing1-0/+60
Add pinctrl and MMC controller nodes for the Broadcom wifi controller on the BeagleV Starlight and StarFive VisionFive V1 boards. Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-13riscv: dts: starfive: Enable SD-card on JH7100 boardsEmil Renner Berthing1-0/+47
Add pinctrl and MMC device tree nodes for the SD-card on the BeagleV Starlight and StarFive VisionFive V1 boards. Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-13riscv: dts: starfive: Add JH7100 MMC nodesEmil Renner Berthing1-0/+26
Add device tree nodes for the Synopsis MMC controllers on the StarFive JH7100 SoC. Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-13riscv: dts: starfive: Add pool for coherent DMA memory on JH7100 boardsEmil Renner Berthing1-0/+24
The StarFive JH7100 SoC has non-coherent device DMAs, but most drivers expect to be able to allocate coherent memory for DMA descriptors and such. However on the JH7100 DDR memory appears twice in the physical memory map, once cached and once uncached: 0x00_8000_0000 - 0x08_7fff_ffff : Off chip DDR memory, cached 0x10_0000_0000 - 0x17_ffff_ffff : Off chip DDR memory, uncached To use this uncached region we create a global DMA memory pool there and reserve the corresponding area in the cached region. However the uncached region is fully above the 32bit address limit, so add a dma-ranges map so the DMA address used for peripherals is still in the regular cached region below the limit. Link: https://github.com/starfive-tech/JH7100_Docs/blob/main/JH7100%20Data%20Sheet%20V01.01.04-EN%20(4-21-2021).pdf Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-13riscv: dts: starfive: Add JH7100 cache controllerEmil Renner Berthing1-0/+13
The StarFive JH7100 SoC also features the SiFive L2 cache controller, so add the device tree nodes for it. Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-13riscv: dts: starfive: Mark the JH7100 as having non-coherent DMAsEmil Renner Berthing1-0/+1
The StarFive JH7100 SoC has non-coherent device DMAs, so mark the soc bus as such. Link: https://github.com/starfive-tech/JH7100_Docs/blob/main/JH7100%20Cache%20Coherence%20V1.0.pdf Signed-off-by: Emil Renner Berthing <kernel@esmil.dk> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-13riscv: dts: starfive: Group tuples in interrupt propertiesGeert Uytterhoeven1-4/+4
To improve human readability and enable automatic validation, the tuples in the various properties containing interrupt specifiers should be grouped. Fix this by grouping the tuples of "interrupts-extended" properties using angle brackets. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-13riscv: errata: Add StarFive JH7100 errataEmil Renner Berthing1-0/+17
This not really an errata, but since the JH7100 was made before the standard Zicbom extension it needs the DMA_GLOBAL_POOL and RISCV_NONSTANDARD_CACHE_OPS enabled to work correctly. Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2023-12-13RISCV: KVM: update external interrupt atomically for IMSIC swfileYong-Xuan Wang1-0/+13
The emulated IMSIC update the external interrupt pending depending on the value of eidelivery and topei. It might lose an interrupt when it is interrupted before setting the new value to the pending status. For example, when VCPU0 sends an IPI to VCPU1 via IMSIC: VCPU0 VCPU1 CSRSWAP topei = 0 The VCPU1 has claimed all the external interrupt in its interrupt handler. topei of VCPU1's IMSIC = 0 set pending in VCPU1's IMSIC topei of VCPU1' IMSIC = 1 set the external interrupt pending of VCPU1 clear the external interrupt pending of VCPU1 When the VCPU1 switches back to VS mode, it exits the interrupt handler because the result of CSRSWAP topei is 0. If there are no other external interrupts injected into the VCPU1's IMSIC, VCPU1 will never know this pending interrupt unless it initiative read the topei. If the interruption occurs between updating interrupt pending in IMSIC and updating external interrupt pending of VCPU, it will not cause a problem. Suppose that the VCPU1 clears the IPI pending in IMSIC right after VCPU0 sets the pending, the external interrupt pending of VCPU1 will not be set because the topei is 0. But when the VCPU1 goes back to VS mode, the pending IPI will be reported by the CSRSWAP topei, it will not lose this interrupt. So we only need to make the external interrupt updating procedure as a critical section to avoid the problem. Fixes: db8b7e97d613 ("RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC") Tested-by: Roy Lin <roy.lin@sifive.com> Tested-by: Wayling Chen <wayling.chen@sifive.com> Co-developed-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Signed-off-by: Anup Patel <anup@brainfault.org>
2023-12-13riscv: fix VMALLOC_START definitionBaoquan He1-1/+1
When below config items are set, compiler complained: -------------------- CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_CRASH_DUMP=y ...... ----------------------- ------------------------------------------------------------------- arch/riscv/kernel/crash_core.c: In function 'arch_crash_save_vmcoreinfo': arch/riscv/kernel/crash_core.c:11:58: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'int' [-Wformat=] 11 | vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START); | ~~^ | | | long unsigned int | %x ---------------------------------------------------------------------- This is because on riscv macro VMALLOC_START has different type when CONFIG_MMU is set or unset. arch/riscv/include/asm/pgtable.h: -------------------------------------------------- Changing it to _AC(0, UL) in case CONFIG_MMU=n can fix the warning. Link: https://lkml.kernel.org/r/ZW7OsX4zQRA3mO4+@MiWiFi-R3L-srv Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Cc: Eric DeVolder <eric_devolder@yahoo.com> Cc: Ignat Korchagin <ignat@cloudflare.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-13kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMPIgnat Korchagin2-3/+5
In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP required CONFIG_KEXEC. However, it was not enough at least for arm64 platforms. While further testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP is unavailable in menuconfig. This is because CONFIG_CRASH_DUMP still depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of which are set in our config. Given that we already established that CONFIG_KEXEC (which is a switch for kexec system call itself) is not required for CONFIG_CRASH_DUMP drop CONFIG_ARCH_SUPPORTS_KEXEC dependency as well. The arm64 kernel builds just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP") and this patch are applied given that the necessary shared bits are included via CONFIG_KEXEC_CORE dependency. [bhe@redhat.com: don't export some symbols when CONFIG_MMU=n] Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv [bhe@redhat.com: riscv, kexec: fix dependency of two items] Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: <stable@vger.kernel.org> # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP Cc: <stable@vger.kernel.org> # 6.6+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>