summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm
AgeCommit message (Collapse)AuthorFilesLines
2024-01-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds20-252/+511
Pull kvm updates from Paolo Bonzini: "Generic: - Use memdup_array_user() to harden against overflow. - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures. - Clean up Kconfigs that all KVM architectures were selecting - New functionality around "guest_memfd", a new userspace API that creates an anonymous file and returns a file descriptor that refers to it. guest_memfd files are bound to their owning virtual machine, cannot be mapped, read, or written by userspace, and cannot be resized. guest_memfd files do however support PUNCH_HOLE, which can be used to switch a memory area between guest_memfd and regular anonymous memory. - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify per-page attributes for a given page of guest memory; right now the only attribute is whether the guest expects to access memory via guest_memfd or not, which in Confidential SVMs backed by SEV-SNP, TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM). x86: - Support for "software-protected VMs" that can use the new guest_memfd and page attributes infrastructure. This is mostly useful for testing, since there is no pKVM-like infrastructure to provide a meaningfully reduced TCB. - Fix a relatively benign off-by-one error when splitting huge pages during CLEAR_DIRTY_LOG. - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE. - Use more generic lockdep assertions in paths that don't actually care about whether the caller is a reader or a writer. - let Xen guests opt out of having PV clock reported as "based on a stable TSC", because some of them don't expect the "TSC stable" bit (added to the pvclock ABI by KVM, but never set by Xen) to be set. - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL. - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always flushes on nested transitions, i.e. always satisfies flush requests. This allows running bleeding edge versions of VMware Workstation on top of KVM. - Sanity check that the CPU supports flush-by-ASID when enabling SEV support. - On AMD machines with vNMI, always rely on hardware instead of intercepting IRET in some cases to detect unmasking of NMIs - Support for virtualizing Linear Address Masking (LAM) - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state prior to refreshing the vPMU model. - Fix a double-overflow PMU bug by tracking emulated counter events using a dedicated field instead of snapshotting the "previous" counter. If the hardware PMC count triggers overflow that is recognized in the same VM-Exit that KVM manually bumps an event count, KVM would pend PMIs for both the hardware-triggered overflow and for KVM-triggered overflow. - Turn off KVM_WERROR by default for all configs so that it's not inadvertantly enabled by non-KVM developers, which can be problematic for subsystems that require no regressions for W=1 builds. - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL "features". - Don't force a masterclock update when a vCPU synchronizes to the current TSC generation, as updating the masterclock can cause kvmclock's time to "jump" unexpectedly, e.g. when userspace hotplugs a pre-created vCPU. - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths, partly as a super minor optimization, but mostly to make KVM play nice with position independent executable builds. - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on CONFIG_HYPERV as a minor optimization, and to self-document the code. - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation" at build time. ARM64: - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups. Loongarch: - Optimization for memslot hugepage checking - Cleanup and fix some HW/SW timer issues - Add LSX/LASX (128bit/256bit SIMD) support RISC-V: - KVM_GET_REG_LIST improvement for vector registers - Generate ISA extension reg_list using macros in get-reg-list selftest - Support for reporting steal time along with selftest s390: - Bugfixes Selftests: - Fix an annoying goof where the NX hugepage test prints out garbage instead of the magic token needed to run the test. - Fix build errors when a header is delete/moved due to a missing flag in the Makefile. - Detect if KVM bugged/killed a selftest's VM and print out a helpful message instead of complaining that a random ioctl() failed. - Annotate the guest printf/assert helpers with __printf(), and fix the various bugs that were lurking due to lack of said annotation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits) x86/kvm: Do not try to disable kvmclock if it was not enabled KVM: x86: add missing "depends on KVM" KVM: fix direction of dependency on MMU notifiers KVM: introduce CONFIG_KVM_COMMON KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache RISC-V: KVM: selftests: Add get-reg-list test for STA registers RISC-V: KVM: selftests: Add steal_time test support RISC-V: KVM: selftests: Add guest_sbi_probe_extension RISC-V: KVM: selftests: Move sbi_ecall to processor.c RISC-V: KVM: Implement SBI STA extension RISC-V: KVM: Add support for SBI STA registers RISC-V: KVM: Add support for SBI extension registers RISC-V: KVM: Add SBI STA info to vcpu_arch RISC-V: KVM: Add steal-update vcpu request RISC-V: KVM: Add SBI STA extension skeleton RISC-V: paravirt: Implement steal-time support RISC-V: Add SBI STA extension definitions RISC-V: paravirt: Add skeleton for pv-time support RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr() ...
2024-01-09Merge tag 'mm-stable-2024-01-08-15-31' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2024-01-09Merge tag 'arm64-upstream' of ↵Linus Torvalds5-82/+6
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "CPU features: - Remove ARM64_HAS_NO_HW_PREFETCH copy_page() optimisation for ye olde Thunder-X machines - Avoid mapping KPTI trampoline when it is not required - Make CPU capability API more robust during early initialisation Early idreg overrides: - Remove dependencies on core kernel helpers from the early command-line parsing logic in preparation for moving this code before the kernel is mapped FPsimd: - Restore kernel-mode fpsimd context lazily, allowing us to run fpsimd code sequences in the kernel with pre-emption enabled KBuild: - Install 'vmlinuz.efi' when CONFIG_EFI_ZBOOT=y - Makefile cleanups LPA2 prep: - Preparatory work for enabling the 'LPA2' extension, which will introduce 52-bit virtual and physical addressing even with 4KiB pages (including for KVM guests). Misc: - Remove dead code and fix a typo MM: - Pass NUMA node information for IRQ stack allocations Perf: - Add perf support for the Synopsys DesignWare PCIe PMU - Add support for event counting thresholds (FEAT_PMUv3_TH) introduced in Armv8.8 - Add support for i.MX8DXL SoCs to the IMX DDR PMU driver. - Minor PMU driver fixes and optimisations RIP VPIPT: - Remove what support we had for the obsolete VPIPT I-cache policy Selftests: - Improvements to the SVE and SME selftests Stacktrace: - Refactor kernel unwind logic so that it can used by BPF unwinding and, eventually, reliable backtracing Sysregs: - Update a bunch of register definitions based on the latest XML drop from Arm" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits) kselftest/arm64: Don't probe the current VL for unsupported vector types efi/libstub: zboot: do not use $(shell ...) in cmd_copy_and_pad arm64: properly install vmlinuz.efi arm64/sysreg: Add missing system instruction definitions for FGT arm64/sysreg: Add missing system register definitions for FGT arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1 arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1 arm64: memory: remove duplicated include arm: perf: Fix ARCH=arm build with GCC arm64: Align boot cpucap handling with system cpucap handling arm64: Cleanup system cpucap handling MAINTAINERS: add maintainers for DesignWare PCIe PMU driver drivers/perf: add DesignWare PCIe PMU driver PCI: Move pci_clear_and_set_dword() helper to PCI header PCI: Add Alibaba Vendor ID to linux/pci_ids.h docs: perf: Add description for Synopsys DesignWare PCIe PMU driver arm64: irq: set the correct node for shadow call stack Revert "perf/arm_dmc620: Remove duplicate format attribute #defines" arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD arm64: fpsimd: Preserve/restore kernel mode NEON at context switch ...
2024-01-09mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDERKirill A. Shutemov1-1/+2
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-09mm, treewide: introduce NR_PAGE_ORDERSKirill A. Shutemov1-1/+1
NR_PAGE_ORDERS defines the number of page orders supported by the page allocator, ranging from 0 to MAX_ORDER, MAX_ORDER + 1 in total. NR_PAGE_ORDERS assists in defining arrays of page orders and allows for more natural iteration over them. [kirill.shutemov@linux.intel.com: fixup for kerneldoc warning] Link: https://lkml.kernel.org/r/20240101111512.7empzyifq7kxtzk3@box Link: https://lkml.kernel.org/r/20231228144704.14033-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-08Merge tag 'kvm-x86-generic-6.8' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-1/+0
Common KVM changes for 6.8: - Use memdup_array_user() to harden against overflow. - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
2024-01-08Merge tag 'kvmarm-6.8' of ↵Paolo Bonzini19-246/+509
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 6.8 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups.
2024-01-08KVM: introduce CONFIG_KVM_COMMONPaolo Bonzini1-2/+1
CONFIG_HAVE_KVM is currently used by some architectures to either enabled the KVM config proper, or to enable host-side code that is not part of the KVM module. However, CONFIG_KVM's "select" statement in virt/kvm/Kconfig corresponds to a third meaning, namely to enable common Kconfigs required by all architectures that support KVM. These three meanings can be replaced respectively by an architecture-specific Kconfig, by IS_ENABLED(CONFIG_KVM), or by a new Kconfig symbol that is in turn selected by the architecture-specific "config KVM". Start by introducing such a new Kconfig symbol, CONFIG_KVM_COMMON. Unlike CONFIG_HAVE_KVM, it is selected by CONFIG_KVM, not by architecture code, and it brings in all dependencies of common KVM code. In particular, INTERVAL_TREE was missing in loongarch and riscv, so that is another thing that is fixed. Fixes: 8132d887a702 ("KVM: remove CONFIG_HAVE_KVM_EVENTFD", 2023-12-08) Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/all/44907c6b-c5bd-4e4a-a921-e4d3825539d8@infradead.org/ Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-04KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgdWill Deacon1-0/+2
In commit f320bc742bc23 ("KVM: arm64: Prepare the creation of s1 mappings at EL2"), pKVM switches from a temporary host-provided page-table to its own page-table at EL2. Since there is only a single TTBR for the nVHE hypervisor, this involves disabling and re-enabling the MMU in __pkvm_init_switch_pgd(). Unfortunately, the memory barriers here are not quite correct. Specifically: - A DSB is required to complete the TLB invalidation executed while the MMU is disabled. - An ISB is required to make the new TTBR value visible to the page-table walker before the MMU is enabled in the SCTLR. An earlier version of the patch actually got this correct: https://lore.kernel.org/lkml/20210304184717.GB21795@willie-the-truck/ but thanks to some badly worded review comments from yours truly, these were dropped for the version that was eventually merged. Bring back the barriers and fix the potential issue (but note that this was found by code inspection). Cc: Quentin Perret <qperret@google.com> Fixes: f320bc742bc23 ("KVM: arm64: Prepare the creation of s1 mappings at EL2") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240104164220.7968-1-will@kernel.org
2024-01-04Merge branch kvm-arm64/vgic-6.8 into kvmarm-master/nextMarc Zyngier3-81/+53
* kvm-arm64/vgic-6.8: : . : Fix for the GICv4.1 vSGI pending state being set/cleared from : userspace, and some cleanup to the MMIO and userspace accessors : for the pending state. : : Also a fix for a potential UAF in the ITS translation cache. : . KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache KVM: arm64: vgic-v3: Reinterpret user ISPENDR writes as I{C,S}PENDR KVM: arm64: vgic: Use common accessor for writes to ICPENDR KVM: arm64: vgic: Use common accessor for writes to ISPENDR KVM: arm64: vgic-v4: Restore pending state on host userspace write Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-01-04KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cacheOliver Upton1-0/+5
There is a potential UAF scenario in the case of an LPI translation cache hit racing with an operation that invalidates the cache, such as a DISCARD ITS command. The root of the problem is that vgic_its_check_cache() does not elevate the refcount on the vgic_irq before dropping the lock that serializes refcount changes. Have vgic_its_check_cache() raise the refcount on the returned vgic_irq and add the corresponding decrement after queueing the interrupt. Cc: stable@vger.kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240104183233.3560639-1-oliver.upton@linux.dev
2024-01-04Merge branch 'for-next/rip-vpipt' into for-next/coreWill Deacon3-75/+1
* for-next/rip-vpipt: arm64: Rename reserved values for CTR_EL0.L1Ip arm64: Kill detection of VPIPT i-cache policy KVM: arm64: Remove VPIPT I-cache handling
2024-01-02Merge tag 'kvm-riscv-6.8-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini4-20/+34
KVM/riscv changes for 6.8 part #1 - KVM_GET_REG_LIST improvement for vector registers - Generate ISA extension reg_list using macros in get-reg-list selftest - Steal time account support along with selftest
2024-01-02Merge tag 'loongarch-kvm-6.8' of ↵Paolo Bonzini1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.8 1. Optimization for memslot hugepage checking. 2. Cleanup and fix some HW/SW timer issues. 3. Add LSX/LASX (128bit/256bit SIMD) support.
2023-12-23Merge tag 'kvmarm-fixes-6.7-2' of ↵Paolo Bonzini4-20/+34
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 6.7, part #2 - Ensure a vCPU's redistributor is unregistered from the MMIO bus if vCPU creation fails - Fix building KVM selftests for arm64 from the top-level Makefile
2023-12-22KVM: arm64: vgic-v3: Reinterpret user ISPENDR writes as I{C,S}PENDROliver Upton1-30/+5
User writes to ISPENDR for GICv3 are treated specially, as zeroes actually clear the pending state for interrupts (unlike HW). Reimplement it using the ISPENDR and ICPENDR user accessors. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231219065855.1019608-4-oliver.upton@linux.dev
2023-12-22KVM: arm64: vgic: Use common accessor for writes to ICPENDROliver Upton1-29/+22
Fold MMIO and user accessors into a common helper while maintaining the distinction between the two. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231219065855.1019608-3-oliver.upton@linux.dev
2023-12-22KVM: arm64: vgic: Use common accessor for writes to ISPENDROliver Upton1-29/+21
Perhaps unsurprisingly, there is a considerable amount of duplicate code between the MMIO and user accessors for ISPENDR. At the same time there are some important differences between user and guest MMIO, like how SGIs can only be made pending from userspace. Fold user and MMIO accessors into a common helper, maintaining the distinction between the two. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231219065855.1019608-2-oliver.upton@linux.dev
2023-12-22KVM: arm64: vgic-v4: Restore pending state on host userspace writeMarc Zyngier1-10/+17
When the VMM writes to ISPENDR0 to set the state pending state of an SGI, we fail to convey this to the HW if this SGI is already backed by a GICv4.1 vSGI. This is a bit of a corner case, as this would only occur if the vgic state is changed on an already running VM, but this can apparently happen across a guest reset driven by the VMM. Fix this by always writing out the pending_latch value to the HW, and reseting it to false. Reported-by: Kunkun Jiang <jiangkunkun@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Cc: stable@vger.kernel.org # 5.10+ Link: https://lore.kernel.org/r/7e7f2c0c-448b-10a9-8929-4b8f4f6e2a32@huawei.com
2023-12-19Merge branch kvm-arm64/nv-6.8-prefix into kvmarm-master/nextMarc Zyngier4-58/+209
* kvm-arm64/nv-6.8-prefix: : . : Nested Virtualization support update, focussing on the : NV2 support (VNCR mapping and such). : . KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() KVM: arm64: nv: Map VNCR-capable registers to a separate page KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers KVM: arm64: Introduce a bad_trap() primitive for unexpected trap handling KVM: arm64: nv: Add include containing the VNCR_EL2 offsets KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR KVM: arm64: nv: Compute NV view of idregs as a one-off KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt() arm64: cpufeatures: Restrict NV support to FEAT_NV2 Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()Marc Zyngier1-5/+124
KVM internally uses accessor functions when reading or writing the guest's system registers. This takes care of accessing either the stored copy or using the "live" EL1 system registers when the host uses VHE. With the introduction of virtual EL2 we add a bunch of EL2 system registers, which now must also be taken care of: - If the guest is running in vEL2, and we access an EL1 sysreg, we must revert to the stored version of that, and not use the CPU's copy. - If the guest is running in vEL1, and we access an EL2 sysreg, we must also use the stored version, since the CPU carries the EL1 copy. - Some EL2 system registers are supposed to affect the current execution of the system, so we need to put them into their respective EL1 counterparts. For this we need to define a mapping between the two. - Some EL2 system registers have a different format than their EL1 counterpart, so we need to translate them before writing them to the CPU. This is done using an (optional) translate function in the map. All of these cases are now wrapped into the existing accessor functions, so KVM users wouldn't need to care whether they access EL2 or EL1 registers and also which state the guest is in. Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Co-developed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpersMarc Zyngier1-18/+47
Add two helpers to deal with EL2 registers are are either redirected to the VNCR page, or that are redirected to their EL1 counterpart. In either cases, no trap is expected. THe relevant register descriptors are repainted accordingly. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: Introduce a bad_trap() primitive for unexpected trap handlingMarc Zyngier1-8/+15
In order to ease the debugging of NV, it is helpful to have the kernel shout at you when an unexpected trap is handled. We already have this in a couple of cases. Make this a more generic infrastructure that we will make use of very shortly. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCRMarc Zyngier1-15/+0
With FEAT_NV2, a bunch of system register writes are turned into memory writes. This is specially the fate of the EL12 registers that the guest hypervisor manipulates out of context. Remove the trap descriptors for those, as they are never going to be used again. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Compute NV view of idregs as a one-offMarc Zyngier3-9/+21
Now that we have a full copy of the idregs for each VM, there is no point in repainting the sysregs on each access. Instead, we can simply perform the transmation as a one-off and be done with it. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-19KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt()Marc Zyngier1-2/+1
A rather common idiom when writing NV code as part of KVM is to have things such has: if (vcpu_has_nv(vcpu) && is_hyp_ctxt(vcpu)) { [...] } to check that we are in a hyp-related context. The second part of the conjunction would be enough, but the first one contains a static key that allows the rest of the checkis to be elided when in a non-NV environment. Rewrite is_hyp_ctxt() to directly use vcpu_has_nv(). The result is the same, and the code easier to read. The one occurence of this that is already merged is rewritten in the process. In order to avoid nasty cirtular dependencies between kvm_emulate.h and kvm_nested.h, vcpu_has_feature() is itself hoisted into kvm_host.h, at the cost of some #deferry... Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-18Merge branch kvm-arm64/fgt-rework into kvmarm-master/nextMarc Zyngier5-34/+147
* kvm-arm64/fgt-rework: (30 commits) : . : Fine Grain Trapping update, courtesy of Fuad Tabba. : : From the cover letter: : : "This patch series has fixes, updates, and code for validating : fine grain trap register masks, as well as some fixes to feature : trapping in pKVM. : : New fine grain trap (FGT) bits have been defined in the latest : Arm Architecture System Registers xml specification (DDI0601 and : DDI0602 2023-09) [1], so the code is updated to reflect them. : Moreover, some of the already-defined masks overlap with RES0, : which this series fixes. : : It also adds FGT register masks that weren't defined earlier, : handling of HAFGRTR_EL2 in nested virt, as well as build time : validation that the bits of the various masks are all accounted : for and without overlap." : : This branch also drags the arm64/for-next/sysregs branch, : which is a dependency on this work. : . KVM: arm64: Trap external trace for protected VMs KVM: arm64: Mark PAuth as a restricted feature for protected VMs KVM: arm64: Fix which features are marked as allowed for protected VMs KVM: arm64: Macros for setting/clearing FGT bits KVM: arm64: Define FGT nMASK bits relative to other fields KVM: arm64: Use generated FGT RES0 bits instead of specifying them KVM: arm64: Add build validation for FGT trap mask values KVM: arm64: Update and fix FGT register masks KVM: arm64: Handle HAFGRTR_EL2 trapping in nested virt KVM: arm64: Add bit masks for HAFGRTR_EL2 KVM: arm64: Add missing HFGITR_EL2 FGT entries to nested virt KVM: arm64: Add missing HFGxTR_EL2 FGT entries to nested virt KVM: arm64: Explicitly trap unsupported HFGxTR_EL2 features arm64/sysreg: Add missing system instruction definitions for FGT arm64/sysreg: Add missing system register definitions for FGT arm64/sysreg: Add missing ExtTrcBuff field definition to ID_AA64DFR0_EL1 arm64/sysreg: Add missing Pauth_LR field definitions to ID_AA64ISAR1_EL1 arm64/sysreg: Add new system registers for GCS arm64/sysreg: Add definition for FPMR arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09 ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2023-12-18KVM: arm64: Trap external trace for protected VMsFuad Tabba1-0/+4
pKVM does not support external trace for protected VMs. Trap external trace, and add the ExtTrcBuff to make it possible to check for the feature. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-18-tabba@google.com
2023-12-18KVM: arm64: Mark PAuth as a restricted feature for protected VMsFuad Tabba1-3/+11
Protected VMs will only support basic PAuth (FEAT_PAuth). Mark it as restricted to ensure that later versions aren't supported for protected guests. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-17-tabba@google.com
2023-12-18KVM: arm64: Fix which features are marked as allowed for protected VMsFuad Tabba1-1/+7
Cache maintenance operations are not trapped for protected VMs, and shouldn't be. Mark them as allowed. Moreover, features advertised by ID_AA64PFR2 and ID_AA64MMFR3 are (already) not allowed, mark them as such. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-16-tabba@google.com
2023-12-18KVM: arm64: Macros for setting/clearing FGT bitsFuad Tabba1-42/+27
There's a lot of boilerplate code for setting and clearing FGT bits when activating guest traps. Refactor it into macros. These macros will also be used in future patch series. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-15-tabba@google.com
2023-12-18KVM: arm64: Add build validation for FGT trap mask valuesFuad Tabba1-0/+18
These checks help ensure that all the bits are accounted for, that there hasn't been a transcribing error from the spec nor from the generated mask values, which will be used in subsequent patches. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-12-tabba@google.com
2023-12-18KVM: arm64: Handle HAFGRTR_EL2 trapping in nested virtFuad Tabba3-0/+74
Add the encodings to fine grain trapping fields for HAFGRTR_EL2 and add the associated handling code in nested virt. Based on DDI0601 2023-09. Add the missing field definitions as well, both to generate the correct RES0 mask and to be able to toggle their FGT bits. Also add the code for handling FGT trapping, reading of the register, to nested virt. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-10-tabba@google.com
2023-12-18KVM: arm64: Add missing HFGITR_EL2 FGT entries to nested virtFuad Tabba1-0/+5
Add the missing nested virt FGT table entries HFGITR_EL2. Based on DDI0601 and DDI0602 2023-09. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-8-tabba@google.com
2023-12-18KVM: arm64: Add missing HFGxTR_EL2 FGT entries to nested virtFuad Tabba1-0/+10
Add the missing nested virt FGT table entries HFGxTR_EL2. Based on DDI0601 2023-09. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-7-tabba@google.com
2023-12-18KVM: arm64: Explicitly trap unsupported HFGxTR_EL2 featuresFuad Tabba1-3/+6
Do not rely on the value of __HFGRTR_EL2_nMASK to trap unsupported features, since the nMASK can (and will) change as new traps are added and as its value is updated. Instead, explicitly specify the trap bits. Suggested-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231214100158.2305400-6-tabba@google.com
2023-12-12arm: perf/kvm: Use GENMASK for ARMV8_PMU_PMCR_NJames Clark2-7/+5
This is so that FIELD_GET and FIELD_PREP can be used and that the fields are in a consistent format to arm64/tools/sysreg Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20231211161331.1277825-3-james.clark@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-12KVM: arm64: vgic: Ensure that slots_lock is held in ↵Marc Zyngier1-0/+2
vgic_register_all_redist_iodevs() Although we implicitly depend on slots_lock being held when registering IO devices with the IO bus infrastructure, we don't enforce this requirement. Make it explicit. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-12KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroyMarc Zyngier4-3/+7
When failing to create a vcpu because (for example) it has a duplicate vcpu_id, we destroy the vcpu. Amusingly, this leaves the redistributor registered with the KVM_MMIO bus. This is no good, and we should properly clean the mess. Force a teardown of the vgic vcpu interface, including the RD device before returning to the caller. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-12KVM: arm64: vgic: Add a non-locking primitive for kvm_vgic_vcpu_destroy()Marc Zyngier1-2/+11
As we are going to need to call into kvm_vgic_vcpu_destroy() without prior holding of the slots_lock, introduce __kvm_vgic_vcpu_destroy() as a non-locking primitive of kvm_vgic_vcpu_destroy(). Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-12KVM: arm64: vgic: Simplify kvm_vgic_destroy()Marc Zyngier1-15/+14
When destroying a vgic, we have rather cumbersome rules about when slots_lock and config_lock are held, resulting in fun buglets. The first port of call is to simplify kvm_vgic_map_resources() so that there is only one call to kvm_vgic_destroy() instead of two, with the second only holding half of the locks. For that, we kill the non-locking primitive and move the call outside of the locking altogether. This doesn't change anything (we re-acquire the locks and teardown the whole vgic), and simplifies the code significantly. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-12-08KVM: remove CONFIG_HAVE_KVM_IRQFDPaolo Bonzini1-1/+0
All platforms with a kernel irqchip have support for irqfd. Unify the two configuration items so that userspace can expect to use irqfd to inject interrupts into the irqchip. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-08KVM: remove CONFIG_HAVE_KVM_EVENTFDPaolo Bonzini1-1/+0
virt/kvm/eventfd.c is compiled unconditionally, meaning that the ioeventfds member of struct kvm is accessed unconditionally. CONFIG_HAVE_KVM_EVENTFD therefore must be defined for KVM common code to compile successfully, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-05KVM: arm64: Remove VPIPT I-cache handlingMarc Zyngier3-75/+1
We have some special handling for VPIPT I-cache in critical parts of the cache and TLB maintenance. Remove it. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20231204143606.1806432-2-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-12-01KVM: move KVM_CAP_DEVICE_CTRL to the generic checkWei Wang1-1/+0
KVM_CAP_DEVICE_CTRL allows userspace to check if the kvm_device framework (e.g. KVM_CREATE_DEVICE) is supported by KVM. Move KVM_CAP_DEVICE_CTRL to the generic check for the two reasons: 1) it already supports arch agnostic usages (i.e. KVM_DEV_TYPE_VFIO). For example, userspace VFIO implementation may needs to create KVM_DEV_TYPE_VFIO on x86, riscv, or arm etc. It is simpler to have it checked at the generic code than at each arch's code. 2) KVM_CREATE_DEVICE has been added to the generic code. Link: https://lore.kernel.org/all/20221215115207.14784-1-wei.w.wang@intel.com Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Acked-by: Anup Patel <anup@brainfault.org> (riscv) Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20230315101606.10636-1-wei.w.wang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-11-30KVM: arm64: Use helpers to classify exception types reported via ESRArd Biesheuvel3-20/+19
Currently, we rely on the fact that exceptions can be trivially classified by applying a mask/value pair to the syndrome value reported via the ESR register, but this will no longer be true once we enable support for 5 level paging. So introduce a couple of helpers that encapsulate this mask/value pair matching, and wire them up in the code. No functional change intended, the actual handling of translation level -1 will be added in a subsequent patch. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> [maz: folded in changes suggested by Mark] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231128140400.3132145-2-ardb@google.com
2023-11-27KVM: arm64: Allow guests with >48-bit IPA size on FEAT_LPA2 systemsRyan Roberts1-5/+4
With all the page-table infrastructure in place, we can finally increase the maximum permisable IPA size to 52-bits on 4KB and 16KB page systems that have FEAT_LPA2. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231127111737.1897081-11-ryan.roberts@arm.com
2023-11-27KVM: arm64: Support up to 5 levels of translation in kvm_pgtableRyan Roberts1-0/+9
FEAT_LPA2 increases the maximum levels of translation from 4 to 5 for the 4KB page case, when IA is >48 bits. While we can still use 4 levels for stage2 translation in this case (due to stage2 allowing concatenated page tables for first level lookup), the same kvm_pgtable library is used for the hyp stage1 page tables and stage1 does not support concatenation. Therefore, modify the library to support up to 5 levels. Previous patches already laid the groundwork for this by refactoring code to work in terms of KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. So we just need to change these macros. The hardware sometimes encodes the new level differently from the others: One such place is when reading the level from the FSC field in the ESR_EL2 register. We never expect to see the lowest level (-1) here since the stage 2 page tables always use concatenated tables for first level lookup and therefore only use 4 levels of lookup. So we get away with just adding a comment to explain why we are not being careful about decoding level -1. For stage2 VTCR_EL2.SL2 is introduced to encode the new start level. However, since we always use concatenated page tables for first level look up at stage2 (and therefore we will never need the new extra level) we never touch this new field. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231127111737.1897081-10-ryan.roberts@arm.com
2023-11-27KVM: arm64: Convert translation level parameter to s8Ryan Roberts5-43/+51
With the introduction of FEAT_LPA2, the Arm ARM adds a new level of translation, level -1, so levels can now be in the range [-1;3]. 3 is always the last level and the first level is determined based on the number of VA bits in use. Convert level variables to use a signed type in preparation for supporting this new level -1. Since the last level is always anchored at 3, and the first level varies to suit the number of VA/IPA bits, take the opportunity to replace KVM_PGTABLE_MAX_LEVELS with the 2 macros KVM_PGTABLE_FIRST_LEVEL and KVM_PGTABLE_LAST_LEVEL. This removes the assumption from the code that levels run from 0 to KVM_PGTABLE_MAX_LEVELS - 1, which will soon no longer be true. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231127111737.1897081-9-ryan.roberts@arm.com
2023-11-27KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1Ryan Roberts3-7/+17
Implement a simple policy whereby if the HW supports FEAT_LPA2 for the page size we are using, always use LPA2-style page-tables for stage 2 and hyp stage 1 (assuming an nvhe hyp), regardless of the VMM-requested IPA size or HW-implemented PA size. When in use we can now support up to 52-bit IPA and PA sizes. We use the previously created cpu feature to track whether LPA2 is supported for deciding whether to use the LPA2 or classic pte format. Note that FEAT_LPA2 brings support for bigger block mappings (512GB with 4KB, 64GB with 16KB). We explicitly don't enable these in the library because stage2_apply_range() works on batch sizes of the largest used block mapping, and increasing the size of the batch would lead to soft lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit stage2_apply_range() batch size to largest block"). With the addition of LPA2 support in the hypervisor, the PA size supported by the HW must be capped with a runtime decision, rather than simply using a compile-time decision based on PA_BITS. For example, on a system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB or 16KB kernel compiled with LPA2 support must still limit the PA size to 48 bits. Therefore, move the insertion of the PS field into TCR_EL2 out of __kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode() where the rest of TCR_EL2 is prepared. This allows us to figure out PS with kvm_get_parange(), which has the appropriate logic to ensure the above requirement. (and the PS field of VTCR_EL2 is already populated this way). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231127111737.1897081-8-ryan.roberts@arm.com