summaryrefslogtreecommitdiff
path: root/arch/arm64/kvm/sys_regs.c
AgeCommit message (Collapse)AuthorFilesLines
2024-07-14Merge branch kvm-arm64/nv-tcr2 into kvmarm/nextOliver Upton1-1/+16
* kvm-arm64/nv-tcr2: : Fixes to the handling of TCR_EL1, courtesy of Marc Zyngier : : Series addresses a couple gaps that are present in KVM (from cover : letter): : : - VM configuration: HCRX_EL2.TCR2En is forced to 1, and we blindly : save/restore stuff. : : - trap bit description and routing: none, obviously, since we make a : point in not trapping. KVM: arm64: Honor trap routing for TCR2_EL1 KVM: arm64: Make PIR{,E0}_EL1 save/restore conditional on FEAT_TCRX KVM: arm64: Make TCR2_EL1 save/restore dependent on the VM features KVM: arm64: Get rid of HCRX_GUEST_FLAGS KVM: arm64: Correctly honor the presence of FEAT_TCRX Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/nv-sve into kvmarm/nextOliver Upton1-0/+38
* kvm-arm64/nv-sve: : CPTR_EL2, FPSIMD/SVE support for nested : : This series brings support for honoring the guest hypervisor's CPTR_EL2 : trap configuration when running a nested guest, along with support for : FPSIMD/SVE usage at L1 and L2. KVM: arm64: Allow the use of SVE+NV KVM: arm64: nv: Add additional trap setup for CPTR_EL2 KVM: arm64: nv: Add trap description for CPTR_EL2 KVM: arm64: nv: Add TCPAC/TTA to CPTR->CPACR conversion helper KVM: arm64: nv: Honor guest hypervisor's FP/SVE traps in CPTR_EL2 KVM: arm64: nv: Load guest FP state for ZCR_EL2 trap KVM: arm64: nv: Handle CPACR_EL1 traps KVM: arm64: Spin off helper for programming CPTR traps KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state KVM: arm64: nv: Use guest hypervisor's max VL when running nested guest KVM: arm64: nv: Save guest's ZCR_EL2 when in hyp context KVM: arm64: nv: Load guest hyp's ZCR into EL1 state KVM: arm64: nv: Handle ZCR_EL2 traps KVM: arm64: nv: Forward SVE traps to guest hypervisor KVM: arm64: nv: Forward FP/ASIMD traps to guest hypervisor Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-07-14Merge branch kvm-arm64/ctr-el0 into kvmarm/nextOliver Upton1-49/+91
* kvm-arm64/ctr-el0: : Support for user changes to CTR_EL0, courtesy of Sebastian Ott : : Allow userspace to change the guest-visible value of CTR_EL0 for a VM, : so long as the requested value represents a subset of features supported : by hardware. In other words, prevent the VMM from over-promising the : capabilities of hardware. : : Make this happen by fitting CTR_EL0 into the existing infrastructure for : feature ID registers. KVM: selftests: Assert that MPIDR_EL1 is unchanged across vCPU reset KVM: arm64: nv: Unfudge ID_AA64PFR0_EL1 masking KVM: selftests: arm64: Test writes to CTR_EL0 KVM: arm64: rename functions for invariant sys regs KVM: arm64: show writable masks for feature registers KVM: arm64: Treat CTR_EL0 as a VM feature ID register KVM: arm64: unify code to prepare traps KVM: arm64: nv: Use accessors for modifying ID registers KVM: arm64: Add helper for writing ID regs KVM: arm64: Use read-only helper for reading VM ID registers KVM: arm64: Make idregs debugfs iterator search sysreg table directly KVM: arm64: Get sys_reg encoding from descriptor in idregs_debug_show() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-27KVM: arm64: Get rid of HCRX_GUEST_FLAGSMarc Zyngier1-1/+7
HCRX_GUEST_FLAGS gives random KVM hackers the impression that they can stuff bits in this macro and unconditionally enable features in the guest. In general, this is wrong (we have been there with FEAT_MOPS, and again with FEAT_TCRX). Document that HCRX_EL2.SMPME is an exception rather than the rule, and get rid of HCRX_GUEST_FLAGS. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20240625130042.259175-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-27KVM: arm64: Correctly honor the presence of FEAT_TCRXMarc Zyngier1-0/+9
We currently blindly enable TCR2_EL1 use in a guest, irrespective of the feature set. This is obviously wrong, and we should actually honor the guest configuration and handle the possible trap resulting from the guest being buggy. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20240625130042.259175-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: nv: Handle ZCR_EL2 trapsOliver Upton1-0/+38
Unlike other SVE-related registers, ZCR_EL2 takes a sysreg trap to EL2 when HCR_EL2.NV = 1. KVM still needs to honor the guest hypervisor's trap configuration, which expects an SVE trap (i.e. ESR_EL2.EC = 0x19) when CPTR traps are enabled for the vCPU's current context. Otherwise, if the guest hypervisor has traps disabled, emulate the access by mapping the requested VL into ZCR_EL1. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240620164653.1130714-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: rename functions for invariant sys regsSebastian Ott1-5/+5
Invariant system id registers are populated with host values at initialization time using their .reset function cb. These are currently called get_* which is usually used by the functions implementing the .get_user callback. Change their function names to reset_* to reflect what they are used for. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: show writable masks for feature registersSebastian Ott1-14/+5
Instead of using ~0UL provide the actual writable mask for non-id feature registers in the output of the KVM_ARM_GET_REG_WRITABLE_MASKS ioctl. This changes the mask for the CTR_EL0 and CLIDR_EL1 registers. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Treat CTR_EL0 as a VM feature ID registerSebastian Ott1-10/+10
CTR_EL0 is currently handled as an invariant register, thus guests will be presented with the host value of that register. Add emulation for CTR_EL0 based on a per VM value. Userspace can switch off DIC and IDC bits and reduce DminLine and IminLine sizes. Naturally, ensure CTR_EL0 is trapped (HCR_EL2.TID2=1) any time that a VM's CTR_EL0 differs from hardware. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: unify code to prepare trapsSebastian Ott1-2/+32
There are 2 functions to calculate traps via HCR_EL2: * kvm_init_sysreg() called via KVM_RUN (before the 1st run or when the pid changes) * vcpu_reset_hcr() called via KVM_ARM_VCPU_INIT To unify these 2 and to support traps that are dependent on the ID register configuration, move the code from vcpu_reset_hcr() to sys_regs.c and call it via kvm_init_sysreg(). We still have to keep the non-FWB handling stuff in vcpu_reset_hcr(). Also the initialization with HCR_GUEST_FLAGS is kept there but guarded by !vcpu_has_run_once() to ensure that previous calculated values don't get overwritten. While at it rename kvm_init_sysreg() to kvm_calculate_traps() to better reflect what it's doing. Signed-off-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Add helper for writing ID regsOliver Upton1-3/+14
Replace the remaining usage of IDREG() with a new helper for setting the value of a feature ID register, with the benefit of cramming in some extra sanity checks. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Use read-only helper for reading VM ID registersOliver Upton1-3/+3
IDREG() expands to the storage of a particular ID reg, which can be useful for handling both reads and writes. However, outside of a select few situations, the ID registers should be considered read only. Replace current readers with a new macro that expands to the value of the field rather than the field itself. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Make idregs debugfs iterator search sysreg table directlyOliver Upton1-13/+23
CTR_EL0 complicates the existing scheme for iterating feature ID registers, as it is not in the contiguous range that we presently support. Just search the sysreg table for the Nth feature ID register in anticipation of this. Yes, the debugfs interface has quadratic time completixy now. Boo hoo. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-20KVM: arm64: Get sys_reg encoding from descriptor in idregs_debug_show()Oliver Upton1-1/+1
KVM is about to add support for more VM-scoped feature ID regs that live outside of the id_regs[] array, which means the index of the debugfs iterator may not actually be an index into the array. Prepare by getting the sys_reg encoding from the descriptor itself. Reviewed-by: Sebastian Ott <sebott@redhat.com> Link: https://lore.kernel.org/r/20240619174036.483943-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Add handling of NXS-flavoured TLBI operationsMarc Zyngier1-0/+73
Latest kid on the block: NXS (Non-eXtra-Slow) TLBI operations. Let's add those in bulk (NSH, ISH, OSH, both normal and range) as they directly map to their XS (the standard ones) counterparts. Not a lot to say about them, they are basically useless. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-17-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Add handling of range-based TLBI operationsMarc Zyngier1-0/+80
We already support some form of range operation by handling FEAT_TTL, but so far the "arbitrary" range operations are unsupported. Let's fix that. For EL2 S1, this is simple enough: we just map both NSH, ISH and OSH instructions onto the ISH version for EL1. For TLBI instructions affecting EL1 S1, we use the same model as their non-range counterpart to invalidate in the context of the correct VMID. For TLBI instructions affecting S2, we interpret the data passed by the guest to compute the range and use that to tear-down part of the shadow S2 range and invalidate the TLBs. Finally, we advertise FEAT_TLBIRANGE if the host supports it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-16-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Add handling of outer-shareable TLBI operationsMarc Zyngier1-0/+15
Our handling of outer-shareable TLBIs is pretty basic: we just map them to the existing inner-shareable ones, because we really don't have anything else. The only significant change is that we can now advertise FEAT_TLBIOS support if the host supports it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle FEAT_TTL hinted TLB operationsMarc Zyngier1-23/+1
Support guest-provided information information to size the range of required invalidation. This helps with reducing over-invalidation, provided that the guest actually provides accurate information. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operationsMarc Zyngier1-0/+96
TLBI IPAS2E1* are the last class of TLBI instructions we need to handle. For each matching S2 MMU context, we invalidate a range corresponding to the largest possible mapping for that context. At this stage, we don't handle TTL, which means we are likely over-invalidating. Further patches will aim at making this a bit better. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle TLBI ALLE1{,IS} operationsMarc Zyngier1-0/+25
TLBI ALLE1* is a pretty big hammer that invalides all S1/S2 TLBs. This translates into the unmapping of all our shadow S2 PTs, itself resulting in the corresponding TLB invalidations. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operationsMarc Zyngier1-0/+51
Emulating TLBI VMALLS12E1* results in tearing down all the shadow S2 PTs that match the current VMID, since our shadow S2s are just some form of SW-managed TLBs. That teardown itself results in a full TLB invalidation for both S1 and S2. This can result in over-invalidation if two vcpus use the same VMID to tag private S2 PTs, but this is still correct from an architecture perspective. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-06-19KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1Marc Zyngier1-0/+80
While dealing with TLB invalidation targeting the guest hypervisor's own stage-1 was easy, doing the same thing for its own guests is a bit more involved. Since such an invalidation is scoped by VMID, it needs to apply to all s2_mmu contexts that have been tagged by that VMID, irrespective of the value of VTTBR_EL2.BADDR. So for each s2_mmu context matching that VMID, we invalidate the corresponding TLBs, each context having its own "physical" VMID. Co-developed-by: Jintack Lim <jintack.lim@linaro.org> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240614144552.2773592-8-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-05-09Merge branch kvm-arm64/mpidr-reset into kvmarm-master/nextMarc Zyngier1-26/+36
* kvm-arm64/mpidr-reset: : . : Fixes for CLIDR_EL1 and MPIDR_EL1 being accidentally mutable across : a vcpu reset, courtesy of Oliver. From the cover letter: : : "For VM-wide feature ID registers we ensure they get initialized once for : the lifetime of a VM. On the other hand, vCPU-local feature ID registers : get re-initialized on every vCPU reset, potentially clobbering the : values userspace set up. : : MPIDR_EL1 and CLIDR_EL1 are the only registers in this space that we : allow userspace to modify for now. Clobbering the value of MPIDR_EL1 has : some disastrous side effects as the compressed index used by the : MPIDR-to-vCPU lookup table assumes MPIDR_EL1 is immutable after KVM_RUN. : : Series + reproducer test case to address the problem of KVM wiping out : userspace changes to these registers. Note that there are still some : differences between VM and vCPU scoped feature ID registers from the : perspective of userspace. We do not allow the value of VM-scope : registers to change after KVM_RUN, but vCPU registers remain mutable." : . KVM: selftests: arm64: Test vCPU-scoped feature ID registers KVM: selftests: arm64: Test that feature ID regs survive a reset KVM: selftests: arm64: Store expected register value in set_id_regs KVM: selftests: arm64: Rename helper in set_id_regs to imply VM scope KVM: arm64: Only reset vCPU-scoped feature ID regs once KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs() KVM: arm64: Rename is_id_reg() to imply VM scope Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-09KVM: arm64: Only reset vCPU-scoped feature ID regs onceOliver Upton1-8/+24
The general expecation with feature ID registers is that they're 'reset' exactly once by KVM for the lifetime of a vCPU/VM, such that any userspace changes to the CPU features / identity are honored after a vCPU gets reset (e.g. PSCI_ON). KVM handles what it calls VM-scoped feature ID registers correctly, but feature ID registers local to a vCPU (CLIDR_EL1, MPIDR_EL1) get wiped after every reset. What's especially concerning is that a potentially-changing MPIDR_EL1 breaks MPIDR compression for indexing mpidr_data, as the mask of useful bits to build the index could change. This is absolutely no good. Avoid resetting vCPU feature ID registers more than once. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240502233529.1958459-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-09KVM: arm64: Reset VM feature ID regs from kvm_reset_sys_regs()Oliver Upton1-17/+10
A subsequent change to KVM will expand the range of feature ID registers that get special treatment at reset. Fold the existing ones back in to kvm_reset_sys_regs() to avoid the need for an additional table walk. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240502233529.1958459-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-09KVM: arm64: Rename is_id_reg() to imply VM scopeOliver Upton1-5/+6
The naming of some of the feature ID checks is ambiguous. Rephrase the is_id_reg() helper to make its purpose slightly clearer. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240502233529.1958459-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNXRussell King1-1/+0
Commit d5a32b60dc18 ("KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1") made certain fields in these registers writable, but in doing so, ID_AA64MMFR1_EL1_XNX was listed twice. Remove the duplication. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Link: https://lore.kernel.org/r/E1s2AxF-00AWLv-03@rmk-PC.armlinux.org.uk Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-14KVM: arm64: Improve out-of-order sysreg table diagnosticsMarc Zyngier1-2/+4
Adding new entries to our system register tables is a painful exercise, as we require them to be ordered by Op0,Op1,CRn,CRm,Op2. If an entry is misordered, we output an error that indicates the pointer to the entry and the number *of the last valid one*. That's not very helpful, and would be much better if we printed the number of the *offending* entry as well as its name (which is present in the vast majority of the cases). This makes debugging new additions to the tables much easier. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240410152503.3593890-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-03-16Revert "KVM: arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later ↵Oliver Upton1-3/+0
checking" This reverts commits 99101dda29e3186b1356b0dc4dbb835c02c71ac9 and b80b701d5a67d07f4df4a21e09cb31f6bc1feeca. Linus reports that the sysreg reserved bit checks in KVM have led to build failures, arising from commit fdd867fe9b32 ("arm64/sysreg: Add register fields for ID_AA64DFR1_EL1") giving meaning to fields that were previously RES0. Of course, this is a genuine issue, since KVM's sysreg emulation depends heavily on the definition of reserved fields. But at this point the build breakage is far more offensive, and the right course of action is to revert and retry later. All of these build-time assertions were on by default before commit 99101dda29e3 ("KVM: arm64: Make build-time check of RES0/RES1 bits optional"), so deliberately revert it all atomically to avoid introducing further breakage of bisection. Link: https://lore.kernel.org/all/CAHk-=whCvkhc8BbFOUf1ddOsgSGgEjwoKv77=HEY1UiVCydGqw@mail.gmail.com/ Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-07Merge branch kvm-arm64/kerneldoc into kvmarm/nextOliver Upton1-2/+5
* kvm-arm64/kerneldoc: : kerneldoc warning fixes, courtesy of Randy Dunlap : : Fixes addressing the widespread misuse of kerneldoc-style comments : throughout KVM/arm64. KVM: arm64: vgic: fix a kernel-doc warning KVM: arm64: vgic-its: fix kernel-doc warnings KVM: arm64: vgic-init: fix a kernel-doc warning KVM: arm64: sys_regs: fix kernel-doc warnings KVM: arm64: PMU: fix kernel-doc warnings KVM: arm64: mmu: fix a kernel-doc warning KVM: arm64: vhe: fix a kernel-doc warning KVM: arm64: hyp/aarch32: fix kernel-doc warnings KVM: arm64: guest: fix kernel-doc warnings KVM: arm64: debug: fix kernel-doc warnings Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-27KVM: arm64: Don't initialize idreg debugfs w/ preemption disabledOliver Upton1-5/+8
Testing KVM with DEBUG_ATOMIC_SLEEP enabled doesn't get far before hitting the first splat: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 13062, name: vgic_lpi_stress preempt_count: 1, expected: 0 2 locks held by vgic_lpi_stress/13062: #0: ffff080084553240 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xc0/0x13f0 #1: ffff800080485f08 (&kvm->arch.config_lock){+.+.}-{3:3}, at: kvm_arch_vcpu_ioctl+0xd60/0x1788 CPU: 19 PID: 13062 Comm: vgic_lpi_stress Tainted: G W O 6.8.0-dbg-DEV #1 Call trace: dump_backtrace+0xf8/0x148 show_stack+0x20/0x38 dump_stack_lvl+0xb4/0xf8 dump_stack+0x18/0x40 __might_resched+0x248/0x2a0 __might_sleep+0x50/0x88 down_write+0x30/0x150 start_creating+0x90/0x1a0 __debugfs_create_file+0x5c/0x1b0 debugfs_create_file+0x34/0x48 kvm_reset_sys_regs+0x120/0x1e8 kvm_reset_vcpu+0x148/0x270 kvm_arch_vcpu_ioctl+0xddc/0x1788 kvm_vcpu_ioctl+0xb6c/0x13f0 __arm64_sys_ioctl+0x98/0xd8 invoke_syscall+0x48/0x108 el0_svc_common+0xb4/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x54/0x128 el0t_64_sync_handler+0x68/0xc0 el0t_64_sync+0x1a8/0x1b0 kvm_reset_vcpu() disables preemption as it needs to unload vCPU state from the CPU to twiddle with it, which subsequently explodes when taking the parent inode's rwsem while creating the idreg debugfs file. Fix it by moving the initialization to kvm_arch_create_vm_debugfs(). Fixes: 891766581dea ("KVM: arm64: Add debugfs file for guest's ID registers") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240227094115.1723330-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-27KVM: arm64: Fail the idreg iterator if idregs aren't initializedOliver Upton1-1/+2
Return an error to userspace if the VM's ID register values haven't been initialized in preparation for changing the debugfs file initialization order. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240227094115.1723330-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Add debugfs file for guest's ID registersMarc Zyngier1-0/+81
Debugging ID register setup can be a complicated affair. Give the kernel hacker a way to dump that state in an easy to parse way. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-27-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Snapshot all non-zero RES0/RES1 sysreg fields for later checkingMarc Zyngier1-0/+3
As KVM now strongly relies on accurately handling the RES0/RES1 bits on a number of paths, add a compile-time checker that will blow in the face of the innocent bystander, should they try to sneak in an update that changes any of these RES0/RES1 fields. It is expected that such an update will come with the relevant KVM update if needed. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-26-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Make FEAT_MOPS UNDEF if not advertised to the guestMarc Zyngier1-0/+7
We unconditionally enable FEAT_MOPS, which is obviously wrong. So let's only do that when it is advertised to the guest. Which means we need to rely on a per-vcpu HCRX_EL2 shadow register. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20240214131827.2856277-25-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Make AMU sysreg UNDEF if FEAT_AMU is not advertised to the guestMarc Zyngier1-0/+4
No AMU? No AMU! IF we see an AMU-related trap, let's turn it into an UNDEF! Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-24-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Make PIR{,E0}_EL1 UNDEF if S1PIE is not advertised to the guestMarc Zyngier1-0/+4
As part of the ongoing effort to honor the guest configuration, add the necessary checks to make PIR_EL1 and co UNDEF if not advertised to the guest, and avoid context switching them. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20240214131827.2856277-23-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Make TLBI OS/Range UNDEF if not advertised to the guestMarc Zyngier1-0/+34
Outer Shareable and Range TLBI instructions shouldn't be made available to the guest if they are not advertised. Use FGU to disable those, and set HCR_EL2.TLBIOS in the case the host doesn't have FGT. Note that in that later case, we cannot efficiently disable TLBI Range instructions, as this would require to trap all TLBIs. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20240214131827.2856277-22-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Move existing feature disabling over to FGU infrastructureMarc Zyngier1-0/+23
We already trap a bunch of existing features for the purpose of disabling them (MAIR2, POR, ACCDATA, SME...). Let's move them over to our brand new FGU infrastructure. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-20-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Rename __check_nv_sr_forward() to triage_sysreg_trap()Marc Zyngier1-1/+1
__check_nv_sr_forward() is not specific to NV anymore, and does a lot more. Rename it to triage_sysreg_trap(), making it plain that its role is to handle where an exception is to be handled. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-17-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Use the xarray as the primary sysreg/sysinsn walkerMarc Zyngier1-44/+19
Since we always start sysreg/sysinsn handling by searching the xarray, use it as the source of the index in the correct sys_reg_desc array. This allows some cleanup, such as moving the handling of unknown sysregs in a single location. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-16-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Register AArch64 system register entries with the sysreg xarrayMarc Zyngier1-1/+10
In order to reduce the number of lookups that we have to perform when handling a sysreg, register each AArch64 sysreg descriptor with the global xarray. The index of the descriptor is stored as a 10 bit field in the data word. Subsequent patches will retrieve and use the stored index. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Always populate the trap configuration xarrayMarc Zyngier1-4/+1
As we are going to rely more and more on the global xarray that contains the trap configuration, always populate it, even in the non-NV case. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-14-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: nv: Move system instructions to their own sys_reg_desc arrayMarc Zyngier1-15/+44
As NV results in a bunch of system instructions being trapped, it makes sense to pull the system instructions into their own little array, where they will eventually be joined by AT, TLBI and a bunch of other CMOs. Based on an initial patch by Jintack Lim. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-13-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-19KVM: arm64: Add feature checking helpersMarc Zyngier1-4/+2
In order to make it easier to check whether a particular feature is exposed to a guest, add a new set of helpers, with kvm_has_feat() being the most useful. Let's start making use of them in the PMU code (courtesy of Oliver). Follow-up changes will introduce additional use patterns. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Co-developed--by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-08KVM: arm64: Force guest's HCR_EL2.E2H RES1 when NV1 is not implementedMarc Zyngier1-1/+11
If NV1 isn't supported on a system, make sure we always evaluate the guest's HCR_EL2.E2H as RES1, irrespective of what the guest may have written there. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240122181344.258974-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-08KVM: arm64: Expose ID_AA64MMFR4_EL1 to guestsMarc Zyngier1-1/+1
We can now expose ID_AA64MMFR4_EL1 to guests, and let NV guests understand that they cannot really switch HCR_EL2.E2H to 0 on some platforms. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240122181344.258974-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-08arm64: Add macro to compose a sysreg field valueMarc Zyngier1-1/+2
A common idiom is to compose a tupple (reg, field, val) into a symbol matching an autogenerated definition. Add a help performing the concatenation and replace it when open-coded implementations exist. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240122181344.258974-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-02-01KVM: arm64: sys_regs: fix kernel-doc warningsRandy Dunlap1-2/+5
Drop the @run function parameter descriptions and add the actual ones for 2 functions to prevent kernel-doc warnings: arch/arm64/kvm/sys_regs.c:3167: warning: Excess function parameter 'run' description in 'kvm_handle_cp_64' arch/arm64/kvm/sys_regs.c:3335: warning: Excess function parameter 'run' description in 'kvm_handle_cp_32' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.linux.dev Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20240117230714.31025-8-rdunlap@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-01-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-48/+187
Pull kvm updates from Paolo Bonzini: "Generic: - Use memdup_array_user() to harden against overflow. - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures. - Clean up Kconfigs that all KVM architectures were selecting - New functionality around "guest_memfd", a new userspace API that creates an anonymous file and returns a file descriptor that refers to it. guest_memfd files are bound to their owning virtual machine, cannot be mapped, read, or written by userspace, and cannot be resized. guest_memfd files do however support PUNCH_HOLE, which can be used to switch a memory area between guest_memfd and regular anonymous memory. - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify per-page attributes for a given page of guest memory; right now the only attribute is whether the guest expects to access memory via guest_memfd or not, which in Confidential SVMs backed by SEV-SNP, TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM). x86: - Support for "software-protected VMs" that can use the new guest_memfd and page attributes infrastructure. This is mostly useful for testing, since there is no pKVM-like infrastructure to provide a meaningfully reduced TCB. - Fix a relatively benign off-by-one error when splitting huge pages during CLEAR_DIRTY_LOG. - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE. - Use more generic lockdep assertions in paths that don't actually care about whether the caller is a reader or a writer. - let Xen guests opt out of having PV clock reported as "based on a stable TSC", because some of them don't expect the "TSC stable" bit (added to the pvclock ABI by KVM, but never set by Xen) to be set. - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL. - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always flushes on nested transitions, i.e. always satisfies flush requests. This allows running bleeding edge versions of VMware Workstation on top of KVM. - Sanity check that the CPU supports flush-by-ASID when enabling SEV support. - On AMD machines with vNMI, always rely on hardware instead of intercepting IRET in some cases to detect unmasking of NMIs - Support for virtualizing Linear Address Masking (LAM) - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state prior to refreshing the vPMU model. - Fix a double-overflow PMU bug by tracking emulated counter events using a dedicated field instead of snapshotting the "previous" counter. If the hardware PMC count triggers overflow that is recognized in the same VM-Exit that KVM manually bumps an event count, KVM would pend PMIs for both the hardware-triggered overflow and for KVM-triggered overflow. - Turn off KVM_WERROR by default for all configs so that it's not inadvertantly enabled by non-KVM developers, which can be problematic for subsystems that require no regressions for W=1 builds. - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL "features". - Don't force a masterclock update when a vCPU synchronizes to the current TSC generation, as updating the masterclock can cause kvmclock's time to "jump" unexpectedly, e.g. when userspace hotplugs a pre-created vCPU. - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths, partly as a super minor optimization, but mostly to make KVM play nice with position independent executable builds. - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on CONFIG_HYPERV as a minor optimization, and to self-document the code. - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation" at build time. ARM64: - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base granule sizes. Branch shared with the arm64 tree. - Large Fine-Grained Trap rework, bringing some sanity to the feature, although there is more to come. This comes with a prefix branch shared with the arm64 tree. - Some additional Nested Virtualization groundwork, mostly introducing the NV2 VNCR support and retargetting the NV support to that version of the architecture. - A small set of vgic fixes and associated cleanups. Loongarch: - Optimization for memslot hugepage checking - Cleanup and fix some HW/SW timer issues - Add LSX/LASX (128bit/256bit SIMD) support RISC-V: - KVM_GET_REG_LIST improvement for vector registers - Generate ISA extension reg_list using macros in get-reg-list selftest - Support for reporting steal time along with selftest s390: - Bugfixes Selftests: - Fix an annoying goof where the NX hugepage test prints out garbage instead of the magic token needed to run the test. - Fix build errors when a header is delete/moved due to a missing flag in the Makefile. - Detect if KVM bugged/killed a selftest's VM and print out a helpful message instead of complaining that a random ioctl() failed. - Annotate the guest printf/assert helpers with __printf(), and fix the various bugs that were lurking due to lack of said annotation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits) x86/kvm: Do not try to disable kvmclock if it was not enabled KVM: x86: add missing "depends on KVM" KVM: fix direction of dependency on MMU notifiers KVM: introduce CONFIG_KVM_COMMON KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache RISC-V: KVM: selftests: Add get-reg-list test for STA registers RISC-V: KVM: selftests: Add steal_time test support RISC-V: KVM: selftests: Add guest_sbi_probe_extension RISC-V: KVM: selftests: Move sbi_ecall to processor.c RISC-V: KVM: Implement SBI STA extension RISC-V: KVM: Add support for SBI STA registers RISC-V: KVM: Add support for SBI extension registers RISC-V: KVM: Add SBI STA info to vcpu_arch RISC-V: KVM: Add steal-update vcpu request RISC-V: KVM: Add SBI STA extension skeleton RISC-V: paravirt: Implement steal-time support RISC-V: Add SBI STA extension definitions RISC-V: paravirt: Add skeleton for pv-time support RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr() ...