summaryrefslogtreecommitdiff
path: root/arch/arm64
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2023-02-25 22:30:21 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2023-02-25 22:30:21 +0300
commit49d575926890e6ada930bf6f06d62b2fde8fce95 (patch)
tree2071ea5d42156e65b8b934b60c9dfcd62b9d196c /arch/arm64
parent01687e7c935ef70eca69ea2d468020bc93e898dc (diff)
parent45dd9bc75d9adc9483f0c7d662ba6e73ed698a0b (diff)
downloadlinux-49d575926890e6ada930bf6f06d62b2fde8fce95.tar.xz
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini: "ARM: - Provide a virtual cache topology to the guest to avoid inconsistencies with migration on heterogenous systems. Non secure software has no practical need to traverse the caches by set/way in the first place - Add support for taking stage-2 access faults in parallel. This was an accidental omission in the original parallel faults implementation, but should provide a marginal improvement to machines w/o FEAT_HAFDBS (such as hardware from the fruit company) - A preamble to adding support for nested virtualization to KVM, including vEL2 register state, rudimentary nested exception handling and masking unsupported features for nested guests - Fixes to the PSCI relay that avoid an unexpected host SVE trap when resuming a CPU when running pKVM - VGIC maintenance interrupt support for the AIC - Improvements to the arch timer emulation, primarily aimed at reducing the trap overhead of running nested - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the interest of CI systems - Avoid VM-wide stop-the-world operations when a vCPU accesses its own redistributor - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions in the host - Aesthetic and comment/kerneldoc fixes - Drop the vestiges of the old Columbia mailing list and add [Oliver] as co-maintainer RISC-V: - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE - Correctly place the guest in S-mode after redirecting a trap to the guest - Redirect illegal instruction traps to guest - SBI PMU support for guest s390: - Sort out confusion between virtual and physical addresses, which currently are the same on s390 - A new ioctl that performs cmpxchg on guest memory - A few fixes x86: - Change tdp_mmu to a read-only parameter - Separate TDP and shadow MMU page fault paths - Enable Hyper-V invariant TSC control - Fix a variety of APICv and AVIC bugs, some of them real-world, some of them affecting architecurally legal but unlikely to happen in practice - Mark APIC timer as expired if its in one-shot mode and the count underflows while the vCPU task was being migrated - Advertise support for Intel's new fast REP string features - Fix a double-shootdown issue in the emergency reboot code - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM similar treatment to VMX - Update Xen's TSC info CPUID sub-leaves as appropriate - Add support for Hyper-V's extended hypercalls, where "support" at this point is just forwarding the hypercalls to userspace - Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and MSR filters - One-off fixes and cleanups - Fix and cleanup the range-based TLB flushing code, used when KVM is running on Hyper-V - Add support for filtering PMU events using a mask. If userspace wants to restrict heavily what events the guest can use, it can now do so without needing an absurd number of filter entries - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU support is disabled - Add PEBS support for Intel Sapphire Rapids - Fix a mostly benign overflow bug in SEV's send|receive_update_data() - Move several SVM-specific flags into vcpu_svm x86 Intel: - Handle NMI VM-Exits before leaving the noinstr region - A few trivial cleanups in the VM-Enter flows - Stop enabling VMFUNC for L1 purely to document that KVM doesn't support EPTP switching (or any other VM function) for L1 - Fix a crash when using eVMCS's enlighted MSR bitmaps Generic: - Clean up the hardware enable and initialization flow, which was scattered around multiple arch-specific hooks. Instead, just let the arch code call into generic code. Both x86 and ARM should benefit from not having to fight common KVM code's notion of how to do initialization - Account allocations in generic kvm_arch_alloc_vm() - Fix a memory leak if coalesced MMIO unregistration fails selftests: - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit the correct hypercall instruction instead of relying on KVM to patch in VMMCALL - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits) KVM: SVM: hyper-v: placate modpost section mismatch error KVM: x86/mmu: Make tdp_mmu_allowed static KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes KVM: arm64: nv: Filter out unsupported features from ID regs KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 KVM: arm64: nv: Allow a sysreg to be hidden from userspace only KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 KVM: arm64: nv: Handle SMCs taken from virtual EL2 KVM: arm64: nv: Handle trapped ERET from virtual EL2 KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 KVM: arm64: nv: Support virtual EL2 exceptions KVM: arm64: nv: Handle HCR_EL2.NV system register traps KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state KVM: arm64: nv: Add EL2 system registers to vcpu context KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set KVM: arm64: nv: Introduce nested virtualization VCPU feature KVM: arm64: Use the S2 MMU context to iterate over S2 table ...
Diffstat (limited to 'arch/arm64')
-rw-r--r--arch/arm64/include/asm/cache.h9
-rw-r--r--arch/arm64/include/asm/el2_setup.h99
-rw-r--r--arch/arm64/include/asm/esr.h4
-rw-r--r--arch/arm64/include/asm/kvm_arm.h23
-rw-r--r--arch/arm64/include/asm/kvm_emulate.h70
-rw-r--r--arch/arm64/include/asm/kvm_host.h67
-rw-r--r--arch/arm64/include/asm/kvm_hyp.h1
-rw-r--r--arch/arm64/include/asm/kvm_mmu.h15
-rw-r--r--arch/arm64/include/asm/kvm_nested.h20
-rw-r--r--arch/arm64/include/asm/kvm_pgtable.h8
-rw-r--r--arch/arm64/include/asm/sysreg.h39
-rw-r--r--arch/arm64/include/uapi/asm/kvm.h1
-rw-r--r--arch/arm64/kernel/cacheinfo.c5
-rw-r--r--arch/arm64/kernel/cpufeature.c25
-rw-r--r--arch/arm64/kernel/hyp-stub.S86
-rw-r--r--arch/arm64/kvm/Kconfig1
-rw-r--r--arch/arm64/kvm/Makefile2
-rw-r--r--arch/arm64/kvm/arch_timer.c106
-rw-r--r--arch/arm64/kvm/arm.c109
-rw-r--r--arch/arm64/kvm/emulate-nested.c203
-rw-r--r--arch/arm64/kvm/fpsimd.c1
-rw-r--r--arch/arm64/kvm/guest.c6
-rw-r--r--arch/arm64/kvm/handle_exit.c47
-rw-r--r--arch/arm64/kvm/hyp/exception.c48
-rw-r--r--arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h21
-rw-r--r--arch/arm64/kvm/hyp/nvhe/hyp-init.S1
-rw-r--r--arch/arm64/kvm/hyp/nvhe/sys_regs.c1
-rw-r--r--arch/arm64/kvm/hyp/pgtable.c43
-rw-r--r--arch/arm64/kvm/hyp/vhe/switch.c26
-rw-r--r--arch/arm64/kvm/hypercalls.c2
-rw-r--r--arch/arm64/kvm/inject_fault.c61
-rw-r--r--arch/arm64/kvm/mmu.c46
-rw-r--r--arch/arm64/kvm/nested.c161
-rw-r--r--arch/arm64/kvm/pvtime.c8
-rw-r--r--arch/arm64/kvm/reset.c25
-rw-r--r--arch/arm64/kvm/sys_regs.c459
-rw-r--r--arch/arm64/kvm/sys_regs.h14
-rw-r--r--arch/arm64/kvm/trace_arm.h59
-rw-r--r--arch/arm64/kvm/vgic/vgic-init.c21
-rw-r--r--arch/arm64/kvm/vgic/vgic-mmio.c13
-rw-r--r--arch/arm64/kvm/vgic/vgic-v3.c9
-rw-r--r--arch/arm64/kvm/vmid.c6
-rw-r--r--arch/arm64/tools/cpucaps1
-rwxr-xr-xarch/arm64/tools/gen-sysreg.awk20
-rw-r--r--arch/arm64/tools/sysreg17
45 files changed, 1594 insertions, 415 deletions
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index c0b178d1bb4f..a51e6e8f3171 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -16,6 +16,15 @@
#define CLIDR_LOC(clidr) (((clidr) >> CLIDR_LOC_SHIFT) & 0x7)
#define CLIDR_LOUIS(clidr) (((clidr) >> CLIDR_LOUIS_SHIFT) & 0x7)
+/* Ctypen, bits[3(n - 1) + 2 : 3(n - 1)], for n = 1 to 7 */
+#define CLIDR_CTYPE_SHIFT(level) (3 * (level - 1))
+#define CLIDR_CTYPE_MASK(level) (7 << CLIDR_CTYPE_SHIFT(level))
+#define CLIDR_CTYPE(clidr, level) \
+ (((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level))
+
+/* Ttypen, bits [2(n - 1) + 34 : 2(n - 1) + 33], for n = 1 to 7 */
+#define CLIDR_TTYPE_SHIFT(level) (2 * ((level) - 1) + CLIDR_EL1_Ttypen_SHIFT)
+
/*
* Memory returned by kmalloc() may be used for DMA, so we must make
* sure that all such allocations are cache aligned. Otherwise,
diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index 2cdd010f9524..037724b19c5c 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -196,4 +196,103 @@
__init_el2_nvhe_prepare_eret
.endm
+#ifndef __KVM_NVHE_HYPERVISOR__
+// This will clobber tmp1 and tmp2, and expect tmp1 to contain
+// the id register value as read from the HW
+.macro __check_override idreg, fld, width, pass, fail, tmp1, tmp2
+ ubfx \tmp1, \tmp1, #\fld, #\width
+ cbz \tmp1, \fail
+
+ adr_l \tmp1, \idreg\()_override
+ ldr \tmp2, [\tmp1, FTR_OVR_VAL_OFFSET]
+ ldr \tmp1, [\tmp1, FTR_OVR_MASK_OFFSET]
+ ubfx \tmp2, \tmp2, #\fld, #\width
+ ubfx \tmp1, \tmp1, #\fld, #\width
+ cmp \tmp1, xzr
+ and \tmp2, \tmp2, \tmp1
+ csinv \tmp2, \tmp2, xzr, ne
+ cbnz \tmp2, \pass
+ b \fail
+.endm
+
+// This will clobber tmp1 and tmp2
+.macro check_override idreg, fld, pass, fail, tmp1, tmp2
+ mrs \tmp1, \idreg\()_el1
+ __check_override \idreg \fld 4 \pass \fail \tmp1 \tmp2
+.endm
+#else
+// This will clobber tmp
+.macro __check_override idreg, fld, width, pass, fail, tmp, ignore
+ ldr_l \tmp, \idreg\()_el1_sys_val
+ ubfx \tmp, \tmp, #\fld, #\width
+ cbnz \tmp, \pass
+ b \fail
+.endm
+
+.macro check_override idreg, fld, pass, fail, tmp, ignore
+ __check_override \idreg \fld 4 \pass \fail \tmp \ignore
+.endm
+#endif
+
+.macro finalise_el2_state
+ check_override id_aa64pfr0, ID_AA64PFR0_EL1_SVE_SHIFT, .Linit_sve_\@, .Lskip_sve_\@, x1, x2
+
+.Linit_sve_\@: /* SVE register access */
+ mrs x0, cptr_el2 // Disable SVE traps
+ bic x0, x0, #CPTR_EL2_TZ
+ msr cptr_el2, x0
+ isb
+ mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
+ msr_s SYS_ZCR_EL2, x1 // length for EL1.
+
+.Lskip_sve_\@:
+ check_override id_aa64pfr1, ID_AA64PFR1_EL1_SME_SHIFT, .Linit_sme_\@, .Lskip_sme_\@, x1, x2
+
+.Linit_sme_\@: /* SME register access and priority mapping */
+ mrs x0, cptr_el2 // Disable SME traps
+ bic x0, x0, #CPTR_EL2_TSM
+ msr cptr_el2, x0
+ isb
+
+ mrs x1, sctlr_el2
+ orr x1, x1, #SCTLR_ELx_ENTP2 // Disable TPIDR2 traps
+ msr sctlr_el2, x1
+ isb
+
+ mov x0, #0 // SMCR controls
+
+ // Full FP in SM?
+ mrs_s x1, SYS_ID_AA64SMFR0_EL1
+ __check_override id_aa64smfr0, ID_AA64SMFR0_EL1_FA64_SHIFT, 1, .Linit_sme_fa64_\@, .Lskip_sme_fa64_\@, x1, x2
+
+.Linit_sme_fa64_\@:
+ orr x0, x0, SMCR_ELx_FA64_MASK
+.Lskip_sme_fa64_\@:
+
+ // ZT0 available?
+ mrs_s x1, SYS_ID_AA64SMFR0_EL1
+ __check_override id_aa64smfr0, ID_AA64SMFR0_EL1_SMEver_SHIFT, 4, .Linit_sme_zt0_\@, .Lskip_sme_zt0_\@, x1, x2
+.Linit_sme_zt0_\@:
+ orr x0, x0, SMCR_ELx_EZT0_MASK
+.Lskip_sme_zt0_\@:
+
+ orr x0, x0, #SMCR_ELx_LEN_MASK // Enable full SME vector
+ msr_s SYS_SMCR_EL2, x0 // length for EL1.
+
+ mrs_s x1, SYS_SMIDR_EL1 // Priority mapping supported?
+ ubfx x1, x1, #SMIDR_EL1_SMPS_SHIFT, #1
+ cbz x1, .Lskip_sme_\@
+
+ msr_s SYS_SMPRIMAP_EL2, xzr // Make all priorities equal
+
+ mrs x1, id_aa64mmfr1_el1 // HCRX_EL2 present?
+ ubfx x1, x1, #ID_AA64MMFR1_EL1_HCX_SHIFT, #4
+ cbz x1, .Lskip_sme_\@
+
+ mrs_s x1, SYS_HCRX_EL2
+ orr x1, x1, #HCRX_EL2_SMPME_MASK // Enable priority mapping
+ msr_s SYS_HCRX_EL2, x1
+.Lskip_sme_\@:
+.endm
+
#endif /* __ARM_KVM_INIT_H__ */
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index c9f15b9e3c71..8487aec9b658 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -272,6 +272,10 @@
(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >> \
ESR_ELx_SYS64_ISS_OP2_SHIFT))
+/* ISS field definitions for ERET/ERETAA/ERETAB trapping */
+#define ESR_ELx_ERET_ISS_ERET 0x2
+#define ESR_ELx_ERET_ISS_ERETA 0x1
+
/*
* ISS field definitions for floating-point exception traps
* (FP_EXC_32/FP_EXC_64).
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 26b0c97df986..baef29fcbeee 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -81,11 +81,12 @@
* SWIO: Turn set/way invalidates into set/way clean+invalidate
* PTW: Take a stage2 fault if a stage1 walk steps in device memory
* TID3: Trap EL1 reads of group 3 ID registers
+ * TID2: Trap CTR_EL0, CCSIDR2_EL1, CLIDR_EL1, and CSSELR_EL1
*/
#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
HCR_BSU_IS | HCR_FB | HCR_TACR | \
HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
- HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 )
+ HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 | HCR_TID2)
#define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
#define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
#define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
@@ -344,10 +345,26 @@
ECN(SP_ALIGN), ECN(FP_EXC32), ECN(FP_EXC64), ECN(SERROR), \
ECN(BREAKPT_LOW), ECN(BREAKPT_CUR), ECN(SOFTSTP_LOW), \
ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \
- ECN(BKPT32), ECN(VECTOR32), ECN(BRK64)
+ ECN(BKPT32), ECN(VECTOR32), ECN(BRK64), ECN(ERET)
-#define CPACR_EL1_TTA (1 << 28)
#define CPACR_EL1_DEFAULT (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN |\
CPACR_EL1_ZEN_EL1EN)
+#define kvm_mode_names \
+ { PSR_MODE_EL0t, "EL0t" }, \
+ { PSR_MODE_EL1t, "EL1t" }, \
+ { PSR_MODE_EL1h, "EL1h" }, \
+ { PSR_MODE_EL2t, "EL2t" }, \
+ { PSR_MODE_EL2h, "EL2h" }, \
+ { PSR_MODE_EL3t, "EL3t" }, \
+ { PSR_MODE_EL3h, "EL3h" }, \
+ { PSR_AA32_MODE_USR, "32-bit USR" }, \
+ { PSR_AA32_MODE_FIQ, "32-bit FIQ" }, \
+ { PSR_AA32_MODE_IRQ, "32-bit IRQ" }, \
+ { PSR_AA32_MODE_SVC, "32-bit SVC" }, \
+ { PSR_AA32_MODE_ABT, "32-bit ABT" }, \
+ { PSR_AA32_MODE_HYP, "32-bit HYP" }, \
+ { PSR_AA32_MODE_UND, "32-bit UND" }, \
+ { PSR_AA32_MODE_SYS, "32-bit SYS" }
+
#endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 193583df2d9c..b31b32ecbe2d 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -33,6 +33,12 @@ enum exception_type {
except_type_serror = 0x180,
};
+#define kvm_exception_type_names \
+ { except_type_sync, "SYNC" }, \
+ { except_type_irq, "IRQ" }, \
+ { except_type_fiq, "FIQ" }, \
+ { except_type_serror, "SERROR" }
+
bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
void kvm_skip_instr32(struct kvm_vcpu *vcpu);
@@ -44,6 +50,10 @@ void kvm_inject_size_fault(struct kvm_vcpu *vcpu);
void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
+
#if defined(__KVM_VHE_HYPERVISOR__) || defined(__KVM_NVHE_HYPERVISOR__)
static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
{
@@ -88,10 +98,6 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
if (vcpu_el1_is_32bit(vcpu))
vcpu->arch.hcr_el2 &= ~HCR_RW;
- if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
- vcpu_el1_is_32bit(vcpu))
- vcpu->arch.hcr_el2 |= HCR_TID2;
-
if (kvm_has_mte(vcpu->kvm))
vcpu->arch.hcr_el2 |= HCR_ATA;
}
@@ -183,6 +189,62 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
vcpu_gp_regs(vcpu)->regs[reg_num] = val;
}
+static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
+{
+ switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
+ case PSR_MODE_EL2h:
+ case PSR_MODE_EL2t:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
+{
+ return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
+{
+ return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
+}
+
+static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
+{
+ return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
+{
+ return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
+}
+
+static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
+{
+ return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
+{
+ /*
+ * We are in a hypervisor context if the vcpu mode is EL2 or
+ * E2H and TGE bits are set. The latter means we are in the user space
+ * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
+ *
+ * Note that the HCR_EL2.{E2H,TGE}={0,1} isn't really handled in the
+ * rest of the KVM code, and will result in a misbehaving guest.
+ */
+ return vcpu_is_el2_ctxt(ctxt) ||
+ (__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
+ __vcpu_el2_tge_is_set(ctxt);
+}
+
+static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
+{
+ return __is_hyp_ctxt(&vcpu->arch.ctxt);
+}
+
/*
* The layout of SPSR for an AArch32 state is different when observed from an
* AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 35a159d131b5..a1892a8f6032 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -60,14 +60,19 @@
enum kvm_mode {
KVM_MODE_DEFAULT,
KVM_MODE_PROTECTED,
+ KVM_MODE_NV,
KVM_MODE_NONE,
};
+#ifdef CONFIG_KVM
enum kvm_mode kvm_get_mode(void);
+#else
+static inline enum kvm_mode kvm_get_mode(void) { return KVM_MODE_NONE; };
+#endif
DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
-extern unsigned int kvm_sve_max_vl;
-int kvm_arm_init_sve(void);
+extern unsigned int __ro_after_init kvm_sve_max_vl;
+int __init kvm_arm_init_sve(void);
u32 __attribute_const__ kvm_target_cpu(void);
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
@@ -252,6 +257,7 @@ struct kvm_vcpu_fault_info {
enum vcpu_sysreg {
__INVALID_SYSREG__, /* 0 is reserved as an invalid value */
MPIDR_EL1, /* MultiProcessor Affinity Register */
+ CLIDR_EL1, /* Cache Level ID Register */
CSSELR_EL1, /* Cache Size Selection Register */
SCTLR_EL1, /* System Control Register */
ACTLR_EL1, /* Auxiliary Control Register */
@@ -320,12 +326,43 @@ enum vcpu_sysreg {
TFSR_EL1, /* Tag Fault Status Register (EL1) */
TFSRE0_EL1, /* Tag Fault Status Register (EL0) */
- /* 32bit specific registers. Keep them at the end of the range */
+ /* 32bit specific registers. */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
FPEXC32_EL2, /* Floating-Point Exception Control Register */
DBGVCR32_EL2, /* Debug Vector Catch Register */
+ /* EL2 registers */
+ VPIDR_EL2, /* Virtualization Processor ID Register */
+ VMPIDR_EL2, /* Virtualization Multiprocessor ID Register */
+ SCTLR_EL2, /* System Control Register (EL2) */
+ ACTLR_EL2, /* Auxiliary Control Register (EL2) */
+ HCR_EL2, /* Hypervisor Configuration Register */
+ MDCR_EL2, /* Monitor Debug Configuration Register (EL2) */
+ CPTR_EL2, /* Architectural Feature Trap Register (EL2) */
+ HSTR_EL2, /* Hypervisor System Trap Register */
+ HACR_EL2, /* Hypervisor Auxiliary Control Register */
+ TTBR0_EL2, /* Translation Table Base Register 0 (EL2) */
+ TTBR1_EL2, /* Translation Table Base Register 1 (EL2) */
+ TCR_EL2, /* Translation Control Register (EL2) */
+ VTTBR_EL2, /* Virtualization Translation Table Base Register */
+ VTCR_EL2, /* Virtualization Translation Control Register */
+ SPSR_EL2, /* EL2 saved program status register */
+ ELR_EL2, /* EL2 exception link register */
+ AFSR0_EL2, /* Auxiliary Fault Status Register 0 (EL2) */
+ AFSR1_EL2, /* Auxiliary Fault Status Register 1 (EL2) */
+ ESR_EL2, /* Exception Syndrome Register (EL2) */
+ FAR_EL2, /* Fault Address Register (EL2) */
+ HPFAR_EL2, /* Hypervisor IPA Fault Address Register */
+ MAIR_EL2, /* Memory Attribute Indirection Register (EL2) */
+ AMAIR_EL2, /* Auxiliary Memory Attribute Indirection Register (EL2) */
+ VBAR_EL2, /* Vector Base Address Register (EL2) */
+ RVBAR_EL2, /* Reset Vector Base Address Register */
+ CONTEXTIDR_EL2, /* Context ID Register (EL2) */
+ TPIDR_EL2, /* EL2 Software Thread ID Register */
+ CNTHCTL_EL2, /* Counter-timer Hypervisor Control register */
+ SP_EL2, /* EL2 Stack Pointer */
+
NR_SYS_REGS /* Nothing after this line! */
};
@@ -501,6 +538,9 @@ struct kvm_vcpu_arch {
u64 last_steal;
gpa_t base;
} steal;
+
+ /* Per-vcpu CCSIDR override or NULL */
+ u32 *ccsidr;
};
/*
@@ -598,7 +638,7 @@ struct kvm_vcpu_arch {
#define EXCEPT_AA64_EL1_IRQ __vcpu_except_flags(1)
#define EXCEPT_AA64_EL1_FIQ __vcpu_except_flags(2)
#define EXCEPT_AA64_EL1_SERR __vcpu_except_flags(3)
-/* For AArch64 with NV (one day): */
+/* For AArch64 with NV: */
#define EXCEPT_AA64_EL2_SYNC __vcpu_except_flags(4)
#define EXCEPT_AA64_EL2_IRQ __vcpu_except_flags(5)
#define EXCEPT_AA64_EL2_FIQ __vcpu_except_flags(6)
@@ -609,6 +649,8 @@ struct kvm_vcpu_arch {
#define DEBUG_STATE_SAVE_SPE __vcpu_single_flag(iflags, BIT(5))
/* Save TRBE context if active */
#define DEBUG_STATE_SAVE_TRBE __vcpu_single_flag(iflags, BIT(6))
+/* vcpu running in HYP context */
+#define VCPU_HYP_CONTEXT __vcpu_single_flag(iflags, BIT(7))
/* SVE enabled for host EL0 */
#define HOST_SVE_ENABLED __vcpu_single_flag(sflags, BIT(0))
@@ -705,7 +747,6 @@ static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
return false;
switch (reg) {
- case CSSELR_EL1: *val = read_sysreg_s(SYS_CSSELR_EL1); break;
case SCTLR_EL1: *val = read_sysreg_s(SYS_SCTLR_EL12); break;
case CPACR_EL1: *val = read_sysreg_s(SYS_CPACR_EL12); break;
case TTBR0_EL1: *val = read_sysreg_s(SYS_TTBR0_EL12); break;
@@ -750,7 +791,6 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
return false;
switch (reg) {
- case CSSELR_EL1: write_sysreg_s(val, SYS_CSSELR_EL1); break;
case SCTLR_EL1: write_sysreg_s(val, SYS_SCTLR_EL12); break;
case CPACR_EL1: write_sysreg_s(val, SYS_CPACR_EL12); break;
case TTBR0_EL1: write_sysreg_s(val, SYS_TTBR0_EL12); break;
@@ -877,7 +917,7 @@ int kvm_handle_cp10_id(struct kvm_vcpu *vcpu);
void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
-int kvm_sys_reg_table_init(void);
+int __init kvm_sys_reg_table_init(void);
/* MMIO helpers */
void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
@@ -908,20 +948,20 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
-extern unsigned int kvm_arm_vmid_bits;
-int kvm_arm_vmid_alloc_init(void);
-void kvm_arm_vmid_alloc_free(void);
+extern unsigned int __ro_after_init kvm_arm_vmid_bits;
+int __init kvm_arm_vmid_alloc_init(void);
+void __init kvm_arm_vmid_alloc_free(void);
void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid);
void kvm_arm_vmid_clear_active(void);
static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
{
- vcpu_arch->steal.base = GPA_INVALID;
+ vcpu_arch->steal.base = INVALID_GPA;
}
static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch)
{
- return (vcpu_arch->steal.base != GPA_INVALID);
+ return (vcpu_arch->steal.base != INVALID_GPA);
}
void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome);
@@ -943,7 +983,6 @@ static inline bool kvm_system_needs_idmapped_vectors(void)
void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu);
-static inline void kvm_arch_hardware_unsetup(void) {}
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
@@ -994,7 +1033,7 @@ static inline void kvm_clr_pmu_events(u32 clr) {}
void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu);
-int kvm_set_ipa_limit(void);
+int __init kvm_set_ipa_limit(void);
#define __KVM_HAVE_ARCH_VM_ALLOC
struct kvm *kvm_arch_alloc_vm(void);
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 6797eafe7890..bdd9cf546d95 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -122,6 +122,7 @@ extern u64 kvm_nvhe_sym(id_aa64isar2_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
+extern u64 kvm_nvhe_sym(id_aa64smfr0_el1_sys_val);
extern unsigned long kvm_nvhe_sym(__icache_flags);
extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e4a7e6369499..083cc47dca08 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -115,6 +115,7 @@ alternative_cb_end
#include <asm/cache.h>
#include <asm/cacheflush.h>
#include <asm/mmu_context.h>
+#include <asm/kvm_emulate.h>
#include <asm/kvm_host.h>
void kvm_update_va_mask(struct alt_instr *alt,
@@ -163,7 +164,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
void __iomem **haddr);
int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
void **haddr);
-void free_hyp_pgds(void);
+void __init free_hyp_pgds(void);
void stage2_unmap_vm(struct kvm *kvm);
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
@@ -175,7 +176,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
phys_addr_t kvm_mmu_get_httbr(void);
phys_addr_t kvm_get_idmap_vector(void);
-int kvm_mmu_init(u32 *hyp_va_bits);
+int __init kvm_mmu_init(u32 *hyp_va_bits);
static inline void *__kvm_vector_slot2addr(void *base,
enum arm64_hyp_spectre_vector slot)
@@ -192,7 +193,15 @@ struct kvm;
static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
{
- return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+ u64 cache_bits = SCTLR_ELx_M | SCTLR_ELx_C;
+ int reg;
+
+ if (vcpu_is_el2(vcpu))
+ reg = SCTLR_EL2;
+ else
+ reg = SCTLR_EL1;
+
+ return (vcpu_read_sys_reg(vcpu, reg) & cache_bits) == cache_bits;
}
static inline void __clean_dcache_guest_page(void *va, size_t size)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
new file mode 100644
index 000000000000..8fb67f032fd1
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM64_KVM_NESTED_H
+#define __ARM64_KVM_NESTED_H
+
+#include <linux/kvm_host.h>
+
+static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
+{
+ return (!__is_defined(__KVM_NVHE_HYPERVISOR__) &&
+ cpus_have_final_cap(ARM64_HAS_NESTED_VIRT) &&
+ test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
+}
+
+struct sys_reg_params;
+struct sys_reg_desc;
+
+void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
+ const struct sys_reg_desc *r);
+
+#endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 63f81b27a4e3..4cd6762bda80 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -71,6 +71,11 @@ static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
return pte;
}
+static inline kvm_pfn_t kvm_pte_to_pfn(kvm_pte_t pte)
+{
+ return __phys_to_pfn(kvm_pte_to_phys(pte));
+}
+
static inline u64 kvm_granule_shift(u32 level)
{
/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
@@ -188,12 +193,15 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
* children.
* @KVM_PGTABLE_WALK_SHARED: Indicates the page-tables may be shared
* with other software walkers.
+ * @KVM_PGTABLE_WALK_HANDLE_FAULT: Indicates the page-table walk was
+ * invoked from a fault handler.
*/
enum kvm_pgtable_walk_flags {
KVM_PGTABLE_WALK_LEAF = BIT(0),
KVM_PGTABLE_WALK_TABLE_PRE = BIT(1),
KVM_PGTABLE_WALK_TABLE_POST = BIT(2),
KVM_PGTABLE_WALK_SHARED = BIT(3),
+ KVM_PGTABLE_WALK_HANDLE_FAULT = BIT(4),
};
struct kvm_pgtable_visit_ctx {
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 043ecc3405e7..9e3ecba3c4e6 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -325,7 +325,6 @@
#define SYS_CNTKCTL_EL1 sys_reg(3, 0, 14, 1, 0)
-#define SYS_CCSIDR_EL1 sys_reg(3, 1, 0, 0, 0)
#define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7)
#define SYS_RNDR_EL0 sys_reg(3, 3, 2, 4, 0)
@@ -411,23 +410,51 @@
#define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7)
+#define SYS_VPIDR_EL2 sys_reg(3, 4, 0, 0, 0)
+#define SYS_VMPIDR_EL2 sys_reg(3, 4, 0, 0, 5)
+
#define SYS_SCTLR_EL2 sys_reg(3, 4, 1, 0, 0)
+#define SYS_ACTLR_EL2 sys_reg(3, 4, 1, 0, 1)
+#define SYS_HCR_EL2 sys_reg(3, 4, 1, 1, 0)
+#define SYS_MDCR_EL2 sys_reg(3, 4, 1, 1, 1)
+#define SYS_CPTR_EL2 sys_reg(3, 4, 1, 1, 2)
+#define SYS_HSTR_EL2 sys_reg(3, 4, 1, 1, 3)
#define SYS_HFGRTR_EL2 sys_reg(3, 4, 1, 1, 4)
#define SYS_HFGWTR_EL2 sys_reg(3, 4, 1, 1, 5)
#define SYS_HFGITR_EL2 sys_reg(3, 4, 1, 1, 6)
+#define SYS_HACR_EL2 sys_reg(3, 4, 1, 1, 7)
+
+#define SYS_TTBR0_EL2 sys_reg(3, 4, 2, 0, 0)
+#define SYS_TTBR1_EL2 sys_reg(3, 4, 2, 0, 1)
+#define SYS_TCR_EL2 sys_reg(3, 4, 2, 0, 2)
+#define SYS_VTTBR_EL2 sys_reg(3, 4, 2, 1, 0)
+#define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2)
+
#define SYS_TRFCR_EL2 sys_reg(3, 4, 1, 2, 1)
#define SYS_HDFGRTR_EL2 sys_reg(3, 4, 3, 1, 4)
#define SYS_HDFGWTR_EL2 sys_reg(3, 4, 3, 1, 5)
#define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6)
#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
#define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1)
+#define SYS_SP_EL1 sys_reg(3, 4, 4, 1, 0)
#define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
+#define SYS_AFSR0_EL2 sys_reg(3, 4, 5, 1, 0)
+#define SYS_AFSR1_EL2 sys_reg(3, 4, 5, 1, 1)
#define SYS_ESR_EL2 sys_reg(3, 4, 5, 2, 0)
#define SYS_VSESR_EL2 sys_reg(3, 4, 5, 2, 3)
#define SYS_FPEXC32_EL2 sys_reg(3, 4, 5, 3, 0)
#define SYS_TFSR_EL2 sys_reg(3, 4, 5, 6, 0)
-#define SYS_VDISR_EL2 sys_reg(3, 4, 12, 1, 1)
+#define SYS_FAR_EL2 sys_reg(3, 4, 6, 0, 0)
+#define SYS_HPFAR_EL2 sys_reg(3, 4, 6, 0, 4)
+
+#define SYS_MAIR_EL2 sys_reg(3, 4, 10, 2, 0)
+#define SYS_AMAIR_EL2 sys_reg(3, 4, 10, 3, 0)
+
+#define SYS_VBAR_EL2 sys_reg(3, 4, 12, 0, 0)
+#define SYS_RVBAR_EL2 sys_reg(3, 4, 12, 0, 1)
+#define SYS_RMR_EL2 sys_reg(3, 4, 12, 0, 2)
+#define SYS_VDISR_EL2 sys_reg(3, 4, 12, 1, 1)
#define __SYS__AP0Rx_EL2(x) sys_reg(3, 4, 12, 8, x)
#define SYS_ICH_AP0R0_EL2 __SYS__AP0Rx_EL2(0)
#define SYS_ICH_AP0R1_EL2 __SYS__AP0Rx_EL2(1)
@@ -469,6 +496,12 @@
#define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
#define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)
+#define SYS_CONTEXTIDR_EL2 sys_reg(3, 4, 13, 0, 1)
+#define SYS_TPIDR_EL2 sys_reg(3, 4, 13, 0, 2)
+
+#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
+#define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)
+
/* VHE encodings for architectural EL0/1 system registers */
#define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0)
#define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0)
@@ -491,6 +524,8 @@
#define SYS_CNTV_CTL_EL02 sys_reg(3, 5, 14, 3, 1)
#define SYS_CNTV_CVAL_EL02 sys_reg(3, 5, 14, 3, 2)
+#define SYS_SP_EL2 sys_reg(3, 6, 4, 1, 0)
+
/* Common SCTLR_ELx flags. */
#define SCTLR_ELx_ENTP2 (BIT(60))
#define SCTLR_ELx_DSSBS (BIT(44))
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index a7a857f1784d..f8129c624b07 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -109,6 +109,7 @@ struct kvm_regs {
#define KVM_ARM_VCPU_SVE 4 /* enable SVE for this CPU */
#define KVM_ARM_VCPU_PTRAUTH_ADDRESS 5 /* VCPU uses address authentication */
#define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_HAS_EL2 7 /* Support nested virtualization */
struct kvm_vcpu_init {
__u32 target;
diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 3ba70985e3a2..c307f69e9b55 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -11,11 +11,6 @@
#include <linux/of.h>
#define MAX_CACHE_LEVEL 7 /* Max 7 level supported */
-/* Ctypen, bits[3(n - 1) + 2 : 3(n - 1)], for n = 1 to 7 */
-#define CLIDR_CTYPE_SHIFT(level) (3 * (level - 1))
-#define CLIDR_CTYPE_MASK(level) (7 << CLIDR_CTYPE_SHIFT(level))
-#define CLIDR_CTYPE(clidr, level) \
- (((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level))
int cache_line_size(void)
{
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 45a42cf2191c..87687e99fee3 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1967,6 +1967,20 @@ static void cpu_copy_el2regs(const struct arm64_cpu_capabilities *__unused)
write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);
}
+static bool has_nested_virt_support(const struct arm64_cpu_capabilities *cap,
+ int scope)
+{
+ if (kvm_get_mode() != KVM_MODE_NV)
+ return false;
+
+ if (!has_cpuid_feature(cap, scope)) {
+ pr_warn("unavailable: %s\n", cap->desc);
+ return false;
+ }
+
+ return true;
+}
+
#ifdef CONFIG_ARM64_PAN
static void cpu_enable_pan(const struct arm64_cpu_capabilities *__unused)
{
@@ -2263,6 +2277,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.cpu_enable = cpu_copy_el2regs,
},
{
+ .desc = "Nested Virtualization Support",
+ .capability = ARM64_HAS_NESTED_VIRT,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = has_nested_virt_support,
+ .sys_reg = SYS_ID_AA64MMFR2_EL1,
+ .sign = FTR_UNSIGNED,
+ .field_pos = ID_AA64MMFR2_EL1_NV_SHIFT,
+ .field_width = 4,
+ .min_field_value = ID_AA64MMFR2_EL1_NV_IMP,
+ },
+ {
.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
.type = ARM64_CPUCAP_SYSTEM_FEATURE,
.matches = has_32bit_el0,
diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
index 111ff33d93ee..9439240c3fcf 100644
--- a/arch/arm64/kernel/hyp-stub.S
+++ b/arch/arm64/kernel/hyp-stub.S
@@ -16,30 +16,6 @@
#include <asm/ptrace.h>
#include <asm/virt.h>
-// Warning, hardcoded register allocation
-// This will clobber x1 and x2, and expect x1 to contain
-// the id register value as read from the HW
-.macro __check_override idreg, fld, width, pass, fail
- ubfx x1, x1, #\fld, #\width
- cbz x1, \fail
-
- adr_l x1, \idreg\()_override
- ldr x2, [x1, FTR_OVR_VAL_OFFSET]
- ldr x1, [x1, FTR_OVR_MASK_OFFSET]
- ubfx x2, x2, #\fld, #\width
- ubfx x1, x1, #\fld, #\width
- cmp x1, xzr
- and x2, x2, x1
- csinv x2, x2, xzr, ne
- cbnz x2, \pass
- b \fail
-.endm
-
-.macro check_override idreg, fld, pass, fail
- mrs x1, \idreg\()_el1
- __check_override \idreg \fld 4 \pass \fail
-.endm
-
.text
.pushsection .hyp.text, "ax"
@@ -98,65 +74,7 @@ SYM_CODE_START_LOCAL(elx_sync)
SYM_CODE_END(elx_sync)
SYM_CODE_START_LOCAL(__finalise_el2)
- check_override id_aa64pfr0 ID_AA64PFR0_EL1_SVE_SHIFT .Linit_sve .Lskip_sve
-
-.Linit_sve: /* SVE register access */
- mrs x0, cptr_el2 // Disable SVE traps
- bic x0, x0, #CPTR_EL2_TZ
- msr cptr_el2, x0
- isb
- mov x1, #ZCR_ELx_LEN_MASK // SVE: Enable full vector
- msr_s SYS_ZCR_EL2, x1 // length for EL1.
-
-.Lskip_sve:
- check_override id_aa64pfr1 ID_AA64PFR1_EL1_SME_SHIFT .Linit_sme .Lskip_sme
-
-.Linit_sme: /* SME register access and priority mapping */
- mrs x0, cptr_el2 // Disable SME traps
- bic x0, x0, #CPTR_EL2_TSM
- msr cptr_el2, x0
- isb
-
- mrs x1, sctlr_el2
- orr x1, x1, #SCTLR_ELx_ENTP2 // Disable TPIDR2 traps
- msr sctlr_el2, x1
- isb
-
- mov x0, #0 // SMCR controls
-
- // Full FP in SM?
- mrs_s x1, SYS_ID_AA64SMFR0_EL1
- __check_override id_aa64smfr0 ID_AA64SMFR0_EL1_FA64_SHIFT 1 .Linit_sme_fa64 .Lskip_sme_fa64
-
-.Linit_sme_fa64:
- orr x0, x0, SMCR_ELx_FA64_MASK
-.Lskip_sme_fa64:
-
- // ZT0 available?
- mrs_s x1, SYS_ID_AA64SMFR0_EL1
- __check_override id_aa64smfr0 ID_AA64SMFR0_EL1_SMEver_SHIFT 4 .Linit_sme_zt0 .Lskip_sme_zt0
-.Linit_sme_zt0:
- orr x0, x0, SMCR_ELx_EZT0_MASK
-.Lskip_sme_zt0:
-
- orr x0, x0, #SMCR_ELx_LEN_MASK // Enable full SME vector
- msr_s SYS_SMCR_EL2, x0 // length for EL1.
-
- mrs_s x1, SYS_SMIDR_EL1 // Priority mapping supported?
- ubfx x1, x1, #SMIDR_EL1_SMPS_SHIFT, #1
- cbz x1, .Lskip_sme
-
- msr_s SYS_SMPRIMAP_EL2, xzr // Make all priorities equal
-
- mrs x1, id_aa64mmfr1_el1 // HCRX_EL2 present?
- ubfx x1, x1, #ID_AA64MMFR1_EL1_HCX_SHIFT, #4
- cbz x1, .Lskip_sme
-
- mrs_s x1, SYS_HCRX_EL2
- orr x1, x1, #HCRX_EL2_SMPME_MASK // Enable priority mapping
- msr_s SYS_HCRX_EL2, x1
-
-.Lskip_sme:
+ finalise_el2_state
// nVHE? No way! Give me the real thing!
// Sanity check: MMU *must* be off
@@ -164,7 +82,7 @@ SYM_CODE_START_LOCAL(__finalise_el2)
tbnz x1, #0, 1f
// Needs to be VHE capable, obviously
- check_override id_aa64mmfr1 ID_AA64MMFR1_EL1_VH_SHIFT 2f 1f
+ check_override id_aa64mmfr1 ID_AA64MMFR1_EL1_VH_SHIFT 2f 1f x1 x2
1: mov_q x0, HVC_STUB_ERR
eret
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 05da3c8f7e88..ca6eadeb7d1a 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -21,6 +21,7 @@ if VIRTUALIZATION
menuconfig KVM
bool "Kernel-based Virtual Machine (KVM) support"
depends on HAVE_KVM
+ select KVM_GENERIC_HARDWARE_ENABLING
select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select HAVE_KVM_CPU_RELAX_INTERCEPT
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5e33c2d4645a..c0c050e53157 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
inject_fault.o va_layout.o handle_exit.o \
guest.o debug.o reset.o sys_regs.o stacktrace.o \
vgic-sys-reg-v3.o fpsimd.o pkvm.o \
- arch_timer.o trng.o vmid.o \
+ arch_timer.o trng.o vmid.o emulate-nested.o nested.o \
vgic/vgic.o vgic/vgic-init.o \
vgic/vgic-irqfd.o vgic/vgic-v2.o \
vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index bb24a76b4224..00610477ec7b 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -428,14 +428,17 @@ static void timer_emulate(struct arch_timer_context *ctx)
* scheduled for the future. If the timer cannot fire at all,
* then we also don't need a soft timer.
*/
- if (!kvm_timer_irq_can_fire(ctx)) {
- soft_timer_cancel(&ctx->hrtimer);
+ if (should_fire || !kvm_timer_irq_can_fire(ctx))
return;
- }
soft_timer_start(&ctx->hrtimer, kvm_timer_compute_delta(ctx));
}
+static void set_cntvoff(u64 cntvoff)
+{
+ kvm_call_hyp(__kvm_timer_set_cntvoff, cntvoff);
+}
+
static void timer_save_state(struct arch_timer_context *ctx)
{
struct arch_timer_cpu *timer = vcpu_timer(ctx->vcpu);
@@ -459,6 +462,22 @@ static void timer_save_state(struct arch_timer_context *ctx)
write_sysreg_el0(0, SYS_CNTV_CTL);
isb();
+ /*
+ * The kernel may decide to run userspace after
+ * calling vcpu_put, so we reset cntvoff to 0 to
+ * ensure a consistent read between user accesses to
+ * the virtual counter and kernel access to the
+ * physical counter of non-VHE case.
+ *
+ * For VHE, the virtual counter uses a fixed virtual
+ * offset of zero, so no need to zero CNTVOFF_EL2
+ * register, but this is actually useful when switching
+ * between EL1/vEL2 with NV.
+ *
+ * Do it unconditionally, as this is either unavoidable
+ * or dirt cheap.
+ */
+ set_cntvoff(0);
break;
case TIMER_PTIMER:
timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
@@ -532,6 +551,7 @@ static void timer_restore_state(struct arch_timer_context *ctx)
switch (index) {
case TIMER_VTIMER:
+ set_cntvoff(timer_get_offset(ctx));
write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
isb();
write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
@@ -552,11 +572,6 @@ out:
local_irq_restore(flags);
}
-static void set_cntvoff(u64 cntvoff)
-{
- kvm_call_hyp(__kvm_timer_set_cntvoff, cntvoff);
-}
-
static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, bool active)
{
int r;
@@ -631,8 +646,6 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
kvm_timer_vcpu_load_nogic(vcpu);
}
- set_cntvoff(timer_get_offset(map.direct_vtimer));
-
kvm_timer_unblocking(vcpu);
timer_restore_state(map.direct_vtimer);
@@ -688,15 +701,6 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
if (kvm_vcpu_is_blocking(vcpu))
kvm_timer_blocking(vcpu);
-
- /*
- * The kernel may decide to run userspace after calling vcpu_put, so
- * we reset cntvoff to 0 to ensure a consistent read between user
- * accesses to the virtual counter and kernel access to the physical
- * counter of non-VHE case. For VHE, the virtual counter uses a fixed
- * virtual offset of zero, so no need to zero CNTVOFF_EL2 register.
- */
- set_cntvoff(0);
}
/*
@@ -811,10 +815,18 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
}
-static void kvm_timer_init_interrupt(void *info)
+void kvm_timer_cpu_up(void)
{
enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
- enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
+ if (host_ptimer_irq)
+ enable_percpu_irq(host_ptimer_irq, host_ptimer_irq_flags);
+}
+
+void kvm_timer_cpu_down(void)
+{
+ disable_percpu_irq(host_vtimer_irq);
+ if (host_ptimer_irq)
+ disable_percpu_irq(host_ptimer_irq);
}
int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
@@ -926,14 +938,22 @@ u64 kvm_arm_timer_read_sysreg(struct kvm_vcpu *vcpu,
enum kvm_arch_timers tmr,
enum kvm_arch_timer_regs treg)
{
+ struct arch_timer_context *timer;
+ struct timer_map map;
u64 val;
+ get_timer_map(vcpu, &map);
+ timer = vcpu_get_timer(vcpu, tmr);
+
+ if (timer == map.emul_ptimer)
+ return kvm_arm_timer_read(vcpu, timer, treg);
+
preempt_disable();
- kvm_timer_vcpu_put(vcpu);
+ timer_save_state(timer);
- val = kvm_arm_timer_read(vcpu, vcpu_get_timer(vcpu, tmr), treg);
+ val = kvm_arm_timer_read(vcpu, timer, treg);
- kvm_timer_vcpu_load(vcpu);
+ timer_restore_state(timer);
preempt_enable();
return val;
@@ -967,25 +987,22 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
enum kvm_arch_timer_regs treg,
u64 val)
{
- preempt_disable();
- kvm_timer_vcpu_put(vcpu);
-
- kvm_arm_timer_write(vcpu, vcpu_get_timer(vcpu, tmr), treg, val);
-
- kvm_timer_vcpu_load(vcpu);
- preempt_enable();
-}
-
-static int kvm_timer_starting_cpu(unsigned int cpu)
-{
- kvm_timer_init_interrupt(NULL);
- return 0;
-}
+ struct arch_timer_context *timer;
+ struct timer_map map;
-static int kvm_timer_dying_cpu(unsigned int cpu)
-{
- disable_percpu_irq(host_vtimer_irq);
- return 0;
+ get_timer_map(vcpu, &map);
+ timer = vcpu_get_timer(vcpu, tmr);
+ if (timer == map.emul_ptimer) {
+ soft_timer_cancel(&timer->hrtimer);
+ kvm_arm_timer_write(vcpu, timer, treg, val);
+ timer_emulate(timer);
+ } else {
+ preempt_disable();
+ timer_save_state(timer);
+ kvm_arm_timer_write(vcpu, timer, treg, val);
+ timer_restore_state(timer);
+ preempt_enable();
+ }
}
static int timer_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
@@ -1117,7 +1134,7 @@ static int kvm_irq_init(struct arch_timer_kvm_info *info)
return 0;
}
-int kvm_timer_hyp_init(bool has_gic)
+int __init kvm_timer_hyp_init(bool has_gic)
{
struct arch_timer_kvm_info *info;
int err;
@@ -1185,9 +1202,6 @@ int kvm_timer_hyp_init(bool has_gic)
goto out_free_irq;
}
- cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
- "kvm/arm/timer:starting", kvm_timer_starting_cpu,
- kvm_timer_dying_cpu);
return 0;
out_free_irq:
free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus());
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9c5573bc4614..3bd732eaf087 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -63,16 +63,6 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}
-int kvm_arch_hardware_setup(void *opaque)
-{
- return 0;
-}
-
-int kvm_arch_check_processor_compat(void *opaque)
-{
- return 0;
-}
-
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
@@ -146,7 +136,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret)
goto err_unshare_kvm;
- if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL)) {
+ if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL_ACCOUNT)) {
ret = -ENOMEM;
goto err_unshare_kvm;
}
@@ -1539,7 +1529,7 @@ static int kvm_init_vector_slots(void)
return 0;
}
-static void cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
+static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
{
struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
unsigned long tcr;
@@ -1682,7 +1672,15 @@ static void _kvm_arch_hardware_enable(void *discard)
int kvm_arch_hardware_enable(void)
{
+ int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
+
_kvm_arch_hardware_enable(NULL);
+
+ if (!was_enabled) {
+ kvm_vgic_cpu_up();
+ kvm_timer_cpu_up();
+ }
+
return 0;
}
@@ -1696,6 +1694,11 @@ static void _kvm_arch_hardware_disable(void *discard)
void kvm_arch_hardware_disable(void)
{
+ if (__this_cpu_read(kvm_arm_hardware_enabled)) {
+ kvm_timer_cpu_down();
+ kvm_vgic_cpu_down();
+ }
+
if (!is_protected_kvm_enabled())
_kvm_arch_hardware_disable(NULL);
}
@@ -1738,26 +1741,26 @@ static struct notifier_block hyp_init_cpu_pm_nb = {
.notifier_call = hyp_init_cpu_pm_notifier,
};
-static void hyp_cpu_pm_init(void)
+static void __init hyp_cpu_pm_init(void)
{
if (!is_protected_kvm_enabled())
cpu_pm_register_notifier(&hyp_init_cpu_pm_nb);
}
-static void hyp_cpu_pm_exit(void)
+static void __init hyp_cpu_pm_exit(void)
{
if (!is_protected_kvm_enabled())
cpu_pm_unregister_notifier(&hyp_init_cpu_pm_nb);
}
#else
-static inline void hyp_cpu_pm_init(void)
+static inline void __init hyp_cpu_pm_init(void)
{
}
-static inline void hyp_cpu_pm_exit(void)
+static inline void __init hyp_cpu_pm_exit(void)
{
}
#endif
-static void init_cpu_logical_map(void)
+static void __init init_cpu_logical_map(void)
{
unsigned int cpu;
@@ -1774,7 +1777,7 @@ static void init_cpu_logical_map(void)
#define init_psci_0_1_impl_state(config, what) \
config.psci_0_1_ ## what ## _implemented = psci_ops.what
-static bool init_psci_relay(void)
+static bool __init init_psci_relay(void)
{
/*
* If PSCI has not been initialized, protected KVM cannot install
@@ -1797,7 +1800,7 @@ static bool init_psci_relay(void)
return true;
}
-static int init_subsystems(void)
+static int __init init_subsystems(void)
{
int err = 0;
@@ -1838,13 +1841,22 @@ static int init_subsystems(void)
kvm_register_perf_callbacks(NULL);
out:
+ if (err)
+ hyp_cpu_pm_exit();
+
if (err || !is_protected_kvm_enabled())
on_each_cpu(_kvm_arch_hardware_disable, NULL, 1);
return err;
}
-static void teardown_hyp_mode(void)
+static void __init teardown_subsystems(void)
+{
+ kvm_unregister_perf_callbacks();
+ hyp_cpu_pm_exit();
+}
+
+static void __init teardown_hyp_mode(void)
{
int cpu;
@@ -1855,7 +1867,7 @@ static void teardown_hyp_mode(void)
}
}
-static int do_pkvm_init(u32 hyp_va_bits)
+static int __init do_pkvm_init(u32 hyp_va_bits)
{
void *per_cpu_base = kvm_ksym_ref(kvm_nvhe_sym(kvm_arm_hyp_percpu_base));
int ret;
@@ -1887,11 +1899,12 @@ static void kvm_hyp_init_symbols(void)
kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+ kvm_nvhe_sym(id_aa64smfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64SMFR0_EL1);
kvm_nvhe_sym(__icache_flags) = __icache_flags;
kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits;
}
-static int kvm_hyp_init_protection(u32 hyp_va_bits)
+static int __init kvm_hyp_init_protection(u32 hyp_va_bits)
{
void *addr = phys_to_virt(hyp_mem_base);
int ret;
@@ -1909,10 +1922,8 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
return 0;
}
-/**
- * Inits Hyp-mode on all online CPUs
- */
-static int init_hyp_mode(void)
+/* Inits Hyp-mode on all online CPUs */
+static int __init init_hyp_mode(void)
{
u32 hyp_va_bits;
int cpu;
@@ -2094,7 +2105,7 @@ out_err:
return err;
}
-static void _kvm_host_prot_finalize(void *arg)
+static void __init _kvm_host_prot_finalize(void *arg)
{
int *err = arg;
@@ -2102,7 +2113,7 @@ static void _kvm_host_prot_finalize(void *arg)
WRITE_ONCE(*err, -EINVAL);
}
-static int pkvm_drop_host_privileges(void)
+static int __init pkvm_drop_host_privileges(void)
{
int ret = 0;
@@ -2115,7 +2126,7 @@ static int pkvm_drop_host_privileges(void)
return ret;
}
-static int finalize_hyp_mode(void)
+static int __init finalize_hyp_mode(void)
{
if (!is_protected_kvm_enabled())
return 0;
@@ -2187,10 +2198,8 @@ void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *cons)
kvm_arm_resume_guest(irqfd->kvm);
}
-/**
- * Initialize Hyp-mode and memory mappings on all CPUs.
- */
-int kvm_arch_init(void *opaque)
+/* Initialize Hyp-mode and memory mappings on all CPUs */
+static __init int kvm_arm_init(void)
{
int err;
bool in_hyp_mode;
@@ -2241,7 +2250,7 @@ int kvm_arch_init(void *opaque)
err = kvm_init_vector_slots();
if (err) {
kvm_err("Cannot initialise vector slots\n");
- goto out_err;
+ goto out_hyp;
}
err = init_subsystems();
@@ -2252,7 +2261,7 @@ int kvm_arch_init(void *opaque)
err = finalize_hyp_mode();
if (err) {
kvm_err("Failed to finalize Hyp protection\n");
- goto out_hyp;
+ goto out_subs;
}
}
@@ -2264,10 +2273,19 @@ int kvm_arch_init(void *opaque)
kvm_info("Hyp mode initialized successfully\n");
}
+ /*
+ * FIXME: Do something reasonable if kvm_init() fails after pKVM
+ * hypervisor protection is finalized.
+ */
+ err = kvm_init(sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ if (err)
+ goto out_subs;
+
return 0;
+out_subs:
+ teardown_subsystems();
out_hyp:
- hyp_cpu_pm_exit();
if (!in_hyp_mode)
teardown_hyp_mode();
out_err:
@@ -2275,12 +2293,6 @@ out_err:
return err;
}
-/* NOP: Compiling as a module not supported */
-void kvm_arch_exit(void)
-{
- kvm_unregister_perf_callbacks();
-}
-
static int __init early_kvm_mode_cfg(char *arg)
{
if (!arg)
@@ -2310,6 +2322,11 @@ static int __init early_kvm_mode_cfg(char *arg)
return 0;
}
+ if (strcmp(arg, "nested") == 0 && !WARN_ON(!is_kernel_in_hyp_mode())) {
+ kvm_mode = KVM_MODE_NV;
+ return 0;
+ }
+
return -EINVAL;
}
early_param("kvm-arm.mode", early_kvm_mode_cfg);
@@ -2319,10 +2336,4 @@ enum kvm_mode kvm_get_mode(void)
return kvm_mode;
}
-static int arm_init(void)
-{
- int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
- return rc;
-}
-
-module_init(arm_init);
+module_init(kvm_arm_init);
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
new file mode 100644
index 000000000000..b96662029fb1
--- /dev/null
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2016 - Linaro and Columbia University
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
+
+#include "hyp/include/hyp/adjust_pc.h"
+
+#include "trace.h"
+
+static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
+{
+ u64 mode = spsr & PSR_MODE_MASK;
+
+ /*
+ * Possible causes for an Illegal Exception Return from EL2:
+ * - trying to return to EL3
+ * - trying to return to an illegal M value
+ * - trying to return to a 32bit EL
+ * - trying to return to EL1 with HCR_EL2.TGE set
+ */
+ if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
+ mode == 0b00001 || (mode & BIT(1)) ||
+ (spsr & PSR_MODE32_BIT) ||
+ (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
+ mode == PSR_MODE_EL1h))) {
+ /*
+ * The guest is playing with our nerves. Preserve EL, SP,
+ * masks, flags from the existing PSTATE, and set IL.
+ * The HW will then generate an Illegal State Exception
+ * immediately after ERET.
+ */
+ spsr = *vcpu_cpsr(vcpu);
+
+ spsr &= (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
+ PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT |
+ PSR_MODE_MASK | PSR_MODE32_BIT);
+ spsr |= PSR_IL_BIT;
+ }
+
+ return spsr;
+}
+
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
+{
+ u64 spsr, elr, mode;
+ bool direct_eret;
+
+ /*
+ * Going through the whole put/load motions is a waste of time
+ * if this is a VHE guest hypervisor returning to its own
+ * userspace, or the hypervisor performing a local exception
+ * return. No need to save/restore registers, no need to
+ * switch S2 MMU. Just do the canonical ERET.
+ */
+ spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+ spsr = kvm_check_illegal_exception_return(vcpu, spsr);
+
+ mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+ direct_eret = (mode == PSR_MODE_EL0t &&
+ vcpu_el2_e2h_is_set(vcpu) &&
+ vcpu_el2_tge_is_set(vcpu));
+ direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+ if (direct_eret) {
+ *vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
+ *vcpu_cpsr(vcpu) = spsr;
+ trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
+ return;
+ }
+
+ preempt_disable();
+ kvm_arch_vcpu_put(vcpu);
+
+ elr = __vcpu_sys_reg(vcpu, ELR_EL2);
+
+ trace_kvm_nested_eret(vcpu, elr, spsr);
+
+ /*
+ * Note that the current exception level is always the virtual EL2,
+ * since we set HCR_EL2.NV bit only when entering the virtual EL2.
+ */
+ *vcpu_pc(vcpu) = elr;
+ *vcpu_cpsr(vcpu) = spsr;
+
+ kvm_arch_vcpu_load(vcpu, smp_processor_id());
+ preempt_enable();
+}
+
+static void kvm_inject_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
+ enum exception_type type)
+{
+ trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
+
+ switch (type) {
+ case except_type_sync:
+ kvm_pend_exception(vcpu, EXCEPT_AA64_EL2_SYNC);
+ vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
+ break;
+ case except_type_irq:
+ kvm_pend_exception(vcpu, EXCEPT_AA64_EL2_IRQ);
+ break;
+ default:
+ WARN_ONCE(1, "Unsupported EL2 exception injection %d\n", type);
+ }
+}
+
+/*
+ * Emulate taking an exception to EL2.
+ * See ARM ARM J8.1.2 AArch64.TakeException()
+ */
+static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
+ enum exception_type type)
+{
+ u64 pstate, mode;
+ bool direct_inject;
+
+ if (!vcpu_has_nv(vcpu)) {
+ kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+ __func__);
+ return -EINVAL;
+ }
+
+ /*
+ * As for ERET, we can avoid doing too much on the injection path by
+ * checking that we either took the exception from a VHE host
+ * userspace or from vEL2. In these cases, there is no change in
+ * translation regime (or anything else), so let's do as little as
+ * possible.
+ */
+ pstate = *vcpu_cpsr(vcpu);
+ mode = pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+ direct_inject = (mode == PSR_MODE_EL0t &&
+ vcpu_el2_e2h_is_set(vcpu) &&
+ vcpu_el2_tge_is_set(vcpu));
+ direct_inject |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+ if (direct_inject) {
+ kvm_inject_el2_exception(vcpu, esr_el2, type);
+ return 1;
+ }
+
+ preempt_disable();
+
+ /*
+ * We may have an exception or PC update in the EL0/EL1 context.
+ * Commit it before entering EL2.
+ */
+ __kvm_adjust_pc(vcpu);
+
+ kvm_arch_vcpu_put(vcpu);
+
+ kvm_inject_el2_exception(vcpu, esr_el2, type);
+
+ /*
+ * A hard requirement is that a switch between EL1 and EL2
+ * contexts has to happen between a put/load, so that we can
+ * pick the correct timer and interrupt configuration, among
+ * other things.
+ *
+ * Make sure the exception actually took place before we load
+ * the new context.
+ */
+ __kvm_adjust_pc(vcpu);
+
+ kvm_arch_vcpu_load(vcpu, smp_processor_id());
+ preempt_enable();
+
+ return 1;
+}
+
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+ return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
+}
+
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
+{
+ /*
+ * Do not inject an irq if the:
+ * - Current exception level is EL2, and
+ * - virtual HCR_EL2.TGE == 0
+ * - virtual HCR_EL2.IMO == 0
+ *
+ * See Table D1-17 "Physical interrupt target and masking when EL3 is
+ * not implemented and EL2 is implemented" in ARM DDI 0487C.a.
+ */
+
+ if (vcpu_is_el2(vcpu) && !vcpu_el2_tge_is_set(vcpu) &&
+ !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO))
+ return 1;
+
+ /* esr_el2 value doesn't matter for exits due to irqs. */
+ return kvm_inject_nested(vcpu, 0, except_type_irq);
+}
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 235775d0c825..1279949599b5 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -184,6 +184,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
sysreg_clear_set(CPACR_EL1,
CPACR_EL1_SMEN_EL0EN,
CPACR_EL1_SMEN_EL1EN);
+ isb();
}
if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index cf4c495a4321..07444fa22888 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -24,6 +24,7 @@
#include <asm/fpsimd.h>
#include <asm/kvm.h>
#include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
#include <asm/sigcontext.h>
#include "trace.h"
@@ -253,6 +254,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
if (!vcpu_el1_is_32bit(vcpu))
return -EINVAL;
break;
+ case PSR_MODE_EL2h:
+ case PSR_MODE_EL2t:
+ if (!vcpu_has_nv(vcpu))
+ return -EINVAL;
+ fallthrough;
case PSR_MODE_EL0t:
case PSR_MODE_EL1t:
case PSR_MODE_EL1h:
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e778eefcf214..a798c0b4d717 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -16,6 +16,7 @@
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
#include <asm/debug-monitors.h>
#include <asm/stacktrace/nvhe.h>
#include <asm/traps.h>
@@ -41,6 +42,16 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
kvm_vcpu_hvc_get_imm(vcpu));
vcpu->stat.hvc_exit_stat++;
+ /* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
+ if (vcpu_has_nv(vcpu)) {
+ if (vcpu_read_sys_reg(vcpu, HCR_EL2) & HCR_HCD)
+ kvm_inject_undefined(vcpu);
+ else
+ kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+ return 1;
+ }
+
ret = kvm_hvc_call_handler(vcpu);
if (ret < 0) {
vcpu_set_reg(vcpu, 0, ~0UL);
@@ -52,6 +63,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
static int handle_smc(struct kvm_vcpu *vcpu)
{
+ int ret;
+
/*
* "If an SMC instruction executed at Non-secure EL1 is
* trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
@@ -59,10 +72,30 @@ static int handle_smc(struct kvm_vcpu *vcpu)
*
* We need to advance the PC after the trap, as it would
* otherwise return to the same address...
+ *
+ * Only handle SMCs from the virtual EL2 with an immediate of zero and
+ * skip it otherwise.
*/
- vcpu_set_reg(vcpu, 0, ~0UL);
+ if (!vcpu_is_el2(vcpu) || kvm_vcpu_hvc_get_imm(vcpu)) {
+ vcpu_set_reg(vcpu, 0, ~0UL);
+ kvm_incr_pc(vcpu);
+ return 1;
+ }
+
+ /*
+ * If imm is zero then it is likely an SMCCC call.
+ *
+ * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
+ * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
+ * being treated as UNDEFINED.
+ */
+ ret = kvm_hvc_call_handler(vcpu);
+ if (ret < 0)
+ vcpu_set_reg(vcpu, 0, ~0UL);
+
kvm_incr_pc(vcpu);
- return 1;
+
+ return ret;
}
/*
@@ -196,6 +229,15 @@ static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu)
return 1;
}
+static int kvm_handle_eret(struct kvm_vcpu *vcpu)
+{
+ if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_ERET_ISS_ERET)
+ return kvm_handle_ptrauth(vcpu);
+
+ kvm_emulate_nested_eret(vcpu);
+ return 1;
+}
+
static exit_handle_fn arm_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = kvm_handle_unknown_ec,
[ESR_ELx_EC_WFx] = kvm_handle_wfx,
@@ -211,6 +253,7 @@ static exit_handle_fn arm_exit_handlers[] = {
[ESR_ELx_EC_SMC64] = handle_smc,
[ESR_ELx_EC_SYS64] = kvm_handle_sys_reg,
[ESR_ELx_EC_SVE] = handle_sve,
+ [ESR_ELx_EC_ERET] = kvm_handle_eret,
[ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 791d3de76771..424a5107cddb 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -14,6 +14,7 @@
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
#if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
#error Hypervisor code only!
@@ -23,7 +24,9 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
{
u64 val;
- if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+ if (unlikely(vcpu_has_nv(vcpu)))
+ return vcpu_read_sys_reg(vcpu, reg);
+ else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
return val;
return __vcpu_sys_reg(vcpu, reg);
@@ -31,18 +34,25 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
{
- if (__vcpu_write_sys_reg_to_cpu(val, reg))
- return;
-
- __vcpu_sys_reg(vcpu, reg) = val;
+ if (unlikely(vcpu_has_nv(vcpu)))
+ vcpu_write_sys_reg(vcpu, val, reg);
+ else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
+ __vcpu_sys_reg(vcpu, reg) = val;
}
-static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
+static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long target_mode,
+ u64 val)
{
- if (has_vhe())
+ if (unlikely(vcpu_has_nv(vcpu))) {
+ if (target_mode == PSR_MODE_EL1h)
+ vcpu_write_sys_reg(vcpu, val, SPSR_EL1);
+ else
+ vcpu_write_sys_reg(vcpu, val, SPSR_EL2);
+ } else if (has_vhe()) {
write_sysreg_el1(val, SYS_SPSR);
- else
+ } else {
__vcpu_sys_reg(vcpu, SPSR_EL1) = val;
+ }
}
static void __vcpu_write_spsr_abt(struct kvm_vcpu *vcpu, u64 val)
@@ -101,6 +111,11 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
break;
+ case PSR_MODE_EL2h:
+ vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
+ sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
+ __vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
+ break;
default:
/* Don't do that */
BUG();
@@ -153,7 +168,7 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
new |= target_mode;
*vcpu_cpsr(vcpu) = new;
- __vcpu_write_spsr(vcpu, old);
+ __vcpu_write_spsr(vcpu, target_mode, old);
}
/*
@@ -323,11 +338,20 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
case unpack_vcpu_flag(EXCEPT_AA64_EL1_SYNC):
enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
break;
+
+ case unpack_vcpu_flag(EXCEPT_AA64_EL2_SYNC):
+ enter_exception64(vcpu, PSR_MODE_EL2h, except_type_sync);
+ break;
+
+ case unpack_vcpu_flag(EXCEPT_AA64_EL2_IRQ):
+ enter_exception64(vcpu, PSR_MODE_EL2h, except_type_irq);
+ break;
+
default:
/*
- * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
- * will be implemented at some point. Everything
- * else gets silently ignored.
+ * Only EL1_SYNC and EL2_{SYNC,IRQ} makes
+ * sense so far. Everything else gets silently
+ * ignored.
*/
break;
}
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index baa5b9b3dde5..699ea1f8d409 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -39,7 +39,6 @@ static inline bool ctxt_has_mte(struct kvm_cpu_context *ctxt)
static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
{
- ctxt_sys_reg(ctxt, CSSELR_EL1) = read_sysreg(csselr_el1);
ctxt_sys_reg(ctxt, SCTLR_EL1) = read_sysreg_el1(SYS_SCTLR);
ctxt_sys_reg(ctxt, CPACR_EL1) = read_sysreg_el1(SYS_CPACR);
ctxt_sys_reg(ctxt, TTBR0_EL1) = read_sysreg_el1(SYS_TTBR0);
@@ -95,7 +94,6 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
{
write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1), vmpidr_el2);
- write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1), csselr_el1);
if (has_vhe() ||
!cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
@@ -156,9 +154,26 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
write_sysreg_el1(ctxt_sys_reg(ctxt, SPSR_EL1), SYS_SPSR);
}
+/* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */
+static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt)
+{
+ u64 mode = ctxt->regs.pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+ switch (mode) {
+ case PSR_MODE_EL2t:
+ mode = PSR_MODE_EL1t;
+ break;
+ case PSR_MODE_EL2h:
+ mode = PSR_MODE_EL1h;
+ break;
+ }
+
+ return (ctxt->regs.pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
+}
+
static inline void __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
{
- u64 pstate = ctxt->regs.pstate;
+ u64 pstate = to_hw_pstate(ctxt);
u64 mode = pstate & PSR_AA32_MODE_MASK;
/*
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
index c953fb4b9a13..a6d67c2bb5ae 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
@@ -183,6 +183,7 @@ SYM_CODE_START_LOCAL(__kvm_hyp_init_cpu)
/* Initialize EL2 CPU state to sane values. */
init_el2_state // Clobbers x0..x2
+ finalise_el2_state
/* Enable MMU, set vectors and stack. */
mov x0, x28
diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
index 0f9ac25afdf4..08d2b004f4b7 100644
--- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
+++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
@@ -26,6 +26,7 @@ u64 id_aa64isar2_el1_sys_val;
u64 id_aa64mmfr0_el1_sys_val;
u64 id_aa64mmfr1_el1_sys_val;
u64 id_aa64mmfr2_el1_sys_val;
+u64 id_aa64smfr0_el1_sys_val;
/*
* Inject an unknown/undefined exception to an AArch64 guest while most of its
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b11cf2c618a6..3d61bd3e591d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -168,6 +168,25 @@ static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
return walker->cb(ctx, visit);
}
+static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
+ int r)
+{
+ /*
+ * Visitor callbacks return EAGAIN when the conditions that led to a
+ * fault are no longer reflected in the page tables due to a race to
+ * update a PTE. In the context of a fault handler this is interpreted
+ * as a signal to retry guest execution.
+ *
+ * Ignore the return code altogether for walkers outside a fault handler
+ * (e.g. write protecting a range of memory) and chug along with the
+ * page table walk.
+ */
+ if (r == -EAGAIN)
+ return !(walker->flags & KVM_PGTABLE_WALK_HANDLE_FAULT);
+
+ return !r;
+}
+
static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
struct kvm_pgtable_mm_ops *mm_ops, kvm_pteref_t pgtable, u32 level);
@@ -200,7 +219,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
table = kvm_pte_table(ctx.old, level);
}
- if (ret)
+ if (!kvm_pgtable_walk_continue(data->walker, ret))
goto out;
if (!table) {
@@ -211,13 +230,16 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
childp = (kvm_pteref_t)kvm_pte_follow(ctx.old, mm_ops);
ret = __kvm_pgtable_walk(data, mm_ops, childp, level + 1);
- if (ret)
+ if (!kvm_pgtable_walk_continue(data->walker, ret))
goto out;
if (ctx.flags & KVM_PGTABLE_WALK_TABLE_POST)
ret = kvm_pgtable_visitor_cb(data, &ctx, KVM_PGTABLE_WALK_TABLE_POST);
out:
+ if (kvm_pgtable_walk_continue(data->walker, ret))
+ return 0;
+
return ret;
}
@@ -584,12 +606,14 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
lvls = 2;
vtcr |= VTCR_EL2_LVLS_TO_SL0(lvls);
+#ifdef CONFIG_ARM64_HW_AFDBM
/*
* Enable the Hardware Access Flag management, unconditionally
* on all CPUs. The features is RES0 on CPUs without the support
* and must be ignored by the CPUs.
*/
vtcr |= VTCR_EL2_HA;
+#endif /* CONFIG_ARM64_HW_AFDBM */
/* Set the vmid bits */
vtcr |= (get_vmid_bits(mmfr1) == 16) ?
@@ -1026,7 +1050,7 @@ static int stage2_attr_walker(const struct kvm_pgtable_visit_ctx *ctx,
struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
if (!kvm_pte_valid(ctx->old))
- return 0;
+ return -EAGAIN;
data->level = ctx->level;
data->pte = pte;
@@ -1094,9 +1118,15 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
{
kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
- &pte, NULL, 0);
- dsb(ishst);
+ int ret;
+
+ ret = stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
+ &pte, NULL,
+ KVM_PGTABLE_WALK_HANDLE_FAULT |
+ KVM_PGTABLE_WALK_SHARED);
+ if (!ret)
+ dsb(ishst);
+
return pte;
}
@@ -1141,6 +1171,7 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL, &level,
+ KVM_PGTABLE_WALK_HANDLE_FAULT |
KVM_PGTABLE_WALK_SHARED);
if (!ret)
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, level);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 1a97391fedd2..cd3f3117bf16 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -40,7 +40,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
___activate_traps(vcpu);
val = read_sysreg(cpacr_el1);
- val |= CPACR_EL1_TTA;
+ val |= CPACR_ELx_TTA;
val &= ~(CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN |
CPACR_EL1_SMEN_EL0EN | CPACR_EL1_SMEN_EL1EN);
@@ -120,6 +120,25 @@ static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
{
+ /*
+ * If we were in HYP context on entry, adjust the PSTATE view
+ * so that the usual helpers work correctly.
+ */
+ if (unlikely(vcpu_get_flag(vcpu, VCPU_HYP_CONTEXT))) {
+ u64 mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+ switch (mode) {
+ case PSR_MODE_EL1t:
+ mode = PSR_MODE_EL2t;
+ break;
+ case PSR_MODE_EL1h:
+ mode = PSR_MODE_EL2h;
+ break;
+ }
+
+ *vcpu_cpsr(vcpu) &= ~(PSR_MODE_MASK | PSR_MODE32_BIT);
+ *vcpu_cpsr(vcpu) |= mode;
+ }
}
/* Switch to the guest for VHE systems running in EL2 */
@@ -154,6 +173,11 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
sysreg_restore_guest_state_vhe(guest_ctxt);
__debug_switch_to_guest(vcpu);
+ if (is_hyp_ctxt(vcpu))
+ vcpu_set_flag(vcpu, VCPU_HYP_CONTEXT);
+ else
+ vcpu_clear_flag(vcpu, VCPU_HYP_CONTEXT);
+
do {
/* Jump in the fire! */
exit_code = __guest_enter(vcpu);
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index c9f401fa01a9..64c086c02c60 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -198,7 +198,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
break;
case ARM_SMCCC_HV_PV_TIME_ST:
gpa = kvm_init_stolen_time(vcpu);
- if (gpa != GPA_INVALID)
+ if (gpa != INVALID_GPA)
val[0] = gpa;
break;
case ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID:
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index f32f4a2a347f..64c3aec0d937 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -12,17 +12,55 @@
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
#include <asm/esr.h>
+static void pend_sync_exception(struct kvm_vcpu *vcpu)
+{
+ /* If not nesting, EL1 is the only possible exception target */
+ if (likely(!vcpu_has_nv(vcpu))) {
+ kvm_pend_exception(vcpu, EXCEPT_AA64_EL1_SYNC);
+ return;
+ }
+
+ /*
+ * With NV, we need to pick between EL1 and EL2. Note that we
+ * never deal with a nesting exception here, hence never
+ * changing context, and the exception itself can be delayed
+ * until the next entry.
+ */
+ switch(*vcpu_cpsr(vcpu) & PSR_MODE_MASK) {
+ case PSR_MODE_EL2h:
+ case PSR_MODE_EL2t:
+ kvm_pend_exception(vcpu, EXCEPT_AA64_EL2_SYNC);
+ break;
+ case PSR_MODE_EL1h:
+ case PSR_MODE_EL1t:
+ kvm_pend_exception(vcpu, EXCEPT_AA64_EL1_SYNC);
+ break;
+ case PSR_MODE_EL0t:
+ if (vcpu_el2_tge_is_set(vcpu))
+ kvm_pend_exception(vcpu, EXCEPT_AA64_EL2_SYNC);
+ else
+ kvm_pend_exception(vcpu, EXCEPT_AA64_EL1_SYNC);
+ break;
+ default:
+ BUG();
+ }
+}
+
+static bool match_target_el(struct kvm_vcpu *vcpu, unsigned long target)
+{
+ return (vcpu_get_flag(vcpu, EXCEPT_MASK) == target);
+}
+
static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
{
unsigned long cpsr = *vcpu_cpsr(vcpu);
bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
u64 esr = 0;
- kvm_pend_exception(vcpu, EXCEPT_AA64_EL1_SYNC);
-
- vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+ pend_sync_exception(vcpu);
/*
* Build an {i,d}abort, depending on the level and the
@@ -43,14 +81,22 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
if (!is_iabt)
esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
- vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
+ esr |= ESR_ELx_FSC_EXTABT;
+
+ if (match_target_el(vcpu, unpack_vcpu_flag(EXCEPT_AA64_EL1_SYNC))) {
+ vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+ vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+ } else {
+ vcpu_write_sys_reg(vcpu, addr, FAR_EL2);
+ vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
+ }
}
static void inject_undef64(struct kvm_vcpu *vcpu)
{
u64 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
- kvm_pend_exception(vcpu, EXCEPT_AA64_EL1_SYNC);
+ pend_sync_exception(vcpu);
/*
* Build an unknown exception, depending on the instruction
@@ -59,7 +105,10 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
if (kvm_vcpu_trap_il_is32bit(vcpu))
esr |= ESR_ELx_IL;
- vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+ if (match_target_el(vcpu, unpack_vcpu_flag(EXCEPT_AA64_EL1_SYNC)))
+ vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+ else
+ vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
}
#define DFSR_FSC_EXTABT_LPAE 0x10
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a3ee3b605c9b..7113587222ff 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -25,11 +25,11 @@
static struct kvm_pgtable *hyp_pgtable;
static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
-static unsigned long hyp_idmap_start;
-static unsigned long hyp_idmap_end;
-static phys_addr_t hyp_idmap_vector;
+static unsigned long __ro_after_init hyp_idmap_start;
+static unsigned long __ro_after_init hyp_idmap_end;
+static phys_addr_t __ro_after_init hyp_idmap_vector;
-static unsigned long io_map_base;
+static unsigned long __ro_after_init io_map_base;
static phys_addr_t stage2_range_addr_end(phys_addr_t addr, phys_addr_t end)
{
@@ -46,16 +46,17 @@ static phys_addr_t stage2_range_addr_end(phys_addr_t addr, phys_addr_t end)
* long will also starve other vCPUs. We have to also make sure that the page
* tables are not freed while we released the lock.
*/
-static int stage2_apply_range(struct kvm *kvm, phys_addr_t addr,
+static int stage2_apply_range(struct kvm_s2_mmu *mmu, phys_addr_t addr,
phys_addr_t end,
int (*fn)(struct kvm_pgtable *, u64, u64),
bool resched)
{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
int ret;
u64 next;
do {
- struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+ struct kvm_pgtable *pgt = mmu->pgt;
if (!pgt)
return -EINVAL;
@@ -71,8 +72,8 @@ static int stage2_apply_range(struct kvm *kvm, phys_addr_t addr,
return ret;
}
-#define stage2_apply_range_resched(kvm, addr, end, fn) \
- stage2_apply_range(kvm, addr, end, fn, true)
+#define stage2_apply_range_resched(mmu, addr, end, fn) \
+ stage2_apply_range(mmu, addr, end, fn, true)
static bool memslot_is_logging(struct kvm_memory_slot *memslot)
{
@@ -235,7 +236,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
lockdep_assert_held_write(&kvm->mmu_lock);
WARN_ON(size & ~PAGE_MASK);
- WARN_ON(stage2_apply_range(kvm, start, end, kvm_pgtable_stage2_unmap,
+ WARN_ON(stage2_apply_range(mmu, start, end, kvm_pgtable_stage2_unmap,
may_block));
}
@@ -250,7 +251,7 @@ static void stage2_flush_memslot(struct kvm *kvm,
phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
- stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_flush);
+ stage2_apply_range_resched(&kvm->arch.mmu, addr, end, kvm_pgtable_stage2_flush);
}
/**
@@ -280,7 +281,7 @@ static void stage2_flush_vm(struct kvm *kvm)
/**
* free_hyp_pgds - free Hyp-mode page tables
*/
-void free_hyp_pgds(void)
+void __init free_hyp_pgds(void)
{
mutex_lock(&kvm_hyp_pgd_mutex);
if (hyp_pgtable) {
@@ -934,8 +935,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
*/
static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
{
- struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
- stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
+ stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_wrprotect);
}
/**
@@ -1383,7 +1383,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
else
ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
__pfn_to_phys(pfn), prot,
- memcache, KVM_PGTABLE_WALK_SHARED);
+ memcache,
+ KVM_PGTABLE_WALK_HANDLE_FAULT |
+ KVM_PGTABLE_WALK_SHARED);
/* Mark the page dirty only if the fault is handled successfully */
if (writable && !ret) {
@@ -1401,20 +1403,18 @@ out_unlock:
/* Resolve the access fault by making the page young again. */
static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
{
- pte_t pte;
- kvm_pte_t kpte;
+ kvm_pte_t pte;
struct kvm_s2_mmu *mmu;
trace_kvm_access_fault(fault_ipa);
- write_lock(&vcpu->kvm->mmu_lock);
+ read_lock(&vcpu->kvm->mmu_lock);
mmu = vcpu->arch.hw_mmu;
- kpte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
- write_unlock(&vcpu->kvm->mmu_lock);
+ pte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
+ read_unlock(&vcpu->kvm->mmu_lock);
- pte = __pte(kpte);
- if (pte_valid(pte))
- kvm_set_pfn_accessed(pte_pfn(pte));
+ if (kvm_pte_valid(pte))
+ kvm_set_pfn_accessed(kvm_pte_to_pfn(pte));
}
/**
@@ -1668,7 +1668,7 @@ static struct kvm_pgtable_mm_ops kvm_hyp_mm_ops = {
.virt_to_phys = kvm_host_pa,
};
-int kvm_mmu_init(u32 *hyp_va_bits)
+int __init kvm_mmu_init(u32 *hyp_va_bits)
{
int err;
u32 idmap_bits;
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
new file mode 100644
index 000000000000..315354d27978
--- /dev/null
+++ b/arch/arm64/kvm/nested.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2017 - Columbia University and Linaro Ltd.
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
+#include <asm/sysreg.h>
+
+#include "sys_regs.h"
+
+/* Protection against the sysreg repainting madness... */
+#define NV_FTR(r, f) ID_AA64##r##_EL1_##f
+
+/*
+ * Our emulated CPU doesn't support all the possible features. For the
+ * sake of simplicity (and probably mental sanity), wipe out a number
+ * of feature bits we don't intend to support for the time being.
+ * This list should get updated as new features get added to the NV
+ * support, and new extension to the architecture.
+ */
+void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ u32 id = reg_to_encoding(r);
+ u64 val, tmp;
+
+ val = p->regval;
+
+ switch (id) {
+ case SYS_ID_AA64ISAR0_EL1:
+ /* Support everything but TME, O.S. and Range TLBIs */
+ val &= ~(NV_FTR(ISAR0, TLB) |
+ NV_FTR(ISAR0, TME));
+ break;
+
+ case SYS_ID_AA64ISAR1_EL1:
+ /* Support everything but PtrAuth and Spec Invalidation */
+ val &= ~(GENMASK_ULL(63, 56) |
+ NV_FTR(ISAR1, SPECRES) |
+ NV_FTR(ISAR1, GPI) |
+ NV_FTR(ISAR1, GPA) |
+ NV_FTR(ISAR1, API) |
+ NV_FTR(ISAR1, APA));
+ break;
+
+ case SYS_ID_AA64PFR0_EL1:
+ /* No AMU, MPAM, S-EL2, RAS or SVE */
+ val &= ~(GENMASK_ULL(55, 52) |
+ NV_FTR(PFR0, AMU) |
+ NV_FTR(PFR0, MPAM) |
+ NV_FTR(PFR0, SEL2) |
+ NV_FTR(PFR0, RAS) |
+ NV_FTR(PFR0, SVE) |
+ NV_FTR(PFR0, EL3) |
+ NV_FTR(PFR0, EL2) |
+ NV_FTR(PFR0, EL1));
+ /* 64bit EL1/EL2/EL3 only */
+ val |= FIELD_PREP(NV_FTR(PFR0, EL1), 0b0001);
+ val |= FIELD_PREP(NV_FTR(PFR0, EL2), 0b0001);
+ val |= FIELD_PREP(NV_FTR(PFR0, EL3), 0b0001);
+ break;
+
+ case SYS_ID_AA64PFR1_EL1:
+ /* Only support SSBS */
+ val &= NV_FTR(PFR1, SSBS);
+ break;
+
+ case SYS_ID_AA64MMFR0_EL1:
+ /* Hide ECV, FGT, ExS, Secure Memory */
+ val &= ~(GENMASK_ULL(63, 43) |
+ NV_FTR(MMFR0, TGRAN4_2) |
+ NV_FTR(MMFR0, TGRAN16_2) |
+ NV_FTR(MMFR0, TGRAN64_2) |
+ NV_FTR(MMFR0, SNSMEM));
+
+ /* Disallow unsupported S2 page sizes */
+ switch (PAGE_SIZE) {
+ case SZ_64K:
+ val |= FIELD_PREP(NV_FTR(MMFR0, TGRAN16_2), 0b0001);
+ fallthrough;
+ case SZ_16K:
+ val |= FIELD_PREP(NV_FTR(MMFR0, TGRAN4_2), 0b0001);
+ fallthrough;
+ case SZ_4K:
+ /* Support everything */
+ break;
+ }
+ /*
+ * Since we can't support a guest S2 page size smaller than
+ * the host's own page size (due to KVM only populating its
+ * own S2 using the kernel's page size), advertise the
+ * limitation using FEAT_GTG.
+ */
+ switch (PAGE_SIZE) {
+ case SZ_4K:
+ val |= FIELD_PREP(NV_FTR(MMFR0, TGRAN4_2), 0b0010);
+ fallthrough;
+ case SZ_16K:
+ val |= FIELD_PREP(NV_FTR(MMFR0, TGRAN16_2), 0b0010);
+ fallthrough;
+ case SZ_64K:
+ val |= FIELD_PREP(NV_FTR(MMFR0, TGRAN64_2), 0b0010);
+ break;
+ }
+ /* Cap PARange to 48bits */
+ tmp = FIELD_GET(NV_FTR(MMFR0, PARANGE), val);
+ if (tmp > 0b0101) {
+ val &= ~NV_FTR(MMFR0, PARANGE);
+ val |= FIELD_PREP(NV_FTR(MMFR0, PARANGE), 0b0101);
+ }
+ break;
+
+ case SYS_ID_AA64MMFR1_EL1:
+ val &= (NV_FTR(MMFR1, PAN) |
+ NV_FTR(MMFR1, LO) |
+ NV_FTR(MMFR1, HPDS) |
+ NV_FTR(MMFR1, VH) |
+ NV_FTR(MMFR1, VMIDBits));
+ break;
+
+ case SYS_ID_AA64MMFR2_EL1:
+ val &= ~(NV_FTR(MMFR2, EVT) |
+ NV_FTR(MMFR2, BBM) |
+ NV_FTR(MMFR2, TTL) |
+ GENMASK_ULL(47, 44) |
+ NV_FTR(MMFR2, ST) |
+ NV_FTR(MMFR2, CCIDX) |
+ NV_FTR(MMFR2, VARange));
+
+ /* Force TTL support */
+ val |= FIELD_PREP(NV_FTR(MMFR2, TTL), 0b0001);
+ break;
+
+ case SYS_ID_AA64DFR0_EL1:
+ /* Only limited support for PMU, Debug, BPs and WPs */
+ val &= (NV_FTR(DFR0, PMUVer) |
+ NV_FTR(DFR0, WRPs) |
+ NV_FTR(DFR0, BRPs) |
+ NV_FTR(DFR0, DebugVer));
+
+ /* Cap Debug to ARMv8.1 */
+ tmp = FIELD_GET(NV_FTR(DFR0, DebugVer), val);
+ if (tmp > 0b0111) {
+ val &= ~NV_FTR(DFR0, DebugVer);
+ val |= FIELD_PREP(NV_FTR(DFR0, DebugVer), 0b0111);
+ }
+ break;
+
+ default:
+ /* Unknown register, just wipe it clean */
+ val = 0;
+ break;
+ }
+
+ p->regval = val;
+}
diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
index 78a09f7a6637..4ceabaa4c30b 100644
--- a/arch/arm64/kvm/pvtime.c
+++ b/arch/arm64/kvm/pvtime.c
@@ -19,7 +19,7 @@ void kvm_update_stolen_time(struct kvm_vcpu *vcpu)
u64 steal = 0;
int idx;
- if (base == GPA_INVALID)
+ if (base == INVALID_GPA)
return;
idx = srcu_read_lock(&kvm->srcu);
@@ -40,7 +40,7 @@ long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
switch (feature) {
case ARM_SMCCC_HV_PV_TIME_FEATURES:
case ARM_SMCCC_HV_PV_TIME_ST:
- if (vcpu->arch.steal.base != GPA_INVALID)
+ if (vcpu->arch.steal.base != INVALID_GPA)
val = SMCCC_RET_SUCCESS;
break;
}
@@ -54,7 +54,7 @@ gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu)
struct kvm *kvm = vcpu->kvm;
u64 base = vcpu->arch.steal.base;
- if (base == GPA_INVALID)
+ if (base == INVALID_GPA)
return base;
/*
@@ -89,7 +89,7 @@ int kvm_arm_pvtime_set_attr(struct kvm_vcpu *vcpu,
return -EFAULT;
if (!IS_ALIGNED(ipa, 64))
return -EINVAL;
- if (vcpu->arch.steal.base != GPA_INVALID)
+ if (vcpu->arch.steal.base != INVALID_GPA)
return -EEXIST;
/* Check the address is in a valid memslot */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index e0267f672b8a..49a3257dec46 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -27,10 +27,11 @@
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
#include <asm/virt.h>
/* Maximum phys_shift supported for any VM on this host */
-static u32 kvm_ipa_limit;
+static u32 __ro_after_init kvm_ipa_limit;
/*
* ARMv8 Reset Values
@@ -38,12 +39,15 @@ static u32 kvm_ipa_limit;
#define VCPU_RESET_PSTATE_EL1 (PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
PSR_F_BIT | PSR_D_BIT)
+#define VCPU_RESET_PSTATE_EL2 (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
+ PSR_F_BIT | PSR_D_BIT)
+
#define VCPU_RESET_PSTATE_SVC (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
PSR_AA32_I_BIT | PSR_AA32_F_BIT)
-unsigned int kvm_sve_max_vl;
+unsigned int __ro_after_init kvm_sve_max_vl;
-int kvm_arm_init_sve(void)
+int __init kvm_arm_init_sve(void)
{
if (system_supports_sve()) {
kvm_sve_max_vl = sve_max_virtualisable_vl();
@@ -157,6 +161,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
if (sve_state)
kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
kfree(sve_state);
+ kfree(vcpu->arch.ccsidr);
}
static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
@@ -220,6 +225,10 @@ static int kvm_set_vm_width(struct kvm_vcpu *vcpu)
if (kvm_has_mte(kvm) && is32bit)
return -EINVAL;
+ /* NV is incompatible with AArch32 */
+ if (vcpu_has_nv(vcpu) && is32bit)
+ return -EINVAL;
+
if (is32bit)
set_bit(KVM_ARCH_FLAG_EL1_32BIT, &kvm->arch.flags);
@@ -272,6 +281,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
if (loaded)
kvm_arch_vcpu_put(vcpu);
+ /* Disallow NV+SVE for the time being */
+ if (vcpu_has_nv(vcpu) && vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
if (!kvm_arm_vcpu_sve_finalized(vcpu)) {
if (test_bit(KVM_ARM_VCPU_SVE, vcpu->arch.features)) {
ret = kvm_vcpu_enable_sve(vcpu);
@@ -294,6 +309,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
default:
if (vcpu_el1_is_32bit(vcpu)) {
pstate = VCPU_RESET_PSTATE_SVC;
+ } else if (vcpu_has_nv(vcpu)) {
+ pstate = VCPU_RESET_PSTATE_EL2;
} else {
pstate = VCPU_RESET_PSTATE_EL1;
}
@@ -352,7 +369,7 @@ u32 get_kvm_ipa_limit(void)
return kvm_ipa_limit;
}
-int kvm_set_ipa_limit(void)
+int __init kvm_set_ipa_limit(void)
{
unsigned int parange;
u64 mmfr0;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c6cbfe6b854b..53749d3a0996 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -11,6 +11,7 @@
#include <linux/bitfield.h>
#include <linux/bsearch.h>
+#include <linux/cacheinfo.h>
#include <linux/kvm_host.h>
#include <linux/mm.h>
#include <linux/printk.h>
@@ -24,6 +25,7 @@
#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
#include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
#include <asm/perf_event.h>
#include <asm/sysreg.h>
@@ -78,28 +80,112 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
__vcpu_write_sys_reg_to_cpu(val, reg))
return;
- __vcpu_sys_reg(vcpu, reg) = val;
+ __vcpu_sys_reg(vcpu, reg) = val;
}
-/* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
-static u32 cache_levels;
-
/* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
#define CSSELR_MAX 14
+/*
+ * Returns the minimum line size for the selected cache, expressed as
+ * Log2(bytes).
+ */
+static u8 get_min_cache_line_size(bool icache)
+{
+ u64 ctr = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ u8 field;
+
+ if (icache)
+ field = SYS_FIELD_GET(CTR_EL0, IminLine, ctr);
+ else
+ field = SYS_FIELD_GET(CTR_EL0, DminLine, ctr);
+
+ /*
+ * Cache line size is represented as Log2(words) in CTR_EL0.
+ * Log2(bytes) can be derived with the following:
+ *
+ * Log2(words) + 2 = Log2(bytes / 4) + 2
+ * = Log2(bytes) - 2 + 2
+ * = Log2(bytes)
+ */
+ return field + 2;
+}
+
/* Which cache CCSIDR represents depends on CSSELR value. */
-static u32 get_ccsidr(u32 csselr)
+static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
{
- u32 ccsidr;
+ u8 line_size;
- /* Make sure noone else changes CSSELR during this! */
- local_irq_disable();
- write_sysreg(csselr, csselr_el1);
- isb();
- ccsidr = read_sysreg(ccsidr_el1);
- local_irq_enable();
+ if (vcpu->arch.ccsidr)
+ return vcpu->arch.ccsidr[csselr];
- return ccsidr;
+ line_size = get_min_cache_line_size(csselr & CSSELR_EL1_InD);
+
+ /*
+ * Fabricate a CCSIDR value as the overriding value does not exist.
+ * The real CCSIDR value will not be used as it can vary by the
+ * physical CPU which the vcpu currently resides in.
+ *
+ * The line size is determined with get_min_cache_line_size(), which
+ * should be valid for all CPUs even if they have different cache
+ * configuration.
+ *
+ * The associativity bits are cleared, meaning the geometry of all data
+ * and unified caches (which are guaranteed to be PIPT and thus
+ * non-aliasing) are 1 set and 1 way.
+ * Guests should not be doing cache operations by set/way at all, and
+ * for this reason, we trap them and attempt to infer the intent, so
+ * that we can flush the entire guest's address space at the appropriate
+ * time. The exposed geometry minimizes the number of the traps.
+ * [If guests should attempt to infer aliasing properties from the
+ * geometry (which is not permitted by the architecture), they would
+ * only do so for virtually indexed caches.]
+ *
+ * We don't check if the cache level exists as it is allowed to return
+ * an UNKNOWN value if not.
+ */
+ return SYS_FIELD_PREP(CCSIDR_EL1, LineSize, line_size - 4);
+}
+
+static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
+{
+ u8 line_size = FIELD_GET(CCSIDR_EL1_LineSize, val) + 4;
+ u32 *ccsidr = vcpu->arch.ccsidr;
+ u32 i;
+
+ if ((val & CCSIDR_EL1_RES0) ||
+ line_size < get_min_cache_line_size(csselr & CSSELR_EL1_InD))
+ return -EINVAL;
+
+ if (!ccsidr) {
+ if (val == get_ccsidr(vcpu, csselr))
+ return 0;
+
+ ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32), GFP_KERNEL_ACCOUNT);
+ if (!ccsidr)
+ return -ENOMEM;
+
+ for (i = 0; i < CSSELR_MAX; i++)
+ ccsidr[i] = get_ccsidr(vcpu, i);
+
+ vcpu->arch.ccsidr = ccsidr;
+ }
+
+ ccsidr[csselr] = val;
+
+ return 0;
+}
+
+static bool access_rw(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ if (p->is_write)
+ vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+ else
+ p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+
+ return true;
}
/*
@@ -260,6 +346,14 @@ static bool trap_raz_wi(struct kvm_vcpu *vcpu,
return read_zero(vcpu, p);
}
+static bool trap_undef(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ kvm_inject_undefined(vcpu);
+ return false;
+}
+
/*
* ARMv8.1 mandates at least a trivial LORegion implementation, where all the
* RW registers are RES0 (which we can implement as RAZ/WI). On an ARMv8.0
@@ -370,12 +464,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
- if (p->is_write) {
- vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+ access_rw(vcpu, p, r);
+ if (p->is_write)
vcpu_set_flag(vcpu, DEBUG_DIRTY);
- } else {
- p->regval = vcpu_read_sys_reg(vcpu, r->reg);
- }
trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
@@ -1049,7 +1140,9 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
treg = TIMER_REG_CVAL;
break;
default:
- BUG();
+ print_sys_reg_msg(p, "%s", "Unhandled trapped timer register");
+ kvm_inject_undefined(vcpu);
+ return false;
}
if (p->is_write)
@@ -1155,6 +1248,12 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu, struct sys_reg_desc const *r
val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_DFR0_EL1_PerfMon),
pmuver_to_perfmon(vcpu_pmuver(vcpu)));
break;
+ case SYS_ID_AA64MMFR2_EL1:
+ val &= ~ID_AA64MMFR2_EL1_CCIDX_MASK;
+ break;
+ case SYS_ID_MMFR4_EL1:
+ val &= ~ARM64_FEATURE_MASK(ID_MMFR4_EL1_CCIDX);
+ break;
}
return val;
@@ -1205,6 +1304,9 @@ static bool access_id_reg(struct kvm_vcpu *vcpu,
return write_to_read_only(vcpu, p, r);
p->regval = read_id_reg(vcpu, r);
+ if (vcpu_has_nv(vcpu))
+ access_nested_id_reg(vcpu, p, r);
+
return true;
}
@@ -1385,10 +1487,78 @@ static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
if (p->is_write)
return write_to_read_only(vcpu, p, r);
- p->regval = read_sysreg(clidr_el1);
+ p->regval = __vcpu_sys_reg(vcpu, r->reg);
return true;
}
+/*
+ * Fabricate a CLIDR_EL1 value instead of using the real value, which can vary
+ * by the physical CPU which the vcpu currently resides in.
+ */
+static void reset_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+ u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ u64 clidr;
+ u8 loc;
+
+ if ((ctr_el0 & CTR_EL0_IDC)) {
+ /*
+ * Data cache clean to the PoU is not required so LoUU and LoUIS
+ * will not be set and a unified cache, which will be marked as
+ * LoC, will be added.
+ *
+ * If not DIC, let the unified cache L2 so that an instruction
+ * cache can be added as L1 later.
+ */
+ loc = (ctr_el0 & CTR_EL0_DIC) ? 1 : 2;
+ clidr = CACHE_TYPE_UNIFIED << CLIDR_CTYPE_SHIFT(loc);
+ } else {
+ /*
+ * Data cache clean to the PoU is required so let L1 have a data
+ * cache and mark it as LoUU and LoUIS. As L1 has a data cache,
+ * it can be marked as LoC too.
+ */
+ loc = 1;
+ clidr = 1 << CLIDR_LOUU_SHIFT;
+ clidr |= 1 << CLIDR_LOUIS_SHIFT;
+ clidr |= CACHE_TYPE_DATA << CLIDR_CTYPE_SHIFT(1);
+ }
+
+ /*
+ * Instruction cache invalidation to the PoU is required so let L1 have
+ * an instruction cache. If L1 already has a data cache, it will be
+ * CACHE_TYPE_SEPARATE.
+ */
+ if (!(ctr_el0 & CTR_EL0_DIC))
+ clidr |= CACHE_TYPE_INST << CLIDR_CTYPE_SHIFT(1);
+
+ clidr |= loc << CLIDR_LOC_SHIFT;
+
+ /*
+ * Add tag cache unified to data cache. Allocation tags and data are
+ * unified in a cache line so that it looks valid even if there is only
+ * one cache line.
+ */
+ if (kvm_has_mte(vcpu->kvm))
+ clidr |= 2 << CLIDR_TTYPE_SHIFT(loc);
+
+ __vcpu_sys_reg(vcpu, r->reg) = clidr;
+}
+
+static int set_clidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
+ u64 val)
+{
+ u64 ctr_el0 = read_sanitised_ftr_reg(SYS_CTR_EL0);
+ u64 idc = !CLIDR_LOC(val) || (!CLIDR_LOUIS(val) && !CLIDR_LOUU(val));
+
+ if ((val & CLIDR_EL1_RES0) || (!(ctr_el0 & CTR_EL0_IDC) && idc))
+ return -EINVAL;
+
+ __vcpu_sys_reg(vcpu, rd->reg) = val;
+
+ return 0;
+}
+
static bool access_csselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
@@ -1410,22 +1580,10 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return write_to_read_only(vcpu, p, r);
csselr = vcpu_read_sys_reg(vcpu, CSSELR_EL1);
- p->regval = get_ccsidr(csselr);
+ csselr &= CSSELR_EL1_Level | CSSELR_EL1_InD;
+ if (csselr < CSSELR_MAX)
+ p->regval = get_ccsidr(vcpu, csselr);
- /*
- * Guests should not be doing cache operations by set/way at all, and
- * for this reason, we trap them and attempt to infer the intent, so
- * that we can flush the entire guest's address space at the appropriate
- * time.
- * To prevent this trapping from causing performance problems, let's
- * expose the geometry of all data and unified caches (which are
- * guaranteed to be PIPT and thus non-aliasing) as 1 set and 1 way.
- * [If guests should attempt to infer aliasing properties from the
- * geometry (which is not permitted by the architecture), they would
- * only do so for virtually indexed caches.]
- */
- if (!(csselr & 1)) // data or unified cache
- p->regval &= ~GENMASK(27, 3);
return true;
}
@@ -1446,6 +1604,44 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
.visibility = mte_visibility, \
}
+static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *rd)
+{
+ if (vcpu_has_nv(vcpu))
+ return 0;
+
+ return REG_HIDDEN;
+}
+
+#define EL2_REG(name, acc, rst, v) { \
+ SYS_DESC(SYS_##name), \
+ .access = acc, \
+ .reset = rst, \
+ .reg = name, \
+ .visibility = el2_visibility, \
+ .val = v, \
+}
+
+/*
+ * EL{0,1}2 registers are the EL2 view on an EL0 or EL1 register when
+ * HCR_EL2.E2H==1, and only in the sysreg table for convenience of
+ * handling traps. Given that, they are always hidden from userspace.
+ */
+static unsigned int elx2_visibility(const struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *rd)
+{
+ return REG_HIDDEN_USER;
+}
+
+#define EL12_REG(name, acc, rst, v) { \
+ SYS_DESC(SYS_##name##_EL12), \
+ .access = acc, \
+ .reset = rst, \
+ .reg = name##_EL1, \
+ .val = v, \
+ .visibility = elx2_visibility, \
+}
+
/* sys_reg_desc initialiser for known cpufeature ID registers */
#define ID_SANITISED(name) { \
SYS_DESC(SYS_##name), \
@@ -1490,6 +1686,42 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
.visibility = raz_visibility, \
}
+static bool access_sp_el1(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ if (p->is_write)
+ __vcpu_sys_reg(vcpu, SP_EL1) = p->regval;
+ else
+ p->regval = __vcpu_sys_reg(vcpu, SP_EL1);
+
+ return true;
+}
+
+static bool access_elr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ if (p->is_write)
+ vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
+ else
+ p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);
+
+ return true;
+}
+
+static bool access_spsr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ if (p->is_write)
+ __vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
+ else
+ p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
+
+ return true;
+}
+
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1646,6 +1878,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
PTRAUTH_KEY(APDB),
PTRAUTH_KEY(APGA),
+ { SYS_DESC(SYS_SPSR_EL1), access_spsr},
+ { SYS_DESC(SYS_ELR_EL1), access_elr},
+
{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
@@ -1693,7 +1928,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
- { SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
+ { SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
@@ -1717,7 +1952,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_CNTKCTL_EL1), NULL, reset_val, CNTKCTL_EL1, 0},
{ SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr },
- { SYS_DESC(SYS_CLIDR_EL1), access_clidr },
+ { SYS_DESC(SYS_CLIDR_EL1), access_clidr, reset_clidr, CLIDR_EL1,
+ .set_user = set_clidr },
+ { SYS_DESC(SYS_CCSIDR2_EL1), undef_access },
{ SYS_DESC(SYS_SMIDR_EL1), undef_access },
{ SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
{ SYS_DESC(SYS_CTR_EL0), access_ctr },
@@ -1913,9 +2150,67 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ PMU_SYS_REG(SYS_PMCCFILTR_EL0), .access = access_pmu_evtyper,
.reset = reset_val, .reg = PMCCFILTR_EL0, .val = 0 },
+ EL2_REG(VPIDR_EL2, access_rw, reset_unknown, 0),
+ EL2_REG(VMPIDR_EL2, access_rw, reset_unknown, 0),
+ EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1),
+ EL2_REG(ACTLR_EL2, access_rw, reset_val, 0),
+ EL2_REG(HCR_EL2, access_rw, reset_val, 0),
+ EL2_REG(MDCR_EL2, access_rw, reset_val, 0),
+ EL2_REG(CPTR_EL2, access_rw, reset_val, CPTR_EL2_DEFAULT ),
+ EL2_REG(HSTR_EL2, access_rw, reset_val, 0),
+ EL2_REG(HACR_EL2, access_rw, reset_val, 0),
+
+ EL2_REG(TTBR0_EL2, access_rw, reset_val, 0),
+ EL2_REG(TTBR1_EL2, access_rw, reset_val, 0),
+ EL2_REG(TCR_EL2, access_rw, reset_val, TCR_EL2_RES1),
+ EL2_REG(VTTBR_EL2, access_rw, reset_val, 0),
+ EL2_REG(VTCR_EL2, access_rw, reset_val, 0),
+
{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
+ EL2_REG(SPSR_EL2, access_rw, reset_val, 0),
+ EL2_REG(ELR_EL2, access_rw, reset_val, 0),
+ { SYS_DESC(SYS_SP_EL1), access_sp_el1},
+
{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
+ EL2_REG(AFSR0_EL2, access_rw, reset_val, 0),
+ EL2_REG(AFSR1_EL2, access_rw, reset_val, 0),
+ EL2_REG(ESR_EL2, access_rw, reset_val, 0),
{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x700 },
+
+ EL2_REG(FAR_EL2, access_rw, reset_val, 0),
+ EL2_REG(HPFAR_EL2, access_rw, reset_val, 0),
+
+ EL2_REG(MAIR_EL2, access_rw, reset_val, 0),
+ EL2_REG(AMAIR_EL2, access_rw, reset_val, 0),
+
+ EL2_REG(VBAR_EL2, access_rw, reset_val, 0),
+ EL2_REG(RVBAR_EL2, access_rw, reset_val, 0),
+ { SYS_DESC(SYS_RMR_EL2), trap_undef },
+
+ EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
+ EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
+
+ EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
+ EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
+
+ EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
+ EL12_REG(CPACR, access_rw, reset_val, 0),
+ EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
+ EL12_REG(TTBR1, access_vm_reg, reset_unknown, 0),
+ EL12_REG(TCR, access_vm_reg, reset_val, 0),
+ { SYS_DESC(SYS_SPSR_EL12), access_spsr},
+ { SYS_DESC(SYS_ELR_EL12), access_elr},
+ EL12_REG(AFSR0, access_vm_reg, reset_unknown, 0),
+ EL12_REG(AFSR1, access_vm_reg, reset_unknown, 0),
+ EL12_REG(ESR, access_vm_reg, reset_unknown, 0),
+ EL12_REG(FAR, access_vm_reg, reset_unknown, 0),
+ EL12_REG(MAIR, access_vm_reg, reset_unknown, 0),
+ EL12_REG(AMAIR, access_vm_reg, reset_amair_el1, 0),
+ EL12_REG(VBAR, access_rw, reset_val, 0),
+ EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
+ EL12_REG(CNTKCTL, access_rw, reset_val, 0),
+
+ EL2_REG(SP_EL2, NULL, reset_unknown, 0),
};
static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
@@ -2219,6 +2514,10 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1(1), CRn( 0), CRm( 0), Op2(0), access_ccsidr },
{ Op1(1), CRn( 0), CRm( 0), Op2(1), access_clidr },
+
+ /* CCSIDR2 */
+ { Op1(1), CRn( 0), CRm( 0), Op2(2), undef_access },
+
{ Op1(2), CRn( 0), CRm( 0), Op2(0), access_csselr, NULL, CSSELR_EL1 },
};
@@ -2724,7 +3023,6 @@ id_to_sys_reg_desc(struct kvm_vcpu *vcpu, u64 id,
FUNCTION_INVARIANT(midr_el1)
FUNCTION_INVARIANT(revidr_el1)
-FUNCTION_INVARIANT(clidr_el1)
FUNCTION_INVARIANT(aidr_el1)
static void get_ctr_el0(struct kvm_vcpu *v, const struct sys_reg_desc *r)
@@ -2733,10 +3031,9 @@ static void get_ctr_el0(struct kvm_vcpu *v, const struct sys_reg_desc *r)
}
/* ->val is filled in by kvm_sys_reg_table_init() */
-static struct sys_reg_desc invariant_sys_regs[] = {
+static struct sys_reg_desc invariant_sys_regs[] __ro_after_init = {
{ SYS_DESC(SYS_MIDR_EL1), NULL, get_midr_el1 },
{ SYS_DESC(SYS_REVIDR_EL1), NULL, get_revidr_el1 },
- { SYS_DESC(SYS_CLIDR_EL1), NULL, get_clidr_el1 },
{ SYS_DESC(SYS_AIDR_EL1), NULL, get_aidr_el1 },
{ SYS_DESC(SYS_CTR_EL0), NULL, get_ctr_el0 },
};
@@ -2773,33 +3070,7 @@ static int set_invariant_sys_reg(u64 id, u64 __user *uaddr)
return 0;
}
-static bool is_valid_cache(u32 val)
-{
- u32 level, ctype;
-
- if (val >= CSSELR_MAX)
- return false;
-
- /* Bottom bit is Instruction or Data bit. Next 3 bits are level. */
- level = (val >> 1);
- ctype = (cache_levels >> (level * 3)) & 7;
-
- switch (ctype) {
- case 0: /* No cache */
- return false;
- case 1: /* Instruction cache only */
- return (val & 1);
- case 2: /* Data cache only */
- case 4: /* Unified cache */
- return !(val & 1);
- case 3: /* Separate instruction and data caches */
- return true;
- default: /* Reserved: we can't know instruction or data. */
- return false;
- }
-}
-
-static int demux_c15_get(u64 id, void __user *uaddr)
+static int demux_c15_get(struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
{
u32 val;
u32 __user *uval = uaddr;
@@ -2815,16 +3086,16 @@ static int demux_c15_get(u64 id, void __user *uaddr)
return -ENOENT;
val = (id & KVM_REG_ARM_DEMUX_VAL_MASK)
>> KVM_REG_ARM_DEMUX_VAL_SHIFT;
- if (!is_valid_cache(val))
+ if (val >= CSSELR_MAX)
return -ENOENT;
- return put_user(get_ccsidr(val), uval);
+ return put_user(get_ccsidr(vcpu, val), uval);
default:
return -ENOENT;
}
}
-static int demux_c15_set(u64 id, void __user *uaddr)
+static int demux_c15_set(struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
{
u32 val, newval;
u32 __user *uval = uaddr;
@@ -2840,16 +3111,13 @@ static int demux_c15_set(u64 id, void __user *uaddr)
return -ENOENT;
val = (id & KVM_REG_ARM_DEMUX_VAL_MASK)
>> KVM_REG_ARM_DEMUX_VAL_SHIFT;
- if (!is_valid_cache(val))
+ if (val >= CSSELR_MAX)
return -ENOENT;
if (get_user(newval, uval))
return -EFAULT;
- /* This is also invariant: you can't change it. */
- if (newval != get_ccsidr(val))
- return -EINVAL;
- return 0;
+ return set_ccsidr(vcpu, val, newval);
default:
return -ENOENT;
}
@@ -2864,7 +3132,7 @@ int kvm_sys_reg_get_user(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg,
int ret;
r = id_to_sys_reg_desc(vcpu, reg->id, table, num);
- if (!r)
+ if (!r || sysreg_hidden_user(vcpu, r))
return -ENOENT;
if (r->get_user) {
@@ -2886,7 +3154,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
int err;
if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
- return demux_c15_get(reg->id, uaddr);
+ return demux_c15_get(vcpu, reg->id, uaddr);
err = get_invariant_sys_reg(reg->id, uaddr);
if (err != -ENOENT)
@@ -2908,7 +3176,7 @@ int kvm_sys_reg_set_user(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg,
return -EFAULT;
r = id_to_sys_reg_desc(vcpu, reg->id, table, num);
- if (!r)
+ if (!r || sysreg_hidden_user(vcpu, r))
return -ENOENT;
if (sysreg_user_write_ignore(vcpu, r))
@@ -2930,7 +3198,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
int err;
if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_DEMUX)
- return demux_c15_set(reg->id, uaddr);
+ return demux_c15_set(vcpu, reg->id, uaddr);
err = set_invariant_sys_reg(reg->id, uaddr);
if (err != -ENOENT)
@@ -2942,13 +3210,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
static unsigned int num_demux_regs(void)
{
- unsigned int i, count = 0;
-
- for (i = 0; i < CSSELR_MAX; i++)
- if (is_valid_cache(i))
- count++;
-
- return count;
+ return CSSELR_MAX;
}
static int write_demux_regids(u64 __user *uindices)
@@ -2958,8 +3220,6 @@ static int write_demux_regids(u64 __user *uindices)
val |= KVM_REG_ARM_DEMUX_ID_CCSIDR;
for (i = 0; i < CSSELR_MAX; i++) {
- if (!is_valid_cache(i))
- continue;
if (put_user(val | i, uindices))
return -EFAULT;
uindices++;
@@ -3002,7 +3262,7 @@ static int walk_one_sys_reg(const struct kvm_vcpu *vcpu,
if (!(rd->reg || rd->get_user))
return 0;
- if (sysreg_hidden(vcpu, rd))
+ if (sysreg_hidden_user(vcpu, rd))
return 0;
if (!copy_reg_to_user(rd, uind))
@@ -3057,11 +3317,10 @@ int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
return write_demux_regids(uindices);
}
-int kvm_sys_reg_table_init(void)
+int __init kvm_sys_reg_table_init(void)
{
bool valid = true;
unsigned int i;
- struct sys_reg_desc clidr;
/* Make sure tables are unique and in order. */
valid &= check_sysreg_table(sys_reg_descs, ARRAY_SIZE(sys_reg_descs), false);
@@ -3078,23 +3337,5 @@ int kvm_sys_reg_table_init(void)
for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
invariant_sys_regs[i].reset(NULL, &invariant_sys_regs[i]);
- /*
- * CLIDR format is awkward, so clean it up. See ARM B4.1.20:
- *
- * If software reads the Cache Type fields from Ctype1
- * upwards, once it has seen a value of 0b000, no caches
- * exist at further-out levels of the hierarchy. So, for
- * example, if Ctype3 is the first Cache Type field with a
- * value of 0b000, the values of Ctype4 to Ctype7 must be
- * ignored.
- */
- get_clidr_el1(NULL, &clidr); /* Ugly... */
- cache_levels = clidr.val;
- for (i = 0; i < 7; i++)
- if (((cache_levels >> (i*3)) & 7) == 0)
- break;
- /* Clear all higher bits. */
- cache_levels &= (1 << (i*3))-1;
-
return 0;
}
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index e4ebb3a379fd..6b11f2cc7146 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -85,8 +85,9 @@ struct sys_reg_desc {
};
#define REG_HIDDEN (1 << 0) /* hidden from userspace and guest */
-#define REG_RAZ (1 << 1) /* RAZ from userspace and guest */
-#define REG_USER_WI (1 << 2) /* WI from userspace only */
+#define REG_HIDDEN_USER (1 << 1) /* hidden from userspace only */
+#define REG_RAZ (1 << 2) /* RAZ from userspace and guest */
+#define REG_USER_WI (1 << 3) /* WI from userspace only */
static __printf(2, 3)
inline void print_sys_reg_msg(const struct sys_reg_params *p,
@@ -152,6 +153,15 @@ static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu,
return sysreg_visibility(vcpu, r) & REG_HIDDEN;
}
+static inline bool sysreg_hidden_user(const struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r)
+{
+ if (likely(!r->visibility))
+ return false;
+
+ return r->visibility(vcpu, r) & (REG_HIDDEN | REG_HIDDEN_USER);
+}
+
static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
{
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index 33e4e7dd2719..f3e46a976125 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -2,6 +2,7 @@
#if !defined(_TRACE_ARM_ARM64_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_ARM_ARM64_KVM_H
+#include <asm/kvm_emulate.h>
#include <kvm/arm_arch_timer.h>
#include <linux/tracepoint.h>
@@ -301,6 +302,64 @@ TRACE_EVENT(kvm_timer_emulate,
__entry->timer_idx, __entry->should_fire)
);
+TRACE_EVENT(kvm_nested_eret,
+ TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
+ unsigned long spsr_el2),
+ TP_ARGS(vcpu, elr_el2, spsr_el2),
+
+ TP_STRUCT__entry(
+ __field(struct kvm_vcpu *, vcpu)
+ __field(unsigned long, elr_el2)
+ __field(unsigned long, spsr_el2)
+ __field(unsigned long, target_mode)
+ __field(unsigned long, hcr_el2)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu = vcpu;
+ __entry->elr_el2 = elr_el2;
+ __entry->spsr_el2 = spsr_el2;
+ __entry->target_mode = spsr_el2 & (PSR_MODE_MASK | PSR_MODE32_BIT);
+ __entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+ ),
+
+ TP_printk("elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+ __entry->elr_el2, __entry->spsr_el2,
+ __print_symbolic(__entry->target_mode, kvm_mode_names),
+ __entry->hcr_el2)
+);
+
+TRACE_EVENT(kvm_inject_nested_exception,
+ TP_PROTO(struct kvm_vcpu *vcpu, u64 esr_el2, int type),
+ TP_ARGS(vcpu, esr_el2, type),
+
+ TP_STRUCT__entry(
+ __field(struct kvm_vcpu *, vcpu)
+ __field(unsigned long, esr_el2)
+ __field(int, type)
+ __field(unsigned long, spsr_el2)
+ __field(unsigned long, pc)
+ __field(unsigned long, source_mode)
+ __field(unsigned long, hcr_el2)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu = vcpu;
+ __entry->esr_el2 = esr_el2;
+ __entry->type = type;
+ __entry->spsr_el2 = *vcpu_cpsr(vcpu);
+ __entry->pc = *vcpu_pc(vcpu);
+ __entry->source_mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+ __entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+ ),
+
+ TP_printk("%s: esr_el2 0x%lx elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+ __print_symbolic(__entry->type, kvm_exception_type_names),
+ __entry->esr_el2, __entry->pc, __entry->spsr_el2,
+ __print_symbolic(__entry->source_mode, kvm_mode_names),
+ __entry->hcr_el2)
+);
+
#endif /* _TRACE_ARM_ARM64_KVM_H */
#undef TRACE_INCLUDE_PATH
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index f6d4f4052555..cd134db41a57 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -465,17 +465,15 @@ out:
/* GENERIC PROBE */
-static int vgic_init_cpu_starting(unsigned int cpu)
+void kvm_vgic_cpu_up(void)
{
enable_percpu_irq(kvm_vgic_global_state.maint_irq, 0);
- return 0;
}
-static int vgic_init_cpu_dying(unsigned int cpu)
+void kvm_vgic_cpu_down(void)
{
disable_percpu_irq(kvm_vgic_global_state.maint_irq);
- return 0;
}
static irqreturn_t vgic_maintenance_handler(int irq, void *data)
@@ -572,7 +570,7 @@ int kvm_vgic_hyp_init(void)
if (ret)
return ret;
- if (!has_mask)
+ if (!has_mask && !kvm_vgic_global_state.maint_irq)
return 0;
ret = request_percpu_irq(kvm_vgic_global_state.maint_irq,
@@ -584,19 +582,6 @@ int kvm_vgic_hyp_init(void)
return ret;
}
- ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
- "kvm/arm/vgic:starting",
- vgic_init_cpu_starting, vgic_init_cpu_dying);
- if (ret) {
- kvm_err("Cannot register vgic CPU notifier\n");
- goto out_free_irq;
- }
-
kvm_info("vgic interrupt IRQ%d\n", kvm_vgic_global_state.maint_irq);
return 0;
-
-out_free_irq:
- free_percpu_irq(kvm_vgic_global_state.maint_irq,
- kvm_get_running_vcpus());
- return ret;
}
diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
index b32d434c1d4a..e67b3b2c8044 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio.c
@@ -473,9 +473,10 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu,
* active state can be overwritten when the VCPU's state is synced coming back
* from the guest.
*
- * For shared interrupts as well as GICv3 private interrupts, we have to
- * stop all the VCPUs because interrupts can be migrated while we don't hold
- * the IRQ locks and we don't want to be chasing moving targets.
+ * For shared interrupts as well as GICv3 private interrupts accessed from the
+ * non-owning CPU, we have to stop all the VCPUs because interrupts can be
+ * migrated while we don't hold the IRQ locks and we don't want to be chasing
+ * moving targets.
*
* For GICv2 private interrupts we don't have to do anything because
* userspace accesses to the VGIC state already require all VCPUs to be
@@ -484,7 +485,8 @@ int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu,
*/
static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
{
- if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
+ if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
+ vcpu != kvm_get_running_vcpu()) ||
intid >= VGIC_NR_PRIVATE_IRQS)
kvm_arm_halt_guest(vcpu->kvm);
}
@@ -492,7 +494,8 @@ static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
/* See vgic_access_active_prepare */
static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid)
{
- if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 ||
+ if ((vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 &&
+ vcpu != kvm_get_running_vcpu()) ||
intid >= VGIC_NR_PRIVATE_IRQS)
kvm_arm_resume_guest(vcpu->kvm);
}
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 684bdfaad4a9..469d816f356f 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -3,6 +3,7 @@
#include <linux/irqchip/arm-gic-v3.h>
#include <linux/irq.h>
#include <linux/irqdomain.h>
+#include <linux/kstrtox.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <kvm/arm_vgic.h>
@@ -584,25 +585,25 @@ DEFINE_STATIC_KEY_FALSE(vgic_v3_cpuif_trap);
static int __init early_group0_trap_cfg(char *buf)
{
- return strtobool(buf, &group0_trap);
+ return kstrtobool(buf, &group0_trap);
}
early_param("kvm-arm.vgic_v3_group0_trap", early_group0_trap_cfg);
static int __init early_group1_trap_cfg(char *buf)
{
- return strtobool(buf, &group1_trap);
+ return kstrtobool(buf, &group1_trap);
}
early_param("kvm-arm.vgic_v3_group1_trap", early_group1_trap_cfg);
static int __init early_common_trap_cfg(char *buf)
{
- return strtobool(buf, &common_trap);
+ return kstrtobool(buf, &common_trap);
}
early_param("kvm-arm.vgic_v3_common_trap", early_common_trap_cfg);
static int __init early_gicv4_enable(char *buf)
{
- return strtobool(buf, &gicv4_enable);
+ return kstrtobool(buf, &gicv4_enable);
}
early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable);
diff --git a/arch/arm64/kvm/vmid.c b/arch/arm64/kvm/vmid.c
index d78ae63d7c15..08978d0672e7 100644
--- a/arch/arm64/kvm/vmid.c
+++ b/arch/arm64/kvm/vmid.c
@@ -16,7 +16,7 @@
#include <asm/kvm_asm.h>
#include <asm/kvm_mmu.h>
-unsigned int kvm_arm_vmid_bits;
+unsigned int __ro_after_init kvm_arm_vmid_bits;
static DEFINE_RAW_SPINLOCK(cpu_vmid_lock);
static atomic64_t vmid_generation;
@@ -172,7 +172,7 @@ void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid)
/*
* Initialize the VMID allocator
*/
-int kvm_arm_vmid_alloc_init(void)
+int __init kvm_arm_vmid_alloc_init(void)
{
kvm_arm_vmid_bits = kvm_get_vmid_bits();
@@ -190,7 +190,7 @@ int kvm_arm_vmid_alloc_init(void)
return 0;
}
-void kvm_arm_vmid_alloc_free(void)
+void __init kvm_arm_vmid_alloc_free(void)
{
kfree(vmid_map);
}
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 10dcfa13390a..37b1340e9646 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -33,6 +33,7 @@ HAS_GIC_PRIO_MASKING
HAS_GIC_PRIO_RELAXED_SYNC
HAS_LDAPR
HAS_LSE_ATOMICS
+HAS_NESTED_VIRT
HAS_NO_FPSIMD
HAS_NO_HW_PREFETCH
HAS_PAN
diff --git a/arch/arm64/tools/gen-sysreg.awk b/arch/arm64/tools/gen-sysreg.awk
index 7f27d66a17e1..6fa0468caa00 100755
--- a/arch/arm64/tools/gen-sysreg.awk
+++ b/arch/arm64/tools/gen-sysreg.awk
@@ -103,6 +103,7 @@ END {
res0 = "UL(0)"
res1 = "UL(0)"
+ unkn = "UL(0)"
next_bit = 63
@@ -117,11 +118,13 @@ END {
define(reg "_RES0", "(" res0 ")")
define(reg "_RES1", "(" res1 ")")
+ define(reg "_UNKN", "(" unkn ")")
print ""
reg = null
res0 = null
res1 = null
+ unkn = null
next
}
@@ -139,6 +142,7 @@ END {
res0 = "UL(0)"
res1 = "UL(0)"
+ unkn = "UL(0)"
define("REG_" reg, "S" op0 "_" op1 "_C" crn "_C" crm "_" op2)
define("SYS_" reg, "sys_reg(" op0 ", " op1 ", " crn ", " crm ", " op2 ")")
@@ -166,7 +170,9 @@ END {
define(reg "_RES0", "(" res0 ")")
if (res1 != null)
define(reg "_RES1", "(" res1 ")")
- if (res0 != null || res1 != null)
+ if (unkn != null)
+ define(reg "_UNKN", "(" unkn ")")
+ if (res0 != null || res1 != null || unkn != null)
print ""
reg = null
@@ -177,6 +183,7 @@ END {
op2 = null
res0 = null
res1 = null
+ unkn = null
next
}
@@ -195,6 +202,7 @@ END {
next_bit = 0
res0 = null
res1 = null
+ unkn = null
next
}
@@ -220,6 +228,16 @@ END {
next
}
+/^Unkn/ && (block == "Sysreg" || block == "SysregFields") {
+ expect_fields(2)
+ parse_bitdef(reg, "UNKN", $2)
+ field = "UNKN_" msb "_" lsb
+
+ unkn = unkn " | GENMASK_ULL(" msb ", " lsb ")"
+
+ next
+}
+
/^Field/ && (block == "Sysreg" || block == "SysregFields") {
expect_fields(3)
field = $3
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 94d78acafb67..dd5a9c7e310f 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -15,6 +15,8 @@
# Res1 <msb>[:<lsb>]
+# Unkn <msb>[:<lsb>]
+
# Field <msb>[:<lsb>] <name>
# Enum <msb>[:<lsb>] <name>
@@ -1779,6 +1781,16 @@ Sysreg SCXTNUM_EL1 3 0 13 0 7
Field 63:0 SoftwareContextNumber
EndSysreg
+# The bit layout for CCSIDR_EL1 depends on whether FEAT_CCIDX is implemented.
+# The following is for case when FEAT_CCIDX is not implemented.
+Sysreg CCSIDR_EL1 3 1 0 0 0
+Res0 63:32
+Unkn 31:28
+Field 27:13 NumSets
+Field 12:3 Associativity
+Field 2:0 LineSize
+EndSysreg
+
Sysreg CLIDR_EL1 3 1 0 0 1
Res0 63:47
Field 46:33 Ttypen
@@ -1795,6 +1807,11 @@ Field 5:3 Ctype2
Field 2:0 Ctype1
EndSysreg
+Sysreg CCSIDR2_EL1 3 1 0 0 2
+Res0 63:24
+Field 23:0 NumSets
+EndSysreg
+
Sysreg GMID_EL1 3 1 0 0 4
Res0 63:4
Field 3:0 BS