From 87e28a15c42cc592009c32a8c20e5789059027c2 Mon Sep 17 00:00:00 2001 From: Pierre Morel Date: Mon, 7 Sep 2020 15:26:07 +0200 Subject: KVM: s390: diag9c (directed yield) forwarding When we intercept a DIAG_9C from the guest we verify that the target real CPU associated with the virtual CPU designated by the guest is running and if not we forward the DIAG_9C to the target real CPU. To avoid a diag9c storm we allow a maximal rate of diag9c forwarding. The rate is calculated as a count per second defined as a new parameter of the s390 kvm module: diag9c_forwarding_hz . The default value of 0 is to not forward diag9c. Signed-off-by: Pierre Morel Link: https://lore.kernel.org/r/1613997661-22525-2-git-send-email-pmorel@linux.ibm.com Reviewed-by: Cornelia Huck Signed-off-by: Christian Borntraeger --- Documentation/virt/kvm/s390-diag.rst | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/s390-diag.rst b/Documentation/virt/kvm/s390-diag.rst index eaac4864d3d6..ca85f030eb0b 100644 --- a/Documentation/virt/kvm/s390-diag.rst +++ b/Documentation/virt/kvm/s390-diag.rst @@ -84,3 +84,36 @@ If the function code specifies 0x501, breakpoint functions may be performed. This function code is handled by userspace. This diagnose function code has no subfunctions and uses no parameters. + + +DIAGNOSE function code 'X'9C - Voluntary Time Slice Yield +--------------------------------------------------------- + +General register 1 contains the target CPU address. + +In a guest of a hypervisor like LPAR, KVM or z/VM using shared host CPUs, +DIAGNOSE with function code 0x9c may improve system performance by +yielding the host CPU on which the guest CPU is running to be assigned +to another guest CPU, preferably the logical CPU containing the specified +target CPU. + + +DIAG 'X'9C forwarding ++++++++++++++++++++++ + +The guest may send a DIAGNOSE 0x9c in order to yield to a certain +other vcpu. An example is a Linux guest that tries to yield to the vcpu +that is currently holding a spinlock, but not running. + +However, on the host the real cpu backing the vcpu may itself not be +running. +Forwarding the DIAGNOSE 0x9c initially sent by the guest to yield to +the backing cpu will hopefully cause that cpu, and thus subsequently +the guest's vcpu, to be scheduled. + + +diag9c_forwarding_hz + KVM kernel parameter allowing to specify the maximum number of DIAGNOSE + 0x9c forwarding per second in the purpose of avoiding a DIAGNOSE 0x9c + forwarding storm. + A value of 0 turns the forwarding off. -- cgit v1.2.3 From 8a406c89532c91ee50688d4e728474dd09a11be3 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 25 Feb 2021 12:47:37 -0800 Subject: KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs Rename the various A/D status defines to explicitly associated them with TDP. There is a subtle dependency on the bits in question never being set when using PAE paging, as those bits are reserved, not available. I.e. using these bits outside of TDP (technically EPT) would cause explosions. No functional change intended. Signed-off-by: Sean Christopherson Message-Id: <20210225204749.1512652-13-seanjc@google.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/locking.rst | 37 ++++++++++++++++++------------------- arch/x86/kvm/mmu/spte.c | 17 ++++++++++++----- arch/x86/kvm/mmu/spte.h | 34 ++++++++++++++++++++++++++-------- 3 files changed, 56 insertions(+), 32 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 0aa4817b466d..85876afe0441 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -38,12 +38,11 @@ the mmu-lock on x86. Currently, the page fault can be fast in one of the following two cases: 1. Access Tracking: The SPTE is not present, but it is marked for access - tracking i.e. the SPTE_SPECIAL_MASK is set. That means we need to - restore the saved R/X bits. This is described in more detail later below. + tracking. That means we need to restore the saved R/X bits. This is + described in more detail later below. -2. Write-Protection: The SPTE is present and the fault is - caused by write-protect. That means we just need to change the W bit of - the spte. +2. Write-Protection: The SPTE is present and the fault is caused by + write-protect. That means we just need to change the W bit of the spte. What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and SPTE_MMU_WRITEABLE bit on the spte: @@ -54,9 +53,9 @@ SPTE_MMU_WRITEABLE bit on the spte: page write-protection. On fast page fault path, we will use cmpxchg to atomically set the spte W -bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, or -restore the saved R/X bits if VMX_EPT_TRACK_ACCESS mask is set, or both. This -is safe because whenever changing these bits can be detected by cmpxchg. +bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, to +restore the saved R/X bits if for an access-traced spte, or both. This is +safe because whenever changing these bits can be detected by cmpxchg. But we need carefully check these cases: @@ -185,17 +184,17 @@ See the comments in spte_has_volatile_bits() and mmu_spte_update(). Lockless Access Tracking: This is used for Intel CPUs that are using EPT but do not support the EPT A/D -bits. In this case, when the KVM MMU notifier is called to track accesses to a -page (via kvm_mmu_notifier_clear_flush_young), it marks the PTE as not-present -by clearing the RWX bits in the PTE and storing the original R & X bits in -some unused/ignored bits. In addition, the SPTE_SPECIAL_MASK is also set on the -PTE (using the ignored bit 62). When the VM tries to access the page later on, -a fault is generated and the fast page fault mechanism described above is used -to atomically restore the PTE to a Present state. The W bit is not saved when -the PTE is marked for access tracking and during restoration to the Present -state, the W bit is set depending on whether or not it was a write access. If -it wasn't, then the W bit will remain clear until a write access happens, at -which time it will be set using the Dirty tracking mechanism described above. +bits. In this case, PTEs are tagged as A/D disabled (using ignored bits), and +when the KVM MMU notifier is called to track accesses to a page (via +kvm_mmu_notifier_clear_flush_young), it marks the PTE not-present in hardware +by clearing the RWX bits in the PTE and storing the original R & X bits in more +unused/ignored bits. When the VM tries to access the page later on, a fault is +generated and the fast page fault mechanism described above is used to +atomically restore the PTE to a Present state. The W bit is not saved when the +PTE is marked for access tracking and during restoration to the Present state, +the W bit is set depending on whether or not it was a write access. If it +wasn't, then the W bit will remain clear until a write access happens, at which +time it will be set using the Dirty tracking mechanism described above. 3. Reference ------------ diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 503dec3f8c7a..3eaf143b7d12 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -42,7 +42,7 @@ static u64 generation_mmio_spte_mask(u64 gen) u64 mask; WARN_ON(gen & ~MMIO_SPTE_GEN_MASK); - BUILD_BUG_ON((MMIO_SPTE_GEN_HIGH_MASK | MMIO_SPTE_GEN_LOW_MASK) & SPTE_SPECIAL_MASK); + BUILD_BUG_ON((MMIO_SPTE_GEN_HIGH_MASK | MMIO_SPTE_GEN_LOW_MASK) & SPTE_TDP_AD_MASK); mask = (gen << MMIO_SPTE_GEN_LOW_SHIFT) & MMIO_SPTE_GEN_LOW_MASK; mask |= (gen << MMIO_SPTE_GEN_HIGH_SHIFT) & MMIO_SPTE_GEN_HIGH_MASK; @@ -96,9 +96,16 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, int ret = 0; if (ad_disabled) - spte |= SPTE_AD_DISABLED_MASK; + spte |= SPTE_TDP_AD_DISABLED_MASK; else if (kvm_vcpu_ad_need_write_protect(vcpu)) - spte |= SPTE_AD_WRPROT_ONLY_MASK; + spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK; + + /* + * Bits 62:52 of PAE SPTEs are reserved. WARN if said bits are set + * if PAE paging may be employed (shadow paging or any 32-bit KVM). + */ + WARN_ON_ONCE((!tdp_enabled || !IS_ENABLED(CONFIG_X86_64)) && + (spte & SPTE_TDP_AD_MASK)); /* * For the EPT case, shadow_present_mask is 0 if hardware @@ -180,7 +187,7 @@ u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled) shadow_user_mask | shadow_x_mask | shadow_me_mask; if (ad_disabled) - spte |= SPTE_AD_DISABLED_MASK; + spte |= SPTE_TDP_AD_DISABLED_MASK; else spte |= shadow_accessed_mask; @@ -288,7 +295,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, { BUG_ON(!dirty_mask != !accessed_mask); BUG_ON(!accessed_mask && !acc_track_mask); - BUG_ON(acc_track_mask & SPTE_SPECIAL_MASK); + BUG_ON(acc_track_mask & SPTE_TDP_AD_MASK); shadow_user_mask = user_mask; shadow_accessed_mask = accessed_mask; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 642a17b9964c..fd0a7911f098 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -8,11 +8,24 @@ #define PT_FIRST_AVAIL_BITS_SHIFT 10 #define PT64_SECOND_AVAIL_BITS_SHIFT 54 -/* The mask used to denote Access Tracking SPTEs. Note, val=3 is available. */ -#define SPTE_SPECIAL_MASK (3ULL << 52) -#define SPTE_AD_ENABLED_MASK (0ULL << 52) -#define SPTE_AD_DISABLED_MASK (1ULL << 52) -#define SPTE_AD_WRPROT_ONLY_MASK (2ULL << 52) +/* + * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also + * be restricted to using write-protection (for L2 when CPU dirty logging, i.e. + * PML, is enabled). Use bits 52 and 53 to hold the type of A/D tracking that + * is must be employed for a given TDP SPTE. + * + * Note, the "enabled" mask must be '0', as bits 62:52 are _reserved_ for PAE + * paging, including NPT PAE. This scheme works because legacy shadow paging + * is guaranteed to have A/D bits and write-protection is forced only for + * TDP with CPU dirty logging (PML). If NPT ever gains PML-like support, it + * must be restricted to 64-bit KVM. + */ +#define SPTE_TDP_AD_SHIFT 52 +#define SPTE_TDP_AD_MASK (3ULL << SPTE_TDP_AD_SHIFT) +#define SPTE_TDP_AD_ENABLED_MASK (0ULL << SPTE_TDP_AD_SHIFT) +#define SPTE_TDP_AD_DISABLED_MASK (1ULL << SPTE_TDP_AD_SHIFT) +#define SPTE_TDP_AD_WRPROT_ONLY_MASK (2ULL << SPTE_TDP_AD_SHIFT) +static_assert(SPTE_TDP_AD_ENABLED_MASK == 0); #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK #define PT64_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1)) @@ -100,7 +113,7 @@ extern u64 __read_mostly shadow_present_mask; extern u64 __read_mostly shadow_me_mask; /* - * SPTEs used by MMUs without A/D bits are marked with SPTE_AD_DISABLED_MASK; + * SPTEs in MMUs without A/D bits are marked with SPTE_TDP_AD_DISABLED_MASK; * shadow_acc_track_mask is the set of bits to be cleared in non-accessed * pages. */ @@ -176,13 +189,18 @@ static inline bool sp_ad_disabled(struct kvm_mmu_page *sp) static inline bool spte_ad_enabled(u64 spte) { MMU_WARN_ON(is_mmio_spte(spte)); - return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_DISABLED_MASK; + return (spte & SPTE_TDP_AD_MASK) != SPTE_TDP_AD_DISABLED_MASK; } static inline bool spte_ad_need_write_protect(u64 spte) { MMU_WARN_ON(is_mmio_spte(spte)); - return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK; + /* + * This is benign for non-TDP SPTEs as SPTE_TDP_AD_ENABLED_MASK is '0', + * and non-TDP SPTEs will never set these bits. Optimize for 64-bit + * TDP and do the A/D type check unconditionally. + */ + return (spte & SPTE_TDP_AD_MASK) != SPTE_TDP_AD_ENABLED_MASK; } static inline u64 spte_shadow_accessed_mask(u64 spte) -- cgit v1.2.3 From 5fc3424f8b854584f8f6fb6ea03f1419487fdc96 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 25 Feb 2021 12:47:43 -0800 Subject: KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic Make the location of the HOST_WRITABLE and MMU_WRITABLE configurable for a given KVM instance. This will allow EPT to use high available bits, which in turn will free up bit 11 for a constant MMU_PRESENT bit. No functional change intended. Signed-off-by: Sean Christopherson Message-Id: <20210225204749.1512652-19-seanjc@google.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/locking.rst | 18 +++++++++--------- arch/x86/kvm/mmu.h | 12 ++++++------ arch/x86/kvm/mmu/mmu.c | 8 ++++---- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/mmu/spte.c | 13 +++++++++---- arch/x86/kvm/mmu/spte.h | 13 ++++++------- arch/x86/kvm/mmu/tdp_mmu.c | 6 +++--- 7 files changed, 38 insertions(+), 34 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 85876afe0441..1fc860c007a3 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -44,18 +44,18 @@ following two cases: 2. Write-Protection: The SPTE is present and the fault is caused by write-protect. That means we just need to change the W bit of the spte. -What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and -SPTE_MMU_WRITEABLE bit on the spte: +What we use to avoid all the race is the Host-writable bit and MMU-writable bit +on the spte: -- SPTE_HOST_WRITEABLE means the gfn is writable on host. -- SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when - the gfn is writable on guest mmu and it is not write-protected by shadow - page write-protection. +- Host-writable means the gfn is writable in the host kernel page tables and in + its KVM memslot. +- MMU-writable means the gfn is writable in the guest's mmu and it is not + write-protected by shadow page write-protection. On fast page fault path, we will use cmpxchg to atomically set the spte W -bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, to -restore the saved R/X bits if for an access-traced spte, or both. This is -safe because whenever changing these bits can be detected by cmpxchg. +bit if spte.HOST_WRITEABLE = 1 and spte.WRITE_PROTECT = 1, to restore the saved +R/X bits if for an access-traced spte, or both. This is safe because whenever +changing these bits can be detected by cmpxchg. But we need carefully check these cases: diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 63bf9aa5aefa..67e8c7c7a6ce 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -129,7 +129,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, * write-protects guest page to sync the guest modification, b) another one is * used to sync dirty bitmap when we do KVM_GET_DIRTY_LOG. The differences * between these two sorts are: - * 1) the first case clears SPTE_MMU_WRITEABLE bit. + * 1) the first case clears MMU-writable bit. * 2) the first case requires flushing tlb immediately avoiding corrupting * shadow page table between all vcpus so it should be in the protection of * mmu-lock. And the another case does not need to flush tlb until returning @@ -140,17 +140,17 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, * So, there is the problem: the first case can meet the corrupted tlb caused * by another case which write-protects pages but without flush tlb * immediately. In order to making the first case be aware this problem we let - * it flush tlb if we try to write-protect a spte whose SPTE_MMU_WRITEABLE bit - * is set, it works since another case never touches SPTE_MMU_WRITEABLE bit. + * it flush tlb if we try to write-protect a spte whose MMU-writable bit + * is set, it works since another case never touches MMU-writable bit. * * Anyway, whenever a spte is updated (only permission and status bits are - * changed) we need to check whether the spte with SPTE_MMU_WRITEABLE becomes + * changed) we need to check whether the spte with MMU-writable becomes * readonly, if that happens, we need to flush tlb. Fortunately, * mmu_spte_update() has already handled it perfectly. * - * The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK: + * The rules to use MMU-writable and PT_WRITABLE_MASK: * - if we want to see if it has writable tlb entry or if the spte can be - * writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is the most + * writable on the mmu mapping, check MMU-writable, this is the most * case, otherwise * - if we fix page fault on the spte or do write-protection by dirty logging, * check PT_WRITABLE_MASK. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2e55f04de80a..f15b96c9ee2f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1107,7 +1107,7 @@ static bool spte_write_protect(u64 *sptep, bool pt_protect) rmap_printk("spte %p %llx\n", sptep, *sptep); if (pt_protect) - spte &= ~SPTE_MMU_WRITEABLE; + spte &= ~shadow_mmu_writable_mask; spte = spte & ~PT_WRITABLE_MASK; return mmu_spte_update(sptep, spte); @@ -5529,9 +5529,9 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, * spte from present to present (changing the spte from present * to nonpresent will flush all the TLBs immediately), in other * words, the only case we care is mmu_spte_update() where we - * have checked SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE - * instead of PT_WRITABLE_MASK, that means it does not depend - * on PT_WRITABLE_MASK anymore. + * have checked Host-writable | MMU-writable instead of + * PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK + * anymore. */ if (flush) kvm_arch_flush_remote_tlbs_memslot(kvm, memslot); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index b9751a9f9c37..70b7e44e3035 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -1085,7 +1085,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) nr_present++; - host_writable = sp->spt[i] & SPTE_HOST_WRITEABLE; + host_writable = sp->spt[i] & shadow_host_writable_mask; set_spte_ret |= set_spte(vcpu, &sp->spt[i], pte_access, PG_LEVEL_4K, diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index ac5ea6fda969..2329ba60c67a 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -21,6 +21,8 @@ static bool __read_mostly enable_mmio_caching = true; module_param_named(mmio_caching, enable_mmio_caching, bool, 0444); +u64 __read_mostly shadow_host_writable_mask; +u64 __read_mostly shadow_mmu_writable_mask; u64 __read_mostly shadow_nx_mask; u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ u64 __read_mostly shadow_user_mask; @@ -137,7 +139,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, kvm_is_mmio_pfn(pfn)); if (host_writable) - spte |= SPTE_HOST_WRITEABLE; + spte |= shadow_host_writable_mask; else pte_access &= ~ACC_WRITE_MASK; @@ -147,7 +149,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, spte |= (u64)pfn << PAGE_SHIFT; if (pte_access & ACC_WRITE_MASK) { - spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE; + spte |= PT_WRITABLE_MASK | shadow_mmu_writable_mask; /* * Optimization: for pte sync, if spte was writable the hash @@ -163,7 +165,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, __func__, gfn); ret |= SET_SPTE_WRITE_PROTECTED_PT; pte_access &= ~ACC_WRITE_MASK; - spte &= ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE); + spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask); } } @@ -202,7 +204,7 @@ u64 kvm_mmu_changed_pte_notifier_make_spte(u64 old_spte, kvm_pfn_t new_pfn) new_spte |= (u64)new_pfn << PAGE_SHIFT; new_spte &= ~PT_WRITABLE_MASK; - new_spte &= ~SPTE_HOST_WRITEABLE; + new_spte &= ~shadow_host_writable_mask; new_spte = mark_spte_for_access_track(new_spte); @@ -342,6 +344,9 @@ void kvm_mmu_reset_all_pte_masks(void) shadow_acc_track_mask = 0; shadow_me_mask = sme_me_mask; + shadow_host_writable_mask = DEFAULT_SPTE_HOST_WRITEABLE; + shadow_mmu_writable_mask = DEFAULT_SPTE_MMU_WRITEABLE; + /* * Set a reserved PA bit in MMIO SPTEs to generate page faults with * PFEC.RSVD=1 on MMIO accesses. 64-bit PTEs (PAE, x86-64, and EPT diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index e918b8f0b21d..287540d211a9 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -5,8 +5,6 @@ #include "mmu_internal.h" -#define PT_FIRST_AVAIL_BITS_SHIFT 10 - /* * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also * be restricted to using write-protection (for L2 when CPU dirty logging, i.e. @@ -59,9 +57,8 @@ static_assert(SPTE_TDP_AD_ENABLED_MASK == 0); (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1)) #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) - -#define SPTE_HOST_WRITEABLE (1ULL << PT_FIRST_AVAIL_BITS_SHIFT) -#define SPTE_MMU_WRITEABLE (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1)) +#define DEFAULT_SPTE_HOST_WRITEABLE BIT_ULL(10) +#define DEFAULT_SPTE_MMU_WRITEABLE BIT_ULL(11) /* * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of @@ -100,6 +97,8 @@ static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11); #define MMIO_SPTE_GEN_MASK GENMASK_ULL(MMIO_SPTE_GEN_LOW_BITS + MMIO_SPTE_GEN_HIGH_BITS - 1, 0) +extern u64 __read_mostly shadow_host_writable_mask; +extern u64 __read_mostly shadow_mmu_writable_mask; extern u64 __read_mostly shadow_nx_mask; extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */ extern u64 __read_mostly shadow_user_mask; @@ -264,8 +263,8 @@ static inline bool is_dirty_spte(u64 spte) static inline bool spte_can_locklessly_be_made_writable(u64 spte) { - return (spte & (SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE)) == - (SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE); + return (spte & shadow_host_writable_mask) && + (spte & shadow_mmu_writable_mask); } static inline u64 get_mmio_spte_generation(u64 spte) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0f8a5dabbb23..d46fd18fe22e 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1335,7 +1335,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm, /* * Removes write access on the last level SPTE mapping this GFN and unsets the - * SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted. + * MMU-writable bit to ensure future writes continue to be intercepted. * Returns true if an SPTE was set and a TLB flush is needed. */ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, @@ -1352,7 +1352,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, break; new_spte = iter.old_spte & - ~(PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE); + ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask); tdp_mmu_set_spte(kvm, &iter, new_spte); spte_set = true; @@ -1365,7 +1365,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root, /* * Removes write access on the last level SPTE mapping this GFN and unsets the - * SPTE_MMU_WRITABLE bit to ensure future writes continue to be intercepted. + * MMU-writable bit to ensure future writes continue to be intercepted. * Returns true if an SPTE was set and a TLB flush is needed. */ bool kvm_tdp_mmu_write_protect_gfn(struct kvm *kvm, -- cgit v1.2.3 From 8082d50f4817ff6a7e08f4b7e9b18e5f8bfa290d Mon Sep 17 00:00:00 2001 From: Shenming Lu Date: Mon, 22 Mar 2021 14:01:58 +0800 Subject: KVM: arm64: GICv4.1: Give a chance to save VLPI state MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Before GICv4.1, we don't have direct access to the VLPI state. So we simply let it fail early when encountering any VLPI in saving. But now we don't have to return -EACCES directly if on GICv4.1. Let’s change the hard code and give a chance to save the VLPI state (and preserve the UAPI). Signed-off-by: Shenming Lu Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20210322060158.1584-7-lushenming@huawei.com --- Documentation/virt/kvm/devices/arm-vgic-its.rst | 2 +- arch/arm64/kvm/vgic/vgic-its.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/devices/arm-vgic-its.rst b/Documentation/virt/kvm/devices/arm-vgic-its.rst index 6c304fd2b1b4..d257eddbae29 100644 --- a/Documentation/virt/kvm/devices/arm-vgic-its.rst +++ b/Documentation/virt/kvm/devices/arm-vgic-its.rst @@ -80,7 +80,7 @@ KVM_DEV_ARM_VGIC_GRP_CTRL -EFAULT Invalid guest ram access -EBUSY One or more VCPUS are running -EACCES The virtual ITS is backed by a physical GICv4 ITS, and the - state is not available + state is not available without GICv4.1 ======= ========================================================== KVM_DEV_ARM_VGIC_GRP_ITS_REGS diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c index 40cbaca81333..ec7543a9617c 100644 --- a/arch/arm64/kvm/vgic/vgic-its.c +++ b/arch/arm64/kvm/vgic/vgic-its.c @@ -2218,10 +2218,10 @@ static int vgic_its_save_itt(struct vgic_its *its, struct its_device *device) /* * If an LPI carries the HW bit, this means that this * interrupt is controlled by GICv4, and we do not - * have direct access to that state. Let's simply fail - * the save operation... + * have direct access to that state without GICv4.1. + * Let's simply fail the save operation... */ - if (ite->irq->hw) + if (ite->irq->hw && !kvm_vgic_global_state.has_gicv4_1) return -EACCES; ret = vgic_its_save_ite(its, device, ite, gpa, ite_esz); -- cgit v1.2.3 From 298c41b8fa1e02c5a35e2263d138583220ab6094 Mon Sep 17 00:00:00 2001 From: Eric Auger Date: Mon, 5 Apr 2021 18:39:37 +0200 Subject: docs: kvm: devices/arm-vgic-v3: enhance KVM_DEV_ARM_VGIC_CTRL_INIT doc kvm_arch_vcpu_precreate() returns -EBUSY if the vgic is already initialized. So let's document that KVM_DEV_ARM_VGIC_CTRL_INIT must be called after all vcpu creations. Signed-off-by: Eric Auger Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20210405163941.510258-6-eric.auger@redhat.com --- Documentation/virt/kvm/devices/arm-vgic-v3.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst b/Documentation/virt/kvm/devices/arm-vgic-v3.rst index 5dd3bff51978..51e5e5762571 100644 --- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst +++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst @@ -228,7 +228,7 @@ Groups: KVM_DEV_ARM_VGIC_CTRL_INIT request the initialization of the VGIC, no additional parameter in - kvm_device_attr.addr. + kvm_device_attr.addr. Must be called after all VCPUs have been created. KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES save all LPI pending bits into guest RAM pending tables. -- cgit v1.2.3 From e7cc4f2303b0ce1ecb9d8d381a1763bfea15fea9 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Mon, 5 Apr 2021 17:43:01 +0100 Subject: dts: bindings: Document device tree bindings for ETE Document the device tree bindings for Embedded Trace Extensions. ETE can be connected to legacy coresight components and thus could optionally contain a connection graph as described by the CoreSight bindings. Cc: devicetree@vger.kernel.org Cc: Mathieu Poirier Cc: Mike Leach Reviewed-by: Rob Herring Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20210405164307.1720226-15-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier --- Documentation/devicetree/bindings/arm/ete.yaml | 75 ++++++++++++++++++++++++++ MAINTAINERS | 1 + 2 files changed, 76 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/ete.yaml (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/arm/ete.yaml b/Documentation/devicetree/bindings/arm/ete.yaml new file mode 100644 index 000000000000..7f9b2d1e1147 --- /dev/null +++ b/Documentation/devicetree/bindings/arm/ete.yaml @@ -0,0 +1,75 @@ +# SPDX-License-Identifier: GPL-2.0-only or BSD-2-Clause +# Copyright 2021, Arm Ltd +%YAML 1.2 +--- +$id: "http://devicetree.org/schemas/arm/ete.yaml#" +$schema: "http://devicetree.org/meta-schemas/core.yaml#" + +title: ARM Embedded Trace Extensions + +maintainers: + - Suzuki K Poulose + - Mathieu Poirier + +description: | + Arm Embedded Trace Extension(ETE) is a per CPU trace component that + allows tracing the CPU execution. It overlaps with the CoreSight ETMv4 + architecture and has extended support for future architecture changes. + The trace generated by the ETE could be stored via legacy CoreSight + components (e.g, TMC-ETR) or other means (e.g, using a per CPU buffer + Arm Trace Buffer Extension (TRBE)). Since the ETE can be connected to + legacy CoreSight components, a node must be listed per instance, along + with any optional connection graph as per the coresight bindings. + See bindings/arm/coresight.txt. + +properties: + $nodename: + pattern: "^ete([0-9a-f]+)$" + compatible: + items: + - const: arm,embedded-trace-extension + + cpu: + description: | + Handle to the cpu this ETE is bound to. + $ref: /schemas/types.yaml#/definitions/phandle + + out-ports: + description: | + Output connections from the ETE to legacy CoreSight trace bus. + $ref: /schemas/graph.yaml#/properties/ports + properties: + port: + description: Output connection from the ETE to legacy CoreSight Trace bus. + $ref: /schemas/graph.yaml#/properties/port + +required: + - compatible + - cpu + +additionalProperties: false + +examples: + +# An ETE node without legacy CoreSight connections + - | + ete0 { + compatible = "arm,embedded-trace-extension"; + cpu = <&cpu_0>; + }; +# An ETE node with legacy CoreSight connections + - | + ete1 { + compatible = "arm,embedded-trace-extension"; + cpu = <&cpu_1>; + + out-ports { /* legacy coresight connection */ + port { + ete1_out_port: endpoint { + remote-endpoint = <&funnel_in_port0>; + }; + }; + }; + }; + +... diff --git a/MAINTAINERS b/MAINTAINERS index aa84121c5611..2a20a36c724a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1761,6 +1761,7 @@ F: Documentation/ABI/testing/sysfs-bus-coresight-devices-* F: Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt F: Documentation/devicetree/bindings/arm/coresight-cti.yaml F: Documentation/devicetree/bindings/arm/coresight.txt +F: Documentation/devicetree/bindings/arm/ete.yaml F: Documentation/trace/coresight/* F: drivers/hwtracing/coresight/* F: include/dt-bindings/arm/coresight-cti-dt.h -- cgit v1.2.3 From b20f34aec776f4c735cd3a899e9bc3333463848a Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 5 Apr 2021 17:43:05 +0100 Subject: Documentation: coresight: trbe: Sysfs ABI description Add sysfs ABI documentation for the TRBE devices. Cc: Mathieu Poirier Cc: Mike Leach Cc: Jonathan Corbet Cc: linux-doc@vger.kernel.org Signed-off-by: Anshuman Khandual [ Split from the TRBE driver patch ] Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20210405164307.1720226-19-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier --- Documentation/ABI/testing/sysfs-bus-coresight-devices-trbe | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-bus-coresight-devices-trbe (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-trbe b/Documentation/ABI/testing/sysfs-bus-coresight-devices-trbe new file mode 100644 index 000000000000..ad3bbc6fa751 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-trbe @@ -0,0 +1,14 @@ +What: /sys/bus/coresight/devices/trbe/align +Date: March 2021 +KernelVersion: 5.13 +Contact: Anshuman Khandual +Description: (Read) Shows the TRBE write pointer alignment. This value + is fetched from the TRBIDR register. + +What: /sys/bus/coresight/devices/trbe/flag +Date: March 2021 +KernelVersion: 5.13 +Contact: Anshuman Khandual +Description: (Read) Shows if TRBE updates in the memory are with access + and dirty flag updates as well. This value is fetched from + the TRBIDR register. -- cgit v1.2.3 From 4af432186122bb274b76e7ac549073122c41d2fb Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 5 Apr 2021 17:43:06 +0100 Subject: Documentation: trace: Add documentation for TRBE Add documentation for the TRBE under trace/coresight. Cc: Jonathan Corbet Cc: linux-doc@vger.kernel.org Signed-off-by: Anshuman Khandual [ Split from the TRBE driver patch ] Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20210405164307.1720226-20-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier --- Documentation/trace/coresight/coresight-trbe.rst | 38 ++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 Documentation/trace/coresight/coresight-trbe.rst (limited to 'Documentation') diff --git a/Documentation/trace/coresight/coresight-trbe.rst b/Documentation/trace/coresight/coresight-trbe.rst new file mode 100644 index 000000000000..b9928ef148da --- /dev/null +++ b/Documentation/trace/coresight/coresight-trbe.rst @@ -0,0 +1,38 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================== +Trace Buffer Extension (TRBE). +============================== + + :Author: Anshuman Khandual + :Date: November 2020 + +Hardware Description +-------------------- + +Trace Buffer Extension (TRBE) is a percpu hardware which captures in system +memory, CPU traces generated from a corresponding percpu tracing unit. This +gets plugged in as a coresight sink device because the corresponding trace +generators (ETE), are plugged in as source device. + +The TRBE is not compliant to CoreSight architecture specifications, but is +driven via the CoreSight driver framework to support the ETE (which is +CoreSight compliant) integration. + +Sysfs files and directories +--------------------------- + +The TRBE devices appear on the existing coresight bus alongside the other +coresight devices:: + + >$ ls /sys/bus/coresight/devices + trbe0 trbe1 trbe2 trbe3 + +The ``trbe`` named TRBEs are associated with a CPU.:: + + >$ ls /sys/bus/coresight/devices/trbe0/ + align flag + +*Key file items are:-* + * ``align``: TRBE write pointer alignment + * ``flag``: TRBE updates memory with access and dirty flags -- cgit v1.2.3 From 4fb13790417a7bf726f3867a5d2b9723efde488b Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Mon, 5 Apr 2021 17:43:07 +0100 Subject: dts: bindings: Document device tree bindings for Arm TRBE Document the device tree bindings for Trace Buffer Extension (TRBE). Cc: Anshuman Khandual Cc: Mathieu Poirier Cc: Rob Herring Cc: devicetree@vger.kernel.org Reviewed-by: Rob Herring Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20210405164307.1720226-21-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier --- Documentation/devicetree/bindings/arm/trbe.yaml | 49 +++++++++++++++++++++++++ MAINTAINERS | 1 + 2 files changed, 50 insertions(+) create mode 100644 Documentation/devicetree/bindings/arm/trbe.yaml (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/arm/trbe.yaml b/Documentation/devicetree/bindings/arm/trbe.yaml new file mode 100644 index 000000000000..4402d7bfd1fc --- /dev/null +++ b/Documentation/devicetree/bindings/arm/trbe.yaml @@ -0,0 +1,49 @@ +# SPDX-License-Identifier: GPL-2.0-only or BSD-2-Clause +# Copyright 2021, Arm Ltd +%YAML 1.2 +--- +$id: "http://devicetree.org/schemas/arm/trbe.yaml#" +$schema: "http://devicetree.org/meta-schemas/core.yaml#" + +title: ARM Trace Buffer Extensions + +maintainers: + - Anshuman Khandual + +description: | + Arm Trace Buffer Extension (TRBE) is a per CPU component + for storing trace generated on the CPU to memory. It is + accessed via CPU system registers. The software can verify + if it is permitted to use the component by checking the + TRBIDR register. + +properties: + $nodename: + const: "trbe" + compatible: + items: + - const: arm,trace-buffer-extension + + interrupts: + description: | + Exactly 1 PPI must be listed. For heterogeneous systems where + TRBE is only supported on a subset of the CPUs, please consult + the arm,gic-v3 binding for details on describing a PPI partition. + maxItems: 1 + +required: + - compatible + - interrupts + +additionalProperties: false + +examples: + + - | + #include + + trbe { + compatible = "arm,trace-buffer-extension"; + interrupts = ; + }; +... diff --git a/MAINTAINERS b/MAINTAINERS index 2a20a36c724a..471d04bb2d5e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1762,6 +1762,7 @@ F: Documentation/devicetree/bindings/arm/coresight-cpu-debug.txt F: Documentation/devicetree/bindings/arm/coresight-cti.yaml F: Documentation/devicetree/bindings/arm/coresight.txt F: Documentation/devicetree/bindings/arm/ete.yaml +F: Documentation/devicetree/bindings/arm/trbe.yaml F: Documentation/trace/coresight/* F: drivers/hwtracing/coresight/* F: include/dt-bindings/arm/coresight-cti-dt.h -- cgit v1.2.3 From 3bf725699bf62494b3e179f1795f08c7d749f061 Mon Sep 17 00:00:00 2001 From: Jianyong Wu Date: Wed, 9 Dec 2020 14:09:29 +0800 Subject: KVM: arm64: Add support for the KVM PTP service Implement the hypervisor side of the KVM PTP interface. The service offers wall time and cycle count from host to guest. The caller must specify whether they want the host's view of either the virtual or physical counter. Signed-off-by: Jianyong Wu Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201209060932.212364-7-jianyong.wu@arm.com --- Documentation/virt/kvm/api.rst | 10 +++++++ Documentation/virt/kvm/arm/index.rst | 1 + Documentation/virt/kvm/arm/ptp_kvm.rst | 25 ++++++++++++++++ arch/arm64/kvm/arm.c | 1 + arch/arm64/kvm/hypercalls.c | 53 ++++++++++++++++++++++++++++++++++ include/linux/arm-smccc.h | 16 ++++++++++ include/uapi/linux/kvm.h | 1 + 7 files changed, 107 insertions(+) create mode 100644 Documentation/virt/kvm/arm/ptp_kvm.rst (limited to 'Documentation') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 38e327d4b479..987d99e39887 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6724,3 +6724,13 @@ vcpu_info is set. The KVM_XEN_HVM_CONFIG_RUNSTATE flag indicates that the runstate-related features KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR/_CURRENT/_DATA/_ADJUST are supported by the KVM_XEN_VCPU_SET_ATTR/KVM_XEN_VCPU_GET_ATTR ioctls. + +8.31 KVM_CAP_PTP_KVM +-------------------- + +:Architectures: arm64 + +This capability indicates that the KVM virtual PTP service is +supported in the host. A VMM can check whether the service is +available to the guest on migration. + diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst index 3e2b2aba90fc..78a9b670aafe 100644 --- a/Documentation/virt/kvm/arm/index.rst +++ b/Documentation/virt/kvm/arm/index.rst @@ -10,3 +10,4 @@ ARM hyp-abi psci pvtime + ptp_kvm diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst b/Documentation/virt/kvm/arm/ptp_kvm.rst new file mode 100644 index 000000000000..68cffb50d8bf --- /dev/null +++ b/Documentation/virt/kvm/arm/ptp_kvm.rst @@ -0,0 +1,25 @@ +.. SPDX-License-Identifier: GPL-2.0 + +PTP_KVM support for arm/arm64 +============================= + +PTP_KVM is used for high precision time sync between host and guests. +It relies on transferring the wall clock and counter value from the +host to the guest using a KVM-specific hypercall. + +* ARM_SMCCC_HYP_KVM_PTP_FUNC_ID: 0x86000001 + +This hypercall uses the SMC32/HVC32 calling convention: + +ARM_SMCCC_HYP_KVM_PTP_FUNC_ID + ============= ========== ========== + Function ID: (uint32) 0x86000001 + Arguments: (uint32) KVM_PTP_VIRT_COUNTER(0) + KVM_PTP_PHYS_COUNTER(1) + Return Values: (int32) NOT_SUPPORTED(-1) on error, or + (uint32) Upper 32 bits of wall clock time (r0) + (uint32) Lower 32 bits of wall clock time (r1) + (uint32) Upper 32 bits of counter (r2) + (uint32) Lower 32 bits of counter (r3) + Endianness: No Restrictions. + ============= ========== ========== diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 7f06ba76698d..46401798c644 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -206,6 +206,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ARM_INJECT_EXT_DABT: case KVM_CAP_SET_GUEST_DEBUG: case KVM_CAP_VCPU_ATTRIBUTES: + case KVM_CAP_PTP_KVM: r = 1; break; case KVM_CAP_ARM_SET_DEVICE_ADDR: diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c index 78d32c34d49c..30da78f72b3b 100644 --- a/arch/arm64/kvm/hypercalls.c +++ b/arch/arm64/kvm/hypercalls.c @@ -9,6 +9,55 @@ #include #include +static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val) +{ + struct system_time_snapshot systime_snapshot; + u64 cycles = ~0UL; + u32 feature; + + /* + * system time and counter value must captured at the same + * time to keep consistency and precision. + */ + ktime_get_snapshot(&systime_snapshot); + + /* + * This is only valid if the current clocksource is the + * architected counter, as this is the only one the guest + * can see. + */ + if (systime_snapshot.cs_id != CSID_ARM_ARCH_COUNTER) + return; + + /* + * The guest selects one of the two reference counters + * (virtual or physical) with the first argument of the SMCCC + * call. In case the identifier is not supported, error out. + */ + feature = smccc_get_arg1(vcpu); + switch (feature) { + case KVM_PTP_VIRT_COUNTER: + cycles = systime_snapshot.cycles - vcpu_read_sys_reg(vcpu, CNTVOFF_EL2); + break; + case KVM_PTP_PHYS_COUNTER: + cycles = systime_snapshot.cycles; + break; + default: + return; + } + + /* + * This relies on the top bit of val[0] never being set for + * valid values of system time, because that is *really* far + * in the future (about 292 years from 1970, and at that stage + * nobody will give a damn about it). + */ + val[0] = upper_32_bits(systime_snapshot.real); + val[1] = lower_32_bits(systime_snapshot.real); + val[2] = upper_32_bits(cycles); + val[3] = lower_32_bits(cycles); +} + int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) { u32 func_id = smccc_get_function(vcpu); @@ -79,6 +128,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) break; case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID: val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES); + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP); + break; + case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID: + kvm_ptp_get_time(vcpu, val); break; case ARM_SMCCC_TRNG_VERSION: case ARM_SMCCC_TRNG_FEATURES: diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index 1a27bd9493fe..6861489a1890 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -103,6 +103,7 @@ /* KVM "vendor specific" services */ #define ARM_SMCCC_KVM_FUNC_FEATURES 0 +#define ARM_SMCCC_KVM_FUNC_PTP 1 #define ARM_SMCCC_KVM_FUNC_FEATURES_2 127 #define ARM_SMCCC_KVM_NUM_FUNCS 128 @@ -114,6 +115,21 @@ #define SMCCC_ARCH_WORKAROUND_RET_UNAFFECTED 1 +/* + * ptp_kvm is a feature used for time sync between vm and host. + * ptp_kvm module in guest kernel will get service from host using + * this hypercall ID. + */ +#define ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + ARM_SMCCC_OWNER_VENDOR_HYP, \ + ARM_SMCCC_KVM_FUNC_PTP) + +/* ptp_kvm counter type ID */ +#define KVM_PTP_VIRT_COUNTER 0 +#define KVM_PTP_PHYS_COUNTER 1 + /* Paravirtualised time calls (defined by ARM DEN0057A) */ #define ARM_SMCCC_HV_PV_TIME_FEATURES \ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f6afee209620..0e0f70c0d0dc 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1078,6 +1078,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_DIRTY_LOG_RING 192 #define KVM_CAP_X86_BUS_LOCK_EXIT 193 #define KVM_CAP_PPC_DAWR1 194 +#define KVM_CAP_PTP_KVM 195 #ifdef KVM_CAP_IRQ_ROUTING -- cgit v1.2.3 From feb5dc3de03711d846f0b729cb12fc05cbe49ccb Mon Sep 17 00:00:00 2001 From: Alexandru Elisei Date: Wed, 7 Apr 2021 15:48:56 +0100 Subject: Documentation: KVM: Document KVM_GUESTDBG_USE_HW control flag for arm64 Commit 21b6f32f9471 ("KVM: arm64: guest debug, define API headers") added the arm64 KVM_GUESTDBG_USE_HW flag for the KVM_SET_GUEST_DEBUG ioctl and commit 834bf88726f0 ("KVM: arm64: enable KVM_CAP_SET_GUEST_DEBUG") documented and implemented the flag functionality. Since its introduction, at no point was the flag known by any name other than KVM_GUESTDBG_USE_HW for the arm64 architecture, so refer to it as such in the documentation. CC: Paolo Bonzini Signed-off-by: Alexandru Elisei Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20210407144857.199746-2-alexandru.elisei@arm.com --- Documentation/virt/kvm/api.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 38e327d4b479..62ffcb2dafee 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3334,7 +3334,8 @@ The top 16 bits of the control field are architecture specific control flags which can include the following: - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64] - - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64] + - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390] + - KVM_GUESTDBG_USE_HW: using hardware debug events [arm64] - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86] - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86] - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390] -- cgit v1.2.3 From 127ce0b14133f48a5635faa9dac69a3a99f85146 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 8 Apr 2021 14:31:24 +0100 Subject: KVM: arm64: Fix table format for PTP documentation The documentation build legitimately screams about the PTP documentation table being misformated. Fix it by adjusting the table width guides. Reported-by: Stephen Rothwell Signed-off-by: Marc Zyngier --- Documentation/virt/kvm/arm/ptp_kvm.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst b/Documentation/virt/kvm/arm/ptp_kvm.rst index 68cffb50d8bf..5d2bc4558b9d 100644 --- a/Documentation/virt/kvm/arm/ptp_kvm.rst +++ b/Documentation/virt/kvm/arm/ptp_kvm.rst @@ -12,14 +12,14 @@ host to the guest using a KVM-specific hypercall. This hypercall uses the SMC32/HVC32 calling convention: ARM_SMCCC_HYP_KVM_PTP_FUNC_ID - ============= ========== ========== - Function ID: (uint32) 0x86000001 - Arguments: (uint32) KVM_PTP_VIRT_COUNTER(0) - KVM_PTP_PHYS_COUNTER(1) - Return Values: (int32) NOT_SUPPORTED(-1) on error, or - (uint32) Upper 32 bits of wall clock time (r0) - (uint32) Lower 32 bits of wall clock time (r1) - (uint32) Upper 32 bits of counter (r2) - (uint32) Lower 32 bits of counter (r3) - Endianness: No Restrictions. - ============= ========== ========== + ============== ======== ===================================== + Function ID: (uint32) 0x86000001 + Arguments: (uint32) KVM_PTP_VIRT_COUNTER(0) + KVM_PTP_PHYS_COUNTER(1) + Return Values: (int32) NOT_SUPPORTED(-1) on error, or + (uint32) Upper 32 bits of wall clock time (r0) + (uint32) Lower 32 bits of wall clock time (r1) + (uint32) Upper 32 bits of counter (r2) + (uint32) Lower 32 bits of counter (r3) + Endianness: No Restrictions. + ============== ======== ===================================== -- cgit v1.2.3 From 5b32a53d6d057ab213abae33fc275be844051695 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 6 Apr 2021 13:46:42 +0100 Subject: KVM: arm64: Clarify vcpu reset behaviour Although the KVM_ARM_VCPU_INIT documentation mention that the registers are reset to their "initial values", it doesn't describe what these values are. Describe this state explicitly. Reviewed-by: Alexandru Elisei Acked-by: Will Deacon Signed-off-by: Marc Zyngier --- Documentation/virt/kvm/api.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 38e327d4b479..fedfe7104105 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3115,6 +3115,18 @@ optional features it should have.  This will cause a reset of the cpu registers to their initial values.  If this is not called, KVM_RUN will return ENOEXEC for that vcpu. +The initial values are defined as: + - Processor state: + * AArch64: EL1h, D, A, I and F bits set. All other bits + are cleared. + * AArch32: SVC, A, I and F bits set. All other bits are + cleared. + - General Purpose registers, including PC and SP: set to 0 + - FPSIMD/NEON registers: set to 0 + - SVE registers: set to 0 + - System registers: Reset to their architecturally defined + values as for a warm reset to EL1 (resp. SVC) + Note that because some registers reflect machine topology, all vcpus should be created before this ioctl is invoked. -- cgit v1.2.3 From 8b13c36493d8cb56fc3b386507873c5412b7108d Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Tue, 6 Apr 2021 10:24:07 -0400 Subject: KVM: introduce KVM_CAP_SET_GUEST_DEBUG2 This capability will allow the user to know which KVM_GUESTDBG_* bits are supported. Signed-off-by: Maxim Levitsky Message-Id: <20210401135451.1004564-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 3 +++ include/uapi/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 38e327d4b479..9778b2434c03 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3357,6 +3357,9 @@ indicating the number of supported registers. For ppc, the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability indicates whether the single-step debug event (KVM_GUESTDBG_SINGLESTEP) is supported. +Also when supported, KVM_CAP_SET_GUEST_DEBUG2 capability indicates the +supported KVM_GUESTDBG_* bits in the control field. + When debug events exit the main run loop with the reason KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run structure containing architecture specific debug information. diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index f6afee209620..727010788eff 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1078,6 +1078,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_DIRTY_LOG_RING 192 #define KVM_CAP_X86_BUS_LOCK_EXIT 193 #define KVM_CAP_PPC_DAWR1 194 +#define KVM_CAP_SET_GUEST_DEBUG2 195 #ifdef KVM_CAP_IRQ_ROUTING -- cgit v1.2.3 From 24e7475f931ad7090c1e63dbaf12f338aeb81eac Mon Sep 17 00:00:00 2001 From: Emanuele Giuseppe Esposito Date: Tue, 16 Mar 2021 18:08:14 +0100 Subject: doc/virt/kvm: move KVM_CAP_PPC_MULTITCE in section 8 KVM_CAP_PPC_MULTITCE is a capability, not an ioctl. Therefore move it from section 4.97 to the new 8.31 (other capabilities). To fill the gap, move KVM_X86_SET_MSR_FILTER (was 4.126) to 4.97, and shifted Xen-related ioctl (were 4.127 - 4.130) by one place (4.126 - 4.129). Also fixed minor typo in KVM_GET_MSR_INDEX_LIST ioctl description (section 4.3). Signed-off-by: Emanuele Giuseppe Esposito Message-Id: <20210316170814.64286-1-eesposit@redhat.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 250 ++++++++++++++++++++--------------------- 1 file changed, 125 insertions(+), 125 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 9778b2434c03..1b94cced0105 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -204,7 +204,7 @@ Errors: ====== ============================================================ EFAULT the msr index list cannot be read from or written to - E2BIG the msr index list is to be to fit in the array specified by + E2BIG the msr index list is too big to fit in the array specified by the user. ====== ============================================================ @@ -3692,31 +3692,105 @@ which is the maximum number of possibly pending cpu-local interrupts. Queues an SMI on the thread's vcpu. -4.97 KVM_CAP_PPC_MULTITCE -------------------------- +4.97 KVM_X86_SET_MSR_FILTER +---------------------------- -:Capability: KVM_CAP_PPC_MULTITCE -:Architectures: ppc -:Type: vm +:Capability: KVM_X86_SET_MSR_FILTER +:Architectures: x86 +:Type: vm ioctl +:Parameters: struct kvm_msr_filter +:Returns: 0 on success, < 0 on error -This capability means the kernel is capable of handling hypercalls -H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user -space. This significantly accelerates DMA operations for PPC KVM guests. -User space should expect that its handlers for these hypercalls -are not going to be called if user space previously registered LIOBN -in KVM (via KVM_CREATE_SPAPR_TCE or similar calls). +:: -In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest, -user space might have to advertise it for the guest. For example, -IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is -present in the "ibm,hypertas-functions" device-tree property. + struct kvm_msr_filter_range { + #define KVM_MSR_FILTER_READ (1 << 0) + #define KVM_MSR_FILTER_WRITE (1 << 1) + __u32 flags; + __u32 nmsrs; /* number of msrs in bitmap */ + __u32 base; /* MSR index the bitmap starts at */ + __u8 *bitmap; /* a 1 bit allows the operations in flags, 0 denies */ + }; -The hypercalls mentioned above may or may not be processed successfully -in the kernel based fast path. If they can not be handled by the kernel, -they will get passed on to user space. So user space still has to have -an implementation for these despite the in kernel acceleration. + #define KVM_MSR_FILTER_MAX_RANGES 16 + struct kvm_msr_filter { + #define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0) + #define KVM_MSR_FILTER_DEFAULT_DENY (1 << 0) + __u32 flags; + struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES]; + }; -This capability is always enabled. +flags values for ``struct kvm_msr_filter_range``: + +``KVM_MSR_FILTER_READ`` + + Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap + indicates that a read should immediately fail, while a 1 indicates that + a read for a particular MSR should be handled regardless of the default + filter action. + +``KVM_MSR_FILTER_WRITE`` + + Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap + indicates that a write should immediately fail, while a 1 indicates that + a write for a particular MSR should be handled regardless of the default + filter action. + +``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE`` + + Filter both read and write accesses to MSRs using the given bitmap. A 0 + in the bitmap indicates that both reads and writes should immediately fail, + while a 1 indicates that reads and writes for a particular MSR are not + filtered by this range. + +flags values for ``struct kvm_msr_filter``: + +``KVM_MSR_FILTER_DEFAULT_ALLOW`` + + If no filter range matches an MSR index that is getting accessed, KVM will + fall back to allowing access to the MSR. + +``KVM_MSR_FILTER_DEFAULT_DENY`` + + If no filter range matches an MSR index that is getting accessed, KVM will + fall back to rejecting access to the MSR. In this mode, all MSRs that should + be processed by KVM need to explicitly be marked as allowed in the bitmaps. + +This ioctl allows user space to define up to 16 bitmaps of MSR ranges to +specify whether a certain MSR access should be explicitly filtered for or not. + +If this ioctl has never been invoked, MSR accesses are not guarded and the +default KVM in-kernel emulation behavior is fully preserved. + +Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR +filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes +an error. + +As soon as the filtering is in place, every MSR access is processed through +the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff); +x2APIC MSRs are always allowed, independent of the ``default_allow`` setting, +and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base +register. + +If a bit is within one of the defined ranges, read and write accesses are +guarded by the bitmap's value for the MSR index if the kind of access +is included in the ``struct kvm_msr_filter_range`` flags. If no range +cover this particular access, the behavior is determined by the flags +field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW`` +and ``KVM_MSR_FILTER_DEFAULT_DENY``. + +Each bitmap range specifies a range of MSRs to potentially allow access on. +The range goes from MSR index [base .. base+nmsrs]. The flags field +indicates whether reads, writes or both reads and writes are filtered +by setting a 1 bit in the bitmap for the corresponding MSR index. + +If an MSR access is not permitted through the filtering, it generates a +#GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that +allows user space to deflect and potentially handle various MSR accesses +into user space. + +If a vCPU is in running state while this ioctl is invoked, the vCPU may +experience inconsistent filtering behavior on MSR accesses. 4.98 KVM_CREATE_SPAPR_TCE_64 ---------------------------- @@ -4712,107 +4786,7 @@ KVM_PV_VM_VERIFY Verify the integrity of the unpacked image. Only if this succeeds, KVM is allowed to start protected VCPUs. -4.126 KVM_X86_SET_MSR_FILTER ----------------------------- - -:Capability: KVM_X86_SET_MSR_FILTER -:Architectures: x86 -:Type: vm ioctl -:Parameters: struct kvm_msr_filter -:Returns: 0 on success, < 0 on error - -:: - - struct kvm_msr_filter_range { - #define KVM_MSR_FILTER_READ (1 << 0) - #define KVM_MSR_FILTER_WRITE (1 << 1) - __u32 flags; - __u32 nmsrs; /* number of msrs in bitmap */ - __u32 base; /* MSR index the bitmap starts at */ - __u8 *bitmap; /* a 1 bit allows the operations in flags, 0 denies */ - }; - - #define KVM_MSR_FILTER_MAX_RANGES 16 - struct kvm_msr_filter { - #define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0) - #define KVM_MSR_FILTER_DEFAULT_DENY (1 << 0) - __u32 flags; - struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES]; - }; - -flags values for ``struct kvm_msr_filter_range``: - -``KVM_MSR_FILTER_READ`` - - Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap - indicates that a read should immediately fail, while a 1 indicates that - a read for a particular MSR should be handled regardless of the default - filter action. - -``KVM_MSR_FILTER_WRITE`` - - Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap - indicates that a write should immediately fail, while a 1 indicates that - a write for a particular MSR should be handled regardless of the default - filter action. - -``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE`` - - Filter both read and write accesses to MSRs using the given bitmap. A 0 - in the bitmap indicates that both reads and writes should immediately fail, - while a 1 indicates that reads and writes for a particular MSR are not - filtered by this range. - -flags values for ``struct kvm_msr_filter``: - -``KVM_MSR_FILTER_DEFAULT_ALLOW`` - - If no filter range matches an MSR index that is getting accessed, KVM will - fall back to allowing access to the MSR. - -``KVM_MSR_FILTER_DEFAULT_DENY`` - - If no filter range matches an MSR index that is getting accessed, KVM will - fall back to rejecting access to the MSR. In this mode, all MSRs that should - be processed by KVM need to explicitly be marked as allowed in the bitmaps. - -This ioctl allows user space to define up to 16 bitmaps of MSR ranges to -specify whether a certain MSR access should be explicitly filtered for or not. - -If this ioctl has never been invoked, MSR accesses are not guarded and the -default KVM in-kernel emulation behavior is fully preserved. - -Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR -filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes -an error. - -As soon as the filtering is in place, every MSR access is processed through -the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff); -x2APIC MSRs are always allowed, independent of the ``default_allow`` setting, -and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base -register. - -If a bit is within one of the defined ranges, read and write accesses are -guarded by the bitmap's value for the MSR index if the kind of access -is included in the ``struct kvm_msr_filter_range`` flags. If no range -cover this particular access, the behavior is determined by the flags -field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW`` -and ``KVM_MSR_FILTER_DEFAULT_DENY``. - -Each bitmap range specifies a range of MSRs to potentially allow access on. -The range goes from MSR index [base .. base+nmsrs]. The flags field -indicates whether reads, writes or both reads and writes are filtered -by setting a 1 bit in the bitmap for the corresponding MSR index. - -If an MSR access is not permitted through the filtering, it generates a -#GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that -allows user space to deflect and potentially handle various MSR accesses -into user space. - -If a vCPU is in running state while this ioctl is invoked, the vCPU may -experience inconsistent filtering behavior on MSR accesses. - -4.127 KVM_XEN_HVM_SET_ATTR +4.126 KVM_XEN_HVM_SET_ATTR -------------------------- :Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO @@ -4855,7 +4829,7 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO KVM_XEN_ATTR_TYPE_UPCALL_VECTOR Sets the exception vector used to deliver Xen event channel upcalls. -4.128 KVM_XEN_HVM_GET_ATTR +4.127 KVM_XEN_HVM_GET_ATTR -------------------------- :Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO @@ -4867,7 +4841,7 @@ KVM_XEN_ATTR_TYPE_UPCALL_VECTOR Allows Xen VM attributes to be read. For the structure and types, see KVM_XEN_HVM_SET_ATTR above. -4.129 KVM_XEN_VCPU_SET_ATTR +4.128 KVM_XEN_VCPU_SET_ATTR --------------------------- :Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO @@ -4929,7 +4903,7 @@ KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST or RUNSTATE_offline) to set the current accounted state as of the adjusted state_entry_time. -4.130 KVM_XEN_VCPU_GET_ATTR +4.129 KVM_XEN_VCPU_GET_ATTR --------------------------- :Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO @@ -6727,3 +6701,29 @@ vcpu_info is set. The KVM_XEN_HVM_CONFIG_RUNSTATE flag indicates that the runstate-related features KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR/_CURRENT/_DATA/_ADJUST are supported by the KVM_XEN_VCPU_SET_ATTR/KVM_XEN_VCPU_GET_ATTR ioctls. + +8.31 KVM_CAP_PPC_MULTITCE +------------------------- + +:Capability: KVM_CAP_PPC_MULTITCE +:Architectures: ppc +:Type: vm + +This capability means the kernel is capable of handling hypercalls +H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user +space. This significantly accelerates DMA operations for PPC KVM guests. +User space should expect that its handlers for these hypercalls +are not going to be called if user space previously registered LIOBN +in KVM (via KVM_CREATE_SPAPR_TCE or similar calls). + +In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest, +user space might have to advertise it for the guest. For example, +IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is +present in the "ibm,hypertas-functions" device-tree property. + +The hypercalls mentioned above may or may not be processed successfully +in the kernel based fast path. If they can not be handled by the kernel, +they will get passed on to user space. So user space still has to have +an implementation for these despite the in kernel acceleration. + +This capability is always enabled. \ No newline at end of file -- cgit v1.2.3 From fe7e948837f312d87853b3fce743795d1ae3715a Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Mon, 12 Apr 2021 16:21:43 +1200 Subject: KVM: x86: Add capability to grant VM access to privileged SGX attribute Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by userspace to grant a VM access to a priveleged attribute, with args[0] holding a file handle to a valid SGX attribute file. The SGX subsystem restricts access to a subset of enclave attributes to provide additional security for an uncompromised kernel, e.g. to prevent malware from using the PROVISIONKEY to ensure its nodes are running inside a geniune SGX enclave and/or to obtain a stable fingerprint. To prevent userspace from circumventing such restrictions by running an enclave in a VM, KVM restricts guest access to privileged attributes by default. Cc: Andy Lutomirski Signed-off-by: Sean Christopherson Signed-off-by: Kai Huang Message-Id: <0b099d65e933e068e3ea934b0523bab070cb8cea.1618196135.git.kai.huang@intel.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++ arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/x86.c | 21 +++++++++++++++++++++ include/uapi/linux/kvm.h | 1 + 4 files changed, 46 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 1b94cced0105..414b7fe1cf7b 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6207,6 +6207,29 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them. This capability can be used to check / enable 2nd DAWR feature provided by POWER10 processor. +7.25 KVM_CAP_SGX_ATTRIBUTE +---------------------- + +:Architectures: x86 +:Target: VM +:Parameters: args[0] is a file handle of a SGX attribute file in securityfs +:Returns: 0 on success, -EINVAL if the file handle is invalid or if a requested + attribute is not supported by KVM. + +KVM_CAP_SGX_ATTRIBUTE enables a userspace VMM to grant a VM access to one or +more priveleged enclave attributes. args[0] must hold a file handle to a valid +SGX attribute file corresponding to an attribute that is supported/restricted +by KVM (currently only PROVISIONKEY). + +The SGX subsystem restricts access to a subset of enclave attributes to provide +additional security for an uncompromised kernel, e.g. use of the PROVISIONKEY +is restricted to deter malware from using the PROVISIONKEY to obtain a stable +system fingerprint. To prevent userspace from circumventing such restrictions +by running an enclave in a VM, KVM prevents access to privileged attributes by +default. + +See Documentation/x86/sgx/2.Kernel-internals.rst for more details. + 8. Other capabilities. ====================== diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 29365c11bf27..2ae061586677 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -849,7 +849,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function) * expected to derive it from supported XCR0. */ entry->eax &= SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | - /* PROVISIONKEY | */ SGX_ATTR_EINITTOKENKEY | + SGX_ATTR_PROVISIONKEY | SGX_ATTR_EINITTOKENKEY | SGX_ATTR_KSS; entry->ebx &= 0; break; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d4063bec8ce3..d4cdab15238b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -75,6 +75,7 @@ #include #include #include +#include #include #define CREATE_TRACE_POINTS @@ -3804,6 +3805,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_X86_USER_SPACE_MSR: case KVM_CAP_X86_MSR_FILTER: case KVM_CAP_ENFORCE_PV_FEATURE_CPUID: +#ifdef CONFIG_X86_SGX_KVM + case KVM_CAP_SGX_ATTRIBUTE: +#endif r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: @@ -5392,6 +5396,23 @@ split_irqchip_unlock: kvm->arch.bus_lock_detection_enabled = true; r = 0; break; +#ifdef CONFIG_X86_SGX_KVM + case KVM_CAP_SGX_ATTRIBUTE: { + unsigned long allowed_attributes = 0; + + r = sgx_set_attribute(&allowed_attributes, cap->args[0]); + if (r) + break; + + /* KVM only supports the PROVISIONKEY privileged attribute. */ + if ((allowed_attributes & SGX_ATTR_PROVISIONKEY) && + !(allowed_attributes & ~SGX_ATTR_PROVISIONKEY)) + kvm->arch.sgx_provisioning_allowed = true; + else + r = -EINVAL; + break; + } +#endif default: r = -EINVAL; break; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 727010788eff..162b7e044c8b 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1079,6 +1079,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_X86_BUS_LOCK_EXIT 193 #define KVM_CAP_PPC_DAWR1 194 #define KVM_CAP_SET_GUEST_DEBUG2 195 +#define KVM_CAP_SGX_ATTRIBUTE 196 #ifdef KVM_CAP_IRQ_ROUTING -- cgit v1.2.3 From 182a71a3653c4324672fd87e4384fae2fbd63269 Mon Sep 17 00:00:00 2001 From: Zenghui Yu Date: Sat, 17 Apr 2021 19:38:04 +0800 Subject: KVM: arm64: Fix Function ID typo for PTP_KVM service Per include/linux/arm-smccc.h, the Function ID of PTP_KVM service is defined as ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID. Fix the typo in documentation to keep the git grep consistent. Signed-off-by: Zenghui Yu Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20210417113804.1992-1-yuzenghui@huawei.com --- Documentation/virt/kvm/arm/ptp_kvm.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/arm/ptp_kvm.rst b/Documentation/virt/kvm/arm/ptp_kvm.rst index 5d2bc4558b9d..aecdc80ddcd8 100644 --- a/Documentation/virt/kvm/arm/ptp_kvm.rst +++ b/Documentation/virt/kvm/arm/ptp_kvm.rst @@ -7,11 +7,11 @@ PTP_KVM is used for high precision time sync between host and guests. It relies on transferring the wall clock and counter value from the host to the guest using a KVM-specific hypercall. -* ARM_SMCCC_HYP_KVM_PTP_FUNC_ID: 0x86000001 +* ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID: 0x86000001 This hypercall uses the SMC32/HVC32 calling convention: -ARM_SMCCC_HYP_KVM_PTP_FUNC_ID +ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID ============== ======== ===================================== Function ID: (uint32) 0x86000001 Arguments: (uint32) KVM_PTP_VIRT_COUNTER(0) -- cgit v1.2.3 From 54526d1fd59338fd6a381dbd806b7ccbae3aa4aa Mon Sep 17 00:00:00 2001 From: Nathan Tempelman Date: Thu, 8 Apr 2021 22:32:14 +0000 Subject: KVM: x86: Support KVM VMs sharing SEV context Add a capability for userspace to mirror SEV encryption context from one vm to another. On our side, this is intended to support a Migration Helper vCPU, but it can also be used generically to support other in-guest workloads scheduled by the host. The intention is for the primary guest and the mirror to have nearly identical memslots. The primary benefits of this are that: 1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they can't accidentally clobber each other. 2) The VMs can have different memory-views, which is necessary for post-copy migration (the migration vCPUs on the target need to read and write to pages, when the primary guest would VMEXIT). This does not change the threat model for AMD SEV. Any memory involved is still owned by the primary guest and its initial state is still attested to through the normal SEV_LAUNCH_* flows. If userspace wanted to circumvent SEV, they could achieve the same effect by simply attaching a vCPU to the primary VM. This patch deliberately leaves userspace in charge of the memslots for the mirror, as it already has the power to mess with them in the primary guest. This patch does not support SEV-ES (much less SNP), as it does not handle handing off attested VMSAs to the mirror. For additional context, we need a Migration Helper because SEV PSP migration is far too slow for our live migration on its own. Using an in-guest migrator lets us speed this up significantly. Signed-off-by: Nathan Tempelman Message-Id: <20210408223214.2582277-1-natet@google.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 18 ++++++++- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm/sev.c | 90 +++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/svm/svm.c | 2 + arch/x86/kvm/svm/svm.h | 2 + arch/x86/kvm/x86.c | 7 +++- include/linux/kvm_host.h | 1 + include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 6 +++ 9 files changed, 126 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 414b7fe1cf7b..fd4a84911355 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6207,6 +6207,22 @@ KVM_RUN_BUS_LOCK flag is used to distinguish between them. This capability can be used to check / enable 2nd DAWR feature provided by POWER10 processor. +7.24 KVM_CAP_VM_COPY_ENC_CONTEXT_FROM +------------------------------------- + +Architectures: x86 SEV enabled +Type: vm +Parameters: args[0] is the fd of the source vm +Returns: 0 on success; ENOTTY on error + +This capability enables userspace to copy encryption context from the vm +indicated by the fd to the vm this is called on. + +This is intended to support in-guest workloads scheduled by the host. This +allows the in-guest workload to maintain its own NPTs and keeps the two vms +from accidentally clobbering each other with interrupts and the like (separate +APIC/MSRs/etc). + 7.25 KVM_CAP_SGX_ATTRIBUTE ---------------------- @@ -6749,4 +6765,4 @@ in the kernel based fast path. If they can not be handled by the kernel, they will get passed on to user space. So user space still has to have an implementation for these despite the in kernel acceleration. -This capability is always enabled. \ No newline at end of file +This capability is always enabled. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c5c96386cc76..6e195f7df4f0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1349,6 +1349,7 @@ struct kvm_x86_ops { int (*mem_enc_op)(struct kvm *kvm, void __user *argp); int (*mem_enc_reg_region)(struct kvm *kvm, struct kvm_enc_region *argp); int (*mem_enc_unreg_region)(struct kvm *kvm, struct kvm_enc_region *argp); + int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd); int (*get_msr_feature)(struct kvm_msr_entry *entry); diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index bb4bf5ffb104..4bf79dbc3eeb 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -66,6 +66,11 @@ static int sev_flush_asids(void) return ret; } +static inline bool is_mirroring_enc_context(struct kvm *kvm) +{ + return !!to_kvm_svm(kvm)->sev_info.enc_context_owner; +} + /* Must be called with the sev_bitmap_lock held */ static bool __sev_recycle_asids(int min_asid, int max_asid) { @@ -1122,6 +1127,12 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) mutex_lock(&kvm->lock); + /* enc_context_owner handles all memory enc operations */ + if (is_mirroring_enc_context(kvm)) { + r = -EINVAL; + goto out; + } + switch (sev_cmd.id) { case KVM_SEV_ES_INIT: if (!sev_es) { @@ -1185,6 +1196,10 @@ int svm_register_enc_region(struct kvm *kvm, if (!sev_guest(kvm)) return -ENOTTY; + /* If kvm is mirroring encryption context it isn't responsible for it */ + if (is_mirroring_enc_context(kvm)) + return -EINVAL; + if (range->addr > ULONG_MAX || range->size > ULONG_MAX) return -EINVAL; @@ -1251,6 +1266,10 @@ int svm_unregister_enc_region(struct kvm *kvm, struct enc_region *region; int ret; + /* If kvm is mirroring encryption context it isn't responsible for it */ + if (is_mirroring_enc_context(kvm)) + return -EINVAL; + mutex_lock(&kvm->lock); if (!sev_guest(kvm)) { @@ -1281,6 +1300,71 @@ failed: return ret; } +int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd) +{ + struct file *source_kvm_file; + struct kvm *source_kvm; + struct kvm_sev_info *mirror_sev; + unsigned int asid; + int ret; + + source_kvm_file = fget(source_fd); + if (!file_is_kvm(source_kvm_file)) { + ret = -EBADF; + goto e_source_put; + } + + source_kvm = source_kvm_file->private_data; + mutex_lock(&source_kvm->lock); + + if (!sev_guest(source_kvm)) { + ret = -EINVAL; + goto e_source_unlock; + } + + /* Mirrors of mirrors should work, but let's not get silly */ + if (is_mirroring_enc_context(source_kvm) || source_kvm == kvm) { + ret = -EINVAL; + goto e_source_unlock; + } + + asid = to_kvm_svm(source_kvm)->sev_info.asid; + + /* + * The mirror kvm holds an enc_context_owner ref so its asid can't + * disappear until we're done with it + */ + kvm_get_kvm(source_kvm); + + fput(source_kvm_file); + mutex_unlock(&source_kvm->lock); + mutex_lock(&kvm->lock); + + if (sev_guest(kvm)) { + ret = -EINVAL; + goto e_mirror_unlock; + } + + /* Set enc_context_owner and copy its encryption context over */ + mirror_sev = &to_kvm_svm(kvm)->sev_info; + mirror_sev->enc_context_owner = source_kvm; + mirror_sev->asid = asid; + mirror_sev->active = true; + + mutex_unlock(&kvm->lock); + return 0; + +e_mirror_unlock: + mutex_unlock(&kvm->lock); + kvm_put_kvm(source_kvm); + return ret; +e_source_unlock: + mutex_unlock(&source_kvm->lock); +e_source_put: + fput(source_kvm_file); + return ret; +} + void sev_vm_destroy(struct kvm *kvm) { struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; @@ -1290,6 +1374,12 @@ void sev_vm_destroy(struct kvm *kvm) if (!sev_guest(kvm)) return; + /* If this is a mirror_kvm release the enc_context_owner and skip sev cleanup */ + if (is_mirroring_enc_context(kvm)) { + kvm_put_kvm(sev->enc_context_owner); + return; + } + mutex_lock(&kvm->lock); /* diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 86e42d42637e..cd8c333ed2dc 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4589,6 +4589,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .mem_enc_reg_region = svm_register_enc_region, .mem_enc_unreg_region = svm_unregister_enc_region, + .vm_copy_enc_context_from = svm_vm_copy_asid_from, + .can_emulate_instruction = svm_can_emulate_instruction, .apic_init_signal_blocked = svm_apic_init_signal_blocked, diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 7deb7a057004..454da1c1d9b7 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -68,6 +68,7 @@ struct kvm_sev_info { unsigned long pages_locked; /* Number of pages locked */ struct list_head regions_list; /* List of registered regions */ u64 ap_jump_table; /* SEV-ES AP Jump Table address */ + struct kvm *enc_context_owner; /* Owner of copied encryption context */ }; struct kvm_svm { @@ -580,6 +581,7 @@ int svm_register_enc_region(struct kvm *kvm, struct kvm_enc_region *range); int svm_unregister_enc_region(struct kvm *kvm, struct kvm_enc_region *range); +int svm_vm_copy_asid_from(struct kvm *kvm, unsigned int source_fd); void pre_sev_run(struct vcpu_svm *svm, int cpu); void __init sev_hardware_setup(void); void sev_hardware_teardown(void); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d4cdab15238b..a5eeca55810f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3808,6 +3808,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) #ifdef CONFIG_X86_SGX_KVM case KVM_CAP_SGX_ATTRIBUTE: #endif + case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: @@ -4714,7 +4715,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, kvm_update_pv_runtime(vcpu); return 0; - default: return -EINVAL; } @@ -5413,6 +5413,11 @@ split_irqchip_unlock: break; } #endif + case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM: + r = -EINVAL; + if (kvm_x86_ops.vm_copy_enc_context_from) + r = kvm_x86_ops.vm_copy_enc_context_from(kvm, cap->args[0]); + return r; default: r = -EINVAL; break; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 82b066db37cb..c306b4c3674e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -654,6 +654,7 @@ void kvm_exit(void); void kvm_get_kvm(struct kvm *kvm); void kvm_put_kvm(struct kvm *kvm); +bool file_is_kvm(struct file *file); void kvm_put_kvm_no_destroy(struct kvm *kvm); static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 162b7e044c8b..37f0a329da6a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1080,6 +1080,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_PPC_DAWR1 194 #define KVM_CAP_SET_GUEST_DEBUG2 195 #define KVM_CAP_SGX_ATTRIBUTE 196 +#define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 545a28387c33..48c1e4842908 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4197,6 +4197,12 @@ static struct file_operations kvm_vm_fops = { KVM_COMPAT(kvm_vm_compat_ioctl), }; +bool file_is_kvm(struct file *file) +{ + return file && file->f_op == &kvm_vm_fops; +} +EXPORT_SYMBOL_GPL(file_is_kvm); + static int kvm_dev_ioctl_create_vm(unsigned long type) { int r; -- cgit v1.2.3 From c265878fcb2c96befe7424e984011ed0ce6d095d Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Tue, 20 Apr 2021 04:57:06 -0400 Subject: KVM: x86: document behavior of measurement ioctls with len==0 Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/amd-memory-encryption.rst | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst index 469a6308765b..34ce2d1fcb89 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/amd-memory-encryption.rst @@ -148,6 +148,9 @@ measurement. Since the guest owner knows the initial contents of the guest at boot, the measurement can be verified by comparing it to what the guest owner expects. +If len is zero on entry, the measurement blob length is written to len and +uaddr is unused. + Parameters (in): struct kvm_sev_launch_measure Returns: 0 on success, -negative on error @@ -271,6 +274,9 @@ report containing the SHA-256 digest of the guest memory and VMSA passed through commands and signed with the PEK. The digest returned by the command should match the digest used by the guest owner with the KVM_SEV_LAUNCH_MEASURE. +If len is zero on entry, the measurement blob length is written to len and +uaddr is unused. + Parameters (in): struct kvm_sev_attestation Returns: 0 on success, -negative on error -- cgit v1.2.3 From 4cfdd47d6d95aca4fb8d6cfbe73392472d353f82 Mon Sep 17 00:00:00 2001 From: Brijesh Singh Date: Thu, 15 Apr 2021 15:53:14 +0000 Subject: KVM: SVM: Add KVM_SEV SEND_START command The command is used to create an outgoing SEV guest encryption context. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Paolo Bonzini Cc: Joerg Roedel Cc: Borislav Petkov Cc: Tom Lendacky Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford Reviewed-by: Venu Busireddy Signed-off-by: Brijesh Singh Signed-off-by: Ashish Kalra Message-Id: <2f1686d0164e0f1b3d6a41d620408393e0a48376.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/amd-memory-encryption.rst | 30 ++++++ arch/x86/kvm/svm/sev.c | 128 +++++++++++++++++++++++ include/linux/psp-sev.h | 8 +- include/uapi/linux/kvm.h | 12 +++ 4 files changed, 174 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst index 34ce2d1fcb89..db5c3fb2bab5 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/amd-memory-encryption.rst @@ -290,6 +290,36 @@ Returns: 0 on success, -negative on error __u32 len; }; +11. KVM_SEV_SEND_START +---------------------- + +The KVM_SEV_SEND_START command can be used by the hypervisor to create an +outgoing guest encryption context. + +If session_len is zero on entry, the length of the guest session information is +written to session_len and all other fields are not used. + +Parameters (in): struct kvm_sev_send_start + +Returns: 0 on success, -negative on error + +:: + struct kvm_sev_send_start { + __u32 policy; /* guest policy */ + + __u64 pdh_cert_uaddr; /* platform Diffie-Hellman certificate */ + __u32 pdh_cert_len; + + __u64 plat_certs_uaddr; /* platform certificate chain */ + __u32 plat_certs_len; + + __u64 amd_certs_uaddr; /* AMD certificate */ + __u32 amd_certs_len; + + __u64 session_uaddr; /* Guest session information */ + __u32 session_len; + }; + References ========== diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 4bf79dbc3eeb..9f5312956d5e 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1111,6 +1111,131 @@ e_free: return ret; } +/* Userspace wants to query session length. */ +static int +__sev_send_start_query_session_length(struct kvm *kvm, struct kvm_sev_cmd *argp, + struct kvm_sev_send_start *params) +{ + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; + struct sev_data_send_start *data; + int ret; + + data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT); + if (data == NULL) + return -ENOMEM; + + data->handle = sev->handle; + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error); + if (ret < 0) + goto out; + + params->session_len = data->session_len; + if (copy_to_user((void __user *)(uintptr_t)argp->data, params, + sizeof(struct kvm_sev_send_start))) + ret = -EFAULT; + +out: + kfree(data); + return ret; +} + +static int sev_send_start(struct kvm *kvm, struct kvm_sev_cmd *argp) +{ + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; + struct sev_data_send_start *data; + struct kvm_sev_send_start params; + void *amd_certs, *session_data; + void *pdh_cert, *plat_certs; + int ret; + + if (!sev_guest(kvm)) + return -ENOTTY; + + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, + sizeof(struct kvm_sev_send_start))) + return -EFAULT; + + /* if session_len is zero, userspace wants to query the session length */ + if (!params.session_len) + return __sev_send_start_query_session_length(kvm, argp, + ¶ms); + + /* some sanity checks */ + if (!params.pdh_cert_uaddr || !params.pdh_cert_len || + !params.session_uaddr || params.session_len > SEV_FW_BLOB_MAX_SIZE) + return -EINVAL; + + /* allocate the memory to hold the session data blob */ + session_data = kmalloc(params.session_len, GFP_KERNEL_ACCOUNT); + if (!session_data) + return -ENOMEM; + + /* copy the certificate blobs from userspace */ + pdh_cert = psp_copy_user_blob(params.pdh_cert_uaddr, + params.pdh_cert_len); + if (IS_ERR(pdh_cert)) { + ret = PTR_ERR(pdh_cert); + goto e_free_session; + } + + plat_certs = psp_copy_user_blob(params.plat_certs_uaddr, + params.plat_certs_len); + if (IS_ERR(plat_certs)) { + ret = PTR_ERR(plat_certs); + goto e_free_pdh; + } + + amd_certs = psp_copy_user_blob(params.amd_certs_uaddr, + params.amd_certs_len); + if (IS_ERR(amd_certs)) { + ret = PTR_ERR(amd_certs); + goto e_free_plat_cert; + } + + data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT); + if (data == NULL) { + ret = -ENOMEM; + goto e_free_amd_cert; + } + + /* populate the FW SEND_START field with system physical address */ + data->pdh_cert_address = __psp_pa(pdh_cert); + data->pdh_cert_len = params.pdh_cert_len; + data->plat_certs_address = __psp_pa(plat_certs); + data->plat_certs_len = params.plat_certs_len; + data->amd_certs_address = __psp_pa(amd_certs); + data->amd_certs_len = params.amd_certs_len; + data->session_address = __psp_pa(session_data); + data->session_len = params.session_len; + data->handle = sev->handle; + + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_START, data, &argp->error); + + if (!ret && copy_to_user((void __user *)(uintptr_t)params.session_uaddr, + session_data, params.session_len)) { + ret = -EFAULT; + goto e_free; + } + + params.policy = data->policy; + params.session_len = data->session_len; + if (copy_to_user((void __user *)(uintptr_t)argp->data, ¶ms, + sizeof(struct kvm_sev_send_start))) + ret = -EFAULT; + +e_free: + kfree(data); +e_free_amd_cert: + kfree(amd_certs); +e_free_plat_cert: + kfree(plat_certs); +e_free_pdh: + kfree(pdh_cert); +e_free_session: + kfree(session_data); + return ret; +} + int svm_mem_enc_op(struct kvm *kvm, void __user *argp) { struct kvm_sev_cmd sev_cmd; @@ -1173,6 +1298,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) case KVM_SEV_GET_ATTESTATION_REPORT: r = sev_get_attestation_report(kvm, &sev_cmd); break; + case KVM_SEV_SEND_START: + r = sev_send_start(kvm, &sev_cmd); + break; default: r = -EINVAL; goto out; diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h index b801ead1e2bb..73da511b9423 100644 --- a/include/linux/psp-sev.h +++ b/include/linux/psp-sev.h @@ -326,11 +326,11 @@ struct sev_data_send_start { u64 pdh_cert_address; /* In */ u32 pdh_cert_len; /* In */ u32 reserved1; - u64 plat_cert_address; /* In */ - u32 plat_cert_len; /* In */ + u64 plat_certs_address; /* In */ + u32 plat_certs_len; /* In */ u32 reserved2; - u64 amd_cert_address; /* In */ - u32 amd_cert_len; /* In */ + u64 amd_certs_address; /* In */ + u32 amd_certs_len; /* In */ u32 reserved3; u64 session_address; /* In */ u32 session_len; /* In/Out */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 37f0a329da6a..11b3e528d35d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1732,6 +1732,18 @@ struct kvm_sev_attestation_report { __u32 len; }; +struct kvm_sev_send_start { + __u32 policy; + __u64 pdh_cert_uaddr; + __u32 pdh_cert_len; + __u64 plat_certs_uaddr; + __u32 plat_certs_len; + __u64 amd_certs_uaddr; + __u32 amd_certs_len; + __u64 session_uaddr; + __u32 session_len; +}; + #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2) -- cgit v1.2.3 From d3d1af85e2c75bb57da51535a6e182c7c45eceb0 Mon Sep 17 00:00:00 2001 From: Brijesh Singh Date: Thu, 15 Apr 2021 15:53:55 +0000 Subject: KVM: SVM: Add KVM_SEND_UPDATE_DATA command The command is used for encrypting the guest memory region using the encryption context created with KVM_SEV_SEND_START. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Paolo Bonzini Cc: Joerg Roedel Cc: Borislav Petkov Cc: Tom Lendacky Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by : Steve Rutherford Signed-off-by: Brijesh Singh Signed-off-by: Ashish Kalra Message-Id: Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/amd-memory-encryption.rst | 28 +++++ arch/x86/kvm/svm/sev.c | 125 +++++++++++++++++++++++ include/uapi/linux/kvm.h | 9 ++ 3 files changed, 162 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst index db5c3fb2bab5..79da88a14135 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/amd-memory-encryption.rst @@ -320,6 +320,34 @@ Returns: 0 on success, -negative on error __u32 session_len; }; +12. KVM_SEV_SEND_UPDATE_DATA +---------------------------- + +The KVM_SEV_SEND_UPDATE_DATA command can be used by the hypervisor to encrypt the +outgoing guest memory region with the encryption context creating using +KVM_SEV_SEND_START. + +If hdr_len or trans_len are zero on entry, the length of the packet header and +transport region are written to hdr_len and trans_len respectively, and all +other fields are not used. + +Parameters (in): struct kvm_sev_send_update_data + +Returns: 0 on success, -negative on error + +:: + + struct kvm_sev_launch_send_update_data { + __u64 hdr_uaddr; /* userspace address containing the packet header */ + __u32 hdr_len; + + __u64 guest_uaddr; /* the source memory region to be encrypted */ + __u32 guest_len; + + __u64 trans_uaddr; /* the destination memory region */ + __u32 trans_len; + }; + References ========== diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 9f5312956d5e..a534ec9ae172 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -34,6 +34,7 @@ static DECLARE_RWSEM(sev_deactivate_lock); static DEFINE_MUTEX(sev_bitmap_lock); unsigned int max_sev_asid; static unsigned int min_sev_asid; +static unsigned long sev_me_mask; static unsigned long *sev_asid_bitmap; static unsigned long *sev_reclaim_asid_bitmap; @@ -1236,6 +1237,126 @@ e_free_session: return ret; } +/* Userspace wants to query either header or trans length. */ +static int +__sev_send_update_data_query_lengths(struct kvm *kvm, struct kvm_sev_cmd *argp, + struct kvm_sev_send_update_data *params) +{ + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; + struct sev_data_send_update_data *data; + int ret; + + data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT); + if (!data) + return -ENOMEM; + + data->handle = sev->handle; + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error); + if (ret < 0) + goto out; + + params->hdr_len = data->hdr_len; + params->trans_len = data->trans_len; + + if (copy_to_user((void __user *)(uintptr_t)argp->data, params, + sizeof(struct kvm_sev_send_update_data))) + ret = -EFAULT; + +out: + kfree(data); + return ret; +} + +static int sev_send_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp) +{ + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; + struct sev_data_send_update_data *data; + struct kvm_sev_send_update_data params; + void *hdr, *trans_data; + struct page **guest_page; + unsigned long n; + int ret, offset; + + if (!sev_guest(kvm)) + return -ENOTTY; + + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, + sizeof(struct kvm_sev_send_update_data))) + return -EFAULT; + + /* userspace wants to query either header or trans length */ + if (!params.trans_len || !params.hdr_len) + return __sev_send_update_data_query_lengths(kvm, argp, ¶ms); + + if (!params.trans_uaddr || !params.guest_uaddr || + !params.guest_len || !params.hdr_uaddr) + return -EINVAL; + + /* Check if we are crossing the page boundary */ + offset = params.guest_uaddr & (PAGE_SIZE - 1); + if ((params.guest_len + offset > PAGE_SIZE)) + return -EINVAL; + + /* Pin guest memory */ + guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK, + PAGE_SIZE, &n, 0); + if (!guest_page) + return -EFAULT; + + /* allocate memory for header and transport buffer */ + ret = -ENOMEM; + hdr = kmalloc(params.hdr_len, GFP_KERNEL_ACCOUNT); + if (!hdr) + goto e_unpin; + + trans_data = kmalloc(params.trans_len, GFP_KERNEL_ACCOUNT); + if (!trans_data) + goto e_free_hdr; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + goto e_free_trans_data; + + data->hdr_address = __psp_pa(hdr); + data->hdr_len = params.hdr_len; + data->trans_address = __psp_pa(trans_data); + data->trans_len = params.trans_len; + + /* The SEND_UPDATE_DATA command requires C-bit to be always set. */ + data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) + + offset; + data->guest_address |= sev_me_mask; + data->guest_len = params.guest_len; + data->handle = sev->handle; + + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_UPDATE_DATA, data, &argp->error); + + if (ret) + goto e_free; + + /* copy transport buffer to user space */ + if (copy_to_user((void __user *)(uintptr_t)params.trans_uaddr, + trans_data, params.trans_len)) { + ret = -EFAULT; + goto e_free; + } + + /* Copy packet header to userspace. */ + ret = copy_to_user((void __user *)(uintptr_t)params.hdr_uaddr, hdr, + params.hdr_len); + +e_free: + kfree(data); +e_free_trans_data: + kfree(trans_data); +e_free_hdr: + kfree(hdr); +e_unpin: + sev_unpin_memory(kvm, guest_page, n); + + return ret; +} + int svm_mem_enc_op(struct kvm *kvm, void __user *argp) { struct kvm_sev_cmd sev_cmd; @@ -1301,6 +1422,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) case KVM_SEV_SEND_START: r = sev_send_start(kvm, &sev_cmd); break; + case KVM_SEV_SEND_UPDATE_DATA: + r = sev_send_update_data(kvm, &sev_cmd); + break; default: r = -EINVAL; goto out; @@ -1559,6 +1683,7 @@ void __init sev_hardware_setup(void) /* Minimum ASID value that should be used for SEV guest */ min_sev_asid = edx; + sev_me_mask = 1UL << (ebx & 0x3f); /* Initialize SEV ASID bitmaps */ sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 11b3e528d35d..d07be69f402f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1744,6 +1744,15 @@ struct kvm_sev_send_start { __u32 session_len; }; +struct kvm_sev_send_update_data { + __u64 hdr_uaddr; + __u32 hdr_len; + __u64 guest_uaddr; + __u32 guest_len; + __u64 trans_uaddr; + __u32 trans_len; +}; + #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2) -- cgit v1.2.3 From fddecf6a237ee464db7a1771fad6507d8c180c03 Mon Sep 17 00:00:00 2001 From: Brijesh Singh Date: Thu, 15 Apr 2021 15:54:15 +0000 Subject: KVM: SVM: Add KVM_SEV_SEND_FINISH command The command is used to finailize the encryption context created with KVM_SEV_SEND_START command. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Paolo Bonzini Cc: Joerg Roedel Cc: Borislav Petkov Cc: Tom Lendacky Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford Signed-off-by: Brijesh Singh Signed-off-by: Ashish Kalra Message-Id: <5082bd6a8539d24bc55a1dd63a1b341245bb168f.1618498113.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/amd-memory-encryption.rst | 8 ++++++++ arch/x86/kvm/svm/sev.c | 23 +++++++++++++++++++++++ 2 files changed, 31 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst index 79da88a14135..03f2518cfbeb 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/amd-memory-encryption.rst @@ -348,6 +348,14 @@ Returns: 0 on success, -negative on error __u32 trans_len; }; +13. KVM_SEV_SEND_FINISH +------------------------ + +After completion of the migration flow, the KVM_SEV_SEND_FINISH command can be +issued by the hypervisor to delete the encryption context. + +Returns: 0 on success, -negative on error + References ========== diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index a534ec9ae172..f25c52a2dc14 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1357,6 +1357,26 @@ e_unpin: return ret; } +static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp) +{ + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; + struct sev_data_send_finish *data; + int ret; + + if (!sev_guest(kvm)) + return -ENOTTY; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + data->handle = sev->handle; + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_FINISH, data, &argp->error); + + kfree(data); + return ret; +} + int svm_mem_enc_op(struct kvm *kvm, void __user *argp) { struct kvm_sev_cmd sev_cmd; @@ -1425,6 +1445,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) case KVM_SEV_SEND_UPDATE_DATA: r = sev_send_update_data(kvm, &sev_cmd); break; + case KVM_SEV_SEND_FINISH: + r = sev_send_finish(kvm, &sev_cmd); + break; default: r = -EINVAL; goto out; -- cgit v1.2.3 From 5569e2e7a650dfffd4df7635662b2f92162d6501 Mon Sep 17 00:00:00 2001 From: Steve Rutherford Date: Tue, 20 Apr 2021 05:01:20 -0400 Subject: KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command After completion of SEND_START, but before SEND_FINISH, the source VMM can issue the SEND_CANCEL command to stop a migration. This is necessary so that a cancelled migration can restart with a new target later. Reviewed-by: Nathan Tempelman Reviewed-by: Brijesh Singh Signed-off-by: Steve Rutherford Message-Id: <20210412194408.2458827-1-srutherford@google.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/amd-memory-encryption.rst | 9 +++++++++ arch/x86/kvm/svm/sev.c | 23 +++++++++++++++++++++++ drivers/crypto/ccp/sev-dev.c | 1 + include/linux/psp-sev.h | 10 ++++++++++ include/uapi/linux/kvm.h | 2 ++ 5 files changed, 45 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst index 03f2518cfbeb..c36a12975763 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/amd-memory-encryption.rst @@ -356,6 +356,15 @@ issued by the hypervisor to delete the encryption context. Returns: 0 on success, -negative on error +14. KVM_SEV_SEND_CANCEL +------------------------ + +After completion of SEND_START, but before SEND_FINISH, the source VMM can issue the +SEND_CANCEL command to stop a migration. This is necessary so that a cancelled +migration can restart with a new target later. + +Returns: 0 on success, -negative on error + References ========== diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index f25c52a2dc14..552f47787143 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1377,6 +1377,26 @@ static int sev_send_finish(struct kvm *kvm, struct kvm_sev_cmd *argp) return ret; } +static int sev_send_cancel(struct kvm *kvm, struct kvm_sev_cmd *argp) +{ + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; + struct sev_data_send_cancel *data; + int ret; + + if (!sev_guest(kvm)) + return -ENOTTY; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + data->handle = sev->handle; + ret = sev_issue_cmd(kvm, SEV_CMD_SEND_CANCEL, data, &argp->error); + + kfree(data); + return ret; +} + int svm_mem_enc_op(struct kvm *kvm, void __user *argp) { struct kvm_sev_cmd sev_cmd; @@ -1448,6 +1468,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) case KVM_SEV_SEND_FINISH: r = sev_send_finish(kvm, &sev_cmd); break; + case KVM_SEV_SEND_CANCEL: + r = sev_send_cancel(kvm, &sev_cmd); + break; default: r = -EINVAL; goto out; diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c index cb9b4c4e371e..4172a1afa0db 100644 --- a/drivers/crypto/ccp/sev-dev.c +++ b/drivers/crypto/ccp/sev-dev.c @@ -129,6 +129,7 @@ static int sev_cmd_buffer_len(int cmd) case SEV_CMD_DOWNLOAD_FIRMWARE: return sizeof(struct sev_data_download_firmware); case SEV_CMD_GET_ID: return sizeof(struct sev_data_get_id); case SEV_CMD_ATTESTATION_REPORT: return sizeof(struct sev_data_attestation_report); + case SEV_CMD_SEND_CANCEL: return sizeof(struct sev_data_send_cancel); default: return 0; } diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h index 73da511b9423..d48a7192e881 100644 --- a/include/linux/psp-sev.h +++ b/include/linux/psp-sev.h @@ -73,6 +73,7 @@ enum sev_cmd { SEV_CMD_SEND_UPDATE_DATA = 0x041, SEV_CMD_SEND_UPDATE_VMSA = 0x042, SEV_CMD_SEND_FINISH = 0x043, + SEV_CMD_SEND_CANCEL = 0x044, /* Guest migration commands (incoming) */ SEV_CMD_RECEIVE_START = 0x050, @@ -392,6 +393,15 @@ struct sev_data_send_finish { u32 handle; /* In */ } __packed; +/** + * struct sev_data_send_cancel - SEND_CANCEL command parameters + * + * @handle: handle of the VM to process + */ +struct sev_data_send_cancel { + u32 handle; /* In */ +} __packed; + /** * struct sev_data_receive_start - RECEIVE_START command parameters * diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d07be69f402f..5d1adeb13739 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1674,6 +1674,8 @@ enum sev_cmd_id { KVM_SEV_CERT_EXPORT, /* Attestation report */ KVM_SEV_GET_ATTESTATION_REPORT, + /* Guest Migration Extension */ + KVM_SEV_SEND_CANCEL, KVM_SEV_NR_MAX, }; -- cgit v1.2.3 From af43cbbf954b50ca97d5e7bb56c2edc6ffd209ef Mon Sep 17 00:00:00 2001 From: Brijesh Singh Date: Thu, 15 Apr 2021 15:54:50 +0000 Subject: KVM: SVM: Add support for KVM_SEV_RECEIVE_START command The command is used to create the encryption context for an incoming SEV guest. The encryption context can be later used by the hypervisor to import the incoming data into the SEV guest memory space. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Paolo Bonzini Cc: Joerg Roedel Cc: Borislav Petkov Cc: Tom Lendacky Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford Signed-off-by: Brijesh Singh Signed-off-by: Ashish Kalra Message-Id: Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/amd-memory-encryption.rst | 29 +++++++++ arch/x86/kvm/svm/sev.c | 81 ++++++++++++++++++++++++ include/uapi/linux/kvm.h | 9 +++ 3 files changed, 119 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst index c36a12975763..86c9b36f4a57 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/amd-memory-encryption.rst @@ -365,6 +365,35 @@ migration can restart with a new target later. Returns: 0 on success, -negative on error +15. KVM_SEV_RECEIVE_START +------------------------ + +The KVM_SEV_RECEIVE_START command is used for creating the memory encryption +context for an incoming SEV guest. To create the encryption context, the user must +provide a guest policy, the platform public Diffie-Hellman (PDH) key and session +information. + +Parameters: struct kvm_sev_receive_start (in/out) + +Returns: 0 on success, -negative on error + +:: + + struct kvm_sev_receive_start { + __u32 handle; /* if zero then firmware creates a new handle */ + __u32 policy; /* guest's policy */ + + __u64 pdh_uaddr; /* userspace address pointing to the PDH key */ + __u32 pdh_len; + + __u64 session_uaddr; /* userspace address which points to the guest session information */ + __u32 session_len; + }; + +On success, the 'handle' field contains a new handle and on error, a negative value. + +For more details, see SEV spec Section 6.12. + References ========== diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 552f47787143..3e5064c69124 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1397,6 +1397,84 @@ static int sev_send_cancel(struct kvm *kvm, struct kvm_sev_cmd *argp) return ret; } +static int sev_receive_start(struct kvm *kvm, struct kvm_sev_cmd *argp) +{ + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; + struct sev_data_receive_start *start; + struct kvm_sev_receive_start params; + int *error = &argp->error; + void *session_data; + void *pdh_data; + int ret; + + if (!sev_guest(kvm)) + return -ENOTTY; + + /* Get parameter from the userspace */ + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, + sizeof(struct kvm_sev_receive_start))) + return -EFAULT; + + /* some sanity checks */ + if (!params.pdh_uaddr || !params.pdh_len || + !params.session_uaddr || !params.session_len) + return -EINVAL; + + pdh_data = psp_copy_user_blob(params.pdh_uaddr, params.pdh_len); + if (IS_ERR(pdh_data)) + return PTR_ERR(pdh_data); + + session_data = psp_copy_user_blob(params.session_uaddr, + params.session_len); + if (IS_ERR(session_data)) { + ret = PTR_ERR(session_data); + goto e_free_pdh; + } + + ret = -ENOMEM; + start = kzalloc(sizeof(*start), GFP_KERNEL); + if (!start) + goto e_free_session; + + start->handle = params.handle; + start->policy = params.policy; + start->pdh_cert_address = __psp_pa(pdh_data); + start->pdh_cert_len = params.pdh_len; + start->session_address = __psp_pa(session_data); + start->session_len = params.session_len; + + /* create memory encryption context */ + ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_RECEIVE_START, start, + error); + if (ret) + goto e_free; + + /* Bind ASID to this guest */ + ret = sev_bind_asid(kvm, start->handle, error); + if (ret) + goto e_free; + + params.handle = start->handle; + if (copy_to_user((void __user *)(uintptr_t)argp->data, + ¶ms, sizeof(struct kvm_sev_receive_start))) { + ret = -EFAULT; + sev_unbind_asid(kvm, start->handle); + goto e_free; + } + + sev->handle = start->handle; + sev->fd = argp->sev_fd; + +e_free: + kfree(start); +e_free_session: + kfree(session_data); +e_free_pdh: + kfree(pdh_data); + + return ret; +} + int svm_mem_enc_op(struct kvm *kvm, void __user *argp) { struct kvm_sev_cmd sev_cmd; @@ -1471,6 +1549,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) case KVM_SEV_SEND_CANCEL: r = sev_send_cancel(kvm, &sev_cmd); break; + case KVM_SEV_RECEIVE_START: + r = sev_receive_start(kvm, &sev_cmd); + break; default: r = -EINVAL; goto out; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5d1adeb13739..248d1d266bc3 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1755,6 +1755,15 @@ struct kvm_sev_send_update_data { __u32 trans_len; }; +struct kvm_sev_receive_start { + __u32 handle; + __u32 policy; + __u64 pdh_uaddr; + __u32 pdh_len; + __u64 session_uaddr; + __u32 session_len; +}; + #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2) -- cgit v1.2.3 From 15fb7de1a7f5af0d5910ca4352b26f887543e26e Mon Sep 17 00:00:00 2001 From: Brijesh Singh Date: Thu, 15 Apr 2021 15:55:17 +0000 Subject: KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command The command is used for copying the incoming buffer into the SEV guest memory space. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Paolo Bonzini Cc: Joerg Roedel Cc: Borislav Petkov Cc: Tom Lendacky Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford Signed-off-by: Brijesh Singh Signed-off-by: Ashish Kalra Message-Id: Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/amd-memory-encryption.rst | 24 +++++++ arch/x86/kvm/svm/sev.c | 79 ++++++++++++++++++++++++ include/uapi/linux/kvm.h | 9 +++ 3 files changed, 112 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst index 86c9b36f4a57..bb94c6e3615f 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/amd-memory-encryption.rst @@ -394,6 +394,30 @@ On success, the 'handle' field contains a new handle and on error, a negative va For more details, see SEV spec Section 6.12. +16. KVM_SEV_RECEIVE_UPDATE_DATA +---------------------------- + +The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy +the incoming buffers into the guest memory region with encryption context +created during the KVM_SEV_RECEIVE_START. + +Parameters (in): struct kvm_sev_receive_update_data + +Returns: 0 on success, -negative on error + +:: + + struct kvm_sev_launch_receive_update_data { + __u64 hdr_uaddr; /* userspace address containing the packet header */ + __u32 hdr_len; + + __u64 guest_uaddr; /* the destination guest memory region */ + __u32 guest_len; + + __u64 trans_uaddr; /* the incoming buffer memory region */ + __u32 trans_len; + }; + References ========== diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 3e5064c69124..ff6435527a67 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1475,6 +1475,82 @@ e_free_pdh: return ret; } +static int sev_receive_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp) +{ + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; + struct kvm_sev_receive_update_data params; + struct sev_data_receive_update_data *data; + void *hdr = NULL, *trans = NULL; + struct page **guest_page; + unsigned long n; + int ret, offset; + + if (!sev_guest(kvm)) + return -EINVAL; + + if (copy_from_user(¶ms, (void __user *)(uintptr_t)argp->data, + sizeof(struct kvm_sev_receive_update_data))) + return -EFAULT; + + if (!params.hdr_uaddr || !params.hdr_len || + !params.guest_uaddr || !params.guest_len || + !params.trans_uaddr || !params.trans_len) + return -EINVAL; + + /* Check if we are crossing the page boundary */ + offset = params.guest_uaddr & (PAGE_SIZE - 1); + if ((params.guest_len + offset > PAGE_SIZE)) + return -EINVAL; + + hdr = psp_copy_user_blob(params.hdr_uaddr, params.hdr_len); + if (IS_ERR(hdr)) + return PTR_ERR(hdr); + + trans = psp_copy_user_blob(params.trans_uaddr, params.trans_len); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto e_free_hdr; + } + + ret = -ENOMEM; + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + goto e_free_trans; + + data->hdr_address = __psp_pa(hdr); + data->hdr_len = params.hdr_len; + data->trans_address = __psp_pa(trans); + data->trans_len = params.trans_len; + + /* Pin guest memory */ + ret = -EFAULT; + guest_page = sev_pin_memory(kvm, params.guest_uaddr & PAGE_MASK, + PAGE_SIZE, &n, 0); + if (!guest_page) + goto e_free; + + /* The RECEIVE_UPDATE_DATA command requires C-bit to be always set. */ + data->guest_address = (page_to_pfn(guest_page[0]) << PAGE_SHIFT) + + offset; + data->guest_address |= sev_me_mask; + data->guest_len = params.guest_len; + data->handle = sev->handle; + + ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_UPDATE_DATA, data, + &argp->error); + + sev_unpin_memory(kvm, guest_page, n); + +e_free: + kfree(data); +e_free_trans: + kfree(trans); +e_free_hdr: + kfree(hdr); + + return ret; +} + int svm_mem_enc_op(struct kvm *kvm, void __user *argp) { struct kvm_sev_cmd sev_cmd; @@ -1552,6 +1628,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) case KVM_SEV_RECEIVE_START: r = sev_receive_start(kvm, &sev_cmd); break; + case KVM_SEV_RECEIVE_UPDATE_DATA: + r = sev_receive_update_data(kvm, &sev_cmd); + break; default: r = -EINVAL; goto out; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 248d1d266bc3..d76533498543 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1764,6 +1764,15 @@ struct kvm_sev_receive_start { __u32 session_len; }; +struct kvm_sev_receive_update_data { + __u64 hdr_uaddr; + __u32 hdr_len; + __u64 guest_uaddr; + __u32 guest_len; + __u64 trans_uaddr; + __u32 trans_len; +}; + #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) #define KVM_DEV_ASSIGN_MASK_INTX (1 << 2) -- cgit v1.2.3 From 6a443def87d2698f4fa2d7b57e7f4e5f0f61671a Mon Sep 17 00:00:00 2001 From: Brijesh Singh Date: Thu, 15 Apr 2021 15:55:40 +0000 Subject: KVM: SVM: Add KVM_SEV_RECEIVE_FINISH command The command finalize the guest receiving process and make the SEV guest ready for the execution. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Paolo Bonzini Cc: Joerg Roedel Cc: Borislav Petkov Cc: Tom Lendacky Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Steve Rutherford Signed-off-by: Brijesh Singh Signed-off-by: Ashish Kalra Message-Id: Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/amd-memory-encryption.rst | 8 ++++++++ arch/x86/kvm/svm/sev.c | 23 +++++++++++++++++++++++ 2 files changed, 31 insertions(+) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst index bb94c6e3615f..907adfe60090 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/amd-memory-encryption.rst @@ -418,6 +418,14 @@ Returns: 0 on success, -negative on error __u32 trans_len; }; +17. KVM_SEV_RECEIVE_FINISH +------------------------ + +After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be +issued by the hypervisor to make the guest ready for execution. + +Returns: 0 on success, -negative on error + References ========== diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index ff6435527a67..3f766da43708 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1551,6 +1551,26 @@ e_free_hdr: return ret; } +static int sev_receive_finish(struct kvm *kvm, struct kvm_sev_cmd *argp) +{ + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; + struct sev_data_receive_finish *data; + int ret; + + if (!sev_guest(kvm)) + return -ENOTTY; + + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) + return -ENOMEM; + + data->handle = sev->handle; + ret = sev_issue_cmd(kvm, SEV_CMD_RECEIVE_FINISH, data, &argp->error); + + kfree(data); + return ret; +} + int svm_mem_enc_op(struct kvm *kvm, void __user *argp) { struct kvm_sev_cmd sev_cmd; @@ -1631,6 +1651,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp) case KVM_SEV_RECEIVE_UPDATE_DATA: r = sev_receive_update_data(kvm, &sev_cmd); break; + case KVM_SEV_RECEIVE_FINISH: + r = sev_receive_finish(kvm, &sev_cmd); + break; default: r = -EINVAL; goto out; -- cgit v1.2.3 From f82762fb6193513a852483cc6787ddc2d701d09c Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 22 Apr 2021 09:49:46 -0400 Subject: KVM: documentation: fix sphinx warnings Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/amd-memory-encryption.rst | 7 ++++--- Documentation/virt/kvm/api.rst | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst b/Documentation/virt/kvm/amd-memory-encryption.rst index 907adfe60090..5ec8a1902e15 100644 --- a/Documentation/virt/kvm/amd-memory-encryption.rst +++ b/Documentation/virt/kvm/amd-memory-encryption.rst @@ -304,6 +304,7 @@ Parameters (in): struct kvm_sev_send_start Returns: 0 on success, -negative on error :: + struct kvm_sev_send_start { __u32 policy; /* guest policy */ @@ -366,7 +367,7 @@ migration can restart with a new target later. Returns: 0 on success, -negative on error 15. KVM_SEV_RECEIVE_START ------------------------- +------------------------- The KVM_SEV_RECEIVE_START command is used for creating the memory encryption context for an incoming SEV guest. To create the encryption context, the user must @@ -395,7 +396,7 @@ On success, the 'handle' field contains a new handle and on error, a negative va For more details, see SEV spec Section 6.12. 16. KVM_SEV_RECEIVE_UPDATE_DATA ----------------------------- +------------------------------- The KVM_SEV_RECEIVE_UPDATE_DATA command can be used by the hypervisor to copy the incoming buffers into the guest memory region with encryption context @@ -419,7 +420,7 @@ Returns: 0 on success, -negative on error }; 17. KVM_SEV_RECEIVE_FINISH ------------------------- +-------------------------- After completion of the migration flow, the KVM_SEV_RECEIVE_FINISH command can be issued by the hypervisor to make the guest ready for execution. diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 94804c2f45ac..37c520ddb7e8 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6340,7 +6340,7 @@ from accidentally clobbering each other with interrupts and the like (separate APIC/MSRs/etc). 7.25 KVM_CAP_SGX_ATTRIBUTE ----------------------- +-------------------------- :Architectures: x86 :Target: VM -- cgit v1.2.3