summaryrefslogtreecommitdiff
path: root/arch/x86/hyperv
AgeCommit message (Collapse)AuthorFilesLines
2023-09-04Merge tag 'hyperv-next-signed-20230902' of ↵Linus Torvalds3-22/+361
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for SEV-SNP guests on Hyper-V (Tianyu Lan) - Support for TDX guests on Hyper-V (Dexuan Cui) - Use SBRM API in Hyper-V balloon driver (Mitchell Levy) - Avoid dereferencing ACPI root object handle in VMBus driver (Maciej Szmigiero) - A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh Sengar) * tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: Remove duplicate include x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's x86/hyperv: Remove hv_isolation_type_en_snp x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor x86/hyperv: Introduce a global variable hyperv_paravisor_present Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests Drivers: hv: vmbus: Support fully enlightened TDX guests x86/hyperv: Support hypercalls for fully enlightened TDX guests x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub hv: hyperv.h: Replace one-element array with flexible-array member Drivers: hv: vmbus: Don't dereference ACPI root object handle x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES x86/hyperv: Add smp support for SEV-SNP guest clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest ...
2023-08-30Merge tag 'x86_apic_for_6.6-rc1' of ↵Linus Torvalds4-16/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Dave Hansen: "This includes a very thorough rework of the 'struct apic' handlers. Quite a variety of them popped up over the years, especially in the 32-bit days when odd apics were much more in vogue. The end result speaks for itself, which is a removal of a ton of code and static calls to replace indirect calls. If there's any breakage here, it's likely to be around the 32-bit museum pieces that get light to no testing these days. Summary: - Rework apic callbacks, getting rid of unnecessary ones and coalescing lots of silly duplicates. - Use static_calls() instead of indirect calls for apic->foo() - Tons of cleanups an crap removal along the way" * tag 'x86_apic_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/apic: Turn on static calls x86/apic: Provide static call infrastructure for APIC callbacks x86/apic: Wrap IPI calls into helper functions x86/apic: Mark all hotpath APIC callback wrappers __always_inline x86/xen/apic: Mark apic __ro_after_init x86/apic: Convert other overrides to apic_update_callback() x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb() x86/apic: Provide apic_update_callback() x86/xen/apic: Use standard apic driver mechanism for Xen PV x86/apic: Provide common init infrastructure x86/apic: Wrap apic->native_eoi() into a helper x86/apic: Nuke ack_APIC_irq() x86/apic: Remove pointless arguments from [native_]eoi_write() x86/apic/noop: Tidy up the code x86/apic: Remove pointless NULL initializations x86/apic: Sanitize APIC ID range validation x86/apic: Prepare x2APIC for using apic::max_apic_id x86/apic: Simplify X2APIC ID validation x86/apic: Add max_apic_id member x86/apic: Wrap APIC ID validation into an inline ...
2023-08-25x86/hyperv: Remove duplicate includeJiapeng Chong1-2/+0
./arch/x86/hyperv/ivm.c: asm/sev.h is included more than once. ./arch/x86/hyperv/ivm.c: asm/coco.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6212 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080352.98945-1-jiapeng.chong@linux.alibaba.com
2023-08-25x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef'sDexuan Cui1-159/+150
Group the code this way so that we can avoid too many ifdef's: Data only used in an SNP VM with the paravisor; Functions only used in an SNP VM with the paravisor; Data only used in an SNP VM without the paravisor; Functions only used in an SNP VM without the paravisor; Functions only used in a TDX VM, with and without the paravisor; Functions used in an SNP or TDX VM, when the paravisor is present; Functions always used, even in a regular non-CoCo VM. No functional change. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-11-decui@microsoft.com
2023-08-25x86/hyperv: Remove hv_isolation_type_en_snpDexuan Cui2-15/+5
In ms_hyperv_init_platform(), do not distinguish between a SNP VM with the paravisor and a SNP VM without the paravisor. Replace hv_isolation_type_en_snp() with !ms_hyperv.paravisor_present && hv_isolation_type_snp(). The hv_isolation_type_en_snp() in drivers/hv/hv.c and drivers/hv/hv_common.c can be changed to hv_isolation_type_snp() since we know !ms_hyperv.paravisor_present is true there. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-10-decui@microsoft.com
2023-08-25x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisorDexuan Cui2-8/+69
When the paravisor is present, a SNP VM must use GHCB to access some special MSRs, including HV_X64_MSR_GUEST_OS_ID and some SynIC MSRs. Similarly, when the paravisor is present, a TDX VM must use TDX GHCI to access the same MSRs. Implement hv_tdx_msr_write() and hv_tdx_msr_read(), and use the helper functions hv_ivm_msr_read() and hv_ivm_msr_write() to access the MSRs in a unified way for SNP/TDX VMs with the paravisor. Do not export hv_tdx_msr_write() and hv_tdx_msr_read(), because we never really used hv_ghcb_msr_write() and hv_ghcb_msr_read() in any module. Update arch/x86/include/asm/mshyperv.h so that the kernel can still build if CONFIG_AMD_MEM_ENCRYPT or CONFIG_INTEL_TDX_GUEST is not set, or neither is set. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-9-decui@microsoft.com
2023-08-25Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisorDexuan Cui1-2/+18
The post_msg_page was removed in commit 9a6b1a170ca8 ("Drivers: hv: vmbus: Remove the per-CPU post_msg_page") However, it turns out that we need to bring it back, but only for a TDX VM with the paravisor: in such a VM, the hyperv_pcpu_input_arg is not decrypted, but the HVCALL_POST_MESSAGE in such a VM needs a decrypted page as the hypercall input page: see the comments in hyperv_init() for a detailed explanation. Except for HVCALL_POST_MESSAGE and HVCALL_SIGNAL_EVENT, the other hypercalls in a TDX VM with the paravisor still use hv_hypercall_pg and must use the hyperv_pcpu_input_arg (which is encrypted in such a VM), when a hypercall input page is used. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-8-decui@microsoft.com
2023-08-25x86/hyperv: Introduce a global variable hyperv_paravisor_presentDexuan Cui2-5/+37
The new variable hyperv_paravisor_present is set only when the VM is a SNP/TDX VM with the paravisor running: see ms_hyperv_init_platform(). We introduce hyperv_paravisor_present because we can not use ms_hyperv.paravisor_present in arch/x86/include/asm/mshyperv.h: struct ms_hyperv_info is defined in include/asm-generic/mshyperv.h, which is included at the end of arch/x86/include/asm/mshyperv.h, but at the beginning of arch/x86/include/asm/mshyperv.h, we would already need to use struct ms_hyperv_info in hv_do_hypercall(). We use hyperv_paravisor_present only in include/asm-generic/mshyperv.h, and use ms_hyperv.paravisor_present elsewhere. In the future, we'll introduce a hypercall function structure for different VM types, and at boot time, the right function pointers would be written into the structure so that runtime testing of TDX vs. SNP vs. normal will be avoided and hyperv_paravisor_present will no longer be needed. Call hv_vtom_init() when it's a VBS VM or when ms_hyperv.paravisor_present is true, i.e. the VM is a SNP VM or TDX VM with the paravisor. Enhance hv_vtom_init() for a TDX VM with the paravisor. In hv_common_cpu_init(), don't decrypt the hyperv_pcpu_input_arg for a TDX VM with the paravisor, just like we don't decrypt the page for a SNP VM with the paravisor. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-7-decui@microsoft.com
2023-08-25Drivers: hv: vmbus: Support fully enlightened TDX guestsDexuan Cui2-7/+27
Add Hyper-V specific code so that a fully enlightened TDX guest (i.e. without the paravisor) can run on Hyper-V: Don't use hv_vp_assist_page. Use GHCI instead. Don't try to use the unsupported HV_REGISTER_CRASH_CTL. Don't trust (use) Hyper-V's TLB-flushing hypercalls. Don't use lazy EOI. Share the SynIC Event/Message pages with the hypervisor. Don't use the Hyper-V TSC page for now, because non-trivial work is required to share the page with the hypervisor. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-4-decui@microsoft.com
2023-08-25x86/hyperv: Support hypercalls for fully enlightened TDX guestsDexuan Cui2-0/+25
A fully enlightened TDX guest on Hyper-V (i.e. without the paravisor) only uses the GHCI call rather than hv_hypercall_pg. Do not initialize hypercall_pg for such a guest. In hv_common_cpu_init(), the hyperv_pcpu_input_arg page needs to be decrypted in such a guest. Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-3-decui@microsoft.com
2023-08-25x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guestsDexuan Cui1-0/+9
No logic change to SNP/VBS guests. hv_isolation_type_tdx() will be used to instruct a TDX guest on Hyper-V to do some TDX-specific operations, e.g. for a fully enlightened TDX guest (i.e. without the paravisor), hv_do_hypercall() should use __tdx_hypercall() and such a guest on Hyper-V should handle the Hyper-V Event/Message/Monitor pages specially. Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230824080712.30327-2-decui@microsoft.com
2023-08-22x86/hyperv: Add smp support for SEV-SNP guestTianyu Lan1-0/+138
In the AMD SEV-SNP guest, AP needs to be started up via sev es save area and Hyper-V requires to call HVCALL_START_VP hypercall to pass the gpa of sev es save area with AP's vp index and VTL(Virtual trust level) parameters. Override wakeup_secondary_cpu_64 callback with hv_snp_boot_ap. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230818102919.1318039-8-ltykernel@gmail.com
2023-08-22x86/hyperv: Mark Hyper-V vp assist page unencrypted in SEV-SNP enlightened guestTianyu Lan1-1/+15
hv vp assist page needs to be shared between SEV-SNP guest and Hyper-V. So mark the page unencrypted in the SEV-SNP guest. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230818102919.1318039-4-ltykernel@gmail.com
2023-08-22x86/hyperv: Set Virtual Trust Level in VMBus init messageTianyu Lan1-0/+34
SEV-SNP guests on Hyper-V can run at multiple Virtual Trust Levels (VTL). During boot, get the VTL at which we're running using the GET_VP_REGISTERs hypercall, and save the value for future use. Then during VMBus initialization, set the VTL with the saved value as required in the VMBus init message. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230818102919.1318039-3-ltykernel@gmail.com
2023-08-22x86/hyperv: Add sev-snp enlightened guest static keyTianyu Lan1-0/+11
Introduce static key isolation_type_en_snp for enlightened sev-snp guest check. Reviewed-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Tianyu Lan <tiala@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/20230818102919.1318039-2-ltykernel@gmail.com
2023-08-09x86/apic: Wrap IPI calls into helper functionsDave Hansen1-1/+1
Move them to one place so the static call conversion gets simpler. No functional change. [ dhansen: merge against recent x86/apic changes ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
2023-08-09x86/apic: Convert other overrides to apic_update_callback()Thomas Gleixner1-10/+10
Convert all places which just assign a new function directly to the apic callback to use apic_update_callback() which prepares for using static calls. Mark snp_set_wakeup_secondary_cpu() and kvm_setup_pv_ipi() __init, as they are only invoked from init code and otherwise trigger a section mismatch as they are now invoking a __init function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
2023-08-09x86/apic: Replace acpi_wake_cpu_handler_update() and apic_set_eoi_cb()Thomas Gleixner2-2/+2
Switch them over to apic_update_callback() and remove the code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
2023-08-09x86/apic: Nuke ack_APIC_irq()Dave Hansen1-1/+1
Yet another wrapper of a wrapper gone along with the outdated comment that this compiles to a single instruction. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
2023-08-09x86/apic: Remove pointless arguments from [native_]eoi_write()Thomas Gleixner1-3/+3
Every callsite hands in the same constants which is a pointless exercise and cannot be optimized by the compiler due to the indirect calls. Use the constants in the eoi() callbacks and remove the arguments. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Juergen Gross <jgross@suse.com> # Xen PV (dom0 and unpriv. guest)
2023-08-05Merge tag 'hyperv-fixes-signed-20230804' of ↵Linus Torvalds6-26/+33
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix a bug in a python script for Hyper-V (Ani Sinha) - Workaround a bug in Hyper-V when IBT is enabled (Michael Kelley) - Fix an issue parsing MP table when Linux runs in VTL2 (Saurabh Sengar) - Several cleanup patches (Nischala Yelchuri, Kameron Carr, YueHaibing, ZhiHu) * tag 'hyperv-fixes-signed-20230804' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer() x86/hyperv: add noop functions to x86_init mpparse functions vmbus_testing: fix wrong python syntax for integer value comparison x86/hyperv: fix a warning in mshyperv.h x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction x86/hyperv: Improve code for referencing hyperv_pcpu_input_arg Drivers: hv: Change hv_free_hyperv_page() to take void * argument
2023-08-03x86/hyperv: add noop functions to x86_init mpparse functionsSaurabh Sengar1-0/+4
Hyper-V can run VMs at different privilege "levels" known as Virtual Trust Levels (VTL). Sometimes, it chooses to run two different VMs at different levels but they share some of their address space. In such setups VTL2 (higher level VM) has visibility of all of the VTL0 (level 0) memory space. When the CONFIG_X86_MPPARSE is enabled for VTL2, the VTL2 kernel performs a search within the low memory to locate MP tables. However, in systems where VTL0 manages the low memory and may contain valid tables, this scanning can result in incorrect MP table information being provided to the VTL2 kernel, mistakenly considering VTL0's MP table as its own Add noop functions to avoid MP parse scan by VTL2. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/1687537688-5397-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-07-24x86/hyperv: Disable IBT when hypercall page lacks ENDBR instructionMichael Kelley1-0/+21
On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs with ConfigVersion 9.3 or later support IBT in the guest. However, current versions of Hyper-V have a bug in that there's not an ENDBR64 instruction at the beginning of the hypercall page. Since hypercalls are made with an indirect call to the hypercall page, all hypercall attempts fail with an exception and Linux panics. A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start with ENDBR. The VM will boot and run without IBT. If future Linux 32-bit kernels were to support IBT, additional hypercall page hackery would be needed to make IBT work for such kernels in a Hyper-V VM. Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1690001476-98594-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-28x86/hyperv: Improve code for referencing hyperv_pcpu_input_argNischala Yelchuri4-26/+8
Several places in code for Hyper-V reference the per-CPU variable hyperv_pcpu_input_arg. Older code uses a multi-line sequence to reference the variable, and usually includes a cast. Newer code does a much simpler direct assignment. The latter is preferable as the complexity of the older code is unnecessary. Update older code to use the simpler direct assignment. Signed-off-by: Nischala Yelchuri <niyelchu@linux.microsoft.com> Link: https://lore.kernel.org/r/1687286438-9421-1-git-send-email-niyelchu@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-27Merge tag 'x86_sev_for_v6.5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Some SEV and CC platform helpers cleanup and simplifications now that the usage patterns are becoming apparent [ I'm sure I'm the only one that has gets confused by all the TLAs, but in case there are others: here SEV is AMD's "Secure Encrypted Virtualization" and CC is generic "Confidential Computing". There's also Intel SGX (Software Guard Extensions) and TDX (Trust Domain Extensions), along with all the vendor memory encryption extensions (SME, TSME, TME, and WTF). And then we have arm64 with RMA and CCA, and I probably forgot another dozen or so related acronyms - Linus ] * tag 'x86_sev_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/coco: Get rid of accessor functions x86/sev: Get rid of special sev_es_enable_key x86/coco: Mark cc_platform_has() and descendants noinstr
2023-06-27Merge tag 'x86_mtrr_for_v6.5' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mtrr updates from Borislav Petkov: "A serious scrubbing of the MTRR code including adding a new map mechanism in order to look up the memory type of a region easily. Also address memory range lookup issues like returning an invalid memory type. Furthermore, this handles the decoupling of PAT from MTRR more naturally. All work by Juergen Gross" * tag 'x86_mtrr_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/xen: Set default memory type for PV guests to WB x86/mtrr: Unify debugging printing x86/mtrr: Remove unused code x86/mm: Only check uniform after calling mtrr_type_lookup() x86/mtrr: Don't let mtrr_type_lookup() return MTRR_TYPE_INVALID x86/mtrr: Use new cache_map in mtrr_type_lookup() x86/mtrr: Add mtrr=debug command line option x86/mtrr: Construct a memory map with cache modes x86/mtrr: Add get_effective_type() service function x86/mtrr: Allocate mtrr_value array dynamically x86/mtrr: Move 32-bit code from mtrr.c to legacy.c x86/mtrr: Have only one set_mtrr() variant x86/mtrr: Replace vendor tests in MTRR code x86/xen: Set MTRR state when running as Xen PV initial domain x86/hyperv: Set MTRR state when running as SEV-SNP Hyper-V guest x86/mtrr: Support setting MTRR state for software defined MTRRs x86/mtrr: Replace size_or_mask and size_and_mask with a much easier concept x86/mtrr: Remove physical address size calculation
2023-06-18x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offlineMichael Kelley1-1/+1
These commits a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg") 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs") update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg because that memory will be correctly marked as decrypted or encrypted for all VM types (CoCo or normal). But problems ensue when CPUs in the VM go online or offline after virtual PCI devices have been configured. When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN. But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which may call the virtual PCI driver and fault trying to use the as yet uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo VM if the MMIO read and write hypercalls are used from state CPUHP_AP_IRQ_AFFINITY_ONLINE. When a CPU is taken offline, IRQs may be reassigned in state CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to use the hyperv_pcpu_input_arg that has already been freed by a higher state. Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE) and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for Hyper-V initialization so that hyperv_pcpu_input_arg is allocated early enough. Fix the offlining problem by not freeing hyperv_pcpu_input_arg when a CPU goes offline. Retain the allocated memory, and reuse it if the CPU comes back online later. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1684862062-51576-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-06-01x86/hyperv: Set MTRR state when running as SEV-SNP Hyper-V guestJuergen Gross1-0/+4
In order to avoid mappings using the UC- cache attribute, set the MTRR state to use WB caching as the default. This is needed in order to cope with the fact that PAT is enabled, while MTRRs are not supported by the hypervisor. Fixes: 90b926e68f50 ("x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20230502120931.20719-5-jgross@suse.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2023-05-09x86/coco: Get rid of accessor functionsBorislav Petkov (AMD)1-1/+1
cc_vendor is __ro_after_init and thus can be used directly. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230508121957.32341-1-bp@alien8.de
2023-05-08x86/hyperv/vtl: Add noop for realmode pointersSaurabh Sengar1-0/+2
Assign the realmode pointers to noop, instead of NULL to fix kernel panic. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1682331016-22561-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-29Merge tag 'objtool-core-2023-04-27' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown and panic functions - Misc improvements & fixes * tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/hyperv: Mark hv_ghcb_terminate() as noreturn scsi: message: fusion: Mark mpt_halt_firmware() __noreturn x86/cpu: Mark {hlt,resume}_play_dead() __noreturn btrfs: Mark btrfs_assertfail() __noreturn objtool: Include weak functions in global_noreturns check cpu: Mark nmi_panic_self_stop() __noreturn cpu: Mark panic_smp_self_stop() __noreturn arm64/cpu: Mark cpu_park_loop() and friends __noreturn x86/head: Mark *_start_kernel() __noreturn init: Mark start_kernel() __noreturn init: Mark [arch_call_]rest_init() __noreturn objtool: Generate ORC data for __pfx code x86/linkage: Fix padding for typed functions objtool: Separate prefix code from stack validation code objtool: Remove superfluous dead_end_function() check objtool: Add symbol iteration helpers objtool: Add WARN_INSN() scripts/objdump-func: Support multiple functions context_tracking: Fix KCSAN noinstr violation objtool: Add stackleak instrumentation to uaccess safe list ...
2023-04-18x86/hyperv: VTL support for Hyper-VSaurabh Sengar2-0/+228
Virtual Trust Levels (VTL) helps enable Hyper-V Virtual Secure Mode (VSM) feature. VSM is a set of hypervisor capabilities and enlightenments offered to host and guest partitions which enable the creation and management of new security boundaries within operating system software. VSM achieves and maintains isolation through VTLs. Add early initialization for Virtual Trust Levels (VTL). This includes initializing the x86 platform for VTL and enabling boot support for secondary CPUs to start in targeted VTL context. For now, only enable the code for targeted VTL level as 2. When starting an AP at a VTL other than VTL0, the AP must start directly in 64-bit mode, bypassing the usual 16-bit -> 32-bit -> 64-bit mode transition sequence that occurs after waking up an AP with SIPI whose vector points to the 16-bit AP startup trampoline code. Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com> Link: https://lore.kernel.org/r/1681192532-15460-6-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17x86/hyperv: Exclude lazy TLB mode CPUs from enlightened TLB flushesMichael Kelley1-1/+10
In the case where page tables are not freed, native_flush_tlb_multi() does not do a remote TLB flush on CPUs in lazy TLB mode because the CPU will flush itself at the next context switch. By comparison, the Hyper-V enlightened TLB flush does not exclude CPUs in lazy TLB mode and so performs unnecessary flushes. If we're not freeing page tables, add logic to test for lazy TLB mode when adding CPUs to the input argument to the Hyper-V TLB flush hypercall. Exclude lazy TLB mode CPUs so the behavior matches native_flush_tlb_multi() and the unnecessary flushes are avoided. Handle both the <=64 vCPU case and the _ex case for >64 vCPUs. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1679922967-26582-3-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17x86/hyperv: Add callback filter to cpumask_to_vpset()Michael Kelley1-4/+8
When copying CPUs from a Linux cpumask to a Hyper-V VPset, cpumask_to_vpset() currently has a "_noself" variant that doesn't copy the current CPU to the VPset. Generalize this variant by replacing it with a "_skip" variant having a callback function that is invoked for each CPU to decide if that CPU should be copied. Update the one caller of cpumask_to_vpset_noself() to use the new "_skip" variant instead. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1679922967-26582-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17Drivers: hv: Don't remap addresses that are above shared_gpa_boundaryMichael Kelley1-2/+5
With the vTOM bit now treated as a protection flag and not part of the physical address, avoid remapping physical addresses with vTOM set since technically such addresses aren't valid. Use ioremap_cache() instead of memremap() to ensure that the mapping provides decrypted access, which will correctly set the vTOM bit as a protection flag. While this change is not required for correctness with the current implementation of memremap(), for general code hygiene it's better to not depend on the mapping functions doing something reasonable with a physical address that is out-of-range. While here, fix typos in two error messages. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-12-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-17hv_netvsc: Remove second mapping of send and recv buffersMichael Kelley1-28/+0
With changes to how Hyper-V guest VMs flip memory between private (encrypted) and shared (decrypted), creating a second kernel virtual mapping for shared memory is no longer necessary. Everything needed for the transition to shared is handled by set_memory_decrypted(). As such, remove the code to create and manage the second mapping for the pre-allocated send and recv buffers. This mapping is the last user of hv_map_memory()/hv_unmap_memory(), so delete these functions as well. Finally, hv_map_memory() is the last user of vmap_pfn() in Hyper-V guest code, so remove the Kconfig selection of VMAP_PFN. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-11-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2023-04-14x86/hyperv: Mark hv_ghcb_terminate() as noreturnGuilherme G. Piccoli1-1/+1
Annotate the function prototype and definition as noreturn to prevent objtool warnings like: vmlinux.o: warning: objtool: hyperv_init+0x55c: unreachable instruction Also, as per Josh's suggestion, add it to the global_noreturns list. As a comparison, an objdump output without the annotation: [...] 1b63: mov $0x1,%esi 1b68: xor %edi,%edi 1b6a: callq ffffffff8102f680 <hv_ghcb_terminate> 1b6f: jmpq ffffffff82f217ec <hyperv_init+0x9c> # unreachable 1b74: cmpq $0xffffffffffffffff,-0x702a24(%rip) [...] Now, after adding the __noreturn to the function prototype: [...] 17df: callq ffffffff8102f6d0 <hv_ghcb_negotiate_protocol> 17e4: test %al,%al 17e6: je ffffffff82f21bb9 <hyperv_init+0x469> [...] <many insns> 1bb9: mov $0x1,%esi 1bbe: xor %edi,%edi 1bc0: callq ffffffff8102f680 <hv_ghcb_terminate> 1bc5: nopw %cs:0x0(%rax,%rax,1) # end of function Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/32453a703dfcf0d007b473c9acbf70718222b74b.1681342859.git.jpoimboe@kernel.org
2023-03-27x86/hyperv: Change vTOM handling to use standard coco mechanismsMichael Kelley2-22/+61
Hyper-V guests on AMD SEV-SNP hardware have the option of using the "virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP architecture. With vTOM, shared vs. private memory accesses are controlled by splitting the guest physical address space into two halves. vTOM is the dividing line where the uppermost bit of the physical address space is set; e.g., with 47 bits of guest physical address space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is accessible at two parallel physical addresses -- one below vTOM and one above vTOM. Accesses below vTOM are private (encrypted) while accesses above vTOM are shared (decrypted). In this sense, vTOM is like the GPA.SHARED bit in Intel TDX. Support for Hyper-V guests using vTOM was added to the Linux kernel in two patch sets[1][2]. This support treats the vTOM bit as part of the physical address. For accessing shared (decrypted) memory, these patch sets create a second kernel virtual mapping that maps to physical addresses above vTOM. A better approach is to treat the vTOM bit as a protection flag, not as part of the physical address. This new approach is like the approach for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel virtual mapping, the existing mapping is updated using recently added coco mechanisms. When memory is changed between private and shared using set_memory_decrypted() and set_memory_encrypted(), the PTEs for the existing kernel mapping are changed to add or remove the vTOM bit in the guest physical address, just as with TDX. The hypercalls to change the memory status on the host side are made using the existing callback mechanism. Everything just works, with a minor tweak to map the IO-APIC to use private accesses. To accomplish the switch in approach, the following must be done: * Update Hyper-V initialization to set the cc_mask based on vTOM and do other coco initialization. * Update physical_mask so the vTOM bit is no longer treated as part of the physical address * Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear the vTOM bit as a protection flag. * Code already exists to make hypercalls to inform Hyper-V about pages changing between shared and private. Update this code to run as a callback from __set_memory_enc_pgtable(). * Remove the Hyper-V special case from __set_memory_enc_dec() * Remove the Hyper-V specific call to swiotlb_update_mem_attributes() since mem_encrypt_init() will now do it. * Add a Hyper-V specific implementation of the is_private_mmio() callback that returns true for the IO-APIC and vTPM MMIO addresses [1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/ [2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/ [ bp: Touchups. ] Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/1679838727-87310-7-git-send-email-mikelley@microsoft.com
2023-03-27Drivers: hv: Explicitly request decrypted in vmap_pfn() callsMichael Kelley1-1/+1
Update vmap_pfn() calls to explicitly request that the mapping be for decrypted access to the memory. There's no change in functionality since the PFNs passed to vmap_pfn() are above the shared_gpa_boundary, implicitly producing a decrypted mapping. But explicitly requesting "decrypted" allows the code to work before and after changes that cause vmap_pfn() to mask the PFNs to being below the shared_gpa_boundary. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-4-git-send-email-mikelley@microsoft.com
2023-03-27x86/hyperv: Reorder code to facilitate future workMichael Kelley1-34/+34
Reorder some code to facilitate future work. No functional change. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/1679838727-87310-3-git-send-email-mikelley@microsoft.com
2022-11-29x86/hyperv: Remove unregister syscore call from Hyper-V cleanupGaurav Kohli1-2/+0
Hyper-V cleanup code comes under panic path where preemption and irq is already disabled. So calling of unregister_syscore_ops might schedule out the thread even for the case where mutex lock is free. hyperv_cleanup unregister_syscore_ops mutex_lock(&syscore_ops_lock) might_sleep Here might_sleep might schedule out this thread, where voluntary preemption config is on and this thread will never comes back. And also this was added earlier to maintain the symmetry which is not required as this can comes during crash shutdown path only. To prevent the same, removing unregister_syscore_ops function call. Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1669443291-2575-1-git-send-email-gauravkohli@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-28clocksource: hyper-v: Add TSC page support for root partitionStanislav Kinsburskiy1-0/+2
Microsoft Hypervisor root partition has to map the TSC page specified by the hypervisor, instead of providing the page to the hypervisor like it's done in the guest partitions. However, it's too early to map the page when the clock is initialized, so, the actual mapping is happening later. Signed-off-by: Stanislav Kinsburskiy <stanislav.kinsburskiy@gmail.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Wei Liu <wei.liu@kernel.org> CC: Dexuan Cui <decui@microsoft.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: x86@kernel.org CC: "H. Peter Anvin" <hpa@zytor.com> CC: Daniel Lezcano <daniel.lezcano@linaro.org> CC: linux-hyperv@vger.kernel.org CC: linux-kernel@vger.kernel.org Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Link: https://lore.kernel.org/r/166759443644.385891.15921594265843430260.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-12x86/hyperv: Restore VP assist page after cpu offlining/onliningVitaly Kuznetsov1-28/+26
Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under 'if (*hvp)' condition. This works for root partition as hv_cpu_die() does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]' for them. This causes VP assist page to remain unset after CPU offline/online cycle: $ rdmsr -p 24 0x40000073 10212f001 $ echo 0 > /sys/devices/system/cpu/cpu24/online $ echo 1 > /sys/devices/system/cpu/cpu24/online $ rdmsr -p 24 0x40000073 0 Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to NULL (and it's also NULL initially). Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may present a (potential) issue for KVM. While Hyper-V uses CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses CPUHP_AP_KVM_STARTING which comes earlier in CPU teardown sequence. It is theoretically possible that Enlightened VMCS is still in use. It is unclear if the issue is real and if using KVM with Hyper-V root partition is even possible. While on it, drop the unneeded smp_processor_id() call from hv_cpu_init(). Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist page MSR") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20221103190601.399343-1-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-03x86/hyperv: fix invalid writes to MSRs during root partition kexecAnirudh Rayabharam1-4/+7
hyperv_cleanup resets the hypercall page by setting the MSR to 0. However, the root partition is not allowed to write to the GPA bits of the MSR. Instead, it uses the hypercall page provided by the MSR. Similar is the case with the reference TSC MSR. Clear only the enable bit instead of zeroing the entire MSR to make the code valid for root partition too. Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20221027095729.1676394-3-anrayabh@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-10-27x86/hyperv: Remove BUG_ON() for kmap_local_page()Zhao Liu1-5/+3
The commit 154fb14df7a3c ("x86/hyperv: Replace kmap() with kmap_local_page()") keeps the BUG_ON() to check if kmap_local_page() fails. But in fact, kmap_local_page() always returns a valid kernel address and won't return NULL here. It will BUG on its own if it fails. [1] So directly use memcpy_to_page() which creates local mapping to copy. [1]: https://lore.kernel.org/lkml/YztFEyUA48et0yTt@iweiny-mobl/ Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20221020083820.2341088-1-zhao1.liu@linux.intel.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-10-03x86/hyperv: Replace kmap() with kmap_local_page()Zhao Liu1-2/+2
kmap() is being deprecated in favor of kmap_local_page()[1]. There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap's pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. In the fuction hyperv_init() of hyperv/hv_init.c, the mapping is used in a single thread and is short live. So, in this case, it's safe to simply use kmap_local_page() to create mapping, and this avoids the wasted cost of kmap() for global synchronization. In addtion, the fuction hyperv_init() checks if kmap() fails by BUG_ON(). From the original discussion[2], the BUG_ON() here is just used to explicitly panic NULL pointer. So still keep the BUG_ON() in place to check if kmap_local_page() fails. Based on this consideration, memcpy_to_page() is not selected here but only kmap_local_page() is used. Therefore, replace kmap() with kmap_local_page() in hyperv/hv_init.c. [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com [2]: https://lore.kernel.org/lkml/20200915103710.cqmdvzh5lys4wsqo@liuwe-devbox-debian-v2/ Suggested-by: Dave Hansen <dave.hansen@intel.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20220928095640.626350-1-zhao1.liu@linux.intel.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-09-28hyperv: simplify and rename generate_guest_idLi kunyu1-1/+1
The generate_guest_id function is more suitable for use after the following modifications. 1. The return value of the function is modified to u64. 2. Remove the d_info1 and d_info2 parameters from the function, keep the u64 type kernel_version parameter. 3. Rename the function to make it clearly a Hyper-V related function, and modify it to hv_generate_guest_id. Signed-off-by: Li kunyu <kunyu@nfschina.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20220928064046.3545-1-kunyu@nfschina.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-08-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+1
Pull kvm updates from Paolo Bonzini: "Quite a large pull request due to a selftest API overhaul and some patches that had come in too late for 5.19. ARM: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes RISC-V: - Track ISA extensions used by Guest using bitmap - Added system instruction emulation framework - Added CSR emulation framework - Added gfp_custom flag in struct kvm_mmu_memory_cache - Added G-stage ioremap() and iounmap() functions - Added support for Svpbmt inside Guest s390: - add an interface to provide a hypervisor dump for secure guests - improve selftests to use TAP interface - enable interpretive execution of zPCI instructions (for PCI passthrough) - First part of deferred teardown - CPU Topology - PV attestation - Minor fixes x86: - Permit guests to ignore single-bit ECC errors - Intel IPI virtualization - Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS - PEBS virtualization - Simplify PMU emulation by just using PERF_TYPE_RAW events - More accurate event reinjection on SVM (avoid retrying instructions) - Allow getting/setting the state of the speaker port data bit - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent - "Notify" VM exit (detect microarchitectural hangs) for Intel - Use try_cmpxchg64 instead of cmpxchg64 - Ignore benign host accesses to PMU MSRs when PMU is disabled - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior - Allow NX huge page mitigation to be disabled on a per-vm basis - Port eager page splitting to shadow MMU as well - Enable CMCI capability by default and handle injected UCNA errors - Expose pid of vcpu threads in debugfs - x2AVIC support for AMD - cleanup PIO emulation - Fixes for LLDT/LTR emulation - Don't require refcounted "struct page" to create huge SPTEs - Miscellaneous cleanups: - MCE MSR emulation - Use separate namespaces for guest PTEs and shadow PTEs bitmasks - PIO emulation - Reorganize rmap API, mostly around rmap destruction - Do not workaround very old KVM bugs for L0 that runs with nesting enabled - new selftests API for CPUID Generic: - Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache - new selftests API using struct kvm_vcpu instead of a (vm, id) tuple" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits) selftests: kvm: set rax before vmcall selftests: KVM: Add exponent check for boolean stats selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test selftests: KVM: Check stat name before other fields KVM: x86/mmu: remove unused variable RISC-V: KVM: Add support for Svpbmt inside Guest/VM RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap() RISC-V: KVM: Add G-stage ioremap() and iounmap() functions KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache RISC-V: KVM: Add extensible CSR emulation framework RISC-V: KVM: Add extensible system instruction emulation framework RISC-V: KVM: Factor-out instruction emulation into separate sources RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function RISC-V: KVM: Fix variable spelling mistake RISC-V: KVM: Improve ISA extension by using a bitmap KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs() KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT KVM: x86: Do not block APIC write for non ICR registers ...
2022-08-01Merge remote-tracking branch 'kvm/next' into kvm-next-5.20Paolo Bonzini1-1/+1
KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID
2022-07-07genirq: Return a const cpumask from irq_data_get_affinity_maskSamuel Holland1-1/+1
Now that the irq_data_update_affinity helper exists, enforce its use by returning a a const cpumask from irq_data_get_affinity_mask. Since the previous commit already updated places that needed to call irq_data_update_affinity, this commit updates the remaining code that either did not modify the cpumask or immediately passed the modified mask to irq_set_affinity. Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220701200056.46555-8-samuel@sholland.org