summaryrefslogtreecommitdiff
path: root/arch/x86/kvm/vmx/capabilities.h
AgeCommit message (Collapse)AuthorFilesLines
2023-05-26KVM: VMX: Treat UMIP as emulated if and only if the host doesn't have UMIPSean Christopherson1-2/+2
Advertise UMIP as emulated if and only if the host doesn't natively support UMIP, otherwise vmx_umip_emulated() is misleading when the host _does_ support UMIP. Of the four users of vmx_umip_emulated(), two already check for native support, and the logic in vmx_set_cpu_caps() is relevant if and only if UMIP isn't natively supported as UMIP is set in KVM's caps by kvm_set_cpu_caps() when UMIP is present in hardware. That leaves KVM's stuffing of X86_CR4_UMIP into the default cr4_fixed1 value enumerated for nested VMX. In that case, checking for (lack of) host support is actually a bug fix of sorts, as enumerating UMIP support based solely on descriptor table exiting works only because KVM doesn't sanity check MSR_IA32_VMX_CR4_FIXED1. E.g. if a (very theoretical) host supported UMIP in hardware but didn't allow UMIP+VMX, KVM would advertise UMIP but not actually emulate UMIP. Of course, KVM would explode long before it could run a nested VM on said theoretical CPU, as KVM doesn't modify host CR4 when enabling VMX, i.e. would load an "illegal" value into vmcs.HOST_CR4. Reported-by: Robert Hoo <robert.hu@intel.com> Link: https://lore.kernel.org/all/20230310125718.1442088-2-robert.hu@intel.com Link: https://lore.kernel.org/r/20230413231914.1482782-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-12-29KVM: VMX: Make VMCS configuration/capabilities structs read-only after initSean Christopherson1-2/+2
Tag vmcs_config and vmx_capability structs as __init, the canonical configuration is generated during hardware_setup() and must never be modified after that point. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-31-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: x86: Track supported PERF_CAPABILITIES in kvm_capsSean Christopherson1-25/+0
Track KVM's supported PERF_CAPABILITIES in kvm_caps instead of computing the supported capabilities on the fly every time. Using kvm_caps will also allow for future cleanups as the kvm_caps values can be used directly in common x86 code. Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Like Xu <likexu@tencent.com> Message-Id: <20221006000314.73240-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09perf/x86/core: Zero @lbr instead of returning -1 in x86_perf_get_lbr() stubSean Christopherson1-1/+2
Drop the return value from x86_perf_get_lbr() and have the stub zero out the @lbr structure instead of returning -1 to indicate "no LBR support". KVM doesn't actually check the return value, and instead subtly relies on zeroing the number of LBRs in intel_pmu_init(). Formalize "nr=0 means unsupported" so that KVM doesn't need to add a pointless check on the return value to fix KVM's benign bug. Note, the stub is necessary even though KVM x86 selects PERF_EVENTS and the caller exists only when CONFIG_KVM_INTEL=y. Despite the name, KVM_INTEL doesn't strictly require CPU_SUP_INTEL, it can be built with any of INTEL || CENTAUR || ZHAOXIN CPUs. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()Sean Christopherson1-15/+0
Fold vmx_supported_debugctl() into vcpu_supported_debugctl(), its only caller. Setting bits only to clear them a few instructions later is rather silly, and splitting the logic makes things seem more complicated than they actually are. Opportunistically drop DEBUGCTLMSR_LBR_MASK now that there's a single reference to the pair of bits. The extra layer of indirection provides no meaningful value and makes it unnecessarily tedious to understand what KVM is doing. No functional change. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-4-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRsSean Christopherson1-1/+3
Advertise LBR support to userspace via MSR_IA32_PERF_CAPABILITIES if and only if perf fully supports LBRs. Perf may disable LBRs (by zeroing the number of LBRs) even on platforms the allegedly support LBRs, e.g. if probing any LBR MSRs during setup fails. Fixes: be635e34c284 ("KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES") Reported-by: Like Xu <like.xu.linux@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221006000314.73240-3-seanjc@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_configVitaly Kuznetsov1-8/+3
Like other host VMX control MSRs, MSR_IA32_VMX_MISC can be cached in vmcs_config to avoid the need to re-read it later, e.g. from cpu_has_vmx_intel_pt() or cpu_has_vmx_shadow_vmcs(). No (real) functional change intended. Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-33-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-26KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING in setup_vmcs_config()Vitaly Kuznetsov1-1/+2
CPU_BASED_{INTR,NMI}_WINDOW_EXITING controls are toggled dynamically by vmx_enable_{irq,nmi}_window, handle_interrupt_window(), handle_nmi_window() but setup_vmcs_config() doesn't check their existence. Add the check and filter the controls out in vmx_exec_control(). Note: KVM explicitly supports CPUs without VIRTUAL_NMIS and all these CPUs are supposedly lacking NMI_WINDOW_EXITING too. Adjust cpu_has_virtual_nmis() accordingly. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220830133737.1539624-21-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-01Merge remote-tracking branch 'kvm/next' into kvm-next-5.20Paolo Bonzini1-16/+41
KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID
2022-06-27KVM: VMX: Prevent RSB underflow before vmenterJosh Poimboeuf1-2/+2
On VMX, there are some balanced returns between the time the guest's SPEC_CTRL value is written, and the vmenter. Balanced returns (matched by a preceding call) are usually ok, but it's at least theoretically possible an NMI with a deep call stack could empty the RSB before one of the returns. For maximum paranoia, don't allow *any* returns (balanced or otherwise) between the SPEC_CTRL write and the vmenter. [ bp: Fix 32-bit build. ] Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-08KVM: VMX: Sanitize VM-Entry/VM-Exit control pairs at kvm_intel load timeSean Christopherson1-7/+3
Sanitize the VM-Entry/VM-Exit control pairs (load+load or load+clear) during setup instead of checking both controls in a pair at runtime. If only one control is supported, KVM will report the associated feature as not available, but will leave the supported control bit set in the VMCS config, which could lead to corruption of host state. E.g. if only the VM-Entry control is supported and the feature is not dynamically toggled, KVM will set the control in all VMCSes and load zeros without restoring host state. Note, while this is technically a bug fix, practically speaking no sane CPU or VMM would support only one control. KVM's behavior of checking both controls is mostly pedantry. Cc: Chenyi Qiang <chenyi.qiang@intel.com> Cc: Lei Wang <lei4.wang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220527170658.3571367-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Restrict advanced features based on module enable_pmuLike Xu1-0/+4
Once vPMU is disabled, the KVM would not expose features like: PEBS (via clear kvm_pmu_cap.pebs_ept), legacy LBR and ARCH_LBR, CPUID 0xA leaf, PDCM bit and MSR_IA32_PERF_CAPABILITIES, plus PT_MODE_HOST_GUEST mode. What this group of features has in common is that their use relies on the underlying PMU counter and the host perf_event as a back-end resource requester or sharing part of the irq delivery path. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220601031925.59693-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: VMX: Enable Notify VM exitTao Xu1-0/+6
There are cases that malicious virtual machines can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. VMM can enable notify VM exit that a VM exit generated if no event window occurs in VM non-root mode for a specified amount of time (notify window). Feature enabling: - The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust the expected notify window. - Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space can query and enable this feature in per-VM scope. The argument is a 64bit value: bits 63:32 are used for notify window, and bits 31:0 are for flags. Current supported flags: - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify window provided. - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen. - It's safe to even set notify window to zero since an internal hardware threshold is added to vmcs.notify_window. VM exit handling: - Introduce a vcpu state notify_window_exits to records the count of notify VM exits and expose it through the debugfs. - Notify VM exit can happen incident to delivery of a vector event. Allow it in KVM. - Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID bit is set. Nested handling - Nested notify VM exits are not supported yet. Keep the same notify window control in vmcs02 as vmcs01, so that L1 can't escape the restriction of notify VM exits through launching L2 VM. Notify VM exit is defined in latest Intel Architecture Instruction Set Extensions Programming Reference, chapter 9.2. Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64Like Xu1-11/+17
The CPUID features PDCM, DS and DTES64 are required for PEBS feature. KVM would expose CPUID feature PDCM, DS and DTES64 to guest when PEBS is supported in the KVM on the Ice Lake server platforms. Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-18-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: VMX: enable IPI virtualizationChao Gao1-0/+6
With IPI virtualization enabled, the processor emulates writes to APIC registers that would send IPIs. The processor sets the bit corresponding to the vector in target vCPU's PIR and may send a notification (IPI) specified by NDST and NV fields in target vCPU's Posted-Interrupt Descriptor (PID). It is similar to what IOMMU engine does when dealing with posted interrupt from devices. A PID-pointer table is used by the processor to locate the PID of a vCPU with the vCPU's APIC ID. The table size depends on maximum APIC ID assigned for current VM session from userspace. Allocating memory for PID-pointer table is deferred to vCPU creation, because irqchip mode and VM-scope maximum APIC ID is settled at that point. KVM can skip PID-pointer table allocation if !irqchip_in_kernel(). Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its notification vector to wakeup vector. This can ensure that when an IPI for blocked vCPUs arrives, VMM can get control and wake up blocked vCPUs. And if a VCPU is preempted, its posted interrupt notification is suppressed. Note that IPI virtualization can only virualize physical-addressing, flat mode, unicast IPIs. Sending other IPIs would still cause a trap-like APIC-write VM-exit and need to be handled by VMM. Signed-off-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Zeng Guang <guang.zeng@intel.com> Message-Id: <20220419154510.11938-1-guang.zeng@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS configRobert Hoo1-0/+7
Check VMX features on tertiary execution control in VMCS config setup. Sub-features in tertiary execution control to be enabled are adjusted according to hardware capabilities although no sub-feature is enabled in this patch. EVMCSv1 doesn't support tertiary VM-execution control, so disable it when EVMCSv1 is in use. And define the auxiliary functions for Tertiary control field here, using the new BUILD_CONTROLS_SHADOW(). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Zeng Guang <guang.zeng@intel.com> Message-Id: <20220419153400.11642-1-guang.zeng@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-26KVM: VMX: Remove vmcs_config.orderJim Mattson1-1/+0
The maximum size of a VMCS (or VMXON region) is 4096. By definition, these are order 0 allocations. Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20220125004359.147600-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-17KVM: x86: Making the module parameter of vPMU more commonLike Xu1-0/+4
The new module parameter to control PMU virtualization should apply to Intel as well as AMD, for situations where userspace is not trusted. If the module parameter allows PMU virtualization, there could be a new KVM_CAP or guest CPUID bits whereby userspace can enable/disable PMU virtualization on a per-VM basis. If the module parameter does not allow PMU virtualization, there should be no userspace override, since we have no precedent for authorizing that kind of override. If it's false, other counter-based profiling features (such as LBR including the associated CPUID bits if any) will not be exposed. Change its name from "pmu" to "enable_pmu" as we have temporary variables with the same name in our code like "struct kvm_pmu *pmu". Fixes: b1d66dad65dc ("KVM: x86/svm: Add module param to control PMU virtualization") Suggested-by : Jim Mattson <jmattson@google.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220111073823.21885-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-08KVM: X86: Add parameter huge_page_level to kvm_init_shadow_ept_mmu()Lai Jiangshan1-0/+9
The level of supported large page on nEPT affects the rsvds_bits_mask. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-8-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-17KVM: x86: Use common 'enable_apicv' variable for both APICv and AVICVitaly Kuznetsov1-1/+0
Unify VMX and SVM code by moving APICv/AVIC enablement tracking to common 'enable_apicv' variable. Note: unlike APICv, AVIC is disabled by default. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210609150911.1471882-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-24KVM: VMX: Drop unneeded CONFIG_X86_LOCAL_APIC checkVitaly Kuznetsov1-2/+1
CONFIG_X86_LOCAL_APIC is always on when CONFIG_KVM (on x86) since commit e42eef4ba388 ("KVM: add X86_LOCAL_APIC dependency"). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210518144339.1987982-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com>
2021-05-07KVM: X86: Expose bus lock debug exception to guestPaolo Bonzini1-0/+3
Bus lock debug exception is an ability to notify the kernel by an #DB trap after the instruction acquires a bus lock and is executed when CPL>0. This allows the kernel to enforce user application throttling or mitigations. Existence of bus lock debug exception is enumerated via CPUID.(EAX=7,ECX=0).ECX[24]. Software can enable these exceptions by setting bit 2 of the MSR_IA32_DEBUGCTL. Expose the CPUID to guest and emulate the MSR handling when guest enables it. Support for this feature was originally developed by Xiaoyao Li and Chenyi Qiang, but code has since changed enough that this patch has nothing in common with theirs, except for this commit message. Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIESLike Xu1-1/+8
Userspace could enable guest LBR feature when the exactly supported LBR format value is initialized to the MSR_IA32_PERF_CAPABILITIES and the LBR is also compatible with vPMU version and host cpu model. The LBR could be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-11-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMILike Xu1-1/+3
The current vPMU only supports Architecture Version 2. According to Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual PMI and the KVM would emulate to clear the LBR bit (bit 0) in IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR to resume recording branches. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-9-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabledLike Xu1-1/+6
Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES MSR which tells about the record format stored in the LBR records. The LBR will be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabledPaolo Bonzini1-0/+1
Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES MSR which tells about the record format stored in the LBR records. The LBR will be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: VMX: read/write MSR_IA32_DEBUGCTLMSR from GUEST_IA32_DEBUGCTLLike Xu1-0/+5
SVM already has specific handlers of MSR_IA32_DEBUGCTLMSR in the svm_get/set_msr, so the x86 common part can be safely moved to VMX. This allows KVM to store the bits it supports in GUEST_IA32_DEBUGCTL. Add vmx_supported_debugctl() to refactor the throwing logic of #GP. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Message-Id: <20210108013704.134985-2-like.xu@linux.intel.com> [Merge parts of Chenyi Qiang's "KVM: X86: Expose bus lock debug exception to guest". - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-04KVM: VMX: Enable bus lock VM exitChenyi Qiang1-0/+6
Virtual Machine can exploit bus locks to degrade the performance of system. Bus lock can be caused by split locked access to writeback(WB) memory or by using locks on uncacheable(UC) memory. The bus lock is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). To address the threat, bus lock VM exit is introduced to notify the VMM when a bus lock was acquired, allowing it to enforce throttling or other policy based mitigations. A VMM can enable VM exit due to bus locks by setting a new "Bus Lock Detection" VM-execution control(bit 30 of Secondary Processor-based VM execution controls). If delivery of this VM exit was preempted by a higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of exit reason in vmcs field is set to 1. In current implementation, the KVM exposes this capability through KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap (i.e. off and exit) and enable it explicitly (disabled by default). If bus locks in guest are detected by KVM, exit to user space even when current exit reason is handled by KVM internally. Set a new field KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there is a bus lock detected in guest. Document for Bus Lock VM exit is now available at the latest "Intel Architecture Instruction Set Extensions Programming Reference". Document Link: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename RDTSCP secondary exec control name to insert "ENABLE"Sean Christopherson1-1/+1
Rename SECONDARY_EXEC_RDTSCP to SECONDARY_EXEC_ENABLE_RDTSCP in preparation for consolidating the logic for adjusting secondary exec controls based on the guest CPUID model. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923165048.20486-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28KVM: VMX: Rename vmx_*_supported() helpers to cpu_has_vmx_*()Sean Christopherson1-4/+4
Rename helpers for a few controls to conform to the more prevelant style of cpu_has_vmx_<feature>(). Consistent names will allow adding macros to consolidate the boilerplate code for adjusting secondary execution controls. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923165048.20486-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-01KVM: x86/pmu: Support full width countingLike Xu1-0/+11
Intel CPUs have a new alternative MSR range (starting from MSR_IA32_PMC0) for GP counters that allows writing the full counter width. Enable this range from a new capability bit (IA32_PERF_CAPABILITIES.FW_WRITE[bit 13]). The guest would query CPUID to get the counter width, and sign extends the counter values as needed. The traditional MSRs always limit to 32bit, even though the counter internally is larger (48 or 57 bits). When the new capability is set, use the alternative range which do not have these restrictions. This lowers the overhead of perf stat slightly because it has to do less interrupts to accumulate the counter value. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20200529074347.124619-3-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Handle PKU CPUID adjustment in VMX codeSean Christopherson1-5/+0
Move the setting of the PKU CPUID bit into VMX to eliminate an instance of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the common CPUID handling code. Drop ->pku_supported(), CPUID adjustment was the only user. Note, some AMD CPUs now support PKU, but SVM doesn't yet support exposing it to a guest. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: x86: Use supported_xcr0 to detect MPX supportSean Christopherson1-1/+1
Query supported_xcr0 when checking for MPX support instead of invoking ->mpx_supported() and drop ->mpx_supported() as kvm_mpx_supported() was its last user. Rename vmx_mpx_supported() to cpu_has_vmx_mpx() to better align with VMX/VMCS nomenclature. Modify VMX's adjustment of xcr0 to call cpus_has_vmx_mpx() (renamed from vmx_mpx_supported()) directly to avoid reading supported_xcr0 before it's fully configured. No functional change intended. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Test that *all* bits are set. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16KVM: VMX: Add helpers to query Intel PT modeSean Christopherson1-0/+18
Add helpers to query which of the (two) supported PT modes is active. The primary motivation is to help document that there is a third PT mode (host-only) that's currently not supported by KVM. As is, it's not obvious that PT_MODE_SYSTEM != !PT_MODE_HOST_GUEST and vice versa, e.g. that "pt_mode == PT_MODE_SYSTEM" and "pt_mode != PT_MODE_HOST_GUEST" are two distinct checks. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21KVM: nVMX: clear PIN_BASED_POSTED_INTR from nested pinbased_ctls only when ↵Vitaly Kuznetsov1-0/+1
apicv is globally disabled When apicv is disabled on a vCPU (e.g. by enabling KVM_CAP_HYPERV_SYNIC*), nothing happens to VMX MSRs on the already existing vCPUs, however, all new ones are created with PIN_BASED_POSTED_INTR filtered out. This is very confusing and results in the following picture inside the guest: $ rdmsr -ax 0x48d ff00000016 7f00000016 7f00000016 7f00000016 This is observed with QEMU and 4-vCPU guest: QEMU creates vCPU0, does KVM_CAP_HYPERV_SYNIC2 and then creates the remaining three. L1 hypervisor may only check CPU0's controls to find out what features are available and it will be very confused later. Switch to setting PIN_BASED_POSTED_INTR control based on global 'enable_apicv' setting. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27kvm/svm: PKU not currently supportedJohn Allen1-0/+5
Current SVM implementation does not have support for handling PKU. Guests running on a host with future AMD cpus that support the feature will read garbage from the PKRU register and will hit segmentation faults on boot as memory is getting marked as protected that should not be. Ensure that cpuid from SVM does not advertise the feature. Signed-off-by: John Allen <john.allen@amd.com> Cc: stable@vger.kernel.org Fixes: 0556cbdc2fbc ("x86/pkeys: Don't check if PKRU is zero before writing it") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-24KVM: x86: Add support for user wait instructionsTao Xu1-0/+6
UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions. This patch adds support for user wait instructions in KVM. Availability of the user wait instructions is indicated by the presence of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may be executed at any privilege level, and use 32bit IA32_UMWAIT_CONTROL MSR to set the maximum time. The behavior of user wait instructions in VMX non-root operation is determined first by the setting of the "enable user wait and pause" secondary processor-based VM-execution control bit 26. If the VM-execution control is 0, UMONITOR/UMWAIT/TPAUSE cause an invalid-opcode exception (#UD). If the VM-execution control is 1, treatment is based on the setting of the “RDTSC exiting†VM-execution control. Because KVM never enables RDTSC exiting, if the instruction causes a delay, the amount of time delayed is called here the physical delay. The physical delay is first computed by determining the virtual delay. If IA32_UMWAIT_CONTROL[31:2] is zero, the virtual delay is the value in EDX:EAX minus the value that RDTSC would return; if IA32_UMWAIT_CONTROL[31:2] is not zero, the virtual delay is the minimum of that difference and AND(IA32_UMWAIT_CONTROL,FFFFFFFCH). Because umwait and tpause can put a (psysical) CPU into a power saving state, by default we dont't expose it to kvm and enable it only when guest CPUID has it. Detailed information about user wait instructions can be found in the latest Intel 64 and IA-32 Architectures Software Developer's Manual. Co-developed-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-30KVM: VMX: Include architectural defs header in capabilities.hSean Christopherson1-0/+2
The capabilities header depends on asm/vmx.h but doesn't explicitly include said file. This currently doesn't cause problems as all users of capbilities.h first include asm/vmx.h, but the issue often results in build errors if someone starts moving things around the VMX files. Fixes: 3077c1910882 ("KVM: VMX: Move capabilities structs and helpers to dedicated file") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21KVM: x86: Add Intel PT virtualization work modeChao Peng1-0/+15
Intel Processor Trace virtualization can be work in one of 2 possible modes: a. System-Wide mode (default): When the host configures Intel PT to collect trace packets of the entire system, it can leave the relevant VMX controls clear to allow VMX-specific packets to provide information across VMX transitions. KVM guest will not aware this feature in this mode and both host and KVM guest trace will output to host buffer. b. Host-Guest mode: Host can configure trace-packet generation while in VMX non-root operation for guests and root operation for native executing normally. Intel PT will be exposed to KVM guest in this mode, and the trace output to respective buffer of host and guest. In this mode, tht status of PT will be saved and disabled before VM-entry and restored after VM-exit if trace a virtual machine. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: VMX: Expose various module param vars via capabilities.hSean Christopherson1-0/+7
Expose the variables associated with various module params that are needed by the nested VMX code. There is no ulterior logic for what variables are/aren't exposed, this is purely "what's needed by the nested code". Note that @nested is intentionally not exposed. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14KVM: VMX: Move capabilities structs and helpers to dedicated fileSean Christopherson1-0/+321
Defining a separate capabilities.h as opposed to putting this code in e.g. vmx.h avoids circular dependencies between (the yet-to-be-added) vmx.h and nested.h. The aforementioned circular dependencies are why struct nested_vmx_msrs also resides in capabilities instead of e.g. nested.h. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>