summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2021-10-21KVM: x86: check for interrupts before deciding whether to exit the fast pathPaolo Bonzini1-5/+5
The kvm_x86_sync_pir_to_irr callback can sometimes set KVM_REQ_EVENT. If that happens exactly at the time that an exit is handled as EXIT_FASTPATH_REENTER_GUEST, vcpu_enter_guest will go incorrectly through the loop that calls kvm_x86_run, instead of processing the request promptly. Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-17/+32
Pull kvm fixes from Paolo Bonzini: "Tools: - kvm_stat: do not show halt_wait_ns since it is not a cumulative statistic x86: - clean ups and fixes for bus lock vmexit and lazy allocation of rmaps - two fixes for SEV-ES (one more coming as soon as I get reviews) - fix for static_key underflow ARM: - Properly refcount pages used as a concatenated stage-2 PGD - Fix missing unlock when detecting the use of MTE+VM_SHARED" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SEV-ES: reduce ghcb_sa_len to 32 bits KVM: VMX: Remove redundant handling of bus lock vmexit KVM: kvm_stat: do not show halt_wait_ns KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET" KVM: SEV-ES: Set guest_state_protected after VMSA update KVM: X86: fix lazy allocation of rmaps KVM: SEV-ES: fix length of string I/O KVM: arm64: Release mmap_lock when using VM_SHARED with MTE KVM: arm64: Report corrupted refcount at EL2 KVM: arm64: Fix host stage-2 PGD refcount KVM: s390: Function documentation fixes
2021-10-18KVM: SEV-ES: reduce ghcb_sa_len to 32 bitsPaolo Bonzini1-1/+1
The size of the GHCB scratch area is limited to 16 KiB (GHCB_SCRATCH_AREA_LIMIT), so there is no need for it to be a u64. This fixes a build error on 32-bit systems: i686-linux-gnu-ld: arch/x86/kvm/svm/sev.o: in function `sev_es_string_io: sev.c:(.text+0x110f): undefined reference to `__udivdi3' Cc: stable@vger.kernel.org Fixes: 019057bd73d1 ("KVM: SEV-ES: fix length of string I/O") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: VMX: Remove redundant handling of bus lock vmexitHao Xiang1-6/+9
Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK. We can remove redundant handling of bus lock vmexit. Unconditionally Set exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit(). Signed-off-by: Hao Xiang <hao.xiang@linux.alibaba.com> Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unloadSean Christopherson1-0/+2
WARN if the static keys used to track if any vCPU has disabled its APIC are left elevated at module exit. Unlike the underflow case, nothing in the static key infrastructure will complain if a key is left elevated, and because an elevated key only affects performance, nothing in KVM will fail if either key is improperly incremented. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211013003554.47705-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU ↵Sean Christopherson1-7/+11
RESET" Revert a change to open code bits of kvm_lapic_set_base() when emulating APIC RESET to fix an apic_hw_disabled underflow bug due to arch.apic_base and apic_hw_disabled being unsyncrhonized when the APIC is created. If kvm_arch_vcpu_create() fails after creating the APIC, kvm_free_lapic() will see the initialized-to-zero vcpu->arch.apic_base and decrement apic_hw_disabled without KVM ever having incremented apic_hw_disabled. Using kvm_lapic_set_base() in kvm_lapic_reset() is also desirable for a potential future where KVM supports RESET outside of vCPU creation, in which case all the side effects of kvm_lapic_set_base() are needed, e.g. to handle the transition from x2APIC => xAPIC. Alternatively, KVM could temporarily increment apic_hw_disabled (and call kvm_lapic_set_base() at RESET), but that's a waste of cycles and would impact the performance of other vCPUs and VMs. The other subtle side effect is that updating the xAPIC ID needs to be done at RESET regardless of whether the APIC was previously enabled, i.e. kvm_lapic_reset() needs an explicit call to kvm_apic_set_xapic_id() regardless of whether or not kvm_lapic_set_base() also performs the update. That makes stuffing the enable bit at vCPU creation slightly more palatable, as doing so affects only the apic_hw_disabled key. Opportunistically tweak the comment to explicitly call out the connection between vcpu->arch.apic_base and apic_hw_disabled, and add a comment to call out the need to always do kvm_apic_set_xapic_id() at RESET. Underflow scenario: kvm_vm_ioctl() { kvm_vm_ioctl_create_vcpu() { kvm_arch_vcpu_create() { if (something_went_wrong) goto fail_free_lapic; /* vcpu->arch.apic_base is initialized when something_went_wrong is false. */ kvm_vcpu_reset() { kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) { vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; } } return 0; fail_free_lapic: kvm_free_lapic() { /* vcpu->arch.apic_base is not yet initialized when something_went_wrong is true. */ if (!(vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE)) static_branch_slow_dec_deferred(&apic_hw_disabled); // <= underflow bug. } return r; } } } This (mostly) reverts commit 421221234ada41b4a9f0beeb08e30b07388bd4bd. Fixes: 421221234ada ("KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET") Reported-by: syzbot+9fc046ab2b0cf295a063@syzkaller.appspotmail.com Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211013003554.47705-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: X86: fix lazy allocation of rmapsPaolo Bonzini1-1/+2
If allocation of rmaps fails, but some of the pointers have already been written, those pointers can be cleaned up when the memslot is freed, or even reused later for another attempt at allocating the rmaps. Therefore there is no need to WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be skipped lest KVM will overwrite the previous pointer and will indeed leak memory. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18KVM: SEV-ES: Set guest_state_protected after VMSA updatePeter Gonda1-1/+6
The refactoring in commit bb18a6777465 ("KVM: SEV: Acquire vcpu mutex when updating VMSA") left behind the assignment to svm->vcpu.arch.guest_state_protected; add it back. Signed-off-by: Peter Gonda <pgonda@google.com> [Delta between v2 and v3 of Peter's patch, which had already been committed; the commit message is my own. - Paolo] Fixes: bb18a6777465 ("KVM: SEV: Acquire vcpu mutex when updating VMSA") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-18Merge tag 'perf_urgent_for_v5.15_rc6' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Add Sapphire Rapids to the list of CPUs supporting the SMI count MSR * tag 'perf_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/msr: Add Sapphire Rapids CPU support
2021-10-16x86/fpu: Mask out the invalid MXCSR bits properlyBorislav Petkov1-1/+1
This is a fix for the fix (yeah, /facepalm). The correct mask to use is not the negation of the MXCSR_MASK but the actual mask which contains the supported bits in the MXCSR register. Reported and debugged by Ville Syrjälä <ville.syrjala@linux.intel.com> Fixes: d298b03506d3 ("x86/fpu: Restore the masking out of reserved MXCSR bits") Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Ser Olmy <ser.olmy@protonmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/YWgYIYXLriayyezv@intel.com
2021-10-15perf/x86/msr: Add Sapphire Rapids CPU supportKan Liang1-0/+1
SMI_COUNT MSR is supported on Sapphire Rapids CPU. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1633551137-192083-1-git-send-email-kan.liang@linux.intel.com
2021-10-15KVM: SEV-ES: fix length of string I/OPaolo Bonzini1-1/+1
The size of the data in the scratch buffer is not divided by the size of each port I/O operation, so vcpu->arch.pio.count ends up being larger than it should be by a factor of size. Cc: stable@vger.kernel.org Fixes: 7ed9abfe8e9f ("KVM: SVM: Support string IO operations for an SEV-ES guest") Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-11x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automaticallyBorislav Petkov1-1/+0
This Kconfig option was added initially so that memory encryption is enabled by default on machines which support it. However, devices which have DMA masks that are less than the bit position of the encryption bit, aka C-bit, require the use of an IOMMU or the use of SWIOTLB. If the IOMMU is disabled or in passthrough mode, the kernel would switch to SWIOTLB bounce-buffering for those transfers. In order to avoid that, 2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active") disables the default IOMMU passthrough mode so that devices for which the default 256K DMA is insufficient, can use the IOMMU instead. However 2, there are cases where the IOMMU is disabled in the BIOS, etc. (think the usual hardware folk "oops, I dropped the ball there" cases) or a driver doesn't properly use the DMA APIs or a device has a firmware or hardware bug, e.g.: ea68573d408f ("drm/amdgpu: Fail to load on RAVEN if SME is active") However 3, in the above GPU use case, there are APIs like Vulkan and some OpenGL/OpenCL extensions which are under the assumption that user-allocated memory can be passed in to the kernel driver and both the GPU and CPU can do coherent and concurrent access to the same memory. That cannot work with SWIOTLB bounce buffers, of course. So, in order for those devices to function, drop the "default y" for the SME by default active option so that users who want to have SME enabled, will need to either enable it in their config or use "mem_encrypt=on" on the kernel command line. [ tlendacky: Generalize commit message. ] Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df09c@molgen.mpg.de
2021-10-10Merge tag 'x86_urgent_for_v5.15_rc5' of ↵Linus Torvalds9-14/+99
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - A FPU fix to properly handle invalid MXCSR values: 32-bit masks them out due to historical reasons and 64-bit kernels reject them - A fix to clear X86_FEATURE_SMAP when support for is not config-enabled - Three fixes correcting misspelled Kconfig symbols used in code - Two resctrl object cleanup fixes - Yet another attempt at fixing the neverending saga of botched x86 timers, this time because some incredibly smart hardware decides to turn off the HPET timer in a low power state - who cares if the OS is relying on it... - Check the full return value range of an SEV VMGEXIT call to determine whether it returned an error * tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Restore the masking out of reserved MXCSR bits x86/Kconfig: Correct reference to MWINCHIP3D x86/platform/olpc: Correct ifdef symbol to intended CONFIG_OLPC_XO15_SCI x86/entry: Clear X86_FEATURE_SMAP when CONFIG_X86_SMAP=n x86/entry: Correct reference to intended CONFIG_64_BIT x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu() x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails x86/hpet: Use another crystalball to evaluate HPET usability x86/sev: Return an error on a returned non-zero SW_EXITINFO1[31:0]
2021-10-08Merge tag 'for-linus-5.15b-rc5-tag' of ↵Linus Torvalds10-68/+97
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - fix two minor issues in the Xen privcmd driver plus a cleanup patch for that driver - fix multiple issues related to running as PVH guest and some related earlyprintk fixes for other Xen guest types - fix an issue introduced in 5.15 the Xen balloon driver * tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/balloon: fix cancelled balloon action xen/x86: adjust data placement x86/PVH: adjust function/data placement xen/x86: hook up xen_banner() also for PVH xen/x86: generalize preferred console model from PV to PVH Dom0 xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU xen/x86: allow "earlyprintk=xen" to work for PV Dom0 xen/x86: make "earlyprintk=xen" work better for PVH Dom0 xen/x86: allow PVH Dom0 without XEN_PV=y xen/x86: prevent PVH type from getting clobbered xen/privcmd: drop "pages" parameter from xen_remap_pfn() xen/privcmd: fix error handling in mmap-resource processing xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages
2021-10-08Merge tag 'asm-generic-fixes-5.15' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic fixes from Arnd Bergmann: "There is one build fix for Arm platforms that ended up impacting most architectures because of the way the drivers/firmware Kconfig file is wired up: The CONFIG_QCOM_SCM dependency have caused a number of randconfig regressions over time, and some still remain in v5.15-rc4. The fix we agreed on in the end is to make this symbol selected by any driver using it, and then building it even for non-Arm platforms with CONFIG_COMPILE_TEST. To make this work on all architectures, the drivers/firmware/Kconfig file needs to be included for all architectures to make the symbol itself visible. In a separate discussion, we found that a sound driver patch that is pending for v5.16 needs the same change to include this Kconfig file, so the easiest solution seems to have my Kconfig rework included in v5.15. Finally, the branch also includes a small unrelated build fix for NOMMU architectures" Link: https://lore.kernel.org/all/20210928153508.101208f8@canb.auug.org.au/ Link: https://lore.kernel.org/all/20210928075216.4193128-1-arnd@kernel.org/ Link: https://lore.kernel.org/all/20211007151010.333516-1-arnd@kernel.org/ * tag 'asm-generic-fixes-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic/io.h: give stub iounmap() on !MMU same prototype as elsewhere qcom_scm: hide Kconfig symbol firmware: include drivers/firmware/Kconfig unconditionally
2021-10-08x86/fpu: Restore the masking out of reserved MXCSR bitsBorislav Petkov1-3/+8
Ser Olmy reported a boot failure: init[1] bad frame in sigreturn frame:(ptrval) ip:b7c9fbe6 sp:bf933310 orax:ffffffff \ in libc-2.33.so[b7bed000+156000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b CPU: 0 PID: 1 Comm: init Tainted: G W 5.14.9 #1 Hardware name: Hewlett-Packard HP PC/HP Board, BIOS JD.00.06 12/06/2001 Call Trace: dump_stack_lvl dump_stack panic do_exit.cold do_group_exit get_signal arch_do_signal_or_restart ? force_sig_info_to_task ? force_sig exit_to_user_mode_prepare syscall_exit_to_user_mode do_int80_syscall_32 entry_INT80_32 on an old 32-bit Intel CPU: vendor_id : GenuineIntel cpu family : 6 model : 6 model name : Celeron (Mendocino) stepping : 5 microcode : 0x3 Ser bisected the problem to the commit in Fixes. tglx suggested reverting the rejection of invalid MXCSR values which this commit introduced and replacing it with what the old code did - simply masking them out to zero. Further debugging confirmed his suggestion: fpu->state.fxsave.mxcsr: 0xb7be13b4, mxcsr_feature_mask: 0xffbf WARNING: CPU: 0 PID: 1 at arch/x86/kernel/fpu/signal.c:384 __fpu_restore_sig+0x51f/0x540 so restore the original behavior only for 32-bit kernels where you have ancient machines with buggy hardware. For 32-bit programs on 64-bit kernels, user space which supplies wrong MXCSR values is considered malicious so fail the sigframe restoration there. Fixes: 6f9866a166cd ("x86/fpu/signal: Let xrstor handle the features to init") Reported-by: Ser Olmy <ser.olmy@protonmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Ser Olmy <ser.olmy@protonmail.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/YVtA67jImg3KlBTw@zn.tnic
2021-10-07Merge tag 'hyperv-fixes-signed-20211007' of ↵Linus Torvalds1-5/+15
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Replace uuid.h with types.h in a header (Andy Shevchenko) - Avoid sleeping in atomic context in PCI driver (Long Li) - Avoid sending IPI to self when it shouldn't (Vitaly Kuznetsov) * tag 'hyperv-fixes-signed-20211007' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Avoid erroneously sending IPI to 'self' hyper-v: Replace uuid.h with types.h PCI: hv: Fix sleep while in non-sleep context when removing child devices from the bus
2021-10-07firmware: include drivers/firmware/Kconfig unconditionallyArnd Bergmann1-2/+0
Compile-testing drivers that require access to a firmware layer fails when that firmware symbol is unavailable. This happened twice this week: - My proposed to change to rework the QCOM_SCM firmware symbol broke on ppc64 and others. - The cs_dsp firmware patch added device specific firmware loader into drivers/firmware, which broke on the same set of architectures. We should probably do the same thing for other subsystems as well, but fix this one first as this is a dependency for other patches getting merged. Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Mark Brown <broonie@kernel.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Charles Keepax <ckeepax@opensource.cirrus.com> Cc: Simon Trimmer <simont@opensource.cirrus.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-06x86/Kconfig: Correct reference to MWINCHIP3DLukas Bulwahn1-1/+1
Commit in Fixes intended to exclude the Winchip series and referred to CONFIG_WINCHIP3D, but the config symbol is called CONFIG_MWINCHIP3D. Hence, scripts/checkkconfigsymbols.py warns: WINCHIP3D Referencing files: arch/x86/Kconfig Correct the reference to the intended config symbol. Fixes: 69b8d3fcabdc ("x86/Kconfig: Exclude i586-class CPUs lacking PAE support from the HIGHMEM64G Kconfig group") Suggested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210803113531.30720-4-lukas.bulwahn@gmail.com
2021-10-06x86/platform/olpc: Correct ifdef symbol to intended CONFIG_OLPC_XO15_SCILukas Bulwahn1-1/+1
The refactoring in the commit in Fixes introduced an ifdef CONFIG_OLPC_XO1_5_SCI, however the config symbol is actually called "CONFIG_OLPC_XO15_SCI". Fortunately, ./scripts/checkkconfigsymbols.py warns: OLPC_XO1_5_SCI Referencing files: arch/x86/platform/olpc/olpc.c Correct this ifdef condition to the intended config symbol. Fixes: ec9964b48033 ("Platform: OLPC: Move EC-specific functionality out from x86") Suggested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210803113531.30720-3-lukas.bulwahn@gmail.com
2021-10-06x86/entry: Clear X86_FEATURE_SMAP when CONFIG_X86_SMAP=nVegard Nossum1-0/+1
Commit 3c73b81a9164 ("x86/entry, selftests: Further improve user entry sanity checks") added a warning if AC is set when in the kernel. Commit 662a0221893a3d ("x86/entry: Fix AC assertion") changed the warning to only fire if the CPU supports SMAP. However, the warning can still trigger on a machine that supports SMAP but where it's disabled in the kernel config and when running the syscall_nt selftest, for example: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 49 at irqentry_enter_from_user_mode CPU: 0 PID: 49 Comm: init Tainted: G T 5.15.0-rc4+ #98 e6202628ee053b4f310759978284bd8bb0ce6905 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:irqentry_enter_from_user_mode ... Call Trace: ? irqentry_enter ? exc_general_protection ? asm_exc_general_protection ? asm_exc_general_protectio IS_ENABLED(CONFIG_X86_SMAP) could be added to the warning condition, but even this would not be enough in case SMAP is disabled at boot time with the "nosmap" parameter. To be consistent with "nosmap" behaviour, clear X86_FEATURE_SMAP when !CONFIG_X86_SMAP. Found using entry-fuzz + satrandconfig. [ bp: Massage commit message. ] Fixes: 3c73b81a9164 ("x86/entry, selftests: Further improve user entry sanity checks") Fixes: 662a0221893a ("x86/entry: Fix AC assertion") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20211003223423.8666-1-vegard.nossum@oracle.com
2021-10-06x86/entry: Correct reference to intended CONFIG_64_BITLukas Bulwahn1-1/+1
Commit in Fixes adds a condition with IS_ENABLED(CONFIG_64_BIT), but the intended config item is called CONFIG_64BIT, as defined in arch/x86/Kconfig. Fortunately, scripts/checkkconfigsymbols.py warns: 64_BIT Referencing files: arch/x86/include/asm/entry-common.h Correct the reference to the intended config symbol. Fixes: 662a0221893a ("x86/entry: Fix AC assertion") Suggested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210803113531.30720-2-lukas.bulwahn@gmail.com
2021-10-06x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()James Morse1-2/+2
Commit in Fixes separated the architecture specific and filesystem parts of the resctrl domain structures. This left the error paths in domain_add_cpu() kfree()ing the memory with the wrong type. This will cause a problem if someone adds a new member to struct rdt_hw_domain meaning d_resctrl is no longer the first member. Fixes: 792e0f6f789b ("x86/resctrl: Split struct rdt_domain") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20210917165924.28254-1-james.morse@arm.com
2021-10-06x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() failsJames Morse1-0/+2
domain_add_cpu() is called whenever a CPU is brought online. The earlier call to domain_setup_ctrlval() allocates the control value arrays. If domain_setup_mon_state() fails, the control value arrays are not freed. Add the missing kfree() calls. Fixes: 1bd2a63b4f0de ("x86/intel_rdt/mba_sc: Add initialization support") Fixes: edf6fa1c4a951 ("x86/intel_rdt/cqm: Add RMID (Resource monitoring ID) management") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Reinette Chatre <reinette.chatre@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210917165958.28313-1-james.morse@arm.com
2021-10-06x86/hyperv: Avoid erroneously sending IPI to 'self'Vitaly Kuznetsov1-5/+15
__send_ipi_mask_ex() uses an optimization: when the target CPU mask is equal to 'cpu_present_mask' it uses 'HV_GENERIC_SET_ALL' format to avoid converting the specified cpumask to VP_SET. This case was overlooked when 'exclude_self' parameter was added. As the result, a spurious IPI to 'self' can be send. Reported-by: Thomas Gleixner <tglx@linutronix.de> Fixes: dfb5c1e12c28 ("x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20211006125016.941616-1-vkuznets@redhat.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-10-05xen/x86: adjust data placementJan Beulich2-3/+5
Both xen_pvh and xen_start_flags get written just once early during init. Using the respective annotation then allows the open-coded placing in .data to go away. Additionally the former, like the latter, wants exporting, or else xen_pvh_domain() can't be used from modules. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/8155ed26-5a1d-c06f-42d8-596d26e75849@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-05x86/PVH: adjust function/data placementJan Beulich1-6/+6
Two of the variables can live in .init.data, allowing the open-coded placing in .data to go away. Another "variable" is used to communicate a size value only to very early assembly code, which hence can be both const and live in .init.*. Additionally two functions were lacking __init annotations. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/3b0bb22e-43f4-e459-c5cb-169f996b5669@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-05xen/x86: hook up xen_banner() also for PVHJan Beulich4-11/+16
This was effectively lost while dropping PVHv1 code. Move the function and arrange for it to be called the same way as done in PV mode. Clearly this then needs re-introducing the XENFEAT_mmu_pt_update_preserve_ad check that was recently removed, as that's a PV-only feature. Since the string pointed at by pv_info.name describes the mode, drop "paravirtualized" from the log message while moving the code. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/de03054d-a20d-2114-bb86-eec28e17b3b8@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-05xen/x86: generalize preferred console model from PV to PVH Dom0Jan Beulich4-7/+18
Without announcing hvc0 as preferred it won't get used as long as tty0 gets registered earlier. This is particularly problematic with there not being any screen output for PVH Dom0 when the screen is in graphics mode, as the necessary information doesn't get conveyed yet from the hypervisor. Follow PV's model, but be conservative and do this for Dom0 only for now. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/582328b6-c86c-37f3-d802-5539b7a86736@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-05xen/x86: allow "earlyprintk=xen" to work for PV Dom0Jan Beulich1-1/+1
With preferred consoles "tty" and "hvc" announced as preferred, registering "xenboot" early won't result in use of the console: It also needs to be registered as preferred. Generalize this from being DomU- only so far. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/d4a34540-a476-df2c-bca6-732d0d58c5f0@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-05xen/x86: allow PVH Dom0 without XEN_PV=yJan Beulich7-35/+47
Decouple XEN_DOM0 from XEN_PV, converting some existing uses of XEN_DOM0 to a new XEN_PV_DOM0. (I'm not convinced all are really / should really be PV-specific, but for starters I've tried to be conservative.) For PVH Dom0 the hypervisor populates MADT with only x2APIC entries, so without x2APIC support enabled in the kernel things aren't going to work very well. (As opposed, DomU-s would only ever see LAPIC entries in MADT as of now.) Note that this then requires PVH Dom0 to be 64-bit, as X86_X2APIC depends on X86_64. In the course of this xen_running_on_version_or_later() needs to be available more broadly. Move it from a PV-specific to a generic file, considering that what it does isn't really PV-specific at all anyway. Note that xen/interface/version.h cannot be included on its own; in enlighten.c, which uses SCHEDOP_* anyway, include xen/interface/sched.h first to resolve the apparently sole missing type (xen_ulong_t). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/983bb72f-53df-b6af-14bd-5e088bd06a08@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-05xen/x86: prevent PVH type from getting clobberedJan Beulich1-5/+4
Like xen_start_flags, xen_domain_type gets set before .bss gets cleared. Hence this variable also needs to be prevented from getting put in .bss, which is possible because XEN_NATIVE is an enumerator evaluating to zero. Any use prior to init_hvm_pv_info() setting the variable again would lead to wrong decisions; one such case is xenboot_console_setup() when called as a result of "earlyprintk=xen". Use __ro_after_init as more applicable than either __section(".data") or __read_mostly. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/d301677b-6f22-5ae6-bd36-458e1f323d0b@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-05xen/privcmd: drop "pages" parameter from xen_remap_pfn()Jan Beulich1-1/+1
The function doesn't use it and all of its callers say in a comment that their respective arguments are to be non-NULL only in auto-translated mode. Since xen_remap_domain_mfn_array() isn't supposed to be used by non-PV, drop the parameter there as well. It was bogusly passed as non- NULL (PRIV_VMA_LOCKED) by its only caller anyway. For xen_remap_domain_gfn_range(), otoh, it's not clear at all why this wouldn't want / might not need to gain auto-translated support down the road, so the parameter is retained there despite now remaining unused (and the only caller passing NULL); correct a respective comment as well. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/036ad8a2-46f9-ac3d-6219-bdc93ab9e10b@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-03kvm: fix objtool relocation warningLinus Torvalds1-1/+0
The recent change to make objtool aware of more symbol relocation types (commit 24ff65257375: "objtool: Teach get_alt_entry() about more relocation types") also added another check, and resulted in this objtool warning when building kvm on x86: arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception The reason seems to be that kvm_fastop_exception() is marked as a global symbol, which causes the relocation to ke kept around for objtool. And at the same time, the kvm_fastop_exception definition (which is done as an inline asm statement) doesn't actually set the type of the global, which then makes objtool unhappy. The minimal fix is to just not mark kvm_fastop_exception as being a global symbol. It's only used in that one compilation unit anyway, so it was always pointless. That's how all the other local exception table labels are done. I'm not entirely happy about the kinds of games that the kvm code plays with doing its own exception handling, and the fact that it confused objtool is most definitely a symptom of the code being a bit too subtle and ad-hoc. But at least this trivial one-liner makes objtool no longer upset about what is going on. Fixes: 24ff65257375 ("objtool: Teach get_alt_entry() about more relocation types") Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/ Cc: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-03Merge tag 'perf_urgent_for_v5.15_rc4' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Make sure the destroy callback is reset when a event initialization fails - Update the event constraints for Icelake - Make sure the active time of an event is updated even for inactive events * tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: fix userpage->time_enabled of inactive events perf/x86/intel: Update event constraints for ICX perf/x86: Reset destroy callback on event init failure
2021-10-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-14/+19
Pull more kvm fixes from Paolo Bonzini: "Small x86 fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Ensure all migrations are performed when test is affined KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checks ptp: Fix ptp_kvm_getcrosststamp issue for x86 ptp_kvm x86/kvmclock: Move this_cpu_pvti into kvmclock.h selftests: KVM: Don't clobber XMM register when read KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue
2021-10-01perf/x86/intel: Update event constraints for ICXKan Liang1-0/+1
According to the latest event list, the event encoding 0xEF is only available on the first 4 counters. Add it into the event constraints table. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1632842343-25862-1-git-send-email-kan.liang@linux.intel.com
2021-10-01perf/x86: Reset destroy callback on event init failureAnand K Mistry1-0/+1
perf_init_event tries multiple init callbacks and does not reset the event state between tries. When x86_pmu_event_init runs, it unconditionally sets the destroy callback to hw_perf_event_destroy. On the next init attempt after x86_pmu_event_init, in perf_try_init_event, if the pmu's capabilities includes PERF_PMU_CAP_NO_EXCLUDE, the destroy callback will be run. However, if the next init didn't set the destroy callback, hw_perf_event_destroy will be run (since the callback wasn't reset). Looking at other pmu init functions, the common pattern is to only set the destroy callback on a successful init. Resetting the callback on failure tries to replicate that pattern. This was discovered after commit f11dd0d80555 ("perf/x86/amd/ibs: Extend PERF_PMU_CAP_NO_EXCLUDE to IBS Op") when the second (and only second) run of the perf tool after a reboot results in 0 samples being generated. The extra run of hw_perf_event_destroy results in active_events having an extra decrement on each perf run. The second run has active_events == 0 and every subsequent run has active_events < 0. When active_events == 0, the NMI handler will early-out and not record any samples. Signed-off-by: Anand K Mistry <amistry@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210929170405.1.I078b98ee7727f9ae9d6df8262bad7e325e40faf0@changeid
2021-10-01x86/hpet: Use another crystalball to evaluate HPET usabilityThomas Gleixner2-6/+81
On recent Intel systems the HPET stops working when the system reaches PC10 idle state. The approach of adding PCI ids to the early quirks to disable HPET on these systems is a whack a mole game which makes no sense. Check for PC10 instead and force disable HPET if supported. The check is overbroad as it does not take ACPI, intel_idle enablement and command line parameters into account. That's fine as long as there is at least PMTIMER available to calibrate the TSC frequency. The decision can be overruled by adding "hpet=force" on the kernel command line. Remove the related early PCI quirks for affected Ice Cake and Coffin Lake systems as they are not longer required. That should also cover all other systems, i.e. Tiger Rag and newer generations, which are most likely affected by this as well. Fixes: Yet another hardware trainwreck Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: Bjorn Helgaas <bhelgaas@google.com>
2021-10-01x86/sev: Return an error on a returned non-zero SW_EXITINFO1[31:0]Tom Lendacky1-0/+2
After returning from a VMGEXIT NAE event, SW_EXITINFO1[31:0] is checked for a value of 1, which indicates an error and that SW_EXITINFO2 contains exception information. However, future versions of the GHCB specification may define new values for SW_EXITINFO1[31:0], so really any non-zero value should be treated as an error. Fixes: 597cfe48212a ("x86/boot/compressed/64: Setup a GHCB-based VC Exception handler") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 5.10+ Link: https://lkml.kernel.org/r/efc772af831e9e7f517f0439b13b41f56bad8784.1633063321.git.thomas.lendacky@amd.com
2021-10-01Merge tag 'net-5.15-rc4' of ↵Linus Torvalds1-18/+48
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from mac80211, netfilter and bpf. Current release - regressions: - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from interrupt - mdio: revert mechanical patches which broke handling of optional resources - dev_addr_list: prevent address duplication Previous releases - regressions: - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb (NULL deref) - Revert "mac80211: do not use low data rates for data frames with no ack flag", fixing broadcast transmissions - mac80211: fix use-after-free in CCMP/GCMP RX - netfilter: include zone id in tuple hash again, minimize collisions - netfilter: nf_tables: unlink table before deleting it (race -> UAF) - netfilter: log: work around missing softdep backend module - mptcp: don't return sockets in foreign netns - sched: flower: protect fl_walk() with rcu (race -> UAF) - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup - smsc95xx: fix stalled rx after link change - enetc: fix the incorrect clearing of IF_MODE bits - ipv4: fix rtnexthop len when RTA_FLOW is present - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this SKU - e100: fix length calculation & buffer overrun in ethtool::get_regs Previous releases - always broken: - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx - mac80211: drop frames from invalid MAC address in ad-hoc mode - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race -> UAF) - bpf, x86: Fix bpf mapping of atomic fetch implementation - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog - netfilter: ip6_tables: zero-initialize fragment offset - mhi: fix error path in mhi_net_newlink - af_unix: return errno instead of NULL in unix_create1() when over the fs.file-max limit Misc: - bpf: exempt CAP_BPF from checks against bpf_jit_limit - netfilter: conntrack: make max chain length random, prevent guessing buckets by attackers - netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic, defer conntrack walk to work queue (prevent hogging RTNL lock)" * tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) af_unix: fix races in sk_peer_pid and sk_peer_cred accesses net: stmmac: fix EEE init issue when paired with EEE capable PHYs net: dev_addr_list: handle first address in __hw_addr_add_ex net: sched: flower: protect fl_walk() with rcu net: introduce and use lock_sock_fast_nested() net: phy: bcm7xxx: Fixed indirect MMD operations net: hns3: disable firmware compatible features when uninstall PF net: hns3: fix always enable rx vlan filter problem after selftest net: hns3: PF enable promisc for VF when mac table is overflow net: hns3: fix show wrong state when add existing uc mac address net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE net: hns3: don't rollback when destroy mqprio fail net: hns3: remove tc enable checking net: hns3: do not allow call hns3_nic_net_open repeatedly ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup net: bridge: mcast: Associate the seqcount with its protecting lock. net: mdio-ipq4019: Fix the error for an optional regs resource net: hns3: fix hclge_dbg_dump_tm_pg() stack usage net: mdio: mscc-miim: Fix the mdio controller af_unix: Return errno instead of NULL in unix_create1(). ...
2021-09-30KVM: x86: Swap order of CPUID entry "index" vs. "significant flag" checksSean Christopherson1-2/+2
Check whether a CPUID entry's index is significant before checking for a matching index to hack-a-fix an undefined behavior bug due to consuming uninitialized data. RESET/INIT emulation uses kvm_cpuid() to retrieve CPUID.0x1, which does _not_ have a significant index, and fails to initialize the dummy variable that doubles as EBX/ECX/EDX output _and_ ECX, a.k.a. index, input. Practically speaking, it's _extremely_ unlikely any compiler will yield code that causes problems, as the compiler would need to inline the kvm_cpuid() call to detect the uninitialized data, and intentionally hose the kernel, e.g. insert ud2, instead of simply ignoring the result of the index comparison. Although the sketchy "dummy" pattern was introduced in SVM by commit 66f7b72e1171 ("KVM: x86: Make register state after reset conform to specification"), it wasn't actually broken until commit 7ff6c0350315 ("KVM: x86: Remove stateful CPUID handling") arbitrarily swapped the order of operations such that "index" was checked before the significant flag. Avoid consuming uninitialized data by reverting to checking the flag before the index purely so that the fix can be easily backported; the offending RESET/INIT code has been refactored, moved, and consolidated from vendor code to common x86 since the bug was introduced. A future patch will directly address the bad RESET/INIT behavior. The undefined behavior was detected by syzbot + KernelMemorySanitizer. BUG: KMSAN: uninit-value in cpuid_entry2_find arch/x86/kvm/cpuid.c:68 BUG: KMSAN: uninit-value in kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 BUG: KMSAN: uninit-value in kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183 cpuid_entry2_find arch/x86/kvm/cpuid.c:68 [inline] kvm_find_cpuid_entry arch/x86/kvm/cpuid.c:1103 [inline] kvm_cpuid+0x456/0x28f0 arch/x86/kvm/cpuid.c:1183 kvm_vcpu_reset+0x13fb/0x1c20 arch/x86/kvm/x86.c:10885 kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923 vcpu_enter_guest+0xfd2/0x6d80 arch/x86/kvm/x86.c:9534 vcpu_run+0x7f5/0x18d0 arch/x86/kvm/x86.c:9788 kvm_arch_vcpu_ioctl_run+0x245b/0x2d10 arch/x86/kvm/x86.c:10020 Local variable ----dummy@kvm_vcpu_reset created at: kvm_vcpu_reset+0x1fb/0x1c20 arch/x86/kvm/x86.c:10812 kvm_apic_accept_events+0x58f/0x8c0 arch/x86/kvm/lapic.c:2923 Reported-by: syzbot+f3985126b746b3d59c9d@syzkaller.appspotmail.com Reported-by: Alexander Potapenko <glider@google.com> Fixes: 2a24be79b6b7 ("KVM: VMX: Set EDX at INIT with CPUID.0x1, Family-Model-Stepping") Fixes: 7ff6c0350315 ("KVM: x86: Remove stateful CPUID handling") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20210929222426.1855730-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30x86/kvmclock: Move this_cpu_pvti into kvmclock.hZelin Deng2-11/+16
There're other modules might use hv_clock_per_cpu variable like ptp_kvm, so move it into kvmclock.h and export the symbol to make it visiable to other modules. Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com> Cc: <stable@vger.kernel.org> Message-Id: <1632892429-101194-2-git-send-email-zelin.deng@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-29Merge branch 'linus' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This contains fixes for a resource leak in ccp as well as stack corruption in x86/sm4" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/sm4 - Fix frame pointer stack corruption crypto: ccp - fix resource leaks in ccp_run_aes_gcm_cmd()
2021-09-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-18/+48
Daniel Borkmann says: ==================== pull-request: bpf 2021-09-28 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 14 day(s) which contain a total of 11 files changed, 139 insertions(+), 53 deletions(-). The main changes are: 1) Fix MIPS JIT jump code emission for too large offsets, from Piotr Krysiuk. 2) Fix x86 JIT atomic/fetch emission when dst reg maps to rax, from Johan Almbladh. 3) Fix cgroup_sk_alloc corner case when called from interrupt, from Daniel Borkmann. 4) Fix segfault in libbpf's linker for objects without BTF, from Kumar Kartikeya Dwivedi. 5) Fix bpf_jit_charge_modmem for applications with CAP_BPF, from Lorenz Bauer. 6) Fix return value handling for struct_ops BPF programs, from Hou Tao. 7) Various fixes to BPF selftests, from Jiri Benc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> ,
2021-09-28bpf, x86: Fix bpf mapping of atomic fetch implementationJohan Almbladh1-5/+8
Fix the case where the dst register maps to %rax as otherwise this produces an incorrect mapping with the implementation in 981f94c3e921 ("bpf: Add bitwise atomic instructions") as %rax is clobbered given it's part of the cmpxchg as operand. The issue is similar to b29dd96b905f ("bpf, x86: Fix BPF_FETCH atomic and/or/ xor with r0 as src") just that the case of dst register was missed. Before, dst=r0 (%rax) src=r2 (%rsi): [...] c5: mov %rax,%r10 c8: mov 0x0(%rax),%rax <---+ (broken) cc: mov %rax,%r11 | cf: and %rsi,%r11 | d2: lock cmpxchg %r11,0x0(%rax) <---+ d8: jne 0x00000000000000c8 | da: mov %rax,%rsi | dd: mov %r10,%rax | [...] | | After, dst=r0 (%rax) src=r2 (%rsi): | | [...] | da: mov %rax,%r10 | dd: mov 0x0(%r10),%rax <---+ (fixed) e1: mov %rax,%r11 | e4: and %rsi,%r11 | e7: lock cmpxchg %r11,0x0(%r10) <---+ ed: jne 0x00000000000000dd ef: mov %rax,%rsi f2: mov %r10,%rax [...] The remaining combinations were fine as-is though: After, dst=r9 (%r15) src=r0 (%rax): [...] dc: mov %rax,%r10 df: mov 0x0(%r15),%rax e3: mov %rax,%r11 e6: and %r10,%r11 e9: lock cmpxchg %r11,0x0(%r15) ef: jne 0x00000000000000df _ f1: mov %rax,%r10 | (unneeded, but f4: mov %r10,%rax _| not a problem) [...] After, dst=r9 (%r15) src=r4 (%rcx): [...] de: mov %rax,%r10 e1: mov 0x0(%r15),%rax e5: mov %rax,%r11 e8: and %rcx,%r11 eb: lock cmpxchg %r11,0x0(%r15) f1: jne 0x00000000000000e1 f3: mov %rax,%rcx f6: mov %r10,%rax [...] The case of dst == src register is rejected by the verifier and therefore not supported, but x86 JIT also handles this case just fine. After, dst=r0 (%rax) src=r0 (%rax): [...] eb: mov %rax,%r10 ee: mov 0x0(%r10),%rax f2: mov %rax,%r11 f5: and %r10,%r11 f8: lock cmpxchg %r11,0x0(%r10) fe: jne 0x00000000000000ee 100: mov %rax,%r10 103: mov %r10,%rax [...] Fixes: 981f94c3e921 ("bpf: Add bitwise atomic instructions") Reported-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Brendan Jackman <jackmanb@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-09-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds17-171/+267
Pull kvm fixes from Paolo Bonzini: "A bit late... I got sidetracked by back-from-vacation routines and conferences. But most of these patches are already a few weeks old and things look more calm on the mailing list than what this pull request would suggest. x86: - missing TLB flush - nested virtualization fixes for SMM (secure boot on nested hypervisor) and other nested SVM fixes - syscall fuzzing fixes - live migration fix for AMD SEV - mirror VMs now work for SEV-ES too - fixes for reset - possible out-of-bounds access in IOAPIC emulation - fix enlightened VMCS on Windows 2022 ARM: - Add missing FORCE target when building the EL2 object - Fix a PMU probe regression on some platforms Generic: - KCSAN fixes selftests: - random fixes, mostly for clang compilation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) selftests: KVM: Explicitly use movq to read xmm registers selftests: KVM: Call ucall_init when setting up in rseq_test KVM: Remove tlbs_dirty KVM: X86: Synchronize the shadow pagetable before link it KVM: X86: Fix missed remote tlb flush in rmap_write_protect() KVM: x86: nSVM: don't copy virt_ext from vmcb12 KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0 KVM: x86: nSVM: restore int_vector in svm_clear_vintr kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[] KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest state KVM: x86: nSVM: refactor svm_leave_smm and smm_enter_smm KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode KVM: x86: reset pdptrs_from_userspace when exiting smm KVM: x86: nSVM: restore the L1 host state prior to resuming nested guest on SMM exit KVM: nVMX: Filter out all unsupported controls when eVMCS was activated KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUs KVM: Clean up benign vcpu->cpu data races when kicking vCPUs ...
2021-09-27KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issueZhenzhong Duan1-1/+1
When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry, clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i]. Modifying guest_uret_msrs directly is completely broken as 'i' does not point at the MSR_IA32_TSX_CTRL entry. In fact, it's guaranteed to be an out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior loop. By sheer dumb luck, the fallout is limited to "only" failing to preserve the host's TSX_CTRL_CPUID_CLEAR. The out-of-bounds access is benign as it's guaranteed to clear a bit in a guest MSR value, which are always zero at vCPU creation on both x86-64 and i386. Cc: stable@vger.kernel.org Fixes: 8ea8b8d6f869 ("KVM: VMX: Use common x86's uret MSR list as the one true list") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-26Merge tag 'x86-urgent-2021-09-26' of ↵Linus Torvalds5-25/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for X86: - Prevent sending the wrong signal when protection keys are enabled and the kernel handles a fault in the vsyscall emulation. - Invoke early_reserve_memory() before invoking e820_memory_setup() which is required to make the Xen dom0 e820 hooks work correctly. - Use the correct data type for the SETZ operand in the EMQCMDS instruction wrapper. - Prevent undefined behaviour to the potential unaligned accesss in the instruction decoder library" * tag 'x86-urgent-2021-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn, tools/x86: Fix undefined behavior due to potential unaligned accesses x86/asm: Fix SETZ size enqcmds() build failure x86/setup: Call early_reserve_memory() earlier x86/fault: Fix wrong signal when vsyscall fails with pkey