summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2023-06-19Backmerge tag 'v6.4-rc7' of ↵Dave Airlie21-46/+119
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next Linux 6.4-rc7 Need this to pull in the msm work. Signed-off-by: Dave Airlie <airlied@redhat.com>
2023-06-13Merge tag 'mm-hotfixes-stable-2023-06-12-12-22' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "19 hotfixes. 14 are cc:stable and the remainder address issues which were introduced during this development cycle or which were considered inappropriate for a backport" * tag 'mm-hotfixes-stable-2023-06-12-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: zswap: do not shrink if cgroup may not zswap page cache: fix page_cache_next/prev_miss off by one ocfs2: check new file size on fallocate call mailmap: add entry for John Keeping mm/damon/core: fix divide error in damon_nr_accesses_to_accesses_bp() epoll: ep_autoremove_wake_function should use list_del_init_careful mm/gup_test: fix ioctl fail for compat task nilfs2: reject devices with insufficient block count ocfs2: fix use-after-free when unmounting read-only filesystem lib/test_vmalloc.c: avoid garbage in page array nilfs2: fix possible out-of-bounds segment allocation in resize ioctl riscv/purgatory: remove PGO flags powerpc/purgatory: remove PGO flags x86/purgatory: remove PGO flags kexec: support purgatories with .text.hot sections mm/uffd: allow vma to merge as much as possible mm/uffd: fix vma operation where start addr cuts part of vma radix-tree: move declarations to header nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key()
2023-06-12x86/purgatory: remove PGO flagsRicardo Ribalda1-0/+5
If profile-guided optimization is enabled, the purgatory ends up with multiple .text sections. This is not supported by kexec and crashes the system. Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-2-b05c520b7296@chromium.org Fixes: 930457057abe ("kernel/kexec_file.c: split up __kexec_load_puragory") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Cc: <stable@vger.kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Philipp Rudo <prudo@redhat.com> Cc: Ross Zwisler <zwisler@google.com> Cc: Simon Horman <horms@kernel.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-11Merge tag 'x86_urgent_for_v6.4_rc6' of ↵Linus Torvalds1-9/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: - Set up the kernel CS earlier in the boot process in case EFI boots the kernel after bypassing the decompressor and the CS descriptor used ends up being the EFI one which is not mapped in the identity page table, leading to early SEV/SNP guest communication exceptions resulting in the guest crashing * tag 'x86_urgent_for_v6.4_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/head/64: Switch to KERNEL_CS as soon as new GDT is installed
2023-06-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-4/+26
Pull kvm fixes from Paolo Bonzini: "ARM: - Address some fallout of the locking rework, this time affecting the way the vgic is configured - Fix an issue where the page table walker frees a subtree and then proceeds with walking what it has just freed... - Check that a given PA donated to the guest is actually memory (only affecting pKVM) - Correctly handle MTE CMOs by Set/Way - Fix the reported address of a watchpoint forwarded to userspace - Fix the freeing of the root of stage-2 page tables - Stop creating spurious PMU events to perform detection of the default PMU and use the existing PMU list instead x86: - Fix a memslot lookup bug in the NX recovery thread that could theoretically let userspace bypass the NX hugepage mitigation - Fix a s/BLOCKING/PENDING bug in SVM's vNMI support - Account exit stats for fastpath VM-Exits that never leave the super tight run-loop - Fix an out-of-bounds bug in the optimized APIC map code, and add a regression test for the race" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Add test for race in kvm_recalculate_apic_map() KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds KVM: x86: Account fastpath-only VM-Exits in vCPU stats KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker KVM: arm64: Document default vPMU behavior on heterogeneous systems KVM: arm64: Iterate arm_pmus list to probe for default PMU KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed() KVM: arm64: Populate fault info for watchpoint KVM: arm64: Reload PTE after invoking walker callback on preorder traversal KVM: arm64: Handle trap of tagged Set/Way CMOs arm64: Add missing Set/Way CMO encodings KVM: arm64: Prevent unconditional donation of unmapped regions from the host KVM: arm64: vgic: Fix a comment KVM: arm64: vgic: Fix locking comment KVM: arm64: vgic: Wrap vgic_its_create() with config_lock KVM: arm64: vgic: Fix a circular locking issue
2023-06-03KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-boundsSean Christopherson1-2/+18
Bail from kvm_recalculate_phys_map() and disable the optimized map if the target vCPU's x2APIC ID is out-of-bounds, i.e. if the vCPU was added and/or enabled its local APIC after the map was allocated. This fixes an out-of-bounds access bug in the !x2apic_format path where KVM would write beyond the end of phys_map. Check the x2APIC ID regardless of whether or not x2APIC is enabled, as KVM's hardcodes x2APIC ID to be the vCPU ID, i.e. it can't change, and the map allocation in kvm_recalculate_apic_map() doesn't check for x2APIC being enabled, i.e. the check won't get false postivies. Note, this also affects the x2apic_format path, which previously just ignored the "x2apic_id > new->max_apic_id" case. That too is arguably a bug fix, as ignoring the vCPU meant that KVM would not send interrupts to the vCPU until the next map recalculation. In practice, that "bug" is likely benign as a newly present vCPU/APIC would immediately trigger a recalc. But, there's no functional downside to disabling the map, and a future patch will gracefully handle the -E2BIG case by retrying instead of simply disabling the optimized map. Opportunistically add a sanity check on the xAPIC ID size, along with a comment explaining why the xAPIC ID is guaranteed to be "good". Reported-by: Michal Luczaj <mhal@rbox.co> Fixes: 5b84b0291702 ("KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230602233250.1014316-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-03x86/head/64: Switch to KERNEL_CS as soon as new GDT is installedTom Lendacky1-9/+9
The call to startup_64_setup_env() will install a new GDT but does not actually switch to using the KERNEL_CS entry until returning from the function call. Commit bcce82908333 ("x86/sev: Detect/setup SEV/SME features earlier in boot") moved the call to sme_enable() earlier in the boot process and in between the call to startup_64_setup_env() and the switch to KERNEL_CS. An SEV-ES or an SEV-SNP guest will trigger #VC exceptions during the call to sme_enable() and if the CS pushed on the stack as part of the exception and used by IRETQ is not mapped by the new GDT, then problems occur. Today, the current CS when entering startup_64 is the kernel CS value because it was set up by the decompressor code, so no issue is seen. However, a recent patchset that looked to avoid using the legacy decompressor during an EFI boot exposed this bug. At entry to startup_64, the CS value is that of EFI and is not mapped in the new kernel GDT. So when a #VC exception occurs, the CS value used by IRETQ is not valid and the guest boot crashes. Fix this issue by moving the block that switches to the KERNEL_CS value to be done immediately after returning from startup_64_setup_env(). Fixes: bcce82908333 ("x86/sev: Detect/setup SEV/SME features earlier in boot") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Link: https://lore.kernel.org/all/6ff1f28af2829cc9aea357ebee285825f90a431f.1684340801.git.thomas.lendacky%40amd.com
2023-06-03KVM: x86: Account fastpath-only VM-Exits in vCPU statsSean Christopherson1-0/+3
Increment vcpu->stat.exits when handling a fastpath VM-Exit without going through any part of the "slow" path. Not bumping the exits stat can result in wildly misleading exit counts, e.g. if the primary reason the guest is exiting is to program the TSC deadline timer. Fixes: 404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230602011920.787844-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-03KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASKMaciej S. Szmigiero1-1/+1
While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware I noticed that with vCPU count large enough (> 16) they sometimes froze at boot. With vCPU count of 64 they never booted successfully - suggesting some kind of a race condition. Since adding "vnmi=0" module parameter made these guests boot successfully it was clear that the problem is most likely (v)NMI-related. Running kvm-unit-tests quickly showed failing NMI-related tests cases, like "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests and the NMI parts of eventinj test. The issue was that once one NMI was being serviced no other NMI was allowed to be set pending (NMI limit = 0), which was traced to svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather than for the "NMI pending" flag. Fix this by testing for the right flag in svm_is_vnmi_pending(). Once this is done, the NMI-related kvm-unit-tests pass successfully and the Windows guest no longer freezes at boot. Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-03KVM: x86/mmu: Grab memslot for correct address space in NX recovery workerSean Christopherson1-1/+4
Factor in the address space (non-SMM vs. SMM) of the target shadow page when recovering potential NX huge pages, otherwise KVM will retrieve the wrong memslot when zapping shadow pages that were created for SMM. The bug most visibly manifests as a WARN on the memslot being non-NULL, but the worst case scenario is that KVM could unaccount the shadow page without ensuring KVM won't install a huge page, i.e. if the non-SMM slot is being dirty logged, but the SMM slot is not. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 3911 at arch/x86/kvm/mmu/mmu.c:7015 kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] CPU: 1 PID: 3911 Comm: kvm-nx-lpage-re RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] RSP: 0018:ffff99b284f0be68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff99b284edd000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff9271397024e0 R08: 0000000000000000 R09: ffff927139702450 R10: 0000000000000000 R11: 0000000000000001 R12: ffff99b284f0be98 R13: 0000000000000000 R14: ffff9270991fcd80 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff927f9f640000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0aacad3ae0 CR3: 000000088fc2c005 CR4: 00000000003726e0 Call Trace: <TASK> __pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm] kvm_vm_worker_thread+0x106/0x1c0 [kvm] kthread+0xd9/0x100 ret_from_fork+0x2c/0x50 </TASK> ---[ end trace 0000000000000000 ]--- This bug was exposed by commit edbdb43fc96b ("KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated"), which allowed KVM to retain SMM TDP MMU roots effectively indefinitely. Before commit edbdb43fc96b, KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages once all vCPUs exited SMM, which made the window where this bug (recovering an SMM NX huge page) could be encountered quite tiny. To hit the bug, the NX recovery thread would have to run while at least one vCPU was in SMM. Most VMs typically only use SMM during boot, and so the problematic shadow pages were gone by the time the NX recovery thread ran. Now that KVM preserves TDP MMU roots until they are explicitly invalidated (e.g. by a memslot deletion), the window to trigger the bug is effectively never closed because most VMMs don't delete memslots after boot (except for a handful of special scenarios). Fixes: eb298605705a ("KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages") Reported-by: Fabio Coatti <fabio.coatti@gmail.com> Closes: https://lore.kernel.org/all/CADpTngX9LESCdHVu_2mQkNGena_Ng2CphWNwsRGSMxzDsTjU2A@mail.gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230602010137.784664-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-06-02fork, vhost: Use CLONE_THREAD to fix freezer/ps regressionMike Christie3-3/+3
When switching from kthreads to vhost_tasks two bugs were added: 1. The vhost worker tasks's now show up as processes so scripts doing ps or ps a would not incorrectly detect the vhost task as another process. 2. kthreads disabled freeze by setting PF_NOFREEZE, but vhost tasks's didn't disable or add support for them. To fix both bugs, this switches the vhost task to be thread in the process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that SIGKILL/STOP support is required because CLONE_THREAD requires CLONE_SIGHAND which requires those 2 signals to be supported. This is a modified version of the patch written by Mike Christie <michael.christie@oracle.com> which was a modified version of patch originally written by Linus. Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER. Including ignoring signals, setting up the register state, and having get_signal return instead of calling do_group_exit. Tidied up the vhost_task abstraction so that the definition of vhost_task only needs to be visible inside of vhost_task.c. Making it easier to review the code and tell what needs to be done where. As part of this the main loop has been moved from vhost_worker into vhost_task_fn. vhost_worker now returns true if work was done. The main loop has been updated to call get_signal which handles SIGSTOP, freezing, and collects the message that tells the thread to exit as part of process exit. This collection clears __fatal_signal_pending. This collection is not guaranteed to clear signal_pending() so clear that explicitly so the schedule() sleeps. For now the vhost thread continues to exist and run work until the last file descriptor is closed and the release function is called as part of freeing struct file. To avoid hangs in the coredump rendezvous and when killing threads in a multi-threaded exec. The coredump code and de_thread have been modified to ignore vhost threads. Remvoing the special case for exec appears to require teaching vhost_dev_flush how to directly complete transactions in case the vhost thread is no longer running. Removing the special case for coredump rendezvous requires either the above fix needed for exec or moving the coredump rendezvous into get_signal. Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Co-developed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-29Merge tag 'v6.4-p3' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix an alignment crash in x86/aria" * tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors
2023-05-28Merge tag 'x86-urgent-2023-05-28' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu fix from Thomas Gleixner: "A single fix for x86: - Prevent a bogus setting for the number of HT siblings, which is caused by the CPUID evaluation trainwreck of X86. That recomputes the value for each CPU, so the last CPU "wins". That can cause completely bogus sibling values" * tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
2023-05-28Merge tag 'perf-urgent-2023-05-28' of ↵Linus Torvalds2-1/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A small set of perf fixes: - Make the MSR-readout based CHA discovery work around broken discovery tables in some SPR firmwares. - Prevent saving PEBS configuration which has software bits set that cause a crash when restored into the relevant MSR" * tag 'perf-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/uncore: Correct the number of CHAs on SPR perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBS
2023-05-28Merge tag 'objtool-urgent-2023-05-28' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull unwinder fixes from Thomas Gleixner: "A set of unwinder and tooling fixes: - Ensure that the stack pointer on x86 is aligned again so that the unwinder does not read past the end of the stack - Discard .note.gnu.property section which has a pointlessly different alignment than the other note sections. That confuses tooling of all sorts including readelf, libbpf and pahole" * tag 'objtool-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/show_trace_log_lvl: Ensure stack pointer is aligned, again vmlinux.lds.h: Discard .note.gnu.property section
2023-05-27Merge tag 'for-linus-6.4-rc4-tag' of ↵Linus Torvalds1-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a double free fix in the Xen pvcalls backend driver - a fix for a regression causing the MSI related sysfs entries to not being created in Xen PV guests - a fix in the Xen blkfront driver for handling insane input data better * tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/pci/xen: populate MSI sysfs entries xen/pvcalls-back: fix double frees with pvcalls_new_active_socket() xen/blkfront: Only check REQ_FUA for writes
2023-05-26x86: re-introduce support for ERMS copies for user space accessesLinus Torvalds1-1/+9
I tried to streamline our user memory copy code fairly aggressively in commit adfcf4231b8c ("x86: don't use REP_GOOD or ERMS for user memory copies"), in order to then be able to clean up the code and inline the modern FSRM case in commit 577e6a7fd50d ("x86: inline the 'rep movs' in user copies for the FSRM case"). We had reports [1] of that causing regressions earlier with blogbench, but that turned out to be a horrible benchmark for that case, and not a sufficient reason for re-instating "rep movsb" on older machines. However, now Eric Dumazet reported [2] a regression in performance that seems to be a rather more real benchmark, where due to the removal of "rep movs" a TCP stream over a 100Gbps network no longer reaches line speed. And it turns out that with the simplified the calling convention for the non-FSRM case in commit 427fda2c8a49 ("x86: improve on the non-rep 'copy_user' function"), re-introducing the ERMS case is actually fairly simple. Of course, that "fairly simple" is glossing over several missteps due to having to fight our assembler alternative code. This code really wanted to rewrite a conditional branch to have two different targets, but that made objtool sufficiently unhappy that this instead just ended up doing a choice between "jump to the unrolled loop, or use 'rep movsb' directly". Let's see if somebody finds a case where the kernel memory copies also care (see commit 68674f94ffc9: "x86: don't use REP_GOOD or ERMS for small memory copies"). But Eric does argue that the user copies are special because networking tries to copy up to 32KB at a time, if order-3 pages allocations are possible. In-kernel memory copies are typically small, unless they are the special "copy pages at a time" kind that still use "rep movs". Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1] Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2] Reported-and-tested-by: Eric Dumazet <edumazet@google.com> Fixes: adfcf4231b8c ("x86: don't use REP_GOOD or ERMS for user memory copies") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-26Merge tag 'drm-misc-next-2023-05-24' of ↵Dave Airlie1-2/+0
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v6.5: UAPI Changes: Cross-subsystem Changes: * fbdev: Move framebuffer I/O helpers to <asm/fb.h>, fix naming * firmware: Init sysfb as early as possible Core Changes: * DRM scheduler: Rename interfaces * ttm: Store ttm_device_funcs in .rodata * Replace strlcpy() with strscpy() in various places * Cleanups Driver Changes: * bridge: analogix: Fix endless probe loop; samsung-dsim: Support swapping clock/data polarity; tc358767: Use devm_ Cleanups; * gma500: Fix I/O-memory access * panel: boe-tv101wum-nl6: Improve initialization; sharp-ls043t1le001: Mode fixes; simple: Add BOE EV121WXM-N10-1850 plus DT bindings; AddS6D7AA0 plus DT bindings; Cleanups * ssd1307x: Style fixes * sun4i: Release clocks * msm: Fix I/O-memory access * nouveau: Cleanups * shmobile: Support Renesas; Enable framebuffer console; Various fixes * vkms: Fix RGB565 conversion Signed-off-by: Dave Airlie <airlied@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAmRuBXEACgkQaA3BHVML # eiPLkwgAqCa7IuSDQhFMWVOI0EJpPPEHtHM8SCT1Pp8aniXk23Ru+E16c5zck53O # uf4tB+zoFrwD9npy60LIvX1OZmXS1KI4+ZO8itYFk6GSjxqbTWbjNFREBeWFdIpa # OG54nEqjFQZzEXY+gJYDpu5zqLy3xLN07ZgQkcMyfW3O/Krj4LLzfQTDl+jP5wkO # 7/v5Eu5CG5QjupMxIjb4e+ruUflp73pynur5bhZsfS1bPNGFTnxHlwg7NWnBXU7o # Hg23UYfCuZZWPmuO26EeUDlN33rCoaycmVgtpdZft2eznca5Mg74Loz1Qc3GQfjw # LLvKsAIlBcZvEIhElkzhtXitBoe7LQ== # =/9zV # -----END PGP SIGNATURE----- # gpg: Signature made Wed 24 May 2023 22:39:13 AEST # gpg: using RSA key 7217FBAC8CE9CF6344A168E5680DC11D530B7A23 # gpg: Can't check signature: No public key # Conflicts: # MAINTAINERS From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230524124237.GA25416@linux-uq9g
2023-05-25x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platformsZhang Rui1-2/+3
Traditionally, all CPUs in a system have identical numbers of SMT siblings. That changes with hybrid processors where some logical CPUs have a sibling and others have none. Today, the CPU boot code sets the global variable smp_num_siblings when every CPU thread is brought up. The last thread to boot will overwrite it with the number of siblings of *that* thread. That last thread to boot will "win". If the thread is a Pcore, smp_num_siblings == 2. If it is an Ecore, smp_num_siblings == 1. smp_num_siblings describes if the *system* supports SMT. It should specify the maximum number of SMT threads among all cores. Ensure that smp_num_siblings represents the system-wide maximum number of siblings by always increasing its value. Never allow it to decrease. On MeteorLake-P platform, this fixes a problem that the Ecore CPUs are not updated in any cpu sibling map because the system is treated as an UP system when probing Ecore CPUs. Below shows part of the CPU topology information before and after the fix, for both Pcore and Ecore CPU (cpu0 is Pcore, cpu 12 is Ecore). ... -/sys/devices/system/cpu/cpu0/topology/package_cpus:000fff -/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-11 +/sys/devices/system/cpu/cpu0/topology/package_cpus:3fffff +/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-21 ... -/sys/devices/system/cpu/cpu12/topology/package_cpus:001000 -/sys/devices/system/cpu/cpu12/topology/package_cpus_list:12 +/sys/devices/system/cpu/cpu12/topology/package_cpus:3fffff +/sys/devices/system/cpu/cpu12/topology/package_cpus_list:0-21 Notice that the "before" 'package_cpus_list' has only one CPU. This means that userspace tools like lscpu will see a little laptop like an 11-socket system: -Core(s) per socket: 1 -Socket(s): 11 +Core(s) per socket: 16 +Socket(s): 1 This is also expected to make the scheduler do rather wonky things too. [ dhansen: remove CPUID detail from changelog, add end user effects ] CC: stable@kernel.org Fixes: bbb65d2d365e ("x86: use cpuid vector 0xb when available for detecting cpu topology") Fixes: 95f3d39ccf7a ("x86/cpu/topology: Provide detect_extended_topology_early()") Suggested-by: Len Brown <len.brown@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20230323015640.27906-1-rui.zhang%40intel.com
2023-05-24perf/x86/uncore: Correct the number of CHAs on SPRKan Liang1-0/+11
The number of CHAs from the discovery table on some SPR variants is incorrect, because of a firmware issue. An accurate number can be read from the MSR UNC_CBO_CONFIG. Fixes: 949b11381f81 ("perf/x86/intel/uncore: Add Sapphire Rapids server CHA support") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Stephane Eranian <eranian@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230508140206.283708-1-kan.liang@linux.intel.com
2023-05-24x86/pci/xen: populate MSI sysfs entriesMaximilian Heyne1-3/+5
Commit bf5e758f02fc ("genirq/msi: Simplify sysfs handling") reworked the creation of sysfs entries for MSI IRQs. The creation used to be in msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs. Then it moved into __msi_domain_alloc_irqs which is an implementation of domain_alloc_irqs. However, Xen comes with the only other implementation of domain_alloc_irqs and hence doesn't run the sysfs population code anymore. Commit 6c796996ee70 ("x86/pci/xen: Fixup fallout from the PCI/MSI overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info but that doesn't actually have an effect because Xen uses it's own domain_alloc_irqs implementation. Fix this by making use of the fallback functions for sysfs population. Fixes: bf5e758f02fc ("genirq/msi: Simplify sysfs handling") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-24crypto: x86/aria - Use 16 byte alignment for GFNI constant vectorsArd Biesheuvel1-2/+0
The GFNI routines in the AVX version of the ARIA implementation now use explicit VMOVDQA instructions to load the constant input vectors, which means they must be 16 byte aligned. So ensure that this is the case, by dropping the section split and the incorrect .align 8 directive, and emitting the constants into the 16-byte aligned section instead. Note that the AVX2 version of this code deviates from this pattern, and does not require a similar fix, given that it loads these contants as 8-byte memory operands, for which AVX2 permits any alignment. Cc: Taehee Yoo <ap420073@gmail.com> Fixes: 8b84475318641c2b ("crypto: x86/aria-avx - Do not use avx2 instructions") Reported-by: syzbot+a6abcf08bad8b18fd198@syzkaller.appspotmail.com Tested-by: syzbot+a6abcf08bad8b18fd198@syzkaller.appspotmail.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-05-23perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBSLike Xu1-1/+1
After commit b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG"), the cpuc->pebs_data_cfg may save some bits that are not supported by real hardware, such as PEBS_UPDATE_DS_SW. This would cause the VMX hardware MSR switching mechanism to save/restore invalid values for PEBS_DATA_CFG MSR, thus crashing the host when PEBS is used for guest. Fix it by using the active host value from cpuc->active_pebs_data_cfg. Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230517133808.67885-1-likexu@tencent.com
2023-05-23Merge tag 'x86_urgent_for_6.4-rc4' of ↵Linus Torvalds1-0/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Dave Hansen: "This works around and issue where the INVLPG instruction may miss invalidating kernel TLB entries in recent hybrid CPUs. I do expect an eventual microcode fix for this. When the microcode version numbers are known, we can circle back around and add them the model table to disable this workaround" * tag 'x86_urgent_for_6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Avoid incomplete Global INVLPG flushes
2023-05-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds4-19/+16
Pull kvm fixes from Paolo Bonzini: "ARM: - Plug a race in the stage-2 mapping code where the IPA and the PA would end up being out of sync - Make better use of the bitmap API (bitmap_zero, bitmap_zalloc...) - FP/SVE/SME documentation update, in the hope that this field becomes clearer... - Add workaround for Apple SEIS brokenness to a new SoC - Random comment fixes x86: - add MSR_IA32_TSX_CTRL into msrs_to_save - fixes for XCR0 handling in SGX enclaves Generic: - Fix vcpu_array[0] races - Fix race between starting a VM and 'reboot -f'" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM) KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE KVM: Fix vcpu_array[0] races KVM: VMX: Fix header file dependency of asm/vmx.h KVM: Don't enable hardware after a restart/shutdown is initiated KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown KVM: arm64: vgic: Add Apple M2 PRO/MAX cpus to the list of broken SEIS implementations KVM: arm64: Clarify host SME state management KVM: arm64: Restructure check for SVE support in FP trap handler KVM: arm64: Document check for TIF_FOREIGN_FPSTATE KVM: arm64: Fix repeated words in comments KVM: arm64: Constify start/end/phys fields of the pgtable walker data KVM: arm64: Infer PA offset from VA in hyp map walker KVM: arm64: Infer the PA offset from IPA in stage-2 map walker KVM: arm64: Use the bitmap API to allocate bitmaps KVM: arm64: Slightly optimize flush_context()
2023-05-21KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_saveMingwei Zhang1-1/+5
Add MSR_IA32_TSX_CTRL into msrs_to_save[] to explicitly tell userspace to save/restore the register value during migration. Missing this may cause userspace that relies on KVM ioctl(KVM_GET_MSR_INDEX_LIST) fail to port the value to the target VM. In addition, there is no need to add MSR_IA32_TSX_CTRL when ARCH_CAP_TSX_CTRL_MSR is not supported in kvm_get_arch_capabilities(). So add the checking in kvm_probe_msr_to_save(). Fixes: c11f83e0626b ("KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality") Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20230509032348.1153070-1-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-21KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)Sean Christopherson1-16/+0
Drop KVM's manipulation of guest's CPUID.0x12.1 ECX and EDX, i.e. the allowed XFRM of SGX enclaves, now that KVM explicitly checks the guest's allowed XCR0 when emulating ECREATE. Note, this could theoretically break a setup where userspace advertises a "bad" XFRM and relies on KVM to provide a sane CPUID model, but QEMU is the only known user of KVM SGX, and QEMU explicitly sets the SGX CPUID XFRM subleaf based on the guest's XCR0. Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230503160838.3412617-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-21KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATESean Christopherson1-2/+9
Explicitly check the vCPU's supported XCR0 when determining whether or not the XFRM for ECREATE is valid. Checking CPUID works because KVM updates guest CPUID.0x12.1 to restrict the leaf to a subset of the guest's allowed XCR0, but that is rather subtle and KVM should not modify guest CPUID except for modeling true runtime behavior (allowed XFRM is most definitely not "runtime" behavior). Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230503160838.3412617-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-19KVM: VMX: Fix header file dependency of asm/vmx.hJacob Xu1-0/+2
Include a definition of WARN_ON_ONCE() before using it. Fixes: bb1fcc70d98f ("KVM: nVMX: Allow L1 to use 5-level page walks for nested EPT") Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Jacob Xu <jacobhxu@google.com> [reworded commit message; changed <asm/bug.h> to <linux/bug.h>] Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220225012959.1554168-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-19Merge tag 'drm-misc-next-2023-05-11' of ↵Dave Airlie2-26/+25
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 6.5: UAPI Changes: Cross-subsystem Changes: - arch: Consolidate <asm/fb.h> Core Changes: - aperture: Ignore firmware framebuffers with non-primary devices - fbdev: Use fbdev's I/O helpers - sysfs: Expose DRM connector ID - tests: More tests for drm_rect Driver Changes: - armada: Implement fbdev emulation as a client - bridge: - fsl-ldb: Support i.MX6SX - lt9211: Remove blanking packets - lt9611: Remove blanking packets - tc358768: Implement input bus formats reporting, fix various timings and clocks settings - ti-sn65dsi86: Implement wait_hpd_asserted - nouveau: Improve NULL pointer checks before dereference - panel: - nt36523: Support Lenovo J606F - st7703: Support Anbernic RG353V-V2 - new panels: InnoLux G070ACE-L01 - sun4i: Fix MIPI-DSI dotclock - vc4: RGB Range toggle property, BT601 and BT2020 support for HDMI - vkms: Convert to drmm helpers, Add reflection and rotation support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/2pxmxdzsk2ekjy6xvbpj67zrhtwvkkhfspuvdm5pfm5i54hed6@sooct7yq6z4w
2023-05-18Merge tag 'probes-fixes-v6.4-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Initialize 'ret' local variables on fprobe_handler() to fix the smatch warning. With this, fprobe function exit handler is not working randomly. - Fix to use preempt_enable/disable_notrace for rethook handler to prevent recursive call of fprobe exit handler (which is based on rethook) - Fix recursive call issue on fprobe_kprobe_handler() - Fix to detect recursive call on fprobe_exit_handler() - Fix to make all arch-dependent rethook code notrace (the arch-independent code is already notrace)" * tag 'probes-fixes-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rethook, fprobe: do not trace rethook related functions fprobe: add recursion detection in fprobe_exit_handler fprobe: make fprobe_kprobe_handler recursion free rethook: use preempt_{disable, enable}_notrace in rethook_trampoline_handler tracing: fprobe: Initialize ret valiable to fix smatch error
2023-05-18fbdev: Include <linux/fb.h> instead of <asm/fb.h>Thomas Zimmermann1-2/+0
Replace include statements for <asm/fb.h> with <linux/fb.h>. Fixes the coding style: if a header is available in asm/ and linux/, it is preferable to include the header from linux/. This only affects a few source files, most of which already include <linux/fb.h>. Suggested-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Sui Jingfeng <suijingfeng@loongson.cn> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230512102444.5438-6-tzimmermann@suse.de
2023-05-18rethook, fprobe: do not trace rethook related functionsZe Gao1-0/+1
These functions are already marked as NOKPROBE to prevent recursion and we have the same reason to blacklist them if rethook is used with fprobe, since they are beyond the recursion-free region ftrace can guard. Link: https://lore.kernel.org/all/20230517034510.15639-5-zegao@tencent.com/ Fixes: f3a112c0c40d ("x86,rethook,kprobes: Replace kretprobe with rethook on x86") Signed-off-by: Ze Gao <zegao@tencent.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-05-17x86/mm: Avoid incomplete Global INVLPG flushesDave Hansen1-0/+25
The INVLPG instruction is used to invalidate TLB entries for a specified virtual address. When PCIDs are enabled, INVLPG is supposed to invalidate TLB entries for the specified address for both the current PCID *and* Global entries. (Note: Only kernel mappings set Global=1.) Unfortunately, some INVLPG implementations can leave Global translations unflushed when PCIDs are enabled. As a workaround, never enable PCIDs on affected processors. I expect there to eventually be microcode mitigations to replace this software workaround. However, the exact version numbers where that will happen are not known today. Once the version numbers are set in stone, the processor list can be tweaked to only disable PCIDs on affected processors with affected microcode. Note: if anyone wants a quick fix that doesn't require patching, just stick 'nopcid' on your kernel command-line. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
2023-05-16x86/show_trace_log_lvl: Ensure stack pointer is aligned, againVernon Lovejoy1-2/+5
The commit e335bb51cc15 ("x86/unwind: Ensure stack pointer is aligned") tried to align the stack pointer in show_trace_log_lvl(), otherwise the "stack < stack_info.end" check can't guarantee that the last read does not go past the end of the stack. However, we have the same problem with the initial value of the stack pointer, it can also be unaligned. So without this patch this trivial kernel module #include <linux/module.h> static int init(void) { asm volatile("sub $0x4,%rsp"); dump_stack(); asm volatile("add $0x4,%rsp"); return -EAGAIN; } module_init(init); MODULE_LICENSE("GPL"); crashes the kernel. Fixes: e335bb51cc15 ("x86/unwind: Ensure stack pointer is aligned") Signed-off-by: Vernon Lovejoy <vlovejoy@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20230512104232.GA10227@redhat.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-05-14Merge tag 'perf_urgent_for_v6.4_rc2' of ↵Linus Torvalds3-28/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Make sure the PEBS buffer is flushed before reprogramming the hardware so that the correct record sizes are used - Update the sample size for AMD BRS events - Fix a confusion with using the same on-stack struct with different events in the event processing path * tag 'perf_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG perf/x86: Fix missing sample size update on AMD BRS perf/core: Fix perf_sample_data not properly initialized for different swevents in perf_tp_event()
2023-05-14Merge tag 'x86_urgent_for_v6.4_rc2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Borislav Petkov: - Add the required PCI IDs so that the generic SMN accesses provided by amd_nb.c work for drivers which switch to them. Add a PCI device ID to k10temp's table so that latter is loaded on such systems too * tag 'x86_urgent_for_v6.4_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hwmon: (k10temp) Add PCI ID for family 19, model 78h x86/amd_nb: Add PCI ID for family 19h model 78h
2023-05-13x86/retbleed: Fix return thunk alignmentBorislav Petkov (AMD)1-2/+2
SYM_FUNC_START_LOCAL_NOALIGN() adds an endbr leading to this layout (leaving only the last 2 bytes of the address): 3bff <zen_untrain_ret>: 3bff: f3 0f 1e fa endbr64 3c03: f6 test $0xcc,%bl 3c04 <__x86_return_thunk>: 3c04: c3 ret 3c05: cc int3 3c06: 0f ae e8 lfence However, "the RET at __x86_return_thunk must be on a 64 byte boundary, for alignment within the BTB." Use SYM_START instead. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-09Merge drm/drm-next into drm-misc-nextMaxime Ripard201-3147/+4354
Start the 6.5 release cycle. Signed-off-by: Maxime Ripard <maxime@cerno.tech>
2023-05-08x86/amd_nb: Add PCI ID for family 19h model 78hMario Limonciello1-0/+2
Commit 310e782a99c7 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe") switched to using amd_smn_read() which relies upon the misc PCI ID used by DF function 3 being included in a table. The ID for model 78h is missing in that table, so amd_smn_read() doesn't work. Add the missing ID into amd_nb, restoring s2idle on this system. [ bp: Simplify commit message. ] Fixes: 310e782a99c7 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Acked-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20230427053338.16653-2-mario.limonciello@amd.com
2023-05-08perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFGKan Liang2-24/+35
Several similar kernel warnings can be triggered, [56605.607840] CPU0 PEBS record size 0, expected 32, config 0 cpuc->record_size=208 when the below commands are running in parallel for a while on SPR. while true; do perf record --no-buildid -a --intr-regs=AX \ -e cpu/event=0xd0,umask=0x81/pp \ -c 10003 -o /dev/null ./triad; done & while true; do perf record -o /tmp/out -W -d \ -e '{ld_blocks.store_forward:period=1000000, \ MEM_TRANS_RETIRED.LOAD_LATENCY:u:precise=2:ldlat=4}' \ -c 1037 ./triad; done The triad program is just the generation of loads/stores. The warnings are triggered when an unexpected PEBS record (with a different config and size) is found. A system-wide PEBS event with the large PEBS config may be enabled during a context switch. Some PEBS records for the system-wide PEBS may be generated while the old task is sched out but the new one hasn't been sched in yet. When the new task is sched in, the cpuc->pebs_record_size may be updated for the per-task PEBS events. So the existing system-wide PEBS records have a different size from the later PEBS records. The PEBS buffer should be flushed right before the hardware is reprogrammed. The new size and threshold should be updated after the old buffer has been flushed. Reported-by: Stephane Eranian <eranian@google.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230421184529.3320912-1-kan.liang@linux.intel.com
2023-05-08perf/x86: Fix missing sample size update on AMD BRSNamhyung Kim1-4/+2
It missed to convert a PERF_SAMPLE_BRANCH_STACK user to call the new perf_sample_save_brstack() helper in order to update the dyn_size. This affects AMD Zen3 machines with the branch-brs event. Fixes: eb55b455ef9c ("perf/core: Add perf_sample_save_brstack() helper") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20230427030527.580841-1-namhyung@kernel.org
2023-05-05Merge tag 'locking-core-2023-05-05' of ↵Linus Torvalds2-2/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce local{,64}_try_cmpxchg() - a slightly more optimal primitive, which will be used in perf events ring-buffer code - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation - Misc cleanups/fixes * tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: Correct (cmp)xchg() instrumentation locking/x86: Define arch_try_cmpxchg_local() locking/arch: Wire up local_try_cmpxchg() locking/generic: Wire up local{,64}_try_cmpxchg() locking/atomic: Add generic try_cmpxchg{,64}_local() support locking/rwbase: Mitigate indefinite writer starvation locking/arch: Rename all internal __xchg() names to __arch_xchg()
2023-05-05Merge branch 'x86-uaccess-cleanup': x86 uaccess header cleanupsLinus Torvalds4-94/+122
Merge my x86 uaccess updates branch. The LAM ("Linear Address Masking") updates in this release made me unhappy about how "access_ok()" was done, and it actually turned out to have a couple of small bugs in it too. This is my cleanup of the code: - use the sign bit of the __user pointer rather than masking the address and checking it against the TASK_SIZE range. We already did this part for the get/put_user() side, but 'access_ok()' did the naïve "mask and range check" thing, which not only generates nasty code, but also ended up meaning that __access_ok itself didn't do a good job, and so copy_from_user_nmi() didn't get the check right. - move all the code that is 64-bit only into the 64-bit version of the header file, so that we don't unnecessarily pollute the shared x86 code and make it look like LAM might work in 32-bit too. - fix a bug in the address masking (that doesn't end up mattering: in this case the fix was to just remove the buggy code entirely). - a couple of trivial cleanups and added commentary about the access_ok() rules. * x86-uaccess-cleanup: x86-64: mm: clarify the 'positive addresses' user address rules x86: mm: remove 'sign' games from LAM untagged_addr*() macros x86: uaccess: move 32-bit and 64-bit parts into proper <asm/uaccess_N.h> header x86: mm: remove architecture-specific 'access_ok()' define x86-64: make access_ok() independent of LAM
2023-05-05Merge tag 'kvm-x86-mmu-6.4-2' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini1-65/+56
Fix a long-standing flaw in x86's TDP MMU where unloading roots on a vCPU can result in the root being freed even though the root is completely valid and can be reused as-is (with a TLB flush).
2023-05-04Merge tag 'uml-for-linus-6.4-rc1' of ↵Linus Torvalds3-9/+9
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull uml updates from Richard Weinberger: - Make stub data pages configurable - Make it harder to mix user and kernel code by accident * tag 'uml-for-linus-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: make stub data pages size tweakable um: prevent user code in modules um: further clean up user_syms um: don't export printf() um: hostfs: define our own API boundary um: add __weak for exported functions
2023-05-03x86-64: mm: clarify the 'positive addresses' user address rulesLinus Torvalds2-15/+33
Dave Hansen found the "(long) addr >= 0" code in the x86-64 access_ok checks somewhat confusing, and suggested using a helper to clarify what the code is doing. So this does exactly that: clarifying what the sign bit check is all about, by adding a helper macro that makes it clear what it is testing. This also adds some explicit comments talking about how even with LAM enabled, any addresses with the sign bit will still GP-fault in the non-canonical region just above the sign bit. This is all what allows us to do the user address checks with just the sign bit, and furthermore be a bit cavalier about accesses that might be done with an additional offset even past that point. (And yes, this talks about 'positive' even though zero is also a valid user address and so technically we should call them 'non-negative'. But I don't think using 'non-negative' ends up being more understandable). Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03x86: mm: remove 'sign' games from LAM untagged_addr*() macrosLinus Torvalds1-15/+3
The intent of the sign games was to not modify kernel addresses when untagging them. However, that had two issues: (a) it didn't actually work as intended, since the mask was calculated as 'addr >> 63' on an _unsigned_ address. So instead of getting a mask of all ones for kernel addresses, you just got '1'. (b) untagging a kernel address isn't actually a valid operation anyway. Now, (a) had originally been true for both 'untagged_addr()' and the remote version of it, but had accidentally been fixed for the regular version of untagged_addr() by commit e0bddc19ba95 ("x86/mm: Reduce untagged_addr() overhead for systems without LAM"). That one rewrote the shift to be part of the alternative asm code, and in the process changed the unsigned shift into a signed 'sar' instruction. And while it is true that we don't want to turn what looks like a kernel address into a user address by masking off the high bit, that doesn't need these sign masking games - all it needs is that the mm context 'untag_mask' value has the high bit set. Which it always does. So simplify the code by just removing the superfluous (and in the case of untagged_addr_remote(), still buggy) sign bit games in the address masking. Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03x86: uaccess: move 32-bit and 64-bit parts into proper <asm/uaccess_N.h> headerLinus Torvalds3-85/+82
The x86 <asm/uaccess.h> file has grown features that are specific to x86-64 like LAM support and the related access_ok() changes. They really should be in the <asm/uaccess_64.h> file and not pollute the generic x86 header. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03x86: mm: remove architecture-specific 'access_ok()' defineLinus Torvalds1-34/+0
There's already a generic definition of 'access_ok()' in the asm-generic/access_ok.h header file, and the only difference bwteen that and the x86-specific one is the added check for WARN_ON_IN_IRQ(). And it turns out that the reason for that check is long gone: it used to use a "user_addr_max()" inline function that depended on the current thread, and caused problems in non-thread contexts. For details, see commits 7c4788950ba5 ("x86/uaccess, sched/preempt: Verify access_ok() context") and in particular commit ae31fe51a3cc ("perf/x86: Restore TASK_SIZE check on frame pointer") about how and why this came to be. But that "current task" issue was removed in the big set_fs() removal by Christoph Hellwig in commit 47058bb54b57 ("x86: remove address space overrides using set_fs()"). So the reason for the test and the architecture-specific access_ok() define no longer exists, and is actually harmful these days. For example, it led various 'copy_from_user_nmi()' games (eg using __range_not_ok() instead, and then later converted to __access_ok() when that became ok). And that in turn meant that LAM was broken for the frame following before this series, because __access_ok() used to not do the address untagging. Accessing user state still needs care in many contexts, but access_ok() is not the place for this test. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds torvalds@linux-foundation.org>