summaryrefslogtreecommitdiff
path: root/arch/x86/Kconfig
AgeCommit message (Collapse)AuthorFilesLines
2021-06-30Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge misc updates from Andrew Morton: "191 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab, slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap, mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits) mm,hwpoison: make get_hwpoison_page() call get_any_page() mm,hwpoison: send SIGBUS with error virutal address mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes mm/page_alloc: allow high-order pages to be stored on the per-cpu lists mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA docs: remove description of DISCONTIGMEM arch, mm: remove stale mentions of DISCONIGMEM mm: remove CONFIG_DISCONTIGMEM m68k: remove support for DISCONTIGMEM arc: remove support for DISCONTIGMEM arc: update comment about HIGHMEM implementation alpha: remove DISCONTIGMEM and NUMA mm/page_alloc: move free_the_page mm/page_alloc: fix counting of managed_pages mm/page_alloc: improve memmap_pages dbg msg mm: drop SECTION_SHIFT in code comments mm/page_alloc: introduce vm.percpu_pagelist_high_fraction mm/page_alloc: limit the number of pages on PCP lists when reclaim is active mm/page_alloc: scale the number of pages that are batch freed ...
2021-06-29mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMAMike Rapoport1-1/+1
After removal of DISCINTIGMEM the NEED_MULTIPLE_NODES and NUMA configuration options are equivalent. Drop CONFIG_NEED_MULTIPLE_NODES and use CONFIG_NUMA instead. Done with $ sed -i 's/CONFIG_NEED_MULTIPLE_NODES/CONFIG_NUMA/' \ $(git grep -wl CONFIG_NEED_MULTIPLE_NODES) $ sed -i 's/NEED_MULTIPLE_NODES/NUMA/' \ $(git grep -wl NEED_MULTIPLE_NODES) with manual tweaks afterwards. [rppt@linux.ibm.com: fix arm boot crash] Link: https://lkml.kernel.org/r/YMj9vHhHOiCVN4BF@linux.ibm.com Link: https://lkml.kernel.org/r/20210608091316.3622-9-rppt@kernel.org Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-07x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= optionsMike Rapoport1-29/+0
The CONFIG_X86_RESERVE_LOW build time and reservelow= command line option allowed to control the amount of memory under 1M that would be reserved at boot to avoid using memory that can be potentially clobbered by BIOS. Since the entire range under 1M is always reserved there is no need for these options anymore and they can be removed. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210601075354.5149-3-rppt@kernel.org
2021-05-05x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLEOscar Salvador1-0/+3
Enable x86_64 platform to use the MHP_MEMMAP_ON_MEMORY feature. Link: https://lkml.kernel.org/r/20210421102701.25051-8-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05mm: drop redundant ARCH_ENABLE_SPLIT_PMD_PTLOCKAnshuman Khandual1-4/+1
ARCH_ENABLE_SPLIT_PMD_PTLOCKS has duplicate definitions on platforms that subscribe it. Drop these redundant definitions and instead just select it on applicable platforms. Link: https://lkml.kernel.org/r/1617259448-22529-6-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Cc: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05mm: drop redundant ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATIONAnshuman Khandual1-8/+2
ARCH_ENABLE_[HUGEPAGE|THP]_MIGRATION configs have duplicate definitions on platforms that subscribe them. Drop these reduntant definitions and instead just select them appropriately. [akpm@linux-foundation.org: s/x86_64/X86_64/, per Oscar] Link: https://lkml.kernel.org/r/1617259448-22529-5-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05mm: generalize ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE]Anshuman Khandual1-8/+2
ARCH_ENABLE_MEMORY_[HOTPLUG|HOTREMOVE] configs have duplicate definitions on platforms that subscribe them. Instead, just make them generic options which can be selected on applicable platforms. Link: https://lkml.kernel.org/r/1617259448-22529-4-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390] Cc: Will Deacon <will@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05mm: generalize ARCH_HAS_CACHE_LINE_SIZEAnshuman Khandual1-3/+1
Patch series "mm: some config cleanups", v2. This series contains config cleanup patches which reduces code duplication across platforms and also improves maintainability. There is no functional change intended with this series. This patch (of 6): ARCH_HAS_CACHE_LINE_SIZE config has duplicate definitions on platforms that subscribe it. Instead, just make it a generic option which can be selected on applicable platforms. This change reduces code duplication and makes it cleaner. Link: https://lkml.kernel.org/r/1617259448-22529-1-git-send-email-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/1617259448-22529-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05userfaultfd: add minor fault registration modeAxel Rasmussen1-0/+1
Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also* get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will *see* a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ [peterx@redhat.com: fix minor fault page leak] Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30mm/memtest: add ARCH_USE_MEMTESTAnshuman Khandual1-0/+1
early_memtest() does not get called from all architectures. Hence enabling CONFIG_MEMTEST and providing a valid memtest=[1..N] kernel command line option might not trigger the memory pattern tests as would be expected in normal circumstances. This situation is misleading. The change here prevents the above mentioned problem after introducing a new config option ARCH_USE_MEMTEST that should be subscribed on platforms that call early_memtest(), in order to enable the config CONFIG_MEMTEST. Conversely CONFIG_MEMTEST cannot be enabled on platforms where it would not be tested anyway. Link: https://lkml.kernel.org/r/1617269193-22294-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> (arm64) Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-28Merge tag 'x86_core_for_v5.13' of ↵Linus Torvalds1-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Borislav Petkov: - Turn the stack canary into a normal __percpu variable on 32-bit which gets rid of the LAZY_GS stuff and a lot of code. - Add an insn_decode() API which all users of the instruction decoder should preferrably use. Its goal is to keep the details of the instruction decoder away from its users and simplify and streamline how one decodes insns in the kernel. Convert its users to it. - kprobes improvements and fixes - Set the maximum DIE per package variable on Hygon - Rip out the dynamic NOP selection and simplify all the machinery around selecting NOPs. Use the simplified NOPs in objtool now too. - Add Xeon Sapphire Rapids to list of CPUs that support PPIN - Simplify the retpolines by folding the entire thing into an alternative now that objtool can handle alternatives with stack ops. Then, have objtool rewrite the call to the retpoline with the alternative which then will get patched at boot time. - Document Intel uarch per models in intel-family.h - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the exception on Intel. * tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) x86, sched: Treat Intel SNC topology as default, COD as exception x86/cpu: Comment Skylake server stepping too x86/cpu: Resort and comment Intel models objtool/x86: Rewrite retpoline thunk calls objtool: Skip magical retpoline .altinstr_replacement objtool: Cache instruction relocs objtool: Keep track of retpoline call sites objtool: Add elf_create_undef_symbol() objtool: Extract elf_symbol_add() objtool: Extract elf_strtab_concat() objtool: Create reloc sections implicitly objtool: Add elf_create_reloc() helper objtool: Rework the elf_rebuild_reloc_section() logic objtool: Fix static_call list generation objtool: Handle per arch retpoline naming objtool: Correctly handle retpoline thunk calls x86/retpoline: Simplify retpolines x86/alternatives: Optimize optimize_nops() x86: Add insn_decode_kernel() x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration ...
2021-04-26Merge tag 'x86-entry-2021-04-26' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull entry code update from Thomas Gleixner: "Provide support for randomized stack offsets per syscall to make stack-based attacks harder which rely on the deterministic stack layout. The feature is based on the original idea of PaX's RANDSTACK feature, but uses a significantly different implementation. The offset does not affect the pt_regs location on the task stack as this was agreed on to be of dubious value. The offset is applied before the actual syscall is invoked. The offset is stored per cpu and the randomization happens at the end of the syscall which is less predictable than on syscall entry. The mechanism to apply the offset is via alloca(), i.e. abusing the dispised VLAs. This comes with the drawback that stack-clash-protection has to be disabled for the affected compilation units and there is also a negative interaction with stack-protector. Those downsides are traded with the advantage that this approach does not require any intrusive changes to the low level assembly entry code, does not affect the unwinder and the correct stack alignment is handled automatically by the compiler. The feature is guarded with a static branch which avoids the overhead when disabled. Currently this is supported for X86 and ARM64" * tag 'x86-entry-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64: entry: Enable random_kstack_offset support lkdtm: Add REPORT_STACK for checking stack offsets x86/entry: Enable random_kstack_offset support stack: Optionally randomize kernel stack offset each syscall init_on_alloc: Optimize static branches jump_label: Provide CONFIG-driven build state defaults
2021-04-26Merge tag 'x86_platform_for_v5.13' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Borislav Petkov: "A bunch of SGI UV improvements, fixes and cleanups" * tag 'x86_platform_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/uv: Remove dead !CONFIG_KEXEC_CORE code x86/platform/uv: Fix !KEXEC build failure x86/platform/uv: Add more to secondary CPU kdump info x86/platform/uv: Use x2apic enabled bit as set by BIOS to indicate APIC mode x86/platform/uv: Set section block size for hubless architectures x86/platform/uv: Fix indentation warning in Documentation/ABI/testing/sysfs-firmware-sgi_uv
2021-04-26Merge tag 'x86_build_for_v5.13' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Borislav Petkov: "A bunch of clang build fixes and a Kconfig highmem selection fix for 486SX" * tag 'x86_build_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Disable HIGHMEM64G selection for M486SX efi/libstub: Add $(CLANG_FLAGS) to x86 flags x86/boot: Add $(CLANG_FLAGS) to compressed KBUILD_CFLAGS x86/build: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS)
2021-04-26Merge tag 'x86_sgx_for_v5.13' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX updates from Borislav Petkov: "Add the guest side of SGX support in KVM guests. Work by Sean Christopherson, Kai Huang and Jarkko Sakkinen. Along with the usual fixes, cleanups and improvements" * tag 'x86_sgx_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86/sgx: Mark sgx_vepc_vm_ops static x86/sgx: Do not update sgx_nr_free_pages in sgx_setup_epc_section() x86/sgx: Move provisioning device creation out of SGX driver x86/sgx: Add helpers to expose ECREATE and EINIT to KVM x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs x86/sgx: Add encls_faulted() helper x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) x86/sgx: Move ENCLS leaf definitions to sgx.h x86/sgx: Expose SGX architectural definitions to the kernel x86/sgx: Initialize virtual EPC driver even when SGX driver is disabled x86/cpu/intel: Allow SGX virtualization without Launch Control support x86/sgx: Introduce virtual EPC for use by KVM guests x86/sgx: Add SGX_CHILD_PRESENT hardware error code x86/sgx: Wipe out EREMOVE from sgx_free_epc_page() x86/cpufeatures: Add SGX1 and SGX2 sub-features x86/cpufeatures: Make SGX_LC feature bit depend on SGX bit x86/sgx: Remove unnecessary kmap() from sgx_ioc_enclave_init() selftests/sgx: Use getauxval() to simplify test code selftests/sgx: Improve error detection and messages x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() ...
2021-04-26Merge tag 'x86_seves_for_v5.13' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 AMD secure virtualization (SEV-ES) updates from Borislav Petkov: "Add support for SEV-ES guests booting through the 32-bit boot path, along with cleanups, fixes and improvements" * tag 'x86_seves_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev-es: Optimize __sev_es_ist_enter() for better readability x86/sev-es: Replace open-coded hlt-loops with sev_es_terminate() x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path x86/boot/compressed/64: Add CPUID sanity check to 32-bit boot-path x86/boot/compressed/64: Add 32-bit boot #VC handler x86/boot/compressed/64: Setup IDT in startup_32 boot path x86/boot/compressed/64: Reload CS in startup_32 x86/sev: Do not require Hypervisor CPUID bit for SEV guests x86/boot/compressed/64: Cleanup exception handling before booting kernel x86/virtio: Have SEV guests enforce restricted virtio memory access x86/sev-es: Remove subtraction of res variable
2021-04-20x86/platform/uv: Fix !KEXEC build failureIngo Molnar1-0/+1
When KEXEC is disabled, the UV build fails: arch/x86/platform/uv/uv_nmi.c:875:14: error: ‘uv_nmi_kexec_failed’ undeclared (first use in this function) Since uv_nmi_kexec_failed is only defined in the KEXEC_CORE #ifdef branch, this code cannot ever have been build tested: if (main) pr_err("UV: NMI kdump: KEXEC not supported in this kernel\n"); atomic_set(&uv_nmi_kexec_failed, 1); Nor is this use possible in uv_handle_nmi(): atomic_set(&uv_nmi_kexec_failed, 0); These bugs were introduced in this commit: d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails") Which added the uv_nmi_kexec_failed assignments to !KEXEC code, while making the definition KEXEC-only - apparently without testing the !KEXEC case. Instead of complicating the #ifdef maze, simplify the code by requiring X86_UV to depend on KEXEC_CORE. This pattern is present in other architectures as well. ( We'll remove the untested, 7 years old !KEXEC complications from the file in a separate commit. ) Fixes: d0a9964e9873: ("x86/platform/uv: Implement simple dump failover if kdump fails") Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Mike Travis <travis@sgi.com> Cc: linux-kernel@vger.kernel.org
2021-04-19x86/build: Disable HIGHMEM64G selection for M486SXMaciej W. Rozycki1-1/+1
Fix a regression caused by making the 486SX separately selectable in Kconfig, for which the HIGHMEM64G setting has not been updated and therefore has become exposed as a user-selectable option for the M486SX configuration setting unlike with original M486 and all the other settings that choose non-PAE-enabled processors: High Memory Support > 1. off (NOHIGHMEM) 2. 4GB (HIGHMEM4G) 3. 64GB (HIGHMEM64G) choice[1-3?]: With the fix in place the setting is now correctly removed: High Memory Support > 1. off (NOHIGHMEM) 2. 4GB (HIGHMEM4G) choice[1-2?]: [ bp: Massage commit message. ] Fixes: 87d6021b8143 ("x86/math-emu: Limit MATH_EMULATION to 486SX compatibles") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # v5.5+ Link: https://lkml.kernel.org/r/alpine.DEB.2.21.2104141221340.44318@angie.orcam.me.uk
2021-04-08x86/entry: Enable random_kstack_offset supportKees Cook1-0/+1
Allow for a randomized stack offset on a per-syscall basis, with roughly 5-6 bits of entropy, depending on compiler and word size. Since the method of offsetting uses macros, this cannot live in the common entry code (the stack offset needs to be retained for the life of the syscall, which means it needs to happen at the actual entry point). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210401232347.2791257-5-keescook@chromium.org
2021-03-31Merge 'x86/alternatives'Borislav Petkov1-0/+1
Pick up dependent changes. Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-19x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()Jarkko Sakkinen1-0/+1
Background ========== SGX enclave memory is enumerated by the processor in contiguous physical ranges called Enclave Page Cache (EPC) sections. Currently, there is a free list per section, but allocations simply target the lowest-numbered sections. This is functional, but has no NUMA awareness. Fortunately, EPC sections are covered by entries in the ACPI SRAT table. These entries allow each EPC section to be associated with a NUMA node, just like normal RAM. Solution ======== Implement a NUMA-aware enclave page allocator. Mirror the buddy allocator and maintain a list of enclave pages for each NUMA node. Attempt to allocate enclave memory first from local nodes, then fall back to other nodes. Note that the fallback is not as sophisticated as the buddy allocator and is itself not aware of NUMA distances. When a node's free list is empty, it searches for the next-highest node with enclave pages (and will wrap if necessary). This could be improved in the future. Other ===== NUMA_KEEP_MEMINFO dependency is required for phys_to_target_node(). [ Kai Huang: Do not return NULL from __sgx_alloc_epc_page() because callers do not expect that and that leads to a NULL ptr deref. ] [ dhansen: Fix an uninitialized 'nid' variable in __sgx_alloc_epc_page() as Reported-by: kernel test robot <lkp@intel.com> to avoid any potential allocations from the wrong NUMA node or even premature allocation failures. ] Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/lkml/158188326978.894464.217282995221175417.stgit@dwillia2-desk3.amr.corp.intel.com/ Link: https://lkml.kernel.org/r/20210319040602.178558-1-kai.huang@intel.com Link: https://lkml.kernel.org/r/20210318214933.29341-1-dave.hansen@intel.com Link: https://lkml.kernel.org/r/20210317235332.362001-2-jarkko.sakkinen@intel.com
2021-03-15Merge 'x86/seves' into x86/coreBorislav Petkov1-0/+1
Pick up dependent changes. Signed-off-by: Borislav Petkov <bp@suse.de>
2021-03-11x86/paravirt: Switch time pvops functions to use static_call()Juergen Gross1-0/+1
The time pvops functions are the only ones left which might be used in 32-bit mode and which return a 64-bit value. Switch them to use the static_call() mechanism instead of pvops, as this allows quite some simplification of the pvops implementation. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210311142319.4723-5-jgross@suse.com
2021-03-08x86/virtio: Have SEV guests enforce restricted virtio memory accessTom Lendacky1-0/+1
An SEV guest requires that virtio devices use the DMA API to allow the hypervisor to successfully access guest memory as needed. The VIRTIO_F_VERSION_1 and VIRTIO_F_ACCESS_PLATFORM features tell virtio to use the DMA API. Add arch_has_restricted_virtio_memory_access() for x86, to fail the device probe if these features have not been set for the device when running as an SEV guest. [ bp: Fix -Wmissing-prototypes warning Reported-by: kernel test robot <lkp@intel.com> ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/b46e0211f77ca1831f11132f969d470a6ffc9267.1614897610.git.thomas.lendacky@amd.com
2021-03-08x86/stackprotector/32: Make the canary into a regular percpu variableAndy Lutomirski1-5/+2
On 32-bit kernels, the stackprotector canary is quite nasty -- it is stored at %gs:(20), which is nasty because 32-bit kernels use %fs for percpu storage. It's even nastier because it means that whether %gs contains userspace state or kernel state while running kernel code depends on whether stackprotector is enabled (this is CONFIG_X86_32_LAZY_GS), and this setting radically changes the way that segment selectors work. Supporting both variants is a maintenance and testing mess. Merely rearranging so that percpu and the stack canary share the same segment would be messy as the 32-bit percpu address layout isn't currently compatible with putting a variable at a fixed offset. Fortunately, GCC 8.1 added options that allow the stack canary to be accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary percpu variable. This lets us get rid of all of the code to manage the stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess. (That name is special. We could use any symbol we want for the %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any name other than __stack_chk_guard.) Forcibly disable stackprotector on older compilers that don't support the new options and turn the stack canary into a percpu variable. The "lazy GS" approach is now used for all 32-bit configurations. Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels, it loads the GS selector and updates the user GSBASE accordingly. (This is unchanged.) On 32-bit kernels, it loads the GS selector and updates GSBASE, which is now always the user base. This means that the overall effect is the same on 32-bit and 64-bit, which avoids some ifdeffery. [ bp: Massage commit message. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
2021-02-26x86, kfence: enable KFENCE for x86Alexander Potapenko1-0/+1
Add architecture specific implementation details for KFENCE and enable KFENCE for the x86 architecture. In particular, this implements the required interface in <asm/kfence.h> for setting up the pool and providing helper functions for protecting and unprotecting pages. For x86, we need to ensure that the pool uses 4K pages, which is done using the set_memory_4k() helper function. [elver@google.com: add missing copyright and description header] Link: https://lkml.kernel.org/r/20210118092159.145934-2-elver@google.com Link: https://lkml.kernel.org/r/20201103175841.3495947-3-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Co-developed-by: Marco Elver <elver@google.com> Reviewed-by: Jann Horn <jannh@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christopher Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hillf Danton <hdanton@sina.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joern Engel <joern@purestorage.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: SeongJae Park <sjpark@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-25Merge tag 'x86-entry-2021-02-24' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 irq entry updates from Thomas Gleixner: "The irq stack switching was moved out of the ASM entry code in course of the entry code consolidation. It ended up being suboptimal in various ways. This reworks the X86 irq stack handling: - Make the stack switching inline so the stackpointer manipulation is not longer at an easy to find place. - Get rid of the unnecessary indirect call. - Avoid the double stack switching in interrupt return and reuse the interrupt stack for softirq handling. - A objtool fix for CONFIG_FRAME_POINTER=y builds where it got confused about the stack pointer manipulation" * tag 'x86-entry-2021-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Fix stack-swizzle for FRAME_POINTER=y um: Enforce the usage of asm-generic/softirq_stack.h x86/softirq/64: Inline do_softirq_own_stack() softirq: Move do_softirq_own_stack() to generic asm header softirq: Move __ARCH_HAS_DO_SOFTIRQ to Kconfig x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK x86/softirq: Remove indirection in do_softirq_own_stack() x86/entry: Use run_sysvec_on_irqstack_cond() for XEN upcall x86/entry: Convert device interrupts to inline stack switching x86/entry: Convert system vectors to irq stack macro x86/irq: Provide macro for inlining irq stack switching x86/apic: Split out spurious handling code x86/irq/64: Adjust the per CPU irq stack pointer by 8 x86/irq: Sanitize irq stack tracking x86/entry: Fix instrumentation annotation
2021-02-24Merge tag 'sfi-removal-5.12-rc1' of ↵Linus Torvalds1-17/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull Simple Firmware Interface (SFI) support removal from Rafael Wysocki: "Drop support for depercated platforms using SFI, drop the entire support for SFI that has been long deprecated too and make some janitorial changes on top of that (Andy Shevchenko)" * tag 'sfi-removal-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: x86/platform/intel-mid: Update Copyright year and drop file names x86/platform/intel-mid: Remove unused header inclusion in intel-mid.h x86/platform/intel-mid: Drop unused __intel_mid_cpu_chip and Co. x86/platform/intel-mid: Get rid of intel_scu_ipc_legacy.h x86/PCI: Describe @reg for type1_access_ok() x86/PCI: Get rid of custom x86 model comparison sfi: Remove framework for deprecated firmware cpufreq: sfi-cpufreq: Remove driver for deprecated firmware media: atomisp: Remove unused header mfd: intel_msic: Remove driver for deprecated platform x86/apb_timer: Remove driver for deprecated platform x86/platform/intel-mid: Remove unused leftovers (vRTC) x86/platform/intel-mid: Remove unused leftovers (msic) x86/platform/intel-mid: Remove unused leftovers (msic_thermal) x86/platform/intel-mid: Remove unused leftovers (msic_power_btn) x86/platform/intel-mid: Remove unused leftovers (msic_gpio) x86/platform/intel-mid: Remove unused leftovers (msic_battery) x86/platform/intel-mid: Remove unused leftovers (msic_ocd) x86/platform/intel-mid: Remove unused leftovers (msic_audio) platform/x86: intel_scu_wdt: Drop mistakenly added const
2021-02-24Merge tag 'clang-lto-v5.12-rc1-part2' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull more clang LTO updates from Kees Cook: "Clang LTO x86 enablement. Full disclosure: while this has _not_ been in linux-next (since it initially looked like the objtool dependencies weren't going to make v5.12), it has been under daily build and runtime testing by Sami for quite some time. These x86 portions have been discussed on lkml, with Peter, Josh, and others helping nail things down. The bulk of the changes are to get objtool working happily. The rest of the x86 enablement is very small. Summary: - Generate __mcount_loc in objtool (Peter Zijlstra) - Support running objtool against vmlinux.o (Sami Tolvanen) - Clang LTO enablement for x86 (Sami Tolvanen)" Link: https://lore.kernel.org/lkml/20201013003203.4168817-26-samitolvanen@google.com/ Link: https://lore.kernel.org/lkml/cover.1611263461.git.jpoimboe@redhat.com/ * tag 'clang-lto-v5.12-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kbuild: lto: force rebuilds when switching CONFIG_LTO x86, build: allow LTO to be selected x86, cpu: disable LTO for cpu.c x86, vdso: disable LTO only for vDSO kbuild: lto: postpone objtool objtool: Split noinstr validation from --vmlinux x86, build: use objtool mcount tracing: add support for objtool mcount objtool: Don't autodetect vmlinux.o objtool: Fix __mcount_loc generation with Clang's assembler objtool: Add a pass for generating __mcount_loc
2021-02-23x86, build: allow LTO to be selectedSami Tolvanen1-0/+2
Pass code model and stack alignment to the linker as these are not stored in LLVM bitcode, and allow CONFIG_LTO_CLANG* to be enabled. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org>
2021-02-23x86, build: use objtool mcountSami Tolvanen1-0/+1
Select HAVE_OBJTOOL_MCOUNT if STACK_VALIDATION is selected to use objtool to generate __mcount_loc sections for dynamic ftrace with Clang and gcc <5 (later versions of gcc use -mrecord-mcount). Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org>
2021-02-21Merge tag 'sched-core-2021-02-17' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler updates: - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the preempt=none/voluntary/full boot options (default: full), to allow distros to build a PREEMPT kernel but fall back to close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via a boot time selection. There's also the /debug/sched_debug switch to do this runtime. This feature is implemented via runtime patching (a new variant of static calls). The scope of the runtime patching can be best reviewed by looking at the sched_dynamic_update() function in kernel/sched/core.c. ( Note that the dynamic none/voluntary mode isn't 100% identical, for example preempt-RCU is available in all cases, plus the preempt count is maintained in all models, which has runtime overhead even with the code patching. ) The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority of distributions, are supposed to be unaffected. - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that was found via rcutorture triggering a hang. The bug is that rcu_idle_enter() may wake up a NOCB kthread, but this happens after the last generic need_resched() check. Some cpuidle drivers fix it by chance but many others don't. In true 2020 fashion the original bug fix has grown into a 5-patch scheduler/RCU fix series plus another 16 RCU patches to address the underlying issue of missed preemption events. These are the initial fixes that should fix current incarnations of the bug. - Clean up rbtree usage in the scheduler, by providing & using the following consistent set of rbtree APIs: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass. This is a 4-commit series where each commit improves one aspect of the idle sibling scan logic. - Improve the cpufreq cooling driver by getting the effective CPU utilization metrics from the scheduler - Improve the fair scheduler's active load-balancing logic by reducing the number of active LB attempts & lengthen the load-balancing interval. This improves stress-ng mmapfork performance. - Fix CFS's estimated utilization (util_est) calculation bug that can result in too high utilization values Misc updates & fixes: - Fix the HRTICK reprogramming & optimization feature - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code - Reduce dl_add_task_root_domain() overhead - Fix uprobes refcount bug - Process pending softirqs in flush_smp_call_function_from_idle() - Clean up task priority related defines, remove *USER_*PRIO and USER_PRIO() - Simplify the sched_init_numa() deduplication sort - Documentation updates - Fix EAS bug in update_misfit_status(), which degraded the quality of energy-balancing - Smaller cleanups" * tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) sched,x86: Allow !PREEMPT_DYNAMIC entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point entry: Explicitly flush pending rcuog wakeup before last rescheduling point rcu/nocb: Trigger self-IPI on late deferred wake up before user resume rcu/nocb: Perform deferred wake up before last idle's need_resched() check rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers sched/features: Distinguish between NORMAL and DEADLINE hrtick sched/features: Fix hrtick reprogramming sched/deadline: Reduce rq lock contention in dl_add_task_root_domain() uprobes: (Re)add missing get_uprobe() in __find_uprobe() smp: Process pending softirqs in flush_smp_call_function_from_idle() sched: Harden PREEMPT_DYNAMIC static_call: Allow module use without exposing static_call_key sched: Add /debug/sched_preempt preempt/dynamic: Support dynamic preempt with preempt= boot option preempt/dynamic: Provide irqentry_exit_cond_resched() static call preempt/dynamic: Provide preempt_schedule[_notrace]() static calls preempt/dynamic: Provide cond_resched() and might_resched() static calls preempt: Introduce CONFIG_PREEMPT_DYNAMIC static_call: Provide DEFINE_STATIC_CALL_RET0() ...
2021-02-21Merge tag 'oprofile-removal-5.12' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux Pull oprofile and dcookies removal from Viresh Kumar: "Remove oprofile and dcookies support The 'oprofile' user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. The dcookies stuff is only used by the oprofile code. Now that oprofile's support is getting removed from the kernel, there is no need for dcookies as well. Remove kernel's old oprofile and dcookies support" * tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux: fs: Remove dcookies support drivers: Remove CONFIG_OPROFILE support arch: xtensa: Remove CONFIG_OPROFILE support arch: x86: Remove CONFIG_OPROFILE support arch: sparc: Remove CONFIG_OPROFILE support arch: sh: Remove CONFIG_OPROFILE support arch: s390: Remove CONFIG_OPROFILE support arch: powerpc: Remove oprofile arch: powerpc: Stop building and using oprofile arch: parisc: Remove CONFIG_OPROFILE support arch: mips: Remove CONFIG_OPROFILE support arch: microblaze: Remove CONFIG_OPROFILE support arch: ia64: Remove rest of perfmon support arch: ia64: Remove CONFIG_OPROFILE support arch: hexagon: Don't select HAVE_OPROFILE arch: arc: Remove CONFIG_OPROFILE support arch: arm: Remove CONFIG_OPROFILE support arch: alpha: Remove CONFIG_OPROFILE support
2021-02-21Merge branch 'work.elf-compat' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull ELF compat updates from Al Viro: "Sanitizing ELF compat support, especially for triarch architectures: - X32 handling cleaned up - MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now - Kconfig side of things regularized Eventually I hope to have compat_binfmt_elf.c killed, with both native and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed by kbuild, but that's a separate story - not included here" * 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of COMPAT_ELF_EXEC_PAGESIZE compat_binfmt_elf: don't bother with undef of ELF_ARCH Kconfig: regularize selection of CONFIG_BINFMT_ELF mips compat: switch to compat_binfmt_elf.c mips: don't bother with ELF_CORE_EFLAGS mips compat: don't bother with ELF_ET_DYN_BASE mips: KVM_GUEST makes no sense for 64bit builds... mips: kill unused definitions in binfmt_elf[on]32.c mips binfmt_elf*32.c: use elfcore-compat.h x32: make X32, !IA32_EMULATION setups able to execute x32 binaries [amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly elf_prstatus: collect the common part (everything before pr_reg) into a struct binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
2021-02-21Merge tag 'x86_mm_for_v5.12' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm cleanups from Borislav Petkov: - PTRACE_GETREGS/PTRACE_PUTREGS regset selection cleanup - Another initial cleanup - more to follow - to the fault handling code. - Other minor cleanups and corrections. * tag 'x86_mm_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/{fault,efi}: Fix and rename efi_recover_from_page_fault() x86/fault: Don't run fixups for SMAP violations x86/fault: Don't look for extable entries for SMEP violations x86/fault: Rename no_context() to kernelmode_fixup_or_oops() x86/fault: Bypass no_context() for implicit kernel faults from usermode x86/fault: Split the OOPS code out from no_context() x86/fault: Improve kernel-executing-user-memory handling x86/fault: Correct a few user vs kernel checks wrt WRUSS x86/fault: Document the locking in the fault_signal_pending() path x86/fault/32: Move is_f00f_bug() to do_kern_addr_fault() x86/fault: Fold mm_fault_error() into do_user_addr_fault() x86/fault: Skip the AMD erratum #91 workaround on unaffected CPUs x86/fault: Fix AMD erratum #91 errata fixup for user code x86/Kconfig: Remove HPET_EMULATE_RTC depends on RTC x86/asm: Fixup TASK_SIZE_MAX comment x86/ptrace: Clean up PTRACE_GETREGS/PTRACE_PUTREGS regset selection x86/vm86/32: Remove VM86_SCREEN_BITMAP support x86: Remove definition of DEBUG x86/entry: Remove now unused do_IRQ() declaration x86/mm: Remove duplicate definition of _PAGE_PAT_LARGE ...
2021-02-21Merge tag 'ras_updates_for_v5.12' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - move therm_throt.c to the thermal framework, where it belongs. - identify CPUs which miss to enter the broadcast handler, as an additional debugging aid. * tag 'ras_updates_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: thermal: Move therm_throt there from x86/mce x86/mce: Get rid of mcheck_intel_therm_init() x86/mce: Make mce_timed_out() identify holdout CPUs
2021-02-17preempt: Introduce CONFIG_PREEMPT_DYNAMICMichal Hocko1-0/+1
Preemption mode selection is currently hardcoded on Kconfig choices. Introduce a dedicated option to tune preemption flavour at boot time, This will be only available on architectures efficiently supporting static calls in order not to tempt with the feature against additional overhead that might be prohibitive or undesirable. CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE, CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() / __preempt_schedule_notrace_function()). Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> [peterz: relax requirement to HAVE_STATIC_CALL] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210118141223.123667-5-frederic@kernel.org
2021-02-15sfi: Remove framework for deprecated firmwareAndy Shevchenko1-5/+2
SFI-based platforms are gone. So does this framework. This removes mention of SFI through the drivers and other code as well. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-11softirq: Move __ARCH_HAS_DO_SOFTIRQ to KconfigThomas Gleixner1-0/+1
To prepare for inlining do_softirq_own_stack() replace __ARCH_HAS_DO_SOFTIRQ with a Kconfig switch and select it in the affected architectures. This allows in the next step to move the function prototype and the inline stub into a seperate asm-generic header file which is required to avoid include recursion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210210002513.181713427@linutronix.de
2021-02-11x86: Select CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACKThomas Gleixner1-0/+1
Now that all invocations of irq_exit_rcu() happen on the irq stack, turn on CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK which causes the core code to invoke __do_softirq() directly without going through do_softirq_own_stack(). That means do_softirq_own_stack() is only invoked from task context which means it can't be on the irq stack. Remove the conditional from run_softirq_on_irqstack_cond() and rename the function accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210210002513.068033456@linutronix.de
2021-02-09x86/apb_timer: Remove driver for deprecated platformAndy Shevchenko1-12/+0
Intel Moorestown and Medfield are quite old Intel Atom based 32-bit platforms, which were in limited use in some Android phones, tablets and consumer electronics more than eight years ago. There are no bugs or problems ever reported outside from Intel for breaking any of that platforms for years. It seems no real users exists who run more or less fresh kernel on it. Commit 05f4434bc130 ("ASoC: Intel: remove mfld_machine") is also in align with this theory. Due to above and to reduce a burden of supporting outdated drivers, remove the support for outdated platforms completely. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-08thermal: Move therm_throt there from x86/mceBorislav Petkov1-4/+0
This functionality has nothing to do with MCE, move it to the thermal framework and untangle it from MCE. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lkml.kernel.org/r/20210202121003.GD18075@zn.tnic
2021-02-05x86/Kconfig: Remove HPET_EMULATE_RTC depends on RTCAnand K Mistry1-1/+1
The RTC config option was removed in commit f52ef24be21a ("rtc/alpha: remove legacy rtc driver") Signed-off-by: Anand K Mistry <amistry@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210204183205.1.If5c6ded53a00ecad6a02a1e974316291cc0239d1@changeid
2021-01-29arch: x86: Remove CONFIG_OPROFILE supportViresh Kumar1-1/+0
The "oprofile" user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. Remove the old oprofile's architecture specific support. Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Robert Richter <rric@kernel.org> Acked-by: William Cohen <wcohen@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2021-01-06Kconfig: regularize selection of CONFIG_BINFMT_ELFAl Viro1-1/+0
with mips converted to use of fs/config_binfmt_elf.c, there's no need to keep selects of that thing all over arch/* - we can simply turn into def_bool y if COMPAT && BINFMT_ELF (in fs/Kconfig.binfmt) and get rid of all selects. Several architectures got those selects wrong (e.g. you could end up with sparc64 sans BINFMT_ELF, with select violating dependencies, etc.) Randy Dunlap has spotted some of those; IMO this is simpler than his fix, but it depends upon the stuff that would need to be backported, so we might end up using his variant for -stable. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-06x32: make X32, !IA32_EMULATION setups able to execute x32 binariesAl Viro1-1/+1
It's really trivial - the only wrinkle is making sure that compiler knows that ia32-related side of COMPAT_ARCH_DLINFO is dead code on such configs (we don't get there without having passed compat_elf_check_arch(), and on such configs that'll fail for ia32 binary). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-01-06[amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properlyAl Viro1-0/+1
To get rid of hardcoded size/offset in those macros we need to have a definition of i386 variant of struct elf_prstatus. However, we can't do that in asm/compat.h - the types needed for that are not there and adding an include of asm/user32.h into asm/compat.h would cause a lot of mess. That could be conveniently done in elfcore-compat.h, but currently there is nowhere to put arch-dependent parts of it - no asm/elfcore-compat.h. So we introduce a new file (asm/elfcore-compat.h, present on architectures that have CONFIG_ARCH_HAS_ELFCORE_COMPAT set, currently only on x86), have it pulled by linux/elfcore-compat.h and move the definitions there. As a side benefit, we don't need to worry about accidental inclusion of that file into binfmt_elf.c itself, so we don't need the dance with COMPAT_PRSTATUS_SIZE, etc. - only fs/compat_binfmt_elf.c will see that header. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-12-28fanotify: Fix sys_fanotify_mark() on native x86-32Brian Gerst1-0/+1
Commit 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") converted native x86-32 which take 64-bit arguments to use the compat handlers to allow conversion to passing args via pt_regs. sys_fanotify_mark() was however missed, as it has a general compat handler. Add a config option that will use the syscall wrapper that takes the split args for native 32-bit. [ bp: Fix typo in Kconfig help text. ] Fixes: 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") Reported-by: Paweł Jasiak <pawel@jasiak.xyz> Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com
2020-12-18Merge tag 'trace-v5.11' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The major update to this release is that there's a new arch config option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS. Currently, only x86_64 enables it. All the ftrace callbacks now take a struct ftrace_regs instead of a struct pt_regs. If the architecture has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will have enough information to read the arguments of the function being traced, as well as access to the stack pointer. This way, if a user (like live kernel patching) only cares about the arguments, then it can avoid using the heavier weight "regs" callback, that puts in enough information in the struct ftrace_regs to simulate a breakpoint exception (needed for kprobes). A new config option that audits the timestamps of the ftrace ring buffer at most every event recorded. Ftrace recursion protection has been cleaned up to move the protection to the callback itself (this saves on an extra function call for those callbacks). Perf now handles its own RCU protection and does not depend on ftrace to do it for it (saving on that extra function call). New debug option to add "recursed_functions" file to tracefs that lists all the places that triggered the recursion protection of the function tracer. This will show where things need to be fixed as recursion slows down the function tracer. The eval enum mapping updates done at boot up are now offloaded to a work queue, as it caused a noticeable pause on slow embedded boards. Various clean ups and last minute fixes" * tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits) tracing: Offload eval map updates to a work queue Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS" ring-buffer: Add rb_check_bpage in __rb_allocate_pages ring-buffer: Fix two typos in comments tracing: Drop unneeded assignment in ring_buffer_resize() tracing: Disable ftrace selftests when any tracer is running seq_buf: Avoid type mismatch for seq_buf_init ring-buffer: Fix a typo in function description ring-buffer: Remove obsolete rb_event_is_commit() ring-buffer: Add test to validate the time stamp deltas ftrace/documentation: Fix RST C code blocks tracing: Clean up after filter logic rewriting tracing: Remove the useless value assignment in test_create_synth_event() livepatch: Use the default ftrace_ops instead of REGS when ARGS is available ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs MAINTAINERS: assign ./fs/tracefs to TRACING tracing: Fix some typos in comments ftrace: Remove unused varible 'ret' ring-buffer: Add recording of ring buffer recursion into recursed_functions ...
2020-12-16Merge tag 'asm-generic-timers-5.11' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cross-architecture timer cleanup from Arnd Bergmann: "This cleans up two ancient timer features that were never completed in the past, CONFIG_GENERIC_CLOCKEVENTS and CONFIG_ARCH_USES_GETTIMEOFFSET. There was only one user left for the ARCH_USES_GETTIMEOFFSET variant of clocksource implementations, the ARM EBSA110 platform. Rather than changing to use modern timekeeping, we remove the platform entirely as Russell no longer uses his machine and nobody else seems to have one any more. The conditional code for using arch_gettimeoffset() is removed as a result. For CONFIG_GENERIC_CLOCKEVENTS, there are still a couple of platforms not using clockevent drivers: parisc, ia64, most of m68k, and one Arm platform. These all do timer ticks slighly differently, and this gets cleaned up to the point they at least all call the same helper function. Instead of most platforms using 'select GENERIC_CLOCKEVENTS' in Kconfig, the polarity is now reversed, with the few remaining ones selecting LEGACY_TIMER_TICK instead" * tag 'asm-generic-timers-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: timekeeping: default GENERIC_CLOCKEVENTS to enabled timekeeping: remove xtime_update m68k: remove timer_interrupt() function m68k: change remaining timers to legacy_timer_tick m68k: m68328: use legacy_timer_tick() m68k: sun3/sun3c: use legacy_timer_tick m68k: split heartbeat out of timer function m68k: coldfire: use legacy_timer_tick() parisc: use legacy_timer_tick ARM: rpc: use legacy_timer_tick ia64: convert to legacy_timer_tick timekeeping: add CONFIG_LEGACY_TIMER_TICK timekeeping: remove arch_gettimeoffset net: remove am79c961a driver ARM: remove ebsa110 platform