summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-02-03mm: remove __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand31-73/+0
__HAVE_ARCH_PTE_SWP_EXCLUSIVE is now supported by all architectures that support swp PTEs, so let's drop it. Link: https://lkml.kernel.org/r/20230113171026.582290-27-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03xtensa/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand1-5/+27
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 1. This bit should be safe to use for our usecase. Most importantly, we can still distinguish swap PTEs from PAGE_NONE PTEs (see pte_present()) and don't use one of the two reserved attribute masks (1101 and 1111). Attribute mask 1100 and 1110 now identify swap PTEs. While at it, remove SWP_TYPE_BITS (not really helpful as it's not used in the actual swap macros) and mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-26-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03x86/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also on 32bitDavid Hildenbrand3-10/+44
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE just like we already do on x86-64. After deciphering the PTE layout it becomes clear that there are still unused bits for 2-level and 3-level page tables that we should be able to use. Reusing a bit avoids stealing one bit from the swap offset. While at it, mask the type in __swp_entry(); use some helper definitions to make the macros easier to grasp. Link: https://lkml.kernel.org/r/20230113171026.582290-25-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03um/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand1-2/+35
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 10, which is yet unused for swap PTEs. The pte_mkuptodate() is a bit weird in __pte_to_swp_entry() for a swap PTE ... but it only messes with bit 1 and 2 and there is a comment in set_pte(), so leave these bits alone. While at it, mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-24-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 64bitDavid Hildenbrand1-3/+35
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the type. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit was effectively unused. While at it, mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-23-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03sparc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bitDavid Hildenbrand2-12/+29
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by reusing the SRMMU_DIRTY bit as that seems to be safe to reuse inside a swap PTE. This avoids having to steal one bit from the swap offset. While at it, relocate the swap PTE layout documentation and use the same style now used for most other archs. Note that the old documentation was wrong: we use 20 bit for the offset and the reserved bits were 8 instead of 7 bits in the ascii art. Link: https://lkml.kernel.org/r/20230113171026.582290-22-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03sh/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand1-12/+42
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 6 in the PTE, reducing the swap type in the !CONFIG_X2TLB case to 5 bits. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused. Interrestingly, the swap type in the !CONFIG_X2TLB case could currently overlap with the _PAGE_PRESENT bit, because there is a sneaky shift by 1 in __pte_to_swp_entry() and __swp_entry_to_pte(). Bit 0-7 in the architecture specific swap PTE would get shifted to bit 1-8 in the PTE. As generic MM uses 5 bits only, this didn't matter so far. While at it, mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-21-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03riscv/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand2-5/+27
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the offset. This reduces the maximum swap space per file: on 32bit to 16 GiB (was 32 GiB). Note that this bit does not conflict with swap PMDs and could also be used in swap PMD context later. While at it, mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-20-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03powerpc/nohash/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand7-28/+63
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit and 64bit. On 64bit, let's use MSB 56 (LSB 7), located right next to the page type. On 32bit, let's use LSB 2 to avoid stealing one bit from the swap offset. There seems to be no real reason why these bits cannot be used for swap PTEs. The important part is that _PAGE_PRESENT and _PAGE_HASHPTE remain 0. While at it, mask the type in __swp_entry() and remove _PAGE_BIT_SWAP_TYPE from pte-e500.h: while it was used in 64bit code it was ignored in 32bit code. Link: https://lkml.kernel.org/r/20230113171026.582290-19-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03powerpc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on 32bit book3sDavid Hildenbrand1-5/+33
We already implemented support for 64bit book3s in commit bff9beaa2e80 ("powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE for book3s") Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE also in 32bit by reusing yet unused LSB 2 / MSB 29. There seems to be no real reason why that bit cannot be used, and reusing it avoids having to steal one bit from the swap offset. While at it, mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-18-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03parisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand1-3/+38
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using the yet-unused _PAGE_ACCESSED location in the swap PTE. Looking at pte_present() and pte_none() checks, there seems to be no actual reason why we cannot use it: we only have to make sure we're not using _PAGE_PRESENT. Reusing this bit avoids having to steal one bit from the swap offset. Link: https://lkml.kernel.org/r/20230113171026.582290-17-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03openrisc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand1-5/+36
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the type. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused. While at it, mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-16-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03nios2/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand2-1/+24
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using the yet-unused bit 31. Link: https://lkml.kernel.org/r/20230113171026.582290-15-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03nios2/mm: refactor swap PTE layoutDavid Hildenbrand1-8/+10
nios2 disables swap for a good reason: it doesn't even provide sufficient type bits as required by core MM. However, swap entries are nowadays also used for other purposes (migration entries, PTE markers, HWPoison, ...), and accidential use could be problematic. Let's properly use 5 bits for the swap type and document the layout. Bits 26--31 should get ignored by hardware completely, so they can be used. Link: https://lkml.kernel.org/r/20230113171026.582290-14-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mips/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand3-16/+131
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE. On 64bit, steal one bit from the type. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused. On 32bit we're able to locate unused bits. As the PTE layout for 32 bit is very confusing, document it a bit better. While at it, mask the type in __swp_entry()/mk_swap_pte(). Link: https://lkml.kernel.org/r/20230113171026.582290-13-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03microblaze/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand2-12/+37
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the type. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused. The shift by 2 when converting between PTE and arch-specific swap entry makes the swap PTE layout a little bit harder to decipher. While at it, drop the comment from paulus---copy-and-paste leftover from powerpc where we actually have _PAGE_HASHPTE---and mask the type in __swp_entry_to_pte() as well. Link: https://lkml.kernel.org/r/20230113171026.582290-12-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03m68k/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand3-9/+104
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the type. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused. While at it, make sure for sun3 that the valid bit never gets set by properly masking it off and mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-11-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03m68k/mm: remove dummy __swp definitions for nommuDavid Hildenbrand1-6/+0
The definitions are not required, let's remove them. Link: https://lkml.kernel.org/r/20230113171026.582290-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03loongarch/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand2-4/+39
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the type. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused. While at it, also mask the type in mk_swap_pte(). Note that this bit does not conflict with swap PMDs and could also be used in swap PMD context later. Link: https://lkml.kernel.org/r/20230113171026.582290-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03ia64/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand1-3/+29
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the type. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused. While at it, also mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-8-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03hexagon/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand1-6/+31
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the offset. This reduces the maximum swap space per file to 16 GiB (was 32 GiB). While at it, mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Brian Cain <bcain@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03csky/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand3-11/+39
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the offset. This reduces the maximum swap space per file to 16 GiB (was 32 GiB). We might actually be able to reuse one of the other software bits (_PAGE_READ / PAGE_WRITE) instead, because we only have to keep pte_present(), pte_none() and HW happy. For now, let's keep it simple because there might be something non-obvious. Link: https://lkml.kernel.org/r/20230113171026.582290-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Guo Ren <guoren@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03arm/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand3-7/+34
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the offset. This reduces the maximum swap space per file to 64 GiB (was 128 GiB). While at it drop the PTE_TYPE_FAULT from __swp_entry_to_pte() which is defined to be 0 and is rather confusing because we should be dealing with "Linux PTEs" not "hardware PTEs". Also, properly mask the type in __swp_entry(). Link: https://lkml.kernel.org/r/20230113171026.582290-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03arc/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand1-3/+24
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by using bit 5, which is yet unused. The only important parts seems to be to not use _PAGE_PRESENT (bit 9). Link: https://lkml.kernel.org/r/20230113171026.582290-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Vineet Gupta <vgupta@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03alpha/mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVEDavid Hildenbrand1-4/+37
Let's support __HAVE_ARCH_PTE_SWP_EXCLUSIVE by stealing one bit from the type. Generic MM currently only uses 5 bits for the type (MAX_SWAPFILES_SHIFT), so the stolen bit is effectively unused. While at it, mask the type in mk_swap_pte() as well. Link: https://lkml.kernel.org/r/20230113171026.582290-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checksDavid Hildenbrand1-1/+24
Patch series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". This is the follow-up on [1]: [PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages After we implemented __HAVE_ARCH_PTE_SWP_EXCLUSIVE on most prominent enterprise architectures, implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all remaining architectures that support swap PTEs. This makes sure that exclusive anonymous pages will stay exclusive, even after they were swapped out -- for example, making GUP R/W FOLL_GET of anonymous pages reliable. Details can be found in [1]. This primarily fixes remaining known O_DIRECT memory corruptions that can happen on concurrent swapout, whereby we can lose DMA reads to a page (modifying the user page by writing to it). To verify, there are two test cases (requiring swap space, obviously): (1) The O_DIRECT+swapout test case [2] from Andrea. This test case tries triggering a race condition. (2) My vmsplice() test case [3] that tries to detect if the exclusive marker was lost during swapout, not relying on a race condition. For example, on 32bit x86 (with and without PAE), my test case fails without these patches: $ ./test_swp_exclusive FAIL: page was replaced during COW But succeeds with these patches: $ ./test_swp_exclusive PASS: page was not replaced during COW Why implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all architectures, even the ones where swap support might be in a questionable state? This is the first step towards removing "readable_exclusive" migration entries, and instead using pte_swp_exclusive() also with (readable) migration entries instead (as suggested by Peter). The only missing piece for that is supporting pmd_swp_exclusive() on relevant architectures with THP migration support. As all relevant architectures now implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE,, we can drop __HAVE_ARCH_PTE_SWP_EXCLUSIVE in the last patch. I tried cross-compiling all relevant setups and tested on x86 and sparc64 so far. CCing arch maintainers only on this cover letter and on the respective patch(es). [1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com [2] https://gitlab.com/aarcange/kernel-testcases-for-v5.11/-/blob/main/page_count_do_wp_page-swap.c [3] https://gitlab.com/davidhildenbrand/scratchspace/-/blob/main/test_swp_exclusive.c This patch (of 26): We want to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures. Let's extend our sanity checks, especially testing that our PTE bit does not affect: * is_swap_pte() -> pte_present() and pte_none() * the swap entry + type * pte_swp_soft_dirty() Especially, the pfn_pte() is dodgy when the swap PTE layout differs heavily from ordinary PTEs. Let's properly construct a swap PTE from swap type+offset. [david@redhat.com: fix build] Link: https://lkml.kernel.org/r/6aaad548-cf48-77fa-9d6c-db83d724b2eb@redhat.com Link: https://lkml.kernel.org/r/20230113171026.582290-1-david@redhat.com Link: https://lkml.kernel.org/r/20230113171026.582290-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: <aou@eecs.berkeley.edu> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: H. Peter Anvin (Intel) <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Nadav Amit <namit@vmware.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuerui Wang <kernel@xen0n.name> Cc: Yang Shi <shy828301@gmail.com> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/khugepaged: convert release_pte_pages() to use foliosVishal Moola (Oracle)1-7/+7
Converts release_pte_pages() to use folios instead of pages. Link: https://lkml.kernel.org/r/20230114001556.43795-2-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/khugepaged: introduce release_pte_folio() to replace release_pte_page()Vishal Moola (Oracle)1-5/+10
release_pte_page() is converted to be a wrapper for release_pte_folio() to help facilitate the khugepaged conversion to folios. This replaces 3 calls to compound_head() with 1, and saves 85 bytes of kernel text. Link: https://lkml.kernel.org/r/20230114001556.43795-1-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03kmsan: silence -Wmissing-prototypes warningsAlexander Potapenko1-0/+23
When building the kernel with W=1, the compiler reports numerous warnings about the missing prototypes for KMSAN instrumentation hooks. Because these functions are not supposed to be called explicitly by the kernel code (calls to them are emitted by the compiler), they do not have to be declared in the headers. Instead, we add forward declarations right before the definitions to silence the warnings produced by -Wmissing-prototypes. Link: https://lkml.kernel.org/r/20230112103147.382416-1-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Marco Elver <elver@google.com> Reviewed-by: Marco Elver <elver@google.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202301020356.dFruA4I5-lkp@intel.com/T/ Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03Documentation/mm: update references to __m[un]lock_page() to *_folio()Lorenzo Stoakes1-15/+15
We now pass folios to these functions, so update the documentation accordingly. Additionally, correct the outdated reference to __pagevec_lru_add_fn(), the referenced action occurs in __munlock_folio() directly now, replace reference to lru_cache_add_inactive_or_unevictable() with the modern folio equivalent folio_add_lru_vma() and reference folio flags by the flag name rather than accessor. Link: https://lkml.kernel.org/r/898c487169d98a7f09c1c1e57a7dfdc2b3f6bf0f.1673526881.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm: mlock: update the interface to use foliosLorenzo Stoakes6-45/+49
Update the mlock interface to accept folios rather than pages, bringing the interface in line with the internal implementation. munlock_vma_page() still requires a page_folio() conversion, however this is consistent with the existent mlock_vma_page() implementation and a product of rmap still dealing in pages rather than folios. Link: https://lkml.kernel.org/r/cba12777c5544305014bc0cbec56bb4cc71477d8.1673526881.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03m68k/mm/motorola: specify pmd_page() typeLorenzo Stoakes1-1/+1
Failing to specify a specific type here breaks anything that relies on the type being explicitly known, such as page_folio(). Make explicit the type of null pointer returned here. Link: https://lkml.kernel.org/r/ad6be2821bbd6af10966b3704568ff458b270d9c.1673526881.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm: mlock: use folios and a folio batch internallyLorenzo Stoakes1-122/+124
This brings mlock in line with the folio batches declared in mm/swap.c and makes the code more consistent across the two. The existing mechanism for identifying which operation each folio in the batch is undergoing is maintained, i.e. using the lower 2 bits of the struct folio address (previously struct page address). This should continue to function correctly as folios remain at least system word-aligned. All invocations of mlock() pass either a non-compound page or the head of a THP-compound page and no tail pages need updating so this functionality works with struct folios being used internally rather than struct pages. In this patch the external interface is kept identical to before in order to maintain separation between patches in the series, using a rather awkward conversion from struct page to struct folio in relevant functions. However, this maintenance of the existing interface is intended to be temporary - the next patch in the series will update the interfaces to accept folios directly. Link: https://lkml.kernel.org/r/9f894d54d568773f4ed3cb0eef5f8932f62c95f4.1673526881.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm: pagevec: add folio_batch_reinit()Lorenzo Stoakes1-0/+5
Patch series "update mlock to use folios", v4. This series updates mlock to use folios, converting the internal interface to using folios exclusively and exposing the folio interface externally. As a product of this we move to using a folio batch rather than a pagevec for mlock folios, which brings it in line with the core folio batches contained in mm/swap.c. This patch (of 5): This performs the same task as pagevec_reinit(), only modifying a folio batch rather than a pagevec. Link: https://lkml.kernel.org/r/cover.1673526881.git.lstoakes@gmail.com Link: https://lkml.kernel.org/r/9018cecacb39e34c883540f997f9be8281153613.1673526881.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christian Brauner <brauner@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm: madvise: use vm_normal_folio() in madvise_free_pte_range()Kefeng Wang1-4/+2
There is already a vm_normal_folio(), use it to make madvise_free_pte_range() only use a folio. Link: https://lkml.kernel.org/r/20230112124028.16964-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03shmem: convert shmem_write_end() to use a folioMatthew Wilcox (Oracle)1-20/+10
Use a folio internally to shmem_write_end() which saves a number of calls to compound_head() and lets us get rid of the custom code to zero out the rest of a THP and supports folios of arbitrary size. Link: https://lkml.kernel.org/r/20230112131031.1209553-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert unpoison_memory() to foliosSidhartha Kumar1-11/+9
Use a folio inside unpoison_memory which replaces a compound_head() call with a call to page_folio(). Link: https://lkml.kernel.org/r/20230112204608.80136-9-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert hugetlb_set_page_hwpoison() to foliosSidhartha Kumar1-4/+3
Change hugetlb_set_page_hwpoison() to folio_set_hugetlb_hwpoison() and use a folio internally. Link: https://lkml.kernel.org/r/20230112204608.80136-8-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert __free_raw_hwp_pages() to foliosSidhartha Kumar1-4/+3
Change __free_raw_hwp_pages() to __folio_free_raw_hwp() and modify its callers to pass in a folio. Link: https://lkml.kernel.org/r/20230112204608.80136-7-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert raw_hwp_list_head() to foliosSidhartha Kumar1-7/+9
Change raw_hwp_list_head() to take in a folio and modify its callers to pass in a folio. Also converts two users of hugetlb specific page macro users to their folio equivalents. Link: https://lkml.kernel.org/r/20230112204608.80136-6-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert free_raw_hwp_pages() to foliosSidhartha Kumar1-10/+12
Change free_raw_hwp_pages() to folio_free_raw_hwp(), converts two users of hugetlb specific page macro users to their folio equivalents. Link: https://lkml.kernel.org/r/20230112204608.80136-5-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert hugetlb_clear_page_hwpoison to foliosSidhartha Kumar3-8/+8
Change hugetlb_clear_page_hwpoison() to folio_clear_hugetlb_hwpoison() by changing the function to take in a folio. This converts one use of ClearPageHWPoison and HPageRawHwpUnreliable to their folio equivalents. Link: https://lkml.kernel.org/r/20230112204608.80136-4-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert try_memory_failure_hugetlb() to foliosSidhartha Kumar1-13/+13
Use a struct folio rather than a head page in try_memory_failure_hugetlb. This converts one user of SetHPageMigratable to the folio equivalent. Link: https://lkml.kernel.org/r/20230112204608.80136-3-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/memory-failure: convert __get_huge_page_for_hwpoison() to foliosSidhartha Kumar1-10/+10
Patch series "convert hugepage memory failure functions to folios". This series contains a 1:1 straightforward page to folio conversion for memory failure functions which deal with huge pages. I renamed a few functions to fit with how other folio operating functions are named. These include: hugetlb_clear_page_hwpoison -> folio_clear_hugetlb_hwpoison free_raw_hwp_pages -> folio_free_raw_hwp __free_raw_hwp_pages -> __folio_free_raw_hwp hugetlb_set_page_hwpoison -> folio_set_hugetlb_hwpoison The goal of this series was to reduce users of the hugetlb specific page flag macros which take in a page so users are protected by the compiler to make sure they are operating on a head page. This patch (of 8): Use a folio throughout the function rather than using a head page. This also reduces the users of the page version of hugetlb specific page flags. Link: https://lkml.kernel.org/r/20230112204608.80136-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03Revert "x86: kmsan: sync metadata pages on page fault"Alexander Potapenko1-22/+1
This reverts commit 3f1e2c7a9099c1ed32c67f12cdf432ba782cf51f. As noticed by Qun-Wei Lin, arch_sync_kernel_mappings() in arch/x86/mm/fault.c is only used with CONFIG_X86_32, whereas KMSAN is only supported on x86_64, where this code is not compiled. The patch in question dates back to downstream KMSAN branch based on v5.8-rc5, it sneaked into upstream unnoticed in v6.1. Link: https://lkml.kernel.org/r/20230111101806.3236991-1-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Qun-Wei Lin <qun-wei.lin@mediatek.com> Link: https://github.com/google/kmsan/issues/91 Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Elver <elver@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/mmap: fix comment of unmapped_area{_topdown}Vernon Yang1-5/+5
The low_limit of unmapped area information is inclusive, and the hight_limit is not, so make symbol to be [ instead of (. And replace hight_limit to high_limit. Link: https://lkml.kernel.org/r/20230111132036.801404-1-vernon2gm@gmail.com Fixes: 3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}") Signed-off-by: Vernon Yang <vernon2gm@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03maple_tree: fix comment of mte_destroy_walkVernon Yang1-2/+2
The parameter name of maple tree is mt, make the comment be mt instead of mn, and the separator between the parameter name and the description to be : instead of -. Link: https://lkml.kernel.org/r/20230111135348.803181-1-vernon2gm@gmail.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Vernon Yang <vernon2gm@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm: remove the hugetlb field from struct pageSidhartha Kumar1-12/+0
Patch series "Get rid of tail page fields". Continue the shrinkage of the struct page definition by getting rid of the 'first tail page' and 'second tail page' fields. I originally did this patch set before Hugh's rewrite of the subpages_mapcount, so it needed substantial updates; hope I didn't miss anything. This patch (of 28): commit dad6a5eb5556(mm,hugetlb: use folio fields in second tail page) added a transitional hugetlb field to struct page and struct folio to make room for another int in the first tail of a compound page. Hugetlb folio conversions have changed all page users of this field to use the fields within the folio so struct page no longer needs this hugetlb specific field. Link: https://lkml.kernel.org/r/20230111142915.1001531-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230111142915.1001531-29-willy@infradead.org Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm: convert deferred_split_huge_page() to deferred_split_folio()Matthew Wilcox (Oracle)4-8/+7
Now that both callers use a folio, pass the folio in and save a call to compound_head(). Link: https://lkml.kernel.org/r/20230111142915.1001531-28-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/huge_memory: convert get_deferred_split_queue() to take a folioMatthew Wilcox (Oracle)1-8/+10
Removes a few calls to compound_head(). Link: https://lkml.kernel.org/r/20230111142915.1001531-27-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>