summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-06mm/madvise: introduce clear_young_dirty_ptes() batch helperLance Yang3-31/+55
Patch series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free", v10. This patchset adds support for lazyfreeing multi-size THP (mTHP) without needing to first split the large folio via split_folio(). However, we still need to split a large folio that is not fully mapped within the target range. If a large folio is locked or shared, or if we fail to split it, we just leave it in place and advance to the next PTE in the range. But note that the behavior is changed; previously, any failure of this sort would cause the entire operation to give up. As large folios become more common, sticking to the old way could result in wasted opportunities. Performance Testing =================== On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by PTE-mapped folios of the same size results in the following runtimes for madvise(MADV_FREE) in seconds (shorter is better): Folio Size | Old | New | Change ------------------------------------------ 4KiB | 0.590251 | 0.590259 | 0% 16KiB | 2.990447 | 0.185655 | -94% 32KiB | 2.547831 | 0.104870 | -95% 64KiB | 2.457796 | 0.052812 | -97% 128KiB | 2.281034 | 0.032777 | -99% 256KiB | 2.230387 | 0.017496 | -99% 512KiB | 2.189106 | 0.010781 | -99% 1024KiB | 2.183949 | 0.007753 | -99% 2048KiB | 0.002799 | 0.002804 | 0% This patch (of 4): This commit introduces clear_young_dirty_ptes() to replace mkold_ptes(). By doing so, we can use the same function for both use cases (madvise_pageout and madvise_free), and it also provides the flexibility to only clear the dirty flag in the future if needed. Link: https://lkml.kernel.org/r/20240418134435.6092-1-ioworker0@gmail.com Link: https://lkml.kernel.org/r/20240418134435.6092-2-ioworker0@gmail.com Signed-off-by: Lance Yang <ioworker0@gmail.com> Suggested-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Jeff Xie <xiehuan09@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: swapfile: check usable swap device in __folio_throttle_swaprate()Kefeng Wang1-3/+10
Skip blk_cgroup_congested() if there is no usable swap device since no swapin/out will occur, Thereby avoid taking swap_lock. The difference is shown below from perf date of CoW pagefault, perf report -g -i perf.data.swapon | egrep "blk_cgroup_congested|__folio_throttle_swaprate" 1.01% 0.16% page_fault2_pro [kernel.kallsyms] [k] __folio_throttle_swaprate 0.83% 0.80% page_fault2_pro [kernel.kallsyms] [k] blk_cgroup_congested perf report -g -i perf.data.swapoff | egrep "blk_cgroup_congested|__folio_throttle_swaprate" 0.15% 0.15% page_fault2_pro [kernel.kallsyms] [k] __folio_throttle_swaprate Link: https://lkml.kernel.org/r/20240418135644.2736748-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/huge_memory: improve split_huge_page_to_list_to_order() return value ↵David Hildenbrand1-3/+11
documentation The documentation is wrong and relying on it almost resulted in BUGs in new callers: ever since fd4a7ac32918 ("mm: migrate: try again if THP split is failed due to page refcnt") we return -EAGAIN on unexpected folio references, not -EBUSY. Let's fix that and also document which other return values we can currently see and why they could happen. [david@redhat.com: v2] Link: https://lkml.kernel.org/r/20240422194217.442933-1-david@redhat.com Link: https://lkml.kernel.org/r/20240418151834.216557-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/page_table_check: support userfault wr-protect entriesPeter Xu3-18/+39
Allow page_table_check hooks to check over userfaultfd wr-protect criteria upon pgtable updates. The rule is no co-existance allowed for any writable flag against userfault wr-protect flag. This should be better than c2da319c2e, where we used to only sanitize such issues during a pgtable walk, but when hitting such issue we don't have a good chance to know where does that writable bit came from [1], so that even the pgtable walk exposes a kernel bug (which is still helpful on triaging) but not easy to track and debug. Now we switch to track the source. It's much easier too with the recent introduction of page table check. There are some limitations with using the page table check here for userfaultfd wr-protect purpose: - It is only enabled with explicit enablement of page table check configs and/or boot parameters, but should be good enough to track at least syzbot issues, as syzbot should enable PAGE_TABLE_CHECK[_ENFORCED] for x86 [1]. We used to have DEBUG_VM but it's now off for most distros, while distros also normally not enable PAGE_TABLE_CHECK[_ENFORCED], which is similar. - It conditionally works with the ptep_modify_prot API. It will be bypassed when e.g. XEN PV is enabled, however still work for most of the rest scenarios, which should be the common cases so should be good enough. - Hugetlb check is a bit hairy, as the page table check cannot identify hugetlb pte or normal pte via trapping at set_pte_at(), because of the current design where hugetlb maps every layers to pte_t... For example, the default set_huge_pte_at() can invoke set_pte_at() directly and lose the hugetlb context, treating it the same as a normal pte_t. So far it's fine because we have huge_pte_uffd_wp() always equals to pte_uffd_wp() as long as supported (x86 only). It'll be a bigger problem when we'll define _PAGE_UFFD_WP differently at various pgtable levels, because then one huge_pte_uffd_wp() per-arch will stop making sense first.. as of now we can leave this for later too. This patch also removes commit c2da319c2e altogether, as we have something better now. [1] https://lore.kernel.org/all/000000000000dce0530615c89210@google.com/ Link: https://lkml.kernel.org/r/20240417212549.2766883-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/hugetlb: assert hugetlb_lock in __hugetlb_cgroup_commit_chargePeter Xu1-1/+1
This is similar to __hugetlb_cgroup_uncharge_folio() where it relies on holding hugetlb_lock. Add the similar assertion like the other one, since it looks like such things may help some day. Link: https://lkml.kernel.org/r/20240417211836.2742593-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06fs/proc/task_mmu: convert smaps_hugetlb_range() to work on foliosDavid Hildenbrand1-6/+7
Let's get rid of another page_mapcount() check and simply use folio_likely_mapped_shared(), which is precise for hugetlb folios. While at it, use huge_ptep_get() + pte_page() instead of ptep_get() + vm_normal_page(), just like we do in pagemap_hugetlb_range(). No functional change intended. Link: https://lkml.kernel.org/r/20240417092313.753919-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06fs/proc/task_mmu: convert pagemap_hugetlb_range() to work on foliosDavid Hildenbrand1-3/+4
Patch series "fs/proc/task_mmu: convert hugetlb functions to work on folis". Let's convert two more functions, getting rid of two more page_mapcount() calls. This patch (of 2): Let's get rid of another page_mapcount() check and simply use folio_likely_mapped_shared(), which is precise for hugetlb folios. While at it, also check for PMD table sharing, like we do in smaps_hugetlb_range(). No functional change intended, except that we would now detect hugetlb folios shared via PMD table sharing correctly. Link: https://lkml.kernel.org/r/20240417092313.753919-1-david@redhat.com Link: https://lkml.kernel.org/r/20240417092313.753919-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/sparse: guard the size of mem_section is power of 2Wei Yang1-0/+2
We usually have this check, while commit 2a3cb8baef71 ("mm/sparse: delete old sparse_init and enable new one") missed to take it. Link: https://lkml.kernel.org/r/20240416012559.4536-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06doc: split buffer.rst out of api-summary.rstMatthew Wilcox (Oracle)3-3/+13
Buffer heads are no longer a generic filesystem API but an optional filesystem support library. Make the documentation structure reflect that, and include the fine documentation kept in buffer_head.h. We could give a better overview of what buffer heads are all about, but my enthusiasm for documenting it is limited. [willy@infradead.org: fix kerneldoc warning] Link: https://lkml.kernel.org/r/20240417015933.453505-1-willy@infradead.org [akpm@linux-foundation.org: remove newline at EOF] Link: https://lkml.kernel.org/r/20240416031754.4076917-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06buffer: improve bdev_getblk documentationMatthew Wilcox (Oracle)1-0/+5
Add some more information about the state of the buffer_head returned. Link: https://lkml.kernel.org/r/20240416031754.4076917-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06buffer: add kernel-doc for bforget() and __bforget()Matthew Wilcox (Oracle)2-3/+16
Distinguish these functions from brelse() and __brelse(). Link: https://lkml.kernel.org/r/20240416031754.4076917-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06buffer: add kernel-doc for brelse() and __brelse()Matthew Wilcox (Oracle)2-9/+24
Move the documentation for __brelse() to brelse(), format it as kernel-doc and update it from talking about pages to folios. Link: https://lkml.kernel.org/r/20240416031754.4076917-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06buffer: fix __bread and __bread_gfp kernel-docMatthew Wilcox (Oracle)2-22/+35
The extra indentation confused the kernel-doc parser, so remove it. Fix some other wording while I'm here, and advise the user they need to call brelse() on this buffer. __bread_gfp() isn't used directly by filesystems, but the other wrappers for it don't have documentation, so document it accordingly. Link: https://lkml.kernel.org/r/20240416031754.4076917-5-willy@infradead.org Co-developed-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06buffer: add kernel-doc for try_to_free_buffers()Matthew Wilcox (Oracle)1-20/+24
The documentation for this function has become separated from it over time; move it to the right place and turn it into kernel-doc. Mild editing of the content to make it more about what the function does, and less about how it does it. Link: https://lkml.kernel.org/r/20240416031754.4076917-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06buffer: add kernel-doc for block_dirty_folio()Matthew Wilcox (Oracle)1-24/+31
Turn the excellent documentation for this function into kernel-doc. Replace 'page' with 'folio' and make a few other minor updates. Link: https://lkml.kernel.org/r/20240416031754.4076917-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06doc: improve the description of __folio_mark_dirtyMatthew Wilcox (Oracle)1-5/+9
Patch series "Improve buffer head documentation", v3. Turn buffer head documentation into its own document, and make many general improvements to the docs. Obviously there is much more that could be done. Tested with make htmldocs. This patch (of 8): I've learned why it's safe to call __folio_mark_dirty() from mark_buffer_dirty() without holding the folio lock, so update the description to explain why. Link: https://lkml.kernel.org/r/20240416031754.4076917-1-willy@infradead.org Link: https://lkml.kernel.org/r/20240416031754.4076917-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Pankaj Raghav <p.raghav@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06xarray: inline xas_descend to improve performanceLong Li1-1/+2
The commit 63b1898fffcd ("XArray: Disallow sibling entries of nodes") modified the xas_descend function in such a way that it was no longer being compiled as an inline function, because it increased the size of xas_descend(), and the compiler no longer optimizes it as inline. This had a negative impact on performance, xas_descend is called frequently to traverse downwards in the xarray tree, making it a hot function. Inlining xas_descend has been shown to significantly improve performance by approximately 4.95% in the iozone write test. Machine: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz #iozone i 0 -i 1 -s 64g -r 16m -f /test/tmptest Before this patch: kB reclen write rewrite read reread 67108864 16384 2230080 3637689 6315197 5496027 After this patch: kB reclen write rewrite read reread 67108864 16384 2340360 3666175 6272401 5460782 Percentage change: 4.95% 0.78% -0.68% -0.64% This patch introduces inlining to the xas_descend function. While this change increases the size of lib/xarray.o, the performance gains in critical workloads make this an acceptable trade-off. Size comparison before and after patch: .text .data .bss file 0x3502 0 0 lib/xarray.o.before 0x3602 0 0 lib/xarray.o.after Link: https://lkml.kernel.org/r/20240416061628.3768901-1-leo.lilong@huawei.com Signed-off-by: Long Li <leo.lilong@huawei.com> Cc: Hou Tao <houtao1@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: yangerkun <yangerkun@huawei.com> Cc: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: remove page_mapcount() usage in stable_tree_search()David Hildenbrand1-5/+8
We want to limit the use of page_mapcount() to the places where it is absolutely necessary. If our folio has a stable node, it is a (small) KSM folio -- see folio_stable_node(). Let's use folio_mapcount() in stable_tree_search() instead, which results in no functional change. The mapcount > 1 check is a bit confusing, because that's usually a check for page sharing. Looks like the reason is that we are guaranteed to not exceed ksm_max_page_sharing for the tree KSM folio when merging with that. Let's update the documentation to make that clearer. Link: https://lkml.kernel.org/r/20240416172533.663418-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alex Shi <alexs@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: zswap: remove same_filled module paramsYosry Ahmed3-50/+0
These knobs offer more fine-grained control to userspace than needed and directly expose/influence kernel implementation; remove them. For disabling same_filled handling, there is no logical reason to refuse storing same-filled pages more efficiently and opt for compression. Scanning pages for patterns may be an argument, but the page contents will be read into the CPU cache anyway during compression. Also, removing the same_filled handling code does not move the needle significantly in terms of performance anyway [1]. For disabling non_same_filled handling, it was added when the compressed pages in zswap were not being properly charged to memcgs, as workloads could escape the accounting with compression [2]. This is no longer the case after commit f4840ccfca25 ("zswap: memcg accounting"), and using zswap without compression does not make much sense. [1]https://lore.kernel.org/lkml/CAJD7tkaySFP2hBQw4pnZHJJwe3bMdjJ1t9VC2VJd=khn1_TXvA@mail.gmail.com/ [2]https://lore.kernel.org/lkml/19d5cdee-2868-41bd-83d5-6da75d72e940@maciej.szmigiero.name/ [yosryahmed@google.com: remove same_filled_pages from docs] Link: https://lkml.kernel.org/r/ZhxFVggdyvCo79jc@google.com Link: https://lkml.kernel.org/r/20240413022407.785696-5-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: zswap: move more same-filled pages checks outside of zswap_store()Yosry Ahmed1-20/+25
Currently, zswap_store() checks zswap_same_filled_pages_enabled, kmaps the folio, then calls zswap_is_page_same_filled() to check the folio contents. Move this logic into zswap_is_page_same_filled() as well (and rename it to use 'folio' while we are at it). This makes zswap_store() cleaner, and makes following changes to that logic contained within the helper. While we are at it: - Rename the insert_entry label to store_entry to match xa_store(). - Add comment headers for same-filled functions and the main API functions (load, store, invalidate, swapon, swapoff). No functional change intended. Link: https://lkml.kernel.org/r/20240413022407.785696-4-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: zswap: refactor limit checking from zswap_store()Yosry Ahmed1-16/+16
Refactor limit and acceptance threshold checking outside of zswap_store(). This code will be moved around in a following patch, so it would be cleaner to move a function call around. Link: https://lkml.kernel.org/r/20240413022407.785696-3-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: zswap: always shrink in zswap_store() if zswap_pool_reached_fullYosry Ahmed1-6/+4
Patch series "zswap same-filled and limit checking cleanups", v3. Miscellaneous cleanups for limit checking and same-filled handling in the store path. This series was broken out of the "zswap: store zero-filled pages more efficiently" series [1]. It contains the cleanups and drops the main functional changes. [1]https://lore.kernel.org/lkml/20240325235018.2028408-1-yosryahmed@google.com/ This patch (of 4): The cleanup code in zswap_store() is not pretty, particularly the 'shrink' label at the bottom that ends up jumping between cleanup labels. Instead of having a dedicated label to shrink the pool, just use zswap_pool_reached_full directly to figure out if the pool needs shrinking. zswap_pool_reached_full should be true if and only if the pool needs shrinking. The only caveat is that the value of zswap_pool_reached_full may be changed by concurrent zswap_store() calls between checking the limit and testing zswap_pool_reached_full in the cleanup code. This is fine because: - If zswap_pool_reached_full was true during limit checking then became false during the cleanup code, then someone else already took care of shrinking the pool and there is no need to queue the worker. That would be a good change. - If zswap_pool_reached_full was false during limit checking then became true during the cleanup code, then someone else hit the limit meanwhile. In this case, both threads will try to queue the worker, but it never gets queued more than once anyway. Also, calling queue_work() multiple times when the limit is hit could already happen today, so this isn't a significant change in any way. Link: https://lkml.kernel.org/r/20240413022407.785696-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20240413022407.785696-2-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06userfaultfd: remove WRITE_ONCE when setting folio->index during UFFDIO_MOVESuren Baghdasaryan2-2/+2
When folio is moved with UFFDIO_MOVE it gets locked before the rmap and index are modified. Due to the folio lock being already held, WRITE_ONCE() is not needed when setting the folio index. Remove it. Link: https://lkml.kernel.org/r/20240415020821.1152951-1-surenb@google.com Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: page_alloc: allowing mTHP compaction to capture the freed page directlyBaolin Wang1-2/+4
Currently, compaction_capture() does not allow lower-order allocations to directly capture the movable free pages, even though lower-order allocations might also be requesting movable pages, that can lead to more compaction scanning. And, with the enablement of mTHP, such situations will become more common. Thus allowing lower-order (mTHP) allocations of movable page types directly capture the movable free pages can avoid unnecessary compaction scanning, meanwhile that won't pollute the movable pageblock. With testing 1M mTHP compaction, it can be seen that compaction scanning is significantly reduced. mm-unstable patched Ops Compaction pages isolated 116598741.00 120946702.00 Ops Compaction migrate scanned 1764870054.00 1488621550.00 Ops Compaction free scanned 7707879039.00 4986299318.00 Ops Compact scan efficiency 22.90 29.85 Ops Compaction cost 73797.69 72933.48 Link: https://lkml.kernel.org/r/8118a5d66a034736a48433beddaca60ed78577c4.1712892329.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: filemap: batch mm counter updating in filemap_map_pages()Kefeng Wang1-9/+12
Like copy_pte_range()/zap_pte_range(), make mm counter batch updating in filemap_map_pages(), since folios type are same(MM_SHMEMPAGES or MM_FILEPAGES) in filemap_map_pages(), only check the first folio type is enough, the 'lat_pagefault -P 1 file' test from lmbench shows 12% improvement, and the percpu_counter_add_batch() is gone from perf flame graph. Link: https://lkml.kernel.org/r/20240412064751.119015-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: move mm counter updating out of set_pte_range()Kefeng Wang2-3/+9
Patch series "mm: batch mm counter updating in filemap_map_pages()", v3. Let's batch mm counter updating to accelerate filemap_map_pages(). This patch (of 2): In order to support batch mm counter updating in filemap_map_pages(), move mm counter updating out of set_pte_range(), the folios are file from filemap, and distinguish folios by vmf->flags and vma->vm_flags from another caller finish_fault(). Link: https://lkml.kernel.org/r/20240412064751.119015-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240412064751.119015-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: correct the docs for thp_fault_alloc and thp_fault_fallbackBarry Song1-2/+2
The documentation does not align with the code. In __do_huge_pmd_anonymous_page(), THP_FAULT_FALLBACK is incremented when mem_cgroup_charge() fails, despite the allocation succeeding, whereas THP_FAULT_ALLOC is only incremented after a successful charge. Link: https://lkml.kernel.org/r/20240412114858.407208-5-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Chris Li <chrisl@kernel.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: add docs for per-order mTHP counters and transhuge_page ABIBarry Song2-0/+46
This patch includes documentation for mTHP counters and an ABI file for sys-kernel-mm-transparent-hugepage, which appears to have been missing for some time. [v-songbaohua@oppo.com: fix the name and unexpected indentation] Link: https://lkml.kernel.org/r/20240415054538.17071-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240412114858.407208-4-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Chris Li <chrisl@kernel.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: add per-order mTHP anon_swpout and anon_swpout_fallback countersBarry Song4-0/+10
This helps to display the fragmentation situation of the swapfile, knowing the proportion of how much we haven't split large folios. So far, we only support non-split swapout for anon memory, with the possibility of expanding to shmem in the future. So, we add the "anon" prefix to the counter names. Link: https://lkml.kernel.org/r/20240412114858.407208-3-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chris Li <chrisl@kernel.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback countersBarry Song3-0/+78
Patch series "mm: add per-order mTHP alloc and swpout counters", v6. The patchset introduces a framework to facilitate mTHP counters, starting with the allocation and swap-out counters. Currently, only four new nodes are appended to the stats directory for each mTHP size. /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats anon_fault_alloc anon_fault_fallback anon_fault_fallback_charge anon_swpout anon_swpout_fallback These nodes are crucial for us to monitor the fragmentation levels of both the buddy system and the swap partitions. In the future, we may consider adding additional nodes for further insights. This patch (of 4): Profiling a system blindly with mTHP has become challenging due to the lack of visibility into its operations. Presenting the success rate of mTHP allocations appears to be pressing need. Recently, I've been experiencing significant difficulty debugging performance improvements and regressions without these figures. It's crucial for us to understand the true effectiveness of mTHP in real-world scenarios, especially in systems with fragmented memory. This patch establishes the framework for per-order mTHP counters. It begins by introducing the anon_fault_alloc and anon_fault_fallback counters. Additionally, to maintain consistency with thp_fault_fallback_charge in /proc/vmstat, this patch also tracks anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP. Incorporating additional counters should now be straightforward as well. Link: https://lkml.kernel.org/r/20240412114858.407208-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240412114858.407208-2-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chris Li <chrisl@kernel.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/hugetlb: rename dissolve_free_huge_pages() to dissolve_free_hugetlb_folios()Sidhartha Kumar3-5/+5
dissolve_free_huge_pages() only uses folios internally, rename it to dissolve_free_hugetlb_folios() and change the comments which reference it. [akpm@linux-foundation.org: remove unneeded `extern'] Link: https://lkml.kernel.org/r/20240412182139.120871-2-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/hugetlb: convert dissolve_free_huge_pages() to foliosSidhartha Kumar3-15/+14
Allows us to rename dissolve_free_huge_pages() to dissolve_free_hugetlb_folio(). Convert one caller to pass in a folio directly and use page_folio() to convert the caller in mm/memory-failure. [sidhartha.kumar@oracle.com: remove unneeded `extern'] Link: https://lkml.kernel.org/r/71760ed4-e80d-493a-95ea-2545414b1aba@oracle.com [sidhartha.kumar@oracle.com: v2] Link: https://lkml.kernel.org/r/20240412182139.120871-1-sidhartha.kumar@oracle.com Link: https://lkml.kernel.org/r/20240411164756.261178-1-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: replace set_page_stable_node by folio_set_stable_nodeAlex Shi (tencent)1-9/+3
Only single page could be reached where we set stable node after write protect, so use folio converted func to replace page's. And remove the unused func set_page_stable_node(). Link: https://lkml.kernel.org/r/20240411061713.1847574-11-alexs@kernel.org Signed-off-by: Alex Shi (tencent) <alexs@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: rename get_ksm_page_flags to ksm_get_folio_flagsDavid Hildenbrand1-16/+16
As we are removing get_ksm_page_flags(), make the flags match the new function name. Link: https://lkml.kernel.org/r/20240411061713.1847574-10-alexs@kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Alex Shi <alexs@kernel.org> Reviewed-by: Alex Shi <alexs@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Hugh Dickins <hughd@google.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: convert chain series funcs and replace get_ksm_pageAlex Shi (tencent)2-72/+66
In ksm stable tree all page are single, let's convert them to use and folios as well as stable_tree_insert/stable_tree_search funcs. And replace get_ksm_page() by ksm_get_folio() since there is no more needs. It could save a few compound_head calls. Link: https://lkml.kernel.org/r/20240411061713.1847574-9-alexs@kernel.org Signed-off-by: Alex Shi (tencent) <alexs@kernel.org> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: use folio in write_protect_pageAlex Shi (tencent)1-12/+13
Compound page is checked and skipped before write_protect_page() called, use folio to save a few compound_head checks. Link: https://lkml.kernel.org/r/20240411061713.1847574-8-alexs@kernel.org Signed-off-by: Alex Shi (tencent) <alexs@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: use ksm_get_folio in scan_get_next_rmap_itemAlex Shi (tencent)1-5/+5
Save a compound_head call. Link: https://lkml.kernel.org/r/20240411061713.1847574-7-alexs@kernel.org Signed-off-by: Alex Shi (tencent) <alexs@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: use folio in stable_node_dupAlex Shi (tencent)1-8/+8
Use ksm_get_folio() and save 2 compound_head calls. Link: https://lkml.kernel.org/r/20240411061713.1847574-6-alexs@kernel.org Signed-off-by: Alex Shi (tencent) <alexs@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: use folio in remove_stable_nodeAlex Shi (tencent)1-12/+12
Pages in stable tree are all single normal page, so uses ksm_get_folio() and folio_set_stable_node(), also saves 3 calls to compound_head(). Link: https://lkml.kernel.org/r/20240411061713.1847574-5-alexs@kernel.org Signed-off-by: Alex Shi (tencent) <alexs@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: add folio_set_stable_nodeAlex Shi (tencent)1-1/+7
Turn set_page_stable_node() into a wrapper folio_set_stable_node, and then use it to replace the former. we will merge them together after all place converted to folio. Link: https://lkml.kernel.org/r/20240411061713.1847574-4-alexs@kernel.org Signed-off-by: Alex Shi (tencent) <alexs@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: use folio in remove_rmap_item_from_treeAlex Shi (tencent)1-5/+5
To save 2 compound_head calls. Link: https://lkml.kernel.org/r/20240411061713.1847574-3-alexs@kernel.org Signed-off-by: Alex Shi (tencent) <alexs@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/ksm: add ksm_get_folioAlex Shi (tencent)1-17/+25
Patch series "transfer page to folio in KSM". This is the first part of page to folio transfer on KSM. Since only single page could be stored in KSM, we could safely transfer stable tree pages to folios. This patchset could reduce ksm.o 57kbytes from 2541776 bytes on latest akpm/mm-stable branch with CONFIG_DEBUG_VM enabled. It pass the KSM testing in LTP and kernel selftest. Thanks for Matthew Wilcox and David Hildenbrand's suggestions and comments! This patch (of 10): The ksm only contains single pages, so we could add a new func ksm_get_folio for get_ksm_page to use folio instead of pages to save a couple of compound_head calls. After all caller replaced, get_ksm_page will be removed. Link: https://lkml.kernel.org/r/20240411061713.1847574-1-alexs@kernel.org Link: https://lkml.kernel.org/r/20240411061713.1847574-2-alexs@kernel.org Signed-off-by: Alex Shi (tencent) <alexs@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06arm: mm: drop VM_FAULT_BADMAP/VM_FAULT_BADACCESSKefeng Wang1-15/+15
If bad map or access, directly set code to SEGV_MAPRR or SEGV_ACCERR, also set fault to 0 and goto error handling, which make us to drop the arch's special vm fault reason. [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20240411130925.73281-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Cristian Marussi <cristian.marussi@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06arm64: mm: drop VM_FAULT_BADMAP/VM_FAULT_BADACCESSKefeng Wang1-23/+20
Patch series "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS", v2. Directly set SEGV_MAPRR or SEGV_ACCERR for arm/arm64 to remove the last two arch's private vm_fault reasons. This patch (of 2): If bad map or access, directly set si_code to SEGV_MAPRR or SEGV_ACCERR, also set fault to 0 and goto error handling, which make us to drop the arch's special vm fault reason. Link: https://lkml.kernel.org/r/20240411130925.73281-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20240411130925.73281-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: Cristian Marussi <cristian.marussi@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06Documentation/admin-guide/cgroup-v1/memory.rst: don't reference page_mapcount()David Hildenbrand1-2/+2
Let's stop talking about page_mapcount(). Link: https://lkml.kernel.org/r/20240409192301.907377-19-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Chang <richardycc@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/debug: print only page mapcount (excluding folio entire mapcount) in ↵David Hildenbrand1-7/+2
__dump_folio() Let's simplify and only print the page mapcount: we already print the large folio mapcount and the entire folio mapcount for large folios separately; that should be sufficient to figure out what's happening. While at it, print the page mapcount also if it had an underflow, filtering out only typed pages. Link: https://lkml.kernel.org/r/20240409192301.907377-18-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Chang <richardycc@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06xtensa/mm: convert check_tlb_entry() to sanity check foliosDavid Hildenbrand1-5/+6
We want to limit the use of page_mapcount() to the places where it is absolutely necessary. So let's convert check_tlb_entry() to perform sanity checks on folios instead of pages. This essentially already happened: page_count() is mapped to folio_ref_count(), and page_mapped() to folio_mapped() internally. However, we would have printed the page_mapount(), which does not really match what page_mapped() would have checked. Let's simply print the folio mapcount to avoid using page_mapcount(). For small folios there is no change. Link: https://lkml.kernel.org/r/20240409192301.907377-17-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Chang <richardycc@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06trace/events/page_ref: trace the raw page mapcount valueDavid Hildenbrand1-2/+2
We want to limit the use of page_mapcount() to the places where it is absolutely necessary. We already trace raw page->refcount, raw page->flags and raw page->mapping, and don't involve any folios. Let's also trace the raw mapcount value that does not consider the entire mapcount of large folios, and we don't add "1" to it. When dealing with typed folios, this makes a lot more sense. ... and it's for debugging purposes only either way. Link: https://lkml.kernel.org/r/20240409192301.907377-16-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Chang <richardycc@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/migrate_device: use folio_mapcount() in migrate_vma_check_page()David Hildenbrand1-5/+7
We want to limit the use of page_mapcount() to the places where it is absolutely necessary. Let's convert migrate_vma_check_page() to work on a folio internally so we can remove the page_mapcount() usage. Note that we reject any large folios. There is a lot more folio conversion to be had, but that has to wait for another day. No functional change intended. Link: https://lkml.kernel.org/r/20240409192301.907377-15-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Chang <richardycc@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-06mm/filemap: use folio_mapcount() in filemap_unaccount_folio()David Hildenbrand1-1/+1
We want to limit the use of page_mapcount() to the places where it is absolutely necessary. Let's use folio_mapcount() instead of filemap_unaccount_folio(). No functional change intended, because we're only dealing with small folios. Link: https://lkml.kernel.org/r/20240409192301.907377-14-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Richard Chang <richardycc@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>