summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2022-04-29mm: compaction: clean up comment about async compaction in isolate_migratepagesMiaohe Lin1-6/+6
Since commit 282722b0d258 ("mm, compaction: restrict async compaction to pageblocks of same migratetype"), async direct compaction is restricted to scan the pageblocks of same migratetype. Correct the comment accordingly. Link: https://lkml.kernel.org/r/20220418141253.24298-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Charan Teja Kalla <charante@codeaurora.org> Cc: David Hildenbrand <david@redhat.com> Cc: Pintu Kumar <pintu@codeaurora.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: compaction: use helper compound_nr in isolate_migratepages_blockMiaohe Lin1-1/+1
Use helper compound_nr to make use of compound_nr when CONFIG_64BIT and simplify the code a bit. Link: https://lkml.kernel.org/r/20220418141253.24298-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Charan Teja Kalla <charante@codeaurora.org> Cc: David Hildenbrand <david@redhat.com> Cc: Pintu Kumar <pintu@codeaurora.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: compaction: use COMPACT_CLUSTER_MAX in compaction.cMiaohe Lin1-4/+4
Always use COMPACT_CLUSTER_MAX here as we're doing the compaction. Minor improvements in readability. Link: https://lkml.kernel.org/r/20220418141253.24298-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Charan Teja Kalla <charante@codeaurora.org> Cc: David Hildenbrand <david@redhat.com> Cc: Pintu Kumar <pintu@codeaurora.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: compaction: clean up comment about suitable migration target recheckMiaohe Lin1-7/+1
checked_pageblock is already removed and suitable_migration_target is not rechecked under the zone lock since commit f8224aa5a0a4 ("mm, compaction: do not recheck suitable_migration_target under lock"). Correct the comment accordingly. Link: https://lkml.kernel.org/r/20220418141253.24298-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Charan Teja Kalla <charante@codeaurora.org> Cc: David Hildenbrand <david@redhat.com> Cc: Pintu Kumar <pintu@codeaurora.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: compaction: clean up comment for sched contentionMiaohe Lin2-8/+5
Since commit cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as contention"), async compaction won't abort when scheduling is needed. Correct the relevant comment accordingly. Link: https://lkml.kernel.org/r/20220418141253.24298-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Charan Teja Kalla <charante@codeaurora.org> Cc: David Hildenbrand <david@redhat.com> Cc: Pintu Kumar <pintu@codeaurora.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: compaction: remove unneeded assignment to isolate_start_pfnMiaohe Lin1-1/+1
isolate_start_pfn is unused when cc->nr_freepages ! = 0. Otherwise cc->free_pfn will overwrite it unconditionally. So we should remove this unneeded and somewhat misleading assignment. Link: https://lkml.kernel.org/r/20220418141253.24298-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Charan Teja Kalla <charante@codeaurora.org> Cc: David Hildenbrand <david@redhat.com> Cc: Pintu Kumar <pintu@codeaurora.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: compaction: remove unneeded pfn updateMiaohe Lin1-1/+0
pfn is unused in this do while loop. Remove the unneeded pfn update. Link: https://lkml.kernel.org/r/20220418141253.24298-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Charan Teja Kalla <charante@codeaurora.org> Cc: David Hildenbrand <david@redhat.com> Cc: Pintu Kumar <pintu@codeaurora.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: compaction: remove unneeded return value of kcompactd_runMiaohe Lin1-5/+2
Patch series "A few cleanup and fixup patches for compaction". This series contains a few patches to clean up some obsolete comment, remove unneeded return value and so on. Also we fix the possible NULL pointer dereference. More details can be found in the respective changelogs. This patch (of 12): The return value of kcompactd_run() is unused now. Clean it up. Link: https://lkml.kernel.org/r/20220418141253.24298-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220418141253.24298-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc; Mel Gorman <mgorman@techsingularity.net> Cc: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pintu Kumar <pintu@codeaurora.org> Cc: Charan Teja Kalla <charante@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/vmstat: add events for ksm cowYang Yang2-0/+7
Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want to save memory, it's a tradeoff by suffering delay on ksm cow. Users can get to know how much memory ksm saved by reading /sys/kernel/mm/ksm/pages_sharing, but they don't know what's the costs of ksm cow, and this is important of some delay sensitive tasks. So add ksm cow events to help users evaluate whether or how to use ksm. Also update Documentation/admin-guide/mm/ksm.rst with new added events. Link: https://lkml.kernel.org/r/20220331035616.2390805-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Saravanan D <saravanand@fb.com> Cc: Minchan Kim <minchan@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29ksm: count ksm merging pages for each processxu xin1-0/+8
Some applications or containers want to use KSM by calling madvise() to advise areas of address space to be MERGEABLE. But they may not know which applications are more likely to cause real merges in the deployment. If this patch is applied, it helps them know their corresponding number of merged pages, and then optimize their app code. As current KSM only counts the number of KSM merging pages(e.g. ksm_pages_sharing and ksm_pages_shared) of the whole system, we cannot see the more fine-grained KSM merging, for the upper application optimization, the merging area cannot be set easily according to the KSM page merging probability of each process. Therefore, it is necessary to add extra statistical means so that the upper level users can know the detailed KSM merging information of each process. We add a new proc file named as ksm_merging_pages under /proc/<pid>/ to indicate the involved ksm merging pages of this process. [akpm@linux-foundation.org: fix comment typo, remove BUG_ON()s] Link: https://lkml.kernel.org/r/20220325082318.2352853-1-xu.xin16@zte.com.cn Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Cc: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Ohhoon Kwon <ohoono.kwon@samsung.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Feng Tang <feng.tang@intel.com> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn> Cc: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/page_alloc: reuse tail struct pages for compound devmapsJoao Martins1-1/+16
Currently memmap_init_zone_device() ends up initializing 32768 pages when it only needs to initialize 128 given tail page reuse. That number is worse with 1GB compound pages, 262144 instead of 128. Update memmap_init_zone_device() to skip redundant initialization, detailed below. When a pgmap @vmemmap_shift is set, all pages are mapped at a given huge page alignment and use compound pages to describe them as opposed to a struct per 4K. With @vmemmap_shift > 0 and when struct pages are stored in ram (!altmap) most tail pages are reused. Consequently, the amount of unique struct pages is a lot smaller than the total amount of struct pages being mapped. The altmap path is left alone since it does not support memory savings based on compound pages devmap. Link: https://lkml.kernel.org/r/20220420155310.9712-6-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/sparse-vmemmap: improve memory savings for compound devmapsJoao Martins2-10/+123
A compound devmap is a dev_pagemap with @vmemmap_shift > 0 and it means that pages are mapped at a given huge page alignment and utilize uses compound pages as opposed to order-0 pages. Take advantage of the fact that most tail pages look the same (except the first two) to minimize struct page overhead. Allocate a separate page for the vmemmap area which contains the head page and separate for the next 64 pages. The rest of the subsections then reuse this tail vmemmap page to initialize the rest of the tail pages. Sections are arch-dependent (e.g. on x86 it's 64M, 128M or 512M) and when initializing compound devmap with big enough @vmemmap_shift (e.g. 1G PUD) it may cross multiple sections. The vmemmap code needs to consult @pgmap so that multiple sections that all map the same tail data can refer back to the first copy of that data for a given gigantic page. On compound devmaps with 2M align, this mechanism lets 6 pages be saved out of the 8 necessary PFNs necessary to set the subsection's 512 struct pages being mapped. On a 1G compound devmap it saves 4094 pages. Altmap isn't supported yet, given various restrictions in altmap pfn allocator, thus fallback to the already in use vmemmap_populate(). It is worth noting that altmap for devmap mappings was there to relieve the pressure of inordinate amounts of memmap space to map terabytes of pmem. With compound pages the motivation for altmaps for pmem gets reduced. Link: https://lkml.kernel.org/r/20220420155310.9712-5-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/hugetlb_vmemmap: move comment block to Documentation/vmJoao Martins1-167/+1
In preparation for device-dax for using hugetlbfs compound page tail deduplication technique, move the comment block explanation into a common place in Documentation/vm. Link: https://lkml.kernel.org/r/20220420155310.9712-4-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helperJoao Martins1-17/+36
In preparation for describing a memmap with compound pages, move the actual pte population logic into a separate function vmemmap_populate_address() and have a new helper vmemmap_populate_range() walk through all base pages it needs to populate. While doing that, change the helper to use a pte_t* as return value, rather than an hardcoded errno of 0 or -ENOMEM. Link: https://lkml.kernel.org/r/20220420155310.9712-3-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/sparse-vmemmap: add a pgmap argument to section activationJoao Martins3-12/+20
Patch series "sparse-vmemmap: memory savings for compound devmaps (device-dax)", v9. This series minimizes 'struct page' overhead by pursuing a similar approach as Muchun Song series "Free some vmemmap pages of hugetlb page" (now merged since v5.14), but applied to devmap with @vmemmap_shift (device-dax). The vmemmap dedpulication original idea (already used in HugeTLB) is to reuse/deduplicate tail page vmemmap areas, particular the area which only describes tail pages. So a vmemmap page describes 64 struct pages, and the first page for a given ZONE_DEVICE vmemmap would contain the head page and 63 tail pages. The second vmemmap page would contain only tail pages, and that's what gets reused across the rest of the subsection/section. The bigger the page size, the bigger the savings (2M hpage -> save 6 vmemmap pages; 1G hpage -> save 4094 vmemmap pages). This is done for PMEM /specifically only/ on device-dax configured namespaces, not fsdax. In other words, a devmap with a @vmemmap_shift. In terms of savings, per 1Tb of memory, the struct page cost would go down with compound devmap: * with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of total memory) * with 1G pages we lose 40MB instead of 16G (0.0014% instead of 1.5% of total memory) The series is mostly summed up by patch 4, and to summarize what the series does: Patches 1 - 3: Minor cleanups in preparation for patch 4. Move the very nice docs of hugetlb_vmemmap.c into a Documentation/vm/ entry. Patch 4: Patch 4 is the one that takes care of the struct page savings (also referred to here as tail-page/vmemmap deduplication). Much like Muchun series, we reuse the second PTE tail page vmemmap areas across a given @vmemmap_shift On important difference though, is that contrary to the hugetlbfs series, there's no vmemmap for the area because we are late-populating it as opposed to remapping a system-ram range. IOW no freeing of pages of already initialized vmemmap like the case for hugetlbfs, which greatly simplifies the logic (besides not being arch-specific). altmap case unchanged and still goes via the vmemmap_populate(). Also adjust the newly added docs to the device-dax case. [Note that device-dax is still a little behind HugeTLB in terms of savings. I have an additional simple patch that reuses the head vmemmap page too, as a follow-up. That will double the savings and namespaces initialization.] Patch 5: Initialize fewer struct pages depending on the page size with DRAM backed struct pages -- because fewer pages are unique and most tail pages (with bigger vmemmap_shift). NVDIMM namespace bootstrap improves from ~268-358 ms to ~80-110/<1ms on 128G NVDIMMs with 2M and 1G respectivally. And struct page needed capacity will be 3.8x / 1071x smaller for 2M and 1G respectivelly. Tested on x86 with 1.5Tb of pmem (including pinning, and RDMA registration/deregistration scalability with 2M MRs) This patch (of 5): In support of using compound pages for devmap mappings, plumb the pgmap down to the vmemmap_populate implementation. Note that while altmap is retrievable from pgmap the memory hotplug code passes altmap without pgmap[*], so both need to be independently plumbed. So in addition to @altmap, pass @pgmap to sparse section populate functions namely: sparse_add_section section_activate populate_section_memmap __populate_section_memmap Passing @pgmap allows __populate_section_memmap() to both fetch the vmemmap_shift in which memmap metadata is created for and also to let sparse-vmemmap fetch pgmap ranges to co-relate to a given section and pick whether to just reuse tail pages from past onlined sections. While at it, fix the kdoc for @altmap for sparse_add_section(). [*] https://lore.kernel.org/linux-mm/20210319092635.6214-1-osalvador@suse.de/ Link: https://lkml.kernel.org/r/20220420155310.9712-1-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/20220420155310.9712-2-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*Muchun Song4-7/+7
The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup configs to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: hugetlb_vmemmap: cleanup hugetlb_free_vmemmap_enabled*Muchun Song2-6/+6
The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup the static key and hugetlb_free_vmemmap_enabled() to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-3-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: hugetlb_vmemmap: cleanup hugetlb_vmemmap related functionsMuchun Song3-37/+35
Patch series "cleanup hugetlb_vmemmap". The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize" is more clear. In this series, cheanup related codes to make it more clear and expressive. This is suggested by David. This patch (of 3): The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". And some function names are prefixed with "huge_page" instead of "hugetlb", it is easily to be confused with THP. In this patch, cheanup related functions to make code more clear and expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220404074652.68024-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/page_alloc.c: calc the right pfn if page size is not 4KMa Wupeng1-1/+1
Previous 0x100000 is used to check the 4G limit in find_zone_movable_pfns_for_nodes(). This is right in x86 because the page size can only be 4K. But 16K and 64K are available in arm64. So replace it with PHYS_PFN(SZ_4G). Link: https://lkml.kernel.org/r/20220414101314.1250667-8-mawupeng1@huawei.com Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mremap: avoid unneeded do_munmap callMiaohe Lin1-2/+2
When old_len == new_len, do_munmap will return -EINVAL due to len == 0. This errno will be simply ignored because of old_len != new_len check. So it is unnecessary to call do_munmap when old_len == new_len because nothing is actually done. Link: https://lkml.kernel.org/r/20220401081023.37080-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mremap: use helper mlock_future_check()Miaohe Lin1-8/+2
Use helper mlock_future_check() to check whether it's safe to resize the locked_vm to simplify the code. Minor readability improvement. Link: https://lkml.kernel.org/r/20220322112004.27380-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mmap: drop arch_vm_get_page_pgprot()Anshuman Khandual1-3/+1
There are no platforms left which use arch_vm_get_page_prot(). Just drop generic arch_vm_get_page_prot(). Link: https://lkml.kernel.org/r/20220414062125.609297-8-anshuman.khandual@arm.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mmap: drop arch_filter_pgprot()Anshuman Khandual2-14/+2
There are no platforms left which subscribe ARCH_HAS_FILTER_PGPROT. Hence drop generic arch_filter_pgprot() and also config ARCH_HAS_FILTER_PGPROT. Link: https://lkml.kernel.org/r/20220414062125.609297-7-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mmap: add new config ARCH_HAS_VM_GET_PAGE_PROTAnshuman Khandual2-0/+5
Patch series "mm/mmap: Drop arch_vm_get_page_prot() and arch_filter_pgprot()", v7. protection_map[] is an array based construct that translates given vm_flags combination. This array contains page protection map, which is populated by the platform via [__S000 .. __S111] and [__P000 .. __P111] exported macros. Primary usage for protection_map[] is for vm_get_page_prot(), which is used to determine page protection value for a given vm_flags. vm_get_page_prot() implementation, could again call platform overrides arch_vm_get_page_prot() and arch_filter_pgprot(). Some platforms override protection_map[] that was originally built with __SXXX/__PXXX with different runtime values. Currently there are multiple layers of abstraction i.e __SXXX/__PXXX macros , protection_map[], arch_vm_get_page_prot() and arch_filter_pgprot() built between the platform and generic MM, finally defining vm_get_page_prot(). Hence this series proposes to drop later two abstraction levels and instead just move the responsibility of defining vm_get_page_prot() to the platform (still utilizing generic protection_map[] array) itself making it clean and simple. This first introduces ARCH_HAS_VM_GET_PAGE_PROT which enables the platforms to define custom vm_get_page_prot(). This starts converting platforms that define the overrides arch_filter_pgprot() or arch_vm_get_page_prot() which enables for those constructs to be dropped off completely. The series has been inspired from an earlier discuss with Christoph Hellwig https://lore.kernel.org/all/1632712920-8171-1-git-send-email-anshuman.khandual@arm.com/ This patch (of 7): Add a new config ARCH_HAS_VM_GET_PAGE_PROT, which when subscribed enables a given platform to define its own vm_get_page_prot() but still utilizing the generic protection_map[] array. Link: https://lkml.kernel.org/r/20220414062125.609297-1-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/20220414062125.609297-2-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mmap.c: use helper mlock_future_check()Miaohe Lin1-9/+2
Use helper mlock_future_check() to check whether it's safe to enlarge the locked_vm to simplify the code. Minor readability improvement. Link: https://lkml.kernel.org/r/20220402032231.64974-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mmap: clarify protection_map[] indicesAnshuman Khandual1-2/+16
protection_map[] maps vm_flags access combinations into page protection value as defined by the platform via __PXXX and __SXXX macros. The array indices in protection_map[], represents vm_flags access combinations but it's not very intuitive to derive. This makes it clear and explicit. Link: https://lkml.kernel.org/r/20220404031840.588321-3-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/debug_vm_pgtable: drop protection_map[] usageAnshuman Khandual1-12/+19
Patch series "mm: protection_map[] cleanups". This patch (of 2): Although protection_map[] contains the platform defined page protection map for a given vm_flags combination, vm_get_page_prot() is the right interface to use. This will also reduce dependency on protection_map[] which is going to be dropped off completely later on. Link: https://lkml.kernel.org/r/20220404031840.588321-1-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/20220404031840.588321-2-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mmu_gather: limit free batch count and add schedule point in ↵Jianxing Wang1-2/+14
tlb_batch_pages_flush free a large list of pages maybe cause rcu_sched starved on non-preemptible kernels. howerver free_unref_page_list maybe can't cond_resched as it maybe called in interrupt or atomic context, especially can't detect atomic context in CONFIG_PREEMPTION=n. The issue is detected in guest with kvm cpu 200% overcommit, however I didn't see the warning in the host with the same application. I'm sure that the patch is needed for guest kernel, but no sure for host. To reproduce, set up two virtual machines in one host machine, per vm has the same number cpu and half memory of host. the run ltpstress.sh in per vm, then will see rcu stall warning.kernel is preempt disabled, append kernel command 'preempt=none' if enable dynamic preempt . It could detected in loongson machine(32 core, 128G mem) and ProLiant DL380 Gen9(x86 E5-2680, 28 core, 64G mem) tlb flush batch count depends on PAGE_SIZE, it's too large if PAGE_SIZE > 4K, here limit free batch count with 512. And add schedule point in tlb_batch_pages_flush. rcu: rcu_sched kthread starved for 5359 jiffies! g454793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=19 [...] Call Trace: free_unref_page_list+0x19c/0x270 release_pages+0x3cc/0x498 tlb_flush_mmu_free+0x44/0x70 zap_pte_range+0x450/0x738 unmap_page_range+0x108/0x240 unmap_vmas+0x74/0xf0 unmap_region+0xb0/0x120 do_munmap+0x264/0x438 vm_munmap+0x58/0xa0 sys_munmap+0x10/0x20 syscall_common+0x24/0x38 Link: https://lkml.kernel.org/r/20220317072857.2635262-1-wangjianxing@loongson.cn Signed-off-by: Jianxing Wang <wangjianxing@loongson.cn> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mmap.c: use mmap_assert_write_locked() instead of open coding itRolf Eike Beer1-2/+2
In case the lock is actually not held at this point. Link: https://lkml.kernel.org/r/5827758.TJ1SttVevJ@mobilepool36.emlix.com Signed-off-by: Rolf Eike Beer <eb@emlix.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: simplify follow_invalidate_pte()Muchun Song1-58/+23
The only user (DAX) of range and pmdpp parameters of follow_invalidate_pte() is gone, it is safe to remove them and make it static to simlify the code. This is revertant of the following commits: 097963959594 ("mm: add follow_pte_pmd()") a4d1a8852513 ("dax: update to new mmu_notifier semantic") There is only one caller of the follow_invalidate_pte(). So just fold it into follow_pte() and remove it. Link: https://lkml.kernel.org/r/20220403053957.10770-7-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: pvmw: add support for walking devmap pagesMuchun Song1-8/+9
The devmap pages can not use page_vma_mapped_walk() to check if a huge devmap page is mapped into a vma. Add support for walking huge devmap pages so that DAX can use it in the next patch. Link: https://lkml.kernel.org/r/20220403053957.10770-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: rmap: introduce pfn_mkclean_range() to cleans PTEsMuchun Song2-20/+71
The page_mkclean_one() is supposed to be used with the pfn that has a associated struct page, but not all the pfns (e.g. DAX) have a struct page. Introduce a new function pfn_mkclean_range() to cleans the PTEs (including PMDs) mapped with range of pfns which has no struct page associated with them. This helper will be used by DAX device in the next patch to make pfns clean. Link: https://lkml.kernel.org/r/20220403053957.10770-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: rmap: fix cache flush on THP pagesMuchun Song1-1/+2
Patch series "Fix some bugs related to ramp and dax", v7. Patch 1-2 fix a cache flush bug, because subsequent patches depend on those on those changes, there are placed in this series. Patch 3-4 are preparation for fixing a dax bug in patch 5. Patch 6 is code cleanup since the previous patch removes the usage of follow_invalidate_pte(). This patch (of 6): The flush_cache_page() only remove a PAGE_SIZE sized range from the cache. However, it does not cover the full pages in a THP except a head page. Replace it with flush_cache_range() to fix this issue. At least, no problems were found due to this. Maybe because the architectures that have virtual indexed caches is less. Link: https://lkml.kernel.org/r/20220403053957.10770-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220403053957.10770-2-songmuchun@bytedance.com Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alistair Popple <apopple@nvidia.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Hugh Dickins <hughd@google.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/madvise: fix potential pte_unmap_unlock pte errorMiaohe Lin1-4/+4
We can't assume pte_offset_map_lock will return same orig_pte value. So it's necessary to reacquire the orig_pte or pte_unmap_unlock will unmap the stale pte. Link: https://lkml.kernel.org/r/20220416081416.23304-1-linmiaohe@huawei.com Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD") Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: untangle config dependencies for demote-on-reclaimOscar Salvador2-7/+6
At the time demote-on-reclaim was introduced, it was tied to CONFIG_HOTPLUG_CPU + CONFIG_MIGRATE, but that is not really accurate. The only two things we need to depend on are CONFIG_NUMA + CONFIG_MIGRATE, so clean this up. Furthermore, we only register the hotplug memory notifier when the system has CONFIG_MEMORY_HOTPLUG. Link: https://lkml.kernel.org/r/20220322224016.4574-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Suggested-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Abhishek Goel <huntbag@linux.vnet.ibm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: migrate: simplify the refcount validation when migrating hugetlb mappingBaolin Wang1-5/+0
There is no need to validate the hugetlb page's refcount before trying to freeze the hugetlb page's expected refcount, instead we can just rely on the page_ref_freeze() to simplify the validation. Moreover we are always under the page lock when migrating the hugetlb page mapping, which means nowhere else can remove it from the page cache, so we can remove the xas_load() validation under the i_pages lock. Link: https://lkml.kernel.org/r/eb2fbbeaef2b1714097b9dec457426d682ee0635.1649676424.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/migration: fix possible do_pages_stat_array racing with memory offlineMiaohe Lin1-2/+7
When follow_page peeks a page, the page could be migrated and then be offlined while it's still being used by the do_pages_stat_array(). Use FOLL_GET to hold the page refcnt to fix this potential race. Link: https://lkml.kernel.org/r/20220318111709.60311-12-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/migration: fix potential invalid node access for reclaim-based migrationMiaohe Lin1-3/+3
If we failed to setup hotplug state callbacks for mm/demotion:online in some corner cases, node_demotion will be left uninitialized. Invalid node might be returned from the next_demotion_node() when doing reclaim-based migration. Use kcalloc to allocate node_demotion to fix the issue. Link: https://lkml.kernel.org/r/20220318111709.60311-11-linmiaohe@huawei.com Fixes: ac16ec835314 ("mm: migrate: support multiple target nodes demotion") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/migration: fix potential page refcounts leak in migrate_pagesMiaohe Lin1-0/+8
In -ENOMEM case, there might be some subpages of fail-to-migrate THPs left in thp_split_pages list. We should move them back to migration list so that they could be put back to the right list by the caller otherwise the page refcnt will be leaked here. Also adjust nr_failed and nr_thp_failed accordingly to make vm events account more accurate. Link: https://lkml.kernel.org/r/20220318111709.60311-10-linmiaohe@huawei.com Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/migration: remove some duplicated codes in migrate_pagesMiaohe Lin1-24/+12
Remove the duplicated codes in migrate_pages to simplify the code. Minor readability improvement. No functional change intended. Link: https://lkml.kernel.org/r/20220318111709.60311-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/migration: avoid unneeded nodemask_t initializationMiaohe Lin1-2/+2
Avoid unneeded next_pass and this_pass initialization as they're always set before using to save possible cpu cycles when there are plenty of nodes in the system. Link: https://lkml.kernel.org/r/20220318111709.60311-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/migration: use helper macro min in do_pages_statMiaohe Lin1-6/+2
We could use helper macro min to help set the chunk_nr to simplify the code. Link: https://lkml.kernel.org/r/20220318111709.60311-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/migration: use helper function vma_lookup() in add_page_for_migrationMiaohe Lin1-2/+2
We could use helper function vma_lookup() to lookup the needed vma to simplify the code. Link: https://lkml.kernel.org/r/20220318111709.60311-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/migration: remove unneeded local variable page_lruMiaohe Lin1-3/+1
We can use page_is_file_lru() directly to help account the isolated pages to simplify the code a bit. Link: https://lkml.kernel.org/r/20220318111709.60311-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/migration: remove unneeded local variable mapping_lockedMiaohe Lin1-4/+2
Patch series "A few cleanup and fixup patches for migration", v2. This series contains a few patches to remove unneeded variables, jump label and use helper to simplify the code. Also we fix some bugs such as page refcounts leak , invalid node access and so on. More details can be found in the respective changelogs. This patch (of 11): When mapping_locked is true, TTU_RMAP_LOCKED is always set to ttu. We can check ttu instead so mapping_locked can be removed. And ttu is either 0 or TTU_RMAP_LOCKED now. Change '|=' to '=' to reflect this. Link: https://lkml.kernel.org/r/20220318111709.60311-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220318111709.60311-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Alistair Popple <apopple@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mempolicy: clean up the code logic in queue_pages_pte_rangeMiaohe Lin1-9/+3
Since commit e5947d23edd8 ("mm: mempolicy: don't have to split pmd for huge zero page"), THP is never splited in queue_pages_pmd. Thus 2 is never returned now. We can remove such unnecessary ret != 2 check and clean up the relevant comment. Minor improvements in readability. Link: https://lkml.kernel.org/r/20220419122234.45083-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm: compaction: use helper isolation_suitable()Miaohe Lin1-1/+1
Use helper isolation_suitable() to check whether page is suitable to isolate to simplify the code. Minor readability improvement. Link: https://lkml.kernel.org/r/20220322110750.60311-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/z3fold: remove unneeded PAGE_HEADLESS check in free_handle()Miaohe Lin1-3/+0
The only caller z3fold_free() never calls free_handle() in PAGE_HEADLESS case. Remove this unneeded check. Link: https://lkml.kernel.org/r/20220308134311.59086-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/z3fold: remove redundant list_del_init of zhdr->buddy in z3fold_freeMiaohe Lin1-3/+0
do_compact_page() will do list_del_init(&zhdr->buddy) for us. Remove this extra one to save some possible cpu cycles. Link: https://lkml.kernel.org/r/20220308134311.59086-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/z3fold: move decrement of pool->pages_nr into __release_z3fold_page()Miaohe Lin1-29/+12
The z3fold will always do atomic64_dec(&pool->pages_nr) when the __release_z3fold_page() is called. Thus we can move decrement of pool->pages_nr into __release_z3fold_page() to simplify the code. Also we can reduce the size of z3fold.o ~1k. Without this patch: text data bss dec hex filename 15444 1376 8 16828 41bc mm/z3fold.o With this patch: text data bss dec hex filename 15044 1248 8 16300 3fac mm/z3fold.o Link: https://lkml.kernel.org/r/20220308134311.59086-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>