summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-09-26Merge branch 'mm-hotfixes-stable' into mm-stableAndrew Morton23-106/+192
2022-09-26x86/uaccess: avoid check_object_size() in copy_from_user_nmi()Kees Cook1-1/+1
The check_object_size() helper under CONFIG_HARDENED_USERCOPY is designed to skip any checks where the length is known at compile time as a reasonable heuristic to avoid "likely known-good" cases. However, it can only do this when the copy_*_user() helpers are, themselves, inline too. Using find_vmap_area() requires taking a spinlock. The check_object_size() helper can call find_vmap_area() when the destination is in vmap memory. If show_regs() is called in interrupt context, it will attempt a call to copy_from_user_nmi(), which may call check_object_size() and then find_vmap_area(). If something in normal context happens to be in the middle of calling find_vmap_area() (with the spinlock held), the interrupt handler will hang forever. The copy_from_user_nmi() call is actually being called with a fixed-size length, so check_object_size() should never have been called in the first place. Given the narrow constraints, just replace the __copy_from_user_inatomic() call with an open-coded version that calls only into the sanitizers and not check_object_size(), followed by a call to raw_copy_from_user(). [akpm@linux-foundation.org: no instrument_copy_from_user() in my tree...] Link: https://lkml.kernel.org/r/20220919201648.2250764-1-keescook@chromium.org Link: https://lore.kernel.org/all/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com Fixes: 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Yu Zhao <yuzhao@google.com> Reported-by: Florian Lehner <dev@der-flo.net> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Florian Lehner <dev@der-flo.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/page_isolation: fix isolate_single_pageblock() isolation behaviorZi Yan1-11/+14
set_migratetype_isolate() does not allow isolating MIGRATE_CMA pageblocks unless it is used for CMA allocation. isolate_single_pageblock() did not have the same behavior when it is used together with set_migratetype_isolate() in start_isolate_page_range(). This allows alloc_contig_range() with migratetype other than MIGRATE_CMA, like MIGRATE_MOVABLE (used by alloc_contig_pages()), to isolate first and last pageblock but fail the rest. The failure leads to changing migratetype of the first and last pageblock to MIGRATE_MOVABLE from MIGRATE_CMA, corrupting the CMA region. This can happen during gigantic page allocations. Like Doug said here: https://lore.kernel.org/linux-mm/a3363a52-883b-dcd1-b77f-f2bb378d6f2d@gmail.com/T/#u, for gigantic page allocations, the user would notice no difference, since the allocation on CMA region will fail as well as it did before. But it might hurt the performance of device drivers that use CMA, since CMA region size decreases. Fix it by passing migratetype into isolate_single_pageblock(), so that set_migratetype_isolate() used by isolate_single_pageblock() will prevent the isolation happening. Link: https://lkml.kernel.org/r/20220914023913.1855924-1-zi.yan@sent.com Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Doug Berger <opendmb@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm,hwpoison: check mm when killing accessing processShuai Xue1-0/+3
The GHES code calls memory_failure_queue() from IRQ context to queue work into workqueue and schedule it on the current CPU. Then the work is processed in memory_failure_work_func() by kworker and calls memory_failure(). When a page is already poisoned, commit a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") make memory_failure() call kill_accessing_process() that: - holds mmap locking of current->mm - does pagetable walk to find the error virtual address - and sends SIGBUS to the current process with error info. However, the mm of kworker is not valid, resulting in a null-pointer dereference. So check mm when killing the accessing process. [akpm@linux-foundation.org: remove unrelated whitespace alteration] Link: https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Bixuan Cui <cuibixuan@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/hugetlb: correct demote page offset logicDoug Berger1-6/+8
With gigantic pages it may not be true that struct page structures are contiguous across the entire gigantic page. The nth_page macro is used here in place of direct pointer arithmetic to correct for this. Mike said: : This error could cause addressing exceptions. However, this is only : possible in configurations where CONFIG_SPARSEMEM && : !CONFIG_SPARSEMEM_VMEMMAP. Such a configuration option is rare and : unknown to be the default anywhere. Link: https://lkml.kernel.org/r/20220914190917.3517663-1-opendmb@gmail.com Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support") Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: prevent page_frag_alloc() from corrupting the memoryMaurizio Lombardi1-0/+12
A number of drivers call page_frag_alloc() with a fragment's size > PAGE_SIZE. In low memory conditions, __page_frag_cache_refill() may fail the order 3 cache allocation and fall back to order 0; In this case, the cache will be smaller than the fragment, causing memory corruptions. Prevent this from happening by checking if the newly allocated cache is large enough for the fragment; if not, the allocation will fail and page_frag_alloc() will return NULL. Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com Fixes: b63ae8ca096d ("mm/net: Rename and move page fragment handling from net/ to mm/") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Cc: Chen Lin <chen45464546@163.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: bring back update_mmu_cache() to finish_fault()Sergei Antonov1-4/+10
Running this test program on ARMv4 a few times (sometimes just once) reproduces the bug. int main() { unsigned i; char paragon[SIZE]; void* ptr; memset(paragon, 0xAA, SIZE); ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED, -1, 0); if (ptr == MAP_FAILED) return 1; printf("ptr = %p\n", ptr); for (i=0;i<10000;i++){ memset(ptr, 0xAA, SIZE); if (memcmp(ptr, paragon, SIZE)) { printf("Unexpected bytes on iteration %u!!!\n", i); break; } } munmap(ptr, SIZE); } In the "ptr" buffer there appear runs of zero bytes which are aligned by 16 and their lengths are multiple of 16. Linux v5.11 does not have the bug, "git bisect" finds the first bad commit: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Before the commit update_mmu_cache() was called during a call to filemap_map_pages() as well as finish_fault(). After the commit finish_fault() lacks it. Bring back update_mmu_cache() to finish_fault() to fix the bug. Also call update_mmu_tlb() only when returning VM_FAULT_NOPAGE to more closely reproduce the code of alloc_set_pte() function that existed before the commit. On many platforms update_mmu_cache() is nop: x86, see arch/x86/include/asm/pgtable ARMv6+, see arch/arm/include/asm/tlbflush.h So, it seems, few users ran into this bug. Link: https://lkml.kernel.org/r/20220908204809.2012451-1-saproj@gmail.com Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Sergei Antonov <saproj@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26frontswap: don't call ->init if no ops are registeredChristoph Hellwig1-0/+3
If no frontswap module (i.e. zswap) was registered, frontswap_ops will be NULL. In such situation, swapon crashes with the following stack trace: Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000020a4fab000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] SMP Modules linked in: zram fsl_dpaa2_eth pcs_lynx phylink ahci_qoriq crct10dif_ce ghash_ce sbsa_gwdt fsl_mc_dpio nvme lm90 nvme_core at803x xhci_plat_hcd rtc_fsl_ftm_alarm xgmac_mdio ahci_platform i2c_imx ip6_tables ip_tables fuse Unloaded tainted modules: cppc_cpufreq():1 CPU: 10 PID: 761 Comm: swapon Not tainted 6.0.0-rc2-00454-g22100432cf14 #1 Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II Jun 21 2022 pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : frontswap_init+0x38/0x60 lr : __do_sys_swapon+0x8a8/0x9f4 sp : ffff80000969bcf0 x29: ffff80000969bcf0 x28: ffff37bee0d8fc00 x27: ffff80000a7f5000 x26: fffffcdefb971e80 x25: ffffaba797453b90 x24: 0000000000000064 x23: ffff37c1f209d1a8 x22: ffff37bee880e000 x21: ffffaba797748560 x20: ffff37bee0d8fce4 x19: ffffaba797748488 x18: 0000000000000014 x17: 0000000030ec029a x16: ffffaba795a479b0 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000001 x11: ffff37c63c0aba18 x10: 0000000000000000 x9 : ffffaba7956b8c88 x8 : ffff80000969bcd0 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffffaba79730f000 x2 : ffff37bee0d8fc00 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: frontswap_init+0x38/0x60 __do_sys_swapon+0x8a8/0x9f4 __arm64_sys_swapon+0x28/0x3c invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0xd4/0xf4 do_el0_svc+0x38/0x4c el0_svc+0x34/0x10c el0t_64_sync_handler+0x11c/0x150 el0t_64_sync+0x190/0x194 Code: d000e283 910003fd f9006c41 f946d461 (f9400021) ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20220909130829.3262926-1-hch@lst.de Fixes: 1da0d94a3ec8 ("frontswap: remove support for multiple ops") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()Naoya Horiguchi1-4/+2
NULL pointer dereference is triggered when calling thp split via debugfs on the system with offlined memory blocks. With debug option enabled, the following kernel messages are printed out: page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000 flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff) raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: unmovable page page:000000007d7ab72e is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1248! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 16 PID: 20964 Comm: bash Tainted: G I 6.0.0-rc3-foll-numa+ #41 ... RIP: 0010:split_huge_pages_write+0xcf4/0xe30 This shows that page_to_nid() in page_zone() is unexpectedly called for an offlined memmap. Use pfn_to_online_page() to get struct page in PFN walker. Link: https://lkml.kernel.org/r/20220908041150.3430269-1-naoya.horiguchi@linux.dev Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: fix madivse_pageout mishandling on non-LRU pageMinchan Kim1-2/+5
MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from isolate_lru_page below. Fix it by checking PageLRU in advance. ------------[ cut here ]------------ trying to isolate tail page WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140 Modules linked in: CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:isolate_lru_page+0x130/0x140 Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/ Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT") Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: 韩天ç`• <hantianshuo@iie.ac.cn> Suggested-by: Yang Shi <shy828301@gmail.com> Acked-by: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flushYang Shi1-9/+0
The IPI broadcast is used to serialize against fast-GUP, but fast-GUP will move to use RCU instead of disabling local interrupts in fast-GUP. Using an IPI is the old-styled way of serializing against fast-GUP although it still works as expected now. And fast-GUP now fixed the potential race with THP collapse by checking whether PMD is changed or not. So IPI broadcast in radix pmd collapse flush is not necessary anymore. But it is still needed for hash TLB. Link: https://lkml.kernel.org/r/20220907180144.555485-2-shy828301@gmail.com Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: gup: fix the fast GUP race against THP collapseYang Shi2-10/+34
Since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer sufficient to handle concurrent GUP-fast in all cases, it only handles traditional IPI-based GUP-fast correctly. On architectures that send an IPI broadcast on TLB flush, it works as expected. But on the architectures that do not use IPI to broadcast TLB flush, it may have the below race: CPU A CPU B THP collapse fast GUP gup_pmd_range() <-- see valid pmd gup_pte_range() <-- work on pte pmdp_collapse_flush() <-- clear pmd and flush __collapse_huge_page_isolate() check page pinned <-- before GUP bump refcount pin the page check PTE <-- no change __collapse_huge_page_copy() copy data to huge page ptep_clear() install huge pmd for the huge page return the stale page discard the stale page The race can be fixed by checking whether PMD is changed or not after taking the page pin in fast GUP, just like what it does for PTE. If the PMD is changed it means there may be parallel THP collapse, so GUP should back off. Also update the stale comment about serializing against fast GUP in khugepaged. Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()") Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm: fix PageAnonExclusive clearing racing with concurrent RCU GUP-fastDavid Hildenbrand6-12/+85
commit 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive") made sure that when PageAnonExclusive() has to be cleared during temporary unmapping of a page, that the PTE is cleared/invalidated and that the TLB is flushed. What we want to achieve in all cases is that we cannot end up with a pin on an anonymous page that may be shared, because such pins would be unreliable and could result in memory corruptions when the mapped page and the pin go out of sync due to a write fault. That TLB flush handling was inspired by an outdated comment in mm/ksm.c:write_protect_page(), which similarly required the TLB flush in the past to synchronize with GUP-fast. However, ever since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer sufficient to handle concurrent GUP-fast in all cases -- it only handles traditional IPI-based GUP-fast correctly. Peter Xu (thankfully) questioned whether that TLB flush is really required. On architectures that send an IPI broadcast on TLB flush, it works as expected. To synchronize with RCU GUP-fast properly, we're conceptually fine, however, we have to enforce a certain memory order and are missing memory barriers. Let's document that, avoid the TLB flush where possible and use proper explicit memory barriers where required. We shouldn't really care about the additional memory barriers here, as we're not on extremely hot paths -- and we're getting rid of some TLB flushes. We use a smp_mb() pair for handling concurrent pinning and a smp_rmb()/smp_wmb() pair for handling the corner case of only temporary PTE changes but permanent PageAnonExclusive changes. One extreme example, whereby GUP-fast takes a R/O pin and KSM wants to convert an exclusive anonymous page to a KSM page, and that page is already mapped write-protected (-> no PTE change) would be: Thread 0 (KSM) Thread 1 (GUP-fast) (B1) Read the PTE # (B2) skipped without FOLL_WRITE (A1) Clear PTE smp_mb() (A2) Check pinned (B3) Pin the mapped page smp_mb() (A3) Clear PageAnonExclusive smp_wmb() (A4) Restore PTE (B4) Check if the PTE changed smp_rmb() (B5) Check PageAnonExclusive Thread 1 will properly detect that PageAnonExclusive was cleared and back off. Note that we don't need a memory barrier between checking if the page is pinned and clearing PageAnonExclusive, because stores are not speculated. The possible issues due to reordering are of theoretical nature so far and attempts to reproduce the race failed. Especially the "no PTE change" case isn't the common case, because we'd need an exclusive anonymous page that's mapped R/O and the PTE is clean in KSM code -- and using KSM with page pinning isn't extremely common. Further, the clear+TLB flush we used for now implies a memory barrier. So the problematic missing part should be the missing memory barrier after pinning but before checking if the PTE changed. Link: https://lkml.kernel.org/r/20220901083559.67446-1-david@redhat.com Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive") Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Nadav Amit <namit@vmware.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Christoph von Recklinghausen <crecklin@redhat.com> Cc: Don Dutile <ddutile@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/mremap_pages: save a few cycles in get_dev_pagemap()Christophe JAILLET1-1/+1
Use 'percpu_ref_tryget_live_rcu()' instead of 'percpu_ref_tryget_live()' to save a few cycles when it is known that the rcu lock is already taken/released. Link: https://lkml.kernel.org/r/9ef1562a1975371360f3e263856e9f1c5749b656.1662136782.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm: remove BUG_ON() in __isolate_free_page()Kefeng Wang1-11/+3
Drop unneed comment and blank, adjust the variable, and the most important is to delete BUG_ON(). The page passed is always buddy page into __isolate_free_page() from compaction, page_isolation and page_reporting, and the caller also check the return, BUG_ON() is a too drastic measure, remove it. Link: https://lkml.kernel.org/r/20220901015043.189276-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/kmemleak: make create_object return voidLiu Shixin1-12/+9
No caller cares about the return value of create_object(), so make it return void. Link: https://lkml.kernel.org/r/20220901023007.3471887-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12selftest: vm: remove deleted local_config.* from .gitignoreTarun Sahu1-1/+0
Commit d2d6cba5d6623 ("selftest: vm: remove orphaned references to local_config.{h,mk}") took care of removing orphaned references. This commit removes local_config from .gitignore. Parent patch commit 69007f156ba ("Kselftests: remove support of libhugetlbfs from kselftests") Link: https://lkml.kernel.org/r/20220901092315.33619-1-tsahu@linux.ibm.com Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: make hugetlb depends on SYSFS or SYSCTLMiaohe Lin1-0/+1
If CONFIG_SYSFS and CONFIG_SYSCTL are both undefined, hugetlb doesn't work now as there's no way to set max huge pages. Make sure at least one of the above configs is defined to make hugetlb works as expected. Link: https://lkml.kernel.org/r/20220901120030.63318-11-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: remove meaningless BUG_ON(huge_pte_none())Miaohe Lin1-1/+0
When code reaches here, invalid page would have been accessed if huge pte is none. So this BUG_ON(huge_pte_none()) is meaningless. Remove it. Link: https://lkml.kernel.org/r/20220901120030.63318-10-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: add comment for subtle SetHPageVmemmapOptimized()Miaohe Lin1-0/+4
The SetHPageVmemmapOptimized() called here seems unnecessary as it's assumed to be set when calling this function. But it's indeed cleared by above set_page_private(page, 0). Add a comment to avoid possible future confusion. Link: https://lkml.kernel.org/r/20220901120030.63318-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: kill hugetlbfs_pagecache_page()Miaohe Lin1-14/+1
Fold hugetlbfs_pagecache_page() into its sole caller to remove some duplicated code. No functional change intended. Link: https://lkml.kernel.org/r/20220901120030.63318-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: pass NULL to kobj_to_hstate() if nid is unusedMiaohe Lin1-4/+2
We can pass NULL to kobj_to_hstate() directly when nid is unused to simplify the code. No functional change intended. Link: https://lkml.kernel.org/r/20220901120030.63318-7-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: use helper {huge_pte|pmd}_lock()Miaohe Lin1-2/+1
Use helper huge_pte_lock and pmd_lock to simplify the code. No functional change intended. Link: https://lkml.kernel.org/r/20220901120030.63318-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: use sizeof() to get the array sizeMiaohe Lin1-2/+2
It's better to use sizeof() to get the array size instead of manual calculation. Minor readability improvement. Link: https://lkml.kernel.org/r/20220901120030.63318-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: use LIST_HEAD() to define a list headMiaohe Lin1-5/+2
Use LIST_HEAD() directly to define a list head to simplify the code. No functional change intended. Link: https://lkml.kernel.org/r/20220901120030.63318-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: Use helper macro SZ_1KMiaohe Lin1-1/+1
Use helper macro SZ_1K to do the size conversion to make code more consistent in this file. Minor readability improvement. Link: https://lkml.kernel.org/r/20220901120030.63318-3-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12hugetlb: make hugetlb_cma_check() staticMiaohe Lin2-5/+9
Patch series "A few cleanup patches for hugetlb", v2. This series contains a few cleanup patches to use helper functions to simplify the codes, remove unneeded nid parameter and so on. More details can be found in the respective changelogs. This patch (of 10): Make hugetlb_cma_check() static as it's only used inside mm/hugetlb.c. Link: https://lkml.kernel.org/r/20220901120030.63318-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220901120030.63318-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12fs/buffer: remove bh_submit_read() helperZhang Yi2-26/+0
bh_submit_read() has no user anymore, just remove it. Link: https://lkml.kernel.org/r/20220901133505.2510834-15-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12ext2: replace bh_submit_read() helper with bh_read()Zhang Yi1-3/+4
bh_submit_read() and the uptodate check logic in bh_uptodate_or_lock() has been integrated in bh_read() helper, so switch to use it directly. Link: https://lkml.kernel.org/r/20220901133505.2510834-14-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12fs/buffer: remove ll_rw_block() helperZhang Yi2-60/+4
Now that all ll_rw_block() users has been replaced to new safe helpers, we just remove it here. Link: https://lkml.kernel.org/r/20220901133505.2510834-13-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12ufs: replace ll_rw_block()Zhang Yi1-8/+4
ll_rw_block() is not safe for the sync read path because it cannot guarantee that submitting read IO if the buffer has been locked. We could get false positive EIO after wait_on_buffer() if the buffer has been locked by others. So stop using ll_rw_block() in ufs. Link: https://lkml.kernel.org/r/20220901133505.2510834-12-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12udf: replace ll_rw_block()Zhang Yi3-9/+3
ll_rw_block() is not safe for the sync read path because it cannot guarantee that submitting read IO if the buffer has been locked. We could get false positive EIO after wait_on_buffer() if the buffer has been locked by others. So stop using ll_rw_block(). We also switch to new bh_readahead_batch() helper for the buffer array readahead path. Link: https://lkml.kernel.org/r/20220901133505.2510834-11-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12reiserfs: replace ll_rw_block()Zhang Yi3-10/+9
ll_rw_block() is not safe for the sync read/write path because it cannot guarantee that submitting read/write IO if the buffer has been locked. We could get false positive EIO after wait_on_buffer() in read path if the buffer has been locked by others. So stop using ll_rw_block() in reiserfs. We also switch to new bh_readahead_batch() helper for the buffer array readahead path. Link: https://lkml.kernel.org/r/20220901133505.2510834-10-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12ocfs2: replace ll_rw_block()Zhang Yi2-4/+2
ll_rw_block() is not safe for the sync read path because it cannot guarantee that submitting read IO if the buffer has been locked. We could get false positive EIO after wait_on_buffer() if the buffer has been locked by others. So stop using ll_rw_block() in ocfs2. Link: https://lkml.kernel.org/r/20220901133505.2510834-9-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12ntfs3: replace ll_rw_block()Zhang Yi1-5/+2
ll_rw_block() is not safe for the sync read path because it cannot guarantee that submitting read IO if the buffer has been locked. We could get false positive EIO after wait_on_buffer() if the buffer has been locked by others. So stop using ll_rw_block() in ntfs_get_block_vbo(). Link: https://lkml.kernel.org/r/20220901133505.2510834-8-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12jbd2: replace ll_rw_block()Zhang Yi2-15/+16
ll_rw_block() is not safe for the sync read path because it cannot guarantee that submitting read IO if the buffer has been locked. We could get false positive EIO after wait_on_buffer() if the buffer has been locked by others. So stop using ll_rw_block() in journal_get_superblock(). We also switch to new bh_readahead_batch() for the buffer array readahead path. Link: https://lkml.kernel.org/r/20220901133505.2510834-7-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12isofs: replace ll_rw_block()Zhang Yi1-1/+1
ll_rw_block() is not safe for the sync read path because it cannot guarantee that submitting read IO if the buffer has been locked. We could get false positive EIO return from zisofs_uncompress_block() if he buffer has been locked by others. So stop using ll_rw_block(), switch to sync helper instead. Link: https://lkml.kernel.org/r/20220901133505.2510834-6-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12gfs2: replace ll_rw_block()Zhang Yi2-11/+4
ll_rw_block() is not safe for the sync read path because it cannot guarantee that always submitting read IO if the buffer has been locked, so stop using it. We also switch to new bh_readahead() helper for the readahead path. Link: https://lkml.kernel.org/r/20220901133505.2510834-5-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12fs/buffer: replace ll_rw_block()Zhang Yi1-7/+5
ll_rw_block() is not safe for the sync IO path because it skip buffers which has been locked by others, it could lead to false positive EIO when submitting read IO. So stop using ll_rw_block(), switch to use new helpers which could guarantee buffer locked and submit IO if needed. Link: https://lkml.kernel.org/r/20220901133505.2510834-4-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12fs/buffer: add some new buffer read helpersZhang Yi2-0/+103
Current ll_rw_block() helper is fragile because it assumes that locked buffer means it's under IO which is submitted by some other who holds the lock, it skip buffer if it failed to get the lock, so it's only safe on the readahead path. Unfortunately, now that most filesystems still use this helper mistakenly on the sync metadata read path. There is no guarantee that the one who holds the buffer lock always submit IO (e.g. buffer_migrate_folio_norefs() after commit 88dbcbb3a484 ("blkdev: avoid migration stalls for blkdev pages"), it could lead to false positive -EIO when submitting reading IO. This patch add some friendly buffer read helpers to prepare replacing ll_rw_block() and similar calls. We can only call bh_readahead_[] helpers for the readahead paths. Link: https://lkml.kernel.org/r/20220901133505.2510834-3-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12fs/buffer: remove __breadahead_gfp()Zhang Yi2-19/+0
Patch series "fs/buffer: remove ll_rw_block()", v2. ll_rw_block() will skip locked buffer before submitting IO, it assumes that locked buffer means it is under IO. This assumption is not always true because we cannot guarantee every buffer lock path would submit IO. After commit 88dbcbb3a484 ("blkdev: avoid migration stalls for blkdev pages"), buffer_migrate_folio_norefs() becomes one exceptional case, and there may be others. So ll_rw_block() is not safe on the sync read path, we could get false positive EIO return value when filesystem reading metadata. It seems that it could be only used on the readahead path. Unfortunately, many filesystem misuse the ll_rw_block() on the sync read path. This patch set just remove ll_rw_block() and add new friendly helpers, which could prevent false positive EIO on the read metadata path. Thanks for the suggestion from Jan, the original discussion is at [1]. patch 1: remove unused helpers in fs/buffer.c patch 2: add new bh_read_[*] helpers patch 3-11: remove all ll_rw_block() calls in filesystems patch 12-14: do some leftover cleanups. [1]. https://lore.kernel.org/linux-mm/20220825080146.2021641-1-chengzhihao1@huawei.com/ This patch (of 14): No one use __breadahead_gfp() and sb_breadahead_unmovable() any more, remove them. Link: https://lkml.kernel.org/r/20220901133505.2510834-1-yi.zhang@huawei.com Link: https://lkml.kernel.org/r/20220901133505.2510834-2-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Heming Zhao <ocfs2-devel@oss.oracle.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/vmalloc: extend find_vmap_lowest_match_check with extra argumentsSong Liu1-6/+7
find_vmap_lowest_match() is now able to handle different roots. With DEBUG_AUGMENT_LOWEST_MATCH_CHECK enabled as: : --- a/mm/vmalloc.c : +++ b/mm/vmalloc.c : @@ -713,7 +713,7 @@ EXPORT_SYMBOL(vmalloc_to_pfn); : /*** Global kva allocator ***/ : : -#define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 0 : +#define DEBUG_AUGMENT_LOWEST_MATCH_CHECK 1 compilation failed as: mm/vmalloc.c: In function 'find_vmap_lowest_match_check': mm/vmalloc.c:1328:32: warning: passing argument 1 of 'find_vmap_lowest_match' makes pointer from integer without a cast [-Wint-conversion] 1328 | va_1 = find_vmap_lowest_match(size, align, vstart, false); | ^~~~ | | | long unsigned int mm/vmalloc.c:1236:40: note: expected 'struct rb_root *' but argument is of type 'long unsigned int' 1236 | find_vmap_lowest_match(struct rb_root *root, unsigned long size, | ~~~~~~~~~~~~~~~~^~~~ mm/vmalloc.c:1328:9: error: too few arguments to function 'find_vmap_lowest_match' 1328 | va_1 = find_vmap_lowest_match(size, align, vstart, false); | ^~~~~~~~~~~~~~~~~~~~~~ mm/vmalloc.c:1236:1: note: declared here 1236 | find_vmap_lowest_match(struct rb_root *root, unsigned long size, | ^~~~~~~~~~~~~~~~~~~~~~ Extend find_vmap_lowest_match_check() and find_vmap_lowest_linear_match() with extra arguments to fix this. Link: https://lkml.kernel.org/r/20220906060548.1127396-1-song@kernel.org Link: https://lkml.kernel.org/r/20220831052734.3423079-1-song@kernel.org Fixes: f9863be49312 ("mm/vmalloc: extend __alloc_vmap_area() with extra arguments") Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/migrate_device.c: fix a misleading and outdated commentAlistair Popple1-3/+10
Commit ab09243aa95a ("mm/migrate.c: remove MIGRATE_PFN_LOCKED") changed the way trylock_page() in migrate_vma_collect_pmd() works without updating the comment. Reword the comment to be less misleading and a better reflection of what happens. Link: https://lkml.kernel.org/r/20220830020138.497063-1-apopple@nvidia.com Fixes: ab09243aa95a ("mm/migrate.c: remove MIGRATE_PFN_LOCKED") Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: Peter Xu <peterx@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/page_alloc.c: delete a redundant parameter of rmqueue_pcplistzezuo1-3/+2
The gfp_flags parameter is not used in rmqueue_pcplist, so directly delete this parameter. Link: https://lkml.kernel.org/r/20220831013404.3360714-1-zuoze1@huawei.com Signed-off-by: zezuo <zuoze1@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/damon: get the hotness from damon_hot_score() in damon_pageout_score()Kaixu Xia1-40/+6
We can get the hotness value from damon_hot_score() directly in damon_pageout_score() function and improve the code readability. Link: https://lkml.kernel.org/r/1661766366-20998-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/thp: remove redundant CONFIG_TRANSPARENT_HUGEPAGELiu Shixin1-2/+1
Simplify code by removing redundant CONFIG_TRANSPARENT_HUGEPAGE judgment. No functional change. Link: https://lkml.kernel.org/r/20220829095125.3284567-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/thp: simplify has_transparent_hugepage by using IS_BUILTINLiu Shixin1-5/+1
Simplify code of has_transparent_hugepage define by using IS_BUILTIN. No functional change. Link: https://lkml.kernel.org/r/20220829095709.3287462-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/damon/vaddr: remove comparison between mm and last_mm when checking ↵Kaixu Xia1-5/+6
region accesses The damon regions that belong to the same damon target have the same 'struct mm_struct *mm', so it's unnecessary to compare the mm and last_mm objects among the damon regions in one damon target when checking accesses. But the check is necessary when the target changed in '__damon_va_check_accesses()', so we can simplify the whole operation by using the bool 'same_target' to indicate whether the target changed. Link: https://lkml.kernel.org/r/1661590971-20893-3-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm/damon: simplify the parameter passing for 'check_accesses'Kaixu Xia2-6/+5
Patch series "mm/damon: Simplify the damon regions access check", v2. This patchset simplifies the operations when checking the damon regions accesses. This patch (of 2): The parameter 'struct damon_ctx *ctx' isn't used in the functions __damon_{p,v}a_check_access(), so we can remove it and simplify the parameter passing. Link: https://lkml.kernel.org/r/1661590971-20893-1-git-send-email-kaixuxia@tencent.com Link: https://lkml.kernel.org/r/1661590971-20893-2-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-12mm: fix null-ptr-deref in kswapd_is_running()Kefeng Wang6-15/+55
kswapd_run/stop() will set pgdat->kswapd to NULL, which could race with kswapd_is_running() in kcompactd(), kswapd_run/stop() kcompactd() kswapd_is_running() pgdat->kswapd // error or nomal ptr verify pgdat->kswapd // load non-NULL pgdat->kswapd pgdat->kswapd = NULL task_is_running(pgdat->kswapd) // Null pointer derefence KASAN reports the null-ptr-deref shown below, vmscan: Failed to start kswapd on node 0 ... BUG: KASAN: null-ptr-deref in kcompactd+0x440/0x504 Read of size 8 at addr 0000000000000024 by task kcompactd0/37 CPU: 0 PID: 37 Comm: kcompactd0 Kdump: loaded Tainted: G OE 5.10.60 #1 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x394 show_stack+0x34/0x4c dump_stack+0x158/0x1e4 __kasan_report+0x138/0x140 kasan_report+0x44/0xdc __asan_load8+0x94/0xd0 kcompactd+0x440/0x504 kthread+0x1a4/0x1f0 ret_from_fork+0x10/0x18 At present kswapd/kcompactd_run() and kswapd/kcompactd_stop() are protected by mem_hotplug_begin/done(), but without kcompactd(). There is no need to involve memory hotplug lock in kcompactd(), so let's add a new mutex to protect pgdat->kswapd accesses. Also, because the kcompactd task will check the state of kswapd task, it's better to call kcompactd_stop() before kswapd_stop() to reduce lock conflicts. [akpm@linux-foundation.org: add comments] Link: https://lkml.kernel.org/r/20220827111959.186838-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>