From b67bf49ce7aae72f63739abee6ac25f64bf20081 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 14 Feb 2022 18:21:52 -0800 Subject: mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE If counting page mlocks, we must not double-count: follow_page_pte() can tell if a page has already been Mlocked or not, but cannot tell if a pte has already been counted or not: that will have to be done when the pte is mapped in (which lru_cache_add_inactive_or_unevictable() already tracks for new anon pages, but there's no such tracking yet for others). Delete all the FOLL_MLOCK code - faulting in the missing pages will do all that is necessary, without special mlock_vma_page() calls from here. But then FOLL_POPULATE turns out to serve no purpose - it was there so that its absence would tell faultin_page() not to faultin page when setting up VM_LOCKONFAULT areas; but if there's no special work needed here for mlock, then there's no work at all here for VM_LOCKONFAULT. Have I got that right? I've not looked into the history, but see that FOLL_POPULATE goes back before VM_LOCKONFAULT: did it serve a different purpose before? Ah, yes, it was used to skip the old stack guard page. And is it intentional that COW is not broken on existing pages when setting up a VM_LOCKONFAULT area? I can see that being argued either way, and have no reason to disagree with current behaviour. Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka Signed-off-by: Matthew Wilcox (Oracle) --- mm/huge_memory.c | 33 --------------------------------- 1 file changed, 33 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 406a3c28c026..9a34b85ebcf8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1380,39 +1380,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, if (flags & FOLL_TOUCH) touch_pmd(vma, addr, pmd, flags); - if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { - /* - * We don't mlock() pte-mapped THPs. This way we can avoid - * leaking mlocked pages into non-VM_LOCKED VMAs. - * - * For anon THP: - * - * In most cases the pmd is the only mapping of the page as we - * break COW for the mlock() -- see gup_flags |= FOLL_WRITE for - * writable private mappings in populate_vma_page_range(). - * - * The only scenario when we have the page shared here is if we - * mlocking read-only mapping shared over fork(). We skip - * mlocking such pages. - * - * For file THP: - * - * We can expect PageDoubleMap() to be stable under page lock: - * for file pages we set it in page_add_file_rmap(), which - * requires page to be locked. - */ - - if (PageAnon(page) && compound_mapcount(page) != 1) - goto skip_mlock; - if (PageDoubleMap(page) || !page->mapping) - goto skip_mlock; - if (!trylock_page(page)) - goto skip_mlock; - if (page->mapping && !PageDoubleMap(page)) - mlock_vma_page(page); - unlock_page(page); - } -skip_mlock: page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page); -- cgit v1.2.3 From cea86fe246b694a191804b47378eb9d77aefabec Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 14 Feb 2022 18:26:39 -0800 Subject: mm/munlock: rmap call mlock_vma_page() munlock_vma_page() Add vma argument to mlock_vma_page() and munlock_vma_page(), make them inline functions which check (vma->vm_flags & VM_LOCKED) before calling mlock_page() and munlock_page() in mm/mlock.c. Add bool compound to mlock_vma_page() and munlock_vma_page(): this is because we have understandable difficulty in accounting pte maps of THPs, and if passed a PageHead page, mlock_page() and munlock_page() cannot tell whether it's a pmd map to be counted or a pte map to be ignored. Add vma arg to page_add_file_rmap() and page_remove_rmap(), like the others, and use that to call mlock_vma_page() at the end of the page adds, and munlock_vma_page() at the end of page_remove_rmap() (end or beginning? unimportant, but end was easier for assertions in testing). No page lock is required (although almost all adds happen to hold it): delete the "Serialize with page migration" BUG_ON(!PageLocked(page))s. Certainly page lock did serialize with page migration, but I'm having difficulty explaining why that was ever important. Mlock accounting on THPs has been hard to define, differed between anon and file, involved PageDoubleMap in some places and not others, required clear_page_mlock() at some points. Keep it simple now: just count the pmds and ignore the ptes, there is no reason for ptes to undo pmd mlocks. page_add_new_anon_rmap() callers unchanged: they have long been calling lru_cache_add_inactive_or_unevictable(), which does its own VM_LOCKED handling (it also checks for not VM_SPECIAL: I think that's overcautious, and inconsistent with other checks, that mmap_region() already prevents VM_LOCKED on VM_SPECIAL; but haven't quite convinced myself to change it). Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 17 ++++++++------- kernel/events/uprobes.c | 7 ++----- mm/huge_memory.c | 17 +++++++-------- mm/hugetlb.c | 4 ++-- mm/internal.h | 36 ++++++++++++++++++++++++++----- mm/khugepaged.c | 4 ++-- mm/ksm.c | 12 +---------- mm/memory.c | 45 +++++++++++++-------------------------- mm/migrate.c | 9 ++------ mm/mlock.c | 21 +++++++------------ mm/rmap.c | 56 +++++++++++++++++++++++-------------------------- mm/userfaultfd.c | 14 +++++++------ 12 files changed, 113 insertions(+), 129 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index dc48aa8c2c94..ac29b076082b 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -167,18 +167,19 @@ struct anon_vma *page_get_anon_vma(struct page *page); */ void page_move_anon_rmap(struct page *, struct vm_area_struct *); void page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, bool); + unsigned long address, bool compound); void do_page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, int); + unsigned long address, int flags); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, bool); -void page_add_file_rmap(struct page *, bool); -void page_remove_rmap(struct page *, bool); - + unsigned long address, bool compound); +void page_add_file_rmap(struct page *, struct vm_area_struct *, + bool compound); +void page_remove_rmap(struct page *, struct vm_area_struct *, + bool compound); void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long); + unsigned long address); void hugepage_add_new_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long); + unsigned long address); static inline void page_dup_rmap(struct page *page, bool compound) { diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 6357c3580d07..eed2f7437d96 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -173,7 +173,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, return err; } - /* For try_to_free_swap() and munlock_vma_page() below */ + /* For try_to_free_swap() below */ lock_page(old_page); mmu_notifier_invalidate_range_start(&range); @@ -201,13 +201,10 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, set_pte_at_notify(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot)); - page_remove_rmap(old_page, false); + page_remove_rmap(old_page, vma, false); if (!page_mapped(old_page)) try_to_free_swap(old_page); page_vma_mapped_walk_done(&pvmw); - - if ((vma->vm_flags & VM_LOCKED) && !PageCompound(old_page)) - munlock_vma_page(old_page); put_page(old_page); err = 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9a34b85ebcf8..d6477f48a27e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1577,7 +1577,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, if (pmd_present(orig_pmd)) { page = pmd_page(orig_pmd); - page_remove_rmap(page, true); + page_remove_rmap(page, vma, true); VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); VM_BUG_ON_PAGE(!PageHead(page), page); } else if (thp_migration_supported()) { @@ -1962,7 +1962,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, set_page_dirty(page); if (!PageReferenced(page) && pmd_young(old_pmd)) SetPageReferenced(page); - page_remove_rmap(page, true); + page_remove_rmap(page, vma, true); put_page(page); } add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); @@ -2096,6 +2096,9 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, } } unlock_page_memcg(page); + + /* Above is effectively page_remove_rmap(page, vma, true) */ + munlock_vma_page(page, vma, true); } smp_wmb(); /* make pte visible before pmd */ @@ -2103,7 +2106,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, if (freeze) { for (i = 0; i < HPAGE_PMD_NR; i++) { - page_remove_rmap(page + i, false); + page_remove_rmap(page + i, vma, false); put_page(page + i); } } @@ -2163,8 +2166,6 @@ repeat: do_unlock_page = true; } } - if (PageMlocked(page)) - clear_page_mlock(page); } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) goto out; __split_huge_pmd_locked(vma, pmd, range.start, freeze); @@ -3138,7 +3139,7 @@ void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, if (pmd_soft_dirty(pmdval)) pmdswp = pmd_swp_mksoft_dirty(pmdswp); set_pmd_at(mm, address, pvmw->pmd, pmdswp); - page_remove_rmap(page, true); + page_remove_rmap(page, vma, true); put_page(page); } @@ -3168,10 +3169,8 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) if (PageAnon(new)) page_add_anon_rmap(new, vma, mmun_start, true); else - page_add_file_rmap(new, true); + page_add_file_rmap(new, vma, true); set_pmd_at(mm, mmun_start, pvmw->pmd, pmde); - if ((vma->vm_flags & VM_LOCKED) && !PageDoubleMap(new)) - mlock_vma_page(new); update_mmu_cache_pmd(vma, address, pvmw->pmd); } #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 61895cc01d09..43fb3155298e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5014,7 +5014,7 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct set_page_dirty(page); hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, true); + page_remove_rmap(page, vma, true); spin_unlock(ptl); tlb_remove_page_size(tlb, page, huge_page_size(h)); @@ -5259,7 +5259,7 @@ retry_avoidcopy: /* Break COW */ huge_ptep_clear_flush(vma, haddr, ptep); mmu_notifier_invalidate_range(mm, range.start, range.end); - page_remove_rmap(old_page, true); + page_remove_rmap(old_page, vma, true); hugepage_add_new_anon_rmap(new_page, vma, haddr); set_huge_pte_at(mm, haddr, ptep, make_huge_pte(vma, new_page, 1)); diff --git a/mm/internal.h b/mm/internal.h index f235aa92e564..3d7dfc8bc471 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -395,12 +395,35 @@ extern long faultin_vma_page_range(struct vm_area_struct *vma, bool write, int *locked); extern int mlock_future_check(struct mm_struct *mm, unsigned long flags, unsigned long len); - /* - * must be called with vma's mmap_lock held for read or write, and page locked. + * mlock_vma_page() and munlock_vma_page(): + * should be called with vma's mmap_lock held for read or write, + * under page table lock for the pte/pmd being added or removed. + * + * mlock is usually called at the end of page_add_*_rmap(), + * munlock at the end of page_remove_rmap(); but new anon + * pages are managed in lru_cache_add_inactive_or_unevictable(). + * + * @compound is used to include pmd mappings of THPs, but filter out + * pte mappings of THPs, which cannot be consistently counted: a pte + * mapping of the THP head cannot be distinguished by the page alone. */ -extern void mlock_vma_page(struct page *page); -extern void munlock_vma_page(struct page *page); +void mlock_page(struct page *page); +static inline void mlock_vma_page(struct page *page, + struct vm_area_struct *vma, bool compound) +{ + if (unlikely(vma->vm_flags & VM_LOCKED) && + (compound || !PageTransCompound(page))) + mlock_page(page); +} +void munlock_page(struct page *page); +static inline void munlock_vma_page(struct page *page, + struct vm_area_struct *vma, bool compound) +{ + if (unlikely(vma->vm_flags & VM_LOCKED) && + (compound || !PageTransCompound(page))) + munlock_page(page); +} /* * Clear the page's PageMlocked(). This can be useful in a situation where @@ -487,7 +510,10 @@ static inline struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf, #else /* !CONFIG_MMU */ static inline void unmap_mapping_folio(struct folio *folio) { } static inline void clear_page_mlock(struct page *page) { } -static inline void mlock_vma_page(struct page *page) { } +static inline void mlock_vma_page(struct page *page, + struct vm_area_struct *vma, bool compound) { } +static inline void munlock_vma_page(struct page *page, + struct vm_area_struct *vma, bool compound) { } static inline void vunmap_range_noflush(unsigned long start, unsigned long end) { } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 131492fd1148..52add1825525 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -774,7 +774,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, */ spin_lock(ptl); ptep_clear(vma->vm_mm, address, _pte); - page_remove_rmap(src_page, false); + page_remove_rmap(src_page, vma, false); spin_unlock(ptl); free_page_and_swap_cache(src_page); } @@ -1513,7 +1513,7 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (pte_none(*pte)) continue; page = vm_normal_page(vma, addr, *pte); - page_remove_rmap(page, false); + page_remove_rmap(page, vma, false); } pte_unmap_unlock(start_pte, ptl); diff --git a/mm/ksm.c b/mm/ksm.c index c20bd4d9a0d9..c5a4403b5dc9 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1177,7 +1177,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, ptep_clear_flush(vma, addr, ptep); set_pte_at_notify(mm, addr, ptep, newpte); - page_remove_rmap(page, false); + page_remove_rmap(page, vma, false); if (!page_mapped(page)) try_to_free_swap(page); put_page(page); @@ -1252,16 +1252,6 @@ static int try_to_merge_one_page(struct vm_area_struct *vma, err = replace_page(vma, page, kpage, orig_pte); } - if ((vma->vm_flags & VM_LOCKED) && kpage && !err) { - munlock_vma_page(page); - if (!PageMlocked(kpage)) { - unlock_page(page); - lock_page(kpage); - mlock_vma_page(kpage); - page = kpage; /* for final unlock */ - } - } - out_unlock: unlock_page(page); out: diff --git a/mm/memory.c b/mm/memory.c index c125c4969913..53bd9e5f2e33 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -735,9 +735,6 @@ static void restore_exclusive_pte(struct vm_area_struct *vma, set_pte_at(vma->vm_mm, address, ptep, pte); - if (vma->vm_flags & VM_LOCKED) - mlock_vma_page(page); - /* * No need to invalidate - it was non-present before. However * secondary CPUs may have mappings that need invalidating. @@ -1377,7 +1374,7 @@ again: mark_page_accessed(page); } rss[mm_counter(page)]--; - page_remove_rmap(page, false); + page_remove_rmap(page, vma, false); if (unlikely(page_mapcount(page) < 0)) print_bad_pte(vma, addr, ptent, page); if (unlikely(__tlb_remove_page(tlb, page))) { @@ -1397,10 +1394,8 @@ again: continue; pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); rss[mm_counter(page)]--; - if (is_device_private_entry(entry)) - page_remove_rmap(page, false); - + page_remove_rmap(page, vma, false); put_page(page); continue; } @@ -1753,16 +1748,16 @@ static int validate_page_before_insert(struct page *page) return 0; } -static int insert_page_into_pte_locked(struct mm_struct *mm, pte_t *pte, +static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte, unsigned long addr, struct page *page, pgprot_t prot) { if (!pte_none(*pte)) return -EBUSY; /* Ok, finally just insert the thing.. */ get_page(page); - inc_mm_counter_fast(mm, mm_counter_file(page)); - page_add_file_rmap(page, false); - set_pte_at(mm, addr, pte, mk_pte(page, prot)); + inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); + page_add_file_rmap(page, vma, false); + set_pte_at(vma->vm_mm, addr, pte, mk_pte(page, prot)); return 0; } @@ -1776,7 +1771,6 @@ static int insert_page_into_pte_locked(struct mm_struct *mm, pte_t *pte, static int insert_page(struct vm_area_struct *vma, unsigned long addr, struct page *page, pgprot_t prot) { - struct mm_struct *mm = vma->vm_mm; int retval; pte_t *pte; spinlock_t *ptl; @@ -1785,17 +1779,17 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr, if (retval) goto out; retval = -ENOMEM; - pte = get_locked_pte(mm, addr, &ptl); + pte = get_locked_pte(vma->vm_mm, addr, &ptl); if (!pte) goto out; - retval = insert_page_into_pte_locked(mm, pte, addr, page, prot); + retval = insert_page_into_pte_locked(vma, pte, addr, page, prot); pte_unmap_unlock(pte, ptl); out: return retval; } #ifdef pte_index -static int insert_page_in_batch_locked(struct mm_struct *mm, pte_t *pte, +static int insert_page_in_batch_locked(struct vm_area_struct *vma, pte_t *pte, unsigned long addr, struct page *page, pgprot_t prot) { int err; @@ -1805,7 +1799,7 @@ static int insert_page_in_batch_locked(struct mm_struct *mm, pte_t *pte, err = validate_page_before_insert(page); if (err) return err; - return insert_page_into_pte_locked(mm, pte, addr, page, prot); + return insert_page_into_pte_locked(vma, pte, addr, page, prot); } /* insert_pages() amortizes the cost of spinlock operations @@ -1842,7 +1836,7 @@ more: start_pte = pte_offset_map_lock(mm, pmd, addr, &pte_lock); for (pte = start_pte; pte_idx < batch_size; ++pte, ++pte_idx) { - int err = insert_page_in_batch_locked(mm, pte, + int err = insert_page_in_batch_locked(vma, pte, addr, pages[curr_page_idx], prot); if (unlikely(err)) { pte_unmap_unlock(start_pte, pte_lock); @@ -3098,7 +3092,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) * mapcount is visible. So transitively, TLBs to * old page will be flushed before it can be reused. */ - page_remove_rmap(old_page, false); + page_remove_rmap(old_page, vma, false); } /* Free the old page.. */ @@ -3118,16 +3112,6 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) */ mmu_notifier_invalidate_range_only_end(&range); if (old_page) { - /* - * Don't let another task, with possibly unlocked vma, - * keep the mlocked page. - */ - if (page_copied && (vma->vm_flags & VM_LOCKED)) { - lock_page(old_page); /* LRU manipulation */ - if (PageMlocked(old_page)) - munlock_vma_page(old_page); - unlock_page(old_page); - } if (page_copied) free_swap_cache(old_page); put_page(old_page); @@ -3947,7 +3931,8 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); add_mm_counter(vma->vm_mm, mm_counter_file(page), HPAGE_PMD_NR); - page_add_file_rmap(page, true); + page_add_file_rmap(page, vma, true); + /* * deposit and withdraw with pmd lock held */ @@ -3996,7 +3981,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) lru_cache_add_inactive_or_unevictable(page, vma); } else { inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page)); - page_add_file_rmap(page, false); + page_add_file_rmap(page, vma, false); } set_pte_at(vma->vm_mm, addr, vmf->pte, entry); } diff --git a/mm/migrate.c b/mm/migrate.c index c7da064b4781..7c4223ce2500 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -248,14 +248,9 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, if (PageAnon(new)) page_add_anon_rmap(new, vma, pvmw.address, false); else - page_add_file_rmap(new, false); + page_add_file_rmap(new, vma, false); set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); } - if (vma->vm_flags & VM_LOCKED && !PageTransCompound(new)) - mlock_vma_page(new); - - if (PageTransHuge(page) && PageMlocked(page)) - clear_page_mlock(page); /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); @@ -2331,7 +2326,7 @@ again: * drop page refcount. Page won't be freed, as we took * a reference just above. */ - page_remove_rmap(page, false); + page_remove_rmap(page, vma, false); put_page(page); if (pte_present(pte)) diff --git a/mm/mlock.c b/mm/mlock.c index 5d7ced8303be..92f28258b4ae 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -78,17 +78,13 @@ void clear_page_mlock(struct page *page) } } -/* - * Mark page as mlocked if not already. - * If page on LRU, isolate and putback to move to unevictable list. +/** + * mlock_page - mlock a page + * @page: page to be mlocked, either a normal page or a THP head. */ -void mlock_vma_page(struct page *page) +void mlock_page(struct page *page) { - /* Serialize with page migration */ - BUG_ON(!PageLocked(page)); - VM_BUG_ON_PAGE(PageTail(page), page); - VM_BUG_ON_PAGE(PageCompound(page) && PageDoubleMap(page), page); if (!TestSetPageMlocked(page)) { int nr_pages = thp_nr_pages(page); @@ -101,14 +97,11 @@ void mlock_vma_page(struct page *page) } /** - * munlock_vma_page - munlock a vma page - * @page: page to be unlocked, either a normal page or THP page head + * munlock_page - munlock a page + * @page: page to be munlocked, either a normal page or a THP head. */ -void munlock_vma_page(struct page *page) +void munlock_page(struct page *page) { - /* Serialize with page migration */ - BUG_ON(!PageLocked(page)); - VM_BUG_ON_PAGE(PageTail(page), page); if (TestClearPageMlocked(page)) { diff --git a/mm/rmap.c b/mm/rmap.c index 7ce7f1946cff..6cc8bf129f18 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1181,17 +1181,17 @@ void do_page_add_anon_rmap(struct page *page, __mod_lruvec_page_state(page, NR_ANON_MAPPED, nr); } - if (unlikely(PageKsm(page))) { + if (unlikely(PageKsm(page))) unlock_page_memcg(page); - return; - } /* address might be in next vma when migration races vma_adjust */ - if (first) + else if (first) __page_set_anon_rmap(page, vma, address, flags & RMAP_EXCLUSIVE); else __page_check_anon_rmap(page, vma, address); + + mlock_vma_page(page, vma, compound); } /** @@ -1232,12 +1232,14 @@ void page_add_new_anon_rmap(struct page *page, /** * page_add_file_rmap - add pte mapping to a file page - * @page: the page to add the mapping to - * @compound: charge the page as compound or small page + * @page: the page to add the mapping to + * @vma: the vm area in which the mapping is added + * @compound: charge the page as compound or small page * * The caller needs to hold the pte lock. */ -void page_add_file_rmap(struct page *page, bool compound) +void page_add_file_rmap(struct page *page, + struct vm_area_struct *vma, bool compound) { int i, nr = 1; @@ -1260,13 +1262,8 @@ void page_add_file_rmap(struct page *page, bool compound) nr_pages); } else { if (PageTransCompound(page) && page_mapping(page)) { - struct page *head = compound_head(page); - VM_WARN_ON_ONCE(!PageLocked(page)); - - SetPageDoubleMap(head); - if (PageMlocked(page)) - clear_page_mlock(head); + SetPageDoubleMap(compound_head(page)); } if (!atomic_inc_and_test(&page->_mapcount)) goto out; @@ -1274,6 +1271,8 @@ void page_add_file_rmap(struct page *page, bool compound) __mod_lruvec_page_state(page, NR_FILE_MAPPED, nr); out: unlock_page_memcg(page); + + mlock_vma_page(page, vma, compound); } static void page_remove_file_rmap(struct page *page, bool compound) @@ -1368,11 +1367,13 @@ static void page_remove_anon_compound_rmap(struct page *page) /** * page_remove_rmap - take down pte mapping from a page * @page: page to remove mapping from + * @vma: the vm area from which the mapping is removed * @compound: uncharge the page as compound or small page * * The caller needs to hold the pte lock. */ -void page_remove_rmap(struct page *page, bool compound) +void page_remove_rmap(struct page *page, + struct vm_area_struct *vma, bool compound) { lock_page_memcg(page); @@ -1414,6 +1415,8 @@ void page_remove_rmap(struct page *page, bool compound) */ out: unlock_page_memcg(page); + + munlock_vma_page(page, vma, compound); } /* @@ -1469,28 +1472,21 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); while (page_vma_mapped_walk(&pvmw)) { + /* Unexpected PMD-mapped THP? */ + VM_BUG_ON_PAGE(!pvmw.pte, page); + /* - * If the page is mlock()d, we cannot swap it out. + * If the page is in an mlock()d vma, we must not swap it out. */ if (!(flags & TTU_IGNORE_MLOCK) && (vma->vm_flags & VM_LOCKED)) { - /* - * PTE-mapped THP are never marked as mlocked: so do - * not set it on a DoubleMap THP, nor on an Anon THP - * (which may still be PTE-mapped after DoubleMap was - * cleared). But stop unmapping even in those cases. - */ - if (!PageTransCompound(page) || (PageHead(page) && - !PageDoubleMap(page) && !PageAnon(page))) - mlock_vma_page(page); + /* Restore the mlock which got missed */ + mlock_vma_page(page, vma, false); page_vma_mapped_walk_done(&pvmw); ret = false; break; } - /* Unexpected PMD-mapped THP? */ - VM_BUG_ON_PAGE(!pvmw.pte, page); - subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); address = pvmw.address; @@ -1668,7 +1664,7 @@ discard: * * See Documentation/vm/mmu_notifier.rst */ - page_remove_rmap(subpage, PageHuge(page)); + page_remove_rmap(subpage, vma, PageHuge(page)); put_page(page); } @@ -1942,7 +1938,7 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, * * See Documentation/vm/mmu_notifier.rst */ - page_remove_rmap(subpage, PageHuge(page)); + page_remove_rmap(subpage, vma, PageHuge(page)); put_page(page); } @@ -2078,7 +2074,7 @@ static bool page_make_device_exclusive_one(struct page *page, * There is a reference on the page for the swap entry which has * been removed, so shouldn't take another. */ - page_remove_rmap(subpage, false); + page_remove_rmap(subpage, vma, false); } mmu_notifier_invalidate_range_end(&range); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 0780c2a57ff1..15d3e97a6e04 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -95,10 +95,15 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, if (!pte_none(*dst_pte)) goto out_unlock; - if (page_in_cache) - page_add_file_rmap(page, false); - else + if (page_in_cache) { + /* Usually, cache pages are already added to LRU */ + if (newly_allocated) + lru_cache_add(page); + page_add_file_rmap(page, dst_vma, false); + } else { page_add_new_anon_rmap(page, dst_vma, dst_addr, false); + lru_cache_add_inactive_or_unevictable(page, dst_vma); + } /* * Must happen after rmap, as mm_counter() checks mapping (via @@ -106,9 +111,6 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, */ inc_mm_counter(dst_mm, mm_counter(page)); - if (newly_allocated) - lru_cache_add_inactive_or_unevictable(page, dst_vma); - set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); /* No need to invalidate - it was non-present before */ -- cgit v1.2.3 From 07ca760673088f262da57ff42c15558688565aa2 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 14 Feb 2022 18:29:54 -0800 Subject: mm/munlock: maintain page->mlock_count while unevictable Previous patches have been preparatory: now implement page->mlock_count. The ordering of the "Unevictable LRU" is of no significance, and there is no point holding unevictable pages on a list: place page->mlock_count to overlay page->lru.prev (since page->lru.next is overlaid by compound_head, which needs to be even so as not to satisfy PageTail - though 2 could be added instead of 1 for each mlock, if that's ever an improvement). But it's only safe to rely on or modify page->mlock_count while lruvec lock is held and page is on unevictable "LRU" - we can save lots of edits by continuing to pretend that there's an imaginary LRU here (there is an unevictable count which still needs to be maintained, but not a list). The mlock_count technique suffers from an unreliability much like with page_mlock(): while someone else has the page off LRU, not much can be done. As before, err on the safe side (behave as if mlock_count 0), and let try_to_unlock_one() move the page to unevictable if reclaim finds out later on - a few misplaced pages don't matter, what we want to avoid is imbalancing reclaim by flooding evictable lists with unevictable pages. I am not a fan of "if (!isolate_lru_page(page)) putback_lru_page(page);": if we have taken lruvec lock to get the page off its present list, then we save everyone trouble (and however many extra atomic ops) by putting it on its destination list immediately. Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm_inline.h | 11 +++++--- include/linux/mm_types.h | 19 +++++++++++-- mm/huge_memory.c | 5 +++- mm/memcontrol.c | 3 +-- mm/mlock.c | 68 +++++++++++++++++++++++++++++++++++++---------- mm/mmzone.c | 7 +++++ mm/swap.c | 1 + 7 files changed, 92 insertions(+), 22 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index b725839dfe71..884d6f6af05b 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -99,7 +99,8 @@ void lruvec_add_folio(struct lruvec *lruvec, struct folio *folio) update_lru_size(lruvec, lru, folio_zonenum(folio), folio_nr_pages(folio)); - list_add(&folio->lru, &lruvec->lists[lru]); + if (lru != LRU_UNEVICTABLE) + list_add(&folio->lru, &lruvec->lists[lru]); } static __always_inline void add_page_to_lru_list(struct page *page, @@ -115,6 +116,7 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struct folio *folio) update_lru_size(lruvec, lru, folio_zonenum(folio), folio_nr_pages(folio)); + /* This is not expected to be used on LRU_UNEVICTABLE */ list_add_tail(&folio->lru, &lruvec->lists[lru]); } @@ -127,8 +129,11 @@ static __always_inline void add_page_to_lru_list_tail(struct page *page, static __always_inline void lruvec_del_folio(struct lruvec *lruvec, struct folio *folio) { - list_del(&folio->lru); - update_lru_size(lruvec, folio_lru_list(folio), folio_zonenum(folio), + enum lru_list lru = folio_lru_list(folio); + + if (lru != LRU_UNEVICTABLE) + list_del(&folio->lru); + update_lru_size(lruvec, lru, folio_zonenum(folio), -folio_nr_pages(folio)); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5140e5feb486..475bdb282769 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -85,7 +85,16 @@ struct page { * lruvec->lru_lock. Sometimes used as a generic list * by the page owner. */ - struct list_head lru; + union { + struct list_head lru; + /* Or, for the Unevictable "LRU list" slot */ + struct { + /* Always even, to negate PageTail */ + void *__filler; + /* Count page's or folio's mlocks */ + unsigned int mlock_count; + }; + }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; pgoff_t index; /* Our offset within mapping. */ @@ -241,7 +250,13 @@ struct folio { struct { /* public: */ unsigned long flags; - struct list_head lru; + union { + struct list_head lru; + struct { + void *__filler; + unsigned int mlock_count; + }; + }; struct address_space *mapping; pgoff_t index; void *private; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d6477f48a27e..9afca0122723 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2300,8 +2300,11 @@ static void lru_add_page_tail(struct page *head, struct page *tail, } else { /* head is still on lru (and we have it frozen) */ VM_WARN_ON(!PageLRU(head)); + if (PageUnevictable(tail)) + tail->mlock_count = 0; + else + list_add_tail(&tail->lru, &head->lru); SetPageLRU(tail); - list_add_tail(&tail->lru, &head->lru); } } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 36e9f38c919d..c78b9d3b9c04 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1257,8 +1257,7 @@ struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, * @nr_pages: positive when adding or negative when removing * * This function must be called under lru_lock, just before a page is added - * to or just after a page is removed from an lru list (that ordering being - * so as to allow it to check that lru_size 0 is consistent with list_empty). + * to or just after a page is removed from an lru list. */ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, int zid, int nr_pages) diff --git a/mm/mlock.c b/mm/mlock.c index 3c26473050a3..f8a3a54687dd 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -54,16 +54,35 @@ EXPORT_SYMBOL(can_do_mlock); */ void mlock_page(struct page *page) { + struct lruvec *lruvec; + int nr_pages = thp_nr_pages(page); + VM_BUG_ON_PAGE(PageTail(page), page); if (!TestSetPageMlocked(page)) { - int nr_pages = thp_nr_pages(page); - mod_zone_page_state(page_zone(page), NR_MLOCK, nr_pages); - count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); - if (!isolate_lru_page(page)) - putback_lru_page(page); + __count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); + } + + /* There is nothing more we can do while it's off LRU */ + if (!TestClearPageLRU(page)) + return; + + lruvec = folio_lruvec_lock_irq(page_folio(page)); + if (PageUnevictable(page)) { + page->mlock_count++; + goto out; } + + del_page_from_lru_list(page, lruvec); + ClearPageActive(page); + SetPageUnevictable(page); + page->mlock_count = 1; + add_page_to_lru_list(page, lruvec); + __count_vm_events(UNEVICTABLE_PGCULLED, nr_pages); +out: + SetPageLRU(page); + unlock_page_lruvec_irq(lruvec); } /** @@ -72,19 +91,40 @@ void mlock_page(struct page *page) */ void munlock_page(struct page *page) { + struct lruvec *lruvec; + int nr_pages = thp_nr_pages(page); + VM_BUG_ON_PAGE(PageTail(page), page); + lock_page_memcg(page); + lruvec = folio_lruvec_lock_irq(page_folio(page)); + if (PageLRU(page) && PageUnevictable(page)) { + /* Then mlock_count is maintained, but might undercount */ + if (page->mlock_count) + page->mlock_count--; + if (page->mlock_count) + goto out; + } + /* else assume that was the last mlock: reclaim will fix it if not */ + if (TestClearPageMlocked(page)) { - int nr_pages = thp_nr_pages(page); - - mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); - if (!isolate_lru_page(page)) { - putback_lru_page(page); - count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages); - } else if (PageUnevictable(page)) { - count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages); - } + __mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); + if (PageLRU(page) || !PageUnevictable(page)) + __count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages); + else + __count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages); + } + + /* page_evictable() has to be checked *after* clearing Mlocked */ + if (PageLRU(page) && PageUnevictable(page) && page_evictable(page)) { + del_page_from_lru_list(page, lruvec); + ClearPageUnevictable(page); + add_page_to_lru_list(page, lruvec); + __count_vm_events(UNEVICTABLE_PGRESCUED, nr_pages); } +out: + unlock_page_lruvec_irq(lruvec); + unlock_page_memcg(page); } /* diff --git a/mm/mmzone.c b/mm/mmzone.c index eb89d6e018e2..40e1d9428300 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -81,6 +81,13 @@ void lruvec_init(struct lruvec *lruvec) for_each_lru(lru) INIT_LIST_HEAD(&lruvec->lists[lru]); + /* + * The "Unevictable LRU" is imaginary: though its size is maintained, + * it is never scanned, and unevictable pages are not threaded on it + * (so that their lru fields can be reused to hold mlock_count). + * Poison its list head, so that any operations on it would crash. + */ + list_del(&lruvec->lists[LRU_UNEVICTABLE]); } #if defined(CONFIG_NUMA_BALANCING) && !defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) diff --git a/mm/swap.c b/mm/swap.c index ff4810e4a4bc..682a03301a2c 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -1062,6 +1062,7 @@ static void __pagevec_lru_add_fn(struct folio *folio, struct lruvec *lruvec) } else { folio_clear_active(folio); folio_set_unevictable(folio); + folio->mlock_count = !!folio_test_mlocked(folio); if (!was_unevictable) __count_vm_events(UNEVICTABLE_PGCULLED, nr_pages); } -- cgit v1.2.3 From 4ba1119cd53166d853050ff1a9d76079cd8f8e06 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 17 Jan 2022 16:33:26 -0500 Subject: mm: Add folio_mapcount() This implements the same algorithm as total_mapcount(), which is transformed into a wrapper function. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/mm.h | 8 +++++++- mm/huge_memory.c | 24 ------------------------ mm/util.c | 33 +++++++++++++++++++++++++++++++++ 3 files changed, 40 insertions(+), 25 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/include/linux/mm.h b/include/linux/mm.h index 70f0ca217962..0d380dc26847 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -825,8 +825,14 @@ static inline int page_mapcount(struct page *page) return atomic_read(&page->_mapcount) + 1; } +int folio_mapcount(struct folio *folio); + #ifdef CONFIG_TRANSPARENT_HUGEPAGE -int total_mapcount(struct page *page); +static inline int total_mapcount(struct page *page) +{ + return folio_mapcount(page_folio(page)); +} + int page_trans_huge_mapcount(struct page *page); #else static inline int total_mapcount(struct page *page) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9afca0122723..beebe4105659 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2465,30 +2465,6 @@ static void __split_huge_page(struct page *page, struct list_head *list, } } -int total_mapcount(struct page *page) -{ - int i, compound, nr, ret; - - VM_BUG_ON_PAGE(PageTail(page), page); - - if (likely(!PageCompound(page))) - return atomic_read(&page->_mapcount) + 1; - - compound = compound_mapcount(page); - nr = compound_nr(page); - if (PageHuge(page)) - return compound; - ret = compound; - for (i = 0; i < nr; i++) - ret += atomic_read(&page[i]._mapcount) + 1; - /* File pages has compound_mapcount included in _mapcount */ - if (!PageAnon(page)) - return ret - compound * nr; - if (PageDoubleMap(page)) - ret -= nr; - return ret; -} - /* * This calculates accurately how many mappings a transparent hugepage * has (unlike page_mapcount() which isn't fully accurate). This full diff --git a/mm/util.c b/mm/util.c index 7e43369064c8..b614f423aaa4 100644 --- a/mm/util.c +++ b/mm/util.c @@ -740,6 +740,39 @@ int __page_mapcount(struct page *page) } EXPORT_SYMBOL_GPL(__page_mapcount); +/** + * folio_mapcount() - Calculate the number of mappings of this folio. + * @folio: The folio. + * + * A large folio tracks both how many times the entire folio is mapped, + * and how many times each individual page in the folio is mapped. + * This function calculates the total number of times the folio is + * mapped. + * + * Return: The number of times this folio is mapped. + */ +int folio_mapcount(struct folio *folio) +{ + int i, compound, nr, ret; + + if (likely(!folio_test_large(folio))) + return atomic_read(&folio->_mapcount) + 1; + + compound = folio_entire_mapcount(folio); + nr = folio_nr_pages(folio); + if (folio_test_hugetlb(folio)) + return compound; + ret = compound; + for (i = 0; i < nr; i++) + ret += atomic_read(&folio_page(folio, i)->_mapcount) + 1; + /* File pages has compound_mapcount included in _mapcount */ + if (!folio_test_anon(folio)) + return ret - compound * nr; + if (folio_test_double_map(folio)) + ret -= nr; + return ret; +} + /** * folio_copy - Copy the contents of one folio to another. * @dst: Folio to copy to. -- cgit v1.2.3 From af28a988b313a601c12c410a42e485ca46adcfee Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 21 Jan 2022 10:44:52 -0500 Subject: mm/huge_memory: Convert __split_huge_pmd() to take a folio Convert split_huge_pmd_address() at the same time since it only passes the folio through, and its two callers already have a folio on hand. Removes numerous calls to compound_head() and removes an assumption that a page cannot be larger than a PMD. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/huge_mm.h | 8 ++++---- mm/huge_memory.c | 46 +++++++++++++++++++++++----------------------- mm/rmap.c | 6 ++++-- 3 files changed, 31 insertions(+), 29 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 71c073d411ac..4368b314d9c8 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -194,7 +194,7 @@ static inline int split_huge_page(struct page *page) void deferred_split_huge_page(struct page *page); void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct page *page); + unsigned long address, bool freeze, struct folio *folio); #define split_huge_pmd(__vma, __pmd, __address) \ do { \ @@ -207,7 +207,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze, struct page *page); + bool freeze, struct folio *folio); void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, unsigned long address); @@ -406,9 +406,9 @@ static inline void deferred_split_huge_page(struct page *page) {} do { } while (0) static inline void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct page *page) {} + unsigned long address, bool freeze, struct folio *folio) {} static inline void split_huge_pmd_address(struct vm_area_struct *vma, - unsigned long address, bool freeze, struct page *page) {} + unsigned long address, bool freeze, struct folio *folio) {} #define split_huge_pud(__vma, __pmd, __address) \ do { } while (0) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index beebe4105659..583b735a079b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2113,11 +2113,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, } void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct page *page) + unsigned long address, bool freeze, struct folio *folio) { spinlock_t *ptl; struct mmu_notifier_range range; - bool do_unlock_page = false; + bool do_unlock_folio = false; pmd_t _pmd; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, @@ -2127,20 +2127,20 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, ptl = pmd_lock(vma->vm_mm, pmd); /* - * If caller asks to setup a migration entries, we need a page to check - * pmd against. Otherwise we can end up replacing wrong page. + * If caller asks to setup a migration entry, we need a folio to check + * pmd against. Otherwise we can end up replacing wrong folio. */ - VM_BUG_ON(freeze && !page); - if (page) { - VM_WARN_ON_ONCE(!PageLocked(page)); - if (page != pmd_page(*pmd)) + VM_BUG_ON(freeze && !folio); + if (folio) { + VM_WARN_ON_ONCE(!folio_test_locked(folio)); + if (folio != page_folio(pmd_page(*pmd))) goto out; } repeat: if (pmd_trans_huge(*pmd)) { - if (!page) { - page = pmd_page(*pmd); + if (!folio) { + folio = page_folio(pmd_page(*pmd)); /* * An anonymous page must be locked, to ensure that a * concurrent reuse_swap_page() sees stable mapcount; @@ -2148,22 +2148,22 @@ repeat: * and page lock must not be taken when zap_pmd_range() * calls __split_huge_pmd() while i_mmap_lock is held. */ - if (PageAnon(page)) { - if (unlikely(!trylock_page(page))) { - get_page(page); + if (folio_test_anon(folio)) { + if (unlikely(!folio_trylock(folio))) { + folio_get(folio); _pmd = *pmd; spin_unlock(ptl); - lock_page(page); + folio_lock(folio); spin_lock(ptl); if (unlikely(!pmd_same(*pmd, _pmd))) { - unlock_page(page); - put_page(page); - page = NULL; + folio_unlock(folio); + folio_put(folio); + folio = NULL; goto repeat; } - put_page(page); + folio_put(folio); } - do_unlock_page = true; + do_unlock_folio = true; } } } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) @@ -2171,8 +2171,8 @@ repeat: __split_huge_pmd_locked(vma, pmd, range.start, freeze); out: spin_unlock(ptl); - if (do_unlock_page) - unlock_page(page); + if (do_unlock_folio) + folio_unlock(folio); /* * No need to double call mmu_notifier->invalidate_range() callback. * They are 3 cases to consider inside __split_huge_pmd_locked(): @@ -2190,7 +2190,7 @@ out: } void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze, struct page *page) + bool freeze, struct folio *folio) { pgd_t *pgd; p4d_t *p4d; @@ -2211,7 +2211,7 @@ void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, pmd = pmd_offset(pud, address); - __split_huge_pmd(vma, pmd, address, freeze, page); + __split_huge_pmd(vma, pmd, address, freeze, folio); } static inline void split_huge_pmd_if_needed(struct vm_area_struct *vma, unsigned long address) diff --git a/mm/rmap.c b/mm/rmap.c index 36eef54d05d8..8b3d44e56e30 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1410,6 +1410,7 @@ out: static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, unsigned long address, void *arg) { + struct folio *folio = page_folio(page); struct mm_struct *mm = vma->vm_mm; DEFINE_PAGE_VMA_WALK(pvmw, page, vma, address, 0); pte_t pteval; @@ -1428,7 +1429,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, pvmw.flags = PVMW_SYNC; if (flags & TTU_SPLIT_HUGE_PMD) - split_huge_pmd_address(vma, address, false, page); + split_huge_pmd_address(vma, address, false, folio); /* * For THP, we have to assume the worse case ie pmd for invalidation. @@ -1700,6 +1701,7 @@ void try_to_unmap(struct page *page, enum ttu_flags flags) static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, unsigned long address, void *arg) { + struct folio *folio = page_folio(page); struct mm_struct *mm = vma->vm_mm; DEFINE_PAGE_VMA_WALK(pvmw, page, vma, address, 0); pte_t pteval; @@ -1722,7 +1724,7 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, * TTU_SPLIT_HUGE_PMD and it wants to freeze. */ if (flags & TTU_SPLIT_HUGE_PMD) - split_huge_pmd_address(vma, address, true, page); + split_huge_pmd_address(vma, address, true, folio); /* * For THP, we have to assume the worse case ie pmd for invalidation. -- cgit v1.2.3 From 869f7ee6f6477341f859c8b0949ae81caf9ca7f3 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 15 Feb 2022 09:28:49 -0500 Subject: mm/rmap: Convert try_to_unmap() to take a folio Change all three callers and the worker function try_to_unmap_one(). Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 4 +-- mm/huge_memory.c | 3 +- mm/khugepaged.c | 3 +- mm/memory-failure.c | 7 +++-- mm/memory_hotplug.c | 13 ++++---- mm/rmap.c | 83 +++++++++++++++++++++++++++------------------------- mm/vmscan.c | 2 +- 7 files changed, 62 insertions(+), 53 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 69a1664216de..a0c5c38c733f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -194,7 +194,7 @@ int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); void try_to_migrate(struct page *page, enum ttu_flags flags); -void try_to_unmap(struct page *, enum ttu_flags flags); +void try_to_unmap(struct folio *, enum ttu_flags flags); int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, unsigned long end, struct page **pages, @@ -309,7 +309,7 @@ static inline int folio_referenced(struct folio *folio, int is_locked, return 0; } -static inline void try_to_unmap(struct page *page, enum ttu_flags flags) +static inline void try_to_unmap(struct folio *folio, enum ttu_flags flags) { } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 583b735a079b..de684427f79c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2251,6 +2251,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, static void unmap_page(struct page *page) { + struct folio *folio = page_folio(page); enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC; @@ -2264,7 +2265,7 @@ static void unmap_page(struct page *page) if (PageAnon(page)) try_to_migrate(page, ttu_flags); else - try_to_unmap(page, ttu_flags | TTU_IGNORE_MLOCK); + try_to_unmap(folio, ttu_flags | TTU_IGNORE_MLOCK); VM_WARN_ON_ONCE_PAGE(page_mapped(page), page); } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index fa05e6d39783..1cdf7c38b9e5 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1834,7 +1834,8 @@ static void collapse_file(struct mm_struct *mm, } if (page_mapped(page)) - try_to_unmap(page, TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH); + try_to_unmap(page_folio(page), + TTU_IGNORE_MLOCK | TTU_BATCH_FLUSH); xas_lock_irq(&xas); xas_set(&xas, index); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0b72a936b8dd..258913d5e036 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1347,6 +1347,7 @@ static int get_hwpoison_page(struct page *p, unsigned long flags) static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, int flags, struct page *hpage) { + struct folio *folio = page_folio(hpage); enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC; struct address_space *mapping; LIST_HEAD(tokill); @@ -1412,7 +1413,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED); if (!PageHuge(hpage)) { - try_to_unmap(hpage, ttu); + try_to_unmap(folio, ttu); } else { if (!PageAnon(hpage)) { /* @@ -1424,12 +1425,12 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, */ mapping = hugetlb_page_mapping_lock_write(hpage); if (mapping) { - try_to_unmap(hpage, ttu|TTU_RMAP_LOCKED); + try_to_unmap(folio, ttu|TTU_RMAP_LOCKED); i_mmap_unlock_write(mapping); } else pr_info("Memory failure: %#lx: could not lock mapping for mapped huge page\n", pfn); } else { - try_to_unmap(hpage, ttu); + try_to_unmap(folio, ttu); } } diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 2a9627dc784c..914057da53c7 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1690,10 +1690,13 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) DEFAULT_RATELIMIT_BURST); for (pfn = start_pfn; pfn < end_pfn; pfn++) { + struct folio *folio; + if (!pfn_valid(pfn)) continue; page = pfn_to_page(pfn); - head = compound_head(page); + folio = page_folio(page); + head = &folio->page; if (PageHuge(page)) { pfn = page_to_pfn(head) + compound_nr(head) - 1; @@ -1710,10 +1713,10 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) * the unmap as the catch all safety net). */ if (PageHWPoison(page)) { - if (WARN_ON(PageLRU(page))) - isolate_lru_page(page); - if (page_mapped(page)) - try_to_unmap(page, TTU_IGNORE_MLOCK); + if (WARN_ON(folio_test_lru(folio))) + folio_isolate_lru(folio); + if (folio_mapped(folio)) + try_to_unmap(folio, TTU_IGNORE_MLOCK); continue; } diff --git a/mm/rmap.c b/mm/rmap.c index 8b3d44e56e30..cf6e3de9d2f7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1412,7 +1412,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, { struct folio *folio = page_folio(page); struct mm_struct *mm = vma->vm_mm; - DEFINE_PAGE_VMA_WALK(pvmw, page, vma, address, 0); + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; struct page *subpage; bool ret = true; @@ -1436,13 +1436,13 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * For hugetlb, it could be much worse if we need to do pud * invalidation in the case of pmd sharing. * - * Note that the page can not be free in this function as call of - * try_to_unmap() must hold a reference on the page. + * Note that the folio can not be freed in this function as call of + * try_to_unmap() must hold a reference on the folio. */ range.end = vma_address_end(&pvmw); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, address, range.end); - if (PageHuge(page)) { + if (folio_test_hugetlb(folio)) { /* * If sharing is possible, start and end will be adjusted * accordingly. @@ -1454,24 +1454,25 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, while (page_vma_mapped_walk(&pvmw)) { /* Unexpected PMD-mapped THP? */ - VM_BUG_ON_PAGE(!pvmw.pte, page); + VM_BUG_ON_FOLIO(!pvmw.pte, folio); /* - * If the page is in an mlock()d vma, we must not swap it out. + * If the folio is in an mlock()d vma, we must not swap it out. */ if (!(flags & TTU_IGNORE_MLOCK) && (vma->vm_flags & VM_LOCKED)) { /* Restore the mlock which got missed */ - mlock_vma_page(page, vma, false); + mlock_vma_folio(folio, vma, false); page_vma_mapped_walk_done(&pvmw); ret = false; break; } - subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); + subpage = folio_page(folio, + pte_pfn(*pvmw.pte) - folio_pfn(folio)); address = pvmw.address; - if (PageHuge(page) && !PageAnon(page)) { + if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) { /* * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly @@ -1510,7 +1511,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, if (should_defer_flush(mm, flags)) { /* * We clear the PTE but do not flush so potentially - * a remote CPU could still be writing to the page. + * a remote CPU could still be writing to the folio. * If the entry was previously clean then the * architecture must guarantee that a clear->dirty * transition on a cached TLB entry is written through @@ -1523,22 +1524,22 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, pteval = ptep_clear_flush(vma, address, pvmw.pte); } - /* Move the dirty bit to the page. Now the pte is gone. */ + /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) - set_page_dirty(page); + folio_mark_dirty(folio); /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) { + if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); - if (PageHuge(page)) { - hugetlb_count_sub(compound_nr(page), mm); + if (folio_test_hugetlb(folio)) { + hugetlb_count_sub(folio_nr_pages(folio), mm); set_huge_swap_pte_at(mm, address, pvmw.pte, pteval, vma_mmu_pagesize(vma)); } else { - dec_mm_counter(mm, mm_counter(page)); + dec_mm_counter(mm, mm_counter(&folio->page)); set_pte_at(mm, address, pvmw.pte, pteval); } @@ -1553,18 +1554,19 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, * migration) will not expect userfaults on already * copied pages. */ - dec_mm_counter(mm, mm_counter(page)); + dec_mm_counter(mm, mm_counter(&folio->page)); /* We have to invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); - } else if (PageAnon(page)) { + } else if (folio_test_anon(folio)) { swp_entry_t entry = { .val = page_private(subpage) }; pte_t swp_pte; /* * Store the swap location in the pte. * See handle_pte_fault() ... */ - if (unlikely(PageSwapBacked(page) != PageSwapCache(page))) { + if (unlikely(folio_test_swapbacked(folio) != + folio_test_swapcache(folio))) { WARN_ON_ONCE(1); ret = false; /* We have to invalidate as we cleared the pte */ @@ -1575,8 +1577,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } /* MADV_FREE page check */ - if (!PageSwapBacked(page)) { - if (!PageDirty(page)) { + if (!folio_test_swapbacked(folio)) { + if (!folio_test_dirty(folio)) { /* Invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); @@ -1585,11 +1587,11 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, } /* - * If the page was redirtied, it cannot be + * If the folio was redirtied, it cannot be * discarded. Remap the page to page table. */ set_pte_at(mm, address, pvmw.pte, pteval); - SetPageSwapBacked(page); + folio_set_swapbacked(folio); ret = false; page_vma_mapped_walk_done(&pvmw); break; @@ -1626,16 +1628,17 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, address + PAGE_SIZE); } else { /* - * This is a locked file-backed page, thus it cannot - * be removed from the page cache and replaced by a new - * page before mmu_notifier_invalidate_range_end, so no - * concurrent thread might update its page table to - * point at new page while a device still is using this - * page. + * This is a locked file-backed folio, + * so it cannot be removed from the page + * cache and replaced by a new folio before + * mmu_notifier_invalidate_range_end, so no + * concurrent thread might update its page table + * to point at a new folio while a device is + * still using this folio. * * See Documentation/vm/mmu_notifier.rst */ - dec_mm_counter(mm, mm_counter_file(page)); + dec_mm_counter(mm, mm_counter_file(&folio->page)); } discard: /* @@ -1645,10 +1648,10 @@ discard: * * See Documentation/vm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, PageHuge(page)); + page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); if (vma->vm_flags & VM_LOCKED) mlock_page_drain(smp_processor_id()); - put_page(page); + folio_put(folio); } mmu_notifier_invalidate_range_end(&range); @@ -1667,17 +1670,17 @@ static int page_not_mapped(struct page *page) } /** - * try_to_unmap - try to remove all page table mappings to a page - * @page: the page to get unmapped + * try_to_unmap - Try to remove all page table mappings to a folio. + * @folio: The folio to unmap. * @flags: action and flags * * Tries to remove all the page table entries which are mapping this - * page, used in the pageout path. Caller must hold the page lock. + * folio. It is the caller's responsibility to check if the folio is + * still mapped if needed (use TTU_SYNC to prevent accounting races). * - * It is the caller's responsibility to check if the page is still - * mapped when needed (use TTU_SYNC to prevent accounting races). + * Context: Caller must hold the folio lock. */ -void try_to_unmap(struct page *page, enum ttu_flags flags) +void try_to_unmap(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_unmap_one, @@ -1687,9 +1690,9 @@ void try_to_unmap(struct page *page, enum ttu_flags flags) }; if (flags & TTU_RMAP_LOCKED) - rmap_walk_locked(page, &rwc); + rmap_walk_locked(&folio->page, &rwc); else - rmap_walk(page, &rwc); + rmap_walk(&folio->page, &rwc); } /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 38f124c41bcd..a57eb747f08d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1768,7 +1768,7 @@ retry: if (unlikely(PageTransHuge(page))) flags |= TTU_SPLIT_HUGE_PMD; - try_to_unmap(page, flags); + try_to_unmap(folio, flags); if (page_mapped(page)) { stat->nr_unmap_fail += nr_pages; if (!was_swapbacked && PageSwapBacked(page)) -- cgit v1.2.3 From 4b8554c527f3cfa183f6c06d231a9387873205a0 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 28 Jan 2022 14:29:43 -0500 Subject: mm/rmap: Convert try_to_migrate() to folios Convert the callers to pass a folio and the try_to_migrate_one() worker to use a folio throughout. Fixes an assumption that a folio must be <= PMD size. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 2 +- mm/huge_memory.c | 4 ++-- mm/migrate.c | 6 ++++-- mm/migrate_device.c | 6 ++++-- mm/rmap.c | 59 +++++++++++++++++++++++++++------------------------- 5 files changed, 42 insertions(+), 35 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index a0c5c38c733f..e6e935c81414 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -193,7 +193,7 @@ static inline void page_dup_rmap(struct page *page, bool compound) int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); -void try_to_migrate(struct page *page, enum ttu_flags flags); +void try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index de684427f79c..7df1934d6528 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2262,8 +2262,8 @@ static void unmap_page(struct page *page) * pages can simply be left unmapped, then faulted back on demand. * If that is ever changed (perhaps for mlock), update remap_page(). */ - if (PageAnon(page)) - try_to_migrate(page, ttu_flags); + if (folio_test_anon(folio)) + try_to_migrate(folio, ttu_flags); else try_to_unmap(folio, ttu_flags | TTU_IGNORE_MLOCK); diff --git a/mm/migrate.c b/mm/migrate.c index 358bc311caaa..6ed85a5d1be5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -912,6 +912,7 @@ out: static int __unmap_and_move(struct page *page, struct page *newpage, int force, enum migrate_mode mode) { + struct folio *folio = page_folio(page); int rc = -EAGAIN; bool page_was_mapped = false; struct anon_vma *anon_vma = NULL; @@ -1015,7 +1016,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, /* Establish migration ptes */ VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma, page); - try_to_migrate(page, 0); + try_to_migrate(folio, 0); page_was_mapped = true; } @@ -1165,6 +1166,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, enum migrate_mode mode, int reason, struct list_head *ret) { + struct folio *src = page_folio(hpage); int rc = -EAGAIN; int page_was_mapped = 0; struct page *new_hpage; @@ -1241,7 +1243,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, ttu |= TTU_RMAP_LOCKED; } - try_to_migrate(hpage, ttu); + try_to_migrate(src, ttu); page_was_mapped = 1; if (mapping_locked) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 0326b901d2fd..b2c611d4bdb2 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -333,6 +333,7 @@ static void migrate_vma_unmap(struct migrate_vma *migrate) for (i = 0; i < npages; i++) { struct page *page = migrate_pfn_to_page(migrate->src[i]); + struct folio *folio; if (!page) continue; @@ -356,8 +357,9 @@ static void migrate_vma_unmap(struct migrate_vma *migrate) put_page(page); } - if (page_mapped(page)) - try_to_migrate(page, 0); + folio = page_folio(page); + if (folio_mapped(folio)) + try_to_migrate(folio, 0); if (page_mapped(page) || !migrate_vma_check_page(page)) { if (!is_zone_device_page(page)) { diff --git a/mm/rmap.c b/mm/rmap.c index cf6e3de9d2f7..8497da29193c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1706,7 +1706,7 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, { struct folio *folio = page_folio(page); struct mm_struct *mm = vma->vm_mm; - DEFINE_PAGE_VMA_WALK(pvmw, page, vma, address, 0); + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; struct page *subpage; bool ret = true; @@ -1740,7 +1740,7 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, range.end = vma_address_end(&pvmw); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, address, range.end); - if (PageHuge(page)) { + if (folio_test_hugetlb(folio)) { /* * If sharing is possible, start and end will be adjusted * accordingly. @@ -1754,21 +1754,24 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION /* PMD-mapped THP migration entry */ if (!pvmw.pte) { - VM_BUG_ON_PAGE(PageHuge(page) || - !PageTransCompound(page), page); + subpage = folio_page(folio, + pmd_pfn(*pvmw.pmd) - folio_pfn(folio)); + VM_BUG_ON_FOLIO(folio_test_hugetlb(folio) || + !folio_test_pmd_mappable(folio), folio); - set_pmd_migration_entry(&pvmw, page); + set_pmd_migration_entry(&pvmw, subpage); continue; } #endif /* Unexpected PMD-mapped THP? */ - VM_BUG_ON_PAGE(!pvmw.pte, page); + VM_BUG_ON_FOLIO(!pvmw.pte, folio); - subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); + subpage = folio_page(folio, + pte_pfn(*pvmw.pte) - folio_pfn(folio)); address = pvmw.address; - if (PageHuge(page) && !PageAnon(page)) { + if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) { /* * To call huge_pmd_unshare, i_mmap_rwsem must be * held in write mode. Caller needs to explicitly @@ -1806,15 +1809,15 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Move the dirty bit to the page. Now the pte is gone. */ + /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) - set_page_dirty(page); + folio_mark_dirty(folio); /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (is_zone_device_page(page)) { - unsigned long pfn = page_to_pfn(page); + if (folio_is_zone_device(folio)) { + unsigned long pfn = folio_pfn(folio); swp_entry_t entry; pte_t swp_pte; @@ -1850,16 +1853,16 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, * changed when hugepage migrations to device private * memory are supported. */ - subpage = page; - } else if (PageHWPoison(page)) { + subpage = &folio->page; + } else if (PageHWPoison(subpage)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); - if (PageHuge(page)) { - hugetlb_count_sub(compound_nr(page), mm); + if (folio_test_hugetlb(folio)) { + hugetlb_count_sub(folio_nr_pages(folio), mm); set_huge_swap_pte_at(mm, address, pvmw.pte, pteval, vma_mmu_pagesize(vma)); } else { - dec_mm_counter(mm, mm_counter(page)); + dec_mm_counter(mm, mm_counter(&folio->page)); set_pte_at(mm, address, pvmw.pte, pteval); } @@ -1874,7 +1877,7 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, * migration) will not expect userfaults on already * copied pages. */ - dec_mm_counter(mm, mm_counter(page)); + dec_mm_counter(mm, mm_counter(&folio->page)); /* We have to invalidate as we cleared the pte */ mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); @@ -1920,10 +1923,10 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, * * See Documentation/vm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, PageHuge(page)); + page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); if (vma->vm_flags & VM_LOCKED) mlock_page_drain(smp_processor_id()); - put_page(page); + folio_put(folio); } mmu_notifier_invalidate_range_end(&range); @@ -1933,13 +1936,13 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, /** * try_to_migrate - try to replace all page table mappings with swap entries - * @page: the page to replace page table entries for + * @folio: the folio to replace page table entries for * @flags: action and flags * - * Tries to remove all the page table entries which are mapping this page and - * replace them with special swap entries. Caller must hold the page lock. + * Tries to remove all the page table entries which are mapping this folio and + * replace them with special swap entries. Caller must hold the folio lock. */ -void try_to_migrate(struct page *page, enum ttu_flags flags) +void try_to_migrate(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_migrate_one, @@ -1956,7 +1959,7 @@ void try_to_migrate(struct page *page, enum ttu_flags flags) TTU_SYNC))) return; - if (is_zone_device_page(page) && !is_device_private_page(page)) + if (folio_is_zone_device(folio) && !folio_is_device_private(folio)) return; /* @@ -1967,13 +1970,13 @@ void try_to_migrate(struct page *page, enum ttu_flags flags) * locking requirements of exec(), migration skips * temporary VMAs until after exec() completes. */ - if (!PageKsm(page) && PageAnon(page)) + if (!folio_test_ksm(folio) && folio_test_anon(folio)) rwc.invalid_vma = invalid_migration_vma; if (flags & TTU_RMAP_LOCKED) - rmap_walk_locked(page, &rwc); + rmap_walk_locked(&folio->page, &rwc); else - rmap_walk(page, &rwc); + rmap_walk(&folio->page, &rwc); } #ifdef CONFIG_DEVICE_PRIVATE -- cgit v1.2.3 From 4eecb8b9163df82c87c91764a02fff228ef25f6d Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 28 Jan 2022 23:32:59 -0500 Subject: mm/migrate: Convert remove_migration_ptes() to folios Convert the implementation and all callers. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 2 +- mm/huge_memory.c | 24 ++++++++++++----------- mm/migrate.c | 55 ++++++++++++++++++++++++++++------------------------ mm/migrate_device.c | 15 +++++++++----- 4 files changed, 54 insertions(+), 42 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index e6e935c81414..21af80d5b711 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -261,7 +261,7 @@ unsigned long page_address_in_vma(struct page *, struct vm_area_struct *); */ int folio_mkclean(struct folio *); -void remove_migration_ptes(struct page *old, struct page *new, bool locked); +void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); /* * Called by memory-failure.c to kill processes. diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7df1934d6528..d55b25f1ceba 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2270,18 +2270,19 @@ static void unmap_page(struct page *page) VM_WARN_ON_ONCE_PAGE(page_mapped(page), page); } -static void remap_page(struct page *page, unsigned int nr) +static void remap_page(struct folio *folio, unsigned long nr) { - int i; + int i = 0; /* If unmap_page() uses try_to_migrate() on file, remove this check */ - if (!PageAnon(page)) + if (!folio_test_anon(folio)) return; - if (PageTransHuge(page)) { - remove_migration_ptes(page, page, true); - } else { - for (i = 0; i < nr; i++) - remove_migration_ptes(page + i, page + i, true); + for (;;) { + remove_migration_ptes(folio, folio, true); + i += folio_nr_pages(folio); + if (i >= nr) + break; + folio = folio_next(folio); } } @@ -2441,7 +2442,7 @@ static void __split_huge_page(struct page *page, struct list_head *list, } local_irq_enable(); - remap_page(head, nr); + remap_page(folio, nr); if (PageSwapCache(head)) { swp_entry_t entry = { .val = page_private(head) }; @@ -2550,7 +2551,8 @@ bool can_split_huge_page(struct page *page, int *pextra_pins) */ int split_huge_page_to_list(struct page *page, struct list_head *list) { - struct page *head = compound_head(page); + struct folio *folio = page_folio(page); + struct page *head = &folio->page; struct deferred_split *ds_queue = get_deferred_split_queue(head); XA_STATE(xas, &head->mapping->i_pages, head->index); struct anon_vma *anon_vma = NULL; @@ -2667,7 +2669,7 @@ fail: if (mapping) xas_unlock(&xas); local_irq_enable(); - remap_page(head, thp_nr_pages(head)); + remap_page(folio, folio_nr_pages(folio)); ret = -EBUSY; } diff --git a/mm/migrate.c b/mm/migrate.c index 6ed85a5d1be5..eba3cd5376e3 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -174,30 +174,32 @@ void putback_movable_pages(struct list_head *l) static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, unsigned long addr, void *old) { - DEFINE_PAGE_VMA_WALK(pvmw, (struct page *)old, vma, addr, - PVMW_SYNC | PVMW_MIGRATION); - struct page *new; - pte_t pte; - swp_entry_t entry; + struct folio *folio = page_folio(page); + DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION); VM_BUG_ON_PAGE(PageTail(page), page); while (page_vma_mapped_walk(&pvmw)) { - if (PageKsm(page)) - new = page; - else - new = page - pvmw.pgoff + - linear_page_index(vma, pvmw.address); + pte_t pte; + swp_entry_t entry; + struct page *new; + unsigned long idx = 0; + + /* pgoff is invalid for ksm pages, but they are never large */ + if (folio_test_large(folio) && !folio_test_hugetlb(folio)) + idx = linear_page_index(vma, pvmw.address) - pvmw.pgoff; + new = folio_page(folio, idx); #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION /* PMD-mapped THP migration entry */ if (!pvmw.pte) { - VM_BUG_ON_PAGE(PageHuge(page) || !PageTransCompound(page), page); + VM_BUG_ON_FOLIO(folio_test_hugetlb(folio) || + !folio_test_pmd_mappable(folio), folio); remove_migration_pmd(&pvmw, new); continue; } #endif - get_page(new); + folio_get(folio); pte = pte_mkold(mk_pte(new, READ_ONCE(vma->vm_page_prot))); if (pte_swp_soft_dirty(*pvmw.pte)) pte = pte_mksoft_dirty(pte); @@ -226,12 +228,12 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, } #ifdef CONFIG_HUGETLB_PAGE - if (PageHuge(new)) { + if (folio_test_hugetlb(folio)) { unsigned int shift = huge_page_shift(hstate_vma(vma)); pte = pte_mkhuge(pte); pte = arch_make_huge_pte(pte, shift, vma->vm_flags); - if (PageAnon(new)) + if (folio_test_anon(folio)) hugepage_add_anon_rmap(new, vma, pvmw.address); else page_dup_rmap(new, true); @@ -239,7 +241,7 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, } else #endif { - if (PageAnon(new)) + if (folio_test_anon(folio)) page_add_anon_rmap(new, vma, pvmw.address, false); else page_add_file_rmap(new, vma, false); @@ -259,17 +261,17 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, * Get rid of all migration entries and replace them by * references to the indicated page. */ -void remove_migration_ptes(struct page *old, struct page *new, bool locked) +void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked) { struct rmap_walk_control rwc = { .rmap_one = remove_migration_pte, - .arg = old, + .arg = src, }; if (locked) - rmap_walk_locked(new, &rwc); + rmap_walk_locked(&dst->page, &rwc); else - rmap_walk(new, &rwc); + rmap_walk(&dst->page, &rwc); } /* @@ -756,6 +758,7 @@ int buffer_migrate_page_norefs(struct address_space *mapping, */ static int writeout(struct address_space *mapping, struct page *page) { + struct folio *folio = page_folio(page); struct writeback_control wbc = { .sync_mode = WB_SYNC_NONE, .nr_to_write = 1, @@ -781,7 +784,7 @@ static int writeout(struct address_space *mapping, struct page *page) * At this point we know that the migration attempt cannot * be successful. */ - remove_migration_ptes(page, page, false); + remove_migration_ptes(folio, folio, false); rc = mapping->a_ops->writepage(page, &wbc); @@ -913,6 +916,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, int force, enum migrate_mode mode) { struct folio *folio = page_folio(page); + struct folio *dst = page_folio(newpage); int rc = -EAGAIN; bool page_was_mapped = false; struct anon_vma *anon_vma = NULL; @@ -1039,8 +1043,8 @@ static int __unmap_and_move(struct page *page, struct page *newpage, } if (page_was_mapped) - remove_migration_ptes(page, - rc == MIGRATEPAGE_SUCCESS ? newpage : page, false); + remove_migration_ptes(folio, + rc == MIGRATEPAGE_SUCCESS ? dst : folio, false); out_unlock_both: unlock_page(newpage); @@ -1166,7 +1170,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, enum migrate_mode mode, int reason, struct list_head *ret) { - struct folio *src = page_folio(hpage); + struct folio *dst, *src = page_folio(hpage); int rc = -EAGAIN; int page_was_mapped = 0; struct page *new_hpage; @@ -1194,6 +1198,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, new_hpage = get_new_page(hpage, private); if (!new_hpage) return -ENOMEM; + dst = page_folio(new_hpage); if (!trylock_page(hpage)) { if (!force) @@ -1254,8 +1259,8 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, rc = move_to_new_page(new_hpage, hpage, mode); if (page_was_mapped) - remove_migration_ptes(hpage, - rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage, false); + remove_migration_ptes(src, + rc == MIGRATEPAGE_SUCCESS ? dst : src, false); unlock_put_anon: unlock_page(new_hpage); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index b2c611d4bdb2..70c7dc05bbfc 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -376,15 +376,17 @@ static void migrate_vma_unmap(struct migrate_vma *migrate) for (i = 0; i < npages && restore; i++) { struct page *page = migrate_pfn_to_page(migrate->src[i]); + struct folio *folio; if (!page || (migrate->src[i] & MIGRATE_PFN_MIGRATE)) continue; - remove_migration_ptes(page, page, false); + folio = page_folio(page); + remove_migration_ptes(folio, folio, false); migrate->src[i] = 0; - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); restore--; } } @@ -729,6 +731,7 @@ void migrate_vma_finalize(struct migrate_vma *migrate) unsigned long i; for (i = 0; i < npages; i++) { + struct folio *dst, *src; struct page *newpage = migrate_pfn_to_page(migrate->dst[i]); struct page *page = migrate_pfn_to_page(migrate->src[i]); @@ -748,8 +751,10 @@ void migrate_vma_finalize(struct migrate_vma *migrate) newpage = page; } - remove_migration_ptes(page, newpage, false); - unlock_page(page); + src = page_folio(page); + dst = page_folio(newpage); + remove_migration_ptes(src, dst, false); + folio_unlock(src); if (is_zone_device_page(page)) put_page(page); -- cgit v1.2.3 From 2f031c6f042cb8a9b221a8b6b80e69de5170f830 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 29 Jan 2022 16:06:53 -0500 Subject: mm/rmap: Convert rmap_walk() to take a folio This ripples all the way through to every calling and called function from rmap. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/ksm.h | 4 +- include/linux/rmap.h | 11 +++-- mm/damon/paddr.c | 15 ++++--- mm/folio-compat.c | 7 ---- mm/huge_memory.c | 2 +- mm/ksm.c | 12 +++--- mm/migrate.c | 10 ++--- mm/page_idle.c | 7 ++-- mm/rmap.c | 111 ++++++++++++++++++++++++--------------------------- 9 files changed, 80 insertions(+), 99 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/include/linux/ksm.h b/include/linux/ksm.h index a38a5bca1ba5..0b4f17418f64 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -51,7 +51,7 @@ static inline void ksm_exit(struct mm_struct *mm) struct page *ksm_might_need_to_copy(struct page *page, struct vm_area_struct *vma, unsigned long address); -void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc); +void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc); void folio_migrate_ksm(struct folio *newfolio, struct folio *folio); #else /* !CONFIG_KSM */ @@ -78,7 +78,7 @@ static inline struct page *ksm_might_need_to_copy(struct page *page, return page; } -static inline void rmap_walk_ksm(struct page *page, +static inline void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc) { } diff --git a/include/linux/rmap.h b/include/linux/rmap.h index be020d38b0a5..a74d811c5b3f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -266,7 +266,6 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); /* * Called by memory-failure.c to kill processes. */ -struct anon_vma *page_lock_anon_vma_read(struct page *page); struct anon_vma *folio_lock_anon_vma_read(struct folio *folio); void page_unlock_anon_vma_read(struct anon_vma *anon_vma); int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); @@ -286,15 +285,15 @@ struct rmap_walk_control { * Return false if page table scanning in rmap_walk should be stopped. * Otherwise, return true. */ - bool (*rmap_one)(struct page *page, struct vm_area_struct *vma, + bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg); - int (*done)(struct page *page); - struct anon_vma *(*anon_lock)(struct page *page); + int (*done)(struct folio *folio); + struct anon_vma *(*anon_lock)(struct folio *folio); bool (*invalid_vma)(struct vm_area_struct *vma, void *arg); }; -void rmap_walk(struct page *page, struct rmap_walk_control *rwc); -void rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc); +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); #else /* !CONFIG_MMU */ diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c index eee8a3395d71..ae24549921e2 100644 --- a/mm/damon/paddr.c +++ b/mm/damon/paddr.c @@ -16,10 +16,10 @@ #include "../internal.h" #include "prmtv-common.h" -static bool __damon_pa_mkold(struct page *page, struct vm_area_struct *vma, +static bool __damon_pa_mkold(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg) { - DEFINE_PAGE_VMA_WALK(pvmw, page, vma, addr, 0); + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, addr, 0); while (page_vma_mapped_walk(&pvmw)) { addr = pvmw.address; @@ -37,7 +37,7 @@ static void damon_pa_mkold(unsigned long paddr) struct page *page = damon_get_page(PHYS_PFN(paddr)); struct rmap_walk_control rwc = { .rmap_one = __damon_pa_mkold, - .anon_lock = page_lock_anon_vma_read, + .anon_lock = folio_lock_anon_vma_read, }; bool need_lock; @@ -54,7 +54,7 @@ static void damon_pa_mkold(unsigned long paddr) if (need_lock && !folio_trylock(folio)) goto out; - rmap_walk(&folio->page, &rwc); + rmap_walk(folio, &rwc); if (need_lock) folio_unlock(folio); @@ -87,10 +87,9 @@ struct damon_pa_access_chk_result { bool accessed; }; -static bool __damon_pa_young(struct page *page, struct vm_area_struct *vma, +static bool __damon_pa_young(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg) { - struct folio *folio = page_folio(page); struct damon_pa_access_chk_result *result = arg; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, addr, 0); @@ -133,7 +132,7 @@ static bool damon_pa_young(unsigned long paddr, unsigned long *page_sz) struct rmap_walk_control rwc = { .arg = &result, .rmap_one = __damon_pa_young, - .anon_lock = page_lock_anon_vma_read, + .anon_lock = folio_lock_anon_vma_read, }; bool need_lock; @@ -156,7 +155,7 @@ static bool damon_pa_young(unsigned long paddr, unsigned long *page_sz) return NULL; } - rmap_walk(&folio->page, &rwc); + rmap_walk(folio, &rwc); if (need_lock) folio_unlock(folio); diff --git a/mm/folio-compat.c b/mm/folio-compat.c index 968ad97bbffa..46fa179e32fb 100644 --- a/mm/folio-compat.c +++ b/mm/folio-compat.c @@ -164,10 +164,3 @@ void putback_lru_page(struct page *page) { folio_putback_lru(page_folio(page)); } - -#ifdef CONFIG_MMU -struct anon_vma *page_lock_anon_vma_read(struct page *page) -{ - return folio_lock_anon_vma_read(page_folio(page)); -} -#endif diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d55b25f1ceba..d874d50e703b 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2572,7 +2572,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) * The caller does not necessarily hold an mmap_lock that would * prevent the anon_vma disappearing so we first we take a * reference to it and then lock the anon_vma for write. This - * is similar to page_lock_anon_vma_read except the write lock + * is similar to folio_lock_anon_vma_read except the write lock * is taken to serialise against parallel split or collapse * operations. */ diff --git a/mm/ksm.c b/mm/ksm.c index b25d545e0cd1..dd737f925c04 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2588,21 +2588,21 @@ struct page *ksm_might_need_to_copy(struct page *page, return new_page; } -void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc) +void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc) { struct stable_node *stable_node; struct rmap_item *rmap_item; int search_new_forks = 0; - VM_BUG_ON_PAGE(!PageKsm(page), page); + VM_BUG_ON_FOLIO(!folio_test_ksm(folio), folio); /* * Rely on the page lock to protect against concurrent modifications * to that page's node of the stable tree. */ - VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - stable_node = page_stable_node(page); + stable_node = folio_stable_node(folio); if (!stable_node) return; again: @@ -2637,11 +2637,11 @@ again: if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg)) continue; - if (!rwc->rmap_one(page, vma, addr, rwc->arg)) { + if (!rwc->rmap_one(folio, vma, addr, rwc->arg)) { anon_vma_unlock_read(anon_vma); return; } - if (rwc->done && rwc->done(page)) { + if (rwc->done && rwc->done(folio)) { anon_vma_unlock_read(anon_vma); return; } diff --git a/mm/migrate.c b/mm/migrate.c index eba3cd5376e3..2defe9aa4d0e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -171,13 +171,11 @@ void putback_movable_pages(struct list_head *l) /* * Restore a potential migration pte to a working pte entry */ -static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma, - unsigned long addr, void *old) +static bool remove_migration_pte(struct folio *folio, + struct vm_area_struct *vma, unsigned long addr, void *old) { - struct folio *folio = page_folio(page); DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION); - VM_BUG_ON_PAGE(PageTail(page), page); while (page_vma_mapped_walk(&pvmw)) { pte_t pte; swp_entry_t entry; @@ -269,9 +267,9 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked) }; if (locked) - rmap_walk_locked(&dst->page, &rwc); + rmap_walk_locked(dst, &rwc); else - rmap_walk(&dst->page, &rwc); + rmap_walk(dst, &rwc); } /* diff --git a/mm/page_idle.c b/mm/page_idle.c index 5c73a9b578da..e34ba04e22e2 100644 --- a/mm/page_idle.c +++ b/mm/page_idle.c @@ -46,11 +46,10 @@ static struct page *page_idle_get_page(unsigned long pfn) return page; } -static bool page_idle_clear_pte_refs_one(struct page *page, +static bool page_idle_clear_pte_refs_one(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg) { - struct folio *folio = page_folio(page); DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, addr, 0); bool referenced = false; @@ -93,7 +92,7 @@ static void page_idle_clear_pte_refs(struct page *page) */ static const struct rmap_walk_control rwc = { .rmap_one = page_idle_clear_pte_refs_one, - .anon_lock = page_lock_anon_vma_read, + .anon_lock = folio_lock_anon_vma_read, }; bool need_lock; @@ -104,7 +103,7 @@ static void page_idle_clear_pte_refs(struct page *page) if (need_lock && !folio_trylock(folio)) return; - rmap_walk(&folio->page, (struct rmap_walk_control *)&rwc); + rmap_walk(folio, (struct rmap_walk_control *)&rwc); if (need_lock) folio_unlock(folio); diff --git a/mm/rmap.c b/mm/rmap.c index 09301aecf2fc..2cc97c64901e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -107,15 +107,15 @@ static inline void anon_vma_free(struct anon_vma *anon_vma) VM_BUG_ON(atomic_read(&anon_vma->refcount)); /* - * Synchronize against page_lock_anon_vma_read() such that + * Synchronize against folio_lock_anon_vma_read() such that * we can safely hold the lock without the anon_vma getting * freed. * * Relies on the full mb implied by the atomic_dec_and_test() from * put_anon_vma() against the acquire barrier implied by - * down_read_trylock() from page_lock_anon_vma_read(). This orders: + * down_read_trylock() from folio_lock_anon_vma_read(). This orders: * - * page_lock_anon_vma_read() VS put_anon_vma() + * folio_lock_anon_vma_read() VS put_anon_vma() * down_read_trylock() atomic_dec_and_test() * LOCK MB * atomic_read() rwsem_is_locked() @@ -168,7 +168,7 @@ static void anon_vma_chain_link(struct vm_area_struct *vma, * allocate a new one. * * Anon-vma allocations are very subtle, because we may have - * optimistically looked up an anon_vma in page_lock_anon_vma_read() + * optimistically looked up an anon_vma in folio_lock_anon_vma_read() * and that may actually touch the rwsem even in the newly * allocated vma (it depends on RCU to make sure that the * anon_vma isn't actually destroyed). @@ -799,10 +799,9 @@ struct folio_referenced_arg { /* * arg: folio_referenced_arg will be passed */ -static bool folio_referenced_one(struct page *page, struct vm_area_struct *vma, - unsigned long address, void *arg) +static bool folio_referenced_one(struct folio *folio, + struct vm_area_struct *vma, unsigned long address, void *arg) { - struct folio *folio = page_folio(page); struct folio_referenced_arg *pra = arg; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); int referenced = 0; @@ -894,7 +893,7 @@ int folio_referenced(struct folio *folio, int is_locked, struct rmap_walk_control rwc = { .rmap_one = folio_referenced_one, .arg = (void *)&pra, - .anon_lock = page_lock_anon_vma_read, + .anon_lock = folio_lock_anon_vma_read, }; *vm_flags = 0; @@ -919,7 +918,7 @@ int folio_referenced(struct folio *folio, int is_locked, rwc.invalid_vma = invalid_folio_referenced_vma; } - rmap_walk(&folio->page, &rwc); + rmap_walk(folio, &rwc); *vm_flags = pra.vm_flags; if (we_locked) @@ -928,10 +927,9 @@ int folio_referenced(struct folio *folio, int is_locked, return pra.referenced; } -static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma, +static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { - struct folio *folio = page_folio(page); DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_SYNC); struct mmu_notifier_range range; int *cleaned = arg; @@ -1025,7 +1023,7 @@ int folio_mkclean(struct folio *folio) if (!mapping) return 0; - rmap_walk(&folio->page, &rwc); + rmap_walk(folio, &rwc); return cleaned; } @@ -1410,10 +1408,9 @@ out: /* * @arg: enum ttu_flags will be passed to this argument */ -static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, +static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { - struct folio *folio = page_folio(page); struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; @@ -1667,9 +1664,9 @@ static bool invalid_migration_vma(struct vm_area_struct *vma, void *arg) return vma_is_temporary_stack(vma); } -static int page_not_mapped(struct page *page) +static int page_not_mapped(struct folio *folio) { - return !page_mapped(page); + return !folio_mapped(folio); } /** @@ -1689,13 +1686,13 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags) .rmap_one = try_to_unmap_one, .arg = (void *)flags, .done = page_not_mapped, - .anon_lock = page_lock_anon_vma_read, + .anon_lock = folio_lock_anon_vma_read, }; if (flags & TTU_RMAP_LOCKED) - rmap_walk_locked(&folio->page, &rwc); + rmap_walk_locked(folio, &rwc); else - rmap_walk(&folio->page, &rwc); + rmap_walk(folio, &rwc); } /* @@ -1704,10 +1701,9 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags) * If TTU_SPLIT_HUGE_PMD is specified any PMD mappings will be split into PTEs * containing migration entries. */ -static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma, +static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *arg) { - struct folio *folio = page_folio(page); struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; @@ -1951,7 +1947,7 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) .rmap_one = try_to_migrate_one, .arg = (void *)flags, .done = page_not_mapped, - .anon_lock = page_lock_anon_vma_read, + .anon_lock = folio_lock_anon_vma_read, }; /* @@ -1977,9 +1973,9 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) rwc.invalid_vma = invalid_migration_vma; if (flags & TTU_RMAP_LOCKED) - rmap_walk_locked(&folio->page, &rwc); + rmap_walk_locked(folio, &rwc); else - rmap_walk(&folio->page, &rwc); + rmap_walk(folio, &rwc); } #ifdef CONFIG_DEVICE_PRIVATE @@ -1990,10 +1986,9 @@ struct make_exclusive_args { bool valid; }; -static bool page_make_device_exclusive_one(struct page *page, +static bool page_make_device_exclusive_one(struct folio *folio, struct vm_area_struct *vma, unsigned long address, void *priv) { - struct folio *folio = page_folio(page); struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); struct make_exclusive_args *args = priv; @@ -2098,7 +2093,7 @@ static bool folio_make_device_exclusive(struct folio *folio, struct rmap_walk_control rwc = { .rmap_one = page_make_device_exclusive_one, .done = page_not_mapped, - .anon_lock = page_lock_anon_vma_read, + .anon_lock = folio_lock_anon_vma_read, .arg = &args, }; @@ -2109,7 +2104,7 @@ static bool folio_make_device_exclusive(struct folio *folio, if (!folio_test_anon(folio)) return false; - rmap_walk(&folio->page, &rwc); + rmap_walk(folio, &rwc); return args.valid && !folio_mapcount(folio); } @@ -2177,17 +2172,16 @@ void __put_anon_vma(struct anon_vma *anon_vma) anon_vma_free(root); } -static struct anon_vma *rmap_walk_anon_lock(struct page *page, +static struct anon_vma *rmap_walk_anon_lock(struct folio *folio, struct rmap_walk_control *rwc) { - struct folio *folio = page_folio(page); struct anon_vma *anon_vma; if (rwc->anon_lock) - return rwc->anon_lock(page); + return rwc->anon_lock(folio); /* - * Note: remove_migration_ptes() cannot use page_lock_anon_vma_read() + * Note: remove_migration_ptes() cannot use folio_lock_anon_vma_read() * because that depends on page_mapped(); but not all its usages * are holding mmap_lock. Users without mmap_lock are required to * take a reference count to prevent the anon_vma disappearing @@ -2209,10 +2203,9 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page, * Find all the mappings of a page using the mapping pointer and the vma chains * contained in the anon_vma struct it points to. */ -static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, +static void rmap_walk_anon(struct folio *folio, struct rmap_walk_control *rwc, bool locked) { - struct folio *folio = page_folio(page); struct anon_vma *anon_vma; pgoff_t pgoff_start, pgoff_end; struct anon_vma_chain *avc; @@ -2222,17 +2215,17 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, /* anon_vma disappear under us? */ VM_BUG_ON_FOLIO(!anon_vma, folio); } else { - anon_vma = rmap_walk_anon_lock(page, rwc); + anon_vma = rmap_walk_anon_lock(folio, rwc); } if (!anon_vma) return; - pgoff_start = page_to_pgoff(page); - pgoff_end = pgoff_start + thp_nr_pages(page) - 1; + pgoff_start = folio_pgoff(folio); + pgoff_end = pgoff_start + folio_nr_pages(folio) - 1; anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff_start, pgoff_end) { struct vm_area_struct *vma = avc->vma; - unsigned long address = vma_address(page, vma); + unsigned long address = vma_address(&folio->page, vma); VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched(); @@ -2240,9 +2233,9 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg)) continue; - if (!rwc->rmap_one(page, vma, address, rwc->arg)) + if (!rwc->rmap_one(folio, vma, address, rwc->arg)) break; - if (rwc->done && rwc->done(page)) + if (rwc->done && rwc->done(folio)) break; } @@ -2258,10 +2251,10 @@ static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc, * Find all the mappings of a page using the mapping pointer and the vma chains * contained in the address_space struct it points to. */ -static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc, +static void rmap_walk_file(struct folio *folio, struct rmap_walk_control *rwc, bool locked) { - struct address_space *mapping = page_mapping(page); + struct address_space *mapping = folio_mapping(folio); pgoff_t pgoff_start, pgoff_end; struct vm_area_struct *vma; @@ -2271,18 +2264,18 @@ static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc, * structure at mapping cannot be freed and reused yet, * so we can safely take mapping->i_mmap_rwsem. */ - VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); if (!mapping) return; - pgoff_start = page_to_pgoff(page); - pgoff_end = pgoff_start + thp_nr_pages(page) - 1; + pgoff_start = folio_pgoff(folio); + pgoff_end = pgoff_start + folio_nr_pages(folio) - 1; if (!locked) i_mmap_lock_read(mapping); vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff_start, pgoff_end) { - unsigned long address = vma_address(page, vma); + unsigned long address = vma_address(&folio->page, vma); VM_BUG_ON_VMA(address == -EFAULT, vma); cond_resched(); @@ -2290,9 +2283,9 @@ static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc, if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg)) continue; - if (!rwc->rmap_one(page, vma, address, rwc->arg)) + if (!rwc->rmap_one(folio, vma, address, rwc->arg)) goto done; - if (rwc->done && rwc->done(page)) + if (rwc->done && rwc->done(folio)) goto done; } @@ -2301,25 +2294,25 @@ done: i_mmap_unlock_read(mapping); } -void rmap_walk(struct page *page, struct rmap_walk_control *rwc) +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc) { - if (unlikely(PageKsm(page))) - rmap_walk_ksm(page, rwc); - else if (PageAnon(page)) - rmap_walk_anon(page, rwc, false); + if (unlikely(folio_test_ksm(folio))) + rmap_walk_ksm(folio, rwc); + else if (folio_test_anon(folio)) + rmap_walk_anon(folio, rwc, false); else - rmap_walk_file(page, rwc, false); + rmap_walk_file(folio, rwc, false); } /* Like rmap_walk, but caller holds relevant rmap lock */ -void rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc) +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc) { /* no ksm support for now */ - VM_BUG_ON_PAGE(PageKsm(page), page); - if (PageAnon(page)) - rmap_walk_anon(page, rwc, true); + VM_BUG_ON_FOLIO(folio_test_ksm(folio), folio); + if (folio_test_anon(folio)) + rmap_walk_anon(folio, rwc, true); else - rmap_walk_file(page, rwc, true); + rmap_walk_file(folio, rwc, true); } #ifdef CONFIG_HUGETLB_PAGE -- cgit v1.2.3 From d4b4084ac3154c51ff5fa71f669264cc44429be2 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 4 Feb 2022 14:13:31 -0500 Subject: mm: Turn can_split_huge_page() into can_split_folio() This function already required a head page to be passed, so this just adds type-safety and removes a few implicit calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/huge_mm.h | 4 ++-- mm/huge_memory.c | 15 ++++++++------- mm/vmscan.c | 6 +++--- 3 files changed, 13 insertions(+), 12 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4368b314d9c8..e0348bca3d66 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -185,7 +185,7 @@ void prep_transhuge_page(struct page *page); void free_transhuge_page(struct page *page); bool is_transparent_hugepage(struct page *page); -bool can_split_huge_page(struct page *page, int *pextra_pins); +bool can_split_folio(struct folio *folio, int *pextra_pins); int split_huge_page_to_list(struct page *page, struct list_head *list); static inline int split_huge_page(struct page *page) { @@ -387,7 +387,7 @@ static inline bool is_transparent_hugepage(struct page *page) #define thp_get_unmapped_area NULL static inline bool -can_split_huge_page(struct page *page, int *pextra_pins) +can_split_folio(struct folio *folio, int *pextra_pins) { BUILD_BUG(); return false; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index d874d50e703b..38e233a7d977 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2516,18 +2516,19 @@ int page_trans_huge_mapcount(struct page *page) } /* Racy check whether the huge page can be split */ -bool can_split_huge_page(struct page *page, int *pextra_pins) +bool can_split_folio(struct folio *folio, int *pextra_pins) { int extra_pins; /* Additional pins from page cache */ - if (PageAnon(page)) - extra_pins = PageSwapCache(page) ? thp_nr_pages(page) : 0; + if (folio_test_anon(folio)) + extra_pins = folio_test_swapcache(folio) ? + folio_nr_pages(folio) : 0; else - extra_pins = thp_nr_pages(page); + extra_pins = folio_nr_pages(folio); if (pextra_pins) *pextra_pins = extra_pins; - return total_mapcount(page) == page_count(page) - extra_pins - 1; + return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins - 1; } /* @@ -2619,7 +2620,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) * Racy check if we can split the page, before unmap_page() will * split PMDs */ - if (!can_split_huge_page(head, &extra_pins)) { + if (!can_split_folio(folio, &extra_pins)) { ret = -EBUSY; goto out_unlock; } @@ -2928,7 +2929,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start, goto next; total++; - if (!can_split_huge_page(compound_head(page), NULL)) + if (!can_split_folio(page_folio(page), NULL)) goto next; if (!trylock_page(page)) diff --git a/mm/vmscan.c b/mm/vmscan.c index 32473e069f68..7db5d0237333 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1703,18 +1703,18 @@ retry: if (!PageSwapCache(page)) { if (!(sc->gfp_mask & __GFP_IO)) goto keep_locked; - if (page_maybe_dma_pinned(page)) + if (folio_maybe_dma_pinned(folio)) goto keep_locked; if (PageTransHuge(page)) { /* cannot split THP, skip it */ - if (!can_split_huge_page(page, NULL)) + if (!can_split_folio(folio, NULL)) goto activate_locked; /* * Split pages without a PMD map right * away. Chances are some or all of the * tail pages can be freed without IO. */ - if (!compound_mapcount(page) && + if (!folio_entire_mapcount(folio) && split_folio_to_list(folio, page_list)) goto activate_locked; -- cgit v1.2.3 From 1854bc6e2420472676c5c90d3d6b15f6cd640e40 Mon Sep 17 00:00:00 2001 From: William Kucharski Date: Sun, 22 Sep 2019 08:43:15 -0400 Subject: mm/readahead: Align file mappings for non-DAX When we have the opportunity to use PMDs to map a file, we want to follow the same rules as DAX. Signed-off-by: William Kucharski Signed-off-by: Matthew Wilcox (Oracle) --- mm/huge_memory.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'mm/huge_memory.c') diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 38e233a7d977..f85b04b31bd1 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -582,13 +582,10 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long ret; loff_t off = (loff_t)pgoff << PAGE_SHIFT; - if (!IS_DAX(filp->f_mapping->host) || !IS_ENABLED(CONFIG_FS_DAX_PMD)) - goto out; - ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE); if (ret) return ret; -out: + return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags); } EXPORT_SYMBOL_GPL(thp_get_unmapped_area); -- cgit v1.2.3