summaryrefslogtreecommitdiff
path: root/mm/mremap.c
AgeCommit message (Collapse)AuthorFilesLines
2023-02-10mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan1-3/+3
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-10mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASKSuren Baghdasaryan1-1/+1
To simplify the usage of VM_LOCKED_CLEAR_MASK in vm_flags_clear(), replace it with VM_LOCKED_MASK bitmask and convert all users. Link: https://lkml.kernel.org/r/20230126193752.297968-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-10mm/mremap: convert vma_adjust() to vma_expand()Liam R. Howlett1-2/+2
Stop using vma_adjust() in preparation for removing the function. Export vma_expand() to use instead. Link: https://lkml.kernel.org/r/20230120162650.984577-45-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-10mm: don't use __vma_adjust() in __split_vma()Liam R. Howlett1-1/+1
Use the abstracted locking and maple tree operations. Since __split_vma() is the only user of the __vma_adjust() function to use the insert argument, drop that argument. Remove the NULL passed through from fs/exec's shift_arg_pages() and mremap() at the same time. Link: https://lkml.kernel.org/r/20230120162650.984577-44-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-10mm: change munmap splitting order and move_vma()Liam R. Howlett1-11/+17
Splitting can be more efficient when the order is not of concern. Change do_vmi_align_munmap() to reduce walking of the tree during split operations. move_vma() must also be altered to remove the dependency of keeping the original VMA as the active part of the split. Transition to using vma iterator to look up the prev and/or next vma after munmap. [Liam.Howlett@oracle.com: fix vma iterator initialization] Link: https://lkml.kernel.org/r/20230126212011.980350-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20230120162650.984577-39-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-10mm: add vma iterator to vma_adjust() argumentsLiam R. Howlett1-2/+2
Change the vma_adjust() function definition to accept the vma iterator and pass it through to __vma_adjust(). Update fs/exec to use the new vma_adjust() function parameters. Update mm/mremap to use the new vma_adjust() function parameters. Revert the __split_vma() calls back from __vma_adjust() to vma_adjust() and pass through the vma iterator. Link: https://lkml.kernel.org/r/20230120162650.984577-37-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-10mm: switch vma_merge(), split_vma(), and __split_vma to vma iteratorLiam R. Howlett1-6/+4
Drop the vmi_* functions and transition all users to use the vma iterator directly. Link: https://lkml.kernel.org/r/20230120162650.984577-30-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-10mm/mremap: use vmi version of vma_merge()Liam R. Howlett1-4/+7
Use the vma iterator so that the iterator can be invalidated or updated to avoid each caller doing so. Link: https://lkml.kernel.org/r/20230120162650.984577-27-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-10mmap: change do_mas_munmap and do_mas_aligned_munmap() to use vma iteratorLiam R. Howlett1-3/+3
Start passing the vma iterator through the mm code. This will allow for reuse of the state and cleaner invalidation if necessary. Link: https://lkml.kernel.org/r/20230120162650.984577-13-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03mm/mmu_notifier: remove unused mmu_notifier_range_update_to_read_only exportAlistair Popple1-1/+1
mmu_notifier_range_update_to_read_only() was originally introduced in commit c6d23413f81b ("mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper") as an optimisation for device drivers that know a range has only been mapped read-only. However there are no users of this feature so remove it. As it is the only user of the struct mmu_notifier_range.vma field remove that also. Link: https://lkml.kernel.org/r/20230110025722.600912-1-apopple@nvidia.com Signed-off-by: Alistair Popple <apopple@nvidia.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-01mm, mremap: fix mremap() expanding for vma's with vm_ops->close()Vlastimil Babka1-6/+19
Fabian has reported another regression in 6.1 due to ca3d76b0aa80 ("mm: add merging after mremap resize"). The problem is that vma_merge() can fail when vma has a vm_ops->close() method, causing is_mergeable_vma() test to be negative. This was happening for vma mapping a file from fuse-overlayfs, which does have the method. But when we are simply expanding the vma, we never remove it due to the "merge" with the added area, so the test should not prevent the expansion. As a quick fix, check for such vmas and expand them using vma_adjust() directly as was done before commit ca3d76b0aa80. For a more robust long term solution we should try to limit the check for vma_ops->close only to cases that actually result in vma removal, so that no merge would be prevented unnecessarily. [akpm@linux-foundation.org: fix indenting whitespace, reflow comment] Link: https://lkml.kernel.org/r/20230117101939.9753-1-vbabka@suse.cz Fixes: ca3d76b0aa80 ("mm: add merging after mremap resize") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Fabian Vogt <fvogt@suse.com> Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359#c35 Tested-by: Fabian Vogt <fvogt@suse.com> Cc: Jakub Matěna <matenajakub@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-22mm, mremap: fix mremap() expanding vma with addr inside vmaVlastimil Babka1-1/+2
Since 6.1 we have noticed random rpm install failures that were tracked to mremap() returning -ENOMEM and to commit ca3d76b0aa80 ("mm: add merging after mremap resize"). The problem occurs when mremap() expands a VMA in place, but using an starting address that's not vma->vm_start, but somewhere in the middle. The extension_pgoff calculation introduced by the commit is wrong in that case, so vma_merge() fails due to pgoffs not being compatible. Fix the calculation. By the way it seems that the situations, where rpm now expands a vma from the middle, were made possible also due to that commit, thanks to the improved vma merging. Yet it should work just fine, except for the buggy calculation. Link: https://lkml.kernel.org/r/20221216163227.24648-1-vbabka@suse.cz Reported-by: Jiri Slaby <jirislaby@kernel.org> Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359 Fixes: ca3d76b0aa80 ("mm: add merging after mremap resize") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jakub Matěna <matenajakub@gmail.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm: add merging after mremap resizeJakub Matěna1-2/+17
When mremap call results in expansion, it might be possible to merge the VMA with the next VMA which might become adjacent. This patch adds vma_merge call after the expansion is done to try and merge. [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/20220603145719.1012094-3-matenajakub@gmail.com Signed-off-by: Jakub Matěna <matenajakub@gmail.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm/mremap: use vma_find_intersection() instead of vma linked listLiam R. Howlett1-2/+3
Using the vma_find_intersection() call allows for cleaner code and removes linked list users in preparation of the linked list removal. Also remove one user of the linked list at the same time in favour of find_vma(). Link: https://lkml.kernel.org/r/20220906194824.2110408-60-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-27mm/mmap: reorganize munmap to use maple statesLiam R. Howlett1-7/+10
Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and do_mas_align_munmap(). do_munmap() is a wrapper to create a maple state for any callers that have not been converted to the maple tree. do_mas_munmap() takes a maple state to mumap a range. This is just a small function which checks for error conditions and aligns the end of the range. do_mas_align_munmap() uses the aligned range to mumap a range. do_mas_align_munmap() starts with the first VMA in the range, then finds the last VMA in the range. Both start and end are split if necessary. Then the VMAs are removed from the linked list and the mm mlock count is updated at the same time. Followed by a single tree operation of overwriting the area in with a NULL. Finally, the detached list is unmapped and freed. By reorganizing the munmap calls as outlined, it is now possible to avoid extra work of aligning pre-aligned callers which are known to be safe, avoid extra VMA lookups or tree walks for modifications. detach_vmas_to_be_unmapped() is no longer used, so drop this code. vm_brk_flags() can just call the do_mas_munmap() as it checks for intersecting VMAs directly. Link: https://lkml.kernel.org/r/20220906194824.2110408-29-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Tested-by: Yu Zhao <yuzhao@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: SeongJae Park <sj@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-26Merge tag 'mm-stable-2022-05-25' of ↵Linus Torvalds1-11/+5
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Almost all of MM here. A few things are still getting finished off, reviewed, etc. - Yang Shi has improved the behaviour of khugepaged collapsing of readonly file-backed transparent hugepages. - Johannes Weiner has arranged for zswap memory use to be tracked and managed on a per-cgroup basis. - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime enablement of the recent huge page vmemmap optimization feature. - Baolin Wang contributes a series to fix some issues around hugetlb pagetable invalidation. - Zhenwei Pi has fixed some interactions between hwpoisoned pages and virtualization. - Tong Tiangen has enabled the use of the presently x86-only page_table_check debugging feature on arm64 and riscv. - David Vernet has done some fixup work on the memcg selftests. - Peter Xu has taught userfaultfd to handle write protection faults against shmem- and hugetlbfs-backed files. - More DAMON development from SeongJae Park - adding online tuning of the feature and support for monitoring of fixed virtual address ranges. Also easier discovery of which monitoring operations are available. - Nadav Amit has done some optimization of TLB flushing during mprotect(). - Neil Brown continues to labor away at improving our swap-over-NFS support. - David Hildenbrand has some fixes to anon page COWing versus get_user_pages(). - Peng Liu fixed some errors in the core hugetlb code. - Joao Martins has reduced the amount of memory consumed by device-dax's compound devmaps. - Some cleanups of the arch-specific pagemap code from Anshuman Khandual. - Muchun Song has found and fixed some errors in the TLB flushing of transparent hugepages. - Roman Gushchin has done more work on the memcg selftests. ... and, of course, many smaller fixes and cleanups. Notably, the customary million cleanup serieses from Miaohe Lin" * tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits) mm: kfence: use PAGE_ALIGNED helper selftests: vm: add the "settings" file with timeout variable selftests: vm: add "test_hmm.sh" to TEST_FILES selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests selftests: vm: add migration to the .gitignore selftests/vm/pkeys: fix typo in comment ksm: fix typo in comment selftests: vm: add process_mrelease tests Revert "mm/vmscan: never demote for memcg reclaim" mm/kfence: print disabling or re-enabling message include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace" include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion" mm: fix a potential infinite loop in start_isolate_page_range() MAINTAINERS: add Muchun as co-maintainer for HugeTLB zram: fix Kconfig dependency warning mm/shmem: fix shmem folio swapoff hang cgroup: fix an error handling path in alloc_pagecache_max_30M() mm: damon: use HPAGE_PMD_SIZE tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate nodemask.h: fix compilation error with GCC12 ...
2022-05-13mm: hugetlb: considering PMD sharing when flushing cache/TLBsBaolin Wang1-1/+1
This patchset fixes some cache flushing issues if PMD sharing is possible for hugetlb pages, which were found by code inspection. Meanwhile Mike found the flush_cache_page() can not cover the whole size of a hugetlb page on some architectures [1], so I added a new patch 3 to fix this issue, since I found only try_to_unmap_one() and try_to_migrate_one() need to fix after some investigation. [1] https://lore.kernel.org/linux-mm/064da3bb-5b4b-7332-a722-c5a541128705@oracle.com/ This patch (of 3): When moving hugetlb page tables, the cache flushing is called in move_page_tables() without considering the shared PMDs, which may be cause cache issues on some architectures. Thus we should move the hugetlb cache flushing into move_hugetlb_page_tables() with considering the shared PMDs ranges, calculated by adjust_range_if_pmd_sharing_possible(). Meanwhile also expanding the TLBs flushing range in case of shared PMDs. Note this is discovered via code inspection, and did not meet a real problem in practice so far. Link: https://lkml.kernel.org/r/cover.1651056365.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/0443c8cf20db554d3ff4b439b30e0ff26c0181dd.1651056365.git.baolin.wang@linux.alibaba.com Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed vma") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Mina Almasry <almasrymina@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-10mm: mremap: fix sign for EFAULT error return valueNiels Dossche1-1/+1
The mremap syscall is supposed to return a pointer to the new virtual memory area on success, and a negative value of the error code in case of failure. Currently, EFAULT is returned when the VMA is not found, instead of -EFAULT. The users of this syscall will therefore believe the syscall succeeded in case the VMA didn't exist, as it returns a pointer to address 0xe (0xe being the value of EFAULT). Fix the sign of the error value. Link: https://lkml.kernel.org/r/20220427224439.23828-2-dossche.niels@gmail.com Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed vma") Signed-off-by: Niels Dossche <dossche.niels@gmail.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mremap: avoid unneeded do_munmap callMiaohe Lin1-2/+2
When old_len == new_len, do_munmap will return -EINVAL due to len == 0. This errno will be simply ignored because of old_len != new_len check. So it is unnecessary to call do_munmap when old_len == new_len because nothing is actually done. Link: https://lkml.kernel.org/r/20220401081023.37080-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29mm/mremap: use helper mlock_future_check()Miaohe Lin1-8/+2
Use helper mlock_future_check() to check whether it's safe to resize the locked_vm to simplify the code. Minor readability improvement. Link: https://lkml.kernel.org/r/20220322112004.27380-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-09mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)Paolo Bonzini1-0/+3
If an mremap() syscall with old_size=0 ends up in move_page_tables(), it will call invalidate_range_start()/invalidate_range_end() unnecessarily, i.e. with an empty range. This causes a WARN in KVM's mmu_notifier. In the past, empty ranges have been diagnosed to be off-by-one bugs, hence the WARNing. Given the low (so far) number of unique reports, the benefits of detecting more buggy callers seem to outweigh the cost of having to fix cases such as this one, where userspace is doing something silly. In this particular case, an early return from move_page_tables() is enough to fix the issue. Link: https://lkml.kernel.org/r/20220329173155.172439-1-pbonzini@redhat.com Reported-by: syzbot+6bde52d89cfdf9f61425@syzkaller.appspotmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23mm/mremap:: use vma_lookup() instead of find_vma()Miaohe Lin1-2/+2
Using vma_lookup() verifies the address is contained in the found vma. This results in easier to read code. Link: https://lkml.kernel.org/r/20220312083118.48284-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06mm, hugepages: add mremap() support for hugepage backed vmaMina Almasry1-3/+33
Support mremap() for hugepage backed vma segment by simply repositioning page table entries. The page table entries are repositioned to the new virtual address on mremap(). Hugetlb mremap() support is of course generic; my motivating use case is a library (hugepage_text), which reloads the ELF text of executables in hugepages. This significantly increases the execution performance of said executables. Restrict the mremap operation on hugepages to up to the size of the original mapping as the underlying hugetlb reservation is not yet capable of handling remapping to a larger size. During the mremap() operation we detect pmd_share'd mappings and we unshare those during the mremap(). On access and fault the sharing is established again. Link: https://lkml.kernel.org/r/20211013195825.3058275-1-almasrymina@google.com Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Ken Chen <kenchen@google.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06mm/mremap: don't account pages in vma_to_resize()Dmitry Safonov1-28/+22
All this vm_unacct_memory(charged) dance seems to complicate the life without a good reason. Furthermore, it seems not always done right on error-pathes in mremap_to(). And worse than that: this `charged' difference is sometimes double-accounted for growing MREMAP_DONTUNMAP mremap()s in move_vma(): if (security_vm_enough_memory_mm(mm, new_len >> PAGE_SHIFT)) Let's not do this. Account memory in mremap() fast-path for growing VMAs or in move_vma() for actually moving things. The same simpler way as it's done by vm_stat_account(), but with a difference to call security_vm_enough_memory_mm() before copying/adjusting VMA. Originally noticed by Chen Wandun: https://lkml.kernel.org/r/20210717101942.120607-1-chenwandun@huawei.com Link: https://lkml.kernel.org/r/20210721131320.522061-1-dima@arista.com Fixes: e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: Brian Geffon <bgeffon@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03mm/mremap: fix memory account on do_munmap() failureChen Wandun1-1/+1
mremap will account the delta between new_len and old_len in vma_to_resize, and then call move_vma when expanding an existing memory mapping. In function move_vma, there are two scenarios when calling do_munmap: 1. move_page_tables from old_addr to new_addr success 2. move_page_tables from old_addr to new_addr fail In first scenario, it should account old_len if do_munmap fail, because the delta has already been accounted. In second scenario, new_addr/new_len will assign to old_addr/old_len if move_page_table fail, so do_munmap is try to unmap new_addr actually, if do_munmap fail, it should account the new_len, because error code will be return from move_vma, and delta will be unaccounted. What'more, because of new_len == old_len, so account old_len also is OK. In summary, account old_len will be correct if do_munmap fail. Link: https://lkml.kernel.org/r/20210717101942.120607-1-chenwandun@huawei.com Fixes: 51df7bcb6151 ("mm/mremap: account memory on do_munmap() failure") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Acked-by: Dmitry Safonov <dima@arista.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm/mremap: allow arch runtime overrideAneesh Kumar K.V1-1/+14
Patch series "Speedup mremap on ppc64", v8. This patchset enables MOVE_PMD/MOVE_PUD support on power. This requires the platform to support updating higher-level page tables without updating page table entries. This also needs to invalidate the Page Walk Cache on architecture supporting the same. This patch (of 3): Architectures like ppc64 support faster mremap only with radix translation. Hence allow a runtime check w.r.t support for fast mremap. Link: https://lkml.kernel.org/r/20210616045735.374532-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20210616045735.374532-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm/mremap: hold the rmap lock in write mode when moving page table entries.Aneesh Kumar K.V1-2/+2
To avoid a race between rmap walk and mremap, mremap does take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss a page table entry due to PTE moves via move_pagetables(). The kernel does further optimization of this lock such that if we are going to find the newly added vma after the old vma, the rmap lock is not taken. This is because rmap walk would find the vmas in the same order and if we don't find the page table attached to older vma we would find it with the new vma which we would iterate later. As explained in commit eb66ae030829 ("mremap: properly flush TLB before releasing the page") mremap is special in that it doesn't take ownership of the page. The optimized version for PUD/PMD aligned mremap also doesn't hold the ptl lock. This can result in stale TLB entries as show below. This patch updates the rmap locking requirement in mremap to handle the race condition explained below with optimized mremap:: Optmized PMD move CPU 1 CPU 2 CPU 3 mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one mmap_write_lock_killable() addr = old_addr lock(pte_ptl) lock(pmd_ptl) pmd = *old_pmd pmd_clear(old_pmd) flush_tlb_range(old_addr) *new_pmd = pmd *new_addr = 10; and fills TLB with new addr and old pfn unlock(pmd_ptl) ptep_clear_flush() old pfn is free. Stale TLB entry Optimized PUD move also suffers from a similar race. Both the above race condition can be fixed if we force mremap path to take rmap lock. Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions") Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions") Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm/mremap: use pmd/pud_poplulate to update page table entriesAneesh Kumar K.V1-4/+3
pmd/pud_populate is the right interface to be used to set the respective page table entries. Some architectures like ppc64 do assume that set_pmd/pud_at can only be used to set a hugepage PTE. Since we are not setting up a hugepage PTE here, use the pmd/pud_populate interface. Link: https://lkml.kernel.org/r/20210616045239.370802-6-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm/mremap: don't enable optimized PUD move if page table levels is 2Aneesh Kumar K.V1-1/+1
With two level page table don't enable move_normal_pud. Link: https://lkml.kernel.org/r/20210616045239.370802-5-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm/mremap: convert huge PUD move to separate helperAneesh Kumar K.V1-7/+73
With TRANSPARENT_HUGEPAGE_PUD enabled the kernel can find huge PUD entries. Add a helper to move huge PUD entries on mremap(). This will be used by a later patch to optimize mremap of PUD_SIZE aligned level 4 PTE mapped address This also make sure we support mremap on huge PUD entries even with CONFIG_HAVE_MOVE_PUD disabled. [aneesh.kumar@linux.ibm.com: fix build failure with clang-10] Link: https://lore.kernel.org/lkml/YMuOSnJsL9qkxweY@archlinux-ax161 Link: https://lkml.kernel.org/r/20210619134310.89098-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20210616045239.370802-4-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29mm/mremap: use vma_lookup() in vma_to_resize()Liam Howlett1-2/+3
Use vma_lookup() to find the VMA at a specific address. As vma_lookup() will return NULL if the address is not within any VMA, the start address no longer needs to be validated. Link: https://lkml.kernel.org/r/20210521174745.2219620-21-Liam.Howlett@Oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07mm: fix typos in commentsIngo Molnar1-1/+1
Fix ~94 single-word typos in locking code comments, plus a few very obvious grammar mistakes. Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"Brian Geffon1-1/+1
This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99. As discussed in [1] this commit was a no-op because the mapping type was checked in vma_to_resize before move_vma is ever called. This meant that vm_ops->mremap() would never be called on such mappings. Furthermore, we've since expanded support of MREMAP_DONTUNMAP to non-anonymous mappings, and these special mappings are still protected by the existing check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL. 1. https://lkml.org/lkml/2020/12/28/2340 Link: https://lkml.kernel.org/r/20210323182520.2712101-2-bgeffon@google.com Signed-off-by: Brian Geffon <bgeffon@google.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alejandro Colomar <alx.manpages@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30mm: extend MREMAP_DONTUNMAP to non-anonymous mappingsBrian Geffon1-2/+2
Patch series "mm: Extend MREMAP_DONTUNMAP to non-anonymous mappings", v5. This patch (of 3): Currently MREMAP_DONTUNMAP only accepts private anonymous mappings. This restriction was placed initially for simplicity and not because there exists a technical reason to do so. This change will widen the support to include any mappings which are not VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support MREMAP_DONTUNMAP on mappings which may have been created from a memfd. This change will result in mremap(MREMAP_DONTUNMAP) returning -EINVAL if VM_DONTEXPAND or VM_PFNMAP mappings are specified. Lokesh Gidra who works on the Android JVM, provided an explanation of how such a feature will improve Android JVM garbage collection: "Android is developing a new garbage collector (GC), based on userfaultfd. The garbage collector will use userfaultfd (uffd) on the java heap during compaction. On accessing any uncompacted page, the application threads will find it missing, at which point the thread will create the compacted page and then use UFFDIO_COPY ioctl to get it mapped and then resume execution. Before starting this compaction, in a stop-the-world pause the heap will be mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive UFFD_EVENT_PAGEFAULT events after resuming execution. To speedup mremap operations, pagetable movement was optimized by moving PUD entries instead of PTE entries [1]. It was necessary as mremap of even modest sized memory ranges also took several milliseconds, and stopping the application for that long isn't acceptable in response-time sensitive cases. With UFFDIO_CONTINUE feature [2], it will be even more efficient to implement this GC, particularly the 'non-moveable' portions of the heap. It will also help in reducing the need to copy (UFFDIO_COPY) the pages. However, for this to work, the java heap has to be on a 'shared' vma. Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap compaction." [1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%25akpm@linux-foundation.org/ [2] https://lore.kernel.org/linux-mm/20210302000133.272579-1-axelrasmussen@google.com/ Link: https://lkml.kernel.org/r/20210323182520.2712101-1-bgeffon@google.com Signed-off-by: Brian Geffon <bgeffon@google.com> Acked-by: Hugh Dickins <hughd@google.com> Tested-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alejandro Colomar <alx.manpages@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-25mm: mremap: unlink anon_vmas when mremap with MREMAP_DONTUNMAP successLi Xinhai1-0/+8
mremap with MREMAP_DONTUNMAP can move all page table entries to new vma, which means all pages allocated for the old vma are not relevant to it anymore, and the relevant anon_vma links needs to be unlinked, in nature the old vma is much like been freshly created and have no pages been fault in. But we should not do unlink, if the new vma has effectively merged with the old one. [lixinhai.lxh@gmail.com: v2] Link: https://lkml.kernel.org/r/20210127083917.309264-2-lixinhai.lxh@gmail.com Link: https://lkml.kernel.org/r/20210119075126.3513154-2-lixinhai.lxh@gmail.com Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-22Merge tag 'for-linux-5.12-rc1' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Many cleanups and fixes for our virtio code - Add support for a pseudo RTC - Fix for a possible jailbreak - Minor fixes (spelling, header files) * tag 'for-linux-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: irq.h: include <asm-generic/irq.h> um: io.h: include <linux/types.h> um: add a pseudo RTC um: remove process stub VMA um: rework userspace stubs to not hard-code stub location um: separate child and parent errors in clone stub um: defer killing userspace on page table update failures um: mm: check more comprehensively for stub changes um: print register names in wait_for_stub um: hostfs: use a kmem cache for inodes mm: Remove arch_remap() and mm-arch-hooks.h um: fix spelling mistake in Kconfig "privleges" -> "privileges" um: virtio: allow devices to be configured for wakeup um: time-travel: rework interrupt handling in ext mode um: virtio: disable VQs during suspend um: virtio: fix handling of messages without payload um: virtio: clean up a comment
2021-02-12mm: Remove arch_remap() and mm-arch-hooks.hChristophe Leroy1-3/+0
powerpc was the last provider of arch_remap() and the last user of mm-arch-hooks.h. Since commit 526a9c4a7234 ("powerpc/vdso: Provide vdso_remap()"), arch_remap() hence mm-arch-hooks.h are not used anymore. Remove them. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Richard Weinberger <richard@nod.at>
2021-02-10mm/mremap: fix BUILD_BUG_ON() error in get_extentArnd Bergmann1-2/+3
clang can't evaluate this function argument at compile time when the function is not inlined, which leads to a link time failure: ld.lld: error: undefined symbol: __compiletime_assert_414 >>> referenced by mremap.c >>> mremap.o:(get_extent) in archive mm/built-in.a Mark the function as __always_inline to avoid it. Link: https://lkml.kernel.org/r/20201230154104.522605-1-arnd@kernel.org Fixes: 9ad9718bfa41 ("mm/mremap: calculate extent in one place") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-30mm/mremap.c: fix extent calculationKalesh Singh1-1/+3
When `next < old_addr`, `next - old_addr` arithmetic underflows causing `extent` to be incorrect. Make `extent` the smaller of `next - old_addr` or `old_end - old_addr`. Link: https://lkml.kernel.org/r/20201219170433.2418867-1-kaleshsingh@google.com Fixes: c49dd34018026 ("mm: speedup mremap on 1GB or larger regions") Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Helge Deller <deller@gmx.de> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mremap: check if it's possible to split original vmaDmitry Safonov1-1/+10
If original VMA can't be split at the desired address, do_munmap() will fail and leave both new-copied VMA and old VMA. De-facto it's MREMAP_DONTUNMAP behaviour, which is unexpected. Currently, it may fail such way for hugetlbfs and dax device mappings. Minimize such unpleasant situations to OOM by checking .may_split() before attempting to create a VMA copy. Link: https://lkml.kernel.org/r/20201013013416.390574-6-dima@arista.com Signed-off-by: Dmitry Safonov <dima@arista.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aioDmitry Safonov1-1/+1
As kernel expect to see only one of such mappings, any further operations on the VMA-copy may be unexpected by the kernel. Maybe it's being on the safe side, but there doesn't seem to be any expected use-case for this, so restrict it now. Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Signed-off-by: Dmitry Safonov <dima@arista.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/mremap: for MREMAP_DONTUNMAP check security_vm_enough_memory_mm()Dmitry Safonov1-23/+13
Currently memory is accounted post-mremap() with MREMAP_DONTUNMAP, which may break overcommit policy. So, check if there's enough memory before doing actual VMA copy. Don't unset VM_ACCOUNT on MREMAP_DONTUNMAP. By semantics, such mremap() is actually a memory allocation. That also simplifies the error-path a little. Also, as it's memory allocation on success don't reset hiwater_vm value. Link: https://lkml.kernel.org/r/20201013013416.390574-3-dima@arista.com Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Signed-off-by: Dmitry Safonov <dima@arista.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm/mremap: account memory on do_munmap() failureDmitry Safonov1-1/+2
Patch series "mremap: move_vma() fixes". This patch (of 6): move_vma() copies VMA without adding it to account, then unmaps old part of VMA. On failure it unmaps the new VMA. With hacks accounting in munmap is disabled as it's a copy of existing VMA. Account the memory on munmap() failure which was previously copied into a new VMA. Link: https://lkml.kernel.org/r/20201013013416.390574-1-dima@arista.com Link: https://lkml.kernel.org/r/20201013013416.390574-2-dima@arista.com Fixes: commit e2ea83742133 ("[PATCH] mremap: move_vma fixes and cleanup") Signed-off-by: Dmitry Safonov <dima@arista.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15mm: speedup mremap on 1GB or larger regionsKalesh Singh1-40/+190
Android needs to move large memory regions for garbage collection. The GC requires moving physical pages of multi-gigabyte heap using mremap. During this move, the application threads have to be paused for correctness. It is critical to keep this pause as short as possible to avoid jitters during user interaction. Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD level if the source and destination addresses are PUD-aligned. For CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves PGD entries, since the PUD entry is “folded back” onto the PGD entry. Add HAVE_MOVE_PUD so that architectures where moving at the PUD level isn't supported/tested can turn this off by not selecting the config. Link: https://lkml.kernel.org/r/20201014005320.2233162-4-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Gavin Shan <gshan@redhat.com> Cc: Hassan Naveed <hnaveed@wavecomp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kees Cook <keescook@chromium.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/mremap: start addresses are properly alignedWei Yang1-3/+0
After previous cleanup, extent is the minimal step for both source and destination. This means when extent is HPAGE_PMD_SIZE or PMD_SIZE, old_addr and new_addr are properly aligned too. Since these two functions are only invoked in move_page_tables, it is safe to remove the check now. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <yang.shi@linux.alibaba.com> Link: http://lkml.kernel.org/r/20200708095028.41706-4-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/mremap: calculate extent in one placeWei Yang1-3/+3
Page tables is moved on the base of PMD. This requires both source and destination range should meet the requirement. Current code works well since move_huge_pmd() and move_normal_pmd() would check old_addr and new_addr again. And then return to move_ptes() if the either of them is not aligned. Instead of calculating the extent separately, it is better to calculate in one place, so we know it is not necessary to try move pmd. By doing so, the logic seems a little clear. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <yang.shi@linux.alibaba.com> Link: http://lkml.kernel.org/r/20200708095028.41706-3-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07mm/mremap: it is sure to have enough space when extent meets requirementWei Yang1-6/+4
Patch series "mm/mremap: cleanup move_page_tables() a little", v5. move_page_tables() tries to move page table by PMD or PTE. The root reason is if it tries to move PMD, both old and new range should be PMD aligned. But current code calculate old range and new range separately. This leads to some redundant check and calculation. This cleanup tries to consolidate the range check in one place to reduce some extra range handling. This patch (of 3): old_end is passed to these two functions to check whether there is enough space to do the move, while this check is done before invoking these functions. These two functions only would be invoked when extent meets the requirement and there is one check before invoking these functions: if (extent > old_end - old_addr) extent = old_end - old_addr; This implies (old_end - old_addr) won't fail the check in these two functions. Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Peter Xu <peterx@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Hellstrom <thellstrom@vmware.com> Link: http://lkml.kernel.org/r/20200710092835.56368-1-richard.weiyang@linux.alibaba.com Link: http://lkml.kernel.org/r/20200710092835.56368-2-richard.weiyang@linux.alibaba.com Link: http://lkml.kernel.org/r/20200708095028.41706-1-richard.weiyang@linux.alibaba.com Link: http://lkml.kernel.org/r/20200708095028.41706-2-richard.weiyang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-13mm: document warning in move_normal_pmd() and make it warn only onceLinus Torvalds1-2/+21
Naresh Kamboju reported that the LTP tests can cause warnings on i386 going back all the way to v5.0, and bisected it to commit 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions"). The warning in move_normal_pmd() is actually mostly correct, but we have a very unusual special case at process creation time, when we may move the stack down with an overlapping mode (kind of like a "memmove()" except using the page tables). And when you have just the right condition of "move a large initial stack by the right alignment in the end, but with the early part of the move being only page-aligned", we'll be in a situation where we're trying to move a normal PMD entry on top of an already existing - but now empty - PMD entry. The warning is still worth having, in case it ever triggers other cases, and perhaps as a reminder that we could do the stack move case more efficiently (although it's clearly rare enough that it probably doesn't matter). But make it do WARN_ON_ONCE(), so that you can't flood the logs with it. And add a *big* comment above it to explain and remind us what's going on, because it took some figuring out to see how this could trigger. Kudos to Joel Fernandes for debugging this. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Debugged-and-acked-by: Joel Fernandes <joel@joelfernandes.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: convert mmap_sem commentsMichel Lespinasse1-4/+4
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: use coccinelle to convert mmap_sem rwsem call sitesMichel Lespinasse1-3/+3
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>