starfive-tech/linux.git - StarFive Tech Linux Kernel for VisionFive (JH7110) boards (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2023-06-10	Docs/mm/damon/design: add a section for the modules layer	SeongJae Park	1	-0/+61
	Add a section for covering DAMON modules layer to the design document. Link: https://lkml.kernel.org/r/20230525214314.5204-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Docs/mm/damon/design: add a section for DAMON core API	SeongJae Park	1	-0/+12
	Add a section covering the API of DAMON core layer on the design document. Link: https://lkml.kernel.org/r/20230525214314.5204-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Docs/mm/damon/design: add sections for advanced features of DAMOS	SeongJae Park	1	-0/+86
	Add sections for advanced features of DAMOS including quotas, prioritization, watermarks, and filters of DAMOS on the design document. Link: https://lkml.kernel.org/r/20230525214314.5204-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Docs/mm/damon/design: add sections for basic parts of DAMOS	SeongJae Park	1	-0/+70
	DAMOS is an important part of DAMON, but the design doc is not covering it. Add sections for covering the basic part of DAMOS. Link: https://lkml.kernel.org/r/20230525214314.5204-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Docs/mm/damon/design: add a section for the relation between Core and ↵	SeongJae Park	1	-0/+10
	Modules layer Add overall desription of the interface and the relation between the Core and the Modules layer under 'Overall Architecture' section. Link: https://lkml.kernel.org/r/20230525214314.5204-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Docs/mm/damon/design: rewrite configurable layers	SeongJae Park	1	-15/+14
	The 'Configurable Operations Set' section is a little bit outdated. Update the text. Link: https://lkml.kernel.org/r/20230525214314.5204-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Docs/mm/damon/design: update the layout based on the layers	SeongJae Park	1	-10/+14
	DAMON design document is describing only the operations set layer and monitoring part of the core logic. Update the layout based on the DAMON's layers, so that more parts of DAMON including DAMOS core logic and DAMON modules can easily be added. Link: https://lkml.kernel.org/r/20230525214314.5204-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Docs/mm/damon/design: add a section for overall architecture	SeongJae Park	1	-0/+15
	The design doc is missing overall picture of DAMON. Add a section for overall architeucture and layers. Link: https://lkml.kernel.org/r/20230525214314.5204-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Docs/mm/damon/maintainer-profile: fix typos and grammar errors	SeongJae Park	1	-2/+2
	Fix a few typos and grammar erros in DAMON Maintainer Profile document. Link: https://lkml.kernel.org/r/20230525214314.5204-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Docs/mm/damon/faq: remove old questions	SeongJae Park	1	-23/+0
	Patch series "Docs/mm/damon: Minor fixes and design doc update". Some of the DAMON documents are outdated, or having minor typos or grammar erros. Especially, the design doc has not updated for DAMOS, which is an important part of DAMON. Fix the minor issues and update documents. This patch (of 10): The first two questions of DAMON faqs have raised when DAMON patches were first submitted. More than one year has passed since DAMON patches get merged in the mainline, and that kind of questions are not asked nowadays. Remove the questions. Link: https://lkml.kernel.org/r/20230525214314.5204-1-sj@kernel.org Link: https://lkml.kernel.org/r/20230525214314.5204-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	Multi-gen LRU: fix workingset accounting	Kalesh Singh	2	-4/+7
	On Android app cycle workloads, MGLRU showed a significant reduction in workingset refaults although pgpgin/pswpin remained relatively unchanged. This indicated MGLRU may be undercounting workingset refaults. This has impact on userspace programs, like Android's LMKD, that monitor workingset refault statistics to detect thrashing. It was found that refaults were only accounted if the MGLRU shadow entry was for a recently evicted folio. However, recently evicted folios should be accounted as workingset activation, and refaults should be accounted regardless of recency. Fix MGLRU's workingset refault and activation accounting to more closely match that of the conventional active/inactive LRU. Link: https://lkml.kernel.org/r/20230523205922.3852731-1-kaleshsingh@google.com Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation") Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reported-by: Charan Teja Kalla <quic_charante@quicinc.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: relocate the declaration of mas_empty_area_rev().	Peng Zhang	1	-6/+6
	Relocate the declaration of mas_empty_area_rev() so that mas_empty_area() and mas_empty_area_rev() are together. Link: https://lkml.kernel.org/r/20230524031247.65949-11-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: simplify and clean up mas_wr_node_store()	Peng Zhang	1	-61/+26
	Simplify and clean up mas_wr_node_store(), remove unnecessary code. Link: https://lkml.kernel.org/r/20230524031247.65949-10-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: rework mas_wr_slot_store() to be cleaner and more efficient.	Peng Zhang	1	-34/+19
	Get whether the two gaps to be overwritten are empty to avoid calling mas_update_gap() all the time. Also clean up the code and add comments. Link: https://lkml.kernel.org/r/20230524031247.65949-9-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: add comments and some minor cleanups to mas_wr_append()	Peng Zhang	1	-24/+23
	Add comment for mas_wr_append(), move mas_update_gap() into mas_wr_append(), and other cleanups to make mas_wr_modify() cleaner. Link: https://lkml.kernel.org/r/20230524031247.65949-8-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: add mas_wr_new_end() to calculate new_end accurately	Peng Zhang	1	-11/+23
	The previous new_end calculation is inaccurate, because it assumes that two new pivots must be added (this is inaccurate), and sometimes it will miss the fast path and enter the slow path. Add mas_wr_new_end() to accurately calculate new_end to make the conditions for entering the fast path more accurate. Link: https://lkml.kernel.org/r/20230524031247.65949-7-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: make the code symmetrical in mas_wr_extend_null()	Peng Zhang	1	-12/+14
	Just make the code symmetrical to improve readability. Link: https://lkml.kernel.org/r/20230524031247.65949-6-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: simplify mas_is_span_wr()	Peng Zhang	1	-23/+11
	Make the code for detecting spanning writes more concise. Link: https://lkml.kernel.org/r/20230524031247.65949-5-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: fix the arguments to __must_hold()	Peng Zhang	1	-3/+3
	Fix the arguments to __must_hold() to make sparse work. Link: https://lkml.kernel.org/r/20230524031247.65949-4-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: drop mas_{rev_}alloc() and mas_fill_gap()	Peng Zhang	1	-108/+0
	mas_{rev_}alloc() and mas_fill_gap() are no longer used, delete them. Link: https://lkml.kernel.org/r/20230524031247.65949-3-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	maple_tree: rework mtree_alloc_{range,rrange}()	Peng Zhang	1	-25/+32
	Patch series "Clean ups for maple tree", v4. Some clean ups, mainly to make the code of maple tree more concise. This patchset has passed the self-test. This patch (of 10): Use mas_empty_area{_rev}() to refactor mtree_alloc_{range,rrange}() Link: https://lkml.kernel.org/r/20230524031247.65949-2-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/memcontrol: export memcg.swap watermark via sysfs for v2 memcg	Lars R. Damerow	2	-0/+20
	This patch is similar to commit 8e20d4b33266 ("mm/memcontrol: export memcg->watermark via sysfs for v2 memcg"), but exports the swap counter's watermark. We allocate jobs to our compute farm using heuristics determined by memory and swap usage from previous jobs. Tracking the peak swap usage for new jobs is important for determining when jobs are exceeding their expected bounds, or when our baseline metrics are getting outdated. Our toolset was written to use the "memory.memsw.max_usage_in_bytes" file in cgroups v1, and altering it to poll cgroups v2's "memory.swap.current" would give less accurate results as well as add complication to the code. Having this watermark exposed in sysfs is much preferred. Link: https://lkml.kernel.org/r/20230524181734.125696-1-lars@pixar.com Signed-off-by: Lars R. Damerow <lars@pixar.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: shmem: fix UAF bug in shmem_show_options()	Tu Jinjiang	1	-1/+4
	shmem_show_options() uses sbinfo->mpol without adding it's refcnt. This may lead to race with replacement of the mpol by remount. The execution sequence is as follows. CPU0 CPU1 shmem_show_options() shmem_reconfigure() shmem_show_mpol(seq, sbinfo->mpol) mpol = sbinfo->mpol mpol_put(mpol) mpol->mode The KASAN report is as follows. BUG: KASAN: slab-use-after-free in shmem_show_options+0x21b/0x340 Read of size 2 at addr ffff888124324004 by task mount/2388 CPU: 2 PID: 2388 Comm: mount Not tainted 6.4.0-rc3-00017-g9d646009f65d-dirty #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x37/0x50 print_report+0xd0/0x620 ? shmem_show_options+0x21b/0x340 ? __virt_addr_valid+0xf4/0x180 ? shmem_show_options+0x21b/0x340 kasan_report+0xb8/0xe0 ? shmem_show_options+0x21b/0x340 shmem_show_options+0x21b/0x340 ? __pfx_shmem_show_options+0x10/0x10 ? strchr+0x2c/0x50 ? strlen+0x23/0x40 ? seq_puts+0x7d/0x90 show_vfsmnt+0x1e6/0x260 ? __pfx_show_vfsmnt+0x10/0x10 ? __kasan_kmalloc+0x7f/0x90 seq_read_iter+0x57a/0x740 vfs_read+0x2e2/0x4a0 ? __pfx_vfs_read+0x10/0x10 ? down_write_killable+0xb8/0x140 ? __pfx_down_write_killable+0x10/0x10 ? __fget_light+0xa9/0x1e0 ? up_write+0x3f/0x80 ksys_read+0xb8/0x150 ? __pfx_ksys_read+0x10/0x10 ? fpregs_assert_state_consistent+0x55/0x60 ? exit_to_user_mode_prepare+0x2d/0x120 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc </TASK> Allocated by task 2387: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 __kasan_slab_alloc+0x59/0x70 kmem_cache_alloc+0xdd/0x220 mpol_new+0x83/0x150 mpol_parse_str+0x280/0x4a0 shmem_parse_one+0x364/0x520 vfs_parse_fs_param+0xf8/0x1a0 vfs_parse_fs_string+0xc9/0x130 shmem_parse_options+0xb2/0x110 path_mount+0x597/0xdf0 do_mount+0xcd/0xf0 __x64_sys_mount+0xbd/0x100 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 2389: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x50 __kasan_slab_free+0x10e/0x1a0 kmem_cache_free+0x9c/0x350 shmem_reconfigure+0x278/0x370 reconfigure_super+0x383/0x450 path_mount+0xcc5/0xdf0 do_mount+0xcd/0xf0 __x64_sys_mount+0xbd/0x100 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc The buggy address belongs to the object at ffff888124324000 which belongs to the cache numa_policy of size 32 The buggy address is located 4 bytes inside of freed 32-byte region [ffff888124324000, ffff888124324020) ================================================================== To fix the bug, shmem_get_sbmpol() / mpol_put() needs to be called before / after shmem_show_mpol() call. Link: https://lkml.kernel.org/r/20230525031640.593733-1-tujinjiang@huawei.com Signed-off-by: Tu Jinjiang <tujinjiang@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: compaction: skip fast freepages isolation if enough freepages are isolated	Baolin Wang	1	-0/+4
	I've observed that fast isolation often isolates more pages than cc->migratepages, and the excess freepages will be released back to the buddy system. So skip fast freepages isolation if enough freepages are isolated to save some CPU cycles. Link: https://lkml.kernel.org/r/f39c2c07f2dba2732fd9c0843572e5bef96f7f67.1685018752.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: compaction: add trace event for fast freepages isolation	Baolin Wang	2	-1/+16
	The fast_isolate_freepages() can also isolate freepages, but we can not know the fast isolation efficiency to understand the fast isolation pressure. So add a trace event to show some numbers to help to understand the efficiency for fast freepages isolation. Link: https://lkml.kernel.org/r/78d2932d0160d122c15372aceb3f2c45460a17fc.1685018752.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: compaction: only set skip flag if cc->no_set_skip_hint is false	Baolin Wang	1	-1/+1
	To keep the same logic as test_and_set_skip(), only set the skip flag if cc->no_set_skip_hint is false, which makes code more reasonable. Link: https://lkml.kernel.org/r/0eb2cd2407ffb259ae6e3071e10f70f2d41d0f3e.1685018752.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: compaction: skip more fully scanned pageblock	Baolin Wang	1	-1/+1
	In fast_isolate_around(), it assumes the pageblock is fully scanned if cc->nr_freepages < cc->nr_migratepages after trying to isolate some free pages, and will set skip flag to avoid scanning in future. However this can miss setting the skip flag for a fully scanned pageblock (returned 'start_pfn' is equal to 'end_pfn') in the case where cc->nr_freepages is larger than cc->nr_migratepages. So using the returned 'start_pfn' from isolate_freepages_block() and 'end_pfn' to decide if a pageblock is fully scanned makes more sense. It can also cover the case where cc->nr_freepages < cc->nr_migratepages, which means the 'start_pfn' is usually equal to 'end_pfn' except some uncommon fatal error occurs after non-strict mode isolation. Link: https://lkml.kernel.org/r/f4efd2fa08735794a6d809da3249b6715ba6ad38.1685018752.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: compaction: change fast_isolate_freepages() to void type	Baolin Wang	1	-5/+3
	No caller cares about the return value of fast_isolate_freepages(), void it. Link: https://lkml.kernel.org/r/759fca20b22ebf4c81afa30496837b9e0fb2e53b.1685018752.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: compaction: drop the redundant page validation in update_pageblock_skip()	Baolin Wang	1	-3/+0
	Patch series "Misc cleanups and improvements for compaction". This series cantains some cleanups and improvements for compaction. This patch (of 6): The caller has validated the page before calling update_pageblock_skip(), thus drop the redundant page validation in update_pageblock_skip(). Link: https://lkml.kernel.org/r/5142e15b9295fe8c447dbb39b7907a20177a1413.1685018752.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/vmalloc: dont purge usable blocks unnecessarily	Thomas Gleixner	1	-8/+20
	Purging fragmented blocks is done unconditionally in several contexts: 1) From drain_vmap_area_work(), when the number of lazy to be freed vmap_areas reached the threshold 2) Reclaiming vmalloc address space from pcpu_get_vm_areas() 3) _vm_unmap_aliases() #1 There is no reason to zap fragmented vmap blocks unconditionally, simply because reclaiming all lazy areas drains at least 32MB * fls(num_online_cpus()) per invocation which is plenty. #2 Reclaiming when running out of space or due to memory pressure makes a lot of sense #3 _unmap_aliases() requires to touch everything because the caller has no clue which vmap_area used a particular page last and the vmap_area lost that information too. Except for the vfree + VM_FLUSH_RESET_PERMS case, which removes the vmap area first and then cares about the flush. That in turn requires a full walk of _all_ vmap areas including the one which was just added to the purge list. But as this has to be flushed anyway this is an opportunity to combine outstanding TLB flushes and do the housekeeping of purging freed areas, but like #1 there is no real good reason to zap usable vmap blocks unconditionally. Add a @force_purge argument to the newly split out block purge function and if not true only purge fragmented blocks which have less than 1/4 of their capacity left. Rename purge_vmap_area_lazy() to reclaim_and_purge_vmap_areas() to make it clear what the function does. [lstoakes@gmail.com: correct VMAP_PURGE_THRESHOLD check] Link: https://lkml.kernel.org/r/3e92ef61-b910-4576-88e7-cf43211fd4e7@lucifer.local Link: https://lkml.kernel.org/r/20230525124504.864005691@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/vmalloc: add missing READ/WRITE_ONCE() annotations	Thomas Gleixner	1	-5/+8
	purge_fragmented_blocks() accesses vmap_block::free and vmap_block::dirty lockless for a quick check. Add the missing READ/WRITE_ONCE() annotations. Link: https://lkml.kernel.org/r/20230525124504.807356682@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/vmalloc: check free space in vmap_block lockless	Thomas Gleixner	1	-1/+4
	vb_alloc() unconditionally locks a vmap_block on the free list to check the free space. This can be done locklessly because vmap_block::free never increases, it's only decreased on allocations. Check the free space lockless and only if that succeeds, recheck under the lock. Link: https://lkml.kernel.org/r/20230525124504.750481992@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/vmalloc: prevent flushing dirty space over and over	Thomas Gleixner	1	-2/+6
	vmap blocks which have active mappings cannot be purged. Allocations which have been freed are accounted for in vmap_block::dirty_min/max, so that they can be detected in _vm_unmap_aliases() as potentially stale TLBs. If there are several invocations of _vm_unmap_aliases() then each of them will flush the dirty range. That's pointless and just increases the probability of full TLB flushes. Avoid that by resetting the flush range after accounting for it. That's safe versus other invocations of _vm_unmap_aliases() because this is all serialized with vmap_purge_lock. Link: https://lkml.kernel.org/r/20230525124504.692056496@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/vmalloc: avoid iterating over per CPU vmap blocks twice	Thomas Gleixner	1	-24/+46
	_vunmap_aliases() walks the per CPU xarrays to find partially unmapped blocks and then walks the per cpu free lists to purge fragmented blocks. Arguably that's waste of CPU cycles and cache lines as the full xarray walk already touches every block. Avoid this double iteration: - Split out the code to purge one block and the code to free the local purge list into helper functions. - Try to purge the fragmented blocks in the xarray walk before looking at their dirty space. Link: https://lkml.kernel.org/r/20230525124504.633469722@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/vmalloc: prevent stale TLBs in fully utilized blocks	Thomas Gleixner	1	-1/+2
	Patch series "mm/vmalloc: Assorted fixes and improvements", v2. this series addresses the following issues: 1) Prevent the stale TLB problem related to fully utilized vmap blocks 2) Avoid the double per CPU list walk in _vm_unmap_aliases() 3) Avoid flushing dirty space over and over 4) Add a lockless quickcheck in vb_alloc() and add missing READ/WRITE_ONCE() annotations 5) Prevent overeager purging of usable vmap_blocks if not under memory/address space pressure. This patch (of 6): _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a page are left in the system. This is required due to the lazy TLB flush mechanism in vmalloc. This is tried to achieve by walking the per CPU free lists, but those do not contain fully utilized vmap blocks because they are removed from the free list once the blocks free space became zero. When the block is not fully unmapped then it is not on the purge list either. So neither the per CPU list iteration nor the purge list walk find the block and if the page was mapped via such a block and the TLB has not yet been flushed, the guarantee of _vm_unmap_aliases() that there are no stale TLBs after returning is broken: x = vb_alloc() // Removes vmap_block from free list because vb->free became 0 vb_free(x) // Unmaps page and marks in dirty_min/max range // Block has still mappings and is not put on purge list // Page is reused vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL So instead of walking the per CPU free lists, walk the per CPU xarrays which hold pointers to _all_ active blocks in the system including those removed from the free lists. Link: https://lkml.kernel.org/r/20230525122342.109672430@linutronix.de Link: https://lkml.kernel.org/r/20230525124504.573987880@linutronix.de Fixes: db64fe02258f ("mm: rewrite vmap layer") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	kmemleak-test: drop __init to get better backtrace	Jim Cromie	1	-1/+1
	Drop the __init on kmemleak_test_init(). With it, the storage is reclaimed, but then the symbol isn't available for "%pS" rendering, and the backtrace gets a bare pointer where the actual leak happened. unreferenced object 0xffff88800a2b0800 (size 1024): comm "modprobe", pid 413, jiffies 4294953430 hex dump (first 32 bytes): 73 02 00 00 75 01 00 68 02 00 00 01 00 00 00 04 s...u..h........ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000fabad728>] kmalloc_trace+0x26/0x90 [<00000000ef738764>] 0xffffffffc02350a2 [<00000000004e5795>] do_one_initcall+0x43/0x210 [<00000000d768905e>] do_init_module+0x4a/0x210 [<0000000087135ab5>] __do_sys_finit_module+0x93/0xf0 [<000000004fcb1fa2>] do_syscall_64+0x34/0x80 [<00000000c73c8d9d>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 with __init gone, that trace entry renders like: [<00000000ef738764>] kmemleak_test_init+<offset>/<size> Link: https://lkml.kernel.org/r/20230525174356.69711-1-jim.cromie@gmail.com Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: multi-gen LRU: cleanup lru_gen_test_recent()	T.J. Alumbaugh	1	-30/+16
	Avoid passing memcg* and pglist_data* to lru_gen_test_recent() since we only use the lruvec anyway. Link: https://lkml.kernel.org/r/20230522112058.2965866-4-talumbau@google.com Signed-off-by: T.J. Alumbaugh <talumbau@google.com> Reviewed-by: Yuanchu Xie <yuanchu@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: multi-gen LRU: add helpers in page table walks	T.J. Alumbaugh	1	-5/+15
	Add helpers to page table walking code: - Clarifies intent via name "should_walk_mmu" and "should_clear_pmd_young" - Avoids repeating same logic in two places Link: https://lkml.kernel.org/r/20230522112058.2965866-3-talumbau@google.com Signed-off-by: T.J. Alumbaugh <talumbau@google.com> Reviewed-by: Yuanchu Xie <yuanchu@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: multi-gen LRU: cleanup lru_gen_soft_reclaim()	T.J. Alumbaugh	3	-4/+6
	lru_gen_soft_reclaim() gets the lruvec from the memcg and node ID to keep a cleaner interface on the caller side. Link: https://lkml.kernel.org/r/20230522112058.2965866-2-talumbau@google.com Signed-off-by: T.J. Alumbaugh <talumbau@google.com> Reviewed-by: Yuanchu Xie <yuanchu@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: multi-gen LRU: use macro for bitmap	T.J. Alumbaugh	1	-1/+1
	Use DECLARE_BITMAP macro when possible. Link: https://lkml.kernel.org/r/20230522112058.2965866-1-talumbau@google.com Signed-off-by: T.J. Alumbaugh <talumbau@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yuanchu Xie <yuanchu@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	selftests: cgroup: fix unexpected failure on test_memcg_low	Haifeng Xu	1	-1/+3
	Since commit f079a020ba95 ("selftests: memcg: factor out common parts of memory.{low,min} tests"), the value used in second alloc_anon has changed from 148M to 170M. Because memory.low allows reclaiming page cache in child cgroups, so the memory.current is close to 30M instead of 50M. Therefore, adjust the expected value of parent cgroup. Link: https://lkml.kernel.org/r/20230522095233.4246-2-haifeng.xu@shopee.com Fixes: f079a020ba95 ("selftests: memcg: factor out common parts of memory.{low,min} tests") Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/memcontrol: fix typo in comment	Haifeng Xu	1	-1/+1
	Replace 'then' with 'than'. Link: https://lkml.kernel.org/r/20230522095233.4246-1-haifeng.xu@shopee.com Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/mlock: rename mlock_future_check() to mlock_future_ok()	Andrew Morton	4	-7/+7
	It is felt that the name mlock_future_check() is vague - it doesn't particularly convey the function's operation. mlock_future_ok() is a clearer name for a predicate function. Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm/mmap: refactor mlock_future_check()	Lorenzo Stoakes	4	-20/+21
	In all but one instance, mlock_future_check() is treated as a boolean function despite returning an error code. In one instance, this error code is ignored and replaced with -ENOMEM. This is confusing, and the inversion of true -> failure, false -> success is not warranted. Convert the function to a bool, lightly refactor and return true if the check passes, false if not. Link: https://lkml.kernel.org/r/20230522082412.56685-1-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	selftests/mm: gup_longterm: add liburing tests	David Hildenbrand	1	-0/+73
	Similar to the COW selftests, also use io_uring fixed buffers to test if long-term page pinning works as expected. Link: https://lkml.kernel.org/r/20230519102723.185721-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	selftests/mm: gup_longterm: new functional test for FOLL_LONGTERM	David Hildenbrand	4	-1/+393
	Let's add a new test for checking whether GUP long-term page pinning works as expected (R/O vs. R/W, MAP_PRIVATE vs. MAP_SHARED, GUP vs. GUP-fast). Note that COW handling with long-term R/O pinning in private mappings, and pinning of anonymous memory in general, is tested by the COW selftest. This test, therefore, focuses on page pinning in file mappings. The most interesting case is probably the "local tmpfile" case, as that will likely end up on a "real" filesystem such as ext4 or xfs, not on a virtual one like tmpfs or hugetlb where any long-term page pinning is always expected to succeed. For now, only add tests that use the "/sys/kernel/debug/gup_test" interface. We'll add tests based on liburing separately next. [akpm@linux-foundation.org: update .gitignore for gup_longterm, per Peter] Link: https://lkml.kernel.org/r/20230519102723.185721-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	selftests/mm: factor out detection of hugetlb page sizes into vm_util	David Hildenbrand	3	-27/+30
	Patch series "selftests/mm: new test for FOLL_LONGTERM on file mappings". Let's add some selftests to make sure that: * R/O long-term pinning always works of file mappings * R/W long-term pinning always works in MAP_PRIVATE file mappings * R/W long-term pinning only works in MAP_SHARED mappings with special filesystems (shmem, hugetlb) and fails with other filesystems (ext4, btrfs, xfs). The tests make use of the gup_test kernel module to trigger ordinary GUP and GUP-fast, and liburing (similar to our COW selftests). Test with memfd, memfd hugetlb, tmpfile() and mkstemp(). The latter usually gives us a "real" filesystem (ext4, btrfs, xfs) where long-term pinning is expected to fail. Note that these selftests don't contain any actual reproducers for data corruptions in case R/W long-term pinning on problematic filesystems "would" work. Maybe we can later come up with a racy !FOLL_LONGTERM reproducer that can reuse an existing interface to trigger short-term pinning (I'll look into that next). On current mm/mm-unstable: # ./gup_longterm # [INFO] detected hugetlb page size: 2048 KiB # [INFO] detected hugetlb page size: 1048576 KiB TAP version 13 1..50 # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd ok 1 Should have worked # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with tmpfile ok 2 Should have worked # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile ok 3 Should have failed # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB) ok 4 Should have worked # [RUN] R/W longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB) ok 5 Should have worked # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd ok 6 Should have worked # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile ok 7 Should have worked # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile ok 8 Should have failed # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB) ok 9 Should have worked # [RUN] R/W longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB) ok 10 Should have worked # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd ok 11 Should have worked # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with tmpfile ok 12 Should have worked # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with local tmpfile ok 13 Should have worked # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB) ok 14 Should have worked # [RUN] R/O longterm GUP pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB) ok 15 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd ok 16 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with tmpfile ok 17 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with local tmpfile ok 18 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (2048 kB) ok 19 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB) ok 20 Should have worked # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd ok 21 Should have worked # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile ok 22 Should have worked # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile ok 23 Should have worked # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB) ok 24 Should have worked # [RUN] R/W longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB) ok 25 Should have worked # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd ok 26 Should have worked # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile ok 27 Should have worked # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile ok 28 Should have worked # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB) ok 29 Should have worked # [RUN] R/W longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB) ok 30 Should have worked # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd ok 31 Should have worked # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with tmpfile ok 32 Should have worked # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with local tmpfile ok 33 Should have worked # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB) ok 34 Should have worked # [RUN] R/O longterm GUP pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB) ok 35 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd ok 36 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with tmpfile ok 37 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with local tmpfile ok 38 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB) ok 39 Should have worked # [RUN] R/O longterm GUP-fast pin in MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB) ok 40 Should have worked # [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with memfd ok 41 Should have worked # [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with tmpfile ok 42 Should have worked # [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with local tmpfile ok 43 Should have failed # [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with memfd hugetlb (2048 kB) ok 44 Should have worked # [RUN] io_uring fixed buffer with MAP_SHARED file mapping ... with memfd hugetlb (1048576 kB) ok 45 Should have worked # [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with memfd ok 46 Should have worked # [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with tmpfile ok 47 Should have worked # [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with local tmpfile ok 48 Should have worked # [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with memfd hugetlb (2048 kB) ok 49 Should have worked # [RUN] io_uring fixed buffer with MAP_PRIVATE file mapping ... with memfd hugetlb (1048576 kB) ok 50 Should have worked # Totals: pass:50 fail:0 xfail:0 xpass:0 skip:0 error:0 This patch (of 3): Let's factor detection out into vm_util, to be reused by a new test. Link: https://lkml.kernel.org/r/20230519102723.185721-1-david@redhat.com Link: https://lkml.kernel.org/r/20230519102723.185721-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: compaction: avoid GFP_NOFS ABBA deadlock	Johannes Weiner	1	-2/+14
	During stress testing with higher-order allocations, a deadlock scenario was observed in compaction: One GFP_NOFS allocation was sleeping on mm/compaction.c::too_many_isolated(), while all CPUs in the system were busy with compactors spinning on buffer locks held by the sleeping GFP_NOFS allocation. Reclaim is susceptible to this same deadlock; we fixed it by granting GFP_NOFS allocations additional LRU isolation headroom, to ensure it makes forward progress while holding fs locks that other reclaimers might acquire. Do the same here. This code has been like this since compaction was initially merged, and I only managed to trigger this with out-of-tree patches that dramatically increase the contexts that do GFP_NOFS compaction. While the issue is real, it seems theoretical in nature given existing allocation sites. Worth fixing now, but no Fixes tag or stable CC. Link: https://lkml.kernel.org/r/20230519111359.40475-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: compaction: have compaction_suitable() return bool	Johannes Weiner	3	-40/+36
	Since it only returns COMPACT_CONTINUE or COMPACT_SKIPPED now, a bool return value simplifies the callsites. Link: https://lkml.kernel.org/r/20230602151204.GD161817@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-10	mm: compaction: drop redundant watermark check in compaction_zonelist_suitable()	Johannes Weiner	1	-7/+0
	The watermark check in compaction_zonelist_suitable(), called from should_compact_retry(), is sandwiched between two watermark checks already: before, there are freelist attempts as part of direct reclaim and direct compaction; after, there is a last-minute freelist attempt in __alloc_pages_may_oom(). The check in compaction_zonelist_suitable() isn't necessary. Kill it. Link: https://lkml.kernel.org/r/20230519123959.77335-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>