summaryrefslogtreecommitdiff
path: root/mm/internal.h
diff options
context:
space:
mode:
authorVlastimil Babka <vbabka@suse.cz>2020-12-15 06:10:59 +0300
committerLinus Torvalds <torvalds@linux-foundation.org>2020-12-15 23:13:43 +0300
commitec6e8c7e03147c65380e6c04c4cf4290e96280b6 (patch)
tree6bac0857cc5fccd8d67116f9e3bcb87867accf80 /mm/internal.h
parent7612921f2376d51d020ae2f06ffb7da40422b75b (diff)
downloadlinux-ec6e8c7e03147c65380e6c04c4cf4290e96280b6.tar.xz
mm, page_alloc: disable pcplists during memory offline
Memory offlining relies on page isolation to guarantee a forward progress because pages cannot be reused while they are isolated. But the page isolation itself doesn't prevent from races while freed pages are stored on pcp lists and thus can be reused. This can be worked around by repeated draining of pcplists, as done by commit 968318261221 ("mm/memory_hotplug: drain per-cpu pages again during memory offline"). David and Michal would prefer that this race was closed in a way that callers of page isolation who need stronger guarantees don't need to repeatedly drain. David suggested disabling pcplists usage completely during page isolation, instead of repeatedly draining them. To achieve this without adding special cases in alloc/free fastpath, we can use the same approach as boot pagesets - when pcp->high is 0, any pcplist addition will be immediately flushed. The race can thus be closed by setting pcp->high to 0 and draining pcplists once, before calling start_isolate_page_range(). The draining will serialize after processes that already disabled interrupts and read the old value of pcp->high in free_unref_page_commit(), and processes that have not yet disabled interrupts, will observe pcp->high == 0 when they are rescheduled, and skip pcplists. This guarantees no stray pages on pcplists in zones where isolation happens. This patch thus adds zone_pcp_disable() and zone_pcp_enable() functions that page isolation users can call before start_isolate_page_range() and after unisolating (or offlining) the isolated pages. Also, drain_all_pages() is optimized to only execute on cpus where pcplists are not empty. The check can however race with a free to pcplist that has not yet increased the pcp->count from 0 to 1. Thus make the drain optionally skip the racy check and drain on all cpus, and use this option in zone_pcp_disable(). As we have to avoid external updates to high and batch while pcplists are disabled, we take pcp_batch_high_lock in zone_pcp_disable() and release it in zone_pcp_enable(). This also synchronizes multiple users of zone_pcp_disable()/enable(). Currently the only user of this functionality is offline_pages(). [vbabka@suse.cz: add comment, per David] Link: https://lkml.kernel.org/r/527480ef-ed72-e1c1-52a0-1c5b0113df45@suse.cz Link: https://lkml.kernel.org/r/20201111092812.11329-8-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: David Hildenbrand <david@redhat.com> Suggested-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/internal.h')
-rw-r--r--mm/internal.h2
1 files changed, 2 insertions, 0 deletions
diff --git a/mm/internal.h b/mm/internal.h
index fd6734be9c85..25d2b2439f19 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -204,6 +204,8 @@ extern void free_unref_page_list(struct list_head *list);
extern void zone_pcp_update(struct zone *zone);
extern void zone_pcp_reset(struct zone *zone);
+extern void zone_pcp_disable(struct zone *zone);
+extern void zone_pcp_enable(struct zone *zone);
#if defined CONFIG_COMPACTION || defined CONFIG_CMA