diff options
author | Huang Ying <ying.huang@intel.com> | 2023-10-16 08:29:56 +0300 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2023-10-26 02:47:10 +0300 |
commit | 362d37a106dd3f6431b2fdd91d9208b0d023b50d (patch) | |
tree | f51d1db4a765d1594fd34b62a96ca872d8e639da /include/linux | |
parent | 94a3bfe4073cd88b05f7fb201ea7bf9dfa2cf5d5 (diff) | |
download | linux-362d37a106dd3f6431b2fdd91d9208b0d023b50d.tar.xz |
mm, pcp: reduce lock contention for draining high-order pages
In commit f26b3fa04611 ("mm/page_alloc: limit number of high-order pages
on PCP during bulk free"), the PCP (Per-CPU Pageset) will be drained when
PCP is mostly used for high-order pages freeing to improve the cache-hot
pages reusing between page allocating and freeing CPUs.
On system with small per-CPU data cache slice, pages shouldn't be cached
before draining to guarantee cache-hot. But on a system with large
per-CPU data cache slice, some pages can be cached before draining to
reduce zone lock contention.
So, in this patch, instead of draining without any caching, "pcp->batch"
pages will be cached in PCP before draining if the size of the per-CPU
data cache slice is more than "3 * batch".
In theory, if the size of per-CPU data cache slice is more than "2 *
batch", we can reuse cache-hot pages between CPUs. But considering the
other usage of cache (code, other data accessing, etc.), "3 * batch" is
used.
Note: "3 * batch" is chosen to make sure the optimization works on recent
x86_64 server CPUs. If you want to increase it, please check whether it
breaks the optimization.
On a 2-socket Intel server with 128 logical CPU, with the patch, the
network bandwidth of the UNIX (AF_UNIX) test case of lmbench test suite
with 16-pair processes increase 70.5%. The cycles% of the spinlock
contention (mostly for zone lock) decreases from 46.1% to 21.3%. The
number of PCP draining for high order pages freeing (free_high) decreases
89.9%. The cache miss rate keeps 0.2%.
Link: https://lkml.kernel.org/r/20231016053002.756205-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'include/linux')
-rw-r--r-- | include/linux/gfp.h | 1 | ||||
-rw-r--r-- | include/linux/mmzone.h | 6 |
2 files changed, 7 insertions, 0 deletions
diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 665f06675c83..665edc11fb9f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -325,6 +325,7 @@ void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); void page_alloc_init_late(void); +void setup_pcp_cacheinfo(void); /* * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index de313f1c15f9..efe72b3f7872 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -680,8 +680,14 @@ enum zone_watermarks { * PCPF_PREV_FREE_HIGH_ORDER: a high-order page is freed in the * previous page freeing. To avoid to drain PCP for an accident * high-order page freeing. + * + * PCPF_FREE_HIGH_BATCH: preserve "pcp->batch" pages in PCP before + * draining PCP for consecutive high-order pages freeing without + * allocation if data cache slice of CPU is large enough. To reduce + * zone lock contention and keep cache-hot pages reusing. */ #define PCPF_PREV_FREE_HIGH_ORDER BIT(0) +#define PCPF_FREE_HIGH_BATCH BIT(1) struct per_cpu_pages { spinlock_t lock; /* Protects lists field */ |