From d0bf7d5759c1d89fb013aa41cca5832e00b9632a Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Tue, 17 Jan 2023 14:40:00 +0100 Subject: mm/slab: introduce kmem_cache flag SLAB_NO_MERGE Allow API users of kmem_cache_create to specify that they don't want any slab merge or aliasing (with similar sized objects). Use this in kfence_test. The SKB (sk_buff) kmem_cache slab is critical for network performance. Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain performance by amortising the alloc/free cost. For the bulk API to perform efficiently the slub fragmentation need to be low. Especially for the SLUB allocator, the efficiency of bulk free API depend on objects belonging to the same slab (page). When running different network performance microbenchmarks, I started to notice that performance was reduced (slightly) when machines had longer uptimes. I believe the cause was 'skbuff_head_cache' got aliased/merged into the general slub for 256 bytes sized objects (with my kernel config, without CONFIG_HARDENED_USERCOPY). For SKB kmem_cache network stack have reasons for not merging, but it varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE for this kmem_cache. Another use case for the flag has been described by David Sterba [1]: > This can be used for more fine grained control over the caches or for > debugging builds where separate slabs can verify that no objects leak. > The slab_nomerge boot option is too coarse and would need to be > enabled on all testing hosts. There are some other ways how to disable > merging, e.g. a slab constructor but this disables poisoning besides > that it adds additional overhead. Other flags are internal and may > have other semantics. > A concrete example what motivates the flag. During 'btrfs balance' > slab top reported huge increase in caches like > 1330095 1330095 100% 0.10K 34105 39 136420K Acpi-ParseExt > 1734684 1734684 100% 0.14K 61953 28 247812K pid_namespace > 8244036 6873075 83% 0.11K 229001 36 916004K khugepaged_mm_slot > which was confusing and that it's because of slab merging was not the > first idea. After rebooting with slab_nomerge all the caches were > from btrfs_ namespace as expected. [1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/ [ vbabka@suse.cz: rename to SLAB_NO_MERGE, change the flag value to the one proposed by David so it does not collide with internal SLAB/SLUB flags, write a comment for the flag, expand changelog, drop the skbuff part to be handled spearately ] Link: https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/ Reported-by: David Sterba Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Vlastimil Babka Acked-by: Jesper Dangaard Brouer Acked-by: Roman Gushchin --- include/linux/slab.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include') diff --git a/include/linux/slab.h b/include/linux/slab.h index 6b3e155b70bf..72bc906d8bc7 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -106,6 +106,18 @@ /* Avoid kmemleak tracing */ #define SLAB_NOLEAKTRACE ((slab_flags_t __force)0x00800000U) +/* + * Prevent merging with compatible kmem caches. This flag should be used + * cautiously. Valid use cases: + * + * - caches created for self-tests (e.g. kunit) + * - general caches created and used by a subsystem, only when a + * (subsystem-specific) debug option is enabled + * - performance critical caches, should be very rare and consulted with slab + * maintainers, and not used together with CONFIG_SLUB_TINY + */ +#define SLAB_NO_MERGE ((slab_flags_t __force)0x01000000U) + /* Fault injection mark */ #ifdef CONFIG_FAILSLAB # define SLAB_FAILSLAB ((slab_flags_t __force)0x02000000U) -- cgit v1.2.3 From 9ca73f2645706230249c4ec2a2b0cab9515987c8 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 17 Apr 2023 19:04:49 +0000 Subject: mm/slab: add a missing semicolon on SLAB_TYPESAFE_BY_RCU example code An example code snippet for SLAB_TYPESAFE_BY_RCU is missing a semicolon. Add it. Signed-off-by: SeongJae Park Reviewed-by: Paul E. McKenney Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/slab.h b/include/linux/slab.h index 6b3e155b70bf..5eeedbfffcd2 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -53,7 +53,7 @@ * stays valid, the trick to using this is relying on an independent * object validation pass. Something like: * - * rcu_read_lock() + * rcu_read_lock(); * again: * obj = lockless_lookup(key); * if (obj) { -- cgit v1.2.3 From 1143c9d9d7602f20ba7bb3cef0d07b10f23cbef7 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 17 Apr 2023 19:04:50 +0000 Subject: mm/slab: break up RCU readers on SLAB_TYPESAFE_BY_RCU example code The SLAB_TYPESAFE_BY_RCU example code snippet uses a single RCU read-side critical section for retries. 'Documentation/RCU/rculist_nulls.rst' has similar example code snippet, and commit da82af04352b ("doc: Update and wordsmith rculist_nulls.rst") broke it up. Apply the change to SLAB_TYPESAFE_BY_RCU example code snippet, too. Signed-off-by: SeongJae Park Reviewed-by: Paul E. McKenney Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'include') diff --git a/include/linux/slab.h b/include/linux/slab.h index 5eeedbfffcd2..c6bc05765bdb 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -53,16 +53,18 @@ * stays valid, the trick to using this is relying on an independent * object validation pass. Something like: * + * begin: * rcu_read_lock(); - * again: * obj = lockless_lookup(key); * if (obj) { * if (!try_get_ref(obj)) // might fail for free objects - * goto again; + * rcu_read_unlock(); + * goto begin; * * if (obj->key != key) { // not the object we expected * put_ref(obj); - * goto again; + * rcu_read_unlock(); + * goto begin; * } * } * rcu_read_unlock(); -- cgit v1.2.3