inet: frags: batch fqdir destroy works

On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls make the number of active slab objects including 'sock_inode_cache' type rapidly and continuously increase. As a result, memory pressure occurs. In more detail, I made an artificial reproducer that resembles the workload that we found the problem and reproduce the problem faster. It merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop. It takes about 2 minutes. On 40 CPU cores / 70GB DRAM machine, the available memory continuously reduced in a fast speed (about 120MB per second, 15GB in total within the 2 minutes). Note that the issue don't reproduce on every machine. On my 6 CPU cores machine, the problem didn't reproduce. 'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the relevant memory objects. They are asynchronously invoked by the work queues and internally use 'rcu_barrier()' to ensure safe destructions. 'cleanup_net()' works in a batched maneer in a single thread worker, while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the 'system_wq'. Therefore, 'fqdir_work_fn()' called frequently under the workload and made the contention for 'rcu_barrier()' high. In more detail, the global mutex, 'rcu_state.barrier_mutex' became the bottleneck. This commit avoids such contention by doing the 'rcu_barrier()' and subsequent lightweight works in a batched manner, as similar to that of 'cleanup_net()'. The fqdir hashtable destruction, which is done before the 'rcu_barrier()', is still allowed to run in parallel for fast processing, but this commit makes it to use a dedicated work queue instead of the 'system_wq', to make sure that the number of threads is bounded. Signed-off-by: SeongJae Park <sjpark@amazon.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201211112405.31158-1-sjpark@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
author: SeongJae Park <sjpark@amazon.de> 2020-12-11 14:24:05 +0300
committer: Jakub Kicinski <kuba@kernel.org> 2020-12-13 02:08:54 +0300
commit: 0b9b241406818a871c6d25390aa487dba966d548 (patch)
tree: 8d1b4990106eed407ef7704833fe892584c37221 /include/net/inet_frag.h
parent: e0a64d1dffca048a99546993322bd1fb5c728ee8 (diff)
download: linux-0b9b241406818a871c6d25390aa487dba966d548.tar.xz
1 files changed, 1 insertions, 0 deletions
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index bac79e817776..48cc5795ceda 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -21,6 +21,7 @@ struct fqdir {
 	/* Keep atomic mem on separate cachelines in structs that include it */
 	atomic_long_t		mem ____cacheline_aligned_in_smp;
 	struct work_struct	destroy_work;
+	struct llist_node	free_list;
 };
 
 /**
author	SeongJae Park <sjpark@amazon.de>	2020-12-11 14:24:05 +0300
committer	Jakub Kicinski <kuba@kernel.org>	2020-12-13 02:08:54 +0300
commit	0b9b241406818a871c6d25390aa487dba966d548 (patch)
tree	8d1b4990106eed407ef7704833fe892584c37221 /include/net/inet_frag.h
parent	e0a64d1dffca048a99546993322bd1fb5c728ee8 (diff)
download	linux-0b9b241406818a871c6d25390aa487dba966d548.tar.xz