summaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorJohn Fastabend <john.fastabend@gmail.com>2023-05-23 05:56:08 +0300
committerDaniel Borkmann <daniel@iogearbox.net>2023-05-23 17:10:11 +0300
commit405df89dd52cbcd69a3cd7d9a10d64de38f854b2 (patch)
treef71363cbf838188c795fded16e7f6cad145245de /include
parentbce22552f92ea7c577f49839b8e8f7d29afaf880 (diff)
downloadlinux-405df89dd52cbcd69a3cd7d9a10d64de38f854b2.tar.xz
bpf, sockmap: Improved check for empty queue
We noticed some rare sk_buffs were stepping past the queue when system was under memory pressure. The general theory is to skip enqueueing sk_buffs when its not necessary which is the normal case with a system that is properly provisioned for the task, no memory pressure and enough cpu assigned. But, if we can't allocate memory due to an ENOMEM error when enqueueing the sk_buff into the sockmap receive queue we push it onto a delayed workqueue to retry later. When a new sk_buff is received we then check if that queue is empty. However, there is a problem with simply checking the queue length. When a sk_buff is being processed from the ingress queue but not yet on the sockmap msg receive queue its possible to also recv a sk_buff through normal path. It will check the ingress queue which is zero and then skip ahead of the pkt being processed. Previously we used sock lock from both contexts which made the problem harder to hit, but not impossible. To fix instead of popping the skb from the queue entirely we peek the skb from the queue and do the copy there. This ensures checks to the queue length are non-zero while skb is being processed. Then finally when the entire skb has been copied to user space queue or another socket we pop it off the queue. This way the queue length check allows bypassing the queue only after the list has been completely processed. To reproduce issue we run NGINX compliance test with sockmap running and observe some flakes in our testing that we attributed to this issue. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-5-john.fastabend@gmail.com
Diffstat (limited to 'include')
-rw-r--r--include/linux/skmsg.h1
1 files changed, 0 insertions, 1 deletions
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 904ff9a32ad6..054d7911bfc9 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -71,7 +71,6 @@ struct sk_psock_link {
};
struct sk_psock_work_state {
- struct sk_buff *skb;
u32 len;
u32 off;
};