summaryrefslogtreecommitdiff
path: root/net/netfilter
AgeCommit message (Collapse)AuthorFilesLines
2023-12-06netfilter: xt_owner: Fix for unsafe access of sk->sk_socketPhil Sutter1-4/+12
A concurrently running sock_orphan() may NULL the sk_socket pointer in between check and deref. Follow other users (like nft_meta.c for instance) and acquire sk_callback_lock before dereferencing sk_socket. Fixes: 0265ab44bacc ("[NETFILTER]: merge ipt_owner/ip6t_owner in xt_owner") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: nf_tables: validate family when identifying table via handlePablo Neira Ayuso1-2/+3
Validate table family when looking up for it via NFTA_TABLE_HANDLE. Fixes: 3ecbfd65f50e ("netfilter: nf_tables: allocate handle and delete objects via handle") Reported-by: Xingyuan Mo <hdthky0@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: nf_tables: bail out on mismatching dynset and set expressionsPablo Neira Ayuso1-4/+9
If dynset expressions provided by userspace is larger than the declared set expressions, then bail out. Fixes: 48b0ae046ee9 ("netfilter: nftables: netlink support for several set element expressions") Reported-by: Xingyuan Mo <hdthky0@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: nf_tables: fix 'exist' matching on bigendian archesFlorian Westphal2-4/+8
Maze reports "tcp option fastopen exists" fails to match on OpenWrt 22.03.5, r20134-5f15225c1e (5.10.176) router. "tcp option fastopen exists" translates to: inet [ exthdr load tcpopt 1b @ 34 + 0 present => reg 1 ] [ cmp eq reg 1 0x00000001 ] .. but existing nft userspace generates a 1-byte compare. On LSB (x86), "*reg32 = 1" is identical to nft_reg_store8(reg32, 1), but not on MSB, which will place the 1 last. IOW, on bigendian aches the cmp8 is awalys false. Make sure we store this in a consistent fashion, so existing userspace will also work on MSB (bigendian). Regardless of this patch we can also change nft userspace to generate 'reg32 == 0' and 'reg32 != 0' instead of u8 == 0 // u8 == 1 when adding 'option x missing/exists' expressions as well. Fixes: 3c1fece8819e ("netfilter: nft_exthdr: Allow checking TCP option presence, too") Fixes: b9f9a485fb0e ("netfilter: nft_exthdr: add boolean DCCP option matching") Fixes: 055c4b34b94f ("netfilter: nft_fib: Support existence check") Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com> Closes: https://lore.kernel.org/netfilter-devel/CAHo-OozyEqHUjL2-ntATzeZOiuftLWZ_HU6TOM_js4qLfDEAJg@mail.gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: nft_set_pipapo: skip inactive elements during set walkFlorian Westphal1-0/+3
Otherwise set elements can be deactivated twice which will cause a crash. Reported-by: Xingyuan Mo <hdthky0@gmail.com> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-06netfilter: bpf: fix bad registration on nf_defragD. Wythe1-5/+5
We should pass a pointer to global_hook to the get_proto_defrag_hook() instead of its value, since the passed value won't be updated even if the request module was loaded successfully. Log: [ 54.915713] nf_defrag_ipv4 has bad registration [ 54.915779] WARNING: CPU: 3 PID: 6323 at net/netfilter/nf_bpf_link.c:62 get_proto_defrag_hook+0x137/0x160 [ 54.915835] CPU: 3 PID: 6323 Comm: fentry Kdump: loaded Tainted: G E 6.7.0-rc2+ #35 [ 54.915839] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 54.915841] RIP: 0010:get_proto_defrag_hook+0x137/0x160 [ 54.915844] Code: 4f 8c e8 2c cf 68 ff 80 3d db 83 9a 01 00 0f 85 74 ff ff ff 48 89 ee 48 c7 c7 8f 12 4f 8c c6 05 c4 83 9a 01 01 e8 09 ee 5f ff <0f> 0b e9 57 ff ff ff 49 8b 3c 24 4c 63 e5 e8 36 28 6c ff 4c 89 e0 [ 54.915849] RSP: 0018:ffffb676003fbdb0 EFLAGS: 00010286 [ 54.915852] RAX: 0000000000000023 RBX: ffff9596503d5600 RCX: ffff95996fce08c8 [ 54.915854] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff95996fce08c0 [ 54.915855] RBP: ffffffff8c4f12de R08: 0000000000000000 R09: 00000000fffeffff [ 54.915859] R10: ffffb676003fbc70 R11: ffffffff8d363ae8 R12: 0000000000000000 [ 54.915861] R13: ffffffff8e1f75c0 R14: ffffb676003c9000 R15: 00007ffd15e78ef0 [ 54.915864] FS: 00007fb6e9cab740(0000) GS:ffff95996fcc0000(0000) knlGS:0000000000000000 [ 54.915867] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.915868] CR2: 00007ffd15e75c40 CR3: 0000000101e62006 CR4: 0000000000360ef0 [ 54.915870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 54.915871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 54.915873] Call Trace: [ 54.915891] <TASK> [ 54.915894] ? __warn+0x84/0x140 [ 54.915905] ? get_proto_defrag_hook+0x137/0x160 [ 54.915908] ? __report_bug+0xea/0x100 [ 54.915925] ? report_bug+0x2b/0x80 [ 54.915928] ? handle_bug+0x3c/0x70 [ 54.915939] ? exc_invalid_op+0x18/0x70 [ 54.915942] ? asm_exc_invalid_op+0x1a/0x20 [ 54.915948] ? get_proto_defrag_hook+0x137/0x160 [ 54.915950] bpf_nf_link_attach+0x1eb/0x240 [ 54.915953] link_create+0x173/0x290 [ 54.915969] __sys_bpf+0x588/0x8f0 [ 54.915974] __x64_sys_bpf+0x20/0x30 [ 54.915977] do_syscall_64+0x45/0xf0 [ 54.915989] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 54.915998] RIP: 0033:0x7fb6e9daa51d [ 54.916001] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2b 89 0c 00 f7 d8 64 89 01 48 [ 54.916003] RSP: 002b:00007ffd15e78ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 [ 54.916006] RAX: ffffffffffffffda RBX: 00007ffd15e78fc0 RCX: 00007fb6e9daa51d [ 54.916007] RDX: 0000000000000040 RSI: 00007ffd15e78ef0 RDI: 000000000000001c [ 54.916009] RBP: 000000000000002d R08: 00007fb6e9e73a60 R09: 0000000000000001 [ 54.916010] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006 [ 54.916012] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000 [ 54.916014] </TASK> [ 54.916015] ---[ end trace 0000000000000000 ]--- Fixes: 91721c2d02d3 ("netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-14netfilter: nf_tables: split async and sync catchall in two functionsPablo Neira Ayuso1-25/+30
list_for_each_entry_safe() does not work for the async case which runs under RCU, therefore, split GC logic for catchall in two functions instead, one for each of the sync and async GC variants. The catchall sync GC variant never sees a _DEAD bit set on ever, thus, this handling is removed in such case, moreover, allocate GC sync batch via GFP_KERNEL. Fixes: 93995bf4af2c ("netfilter: nf_tables: remove catchall element in GC sync path") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-14netfilter: ipset: fix race condition between swap/destroy and kernel side ↵Jozsef Kadlecsik1-7/+7
add/del/test Linkui Xiao reported that there's a race condition when ipset swap and destroy is called, which can lead to crash in add/del/test element operations. Swap then destroy are usual operations to replace a set with another one in a production system. The issue can in some cases be reproduced with the script: ipset create hash_ip1 hash:net family inet hashsize 1024 maxelem 1048576 ipset add hash_ip1 172.20.0.0/16 ipset add hash_ip1 192.168.0.0/16 iptables -A INPUT -m set --match-set hash_ip1 src -j ACCEPT while [ 1 ] do # ... Ongoing traffic... ipset create hash_ip2 hash:net family inet hashsize 1024 maxelem 1048576 ipset add hash_ip2 172.20.0.0/16 ipset swap hash_ip1 hash_ip2 ipset destroy hash_ip2 sleep 0.05 done In the race case the possible order of the operations are CPU0 CPU1 ip_set_test ipset swap hash_ip1 hash_ip2 ipset destroy hash_ip2 hash_net_kadt Swap replaces hash_ip1 with hash_ip2 and then destroy removes hash_ip2 which is the original hash_ip1. ip_set_test was called on hash_ip1 and because destroy removed it, hash_net_kadt crashes. The fix is to force ip_set_swap() to wait for all readers to finish accessing the old set pointers by calling synchronize_rcu(). The first version of the patch was written by Linkui Xiao <xiaolinkui@kylinos.cn>. v2: synchronize_rcu() is moved into ip_set_swap() in order not to burden ip_set_destroy() unnecessarily when all sets are destroyed. v3: Florian Westphal pointed out that all netfilter hooks run with rcu_read_lock() held and em_ipset.c wraps the entire ip_set_test() in rcu read lock/unlock pair. So there's no need to extend the rcu read locked area in ipset itself. Closes: https://lore.kernel.org/all/69e7963b-e7f8-3ad0-210-7b86eebf7f78@netfilter.org/ Reported by: Linkui Xiao <xiaolinkui@kylinos.cn> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-14netfilter: nf_tables: bogus ENOENT when destroying element which does not existPablo Neira Ayuso1-2/+3
destroy element command bogusly reports ENOENT in case a set element does not exist. ENOENT errors are skipped, however, err is still set and propagated to userspace. # nft destroy element ip raw BLACKLIST { 1.2.3.4 } Error: Could not process rule: No such file or directory destroy element ip raw BLACKLIST { 1.2.3.4 } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Fixes: f80a612dd77c ("netfilter: nf_tables: add support to destroy operation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-14netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()Dan Carpenter2-3/+4
The problem is in nft_byteorder_eval() where we are iterating through a loop and writing to dst[0], dst[1], dst[2] and so on... On each iteration we are writing 8 bytes. But dst[] is an array of u32 so each element only has space for 4 bytes. That means that every iteration overwrites part of the previous element. I spotted this bug while reviewing commit caf3ef7468f7 ("netfilter: nf_tables: prevent OOB access in nft_byteorder_eval") which is a related issue. I think that the reason we have not detected this bug in testing is that most of time we only write one element. Fixes: ce1e7989d989 ("netfilter: nft_byteorder: provide 64bit le/be conversion") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-14netfilter: nft_set_rbtree: Remove unused variable nft_netYang Li1-2/+0
The code that uses nft_net has been removed, and the nft_pernet function is merely obtaining a reference to shared data through the net pointer. The content of the net pointer is not modified or changed, so both of them should be removed. silence the warning: net/netfilter/nft_set_rbtree.c:627:26: warning: variable ‘nft_net’ set but not used Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7103 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-09Merge tag 'nf-23-11-08' of ↵Jakub Kicinski27-7/+69
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Add missing netfilter modules description to fix W=1, from Florian Westphal. 2) Fix catch-all element GC with timeout when use with the pipapo set backend, this remained broken since I tried to fix it this summer, then another attempt to fix it recently. 3) Add missing IPVS modules descriptions to fix W=1, also from Florian. 4) xt_recent allocated a too small buffer to store an IPv4-mapped IPv6 address which can be parsed by in6_pton(), from Maciej Zenczykowski. Broken for many releases. 5) Skip IPv4-mapped IPv6, IPv4-compat IPv6, site/link local scoped IPv6 addressses to set up IPv6 NAT redirect, also from Florian. This is broken since 2012. * tag 'nf-23-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses netfilter: xt_recent: fix (increase) ipv6 literal buffer length ipvs: add missing module descriptions netfilter: nf_tables: remove catchall element in GC sync path netfilter: add missing module descriptions ==================== Link: https://lore.kernel.org/r/20231108155802.84617-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-08netfilter: nat: fix ipv6 nat redirect with mapped and scoped addressesFlorian Westphal1-1/+26
The ipv6 redirect target was derived from the ipv4 one, i.e. its identical to a 'dnat' with the first (primary) address assigned to the network interface. The code has been moved around to make it usable from nf_tables too, but its still the same as it was back when this was added in 2012. IPv6, however, has different types of addresses, if the 'wrong' address comes first the redirection does not work. In Daniels case, the addresses are: inet6 ::ffff:192 ... inet6 2a01: ... ... so the function attempts to redirect to the mapped address. Add more checks before the address is deemed correct: 1. If the packets' daddr is scoped, search for a scoped address too 2. skip tentative addresses 3. skip mapped addresses Use the first address that appears to match our needs. Reported-by: Daniel Huhardeaux <tech@tootai.net> Closes: https://lore.kernel.org/netfilter/71be06b8-6aa0-4cf9-9e0b-e2839b01b22f@tootai.net/ Fixes: 115e23ac78f8 ("netfilter: ip6tables: add REDIRECT target") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-08netfilter: xt_recent: fix (increase) ipv6 literal buffer lengthMaciej Żenczykowski1-1/+1
in6_pton() supports 'low-32-bit dot-decimal representation' (this is useful with DNS64/NAT64 networks for example): # echo +aaaa:bbbb:cccc:dddd:eeee:ffff:1.2.3.4 > /proc/self/net/xt_recent/DEFAULT # cat /proc/self/net/xt_recent/DEFAULT src=aaaa:bbbb:cccc:dddd:eeee:ffff:0102:0304 ttl: 0 last_seen: 9733848829 oldest_pkt: 1 9733848829 but the provided buffer is too short: # echo +aaaa:bbbb:cccc:dddd:eeee:ffff:255.255.255.255 > /proc/self/net/xt_recent/DEFAULT -bash: echo: write error: Invalid argument Fixes: 079aa88fe717 ("netfilter: xt_recent: IPv6 support") Signed-off-by: Maciej Żenczykowski <zenczykowski@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-08ipvs: add missing module descriptionsFlorian Westphal16-0/+16
W=1 builds warn on missing MODULE_DESCRIPTION, add them. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-08netfilter: nf_tables: remove catchall element in GC sync pathPablo Neira Ayuso1-5/+17
The expired catchall element is not deactivated and removed from GC sync path. This path holds mutex so just call nft_setelem_data_deactivate() and nft_setelem_catchall_remove() before queueing the GC work. Fixes: 4a9e12ea7e70 ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC") Reported-by: lonial con <kongln9170@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-08netfilter: add missing module descriptionsFlorian Westphal9-0/+9
W=1 builds warn on missing MODULE_DESCRIPTION, add them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-02bpf: Add __bpf_kfunc_{start,end}_defs macrosDave Marchevsky2-8/+4
BPF kfuncs are meant to be called from BPF programs. Accordingly, most kfuncs are not called from anywhere in the kernel, which the -Wmissing-prototypes warning is unhappy about. We've peppered __diag_ignore_all("-Wmissing-prototypes", ... everywhere kfuncs are defined in the codebase to suppress this warning. This patch adds two macros meant to bound one or many kfunc definitions. All existing kfunc definitions which use these __diag calls to suppress -Wmissing-prototypes are migrated to use the newly-introduced macros. A new __diag_ignore_all - for "-Wmissing-declarations" - is added to the __bpf_kfunc_start_defs macro based on feedback from Andrii on an earlier version of this patch [0] and another recent mailing list thread [1]. In the future we might need to ignore different warnings or do other kfunc-specific things. This change will make it easier to make such modifications for all kfunc defs. [0]: https://lore.kernel.org/bpf/CAEf4BzaE5dRWtK6RPLnjTW-MW9sx9K3Fn6uwqCTChK2Dcb1Xig@mail.gmail.com/ [1]: https://lore.kernel.org/bpf/ZT+2qCc%2FaXep0%2FLf@krava/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <olsajiri@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: David Vernet <void@manifault.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20231031215625.2343848-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-31Merge tag 'ipsec-next-2023-10-28' of ↵Jakub Kicinski1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2023-10-28 1) Remove unused function declarations of xfrm4_extract_input and xfrm6_extract_input. From Yue Haibing. 2) Annotate struct xfrm_sec_ctx with __counted_by. From Kees Cook. 3) Support GRO decapsulation for ESP in UDP encapsulation. From Antony Antony et all. 4) Replace the xfrm session decode with flow dissector. From Florian Westphal. 5) Fix a use after free in __xfrm6_udp_encap_rcv. 6) Fix the layer 4 flowi decoding. From Florian Westphal. * tag 'ipsec-next-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: policy: fix layer 4 flowi decoding xfrm Fix use after free in __xfrm6_udp_encap_rcv. xfrm: policy: replace session decode with flow dissector xfrm: move mark and oif flowi decode into common code xfrm: pass struct net to xfrm_decode_session wrappers xfrm: Support GRO for IPv6 ESP in UDP encapsulation xfrm: Support GRO for IPv4 ESP in UDP encapsulation xfrm: Use the XFRM_GRO to indicate a GRO call on input xfrm: Annotate struct xfrm_sec_ctx with __counted_by xfrm: Remove unused function declarations ==================== Link: https://lore.kernel.org/r/20231028084328.3119236-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-7/+7
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26Merge tag 'nf-next-23-10-25' of ↵Paolo Abeni8-438/+486
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. Mostly nf_tables updates with two patches for connlabel and br_netfilter. 1) Rename function name to perform on-demand GC for rbtree elements, and replace async GC in rbtree by sync GC. Patches from Florian Westphal. 2) Use commit_mutex for NFT_MSG_GETRULE_RESET to ensure that two concurrent threads invoking this command do not underrun stateful objects. Patches from Phil Sutter. 3) Use single hook to deal with IP and ARP packets in br_netfilter. Patch from Florian Westphal. 4) Use atomic_t in netns->connlabel use counter instead of using a spinlock, also patch from Florian. 5) Cleanups for stateful objects infrastructure in nf_tables. Patches from Phil Sutter. 6) Flush path uses opaque set element offered by the iterator, instead of calling pipapo_deactivate() which looks up for it again. 7) Set backend .flush interface always succeeds, make it return void instead. 8) Add struct nft_elem_priv placeholder structure and use it by replacing void * to pass opaque set element representation from backend to frontend which defeats compiler type checks. 9) Shrink memory consumption of set element transactions, by reducing struct nft_trans_elem object size and reducing stack memory usage. 10) Use struct nft_elem_priv also for set backend .insert operation too. 11) Carry reset flag in nft_set_dump_ctx structure, instead of passing it as a function argument, from Phil Sutter. netfilter pull request 23-10-25 * tag 'nf-next-23-10-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctx netfilter: nf_tables: set->ops->insert returns opaque set element in case of EEXIST netfilter: nf_tables: shrink memory consumption of set elements netfilter: nf_tables: expose opaque set element as struct nft_elem_priv netfilter: nf_tables: set backend .flush always succeeds netfilter: nft_set_pipapo: no need to call pipapo_deactivate() from flush netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctx netfilter: nf_tables: nft_obj_filter fits into cb->ctx netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctx netfilter: nf_tables: A better name for nft_obj_filter netfilter: nf_tables: Unconditionally allocate nft_obj_filter netfilter: nf_tables: Drop pointless memset in nf_tables_dump_obj netfilter: conntrack: switch connlabels to atomic_t br_netfilter: use single forward hook for ip and arp netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests netfilter: nf_tables: Introduce nf_tables_getrule_single() netfilter: nf_tables: Open-code audit log call in nf_tables_getrule() netfilter: nft_set_rbtree: prefer sync gc to async worker netfilter: nft_set_rbtree: rename gc deactivate+erase function ==================== Link: https://lore.kernel.org/r/20231025212555.132775-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-25netfilter: flowtable: GC pushes back packets to classic pathPablo Neira Ayuso1-7/+7
Since 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY back to classic path in every run, ie. every second. This is because of a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct. In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on and IPS_SEEN_REPLY is unreliable since users decide when to offload the flow before, such bit might be set on at a later stage. Fix it by adding a custom .gc handler that sched/act_ct can use to deal with its NF_FLOW_HW_ESTABLISHED bit. Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple") Reported-by: Vladimir Smelhaus <vl.sm@email.cz> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: Carry reset boolean in nft_set_dump_ctxPhil Sutter1-10/+8
Relieve the dump callback from having to check nlmsg_type upon each call. Prep work for set element reset locking. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: set->ops->insert returns opaque set element in case of ↵Pablo Neira Ayuso5-23/+26
EEXIST Return struct nft_elem_priv instead of struct nft_set_ext for consistency with ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv") and to prepare the introduction of element timeout updates from control path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: shrink memory consumption of set elementsPablo Neira Ayuso5-151/+107
Instead of copying struct nft_set_elem into struct nft_trans_elem, store the pointer to the opaque set element object in the transaction. Adapt set backend API (and set backend implementations) to take the pointer to opaque set element representation whenever required. This patch deconstifies .remove() and .activate() set backend API since these modify the set element opaque object. And it also constify nft_set_elem_ext() this provides access to the nft_set_ext struct without updating the object. According to pahole on x86_64, this patch shrinks struct nft_trans_elem size from 216 to 24 bytes. This patch also reduces stack memory consumption by removing the template struct nft_set_elem object, using the opaque set element object instead such as from the set iterator API, catchall elements and the get element command. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: expose opaque set element as struct nft_elem_privPablo Neira Ayuso7-108/+148
Add placeholder structure and place it at the beginning of each struct nft_*_elem for each existing set backend, instead of exposing elements as void type to the frontend which defeats compiler type checks. Use this pointer to this new type to replace void *. This patch updates the following set backend API to use this new struct nft_elem_priv placeholder structure: - update - deactivate - flush - get as well as the following helper functions: - nft_set_elem_ext() - nft_set_elem_init() - nft_set_elem_destroy() - nf_tables_set_elem_destroy() This patch adds nft_elem_priv_cast() to cast struct nft_elem_priv to native element representation from the corresponding set backend. BUILD_BUG_ON() makes sure this .priv placeholder is always at the top of the opaque set element representation. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: set backend .flush always succeedsPablo Neira Ayuso5-22/+6
.flush is always successful since this results from iterating over the set elements to toggle mark the element as inactive in the next generation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nft_set_pipapo: no need to call pipapo_deactivate() from flushPablo Neira Ayuso1-2/+3
Use the element object that is already offered instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: Carry reset boolean in nft_obj_dump_ctxPhil Sutter1-6/+6
Relieve the dump callback from having to inspect nlmsg_type upon each call, just do it once at start of the dump. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: nft_obj_filter fits into cb->ctxPhil Sutter1-11/+5
No need to allocate it if one may just use struct netlink_callback's scratch area for it. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: Carry s_idx in nft_obj_dump_ctxPhil Sutter1-4/+5
Prep work for moving the context into struct netlink_callback scratch area. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: A better name for nft_obj_filterPhil Sutter1-16/+16
Name it for what it is supposed to become, a real nft_obj_dump_ctx. No functional change intended. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: Unconditionally allocate nft_obj_filterPhil Sutter1-21/+15
Prep work for moving the filter into struct netlink_callback's scratch area. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: Drop pointless memset in nf_tables_dump_objPhil Sutter1-3/+0
The code does not make use of cb->args fields past the first one, no need to zero them. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: conntrack: switch connlabels to atomic_tFlorian Westphal1-9/+8
The spinlock is back from the day when connabels did not have a fixed size and reallocation had to be supported. Remove it. This change also allows to call the helpers from softirq or timers without deadlocks. Also add WARN()s to catch refcounting imbalances. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requestsPhil Sutter1-13/+64
Rule reset is not concurrency-safe per-se, so multiple CPUs may reset the same rule at the same time. At least counter and quota expressions will suffer from value underruns in this case. Prevent this by introducing dedicated locking callbacks for nfnetlink and the asynchronous dump handling to serialize access. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: Introduce nf_tables_getrule_single()Phil Sutter1-31/+43
Outsource the reply skb preparation for non-dump getrule requests into a distinct function. Prep work for rule reset locking. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nf_tables: Open-code audit log call in nf_tables_getrule()Phil Sutter1-4/+15
The table lookup will be dropped from that function, so remove that dependency from audit logging code. Using whatever is in nla[NFTA_RULE_TABLE] is sufficient as long as the previous rule info filling succeded. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nft_set_rbtree: prefer sync gc to async workerFlorian Westphal1-59/+65
There is no need for asynchronous garbage collection, rbtree inserts can only happen from the netlink control plane. We already perform on-demand gc on insertion, in the area of the tree where the insertion takes place, but we don't do a full tree walk there for performance reasons. Do a full gc walk at the end of the transaction instead and remove the async worker. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-24netfilter: nft_set_rbtree: rename gc deactivate+erase functionFlorian Westphal1-5/+6
Next patch adds a cllaer that doesn't hold the priv->write lock and will need a similar function. Rename the existing function to make it clear that it can only be used for opportunistic gc during insertion. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-10-23tcp: introduce tcp_clock_ms()Eric Dumazet1-1/+1
It delivers current TCP time stamp in ms unit, and is used in place of confusing tcp_time_stamp_raw() It is the same family than tcp_clock_ns() and tcp_clock_ms(). tcp_time_stamp_raw() will be replaced later for TSval contexts with a more descriptive name. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski6-39/+40
Cross-merge networking fixes after downstream PR. net/mac80211/key.c 02e0e426a2fb ("wifi: mac80211: fix error path key leak") 2a8b665e6bcc ("wifi: mac80211: remove key_mtx") 7d6904bf26b9 ("Merge wireless into wireless-next") https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/ti/Kconfig a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object") 98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18netfilter: nf_tables: revert do not remove elements if set backend ↵Pablo Neira Ayuso1-4/+1
implements .abort nf_tables_abort_release() path calls nft_set_elem_destroy() for NFT_MSG_NEWSETELEM which releases the element, however, a reference to the element still remains in the working copy. Fixes: ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: nft_set_rbtree: .deactivate fails if element has expiredPablo Neira Ayuso1-0/+2
This allows to remove an expired element which is not possible in other existing set backends, this is more noticeable if gc-interval is high so expired elements remain in the tree. On-demand gc also does not help in this case, because this is delete element path. Return NULL if element has expired. Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: nf_tables: audit log object reset once per tablePhil Sutter1-22/+28
When resetting multiple objects at once (via dump request), emit a log message per table (or filled skb) and resurrect the 'entries' parameter to contain the number of objects being logged for. To test the skb exhaustion path, perform some bulk counter and quota adds in the kselftest. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> (Audit) Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: nf_tables: de-constify set commit ops function argumentFlorian Westphal1-4/+3
The set backend using this already has to work around this via ugly cast, don't spread this pattern. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: make nftables drops visible in net dropmonitorFlorian Westphal2-4/+8
net_dropmonitor blames core.c:nf_hook_slow. Add NF_DROP_REASON() helper and use it in nft_do_chain(). The helper releases the skb, so exact drop location becomes available. Calling code will observe the NF_STOLEN verdict instead. Adjust nf_hook_slow so we can embed an erro value wih NF_STOLEN verdicts, just like we do for NF_DROP. After this, drop in nftables can be pinpointed to a drop due to a rule or the chain policy. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: nf_nat: mask out non-verdict bits when checking return valueFlorian Westphal1-2/+3
Same as previous change: we need to mask out the non-verdict bits, as upcoming patches may embed an errno value in NF_STOLEN verdicts too. NF_DROP could already do this, but not all called functions do this. Checks that only test ret vs NF_ACCEPT are fine, the 'errno parts' are always 0 for those. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: conntrack: convert nf_conntrack_update to netfilter verdictsFlorian Westphal2-31/+42
This function calls helpers that can return nf-verdicts, but then those get converted to -1/0 as thats what the caller expects. Theoretically NF_DROP could have an errno number set in the upper 24 bits of the return value. Or any of those helpers could return NF_STOLEN, which would result in use-after-free. This is fine as-is, the called functions don't do this yet. But its better to avoid possible future problems if the upcoming patchset to add NF_DROP_REASON() support gains further users, so remove the 0/-1 translation from the picture and pass the verdicts down to the caller. Signed-off-by: Florian Westphal <fw@strlen.de>
2023-10-18netfilter: nf_tables: mask out non-verdict bits when checking return valueFlorian Westphal2-3/+7
nftables trace infra must mask out the non-verdict bit parts of the return value, else followup changes that 'return errno << 8 | NF_STOLEN' will cause breakage. Signed-off-by: Florian Westphal <fw@strlen.de>