summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)AuthorFilesLines
2024-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni1-13/+13
Merge in late fixes to prepare for the 6.8 net-next PR No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-09tcp: Revert no longer abort SYN_SENT when receiving some ICMPShachar Kagan1-6/+3
This reverts commit 0a8de364ff7a14558e9676f424283148110384d6. Shachar reported that Vagrant (https://www.vagrantup.com/), which is very popular tool to manage fleet of VMs stopped to work after commit citied in Fixes line. The issue appears while using Vagrant to manage nested VMs. The steps are: * create vagrant file * vagrant up * vagrant halt (VM is created but shut down) * vagrant up - fail Vagrant up stdout: Bringing machine 'player1' up with 'libvirt' provider... ==> player1: Creating shared folders metadata... ==> player1: Starting domain. ==> player1: Domain launching with graphics connection settings... ==> player1: -- Graphics Port: 5900 ==> player1: -- Graphics IP: 127.0.0.1 ==> player1: -- Graphics Password: Not defined ==> player1: -- Graphics Websocket: 5700 ==> player1: Waiting for domain to get an IP address... ==> player1: Waiting for machine to boot. This may take a few minutes... player1: SSH address: 192.168.123.61:22 player1: SSH username: vagrant player1: SSH auth method: private key ==> player1: Attempting graceful shutdown of VM... ==> player1: Attempting graceful shutdown of VM... ==> player1: Attempting graceful shutdown of VM... player1: Guest communication could not be established! This is usually because player1: SSH is not running, the authentication information was changed, player1: or some other networking issue. Vagrant will force halt, if player1: capable. ==> player1: Attempting direct shutdown of domain... Fixes: 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving some ICMP") Closes: https://lore.kernel.org/all/MN2PR12MB44863139E562A59329E89DBEB982A@MN2PR12MB4486.namprd12.prod.outlook.com Signed-off-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/14459261ea9f9c7d7dfb28eb004ce8734fa83ade.1704185904.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-07ip6_tunnel: fix NEXTHDR_FRAGMENT handling in ip6_tnl_parse_tlv_enc_lim()Eric Dumazet1-13/+13
syzbot pointed out [1] that NEXTHDR_FRAGMENT handling is broken. Reading frag_off can only be done if we pulled enough bytes to skb->head. Currently we might access garbage. [1] BUG: KMSAN: uninit-value in ip6_tnl_parse_tlv_enc_lim+0x94f/0xbb0 ip6_tnl_parse_tlv_enc_lim+0x94f/0xbb0 ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1326 [inline] ip6_tnl_start_xmit+0xab2/0x1a70 net/ipv6/ip6_tunnel.c:1432 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3564 __dev_queue_xmit+0x33b8/0x5130 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x569/0x660 net/core/neighbour.c:1592 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0x23a9/0x2b30 net/ipv6/ip6_output.c:137 ip6_finish_output+0x855/0x12b0 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x323/0x610 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:451 [inline] ip6_local_out+0xe9/0x140 net/ipv6/output_core.c:155 ip6_send_skb net/ipv6/ip6_output.c:1952 [inline] ip6_push_pending_frames+0x1f9/0x560 net/ipv6/ip6_output.c:1972 rawv6_push_pending_frames+0xbe8/0xdf0 net/ipv6/raw.c:582 rawv6_sendmsg+0x2b66/0x2e70 net/ipv6/raw.c:920 inet_sendmsg+0x105/0x190 net/ipv4/af_inet.c:847 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768 slab_alloc_node mm/slub.c:3478 [inline] __kmem_cache_alloc_node+0x5c9/0x970 mm/slub.c:3517 __do_kmalloc_node mm/slab_common.c:1006 [inline] __kmalloc_node_track_caller+0x118/0x3c0 mm/slab_common.c:1027 kmalloc_reserve+0x249/0x4a0 net/core/skbuff.c:582 pskb_expand_head+0x226/0x1a00 net/core/skbuff.c:2098 __pskb_pull_tail+0x13b/0x2310 net/core/skbuff.c:2655 pskb_may_pull_reason include/linux/skbuff.h:2673 [inline] pskb_may_pull include/linux/skbuff.h:2681 [inline] ip6_tnl_parse_tlv_enc_lim+0x901/0xbb0 net/ipv6/ip6_tunnel.c:408 ipxip6_tnl_xmit net/ipv6/ip6_tunnel.c:1326 [inline] ip6_tnl_start_xmit+0xab2/0x1a70 net/ipv6/ip6_tunnel.c:1432 __netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3548 [inline] dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3564 __dev_queue_xmit+0x33b8/0x5130 net/core/dev.c:4349 dev_queue_xmit include/linux/netdevice.h:3134 [inline] neigh_connected_output+0x569/0x660 net/core/neighbour.c:1592 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0x23a9/0x2b30 net/ipv6/ip6_output.c:137 ip6_finish_output+0x855/0x12b0 net/ipv6/ip6_output.c:222 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x323/0x610 net/ipv6/ip6_output.c:243 dst_output include/net/dst.h:451 [inline] ip6_local_out+0xe9/0x140 net/ipv6/output_core.c:155 ip6_send_skb net/ipv6/ip6_output.c:1952 [inline] ip6_push_pending_frames+0x1f9/0x560 net/ipv6/ip6_output.c:1972 rawv6_push_pending_frames+0xbe8/0xdf0 net/ipv6/raw.c:582 rawv6_sendmsg+0x2b66/0x2e70 net/ipv6/raw.c:920 inet_sendmsg+0x105/0x190 net/ipv4/af_inet.c:847 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638 __sys_sendmsg net/socket.c:2667 [inline] __do_sys_sendmsg net/socket.c:2676 [inline] __se_sys_sendmsg net/socket.c:2674 [inline] __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b CPU: 0 PID: 7345 Comm: syz-executor.3 Not tainted 6.7.0-rc8-syzkaller-00024-gac865f00af29 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Fixes: fbfa743a9d2a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-05net: gro: parse ipv6 ext headers without frag0 invalidationRichard Gobert1-10/+41
The existing code always pulls the IPv6 header and sets the transport offset initially. Then optionally again pulls any extension headers in ipv6_gso_pull_exthdrs and sets the transport offset again on return from that call. skb->data is set at the start of the first extension header before calling ipv6_gso_pull_exthdrs, and must disable the frag0 optimization because that function uses pskb_may_pull/pskb_pull instead of skb_gro_ helpers. It sets the GRO offset to the TCP header with skb_gro_pull and sets the transport header. Then returns skb->data to its position before this block. This commit introduces a new helper function - ipv6_gro_pull_exthdrs - which is used in ipv6_gro_receive to pull ipv6 ext headers instead of ipv6_gso_pull_exthdrs. Thus, there is no modification of skb->data, all operations use skb_gro_* helpers, and the frag0 fast path can be taken for IPv6 packets with ext headers. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/504130f6-b56c-4dcc-882c-97942c59f5b7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05net: gso: add HBH extension header offload supportRichard Gobert2-14/+22
This commit adds net_offload to IPv6 Hop-by-Hop extension headers (as it is done for routing and dstopts) since it is supported in GSO and GRO. This allows to remove specific HBH conditionals in GSO and GRO when pulling and parsing an incoming packet. Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/d4f8825a-1d55-4b12-9d67-a254dbbfa6ae@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-13/+5
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt.c e009b2efb7a8 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()") 0f2b21477988 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL") https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04fib: remove unnecessary input parameters in fib_default_rule_addZhengchao Shao2-3/+3
When fib_default_rule_add is invoked, the value of the input parameter 'flags' is always 0. Rules uses kzalloc to allocate memory, so 'flags' has been initialized to 0. Therefore, remove the input parameter 'flags' in fib_default_rule_add. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240102071519.3781384-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03Revert "net: ipv6/addrconf: clamp preferred_lft to the minimum required"Alex Henrie1-13/+5
The commit had a bug and might not have been the right approach anyway. Fixes: 629df6701c8a ("net: ipv6/addrconf: clamp preferred_lft to the minimum required") Fixes: ec575f885e3e ("Documentation: networking: explain what happens if temp_prefered_lft is too small or too large") Reported-by: Dan Moulding <dan@danm.net> Closes: https://lore.kernel.org/netdev/20231221231115.12402-1-dan@danm.net/ Link: https://lore.kernel.org/netdev/CAMMLpeTdYhd=7hhPi2Y7pwdPCgnnW5JYh-bu3hSc7im39uxnEA@mail.gmail.com/ Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231230043252.10530-1-alexhenrie24@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni2-52/+9
Cross-merge networking fixes after downstream PR. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 23c93c3b6275 ("bnxt_en: do not map packet buffers twice") 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") tools/testing/selftests/net/Makefile 2258b666482d ("selftests: add vlan hw filter tests") a0bc96c0cd6e ("selftests: net: verify fq per-band packet limit") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21net/ipv6: Revert remove expired routes with a separated list of routesDavid Ahern2-52/+9
This reverts commit 3dec89b14d37ee635e772636dad3f09f78f1ab87. The commit has some race conditions given how expires is managed on a fib6_info in relation to gc start, adding the entry to the gc list and setting the timer value leading to UAF. Revert the commit and try again in a later release. Fixes: 3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes") Cc: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231219030243.25687-1-dsahern@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-15net: Namespace-ify sysctl_optmem_maxEric Dumazet1-2/+2
optmem_max being used in tx zerocopy, we want to be able to control it on a netns basis. Following patch changes two tests. Tested: oqq130:~# cat /proc/sys/net/core/optmem_max 131072 oqq130:~# echo 1000000 >/proc/sys/net/core/optmem_max oqq130:~# cat /proc/sys/net/core/optmem_max 1000000 oqq130:~# unshare -n oqq130:~# cat /proc/sys/net/core/optmem_max 131072 oqq130:~# exit logout oqq130:~# cat /proc/sys/net/core/optmem_max 1000000 Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-5/+1
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/intel/iavf/iavf_ethtool.c 3a0b5a2929fd ("iavf: Introduce new state machines for flow director") 95260816b489 ("iavf: use iavf_schedule_aq_request() helper") https://lore.kernel.org/all/84e12519-04dc-bd80-bc34-8cf50d7898ce@intel.com/ drivers/net/ethernet/broadcom/bnxt/bnxt.c c13e268c0768 ("bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic") c2f8063309da ("bnxt_en: Refactor RX VLAN acceleration logic.") a7445d69809f ("bnxt_en: Add support for new RX and TPA_START completion types for P7") 1c7fd6ee2fe4 ("bnxt_en: Rename some macros for the P5 chips") https://lore.kernel.org/all/20231211110022.27926ad9@canb.auug.org.au/ drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c bd6781c18cb5 ("bnxt_en: Fix wrong return value check in bnxt_close_nic()") 84793a499578 ("bnxt_en: Skip nic close/open when configuring tstamp filters") https://lore.kernel.org/all/20231214113041.3a0c003c@canb.auug.org.au/ drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c 3d7a3f2612d7 ("net/mlx5: Nack sync reset request when HotPlug is enabled") cecf44ea1a1f ("net/mlx5: Allow sync reset flow when BF MGT interface device is present") https://lore.kernel.org/all/20231211110328.76c925af@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11ipv6: annotate data-races around np->ucast_oifEric Dumazet6-39/+33
np->ucast_oif is read locklessly in some contexts. Make all accesses to this field lockless, adding appropriate annotations. This also makes setsockopt( IPV6_UNICAST_IF ) lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11ipv6: annotate data-races around np->mcast_oifEric Dumazet7-45/+47
np->mcast_oif is read locklessly in some contexts. Make all accesses to this field lockless, adding appropriate annotations. This also makes setsockopt( IPV6_MULTICAST_IF ) lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08net: ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIXMaciej Żenczykowski1-5/+1
Lorenzo points out that we effectively clear all unknown flags from PIO when copying them to userspace in the netlink RTM_NEWPREFIX notification. We could fix this one at a time as new flags are defined, or in one fell swoop - I choose the latter. We could either define 6 new reserved flags (reserved1..6) and handle them individually (and rename them as new flags are defined), or we could simply copy the entire unmodified byte over - I choose the latter. This unfortunately requires some anonymous union/struct magic, so we add a static assert on the struct size for a little extra safety. Cc: David Ahern <dsahern@kernel.org> Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-6/+2
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/stmicro/stmmac/dwmac5.c drivers/net/ethernet/stmicro/stmmac/dwmac5.h drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c drivers/net/ethernet/stmicro/stmmac/hwif.h 37e4b8df27bc ("net: stmmac: fix FPE events losing") c3f3b97238f6 ("net: stmmac: Refactor EST implementation") https://lore.kernel.org/all/20231206110306.01e91114@canb.auug.org.au/ Adjacent changes: net/ipv4/tcp_ao.c 9396c4ee93f9 ("net/tcp: Don't store TCP-AO maclen on reqsk") 7b0f570f879a ("tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06net/tcp: Consistently align TCP-AO option in the headerDmitry Safonov1-1/+1
Currently functions that pre-calculate TCP header options length use unaligned TCP-AO header + MAC-length for skb reservation. And the functions that actually write TCP-AO options into skb do align the header. Nothing good can come out of this for ((maclen % 4) != 0). Provide tcp_ao_len_aligned() helper and use it everywhere for TCP header options space calculations. Fixes: 1e03d32bea8e ("net/tcp: Add TCP-AO sign to outgoing packets") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-01ipv6: fix potential NULL deref in fib6_add()Eric Dumazet1-5/+1
If fib6_find_prefix() returns NULL, we should silently fallback using fib6_null_entry regardless of RT6_DEBUG value. syzbot reported: WARNING: CPU: 0 PID: 5477 at net/ipv6/ip6_fib.c:1516 fib6_add+0x310d/0x3fa0 net/ipv6/ip6_fib.c:1516 Modules linked in: CPU: 0 PID: 5477 Comm: syz-executor.0 Not tainted 6.7.0-rc2-syzkaller-00029-g9b6de136b5f0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 RIP: 0010:fib6_add+0x310d/0x3fa0 net/ipv6/ip6_fib.c:1516 Code: 00 48 8b 54 24 68 e8 42 22 00 00 48 85 c0 74 14 49 89 c6 e8 d5 d3 c2 f7 eb 5d e8 ce d3 c2 f7 e9 ca 00 00 00 e8 c4 d3 c2 f7 90 <0f> 0b 90 48 b8 00 00 00 00 00 fc ff df 48 8b 4c 24 38 80 3c 01 00 RSP: 0018:ffffc90005067740 EFLAGS: 00010293 RAX: ffffffff89cba5bc RBX: ffffc90005067ab0 RCX: ffff88801a2e9dc0 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffffc90005067980 R08: ffffffff89cbca85 R09: 1ffff110040d4b85 R10: dffffc0000000000 R11: ffffed10040d4b86 R12: 00000000ffffffff R13: 1ffff110051c3904 R14: ffff8880206a5c00 R15: ffff888028e1c820 FS: 00007f763783c6c0(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f763783bff8 CR3: 000000007f74d000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __ip6_ins_rt net/ipv6/route.c:1303 [inline] ip6_route_add+0x88/0x120 net/ipv6/route.c:3847 ipv6_route_ioctl+0x525/0x7b0 net/ipv6/route.c:4467 inet6_ioctl+0x21a/0x270 net/ipv6/af_inet6.c:575 sock_do_ioctl+0x152/0x460 net/socket.c:1220 sock_ioctl+0x615/0x8c0 net/socket.c:1339 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0xf8/0x170 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82 Fixes: 7bbfe00e0252 ("ipv6: fix general protection fault in fib6_add()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231129160630.3509216-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30tcp: Factorise cookie-dependent fields initialisation in cookie_v[46]_check()Kuniyuki Iwashima1-29/+32
We will support arbitrary SYN Cookie with BPF, and then kfunc at TC will preallocate reqsk and initialise some fields that should not be overwritten later by cookie_v[46]_check(). To simplify the flow in cookie_v[46]_check(), we move such fields' initialisation to cookie_tcp_reqsk_alloc() and factorise non-BPF SYN Cookie handling into cookie_tcp_check(), where we validate the cookie and allocate reqsk, as done by kfunc later. Note that we set ireq->ecn_ok in two steps, the latter of which will be shared by the BPF case. As cookie_ecn_ok() is one-liner, now it's inlined. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30tcp: Factorise cookie-independent fields initialisation in cookie_v[46]_check().Kuniyuki Iwashima1-14/+0
We will support arbitrary SYN Cookie with BPF, and then some reqsk fields are initialised in kfunc, and others are done in cookie_v[46]_check(). This patch factorises the common part as cookie_tcp_reqsk_init() and calls it in cookie_tcp_reqsk_alloc() to minimise the discrepancy between cookie_v[46]_check(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30tcp: Move TCP-AO bits from cookie_v[46]_check() to tcp_ao_syncookie().Kuniyuki Iwashima1-5/+2
We initialise treq->af_specific in cookie_tcp_reqsk_alloc() so that we can look up a key later in tcp_create_openreq_child(). Initially, that change was added for MD5 by commit ba5a4fdd63ae ("tcp: make sure treq->af_specific is initialized"), but it has not been used since commit d0f2b7a9ca0a ("tcp: Disable header prediction for MD5 flow."). Now, treq->af_specific is used only by TCP-AO, so, we can move that initialisation into tcp_ao_syncookie(). In addition to that, l3index in cookie_v[46]_check() is only used for tcp_ao_syncookie(), so let's move it as well. While at it, we move down tcp_ao_syncookie() in cookie_v4_check() so that it will be called after security_inet_conn_request() to make functions order consistent with cookie_v6_check(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30tcp: Don't initialise tp->tsoffset in tcp_get_cookie_sock().Kuniyuki Iwashima1-2/+2
When we create a full socket from SYN Cookie, we initialise tcp_sk(sk)->tsoffset redundantly in tcp_get_cookie_sock() as the field is inherited from tcp_rsk(req)->ts_off. cookie_v[46]_check |- treq->ts_off = 0 `- tcp_get_cookie_sock |- tcp_v[46]_syn_recv_sock | `- tcp_create_openreq_child | `- newtp->tsoffset = treq->ts_off `- tcp_sk(child)->tsoffset = tsoff Let's initialise tcp_rsk(req)->ts_off with the correct offset and remove the second initialisation of tcp_sk(sk)->tsoffset. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30tcp: Don't pass cookie to __cookie_v[46]_check().Kuniyuki Iwashima1-7/+8
tcp_hdr(skb) and SYN Cookie are passed to __cookie_v[46]_check(), but none of the callers passes cookie other than ntohl(th->ack_seq) - 1. Let's fetch it in __cookie_v[46]_check() instead of passing the cookie over and over. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30tcp: Clean up goto labels in cookie_v[46]_check().Kuniyuki Iwashima1-2/+2
We will support arbitrary SYN Cookie with BPF, and then reqsk will be preallocated before cookie_v[46]_check(). Depending on how validation fails, we send RST or just drop skb. To make the error handling easier, let's clean up goto labels. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30tcp: Cache sock_net(sk) in cookie_v[46]_check().Kuniyuki Iwashima1-9/+10
sock_net(sk) is used repeatedly in cookie_v[46]_check(). Let's cache it in a variable. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-30tcp: Clean up reverse xmas tree in cookie_v[46]_check().Kuniyuki Iwashima1-6/+6
We will grow and cut the xmas tree in cookie_v[46]_check(). This patch cleans it up to make later patches tidy. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231129022924.96156-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-17tcp: no longer abort SYN_SENT when receiving some ICMPEric Dumazet1-3/+6
Currently, non fatal ICMP messages received on behalf of SYN_SENT sockets do call tcp_ld_RTO_revert() to implement RFC 6069, but immediately call tcp_done(), thus aborting the connect() attempt. This violates RFC 1122 following requirement: 4.2.3.9 ICMP Messages ... o Destination Unreachable -- codes 0, 1, 5 Since these Unreachable messages indicate soft error conditions, TCP MUST NOT abort the connection, and it SHOULD make the information available to the application. This patch makes sure non 'fatal' ICMP[v6] messages do not abort the connection attempt. It enables RFC 6069 for SYN_SENT sockets as a result. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Morley <morleyd@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-10Merge tag 'net-6.7-rc1' of ↵Linus Torvalds5-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter and bpf. Current release - regressions: - sched: fix SKB_NOT_DROPPED_YET splat under debug config Current release - new code bugs: - tcp: - fix usec timestamps with TCP fastopen - fix possible out-of-bounds reads in tcp_hash_fail() - fix SYN option room calculation for TCP-AO - tcp_sigpool: fix some off by one bugs - bpf: fix compilation error without CGROUPS - ptp: - ptp_read() should not release queue - fix tsevqs corruption Previous releases - regressions: - llc: verify mac len before reading mac header Previous releases - always broken: - bpf: - fix check_stack_write_fixed_off() to correctly spill imm - fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END - check map->usercnt after timer->timer is assigned - dsa: lan9303: consequently nested-lock physical MDIO - dccp/tcp: call security_inet_conn_request() after setting IP addr - tg3: fix the TX ring stall due to incorrect full ring handling - phylink: initialize carrier state at creation - ice: fix direction of VF rules in switchdev mode Misc: - fill in a bunch of missing MODULE_DESCRIPTION()s, more to come" * tag 'net-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) net: ti: icss-iep: fix setting counter value ptp: fix corrupted list in ptp_open ptp: ptp_read should not release queue net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP net: kcm: fill in MODULE_DESCRIPTION() net/sched: act_ct: Always fill offloading tuple iifidx netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses netfilter: xt_recent: fix (increase) ipv6 literal buffer length ipvs: add missing module descriptions netfilter: nf_tables: remove catchall element in GC sync path netfilter: add missing module descriptions drivers/net/ppp: use standard array-copy-function net: enetc: shorten enetc_setup_xdp_prog() error message to fit NETLINK_MAX_FMTMSG_LEN virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt() r8169: respect userspace disabling IFF_MULTICAST selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg net: phylink: initialize carrier state at creation test/vsock: add dobule bind connect test test/vsock: refactor vsock_accept ...
2023-11-08netfilter: add missing module descriptionsFlorian Westphal4-0/+4
W=1 builds warn on missing MODULE_DESCRIPTION, add them. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-11-03Merge tag 'v6.7-p1' of ↵Linus Torvalds1-11/+8
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add virtual-address based lskcipher interface - Optimise ahash/shash performance in light of costly indirect calls - Remove ahash alignmask attribute Algorithms: - Improve AES/XTS performance of 6-way unrolling for ppc - Remove some uses of obsolete algorithms (md4, md5, sha1) - Add FIPS 202 SHA-3 support in pkcs1pad - Add fast path for single-page messages in adiantum - Remove zlib-deflate Drivers: - Add support for S4 in meson RNG driver - Add STM32MP13x support in stm32 - Add hwrng interface support in qcom-rng - Add support for deflate algorithm in hisilicon/zip" * tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (283 commits) crypto: adiantum - flush destination page before unmapping crypto: testmgr - move pkcs1pad(rsa,sha3-*) to correct place Documentation/module-signing.txt: bring up to date module: enable automatic module signing with FIPS 202 SHA-3 crypto: asymmetric_keys - allow FIPS 202 SHA-3 signatures crypto: rsa-pkcs1pad - Add FIPS 202 SHA-3 support crypto: FIPS 202 SHA-3 register in hash info for IMA x509: Add OIDs for FIPS 202 SHA-3 hash and signatures crypto: ahash - optimize performance when wrapping shash crypto: ahash - check for shash type instead of not ahash type crypto: hash - move "ahash wrapping shash" functions to ahash.c crypto: talitos - stop using crypto_ahash::init crypto: chelsio - stop using crypto_ahash::init crypto: ahash - improve file comment crypto: ahash - remove struct ahash_request_priv crypto: ahash - remove crypto_ahash_alignmask crypto: gcm - stop using alignmask of ahash crypto: chacha20poly1305 - stop using alignmask of ahash crypto: ccm - stop using alignmask of ahash net: ipv6: stop checking crypto_ahash_alignmask ...
2023-11-02dccp/tcp: Call security_inet_conn_request() after setting IPv6 addresses.Kuniyuki Iwashima1-3/+4
Initially, commit 4237c75c0a35 ("[MLSXFRM]: Auto-labeling of child sockets") introduced security_inet_conn_request() in some functions where reqsk is allocated. The hook is added just after the allocation, so reqsk's IPv6 remote address was not initialised then. However, SELinux/Smack started to read it in netlbl_req_setattr() after commit e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm."). Commit 284904aa7946 ("lsm: Relocate the IPv4 security_inet_conn_request() hooks") fixed that kind of issue only in TCPv4 because IPv6 labeling was not supported at that time. Finally, the same issue was introduced again in IPv6. Let's apply the same fix on DCCPv6 and TCPv6. Fixes: e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-31Merge tag 'ipsec-next-2023-10-28' of ↵Jakub Kicinski6-27/+95
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2023-10-28 1) Remove unused function declarations of xfrm4_extract_input and xfrm6_extract_input. From Yue Haibing. 2) Annotate struct xfrm_sec_ctx with __counted_by. From Kees Cook. 3) Support GRO decapsulation for ESP in UDP encapsulation. From Antony Antony et all. 4) Replace the xfrm session decode with flow dissector. From Florian Westphal. 5) Fix a use after free in __xfrm6_udp_encap_rcv. 6) Fix the layer 4 flowi decoding. From Florian Westphal. * tag 'ipsec-next-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: policy: fix layer 4 flowi decoding xfrm Fix use after free in __xfrm6_udp_encap_rcv. xfrm: policy: replace session decode with flow dissector xfrm: move mark and oif flowi decode into common code xfrm: pass struct net to xfrm_decode_session wrappers xfrm: Support GRO for IPv6 ESP in UDP encapsulation xfrm: Support GRO for IPv4 ESP in UDP encapsulation xfrm: Use the XFRM_GRO to indicate a GRO call on input xfrm: Annotate struct xfrm_sec_ctx with __counted_by xfrm: Remove unused function declarations ==================== Link: https://lore.kernel.org/r/20231028084328.3119236-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-27net: ipv6: stop checking crypto_ahash_alignmaskEric Biggers1-10/+7
Now that the alignmask for ahash and shash algorithms is always 0, crypto_ahash_alignmask() always returns 0 and will be removed. In preparation for this, stop checking crypto_ahash_alignmask() in ah6.c. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-27net/tcp: Wire up l3index to TCP-AODmitry Safonov3-16/+25
Similarly how TCP_MD5SIG_FLAG_IFINDEX works for TCP-MD5, TCP_AO_KEYF_IFINDEX is an AO-key flag that binds that MKT to a specified by L3 ifinndex. Similarly, without this flag the key will work in the default VRF l3index = 0 for connections. To prevent AO-keys from overlapping, it's restricted to add key B for a socket that has key A, which have the same sndid/rcvid and one of the following is true: - !(A.keyflags & TCP_AO_KEYF_IFINDEX) or !(B.keyflags & TCP_AO_KEYF_IFINDEX) so that any key is non-bound to a VRF - A.l3index == B.l3index both want to work for the same VRF Additionally, it's restricted to match TCP-MD5 keys for the same peer the following way: |--------------|--------------------|----------------|---------------| | | MD5 key without | MD5 key | MD5 key | | | l3index | l3index=0 | l3index=N | |--------------|--------------------|----------------|---------------| | TCP-AO key | | | | | without | reject | reject | reject | | l3index | | | | |--------------|--------------------|----------------|---------------| | TCP-AO key | | | | | l3index=0 | reject | reject | allow | |--------------|--------------------|----------------|---------------| | TCP-AO key | | | | | l3index=N | reject | allow | reject | |--------------|--------------------|----------------|---------------| This is done with the help of tcp_md5_do_lookup_any_l3index() to reject adding AO key without TCP_AO_KEYF_IFINDEX if there's TCP-MD5 in any VRF. This is important for case where sysctl_tcp_l3mdev_accept = 1 Similarly, for TCP-AO lookups tcp_ao_do_lookup() may be used with l3index < 0, so that __tcp_ao_key_cmp() will match TCP-AO key in any VRF. Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add static_key for TCP-AODmitry Safonov1-11/+14
Similarly to TCP-MD5, add a static key to TCP-AO that is patched out when there are no keys on a machine and dynamically enabled with the first setsockopt(TCP_AO) adds a key on any socket. The static key is as well dynamically disabled later when the socket is destructed. The lifetime of enabled static key here is the same as ao_info: it is enabled on allocation, passed over from full socket to twsk and destructed when ao_info is scheduled for destruction. Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Ignore specific ICMPs for TCP-AO connectionsDmitry Safonov1-0/+7
Similarly to IPsec, RFC5925 prescribes: ">> A TCP-AO implementation MUST default to ignore incoming ICMPv4 messages of Type 3 (destination unreachable), Codes 2-4 (protocol unreachable, port unreachable, and fragmentation needed -- ’hard errors’), and ICMPv6 Type 1 (destination unreachable), Code 1 (administratively prohibited) and Code 4 (port unreachable) intended for connections in synchronized states (ESTABLISHED, FIN-WAIT-1, FIN- WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT) that match MKTs." A selftest (later in patch series) verifies that this attack is not possible in this TCP-AO implementation. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP-AO SNE supportDmitry Safonov1-1/+2
Add Sequence Number Extension (SNE) for TCP-AO. This is needed to protect long-living TCP-AO connections from replaying attacks after sequence number roll-over, see RFC5925 (6.2). Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP-AO segments countersDmitry Safonov1-2/+2
Introduce segment counters that are useful for troubleshooting/debugging as well as for writing tests. Now there are global snmp counters as well as per-socket and per-key. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Verify inbound TCP-AO signed segmentsDmitry Safonov2-9/+11
Now there is a common function to verify signature on TCP segments: tcp_inbound_hash(). It has checks for all possible cross-interactions with MD5 signs as well as with unsigned segments. The rules from RFC5925 are: (1) Any TCP segment can have at max only one signature. (2) TCP connections can't switch between using TCP-MD5 and TCP-AO. (3) TCP-AO connections can't stop using AO, as well as unsigned connections can't suddenly start using AO. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Sign SYN-ACK segments with TCP-AODmitry Safonov2-0/+23
Similarly to RST segments, wire SYN-ACKs to TCP-AO. tcp_rsk_used_ao() is handy here to check if the request socket used AO and needs a signature on the outgoing segments. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Wire TCP-AO to request socketsDmitry Safonov3-15/+100
Now when the new request socket is created from the listening socket, it's recorded what MKT was used by the peer. tcp_rsk_used_ao() is a new helper for checking if TCP-AO option was used to create the request socket. tcp_ao_copy_all_matching() will copy all keys that match the peer on the request socket, as well as preparing them for the usage (creating traffic keys). Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP-AO sign to twskDmitry Safonov1-19/+53
Add support for sockets in time-wait state. ao_info as well as all keys are inherited on transition to time-wait socket. The lifetime of ao_info is now protected by ref counter, so that tcp_ao_destroy_sock() will destruct it only when the last user is gone. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add AO sign to RST packetsDmitry Safonov1-27/+71
Wire up sending resets to TCP-AO hashing. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add tcp_parse_auth_options()Dmitry Safonov1-4/+7
Introduce a helper that: (1) shares the common code with TCP-MD5 header options parsing (2) looks for hash signature only once for both TCP-MD5 and TCP-AO (3) fails with -EEXIST if any TCP sign option is present twice, see RFC5925 (2.2): ">> A single TCP segment MUST NOT have more than one TCP-AO in its options sequence. When multiple TCP-AOs appear, TCP MUST discard the segment." Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Add TCP-AO sign to outgoing packetsDmitry Safonov2-0/+30
Using precalculated traffic keys, sign TCP segments as prescribed by RFC5925. Per RFC, TCP header options are included in sign calculation: "The TCP header, by default including options, and where the TCP checksum and TCP-AO MAC fields are set to zero, all in network- byte order." (5.1.3) tcp_ao_hash_header() has exclude_options parameter to optionally exclude TCP header from hash calculation, as described in RFC5925 (9.1), this is needed for interaction with middleboxes that may change "some TCP options". This is wired up to AO key flags and setsockopt() later. Similarly to TCP-MD5 hash TCP segment fragments. From this moment a user can start sending TCP-AO signed segments with one of crypto ahash algorithms from supported by Linux kernel. It can have a user-specified MAC length, to either save TCP option header space or provide higher protection using a longer signature. The inbound segments are not yet verified, TCP-AO option is ignored and they are accepted. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Calculate TCP-AO traffic keysDmitry Safonov2-0/+51
Add traffic key calculation the way it's described in RFC5926. Wire it up to tcp_finish_connect() and cache the new keys straight away on already established TCP connections. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Prevent TCP-MD5 with TCP-AO being setDmitry Safonov2-4/+39
Be as conservative as possible: if there is TCP-MD5 key for a given peer regardless of L3 interface - don't allow setting TCP-AO key for the same peer. According to RFC5925, TCP-AO is supposed to replace TCP-MD5 and there can't be any switch between both on any connected tuple. Later it can be relaxed, if there's a use, but in the beginning restrict any intersection. Note: it's still should be possible to set both TCP-MD5 and TCP-AO keys on a listening socket for *different* peers. Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Introduce TCP_AO setsockopt()sDmitry Safonov3-15/+44
Add 3 setsockopt()s: 1. TCP_AO_ADD_KEY to add a new Master Key Tuple (MKT) on a socket 2. TCP_AO_DEL_KEY to delete present MKT from a socket 3. TCP_AO_INFO to change flags, Current_key/RNext_key on a TCP-AO sk Userspace has to introduce keys on every socket it wants to use TCP-AO option on, similarly to TCP_MD5SIG/TCP_MD5SIG_EXT. RFC5925 prohibits definition of MKTs that would match the same peer, so do sanity checks on the data provided by userspace. Be as conservative as possible, including refusal of defining MKT on an established connection with no AO, removing the key in-use and etc. (1) and (2) are to be used by userspace key manager to add/remove keys. (3) main purpose is to set RNext_key, which (as prescribed by RFC5925) is the KeyID that will be requested in TCP-AO header from the peer to sign their segments with. At this moment the life of ao_info ends in tcp_v4_destroy_sock(). Co-developed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Co-developed-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Salam Noureddine <noureddine@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27net/tcp: Prepare tcp_md5sig_pool for TCP-AODmitry Safonov1-33/+27
TCP-AO, similarly to TCP-MD5, needs to allocate tfms on a slow-path, which is setsockopt() and use crypto ahash requests on fast paths, which are RX/TX softirqs. Also, it needs a temporary/scratch buffer for preparing the hash. Rework tcp_md5sig_pool in order to support other hashing algorithms than MD5. It will make it possible to share pre-allocated crypto_ahash descriptors and scratch area between all TCP hash users. Internally tcp_sigpool calls crypto_clone_ahash() API over pre-allocated crypto ahash tfm. Kudos to Herbert, who provided this new crypto API. I was a little concerned over GFP_ATOMIC allocations of ahash and crypto_request in RX/TX (see tcp_sigpool_start()), so I benchmarked both "backends" with different algorithms, using patched version of iperf3[2]. On my laptop with i7-7600U @ 2.80GHz: clone-tfm per-CPU-requests TCP-MD5 2.25 Gbits/sec 2.30 Gbits/sec TCP-AO(hmac(sha1)) 2.53 Gbits/sec 2.54 Gbits/sec TCP-AO(hmac(sha512)) 1.67 Gbits/sec 1.64 Gbits/sec TCP-AO(hmac(sha384)) 1.77 Gbits/sec 1.80 Gbits/sec TCP-AO(hmac(sha224)) 1.29 Gbits/sec 1.30 Gbits/sec TCP-AO(hmac(sha3-512)) 481 Mbits/sec 480 Mbits/sec TCP-AO(hmac(md5)) 2.07 Gbits/sec 2.12 Gbits/sec TCP-AO(hmac(rmd160)) 1.01 Gbits/sec 995 Mbits/sec TCP-AO(cmac(aes128)) [not supporetd yet] 2.11 Gbits/sec So, it seems that my concerns don't have strong grounds and per-CPU crypto_request allocation can be dropped/removed from tcp_sigpool once ciphers get crypto_clone_ahash() support. [1]: https://lore.kernel.org/all/ZDefxOq6Ax0JeTRH@gondor.apana.org.au/T/#u [2]: https://github.com/0x7f454c46/iperf/tree/tcp-md5-ao Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-27netlink: make range pointers in policies constJakub Kicinski1-1/+1
struct nla_policy is usually constant itself, but unless we make the ranges inside constant we won't be able to make range structs const. The ranges are not modified by the core. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231025162204.132528-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>