summaryrefslogtreecommitdiff
path: root/net/xfrm
AgeCommit message (Collapse)AuthorFilesLines
3 daysmove asm/unaligned.h to linux/unaligned.hAl Viro1-1/+1
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-09-11Merge tag 'ipsec-next-2024-09-10' of ↵Jakub Kicinski2-127/+101
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-09-10 1) Remove an unneeded WARN_ON on packet offload. From Patrisious Haddad. 2) Add a copy from skb_seq_state to buffer function. This is needed for the upcomming IPTFS patchset. From Christian Hopps. 3) Spelling fix in xfrm.h. From Simon Horman. 4) Speed up xfrm policy insertions. From Florian Westphal. 5) Add and revert a patch to support xfrm interfaces for packet offload. This patch was just half cooked. 6) Extend usage of the new xfrm_policy_is_dead_or_sk helper. From Florian Westphal. 7) Update comments on sdb and xfrm_policy. From Florian Westphal. 8) Fix a null pointer dereference in the new policy insertion code From Florian Westphal. 9) Fix an uninitialized variable in the new policy insertion code. From Nathan Chancellor. * tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: policy: Restore dir assignments in xfrm_hash_rebuild() xfrm: policy: fix null dereference Revert "xfrm: add SA information to the offloaded packet" xfrm: minor update to sdb and xfrm_policy comments xfrm: policy: use recently added helper in more places xfrm: add SA information to the offloaded packet xfrm: policy: remove remaining use of inexact list xfrm: switch migrate to xfrm_policy_lookup_bytype xfrm: policy: don't iterate inexact policies twice at insert time selftests: add xfrm policy insertion speed test script xfrm: Correct spelling in xfrm.h net: add copy from skb_seq_state to buffer function xfrm: Remove documentation WARN_ON to limit return values for offloaded SA ==================== Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()Nathan Chancellor1-0/+2
Clang warns (or errors with CONFIG_WERROR): net/xfrm/xfrm_policy.c:1286:8: error: variable 'dir' is uninitialized when used here [-Werror,-Wuninitialized] 1286 | if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) { | ^~~ net/xfrm/xfrm_policy.c:1257:9: note: initialize the variable 'dir' to silence this warning 1257 | int dir; | ^ | = 0 1 error generated. A recent refactoring removed some assignments to dir because xfrm_policy_is_dead_or_sk() has a dir assignment in it. However, dir is used elsewhere in xfrm_hash_rebuild(), including within loops where it needs to be reloaded for each policy. Restore the assignments before the first use of dir to fix the warning and ensure dir is properly initialized throughout the function. Fixes: 08c2182cf0b4 ("xfrm: policy: use recently added helper in more places") Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-09xfrm: policy: fix null dereferenceFlorian Westphal1-2/+2
Julian Wiedmann says: > + if (!xfrm_pol_hold_rcu(ret)) Coverity spotted that ^^^ needs a s/ret/pol fix-up: > CID 1599386: Null pointer dereferences (FORWARD_NULL) > Passing null pointer "ret" to "xfrm_pol_hold_rcu", which dereferences it. Ditch the bogus 'ret' variable. Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype") Reported-by: Julian Wiedmann <jwiedmann.dev@gmail.com> Closes: https://lore.kernel.org/netdev/06dc2499-c095-4bd4-aee3-a1d0e3ec87c4@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-09Revert "xfrm: add SA information to the offloaded packet"Steffen Klassert1-21/+0
This reverts commit e7cd191f83fd899c233dfbe7dc6d96ef703dcbbd. While supporting xfrm interfaces in the packet offload API is needed, this patch does not do the right thing. There are more things to do to really support xfrm interfaces, so revert it for now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-03netdev_features: convert NETIF_F_LLTX to dev->lltxAlexander Lobakin1-1/+1
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-31xfrm: Unmask upper DSCP bits in xfrm_get_tos()Ido Schimmel1-1/+2
The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain: xfrm_bundle_create() tos = xfrm_get_tos(fl, family) xfrm_dst_lookup(..., tos, ...) __xfrm_dst_lookup(..., tos, ...) xfrm4_dst_lookup(..., tos, ...) __xfrm4_dst_lookup(..., tos, ...) fl4->flowi4_tos = tos __ip_route_output_key(net, fl4) Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Remove IPTOS_RT_MASK since it is no longer used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-28xfrm: minor update to sdb and xfrm_policy commentsFlorian Westphal1-1/+5
The spd is no longer maintained as a linear list. We also haven't been caching bundles in the xfrm_policy struct since 2010. While at it, add kdoc style comments for the xfrm_policy structure and extend the description of the current rbtree based search to mention why it needs to search the candidate set. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-28xfrm: policy: use recently added helper in more placesFlorian Westphal1-11/+2
No logical change intended. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-27xfrm: add SA information to the offloaded packetwangfe1-0/+21
In packet offload mode, append Security Association (SA) information to each packet, replicating the crypto offload implementation. The XFRM_XMIT flag is set to enable packet to be returned immediately from the validate_xmit_xfrm function, thus aligning with the existing code path for packet offload mode. This SA info helps HW offload match packets to their correct security policies. The XFRM interface ID is included, which is crucial in setups with multiple XFRM interfaces where source/destination addresses alone can't pinpoint the right policy. Signed-off-by: wangfe <wangfe@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-24xfrm: policy: remove remaining use of inexact listFlorian Westphal1-38/+0
No consumers anymore, remove it. After this, insertion of policies no longer require list walk of all inexact policies but only those that are reachable via the candidate sets. This gives almost linear insertion speeds provided the inserted policies are for non-overlapping networks. Before: Inserted 1000 policies in 70 ms Inserted 10000 policies in 1155 ms Inserted 100000 policies in 216848 ms After: Inserted 1000 policies in 56 ms Inserted 10000 policies in 478 ms Inserted 100000 policies in 4580 ms Insertion of 1m entries takes about ~40s after this change on my test vm. Cc: Noel Kuntze <noel@familie-kuntze.de> Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-24xfrm: switch migrate to xfrm_policy_lookup_bytypeFlorian Westphal1-67/+39
XFRM_MIGRATE still uses the old lookup method: first check the bydst hash table, then search the list of all the other policies. Switch MIGRATE to use the same lookup function as the packetpath. This is done to remove the last remaining users of the pernet xfrm.policy_inexact lists with the intent of removing this list. After this patch, policies are still added to the list on insertion and they are rehashed as-needed but no single API makes use of these anymore. This change is compile tested only. Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-24xfrm: policy: don't iterate inexact policies twice at insert timeFlorian Westphal1-6/+53
Since commit 6be3b0db6db8 ("xfrm: policy: add inexact policy search tree infrastructure") policy lookup no longer walks a list but has a set of candidate lists. This set has to be searched for the best match. In case there are several matches, the priority wins. If the priority is also the same, then the historic behaviour with a single list was to return the first match (first-in-list). With introduction of serval lists, this doesn't work and a new 'pos' member was added that reflects the xfrm_policy structs position in the list. This value is not exported to userspace and it does not need to be the 'position in the list', it just needs to make sure that a->pos < b->pos means that a was added to the lists more recently than b. This re-walk is expensive when many inexact policies are in use. Speed this up: when appending the policy to the end of the walker list, then just take the ->pos value of the last entry made and add 1. Add a slowpath version to prevent overflow, if we'd assign UINT_MAX then iterate the entire list and fix the ordering. While this speeds up insertion considerably finding the insertion spot in the inexact list still requires a partial list walk. This is addressed in followup patches. Before: ./xfrm_policy_add_speed.sh Inserted 1000 policies in 72 ms Inserted 10000 policies in 1540 ms Inserted 100000 policies in 334780 ms After: Inserted 1000 policies in 68 ms Inserted 10000 policies in 1137 ms Inserted 100000 policies in 157307 ms Reported-by: Noel Kuntze <noel@familie-kuntze.de> Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-16xfrm: Remove documentation WARN_ON to limit return values for offloaded SAPatrisious Haddad1-5/+1
The original idea to put WARN_ON() on return value from driver code was to make sure that packet offload doesn't have silent fallback to SW implementation, like crypto offload has. In reality, this is not needed as all *swan implementations followed this request and used explicit configuration style to make sure that "users will get what they ask". So instead of forcing drivers to make sure that even their internal flows don't return -EOPNOTSUPP, let's remove this WARN_ON. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-13/+66
Merge in late fixes to prepare for the 6.11 net-next PR. Conflicts: 93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null") 4cddb0f15ea9 ("net: ethtool: pse-pd: Fix possible null-deref") 30d7b6727724 ("net: ethtool: Add new power limit get and set features") https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge tag 'ipsec-next-2024-07-13' of ↵Jakub Kicinski8-8/+347
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-07-13 1) Support sending NAT keepalives in ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it. From Eyal Birger. 2) Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths. Currently, IPsec crypto offload is enabled for GRO code path only. This patchset support UDP encapsulation for the non GRO path. From Mike Yu. * tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet xfrm: Allow UDP encapsulation in crypto offload control path xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path xfrm: support sending NAT keepalives in ESP in UDP states ==================== Link: https://patch.msgid.link/20240713102416.3272997-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge tag 'ipsec-2024-07-11' of ↵Jakub Kicinski4-13/+66
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-07-11 1) Fix esp_output_tail_tcp() on unsupported ESPINTCP. From Hagar Hemdan. 2) Fix two bugs in the recently introduced SA direction separation. From Antony Antony. 3) Fix unregister netdevice hang on hardware offload. We had to add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. 4) Fix netdev reference count imbalance in xfrm_state_find. From Jianbo Liu. 5) Call xfrm_dev_policy_delete when killingi them on offloaded policies. Jianbo Liu. * tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: call xfrm_dev_policy_delete when kill policy xfrm: fix netdev reference count imbalance xfrm: Export symbol xfrm_dev_state_delete. xfrm: Fix unregister netdevice hang on hardware offload. xfrm: Log input direction mismatch error in one place xfrm: Fix input error path memory access net: esp: cleanup esp_output_tail_tcp() in case of unsupported ESPINTCP ==================== Link: https://patch.msgid.link/20240711100025.1949454-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packetMike Yu1-1/+2
If xfrm_input() is called with UDP_ENCAP_ESPINUDP, the packet is already processed in UDP layer that removes the UDP header. Therefore, there should be no much difference to treat it as an ESP packet in the XFRM stack. Test: Enabled dir=in IPsec crypto offload, and verified IPv4 UDP-encapsulated ESP packets on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12xfrm: Allow UDP encapsulation in crypto offload control pathMike Yu1-3/+3
Unblock this limitation so that SAs with encapsulation specified can be passed to HW drivers. HW drivers can still reject the SA in their implementation of xdo_dev_state_add if the encapsulation is not supported. Test: Verified on Android device Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO pathMike Yu2-2/+5
IPsec crypt offload supports outbound IPv6 ESP packets, but it doesn't support inbound IPv6 ESP packets. This change enables the crypto offload for inbound IPv6 ESP packets that are not handled through GRO code path. If HW drivers add the offload information to the skb, the packet will be handled in the crypto offload rx code path. Apart from the change in crypto offload rx code path, the change in xfrm_policy_check is also needed. Exampe of RX data path: +-----------+ +-------+ | HW Driver |-->| wlan0 |--------+ +-----------+ +-------+ | v +---------------+ +------+ +------>| Network Stack |-->| Apps | | +---------------+ +------+ | | | v +--------+ +------------+ | ipsec1 |<--| XFRM Stack | +--------+ +------------+ Test: Enabled both in/out IPsec crypto offload, and verified IPv6 ESP packets on Android device on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-08xfrm: call xfrm_dev_policy_delete when kill policyJianbo Liu2-4/+2
xfrm_policy_kill() is called at different places to delete xfrm policy. It will call xfrm_pol_put(). But xfrm_dev_policy_delete() is not called to free the policy offloaded to hardware. The three commits cited here are to handle this issue by calling xfrm_dev_policy_delete() outside xfrm_get_policy(). But they didn't cover all the cases. An example, which is not handled for now, is xfrm_policy_insert(). It is called when XFRM_MSG_UPDPOLICY request is received. Old policy is replaced by new one, but the offloaded policy is not deleted, so driver doesn't have the chance to release hardware resources. To resolve this issue for all cases, move xfrm_dev_policy_delete() into xfrm_policy_kill(), so the offloaded policy can be deleted from hardware when it is called, which avoids hardware resources leakage. Fixes: 919e43fad516 ("xfrm: add an interface to offload policy") Fixes: bf06fcf4be0f ("xfrm: add missed call to delete offloaded policies") Fixes: 982c3aca8bac ("xfrm: delete offloaded policy") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-08xfrm: fix netdev reference count imbalanceJianbo Liu1-2/+1
In cited commit, netdev_tracker_alloc() is called for the newly allocated xfrm state, but dev_hold() is missed, which causes netdev reference count imbalance, because netdev_put() is called when the state is freed in xfrm_dev_state_free(). Fix the issue by replacing netdev_tracker_alloc() with netdev_hold(). Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-01xfrm: Export symbol xfrm_dev_state_delete.Steffen Klassert1-0/+1
This fixes a build failure if xfrm_user is build as a module. Fixes: 07b87f9eea0c ("xfrm: Fix unregister netdevice hang on hardware offload.") Reported-by: Mark Brown <broonie@kernel.org> Tested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-26xfrm: support sending NAT keepalives in ESP in UDP statesEyal Birger6-3/+338
Add the ability to send out RFC-3948 NAT keepalives from the xfrm stack. To use, Userspace sets an XFRM_NAT_KEEPALIVE_INTERVAL integer property when creating XFRM outbound states which denotes the number of seconds between keepalive messages. Keepalive messages are sent from a per net delayed work which iterates over the xfrm states. The logic is guarded by the xfrm state spinlock due to the xfrm state walk iterator. Possible future enhancements: - Adding counters to keep track of sent keepalives. - deduplicate NAT keepalives between states sharing the same nat keepalive parameters. - provisioning hardware offloads for devices capable of implementing this. - revise xfrm state list to use an rcu list in order to avoid running this under spinlock. Suggested-by: Paul Wouters <paul.wouters@aiven.io> Tested-by: Paul Wouters <paul.wouters@aiven.io> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-25xfrm: Fix unregister netdevice hang on hardware offload.Steffen Klassert1-2/+59
When offloading xfrm states to hardware, the offloading device is attached to the skbs secpath. If a skb is free is deferred, an unregister netdevice hangs because the netdevice is still refcounted. Fix this by removing the netdevice from the xfrm states when the netdevice is unregistered. To find all xfrm states that need to be cleared we add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-17xfrm: Log input direction mismatch error in one placeAntony Antony1-5/+0
Previously, the offload data path decrypted the packet before checking the direction, leading to error logging and packet dropping. However, dropped packets wouldn't be visible in tcpdump or audit log. With this fix, the offload path, upon noticing SA direction mismatch, will pass the packet to the stack without decrypting it. The L3 layer will then log the error, audit, and drop ESP without decrypting or decapsulating it. This also ensures that the slow path records the error and audit log, making dropped packets visible in tcpdump. Fixes: 304b44f0d5a4 ("xfrm: Add dir validation to "in" data path lookup") Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-17xfrm: Fix input error path memory accessAntony Antony1-0/+3
When there is a misconfiguration of input state slow path KASAN report error. Fix this error. west login: [ 52.987278] eth1: renamed from veth11 [ 53.078814] eth1: renamed from veth21 [ 53.181355] eth1: renamed from veth31 [ 54.921702] ================================================================== [ 54.922602] BUG: KASAN: wild-memory-access in xfrmi_rcv_cb+0x2d/0x295 [ 54.923393] Read of size 8 at addr 6b6b6b6b00000000 by task ping/512 [ 54.924169] [ 54.924386] CPU: 0 PID: 512 Comm: ping Not tainted 6.9.0-08574-gcd29a4313a1b #25 [ 54.925290] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 54.926401] Call Trace: [ 54.926731] <IRQ> [ 54.927009] dump_stack_lvl+0x2a/0x3b [ 54.927478] kasan_report+0x84/0xa6 [ 54.927930] ? xfrmi_rcv_cb+0x2d/0x295 [ 54.928410] xfrmi_rcv_cb+0x2d/0x295 [ 54.928872] ? xfrm4_rcv_cb+0x3d/0x5e [ 54.929354] xfrm4_rcv_cb+0x46/0x5e [ 54.929804] xfrm_rcv_cb+0x7e/0xa1 [ 54.930240] xfrm_input+0x1b3a/0x1b96 [ 54.930715] ? xfrm_offload+0x41/0x41 [ 54.931182] ? raw_rcv+0x292/0x292 [ 54.931617] ? nf_conntrack_confirm+0xa2/0xa2 [ 54.932158] ? skb_sec_path+0xd/0x3f [ 54.932610] ? xfrmi_input+0x90/0xce [ 54.933066] xfrm4_esp_rcv+0x33/0x54 [ 54.933521] ip_protocol_deliver_rcu+0xd7/0x1b2 [ 54.934089] ip_local_deliver_finish+0x110/0x120 [ 54.934659] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 54.935248] NF_HOOK.constprop.0+0xf8/0x138 [ 54.935767] ? ip_sublist_rcv_finish+0x68/0x68 [ 54.936317] ? secure_tcpv6_ts_off+0x23/0x168 [ 54.936859] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 54.937454] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 54.938135] NF_HOOK.constprop.0+0xf8/0x138 [ 54.938663] ? ip_sublist_rcv_finish+0x68/0x68 [ 54.939220] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 54.939904] ? ip_local_deliver_finish+0x120/0x120 [ 54.940497] __netif_receive_skb_one_core+0xc9/0x107 [ 54.941121] ? __netif_receive_skb_list_core+0x1c2/0x1c2 [ 54.941771] ? blk_mq_start_stopped_hw_queues+0xc7/0xf9 [ 54.942413] ? blk_mq_start_stopped_hw_queue+0x38/0x38 [ 54.943044] ? virtqueue_get_buf_ctx+0x295/0x46b [ 54.943618] process_backlog+0xb3/0x187 [ 54.944102] __napi_poll.constprop.0+0x57/0x1a7 [ 54.944669] net_rx_action+0x1cb/0x380 [ 54.945150] ? __napi_poll.constprop.0+0x1a7/0x1a7 [ 54.945744] ? vring_new_virtqueue+0x17a/0x17a [ 54.946300] ? note_interrupt+0x2cd/0x367 [ 54.946805] handle_softirqs+0x13c/0x2c9 [ 54.947300] do_softirq+0x5f/0x7d [ 54.947727] </IRQ> [ 54.948014] <TASK> [ 54.948300] __local_bh_enable_ip+0x48/0x62 [ 54.948832] __neigh_event_send+0x3fd/0x4ca [ 54.949361] neigh_resolve_output+0x1e/0x210 [ 54.949896] ip_finish_output2+0x4bf/0x4f0 [ 54.950410] ? __ip_finish_output+0x171/0x1b8 [ 54.950956] ip_send_skb+0x25/0x57 [ 54.951390] raw_sendmsg+0xf95/0x10c0 [ 54.951850] ? check_new_pages+0x45/0x71 [ 54.952343] ? raw_hash_sk+0x21b/0x21b [ 54.952815] ? kernel_init_pages+0x42/0x51 [ 54.953337] ? prep_new_page+0x44/0x51 [ 54.953811] ? get_page_from_freelist+0x72b/0x915 [ 54.954390] ? signal_pending_state+0x77/0x77 [ 54.954936] ? preempt_count_sub+0x14/0xb3 [ 54.955450] ? __might_resched+0x8a/0x240 [ 54.955951] ? __might_sleep+0x25/0xa0 [ 54.956424] ? first_zones_zonelist+0x2c/0x43 [ 54.956977] ? __rcu_read_lock+0x2d/0x3a [ 54.957476] ? __pte_offset_map+0x32/0xa4 [ 54.957980] ? __might_resched+0x8a/0x240 [ 54.958483] ? __might_sleep+0x25/0xa0 [ 54.958963] ? inet_send_prepare+0x54/0x54 [ 54.959478] ? sock_sendmsg_nosec+0x42/0x6c [ 54.960000] sock_sendmsg_nosec+0x42/0x6c [ 54.960502] __sys_sendto+0x15d/0x1cc [ 54.960966] ? __x64_sys_getpeername+0x44/0x44 [ 54.961522] ? __handle_mm_fault+0x679/0xae4 [ 54.962068] ? find_vma+0x6b/0x8b [ 54.962497] ? find_vma_intersection+0x8a/0x8a [ 54.963052] ? handle_mm_fault+0x38/0x154 [ 54.963556] ? handle_mm_fault+0xeb/0x154 [ 54.964059] ? preempt_latency_start+0x29/0x34 [ 54.964613] ? preempt_count_sub+0x14/0xb3 [ 54.965141] ? up_read+0x4b/0x5c [ 54.965557] __x64_sys_sendto+0x76/0x82 [ 54.966041] do_syscall_64+0x69/0xd5 [ 54.966497] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 54.967119] RIP: 0033:0x7f2d2fec9a73 [ 54.967572] Code: 8b 15 a9 83 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 71 0b 0d 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24 [ 54.969747] RSP: 002b:00007ffe85756418 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 54.970655] RAX: ffffffffffffffda RBX: 0000558bebad1340 RCX: 00007f2d2fec9a73 [ 54.971511] RDX: 0000000000000040 RSI: 0000558bebad73c0 RDI: 0000000000000003 [ 54.972366] RBP: 0000558bebad73c0 R08: 0000558bebad35c0 R09: 0000000000000010 [ 54.973234] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040 [ 54.974091] R13: 00007ffe85757b00 R14: 0000001d00000001 R15: 0000558bebad4680 [ 54.974951] </TASK> [ 54.975244] ================================================================== [ 54.976133] Disabling lock debugging due to kernel taint [ 54.976784] Oops: stack segment: 0000 [#1] PREEMPT DEBUG_PAGEALLOC KASAN [ 54.977603] CPU: 0 PID: 512 Comm: ping Tainted: G B 6.9.0-08574-gcd29a4313a1b #25 [ 54.978654] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 54.979750] RIP: 0010:xfrmi_rcv_cb+0x2d/0x295 [ 54.980293] Code: 00 00 41 57 41 56 41 89 f6 41 55 41 54 55 53 48 89 fb 51 85 f6 75 31 48 89 df e8 d7 e8 ff ff 48 89 c5 48 89 c7 e8 8b a4 4f ff <48> 8b 7d 00 48 89 ee e8 eb f3 ff ff 49 89 c5 b8 01 00 00 00 4d 85 [ 54.982462] RSP: 0018:ffffc90000007990 EFLAGS: 00010282 [ 54.983099] RAX: 0000000000000001 RBX: ffff8881126e9900 RCX: fffffbfff07b77cd [ 54.983948] RDX: fffffbfff07b77cd RSI: fffffbfff07b77cd RDI: ffffffff83dbbe60 [ 54.984794] RBP: 6b6b6b6b00000000 R08: 0000000000000008 R09: 0000000000000001 [ 54.985647] R10: ffffffff83dbbe67 R11: fffffbfff07b77cc R12: 00000000ffffffff [ 54.986512] R13: 00000000ffffffff R14: 00000000ffffffff R15: 0000000000000002 [ 54.987365] FS: 00007f2d2fc0dc40(0000) GS:ffffffff82eb2000(0000) knlGS:0000000000000000 [ 54.988329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.989026] CR2: 00007ffe85755ff8 CR3: 0000000109941000 CR4: 0000000000350ef0 [ 54.989897] Call Trace: [ 54.990223] <IRQ> [ 54.990500] ? __die_body+0x1a/0x56 [ 54.990950] ? die+0x30/0x49 [ 54.991326] ? do_trap+0x9b/0x132 [ 54.991751] ? do_error_trap+0x7d/0xaf [ 54.992223] ? exc_stack_segment+0x35/0x45 [ 54.992734] ? asm_exc_stack_segment+0x22/0x30 [ 54.993294] ? xfrmi_rcv_cb+0x2d/0x295 [ 54.993764] ? xfrm4_rcv_cb+0x3d/0x5e [ 54.994228] xfrm4_rcv_cb+0x46/0x5e [ 54.994670] xfrm_rcv_cb+0x7e/0xa1 [ 54.995106] xfrm_input+0x1b3a/0x1b96 [ 54.995572] ? xfrm_offload+0x41/0x41 [ 54.996038] ? raw_rcv+0x292/0x292 [ 54.996472] ? nf_conntrack_confirm+0xa2/0xa2 [ 54.997011] ? skb_sec_path+0xd/0x3f [ 54.997466] ? xfrmi_input+0x90/0xce [ 54.997925] xfrm4_esp_rcv+0x33/0x54 [ 54.998378] ip_protocol_deliver_rcu+0xd7/0x1b2 [ 54.998944] ip_local_deliver_finish+0x110/0x120 [ 54.999520] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 55.000111] NF_HOOK.constprop.0+0xf8/0x138 [ 55.000630] ? ip_sublist_rcv_finish+0x68/0x68 [ 55.001195] ? secure_tcpv6_ts_off+0x23/0x168 [ 55.001743] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 55.002331] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 55.003008] NF_HOOK.constprop.0+0xf8/0x138 [ 55.003527] ? ip_sublist_rcv_finish+0x68/0x68 [ 55.004078] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 55.004755] ? ip_local_deliver_finish+0x120/0x120 [ 55.005351] __netif_receive_skb_one_core+0xc9/0x107 [ 55.005972] ? __netif_receive_skb_list_core+0x1c2/0x1c2 [ 55.006626] ? blk_mq_start_stopped_hw_queues+0xc7/0xf9 [ 55.007266] ? blk_mq_start_stopped_hw_queue+0x38/0x38 [ 55.007899] ? virtqueue_get_buf_ctx+0x295/0x46b [ 55.008476] process_backlog+0xb3/0x187 [ 55.008961] __napi_poll.constprop.0+0x57/0x1a7 [ 55.009540] net_rx_action+0x1cb/0x380 [ 55.010020] ? __napi_poll.constprop.0+0x1a7/0x1a7 [ 55.010610] ? vring_new_virtqueue+0x17a/0x17a [ 55.011173] ? note_interrupt+0x2cd/0x367 [ 55.011675] handle_softirqs+0x13c/0x2c9 [ 55.012169] do_softirq+0x5f/0x7d [ 55.012597] </IRQ> [ 55.012882] <TASK> [ 55.013179] __local_bh_enable_ip+0x48/0x62 [ 55.013704] __neigh_event_send+0x3fd/0x4ca [ 55.014227] neigh_resolve_output+0x1e/0x210 [ 55.014761] ip_finish_output2+0x4bf/0x4f0 [ 55.015278] ? __ip_finish_output+0x171/0x1b8 [ 55.015823] ip_send_skb+0x25/0x57 [ 55.016261] raw_sendmsg+0xf95/0x10c0 [ 55.016729] ? check_new_pages+0x45/0x71 [ 55.017229] ? raw_hash_sk+0x21b/0x21b [ 55.017708] ? kernel_init_pages+0x42/0x51 [ 55.018225] ? prep_new_page+0x44/0x51 [ 55.018704] ? get_page_from_freelist+0x72b/0x915 [ 55.019292] ? signal_pending_state+0x77/0x77 [ 55.019840] ? preempt_count_sub+0x14/0xb3 [ 55.020357] ? __might_resched+0x8a/0x240 [ 55.020860] ? __might_sleep+0x25/0xa0 [ 55.021345] ? first_zones_zonelist+0x2c/0x43 [ 55.021896] ? __rcu_read_lock+0x2d/0x3a [ 55.022396] ? __pte_offset_map+0x32/0xa4 [ 55.022901] ? __might_resched+0x8a/0x240 [ 55.023404] ? __might_sleep+0x25/0xa0 [ 55.023879] ? inet_send_prepare+0x54/0x54 [ 55.024391] ? sock_sendmsg_nosec+0x42/0x6c [ 55.024918] sock_sendmsg_nosec+0x42/0x6c [ 55.025428] __sys_sendto+0x15d/0x1cc [ 55.025892] ? __x64_sys_getpeername+0x44/0x44 [ 55.026441] ? __handle_mm_fault+0x679/0xae4 [ 55.026988] ? find_vma+0x6b/0x8b [ 55.027414] ? find_vma_intersection+0x8a/0x8a [ 55.027966] ? handle_mm_fault+0x38/0x154 [ 55.028470] ? handle_mm_fault+0xeb/0x154 [ 55.028972] ? preempt_latency_start+0x29/0x34 [ 55.029532] ? preempt_count_sub+0x14/0xb3 [ 55.030047] ? up_read+0x4b/0x5c [ 55.030463] __x64_sys_sendto+0x76/0x82 [ 55.030949] do_syscall_64+0x69/0xd5 [ 55.031406] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 55.032028] RIP: 0033:0x7f2d2fec9a73 [ 55.032481] Code: 8b 15 a9 83 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 71 0b 0d 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24 [ 55.034660] RSP: 002b:00007ffe85756418 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 55.035567] RAX: ffffffffffffffda RBX: 0000558bebad1340 RCX: 00007f2d2fec9a73 [ 55.036424] RDX: 0000000000000040 RSI: 0000558bebad73c0 RDI: 0000000000000003 [ 55.037293] RBP: 0000558bebad73c0 R08: 0000558bebad35c0 R09: 0000000000000010 [ 55.038153] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040 [ 55.039012] R13: 00007ffe85757b00 R14: 0000001d00000001 R15: 0000558bebad4680 [ 55.039871] </TASK> [ 55.040167] Modules linked in: [ 55.040585] ---[ end trace 0000000000000000 ]--- [ 55.041164] RIP: 0010:xfrmi_rcv_cb+0x2d/0x295 [ 55.041714] Code: 00 00 41 57 41 56 41 89 f6 41 55 41 54 55 53 48 89 fb 51 85 f6 75 31 48 89 df e8 d7 e8 ff ff 48 89 c5 48 89 c7 e8 8b a4 4f ff <48> 8b 7d 00 48 89 ee e8 eb f3 ff ff 49 89 c5 b8 01 00 00 00 4d 85 [ 55.043889] RSP: 0018:ffffc90000007990 EFLAGS: 00010282 [ 55.044528] RAX: 0000000000000001 RBX: ffff8881126e9900 RCX: fffffbfff07b77cd [ 55.045386] RDX: fffffbfff07b77cd RSI: fffffbfff07b77cd RDI: ffffffff83dbbe60 [ 55.046250] RBP: 6b6b6b6b00000000 R08: 0000000000000008 R09: 0000000000000001 [ 55.047104] R10: ffffffff83dbbe67 R11: fffffbfff07b77cc R12: 00000000ffffffff [ 55.047960] R13: 00000000ffffffff R14: 00000000ffffffff R15: 0000000000000002 [ 55.048820] FS: 00007f2d2fc0dc40(0000) GS:ffffffff82eb2000(0000) knlGS:0000000000000000 [ 55.049805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.050507] CR2: 00007ffe85755ff8 CR3: 0000000109941000 CR4: 0000000000350ef0 [ 55.051366] Kernel panic - not syncing: Fatal exception in interrupt [ 55.052136] Kernel Offset: disabled [ 55.052577] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 304b44f0d5a4 ("xfrm: Add dir validation to "in" data path lookup") Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-30net: fix __dst_negative_advice() raceEric Dumazet1-8/+3
__dst_negative_advice() does not enforce proper RCU rules when sk->dst_cache must be cleared, leading to possible UAF. RCU rules are that we must first clear sk->sk_dst_cache, then call dst_release(old_dst). Note that sk_dst_reset(sk) is implementing this protocol correctly, while __dst_negative_advice() uses the wrong order. Given that ip6_negative_advice() has special logic against RTF_CACHE, this means each of the three ->negative_advice() existing methods must perform the sk_dst_reset() themselves. Note the check against NULL dst is centralized in __dst_negative_advice(), there is no need to duplicate it in various callbacks. Many thanks to Clement Lecigne for tracking this issue. This old bug became visible after the blamed commit, using UDP sockets. Fixes: a87cb3e48ee8 ("net: Facility to report route quality of connected sockets") Reported-by: Clement Lecigne <clecigne@google.com> Diagnosed-by: Clement Lecigne <clecigne@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-0/+10
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 35d92abfbad8 ("net: hns3: fix kernel crash when devlink reload during initialization") 2a1a1a7b5fd7 ("net: hns3: add command queue trace for hns3") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protectionEric Dumazet1-1/+1
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. All rtnl_link_ops->get_link_net() methods already using dev_net() are ready. I added READ_ONCE() annotations on others. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-07Merge tag 'ipsec-next-2024-05-03' of ↵Jakub Kicinski8-9/+196
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-05-03 1) Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support. This was defined by an early version of an IETF draft that did not make it to a standard. 2) Introduce direction attribute for xfrm states. xfrm states have a direction, a stsate can be used either for input or output packet processing. Add a direction to xfrm states to make it clear for what a xfrm state is used. * tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Restrict SA direction attribute to specific netlink message types xfrm: Add dir validation to "in" data path lookup xfrm: Add dir validation to "out" data path lookup xfrm: Add Direction to the SA in or out udpencap: Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support ==================== Link: https://lore.kernel.org/r/20240503082732.2835810-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-03net: Remove ctl_table sentinel elements from several networking subsystemsJoel Granados1-4/+1
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) To avoid lots of small commits, this commit brings together network changes from (as they appear in MAINTAINERS) LLC, MPTCP, NETROM NETWORK LAYER, PHONET PROTOCOL, ROSE NETWORK LAYER, RXRPC SOCKETS, SCTP PROTOCOL, SHARED MEMORY COMMUNICATIONS (SMC), TIPC NETWORK LAYER and NETWORKING [IPSEC] * Remove sentinel element from ctl_table structs. * Replace empty array registration with the register_net_sysctl_sz call in llc_sysctl_init * Replace the for loop stop condition that tests for procname == NULL with one that depends on array size in sctp_sysctl_net_register * Remove instances where an array element is zeroed out to make it look like a sentinel in xfrm_sysctl_init. This is not longer needed and is safe after commit c899710fe7f9 ("networking: Update to register_net_sysctl_sz") added the array size to the ctl_table registration * Use a table_size variable to keep the value of ARRAY_SIZE Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-01xfrm: Restrict SA direction attribute to specific netlink message typesAntony Antony1-0/+24
Reject the usage of the SA_DIR attribute in xfrm netlink messages when it's not applicable. This ensures that SA_DIR is only accepted for certain message types (NEWSA, UPDSA, and ALLOCSPI) Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-01xfrm: Add dir validation to "in" data path lookupAntony Antony2-0/+12
Introduces validation for the x->dir attribute within the XFRM input data lookup path. If the configured direction does not match the expected direction, input, increment the XfrmInStateDirError counter and drop the packet to ensure data integrity and correct flow handling. grep -vw 0 /proc/net/xfrm_stat XfrmInStateDirError 1 Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-01xfrm: Add dir validation to "out" data path lookupAntony Antony2-0/+7
Introduces validation for the x->dir attribute within the XFRM output data lookup path. If the configured direction does not match the expected direction, output, increment the XfrmOutStateDirError counter and drop the packet to ensure data integrity and correct flow handling. grep -vw 0 /proc/net/xfrm_stat XfrmOutPolError 1 XfrmOutStateDirError 1 Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-01xfrm: Add Direction to the SA in or outAntony Antony5-9/+153
This patch introduces the 'dir' attribute, 'in' or 'out', to the xfrm_state, SA, enhancing usability by delineating the scope of values based on direction. An input SA will restrict values pertinent to input, effectively segregating them from output-related values. And an output SA will restrict attributes for output. This change aims to streamline the configuration process and improve the overall consistency of SA attributes during configuration. This feature sets the groundwork for future patches, including the upcoming IP-TFS patch. Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-04-29ipv6: introduce dst_rt6_info() helperEric Dumazet1-2/+1
Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-26xfrm: Preserve vlan tags for transport mode software GROPaul Davey1-0/+8
The software GRO path for esp transport mode uses skb_mac_header_rebuild prior to re-injecting the packet via the xfrm_napi_dev. This only copies skb->mac_len bytes of header which may not be sufficient if the packet contains 802.1Q tags or other VLAN tags. Worse copying only the initial header will leave a packet marked as being VLAN tagged but without the corresponding tag leading to mangling when it is later untagged. The VLAN tags are important when receiving the decrypted esp transport mode packet after GRO processing to ensure it is received on the correct interface. Therefore record the full mac header length in xfrm*_transport_input for later use in corresponding xfrm*_transport_finish to copy the entire mac header when rebuilding the mac header for GRO. The skb->data pointer is left pointing skb->mac_header bytes after the start of the mac header as is expected by the network stack and network and transport header offsets reset to this location. Fixes: 7785bba299a8 ("esp: Add a software GRO codepath") Signed-off-by: Paul Davey <paul.davey@alliedtelesis.co.nz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-04-22sysctl: treewide: constify ctl_table_header::ctl_table_argThomas Weißschuh1-1/+1
To be able to constify instances of struct ctl_tables it is necessary to remove ways through which non-const versions are exposed from the sysctl core. One of these is the ctl_table_arg member of struct ctl_table_header. Constify this reference as a prerequisite for the full constification of struct ctl_table instances. No functional change. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-11xfrm: fix possible derferencing in error pathAntony Antony1-0/+2
Fix derferencing pointer when xfrm_policy_lookup_bytype returns an error. Fixes: 63b21caba17e ("xfrm: introduce forwarding of ICMP Error messages") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/kernel-janitors/f6ef0d0d-96de-4e01-9dc3-c1b3a6338653@moroto.mountain/ Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-03-18xfrm: Allow UDP encapsulation only in offload modesLeon Romanovsky1-1/+2
The missing check of x->encap caused to the situation where GSO packets were created with UDP encapsulation. As a solution return the encap check for non-offloaded SA. Fixes: 983a73da1f99 ("xfrm: Pass UDP encapsulation in TX packet offload") Closes: https://lore.kernel.org/all/a650221ae500f0c7cf496c61c96c1b103dcb6f67.camel@redhat.com Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-03-08Merge tag 'ipsec-next-2024-03-06' of ↵David S. Miller2-14/+143
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Introduce forwarding of ICMP Error messages. That is specified in RFC 4301 but was never implemented. From Antony Antony. 2) Use KMEM_CACHE instead of kmem_cache_create in xfrm6_tunnel_init() and xfrm_policy_init(). From Kunwu Chan. 3) Do not allocate stats in the xfrm interface driver, this can be done on net core now. From Breno Leitao. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: move netdev_max_backlog to net_hotdataEric Dumazet2-2/+5
netdev_max_backlog is used in rx fat path. Move it to net_hodata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-4/+13
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07Merge tag 'ipsec-2024-03-06' of ↵Jakub Kicinski4-4/+13
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-03-06 1) Clear the ECN bits flowi4_tos in decode_session4(). This was already fixed but the bug was reintroduced when decode_session4() switched to us the flow dissector. From Guillaume Nault. 2) Fix UDP encapsulation in the TX path with packet offload mode. From Leon Romanovsky, 3) Avoid clang fortify warning in copy_to_user_tmpl(). From Nathan Chancellor. 4) Fix inter address family tunnel in packet offload mode. From Mike Yu. * tag 'ipsec-2024-03-06' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: set skb control buffer based on packet offload as well xfrm: fix xfrm child route lookup for packet offload xfrm: Avoid clang fortify warning in copy_to_user_tmpl() xfrm: Pass UDP encapsulation in TX packet offload xfrm: Clear low order bits of ->flowi4_tos in decode_session4(). ==================== Link: https://lore.kernel.org/r/20240306100438.3953516-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05xfrm: set skb control buffer based on packet offload as wellMike Yu1-1/+5
In packet offload, packets are not encrypted in XFRM stack, so the next network layer which the packets will be forwarded to should depend on where the packet came from (either xfrm4_output or xfrm6_output) rather than the matched SA's family type. Test: verified IPv6-in-IPv4 packets on Android device with IPsec packet offload enabled Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-03-05xfrm: fix xfrm child route lookup for packet offloadMike Yu1-1/+3
In current code, xfrm_bundle_create() always uses the matched SA's family type to look up a xfrm child route for the skb. The route returned by xfrm_dst_lookup() will eventually be used in xfrm_output_resume() (skb_dst(skb)->ops->local_out()). If packet offload is used, the above behavior can lead to calling ip_local_out() for an IPv6 packet or calling ip6_local_out() for an IPv4 packet, which is likely to fail. This change fixes the behavior by checking if the matched SA has packet offload enabled. If not, keep the same behavior; if yes, use the matched SP's family type for the lookup. Test: verified IPv6-in-IPv4 packets on Android device with IPsec packet offload enabled Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-03-03Merge tag 'for-netdev' of ↵Jakub Kicinski2-4/+4
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-02-29 We've added 119 non-merge commits during the last 32 day(s) which contain a total of 150 files changed, 3589 insertions(+), 995 deletions(-). The main changes are: 1) Extend the BPF verifier to enable static subprog calls in spin lock critical sections, from Kumar Kartikeya Dwivedi. 2) Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF global subprogs, from Andrii Nakryiko. 3) Larger batch of riscv BPF JIT improvements and enabling inlining of the bpf_kptr_xchg() for RV64, from Pu Lehui. 4) Allow skeleton users to change the values of the fields in struct_ops maps at runtime, from Kui-Feng Lee. 5) Extend the verifier's capabilities of tracking scalars when they are spilled to stack, especially when the spill or fill is narrowing, from Maxim Mikityanskiy & Eduard Zingerman. 6) Various BPF selftest improvements to fix errors under gcc BPF backend, from Jose E. Marchesi. 7) Avoid module loading failure when the module trying to register a struct_ops has its BTF section stripped, from Geliang Tang. 8) Annotate all kfuncs in .BTF_ids section which eventually allows for automatic kfunc prototype generation from bpftool, from Daniel Xu. 9) Several updates to the instruction-set.rst IETF standardization document, from Dave Thaler. 10) Shrink the size of struct bpf_map resp. bpf_array, from Alexei Starovoitov. 11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer, from Benjamin Tissoires. 12) Fix bpftool to be more portable to musl libc by using POSIX's basename(), from Arnaldo Carvalho de Melo. 13) Add libbpf support to gcc in CORE macro definitions, from Cupertino Miranda. 14) Remove a duplicate type check in perf_event_bpf_event, from Florian Lehner. 15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them with notrace correctly, from Yonghong Song. 16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible array to fix build warnings, from Kees Cook. 17) Fix resolve_btfids cross-compilation to non host-native endianness, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) selftests/bpf: Test if shadow types work correctly. bpftool: Add an example for struct_ops map and shadow type. bpftool: Generated shadow variables for struct_ops maps. libbpf: Convert st_ops->data to shadow type. libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. bpf: Replace bpf_lpm_trie_key 0-length array with flexible array bpf, arm64: use bpf_prog_pack for memory management arm64: patching: implement text_poke API bpf, arm64: support exceptions arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT bpf: add is_async_callback_calling_insn() helper bpf: introduce in_sleepable() helper bpf: allow more maps in sleepable bpf programs selftests/bpf: Test case for lacking CFI stub functions. bpf: Check cfi_stubs before registering a struct_ops type. bpf: Clarify batch lookup/lookup_and_delete semantics bpf, docs: specify which BPF_ABS and BPF_IND fields were zero bpf, docs: Fix typos in instruction-set.rst selftests/bpf: update tcp_custom_syncookie to use scalar packet offset bpf: Shrink size of struct bpf_map/bpf_array. ... ==================== Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27xfrm: Do not allocate stats in the driverBreno Leitao1-8/+2
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the xfrm driver and leverage the network core allocation. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-02-26rtnetlink: prepare nla_put_iflink() to run under RCUEric Dumazet1-1/+1
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. This patch prepares dev_get_iflink() and nla_put_iflink() to run either with RTNL or RCU held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>