summaryrefslogtreecommitdiff
path: root/include/net/xfrm.h
AgeCommit message (Collapse)AuthorFilesLines
2022-07-25Merge branch 'master' of ↵David S. Miller1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-07-20 1) Don't set DST_NOPOLICY in IPv4, a recent patch made this superfluous. From Eyal Birger. 2) Convert alg_key to flexible array member to avoid an iproute2 compile warning when built with gcc-12. From Stephen Hemminger. 3) xfrm_register_km and xfrm_unregister_km do always return 0 so change the type to void. From Zhengchao Shao. 4) Fix spelling mistake in esp6.c From Zhang Jiaming. 5) Improve the wording of comment above XFRM_OFFLOAD flags. From Petr Vaněk. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11net: Find dst with sk's xfrm policy not ctl_sksewookseo1-0/+2
If we set XFRM security policy by calling setsockopt with option IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' struct. However tcp_v6_send_response doesn't look up dst_entry with the actual socket but looks up with tcp control socket. This may cause a problem that a RST packet is sent without ESP encryption & peer's TCP socket can't receive it. This patch will make the function look up dest_entry with actual socket, if the socket has XFRM policy(sock_policy), so that the TCP response packet via this function can be encrypted, & aligned on the encrypted TCP socket. Tested: We encountered this problem when a TCP socket which is encrypted in ESP transport mode encryption, receives challenge ACK at SYN_SENT state. After receiving challenge ACK, TCP needs to send RST to establish the socket at next SYN try. But the RST was not encrypted & peer TCP socket still remains on ESTABLISHED state. So we verified this with test step as below. [Test step] 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). 2. Client tries a new connection on the same TCP ports(src & dst). 3. Server will return challenge ACK instead of SYN,ACK. 4. Client will send RST to server to clear the SOCKET. 5. Client will retransmit SYN to server on the same TCP ports. [Expected result] The TCP connection should be established. Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Sehee Lee <seheele@google.com> Signed-off-by: Sewook Seo <sewookseo@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-24xfrm: change the type of xfrm_register_km and xfrm_unregister_kmZhengchao Shao1-2/+2
Functions xfrm_register_km and xfrm_unregister_km do always return 0, change the type of functions to void. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-06-10net: rename reference+tracking helpersJakub Kicinski1-1/+1
Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+13
drivers/net/ethernet/mellanox/mlx5/core/main.c b33886971dbc ("net/mlx5: Initialize flow steering during driver probe") 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support") f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table") https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/ 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") https://lore.kernel.org/all/20220519114119.060ce014@canb.auug.org.au/ tools/testing/selftests/net/mptcp/mptcp_join.sh e274f7154008 ("selftests: mptcp: add subflow limits test-cases") b6e074e171bc ("selftests: mptcp: add infinite map testcase") 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type") https://lore.kernel.org/all/20220516111918.366d747f@canb.auug.org.au/ net/mptcp/options.c ba2c89e0ea74 ("mptcp: fix checksum byte order") 1e39e5a32ad7 ("mptcp: infinite mapping sending") ea66758c1795 ("tcp: allow MPTCP to update the announced window") https://lore.kernel.org/all/20220519115146.751c3a37@canb.auug.org.au/ net/mptcp/pm.c 95d686517884 ("mptcp: fix subflow accounting on close") 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") https://lore.kernel.org/all/20220516111435.72f35dca@canb.auug.org.au/ net/mptcp/subflow.c ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure") 0348c690ed37 ("mptcp: add the fallback check") f8d4bcacff3b ("mptcp: infinite mapping receiving") https://lore.kernel.org/all/20220519115837.380bb8d4@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-16xfrm: fix "disable_policy" flag use when arriving from different devicesEyal Birger1-1/+13
In IPv4 setting the "disable_policy" flag on a device means no policy should be enforced for traffic originating from the device. This was implemented by seting the DST_NOPOLICY flag in the dst based on the originating device. However, dsts are cached in nexthops regardless of the originating devices, in which case, the DST_NOPOLICY flag value may be incorrect. Consider the following setup: +------------------------------+ | ROUTER | +-------------+ | +-----------------+ | | ipsec src |----|-|ipsec0 | | +-------------+ | |disable_policy=0 | +----+ | | +-----------------+ |eth1|-|----- +-------------+ | +-----------------+ +----+ | | noipsec src |----|-|eth0 | | +-------------+ | |disable_policy=1 | | | +-----------------+ | +------------------------------+ Where ROUTER has a default route towards eth1. dst entries for traffic arriving from eth0 would have DST_NOPOLICY and would be cached and therefore can be reused by traffic originating from ipsec0, skipping policy check. Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead of the DST in IN/FWD IPv4 policy checks. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-05-06xfrm: drop not needed flags variable in XFRM offload structLeon Romanovsky1-1/+0
After drivers were converted to rely on direction, the flags is not used anymore and can be removed. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-05-06xfrm: store and rely on direction to construct offload flagsLeon Romanovsky1-0/+6
XFRM state doesn't need anything from flags except to understand direction, so store it separately. For future patches, such change will allow us to reuse xfrm_dev_offload for policy offload too, which has three possible directions instead of two. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-05-06xfrm: rename xfrm_state_offload struct to allow reuseLeon Romanovsky1-5/+5
The struct xfrm_state_offload has all fields needed to hold information for offloaded policies too. In order to do not create new struct with same fields, let's rename existing one and reuse it later. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-05-06xfrm: delete not used number of external headersLeon Romanovsky1-1/+0
num_exthdrs is set but never used, so delete it. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-05-06xfrm: free not used XFRM_ESP_NO_TRAILER flagLeon Romanovsky1-1/+1
After removal of Innova IPsec support from mlx5 driver, the last user of this XFRM_ESP_NO_TRAILER was gone too. This means that we can safely remove it as no other hardware is capable (or need) to remove ESP trailer. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-19Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/David S. Miller1-30/+18
ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-03-19 1) Delete duplicated functions that calls same xfrm_api_check. From Leon Romanovsky. 2) Align userland API of the default policy structure to the internal structures. From Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-18xfrm: rework default policy structureNicolas Dichtel1-30/+18
This is a follow up of commit f8d858e607b2 ("xfrm: make user policy API complete"). The goal is to align userland API to the internal structures. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-01-27Revert "xfrm: xfrm_state_mtu should return at least 1280 for ipv6"Jiri Bohac1-1/+0
This reverts commit b515d2637276a3810d6595e10ab02c13bfd0b63a. Commit b515d2637276a3810d6595e10ab02c13bfd0b63a ("xfrm: xfrm_state_mtu should return at least 1280 for ipv6") in v5.14 breaks the TCP MSS calculation in ipsec transport mode, resulting complete stalls of TCP connections. This happens when the (P)MTU is 1280 or slighly larger. The desired formula for the MSS is: MSS = (MTU - ESP_overhead) - IP header - TCP header However, the above commit clamps the (MTU - ESP_overhead) to a minimum of 1280, turning the formula into MSS = max(MTU - ESP overhead, 1280) - IP header - TCP header With the (P)MTU near 1280, the calculated MSS is too large and the resulting TCP packets never make it to the destination because they are over the actual PMTU. The above commit also causes suboptimal double fragmentation in xfrm tunnel mode, as described in https://lore.kernel.org/netdev/20210429202529.codhwpc7w6kbudug@dwarf.suse.cz/ The original problem the above commit was trying to fix is now fixed by commit 6596a0229541270fb8d38d989f91b78838e5e9da ("xfrm: fix MTU regression"). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-01-26xfrm: Check if_id in xfrm_migrateYan Yan1-2/+3
This patch enables distinguishing SAs and SPs based on if_id during the xfrm_migrate flow. This ensures support for xfrm interfaces throughout the SA/SP lifecycle. When there are multiple existing SPs with the same direction, the same xfrm_selector and different endpoint addresses, xfrm_migrate might fail with ENODATA. Specifically, the code path for performing xfrm_migrate is: Stage 1: find policy to migrate with xfrm_migrate_policy_find(sel, dir, type, net) Stage 2: find and update state(s) with xfrm_migrate_state_find(mp, net) Stage 3: update endpoint address(es) of template(s) with xfrm_policy_migrate(pol, m, num_migrate) Currently "Stage 1" always returns the first xfrm_policy that matches, and "Stage 3" looks for the xfrm_tmpl that matches the old endpoint address. Thus if there are multiple xfrm_policy with same selector, direction, type and net, "Stage 1" might rertun a wrong xfrm_policy and "Stage 3" will fail with ENODATA because it cannot find a xfrm_tmpl with the matching endpoint address. The fix is to allow userspace to pass an if_id and add if_id to the matching rule in Stage 1 and Stage 2 since if_id is a unique ID for xfrm_policy and xfrm_state. For compatibility, if_id will only be checked if the attribute is set. Tested with additions to Android's kernel unit test suite: https://android-review.googlesource.com/c/kernel/tests/+/1668886 Signed-off-by: Yan Yan <evitayan@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-01-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Merge in fixes directly in prep for the 5.17 merge window. No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06Merge branch 'master' of ↵David S. Miller1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-01-06 1) Fix some clang_analyzer warnings about never read variables. From luo penghao. 2) Check for pols[0] only once in xfrm_expand_policies(). From Jean Sacren. 3) The SA curlft.use_time was updated only on SA cration time. Update whenever the SA is used. From Antony Antony 4) Add support for SM3 secure hash. From Xu Jia. 5) Add support for SM4 symmetric cipher algorithm. From Xu Jia. 6) Add a rate limit for SA mapping change messages. From Antony Antony. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-23xfrm: rate limit SA mapping change message to user spaceAntony Antony1-0/+5
Kernel generates mapping change message, XFRM_MSG_MAPPING, when a source port chage is detected on a input state with UDP encapsulation set. Kernel generates a message for each IPsec packet with new source port. For a high speed flow per packet mapping change message can be excessive, and can overload the user space listener. Introduce rate limiting for XFRM_MSG_MAPPING message to the user space. The rate limiting is configurable via netlink, when adding a new SA or updating it. Use the new attribute XFRMA_MTIMER_THRESH in seconds. v1->v2 change: update xfrm_sa_len() v2->v3 changes: use u32 insted unsigned long to reduce size of struct xfrm_state fix xfrm_ompat size Reported-by: kernel test robot <lkp@intel.com> accept XFRM_MSG_MAPPING only when XFRMA_ENCAP is present Co-developed-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-12-10xfrm: add net device refcount tracker to struct xfrm_state_offloadEric Dumazet1-1/+2
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Link: https://lore.kernel.org/r/20211209154451.4184050-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-23xfrm: fix dflt policy check when there is no policy configuredNicolas Dichtel1-1/+1
When there is no policy configured on the system, the default policy is checked in xfrm_route_forward. However, it was done with the wrong direction (XFRM_POLICY_FWD instead of XFRM_POLICY_OUT). The default policy for XFRM_POLICY_FWD was checked just before, with a call to xfrm[46]_policy_check(). CC: stable@vger.kernel.org Fixes: 2d151d39073a ("xfrm: Add possibility to set the default to block if we have no policy") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-07-21xfrm: Add possibility to set the default to block if we have no policySteffen Klassert1-6/+30
As the default we assume the traffic to pass, if we have no matching IPsec policy. With this patch, we have a possibility to change this default from allow to block. It can be configured via netlink. Each direction (input/output/forward) can be configured separately. With the default to block configuered, we need allow policies for all packet flows we accept. We do not use default policy lookup for the loopback device. v1->v2 - fix compiling when XFRM is disabled - Reported-by: kernel test robot <lkp@intel.com> Co-developed-by: Christian Langrock <christian.langrock@secunet.com> Signed-off-by: Christian Langrock <christian.langrock@secunet.com> Co-developed-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-28Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/gitDavid S. Miller1-22/+15
/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-06-28 1) Remove an unneeded error assignment in esp4_gro_receive(). From Yang Li. 2) Add a new byseq state hashtable to find acquire states faster. From Sabrina Dubroca. 3) Remove some unnecessary variables in pfkey_create(). From zuoqilin. 4) Remove the unused description from xfrm_type struct. From Florian Westphal. 5) Fix a spelling mistake in the comment of xfrm_state_ok(). From gushengxian. 6) Replace hdr_off indirections by a small helper function. From Florian Westphal. 7) Remove xfrm4_output_finish and xfrm6_output_finish declarations, they are not used anymore.From Antony Antony. 8) Remove xfrm replay indirections. From Florian Westphal. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-23net/xfrm: Add inner_ipproto into sec_pathHuy Nguyen1-0/+1
The inner_ipproto saves the inner IP protocol of the plain text packet. This allows vendor's IPsec feature making offload decision at skb's features_check and configuring hardware at ndo_start_xmit. For example, ConnectX6-DX IPsec device needs the plaintext's IP protocol to support partial checksum offload on VXLAN/GENEVE packet over IPsec transport mode tunnel. Signed-off-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Huy Nguyen <huyn@nvidia.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-21xfrm: replay: remove last replay indirectionFlorian Westphal1-7/+1
This replaces the overflow indirection with the new xfrm_replay_overflow helper. After this, the 'repl' pointer in xfrm_state is no longer needed and can be removed as well. xfrm_replay_overflow() is added in two incarnations, one is used when the kernel is compiled with xfrm hardware offload support enabled, the other when its disabled. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: avoid replay indirectionFlorian Westphal1-3/+1
Add and use xfrm_replay_check helper instead of indirection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: remove recheck indirectionFlorian Westphal1-3/+1
Adds new xfrm_replay_recheck() helper and calls it from xfrm input path instead of the indirection. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: remove advance indirectionFlorian Westphal1-1/+1
Similar to other patches: add a new helper to avoid an indirection. v2: fix 'net/xfrm/xfrm_replay.c:519:13: warning: 'seq' may be used uninitialized in this function' warning. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-21xfrm: replay: avoid xfrm replay notify indirectionFlorian Westphal1-1/+10
replay protection is implemented using a callback structure and then called via x->repl->notify(), x->repl->recheck(), and so on. all the differect functions are always built-in, so this could be direct calls instead. This first patch prepares for removal of the x->repl structure. Add an enum with the three available replay modes to the xfrm_state structure and then replace all x->repl->notify() calls by the new xfrm_replay_notify() helper. The helper checks the enum internally to adapt behaviour as needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-16xfrm: delete xfrm4_output_finish xfrm6_output_finish declarationsAntony Antony1-2/+0
These function declarations are not needed any more. The definitions were deleted. Fixes: 2ab6096db2f1 ("xfrm: remove output_finish indirection from xfrm_state_afinfo") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-11xfrm: remove hdr_offset indirectionFlorian Westphal1-3/+0
After previous patches all remaining users set the function pointer to the same function: xfrm6_find_1stfragopt. So remove this function pointer and call ip6_find_1stfragopt directly. Reduces size of xfrm_type to 64 bytes on 64bit platforms. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-09xfrm: remove description from xfrm_type structFlorian Westphal1-2/+0
Its set but never read. Reduces size of xfrm_type to 64 bytes on 64bit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-06-01xfrm: Remove the repeated declarationShaokun Zhang1-1/+0
Function 'xfrm_parse_spi' is declared twice, so remove the repeated declaration. Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-05-14xfrm: add state hashtable keyed by seqSabrina Dubroca1-0/+1
When creating new states with seq set in xfrm_usersa_info, we walk through all the states already installed in that netns to find a matching ACQUIRE state (__xfrm_find_acq_byseq, called from xfrm_state_add). This causes severe slowdowns on systems with a large number of states. This patch introduces a hashtable using x->km.seq as key, so that the corresponding state can be found in a reasonable time. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-04-19xfrm: xfrm_state_mtu should return at least 1280 for ipv6Sabrina Dubroca1-0/+1
Jianwen reported that IPv6 Interoperability tests are failing in an IPsec case where one of the links between the IPsec peers has an MTU of 1280. The peer generates a packet larger than this MTU, the router replies with a "Packet too big" message indicating an MTU of 1280. When the peer tries to send another large packet, xfrm_state_mtu returns 1280 - ipsec_overhead, which causes ip6_setup_cork to fail with EINVAL. We can fix this by forcing xfrm_state_mtu to return IPV6_MIN_MTU when IPv6 is used. After going through IPsec, the packet will then be fragmented to obey the actual network's PMTU, just before leaving the host. Currently, TFC padding is capped to PMTU - overhead to avoid fragementation: after padding and encapsulation, we still fit within the PMTU. That behavior is preserved in this patch. Fixes: 91657eafb64b ("xfrm: take net hdr len into account for esp payload size calculation") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-03-24xfrm: Fix NULL pointer dereference on policy lookupSteffen Klassert1-1/+1
When xfrm interfaces are used in combination with namespaces and ESP offload, we get a dst_entry NULL pointer dereference. This is because we don't have a dst_entry attached in the ESP offloading case and we need to do a policy lookup before the namespace transition. Fix this by expicit checking of skb_dst(skb) before accessing it. Fixes: f203b76d78092 ("xfrm: Add virtual xfrm interfaces") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-03-03xfrm: Use actual socket sk instead of skb socket for xfrm_output_resumeEvan Nimmo1-1/+1
A situation can occur where the interface bound to the sk is different to the interface bound to the sk attached to the skb. The interface bound to the sk is the correct one however this information is lost inside xfrm_output2 and instead the sk on the skb is used in xfrm_output_resume instead. This assumes that the sk bound interface and the bound interface attached to the sk within the skb are the same which can lead to lookup failures inside ip_route_me_harder resulting in the packet being dropped. We have an l2tp v3 tunnel with ipsec protection. The tunnel is in the global VRF however we have an encapsulated dot1q tunnel interface that is within a different VRF. We also have a mangle rule that marks the packets causing them to be processed inside ip_route_me_harder. Prior to commit 31c70d5956fc ("l2tp: keep original skb ownership") this worked fine as the sk attached to the skb was changed from the dot1q encapsulated interface to the sk for the tunnel which meant the interface bound to the sk and the interface bound to the skb were identical. Commit 46d6c5ae953c ("netfilter: use actual socket sk rather than skb sk when routing harder") fixed some of these issues however a similar problem existed in the xfrm code. Fixes: 31c70d5956fc ("l2tp: keep original skb ownership") Signed-off-by: Evan Nimmo <evan.nimmo@alliedtelesis.co.nz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-10-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-10/+6
Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24xfrm/compat: Translate 32-bit user_policy from sockptrDmitry Safonov1-0/+3
Provide compat_xfrm_userpolicy_info translation for xfrm setsocketopt(). Reallocate buffer and put the missing padding for 64-bit message. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-09-24xfrm/compat: Add 32=>64-bit messages translatorDmitry Safonov1-0/+6
Provide the user-to-kernel translator under XFRM_USER_COMPAT, that creates for 32-bit xfrm-user message a 64-bit translation. The translation is afterwards reused by xfrm_user code just as if userspace had sent 64-bit message. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-09-24xfrm/compat: Add 64=>32-bit messages translatorDmitry Safonov1-0/+5
Provide the kernel-to-user translator under XFRM_USER_COMPAT, that creates for 64-bit xfrm-user message a 32-bit translation and puts it in skb's frag_list. net/compat.c layer provides MSG_CMSG_COMPAT to decide if the message should be taken from skb or frag_list. (used by wext-core which has also an ABI difference) Kernel sends 64-bit xfrm messages to the userspace for: - multicast (monitor events) - netlink dumps Wire up the translator to xfrm_nlmsg_multicast(). Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-09-24xfrm: Provide API to register translator moduleDmitry Safonov1-0/+19
Add a skeleton for xfrm_compat module and provide API to register it in xfrm_state.ko. struct xfrm_translator will have function pointers to translate messages received from 32-bit userspace or to be sent to it from 64-bit kernel. module_get()/module_put() are used instead of rcu_read_lock() as the module will vmalloc() memory for translation. The new API is registered with xfrm_state module, not with xfrm_user as the former needs translator for user_policy set by setsockopt() and xfrm_user already uses functions from xfrm_state. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-09-07xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrateAntony Antony1-10/+6
XFRMA_REPLAY_ESN_VAL was not cloned completely from the old to the new. Migrate this attribute during XFRMA_MSG_MIGRATE v1->v2: - move curleft cloning to a separate patch Fixes: af2f464e326e ("xfrm: Assign esn pointers when cloning a state") Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-08-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-6/+9
Resolved kernel/bpf/btf.c using instructions from merge commit 69138b34a7248d2396ab85c8652e20c0c39beaba Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31Merge branch 'master' of ↵David S. Miller1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2020-07-30 Please note that I did the first time now --no-ff merges of my testing branch into the master branch to include the [PATCH 0/n] message of a patchset. Please let me know if this is desirable, or if I should do it any different. 1) Introduce a oseq-may-wrap flag to disable anti-replay protection for manually distributed ICVs as suggested in RFC 4303. From Petr Vaněk. 2) Patchset to fully support IPCOMP for vti4, vti6 and xfrm interfaces. From Xin Long. 3) Switch from a linear list to a hash list for xfrm interface lookups. From Eyal Birger. 4) Fixes to not register one xfrm(6)_tunnel object twice. From Xin Long. 5) Fix two compile errors that were introduced with the IPCOMP support for vti and xfrm interfaces. Also from Xin Long. 6) Make the policy hold queue work with VTI. This was forgotten when VTI was implemented. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25net/xfrm: switch xfrm_user_policy to sockptr_tChristoph Hellwig1-3/+5
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21xfrm: Fix crash when the hold queue is used.Steffen Klassert1-2/+2
The commits "xfrm: Move dst->path into struct xfrm_dst" and "net: Create and use new helper xfrm_dst_child()." changed xfrm bundle handling under the assumption that xdst->path and dst->child are not a NULL pointer only if dst->xfrm is not a NULL pointer. That is true with one exception. If the xfrm hold queue is used to wait until a SA is installed by the key manager, we create a dummy bundle without a valid dst->xfrm pointer. The current xfrm bundle handling crashes in that case. Fix this by extending the NULL check of dst->xfrm with a test of the DST_XFRM_QUEUE flag. Fixes: 0f6c480f23f4 ("xfrm: Move dst->path into struct xfrm_dst") Fixes: b92cf4aab8e6 ("net: Create and use new helper xfrm_dst_child().") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-07-09tunnel6: add tunnel6_input_afinfo for ipip and ipv6 tunnelsXin Long1-0/+1
This patch is to register a callback function tunnel6_rcv_cb with is_ipip set in a xfrm_input_afinfo object for tunnel6 and tunnel46. It will be called by xfrm_rcv_cb() from xfrm_input() when family is AF_INET6 and proto is IPPROTO_IPIP or IPPROTO_IPV6. v1->v2: - Fix a sparse warning caused by the missing "__rcu", as Jakub noticed. - Handle the err returned by xfrm_input_register_afinfo() in tunnel6_init/fini(), as Sabrina noticed. v2->v3: - Add "#if IS_ENABLED(CONFIG_INET6_XFRM_TUNNEL)" to fix the build error when xfrm is disabled, reported by kbuild test robot Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-07-09tunnel4: add cb_handler to struct xfrm_tunnelXin Long1-0/+1
This patch is to register a callback function tunnel4_rcv_cb with is_ipip set in a xfrm_input_afinfo object for tunnel4 and tunnel64. It will be called by xfrm_rcv_cb() from xfrm_input() when family is AF_INET and proto is IPPROTO_IPIP or IPPROTO_IPV6. v1->v2: - Fix a sparse warning caused by the missing "__rcu", as Jakub noticed. - Handle the err returned by xfrm_input_register_afinfo() in tunnel4_init/fini(), as Sabrina noticed. v2->v3: - Add "#if IS_ENABLED(CONFIG_INET_XFRM_TUNNEL)" to fix the build error when xfrm is disabled, reported by kbuild test robot. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-07-09xfrm: add is_ipip to struct xfrm_input_afinfoXin Long1-1/+2
This patch is to add a new member is_ipip to struct xfrm_input_afinfo, to allow another group family of callback functions to be registered with is_ipip set. This will be used for doing a callback for struct xfrm(6)_tunnel of ipip/ipv6 tunnels in xfrm_input() by calling xfrm_rcv_cb(), which is needed by ipip/ipv6 tunnels' support in ip(6)_vti and xfrm interface in the next patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>