summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-04-13nfc: nci: add flush_workqueue to prevent uafLin Ma1-0/+4
Our detector found a concurrent use-after-free bug when detaching an NCI device. The main reason for this bug is the unexpected scheduling between the used delayed mechanism (timer and workqueue). The race can be demonstrated below: Thread-1 Thread-2 | nci_dev_up() | nci_open_device() | __nci_request(nci_reset_req) | nci_send_cmd | queue_work(cmd_work) nci_unregister_device() | nci_close_device() | ... del_timer_sync(cmd_timer)[1] | ... | Worker nci_free_device() | nci_cmd_work() kfree(ndev)[3] | mod_timer(cmd_timer)[2] In short, the cleanup routine thought that the cmd_timer has already been detached by [1] but the mod_timer can re-attach the timer [2], even it is already released [3], resulting in UAF. This UAF is easy to trigger, crash trace by POC is like below [ 66.703713] ================================================================== [ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490 [ 66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33 [ 66.703974] [ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5 [ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work [ 66.703974] Call Trace: [ 66.703974] <TASK> [ 66.703974] dump_stack_lvl+0x57/0x7d [ 66.703974] print_report.cold+0x5e/0x5db [ 66.703974] ? enqueue_timer+0x448/0x490 [ 66.703974] kasan_report+0xbe/0x1c0 [ 66.703974] ? enqueue_timer+0x448/0x490 [ 66.703974] enqueue_timer+0x448/0x490 [ 66.703974] __mod_timer+0x5e6/0xb80 [ 66.703974] ? mark_held_locks+0x9e/0xe0 [ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0 [ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410 [ 66.703974] ? queue_work_on+0x61/0x80 [ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130 [ 66.703974] process_one_work+0x8bb/0x1510 [ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230 [ 66.703974] ? rwlock_bug.part.0+0x90/0x90 [ 66.703974] ? _raw_spin_lock_irq+0x41/0x50 [ 66.703974] worker_thread+0x575/0x1190 [ 66.703974] ? process_one_work+0x1510/0x1510 [ 66.703974] kthread+0x2a0/0x340 [ 66.703974] ? kthread_complete_and_exit+0x20/0x20 [ 66.703974] ret_from_fork+0x22/0x30 [ 66.703974] </TASK> [ 66.703974] [ 66.703974] Allocated by task 267: [ 66.703974] kasan_save_stack+0x1e/0x40 [ 66.703974] __kasan_kmalloc+0x81/0xa0 [ 66.703974] nci_allocate_device+0xd3/0x390 [ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0 [ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd [ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0 [ 66.703974] tty_ioctl+0x764/0x1310 [ 66.703974] __x64_sys_ioctl+0x122/0x190 [ 66.703974] do_syscall_64+0x3b/0x90 [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 66.703974] [ 66.703974] Freed by task 406: [ 66.703974] kasan_save_stack+0x1e/0x40 [ 66.703974] kasan_set_track+0x21/0x30 [ 66.703974] kasan_set_free_info+0x20/0x30 [ 66.703974] __kasan_slab_free+0x108/0x170 [ 66.703974] kfree+0xb0/0x330 [ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0 [ 66.703974] nci_uart_tty_close+0xdf/0x180 [ 66.703974] tty_ldisc_kill+0x73/0x110 [ 66.703974] tty_ldisc_hangup+0x281/0x5b0 [ 66.703974] __tty_hangup.part.0+0x431/0x890 [ 66.703974] tty_release+0x3a8/0xc80 [ 66.703974] __fput+0x1f0/0x8c0 [ 66.703974] task_work_run+0xc9/0x170 [ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0 [ 66.703974] syscall_exit_to_user_mode+0x19/0x50 [ 66.703974] do_syscall_64+0x48/0x90 [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae To fix the UAF, this patch adds flush_workqueue() to ensure the nci_cmd_work is finished before the following del_timer_sync. This combination will promise the timer is actually detached. Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13Merge tag 'wireless-2022-04-13' of ↵David S. Miller3-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v5.18 First set of fixes for v5.18. Maintainers file updates, two compilation warning fixes, one revert for ath11k and smaller fixes to drivers and stack. All the usual stuff. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-13Revert "net: dsa: setup master before ports"Vladimir Oltean1-13/+10
This reverts commit 11fd667dac315ea3f2469961f6d2869271a46cae. dsa_slave_change_mtu() updates the MTU of the DSA master and of the associated CPU port, but only if it detects a change to the master MTU. The blamed commit in the Fixes: tag below addressed a regression where dsa_slave_change_mtu() would return early and not do anything due to ds->ops->port_change_mtu() not being implemented. However, that commit also had the effect that the master MTU got set up to the correct value by dsa_master_setup(), but the associated CPU port's MTU did not get updated. This causes breakage for drivers that rely on the ->port_change_mtu() DSA call to account for the tagging overhead on the CPU port, and don't set up the initial MTU during the setup phase. Things actually worked before because they were in a fragile equilibrium where dsa_slave_change_mtu() was called before dsa_master_setup() was. So dsa_slave_change_mtu() could actually detect a change and update the CPU port MTU too. Restore the code to the way things used to work by reverting the reorder of dsa_tree_setup_master() and dsa_tree_setup_ports(). That change did not have a concrete motivation going for it anyway, it just looked better. Fixes: 066dfc429040 ("Revert "net: dsa: stop updating master MTU from master.c"") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski2-5/+4
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix cgroupv2 from the input path, from Florian Westphal. 2) Fix incorrect return value of nft_parse_register(), from Antoine Tenart. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: nft_parse_register can return a negative value netfilter: nft_socket: make cgroup match work in input too ==================== Link: https://lore.kernel.org/r/20220412094246.448055-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12netfilter: nf_tables: nft_parse_register can return a negative valueAntoine Tenart1-1/+1
Since commit 6e1acfa387b9 ("netfilter: nf_tables: validate registers coming from userspace.") nft_parse_register can return a negative value, but the function prototype is still returning an unsigned int. Fixes: 6e1acfa387b9 ("netfilter: nf_tables: validate registers coming from userspace.") Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-12sctp: Initialize daddr on peeled off socketPetr Malat1-1/+1
Function sctp_do_peeloff() wrongly initializes daddr of the original socket instead of the peeled off socket, which makes getpeername() return zeroes instead of the primary address. Initialize the new socket instead. Fixes: d570ee490fb1 ("[SCTP]: Correctly set daddr for IPv6 sockets during peeloff") Signed-off-by: Petr Malat <oss@malat.biz> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20220409063611.673193-1-oss@malat.biz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12net/smc: Fix af_ops of child socket pointing to released memoryKarsten Graul1-2/+12
Child sockets may inherit the af_ops from the parent listen socket. When the listen socket is released then the af_ops of the child socket points to released memory. Solve that by restoring the original af_ops for child sockets which inherited the parent af_ops. And clear any inherited user_data of the parent socket. Fixes: 8270d9c21041 ("net/smc: Limit backlog connections") Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()Karsten Graul1-2/+3
dev_name() was called with dev.parent as argument but without to NULL-check it before. Solve this by checking the pointer before the call to dev_name(). Fixes: af5f60c7e3d5 ("net/smc: allow PCI IDs as ib device names in the pnet table") Reported-by: syzbot+03e3e228510223dabd34@syzkaller.appspotmail.com Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-12net/smc: use memcpy instead of snprintf to avoid out of bounds readKarsten Graul1-2/+4
Using snprintf() to convert not null-terminated strings to null terminated strings may cause out of bounds read in the source string. Therefore use memcpy() and terminate the target string with a null afterwards. Fixes: fa0866625543 ("net/smc: add support for user defined EIDs") Fixes: 3c572145c24e ("net/smc: add generic netlink support for system EID") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11ipv6: fix panic when forwarding a pkt with no in6 devNicolas Dichtel1-1/+1
kongweibin reported a kernel panic in ip6_forward() when input interface has no in6 dev associated. The following tc commands were used to reproduce this panic: tc qdisc del dev vxlan100 root tc qdisc add dev vxlan100 root netem corrupt 5% CC: stable@vger.kernel.org Fixes: ccd27f05ae7b ("ipv6: fix 'disable_policy' for fwd packets") Reported-by: kongweibin <kongweibin2@huawei.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-11netfilter: nft_socket: make cgroup match work in input tooFlorian Westphal1-4/+3
cgroupv2 helper function ignores the already-looked up sk and uses skb->sk instead. Just pass sk from the calling function instead; this will make cgroup matching work for udp and tcp in input even when edemux did not set skb->sk already. Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2") Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Topi Miettinen <toiwoton@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-11mac80211: fix ht_capa printout in debugfsBen Greear1-1/+1
Don't use sizeof(pointer) when calculating scnprintf offset. Fixes: 01f84f0ed3b4 ("mac80211: reduce stack usage in debugfs") Signed-off-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20220406175659.20611-1-greearb@candelatech.com [correct the Fixes tag] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-04-11cfg80211: hold bss_lock while updating nontrans_listRameshkumar Sundaram1-0/+2
Synchronize additions to nontrans_list of transmitting BSS with bss_lock to avoid races. Also when cfg80211_add_nontrans_list() fails __cfg80211_unlink_bss() needs bss_lock to be held (has lockdep assert on bss_lock). So protect the whole block with bss_lock to avoid races and warnings. Found during code review. Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com> Link: https://lore.kernel.org/r/1649668071-9370-1-git-send-email-quic_ramess@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-04-11nl80211: correctly check NL80211_ATTR_REG_ALPHA2 sizeJohannes Berg1-1/+2
We need this to be at least two bytes, so we can access alpha2[0] and alpha2[1]. It may be three in case some userspace used NUL-termination since it was NLA_STRING (and we also push it out with NUL-termination). Cc: stable@vger.kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20220411114201.fd4a31f06541.Ie7ff4be2cf348d8cc28ed0d626fc54becf7ea799@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-04-11net/sched: taprio: Check if socket flags are validBenedikt Spranger1-1/+2
A user may set the SO_TXTIME socket option to ensure a packet is send at a given time. The taprio scheduler has to confirm, that it is allowed to send a packet at that given time, by a check against the packet time schedule. The scheduler drop the packet, if the gates are closed at the given send time. The check, if SO_TXTIME is set, may fail since sk_flags are part of an union and the union is used otherwise. This happen, if a socket is not a full socket, like a request socket for example. Add a check to verify, if the union is used for sk_flags. Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode") Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-09net/sched: fix initialization order when updating chain 0 headMarcelo Ricardo Leitner1-1/+1
Currently, when inserting a new filter that needs to sit at the head of chain 0, it will first update the heads pointer on all devices using the (shared) block, and only then complete the initialization of the new element so that it has a "next" element. This can lead to a situation that the chain 0 head is propagated to another CPU before the "next" initialization is done. When this race condition is triggered, packets being matched on that CPU will simply miss all other filters, and will flow through the stack as if there were no other filters installed. If the system is using OVS + TC, such packets will get handled by vswitchd via upcall, which results in much higher latency and reordering. For other applications it may result in packet drops. This is reproducible with a tc only setup, but it varies from system to system. It could be reproduced with a shared block amongst 10 veth tunnels, and an ingress filter mirroring packets to another veth. That's because using the last added veth tunnel to the shared block to do the actual traffic, it makes the race window bigger and easier to trigger. The fix is rather simple, to just initialize the next pointer of the new filter instance (tp) before propagating the head change. The fixes tag is pointing to the original code though this issue should only be observed when using it unlocked. Fixes: 2190d1d0944f ("net: sched: introduce helpers to work with filter chains") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/b97d5f4eaffeeb9d058155bcab63347527261abf.1649341369.git.marcelo.leitner@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-09sctp: use the correct skb for security_sctp_assoc_requestXin Long1-3/+3
Yi Chen reported an unexpected sctp connection abort, and it occurred when COOKIE_ECHO is bundled with DATA Fragment by SCTP HW GSO. As the IP header is included in chunk->head_skb instead of chunk->skb, it failed to check IP header version in security_sctp_assoc_request(). According to Ondrej, SELinux only looks at IP header (address and IPsec options) and XFRM state data, and these are all included in head_skb for SCTP HW GSO packets. So fix it by using head_skb when calling security_sctp_assoc_request() in processing COOKIE_ECHO. v1->v2: - As Ondrej noticed, chunk->head_skb should also be used for security_sctp_assoc_established() in sctp_sf_do_5_1E_ca(). Fixes: e215dab1c490 ("security: call security_sctp_assoc_request in sctp_sf_do_5_1D_ce") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/71becb489e51284edf0c11fc15246f4ed4cef5b6.1649337862.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08flow_dissector: fix false-positive __read_overflow2_field() warningJakub Kicinski1-1/+1
Bounds checking is unhappy that we try to copy both Ethernet addresses but pass pointer to the first one. Luckily destination address is the first field so pass the pointer to the entire header, whatever. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08net/sched: flower: fix parsing of ethertype following VLAN headerVlad Buslov2-5/+14
A tc flower filter matching TCA_FLOWER_KEY_VLAN_ETH_TYPE is expected to match the L2 ethertype following the first VLAN header, as confirmed by linked discussion with the maintainer. However, such rule also matches packets that have additional second VLAN header, even though filter has both eth_type and vlan_ethtype set to "ipv4". Looking at the code this seems to be mostly an artifact of the way flower uses flow dissector. First, even though looking at the uAPI eth_type and vlan_ethtype appear like a distinct fields, in flower they are all mapped to the same key->basic.n_proto. Second, flow dissector skips following VLAN header as no keys for FLOW_DISSECTOR_KEY_CVLAN are set and eventually assigns the value of n_proto to last parsed header. With these, such filters ignore any headers present between first VLAN header and first "non magic" header (ipv4 in this case) that doesn't result FLOW_DISSECT_RET_PROTO_AGAIN. Fix the issue by extending flow dissector VLAN key structure with new 'vlan_eth_type' field that matches first ethertype following previously parsed VLAN header. Modify flower classifier to set the new flow_dissector_key_vlan->vlan_eth_type with value obtained from TCA_FLOWER_KEY_VLAN_ETH_TYPE/TCA_FLOWER_KEY_CVLAN_ETH_TYPE uAPIs. Link: https://lore.kernel.org/all/Yjhgi48BpTGh6dig@nanopsycho/ Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support") Fixes: d64efd0926ba ("net/sched: flower: Add supprt for matching on QinQ vlan headers") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-07Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski1-4/+13
Alexei Starovoitov says: ==================== pull-request: bpf 2022-04-06 We've added 8 non-merge commits during the last 8 day(s) which contain a total of 9 files changed, 139 insertions(+), 36 deletions(-). The main changes are: 1) rethook related fixes, from Jiri and Masami. 2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin. 3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets bpf: Support dual-stack sockets in bpf_tcp_check_syncookie bpf: selftests: Test fentry tracing a struct_ops program bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT rethook: Fix to use WRITE_ONCE() for rethook:: Handler selftests/bpf: Fix warning comparing pointer to 0 bpf: Fix sparse warnings in kprobe_multi_resolve_syms bpftool: Explicit errno handling in skeletons ==================== Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06bpf: Support dual-stack sockets in bpf_tcp_check_syncookieMaxim Mikityanskiy1-4/+13
bpf_tcp_gen_syncookie looks at the IP version in the IP header and validates the address family of the socket. It supports IPv4 packets in AF_INET6 dual-stack sockets. On the other hand, bpf_tcp_check_syncookie looks only at the address family of the socket, ignoring the real IP version in headers, and validates only the packet size. This implementation has some drawbacks: 1. Packets are not validated properly, allowing a BPF program to trick bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4 socket. 2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end up receiving a SYNACK with the cookie, but the following ACK gets dropped. This patch fixes these issues by changing the checks in bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP version from the header is taken into account, and it is validated properly with address family. Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Arthur Fabre <afabre@cloudflare.com> Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com
2022-04-06net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=nFlorian Westphal1-1/+1
net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole' Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used. Fixes: 4b340a5a726d ("net: ip6mr: add support for passing full packet on wrong mif") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06rxrpc: fix a race in rxrpc_exit_net()Eric Dumazet1-1/+1
Current code can lead to the following race: CPU0 CPU1 rxrpc_exit_net() rxrpc_peer_keepalive_worker() if (rxnet->live) rxnet->live = false; del_timer_sync(&rxnet->peer_keepalive_timer); timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay); cancel_work_sync(&rxnet->peer_keepalive_work); rxrpc_exit_net() exits while peer_keepalive_timer is still armed, leading to use-after-free. syzbot report was: ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0 WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c0207623 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: netns cleanup_net RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3 RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000 RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0 R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000 FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __debug_check_no_obj_freed lib/debugobjects.c:992 [inline] debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023 kfree+0xd6/0x310 mm/slab.c:3809 ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176 ops_free_list net/core/net_namespace.c:174 [inline] cleanup_net+0x591/0xb00 net/core/net_namespace.c:598 process_one_work+0x996/0x1610 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298 </TASK> Fixes: ace45bec6d77 ("rxrpc: Fix firewall route keepalive") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: linux-afs@lists.infradead.org Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: openvswitch: fix leak of nested actionsIlya Maximets1-5/+90
While parsing user-provided actions, openvswitch module may dynamically allocate memory and store pointers in the internal copy of the actions. So this memory has to be freed while destroying the actions. Currently there are only two such actions: ct() and set(). However, there are many actions that can hold nested lists of actions and ovs_nla_free_flow_actions() just jumps over them leaking the memory. For example, removal of the flow with the following actions will lead to a leak of the memory allocated by nf_ct_tmpl_alloc(): actions:clone(ct(commit),0) Non-freed set() action may also leak the 'dst' structure for the tunnel info including device references. Under certain conditions with a high rate of flow rotation that may cause significant memory leak problem (2MB per second in reporter's case). The problem is also hard to mitigate, because the user doesn't have direct control over the datapath flows generated by OVS. Fix that by iterating over all the nested actions and freeing everything that needs to be freed recursively. New build time assertion should protect us from this problem if new actions will be added in the future. Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all attributes has to be explicitly checked. sample() and clone() actions are mixing extra attributes into the user-provided action list. That prevents some code generalization too. Fixes: 34ae932a4036 ("openvswitch: Make tunnel set action attach a metadata dst") Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.html Reported-by: Stéphane Graber <stgraber@ubuntu.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06net: openvswitch: don't send internal clone attribute to the userspace.Ilya Maximets2-2/+4
'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for performance optimization inside the kernel. It's added by the kernel while parsing user-provided actions and should not be sent during the flow dump as it's not part of the uAPI. The issue doesn't cause any significant problems to the ovs-vswitchd process, because reported actions are not really used in the application lifecycle and only supposed to be shown to a human via ovs-dpctl flow dump. However, the action list is still incorrect and causes the following error if the user wants to look at the datapath flows: # ovs-dpctl add-dp system@ovs-system # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)" # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(bad length 4, expected -1 for: action0(01 00 00 00), ct(commit),0) With the fix: # ovs-dpctl dump-flows <flow match>, packets:0, bytes:0, used:never, actions:clone(ct(commit),0) Additionally fixed an incorrect attribute name in the comment. Fixes: b233504033db ("openvswitch: kernel datapath clone action") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski7-8/+8
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Incorrect comparison in bitmask .reduce, from Jeremy Sowden. 2) Missing GFP_KERNEL_ACCOUNT for dynamically allocated objects, from Vasily Averin. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: memcg accounting for dynamically allocated objects netfilter: bitwise: fix reduce comparisons ==================== Link: https://lore.kernel.org/r/20220405100923.7231-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-05ipv6: Fix stats accounting in ip6_pkt_dropDavid Ahern1-1/+1
VRF devices are the loopbacks for VRFs, and a loopback can not be assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should be '||' not '&&'. Fixes: 1d3fd8a10bed ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach") Reported-by: Pudak, Filip <Filip.Pudak@windriver.com> Reported-by: Xiao, Jiguang <Jiguang.Xiao@windriver.com> Signed-off-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-05netfilter: nf_tables: memcg accounting for dynamically allocated objectsVasily Averin6-6/+6
nft_*.c files whose NFT_EXPR_STATEFUL flag is set on need to use __GFP_ACCOUNT flag for objects that are dynamically allocated from the packet path. Such objects are allocated inside nft_expr_ops->init() callbacks executed in task context while processing netlink messages. In addition, this patch adds accounting to nft_set_elem_expr_clone() used for the same purposes. Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-05sctp: count singleton chunks in assoc user statsJamie Bainbridge1-1/+5
Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN- COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks" counter available to the assoc owner. These are all control chunks so they should be counted as such. Add counting of singleton chunks so they are properly accounted for. Fixes: 196d67593439 ("sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/c9ba8785789880cf07923b8a5051e174442ea9ee.1649029663.git.jamie.bainbridge@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-01net: ipv4: fix route with nexthop object delete warningNikolay Aleksandrov1-1/+6
FRR folks have hit a kernel warning[1] while deleting routes[2] which is caused by trying to delete a route pointing to a nexthop id without specifying nhid but matching on an interface. That is, a route is found but we hit a warning while matching it. The warning is from fib_info_nh() in include/net/nexthop.h because we run it on a fib_info with nexthop object. The call chain is: inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a nexthop fib_info and also with fc_oif set thus calling fib_info_nh on the fib_info and triggering the warning). The fix is to not do any matching in that branch if the fi has a nexthop object because those are managed separately. I.e. we should match when deleting without nh spec and should fail when deleting a nexthop route with old-style nh spec because nexthop objects are managed separately, e.g.: $ ip r show 1.2.3.4/32 1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0 $ ip r del 1.2.3.4/32 $ ip r del 1.2.3.4/32 nhid 12 <both should work> $ ip r del 1.2.3.4/32 dev dummy0 <should fail with ESRCH> [1] [ 523.462226] ------------[ cut here ]------------ [ 523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460 [ 523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd [ 523.462274] videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse [ 523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P OE 5.16.18-200.fc35.x86_64 #1 [ 523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020 [ 523.462303] RIP: 0010:fib_nh_match+0x210/0x460 [ 523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00 [ 523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286 [ 523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0 [ 523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380 [ 523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000 [ 523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031 [ 523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0 [ 523.462311] FS: 00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000 [ 523.462313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0 [ 523.462315] Call Trace: [ 523.462316] <TASK> [ 523.462320] fib_table_delete+0x1a9/0x310 [ 523.462323] inet_rtm_delroute+0x93/0x110 [ 523.462325] rtnetlink_rcv_msg+0x133/0x370 [ 523.462327] ? _copy_to_iter+0xb5/0x6f0 [ 523.462330] ? rtnl_calcit.isra.0+0x110/0x110 [ 523.462331] netlink_rcv_skb+0x50/0xf0 [ 523.462334] netlink_unicast+0x211/0x330 [ 523.462336] netlink_sendmsg+0x23f/0x480 [ 523.462338] sock_sendmsg+0x5e/0x60 [ 523.462340] ____sys_sendmsg+0x22c/0x270 [ 523.462341] ? import_iovec+0x17/0x20 [ 523.462343] ? sendmsg_copy_msghdr+0x59/0x90 [ 523.462344] ? __mod_lruvec_page_state+0x85/0x110 [ 523.462348] ___sys_sendmsg+0x81/0xc0 [ 523.462350] ? netlink_seq_start+0x70/0x70 [ 523.462352] ? __dentry_kill+0x13a/0x180 [ 523.462354] ? __fput+0xff/0x250 [ 523.462356] __sys_sendmsg+0x49/0x80 [ 523.462358] do_syscall_64+0x3b/0x90 [ 523.462361] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 523.462364] RIP: 0033:0x7f24552aa337 [ 523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337 [ 523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003 [ 523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001 [ 523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040 [ 523.462373] </TASK> [ 523.462374] ---[ end trace ba537bc16f6bf4ed ]--- [2] https://github.com/FRRouting/frr/issues/6412 Fixes: 4c7e8084fd46 ("ipv4: Plumb support for nexthop object in a fib_info") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01mctp: Use output netdev to allocate skb headroomMatt Johnston2-16/+44
Previously the skb was allocated with headroom MCTP_HEADER_MAXLEN, but that isn't sufficient if we are using devs that are not MCTP specific. This also adds a check that the smctp_halen provided to sendmsg for extended addressing is the correct size for the netdev. Fixes: 833ef3b91de6 ("mctp: Populate socket implementation") Reported-by: Matthew Rinaldi <mjrinal@g.clemson.edu> Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01mctp: Fix check for dev_hard_header() resultMatt Johnston1-1/+1
dev_hard_header() returns the length of the header, so we need to test for negative errors rather than non-zero. Fixes: 889b7da23abf ("mctp: Add initial routing framework") Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01Revert "net: dsa: stop updating master MTU from master.c"Vladimir Oltean1-1/+24
This reverts commit a1ff94c2973c43bc1e2677ac63ebb15b1d1ff846. Switch drivers that don't implement ->port_change_mtu() will cause the DSA master to remain with an MTU of 1500, since we've deleted the other code path. In turn, this causes a regression for those systems, where MTU-sized traffic can no longer be terminated. Revert the change taking into account the fact that rtnl_lock() is now taken top-level from the callers of dsa_master_setup() and dsa_master_teardown(). Also add a comment in order for it to be absolutely clear why it is still needed. Fixes: a1ff94c2973c ("net: dsa: stop updating master MTU from master.c") Reported-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01skbuff: fix coalescing for page_pool fragment recyclingJean-Philippe Brucker1-4/+11
Fix a use-after-free when using page_pool with page fragments. We encountered this problem during normal RX in the hns3 driver: (1) Initially we have three descriptors in the RX queue. The first one allocates PAGE1 through page_pool, and the other two allocate one half of PAGE2 each. Page references look like this: RX_BD1 _______ PAGE1 RX_BD2 _______ PAGE2 RX_BD3 _________/ (2) Handle RX on the first descriptor. Allocate SKB1, eventually added to the receive queue by tcp_queue_rcv(). (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to netif_receive_skb(): netif_receive_skb(SKB2) ip_rcv(SKB2) SKB3 = skb_clone(SKB2) SKB2 and SKB3 share a reference to PAGE2 through skb_shinfo()->dataref. The other ref to PAGE2 is still held by RX_BD3: SKB2 ---+- PAGE2 SKB3 __/ / RX_BD3 _________/ (3b) Now while handling TCP, coalesce SKB3 with SKB1: tcp_v4_rcv(SKB3) tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds kfree_skb_partial(SKB3) skb_release_data(SKB3) // drops one dataref SKB1 _____ PAGE1 \____ SKB2 _____ PAGE2 / RX_BD3 _________/ In skb_try_coalesce(), __skb_frag_ref() takes a page reference to PAGE2, where it should instead have increased the page_pool frag reference, pp_frag_count. Without coalescing, when releasing both SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now when releasing SKB1 and SKB2, two references to PAGE2 will be dropped, resulting in underflow. (3c) Drop SKB2: af_packet_rcv(SKB2) consume_skb(SKB2) skb_release_data(SKB2) // drops second dataref page_pool_return_skb_page(PAGE2) // drops one pp_frag_count SKB1 _____ PAGE1 \____ PAGE2 / RX_BD3 _________/ (4) Userspace calls recvmsg() Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we release the SKB3 page as well: tcp_eat_recv_skb(SKB1) skb_release_data(SKB1) page_pool_return_skb_page(PAGE1) page_pool_return_skb_page(PAGE2) // drops second pp_frag_count (5) PAGE2 is freed, but the third RX descriptor was still using it! In our case this causes IOMMU faults, but it would silently corrupt memory if the IOMMU was disabled. Change the logic that checks whether pp_recycle SKBs can be coalesced. We still reject differing pp_recycle between 'from' and 'to' SKBs, but in order to avoid the situation described above, we also reject coalescing when both 'from' and 'to' are pp_recycled and 'from' is cloned. The new logic allows coalescing a cloned pp_recycle SKB into a page refcounted one, because in this case the release (4) will drop the right reference, the one taken by skb_try_coalesce(). Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") Suggested-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-01net/tls: fix slab-out-of-bounds bug in decrypt_internalZiyang Xuan1-1/+1
The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in tls_set_sw_offload(). The return value of crypto_aead_ivsize() for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes memory space will trigger slab-out-of-bounds bug as following: ================================================================== BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls] Read of size 16 at addr ffff888114e84e60 by task tls/10911 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x5e/0x5db ? decrypt_internal+0x385/0xc40 [tls] kasan_report+0xab/0x120 ? decrypt_internal+0x385/0xc40 [tls] kasan_check_range+0xf9/0x1e0 memcpy+0x20/0x60 decrypt_internal+0x385/0xc40 [tls] ? tls_get_rec+0x2e0/0x2e0 [tls] ? process_rx_list+0x1a5/0x420 [tls] ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls] decrypt_skb_update+0x9d/0x400 [tls] tls_sw_recvmsg+0x3c8/0xb50 [tls] Allocated by task 10911: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 tls_set_sw_offload+0x2eb/0xa20 [tls] tls_setsockopt+0x68c/0x700 [tls] __sys_setsockopt+0xfe/0x1b0 Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size when memcpy() iv value in TLS_1_3_VERSION scenario. Fixes: f295b3ae9f59 ("net/tls: Add support of AES128-CCM based ciphers") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-31Merge tag 'net-5.18-rc1' of ↵Linus Torvalds9-27/+68
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull more networking updates from Jakub Kicinski: "Networking fixes and rethook patches. Features: - kprobes: rethook: x86: replace kretprobe trampoline with rethook Current release - regressions: - sfc: avoid null-deref on systems without NUMA awareness in the new queue sizing code Current release - new code bugs: - vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices - eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when interface is down Previous releases - always broken: - openvswitch: correct neighbor discovery target mask field in the flow dump - wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak - rxrpc: fix call timer start racing with call destruction - rxrpc: fix null-deref when security type is rxrpc_no_security - can: fix UAF bugs around echo skbs in multiple drivers Misc: - docs: move netdev-FAQ to the 'process' section of the documentation" * tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices openvswitch: Add recirc_id to recirc warning rxrpc: fix some null-ptr-deref bugs in server_key.c rxrpc: Fix call timer start racing with call destruction net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware net: hns3: fix the concurrency between functions reading debugfs docs: netdev: move the netdev-FAQ to the process pages docs: netdev: broaden the new vs old code formatting guidelines docs: netdev: call out the merge window in tag checking docs: netdev: add missing back ticks docs: netdev: make the testing requirement more stringent docs: netdev: add a question about re-posting frequency docs: netdev: rephrase the 'should I update patchwork' question docs: netdev: rephrase the 'Under review' question docs: netdev: shorten the name and mention msgid for patch status docs: netdev: note that RFC postings are allowed any time docs: netdev: turn the net-next closed into a Warning docs: netdev: move the patch marking section up docs: netdev: minor reword docs: netdev: replace references to old archives ...
2022-03-31openvswitch: Add recirc_id to recirc warningStéphane Graber1-2/+2
When hitting the recirculation limit, the kernel would currently log something like this: [ 58.586597] openvswitch: ovs-system: deferred action limit reached, drop recirc action Which isn't all that useful to debug as we only have the interface name to go on but can't track it down to a specific flow. With this change, we now instead get: [ 58.586597] openvswitch: ovs-system: deferred action limit reached, drop recirc action (recirc_id=0x9e) Which can now be correlated with the flow entries from OVS. Suggested-by: Frode Nordahl <frode.nordahl@canonical.com> Signed-off-by: Stéphane Graber <stgraber@ubuntu.com> Tested-by: Stephane Graber <stgraber@ubuntu.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20220330194244.3476544-1-stgraber@ubuntu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-31Merge tag 'linux-can-fixes-for-5.18-20220331' of ↵Jakub Kicinski1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2022-03-31 The first patch is by Oliver Hartkopp and fixes MSG_PEEK feature in the CAN ISOTP protocol (broken in net-next for v5.18 only). Tom Rix's patch for the mcp251xfd driver fixes the propagation of an error value in case of an error. A patch by me for the m_can driver fixes a use-after-free in the xmit handler for m_can IP cores v3.0.x. Hangyu Hua contributes 3 patches fixing the same double free in the error path of the xmit handler in the ems_usb, usb_8dev and mcba_usb USB CAN driver. Pavel Skripkin contributes a patch for the mcba_usb driver to properly check the endpoint type. The last patch is by me and fixes a mem leak in the gs_usb, which was introduced in net-next for v5.18. * tag 'linux-can-fixes-for-5.18-20220331' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: gs_usb: gs_make_candev(): fix memory leak for devices with extended bit timing configuration can: mcba_usb: properly check endpoint type can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path can: usb_8dev: usb_8dev_start_xmit(): fix double dev_kfree_skb() in error path can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path can: m_can: m_can_tx_handler(): fix use after free of skb can: mcp251xfd: mcp251xfd_register_get_dev_id(): fix return of error value can: isotp: restore accidentally removed MSG_PEEK feature ==================== Link: https://lore.kernel.org/r/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-31rxrpc: fix some null-ptr-deref bugs in server_key.cXiaolong Huang1-2/+5
Some function calls are not implemented in rxrpc_no_security, there are preparse_server_key, free_preparse_server_key and destroy_server_key. When rxrpc security type is rxrpc_no_security, user can easily trigger a null-ptr-deref bug via ioctl. So judgment should be added to prevent it The crash log: user@syzkaller:~$ ./rxrpc_preparse_s [ 37.956878][T15626] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 37.957645][T15626] #PF: supervisor instruction fetch in kernel mode [ 37.958229][T15626] #PF: error_code(0x0010) - not-present page [ 37.958762][T15626] PGD 4aadf067 P4D 4aadf067 PUD 4aade067 PMD 0 [ 37.959321][T15626] Oops: 0010 [#1] PREEMPT SMP [ 37.959739][T15626] CPU: 0 PID: 15626 Comm: rxrpc_preparse_ Not tainted 5.17.0-01442-gb47d5a4f6b8d #43 [ 37.960588][T15626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 37.961474][T15626] RIP: 0010:0x0 [ 37.961787][T15626] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 37.962480][T15626] RSP: 0018:ffffc9000d9abdc0 EFLAGS: 00010286 [ 37.963018][T15626] RAX: ffffffff84335200 RBX: ffff888012a1ce80 RCX: 0000000000000000 [ 37.963727][T15626] RDX: 0000000000000000 RSI: ffffffff84a736dc RDI: ffffc9000d9abe48 [ 37.964425][T15626] RBP: ffffc9000d9abe48 R08: 0000000000000000 R09: 0000000000000002 [ 37.965118][T15626] R10: 000000000000000a R11: f000000000000000 R12: ffff888013145680 [ 37.965836][T15626] R13: 0000000000000000 R14: ffffffffffffffec R15: ffff8880432aba80 [ 37.966441][T15626] FS: 00007f2177907700(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 37.966979][T15626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.967384][T15626] CR2: ffffffffffffffd6 CR3: 000000004aaf1000 CR4: 00000000000006f0 [ 37.967864][T15626] Call Trace: [ 37.968062][T15626] <TASK> [ 37.968240][T15626] rxrpc_preparse_s+0x59/0x90 [ 37.968541][T15626] key_create_or_update+0x174/0x510 [ 37.968863][T15626] __x64_sys_add_key+0x139/0x1d0 [ 37.969165][T15626] do_syscall_64+0x35/0xb0 [ 37.969451][T15626] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 37.969824][T15626] RIP: 0033:0x43a1f9 Signed-off-by: Xiaolong Huang <butterflyhuangxx@gmail.com> Tested-by: Xiaolong Huang <butterflyhuangxx@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005069.html Fixes: 12da59fcab5a ("rxrpc: Hand server key parsing off to the security class") Link: https://lore.kernel.org/r/164865013439.2941502.8966285221215590921.stgit@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-31rxrpc: Fix call timer start racing with call destructionDavid Howells3-14/+43
The rxrpc_call struct has a timer used to handle various timed events relating to a call. This timer can get started from the packet input routines that are run in softirq mode with just the RCU read lock held. Unfortunately, because only the RCU read lock is held - and neither ref or other lock is taken - the call can start getting destroyed at the same time a packet comes in addressed to that call. This causes the timer - which was already stopped - to get restarted. Later, the timer dispatch code may then oops if the timer got deallocated first. Fix this by trying to take a ref on the rxrpc_call struct and, if successful, passing that ref along to the timer. If the timer was already running, the ref is discarded. The timer completion routine can then pass the ref along to the call's work item when it queues it. If the timer or work item where already queued/running, the extra ref is discarded. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-March/005073.html Link: https://lore.kernel.org/r/164865115696.2943015.11097991776647323586.stgit@warthog.procyon.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-31can: isotp: restore accidentally removed MSG_PEEK featureOliver Hartkopp1-1/+1
In commit 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when reading from socket") a new check for recvmsg flags has been introduced that only checked for the flags that are handled in isotp_recvmsg() itself. This accidentally removed the MSG_PEEK feature flag which is processed later in the call chain in __skb_try_recv_from_queue(). Add MSG_PEEK to the set of valid flags to restore the feature. Fixes: 42bf50a1795a ("can: isotp: support MSG_TRUNC flag when reading from socket") Link: https://github.com/linux-can/can-utils/issues/347#issuecomment-1079554254 Link: https://lore.kernel.org/all/20220328113611.3691-1-socketcan@hartkopp.net Reported-by: Derek Will <derekrobertwill@gmail.com> Suggested-by: Derek Will <derekrobertwill@gmail.com> Tested-by: Derek Will <derekrobertwill@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-03-30Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski1-2/+6
Alexei Starovoitov says: ==================== pull-request: bpf 2022-03-29 We've added 16 non-merge commits during the last 1 day(s) which contain a total of 24 files changed, 354 insertions(+), 187 deletions(-). The main changes are: 1) x86 specific bits of fprobe/rethook, from Masami and Peter. 2) ice/xsk fixes, from Maciej and Magnus. 3) Various small fixes, from Andrii, Yonghong, Geliang and others. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Fix clang compilation errors ice: xsk: Fix indexing in ice_tx_xsk_pool() ice: xsk: Stop Rx processing when ntc catches ntu ice: xsk: Eliminate unnecessary loop iteration xsk: Do not write NULL in SW ring at allocation failure x86,kprobes: Fix optprobe trampoline to generate complete pt_regs x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs x86,rethook,kprobes: Replace kretprobe with rethook on x86 kprobes: Use rethook for kretprobe if possible bpftool: Fix generated code in codegen_asserts selftests/bpf: fix selftest after random: Urandom_read tracepoint removal bpf: Fix maximum permitted number of arguments check bpf: Sync comments for bpf_get_stack fprobe: Fix sparse warning for acccessing __rcu ftrace_hash fprobe: Fix smatch type mismatch warning bpf/bpftool: Add unprivileged_bpf_disabled check against value of 2 ==================== Link: https://lore.kernel.org/r/20220329234924.39053-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-30Merge tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds18-203/+273
Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - Switch NFS to use readahead instead of the obsolete readpages. - Readdir fixes to improve cacheability of large directories when there are multiple readers and writers. - Readdir performance improvements when doing a seekdir() immediately after opening the directory (common when re-exporting NFS). - NFS swap improvements from Neil Brown. - Loosen up memory allocation to permit direct reclaim and write back in cases where there is no danger of deadlocking the writeback code or NFS swap. - Avoid sillyrename when the NFSv4 server claims to support the necessary features to recover the unlinked but open file after reboot. Bugfixes: - Patch from Olga to add a mount option to control NFSv4.1 session trunking discovery, and default it to being off. - Fix a lockup in nfs_do_recoalesce(). - Two fixes for list iterator variables being used when pointing to the list head. - Fix a kernel memory scribble when reading from a non-socket transport in /sys/kernel/sunrpc. - Fix a race where reconnecting to a server could leave the TCP socket stuck forever in the connecting state. - Patch from Neil to fix a shutdown race which can leave the SUNRPC transport timer primed after we free the struct xprt itself. - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2 copy offload. - Sunrpc patch from Olga to avoid resending a task on an offlined transport. Cleanups: - Patches from Dave Wysochanski to clean up the fscache code" * tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (91 commits) NFSv4/pNFS: Fix another issue with a list iterator pointing to the head NFS: Don't loop forever in nfs_do_recoalesce() SUNRPC: Don't return error values in sysfs read of closed files SUNRPC: Do not dereference non-socket transports in sysfs NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error SUNRPC don't resend a task on an offlined transport NFS: replace usage of found with dedicated list iterator variable SUNRPC: avoid race between mod_timer() and del_timer_sync() pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod NFS: Avoid writeback threads getting stuck in mempool_alloc() NFS: nfsiod should not block forever in mempool_alloc() SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent SUNRPC: Fix unx_lookup_cred() allocation NFS: Fix memory allocation in rpc_alloc_task() NFS: Fix memory allocation in rpc_malloc() SUNRPC: Improve accuracy of socket ENOBUFS determination SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACE SUNRPC: Fix socket waits for write buffer space ...
2022-03-30netfilter: bitwise: fix reduce comparisonsJeremy Sowden1-2/+2
The `nft_bitwise_reduce` and `nft_bitwise_fast_reduce` functions should compare the bitwise operation in `expr` with the tracked operation associated with the destination register of `expr`. However, instead of being called on `expr` and `track->regs[priv->dreg].selector`, `nft_expr_priv` is called on `expr` twice, so both reduce functions return true even when the operations differ. Fixes: be5650f8f47e ("netfilter: nft_bitwise: track register operations") Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-29ax25: Fix UAF bugs in ax25 timersDuoming Zhou1-0/+5
There are race conditions that may lead to UAF bugs in ax25_heartbeat_expiry(), ax25_t1timer_expiry(), ax25_t2timer_expiry(), ax25_t3timer_expiry() and ax25_idletimer_expiry(), when we call ax25_release() to deallocate ax25_dev. One of the UAF bugs caused by ax25_release() is shown below: (Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | ... | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25_std_establish_data_link() | ax25_start_t1timer() | ax25_dev_device_down() //(3) mod_timer(&ax25->t1timer,..) | | ax25_release() (wait a time) | ... | ax25_dev_put(ax25_dev) //(4)FREE ax25_t1timer_expiry() | ax25->ax25_dev->values[..] //USE| ... ... | We increase the refcount of ax25_dev in position (1) and (2), and decrease the refcount of ax25_dev in position (3) and (4). The ax25_dev will be freed in position (4) and be used in ax25_t1timer_expiry(). The fail log is shown below: ============================================================== [ 106.116942] BUG: KASAN: use-after-free in ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] Read of size 8 at addr ffff88800bda9028 by task swapper/0/0 [ 106.116942] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-06123-g0905eec574 [ 106.116942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-14 [ 106.116942] Call Trace: ... [ 106.116942] ax25_t1timer_expiry+0x1c/0x60 [ 106.116942] call_timer_fn+0x122/0x3d0 [ 106.116942] __run_timers.part.0+0x3f6/0x520 [ 106.116942] run_timer_softirq+0x4f/0xb0 [ 106.116942] __do_softirq+0x1c2/0x651 ... This patch adds del_timer_sync() in ax25_release(), which could ensure that all timers stop before we deallocate ax25_dev. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29ax25: fix UAF bug in ax25_send_control()Duoming Zhou1-4/+4
There are UAF bugs in ax25_send_control(), when we call ax25_release() to deallocate ax25_dev. The possible race condition is shown below: (Thread 1) | (Thread 2) ax25_dev_device_up() //(1) | | ax25_kill_by_device() ax25_bind() //(2) | ax25_connect() | ... ax25->state = AX25_STATE_1 | ... | ax25_dev_device_down() //(3) (Thread 3) ax25_release() | ax25_dev_put() //(4) FREE | case AX25_STATE_1: | ax25_send_control() | alloc_skb() //USE | The refcount of ax25_dev increases in position (1) and (2), and decreases in position (3) and (4). The ax25_dev will be freed before dereference sites in ax25_send_control(). The following is part of the report: [ 102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210 [ 102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602 [ 102.297448] Call Trace: [ 102.303751] ax25_send_control+0x33/0x210 [ 102.303751] ax25_release+0x356/0x450 [ 102.305431] __sock_release+0x6d/0x120 [ 102.305431] sock_close+0xf/0x20 [ 102.305431] __fput+0x11f/0x420 [ 102.305431] task_work_run+0x86/0xd0 [ 102.307130] get_signal+0x1075/0x1220 [ 102.308253] arch_do_signal_or_restart+0x1df/0xc00 [ 102.308253] exit_to_user_mode_prepare+0x150/0x1e0 [ 102.308253] syscall_exit_to_user_mode+0x19/0x50 [ 102.308253] do_syscall_64+0x48/0x90 [ 102.308253] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 102.308253] RIP: 0033:0x405ae7 This patch defers the free operation of ax25_dev and net_device after all corresponding dereference sites in ax25_release() to avoid UAF. Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29openvswitch: Fixed nd target mask field in the flow dump.Martin Varghese1-2/+2
IPv6 nd target mask was not getting populated in flow dump. In the function __ovs_nla_put_key the icmp code mask field was checked instead of icmp code key field to classify the flow as neighbour discovery. ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0), skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0), ct_zone(0/0),ct_mark(0/0),ct_label(0/0), eth(src=00:00:00:00:00:00/00:00:00:00:00:00, dst=00:00:00:00:00:00/00:00:00:00:00:00), eth_type(0x86dd), ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no), icmpv6(type=135,code=0), nd(target=2001::2/::, sll=00:00:00:00:00:00/00:00:00:00:00:00, tll=00:00:00:00:00:00/00:00:00:00:00:00), packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2 Fixes: e64457191a25 (openvswitch: Restructure datapath.c and flow.c) Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Link: https://lore.kernel.org/r/20220328054148.3057-1-martinvarghesenokia@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29xsk: Do not write NULL in SW ring at allocation failureMagnus Karlsson1-2/+6
For the case when xp_alloc_batch() is used but the batched allocation cannot be used, there is a slow path that uses the non-batched xp_alloc(). When it fails to allocate an entry, it returns NULL. The current code wrote this NULL into the entry of the provided results array (pointer to the driver SW ring usually) and returned. This might not be what the driver expects and to make things simpler, just write successfully allocated xdp_buffs into the SW ring,. The driver might have information in there that is still important after an allocation failure. Note that at this point in time, there are no drivers using xp_alloc_batch() that could trigger this slow path. But one might get added. Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-2-maciej.fijalkowski@intel.com
2022-03-29Merge tag 'net-5.18-rc0' of ↵Linus Torvalds10-65/+112
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - llc: only change llc->dev when bind() succeeds, fix null-deref Current release - new code bugs: - smc: fix a memory leak in smc_sysctl_net_exit() - dsa: realtek: make interface drivers depend on OF Previous releases - regressions: - sched: act_ct: fix ref leak when switching zones Previous releases - always broken: - netfilter: egress: report interface as outgoing - vsock/virtio: enable VQs early on probe and finish the setup before using them Misc: - memcg: enable accounting for nft objects" * tag 'net-5.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (39 commits) Revert "selftests: net: Add tls config dependency for tls selftests" net/smc: Send out the remaining data in sndbuf before close net: move net_unlink_todo() out of the header net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator net: bnxt_ptp: fix compilation error selftests: net: Add tls config dependency for tls selftests memcg: enable accounting for nft objects net/sched: act_ct: fix ref leak when switching zones net/smc: fix a memory leak in smc_sysctl_net_exit() selftests: tls: skip cmsg_to_pipe tests with TLS=n octeontx2-af: initialize action variable net: sparx5: switchdev: fix possible NULL pointer dereference net/x25: Fix null-ptr-deref caused by x25_disconnect qlcnic: dcb: default to returning -EOPNOTSUPP net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL net: hns3: fix phy can not link up when autoneg off and reset net: hns3: add NULL pointer check for hns3_set/get_ringparam() net: hns3: add netdev reset check for hns3_set_tunable() net: hns3: clean residual vf config after disable sriov net: hns3: add max order judgement for tx spare buffer ...
2022-03-29net/smc: Send out the remaining data in sndbuf before closeWen Gu1-0/+3
The current autocork algorithms will delay the data transmission in BH context to smc_release_cb() when sock_lock is hold by user. So there is a possibility that when connection is being actively closed (sock_lock is hold by user now), some corked data still remains in sndbuf, waiting to be sent by smc_release_cb(). This will cause: - smc_close_stream_wait(), which is called under the sock_lock, has a high probability of timeout because data transmission is delayed until sock_lock is released. - Unexpected data sends may happen after connction closed and use the rtoken which has been deleted by remote peer through LLC_DELETE_RKEY messages. So this patch will try to send out the remaining corked data in sndbuf before active close process, to ensure data integrity and avoid unexpected data transmission after close. Reported-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Fixes: 6b88af839d20 ("net/smc: don't send in the BH context if sock_owned_by_user") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/1648447836-111521-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>