summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2023-02-11dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.Kuniyuki Iwashima2-12/+5
Eric Dumazet pointed out [0] that when we call skb_set_owner_r() for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called, resulting in a negative sk_forward_alloc. We add a new helper which clones a skb and sets its owner only when sk_rmem_schedule() succeeds. Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv() because tcp_send_synack() can make sk_forward_alloc negative before ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases. [0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/ Fixes: 323fbd0edf3f ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()") Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11net/sched: tcindex: update imperfect hash filters respecting rcuPedro Tammela1-4/+30
The imperfect hash area can be updated while packets are traversing, which will cause a use-after-free when 'tcf_exts_exec()' is called with the destroyed tcf_ext. CPU 0: CPU 1: tcindex_set_parms tcindex_classify tcindex_lookup tcindex_lookup tcf_exts_change tcf_exts_exec [UAF] Stop operating on the shared area directly, by using a local copy, and update the filter with 'rcu_replace_pointer()'. Delete the old filter version only after a rcu grace period elapsed. Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change") Reported-by: valis <sec@valis.email> Suggested-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-11sctp: sctp_sock_filter(): avoid list_entry() on possibly empty listPietro Borrello1-3/+1
Use list_is_first() to check whether tsp->asoc matches the first element of ep->asocs, as the list is not guaranteed to have an entry. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniroma1.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: initialize net->notrefcnt_tracker earlierEric Dumazet1-1/+9
syzbot was able to trigger a warning [1] from net_free() calling ref_tracker_dir_exit(&net->notrefcnt_tracker) while the corresponding ref_tracker_dir_init() has not been done yet. copy_net_ns() can indeed bypass the call to setup_net() in some error conditions. Note: We might factorize/move more code in preinit_net() in the future. [1] INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 0 PID: 5817 Comm: syz-executor.3 Not tainted 6.2.0-rc7-next-20230208-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 assign_lock_key kernel/locking/lockdep.c:982 [inline] register_lock_class+0xdb6/0x1120 kernel/locking/lockdep.c:1295 __lock_acquire+0x10a/0x5df0 kernel/locking/lockdep.c:4951 lock_acquire.part.0+0x11c/0x370 kernel/locking/lockdep.c:5691 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 ref_tracker_dir_exit+0x52/0x600 lib/ref_tracker.c:24 net_free net/core/net_namespace.c:442 [inline] net_free+0x98/0xd0 net/core/net_namespace.c:436 copy_net_ns+0x4f3/0x6b0 net/core/net_namespace.c:493 create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc1/0x1f0 kernel/nsproxy.c:228 ksys_unshare+0x449/0x920 kernel/fork.c:3205 __do_sys_unshare kernel/fork.c:3276 [inline] __se_sys_unshare kernel/fork.c:3274 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:3274 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 Fixes: 0cafd77dcd03 ("net: add a refcount tracker for kernel sockets") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230208182123.3821604-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10ipv6: Fix tcp socket connection with DSCP.Guillaume Nault1-0/+1
Take into account the IPV6_TCLASS socket option (DSCP) in tcp_v6_connect(). Otherwise fib6_rule_match() can't properly match the DSCP value, resulting in invalid route lookup. For example: ip route add unreachable table main 2001:db8::10/124 ip route add table 100 2001:db8::10/124 dev eth0 ip -6 rule add dsfield 0x04 table 100 echo test | socat - TCP6:[2001:db8::11]:54321,ipv6-tclass=0x04 Without this patch, socat fails at connect() time ("No route to host") because the fib-rule doesn't jump to table 100 and the lookup ends up being done in the main table. Fixes: 2cc67cc731d9 ("[IPV6] ROUTE: Routing by Traffic Class.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10ipv6: Fix datagram socket connection with DSCP.Guillaume Nault1-1/+1
Take into account the IPV6_TCLASS socket option (DSCP) in ip6_datagram_flow_key_init(). Otherwise fib6_rule_match() can't properly match the DSCP value, resulting in invalid route lookup. For example: ip route add unreachable table main 2001:db8::10/124 ip route add table 100 2001:db8::10/124 dev eth0 ip -6 rule add dsfield 0x04 table 100 echo test | socat - UDP6:[2001:db8::11]:54321,ipv6-tclass=0x04 Without this patch, socat fails at connect() time ("No route to host") because the fib-rule doesn't jump to table 100 and the lookup ends up being done in the main table. Fixes: 2cc67cc731d9 ("[IPV6] ROUTE: Routing by Traffic Class.") Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09rds: rds_rm_zerocopy_callback() use list_first_entry()Pietro Borrello1-3/+3
rds_rm_zerocopy_callback() uses list_entry() on the head of a list causing a type confusion. Use list_first_entry() to actually access the first element of the rs_zcookie_queue list. Fixes: 9426bbc6de99 ("rds: use list structure to track information for zerocopy completion notification") Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230202-rds-zerocopy-v3-1-83b0df974f9a@diag.uniroma1.it Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-09Merge tag 'ipsec-2023-02-08' of ↵Jakub Kicinski5-20/+73
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== ipsec 2023-02-08 1) Fix policy checks for nested IPsec tunnels when using xfrm interfaces. From Benedict Wong. 2) Fix netlink message expression on 32=>64-bit messages translators. From Anastasia Belova. 3) Prevent potential spectre v1 gadget in xfrm_xlate32_attr. From Eric Dumazet. 4) Always consistently use time64_t in xfrm_timer_handler. From Eric Dumazet. 5) Fix KCSAN reported bug: Multiple cpus can update use_time at the same time. From Eric Dumazet. 6) Fix SCP copy from IPv4 to IPv6 on interfamily tunnel. From Christian Hopps. * tag 'ipsec-2023-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix bug with DSCP copy to v6 from v4 tunnel xfrm: annotate data-race around use_time xfrm: consistently use time64_t in xfrm_timer_handler() xfrm/compat: prevent potential spectre v1 gadget in xfrm_xlate32_attr() xfrm: compat: change expression for switch in xfrm_xlate64 Fix XFRM-I support for nested ESP tunnels ==================== Link: https://lore.kernel.org/r/20230208114322.266510-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08mptcp: be careful on subflow status propagation on errorsPaolo Abeni1-1/+9
Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. If the msk is already orphaned, the above prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. There is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(): in the error propagation path we need only to cope with unorphaned sockets. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/339 Fixes: 15cc10453398 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08mptcp: fix locking for in-kernel listener creationPaolo Abeni2-5/+7
For consistency, in mptcp_pm_nl_create_listen_socket(), we need to call the __mptcp_nmpc_socket() under the msk socket lock. Note that as a side effect, mptcp_subflow_create_socket() needs a 'nested' lockdep annotation, as it will acquire the subflow (kernel) socket lock under the in-kernel listener msk socket lock. The current lack of locking is almost harmless, because the relevant socket is not exposed to the user space, but in future we will add more complexity to the mentioned helper, let's play safe. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08mptcp: fix locking for setsockopt corner-casePaolo Abeni1-2/+9
We need to call the __mptcp_nmpc_socket(), and later subflow socket access under the msk socket lock, or e.g. a racing connect() could change the socket status under the hood, with unexpected results. Fixes: 54635bd04701 ("mptcp: add TCP_FASTOPEN_CONNECT socket option") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08mptcp: do not wait for bare sockets' timeoutPaolo Abeni1-0/+9
If the peer closes all the existing subflows for a given mptcp socket and later the application closes it, the current implementation let it survive until the timewait timeout expires. While the above is allowed by the protocol specification it consumes resources for almost no reason and additionally causes sporadic self-tests failures. Let's move the mptcp socket to the TCP_CLOSE state when there are no alive subflows at close time, so that the allocated resources will be freed immediately. Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08txhash: fix sk->sk_txrehash defaultKevin Yang4-4/+4
This code fix a bug that sk->sk_txrehash gets its default enable value from sysctl_txrehash only when the socket is a TCP listener. We should have sysctl_txrehash to set the default sk->sk_txrehash, no matter TCP, nor listerner/connector. Tested by following packetdrill: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 4 // SO_TXREHASH == 74, default to sysctl_txrehash == 1 +0 getsockopt(3, SOL_SOCKET, 74, [1], [4]) = 0 +0 getsockopt(4, SOL_SOCKET, 74, [1], [4]) = 0 Fixes: 26859240e4ee ("txhash: Add socket option to control TX hash rethink behavior") Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net: sched: sch: Fix off by one in htb_activate_prios()Dan Carpenter1-1/+1
The > needs be >= to prevent an out of bounds access. Fixes: de5ca4c3852f ("net: sched: sch: Bounds check priority") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/Y+D+KN18FQI2DKLq@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08Merge tag 'linux-can-fixes-for-6.2-20230207' of ↵Jakub Kicinski1-0/+40
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== can 2023-02-07 The patch is from Devid Antonio Filoni and fixes an address claiming problem in the J1939 CAN protocol. * tag 'linux-can-fixes-for-6.2-20230207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: j1939: do not wait 250 ms if the same addr was already claimed ==================== Link: https://lore.kernel.org/r/20230207140514.2885065-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-07can: j1939: do not wait 250 ms if the same addr was already claimedDevid Antonio Filoni1-0/+40
The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states: d) No CF shall begin, or resume, transmission on the network until 250 ms after it has successfully claimed an address except when responding to a request for address-claimed. But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim prioritization" show that the CF begins the transmission after 250 ms from the first AC (address-claimed) message even if it sends another AC message during that time window to resolve the address contention with another CF. As stated in "4.4.2.3 - Address-claimed message": In order to successfully claim an address, the CF sending an address claimed message shall not receive a contending claim from another CF for at least 250 ms. As stated in "4.4.3.2 - NAME management (NM) message": 1) A commanding CF can d) request that a CF with a specified NAME transmit the address- claimed message with its current NAME. 2) A target CF shall d) send an address-claimed message in response to a request for a matching NAME Taking the above arguments into account, the 250 ms wait is requested only during network initialization. Do not restart the timer on AC message if both the NAME and the address match and so if the address has already been claimed (timer has expired) or the AC message has been sent to resolve the contention with another CF (timer is still running). Signed-off-by: Devid Antonio Filoni <devid.filoni@egluetechnologies.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20221125170418.34575-1-devid.filoni@egluetechnologies.com Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-07devlink: change port event netdev notifier from per-net to globalJiri Pirko1-3/+6
Currently only the network namespace of devlink instance is monitored for port events. If netdev is moved to a different namespace and then unregistered, NETDEV_PRE_UNINIT is missed which leads to trigger following WARN_ON in devl_port_unregister(). WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET); Fix this by changing the netdev notifier from per-net to global so no event is missed. Fixes: 02a68a47eade ("net: devlink: track netdev with devlink_port assigned") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20230206094151.2557264-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-06neigh: make sure used and confirmed times are validJulian Anastasov1-3/+15
Entries can linger in cache without timer for days, thanks to the gc_thresh1 limit. As result, without traffic, the confirmed time can be outdated and to appear to be in the future. Later, on traffic, NUD_STALE entries can switch to NUD_DELAY and start the timer which can see the invalid confirmed time and wrongly switch to NUD_REACHABLE state instead of NUD_PROBE. As result, timer is set many days in the future. This is more visible on 32-bit platforms, with higher HZ value. Why this is a problem? While we expect unused entries to expire, such entries stay in REACHABLE state for too long, locked in cache. They are not expired normally, only when cache is full. Problem and the wrong state change reported by Zhang Changzhong: 172.16.1.18 dev bond0 lladdr 0a:0e:0f:01:12:01 ref 1 used 350521/15994171/350520 probes 4 REACHABLE 350520 seconds have elapsed since this entry was last updated, but it is still in the REACHABLE state (base_reachable_time_ms is 30000), preventing lladdr from being updated through probe. Fix it by ensuring timer is started with valid used/confirmed times. Considering the valid time range is LONG_MAX jiffies, we try not to go too much in the past while we are in DELAY/PROBE state. There are also places that need used/updated times to be validated while timer is not running. Reported-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Tested-by: Zhang Changzhong <zhangchangzhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-02Merge tag 'linux-can-fixes-for-6.2-20230202' of ↵Jakub Kicinski3-56/+64
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== can 2023-02-02 The first patch is by Ziyang Xuan and removes a errant WARN_ON_ONCE() in the CAN J1939 protocol. The next 3 patches are by Oliver Hartkopp. The first 2 target the CAN ISO-TP protocol and fix the state machine with respect to signals and a regression found by the syzbot. The last patch is by me an missing assignment during the ethtool ring configuration callback. * tag 'linux-can-fixes-for-6.2-20230202' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq can: isotp: split tx timer into transmission and timeout can: isotp: handle wait_event_interruptible() return values can: raw: fix CAN FD frame transmissions over CAN XL devices can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate ==================== Link: https://lore.kernel.org/r/20230202094135.2293939-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02net: openvswitch: fix flow memory leak in ovs_flow_cmd_newFedor Pchelkin1-6/+6
Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is not freed when an allocation of a key fails. BUG: memory leak unreferenced object 0xffff888116668000 (size 632): comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline] [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77 [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957 [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739 [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800 [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515 [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline] [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339 [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934 [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline] [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671 [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356 [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410 [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439 [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46 [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 To fix this the patch rearranges the goto labels to reflect the order of object allocations and adds appropriate goto statements on the error paths. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 68bb10101e6b ("openvswitch: Fix flow lookup to use unmasked key") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02can: isotp: split tx timer into transmission and timeoutOliver Hartkopp1-36/+29
The timer for the transmission of isotp PDUs formerly had two functions: 1. send two consecutive frames with a given time gap 2. monitor the timeouts for flow control frames and the echo frames This led to larger txstate checks and potentially to a problem discovered by syzbot which enabled the panic_on_warn feature while testing. The former 'txtimer' function is split into 'txfrtimer' and 'txtimer' to handle the two above functionalities with separate timer callbacks. The two simplified timers now run in one-shot mode and make the state transitions (especially with isotp_rcv_echo) better understandable. Fixes: 866337865f37 ("can: isotp: fix tx state handling for echo tx processing") Reported-by: syzbot+5aed6c3aaba661f5b917@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # >= v6.0 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230104145701.2422-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-02can: isotp: handle wait_event_interruptible() return valuesOliver Hartkopp1-0/+4
When wait_event_interruptible() has been interrupted by a signal the tx.state value might not be ISOTP_IDLE. Force the state machines into idle state to inhibit the timer handlers to continue working. Fixes: 866337865f37 ("can: isotp: fix tx state handling for echo tx processing") Cc: stable@vger.kernel.org Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230112192347.1944-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-02can: raw: fix CAN FD frame transmissions over CAN XL devicesOliver Hartkopp1-16/+31
A CAN XL device is always capable to process CAN FD frames. The former check when sending CAN FD frames relied on the existence of a CAN FD device and did not check for a CAN XL device that would be correct too. With this patch the CAN FD feature is enabled automatically when CAN XL is switched on - and CAN FD cannot be switch off while CAN XL is enabled. This precondition also leads to a clean up and reduction of checks in the hot path in raw_rcv() and raw_sendmsg(). Some conditions are reordered to handle simple checks first. changes since v1: https://lore.kernel.org/all/20230131091012.50553-1-socketcan@hartkopp.net - fixed typo: devive -> device changes since v2: https://lore.kernel.org/all/20230131091824.51026-1-socketcan@hartkopp.net/ - reorder checks in if statements to handle simple checks first Fixes: 626332696d75 ("can: raw: add CAN XL support") Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230131105613.55228-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-02can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivateZiyang Xuan1-4/+0
The conclusion "j1939_session_deactivate() should be called with a session ref-count of at least 2" is incorrect. In some concurrent scenarios, j1939_session_deactivate can be called with the session ref-count less than 2. But there is not any problem because it will check the session active state before session putting in j1939_session_deactivate_locked(). Here is the concurrent scenario of the problem reported by syzbot and my reproduction log. cpu0 cpu1 j1939_xtp_rx_eoma j1939_xtp_rx_abort_one j1939_session_get_by_addr [kref == 2] j1939_session_get_by_addr [kref == 3] j1939_session_deactivate [kref == 2] j1939_session_put [kref == 1] j1939_session_completed j1939_session_deactivate WARN_ON_ONCE(kref < 2) ===================================================== WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70 CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:j1939_session_deactivate+0x5f/0x70 Call Trace: j1939_session_deactivate_activate_next+0x11/0x28 j1939_xtp_rx_eoma+0x12a/0x180 j1939_tp_recv+0x4a2/0x510 j1939_can_recv+0x226/0x380 can_rcv_filter+0xf8/0x220 can_receive+0x102/0x220 ? process_backlog+0xf0/0x2c0 can_rcv+0x53/0xf0 __netif_receive_skb_one_core+0x67/0x90 ? process_backlog+0x97/0x2c0 __netif_receive_skb+0x22/0x80 Fixes: 0c71437dd50d ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object") Reported-by: syzbot+9981a614060dcee6eeca@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/all/20210906094200.95868-1-william.xuanziyang@huawei.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-02ip/ip6_gre: Fix non-point-to-point tunnel not generating IPv6 link local addressThomas Winter1-5/+5
We recently found that our non-point-to-point tunnels were not generating any IPv6 link local address and instead generating an IPv6 compat address, breaking IPv6 communication on the tunnel. Previously, addrconf_gre_config always would call addrconf_addr_gen and generate a EUI64 link local address for the tunnel. Then commit e5dd729460ca changed the code path so that add_v4_addrs is called but this only generates a compat IPv6 address for non-point-to-point tunnels. I assume the compat address is specifically for SIT tunnels so have kept that only for SIT - GRE tunnels now always generate link local addresses. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02ip/ip6_gre: Fix changing addr gen mode not generating IPv6 link local addressThomas Winter1-22/+27
For our point-to-point GRE tunnels, they have IN6_ADDR_GEN_MODE_NONE when they are created then we set IN6_ADDR_GEN_MODE_EUI64 when they come up to generate the IPv6 link local address for the interface. Recently we found that they were no longer generating IPv6 addresses. This issue would also have affected SIT tunnels. Commit e5dd729460ca changed the code path so that GRE tunnels generate an IPv6 address based on the tunnel source address. It also changed the code path so GRE tunnels don't call addrconf_addr_gen in addrconf_dev_config which is called by addrconf_sysctl_addr_gen_mode when the IN6_ADDR_GEN_MODE is changed. This patch aims to fix this issue by moving the code in addrconf_notify which calls the addr gen for GRE and SIT into a separate function and calling it in the places that expect the IPv6 address to be generated. The previous addrconf_dev_config is renamed to addrconf_eth_config since it only expected eth type interfaces and follows the addrconf_gre/sit_config format. A part of this changes means that the loopback address will be attempted to be configured when changing addr_gen_mode for lo. This should not be a problem because the address should exist anyway and if does already exist then no error is produced. Fixes: e5dd729460ca ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address") Signed-off-by: Thomas Winter <Thomas.Winter@alliedtelesis.co.nz> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski2-2/+4
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Release bridge info once packet escapes the br_netfilter path, from Florian Westphal. 2) Revert incorrect fix for the SCTP connection tracking chunk iterator, also from Florian. First path fixes a long standing issue, the second path addresses a mistake in the previous pull request for net. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: Revert "netfilter: conntrack: fix bug in for_each_sctp_chunk" netfilter: br_netfilter: disable sabotage_in hook after first suppression ==================== Link: https://lore.kernel.org/r/20230131133158.4052-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01net: fix NULL pointer in skb_segment_listYan Zhai1-3/+2
Commit 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") introduced UDP listifyed GRO. The segmentation relies on frag_list being untouched when passing through the network stack. This assumption can be broken sometimes, where frag_list itself gets pulled into linear area, leaving frag_list being NULL. When this happens it can trigger following NULL pointer dereference, and panic the kernel. Reverse the test condition should fix it. [19185.577801][ C1] BUG: kernel NULL pointer dereference, address: ... [19185.663775][ C1] RIP: 0010:skb_segment_list+0x1cc/0x390 ... [19185.834644][ C1] Call Trace: [19185.841730][ C1] <TASK> [19185.848563][ C1] __udp_gso_segment+0x33e/0x510 [19185.857370][ C1] inet_gso_segment+0x15b/0x3e0 [19185.866059][ C1] skb_mac_gso_segment+0x97/0x110 [19185.874939][ C1] __skb_gso_segment+0xb2/0x160 [19185.883646][ C1] udp_queue_rcv_skb+0xc3/0x1d0 [19185.892319][ C1] udp_unicast_rcv_skb+0x75/0x90 [19185.900979][ C1] ip_protocol_deliver_rcu+0xd2/0x200 [19185.910003][ C1] ip_local_deliver_finish+0x44/0x60 [19185.918757][ C1] __netif_receive_skb_one_core+0x8b/0xa0 [19185.927834][ C1] process_backlog+0x88/0x130 [19185.935840][ C1] __napi_poll+0x27/0x150 [19185.943447][ C1] net_rx_action+0x27e/0x5f0 [19185.951331][ C1] ? mlx5_cq_tasklet_cb+0x70/0x160 [mlx5_core] [19185.960848][ C1] __do_softirq+0xbc/0x25d [19185.968607][ C1] irq_exit_rcu+0x83/0xb0 [19185.976247][ C1] common_interrupt+0x43/0xa0 [19185.984235][ C1] asm_common_interrupt+0x22/0x40 ... [19186.094106][ C1] </TASK> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/Y9gt5EUizK1UImEP@debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01sctp: do not check hb_timer.expires when resetting hb_timerXin Long1-3/+1
It tries to avoid the frequently hb_timer refresh in commit ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often"), and it only allows mod_timer when the new expires is after hb_timer.expires. It means even a much shorter interval for hb timer gets applied, it will have to wait until the current hb timer to time out. In sctp_do_8_2_transport_strike(), when a transport enters PF state, it expects to update the hb timer to resend a heartbeat every rto after calling sctp_transport_reset_hb_timer(), which will not work as the change mentioned above. The frequently hb_timer refresh was caused by sctp_transport_reset_timers() called in sctp_outq_flush() and it was already removed in the commit above. So we don't have to check hb_timer.expires when resetting hb_timer as it is now not called very often. Fixes: ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/d958c06985713ec84049a2d5664879802710179a.1675095933.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-31Revert "netfilter: conntrack: fix bug in for_each_sctp_chunk"Florian Westphal1-2/+3
There is no bug. If sch->length == 0, this would result in an infinite loop, but first caller, do_basic_checks(), errors out in this case. After this change, packets with bogus zero-length chunks are no longer detected as invalid, so revert & add comment wrt. 0 length check. Fixes: 98ee00774525 ("netfilter: conntrack: fix bug in for_each_sctp_chunk") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-31netfilter: br_netfilter: disable sabotage_in hook after first suppressionFlorian Westphal1-0/+1
When using a xfrm interface in a bridged setup (the outgoing device is bridged), the incoming packets in the xfrm interface are only tracked in the outgoing direction. $ brctl show bridge name interfaces br_eth1 eth1 $ conntrack -L tcp 115 SYN_SENT src=192... dst=192... [UNREPLIED] ... If br_netfilter is enabled, the first (encrypted) packet is received onR eth1, conntrack hooks are called from br_netfilter emulation which allocates nf_bridge info for this skb. If the packet is for local machine, skb gets passed up the ip stack. The skb passes through ip prerouting a second time. br_netfilter ip_sabotage_in supresses the re-invocation of the hooks. After this, skb gets decrypted in xfrm layer and appears in network stack a second time (after decryption). Then, ip_sabotage_in is called again and suppresses netfilter hook invocation, even though the bridge layer never called them for the plaintext incarnation of the packet. Free the bridge info after the first suppression to avoid this. I was unable to figure out where the regression comes from, as far as i can see br_netfilter always had this problem; i did not expect that skb is looped again with different headers. Fixes: c4b0e771f906 ("netfilter: avoid using skb->nf_bridge directly") Reported-and-tested-by: Wolfgang Nothdurft <wolfgang@linogate.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-31net: sched: sch: Bounds check priorityKees Cook1-1/+4
Nothing was explicitly bounds checking the priority index used to access clpriop[]. WARN and bail out early if it's pathological. Seen with GCC 13: ../net/sched/sch_htb.c: In function 'htb_activate_prios': ../net/sched/sch_htb.c:437:44: warning: array subscript [0, 31] is outside array bounds of 'struct htb_prio[8]' [-Warray-bounds=] 437 | if (p->inner.clprio[prio].feed.rb_node) | ~~~~~~~~~~~~~~~^~~~~~ ../net/sched/sch_htb.c:131:41: note: while referencing 'clprio' 131 | struct htb_prio clprio[TC_HTB_NUMPRIO]; | ^~~~~~ Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20230127224036.never.561-kees@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-31Merge tag 'ieee802154-for-net-2023-01-30' of ↵Jakub Kicinski1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== ieee802154 for net 2023-01-30 Only one fix this time around. Miquel Raynal fixed a potential double free spotted by Dan Carpenter. * tag 'ieee802154-for-net-2023-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan: mac802154: Fix possible double free upon parsing error ==================== Link: https://lore.kernel.org/r/20230130095646.301448-1-stefan@datenfreihafen.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-31net/tls: tls_is_tx_ready() checked list_entryPietro Borrello1-1/+1
tls_is_tx_ready() checks that list_first_entry() does not return NULL. This condition can never happen. For empty lists, list_first_entry() returns the list_entry() of the head, which is a type confusion. Use list_first_entry_or_null() which returns NULL in case of empty lists. Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it> Link: https://lore.kernel.org/r/20230128-list-entry-null-check-tls-v1-1-525bbfe6f0d0@diag.uniroma1.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-30xfrm: fix bug with DSCP copy to v6 from v4 tunnelChristian Hopps1-2/+1
When copying the DSCP bits for decap-dscp into IPv6 don't assume the outer encap is always IPv6. Instead, as with the inner IPv4 case, copy the DSCP bits from the correctly saved "tos" value in the control block. Fixes: 227620e29509 ("[IPSEC]: Separate inner/outer mode processing on input") Signed-off-by: Christian Hopps <chopps@chopps.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-01-30netrom: Fix use-after-free caused by accept on already connected socketHyunwoo Kim1-0/+5
If you call listen() and accept() on an already connect()ed AF_NETROM socket, accept() can successfully connect. This is because when the peer socket sends data to sendmsg, the skb with its own sk stored in the connected socket's sk->sk_receive_queue is connected, and nr_accept() dequeues the skb waiting in the sk->sk_receive_queue. As a result, nr_accept() allocates and returns a sock with the sk of the parent AF_NETROM socket. And here use-after-free can happen through complex race conditions: ``` cpu0 cpu1 1. socket_2 = socket(AF_NETROM) . . listen(socket_2) accepted_socket = accept(socket_2) 2. socket_1 = socket(AF_NETROM) nr_create() // sk refcount : 1 connect(socket_1) 3. write(accepted_socket) nr_sendmsg() nr_output() nr_kick() nr_send_iframe() nr_transmit_buffer() nr_route_frame() nr_loopback_queue() nr_loopback_timer() nr_rx_frame() nr_process_rx_frame(sk, skb); // sk : socket_1's sk nr_state3_machine() nr_queue_rx_frame() sock_queue_rcv_skb() sock_queue_rcv_skb_reason() __sock_queue_rcv_skb() __skb_queue_tail(list, skb); // list : socket_1's sk->sk_receive_queue 4. listen(socket_1) nr_listen() uaf_socket = accept(socket_1) nr_accept() skb_dequeue(&sk->sk_receive_queue); 5. close(accepted_socket) nr_release() nr_write_internal(sk, NR_DISCREQ) nr_transmit_buffer() // NR_DISCREQ nr_route_frame() nr_loopback_queue() nr_loopback_timer() nr_rx_frame() // sk : socket_1's sk nr_process_rx_frame() // NR_STATE_3 nr_state3_machine() // NR_DISCREQ nr_disconnect() nr_sk(sk)->state = NR_STATE_0; 6. close(socket_1) // sk refcount : 3 nr_release() // NR_STATE_0 sock_put(sk); // sk refcount : 0 sk_free(sk); close(uaf_socket) nr_release() sock_hold(sk); // UAF ``` KASAN report by syzbot: ``` BUG: KASAN: use-after-free in nr_release+0x66/0x460 net/netrom/af_netrom.c:520 Write of size 4 at addr ffff8880235d8080 by task syz-executor564/5128 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:306 [inline] print_report+0x15e/0x461 mm/kasan/report.c:417 kasan_report+0xbf/0x1f0 mm/kasan/report.c:517 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x141/0x190 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:102 [inline] atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:116 [inline] __refcount_add include/linux/refcount.h:193 [inline] __refcount_inc include/linux/refcount.h:250 [inline] refcount_inc include/linux/refcount.h:267 [inline] sock_hold include/net/sock.h:775 [inline] nr_release+0x66/0x460 net/netrom/af_netrom.c:520 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x1c/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xaa8/0x2950 kernel/exit.c:867 do_group_exit+0xd4/0x2a0 kernel/exit.c:1012 get_signal+0x21c3/0x2450 kernel/signal.c:2859 arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f6c19e3c9b9 Code: Unable to access opcode bytes at 0x7f6c19e3c98f. RSP: 002b:00007fffd4ba2ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: 0000000000000116 RBX: 0000000000000003 RCX: 00007f6c19e3c9b9 RDX: 0000000000000318 RSI: 00000000200bd000 RDI: 0000000000000006 RBP: 0000000000000003 R08: 000000000000000d R09: 000000000000000d R10: 0000000000000000 R11: 0000000000000246 R12: 000055555566a2c0 R13: 0000000000000011 R14: 0000000000000000 R15: 0000000000000000 </TASK> Allocated by task 5128: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:371 [inline] ____kasan_kmalloc mm/kasan/common.c:330 [inline] __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slab_common.c:968 [inline] __kmalloc+0x5a/0xd0 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] sk_prot_alloc+0x140/0x290 net/core/sock.c:2038 sk_alloc+0x3a/0x7a0 net/core/sock.c:2091 nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433 __sock_create+0x359/0x790 net/socket.c:1515 sock_create net/socket.c:1566 [inline] __sys_socket_create net/socket.c:1603 [inline] __sys_socket_create net/socket.c:1588 [inline] __sys_socket+0x133/0x250 net/socket.c:1636 __do_sys_socket net/socket.c:1649 [inline] __se_sys_socket net/socket.c:1647 [inline] __x64_sys_socket+0x73/0xb0 net/socket.c:1647 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 5128: kasan_save_stack+0x22/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:177 [inline] __cache_free mm/slab.c:3394 [inline] __do_kmem_cache_free mm/slab.c:3580 [inline] __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587 sk_prot_free net/core/sock.c:2074 [inline] __sk_destruct+0x5df/0x750 net/core/sock.c:2166 sk_destruct net/core/sock.c:2181 [inline] __sk_free+0x175/0x460 net/core/sock.c:2192 sk_free+0x7c/0xa0 net/core/sock.c:2203 sock_put include/net/sock.h:1991 [inline] nr_release+0x39e/0x460 net/netrom/af_netrom.c:554 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x1c/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xaa8/0x2950 kernel/exit.c:867 do_group_exit+0xd4/0x2a0 kernel/exit.c:1012 get_signal+0x21c3/0x2450 kernel/signal.c:2859 arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306 exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd ``` To fix this issue, nr_listen() returns -EINVAL for sockets that successfully nr_connect(). Reported-by: syzbot+caa188bdfc1eeafeb418@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-28net: mctp: purge receive queues on sk destructionJeremy Kerr1-0/+6
We may have pending skbs in the receive queue when the sk is being destroyed; add a destructor to purge the queue. MCTP doesn't use the error queue, so only the receive_queue is purged. Fixes: 833ef3b91de6 ("mctp: Populate socket implementation") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20230126064551.464468-1-jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28net: qrtr: free memory on error path in radix_tree_insert()Natalia Petrova1-1/+4
Function radix_tree_insert() returns errors if the node hasn't been initialized and added to the tree. "kfree(node)" and return value "NULL" of node_get() help to avoid using unclear node in other calls. Found by Linux Verification Center (linuxtesting.org) with SVACE. Cc: <stable@vger.kernel.org> # 5.7 Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace") Signed-off-by: Natalia Petrova <n.petrova@fintech.ru> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://lore.kernel.org/r/20230125134831.8090-1-n.petrova@fintech.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28net/rose: Fix to not accept on connected socketHyunwoo Kim1-0/+8
If you call listen() and accept() on an already connect()ed rose socket, accept() can successfully connect. This is because when the peer socket sends data to sendmsg, the skb with its own sk stored in the connected socket's sk->sk_receive_queue is connected, and rose_accept() dequeues the skb waiting in the sk->sk_receive_queue. This creates a child socket with the sk of the parent rose socket, which can cause confusion. Fix rose_listen() to return -EINVAL if the socket has already been successfully connected, and add lock_sock to prevent this issue. Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230125105944.GA133314@ubuntu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28Merge tag 'for-netdev' of ↵Jakub Kicinski2-29/+36
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2023-01-27 We've added 10 non-merge commits during the last 9 day(s) which contain a total of 10 files changed, 170 insertions(+), 59 deletions(-). The main changes are: 1) Fix preservation of register's parent/live fields when copying range-info, from Eduard Zingerman. 2) Fix an off-by-one bug in bpf_mem_cache_idx() to select the right cache, from Hou Tao. 3) Fix stack overflow from infinite recursion in sock_map_close(), from Jakub Sitnicki. 4) Fix missing btf_put() in register_btf_id_dtor_kfuncs()'s error path, from Jiri Olsa. 5) Fix a splat from bpf_setsockopt() via lsm_cgroup/socket_sock_rcv_skb, from Kui-Feng Lee. 6) Fix bpf_send_signal[_thread]() helpers to hold a reference on the task, from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix the kernel crash caused by bpf_setsockopt(). selftests/bpf: Cover listener cloning with progs attached to sockmap selftests/bpf: Pass BPF skeleton to sockmap_listen ops tests bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listener bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself bpf: Add missing btf_put to register_btf_id_dtor_kfuncs selftests/bpf: Verify copy_register_state() preserves parent/live fields bpf: Fix to preserve reg parent/live fields when copying range info bpf: Fix a possible task gone issue with bpf_send_signal[_thread]() helpers bpf: Fix off-by-one error in bpf_mem_cache_idx() ==================== Link: https://lore.kernel.org/r/20230127215820.4993-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28skb: Do mix page pool and page referenced frags in GROAlexander Duyck1-0/+9
GSO should not merge page pool recycled frames with standard reference counted frames. Traditionally this didn't occur, at least not often. However as we start looking at adding support for wireless adapters there becomes the potential to mix the two due to A-MSDU repartitioning frames in the receive path. There are possibly other places where this may have occurred however I suspect they must be few and far between as we have not seen this issue until now. Fixes: 53e0961da1c7 ("page_pool: add frag page recycling support in page pool") Reported-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/167475990764.1934330.11960904198087757911.stgit@localhost.localdomain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27xfrm: annotate data-race around use_timeEric Dumazet2-9/+12
KCSAN reported multiple cpus can update use_time at the same time. Adds READ_ONCE()/WRITE_ONCE() annotations. Note that 32bit arches are not fully protected, but they will probably no longer be supported/used in 2106. BUG: KCSAN: data-race in __xfrm_policy_check / __xfrm_policy_check write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 0: __xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664 __xfrm_policy_check2 include/net/xfrm.h:1174 [inline] xfrm_policy_check include/net/xfrm.h:1179 [inline] xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189 udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703 udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792 udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935 __udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020 udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133 ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439 ip6_input_finish net/ipv6/ip6_input.c:484 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493 dst_input include/net/dst.h:454 [inline] ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:302 [inline] ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5482 [inline] __netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596 process_backlog+0x23f/0x3b0 net/core/dev.c:5924 __napi_poll+0x65/0x390 net/core/dev.c:6485 napi_poll net/core/dev.c:6552 [inline] net_rx_action+0x37e/0x730 net/core/dev.c:6663 __do_softirq+0xf2/0x2c7 kernel/softirq.c:571 do_softirq+0xb1/0xf0 kernel/softirq.c:472 __local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396 __raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline] _raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284 wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184 wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline] wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 write to 0xffff88813e7ec108 of 8 bytes by interrupt on cpu 1: __xfrm_policy_check+0x6ae/0x17f0 net/xfrm/xfrm_policy.c:3664 __xfrm_policy_check2 include/net/xfrm.h:1174 [inline] xfrm_policy_check include/net/xfrm.h:1179 [inline] xfrm6_policy_check+0x2e9/0x320 include/net/xfrm.h:1189 udpv6_queue_rcv_one_skb+0x48/0xa30 net/ipv6/udp.c:703 udpv6_queue_rcv_skb+0x2d6/0x310 net/ipv6/udp.c:792 udp6_unicast_rcv_skb+0x16b/0x190 net/ipv6/udp.c:935 __udp6_lib_rcv+0x84b/0x9b0 net/ipv6/udp.c:1020 udpv6_rcv+0x4b/0x50 net/ipv6/udp.c:1133 ip6_protocol_deliver_rcu+0x99e/0x1020 net/ipv6/ip6_input.c:439 ip6_input_finish net/ipv6/ip6_input.c:484 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] ip6_input+0xca/0x180 net/ipv6/ip6_input.c:493 dst_input include/net/dst.h:454 [inline] ip6_rcv_finish+0x1e9/0x2d0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:302 [inline] ipv6_rcv+0x85/0x140 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5482 [inline] __netif_receive_skb+0x8b/0x1b0 net/core/dev.c:5596 process_backlog+0x23f/0x3b0 net/core/dev.c:5924 __napi_poll+0x65/0x390 net/core/dev.c:6485 napi_poll net/core/dev.c:6552 [inline] net_rx_action+0x37e/0x730 net/core/dev.c:6663 __do_softirq+0xf2/0x2c7 kernel/softirq.c:571 do_softirq+0xb1/0xf0 kernel/softirq.c:472 __local_bh_enable_ip+0x6f/0x80 kernel/softirq.c:396 __raw_read_unlock_bh include/linux/rwlock_api_smp.h:257 [inline] _raw_read_unlock_bh+0x17/0x20 kernel/locking/spinlock.c:284 wg_socket_send_skb_to_peer+0x107/0x120 drivers/net/wireguard/socket.c:184 wg_packet_create_data_done drivers/net/wireguard/send.c:251 [inline] wg_packet_tx_worker+0x142/0x360 drivers/net/wireguard/send.c:276 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x0000000063c62d6f -> 0x0000000063c62d70 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 4185 Comm: kworker/1:2 Tainted: G W 6.2.0-rc4-syzkaller-00009-gd532dd102151-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: wg-crypt-wg0 wg_packet_tx_worker Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-01-27xfrm: consistently use time64_t in xfrm_timer_handler()Eric Dumazet1-4/+4
For some reason, blamed commit did the right thing in xfrm_policy_timer() but did not in xfrm_timer_handler() Fixes: 386c5680e2e8 ("xfrm: use time64_t for in-kernel timestamps") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-01-25net: mctp: mark socks as dead on unhash, prevent re-addJeremy Kerr2-0/+7
Once a socket has been unhashed, we want to prevent it from being re-used in a sk_key entry as part of a routing operation. This change marks the sk as SOCK_DEAD on unhash, which prevents addition into the net's key list. We need to do this during the key add path, rather than key lookup, as we release the net keys_lock between those operations. Fixes: 4a992bbd3650 ("mctp: Implement message fragmentation & reassembly") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: mctp: hold key reference when looking up a general keyPaolo Abeni1-7/+7
Currently, we have a race where we look up a sock through a "general" (ie, not directly associated with the (src,dest,tag) tuple) key, then drop the key reference while still holding the key's sock. This change expands the key reference until we've finished using the sock, and hence the sock reference too. Commit message changes from Jeremy Kerr <jk@codeconstruct.com.au>. Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Fixes: 73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: mctp: move expiry timer delete to unhashJeremy Kerr1-3/+6
Currently, we delete the key expiry timer (in sk->close) before unhashing the sk. This means that another thread may find the sk through its presence on the key list, and re-queue the timer. This change moves the timer deletion to the unhash, after we have made the key no longer observable, so the timer cannot be re-queued. Fixes: 7b14e15ae6f4 ("mctp: Implement a timeout for tags") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: mctp: add an explicit reference from a mctp_sk_key to sockJeremy Kerr1-6/+8
Currently, we correlate the mctp_sk_key lifetime to the sock lifetime through the sock hash/unhash operations, but this is pretty tenuous, and there are cases where we may have a temporary reference to an unhashed sk. This change makes the reference more explicit, by adding a hold on the sock when it's associated with a mctp_sk_key, released on final key unref. Fixes: 73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net/x25: Fix to not accept on connected socketHyunwoo Kim1-0/+6
When listen() and accept() are called on an x25 socket that connect() succeeds, accept() succeeds immediately. This is because x25_connect() queues the skb to sk->sk_receive_queue, and x25_accept() dequeues it. This creates a child socket with the sk of the parent x25 socket, which can cause confusion. Fix x25_listen() to return -EINVAL if the socket has already been successfully connect()ed to avoid this issue. Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25bpf, sockmap: Check for any of tcp_bpf_prots when cloning a listenerJakub Sitnicki1-2/+2
A listening socket linked to a sockmap has its sk_prot overridden. It points to one of the struct proto variants in tcp_bpf_prots. The variant depends on the socket's family and which sockmap programs are attached. A child socket cloned from a TCP listener initially inherits their sk_prot. But before cloning is finished, we restore the child's proto to the listener's original non-tcp_bpf_prots one. This happens in tcp_create_openreq_child -> tcp_bpf_clone. Today, in tcp_bpf_clone we detect if the child's proto should be restored by checking only for the TCP_BPF_BASE proto variant. This is not correct. The sk_prot of listening socket linked to a sockmap can point to to any variant in tcp_bpf_prots. If the listeners sk_prot happens to be not the TCP_BPF_BASE variant, then the child socket unintentionally is left if the inherited sk_prot by tcp_bpf_clone. This leads to issues like infinite recursion on close [1], because the child state is otherwise not set up for use with tcp_bpf_prot operations. Adjust the check in tcp_bpf_clone to detect all of tcp_bpf_prots variants. Note that it wouldn't be sufficient to check the socket state when overriding the sk_prot in tcp_bpf_update_proto in order to always use the TCP_BPF_BASE variant for listening sockets. Since commit b8b8315e39ff ("bpf, sockmap: Remove unhash handler for BPF sockmap usage") it is possible for a socket to transition to TCP_LISTEN state while already linked to a sockmap, e.g. connect() -> insert into map -> connect(AF_UNSPEC) -> listen(). [1]: https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/ Fixes: e80251555f0b ("tcp_bpf: Don't let child socket inherit parent protocol ops on copy") Reported-by: syzbot+04c21ed96d861dccc5cd@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-2-1e0ee7ac2f90@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itselfJakub Sitnicki1-27/+34
sock_map proto callbacks should never call themselves by design. Protect against bugs like [1] and break out of the recursive loop to avoid a stack overflow in favor of a resource leak. [1] https://lore.kernel.org/all/00000000000073b14905ef2e7401@google.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230113-sockmap-fix-v2-1-1e0ee7ac2f90@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>