summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-09net: hns3: release PTP resources if pf initialization failedPeiyang Wang1-1/+3
During the PF initialization process, hclge_update_port_info may return an error code for some reason. At this point, the ptp initialization has been completed. To void memory leaks, the resources that are applied by ptp should be released. Therefore, when hclge_update_port_info returns an error code, hclge_ptp_uninit is called to release the corresponding resources. Fixes: eaf83ae59e18 ("net: hns3: add querying fec ability from firmware") Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-09net: hns3: change type of numa_node_mask as nodemask_tPeiyang Wang5-8/+11
It provides nodemask_t to describe the numa node mask in kernel. To improve transportability, change the type of numa_node_mask as nodemask_t. Fixes: 38caee9d3ee8 ("net: hns3: Add support of the HNAE3 framework") Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-09net: hns3: direct return when receive a unknown mailbox messageJian Shen1-3/+4
Currently, the driver didn't return when receive a unknown mailbox message, and continue checking whether need to generate a response. It's unnecessary and may be incorrect. Fixes: bb5790b71bad ("net: hns3: refactor mailbox response scheme between PF and VF") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-09net: hns3: using user configure after hardware resetPeiyang Wang2-6/+12
When a reset occurring, it's supposed to recover user's configuration. Currently, the port info(speed, duplex and autoneg) is stored in hclge_mac and will be scheduled updated. Consider the case that reset was happened consecutively. During the first reset, the port info is configured with a temporary value cause the PHY is reset and looking for best link config. Second reset start and use pervious configuration which is not the user's. The specific process is as follows: +------+ +----+ +----+ | USER | | PF | | HW | +---+--+ +-+--+ +-+--+ | ethtool --reset | | +------------------->| reset command | | ethtool --reset +-------------------->| +------------------->| +---+ | +---+ | | | | |reset currently | | HW RESET | | |and wait to do | | | |<--+ | | | | send pervious cfg |<--+ | | (1000M FULL AN_ON) | | +-------------------->| | | read cfg(time task) | | | (10M HALF AN_OFF) +---+ | |<--------------------+ | cfg take effect | | reset command |<--+ | +-------------------->| | | +---+ | | send pervious cfg | | HW RESET | | (10M HALF AN_OFF) |<--+ | +-------------------->| | | read cfg(time task) | | | (10M HALF AN_OFF) +---+ | |<--------------------+ | cfg take effect | | | | | | read cfg(time task) |<--+ | | (10M HALF AN_OFF) | | |<--------------------+ | | | v v v To avoid aboved situation, this patch introduced req_speed, req_duplex, req_autoneg to store user's configuration and it only be used after hardware reset and to recover user's configuration Fixes: f5f2b3e4dcc0 ("net: hns3: add support for imp-controlled PHYs") Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-09net/smc: fix neighbour and rtable leak in smc_ib_find_route()Wen Gu1-7/+12
In smc_ib_find_route(), the neighbour found by neigh_lookup() and rtable resolved by ip_route_output_flow() are not released or put before return. It may cause the refcount leak, so fix it. Link: https://lore.kernel.org/r/20240506015439.108739-1-guwen@linux.alibaba.com Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Link: https://lore.kernel.org/r/20240507125331.2808-1-guwen@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-09ipv6: prevent NULL dereference in ip6_output()Eric Dumazet1-1/+1
According to syzbot, there is a chance that ip6_dst_idev() returns NULL in ip6_output(). Most places in IPv6 stack deal with a NULL idev just fine, but not here. syzbot reported: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7] CPU: 0 PID: 9775 Comm: syz-executor.4 Not tainted 6.9.0-rc5-syzkaller-00157-g6a30653b604a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:ip6_output+0x231/0x3f0 net/ipv6/ip6_output.c:237 Code: 3c 1e 00 49 89 df 74 08 4c 89 ef e8 19 58 db f7 48 8b 44 24 20 49 89 45 00 49 89 c5 48 8d 9d e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 4c 8b 74 24 28 0f 85 61 01 00 00 8b 1b 31 ff RSP: 0018:ffffc9000927f0d8 EFLAGS: 00010202 RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000040000 RDX: ffffc900131f9000 RSI: 0000000000004f47 RDI: 0000000000004f48 RBP: 0000000000000000 R08: ffffffff8a1f0b9a R09: 1ffffffff1f51fad R10: dffffc0000000000 R11: fffffbfff1f51fae R12: ffff8880293ec8c0 R13: ffff88805d7fc000 R14: 1ffff1100527d91a R15: dffffc0000000000 FS: 00007f135c6856c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000080 CR3: 0000000064096000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> NF_HOOK include/linux/netfilter.h:314 [inline] ip6_xmit+0xefe/0x17f0 net/ipv6/ip6_output.c:358 sctp_v6_xmit+0x9f2/0x13f0 net/sctp/ipv6.c:248 sctp_packet_transmit+0x26ad/0x2ca0 net/sctp/output.c:653 sctp_packet_singleton+0x22c/0x320 net/sctp/outqueue.c:783 sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline] sctp_outq_flush+0x6d5/0x3e20 net/sctp/outqueue.c:1212 sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline] sctp_do_sm+0x59cc/0x60c0 net/sctp/sm_sideeffect.c:1169 sctp_primitive_ASSOCIATE+0x95/0xc0 net/sctp/primitive.c:73 __sctp_connect+0x9cd/0xe30 net/sctp/socket.c:1234 sctp_connect net/sctp/socket.c:4819 [inline] sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834 __sys_connect_file net/socket.c:2048 [inline] __sys_connect+0x2df/0x310 net/socket.c:2065 __do_sys_connect net/socket.c:2075 [inline] __se_sys_connect net/socket.c:2072 [inline] __x64_sys_connect+0x7a/0x90 net/socket.c:2072 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 778d80be5269 ("ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/20240507161842.773961-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-09hsr: Simplify code for announcing HSR nodes timer setupLukasz Majewski1-15/+12
Up till now the code to start HSR announce timer, which triggers sending supervisory frames, was assuming that hsr_netdev_notify() would be called at least twice for hsrX interface. This was required to have different values for old and current values of network device's operstate. This is problematic for a case where hsrX interface is already in the operational state when hsr_netdev_notify() is called, so timer is not configured to trigger and as a result the hsrX is not sending supervisory frames to HSR ring. This error has been discovered when hsr_ping.sh script was run. To be more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} states assigned in hsr_check_carrier_and_operstate(hsr). As a result there was no issue with sending supervisory frames. However, with hsr3, the notify function was called only once with operstate set to IF_OPER_UP and timer responsible for triggering supervisory frames was not fired. The solution is to use netif_oper_up() and netif_running() helper functions to assess if network hsrX device is up. Only then, when the timer is not already pending, it is started. Otherwise it is deactivated. Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240507111214.3519800-1-lukma@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-09ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()Eric Dumazet1-1/+5
syzbot is able to trigger the following crash [1], caused by unsafe ip6_dst_idev() use. Indeed ip6_dst_idev() can return NULL, and must always be checked. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 31648 Comm: syz-executor.0 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:__fib6_rule_action net/ipv6/fib6_rules.c:237 [inline] RIP: 0010:fib6_rule_action+0x241/0x7b0 net/ipv6/fib6_rules.c:267 Code: 02 00 00 49 8d 9f d8 00 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 f9 32 bf f7 48 8b 1b 48 89 d8 48 c1 e8 03 <42> 80 3c 20 00 74 08 48 89 df e8 e0 32 bf f7 4c 8b 03 48 89 ef 4c RSP: 0018:ffffc9000fc1f2f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1a772f98c8186700 RDX: 0000000000000003 RSI: ffffffff8bcac4e0 RDI: ffffffff8c1f9760 RBP: ffff8880673fb980 R08: ffffffff8fac15ef R09: 1ffffffff1f582bd R10: dffffc0000000000 R11: fffffbfff1f582be R12: dffffc0000000000 R13: 0000000000000080 R14: ffff888076509000 R15: ffff88807a029a00 FS: 00007f55e82ca6c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31d23000 CR3: 0000000022b66000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> fib_rules_lookup+0x62c/0xdb0 net/core/fib_rules.c:317 fib6_rule_lookup+0x1fd/0x790 net/ipv6/fib6_rules.c:108 ip6_route_output_flags_noref net/ipv6/route.c:2637 [inline] ip6_route_output_flags+0x38e/0x610 net/ipv6/route.c:2649 ip6_route_output include/net/ip6_route.h:93 [inline] ip6_dst_lookup_tail+0x189/0x11a0 net/ipv6/ip6_output.c:1120 ip6_dst_lookup_flow+0xb9/0x180 net/ipv6/ip6_output.c:1250 sctp_v6_get_dst+0x792/0x1e20 net/sctp/ipv6.c:326 sctp_transport_route+0x12c/0x2e0 net/sctp/transport.c:455 sctp_assoc_add_peer+0x614/0x15c0 net/sctp/associola.c:662 sctp_connect_new_asoc+0x31d/0x6c0 net/sctp/socket.c:1099 __sctp_connect+0x66d/0xe30 net/sctp/socket.c:1197 sctp_connect net/sctp/socket.c:4819 [inline] sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834 __sys_connect_file net/socket.c:2048 [inline] __sys_connect+0x2df/0x310 net/socket.c:2065 __do_sys_connect net/socket.c:2075 [inline] __se_sys_connect net/socket.c:2072 [inline] __x64_sys_connect+0x7a/0x90 net/socket.c:2072 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 5e5f3f0f8013 ("[IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240507163145.835254-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-09dt-bindings: net: mediatek: remove wrongly added clocks and SerDesDaniel Golle1-20/+2
Several clocks as well as both sgmiisys phandles were added by mistake to the Ethernet bindings for MT7988. Also, the total number of clocks didn't match with the actual number of items listed. This happened because the vendor driver which served as a reference uses a high number of syscon phandles to access various parts of the SoC which wasn't acceptable upstream. Hence several parts which have never previously been supported (such SerDes PHY and USXGMII PCS) are going to be implemented by separate drivers. As a result the device tree will look much more sane. Quickly align the bindings with the upcoming reality of the drivers actually adding support for the remaining Ethernet-related features of the MT7988 SoC. Fixes: c94a9aabec36 ("dt-bindings: net: mediatek,net: add mt7988-eth binding") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/1569290b21cc787a424469ed74456a7e976b102d.1715084326.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08Merge branch 'rxrpc-miscellaneous-fixes'Jakub Kicinski3-21/+37
David Howells says: ==================== rxrpc: Miscellaneous fixes (part) Here some miscellaneous fixes for AF_RXRPC: (1) Fix the congestion control algorithm to start cwnd at 4 and to not cut ssthresh when the peer cuts its rwind size. (2) Only transmit a single ACK for all the DATA packets glued together into a jumbo packet to reduce the number of ACKs being generated. ==================== Link: https://lore.kernel.org/r/20240503150749.1001323-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08rxrpc: Only transmit one ACK per jumbo packet receivedDavid Howells1-11/+35
Only generate one ACK packet for all the subpackets in a jumbo packet. If we would like to generate more than one ACK, we prioritise them base on their reason code, in the order, highest first: OutOfSeq > NoSpace > ExceedsWin > Duplicate > Requested > Delay > Idle For the first four, we reference the lowest offending subpacket; for the last three, the highest. This reduces the number of ACKs we end up transmitting to one per UDP packet transmitted to reduce network loading and packet parsing. Fixes: 5d7edbc9231e ("rxrpc: Get rid of the Rx ring") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com <mailto:jaltman@auristor.com>> Link: https://lore.kernel.org/r/20240503150749.1001323-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08rxrpc: Fix congestion control algorithmDavid Howells3-10/+2
Make the following fixes to the congestion control algorithm: (1) Don't vary the cwnd starting value by the size of RXRPC_TX_SMSS since that's currently held constant - set to the size of a jumbo subpacket payload so that we can create jumbo packets on the fly. The current code invariably picks 3 as the starting value. Further, the starting cwnd needs to be an even number because we ack every other packet, so set it to 4. (2) Don't cut ssthresh when we see an ACK come from the peer with a receive window (rwind) less than ssthresh. ssthresh keeps track of characteristics of the connection whereas rwind may be reduced by the peer for any reason - and may be reduced to 0. Fixes: 1fc4fa2ac93d ("rxrpc: Fix congestion management") Fixes: 0851115090a3 ("rxrpc: Reduce ssthresh to peer's receive window") Signed-off-by: David Howells <dhowells@redhat.com> Suggested-by: Simon Wilkinson <sxw@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Reviewed-by: Jeffrey Altman <jaltman@auristor.com <mailto:jaltman@auristor.com>> Link: https://lore.kernel.org/r/20240503150749.1001323-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08selftests: test_bridge_neigh_suppress.sh: Fix failures due to duplicate MACIdo Schimmel1-11/+3
When creating the topology for the test, three veth pairs are created in the initial network namespace before being moved to one of the network namespaces created by the test. On systems where systemd-udev uses MACAddressPolicy=persistent (default since systemd version 242), this will result in some net devices having the same MAC address since they were created with the same name in the initial network namespace. In turn, this leads to arping / ndisc6 failing since packets are dropped by the bridge's loopback filter. Fix by creating each net device in the correct network namespace instead of moving it there from the initial network namespace. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240426074015.251854d4@kernel.org/ Fixes: 7648ac72dcd7 ("selftests: net: Add bridge neighbor suppression test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20240507113033.1732534-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08ipv6: Fix potential uninit-value access in __ip6_make_skb()Shigeru Yoshida1-1/+1
As it was done in commit fc1092f51567 ("ipv4: Fix uninit-value access in __ip_make_skb()") for IPv4, check FLOWI_FLAG_KNOWN_NH on fl6->flowi6_flags instead of testing HDRINCL on the socket to avoid a race condition which causes uninit-value access. Fixes: ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()") Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: phy: marvell-88q2xxx: add support for Rev B1 and B2Gregor Herburger1-16/+103
Different revisions of the Marvell 88q2xxx phy needs different init sequences. Add init sequence for Rev B1 and Rev B2. Rev B2 init sequence skips one register write. Tested-by: Dimitri Fedrau <dima.fedrau@gmail.com> Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08appletalk: Improve handling of broadcast packetsVincent Duvert1-3/+16
When a broadcast AppleTalk packet is received, prefer queuing it on the socket whose address matches the address of the interface that received the packet (and is listening on the correct port). Userspace applications that handle such packets will usually send a response on the same socket that received the packet; this fix allows the response to be sent on the correct interface. If a socket matching the interface's address is not found, an arbitrary socket listening on the correct port will be used, if any. This matches the implementation's previous behavior. Fixes atalkd's responses to network information requests when multiple network interfaces are configured to use AppleTalk. Link: https://lore.kernel.org/netdev/20200722113752.1218-2-vincent.ldev@duvert.net/ Link: https://gist.github.com/VinDuv/4db433b6dce39d51a5b7847ee749b2a4 Signed-off-by: Vincent Duvert <vincent.ldev@duvert.net> Signed-off-by: Doug Brown <doug@schmorgal.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08net: bridge: fix corrupted ethernet header on multicast-to-unicastFelix Fietkau1-2/+7
The change from skb_copy to pskb_copy unfortunately changed the data copying to omit the ethernet header, since it was pulled before reaching this point. Fix this by calling __skb_push/pull around pskb_copy. Fixes: 59c878cbcdd8 ("net: bridge: fix multicast-to-unicast with fraglist GSO") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08mptcp: only allow set existing scheduler for net.mptcp.schedulerGregory Detal1-1/+38
The current behavior is to accept any strings as inputs, this results in an inconsistent result where an unexisting scheduler can be set: # sysctl -w net.mptcp.scheduler=notdefault net.mptcp.scheduler = notdefault This patch changes this behavior by checking for existing scheduler before accepting the input. Fixes: e3b2870b6d22 ("mptcp: add a new sysctl scheduler") Cc: stable@vger.kernel.org Signed-off-by: Gregory Detal <gregory.detal@gmail.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Tested-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240506-upstream-net-20240506-mptcp-sched-exist-v1-1-2ed1529e521e@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08nfc: nci: Fix kcov check in nci_rx_work()Tetsuo Handa1-0/+1
Commit 7e8cdc97148c ("nfc: Add KCOV annotations") added kcov_remote_start_common()/kcov_remote_stop() pair into nci_rx_work(), with an assumption that kcov_remote_stop() is called upon continue of the for loop. But commit d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") forgot to call kcov_remote_stop() before break of the for loop. Reported-by: syzbot <syzbot+0438378d6f157baae1a2@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=0438378d6f157baae1a2 Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet") Suggested-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/6d10f829-5a0c-405a-b39a-d7266f3a1a0b@I-love.SAKURA.ne.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07netlink: specs: Add missing bridge linkinfo attrsDonald Hunter1-0/+6
Attributes for FDB learned entries were added to the if_link netlink api for bridge linkinfo but are missing from the rt_link.yaml spec. Add the missing attributes to the spec. Fixes: ddd1ad68826d ("net: bridge: Add netlink knobs for number / max learned FDB entries") Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240503164304.87427-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07phonet: fix rtm_phonet_notify() skb allocationEric Dumazet1-1/+1
fill_route() stores three components in the skb: - struct rtmsg - RTA_DST (u8) - RTA_OIF (u32) Therefore, rtm_phonet_notify() should use NLMSG_ALIGN(sizeof(struct rtmsg)) + nla_total_size(1) + nla_total_size(4) Fixes: f062f41d0657 ("Phonet: routing table Netlink interface") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Rémi Denis-Courmont <courmisch@gmail.com> Link: https://lore.kernel.org/r/20240502161700.1804476-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-06net: fix out-of-bounds access in ops_initThadeu Lima de Souza Cascardo1-3/+10
net_alloc_generic is called by net_alloc, which is called without any locking. It reads max_gen_ptrs, which is changed under pernet_ops_rwsem. It is read twice, first to allocate an array, then to set s.len, which is later used to limit the bounds of the array access. It is possible that the array is allocated and another thread is registering a new pernet ops, increments max_gen_ptrs, which is then used to set s.len with a larger than allocated length for the variable array. Fix it by reading max_gen_ptrs only once in net_alloc_generic. If max_gen_ptrs is later incremented, it will be caught in net_assign_generic. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Fixes: 073862ba5d24 ("netns: fix net_alloc_generic()") Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240502132006.3430840-1-cascardo@igalia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-04MAINTAINERS: update cxgb4 and cxgb3 network drivers maintainerPotnuri Bharat Teja1-3/+3
Add myself(Bharat) as maintainer for cxgb4 and cxgb3 network drivers. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://lore.kernel.org/r/20240502184209.2723379-1-bharat@chelsio.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-04rtnetlink: Correct nested IFLA_VF_VLAN_LIST attribute validationRoded Zats1-1/+1
Each attribute inside a nested IFLA_VF_VLAN_LIST is assumed to be a struct ifla_vf_vlan_info so the size of such attribute needs to be at least of sizeof(struct ifla_vf_vlan_info) which is 14 bytes. The current size validation in do_setvfinfo is against NLA_HDRLEN (4 bytes) which is less than sizeof(struct ifla_vf_vlan_info) so this validation is not enough and a too small attribute might be cast to a struct ifla_vf_vlan_info, this might result in an out of bands read access when accessing the saved (casted) entry in ivvl. Fixes: 79aab093a0b5 ("net: Update API for VF vlan protocol 802.1ad support") Signed-off-by: Roded Zats <rzats@paloaltonetworks.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240502155751.75705-1-rzats@paloaltonetworks.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-04Merge tag 'ipsec-2024-05-02' of ↵Jakub Kicinski7-3/+39
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-05-02 1) Fix an error pointer dereference in xfrm_in_fwd_icmp. From Antony Antony. 2) Preserve vlan tags for ESP transport mode software GRO. From Paul Davey. 3) Fix a spelling mistake in an uapi xfrm.h comment. From Anotny Antony. * tag 'ipsec-2024-05-02' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Correct spelling mistake in xfrm.h comment xfrm: Preserve vlan tags for transport mode software GRO xfrm: fix possible derferencing in error path ==================== Link: https://lore.kernel.org/r/20240502084838.2269355-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-04Merge tag 'for-net-2024-05-03' of ↵Jakub Kicinski9-36/+119
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - mediatek: mt8183-pico6: Fix bluetooth node - sco: Fix use-after-free bugs caused by sco_sock_timeout - l2cap: fix null-ptr-deref in l2cap_chan_timeout - qca: Various fixes - l2cap: Fix slab-use-after-free in l2cap_connect() - msft: fix slab-use-after-free in msft_do_close() - HCI: Fix potential null-ptr-deref * tag 'for-net-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: qca: fix firmware check error path Bluetooth: l2cap: fix null-ptr-deref in l2cap_chan_timeout Bluetooth: HCI: Fix potential null-ptr-deref arm64: dts: mediatek: mt8183-pico6: Fix bluetooth node Bluetooth: qca: fix info leak when fetching board id Bluetooth: qca: fix info leak when fetching fw build id Bluetooth: qca: generalise device address check Bluetooth: qca: fix NVM configuration parsing Bluetooth: qca: add missing firmware sanity checks Bluetooth: msft: fix slab-use-after-free in msft_do_close() Bluetooth: L2CAP: Fix slab-use-after-free in l2cap_connect() Bluetooth: qca: fix wcn3991 device address check Bluetooth: Fix use-after-free bugs caused by sco_sock_timeout ==================== Link: https://lore.kernel.org/r/20240503171933.3851244-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-04net: ks8851: Queue RX packets in IRQ handler instead of disabling BHsMarek Vasut1-6/+10
Currently the driver uses local_bh_disable()/local_bh_enable() in its IRQ handler to avoid triggering net_rx_action() softirq on exit from netif_rx(). The net_rx_action() could trigger this driver .start_xmit callback, which is protected by the same lock as the IRQ handler, so calling the .start_xmit from netif_rx() from the IRQ handler critical section protected by the lock could lead to an attempt to claim the already claimed lock, and a hang. The local_bh_disable()/local_bh_enable() approach works only in case the IRQ handler is protected by a spinlock, but does not work if the IRQ handler is protected by mutex, i.e. this works for KS8851 with Parallel bus interface, but not for KS8851 with SPI bus interface. Remove the BH manipulation and instead of calling netif_rx() inside the IRQ handler code protected by the lock, queue all the received SKBs in the IRQ handler into a queue first, and once the IRQ handler exits the critical section protected by the lock, dequeue all the queued SKBs and push them all into netif_rx(). At this point, it is safe to trigger the net_rx_action() softirq, since the netif_rx() call is outside of the lock that protects the IRQ handler. Fixes: be0384bf599c ("net: ks8851: Handle softirqs at the end of IRQ thread to fix hang") Tested-by: Ronald Wahl <ronald.wahl@raritan.com> # KS8851 SPI Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240502183436.117117-1-marex@denx.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-03Bluetooth: qca: fix firmware check error pathJohan Hovold1-1/+1
A recent commit fixed the code that parses the firmware files before downloading them to the controller but introduced a memory leak in case the sanity checks ever fail. Make sure to free the firmware buffer before returning on errors. Fixes: f905ae0be4b7 ("Bluetooth: qca: add missing firmware sanity checks") Cc: stable@vger.kernel.org # 4.19 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: l2cap: fix null-ptr-deref in l2cap_chan_timeoutDuoming Zhou1-0/+3
There is a race condition between l2cap_chan_timeout() and l2cap_chan_del(). When we use l2cap_chan_del() to delete the channel, the chan->conn will be set to null. But the conn could be dereferenced again in the mutex_lock() of l2cap_chan_timeout(). As a result the null pointer dereference bug will happen. The KASAN report triggered by POC is shown below: [ 472.074580] ================================================================== [ 472.075284] BUG: KASAN: null-ptr-deref in mutex_lock+0x68/0xc0 [ 472.075308] Write of size 8 at addr 0000000000000158 by task kworker/0:0/7 [ 472.075308] [ 472.075308] CPU: 0 PID: 7 Comm: kworker/0:0 Not tainted 6.9.0-rc5-00356-g78c0094a146b #36 [ 472.075308] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu4 [ 472.075308] Workqueue: events l2cap_chan_timeout [ 472.075308] Call Trace: [ 472.075308] <TASK> [ 472.075308] dump_stack_lvl+0x137/0x1a0 [ 472.075308] print_report+0x101/0x250 [ 472.075308] ? __virt_addr_valid+0x77/0x160 [ 472.075308] ? mutex_lock+0x68/0xc0 [ 472.075308] kasan_report+0x139/0x170 [ 472.075308] ? mutex_lock+0x68/0xc0 [ 472.075308] kasan_check_range+0x2c3/0x2e0 [ 472.075308] mutex_lock+0x68/0xc0 [ 472.075308] l2cap_chan_timeout+0x181/0x300 [ 472.075308] process_one_work+0x5d2/0xe00 [ 472.075308] worker_thread+0xe1d/0x1660 [ 472.075308] ? pr_cont_work+0x5e0/0x5e0 [ 472.075308] kthread+0x2b7/0x350 [ 472.075308] ? pr_cont_work+0x5e0/0x5e0 [ 472.075308] ? kthread_blkcg+0xd0/0xd0 [ 472.075308] ret_from_fork+0x4d/0x80 [ 472.075308] ? kthread_blkcg+0xd0/0xd0 [ 472.075308] ret_from_fork_asm+0x11/0x20 [ 472.075308] </TASK> [ 472.075308] ================================================================== [ 472.094860] Disabling lock debugging due to kernel taint [ 472.096136] BUG: kernel NULL pointer dereference, address: 0000000000000158 [ 472.096136] #PF: supervisor write access in kernel mode [ 472.096136] #PF: error_code(0x0002) - not-present page [ 472.096136] PGD 0 P4D 0 [ 472.096136] Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI [ 472.096136] CPU: 0 PID: 7 Comm: kworker/0:0 Tainted: G B 6.9.0-rc5-00356-g78c0094a146b #36 [ 472.096136] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu4 [ 472.096136] Workqueue: events l2cap_chan_timeout [ 472.096136] RIP: 0010:mutex_lock+0x88/0xc0 [ 472.096136] Code: be 08 00 00 00 e8 f8 23 1f fd 4c 89 f7 be 08 00 00 00 e8 eb 23 1f fd 42 80 3c 23 00 74 08 48 88 [ 472.096136] RSP: 0018:ffff88800744fc78 EFLAGS: 00000246 [ 472.096136] RAX: 0000000000000000 RBX: 1ffff11000e89f8f RCX: ffffffff8457c865 [ 472.096136] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88800744fc78 [ 472.096136] RBP: 0000000000000158 R08: ffff88800744fc7f R09: 1ffff11000e89f8f [ 472.096136] R10: dffffc0000000000 R11: ffffed1000e89f90 R12: dffffc0000000000 [ 472.096136] R13: 0000000000000158 R14: ffff88800744fc78 R15: ffff888007405a00 [ 472.096136] FS: 0000000000000000(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 [ 472.096136] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 472.096136] CR2: 0000000000000158 CR3: 000000000da32000 CR4: 00000000000006f0 [ 472.096136] Call Trace: [ 472.096136] <TASK> [ 472.096136] ? __die_body+0x8d/0xe0 [ 472.096136] ? page_fault_oops+0x6b8/0x9a0 [ 472.096136] ? kernelmode_fixup_or_oops+0x20c/0x2a0 [ 472.096136] ? do_user_addr_fault+0x1027/0x1340 [ 472.096136] ? _printk+0x7a/0xa0 [ 472.096136] ? mutex_lock+0x68/0xc0 [ 472.096136] ? add_taint+0x42/0xd0 [ 472.096136] ? exc_page_fault+0x6a/0x1b0 [ 472.096136] ? asm_exc_page_fault+0x26/0x30 [ 472.096136] ? mutex_lock+0x75/0xc0 [ 472.096136] ? mutex_lock+0x88/0xc0 [ 472.096136] ? mutex_lock+0x75/0xc0 [ 472.096136] l2cap_chan_timeout+0x181/0x300 [ 472.096136] process_one_work+0x5d2/0xe00 [ 472.096136] worker_thread+0xe1d/0x1660 [ 472.096136] ? pr_cont_work+0x5e0/0x5e0 [ 472.096136] kthread+0x2b7/0x350 [ 472.096136] ? pr_cont_work+0x5e0/0x5e0 [ 472.096136] ? kthread_blkcg+0xd0/0xd0 [ 472.096136] ret_from_fork+0x4d/0x80 [ 472.096136] ? kthread_blkcg+0xd0/0xd0 [ 472.096136] ret_from_fork_asm+0x11/0x20 [ 472.096136] </TASK> [ 472.096136] Modules linked in: [ 472.096136] CR2: 0000000000000158 [ 472.096136] ---[ end trace 0000000000000000 ]--- [ 472.096136] RIP: 0010:mutex_lock+0x88/0xc0 [ 472.096136] Code: be 08 00 00 00 e8 f8 23 1f fd 4c 89 f7 be 08 00 00 00 e8 eb 23 1f fd 42 80 3c 23 00 74 08 48 88 [ 472.096136] RSP: 0018:ffff88800744fc78 EFLAGS: 00000246 [ 472.096136] RAX: 0000000000000000 RBX: 1ffff11000e89f8f RCX: ffffffff8457c865 [ 472.096136] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88800744fc78 [ 472.096136] RBP: 0000000000000158 R08: ffff88800744fc7f R09: 1ffff11000e89f8f [ 472.132932] R10: dffffc0000000000 R11: ffffed1000e89f90 R12: dffffc0000000000 [ 472.132932] R13: 0000000000000158 R14: ffff88800744fc78 R15: ffff888007405a00 [ 472.132932] FS: 0000000000000000(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 [ 472.132932] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 472.132932] CR2: 0000000000000158 CR3: 000000000da32000 CR4: 00000000000006f0 [ 472.132932] Kernel panic - not syncing: Fatal exception [ 472.132932] Kernel Offset: disabled [ 472.132932] ---[ end Kernel panic - not syncing: Fatal exception ]--- Add a check to judge whether the conn is null in l2cap_chan_timeout() in order to mitigate the bug. Fixes: 3df91ea20e74 ("Bluetooth: Revert to mutexes from RCU list") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: HCI: Fix potential null-ptr-derefSungwoo Kim1-0/+2
Fix potential null-ptr-deref in hci_le_big_sync_established_evt(). Fixes: f777d8827817 (Bluetooth: ISO: Notify user space about failed bis connections) Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03arm64: dts: mediatek: mt8183-pico6: Fix bluetooth nodeChen-Yu Tsai1-1/+2
Bluetooth is not a random device connected to the MMC/SD controller. It is function 2 of the SDIO device. Fix the address of the bluetooth node. Also fix the node name and drop the label. Fixes: 055ef10ccdd4 ("arm64: dts: mt8183: Add jacuzzi pico/pico6 board") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: qca: fix info leak when fetching board idJohan Hovold1-0/+5
Add the missing sanity check when fetching the board id to avoid leaking slab data when later requesting the firmware. Fixes: a7f8dedb4be2 ("Bluetooth: qca: add support for QCA2066") Cc: stable@vger.kernel.org # 6.7 Cc: Tim Jiang <quic_tjiang@quicinc.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: qca: fix info leak when fetching fw build idJohan Hovold2-5/+21
Add the missing sanity checks and move the 255-byte build-id buffer off the stack to avoid leaking stack data through debugfs in case the build-info reply is malformed. Fixes: c0187b0bd3e9 ("Bluetooth: btqca: Add support to read FW build version for WCN3991 BTSoC") Cc: stable@vger.kernel.org # 5.12 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: qca: generalise device address checkJohan Hovold2-9/+14
The default device address apparently comes from the NVM configuration file and can differ quite a bit between controllers. Store the default address when parsing the configuration file and use it to determine whether the controller has been provisioned with an address. This makes sure that devices without a unique address start as unconfigured unless a valid address has been provided in the devicetree. Fixes: 32868e126c78 ("Bluetooth: qca: fix invalid device address check") Cc: stable@vger.kernel.org # 6.5 Cc: Doug Anderson <dianders@chromium.org> Cc: Janaki Ramaiah Thota <quic_janathot@quicinc.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: qca: fix NVM configuration parsingJohan Hovold1-2/+22
The NVM configuration files used by WCN3988 and WCN3990/1/8 have two sets of configuration tags that are enclosed by a type-length header of type four which the current parser fails to account for. Instead the driver happily parses random data as if it were valid tags, something which can lead to the configuration data being corrupted if it ever encounters the words 0x0011 or 0x001b. As is clear from commit b63882549b2b ("Bluetooth: btqca: Fix the NVM baudrate tag offcet for wcn3991") the intention has always been to process the configuration data also for WCN3991 and WCN3998 which encodes the baud rate at a different offset. Fix the parser so that it can handle the WCN3xxx configuration files, which has an enclosing type-length header of type four and two sets of TLV tags enclosed by a type-length header of type two and three, respectively. Note that only the first set, which contains the tags the driver is currently looking for, will be parsed for now. With the parser fixed, the software in-band sleep bit will now be set for WCN3991 and WCN3998 (as it is for later controllers) and the default baud rate 3200000 may be updated by the driver also for WCN3xxx controllers. Notably the deep-sleep feature bit is already set by default in all configuration files in linux-firmware. Fixes: 4219d4686875 ("Bluetooth: btqca: Add wcn3990 firmware download support.") Cc: stable@vger.kernel.org # 4.19 Cc: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: qca: add missing firmware sanity checksJohan Hovold1-6/+32
Add the missing sanity checks when parsing the firmware files before downloading them to avoid accessing and corrupting memory beyond the vmalloced buffer. Fixes: 83e81961ff7e ("Bluetooth: btqca: Introduce generic QCA ROME support") Cc: stable@vger.kernel.org # 4.10 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: msft: fix slab-use-after-free in msft_do_close()Sungwoo Kim3-5/+4
Tying the msft->data lifetime to hdev by freeing it in hci_release_dev() to fix the following case: [use] msft_do_close() msft = hdev->msft_data; if (!msft) ...(1) <- passed. return; mutex_lock(&msft->filter_lock); ...(4) <- used after freed. [free] msft_unregister() msft = hdev->msft_data; hdev->msft_data = NULL; ...(2) kfree(msft); ...(3) <- msft is freed. ================================================================== BUG: KASAN: slab-use-after-free in __mutex_lock_common kernel/locking/mutex.c:587 [inline] BUG: KASAN: slab-use-after-free in __mutex_lock+0x8f/0xc30 kernel/locking/mutex.c:752 Read of size 8 at addr ffff888106cbbca8 by task kworker/u5:2/309 Fixes: bf6a4e30ffbd ("Bluetooth: disable advertisement filters during suspend") Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: L2CAP: Fix slab-use-after-free in l2cap_connect()Sungwoo Kim1-11/+10
Extend a critical section to prevent chan from early freeing. Also make the l2cap_connect() return type void. Nothing is using the returned value but it is ugly to return a potentially freed pointer. Making it void will help with backports because earlier kernels did use the return value. Now the compile will break for kernels where this patch is not a complete fix. Call stack summary: [use] l2cap_bredr_sig_cmd l2cap_connect ┌ mutex_lock(&conn->chan_lock); │ chan = pchan->ops->new_connection(pchan); <- alloc chan │ __l2cap_chan_add(conn, chan); │ l2cap_chan_hold(chan); │ list_add(&chan->list, &conn->chan_l); ... (1) └ mutex_unlock(&conn->chan_lock); chan->conf_state ... (4) <- use after free [free] l2cap_conn_del ┌ mutex_lock(&conn->chan_lock); │ foreach chan in conn->chan_l: ... (2) │ l2cap_chan_put(chan); │ l2cap_chan_destroy │ kfree(chan) ... (3) <- chan freed └ mutex_unlock(&conn->chan_lock); ================================================================== BUG: KASAN: slab-use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline] BUG: KASAN: slab-use-after-free in _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline] BUG: KASAN: slab-use-after-free in l2cap_connect+0xa67/0x11a0 net/bluetooth/l2cap_core.c:4260 Read of size 8 at addr ffff88810bf040a0 by task kworker/u3:1/311 Fixes: 73ffa904b782 ("Bluetooth: Move conf_{req,rsp} stuff to struct l2cap_chan") Signed-off-by: Sungwoo Kim <iam@sung-woo.kim> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: qca: fix wcn3991 device address checkJohan Hovold1-1/+4
Qualcomm Bluetooth controllers may not have been provisioned with a valid device address and instead end up using the default address 00:00:00:00:5a:ad. This address is now used to determine if a controller has a valid address or if one needs to be provided through devicetree or by user space before the controller can be used. It turns out that the WCN3991 controllers used in Chromium Trogdor machines use a different default address, 39:98:00:00:5a:ad, which also needs to be marked as invalid so that the correct address is fetched from the devicetree. Qualcomm has unfortunately not yet provided any answers as to whether the 39:98 encodes a hardware id and if there are other variants of the default address that needs to be handled by the driver. For now, add the Trogdor WCN3991 default address to the device address check to avoid having these controllers start with the default address instead of their assigned addresses. Fixes: 32868e126c78 ("Bluetooth: qca: fix invalid device address check") Cc: stable@vger.kernel.org # 6.5 Cc: Doug Anderson <dianders@chromium.org> Cc: Janaki Ramaiah Thota <quic_janathot@quicinc.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03Bluetooth: Fix use-after-free bugs caused by sco_sock_timeoutDuoming Zhou1-0/+4
When the sco connection is established and then, the sco socket is releasing, timeout_work will be scheduled to judge whether the sco disconnection is timeout. The sock will be deallocated later, but it is dereferenced again in sco_sock_timeout. As a result, the use-after-free bugs will happen. The root cause is shown below: Cleanup Thread | Worker Thread sco_sock_release | sco_sock_close | __sco_sock_close | sco_sock_set_timer | schedule_delayed_work | sco_sock_kill | (wait a time) sock_put(sk) //FREE | sco_sock_timeout | sock_hold(sk) //USE The KASAN report triggered by POC is shown below: [ 95.890016] ================================================================== [ 95.890496] BUG: KASAN: slab-use-after-free in sco_sock_timeout+0x5e/0x1c0 [ 95.890755] Write of size 4 at addr ffff88800c388080 by task kworker/0:0/7 ... [ 95.890755] Workqueue: events sco_sock_timeout [ 95.890755] Call Trace: [ 95.890755] <TASK> [ 95.890755] dump_stack_lvl+0x45/0x110 [ 95.890755] print_address_description+0x78/0x390 [ 95.890755] print_report+0x11b/0x250 [ 95.890755] ? __virt_addr_valid+0xbe/0xf0 [ 95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755] kasan_report+0x139/0x170 [ 95.890755] ? update_load_avg+0xe5/0x9f0 [ 95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755] kasan_check_range+0x2c3/0x2e0 [ 95.890755] sco_sock_timeout+0x5e/0x1c0 [ 95.890755] process_one_work+0x561/0xc50 [ 95.890755] worker_thread+0xab2/0x13c0 [ 95.890755] ? pr_cont_work+0x490/0x490 [ 95.890755] kthread+0x279/0x300 [ 95.890755] ? pr_cont_work+0x490/0x490 [ 95.890755] ? kthread_blkcg+0xa0/0xa0 [ 95.890755] ret_from_fork+0x34/0x60 [ 95.890755] ? kthread_blkcg+0xa0/0xa0 [ 95.890755] ret_from_fork_asm+0x11/0x20 [ 95.890755] </TASK> [ 95.890755] [ 95.890755] Allocated by task 506: [ 95.890755] kasan_save_track+0x3f/0x70 [ 95.890755] __kasan_kmalloc+0x86/0x90 [ 95.890755] __kmalloc+0x17f/0x360 [ 95.890755] sk_prot_alloc+0xe1/0x1a0 [ 95.890755] sk_alloc+0x31/0x4e0 [ 95.890755] bt_sock_alloc+0x2b/0x2a0 [ 95.890755] sco_sock_create+0xad/0x320 [ 95.890755] bt_sock_create+0x145/0x320 [ 95.890755] __sock_create+0x2e1/0x650 [ 95.890755] __sys_socket+0xd0/0x280 [ 95.890755] __x64_sys_socket+0x75/0x80 [ 95.890755] do_syscall_64+0xc4/0x1b0 [ 95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [ 95.890755] Freed by task 506: [ 95.890755] kasan_save_track+0x3f/0x70 [ 95.890755] kasan_save_free_info+0x40/0x50 [ 95.890755] poison_slab_object+0x118/0x180 [ 95.890755] __kasan_slab_free+0x12/0x30 [ 95.890755] kfree+0xb2/0x240 [ 95.890755] __sk_destruct+0x317/0x410 [ 95.890755] sco_sock_release+0x232/0x280 [ 95.890755] sock_close+0xb2/0x210 [ 95.890755] __fput+0x37f/0x770 [ 95.890755] task_work_run+0x1ae/0x210 [ 95.890755] get_signal+0xe17/0xf70 [ 95.890755] arch_do_signal_or_restart+0x3f/0x520 [ 95.890755] syscall_exit_to_user_mode+0x55/0x120 [ 95.890755] do_syscall_64+0xd1/0x1b0 [ 95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [ 95.890755] The buggy address belongs to the object at ffff88800c388000 [ 95.890755] which belongs to the cache kmalloc-1k of size 1024 [ 95.890755] The buggy address is located 128 bytes inside of [ 95.890755] freed 1024-byte region [ffff88800c388000, ffff88800c388400) [ 95.890755] [ 95.890755] The buggy address belongs to the physical page: [ 95.890755] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88800c38a800 pfn:0xc388 [ 95.890755] head: order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 95.890755] anon flags: 0x100000000000840(slab|head|node=0|zone=1) [ 95.890755] page_type: 0xffffffff() [ 95.890755] raw: 0100000000000840 ffff888006842dc0 0000000000000000 0000000000000001 [ 95.890755] raw: ffff88800c38a800 000000000010000a 00000001ffffffff 0000000000000000 [ 95.890755] head: 0100000000000840 ffff888006842dc0 0000000000000000 0000000000000001 [ 95.890755] head: ffff88800c38a800 000000000010000a 00000001ffffffff 0000000000000000 [ 95.890755] head: 0100000000000003 ffffea000030e201 ffffea000030e248 00000000ffffffff [ 95.890755] head: 0000000800000000 0000000000000000 00000000ffffffff 0000000000000000 [ 95.890755] page dumped because: kasan: bad access detected [ 95.890755] [ 95.890755] Memory state around the buggy address: [ 95.890755] ffff88800c387f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 95.890755] ffff88800c388000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 95.890755] >ffff88800c388080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 95.890755] ^ [ 95.890755] ffff88800c388100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 95.890755] ffff88800c388180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 95.890755] ================================================================== Fix this problem by adding a check protected by sco_conn_lock to judget whether the conn->hcon is null. Because the conn->hcon will be set to null, when the sock is releasing. Fixes: ba316be1b6a0 ("Bluetooth: schedule SCO timeouts with delayed_work") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-05-03tcp: Use refcount_inc_not_zero() in tcp_twsk_unique().Kuniyuki Iwashima1-1/+7
Anderson Nascimento reported a use-after-free splat in tcp_twsk_unique() with nice analysis. Since commit ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance"), inet_twsk_hashdance() sets TIME-WAIT socket's sk_refcnt after putting it into ehash and releasing the bucket lock. Thus, there is a small race window where other threads could try to reuse the port during connect() and call sock_hold() in tcp_twsk_unique() for the TIME-WAIT socket with zero refcnt. If that happens, the refcnt taken by tcp_twsk_unique() is overwritten and sock_put() will cause underflow, triggering a real use-after-free somewhere else. To avoid the use-after-free, we need to use refcount_inc_not_zero() in tcp_twsk_unique() and give up on reusing the port if it returns false. [0]: refcount_t: addition on 0; use-after-free. WARNING: CPU: 0 PID: 1039313 at lib/refcount.c:25 refcount_warn_saturate+0xe5/0x110 CPU: 0 PID: 1039313 Comm: trigger Not tainted 6.8.6-200.fc39.x86_64 #1 Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 RIP: 0010:refcount_warn_saturate+0xe5/0x110 Code: 42 8e ff 0f 0b c3 cc cc cc cc 80 3d aa 13 ea 01 00 0f 85 5e ff ff ff 48 c7 c7 f8 8e b7 82 c6 05 96 13 ea 01 01 e8 7b 42 8e ff <0f> 0b c3 cc cc cc cc 48 c7 c7 50 8f b7 82 c6 05 7a 13 ea 01 01 e8 RSP: 0018:ffffc90006b43b60 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff888009bb3ef0 RCX: 0000000000000027 RDX: ffff88807be218c8 RSI: 0000000000000001 RDI: ffff88807be218c0 RBP: 0000000000069d70 R08: 0000000000000000 R09: ffffc90006b439f0 R10: ffffc90006b439e8 R11: 0000000000000003 R12: ffff8880029ede84 R13: 0000000000004e20 R14: ffffffff84356dc0 R15: ffff888009bb3ef0 FS: 00007f62c10926c0(0000) GS:ffff88807be00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020ccb000 CR3: 000000004628c005 CR4: 0000000000f70ef0 PKRU: 55555554 Call Trace: <TASK> ? refcount_warn_saturate+0xe5/0x110 ? __warn+0x81/0x130 ? refcount_warn_saturate+0xe5/0x110 ? report_bug+0x171/0x1a0 ? refcount_warn_saturate+0xe5/0x110 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? refcount_warn_saturate+0xe5/0x110 tcp_twsk_unique+0x186/0x190 __inet_check_established+0x176/0x2d0 __inet_hash_connect+0x74/0x7d0 ? __pfx___inet_check_established+0x10/0x10 tcp_v4_connect+0x278/0x530 __inet_stream_connect+0x10f/0x3d0 inet_stream_connect+0x3a/0x60 __sys_connect+0xa8/0xd0 __x64_sys_connect+0x18/0x20 do_syscall_64+0x83/0x170 entry_SYSCALL_64_after_hwframe+0x78/0x80 RIP: 0033:0x7f62c11a885d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007f62c1091e58 EFLAGS: 00000296 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX: 00007f62c11a885d RDX: 0000000000000010 RSI: 0000000020ccb000 RDI: 0000000000000003 RBP: 00007f62c1091e90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000296 R12: 00007f62c10926c0 R13: ffffffffffffff88 R14: 0000000000000000 R15: 00007ffe237885b0 </TASK> Fixes: ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance") Reported-by: Anderson Nascimento <anderson@allelesecurity.com> Closes: https://lore.kernel.org/netdev/37a477a6-d39e-486b-9577-3463f655a6b7@allelesecurity.com/ Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240501213145.62261-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-03tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV socketsEric Dumazet3-3/+7
TCP_SYN_RECV state is really special, it is only used by cross-syn connections, mostly used by fuzzers. In the following crash [1], syzbot managed to trigger a divide by zero in tcp_rcv_space_adjust() A socket makes the following state transitions, without ever calling tcp_init_transfer(), meaning tcp_init_buffer_space() is also not called. TCP_CLOSE connect() TCP_SYN_SENT TCP_SYN_RECV shutdown() -> tcp_shutdown(sk, SEND_SHUTDOWN) TCP_FIN_WAIT1 To fix this issue, change tcp_shutdown() to not perform a TCP_SYN_RECV -> TCP_FIN_WAIT1 transition, which makes no sense anyway. When tcp_rcv_state_process() later changes socket state from TCP_SYN_RECV to TCP_ESTABLISH, then look at sk->sk_shutdown to finally enter TCP_FIN_WAIT1 state, and send a FIN packet from a sane socket state. This means tcp_send_fin() can now be called from BH context, and must use GFP_ATOMIC allocations. [1] divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 PID: 5084 Comm: syz-executor358 Not tainted 6.9.0-rc6-syzkaller-00022-g98369dccd2f8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:tcp_rcv_space_adjust+0x2df/0x890 net/ipv4/tcp_input.c:767 Code: e3 04 4c 01 eb 48 8b 44 24 38 0f b6 04 10 84 c0 49 89 d5 0f 85 a5 03 00 00 41 8b 8e c8 09 00 00 89 e8 29 c8 48 0f af c3 31 d2 <48> f7 f1 48 8d 1c 43 49 8d 96 76 08 00 00 48 89 d0 48 c1 e8 03 48 RSP: 0018:ffffc900031ef3f0 EFLAGS: 00010246 RAX: 0c677a10441f8f42 RBX: 000000004fb95e7e RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000027d4b11f R08: ffffffff89e535a4 R09: 1ffffffff25e6ab7 R10: dffffc0000000000 R11: ffffffff8135e920 R12: ffff88802a9f8d30 R13: dffffc0000000000 R14: ffff88802a9f8d00 R15: 1ffff1100553f2da FS: 00005555775c0380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1155bf2304 CR3: 000000002b9f2000 CR4: 0000000000350ef0 Call Trace: <TASK> tcp_recvmsg_locked+0x106d/0x25a0 net/ipv4/tcp.c:2513 tcp_recvmsg+0x25d/0x920 net/ipv4/tcp.c:2578 inet6_recvmsg+0x16a/0x730 net/ipv6/af_inet6.c:680 sock_recvmsg_nosec net/socket.c:1046 [inline] sock_recvmsg+0x109/0x280 net/socket.c:1068 ____sys_recvmsg+0x1db/0x470 net/socket.c:2803 ___sys_recvmsg net/socket.c:2845 [inline] do_recvmmsg+0x474/0xae0 net/socket.c:2939 __sys_recvmmsg net/socket.c:3018 [inline] __do_sys_recvmmsg net/socket.c:3041 [inline] __se_sys_recvmmsg net/socket.c:3034 [inline] __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3034 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7faeb6363db9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcc1997168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007faeb6363db9 RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000001c R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20240501125448.896529-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02Merge tag 'net-6.9-rc7' of ↵Linus Torvalds49-193/+378
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf. Relatively calm week, likely due to public holiday in most places. No known outstanding regressions. Current release - regressions: - rxrpc: fix wrong alignmask in __page_frag_alloc_align() - eth: e1000e: change usleep_range to udelay in PHY mdic access Previous releases - regressions: - gro: fix udp bad offset in socket lookup - bpf: fix incorrect runtime stat for arm64 - tipc: fix UAF in error path - netfs: fix a potential infinite loop in extract_user_to_sg() - eth: ice: ensure the copied buf is NUL terminated - eth: qeth: fix kernel panic after setting hsuid Previous releases - always broken: - bpf: - verifier: prevent userspace memory access - xdp: use flags field to disambiguate broadcast redirect - bridge: fix multicast-to-unicast with fraglist GSO - mptcp: ensure snd_nxt is properly initialized on connect - nsh: fix outer header access in nsh_gso_segment(). - eth: bcmgenet: fix racing registers access - eth: vxlan: fix stats counters. Misc: - a bunch of MAINTAINERS file updates" * tag 'net-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) MAINTAINERS: mark MYRICOM MYRI-10G as Orphan MAINTAINERS: remove Ariel Elior net: gro: add flush check in udp_gro_receive_segment net: gro: fix udp bad offset in socket lookup by adding {inner_}network_offset to napi_gro_cb ipv4: Fix uninit-value access in __ip_make_skb() s390/qeth: Fix kernel panic after setting hsuid vxlan: Pull inner IP header in vxlan_rcv(). tipc: fix a possible memleak in tipc_buf_append tipc: fix UAF in error path rxrpc: Clients must accept conn from any address net: core: reject skb_copy(_expand) for fraglist GSO skbs net: bridge: fix multicast-to-unicast with fraglist GSO mptcp: ensure snd_nxt is properly initialized on connect e1000e: change usleep_range to udelay in PHY mdic access net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341 cxgb4: Properly lock TX queue for the selftest. rxrpc: Fix using alignmask being zero for __page_frag_alloc_align() vxlan: Add missing VNI filter counter update in arp_reduce(). vxlan: Fix racy device stats updates. net: qede: use return from qede_parse_actions() ...
2024-05-02MAINTAINERS: mark MYRICOM MYRI-10G as OrphanJakub Kicinski1-2/+1
Chris's email address bounces and lore hasn't seen an email from anyone with his name for almost a decade. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240430233532.1356982-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-02MAINTAINERS: remove Ariel EliorJakub Kicinski1-3/+0
aelior@marvell.com bounces, we haven't seen Ariel on lore since March 2022. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240430233305.1356105-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-02Merge branch 'net-gro-add-flush-flush_id-checks-and-fix-wrong-offset-in-udp'Paolo Abeni9-5/+33
Richard Gobert says: ==================== net: gro: add flush/flush_id checks and fix wrong offset in udp This series fixes a bug in the complete phase of UDP in GRO, in which socket lookup fails due to using network_header when parsing encapsulated packets. The fix is to add network_offset and inner_network_offset to napi_gro_cb and use these offsets for socket lookup. In addition p->flush/flush_id should be checked in all UDP flows. The same logic from tcp_gro_receive is applied for all flows in udp_gro_receive_segment. This prevents packets with mismatching network headers (flush/flush_id turned on) from merging in UDP GRO. The original series includes a change to vxlan test which adds the local parameter to prevent similar future bugs. I plan to submit it separately to net-next. This series is part of a previously submitted series to net-next: https://lore.kernel.org/all/20240408141720.98832-1-richardbgobert@gmail.com/ v3 -> v4: - Store network offsets, and use them only in udp_gro_complete flows - Correct commit hash used in Fixes tag - v3: https://lore.kernel.org/netdev/20240424163045.123528-1-richardbgobert@gmail.com/ v2 -> v3: - Add network_offsets and fix udp bug in a single commit to make backporting easier - Write to inner_network_offset in {inet,ipv6}_gro_receive - Use network_offsets union in tcp[46]_gro_complete as well - v2: https://lore.kernel.org/netdev/20240419153542.121087-1-richardbgobert@gmail.com/ v1 -> v2: - Use network_offsets instead of p_poff param as suggested by Willem - Check flush before postpull, and for all UDP GRO flows - v1: https://lore.kernel.org/netdev/20240412152120.115067-1-richardbgobert@gmail.com/ ==================== Link: https://lore.kernel.org/r/20240430143555.126083-1-richardbgobert@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-02net: gro: add flush check in udp_gro_receive_segmentRichard Gobert1-1/+11
GRO-GSO path is supposed to be transparent and as such L3 flush checks are relevant to all UDP flows merging in GRO. This patch uses the same logic and code from tcp_gro_receive, terminating merge if flush is non zero. Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-02net: gro: fix udp bad offset in socket lookup by adding ↵Richard Gobert9-4/+22
{inner_}network_offset to napi_gro_cb Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp: additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the complete phase of gro. The functions always return skb->network_header, which in the case of encapsulated packets at the gro complete phase, is always set to the innermost L3 of the packet. That means that calling {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ L3/L4 may return an unexpected value. This incorrect usage leads to a bug in GRO's UDP socket lookup. udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These *_hdr functions return network_header which will point to the innermost L3, resulting in the wrong offset being used in __udp{4,6}_lib_lookup with encapsulated packets. This patch adds network_offset and inner_network_offset to napi_gro_cb, and makes sure both are set correctly. To fix the issue, network_offsets union is used inside napi_gro_cb, in which both the outer and the inner network offsets are saved. Reproduction example: Endpoint configuration example (fou + local address bind) # ip fou add port 6666 ipproto 4 # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip # ip link set tun1 up # ip a add 1.1.1.2/24 dev tun1 Netperf TCP_STREAM result on net-next before patch is applied: net-next main, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.28 2.37 net-next main, GRO disabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2745.06 patch applied, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2877.38 Fixes: a6024562ffd7 ("udp: Add GRO functions to UDP socket") Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-02ipv4: Fix uninit-value access in __ip_make_skb()Shigeru Yoshida2-1/+4
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb() tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL while __ip_make_skb() is running, the function will access icmphdr in the skb even if it is not included. This causes the issue reported by KMSAN. Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL on the socket. Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These are union in struct flowi4 and are implicitly initialized by flowi4_init_output(), but we should not rely on specific union layout. Initialize these explicitly in raw_sendmsg(). [1] BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481 ip_finish_skb include/net/ip.h:243 [inline] ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508 raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577 __alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668 alloc_skb include/linux/skbuff.h:1318 [inline] __ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128 ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365 raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648 inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x274/0x3c0 net/socket.c:745 __sys_sendto+0x62c/0x7b0 net/socket.c:2191 __do_sys_sendto net/socket.c:2203 [inline] __se_sys_sendto net/socket.c:2199 [inline] __x64_sys_sendto+0x130/0x200 net/socket.c:2199 do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6d/0x75 CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 Fixes: 99e5acae193e ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Link: https://lore.kernel.org/r/20240430123945.2057348-1-syoshida@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-02s390/qeth: Fix kernel panic after setting hsuidAlexandra Winter1-34/+27
Symptom: When the hsuid attribute is set for the first time on an IQD Layer3 device while the corresponding network interface is already UP, the kernel will try to execute a napi function pointer that is NULL. Example: --------------------------------------------------------------------------- [ 2057.572696] illegal operation: 0001 ilc:1 [#1] SMP [ 2057.572702] Modules linked in: af_iucv qeth_l3 zfcp scsi_transport_fc sunrpc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink ghash_s390 prng xts aes_s390 des_s390 de s_generic sha3_512_s390 sha3_256_s390 sha512_s390 vfio_ccw vfio_mdev mdev vfio_iommu_type1 eadm_sch vfio ext4 mbcache jbd2 qeth_l2 bridge stp llc dasd_eckd_mod qeth dasd_mod qdio ccwgroup pkey zcrypt [ 2057.572739] CPU: 6 PID: 60182 Comm: stress_client Kdump: loaded Not tainted 4.18.0-541.el8.s390x #1 [ 2057.572742] Hardware name: IBM 3931 A01 704 (LPAR) [ 2057.572744] Krnl PSW : 0704f00180000000 0000000000000002 (0x2) [ 2057.572748] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 RI:0 EA:3 [ 2057.572751] Krnl GPRS: 0000000000000004 0000000000000000 00000000a3b008d8 0000000000000000 [ 2057.572754] 00000000a3b008d8 cb923a29c779abc5 0000000000000000 00000000814cfd80 [ 2057.572756] 000000000000012c 0000000000000000 00000000a3b008d8 00000000a3b008d8 [ 2057.572758] 00000000bab6d500 00000000814cfd80 0000000091317e46 00000000814cfc68 [ 2057.572762] Krnl Code:#0000000000000000: 0000 illegal >0000000000000002: 0000 illegal 0000000000000004: 0000 illegal 0000000000000006: 0000 illegal 0000000000000008: 0000 illegal 000000000000000a: 0000 illegal 000000000000000c: 0000 illegal 000000000000000e: 0000 illegal [ 2057.572800] Call Trace: [ 2057.572801] ([<00000000ec639700>] 0xec639700) [ 2057.572803] [<00000000913183e2>] net_rx_action+0x2ba/0x398 [ 2057.572809] [<0000000091515f76>] __do_softirq+0x11e/0x3a0 [ 2057.572813] [<0000000090ce160c>] do_softirq_own_stack+0x3c/0x58 [ 2057.572817] ([<0000000090d2cbd6>] do_softirq.part.1+0x56/0x60) [ 2057.572822] [<0000000090d2cc60>] __local_bh_enable_ip+0x80/0x98 [ 2057.572825] [<0000000091314706>] __dev_queue_xmit+0x2be/0xd70 [ 2057.572827] [<000003ff803dd6d6>] afiucv_hs_send+0x24e/0x300 [af_iucv] [ 2057.572830] [<000003ff803dd88a>] iucv_send_ctrl+0x102/0x138 [af_iucv] [ 2057.572833] [<000003ff803de72a>] iucv_sock_connect+0x37a/0x468 [af_iucv] [ 2057.572835] [<00000000912e7e90>] __sys_connect+0xa0/0xd8 [ 2057.572839] [<00000000912e9580>] sys_socketcall+0x228/0x348 [ 2057.572841] [<0000000091514e1a>] system_call+0x2a6/0x2c8 [ 2057.572843] Last Breaking-Event-Address: [ 2057.572844] [<0000000091317e44>] __napi_poll+0x4c/0x1d8 [ 2057.572846] [ 2057.572847] Kernel panic - not syncing: Fatal exception in interrupt ------------------------------------------------------------------------------------------- Analysis: There is one napi structure per out_q: card->qdio.out_qs[i].napi The napi.poll functions are set during qeth_open(). Since commit 1cfef80d4c2b ("s390/qeth: Don't call dev_close/dev_open (DOWN/UP)") qeth_set_offline()/qeth_set_online() no longer call dev_close()/ dev_open(). So if qeth_free_qdio_queues() cleared card->qdio.out_qs[i].napi.poll while the network interface was UP and the card was offline, they are not set again. Reproduction: chzdev -e $devno layer2=0 ip link set dev $network_interface up echo 0 > /sys/bus/ccwgroup/devices/0.0.$devno/online echo foo > /sys/bus/ccwgroup/devices/0.0.$devno/hsuid echo 1 > /sys/bus/ccwgroup/devices/0.0.$devno/online -> Crash (can be enforced e.g. by af_iucv connect(), ip link down/up, ...) Note that a Completion Queue (CQ) is only enabled or disabled, when hsuid is set for the first time or when it is removed. Workarounds: - Set hsuid before setting the device online for the first time or - Use chzdev -d $devno; chzdev $devno hsuid=xxx; chzdev -e $devno; to set hsuid on an existing device. (this will remove and recreate the network interface) Fix: There is no need to free the output queues when a completion queue is added or removed. card->qdio.state now indicates whether the inbound buffer pool and the outbound queues are allocated. card->qdio.c_q indicates whether a CQ is allocated. Fixes: 1cfef80d4c2b ("s390/qeth: Don't call dev_close/dev_open (DOWN/UP)") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240430091004.2265683-1-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>