From 95ce158b6c93b28842b54b42ad1cb221b9844062 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Thu, 13 Jul 2023 00:34:05 +0200 Subject: dsa: mv88e6xxx: Do a final check before timing out I get sporadic timeouts from the driver when using the MV88E6352. Reading the status again after the loop fixes the problem: the operation is successful but goes undetected. Some added prints show things like this: [ 58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting for switch, addr 1b reg 0b, mask 8000, val 0000, data c000 [ 58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for ATU op 4000, fid 0001 (...) [ 61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting for switch, addr 1c reg 18, mask 8000, val 0000, data 9860 [ 61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting for PHY command 1860 to complete The reason is probably not the commands: I think those are mostly fine with the 50+50ms timeout, but the problem appears when OpenWrt brings up several interfaces in parallel on a system with 7 populated ports: if one of them take more than 50 ms and waits one or more of the others can get stuck on the mutex for the switch and then this can easily multiply. As we sleep and wait, the function loop needs a final check after exiting the loop if we were successful. Suggested-by: Andrew Lunn Cc: Tobias Waldekranz Fixes: 35da1dfd9484 ("net: dsa: mv88e6xxx: Improve performance of busy bit polling") Signed-off-by: Linus Walleij Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/20230712223405.861899-1-linus.walleij@linaro.org Signed-off-by: Jakub Kicinski --- drivers/net/dsa/mv88e6xxx/chip.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 8b51756bd805..c7d51a539451 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -109,6 +109,13 @@ int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg, usleep_range(1000, 2000); } + err = mv88e6xxx_read(chip, addr, reg, &data); + if (err) + return err; + + if ((data & mask) == val) + return 0; + dev_err(chip->dev, "Timeout while waiting for switch\n"); return -ETIMEDOUT; } -- cgit v1.2.3 From 5e1627cb43ddf1b24b92eb26f8d958a3f5676ccb Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Wed, 12 Jul 2023 10:15:10 -0400 Subject: net: usbnet: Fix WARNING in usbnet_start_xmit/usb_submit_urb The syzbot fuzzer identified a problem in the usbnet driver: usb 1-1: BOGUS urb xfer, pipe 3 != type 1 WARNING: CPU: 0 PID: 754 at drivers/usb/core/urb.c:504 usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504 Modules linked in: CPU: 0 PID: 754 Comm: kworker/0:2 Not tainted 6.4.0-rc7-syzkaller-00014-g692b7dc87ca6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Workqueue: mld mld_ifc_work RIP: 0010:usb_submit_urb+0xed6/0x1880 drivers/usb/core/urb.c:504 Code: 7c 24 18 e8 2c b4 5b fb 48 8b 7c 24 18 e8 42 07 f0 fe 41 89 d8 44 89 e1 4c 89 ea 48 89 c6 48 c7 c7 a0 c9 fc 8a e8 5a 6f 23 fb <0f> 0b e9 58 f8 ff ff e8 fe b3 5b fb 48 81 c5 c0 05 00 00 e9 84 f7 RSP: 0018:ffffc9000463f568 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff88801eb28000 RSI: ffffffff814c03b7 RDI: 0000000000000001 RBP: ffff8881443b7190 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000003 R13: ffff88802a77cb18 R14: 0000000000000003 R15: ffff888018262500 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556a99c15a18 CR3: 0000000028c71000 CR4: 0000000000350ef0 Call Trace: usbnet_start_xmit+0xfe5/0x2190 drivers/net/usb/usbnet.c:1453 __netdev_start_xmit include/linux/netdevice.h:4918 [inline] netdev_start_xmit include/linux/netdevice.h:4932 [inline] xmit_one net/core/dev.c:3578 [inline] dev_hard_start_xmit+0x187/0x700 net/core/dev.c:3594 ... This bug is caused by the fact that usbnet trusts the bulk endpoint addresses its probe routine receives in the driver_info structure, and it does not check to see that these endpoints actually exist and have the expected type and directions. The fix is simply to add such a check. Reported-and-tested-by: syzbot+63ee658b9a100ffadbe2@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/000000000000a56e9105d0cec021@google.com/ Signed-off-by: Alan Stern CC: Oliver Neukum Link: https://lore.kernel.org/r/ea152b6d-44df-4f8a-95c6-4db51143dcc1@rowland.harvard.edu Signed-off-by: Jakub Kicinski --- drivers/net/usb/usbnet.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 283ffddda821..2d14b0d78541 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -1775,6 +1775,10 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod) } else if (!info->in || !info->out) status = usbnet_get_endpoints (dev, udev); else { + u8 ep_addrs[3] = { + info->in + USB_DIR_IN, info->out + USB_DIR_OUT, 0 + }; + dev->in = usb_rcvbulkpipe (xdev, info->in); dev->out = usb_sndbulkpipe (xdev, info->out); if (!(info->flags & FLAG_NO_SETINT)) @@ -1784,6 +1788,8 @@ usbnet_probe (struct usb_interface *udev, const struct usb_device_id *prod) else status = 0; + if (status == 0 && !usb_check_bulk_endpoints(udev, ep_addrs)) + status = -EINVAL; } if (status >= 0 && dev->status) status = init_status (dev, udev); -- cgit v1.2.3 From 9845217d60d01d151b45842ef2017a65e8f39f5a Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Wed, 12 Jul 2023 12:16:16 +0100 Subject: net: dsa: ar9331: Use explict flags for regmap single read/write The at9331 is only able to read or write a single register at once. The driver has a custom regmap bus and chooses to tell the regmap core about this by reporting the maximum transfer sizes rather than the explicit flags that exist at the regmap level. Since there are a number of problems with the raw transfer limits and the regmap level flags are better integrated anyway convert the driver to use the flags. No functional change. Signed-off-by: Mark Brown Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- drivers/net/dsa/qca/ar9331.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/qca/ar9331.c b/drivers/net/dsa/qca/ar9331.c index b2bf78ac485e..3b0937031499 100644 --- a/drivers/net/dsa/qca/ar9331.c +++ b/drivers/net/dsa/qca/ar9331.c @@ -1002,6 +1002,8 @@ static const struct regmap_config ar9331_mdio_regmap_config = { .val_bits = 32, .reg_stride = 4, .max_register = AR9331_SW_REG_PAGE, + .use_single_read = true, + .use_single_write = true, .ranges = ar9331_regmap_range, .num_ranges = ARRAY_SIZE(ar9331_regmap_range), @@ -1018,8 +1020,6 @@ static struct regmap_bus ar9331_sw_bus = { .val_format_endian_default = REGMAP_ENDIAN_NATIVE, .read = ar9331_mdio_read, .write = ar9331_sw_bus_write, - .max_raw_read = 4, - .max_raw_write = 4, }; static int ar9331_sw_probe(struct mdio_device *mdiodev) -- cgit v1.2.3 From b685f1a58956fa36cc01123f253351b25bfacfda Mon Sep 17 00:00:00 2001 From: Tanmay Patil Date: Wed, 12 Jul 2023 16:36:57 +0530 Subject: net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field() CPSW ALE has 75 bit ALE entries which are stored within three 32 bit words. The cpsw_ale_get_field() and cpsw_ale_set_field() functions assume that the field will be strictly contained within one word. However, this is not guaranteed to be the case and it is possible for ALE field entries to span across up to two words at the most. Fix the methods to handle getting/setting fields spanning up to two words. Fixes: db82173f23c5 ("netdev: driver: ethernet: add cpsw address lookup engine support") Signed-off-by: Tanmay Patil [s-vadapalli@ti.com: rephrased commit message and added Fixes tag] Signed-off-by: Siddharth Vadapalli Signed-off-by: David S. Miller --- drivers/net/ethernet/ti/cpsw_ale.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c index 0c5e783e574c..64bf22cd860c 100644 --- a/drivers/net/ethernet/ti/cpsw_ale.c +++ b/drivers/net/ethernet/ti/cpsw_ale.c @@ -106,23 +106,37 @@ struct cpsw_ale_dev_id { static inline int cpsw_ale_get_field(u32 *ale_entry, u32 start, u32 bits) { - int idx; + int idx, idx2; + u32 hi_val = 0; idx = start / 32; + idx2 = (start + bits - 1) / 32; + /* Check if bits to be fetched exceed a word */ + if (idx != idx2) { + idx2 = 2 - idx2; /* flip */ + hi_val = ale_entry[idx2] << ((idx2 * 32) - start); + } start -= idx * 32; idx = 2 - idx; /* flip */ - return (ale_entry[idx] >> start) & BITMASK(bits); + return (hi_val + (ale_entry[idx] >> start)) & BITMASK(bits); } static inline void cpsw_ale_set_field(u32 *ale_entry, u32 start, u32 bits, u32 value) { - int idx; + int idx, idx2; value &= BITMASK(bits); - idx = start / 32; + idx = start / 32; + idx2 = (start + bits - 1) / 32; + /* Check if bits to be set exceed a word */ + if (idx != idx2) { + idx2 = 2 - idx2; /* flip */ + ale_entry[idx2] &= ~(BITMASK(bits + start - (idx2 * 32))); + ale_entry[idx2] |= (value >> ((idx2 * 32) - start)); + } start -= idx * 32; - idx = 2 - idx; /* flip */ + idx = 2 - idx; /* flip */ ale_entry[idx] &= ~(BITMASK(bits) << start); ale_entry[idx] |= (value << start); } -- cgit v1.2.3 From 56a16035bb6effb37177867cea94c13a8382f745 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 12 Jul 2023 08:44:49 -0700 Subject: bridge: Add extack warning when enabling STP in netns. When we create an L2 loop on a bridge in netns, we will see packets storm even if STP is enabled. # unshare -n # ip link add br0 type bridge # ip link add veth0 type veth peer name veth1 # ip link set veth0 master br0 up # ip link set veth1 master br0 up # ip link set br0 type bridge stp_state 1 # ip link set br0 up # sleep 30 # ip -s link show br0 2: br0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped missed mcast 956553768 12861249 0 0 0 12861249 <-. Keep TX: bytes packets errors dropped carrier collsns | increasing 1027834 11951 0 0 0 0 <-' rapidly This is because llc_rcv() drops all packets in non-root netns and BPDU is dropped. Let's add extack warning when enabling STP in netns. # unshare -n # ip link add br0 type bridge # ip link set br0 type bridge stp_state 1 Warning: bridge: STP does not work in non-root netns. Note this commit will be reverted later when we namespacify the whole LLC infra. Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe") Suggested-by: Harry Coin Link: https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com/ Suggested-by: Ido Schimmel Signed-off-by: Kuniyuki Iwashima Acked-by: Nikolay Aleksandrov Reviewed-by: Ido Schimmel Signed-off-by: David S. Miller --- net/bridge/br_stp_if.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index 75204d36d7f9..b65962682771 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -201,6 +201,9 @@ int br_stp_set_enabled(struct net_bridge *br, unsigned long val, { ASSERT_RTNL(); + if (!net_eq(dev_net(br->dev), &init_net)) + NL_SET_ERR_MSG_MOD(extack, "STP does not work in non-root netns"); + if (br_mrp_enabled(br)) { NL_SET_ERR_MSG_MOD(extack, "STP can't be enabled if MRP is already enabled"); -- cgit v1.2.3 From 1d6d537dc55d1f42d16290f00157ac387985b95b Mon Sep 17 00:00:00 2001 From: Daniel Golle Date: Thu, 13 Jul 2023 03:42:29 +0100 Subject: net: ethernet: mtk_eth_soc: handle probe deferral Move the call to of_get_ethdev_address to mtk_add_mac which is part of the probe function and can hence itself return -EPROBE_DEFER should of_get_ethdev_address return -EPROBE_DEFER. This allows us to entirely get rid of the mtk_init function. The problem of of_get_ethdev_address returning -EPROBE_DEFER surfaced in situations in which the NVMEM provider holding the MAC address has not yet be loaded at the time mtk_eth_soc is initially probed. In this case probing of mtk_eth_soc should be deferred instead of falling back to use a random MAC address, so once the NVMEM provider becomes available probing can be repeated. Fixes: 656e705243fd ("net-next: mediatek: add support for MT7623 ethernet") Signed-off-by: Daniel Golle Signed-off-by: David S. Miller --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 834c644b67db..2d15342c260a 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -3846,23 +3846,6 @@ static int mtk_hw_deinit(struct mtk_eth *eth) return 0; } -static int __init mtk_init(struct net_device *dev) -{ - struct mtk_mac *mac = netdev_priv(dev); - struct mtk_eth *eth = mac->hw; - int ret; - - ret = of_get_ethdev_address(mac->of_node, dev); - if (ret) { - /* If the mac address is invalid, use random mac address */ - eth_hw_addr_random(dev); - dev_err(eth->dev, "generated random MAC address %pM\n", - dev->dev_addr); - } - - return 0; -} - static void mtk_uninit(struct net_device *dev) { struct mtk_mac *mac = netdev_priv(dev); @@ -4278,7 +4261,6 @@ static const struct ethtool_ops mtk_ethtool_ops = { }; static const struct net_device_ops mtk_netdev_ops = { - .ndo_init = mtk_init, .ndo_uninit = mtk_uninit, .ndo_open = mtk_open, .ndo_stop = mtk_stop, @@ -4340,6 +4322,17 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np) mac->hw = eth; mac->of_node = np; + err = of_get_ethdev_address(mac->of_node, eth->netdev[id]); + if (err == -EPROBE_DEFER) + return err; + + if (err) { + /* If the mac address is invalid, use random mac address */ + eth_hw_addr_random(eth->netdev[id]); + dev_err(eth->dev, "generated random MAC address %pM\n", + eth->netdev[id]->dev_addr); + } + memset(mac->hwlro_ip, 0, sizeof(mac->hwlro_ip)); mac->hwlro_ip_cnt = 0; -- cgit v1.2.3 From 4ad23d2368ccce6da74edc74e69300a4d75a2379 Mon Sep 17 00:00:00 2001 From: Wang Ming Date: Thu, 13 Jul 2023 17:51:06 +0800 Subject: bna: Remove error checking for debugfs_create_dir() It is expected that most callers should _ignore_ the errors return by debugfs_create_dir() in bnad_debugfs_init(). Signed-off-by: Wang Ming Signed-off-by: David S. Miller --- drivers/net/ethernet/brocade/bna/bnad_debugfs.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/net/ethernet/brocade/bna/bnad_debugfs.c b/drivers/net/ethernet/brocade/bna/bnad_debugfs.c index 04ad0f2b9677..7246e13dd559 100644 --- a/drivers/net/ethernet/brocade/bna/bnad_debugfs.c +++ b/drivers/net/ethernet/brocade/bna/bnad_debugfs.c @@ -512,11 +512,6 @@ bnad_debugfs_init(struct bnad *bnad) if (!bnad->port_debugfs_root) { bnad->port_debugfs_root = debugfs_create_dir(name, bna_debugfs_root); - if (!bnad->port_debugfs_root) { - netdev_warn(bnad->netdev, - "debugfs root dir creation failed\n"); - return; - } atomic_inc(&bna_debugfs_port_count); -- cgit v1.2.3 From a822551c51f0dc063305ca16b9dd0dec7fe1f259 Mon Sep 17 00:00:00 2001 From: Wang Ming Date: Thu, 13 Jul 2023 20:16:03 +0800 Subject: net: ethernet: Remove repeating expression Identify issues that arise by using the tests/doublebitand.cocci semantic patch. Need to remove duplicate expression in if statement. Signed-off-by: Wang Ming Reviewed-by: Jiawen Wu Signed-off-by: David S. Miller --- drivers/net/ethernet/wangxun/libwx/wx_hw.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/wangxun/libwx/wx_hw.c b/drivers/net/ethernet/wangxun/libwx/wx_hw.c index 39a9aeee7aab..6321178fc814 100644 --- a/drivers/net/ethernet/wangxun/libwx/wx_hw.c +++ b/drivers/net/ethernet/wangxun/libwx/wx_hw.c @@ -1511,7 +1511,6 @@ static void wx_configure_rx(struct wx *wx) psrtype = WX_RDB_PL_CFG_L4HDR | WX_RDB_PL_CFG_L3HDR | WX_RDB_PL_CFG_L2HDR | - WX_RDB_PL_CFG_TUN_TUNHDR | WX_RDB_PL_CFG_TUN_TUNHDR; wr32(wx, WX_RDB_PL_CFG(0), psrtype); -- cgit v1.2.3 From 9840036786d90cea11a90d1f30b6dc003b34ee67 Mon Sep 17 00:00:00 2001 From: Yan Zhai Date: Thu, 13 Jul 2023 10:28:00 -0700 Subject: gso: fix dodgy bit handling for GSO_UDP_L4 Commit 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4 packets.") checks DODGY bit for UDP, but for packets that can be fed directly to the device after gso_segs reset, it actually falls through to fragmentation: https://lore.kernel.org/all/CAJPywTKDdjtwkLVUW6LRA2FU912qcDmQOQGt2WaDo28KzYDg+A@mail.gmail.com/ This change restores the expected behavior of GSO_UDP_L4 packets. Fixes: 1fd54773c267 ("udp: allow header check for dodgy GSO_UDP_L4 packets.") Suggested-by: Willem de Bruijn Signed-off-by: Yan Zhai Reviewed-by: Willem de Bruijn Acked-by: Jason Wang Signed-off-by: David S. Miller --- net/ipv4/udp_offload.c | 16 +++++++++++----- net/ipv6/udp_offload.c | 3 +-- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 75aa4de5b731..f402946da344 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -274,13 +274,20 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, __sum16 check; __be16 newlen; - if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) - return __udp_gso_segment_list(gso_skb, features, is_ipv6); - mss = skb_shinfo(gso_skb)->gso_size; if (gso_skb->len <= sizeof(*uh) + mss) return ERR_PTR(-EINVAL); + if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { + /* Packet is from an untrusted source, reset gso_segs. */ + skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh), + mss); + return NULL; + } + + if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST) + return __udp_gso_segment_list(gso_skb, features, is_ipv6); + skb_pull(gso_skb, sizeof(*uh)); /* clear destructor to avoid skb_segment assigning it to tail */ @@ -388,8 +395,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out; - if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && - !skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) return __udp_gso_segment(skb, features, false); mss = skb_shinfo(skb)->gso_size; diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index ad3b8726873e..09fa7a42cb93 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -43,8 +43,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, if (!pskb_may_pull(skb, sizeof(struct udphdr))) goto out; - if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 && - !skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) return __udp_gso_segment(skb, features, true); mss = skb_shinfo(skb)->gso_size; -- cgit v1.2.3 From 24a3298ac9e6bd8de838ab79f7868207170d556d Mon Sep 17 00:00:00 2001 From: Petr Oros Date: Mon, 19 Jun 2023 12:58:13 +0200 Subject: ice: Unregister netdev and devlink_port only once Since commit 6624e780a577fc ("ice: split ice_vsi_setup into smaller functions") ice_vsi_release does things twice. There is unregister netdev which is unregistered in ice_deinit_eth also. It also unregisters the devlink_port twice which is also unregistered in ice_deinit_eth(). This double deregistration is hidden because devl_port_unregister ignores the return value of xa_erase. [ 68.642167] Call Trace: [ 68.650385] ice_devlink_destroy_pf_port+0xe/0x20 [ice] [ 68.655656] ice_vsi_release+0x445/0x690 [ice] [ 68.660147] ice_deinit+0x99/0x280 [ice] [ 68.664117] ice_remove+0x1b6/0x5c0 [ice] [ 171.103841] Call Trace: [ 171.109607] ice_devlink_destroy_pf_port+0xf/0x20 [ice] [ 171.114841] ice_remove+0x158/0x270 [ice] [ 171.118854] pci_device_remove+0x3b/0xc0 [ 171.122779] device_release_driver_internal+0xc7/0x170 [ 171.127912] driver_detach+0x54/0x8c [ 171.131491] bus_remove_driver+0x77/0xd1 [ 171.135406] pci_unregister_driver+0x2d/0xb0 [ 171.139670] ice_module_exit+0xc/0x55f [ice] Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Signed-off-by: Petr Oros Reviewed-by: Maciej Fijalkowski Tested-by: Pucha Himasekhar Reddy (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_lib.c | 27 --------------------------- 1 file changed, 27 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 00e3afd507a4..0054d7e64ec3 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -2972,39 +2972,12 @@ int ice_vsi_release(struct ice_vsi *vsi) return -ENODEV; pf = vsi->back; - /* do not unregister while driver is in the reset recovery pending - * state. Since reset/rebuild happens through PF service task workqueue, - * it's not a good idea to unregister netdev that is associated to the - * PF that is running the work queue items currently. This is done to - * avoid check_flush_dependency() warning on this wq - */ - if (vsi->netdev && !ice_is_reset_in_progress(pf->state) && - (test_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state))) { - unregister_netdev(vsi->netdev); - clear_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state); - } - - if (vsi->type == ICE_VSI_PF) - ice_devlink_destroy_pf_port(pf); - if (test_bit(ICE_FLAG_RSS_ENA, pf->flags)) ice_rss_clean(vsi); ice_vsi_close(vsi); ice_vsi_decfg(vsi); - if (vsi->netdev) { - if (test_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state)) { - unregister_netdev(vsi->netdev); - clear_bit(ICE_VSI_NETDEV_REGISTERED, vsi->state); - } - if (test_bit(ICE_VSI_NETDEV_ALLOCD, vsi->state)) { - free_netdev(vsi->netdev); - vsi->netdev = NULL; - clear_bit(ICE_VSI_NETDEV_ALLOCD, vsi->state); - } - } - /* retain SW VSI data structure since it is needed to unregister and * free VSI netdev when PF is not in reset recovery pending state,\ * for ex: during rmmod. -- cgit v1.2.3 From b3e7b3a6ee92ab927f750a6b19615ce88ece808f Mon Sep 17 00:00:00 2001 From: Michal Swiatkowski Date: Thu, 6 Jul 2023 08:25:51 +0200 Subject: ice: prevent NULL pointer deref during reload Calling ethtool during reload can lead to call trace, because VSI isn't configured for some time, but netdev is alive. To fix it add rtnl lock for VSI deconfig and config. Set ::num_q_vectors to 0 after freeing and add a check for ::tx/rx_rings in ring related ethtool ops. Add proper unroll of filters in ice_start_eth(). Reproduction: $watch -n 0.1 -d 'ethtool -g enp24s0f0np0' $devlink dev reload pci/0000:18:00.0 action driver_reinit Call trace before fix: [66303.926205] BUG: kernel NULL pointer dereference, address: 0000000000000000 [66303.926259] #PF: supervisor read access in kernel mode [66303.926286] #PF: error_code(0x0000) - not-present page [66303.926311] PGD 0 P4D 0 [66303.926332] Oops: 0000 [#1] PREEMPT SMP PTI [66303.926358] CPU: 4 PID: 933821 Comm: ethtool Kdump: loaded Tainted: G OE 6.4.0-rc5+ #1 [66303.926400] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018 [66303.926446] RIP: 0010:ice_get_ringparam+0x22/0x50 [ice] [66303.926649] Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 c0 09 00 00 c7 46 04 e0 1f 00 00 c7 46 10 e0 1f 00 00 48 8b 50 20 <48> 8b 12 0f b7 52 3a 89 56 14 48 8b 40 28 48 8b 00 0f b7 40 58 48 [66303.926722] RSP: 0018:ffffad40472f39c8 EFLAGS: 00010246 [66303.926749] RAX: ffff98a8ada05828 RBX: ffff98a8c46dd060 RCX: ffffad40472f3b48 [66303.926781] RDX: 0000000000000000 RSI: ffff98a8c46dd068 RDI: ffff98a8b23c4000 [66303.926811] RBP: ffffad40472f3b48 R08: 00000000000337b0 R09: 0000000000000000 [66303.926843] R10: 0000000000000001 R11: 0000000000000100 R12: ffff98a8b23c4000 [66303.926874] R13: ffff98a8c46dd060 R14: 000000000000000f R15: ffffad40472f3a50 [66303.926906] FS: 00007f6397966740(0000) GS:ffff98b390900000(0000) knlGS:0000000000000000 [66303.926941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [66303.926967] CR2: 0000000000000000 CR3: 000000011ac20002 CR4: 00000000007706e0 [66303.926999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [66303.927029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [66303.927060] PKRU: 55555554 [66303.927075] Call Trace: [66303.927094] [66303.927111] ? __die+0x23/0x70 [66303.927140] ? page_fault_oops+0x171/0x4e0 [66303.927176] ? exc_page_fault+0x7f/0x180 [66303.927209] ? asm_exc_page_fault+0x26/0x30 [66303.927244] ? ice_get_ringparam+0x22/0x50 [ice] [66303.927433] rings_prepare_data+0x62/0x80 [66303.927469] ethnl_default_doit+0xe2/0x350 [66303.927501] genl_family_rcv_msg_doit.isra.0+0xe3/0x140 [66303.927538] genl_rcv_msg+0x1b1/0x2c0 [66303.927561] ? __pfx_ethnl_default_doit+0x10/0x10 [66303.927590] ? __pfx_genl_rcv_msg+0x10/0x10 [66303.927615] netlink_rcv_skb+0x58/0x110 [66303.927644] genl_rcv+0x28/0x40 [66303.927665] netlink_unicast+0x19e/0x290 [66303.927691] netlink_sendmsg+0x254/0x4d0 [66303.927717] sock_sendmsg+0x93/0xa0 [66303.927743] __sys_sendto+0x126/0x170 [66303.927780] __x64_sys_sendto+0x24/0x30 [66303.928593] do_syscall_64+0x5d/0x90 [66303.929370] ? __count_memcg_events+0x60/0xa0 [66303.930146] ? count_memcg_events.constprop.0+0x1a/0x30 [66303.930920] ? handle_mm_fault+0x9e/0x350 [66303.931688] ? do_user_addr_fault+0x258/0x740 [66303.932452] ? exc_page_fault+0x7f/0x180 [66303.933193] entry_SYSCALL_64_after_hwframe+0x72/0xdc Fixes: 5b246e533d01 ("ice: split probe into smaller functions") Reviewed-by: Przemek Kitszel Signed-off-by: Michal Swiatkowski Reviewed-by: Simon Horman Tested-by: Pucha Himasekhar Reddy (A Contingent worker at Intel) Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/ice/ice_base.c | 2 ++ drivers/net/ethernet/intel/ice/ice_ethtool.c | 13 +++++++++++-- drivers/net/ethernet/intel/ice/ice_main.c | 10 ++++++++-- 3 files changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 4a12316f7b46..b678bdf96f3a 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -800,6 +800,8 @@ void ice_vsi_free_q_vectors(struct ice_vsi *vsi) ice_for_each_q_vector(vsi, v_idx) ice_free_q_vector(vsi, v_idx); + + vsi->num_q_vectors = 0; } /** diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index 8d5cbbd0b3d5..ad4d4702129f 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -2681,8 +2681,13 @@ ice_get_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring, ring->rx_max_pending = ICE_MAX_NUM_DESC; ring->tx_max_pending = ICE_MAX_NUM_DESC; - ring->rx_pending = vsi->rx_rings[0]->count; - ring->tx_pending = vsi->tx_rings[0]->count; + if (vsi->tx_rings && vsi->rx_rings) { + ring->rx_pending = vsi->rx_rings[0]->count; + ring->tx_pending = vsi->tx_rings[0]->count; + } else { + ring->rx_pending = 0; + ring->tx_pending = 0; + } /* Rx mini and jumbo rings are not supported */ ring->rx_mini_max_pending = 0; @@ -2716,6 +2721,10 @@ ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring, return -EINVAL; } + /* Return if there is no rings (device is reloading) */ + if (!vsi->tx_rings || !vsi->rx_rings) + return -EBUSY; + new_tx_cnt = ALIGN(ring->tx_pending, ICE_REQ_DESC_MULTIPLE); if (new_tx_cnt != ring->tx_pending) netdev_info(netdev, "Requested Tx descriptor count rounded up to %d\n", diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 19a5e7f3a075..f02d44455772 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -4430,9 +4430,9 @@ static int ice_start_eth(struct ice_vsi *vsi) if (err) return err; - rtnl_lock(); err = ice_vsi_open(vsi); - rtnl_unlock(); + if (err) + ice_fltr_remove_all(vsi); return err; } @@ -4895,6 +4895,7 @@ int ice_load(struct ice_pf *pf) params = ice_vsi_to_params(vsi); params.flags = ICE_VSI_FLAG_INIT; + rtnl_lock(); err = ice_vsi_cfg(vsi, ¶ms); if (err) goto err_vsi_cfg; @@ -4902,6 +4903,7 @@ int ice_load(struct ice_pf *pf) err = ice_start_eth(ice_get_main_vsi(pf)); if (err) goto err_start_eth; + rtnl_unlock(); err = ice_init_rdma(pf); if (err) @@ -4916,9 +4918,11 @@ int ice_load(struct ice_pf *pf) err_init_rdma: ice_vsi_close(ice_get_main_vsi(pf)); + rtnl_lock(); err_start_eth: ice_vsi_decfg(ice_get_main_vsi(pf)); err_vsi_cfg: + rtnl_unlock(); ice_deinit_dev(pf); return err; } @@ -4931,8 +4935,10 @@ void ice_unload(struct ice_pf *pf) { ice_deinit_features(pf); ice_deinit_rdma(pf); + rtnl_lock(); ice_stop_eth(ice_get_main_vsi(pf)); ice_vsi_decfg(ice_get_main_vsi(pf)); + rtnl_unlock(); ice_deinit_dev(pf); } -- cgit v1.2.3 From a66557c790208610bdc21c0c788e7b9524b1a356 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 13 Jul 2023 21:51:19 -0700 Subject: net: bonding: remove kernel-doc comment marker Change an errant kernel-doc comment marker (/**) to a regular comment to prevent a kernel-doc warning. bonding.h:282: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Returns NULL if the net_device does not belong to any of the bond's slaves Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Randy Dunlap Cc: Jay Vosburgh Cc: Andy Gospodarek Link: https://lore.kernel.org/r/20230714045127.18752-2-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- include/net/bonding.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/bonding.h b/include/net/bonding.h index b57bec6e737e..30ac427cf0c6 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -277,7 +277,7 @@ struct bond_vlan_tag { unsigned short vlan_id; }; -/** +/* * Returns NULL if the net_device does not belong to any of the bond's slaves * * Caller must hold bond lock for read -- cgit v1.2.3 From a63e40444e1bdb2ccdcd7c2e4afa51fdbc6c8589 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 13 Jul 2023 21:51:20 -0700 Subject: net: cfg802154: fix kernel-doc notation warnings Add an enum heading to the kernel-doc comments to prevent kernel-doc warnings. cfg802154.h:174: warning: Cannot understand * @WPAN_PHY_FLAG_TRANSMIT_POWER: Indicates that transceiver will support on line 174 - I thought it was a doc line cfg802154.h:192: warning: Enum value 'WPAN_PHY_FLAG_TXPOWER' not described in enum 'wpan_phy_flags' cfg802154.h:192: warning: Excess enum value 'WPAN_PHY_FLAG_TRANSMIT_POWER' description in 'wpan_phy_flags' Fixes: edea8f7c75ec ("cfg802154: introduce wpan phy flags") Signed-off-by: Randy Dunlap Cc: Alexander Aring Cc: Stefan Schmidt Cc: Marcel Holtmann Acked-by: Miquel Raynal Link: https://lore.kernel.org/r/20230714045127.18752-3-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- include/net/cfg802154.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h index e00057984489..f79ce133e51a 100644 --- a/include/net/cfg802154.h +++ b/include/net/cfg802154.h @@ -170,7 +170,8 @@ wpan_phy_cca_cmp(const struct wpan_phy_cca *a, const struct wpan_phy_cca *b) } /** - * @WPAN_PHY_FLAG_TRANSMIT_POWER: Indicates that transceiver will support + * enum wpan_phy_flags - WPAN PHY state flags + * @WPAN_PHY_FLAG_TXPOWER: Indicates that transceiver will support * transmit power setting. * @WPAN_PHY_FLAG_CCA_ED_LEVEL: Indicates that transceiver will support cca ed * level setting. -- cgit v1.2.3 From cfe57122bba5dc16ad64b162d97e42923fc7587e Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 13 Jul 2023 21:51:21 -0700 Subject: codel: fix kernel-doc notation warnings Use '@' before the struct member names in kernel-doc notation to prevent kernel-doc warnings. codel.h:158: warning: Function parameter or member 'ecn_mark' not described in 'codel_stats' codel.h:158: warning: Function parameter or member 'ce_mark' not described in 'codel_stats' Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM") Signed-off-by: Randy Dunlap Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Cc: Dave Taht Link: https://lore.kernel.org/r/20230714045127.18752-4-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- include/net/codel.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/net/codel.h b/include/net/codel.h index 5fed2f16cb8d..aa80f744826c 100644 --- a/include/net/codel.h +++ b/include/net/codel.h @@ -145,8 +145,8 @@ struct codel_vars { * @maxpacket: largest packet we've seen so far * @drop_count: temp count of dropped packets in dequeue() * @drop_len: bytes of dropped packets in dequeue() - * ecn_mark: number of packets we ECN marked instead of dropping - * ce_mark: number of packets CE marked because sojourn time was above ce_threshold + * @ecn_mark: number of packets we ECN marked instead of dropping + * @ce_mark: number of packets CE marked because sojourn time was above ce_threshold */ struct codel_stats { u32 maxpacket; -- cgit v1.2.3 From 839f55c5ebdfc2c0825bc7e711a7dedaca11f84e Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 13 Jul 2023 21:51:22 -0700 Subject: devlink: fix kernel-doc notation warnings Spell function or struct member names correctly. Use ':' instead of '-' for struct member entries. Mark one field as private in kernel-doc. Add a few entries that were missing. Fix a typo. These changes prevent kernel-doc warnings: devlink.h:252: warning: Function parameter or member 'field_id' not described in 'devlink_dpipe_match' devlink.h:267: warning: Function parameter or member 'field_id' not described in 'devlink_dpipe_action' devlink.h:310: warning: Function parameter or member 'match_values_count' not described in 'devlink_dpipe_entry' devlink.h:355: warning: Function parameter or member 'list' not described in 'devlink_dpipe_table' devlink.h:374: warning: Function parameter or member 'actions_dump' not described in 'devlink_dpipe_table_ops' devlink.h:374: warning: Function parameter or member 'matches_dump' not described in 'devlink_dpipe_table_ops' devlink.h:374: warning: Function parameter or member 'entries_dump' not described in 'devlink_dpipe_table_ops' devlink.h:374: warning: Function parameter or member 'counters_set_update' not described in 'devlink_dpipe_table_ops' devlink.h:374: warning: Function parameter or member 'size_get' not described in 'devlink_dpipe_table_ops' devlink.h:384: warning: Function parameter or member 'headers' not described in 'devlink_dpipe_headers' devlink.h:384: warning: Function parameter or member 'headers_count' not described in 'devlink_dpipe_headers' devlink.h:398: warning: Function parameter or member 'unit' not described in 'devlink_resource_size_params' devlink.h:487: warning: Function parameter or member 'id' not described in 'devlink_param' devlink.h:645: warning: Function parameter or member 'overwrite_mask' not described in 'devlink_flash_update_params' Fixes: 1555d204e743 ("devlink: Support for pipeline debug (dpipe)") Fixes: d9f9b9a4d05f ("devlink: Add support for resource abstraction") Fixes: eabaef1896bc ("devlink: Add devlink_param register and unregister") Fixes: 5d5b4128c4ca ("devlink: introduce flash update overwrite mask") Signed-off-by: Randy Dunlap Cc: Jiri Pirko Cc: Moshe Shemesh Cc: Jacob Keller Link: https://lore.kernel.org/r/20230714045127.18752-5-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- include/net/devlink.h | 28 ++++++++++++++++------------ 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/include/net/devlink.h b/include/net/devlink.h index 9a3c51aa6e81..0cdb4b16e5b5 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -221,7 +221,7 @@ struct devlink_dpipe_field { /** * struct devlink_dpipe_header - dpipe header object * @name: header name - * @id: index, global/local detrmined by global bit + * @id: index, global/local determined by global bit * @fields: fields * @fields_count: number of fields * @global: indicates if header is shared like most protocol header @@ -241,7 +241,7 @@ struct devlink_dpipe_header { * @header_index: header index (packets can have several headers of same * type like in case of tunnels) * @header: header - * @fieled_id: field index + * @field_id: field index */ struct devlink_dpipe_match { enum devlink_dpipe_match_type type; @@ -256,7 +256,7 @@ struct devlink_dpipe_match { * @header_index: header index (packets can have several headers of same * type like in case of tunnels) * @header: header - * @fieled_id: field index + * @field_id: field index */ struct devlink_dpipe_action { enum devlink_dpipe_action_type type; @@ -292,7 +292,7 @@ struct devlink_dpipe_value { * struct devlink_dpipe_entry - table entry object * @index: index of the entry in the table * @match_values: match values - * @matche_values_count: count of matches tuples + * @match_values_count: count of matches tuples * @action_values: actions values * @action_values_count: count of actions values * @counter: value of counter @@ -342,7 +342,9 @@ struct devlink_dpipe_table_ops; */ struct devlink_dpipe_table { void *priv; + /* private: */ struct list_head list; + /* public: */ const char *name; bool counters_enabled; bool counter_control_extern; @@ -355,13 +357,13 @@ struct devlink_dpipe_table { /** * struct devlink_dpipe_table_ops - dpipe_table ops - * @actions_dump - dumps all tables actions - * @matches_dump - dumps all tables matches - * @entries_dump - dumps all active entries in the table - * @counters_set_update - when changing the counter status hardware sync + * @actions_dump: dumps all tables actions + * @matches_dump: dumps all tables matches + * @entries_dump: dumps all active entries in the table + * @counters_set_update: when changing the counter status hardware sync * maybe needed to allocate/free counter related * resources - * @size_get - get size + * @size_get: get size */ struct devlink_dpipe_table_ops { int (*actions_dump)(void *priv, struct sk_buff *skb); @@ -374,8 +376,8 @@ struct devlink_dpipe_table_ops { /** * struct devlink_dpipe_headers - dpipe headers - * @headers - header array can be shared (global bit) or driver specific - * @headers_count - count of headers + * @headers: header array can be shared (global bit) or driver specific + * @headers_count: count of headers */ struct devlink_dpipe_headers { struct devlink_dpipe_header **headers; @@ -387,7 +389,7 @@ struct devlink_dpipe_headers { * @size_min: minimum size which can be set * @size_max: maximum size which can be set * @size_granularity: size granularity - * @size_unit: resource's basic unit + * @unit: resource's basic unit */ struct devlink_resource_size_params { u64 size_min; @@ -457,6 +459,7 @@ struct devlink_flash_notify { /** * struct devlink_param - devlink configuration parameter data + * @id: devlink parameter id number * @name: name of the parameter * @generic: indicates if the parameter is generic or driver specific * @type: parameter type @@ -632,6 +635,7 @@ enum devlink_param_generic_id { * struct devlink_flash_update_params - Flash Update parameters * @fw: pointer to the firmware data to update from * @component: the flash component to update + * @overwrite_mask: which types of flash update are supported (may be %0) * * With the exception of fw, drivers must opt-in to parameters by * setting the appropriate bit in the supported_flash_update_params field in -- cgit v1.2.3 From d20909a0689f585fb6443da2d8797f7ad549f931 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 13 Jul 2023 21:51:23 -0700 Subject: inet: frags: eliminate kernel-doc warning Modify the anonymous enum kernel-doc content so that it doesn't cause a kernel-doc warning. inet_frag.h:33: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Fixes: 1ab1934ed80a ("inet: frags: enum the flag definitions and add descriptions") Signed-off-by: Randy Dunlap Cc: Nikolay Aleksandrov Link: https://lore.kernel.org/r/20230714045127.18752-6-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- include/net/inet_frag.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index 325ad893f624..153960663ce4 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -29,7 +29,7 @@ struct fqdir { }; /** - * fragment queue flags + * enum: fragment queue flags * * @INET_FRAG_FIRST_IN: first fragment has arrived * @INET_FRAG_LAST_IN: final fragment has arrived -- cgit v1.2.3 From 201a08830d8c4698f97ce65329ba92e36b7f402a Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 13 Jul 2023 21:51:24 -0700 Subject: net: llc: fix kernel-doc notation warnings Use the corrent function parameter name or format to prevent kernel-doc warnings. Add 2 function parameter descriptions to prevent kernel-doc warnings. llc_pdu.h:278: warning: Function parameter or member 'da' not described in 'llc_pdu_decode_da' llc_pdu.h:278: warning: Excess function parameter 'sa' description in 'llc_pdu_decode_da' llc_pdu.h:330: warning: Function parameter or member 'skb' not described in 'llc_pdu_init_as_test_cmd' llc_pdu.h:379: warning: Function parameter or member 'svcs_supported' not described in 'llc_pdu_init_as_xid_cmd' llc_pdu.h:379: warning: Function parameter or member 'rx_window' not described in 'llc_pdu_init_as_xid_cmd' Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Randy Dunlap Link: https://lore.kernel.org/r/20230714045127.18752-7-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- include/net/llc_pdu.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/include/net/llc_pdu.h b/include/net/llc_pdu.h index 49aa79c7b278..7e73f8e5e497 100644 --- a/include/net/llc_pdu.h +++ b/include/net/llc_pdu.h @@ -269,7 +269,7 @@ static inline void llc_pdu_decode_sa(struct sk_buff *skb, u8 *sa) /** * llc_pdu_decode_da - extracts dest address of input frame * @skb: input skb that destination address must be extracted from it - * @sa: pointer to destination address (6 byte array). + * @da: pointer to destination address (6 byte array). * * This function extracts destination address(MAC) of input frame. */ @@ -321,7 +321,7 @@ static inline void llc_pdu_init_as_ui_cmd(struct sk_buff *skb) /** * llc_pdu_init_as_test_cmd - sets PDU as TEST - * @skb - Address of the skb to build + * @skb: Address of the skb to build * * Sets a PDU as TEST */ @@ -369,6 +369,8 @@ struct llc_xid_info { /** * llc_pdu_init_as_xid_cmd - sets bytes 3, 4 & 5 of LLC header as XID * @skb: input skb that header must be set into it. + * @svcs_supported: The class of the LLC (I or II) + * @rx_window: The size of the receive window of the LLC * * This function sets third,fourth,fifth and sixth bytes of LLC header as * a XID PDU. -- cgit v1.2.3 From d1533d726aa1efca3a7ae8f40c94ccb149d22e6a Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 13 Jul 2023 21:51:25 -0700 Subject: net: NSH: fix kernel-doc notation warning Use the struct member's name and the correct format to prevent a kernel-doc warning. nsh.h:200: warning: Function parameter or member 'context' not described in 'nsh_md1_ctx' Fixes: 1f0b7744c505 ("net: add NSH header structures and helpers") Signed-off-by: Randy Dunlap Cc: Jiri Benc Link: https://lore.kernel.org/r/20230714045127.18752-8-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- include/net/nsh.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/nsh.h b/include/net/nsh.h index 350b1ad11c7f..16a751093896 100644 --- a/include/net/nsh.h +++ b/include/net/nsh.h @@ -192,7 +192,7 @@ /** * struct nsh_md1_ctx - Keeps track of NSH context data - * @nshc<1-4>: NSH Contexts. + * @context: NSH Contexts. */ struct nsh_md1_ctx { __be32 context[4]; -- cgit v1.2.3 From d1cca974548d76e0fa8b6761d25f7b47376a3780 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 13 Jul 2023 21:51:26 -0700 Subject: pie: fix kernel-doc notation warning Spell a struct member's name correctly to prevent a kernel-doc warning. pie.h:38: warning: Function parameter or member 'tupdate' not described in 'pie_params' Fixes: b42a3d7c7cff ("pie: improve comments and commenting style") Signed-off-by: Randy Dunlap Cc: Leslie Monis Cc: "Mohit P. Tahiliani" Cc: Gautam Ramakrishnan Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Link: https://lore.kernel.org/r/20230714045127.18752-9-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- include/net/pie.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/pie.h b/include/net/pie.h index 3fe2361e03b4..01cbc66825a4 100644 --- a/include/net/pie.h +++ b/include/net/pie.h @@ -17,7 +17,7 @@ /** * struct pie_params - contains pie parameters * @target: target delay in pschedtime - * @tudpate: interval at which drop probability is calculated + * @tupdate: interval at which drop probability is calculated * @limit: total number of packets that can be in the queue * @alpha: parameter to control drop probability * @beta: parameter to control drop probability -- cgit v1.2.3 From 04be3c95da8266a099c8446b6d1205ccf8a62e66 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Thu, 13 Jul 2023 21:51:27 -0700 Subject: rsi: remove kernel-doc comment marker Change an errant kernel-doc comment marker (/**) to a regular comment to prevent a kernel-doc warning. rsi_91x.h:3: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Copyright (c) 2017 Redpine Signals Inc. Fixes: 4c10d56a76bb ("rsi: add header file rsi_91x") Signed-off-by: Randy Dunlap Cc: Prameela Rani Garnepudi Cc: Siva Rebbagondla Acked-by: Kalle Valo Link: https://lore.kernel.org/r/20230714045127.18752-10-rdunlap@infradead.org Signed-off-by: Jakub Kicinski --- include/net/rsi_91x.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/rsi_91x.h b/include/net/rsi_91x.h index 040f07b47f1f..b2283762e446 100644 --- a/include/net/rsi_91x.h +++ b/include/net/rsi_91x.h @@ -1,4 +1,4 @@ -/** +/* * Copyright (c) 2017 Redpine Signals Inc. * * Permission to use, copy, modify, and/or distribute this software for any -- cgit v1.2.3 From b3d0e0489430735e2e7626aa37e6462cdd136e9d Mon Sep 17 00:00:00 2001 From: Victor Nogueira Date: Thu, 13 Jul 2023 15:05:10 -0300 Subject: net: sched: cls_matchall: Undo tcf_bind_filter in case of failure after mall_set_parms In case an error occurred after mall_set_parms executed successfully, we must undo the tcf_bind_filter call it issues. Fix that by calling tcf_unbind_filter in err_replace_hw_filter label. Fixes: ec2507d2a306 ("net/sched: cls_matchall: Fix error path") Signed-off-by: Victor Nogueira Acked-by: Jamal Hadi Salim Reviewed-by: Pedro Tammela Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- net/sched/cls_matchall.c | 35 ++++++++++++----------------------- 1 file changed, 12 insertions(+), 23 deletions(-) diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c index fa3bbd187eb9..c4ed11df6254 100644 --- a/net/sched/cls_matchall.c +++ b/net/sched/cls_matchall.c @@ -159,26 +159,6 @@ static const struct nla_policy mall_policy[TCA_MATCHALL_MAX + 1] = { [TCA_MATCHALL_FLAGS] = { .type = NLA_U32 }, }; -static int mall_set_parms(struct net *net, struct tcf_proto *tp, - struct cls_mall_head *head, - unsigned long base, struct nlattr **tb, - struct nlattr *est, u32 flags, u32 fl_flags, - struct netlink_ext_ack *extack) -{ - int err; - - err = tcf_exts_validate_ex(net, tp, tb, est, &head->exts, flags, - fl_flags, extack); - if (err < 0) - return err; - - if (tb[TCA_MATCHALL_CLASSID]) { - head->res.classid = nla_get_u32(tb[TCA_MATCHALL_CLASSID]); - tcf_bind_filter(tp, &head->res, base); - } - return 0; -} - static int mall_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, @@ -187,6 +167,7 @@ static int mall_change(struct net *net, struct sk_buff *in_skb, { struct cls_mall_head *head = rtnl_dereference(tp->root); struct nlattr *tb[TCA_MATCHALL_MAX + 1]; + bool bound_to_filter = false; struct cls_mall_head *new; u32 userflags = 0; int err; @@ -226,11 +207,17 @@ static int mall_change(struct net *net, struct sk_buff *in_skb, goto err_alloc_percpu; } - err = mall_set_parms(net, tp, new, base, tb, tca[TCA_RATE], - flags, new->flags, extack); - if (err) + err = tcf_exts_validate_ex(net, tp, tb, tca[TCA_RATE], + &new->exts, flags, new->flags, extack); + if (err < 0) goto err_set_parms; + if (tb[TCA_MATCHALL_CLASSID]) { + new->res.classid = nla_get_u32(tb[TCA_MATCHALL_CLASSID]); + tcf_bind_filter(tp, &new->res, base); + bound_to_filter = true; + } + if (!tc_skip_hw(new->flags)) { err = mall_replace_hw_filter(tp, new, (unsigned long)new, extack); @@ -246,6 +233,8 @@ static int mall_change(struct net *net, struct sk_buff *in_skb, return 0; err_replace_hw_filter: + if (bound_to_filter) + tcf_unbind_filter(tp, &new->res); err_set_parms: free_percpu(new->pf); err_alloc_percpu: -- cgit v1.2.3 From 9cb36faedeafb9720ac236aeae2ea57091d90a09 Mon Sep 17 00:00:00 2001 From: Victor Nogueira Date: Thu, 13 Jul 2023 15:05:11 -0300 Subject: net: sched: cls_u32: Undo tcf_bind_filter if u32_replace_hw_knode When u32_replace_hw_knode fails, we need to undo the tcf_bind_filter operation done at u32_set_parms. Fixes: d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.") Signed-off-by: Victor Nogueira Acked-by: Jamal Hadi Salim Reviewed-by: Pedro Tammela Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- net/sched/cls_u32.c | 41 ++++++++++++++++++++++++++++++----------- 1 file changed, 30 insertions(+), 11 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index d15d50de7980..ed358466d042 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -712,8 +712,23 @@ static const struct nla_policy u32_policy[TCA_U32_MAX + 1] = { [TCA_U32_FLAGS] = { .type = NLA_U32 }, }; +static void u32_unbind_filter(struct tcf_proto *tp, struct tc_u_knode *n, + struct nlattr **tb) +{ + if (tb[TCA_U32_CLASSID]) + tcf_unbind_filter(tp, &n->res); +} + +static void u32_bind_filter(struct tcf_proto *tp, struct tc_u_knode *n, + unsigned long base, struct nlattr **tb) +{ + if (tb[TCA_U32_CLASSID]) { + n->res.classid = nla_get_u32(tb[TCA_U32_CLASSID]); + tcf_bind_filter(tp, &n->res, base); + } +} + static int u32_set_parms(struct net *net, struct tcf_proto *tp, - unsigned long base, struct tc_u_knode *n, struct nlattr **tb, struct nlattr *est, u32 flags, u32 fl_flags, struct netlink_ext_ack *extack) @@ -760,10 +775,6 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp, if (ht_old) ht_old->refcnt--; } - if (tb[TCA_U32_CLASSID]) { - n->res.classid = nla_get_u32(tb[TCA_U32_CLASSID]); - tcf_bind_filter(tp, &n->res, base); - } if (ifindex >= 0) n->ifindex = ifindex; @@ -903,17 +914,20 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, if (!new) return -ENOMEM; - err = u32_set_parms(net, tp, base, new, tb, - tca[TCA_RATE], flags, new->flags, - extack); + err = u32_set_parms(net, tp, new, tb, tca[TCA_RATE], + flags, new->flags, extack); if (err) { __u32_destroy_key(new); return err; } + u32_bind_filter(tp, new, base, tb); + err = u32_replace_hw_knode(tp, new, flags, extack); if (err) { + u32_unbind_filter(tp, new, tb); + __u32_destroy_key(new); return err; } @@ -1074,15 +1088,18 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, } #endif - err = u32_set_parms(net, tp, base, n, tb, tca[TCA_RATE], + err = u32_set_parms(net, tp, n, tb, tca[TCA_RATE], flags, n->flags, extack); + + u32_bind_filter(tp, n, base, tb); + if (err == 0) { struct tc_u_knode __rcu **ins; struct tc_u_knode *pins; err = u32_replace_hw_knode(tp, n, flags, extack); if (err) - goto errhw; + goto errunbind; if (!tc_in_hw(n->flags)) n->flags |= TCA_CLS_FLAGS_NOT_IN_HW; @@ -1100,7 +1117,9 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, return 0; } -errhw: +errunbind: + u32_unbind_filter(tp, n, tb); + #ifdef CONFIG_CLS_U32_MARK free_percpu(n->pcpu_success); #endif -- cgit v1.2.3 From e8d3d78c19be0264a5692bed477c303523aead31 Mon Sep 17 00:00:00 2001 From: Victor Nogueira Date: Thu, 13 Jul 2023 15:05:12 -0300 Subject: net: sched: cls_u32: Undo refcount decrement in case update failed In the case of an update, when TCA_U32_LINK is set, u32_set_parms will decrement the refcount of the ht_down (struct tc_u_hnode) pointer present in the older u32 filter which we are replacing. However, if u32_replace_hw_knode errors out, the update command fails and that ht_down pointer continues decremented. To fix that, when u32_replace_hw_knode fails, check if ht_down's refcount was decremented and undo the decrement. Fixes: d34e3e181395 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.") Signed-off-by: Victor Nogueira Acked-by: Jamal Hadi Salim Reviewed-by: Pedro Tammela Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- net/sched/cls_u32.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index ed358466d042..5abf31e432ca 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -928,6 +928,13 @@ static int u32_change(struct net *net, struct sk_buff *in_skb, if (err) { u32_unbind_filter(tp, new, tb); + if (tb[TCA_U32_LINK]) { + struct tc_u_hnode *ht_old; + + ht_old = rtnl_dereference(n->ht_down); + if (ht_old) + ht_old->refcnt++; + } __u32_destroy_key(new); return err; } -- cgit v1.2.3 From 26a22194927e8521e304ed75c2f38d8068d55fc7 Mon Sep 17 00:00:00 2001 From: Victor Nogueira Date: Thu, 13 Jul 2023 15:05:13 -0300 Subject: net: sched: cls_bpf: Undo tcf_bind_filter in case of an error If cls_bpf_offload errors out, we must also undo tcf_bind_filter that was done before the error. Fix that by calling tcf_unbind_filter in errout_parms. Fixes: eadb41489fd2 ("net: cls_bpf: add support for marking filters as hardware-only") Signed-off-by: Victor Nogueira Acked-by: Jamal Hadi Salim Reviewed-by: Pedro Tammela Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- net/sched/cls_bpf.c | 99 +++++++++++++++++++++++++---------------------------- 1 file changed, 47 insertions(+), 52 deletions(-) diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 466c26df853a..382c7a71f81f 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -406,56 +406,6 @@ static int cls_bpf_prog_from_efd(struct nlattr **tb, struct cls_bpf_prog *prog, return 0; } -static int cls_bpf_set_parms(struct net *net, struct tcf_proto *tp, - struct cls_bpf_prog *prog, unsigned long base, - struct nlattr **tb, struct nlattr *est, u32 flags, - struct netlink_ext_ack *extack) -{ - bool is_bpf, is_ebpf, have_exts = false; - u32 gen_flags = 0; - int ret; - - is_bpf = tb[TCA_BPF_OPS_LEN] && tb[TCA_BPF_OPS]; - is_ebpf = tb[TCA_BPF_FD]; - if ((!is_bpf && !is_ebpf) || (is_bpf && is_ebpf)) - return -EINVAL; - - ret = tcf_exts_validate(net, tp, tb, est, &prog->exts, flags, - extack); - if (ret < 0) - return ret; - - if (tb[TCA_BPF_FLAGS]) { - u32 bpf_flags = nla_get_u32(tb[TCA_BPF_FLAGS]); - - if (bpf_flags & ~TCA_BPF_FLAG_ACT_DIRECT) - return -EINVAL; - - have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT; - } - if (tb[TCA_BPF_FLAGS_GEN]) { - gen_flags = nla_get_u32(tb[TCA_BPF_FLAGS_GEN]); - if (gen_flags & ~CLS_BPF_SUPPORTED_GEN_FLAGS || - !tc_flags_valid(gen_flags)) - return -EINVAL; - } - - prog->exts_integrated = have_exts; - prog->gen_flags = gen_flags; - - ret = is_bpf ? cls_bpf_prog_from_ops(tb, prog) : - cls_bpf_prog_from_efd(tb, prog, gen_flags, tp); - if (ret < 0) - return ret; - - if (tb[TCA_BPF_CLASSID]) { - prog->res.classid = nla_get_u32(tb[TCA_BPF_CLASSID]); - tcf_bind_filter(tp, &prog->res, base); - } - - return 0; -} - static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, struct tcf_proto *tp, unsigned long base, u32 handle, struct nlattr **tca, @@ -463,9 +413,12 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, struct netlink_ext_ack *extack) { struct cls_bpf_head *head = rtnl_dereference(tp->root); + bool is_bpf, is_ebpf, have_exts = false; struct cls_bpf_prog *oldprog = *arg; struct nlattr *tb[TCA_BPF_MAX + 1]; + bool bound_to_filter = false; struct cls_bpf_prog *prog; + u32 gen_flags = 0; int ret; if (tca[TCA_OPTIONS] == NULL) @@ -504,11 +457,51 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, goto errout; prog->handle = handle; - ret = cls_bpf_set_parms(net, tp, prog, base, tb, tca[TCA_RATE], flags, - extack); + is_bpf = tb[TCA_BPF_OPS_LEN] && tb[TCA_BPF_OPS]; + is_ebpf = tb[TCA_BPF_FD]; + if ((!is_bpf && !is_ebpf) || (is_bpf && is_ebpf)) { + ret = -EINVAL; + goto errout_idr; + } + + ret = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &prog->exts, + flags, extack); + if (ret < 0) + goto errout_idr; + + if (tb[TCA_BPF_FLAGS]) { + u32 bpf_flags = nla_get_u32(tb[TCA_BPF_FLAGS]); + + if (bpf_flags & ~TCA_BPF_FLAG_ACT_DIRECT) { + ret = -EINVAL; + goto errout_idr; + } + + have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT; + } + if (tb[TCA_BPF_FLAGS_GEN]) { + gen_flags = nla_get_u32(tb[TCA_BPF_FLAGS_GEN]); + if (gen_flags & ~CLS_BPF_SUPPORTED_GEN_FLAGS || + !tc_flags_valid(gen_flags)) { + ret = -EINVAL; + goto errout_idr; + } + } + + prog->exts_integrated = have_exts; + prog->gen_flags = gen_flags; + + ret = is_bpf ? cls_bpf_prog_from_ops(tb, prog) : + cls_bpf_prog_from_efd(tb, prog, gen_flags, tp); if (ret < 0) goto errout_idr; + if (tb[TCA_BPF_CLASSID]) { + prog->res.classid = nla_get_u32(tb[TCA_BPF_CLASSID]); + tcf_bind_filter(tp, &prog->res, base); + bound_to_filter = true; + } + ret = cls_bpf_offload(tp, prog, oldprog, extack); if (ret) goto errout_parms; @@ -530,6 +523,8 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, return 0; errout_parms: + if (bound_to_filter) + tcf_unbind_filter(tp, &prog->res); cls_bpf_free_parms(prog); errout_idr: if (!oldprog) -- cgit v1.2.3 From ac177a330077f264664f56259038e121bb214bec Mon Sep 17 00:00:00 2001 From: Victor Nogueira Date: Thu, 13 Jul 2023 15:05:14 -0300 Subject: net: sched: cls_flower: Undo tcf_bind_filter in case of an error If TCA_FLOWER_CLASSID is specified in the netlink message, the code will call tcf_bind_filter. However, if any error occurs after that, the code should undo this by calling tcf_unbind_filter. Fixes: 77b9900ef53a ("tc: introduce Flower classifier") Signed-off-by: Victor Nogueira Acked-by: Jamal Hadi Salim Reviewed-by: Pedro Tammela Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- net/sched/cls_flower.c | 99 ++++++++++++++++++++++++-------------------------- 1 file changed, 47 insertions(+), 52 deletions(-) diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index f2b0bc4142fe..8da9d039d964 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -2173,53 +2173,6 @@ static bool fl_needs_tc_skb_ext(const struct fl_flow_key *mask) return mask->meta.l2_miss; } -static int fl_set_parms(struct net *net, struct tcf_proto *tp, - struct cls_fl_filter *f, struct fl_flow_mask *mask, - unsigned long base, struct nlattr **tb, - struct nlattr *est, - struct fl_flow_tmplt *tmplt, - u32 flags, u32 fl_flags, - struct netlink_ext_ack *extack) -{ - int err; - - err = tcf_exts_validate_ex(net, tp, tb, est, &f->exts, flags, - fl_flags, extack); - if (err < 0) - return err; - - if (tb[TCA_FLOWER_CLASSID]) { - f->res.classid = nla_get_u32(tb[TCA_FLOWER_CLASSID]); - if (flags & TCA_ACT_FLAGS_NO_RTNL) - rtnl_lock(); - tcf_bind_filter(tp, &f->res, base); - if (flags & TCA_ACT_FLAGS_NO_RTNL) - rtnl_unlock(); - } - - err = fl_set_key(net, tb, &f->key, &mask->key, extack); - if (err) - return err; - - fl_mask_update_range(mask); - fl_set_masked_key(&f->mkey, &f->key, mask); - - if (!fl_mask_fits_tmplt(tmplt, mask)) { - NL_SET_ERR_MSG_MOD(extack, "Mask does not fit the template"); - return -EINVAL; - } - - /* Enable tc skb extension if filter matches on data extracted from - * this extension. - */ - if (fl_needs_tc_skb_ext(&mask->key)) { - f->needs_tc_skb_ext = 1; - tc_skb_ext_tc_enable(); - } - - return 0; -} - static int fl_ht_insert_unique(struct cls_fl_filter *fnew, struct cls_fl_filter *fold, bool *in_ht) @@ -2251,6 +2204,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, struct cls_fl_head *head = fl_head_dereference(tp); bool rtnl_held = !(flags & TCA_ACT_FLAGS_NO_RTNL); struct cls_fl_filter *fold = *arg; + bool bound_to_filter = false; struct cls_fl_filter *fnew; struct fl_flow_mask *mask; struct nlattr **tb; @@ -2335,15 +2289,46 @@ static int fl_change(struct net *net, struct sk_buff *in_skb, if (err < 0) goto errout_idr; - err = fl_set_parms(net, tp, fnew, mask, base, tb, tca[TCA_RATE], - tp->chain->tmplt_priv, flags, fnew->flags, - extack); - if (err) + err = tcf_exts_validate_ex(net, tp, tb, tca[TCA_RATE], + &fnew->exts, flags, fnew->flags, + extack); + if (err < 0) goto errout_idr; + if (tb[TCA_FLOWER_CLASSID]) { + fnew->res.classid = nla_get_u32(tb[TCA_FLOWER_CLASSID]); + if (flags & TCA_ACT_FLAGS_NO_RTNL) + rtnl_lock(); + tcf_bind_filter(tp, &fnew->res, base); + if (flags & TCA_ACT_FLAGS_NO_RTNL) + rtnl_unlock(); + bound_to_filter = true; + } + + err = fl_set_key(net, tb, &fnew->key, &mask->key, extack); + if (err) + goto unbind_filter; + + fl_mask_update_range(mask); + fl_set_masked_key(&fnew->mkey, &fnew->key, mask); + + if (!fl_mask_fits_tmplt(tp->chain->tmplt_priv, mask)) { + NL_SET_ERR_MSG_MOD(extack, "Mask does not fit the template"); + err = -EINVAL; + goto unbind_filter; + } + + /* Enable tc skb extension if filter matches on data extracted from + * this extension. + */ + if (fl_needs_tc_skb_ext(&mask->key)) { + fnew->needs_tc_skb_ext = 1; + tc_skb_ext_tc_enable(); + } + err = fl_check_assign_mask(head, fnew, fold, mask); if (err) - goto errout_idr; + goto unbind_filter; err = fl_ht_insert_unique(fnew, fold, &in_ht); if (err) @@ -2434,6 +2419,16 @@ errout_hw: fnew->mask->filter_ht_params); errout_mask: fl_mask_put(head, fnew->mask); + +unbind_filter: + if (bound_to_filter) { + if (flags & TCA_ACT_FLAGS_NO_RTNL) + rtnl_lock(); + tcf_unbind_filter(tp, &fnew->res); + if (flags & TCA_ACT_FLAGS_NO_RTNL) + rtnl_unlock(); + } + errout_idr: if (!fold) idr_remove(&head->handle_idr, fnew->handle); -- cgit v1.2.3 From 4bdf79d686b49ac49373b36466acfb93972c7d7c Mon Sep 17 00:00:00 2001 From: Tristram Ha Date: Thu, 13 Jul 2023 17:46:22 -0700 Subject: net: dsa: microchip: correct KSZ8795 static MAC table access The KSZ8795 driver code was modified to use on KSZ8863/73, which has different register definitions. Some of the new KSZ8795 register information are wrong compared to previous code. KSZ8795 also behaves differently in that the STATIC_MAC_TABLE_USE_FID and STATIC_MAC_TABLE_FID bits are off by 1 when doing MAC table reading than writing. To compensate that a special code was added to shift the register value by 1 before applying those bits. This is wrong when the code is running on KSZ8863, so this special code is only executed when KSZ8795 is detected. Fixes: 4b20a07e103f ("net: dsa: microchip: ksz8795: add support for ksz88xx chips") Signed-off-by: Tristram Ha Reviewed-by: Horatiu Vultur Reviewed-by: Simon Horman Signed-off-by: David S. Miller --- drivers/net/dsa/microchip/ksz8795.c | 8 +++++++- drivers/net/dsa/microchip/ksz_common.c | 8 ++++---- drivers/net/dsa/microchip/ksz_common.h | 7 +++++++ 3 files changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c index 84d502589f8e..91aba470fb2f 100644 --- a/drivers/net/dsa/microchip/ksz8795.c +++ b/drivers/net/dsa/microchip/ksz8795.c @@ -506,7 +506,13 @@ static int ksz8_r_sta_mac_table(struct ksz_device *dev, u16 addr, (data_hi & masks[STATIC_MAC_TABLE_FWD_PORTS]) >> shifts[STATIC_MAC_FWD_PORTS]; alu->is_override = (data_hi & masks[STATIC_MAC_TABLE_OVERRIDE]) ? 1 : 0; - data_hi >>= 1; + + /* KSZ8795 family switches have STATIC_MAC_TABLE_USE_FID and + * STATIC_MAC_TABLE_FID definitions off by 1 when doing read on the + * static MAC table compared to doing write. + */ + if (ksz_is_ksz87xx(dev)) + data_hi >>= 1; alu->is_static = true; alu->is_use_fid = (data_hi & masks[STATIC_MAC_TABLE_USE_FID]) ? 1 : 0; alu->fid = (data_hi & masks[STATIC_MAC_TABLE_FID]) >> diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index 813b91a816bb..b18cd170ec06 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -331,13 +331,13 @@ static const u32 ksz8795_masks[] = { [STATIC_MAC_TABLE_VALID] = BIT(21), [STATIC_MAC_TABLE_USE_FID] = BIT(23), [STATIC_MAC_TABLE_FID] = GENMASK(30, 24), - [STATIC_MAC_TABLE_OVERRIDE] = BIT(26), - [STATIC_MAC_TABLE_FWD_PORTS] = GENMASK(24, 20), + [STATIC_MAC_TABLE_OVERRIDE] = BIT(22), + [STATIC_MAC_TABLE_FWD_PORTS] = GENMASK(20, 16), [DYNAMIC_MAC_TABLE_ENTRIES_H] = GENMASK(6, 0), - [DYNAMIC_MAC_TABLE_MAC_EMPTY] = BIT(8), + [DYNAMIC_MAC_TABLE_MAC_EMPTY] = BIT(7), [DYNAMIC_MAC_TABLE_NOT_READY] = BIT(7), [DYNAMIC_MAC_TABLE_ENTRIES] = GENMASK(31, 29), - [DYNAMIC_MAC_TABLE_FID] = GENMASK(26, 20), + [DYNAMIC_MAC_TABLE_FID] = GENMASK(22, 16), [DYNAMIC_MAC_TABLE_SRC_PORT] = GENMASK(26, 24), [DYNAMIC_MAC_TABLE_TIMESTAMP] = GENMASK(28, 27), [P_MII_TX_FLOW_CTRL] = BIT(5), diff --git a/drivers/net/dsa/microchip/ksz_common.h b/drivers/net/dsa/microchip/ksz_common.h index 28444e5924f9..a4de58847dea 100644 --- a/drivers/net/dsa/microchip/ksz_common.h +++ b/drivers/net/dsa/microchip/ksz_common.h @@ -601,6 +601,13 @@ static inline void ksz_regmap_unlock(void *__mtx) mutex_unlock(mtx); } +static inline bool ksz_is_ksz87xx(struct ksz_device *dev) +{ + return dev->chip_id == KSZ8795_CHIP_ID || + dev->chip_id == KSZ8794_CHIP_ID || + dev->chip_id == KSZ8765_CHIP_ID; +} + static inline bool ksz_is_ksz88x3(struct ksz_device *dev) { return dev->chip_id == KSZ8830_CHIP_ID; -- cgit v1.2.3 From 162d626f3013215b82b6514ca14f20932c7ccce5 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Fri, 14 Jul 2023 07:39:36 +0200 Subject: r8169: fix ASPM-related problem for chip version 42 and 43 Referenced commit missed that for chip versions 42 and 43 ASPM remained disabled in the respective rtl_hw_start_...() routines. This resulted in problems as described in the referenced bug ticket. Therefore re-instantiate the previous logic. Fixes: 5fc3f6c90cca ("r8169: consolidate disabling ASPM before EPHY access") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217635 Signed-off-by: Heiner Kallweit Signed-off-by: David S. Miller --- drivers/net/ethernet/realtek/r8169_main.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 9445f04f8d48..fce4a2b908c2 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2747,6 +2747,13 @@ static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable) return; if (enable) { + /* On these chip versions ASPM can even harm + * bus communication of other PCI devices. + */ + if (tp->mac_version == RTL_GIGA_MAC_VER_42 || + tp->mac_version == RTL_GIGA_MAC_VER_43) + return; + rtl_mod_config5(tp, 0, ASPM_en); rtl_mod_config2(tp, 0, ClkReqEn); -- cgit v1.2.3 From ee8b94c8510ce64afe0b87ef548d23e00915fb10 Mon Sep 17 00:00:00 2001 From: Ziyang Xuan Date: Tue, 11 Jul 2023 09:17:37 +0800 Subject: can: raw: fix receiver memory leak Got kmemleak errors with the following ltp can_filter testcase: for ((i=1; i<=100; i++)) do ./can_filter & sleep 0.1 done ============================================================== [<00000000db4a4943>] can_rx_register+0x147/0x360 [can] [<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw] [<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0 [<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70 [<00000000fd468496>] do_syscall_64+0x33/0x40 [<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 It's a bug in the concurrent scenario of unregister_netdevice_many() and raw_release() as following: cpu0 cpu1 unregister_netdevice_many(can_dev) unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this net_set_todo(can_dev) raw_release(can_socket) dev = dev_get_by_index(, ro->ifindex); // dev == NULL if (dev) { // receivers in dev_rcv_lists not free because dev is NULL raw_disable_allfilters(, dev, ); dev_put(dev); } ... ro->bound = 0; ... call_netdevice_notifiers(NETDEV_UNREGISTER, ) raw_notify(, NETDEV_UNREGISTER, ) if (ro->bound) // invalid because ro->bound has been set 0 raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed Add a net_device pointer member in struct raw_sock to record bound can_dev, and use rtnl_lock to serialize raw_socket members between raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use ro->dev to decide whether to free receivers in dev_rcv_lists. Fixes: 8d0caedb7596 ("can: bcm/raw/isotp: use per module netdevice notifier") Reviewed-by: Oliver Hartkopp Acked-by: Oliver Hartkopp Signed-off-by: Ziyang Xuan Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde --- net/can/raw.c | 57 ++++++++++++++++++++++++--------------------------------- 1 file changed, 24 insertions(+), 33 deletions(-) diff --git a/net/can/raw.c b/net/can/raw.c index 15c79b079184..2302e4882967 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -84,6 +84,7 @@ struct raw_sock { struct sock sk; int bound; int ifindex; + struct net_device *dev; struct list_head notifier; int loopback; int recv_own_msgs; @@ -277,7 +278,7 @@ static void raw_notify(struct raw_sock *ro, unsigned long msg, if (!net_eq(dev_net(dev), sock_net(sk))) return; - if (ro->ifindex != dev->ifindex) + if (ro->dev != dev) return; switch (msg) { @@ -292,6 +293,7 @@ static void raw_notify(struct raw_sock *ro, unsigned long msg, ro->ifindex = 0; ro->bound = 0; + ro->dev = NULL; ro->count = 0; release_sock(sk); @@ -337,6 +339,7 @@ static int raw_init(struct sock *sk) ro->bound = 0; ro->ifindex = 0; + ro->dev = NULL; /* set default filter to single entry dfilter */ ro->dfilter.can_id = 0; @@ -385,19 +388,13 @@ static int raw_release(struct socket *sock) lock_sock(sk); + rtnl_lock(); /* remove current filters & unregister */ if (ro->bound) { - if (ro->ifindex) { - struct net_device *dev; - - dev = dev_get_by_index(sock_net(sk), ro->ifindex); - if (dev) { - raw_disable_allfilters(dev_net(dev), dev, sk); - dev_put(dev); - } - } else { + if (ro->dev) + raw_disable_allfilters(dev_net(ro->dev), ro->dev, sk); + else raw_disable_allfilters(sock_net(sk), NULL, sk); - } } if (ro->count > 1) @@ -405,8 +402,10 @@ static int raw_release(struct socket *sock) ro->ifindex = 0; ro->bound = 0; + ro->dev = NULL; ro->count = 0; free_percpu(ro->uniq); + rtnl_unlock(); sock_orphan(sk); sock->sk = NULL; @@ -422,6 +421,7 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len) struct sockaddr_can *addr = (struct sockaddr_can *)uaddr; struct sock *sk = sock->sk; struct raw_sock *ro = raw_sk(sk); + struct net_device *dev = NULL; int ifindex; int err = 0; int notify_enetdown = 0; @@ -431,14 +431,13 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len) if (addr->can_family != AF_CAN) return -EINVAL; + rtnl_lock(); lock_sock(sk); if (ro->bound && addr->can_ifindex == ro->ifindex) goto out; if (addr->can_ifindex) { - struct net_device *dev; - dev = dev_get_by_index(sock_net(sk), addr->can_ifindex); if (!dev) { err = -ENODEV; @@ -467,26 +466,20 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len) if (!err) { if (ro->bound) { /* unregister old filters */ - if (ro->ifindex) { - struct net_device *dev; - - dev = dev_get_by_index(sock_net(sk), - ro->ifindex); - if (dev) { - raw_disable_allfilters(dev_net(dev), - dev, sk); - dev_put(dev); - } - } else { + if (ro->dev) + raw_disable_allfilters(dev_net(ro->dev), + ro->dev, sk); + else raw_disable_allfilters(sock_net(sk), NULL, sk); - } } ro->ifindex = ifindex; ro->bound = 1; + ro->dev = dev; } out: release_sock(sk); + rtnl_unlock(); if (notify_enetdown) { sk->sk_err = ENETDOWN; @@ -553,9 +546,9 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, rtnl_lock(); lock_sock(sk); - if (ro->bound && ro->ifindex) { - dev = dev_get_by_index(sock_net(sk), ro->ifindex); - if (!dev) { + dev = ro->dev; + if (ro->bound && dev) { + if (dev->reg_state != NETREG_REGISTERED) { if (count > 1) kfree(filter); err = -ENODEV; @@ -596,7 +589,6 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, ro->count = count; out_fil: - dev_put(dev); release_sock(sk); rtnl_unlock(); @@ -614,9 +606,9 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, rtnl_lock(); lock_sock(sk); - if (ro->bound && ro->ifindex) { - dev = dev_get_by_index(sock_net(sk), ro->ifindex); - if (!dev) { + dev = ro->dev; + if (ro->bound && dev) { + if (dev->reg_state != NETREG_REGISTERED) { err = -ENODEV; goto out_err; } @@ -640,7 +632,6 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, ro->err_mask = err_mask; out_err: - dev_put(dev); release_sock(sk); rtnl_unlock(); -- cgit v1.2.3 From 55c3b96074f3f9b0aee19bf93cd71af7516582bb Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Sat, 15 Jul 2023 17:25:43 +0800 Subject: can: bcm: Fix UAF in bcm_proc_show() BUG: KASAN: slab-use-after-free in bcm_proc_show+0x969/0xa80 Read of size 8 at addr ffff888155846230 by task cat/7862 CPU: 1 PID: 7862 Comm: cat Not tainted 6.5.0-rc1-00153-gc8746099c197 #230 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: dump_stack_lvl+0xd5/0x150 print_report+0xc1/0x5e0 kasan_report+0xba/0xf0 bcm_proc_show+0x969/0xa80 seq_read_iter+0x4f6/0x1260 seq_read+0x165/0x210 proc_reg_read+0x227/0x300 vfs_read+0x1d5/0x8d0 ksys_read+0x11e/0x240 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Allocated by task 7846: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x9e/0xa0 bcm_sendmsg+0x264b/0x44e0 sock_sendmsg+0xda/0x180 ____sys_sendmsg+0x735/0x920 ___sys_sendmsg+0x11d/0x1b0 __sys_sendmsg+0xfa/0x1d0 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 7846: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x27/0x40 ____kasan_slab_free+0x161/0x1c0 slab_free_freelist_hook+0x119/0x220 __kmem_cache_free+0xb4/0x2e0 rcu_core+0x809/0x1bd0 bcm_op is freed before procfs entry be removed in bcm_release(), this lead to bcm_proc_show() may read the freed bcm_op. Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol") Signed-off-by: YueHaibing Reviewed-by: Oliver Hartkopp Acked-by: Oliver Hartkopp Link: https://lore.kernel.org/all/20230715092543.15548-1-yuehaibing@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde --- net/can/bcm.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/can/bcm.c b/net/can/bcm.c index 9ba35685b043..9168114fc87f 100644 --- a/net/can/bcm.c +++ b/net/can/bcm.c @@ -1526,6 +1526,12 @@ static int bcm_release(struct socket *sock) lock_sock(sk); +#if IS_ENABLED(CONFIG_PROC_FS) + /* remove procfs entry */ + if (net->can.bcmproc_dir && bo->bcm_proc_read) + remove_proc_entry(bo->procname, net->can.bcmproc_dir); +#endif /* CONFIG_PROC_FS */ + list_for_each_entry_safe(op, next, &bo->tx_ops, list) bcm_remove_op(op); @@ -1561,12 +1567,6 @@ static int bcm_release(struct socket *sock) list_for_each_entry_safe(op, next, &bo->rx_ops, list) bcm_remove_op(op); -#if IS_ENABLED(CONFIG_PROC_FS) - /* remove procfs entry */ - if (net->can.bcmproc_dir && bo->bcm_proc_read) - remove_proc_entry(bo->procname, net->can.bcmproc_dir); -#endif /* CONFIG_PROC_FS */ - /* remove device reference */ if (bo->bound) { bo->bound = 0; -- cgit v1.2.3 From 2603be9e8167ddc7bea95dcfab9ffc33414215aa Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Fri, 7 Jul 2023 13:43:10 +0200 Subject: can: gs_usb: gs_can_open(): improve error handling The gs_usb driver handles USB devices with more than 1 CAN channel. The RX path for all channels share the same bulk endpoint (the transmitted bulk data encodes the channel number). These per-device resources are allocated and submitted by the first opened channel. During this allocation, the resources are either released immediately in case of a failure or the URBs are anchored. All anchored URBs are finally killed with gs_usb_disconnect(). Currently, gs_can_open() returns with an error if the allocation of a URB or a buffer fails. However, if usb_submit_urb() fails, the driver continues with the URBs submitted so far, even if no URBs were successfully submitted. Treat every error as fatal and free all allocated resources immediately. Switch to goto-style error handling, to prepare the driver for more per-device resource allocation. Cc: stable@vger.kernel.org Cc: John Whittington Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-1-9017cefcd9d5@pengutronix.de Signed-off-by: Marc Kleine-Budde --- drivers/net/can/usb/gs_usb.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c index d476c2884008..85b7b59c8426 100644 --- a/drivers/net/can/usb/gs_usb.c +++ b/drivers/net/can/usb/gs_usb.c @@ -833,6 +833,7 @@ static int gs_can_open(struct net_device *netdev) .mode = cpu_to_le32(GS_CAN_MODE_START), }; struct gs_host_frame *hf; + struct urb *urb = NULL; u32 ctrlmode; u32 flags = 0; int rc, i; @@ -856,13 +857,14 @@ static int gs_can_open(struct net_device *netdev) if (!parent->active_channels) { for (i = 0; i < GS_MAX_RX_URBS; i++) { - struct urb *urb; u8 *buf; /* alloc rx urb */ urb = usb_alloc_urb(0, GFP_KERNEL); - if (!urb) - return -ENOMEM; + if (!urb) { + rc = -ENOMEM; + goto out_usb_kill_anchored_urbs; + } /* alloc rx buffer */ buf = kmalloc(dev->parent->hf_size_rx, @@ -870,8 +872,8 @@ static int gs_can_open(struct net_device *netdev) if (!buf) { netdev_err(netdev, "No memory left for USB buffer\n"); - usb_free_urb(urb); - return -ENOMEM; + rc = -ENOMEM; + goto out_usb_free_urb; } /* fill, anchor, and submit rx urb */ @@ -894,9 +896,7 @@ static int gs_can_open(struct net_device *netdev) netdev_err(netdev, "usb_submit failed (err=%d)\n", rc); - usb_unanchor_urb(urb); - usb_free_urb(urb); - break; + goto out_usb_unanchor_urb; } /* Drop reference, @@ -945,7 +945,8 @@ static int gs_can_open(struct net_device *netdev) if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) gs_usb_timestamp_stop(dev); dev->can.state = CAN_STATE_STOPPED; - return rc; + + goto out_usb_kill_anchored_urbs; } parent->active_channels++; @@ -953,6 +954,18 @@ static int gs_can_open(struct net_device *netdev) netif_start_queue(netdev); return 0; + +out_usb_unanchor_urb: + usb_unanchor_urb(urb); +out_usb_free_urb: + usb_free_urb(urb); +out_usb_kill_anchored_urbs: + if (!parent->active_channels) + usb_kill_anchored_urbs(&dev->tx_submitted); + + close_candev(netdev); + + return rc; } static int gs_usb_get_state(const struct net_device *netdev, -- cgit v1.2.3 From 5886e4d5ecec3e22844efed90b2dd383ef804b3a Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Fri, 7 Jul 2023 18:44:23 +0200 Subject: can: gs_usb: fix time stamp counter initialization If the gs_usb device driver is unloaded (or unbound) before the interface is shut down, the USB stack first calls the struct usb_driver::disconnect and then the struct net_device_ops::ndo_stop callback. In gs_usb_disconnect() all pending bulk URBs are killed, i.e. no more RX'ed CAN frames are send from the USB device to the host. Later in gs_can_close() a reset control message is send to each CAN channel to remove the controller from the CAN bus. In this race window the USB device can still receive CAN frames from the bus and internally queue them to be send to the host. At least in the current version of the candlelight firmware, the queue of received CAN frames is not emptied during the reset command. After loading (or binding) the gs_usb driver, new URBs are submitted during the struct net_device_ops::ndo_open callback and the candlelight firmware starts sending its already queued CAN frames to the host. However, this scenario was not considered when implementing the hardware timestamp function. The cycle counter/time counter infrastructure is set up (gs_usb_timestamp_init()) after the USBs are submitted, resulting in a NULL pointer dereference if timecounter_cyc2time() (via the call chain: gs_usb_receive_bulk_callback() -> gs_usb_set_timestamp() -> gs_usb_skb_set_timestamp()) is called too early. Move the gs_usb_timestamp_init() function before the URBs are submitted to fix this problem. For a comprehensive solution, we need to consider gs_usb devices with more than 1 channel. The cycle counter/time counter infrastructure is setup per channel, but the RX URBs are per device. Once gs_can_open() of _a_ channel has been called, and URBs have been submitted, the gs_usb_receive_bulk_callback() can be called for _all_ available channels, even for channels that are not running, yet. As cycle counter/time counter has not set up, this will again lead to a NULL pointer dereference. Convert the cycle counter/time counter from a "per channel" to a "per device" functionality. Also set it up, before submitting any URBs to the device. Further in gs_usb_receive_bulk_callback(), don't process any URBs for not started CAN channels, only resubmit the URB. Fixes: 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support") Closes: https://github.com/candle-usb/candleLight_fw/issues/137#issuecomment-1623532076 Cc: stable@vger.kernel.org Cc: John Whittington Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-2-9017cefcd9d5@pengutronix.de Signed-off-by: Marc Kleine-Budde --- drivers/net/can/usb/gs_usb.c | 101 +++++++++++++++++++++++-------------------- 1 file changed, 53 insertions(+), 48 deletions(-) diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c index 85b7b59c8426..f418066569fc 100644 --- a/drivers/net/can/usb/gs_usb.c +++ b/drivers/net/can/usb/gs_usb.c @@ -303,12 +303,6 @@ struct gs_can { struct can_bittiming_const bt_const, data_bt_const; unsigned int channel; /* channel number */ - /* time counter for hardware timestamps */ - struct cyclecounter cc; - struct timecounter tc; - spinlock_t tc_lock; /* spinlock to guard access tc->cycle_last */ - struct delayed_work timestamp; - u32 feature; unsigned int hf_size_tx; @@ -325,6 +319,13 @@ struct gs_usb { struct gs_can *canch[GS_MAX_INTF]; struct usb_anchor rx_submitted; struct usb_device *udev; + + /* time counter for hardware timestamps */ + struct cyclecounter cc; + struct timecounter tc; + spinlock_t tc_lock; /* spinlock to guard access tc->cycle_last */ + struct delayed_work timestamp; + unsigned int hf_size_rx; u8 active_channels; }; @@ -388,15 +389,15 @@ static int gs_cmd_reset(struct gs_can *dev) GFP_KERNEL); } -static inline int gs_usb_get_timestamp(const struct gs_can *dev, +static inline int gs_usb_get_timestamp(const struct gs_usb *parent, u32 *timestamp_p) { __le32 timestamp; int rc; - rc = usb_control_msg_recv(dev->udev, 0, GS_USB_BREQ_TIMESTAMP, + rc = usb_control_msg_recv(parent->udev, 0, GS_USB_BREQ_TIMESTAMP, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_INTERFACE, - dev->channel, 0, + 0, 0, ×tamp, sizeof(timestamp), USB_CTRL_GET_TIMEOUT, GFP_KERNEL); @@ -410,20 +411,20 @@ static inline int gs_usb_get_timestamp(const struct gs_can *dev, static u64 gs_usb_timestamp_read(const struct cyclecounter *cc) __must_hold(&dev->tc_lock) { - struct gs_can *dev = container_of(cc, struct gs_can, cc); + struct gs_usb *parent = container_of(cc, struct gs_usb, cc); u32 timestamp = 0; int err; - lockdep_assert_held(&dev->tc_lock); + lockdep_assert_held(&parent->tc_lock); /* drop lock for synchronous USB transfer */ - spin_unlock_bh(&dev->tc_lock); - err = gs_usb_get_timestamp(dev, ×tamp); - spin_lock_bh(&dev->tc_lock); + spin_unlock_bh(&parent->tc_lock); + err = gs_usb_get_timestamp(parent, ×tamp); + spin_lock_bh(&parent->tc_lock); if (err) - netdev_err(dev->netdev, - "Error %d while reading timestamp. HW timestamps may be inaccurate.", - err); + dev_err(&parent->udev->dev, + "Error %d while reading timestamp. HW timestamps may be inaccurate.", + err); return timestamp; } @@ -431,14 +432,14 @@ static u64 gs_usb_timestamp_read(const struct cyclecounter *cc) __must_hold(&dev static void gs_usb_timestamp_work(struct work_struct *work) { struct delayed_work *delayed_work = to_delayed_work(work); - struct gs_can *dev; + struct gs_usb *parent; - dev = container_of(delayed_work, struct gs_can, timestamp); - spin_lock_bh(&dev->tc_lock); - timecounter_read(&dev->tc); - spin_unlock_bh(&dev->tc_lock); + parent = container_of(delayed_work, struct gs_usb, timestamp); + spin_lock_bh(&parent->tc_lock); + timecounter_read(&parent->tc); + spin_unlock_bh(&parent->tc_lock); - schedule_delayed_work(&dev->timestamp, + schedule_delayed_work(&parent->timestamp, GS_USB_TIMESTAMP_WORK_DELAY_SEC * HZ); } @@ -446,37 +447,38 @@ static void gs_usb_skb_set_timestamp(struct gs_can *dev, struct sk_buff *skb, u32 timestamp) { struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); + struct gs_usb *parent = dev->parent; u64 ns; - spin_lock_bh(&dev->tc_lock); - ns = timecounter_cyc2time(&dev->tc, timestamp); - spin_unlock_bh(&dev->tc_lock); + spin_lock_bh(&parent->tc_lock); + ns = timecounter_cyc2time(&parent->tc, timestamp); + spin_unlock_bh(&parent->tc_lock); hwtstamps->hwtstamp = ns_to_ktime(ns); } -static void gs_usb_timestamp_init(struct gs_can *dev) +static void gs_usb_timestamp_init(struct gs_usb *parent) { - struct cyclecounter *cc = &dev->cc; + struct cyclecounter *cc = &parent->cc; cc->read = gs_usb_timestamp_read; cc->mask = CYCLECOUNTER_MASK(32); cc->shift = 32 - bits_per(NSEC_PER_SEC / GS_USB_TIMESTAMP_TIMER_HZ); cc->mult = clocksource_hz2mult(GS_USB_TIMESTAMP_TIMER_HZ, cc->shift); - spin_lock_init(&dev->tc_lock); - spin_lock_bh(&dev->tc_lock); - timecounter_init(&dev->tc, &dev->cc, ktime_get_real_ns()); - spin_unlock_bh(&dev->tc_lock); + spin_lock_init(&parent->tc_lock); + spin_lock_bh(&parent->tc_lock); + timecounter_init(&parent->tc, &parent->cc, ktime_get_real_ns()); + spin_unlock_bh(&parent->tc_lock); - INIT_DELAYED_WORK(&dev->timestamp, gs_usb_timestamp_work); - schedule_delayed_work(&dev->timestamp, + INIT_DELAYED_WORK(&parent->timestamp, gs_usb_timestamp_work); + schedule_delayed_work(&parent->timestamp, GS_USB_TIMESTAMP_WORK_DELAY_SEC * HZ); } -static void gs_usb_timestamp_stop(struct gs_can *dev) +static void gs_usb_timestamp_stop(struct gs_usb *parent) { - cancel_delayed_work_sync(&dev->timestamp); + cancel_delayed_work_sync(&parent->timestamp); } static void gs_update_state(struct gs_can *dev, struct can_frame *cf) @@ -560,6 +562,9 @@ static void gs_usb_receive_bulk_callback(struct urb *urb) if (!netif_device_present(netdev)) return; + if (!netif_running(netdev)) + goto resubmit_urb; + if (hf->echo_id == -1) { /* normal rx */ if (hf->flags & GS_CAN_FLAG_FD) { skb = alloc_canfd_skb(dev->netdev, &cfd); @@ -856,6 +861,9 @@ static int gs_can_open(struct net_device *netdev) } if (!parent->active_channels) { + if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) + gs_usb_timestamp_init(parent); + for (i = 0; i < GS_MAX_RX_URBS; i++) { u8 *buf; @@ -926,13 +934,9 @@ static int gs_can_open(struct net_device *netdev) flags |= GS_CAN_MODE_FD; /* if hardware supports timestamps, enable it */ - if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) { + if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) flags |= GS_CAN_MODE_HW_TIMESTAMP; - /* start polling timestamp */ - gs_usb_timestamp_init(dev); - } - /* finally start device */ dev->can.state = CAN_STATE_ERROR_ACTIVE; dm.flags = cpu_to_le32(flags); @@ -942,8 +946,6 @@ static int gs_can_open(struct net_device *netdev) GFP_KERNEL); if (rc) { netdev_err(netdev, "Couldn't start device (err=%d)\n", rc); - if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) - gs_usb_timestamp_stop(dev); dev->can.state = CAN_STATE_STOPPED; goto out_usb_kill_anchored_urbs; @@ -960,9 +962,13 @@ out_usb_unanchor_urb: out_usb_free_urb: usb_free_urb(urb); out_usb_kill_anchored_urbs: - if (!parent->active_channels) + if (!parent->active_channels) { usb_kill_anchored_urbs(&dev->tx_submitted); + if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) + gs_usb_timestamp_stop(parent); + } + close_candev(netdev); return rc; @@ -1011,14 +1017,13 @@ static int gs_can_close(struct net_device *netdev) netif_stop_queue(netdev); - /* stop polling timestamp */ - if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) - gs_usb_timestamp_stop(dev); - /* Stop polling */ parent->active_channels--; if (!parent->active_channels) { usb_kill_anchored_urbs(&parent->rx_submitted); + + if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) + gs_usb_timestamp_stop(parent); } /* Stop sending URBs */ -- cgit v1.2.3 From 5f4fa1672d98fe99d2297b03add35346f1685d6b Mon Sep 17 00:00:00 2001 From: Ding Hui Date: Tue, 9 May 2023 19:11:47 +0800 Subject: iavf: Fix use-after-free in free_netdev We do netif_napi_add() for all allocated q_vectors[], but potentially do netif_napi_del() for part of them, then kfree q_vectors and leave invalid pointers at dev->napi_list. Reproducer: [root@host ~]# cat repro.sh #!/bin/bash pf_dbsf="0000:41:00.0" vf0_dbsf="0000:41:02.0" g_pids=() function do_set_numvf() { echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) } function do_set_channel() { local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/) [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; } ifconfig $nic 192.168.18.5 netmask 255.255.255.0 ifconfig $nic up ethtool -L $nic combined 1 ethtool -L $nic combined 4 sleep $((RANDOM%3)) } function on_exit() { local pid for pid in "${g_pids[@]}"; do kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null done g_pids=() } trap "on_exit; exit" EXIT while :; do do_set_numvf ; done & g_pids+=($!) while :; do do_set_channel ; done & g_pids+=($!) wait Result: [ 4093.900222] ================================================================== [ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390 [ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699 [ 4093.900233] [ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021 [ 4093.900239] Call Trace: [ 4093.900244] dump_stack+0x71/0xab [ 4093.900249] print_address_description+0x6b/0x290 [ 4093.900251] ? free_netdev+0x308/0x390 [ 4093.900252] kasan_report+0x14a/0x2b0 [ 4093.900254] free_netdev+0x308/0x390 [ 4093.900261] iavf_remove+0x825/0xd20 [iavf] [ 4093.900265] pci_device_remove+0xa8/0x1f0 [ 4093.900268] device_release_driver_internal+0x1c6/0x460 [ 4093.900271] pci_stop_bus_device+0x101/0x150 [ 4093.900273] pci_stop_and_remove_bus_device+0xe/0x20 [ 4093.900275] pci_iov_remove_virtfn+0x187/0x420 [ 4093.900277] ? pci_iov_add_virtfn+0xe10/0xe10 [ 4093.900278] ? pci_get_subsys+0x90/0x90 [ 4093.900280] sriov_disable+0xed/0x3e0 [ 4093.900282] ? bus_find_device+0x12d/0x1a0 [ 4093.900290] i40e_free_vfs+0x754/0x1210 [i40e] [ 4093.900298] ? i40e_reset_all_vfs+0x880/0x880 [i40e] [ 4093.900299] ? pci_get_device+0x7c/0x90 [ 4093.900300] ? pci_get_subsys+0x90/0x90 [ 4093.900306] ? pci_vfs_assigned.part.7+0x144/0x210 [ 4093.900309] ? __mutex_lock_slowpath+0x10/0x10 [ 4093.900315] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 4093.900318] sriov_numvfs_store+0x214/0x290 [ 4093.900320] ? sriov_totalvfs_show+0x30/0x30 [ 4093.900321] ? __mutex_lock_slowpath+0x10/0x10 [ 4093.900323] ? __check_object_size+0x15a/0x350 [ 4093.900326] kernfs_fop_write+0x280/0x3f0 [ 4093.900329] vfs_write+0x145/0x440 [ 4093.900330] ksys_write+0xab/0x160 [ 4093.900332] ? __ia32_sys_read+0xb0/0xb0 [ 4093.900334] ? fput_many+0x1a/0x120 [ 4093.900335] ? filp_close+0xf0/0x130 [ 4093.900338] do_syscall_64+0xa0/0x370 [ 4093.900339] ? page_fault+0x8/0x30 [ 4093.900341] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 4093.900357] RIP: 0033:0x7f16ad4d22c0 [ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24 [ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0 [ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001 [ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700 [ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001 [ 4093.900367] [ 4093.900368] Allocated by task 820: [ 4093.900371] kasan_kmalloc+0xa6/0xd0 [ 4093.900373] __kmalloc+0xfb/0x200 [ 4093.900376] iavf_init_interrupt_scheme+0x63b/0x1320 [iavf] [ 4093.900380] iavf_watchdog_task+0x3d51/0x52c0 [iavf] [ 4093.900382] process_one_work+0x56a/0x11f0 [ 4093.900383] worker_thread+0x8f/0xf40 [ 4093.900384] kthread+0x2a0/0x390 [ 4093.900385] ret_from_fork+0x1f/0x40 [ 4093.900387] 0xffffffffffffffff [ 4093.900387] [ 4093.900388] Freed by task 6699: [ 4093.900390] __kasan_slab_free+0x137/0x190 [ 4093.900391] kfree+0x8b/0x1b0 [ 4093.900394] iavf_free_q_vectors+0x11d/0x1a0 [iavf] [ 4093.900397] iavf_remove+0x35a/0xd20 [iavf] [ 4093.900399] pci_device_remove+0xa8/0x1f0 [ 4093.900400] device_release_driver_internal+0x1c6/0x460 [ 4093.900401] pci_stop_bus_device+0x101/0x150 [ 4093.900402] pci_stop_and_remove_bus_device+0xe/0x20 [ 4093.900403] pci_iov_remove_virtfn+0x187/0x420 [ 4093.900404] sriov_disable+0xed/0x3e0 [ 4093.900409] i40e_free_vfs+0x754/0x1210 [i40e] [ 4093.900415] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 4093.900416] sriov_numvfs_store+0x214/0x290 [ 4093.900417] kernfs_fop_write+0x280/0x3f0 [ 4093.900418] vfs_write+0x145/0x440 [ 4093.900419] ksys_write+0xab/0x160 [ 4093.900420] do_syscall_64+0xa0/0x370 [ 4093.900421] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 4093.900422] 0xffffffffffffffff [ 4093.900422] [ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200 which belongs to the cache kmalloc-8k of size 8192 [ 4093.900425] The buggy address is located 5184 bytes inside of 8192-byte region [ffff88b4dc144200, ffff88b4dc146200) [ 4093.900425] The buggy address belongs to the page: [ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0 [ 4093.900430] flags: 0x10000000008100(slab|head) [ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80 [ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000 [ 4093.900434] page dumped because: kasan: bad access detected [ 4093.900435] [ 4093.900435] Memory state around the buggy address: [ 4093.900436] ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900437] ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900438] ^ [ 4093.900439] ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900440] ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 4093.900440] ================================================================== Although the patch #2 (of 2) can avoid the issue triggered by this repro.sh, there still are other potential risks that if num_active_queues is changed to less than allocated q_vectors[] by unexpected, the mismatched netif_napi_add/del() can also cause UAF. Since we actually call netif_napi_add() for all allocated q_vectors unconditionally in iavf_alloc_q_vectors(), so we should fix it by letting netif_napi_del() match to netif_napi_add(). Fixes: 5eae00c57f5e ("i40evf: main driver core") Signed-off-by: Ding Hui Cc: Donglin Peng Cc: Huang Cun Reviewed-by: Simon Horman Reviewed-by: Madhu Chittim Reviewed-by: Leon Romanovsky Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf_main.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index a483eb185c99..89d88c3a7add 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -1828,19 +1828,16 @@ static int iavf_alloc_q_vectors(struct iavf_adapter *adapter) static void iavf_free_q_vectors(struct iavf_adapter *adapter) { int q_idx, num_q_vectors; - int napi_vectors; if (!adapter->q_vectors) return; num_q_vectors = adapter->num_msix_vectors - NONQ_VECS; - napi_vectors = adapter->num_active_queues; for (q_idx = 0; q_idx < num_q_vectors; q_idx++) { struct iavf_q_vector *q_vector = &adapter->q_vectors[q_idx]; - if (q_idx < napi_vectors) - netif_napi_del(&q_vector->napi); + netif_napi_del(&q_vector->napi); } kfree(adapter->q_vectors); adapter->q_vectors = NULL; -- cgit v1.2.3 From 7c4bced3caa749ce468b0c5de711c98476b23a52 Mon Sep 17 00:00:00 2001 From: Ding Hui Date: Tue, 9 May 2023 19:11:48 +0800 Subject: iavf: Fix out-of-bounds when setting channels on remove If we set channels greater during iavf_remove(), and waiting reset done would be timeout, then returned with error but changed num_active_queues directly, that will lead to OOB like the following logs. Because the num_active_queues is greater than tx/rx_rings[] allocated actually. Reproducer: [root@host ~]# cat repro.sh #!/bin/bash pf_dbsf="0000:41:00.0" vf0_dbsf="0000:41:02.0" g_pids=() function do_set_numvf() { echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs sleep $((RANDOM%3+1)) } function do_set_channel() { local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/) [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; } ifconfig $nic 192.168.18.5 netmask 255.255.255.0 ifconfig $nic up ethtool -L $nic combined 1 ethtool -L $nic combined 4 sleep $((RANDOM%3)) } function on_exit() { local pid for pid in "${g_pids[@]}"; do kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null done g_pids=() } trap "on_exit; exit" EXIT while :; do do_set_numvf ; done & g_pids+=($!) while :; do do_set_channel ; done & g_pids+=($!) wait Result: [ 3506.152887] iavf 0000:41:02.0: Removing device [ 3510.400799] ================================================================== [ 3510.400820] BUG: KASAN: slab-out-of-bounds in iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400823] Read of size 8 at addr ffff88b6f9311008 by task repro.sh/55536 [ 3510.400823] [ 3510.400830] CPU: 101 PID: 55536 Comm: repro.sh Kdump: loaded Tainted: G O --------- -t - 4.18.0 #1 [ 3510.400832] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021 [ 3510.400835] Call Trace: [ 3510.400851] dump_stack+0x71/0xab [ 3510.400860] print_address_description+0x6b/0x290 [ 3510.400865] ? iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400868] kasan_report+0x14a/0x2b0 [ 3510.400873] iavf_free_all_tx_resources+0x156/0x160 [iavf] [ 3510.400880] iavf_remove+0x2b6/0xc70 [iavf] [ 3510.400884] ? iavf_free_all_rx_resources+0x160/0x160 [iavf] [ 3510.400891] ? wait_woken+0x1d0/0x1d0 [ 3510.400895] ? notifier_call_chain+0xc1/0x130 [ 3510.400903] pci_device_remove+0xa8/0x1f0 [ 3510.400910] device_release_driver_internal+0x1c6/0x460 [ 3510.400916] pci_stop_bus_device+0x101/0x150 [ 3510.400919] pci_stop_and_remove_bus_device+0xe/0x20 [ 3510.400924] pci_iov_remove_virtfn+0x187/0x420 [ 3510.400927] ? pci_iov_add_virtfn+0xe10/0xe10 [ 3510.400929] ? pci_get_subsys+0x90/0x90 [ 3510.400932] sriov_disable+0xed/0x3e0 [ 3510.400936] ? bus_find_device+0x12d/0x1a0 [ 3510.400953] i40e_free_vfs+0x754/0x1210 [i40e] [ 3510.400966] ? i40e_reset_all_vfs+0x880/0x880 [i40e] [ 3510.400968] ? pci_get_device+0x7c/0x90 [ 3510.400970] ? pci_get_subsys+0x90/0x90 [ 3510.400982] ? pci_vfs_assigned.part.7+0x144/0x210 [ 3510.400987] ? __mutex_lock_slowpath+0x10/0x10 [ 3510.400996] i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e] [ 3510.401001] sriov_numvfs_store+0x214/0x290 [ 3510.401005] ? sriov_totalvfs_show+0x30/0x30 [ 3510.401007] ? __mutex_lock_slowpath+0x10/0x10 [ 3510.401011] ? __check_object_size+0x15a/0x350 [ 3510.401018] kernfs_fop_write+0x280/0x3f0 [ 3510.401022] vfs_write+0x145/0x440 [ 3510.401025] ksys_write+0xab/0x160 [ 3510.401028] ? __ia32_sys_read+0xb0/0xb0 [ 3510.401031] ? fput_many+0x1a/0x120 [ 3510.401032] ? filp_close+0xf0/0x130 [ 3510.401038] do_syscall_64+0xa0/0x370 [ 3510.401041] ? page_fault+0x8/0x30 [ 3510.401043] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 3510.401073] RIP: 0033:0x7f3a9bb842c0 [ 3510.401079] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24 [ 3510.401080] RSP: 002b:00007ffc05f1fe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 3510.401083] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3a9bb842c0 [ 3510.401085] RDX: 0000000000000002 RSI: 0000000002327408 RDI: 0000000000000001 [ 3510.401086] RBP: 0000000002327408 R08: 00007f3a9be53780 R09: 00007f3a9c8a4700 [ 3510.401086] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002 [ 3510.401087] R13: 0000000000000001 R14: 00007f3a9be52620 R15: 0000000000000001 [ 3510.401090] [ 3510.401093] Allocated by task 76795: [ 3510.401098] kasan_kmalloc+0xa6/0xd0 [ 3510.401099] __kmalloc+0xfb/0x200 [ 3510.401104] iavf_init_interrupt_scheme+0x26f/0x1310 [iavf] [ 3510.401108] iavf_watchdog_task+0x1d58/0x4050 [iavf] [ 3510.401114] process_one_work+0x56a/0x11f0 [ 3510.401115] worker_thread+0x8f/0xf40 [ 3510.401117] kthread+0x2a0/0x390 [ 3510.401119] ret_from_fork+0x1f/0x40 [ 3510.401122] 0xffffffffffffffff [ 3510.401123] In timeout handling, we should keep the original num_active_queues and reset num_req_queues to 0. Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") Signed-off-by: Ding Hui Cc: Donglin Peng Cc: Huang Cun Reviewed-by: Leon Romanovsky Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c index 6f171d1d85b7..92443f8e9fbd 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c @@ -1863,7 +1863,7 @@ static int iavf_set_channels(struct net_device *netdev, } if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) { adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED; - adapter->num_active_queues = num_req; + adapter->num_req_queues = 0; return -EOPNOTSUPP; } -- cgit v1.2.3 From a77ed5c5b768e9649be240a2d864e5cd9c6a2015 Mon Sep 17 00:00:00 2001 From: Ahmed Zaki Date: Fri, 19 May 2023 15:46:02 -0600 Subject: iavf: use internal state to free traffic IRQs If the system tries to close the netdev while iavf_reset_task() is running, __LINK_STATE_START will be cleared and netif_running() will return false in iavf_reinit_interrupt_scheme(). This will result in iavf_free_traffic_irqs() not being called and a leak as follows: [7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0' [7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0 is shown when pci_disable_msix() is later called. Fix by using the internal adapter state. The traffic IRQs will always exist if state == __IAVF_RUNNING. Fixes: 5b36e8d04b44 ("i40evf: Enable VF to request an alternate queue allocation") Signed-off-by: Ahmed Zaki Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf_main.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 89d88c3a7add..058e19c6f94a 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -1929,15 +1929,16 @@ static void iavf_free_rss(struct iavf_adapter *adapter) /** * iavf_reinit_interrupt_scheme - Reallocate queues and vectors * @adapter: board private structure + * @running: true if adapter->state == __IAVF_RUNNING * * Returns 0 on success, negative on failure **/ -static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter) +static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter, bool running) { struct net_device *netdev = adapter->netdev; int err; - if (netif_running(netdev)) + if (running) iavf_free_traffic_irqs(adapter); iavf_free_misc_irq(adapter); iavf_reset_interrupt_capability(adapter); @@ -3053,7 +3054,7 @@ continue_reset: if ((adapter->flags & IAVF_FLAG_REINIT_MSIX_NEEDED) || (adapter->flags & IAVF_FLAG_REINIT_ITR_NEEDED)) { - err = iavf_reinit_interrupt_scheme(adapter); + err = iavf_reinit_interrupt_scheme(adapter, running); if (err) goto reset_err; } -- cgit v1.2.3 From c2ed2403f12c74a74a0091ed5d830e72c58406e8 Mon Sep 17 00:00:00 2001 From: Marcin Szycik Date: Mon, 5 Jun 2023 10:52:22 -0400 Subject: iavf: Wait for reset in callbacks which trigger it There was a fail when trying to add the interface to bonding right after changing the MTU on the interface. It was caused by bonding interface unable to open the interface due to interface being in __RESETTING state because of MTU change. Add new reset_waitqueue to indicate that reset has finished. Add waiting for reset to finish in callbacks which trigger hw reset: iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam(). We use a 5000ms timeout period because on Hyper-V based systems, this operation takes around 3000-4000ms. In normal circumstances, it doesn't take more than 500ms to complete. Add a function iavf_wait_for_reset() to reuse waiting for reset code and use it also in iavf_set_channels(), which already waits for reset. We don't use error handling in iavf_set_channels() as this could cause the device to be in incorrect state if the reset was scheduled but hit timeout or the waitng function was interrupted by a signal. Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") Signed-off-by: Marcin Szycik Co-developed-by: Dawid Wesierski Signed-off-by: Dawid Wesierski Signed-off-by: Sylwester Dziedziuch Signed-off-by: Kamil Maziarz Signed-off-by: Mateusz Palczewski Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf.h | 2 + drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 31 ++++++++------- drivers/net/ethernet/intel/iavf/iavf_main.c | 51 ++++++++++++++++++++++++- drivers/net/ethernet/intel/iavf/iavf_virtchnl.c | 1 + 4 files changed, 68 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h index f80f2735e688..a5cab19eb6a8 100644 --- a/drivers/net/ethernet/intel/iavf/iavf.h +++ b/drivers/net/ethernet/intel/iavf/iavf.h @@ -257,6 +257,7 @@ struct iavf_adapter { struct work_struct adminq_task; struct delayed_work client_task; wait_queue_head_t down_waitqueue; + wait_queue_head_t reset_waitqueue; wait_queue_head_t vc_waitqueue; struct iavf_q_vector *q_vectors; struct list_head vlan_filter_list; @@ -582,4 +583,5 @@ void iavf_add_adv_rss_cfg(struct iavf_adapter *adapter); void iavf_del_adv_rss_cfg(struct iavf_adapter *adapter); struct iavf_mac_filter *iavf_add_filter(struct iavf_adapter *adapter, const u8 *macaddr); +int iavf_wait_for_reset(struct iavf_adapter *adapter); #endif /* _IAVF_H_ */ diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c index 92443f8e9fbd..b7141c2a941d 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c @@ -484,6 +484,7 @@ static int iavf_set_priv_flags(struct net_device *netdev, u32 flags) { struct iavf_adapter *adapter = netdev_priv(netdev); u32 orig_flags, new_flags, changed_flags; + int ret = 0; u32 i; orig_flags = READ_ONCE(adapter->flags); @@ -533,10 +534,13 @@ static int iavf_set_priv_flags(struct net_device *netdev, u32 flags) if (netif_running(netdev)) { adapter->flags |= IAVF_FLAG_RESET_NEEDED; queue_work(adapter->wq, &adapter->reset_task); + ret = iavf_wait_for_reset(adapter); + if (ret) + netdev_warn(netdev, "Changing private flags timeout or interrupted waiting for reset"); } } - return 0; + return ret; } /** @@ -627,6 +631,7 @@ static int iavf_set_ringparam(struct net_device *netdev, { struct iavf_adapter *adapter = netdev_priv(netdev); u32 new_rx_count, new_tx_count; + int ret = 0; if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending)) return -EINVAL; @@ -673,9 +678,12 @@ static int iavf_set_ringparam(struct net_device *netdev, if (netif_running(netdev)) { adapter->flags |= IAVF_FLAG_RESET_NEEDED; queue_work(adapter->wq, &adapter->reset_task); + ret = iavf_wait_for_reset(adapter); + if (ret) + netdev_warn(netdev, "Changing ring parameters timeout or interrupted waiting for reset"); } - return 0; + return ret; } /** @@ -1830,7 +1838,7 @@ static int iavf_set_channels(struct net_device *netdev, { struct iavf_adapter *adapter = netdev_priv(netdev); u32 num_req = ch->combined_count; - int i; + int ret = 0; if ((adapter->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_ADQ) && adapter->num_tc) { @@ -1854,20 +1862,11 @@ static int iavf_set_channels(struct net_device *netdev, adapter->flags |= IAVF_FLAG_REINIT_ITR_NEEDED; iavf_schedule_reset(adapter); - /* wait for the reset is done */ - for (i = 0; i < IAVF_RESET_WAIT_COMPLETE_COUNT; i++) { - msleep(IAVF_RESET_WAIT_MS); - if (adapter->flags & IAVF_FLAG_RESET_PENDING) - continue; - break; - } - if (i == IAVF_RESET_WAIT_COMPLETE_COUNT) { - adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED; - adapter->num_req_queues = 0; - return -EOPNOTSUPP; - } + ret = iavf_wait_for_reset(adapter); + if (ret) + netdev_warn(netdev, "Changing channel count timeout or interrupted waiting for reset"); - return 0; + return ret; } /** diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 058e19c6f94a..b89933aa5bfe 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -166,6 +166,45 @@ static struct iavf_adapter *iavf_pdev_to_adapter(struct pci_dev *pdev) return netdev_priv(pci_get_drvdata(pdev)); } +/** + * iavf_is_reset_in_progress - Check if a reset is in progress + * @adapter: board private structure + */ +static bool iavf_is_reset_in_progress(struct iavf_adapter *adapter) +{ + if (adapter->state == __IAVF_RESETTING || + adapter->flags & (IAVF_FLAG_RESET_PENDING | + IAVF_FLAG_RESET_NEEDED)) + return true; + + return false; +} + +/** + * iavf_wait_for_reset - Wait for reset to finish. + * @adapter: board private structure + * + * Returns 0 if reset finished successfully, negative on timeout or interrupt. + */ +int iavf_wait_for_reset(struct iavf_adapter *adapter) +{ + int ret = wait_event_interruptible_timeout(adapter->reset_waitqueue, + !iavf_is_reset_in_progress(adapter), + msecs_to_jiffies(5000)); + + /* If ret < 0 then it means wait was interrupted. + * If ret == 0 then it means we got a timeout while waiting + * for reset to finish. + * If ret > 0 it means reset has finished. + */ + if (ret > 0) + return 0; + else if (ret < 0) + return -EINTR; + else + return -EBUSY; +} + /** * iavf_allocate_dma_mem_d - OS specific memory alloc for shared code * @hw: pointer to the HW structure @@ -3149,6 +3188,7 @@ continue_reset: adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED; + wake_up(&adapter->reset_waitqueue); mutex_unlock(&adapter->client_lock); mutex_unlock(&adapter->crit_lock); @@ -4313,6 +4353,7 @@ static int iavf_close(struct net_device *netdev) static int iavf_change_mtu(struct net_device *netdev, int new_mtu) { struct iavf_adapter *adapter = netdev_priv(netdev); + int ret = 0; netdev_dbg(netdev, "changing MTU from %d to %d\n", netdev->mtu, new_mtu); @@ -4325,9 +4366,14 @@ static int iavf_change_mtu(struct net_device *netdev, int new_mtu) if (netif_running(netdev)) { adapter->flags |= IAVF_FLAG_RESET_NEEDED; queue_work(adapter->wq, &adapter->reset_task); + ret = iavf_wait_for_reset(adapter); + if (ret < 0) + netdev_warn(netdev, "MTU change interrupted waiting for reset"); + else if (ret) + netdev_warn(netdev, "MTU change timed out waiting for reset"); } - return 0; + return ret; } #define NETIF_VLAN_OFFLOAD_FEATURES (NETIF_F_HW_VLAN_CTAG_RX | \ @@ -4928,6 +4974,9 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) /* Setup the wait queue for indicating transition to down status */ init_waitqueue_head(&adapter->down_waitqueue); + /* Setup the wait queue for indicating transition to running state */ + init_waitqueue_head(&adapter->reset_waitqueue); + /* Setup the wait queue for indicating virtchannel events */ init_waitqueue_head(&adapter->vc_waitqueue); diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c index 7c0578b5457b..1bab896aaf40 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c @@ -2285,6 +2285,7 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, case VIRTCHNL_OP_ENABLE_QUEUES: /* enable transmits */ iavf_irq_enable(adapter, true); + wake_up(&adapter->reset_waitqueue); adapter->flags &= ~IAVF_FLAG_QUEUES_DISABLED; break; case VIRTCHNL_OP_DISABLE_QUEUES: -- cgit v1.2.3 From d2806d960e8387945b53edf7ed9d71ab3ab5f073 Mon Sep 17 00:00:00 2001 From: Marcin Szycik Date: Mon, 5 Jun 2023 10:52:23 -0400 Subject: Revert "iavf: Detach device during reset task" This reverts commit aa626da947e9cd30c4cf727493903e1adbb2c0a0. Detaching device during reset was not fully fixing the rtnl locking issue, as there could be a situation where callback was already in progress before detaching netdev. Furthermore, detaching netdevice causes TX timeouts if traffic is running. To reproduce: ip netns exec ns1 iperf3 -c $PEER_IP -t 600 --logfile /dev/null & while :; do for i in 200 7000 400 5000 300 3000 ; do ip netns exec ns1 ip link set $VF1 mtu $i sleep 2 done sleep 10 done Currently, callbacks such as iavf_change_mtu() wait for the reset. If the reset fails to acquire the rtnl_lock, they schedule the netdev update for later while continuing the reset flow. Operations like MTU changes are performed under the rtnl_lock. Therefore, when the operation finishes, another callback that uses rtnl_lock can start. Signed-off-by: Dawid Wesierski Signed-off-by: Marcin Szycik Signed-off-by: Mateusz Palczewski Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf_main.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index b89933aa5bfe..957ad6e63f6f 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -2977,11 +2977,6 @@ static void iavf_reset_task(struct work_struct *work) int i = 0, err; bool running; - /* Detach interface to avoid subsequent NDO callbacks */ - rtnl_lock(); - netif_device_detach(netdev); - rtnl_unlock(); - /* When device is being removed it doesn't make sense to run the reset * task, just return in such a case. */ @@ -2989,7 +2984,7 @@ static void iavf_reset_task(struct work_struct *work) if (adapter->state != __IAVF_REMOVE) queue_work(adapter->wq, &adapter->reset_task); - goto reset_finish; + return; } while (!mutex_trylock(&adapter->client_lock)) @@ -3192,7 +3187,7 @@ continue_reset: mutex_unlock(&adapter->client_lock); mutex_unlock(&adapter->crit_lock); - goto reset_finish; + return; reset_err: if (running) { set_bit(__IAVF_VSI_DOWN, adapter->vsi.state); @@ -3213,10 +3208,6 @@ reset_err: } dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n"); -reset_finish: - rtnl_lock(); - netif_device_attach(netdev); - rtnl_unlock(); } /** -- cgit v1.2.3 From d916d273041b0b5652434ff27aec716ff90992ac Mon Sep 17 00:00:00 2001 From: Marcin Szycik Date: Mon, 5 Jun 2023 10:52:24 -0400 Subject: Revert "iavf: Do not restart Tx queues after reset task failure" This reverts commit 08f1c147b7265245d67321585c68a27e990e0c4b. Netdev is no longer being detached during reset, so this fix can be reverted. We leave the removal of "hacky" IFF_UP flag update. Signed-off-by: Marcin Szycik Signed-off-by: Mateusz Palczewski Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf_main.c | 15 --------------- 1 file changed, 15 deletions(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 957ad6e63f6f..0c12a752429c 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -3042,11 +3042,6 @@ static void iavf_reset_task(struct work_struct *work) iavf_disable_vf(adapter); mutex_unlock(&adapter->client_lock); mutex_unlock(&adapter->crit_lock); - if (netif_running(netdev)) { - rtnl_lock(); - dev_close(netdev); - rtnl_unlock(); - } return; /* Do not attempt to reinit. It's dead, Jim. */ } @@ -3197,16 +3192,6 @@ reset_err: mutex_unlock(&adapter->client_lock); mutex_unlock(&adapter->crit_lock); - - if (netif_running(netdev)) { - /* Close device to ensure that Tx queues will not be started - * during netif_device_attach() at the end of the reset task. - */ - rtnl_lock(); - dev_close(netdev); - rtnl_unlock(); - } - dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n"); } -- cgit v1.2.3 From d1639a17319ba78a018280cd2df6577a7e5d9fab Mon Sep 17 00:00:00 2001 From: Ahmed Zaki Date: Mon, 5 Jun 2023 10:52:25 -0400 Subject: iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies A driver's lock (crit_lock) is used to serialize all the driver's tasks. Lockdep, however, shows a circular dependency between rtnl and crit_lock. This happens when an ndo that already holds the rtnl requests the driver to reset, since the reset task (in some paths) tries to grab rtnl to either change real number of queues of update netdev features. [566.241851] ====================================================== [566.241893] WARNING: possible circular locking dependency detected [566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE [566.241984] ------------------------------------------------------ [566.242025] repro.sh/2604 is trying to acquire lock: [566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf] [566.242167] but task is already holding lock: [566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf] [566.242300] which lock already depends on the new lock. [566.242353] the existing dependency chain (in reverse order) is: [566.242401] -> #1 (rtnl_mutex){+.+.}-{3:3}: [566.242451] __mutex_lock+0xc1/0xbb0 [566.242489] iavf_init_interrupt_scheme+0x179/0x440 [iavf] [566.242560] iavf_watchdog_task+0x80b/0x1400 [iavf] [566.242627] process_one_work+0x2b3/0x560 [566.242663] worker_thread+0x4f/0x3a0 [566.242696] kthread+0xf2/0x120 [566.242730] ret_from_fork+0x29/0x50 [566.242763] -> #0 (&adapter->crit_lock){+.+.}-{3:3}: [566.242815] __lock_acquire+0x15ff/0x22b0 [566.242869] lock_acquire+0xd2/0x2c0 [566.242901] __mutex_lock+0xc1/0xbb0 [566.242934] iavf_close+0x3c/0x240 [iavf] [566.242997] __dev_close_many+0xac/0x120 [566.243036] dev_close_many+0x8b/0x140 [566.243071] unregister_netdevice_many_notify+0x165/0x7c0 [566.243116] unregister_netdevice_queue+0xd3/0x110 [566.243157] iavf_remove+0x6c1/0x730 [iavf] [566.243217] pci_device_remove+0x33/0xa0 [566.243257] device_release_driver_internal+0x1bc/0x240 [566.243299] pci_stop_bus_device+0x6c/0x90 [566.243338] pci_stop_and_remove_bus_device+0xe/0x20 [566.243380] pci_iov_remove_virtfn+0xd1/0x130 [566.243417] sriov_disable+0x34/0xe0 [566.243448] ice_free_vfs+0x2da/0x330 [ice] [566.244383] ice_sriov_configure+0x88/0xad0 [ice] [566.245353] sriov_numvfs_store+0xde/0x1d0 [566.246156] kernfs_fop_write_iter+0x15e/0x210 [566.246921] vfs_write+0x288/0x530 [566.247671] ksys_write+0x74/0xf0 [566.248408] do_syscall_64+0x58/0x80 [566.249145] entry_SYSCALL_64_after_hwframe+0x72/0xdc [566.249886] other info that might help us debug this: [566.252014] Possible unsafe locking scenario: [566.253432] CPU0 CPU1 [566.254118] ---- ---- [566.254800] lock(rtnl_mutex); [566.255514] lock(&adapter->crit_lock); [566.256233] lock(rtnl_mutex); [566.256897] lock(&adapter->crit_lock); [566.257388] *** DEADLOCK *** The deadlock can be triggered by a script that is continuously resetting the VF adapter while doing other operations requiring RTNL, e.g: while :; do ip link set $VF up ethtool --set-channels $VF combined 2 ip link set $VF down ip link set $VF up ethtool --set-channels $VF combined 4 ip link set $VF down done Any operation that triggers a reset can substitute "ethtool --set-channles" As a fix, add a new task "finish_config" that do all the work which needs rtnl lock. With the exception of iavf_remove(), all work that require rtnl should be called from this task. As for iavf_remove(), at the point where we need to call unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent finish_config work cannot restart after that since the task is guarded by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config(). Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections") Signed-off-by: Ahmed Zaki Signed-off-by: Mateusz Palczewski Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf.h | 2 + drivers/net/ethernet/intel/iavf/iavf_main.c | 114 +++++++++++++++++------- drivers/net/ethernet/intel/iavf/iavf_virtchnl.c | 1 + 3 files changed, 85 insertions(+), 32 deletions(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h index a5cab19eb6a8..bf5e3c8e97e0 100644 --- a/drivers/net/ethernet/intel/iavf/iavf.h +++ b/drivers/net/ethernet/intel/iavf/iavf.h @@ -255,6 +255,7 @@ struct iavf_adapter { struct workqueue_struct *wq; struct work_struct reset_task; struct work_struct adminq_task; + struct work_struct finish_config; struct delayed_work client_task; wait_queue_head_t down_waitqueue; wait_queue_head_t reset_waitqueue; @@ -521,6 +522,7 @@ int iavf_process_config(struct iavf_adapter *adapter); int iavf_parse_vf_resource_msg(struct iavf_adapter *adapter); void iavf_schedule_reset(struct iavf_adapter *adapter); void iavf_schedule_request_stats(struct iavf_adapter *adapter); +void iavf_schedule_finish_config(struct iavf_adapter *adapter); void iavf_reset(struct iavf_adapter *adapter); void iavf_set_ethtool_ops(struct net_device *netdev); void iavf_update_stats(struct iavf_adapter *adapter); diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 0c12a752429c..2a2ae3c4337b 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -1690,10 +1690,10 @@ static int iavf_set_interrupt_capability(struct iavf_adapter *adapter) adapter->msix_entries[vector].entry = vector; err = iavf_acquire_msix_vectors(adapter, v_budget); + if (!err) + iavf_schedule_finish_config(adapter); out: - netif_set_real_num_rx_queues(adapter->netdev, pairs); - netif_set_real_num_tx_queues(adapter->netdev, pairs); return err; } @@ -1913,9 +1913,7 @@ static int iavf_init_interrupt_scheme(struct iavf_adapter *adapter) goto err_alloc_queues; } - rtnl_lock(); err = iavf_set_interrupt_capability(adapter); - rtnl_unlock(); if (err) { dev_err(&adapter->pdev->dev, "Unable to setup interrupt capabilities\n"); @@ -2001,6 +1999,78 @@ err: return err; } +/** + * iavf_finish_config - do all netdev work that needs RTNL + * @work: our work_struct + * + * Do work that needs both RTNL and crit_lock. + **/ +static void iavf_finish_config(struct work_struct *work) +{ + struct iavf_adapter *adapter; + int pairs, err; + + adapter = container_of(work, struct iavf_adapter, finish_config); + + /* Always take RTNL first to prevent circular lock dependency */ + rtnl_lock(); + mutex_lock(&adapter->crit_lock); + + if ((adapter->flags & IAVF_FLAG_SETUP_NETDEV_FEATURES) && + adapter->netdev_registered && + !test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section)) { + netdev_update_features(adapter->netdev); + adapter->flags &= ~IAVF_FLAG_SETUP_NETDEV_FEATURES; + } + + switch (adapter->state) { + case __IAVF_DOWN: + if (!adapter->netdev_registered) { + err = register_netdevice(adapter->netdev); + if (err) { + dev_err(&adapter->pdev->dev, "Unable to register netdev (%d)\n", + err); + + /* go back and try again.*/ + iavf_free_rss(adapter); + iavf_free_misc_irq(adapter); + iavf_reset_interrupt_capability(adapter); + iavf_change_state(adapter, + __IAVF_INIT_CONFIG_ADAPTER); + goto out; + } + adapter->netdev_registered = true; + } + + /* Set the real number of queues when reset occurs while + * state == __IAVF_DOWN + */ + fallthrough; + case __IAVF_RUNNING: + pairs = adapter->num_active_queues; + netif_set_real_num_rx_queues(adapter->netdev, pairs); + netif_set_real_num_tx_queues(adapter->netdev, pairs); + break; + + default: + break; + } + +out: + mutex_unlock(&adapter->crit_lock); + rtnl_unlock(); +} + +/** + * iavf_schedule_finish_config - Set the flags and schedule a reset event + * @adapter: board private structure + **/ +void iavf_schedule_finish_config(struct iavf_adapter *adapter) +{ + if (!test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section)) + queue_work(adapter->wq, &adapter->finish_config); +} + /** * iavf_process_aq_command - process aq_required flags * and sends aq command @@ -2638,22 +2708,8 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter) netif_carrier_off(netdev); adapter->link_up = false; - - /* set the semaphore to prevent any callbacks after device registration - * up to time when state of driver will be set to __IAVF_DOWN - */ - rtnl_lock(); - if (!adapter->netdev_registered) { - err = register_netdevice(netdev); - if (err) { - rtnl_unlock(); - goto err_register; - } - } - - adapter->netdev_registered = true; - netif_tx_stop_all_queues(netdev); + if (CLIENT_ALLOWED(adapter)) { err = iavf_lan_add_device(adapter); if (err) @@ -2666,7 +2722,6 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter) iavf_change_state(adapter, __IAVF_DOWN); set_bit(__IAVF_VSI_DOWN, adapter->vsi.state); - rtnl_unlock(); iavf_misc_irq_enable(adapter); wake_up(&adapter->down_waitqueue); @@ -2686,10 +2741,11 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter) /* request initial VLAN offload settings */ iavf_set_vlan_offload_features(adapter, 0, netdev->features); + iavf_schedule_finish_config(adapter); return; + err_mem: iavf_free_rss(adapter); -err_register: iavf_free_misc_irq(adapter); err_sw_init: iavf_reset_interrupt_capability(adapter); @@ -2716,15 +2772,6 @@ static void iavf_watchdog_task(struct work_struct *work) goto restart_watchdog; } - if ((adapter->flags & IAVF_FLAG_SETUP_NETDEV_FEATURES) && - adapter->netdev_registered && - !test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section) && - rtnl_trylock()) { - netdev_update_features(adapter->netdev); - rtnl_unlock(); - adapter->flags &= ~IAVF_FLAG_SETUP_NETDEV_FEATURES; - } - if (adapter->flags & IAVF_FLAG_PF_COMMS_FAILED) iavf_change_state(adapter, __IAVF_COMM_FAILED); @@ -4942,6 +4989,7 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent) INIT_WORK(&adapter->reset_task, iavf_reset_task); INIT_WORK(&adapter->adminq_task, iavf_adminq_task); + INIT_WORK(&adapter->finish_config, iavf_finish_config); INIT_DELAYED_WORK(&adapter->watchdog_task, iavf_watchdog_task); INIT_DELAYED_WORK(&adapter->client_task, iavf_client_task); queue_delayed_work(adapter->wq, &adapter->watchdog_task, @@ -5084,13 +5132,15 @@ static void iavf_remove(struct pci_dev *pdev) usleep_range(500, 1000); } cancel_delayed_work_sync(&adapter->watchdog_task); + cancel_work_sync(&adapter->finish_config); + rtnl_lock(); if (adapter->netdev_registered) { - rtnl_lock(); unregister_netdevice(netdev); adapter->netdev_registered = false; - rtnl_unlock(); } + rtnl_unlock(); + if (CLIENT_ALLOWED(adapter)) { err = iavf_lan_del_device(adapter); if (err) diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c index 1bab896aaf40..073ac29ed84c 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c @@ -2237,6 +2237,7 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, iavf_process_config(adapter); adapter->flags |= IAVF_FLAG_SETUP_NETDEV_FEATURES; + iavf_schedule_finish_config(adapter); iavf_set_queue_vlan_tag_loc(adapter); -- cgit v1.2.3 From c34743daca0eb1dc855831a5210f0800a850088e Mon Sep 17 00:00:00 2001 From: Ahmed Zaki Date: Mon, 5 Jun 2023 10:52:26 -0400 Subject: iavf: fix reset task race with iavf_remove() The reset task is currently scheduled from the watchdog or adminq tasks. First, all direct calls to schedule the reset task are replaced with the iavf_schedule_reset(), which is modified to accept the flag showing the type of reset. To prevent the reset task from starting once iavf_remove() starts, we need to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now easily added to iavf_schedule_reset(). Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task. It is redundant since all callers who set the flag immediately schedules the reset task. Fixes: 3ccd54ef44eb ("iavf: Fix init state closure on remove") Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage") Signed-off-by: Ahmed Zaki Signed-off-by: Mateusz Palczewski Tested-by: Rafal Romanowski Signed-off-by: Tony Nguyen --- drivers/net/ethernet/intel/iavf/iavf.h | 2 +- drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 8 +++---- drivers/net/ethernet/intel/iavf/iavf_main.c | 32 +++++++++---------------- drivers/net/ethernet/intel/iavf/iavf_virtchnl.c | 3 +-- 4 files changed, 16 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h index bf5e3c8e97e0..8cbdebc5b698 100644 --- a/drivers/net/ethernet/intel/iavf/iavf.h +++ b/drivers/net/ethernet/intel/iavf/iavf.h @@ -520,7 +520,7 @@ int iavf_up(struct iavf_adapter *adapter); void iavf_down(struct iavf_adapter *adapter); int iavf_process_config(struct iavf_adapter *adapter); int iavf_parse_vf_resource_msg(struct iavf_adapter *adapter); -void iavf_schedule_reset(struct iavf_adapter *adapter); +void iavf_schedule_reset(struct iavf_adapter *adapter, u64 flags); void iavf_schedule_request_stats(struct iavf_adapter *adapter); void iavf_schedule_finish_config(struct iavf_adapter *adapter); void iavf_reset(struct iavf_adapter *adapter); diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c index b7141c2a941d..2f47cfa7f06e 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c @@ -532,8 +532,7 @@ static int iavf_set_priv_flags(struct net_device *netdev, u32 flags) /* issue a reset to force legacy-rx change to take effect */ if (changed_flags & IAVF_FLAG_LEGACY_RX) { if (netif_running(netdev)) { - adapter->flags |= IAVF_FLAG_RESET_NEEDED; - queue_work(adapter->wq, &adapter->reset_task); + iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED); ret = iavf_wait_for_reset(adapter); if (ret) netdev_warn(netdev, "Changing private flags timeout or interrupted waiting for reset"); @@ -676,8 +675,7 @@ static int iavf_set_ringparam(struct net_device *netdev, } if (netif_running(netdev)) { - adapter->flags |= IAVF_FLAG_RESET_NEEDED; - queue_work(adapter->wq, &adapter->reset_task); + iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED); ret = iavf_wait_for_reset(adapter); if (ret) netdev_warn(netdev, "Changing ring parameters timeout or interrupted waiting for reset"); @@ -1860,7 +1858,7 @@ static int iavf_set_channels(struct net_device *netdev, adapter->num_req_queues = num_req; adapter->flags |= IAVF_FLAG_REINIT_ITR_NEEDED; - iavf_schedule_reset(adapter); + iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED); ret = iavf_wait_for_reset(adapter); if (ret) diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 2a2ae3c4337b..3a88d413ddee 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -301,12 +301,14 @@ static int iavf_lock_timeout(struct mutex *lock, unsigned int msecs) /** * iavf_schedule_reset - Set the flags and schedule a reset event * @adapter: board private structure + * @flags: IAVF_FLAG_RESET_PENDING or IAVF_FLAG_RESET_NEEDED **/ -void iavf_schedule_reset(struct iavf_adapter *adapter) +void iavf_schedule_reset(struct iavf_adapter *adapter, u64 flags) { - if (!(adapter->flags & - (IAVF_FLAG_RESET_PENDING | IAVF_FLAG_RESET_NEEDED))) { - adapter->flags |= IAVF_FLAG_RESET_NEEDED; + if (!test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section) && + !(adapter->flags & + (IAVF_FLAG_RESET_PENDING | IAVF_FLAG_RESET_NEEDED))) { + adapter->flags |= flags; queue_work(adapter->wq, &adapter->reset_task); } } @@ -334,7 +336,7 @@ static void iavf_tx_timeout(struct net_device *netdev, unsigned int txqueue) struct iavf_adapter *adapter = netdev_priv(netdev); adapter->tx_timeout_count++; - iavf_schedule_reset(adapter); + iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED); } /** @@ -2478,7 +2480,7 @@ int iavf_parse_vf_resource_msg(struct iavf_adapter *adapter) adapter->vsi_res->num_queue_pairs); adapter->flags |= IAVF_FLAG_REINIT_MSIX_NEEDED; adapter->num_req_queues = adapter->vsi_res->num_queue_pairs; - iavf_schedule_reset(adapter); + iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED); return -EAGAIN; } @@ -2775,14 +2777,6 @@ static void iavf_watchdog_task(struct work_struct *work) if (adapter->flags & IAVF_FLAG_PF_COMMS_FAILED) iavf_change_state(adapter, __IAVF_COMM_FAILED); - if (adapter->flags & IAVF_FLAG_RESET_NEEDED) { - adapter->aq_required = 0; - adapter->current_op = VIRTCHNL_OP_UNKNOWN; - mutex_unlock(&adapter->crit_lock); - queue_work(adapter->wq, &adapter->reset_task); - return; - } - switch (adapter->state) { case __IAVF_STARTUP: iavf_startup(adapter); @@ -2910,11 +2904,10 @@ static void iavf_watchdog_task(struct work_struct *work) /* check for hw reset */ reg_val = rd32(hw, IAVF_VF_ARQLEN1) & IAVF_VF_ARQLEN1_ARQENABLE_MASK; if (!reg_val) { - adapter->flags |= IAVF_FLAG_RESET_PENDING; adapter->aq_required = 0; adapter->current_op = VIRTCHNL_OP_UNKNOWN; dev_err(&adapter->pdev->dev, "Hardware reset detected\n"); - queue_work(adapter->wq, &adapter->reset_task); + iavf_schedule_reset(adapter, IAVF_FLAG_RESET_PENDING); mutex_unlock(&adapter->crit_lock); queue_delayed_work(adapter->wq, &adapter->watchdog_task, HZ * 2); @@ -3288,9 +3281,7 @@ static void iavf_adminq_task(struct work_struct *work) } while (pending); mutex_unlock(&adapter->crit_lock); - if ((adapter->flags & - (IAVF_FLAG_RESET_PENDING | IAVF_FLAG_RESET_NEEDED)) || - adapter->state == __IAVF_RESETTING) + if (iavf_is_reset_in_progress(adapter)) goto freedom; /* check for error indications */ @@ -4387,8 +4378,7 @@ static int iavf_change_mtu(struct net_device *netdev, int new_mtu) } if (netif_running(netdev)) { - adapter->flags |= IAVF_FLAG_RESET_NEEDED; - queue_work(adapter->wq, &adapter->reset_task); + iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED); ret = iavf_wait_for_reset(adapter); if (ret < 0) netdev_warn(netdev, "MTU change interrupted waiting for reset"); diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c index 073ac29ed84c..be3c007ce90a 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c @@ -1961,9 +1961,8 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter, case VIRTCHNL_EVENT_RESET_IMPENDING: dev_info(&adapter->pdev->dev, "Reset indication received from the PF\n"); if (!(adapter->flags & IAVF_FLAG_RESET_PENDING)) { - adapter->flags |= IAVF_FLAG_RESET_PENDING; dev_info(&adapter->pdev->dev, "Scheduling reset task\n"); - queue_work(adapter->wq, &adapter->reset_task); + iavf_schedule_reset(adapter, IAVF_FLAG_RESET_PENDING); } break; default: -- cgit v1.2.3 From 9efa1a5407e81265ea502cab83be4de503decc49 Mon Sep 17 00:00:00 2001 From: Fedor Ross Date: Thu, 4 May 2023 21:50:59 +0200 Subject: can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0 mode' or . The maximum length of a CAN frame is 736 bits (64 data bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe spacing). For low bit rates like 10 kbit/s the arbitrarily chosen MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small. Otherwise during polling for the CAN controller to enter 'Normal CAN 2.0 mode' the timeout limit is exceeded and the configuration fails with: | $ ip link set dev can1 up type can bitrate 10000 | [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468). | [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying. | [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check. | RTNETLINK answers: Connection timed out Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in EFF mode, worst case bit stuffing and interframe spacing at the current bit rate. For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS that holds the max frame length in bits, which is 736. This can be replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in a cleanup patch later. Fixes: 55e5b97f003e8 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN") Signed-off-by: Fedor Ross Signed-off-by: Marek Vasut Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20230717-mcp251xfd-fix-increase-poll-timeout-v5-1-06600f34c684@pengutronix.de Signed-off-by: Marc Kleine-Budde --- drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 10 ++++++++-- drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 1 + 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c index 68df6d4641b5..eebf967f4711 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c @@ -227,6 +227,8 @@ static int __mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv, const u8 mode_req, bool nowait) { + const struct can_bittiming *bt = &priv->can.bittiming; + unsigned long timeout_us = MCP251XFD_POLL_TIMEOUT_US; u32 con = 0, con_reqop, osc = 0; u8 mode; int err; @@ -246,12 +248,16 @@ __mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv, if (mode_req == MCP251XFD_REG_CON_MODE_SLEEP || nowait) return 0; + if (bt->bitrate) + timeout_us = max_t(unsigned long, timeout_us, + MCP251XFD_FRAME_LEN_MAX_BITS * USEC_PER_SEC / + bt->bitrate); + err = regmap_read_poll_timeout(priv->map_reg, MCP251XFD_REG_CON, con, !mcp251xfd_reg_invalid(con) && FIELD_GET(MCP251XFD_REG_CON_OPMOD_MASK, con) == mode_req, - MCP251XFD_POLL_SLEEP_US, - MCP251XFD_POLL_TIMEOUT_US); + MCP251XFD_POLL_SLEEP_US, timeout_us); if (err != -ETIMEDOUT && err != -EBADMSG) return err; diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h index 7024ff0cc2c0..24510b3b8020 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h @@ -387,6 +387,7 @@ static_assert(MCP251XFD_TIMESTAMP_WORK_DELAY_SEC < #define MCP251XFD_OSC_STAB_TIMEOUT_US (10 * MCP251XFD_OSC_STAB_SLEEP_US) #define MCP251XFD_POLL_SLEEP_US (10) #define MCP251XFD_POLL_TIMEOUT_US (USEC_PER_MSEC) +#define MCP251XFD_FRAME_LEN_MAX_BITS (736) /* Misc */ #define MCP251XFD_NAPI_WEIGHT 32 -- cgit v1.2.3 From 2033ab90380d46e0e9f0520fd6776a73d107fd95 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Sat, 15 Jul 2023 18:36:05 +0300 Subject: vrf: Fix lockdep splat in output path Cited commit converted the neighbour code to use the standard RCU variant instead of the RCU-bh variant, but the VRF code still uses rcu_read_lock_bh() / rcu_read_unlock_bh() around the neighbour lookup code in its IPv4 and IPv6 output paths, resulting in lockdep splats [1][2]. Can be reproduced using [3]. Fix by switching to rcu_read_lock() / rcu_read_unlock(). [1] ============================= WARNING: suspicious RCU usage 6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted ----------------------------- include/net/neighbour.h:302 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by ping/183: #0: ffff888105ea1d80 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0xc6c/0x33c0 #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output+0x2e3/0x2030 stack backtrace: CPU: 0 PID: 183 Comm: ping Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 Call Trace: dump_stack_lvl+0xc1/0xf0 lockdep_rcu_suspicious+0x211/0x3b0 vrf_output+0x1380/0x2030 ip_push_pending_frames+0x125/0x2a0 raw_sendmsg+0x200d/0x33c0 inet_sendmsg+0xa2/0xe0 __sys_sendto+0x2aa/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [2] ============================= WARNING: suspicious RCU usage 6.5.0-rc1-custom-g9c099e6dbf98 #403 Not tainted ----------------------------- include/net/neighbour.h:302 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by ping6/182: #0: ffff888114b63000 (sk_lock-AF_INET6){+.+.}-{0:0}, at: rawv6_sendmsg+0x1602/0x3e50 #1: ffffffff85b46820 (rcu_read_lock_bh){....}-{1:2}, at: vrf_output6+0xe9/0x1310 stack backtrace: CPU: 0 PID: 182 Comm: ping6 Not tainted 6.5.0-rc1-custom-g9c099e6dbf98 #403 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 Call Trace: dump_stack_lvl+0xc1/0xf0 lockdep_rcu_suspicious+0x211/0x3b0 vrf_output6+0xd32/0x1310 ip6_local_out+0xb4/0x1a0 ip6_send_skb+0xbc/0x340 ip6_push_pending_frames+0xe5/0x110 rawv6_sendmsg+0x2e6e/0x3e50 inet_sendmsg+0xa2/0xe0 __sys_sendto+0x2aa/0x420 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [3] #!/bin/bash ip link add name vrf-red up numtxqueues 2 type vrf table 10 ip link add name swp1 up master vrf-red type dummy ip address add 192.0.2.1/24 dev swp1 ip address add 2001:db8:1::1/64 dev swp1 ip neigh add 192.0.2.2 lladdr 00:11:22:33:44:55 nud perm dev swp1 ip neigh add 2001:db8:1::2 lladdr 00:11:22:33:44:55 nud perm dev swp1 ip vrf exec vrf-red ping 192.0.2.2 -c 1 &> /dev/null ip vrf exec vrf-red ping6 2001:db8:1::2 -c 1 &> /dev/null Fixes: 09eed1192cec ("neighbour: switch to standard rcu, instead of rcu_bh") Reported-by: Naresh Kamboju Link: https://lore.kernel.org/netdev/CA+G9fYtEr-=GbcXNDYo3XOkwR+uYgehVoDjsP0pFLUpZ_AZcyg@mail.gmail.com/ Signed-off-by: Ido Schimmel Reviewed-by: David Ahern Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20230715153605.4068066-1-idosch@nvidia.com Signed-off-by: Paolo Abeni --- drivers/net/vrf.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index bdb3a76a352e..6043e63b42f9 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -664,7 +664,7 @@ static int vrf_finish_output6(struct net *net, struct sock *sk, skb->protocol = htons(ETH_P_IPV6); skb->dev = dev; - rcu_read_lock_bh(); + rcu_read_lock(); nexthop = rt6_nexthop((struct rt6_info *)dst, &ipv6_hdr(skb)->daddr); neigh = __ipv6_neigh_lookup_noref(dst->dev, nexthop); if (unlikely(!neigh)) @@ -672,10 +672,10 @@ static int vrf_finish_output6(struct net *net, struct sock *sk, if (!IS_ERR(neigh)) { sock_confirm_neigh(skb, neigh); ret = neigh_output(neigh, skb, false); - rcu_read_unlock_bh(); + rcu_read_unlock(); return ret; } - rcu_read_unlock_bh(); + rcu_read_unlock(); IP6_INC_STATS(dev_net(dst->dev), ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES); @@ -889,7 +889,7 @@ static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s } } - rcu_read_lock_bh(); + rcu_read_lock(); neigh = ip_neigh_for_gw(rt, skb, &is_v6gw); if (!IS_ERR(neigh)) { @@ -898,11 +898,11 @@ static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s sock_confirm_neigh(skb, neigh); /* if crossing protocols, can not use the cached header */ ret = neigh_output(neigh, skb, is_v6gw); - rcu_read_unlock_bh(); + rcu_read_unlock(); return ret; } - rcu_read_unlock_bh(); + rcu_read_unlock(); vrf_tx_error(skb->dev, skb); return -EINVAL; } -- cgit v1.2.3 From 8fcd7c7b3a38ab5e452f542fda8f7940e77e479a Mon Sep 17 00:00:00 2001 From: Geetha sowjanya Date: Sun, 16 Jul 2023 15:07:41 +0530 Subject: octeontx2-pf: Dont allocate BPIDs for LBK interfaces Current driver enables backpressure for LBK interfaces. But these interfaces do not support this feature. Hence, this patch fixes the issue by skipping the backpressure configuration for these interfaces. Fixes: 75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool"). Signed-off-by: Geetha sowjanya Signed-off-by: Sunil Goutham Link: https://lore.kernel.org/r/20230716093741.28063-1-gakula@marvell.com Signed-off-by: Paolo Abeni --- drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c index fe8ea4e531b7..9551b422622a 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c @@ -1454,8 +1454,9 @@ static int otx2_init_hw_resources(struct otx2_nic *pf) if (err) goto err_free_npa_lf; - /* Enable backpressure */ - otx2_nix_config_bp(pf, true); + /* Enable backpressure for CGX mapped PF/VFs */ + if (!is_otx2_lbkvf(pf->pdev)) + otx2_nix_config_bp(pf, true); /* Init Auras and pools used by NIX RQ, for free buffer ptrs */ err = otx2_rq_aura_pool_init(pf); -- cgit v1.2.3 From ba7b3e7d5f9014be65879ede8fd599cb222901c9 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 17 Jul 2023 21:45:28 +0530 Subject: bpf: Fix subprog idx logic in check_max_stack_depth The assignment to idx in check_max_stack_depth happens once we see a bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of the code performs a few checks and then pushes the frame to the frame stack, except the case of async callbacks. If the async callback case causes the loop iteration to be skipped, the idx assignment will be incorrect on the next iteration of the loop. The value stored in the frame stack (as the subprogno of the current subprog) will be incorrect. This leads to incorrect checks and incorrect tail_call_reachable marking. Save the target subprog in a new variable and only assign to idx once we are done with the is_async_cb check which may skip pushing of frame to the frame stack and subsequent stack depth checks and tail call markings. Fixes: 7ddc80a476c2 ("bpf: Teach stack depth check about async callbacks.") Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20230717161530.1238-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 930b5555cfd3..e682056dd144 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5621,7 +5621,7 @@ process_func: continue_func: subprog_end = subprog[idx + 1].start; for (; i < subprog_end; i++) { - int next_insn; + int next_insn, sidx; if (!bpf_pseudo_call(insn + i) && !bpf_pseudo_func(insn + i)) continue; @@ -5631,14 +5631,14 @@ continue_func: /* find the callee */ next_insn = i + insn[i].imm + 1; - idx = find_subprog(env, next_insn); - if (idx < 0) { + sidx = find_subprog(env, next_insn); + if (sidx < 0) { WARN_ONCE(1, "verifier bug. No program starts at insn %d\n", next_insn); return -EFAULT; } - if (subprog[idx].is_async_cb) { - if (subprog[idx].has_tail_call) { + if (subprog[sidx].is_async_cb) { + if (subprog[sidx].has_tail_call) { verbose(env, "verifier bug. subprog has tail_call and async cb\n"); return -EFAULT; } @@ -5647,6 +5647,7 @@ continue_func: continue; } i = next_insn; + idx = sidx; if (subprog[idx].has_tail_call) tail_call_reachable = true; -- cgit v1.2.3 From b5e9ad522c4ccd32d322877515cff8d47ed731b9 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 17 Jul 2023 21:45:29 +0530 Subject: bpf: Repeat check_max_stack_depth for async callbacks While the check_max_stack_depth function explores call chains emanating from the main prog, which is typically enough to cover all possible call chains, it doesn't explore those rooted at async callbacks unless the async callback will have been directly called, since unlike non-async callbacks it skips their instruction exploration as they don't contribute to stack depth. It could be the case that the async callback leads to a callchain which exceeds the stack depth, but this is never reachable while only exploring the entry point from main subprog. Hence, repeat the check for the main subprog *and* all async callbacks marked by the symbolic execution pass of the verifier, as execution of the program may begin at any of them. Consider functions with following stack depths: main: 256 async: 256 foo: 256 main: rX = async bpf_timer_set_callback(...) async: foo() Here, async is not descended as it does not contribute to stack depth of main (since it is referenced using bpf_pseudo_func and not bpf_pseudo_call). However, when async is invoked asynchronously, it will end up breaching the MAX_BPF_STACK limit by calling foo. Hence, in addition to main, we also need to explore call chains beginning at all async callback subprogs in a program. Fixes: 7ddc80a476c2 ("bpf: Teach stack depth check about async callbacks.") Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20230717161530.1238-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e682056dd144..02a021c524ab 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5573,16 +5573,17 @@ static int update_stack_depth(struct bpf_verifier_env *env, * Since recursion is prevented by check_cfg() this algorithm * only needs a local stack of MAX_CALL_FRAMES to remember callsites */ -static int check_max_stack_depth(struct bpf_verifier_env *env) +static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) { - int depth = 0, frame = 0, idx = 0, i = 0, subprog_end; struct bpf_subprog_info *subprog = env->subprog_info; struct bpf_insn *insn = env->prog->insnsi; + int depth = 0, frame = 0, i, subprog_end; bool tail_call_reachable = false; int ret_insn[MAX_CALL_FRAMES]; int ret_prog[MAX_CALL_FRAMES]; int j; + i = subprog[idx].start; process_func: /* protect against potential stack overflow that might happen when * bpf2bpf calls get combined with tailcalls. Limit the caller's stack @@ -5683,6 +5684,22 @@ continue_func: goto continue_func; } +static int check_max_stack_depth(struct bpf_verifier_env *env) +{ + struct bpf_subprog_info *si = env->subprog_info; + int ret; + + for (int i = 0; i < env->subprog_cnt; i++) { + if (!i || si[i].is_async_cb) { + ret = check_max_stack_depth_subprog(env, i); + if (ret < 0) + return ret; + } + continue; + } + return 0; +} + #ifndef CONFIG_BPF_JIT_ALWAYS_ON static int get_callee_stack_depth(struct bpf_verifier_env *env, const struct bpf_insn *insn, int idx) -- cgit v1.2.3 From 824adae4530b4db1d06987d8dd31a0adef37044f Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 17 Jul 2023 21:45:30 +0530 Subject: selftests/bpf: Add more tests for check_max_stack_depth bug Another test which now exercies the path of the verifier where it will explore call chains rooted at the async callback. Without the prior fixes, this program loads successfully, which is incorrect. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20230717161530.1238-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- .../selftests/bpf/progs/async_stack_depth.c | 25 ++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/async_stack_depth.c b/tools/testing/selftests/bpf/progs/async_stack_depth.c index 477ba950bb43..3517c0e01206 100644 --- a/tools/testing/selftests/bpf/progs/async_stack_depth.c +++ b/tools/testing/selftests/bpf/progs/async_stack_depth.c @@ -22,9 +22,16 @@ static int timer_cb(void *map, int *key, struct bpf_timer *timer) return buf[69]; } +__attribute__((noinline)) +static int bad_timer_cb(void *map, int *key, struct bpf_timer *timer) +{ + volatile char buf[300] = {}; + return buf[255] + timer_cb(NULL, NULL, NULL); +} + SEC("tc") -__failure __msg("combined stack size of 2 calls") -int prog(struct __sk_buff *ctx) +__failure __msg("combined stack size of 2 calls is 576. Too large") +int pseudo_call_check(struct __sk_buff *ctx) { struct hmap_elem *elem; volatile char buf[256] = {}; @@ -37,4 +44,18 @@ int prog(struct __sk_buff *ctx) return bpf_timer_set_callback(&elem->timer, timer_cb) + buf[0]; } +SEC("tc") +__failure __msg("combined stack size of 2 calls is 608. Too large") +int async_call_root_check(struct __sk_buff *ctx) +{ + struct hmap_elem *elem; + volatile char buf[256] = {}; + + elem = bpf_map_lookup_elem(&hmap, &(int){0}); + if (!elem) + return 0; + + return bpf_timer_set_callback(&elem->timer, bad_timer_cb) + buf[0]; +} + char _license[] SEC("license") = "GPL"; -- cgit v1.2.3 From a3f25d614bc73b45e8f02adc6769876dfd16ca84 Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Thu, 13 Jul 2023 09:49:31 -0700 Subject: bpf, arm64: Fix BTI type used for freplace attached functions When running an freplace attached bpf program on an arm64 system w were seeing the following issue: Unhandled 64-bit el1h sync exception on CPU47, ESR 0x0000000036000003 -- BTI After a bit of work to track it down I determined that what appeared to be happening is that the 'bti c' at the start of the program was somehow being reached after a 'br' instruction. Further digging pointed me toward the fact that the function was attached via freplace. This in turn led me to build_plt which I believe is invoking the long jump which is triggering this error. To resolve it we can replace the 'bti c' with 'bti jc' and add a comment explaining why this has to be modified as such. Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64") Signed-off-by: Alexander Duyck Acked-by: Xu Kuohai Link: https://lore.kernel.org/r/168926677665.316237.9953845318337455525.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Alexei Starovoitov --- arch/arm64/net/bpf_jit_comp.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 145b540ec34f..ec2174838f2a 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -322,7 +322,13 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf) * */ - emit_bti(A64_BTI_C, ctx); + /* bpf function may be invoked by 3 instruction types: + * 1. bl, attached via freplace to bpf prog via short jump + * 2. br, attached via freplace to bpf prog via long jump + * 3. blr, working as a function pointer, used by emit_call. + * So BTI_JC should used here to support both br and blr. + */ + emit_bti(A64_BTI_JC, ctx); emit(A64_MOV(1, A64_R(9), A64_LR), ctx); emit(A64_NOP, ctx); -- cgit v1.2.3 From fda05798c22a354efde09a76bdfc276b2d591829 Mon Sep 17 00:00:00 2001 From: Matthieu Baerts Date: Thu, 13 Jul 2023 23:16:44 +0200 Subject: selftests: tc: set timeout to 15 minutes When looking for something else in LKFT reports [1], I noticed that the TC selftest ended with a timeout error: not ok 1 selftests: tc-testing: tdc.sh # TIMEOUT 45 seconds The timeout had been introduced 3 years ago, see the Fixes commit below. This timeout is only in place when executing the selftests via the kselftests runner scripts. I guess this is not what most TC devs are using and nobody noticed the issue before. The new timeout is set to 15 minutes as suggested by Pedro [2]. It looks like it is plenty more time than what it takes in "normal" conditions. Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Cc: stable@vger.kernel.org Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1] Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2] Suggested-by: Pedro Tammela Signed-off-by: Matthieu Baerts Reviewed-by: Zhengchao Shao Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-1-1eb4fd3a96e7@tessares.net Acked-by: Jamal Hadi Salim Signed-off-by: Jakub Kicinski --- tools/testing/selftests/tc-testing/settings | 1 + 1 file changed, 1 insertion(+) create mode 100644 tools/testing/selftests/tc-testing/settings diff --git a/tools/testing/selftests/tc-testing/settings b/tools/testing/selftests/tc-testing/settings new file mode 100644 index 000000000000..e2206265f67c --- /dev/null +++ b/tools/testing/selftests/tc-testing/settings @@ -0,0 +1 @@ +timeout=900 -- cgit v1.2.3 From 719b4774a8cb1a501e2d22a5a4a3a0a870e427d5 Mon Sep 17 00:00:00 2001 From: Matthieu Baerts Date: Thu, 13 Jul 2023 23:16:45 +0200 Subject: selftests: tc: add 'ct' action kconfig dep When looking for something else in LKFT reports [1], I noticed most of the tests were skipped because the "teardown stage" did not complete successfully. Pedro found out this is due to the fact CONFIG_NF_FLOW_TABLE is required but not listed in the 'config' file. Adding it to the list fixes the issues on LKFT side. CONFIG_NET_ACT_CT is now set to 'm' in the final kconfig. Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone") Cc: stable@vger.kernel.org Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1] Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2] Suggested-by: Pedro Tammela Signed-off-by: Matthieu Baerts Tested-by: Zhengchao Shao Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-2-1eb4fd3a96e7@tessares.net Acked-by: Jamal Hadi Salim Signed-off-by: Jakub Kicinski --- tools/testing/selftests/tc-testing/config | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config index 6e73b09c20c8..d1ad29040c02 100644 --- a/tools/testing/selftests/tc-testing/config +++ b/tools/testing/selftests/tc-testing/config @@ -5,6 +5,7 @@ CONFIG_NF_CONNTRACK=m CONFIG_NF_CONNTRACK_MARK=y CONFIG_NF_CONNTRACK_ZONES=y CONFIG_NF_CONNTRACK_LABELS=y +CONFIG_NF_FLOW_TABLE=m CONFIG_NF_NAT=m CONFIG_NETFILTER_XT_TARGET_LOG=m -- cgit v1.2.3 From 031c99e71fedcce93b6785d38b7d287bf59e3952 Mon Sep 17 00:00:00 2001 From: Matthieu Baerts Date: Thu, 13 Jul 2023 23:16:46 +0200 Subject: selftests: tc: add ConnTrack procfs kconfig When looking at the TC selftest reports, I noticed one test was failing because /proc/net/nf_conntrack was not available. not ok 373 3992 - Add ct action triggering DNAT tuple conflict Could not match regex pattern. Verify command output: cat: /proc/net/nf_conntrack: No such file or directory It is only available if NF_CONNTRACK_PROCFS kconfig is set. So the issue can be fixed simply by adding it to the list of required kconfig. Fixes: e46905641316 ("tc-testing: add test for ct DNAT tuple collision") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [1] Signed-off-by: Matthieu Baerts Tested-by: Zhengchao Shao Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-3-1eb4fd3a96e7@tessares.net Acked-by: Jamal Hadi Salim Signed-off-by: Jakub Kicinski --- tools/testing/selftests/tc-testing/config | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config index d1ad29040c02..71706197ba0f 100644 --- a/tools/testing/selftests/tc-testing/config +++ b/tools/testing/selftests/tc-testing/config @@ -5,6 +5,7 @@ CONFIG_NF_CONNTRACK=m CONFIG_NF_CONNTRACK_MARK=y CONFIG_NF_CONNTRACK_ZONES=y CONFIG_NF_CONNTRACK_LABELS=y +CONFIG_NF_CONNTRACK_PROCFS=y CONFIG_NF_FLOW_TABLE=m CONFIG_NF_NAT=m CONFIG_NETFILTER_XT_TARGET_LOG=m -- cgit v1.2.3 From d1998e505a995f5305a6add46f3d806fa38dae06 Mon Sep 17 00:00:00 2001 From: Shannon Nelson Date: Mon, 17 Jul 2023 12:32:42 -0700 Subject: mailmap: add entries for past lives Update old emails for my current work email. Signed-off-by: Shannon Nelson Link: https://lore.kernel.org/r/20230717193242.43670-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski --- .mailmap | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.mailmap b/.mailmap index 1bce47a7f2ce..3af557de8e73 100644 --- a/.mailmap +++ b/.mailmap @@ -453,6 +453,8 @@ Sebastian Reichel Sedat Dilek Seth Forshee Shannon Nelson +Shannon Nelson +Shannon Nelson Shiraz Hashim Shuah Khan Shuah Khan -- cgit v1.2.3 From 195e903b342a73c08c3249cec55b07bf3f23200a Mon Sep 17 00:00:00 2001 From: John Fastabend Date: Mon, 17 Jul 2023 10:33:06 -0700 Subject: mailmap: Add entry for old intel email Fix old email to avoid bouncing email from net/drivers and older netdev work. Anyways my @intel email hasn't been active for years. Signed-off-by: John Fastabend Link: https://lore.kernel.org/r/20230717173306.38407-1-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski --- .mailmap | 1 + 1 file changed, 1 insertion(+) diff --git a/.mailmap b/.mailmap index 3af557de8e73..bcabc8a35122 100644 --- a/.mailmap +++ b/.mailmap @@ -241,6 +241,7 @@ Jisheng Zhang Johan Hovold Johan Hovold John Crispin +John Fastabend John Keeping John Paul Adrian Glaubitz John Stultz -- cgit v1.2.3 From 78adb4bcf99effbb960c5f9091e2e062509d1030 Mon Sep 17 00:00:00 2001 From: Florian Kauer Date: Mon, 17 Jul 2023 10:54:44 -0700 Subject: igc: Prevent garbled TX queue with XDP ZEROCOPY In normal operation, each populated queue item has next_to_watch pointing to the last TX desc of the packet, while each cleaned item has it set to 0. In particular, next_to_use that points to the next (necessarily clean) item to use has next_to_watch set to 0. When the TX queue is used both by an application using AF_XDP with ZEROCOPY as well as a second non-XDP application generating high traffic, the queue pointers can get in an invalid state where next_to_use points to an item where next_to_watch is NOT set to 0. However, the implementation assumes at several places that this is never the case, so if it does hold, bad things happen. In particular, within the loop inside of igc_clean_tx_irq(), next_to_clean can overtake next_to_use. Finally, this prevents any further transmission via this queue and it never gets unblocked or signaled. Secondly, if the queue is in this garbled state, the inner loop of igc_clean_tx_ring() will never terminate, completely hogging a CPU core. The reason is that igc_xdp_xmit_zc() reads next_to_use before acquiring the lock, and writing it back (potentially unmodified) later. If it got modified before locking, the outdated next_to_use is written pointing to an item that was already used elsewhere (and thus next_to_watch got written). Fixes: 9acf59a752d4 ("igc: Enable TX via AF_XDP zero-copy") Signed-off-by: Florian Kauer Reviewed-by: Kurt Kanzenbach Tested-by: Kurt Kanzenbach Acked-by: Vinicius Costa Gomes Reviewed-by: Simon Horman Tested-by: Naama Meir Signed-off-by: Tony Nguyen Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/intel/igc/igc_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 9f93f0f4f752..f36bc2a1849a 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -2828,9 +2828,8 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) struct netdev_queue *nq = txring_txq(ring); union igc_adv_tx_desc *tx_desc = NULL; int cpu = smp_processor_id(); - u16 ntu = ring->next_to_use; struct xdp_desc xdp_desc; - u16 budget; + u16 budget, ntu; if (!netif_carrier_ok(ring->netdev)) return; @@ -2840,6 +2839,7 @@ static void igc_xdp_xmit_zc(struct igc_ring *ring) /* Avoid transmit queue timeout since we share it with the slow path */ txq_trans_cond_update(nq); + ntu = ring->next_to_use; budget = igc_desc_unused(ring); while (xsk_tx_peek_desc(pool, &xdp_desc) && budget--) { -- cgit v1.2.3 From e7002b3b20c58bce4a88c15aca8e6cc894e3a7ed Mon Sep 17 00:00:00 2001 From: Subbaraya Sundeep Date: Mon, 17 Jul 2023 11:46:43 +0530 Subject: octeontx2-pf: mcs: Generate hash key using ecb(aes) Hardware generated encryption and ICV tags are found to be wrong when tested with IEEE MACSEC test vectors. This is because as per the HRM, the hash key (derived by AES-ECB block encryption of an all 0s block with the SAK) has to be programmed by the software in MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register. Hence fix this by generating hash key in software and configuring in hardware. Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Subbaraya Sundeep Reviewed-by: Kalesh AP Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.com Signed-off-by: Jakub Kicinski --- .../ethernet/marvell/octeontx2/nic/cn10k_macsec.c | 137 +++++++++++++++------ 1 file changed, 100 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c index 6e2fb24be8c1..59b138214af2 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn10k_macsec.c @@ -4,6 +4,7 @@ * Copyright (C) 2022 Marvell. */ +#include #include #include #include "otx2_common.h" @@ -42,6 +43,56 @@ #define MCS_TCI_E 0x08 /* encryption */ #define MCS_TCI_C 0x04 /* changed text */ +#define CN10K_MAX_HASH_LEN 16 +#define CN10K_MAX_SAK_LEN 32 + +static int cn10k_ecb_aes_encrypt(struct otx2_nic *pfvf, u8 *sak, + u16 sak_len, u8 *hash) +{ + u8 data[CN10K_MAX_HASH_LEN] = { 0 }; + struct skcipher_request *req = NULL; + struct scatterlist sg_src, sg_dst; + struct crypto_skcipher *tfm; + DECLARE_CRYPTO_WAIT(wait); + int err; + + tfm = crypto_alloc_skcipher("ecb(aes)", 0, 0); + if (IS_ERR(tfm)) { + dev_err(pfvf->dev, "failed to allocate transform for ecb-aes\n"); + return PTR_ERR(tfm); + } + + req = skcipher_request_alloc(tfm, GFP_KERNEL); + if (!req) { + dev_err(pfvf->dev, "failed to allocate request for skcipher\n"); + err = -ENOMEM; + goto free_tfm; + } + + err = crypto_skcipher_setkey(tfm, sak, sak_len); + if (err) { + dev_err(pfvf->dev, "failed to set key for skcipher\n"); + goto free_req; + } + + /* build sg list */ + sg_init_one(&sg_src, data, CN10K_MAX_HASH_LEN); + sg_init_one(&sg_dst, hash, CN10K_MAX_HASH_LEN); + + skcipher_request_set_callback(req, 0, crypto_req_done, &wait); + skcipher_request_set_crypt(req, &sg_src, &sg_dst, + CN10K_MAX_HASH_LEN, NULL); + + err = crypto_skcipher_encrypt(req); + err = crypto_wait_req(err, &wait); + +free_req: + skcipher_request_free(req); +free_tfm: + crypto_free_skcipher(tfm); + return err; +} + static struct cn10k_mcs_txsc *cn10k_mcs_get_txsc(struct cn10k_mcs_cfg *cfg, struct macsec_secy *secy) { @@ -330,19 +381,53 @@ fail: return ret; } +static int cn10k_mcs_write_keys(struct otx2_nic *pfvf, + struct macsec_secy *secy, + struct mcs_sa_plcy_write_req *req, + u8 *sak, u8 *salt, ssci_t ssci) +{ + u8 hash_rev[CN10K_MAX_HASH_LEN]; + u8 sak_rev[CN10K_MAX_SAK_LEN]; + u8 salt_rev[MACSEC_SALT_LEN]; + u8 hash[CN10K_MAX_HASH_LEN]; + u32 ssci_63_32; + int err, i; + + err = cn10k_ecb_aes_encrypt(pfvf, sak, secy->key_len, hash); + if (err) { + dev_err(pfvf->dev, "Generating hash using ECB(AES) failed\n"); + return err; + } + + for (i = 0; i < secy->key_len; i++) + sak_rev[i] = sak[secy->key_len - 1 - i]; + + for (i = 0; i < CN10K_MAX_HASH_LEN; i++) + hash_rev[i] = hash[CN10K_MAX_HASH_LEN - 1 - i]; + + for (i = 0; i < MACSEC_SALT_LEN; i++) + salt_rev[i] = salt[MACSEC_SALT_LEN - 1 - i]; + + ssci_63_32 = (__force u32)cpu_to_be32((__force u32)ssci); + + memcpy(&req->plcy[0][0], sak_rev, secy->key_len); + memcpy(&req->plcy[0][4], hash_rev, CN10K_MAX_HASH_LEN); + memcpy(&req->plcy[0][6], salt_rev, MACSEC_SALT_LEN); + req->plcy[0][7] |= (u64)ssci_63_32 << 32; + + return 0; +} + static int cn10k_mcs_write_rx_sa_plcy(struct otx2_nic *pfvf, struct macsec_secy *secy, struct cn10k_mcs_rxsc *rxsc, u8 assoc_num, bool sa_in_use) { - unsigned char *src = rxsc->sa_key[assoc_num]; struct mcs_sa_plcy_write_req *plcy_req; - u8 *salt_p = rxsc->salt[assoc_num]; + u8 *sak = rxsc->sa_key[assoc_num]; + u8 *salt = rxsc->salt[assoc_num]; struct mcs_rx_sc_sa_map *map_req; struct mbox *mbox = &pfvf->mbox; - u64 ssci_salt_95_64 = 0; - u8 reg, key_len; - u64 salt_63_0; int ret; mutex_lock(&mbox->lock); @@ -360,20 +445,10 @@ static int cn10k_mcs_write_rx_sa_plcy(struct otx2_nic *pfvf, goto fail; } - for (reg = 0, key_len = 0; key_len < secy->key_len; key_len += 8) { - memcpy((u8 *)&plcy_req->plcy[0][reg], - (src + reg * 8), 8); - reg++; - } - - if (secy->xpn) { - memcpy((u8 *)&salt_63_0, salt_p, 8); - memcpy((u8 *)&ssci_salt_95_64, salt_p + 8, 4); - ssci_salt_95_64 |= (__force u64)rxsc->ssci[assoc_num] << 32; - - plcy_req->plcy[0][6] = salt_63_0; - plcy_req->plcy[0][7] = ssci_salt_95_64; - } + ret = cn10k_mcs_write_keys(pfvf, secy, plcy_req, sak, + salt, rxsc->ssci[assoc_num]); + if (ret) + goto fail; plcy_req->sa_index[0] = rxsc->hw_sa_id[assoc_num]; plcy_req->sa_cnt = 1; @@ -586,13 +661,10 @@ static int cn10k_mcs_write_tx_sa_plcy(struct otx2_nic *pfvf, struct cn10k_mcs_txsc *txsc, u8 assoc_num) { - unsigned char *src = txsc->sa_key[assoc_num]; struct mcs_sa_plcy_write_req *plcy_req; - u8 *salt_p = txsc->salt[assoc_num]; + u8 *sak = txsc->sa_key[assoc_num]; + u8 *salt = txsc->salt[assoc_num]; struct mbox *mbox = &pfvf->mbox; - u64 ssci_salt_95_64 = 0; - u8 reg, key_len; - u64 salt_63_0; int ret; mutex_lock(&mbox->lock); @@ -603,19 +675,10 @@ static int cn10k_mcs_write_tx_sa_plcy(struct otx2_nic *pfvf, goto fail; } - for (reg = 0, key_len = 0; key_len < secy->key_len; key_len += 8) { - memcpy((u8 *)&plcy_req->plcy[0][reg], (src + reg * 8), 8); - reg++; - } - - if (secy->xpn) { - memcpy((u8 *)&salt_63_0, salt_p, 8); - memcpy((u8 *)&ssci_salt_95_64, salt_p + 8, 4); - ssci_salt_95_64 |= (__force u64)txsc->ssci[assoc_num] << 32; - - plcy_req->plcy[0][6] = salt_63_0; - plcy_req->plcy[0][7] = ssci_salt_95_64; - } + ret = cn10k_mcs_write_keys(pfvf, secy, plcy_req, sak, + salt, txsc->ssci[assoc_num]); + if (ret) + goto fail; plcy_req->plcy[0][8] = assoc_num; plcy_req->sa_index[0] = txsc->hw_sa_id[assoc_num]; -- cgit v1.2.3 From 5e5265522a9a7f91d1b0bd411d634bdaf16c80cd Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 17 Jul 2023 14:44:44 +0000 Subject: tcp: annotate data-races around tcp_rsk(req)->txhash TCP request sockets are lockless, some of their fields can change while being read by another cpu as syzbot noticed. This is usually harmless, but we should annotate the known races. This patch takes care of tcp_rsk(req)->txhash, a separate one is needed for tcp_rsk(req)->ts_recent. BUG: KCSAN: data-race in tcp_make_synack / tcp_rtx_synack write to 0xffff8881362304bc of 4 bytes by task 32083 on cpu 1: tcp_rtx_synack+0x9d/0x2a0 net/ipv4/tcp_output.c:4213 inet_rtx_syn_ack+0x38/0x80 net/ipv4/inet_connection_sock.c:880 tcp_check_req+0x379/0xc70 net/ipv4/tcp_minisocks.c:665 tcp_v6_rcv+0x125b/0x1b20 net/ipv6/tcp_ipv6.c:1673 ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437 ip6_input_finish net/ipv6/ip6_input.c:482 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:303 [inline] ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5452 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566 netif_receive_skb_internal net/core/dev.c:5652 [inline] netif_receive_skb+0x4a/0x310 net/core/dev.c:5711 tun_rx_batched+0x3bf/0x400 tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997 tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043 call_write_iter include/linux/fs.h:1871 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x4ab/0x7d0 fs/read_write.c:584 ksys_write+0xeb/0x1a0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x42/0x50 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff8881362304bc of 4 bytes by task 32078 on cpu 0: tcp_make_synack+0x367/0xb40 net/ipv4/tcp_output.c:3663 tcp_v6_send_synack+0x72/0x420 net/ipv6/tcp_ipv6.c:544 tcp_conn_request+0x11a8/0x1560 net/ipv4/tcp_input.c:7059 tcp_v6_conn_request+0x13f/0x180 net/ipv6/tcp_ipv6.c:1175 tcp_rcv_state_process+0x156/0x1de0 net/ipv4/tcp_input.c:6494 tcp_v6_do_rcv+0x98a/0xb70 net/ipv6/tcp_ipv6.c:1509 tcp_v6_rcv+0x17b8/0x1b20 net/ipv6/tcp_ipv6.c:1735 ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437 ip6_input_finish net/ipv6/ip6_input.c:482 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:303 [inline] ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5452 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566 netif_receive_skb_internal net/core/dev.c:5652 [inline] netif_receive_skb+0x4a/0x310 net/core/dev.c:5711 tun_rx_batched+0x3bf/0x400 tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997 tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043 call_write_iter include/linux/fs.h:1871 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x4ab/0x7d0 fs/read_write.c:584 ksys_write+0xeb/0x1a0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x42/0x50 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x91d25731 -> 0xe79325cd Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 32078 Comm: syz-executor.4 Not tainted 6.5.0-rc1-syzkaller-00033-geb26cbb1a754 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023 Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Eric Dumazet Reported-by: syzbot Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230717144445.653164-2-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_ipv4.c | 3 ++- net/ipv4/tcp_minisocks.c | 2 +- net/ipv4/tcp_output.c | 4 ++-- net/ipv6/tcp_ipv6.c | 2 +- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index fd365de4d5ff..fa04ff49100b 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -992,7 +992,8 @@ static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, 0, tcp_md5_do_lookup(sk, l3index, addr, AF_INET), inet_rsk(req)->no_srccheck ? IP_REPLY_ARG_NOSRCCHECK : 0, - ip_hdr(skb)->tos, tcp_rsk(req)->txhash); + ip_hdr(skb)->tos, + READ_ONCE(tcp_rsk(req)->txhash)); } /* diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 04fc328727e6..ec05f277ce2e 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -528,7 +528,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, newicsk->icsk_ack.lrcvtime = tcp_jiffies32; newtp->lsndtime = tcp_jiffies32; - newsk->sk_txhash = treq->txhash; + newsk->sk_txhash = READ_ONCE(treq->txhash); newtp->total_retrans = req->num_retrans; tcp_init_xmit_timers(newsk); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 2cb39b6dad02..3b09cd13e2db 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3660,7 +3660,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst, rcu_read_lock(); md5 = tcp_rsk(req)->af_specific->req_md5_lookup(sk, req_to_sk(req)); #endif - skb_set_hash(skb, tcp_rsk(req)->txhash, PKT_HASH_TYPE_L4); + skb_set_hash(skb, READ_ONCE(tcp_rsk(req)->txhash), PKT_HASH_TYPE_L4); /* bpf program will be interested in the tcp_flags */ TCP_SKB_CB(skb)->tcp_flags = TCPHDR_SYN | TCPHDR_ACK; tcp_header_size = tcp_synack_options(sk, req, mss, skb, &opts, md5, @@ -4210,7 +4210,7 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req) /* Paired with WRITE_ONCE() in sock_setsockopt() */ if (READ_ONCE(sk->sk_txrehash) == SOCK_TXREHASH_ENABLED) - tcp_rsk(req)->txhash = net_tx_rndhash(); + WRITE_ONCE(tcp_rsk(req)->txhash, net_tx_rndhash()); res = af_ops->send_synack(sk, NULL, &fl, req, NULL, TCP_SYNACK_NORMAL, NULL); if (!res) { diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 40dd92a2f480..eb96a8010414 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1129,7 +1129,7 @@ static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, req->ts_recent, sk->sk_bound_dev_if, tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr, l3index), ipv6_get_dsfield(ipv6_hdr(skb)), 0, sk->sk_priority, - tcp_rsk(req)->txhash); + READ_ONCE(tcp_rsk(req)->txhash)); } -- cgit v1.2.3 From eba20811f32652bc1a52d5e7cc403859b86390d9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 17 Jul 2023 14:44:45 +0000 Subject: tcp: annotate data-races around tcp_rsk(req)->ts_recent TCP request sockets are lockless, tcp_rsk(req)->ts_recent can change while being read by another cpu as syzbot noticed. This is harmless, but we should annotate the known races. Note that tcp_check_req() changes req->ts_recent a bit early, we might change this in the future. BUG: KCSAN: data-race in tcp_check_req / tcp_check_req write to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 1: tcp_check_req+0x694/0xc70 net/ipv4/tcp_minisocks.c:762 tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071 ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:303 [inline] ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:468 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5493 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607 process_backlog+0x21f/0x380 net/core/dev.c:5935 __napi_poll+0x60/0x3b0 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6698 __do_softirq+0xc1/0x265 kernel/softirq.c:571 do_softirq+0x7e/0xb0 kernel/softirq.c:472 __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:396 local_bh_enable+0x1f/0x20 include/linux/bottom_half.h:33 rcu_read_unlock_bh include/linux/rcupdate.h:843 [inline] __dev_queue_xmit+0xabb/0x1d10 net/core/dev.c:4271 dev_queue_xmit include/linux/netdevice.h:3088 [inline] neigh_hh_output include/net/neighbour.h:528 [inline] neigh_output include/net/neighbour.h:542 [inline] ip_finish_output2+0x700/0x840 net/ipv4/ip_output.c:229 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:317 NF_HOOK_COND include/linux/netfilter.h:292 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:431 dst_output include/net/dst.h:458 [inline] ip_local_out net/ipv4/ip_output.c:126 [inline] __ip_queue_xmit+0xa4d/0xa70 net/ipv4/ip_output.c:533 ip_queue_xmit+0x38/0x40 net/ipv4/ip_output.c:547 __tcp_transmit_skb+0x1194/0x16e0 net/ipv4/tcp_output.c:1399 tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline] tcp_write_xmit+0x13ff/0x2fd0 net/ipv4/tcp_output.c:2693 __tcp_push_pending_frames+0x6a/0x1a0 net/ipv4/tcp_output.c:2877 tcp_push_pending_frames include/net/tcp.h:1952 [inline] __tcp_sock_set_cork net/ipv4/tcp.c:3336 [inline] tcp_sock_set_cork+0xe8/0x100 net/ipv4/tcp.c:3343 rds_tcp_xmit_path_complete+0x3b/0x40 net/rds/tcp_send.c:52 rds_send_xmit+0xf8d/0x1420 net/rds/send.c:422 rds_send_worker+0x42/0x1d0 net/rds/threads.c:200 process_one_work+0x3e6/0x750 kernel/workqueue.c:2408 worker_thread+0x5f2/0xa10 kernel/workqueue.c:2555 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 read to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 0: tcp_check_req+0x32a/0xc70 net/ipv4/tcp_minisocks.c:622 tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071 ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:303 [inline] ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:468 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5493 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607 process_backlog+0x21f/0x380 net/core/dev.c:5935 __napi_poll+0x60/0x3b0 net/core/dev.c:6498 napi_poll net/core/dev.c:6565 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6698 __do_softirq+0xc1/0x265 kernel/softirq.c:571 run_ksoftirqd+0x17/0x20 kernel/softirq.c:939 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x1cd237f1 -> 0x1cd237f2 Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet Reported-by: syzbot Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230717144445.653164-3-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/tcp_minisocks.c | 9 ++++++--- net/ipv4/tcp_output.c | 2 +- net/ipv6/tcp_ipv6.c | 2 +- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index fa04ff49100b..b5c81cf5b86f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -988,7 +988,7 @@ static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, tcp_rsk(req)->rcv_nxt, req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale, tcp_time_stamp_raw() + tcp_rsk(req)->ts_off, - req->ts_recent, + READ_ONCE(req->ts_recent), 0, tcp_md5_do_lookup(sk, l3index, addr, AF_INET), inet_rsk(req)->no_srccheck ? IP_REPLY_ARG_NOSRCCHECK : 0, diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index ec05f277ce2e..c8f2aa003387 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -555,7 +555,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, newtp->max_window = newtp->snd_wnd; if (newtp->rx_opt.tstamp_ok) { - newtp->rx_opt.ts_recent = req->ts_recent; + newtp->rx_opt.ts_recent = READ_ONCE(req->ts_recent); newtp->rx_opt.ts_recent_stamp = ktime_get_seconds(); newtp->tcp_header_len = sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED; } else { @@ -619,7 +619,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, tcp_parse_options(sock_net(sk), skb, &tmp_opt, 0, NULL); if (tmp_opt.saw_tstamp) { - tmp_opt.ts_recent = req->ts_recent; + tmp_opt.ts_recent = READ_ONCE(req->ts_recent); if (tmp_opt.rcv_tsecr) tmp_opt.rcv_tsecr -= tcp_rsk(req)->ts_off; /* We do not store true stamp, but it is not required, @@ -758,8 +758,11 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, /* In sequence, PAWS is OK. */ + /* TODO: We probably should defer ts_recent change once + * we take ownership of @req. + */ if (tmp_opt.saw_tstamp && !after(TCP_SKB_CB(skb)->seq, tcp_rsk(req)->rcv_nxt)) - req->ts_recent = tmp_opt.rcv_tsval; + WRITE_ONCE(req->ts_recent, tmp_opt.rcv_tsval); if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn) { /* Truncate SYN, it is out of window starting diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 3b09cd13e2db..51d8638d4b4c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -878,7 +878,7 @@ static unsigned int tcp_synack_options(const struct sock *sk, if (likely(ireq->tstamp_ok)) { opts->options |= OPTION_TS; opts->tsval = tcp_skb_timestamp(skb) + tcp_rsk(req)->ts_off; - opts->tsecr = req->ts_recent; + opts->tsecr = READ_ONCE(req->ts_recent); remaining -= TCPOLEN_TSTAMP_ALIGNED; } if (likely(ireq->sack_ok)) { diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index eb96a8010414..4714eb695913 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1126,7 +1126,7 @@ static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, tcp_rsk(req)->rcv_nxt, req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale, tcp_time_stamp_raw() + tcp_rsk(req)->ts_off, - req->ts_recent, sk->sk_bound_dev_if, + READ_ONCE(req->ts_recent), sk->sk_bound_dev_if, tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr, l3index), ipv6_get_dsfield(ipv6_hdr(skb)), 0, sk->sk_priority, READ_ONCE(tcp_rsk(req)->txhash)); -- cgit v1.2.3 From daa751444fd9d4184270b1479d8af49aaf1a1ee6 Mon Sep 17 00:00:00 2001 From: Wang Ming Date: Mon, 17 Jul 2023 17:59:19 +0800 Subject: net: ipv4: Use kfree_sensitive instead of kfree key might contain private part of the key, so better use kfree_sensitive to free it. Fixes: 38320c70d282 ("[IPSEC]: Use crypto_aead and authenc in ESP") Signed-off-by: Wang Ming Reviewed-by: Tariq Toukan Reviewed-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/esp4.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index ba06ed42e428..2be2d4922557 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -1132,7 +1132,7 @@ static int esp_init_authenc(struct xfrm_state *x, err = crypto_aead_setkey(aead, key, keylen); free_key: - kfree(key); + kfree_sensitive(key); error: return err; -- cgit v1.2.3 From 4258faa130be4ea43e5e2d839467da421b8ff274 Mon Sep 17 00:00:00 2001 From: Yuanjun Gong Date: Mon, 17 Jul 2023 22:45:19 +0800 Subject: net:ipv6: check return value of pskb_trim() goto tx_err if an unexpected result is returned by pskb_tirm() in ip6erspan_tunnel_xmit(). Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Yuanjun Gong Reviewed-by: David Ahern Reviewed-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv6/ip6_gre.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index da80974ad23a..070d87abf7c0 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -955,7 +955,8 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb, goto tx_err; if (skb->len > dev->mtu + dev->hard_header_len) { - pskb_trim(skb, dev->mtu + dev->hard_header_len); + if (pskb_trim(skb, dev->mtu + dev->hard_header_len)) + goto tx_err; truncate = true; } -- cgit v1.2.3 From 78a93c31003cc53aca5d67b1bbe2d5b9fc37cc4d Mon Sep 17 00:00:00 2001 From: Yuanjun Gong Date: Mon, 17 Jul 2023 22:46:21 +0800 Subject: drivers: net: fix return value check in emac_tso_csum() in emac_tso_csum(), return an error code if an unexpected value is returned by pskb_trim(). Signed-off-by: Yuanjun Gong Reviewed-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- drivers/net/ethernet/qualcomm/emac/emac-mac.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c b/drivers/net/ethernet/qualcomm/emac/emac-mac.c index 0d80447d4d3b..d5c688a8d7be 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c @@ -1260,8 +1260,11 @@ static int emac_tso_csum(struct emac_adapter *adpt, if (skb->protocol == htons(ETH_P_IP)) { u32 pkt_len = ((unsigned char *)ip_hdr(skb) - skb->data) + ntohs(ip_hdr(skb)->tot_len); - if (skb->len > pkt_len) - pskb_trim(skb, pkt_len); + if (skb->len > pkt_len) { + ret = pskb_trim(skb, pkt_len); + if (unlikely(ret)) + return ret; + } } hdr_len = skb_tcp_all_headers(skb); -- cgit v1.2.3 From bce5603365d8184734ba7e6b22e74bd2c90a7167 Mon Sep 17 00:00:00 2001 From: Yuanjun Gong Date: Mon, 17 Jul 2023 22:46:52 +0800 Subject: drivers:net: fix return value check in ocelot_fdma_receive_skb ocelot_fdma_receive_skb should return false if an unexpected value is returned by pskb_trim. Signed-off-by: Yuanjun Gong Reviewed-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- drivers/net/ethernet/mscc/ocelot_fdma.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c index 8e3894cf5f7c..83a3ce0c568e 100644 --- a/drivers/net/ethernet/mscc/ocelot_fdma.c +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c @@ -368,7 +368,8 @@ static bool ocelot_fdma_receive_skb(struct ocelot *ocelot, struct sk_buff *skb) if (unlikely(!ndev)) return false; - pskb_trim(skb, skb->len - ETH_FCS_LEN); + if (pskb_trim(skb, skb->len - ETH_FCS_LEN)) + return false; skb->dev = ndev; skb->protocol = eth_type_trans(skb, skb->dev); -- cgit v1.2.3 From 02d84f3eb53a5be982b17c88410fa6c58806356b Mon Sep 17 00:00:00 2001 From: Yuanjun Gong Date: Mon, 17 Jul 2023 22:49:02 +0800 Subject: ipv4: ip_gre: fix return value check in erspan_fb_xmit() goto err_free_skb if an unexpected result is returned by pskb_tirm() in erspan_fb_xmit(). Signed-off-by: Yuanjun Gong Reviewed-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/ip_gre.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 81a1cce1a7d1..914cc941af55 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -548,7 +548,8 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev) goto err_free_skb; if (skb->len > dev->mtu + dev->hard_header_len) { - pskb_trim(skb, dev->mtu + dev->hard_header_len); + if (pskb_trim(skb, dev->mtu + dev->hard_header_len)) + goto err_free_skb; truncate = true; } -- cgit v1.2.3 From aa7cb3789b429d4fdfbe767e0e0cf8c769299d7a Mon Sep 17 00:00:00 2001 From: Yuanjun Gong Date: Mon, 17 Jul 2023 22:49:18 +0800 Subject: ipv4: ip_gre: fix return value check in erspan_xmit() goto free_skb if an unexpected result is returned by pskb_tirm() in erspan_xmit(). Signed-off-by: Yuanjun Gong Reviewed-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- net/ipv4/ip_gre.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 914cc941af55..22a26d1d29a0 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -690,7 +690,8 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb, goto free_skb; if (skb->len > dev->mtu + dev->hard_header_len) { - pskb_trim(skb, dev->mtu + dev->hard_header_len); + if (pskb_trim(skb, dev->mtu + dev->hard_header_len)) + goto free_skb; truncate = true; } -- cgit v1.2.3 From 81b3ade5d2b98ad6e0a473b0e1e420a801275592 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Mon, 17 Jul 2023 14:59:18 -0700 Subject: Revert "tcp: avoid the lookup process failing to get sk in ehash table" This reverts commit 3f4ca5fafc08881d7a57daa20449d171f2887043. Commit 3f4ca5fafc08 ("tcp: avoid the lookup process failing to get sk in ehash table") reversed the order in how a socket is inserted into ehash to fix an issue that ehash-lookup could fail when reqsk/full sk/twsk are swapped. However, it introduced another lookup failure. The full socket in ehash is allocated from a slab with SLAB_TYPESAFE_BY_RCU and does not have SOCK_RCU_FREE, so the socket could be reused even while it is being referenced on another CPU doing RCU lookup. Let's say a socket is reused and inserted into the same hash bucket during lookup. After the blamed commit, a new socket is inserted at the end of the list. If that happens, we will skip sockets placed after the previous position of the reused socket, resulting in ehash lookup failure. As described in Documentation/RCU/rculist_nulls.rst, we should insert a new socket at the head of the list to avoid such an issue. This issue, the swap-lookup-failure, and another variant reported in [0] can all be handled properly by adding a locked ehash lookup suggested by Eric Dumazet [1]. However, this issue could occur for every packet, thus more likely than the other two races, so let's revert the change for now. Link: https://lore.kernel.org/netdev/20230606064306.9192-1-duanmuquan@baidu.com/ [0] Link: https://lore.kernel.org/netdev/CANn89iK8snOz8TYOhhwfimC7ykYA78GA3Nyv8x06SZYa1nKdyA@mail.gmail.com/ [1] Fixes: 3f4ca5fafc08 ("tcp: avoid the lookup process failing to get sk in ehash table") Signed-off-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/20230717215918.15723-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski --- net/ipv4/inet_hashtables.c | 17 ++--------------- net/ipv4/inet_timewait_sock.c | 8 ++++---- 2 files changed, 6 insertions(+), 19 deletions(-) diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index e7391bf310a7..0819d6001b9a 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -650,20 +650,8 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk) spin_lock(lock); if (osk) { WARN_ON_ONCE(sk->sk_hash != osk->sk_hash); - ret = sk_hashed(osk); - if (ret) { - /* Before deleting the node, we insert a new one to make - * sure that the look-up-sk process would not miss either - * of them and that at least one node would exist in ehash - * table all the time. Otherwise there's a tiny chance - * that lookup process could find nothing in ehash table. - */ - __sk_nulls_add_node_tail_rcu(sk, list); - sk_nulls_del_node_init_rcu(osk); - } - goto unlock; - } - if (found_dup_sk) { + ret = sk_nulls_del_node_init_rcu(osk); + } else if (found_dup_sk) { *found_dup_sk = inet_ehash_lookup_by_sk(sk, list); if (*found_dup_sk) ret = false; @@ -672,7 +660,6 @@ bool inet_ehash_insert(struct sock *sk, struct sock *osk, bool *found_dup_sk) if (ret) __sk_nulls_add_node_rcu(sk, list); -unlock: spin_unlock(lock); return ret; diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 40052414c7c7..2c1b245dba8e 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -88,10 +88,10 @@ void inet_twsk_put(struct inet_timewait_sock *tw) } EXPORT_SYMBOL_GPL(inet_twsk_put); -static void inet_twsk_add_node_tail_rcu(struct inet_timewait_sock *tw, - struct hlist_nulls_head *list) +static void inet_twsk_add_node_rcu(struct inet_timewait_sock *tw, + struct hlist_nulls_head *list) { - hlist_nulls_add_tail_rcu(&tw->tw_node, list); + hlist_nulls_add_head_rcu(&tw->tw_node, list); } static void inet_twsk_add_bind_node(struct inet_timewait_sock *tw, @@ -144,7 +144,7 @@ void inet_twsk_hashdance(struct inet_timewait_sock *tw, struct sock *sk, spin_lock(lock); - inet_twsk_add_node_tail_rcu(tw, &ehead->chain); + inet_twsk_add_node_rcu(tw, &ehead->chain); /* Step 3: Remove SK from hash chain */ if (__sk_nulls_del_node_init_rcu(sk)) -- cgit v1.2.3 From cf2ffdea0839398cb0551762af7f5efb0a6e0fea Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Tue, 18 Jul 2023 13:11:31 +0200 Subject: r8169: revert 2ab19de62d67 ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll") There have been reports that on a number of systems this change breaks network connectivity. Therefore effectively revert it. Mainly affected seem to be systems where BIOS denies ASPM access to OS. Due to later changes we can't do a direct revert. Fixes: 2ab19de62d67 ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/e47bac0d-e802-65e1-b311-6acb26d5cf10@freenet.de/T/ Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217596 Signed-off-by: Heiner Kallweit Link: https://lore.kernel.org/r/57f13ec0-b216-d5d8-363d-5b05528ec5fb@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/realtek/r8169_main.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index fce4a2b908c2..8a8b7d8a5c3f 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -623,6 +623,7 @@ struct rtl8169_private { int cfg9346_usage_count; unsigned supports_gmii:1; + unsigned aspm_manageable:1; dma_addr_t counters_phys_addr; struct rtl8169_counters *counters; struct rtl8169_tc_offsets tc_offset; @@ -2746,7 +2747,8 @@ static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable) if (tp->mac_version < RTL_GIGA_MAC_VER_32) return; - if (enable) { + /* Don't enable ASPM in the chip if OS can't control ASPM */ + if (enable && tp->aspm_manageable) { /* On these chip versions ASPM can even harm * bus communication of other PCI devices. */ @@ -5165,6 +5167,16 @@ done: rtl_rar_set(tp, mac_addr); } +/* register is set if system vendor successfully tested ASPM 1.2 */ +static bool rtl_aspm_is_safe(struct rtl8169_private *tp) +{ + if (tp->mac_version >= RTL_GIGA_MAC_VER_61 && + r8168_mac_ocp_read(tp, 0xc0b2) & 0xf) + return true; + + return false; +} + static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) { struct rtl8169_private *tp; @@ -5234,6 +5246,19 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) xid); tp->mac_version = chipset; + /* Disable ASPM L1 as that cause random device stop working + * problems as well as full system hangs for some PCIe devices users. + * Chips from RTL8168h partially have issues with L1.2, but seem + * to work fine with L1 and L1.1. + */ + if (rtl_aspm_is_safe(tp)) + rc = 0; + else if (tp->mac_version >= RTL_GIGA_MAC_VER_46) + rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1_2); + else + rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1); + tp->aspm_manageable = !rc; + tp->dash_type = rtl_check_dash(tp); tp->cp_cmd = RTL_R16(tp, CPlusCmd) & CPCMD_MASK; -- cgit v1.2.3 From e31a9fedc7d8d80722b19628e66fcb5a36981780 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Tue, 18 Jul 2023 13:12:32 +0200 Subject: Revert "r8169: disable ASPM during NAPI poll" This reverts commit e1ed3e4d91112027b90c7ee61479141b3f948e6a. Turned out the change causes a performance regression. Link: https://lore.kernel.org/netdev/20230713124914.GA12924@green245/T/ Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit Link: https://lore.kernel.org/r/055c6bc2-74fa-8c67-9897-3f658abb5ae7@gmail.com Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/realtek/r8169_main.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 8a8b7d8a5c3f..5eb50b265c0b 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4523,10 +4523,6 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) } if (napi_schedule_prep(&tp->napi)) { - rtl_unlock_config_regs(tp); - rtl_hw_aspm_clkreq_enable(tp, false); - rtl_lock_config_regs(tp); - rtl_irq_disable(tp); __napi_schedule(&tp->napi); } @@ -4586,14 +4582,9 @@ static int rtl8169_poll(struct napi_struct *napi, int budget) work_done = rtl_rx(dev, tp, budget); - if (work_done < budget && napi_complete_done(napi, work_done)) { + if (work_done < budget && napi_complete_done(napi, work_done)) rtl_irq_enable(tp); - rtl_unlock_config_regs(tp); - rtl_hw_aspm_clkreq_enable(tp, true); - rtl_lock_config_regs(tp); - } - return work_done; } -- cgit v1.2.3 From 9f9d4c1a2e82174a4e799ec405284a2b0de32b6a Mon Sep 17 00:00:00 2001 From: Daniel Golle Date: Wed, 19 Jul 2023 01:39:36 +0100 Subject: net: ethernet: mtk_eth_soc: always mtk_get_ib1_pkt_type entries and bind debugfs files would display wrong data on NETSYS_V2 and later because instead of using mtk_get_ib1_pkt_type the driver would use MTK_FOE_IB1_PACKET_TYPE which corresponds to NETSYS_V1(.x) SoCs. Use mtk_get_ib1_pkt_type so entries and bind records display correctly. Fixes: 03a3180e5c09e ("net: ethernet: mtk_eth_soc: introduce flow offloading support for mt7986") Signed-off-by: Daniel Golle Acked-by: Lorenzo Bianconi Link: https://lore.kernel.org/r/c0ae03d0182f4d27b874cbdf0059bc972c317f3c.1689727134.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski --- drivers/net/ethernet/mediatek/mtk_ppe_debugfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/mtk_ppe_debugfs.c b/drivers/net/ethernet/mediatek/mtk_ppe_debugfs.c index 316fe2e70fea..1a97feca77f2 100644 --- a/drivers/net/ethernet/mediatek/mtk_ppe_debugfs.c +++ b/drivers/net/ethernet/mediatek/mtk_ppe_debugfs.c @@ -98,7 +98,7 @@ mtk_ppe_debugfs_foe_show(struct seq_file *m, void *private, bool bind) acct = mtk_foe_entry_get_mib(ppe, i, NULL); - type = FIELD_GET(MTK_FOE_IB1_PACKET_TYPE, entry->ib1); + type = mtk_get_ib1_pkt_type(ppe->eth, entry->ib1); seq_printf(m, "%05x %s %7s", i, mtk_foe_entry_state_str(state), mtk_foe_pkt_type_str(type)); -- cgit v1.2.3 From 9b64e93e83c2145a750e780198b41d612e3dfa5d Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 18 Jul 2023 10:41:49 -0700 Subject: llc: Check netns in llc_dgram_match(). We will remove this restriction in llc_rcv() soon, which means that the protocol handler must be aware of netns. if (!net_eq(dev_net(dev), &init_net)) goto drop; llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it if not NULL. If the PDU type is LLC_DEST_SAP, llc_sap_handler() is called to pass skb to corresponding sockets. Then, we must look up a proper socket in the same netns with skb->dev. If the destination is a multicast address, llc_sap_handler() calls llc_sap_mcast(). It calculates a hash based on DSAP and skb->dev->ifindex, iterates on a socket list, and calls llc_mcast_match() to check if the socket is the correct destination. Then, llc_mcast_match() checks if skb->dev matches with llc_sk(sk)->dev. So, we need not check netns here. OTOH, if the destination is a unicast address, llc_sap_handler() calls llc_lookup_dgram() to look up a socket, but it does not check the netns. Therefore, we need to add netns check in llc_lookup_dgram(). Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni --- net/llc/llc_sap.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/net/llc/llc_sap.c b/net/llc/llc_sap.c index 6805ce43a055..116c0e479183 100644 --- a/net/llc/llc_sap.c +++ b/net/llc/llc_sap.c @@ -294,25 +294,29 @@ static void llc_sap_rcv(struct llc_sap *sap, struct sk_buff *skb, static inline bool llc_dgram_match(const struct llc_sap *sap, const struct llc_addr *laddr, - const struct sock *sk) + const struct sock *sk, + const struct net *net) { struct llc_sock *llc = llc_sk(sk); return sk->sk_type == SOCK_DGRAM && - llc->laddr.lsap == laddr->lsap && - ether_addr_equal(llc->laddr.mac, laddr->mac); + net_eq(sock_net(sk), net) && + llc->laddr.lsap == laddr->lsap && + ether_addr_equal(llc->laddr.mac, laddr->mac); } /** * llc_lookup_dgram - Finds dgram socket for the local sap/mac * @sap: SAP * @laddr: address of local LLC (MAC + SAP) + * @net: netns to look up a socket in * * Search socket list of the SAP and finds connection using the local * mac, and local sap. Returns pointer for socket found, %NULL otherwise. */ static struct sock *llc_lookup_dgram(struct llc_sap *sap, - const struct llc_addr *laddr) + const struct llc_addr *laddr, + const struct net *net) { struct sock *rc; struct hlist_nulls_node *node; @@ -322,12 +326,12 @@ static struct sock *llc_lookup_dgram(struct llc_sap *sap, rcu_read_lock_bh(); again: sk_nulls_for_each_rcu(rc, node, laddr_hb) { - if (llc_dgram_match(sap, laddr, rc)) { + if (llc_dgram_match(sap, laddr, rc, net)) { /* Extra checks required by SLAB_TYPESAFE_BY_RCU */ if (unlikely(!refcount_inc_not_zero(&rc->sk_refcnt))) goto again; if (unlikely(llc_sk(rc)->sap != sap || - !llc_dgram_match(sap, laddr, rc))) { + !llc_dgram_match(sap, laddr, rc, net))) { sock_put(rc); continue; } @@ -429,7 +433,7 @@ void llc_sap_handler(struct llc_sap *sap, struct sk_buff *skb) llc_sap_mcast(sap, &laddr, skb); kfree_skb(skb); } else { - struct sock *sk = llc_lookup_dgram(sap, &laddr); + struct sock *sk = llc_lookup_dgram(sap, &laddr, dev_net(skb->dev)); if (sk) { llc_sap_rcv(sap, skb, sk); sock_put(sk); -- cgit v1.2.3 From 97b1d320f48c21e40cc42b4ac033f2520f9ecc5c Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 18 Jul 2023 10:41:50 -0700 Subject: llc: Check netns in llc_estab_match() and llc_listener_match(). We will remove this restriction in llc_rcv() in the following patch, which means that the protocol handler must be aware of netns. if (!net_eq(dev_net(dev), &init_net)) goto drop; llc_rcv() fetches llc_type_handlers[llc_pdu_type(skb) - 1] and calls it if not NULL. If the PDU type is LLC_DEST_CONN, llc_conn_handler() is called to pass skb to corresponding sockets. Then, we must look up a proper socket in the same netns with skb->dev. llc_conn_handler() calls __llc_lookup() to look up a established or litening socket by __llc_lookup_established() and llc_lookup_listener(). Both functions iterate on a list and call llc_estab_match() or llc_listener_match() to check if the socket is the correct destination. However, these functions do not check netns. Also, bind() and connect() call llc_establish_connection(), which finally calls __llc_lookup_established(), to check if there is a conflicting socket. Let's test netns in llc_estab_match() and llc_listener_match(). Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni --- include/net/llc_conn.h | 2 +- net/llc/af_llc.c | 2 +- net/llc/llc_conn.c | 49 ++++++++++++++++++++++++++++++------------------- net/llc/llc_if.c | 2 +- 4 files changed, 33 insertions(+), 22 deletions(-) diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h index 2c1ea3414640..374411b3066c 100644 --- a/include/net/llc_conn.h +++ b/include/net/llc_conn.h @@ -111,7 +111,7 @@ void llc_conn_resend_i_pdu_as_cmd(struct sock *sk, u8 nr, u8 first_p_bit); void llc_conn_resend_i_pdu_as_rsp(struct sock *sk, u8 nr, u8 first_f_bit); int llc_conn_remove_acked_pdus(struct sock *conn, u8 nr, u16 *how_many_unacked); struct sock *llc_lookup_established(struct llc_sap *sap, struct llc_addr *daddr, - struct llc_addr *laddr); + struct llc_addr *laddr, const struct net *net); void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk); void llc_sap_remove_socket(struct llc_sap *sap, struct sock *sk); diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c index 57c35c960b2c..9b06c380866b 100644 --- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -402,7 +402,7 @@ static int llc_ui_bind(struct socket *sock, struct sockaddr *uaddr, int addrlen) memcpy(laddr.mac, addr->sllc_mac, IFHWADDRLEN); laddr.lsap = addr->sllc_sap; rc = -EADDRINUSE; /* mac + sap clash. */ - ask = llc_lookup_established(sap, &daddr, &laddr); + ask = llc_lookup_established(sap, &daddr, &laddr, &init_net); if (ask) { sock_put(ask); goto out_put; diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c index 912aa9bd5e29..d037009ee10f 100644 --- a/net/llc/llc_conn.c +++ b/net/llc/llc_conn.c @@ -453,11 +453,13 @@ static int llc_exec_conn_trans_actions(struct sock *sk, static inline bool llc_estab_match(const struct llc_sap *sap, const struct llc_addr *daddr, const struct llc_addr *laddr, - const struct sock *sk) + const struct sock *sk, + const struct net *net) { struct llc_sock *llc = llc_sk(sk); - return llc->laddr.lsap == laddr->lsap && + return net_eq(sock_net(sk), net) && + llc->laddr.lsap == laddr->lsap && llc->daddr.lsap == daddr->lsap && ether_addr_equal(llc->laddr.mac, laddr->mac) && ether_addr_equal(llc->daddr.mac, daddr->mac); @@ -468,6 +470,7 @@ static inline bool llc_estab_match(const struct llc_sap *sap, * @sap: SAP * @daddr: address of remote LLC (MAC + SAP) * @laddr: address of local LLC (MAC + SAP) + * @net: netns to look up a socket in * * Search connection list of the SAP and finds connection using the remote * mac, remote sap, local mac, and local sap. Returns pointer for @@ -476,7 +479,8 @@ static inline bool llc_estab_match(const struct llc_sap *sap, */ static struct sock *__llc_lookup_established(struct llc_sap *sap, struct llc_addr *daddr, - struct llc_addr *laddr) + struct llc_addr *laddr, + const struct net *net) { struct sock *rc; struct hlist_nulls_node *node; @@ -486,12 +490,12 @@ static struct sock *__llc_lookup_established(struct llc_sap *sap, rcu_read_lock(); again: sk_nulls_for_each_rcu(rc, node, laddr_hb) { - if (llc_estab_match(sap, daddr, laddr, rc)) { + if (llc_estab_match(sap, daddr, laddr, rc, net)) { /* Extra checks required by SLAB_TYPESAFE_BY_RCU */ if (unlikely(!refcount_inc_not_zero(&rc->sk_refcnt))) goto again; if (unlikely(llc_sk(rc)->sap != sap || - !llc_estab_match(sap, daddr, laddr, rc))) { + !llc_estab_match(sap, daddr, laddr, rc, net))) { sock_put(rc); continue; } @@ -513,29 +517,33 @@ found: struct sock *llc_lookup_established(struct llc_sap *sap, struct llc_addr *daddr, - struct llc_addr *laddr) + struct llc_addr *laddr, + const struct net *net) { struct sock *sk; local_bh_disable(); - sk = __llc_lookup_established(sap, daddr, laddr); + sk = __llc_lookup_established(sap, daddr, laddr, net); local_bh_enable(); return sk; } static inline bool llc_listener_match(const struct llc_sap *sap, const struct llc_addr *laddr, - const struct sock *sk) + const struct sock *sk, + const struct net *net) { struct llc_sock *llc = llc_sk(sk); - return sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN && + return net_eq(sock_net(sk), net) && + sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN && llc->laddr.lsap == laddr->lsap && ether_addr_equal(llc->laddr.mac, laddr->mac); } static struct sock *__llc_lookup_listener(struct llc_sap *sap, - struct llc_addr *laddr) + struct llc_addr *laddr, + const struct net *net) { struct sock *rc; struct hlist_nulls_node *node; @@ -545,12 +553,12 @@ static struct sock *__llc_lookup_listener(struct llc_sap *sap, rcu_read_lock(); again: sk_nulls_for_each_rcu(rc, node, laddr_hb) { - if (llc_listener_match(sap, laddr, rc)) { + if (llc_listener_match(sap, laddr, rc, net)) { /* Extra checks required by SLAB_TYPESAFE_BY_RCU */ if (unlikely(!refcount_inc_not_zero(&rc->sk_refcnt))) goto again; if (unlikely(llc_sk(rc)->sap != sap || - !llc_listener_match(sap, laddr, rc))) { + !llc_listener_match(sap, laddr, rc, net))) { sock_put(rc); continue; } @@ -574,6 +582,7 @@ found: * llc_lookup_listener - Finds listener for local MAC + SAP * @sap: SAP * @laddr: address of local LLC (MAC + SAP) + * @net: netns to look up a socket in * * Search connection list of the SAP and finds connection listening on * local mac, and local sap. Returns pointer for parent socket found, @@ -581,24 +590,26 @@ found: * Caller has to make sure local_bh is disabled. */ static struct sock *llc_lookup_listener(struct llc_sap *sap, - struct llc_addr *laddr) + struct llc_addr *laddr, + const struct net *net) { + struct sock *rc = __llc_lookup_listener(sap, laddr, net); static struct llc_addr null_addr; - struct sock *rc = __llc_lookup_listener(sap, laddr); if (!rc) - rc = __llc_lookup_listener(sap, &null_addr); + rc = __llc_lookup_listener(sap, &null_addr, net); return rc; } static struct sock *__llc_lookup(struct llc_sap *sap, struct llc_addr *daddr, - struct llc_addr *laddr) + struct llc_addr *laddr, + const struct net *net) { - struct sock *sk = __llc_lookup_established(sap, daddr, laddr); + struct sock *sk = __llc_lookup_established(sap, daddr, laddr, net); - return sk ? : llc_lookup_listener(sap, laddr); + return sk ? : llc_lookup_listener(sap, laddr, net); } /** @@ -776,7 +787,7 @@ void llc_conn_handler(struct llc_sap *sap, struct sk_buff *skb) llc_pdu_decode_da(skb, daddr.mac); llc_pdu_decode_dsap(skb, &daddr.lsap); - sk = __llc_lookup(sap, &saddr, &daddr); + sk = __llc_lookup(sap, &saddr, &daddr, dev_net(skb->dev)); if (!sk) goto drop; diff --git a/net/llc/llc_if.c b/net/llc/llc_if.c index dde9bf08a593..58a5f419adc6 100644 --- a/net/llc/llc_if.c +++ b/net/llc/llc_if.c @@ -92,7 +92,7 @@ int llc_establish_connection(struct sock *sk, const u8 *lmac, u8 *dmac, u8 dsap) daddr.lsap = dsap; memcpy(daddr.mac, dmac, sizeof(daddr.mac)); memcpy(laddr.mac, lmac, sizeof(laddr.mac)); - existing = llc_lookup_established(llc->sap, &daddr, &laddr); + existing = llc_lookup_established(llc->sap, &daddr, &laddr, sock_net(sk)); if (existing) { if (existing->sk_state == TCP_ESTABLISHED) { sk = existing; -- cgit v1.2.3 From 6631463b6e6673916d2481f692938f393148aa82 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 18 Jul 2023 10:41:51 -0700 Subject: llc: Don't drop packet from non-root netns. Now these upper layer protocol handlers can be called from llc_rcv() as sap->rcv_func(), which is registered by llc_sap_open(). * function which is passed to register_8022_client() -> no in-kernel user calls register_8022_client(). * snap_rcv() `- proto->rcvfunc() : registered by register_snap_client() -> aarp_rcv() and atalk_rcv() drop packets from non-root netns * stp_pdu_rcv() `- garp_protos[]->rcv() : registered by stp_proto_register() -> garp_pdu_rcv() and br_stp_rcv() are netns-aware So, we can safely remove the netns restriction in llc_rcv(). Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe") Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni --- net/llc/llc_input.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c index c309b72a5877..7cac441862e2 100644 --- a/net/llc/llc_input.c +++ b/net/llc/llc_input.c @@ -163,9 +163,6 @@ int llc_rcv(struct sk_buff *skb, struct net_device *dev, void (*sta_handler)(struct sk_buff *skb); void (*sap_handler)(struct llc_sap *sap, struct sk_buff *skb); - if (!net_eq(dev_net(dev), &init_net)) - goto drop; - /* * When the interface is in promisc. mode, drop all the crap that it * receives, do not try to analyse it. -- cgit v1.2.3 From 7ebd00a5a20c48e6020d49a3b2afb3cdfd2da8b7 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 18 Jul 2023 10:41:52 -0700 Subject: Revert "bridge: Add extack warning when enabling STP in netns." This reverts commit 56a16035bb6effb37177867cea94c13a8382f745. Since the previous commit, STP works on bridge in netns. # unshare -n # ip link add br0 type bridge # ip link add veth0 type veth peer name veth1 # ip link set veth0 master br0 up [ 50.558135] br0: port 1(veth0) entered blocking state [ 50.558366] br0: port 1(veth0) entered disabled state [ 50.558798] veth0: entered allmulticast mode [ 50.564401] veth0: entered promiscuous mode # ip link set veth1 master br0 up [ 54.215487] br0: port 2(veth1) entered blocking state [ 54.215657] br0: port 2(veth1) entered disabled state [ 54.215848] veth1: entered allmulticast mode [ 54.219577] veth1: entered promiscuous mode # ip link set br0 type bridge stp_state 1 # ip link set br0 up [ 61.960726] br0: port 2(veth1) entered blocking state [ 61.961097] br0: port 2(veth1) entered listening state [ 61.961495] br0: port 1(veth0) entered blocking state [ 61.961653] br0: port 1(veth0) entered listening state [ 63.998835] br0: port 2(veth1) entered blocking state [ 77.437113] br0: port 1(veth0) entered learning state [ 86.653501] br0: received packet on veth0 with own address as source address (addr:6e:0f:e7:6f:5f:5f, vlan:0) [ 92.797095] br0: port 1(veth0) entered forwarding state [ 92.797398] br0: topology change detected, propagating Let's remove the warning. Signed-off-by: Kuniyuki Iwashima Signed-off-by: Paolo Abeni --- net/bridge/br_stp_if.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index b65962682771..75204d36d7f9 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -201,9 +201,6 @@ int br_stp_set_enabled(struct net_bridge *br, unsigned long val, { ASSERT_RTNL(); - if (!net_eq(dev_net(br->dev), &init_net)) - NL_SET_ERR_MSG_MOD(extack, "STP does not work in non-root netns"); - if (br_mrp_enabled(br)) { NL_SET_ERR_MSG_MOD(extack, "STP can't be enabled if MRP is already enabled"); -- cgit v1.2.3 From ddbd8be68941985f166f5107109a90ce13147c44 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Thu, 20 Jul 2023 00:29:58 +0200 Subject: netfilter: nf_tables: fix spurious set element insertion failure On some platforms there is a padding hole in the nft_verdict structure, between the verdict code and the chain pointer. On element insertion, if the new element clashes with an existing one and NLM_F_EXCL flag isn't set, we want to ignore the -EEXIST error as long as the data associated with duplicated element is the same as the existing one. The data equality check uses memcmp. For normal data (NFT_DATA_VALUE) this works fine, but for NFT_DATA_VERDICT padding area leads to spurious failure even if the verdict data is the same. This then makes the insertion fail with 'already exists' error, even though the new "key : data" matches an existing entry and userspace told the kernel that it doesn't want to receive an error indication. Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion") Signed-off-by: Florian Westphal --- net/netfilter/nf_tables_api.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 237f739da3ca..79c7eee33dcd 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10517,6 +10517,9 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data, if (!tb[NFTA_VERDICT_CODE]) return -EINVAL; + + /* zero padding hole for memcmp */ + memset(data, 0, sizeof(*data)); data->verdict.code = ntohl(nla_get_be32(tb[NFTA_VERDICT_CODE])); switch (data->verdict.code) { -- cgit v1.2.3 From 314c82841602a111c04a7210c21dc77e0d560242 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Tue, 18 Jul 2023 01:30:33 +0200 Subject: netfilter: nf_tables: can't schedule in nft_chain_validate Can be called via nft set element list iteration, which may acquire rcu and/or bh read lock (depends on set type). BUG: sleeping function called from invalid context at net/netfilter/nf_tables_api.c:3353 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1232, name: nft preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 2 locks held by nft/1232: #0: ffff8881180e3ea8 (&nft_net->commit_mutex){+.+.}-{3:3}, at: nf_tables_valid_genid #1: ffffffff83f5f540 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire Call Trace: nft_chain_validate nft_lookup_validate_setelem nft_pipapo_walk nft_lookup_validate nft_chain_validate nft_immediate_validate nft_chain_validate nf_tables_validate nf_tables_abort No choice but to move it to nf_tables_validate(). Fixes: 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks") Signed-off-by: Florian Westphal --- net/netfilter/nf_tables_api.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 79c7eee33dcd..41e7d21d4429 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3685,8 +3685,6 @@ int nft_chain_validate(const struct nft_ctx *ctx, const struct nft_chain *chain) if (err < 0) return err; } - - cond_resched(); } return 0; @@ -3710,6 +3708,8 @@ static int nft_table_validate(struct net *net, const struct nft_table *table) err = nft_chain_validate(&ctx, chain); if (err < 0) return err; + + cond_resched(); } return 0; -- cgit v1.2.3 From 87b5a5c209405cb6b57424cdfa226a6dbd349232 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 19 Jul 2023 21:08:21 +0200 Subject: netfilter: nft_set_pipapo: fix improper element removal end key should be equal to start unless NFT_SET_EXT_KEY_END is present. Its possible to add elements that only have a start key ("{ 1.0.0.0 . 2.0.0.0 }") without an internval end. Insertion treats this via: if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END)) end = (const u8 *)nft_set_ext_key_end(ext)->data; else end = start; but removal side always uses nft_set_ext_key_end(). This is wrong and leads to garbage remaining in the set after removal next lookup/insert attempt will give: BUG: KASAN: slab-use-after-free in pipapo_get+0x8eb/0xb90 Read of size 1 at addr ffff888100d50586 by task nft-pipapo_uaf_/1399 Call Trace: kasan_report+0x105/0x140 pipapo_get+0x8eb/0xb90 nft_pipapo_insert+0x1dc/0x1710 nf_tables_newsetelem+0x31f5/0x4e00 .. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: lonial con Reviewed-by: Stefano Brivio Signed-off-by: Florian Westphal --- net/netfilter/nft_set_pipapo.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c index db526cb7a485..49915a2a58eb 100644 --- a/net/netfilter/nft_set_pipapo.c +++ b/net/netfilter/nft_set_pipapo.c @@ -1929,7 +1929,11 @@ static void nft_pipapo_remove(const struct net *net, const struct nft_set *set, int i, start, rules_fx; match_start = data; - match_end = (const u8 *)nft_set_ext_key_end(&e->ext)->data; + + if (nft_set_ext_exists(&e->ext, NFT_SET_EXT_KEY_END)) + match_end = (const u8 *)nft_set_ext_key_end(&e->ext)->data; + else + match_end = data; start = first_rule; rules_fx = rules_f0; -- cgit v1.2.3 From 751d460ccff3137212f47d876221534bf0490996 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Wed, 19 Jul 2023 20:19:43 +0200 Subject: netfilter: nf_tables: skip bound chain in netns release path Skip bound chain from netns release path, the rule that owns this chain releases these objects. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso Signed-off-by: Florian Westphal --- net/netfilter/nf_tables_api.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 41e7d21d4429..40bff6e2ea5d 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10802,6 +10802,9 @@ static void __nft_release_table(struct net *net, struct nft_table *table) ctx.family = table->family; ctx.table = table; list_for_each_entry(chain, &table->chains, list) { + if (nft_chain_is_bound(chain)) + continue; + ctx.chain = chain; list_for_each_entry_safe(rule, nr, &chain->rules, list) { list_del(&rule->list); -- cgit v1.2.3 From 6eaf41e87a223ae6f8e7a28d6e78384ad7e407f8 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Thu, 20 Jul 2023 09:17:21 +0200 Subject: netfilter: nf_tables: skip bound chain on rule flush Skip bound chain when flushing table rules, the rule that owns this chain releases these objects. Otherwise, the following warning is triggered: WARNING: CPU: 2 PID: 1217 at net/netfilter/nf_tables_api.c:2013 nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] CPU: 2 PID: 1217 Comm: chain-flush Not tainted 6.1.39 #1 RIP: 0010:nf_tables_chain_destroy+0x1f7/0x210 [nf_tables] Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Kevin Rich Signed-off-by: Pablo Neira Ayuso Signed-off-by: Florian Westphal --- net/netfilter/nf_tables_api.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 40bff6e2ea5d..b9a4d3fd1d34 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4087,6 +4087,8 @@ static int nf_tables_delrule(struct sk_buff *skb, const struct nfnl_info *info, list_for_each_entry(chain, &table->chains, list) { if (!nft_is_active_next(net, chain)) continue; + if (nft_chain_is_bound(chain)) + continue; ctx.chain = chain; err = nft_delrule_by_chain(&ctx); -- cgit v1.2.3 From 195ef75e19287b4bc413da3e3e3722b030ac881e Mon Sep 17 00:00:00 2001 From: Pauli Virtanen Date: Mon, 19 Jun 2023 01:04:31 +0300 Subject: Bluetooth: use RCU for hci_conn_params and iterate safely in hci_sync hci_update_accept_list_sync iterates over hdev->pend_le_conns and hdev->pend_le_reports, and waits for controller events in the loop body, without holding hdev lock. Meanwhile, these lists and the items may be modified e.g. by le_scan_cleanup. This can invalidate the list cursor or any other item in the list, resulting to invalid behavior (eg use-after-free). Use RCU for the hci_conn_params action lists. Since the loop bodies in hci_sync block and we cannot use RCU or hdev->lock for the whole loop, copy list items first and then iterate on the copy. Only the flags field is written from elsewhere, so READ_ONCE/WRITE_ONCE should guarantee we read valid values. Free params everywhere with hci_conn_params_free so the cleanup is guaranteed to be done properly. This fixes the following, which can be triggered e.g. by BlueZ new mgmt-tester case "Add + Remove Device Nowait - Success", or by changing hci_le_set_cig_params to always return false, and running iso-tester: ================================================================== BUG: KASAN: slab-use-after-free in hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841) Read of size 8 at addr ffff888001265018 by task kworker/u3:0/32 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 Workqueue: hci0 hci_cmd_sync_work Call Trace: dump_stack_lvl (./arch/x86/include/asm/irqflags.h:134 lib/dump_stack.c:107) print_report (mm/kasan/report.c:320 mm/kasan/report.c:430) ? __virt_addr_valid (./include/linux/mmzone.h:1915 ./include/linux/mmzone.h:2011 arch/x86/mm/physaddr.c:65) ? hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841) kasan_report (mm/kasan/report.c:538) ? hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841) hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841) ? __pfx_hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2780) ? mutex_lock (kernel/locking/mutex.c:282) ? __pfx_mutex_lock (kernel/locking/mutex.c:282) ? __pfx_mutex_unlock (kernel/locking/mutex.c:538) ? __pfx_update_passive_scan_sync (net/bluetooth/hci_sync.c:2861) hci_cmd_sync_work (net/bluetooth/hci_sync.c:306) process_one_work (./arch/x86/include/asm/preempt.h:27 kernel/workqueue.c:2399) worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2538) ? __pfx_worker_thread (kernel/workqueue.c:2480) kthread (kernel/kthread.c:376) ? __pfx_kthread (kernel/kthread.c:331) ret_from_fork (arch/x86/entry/entry_64.S:314) Allocated by task 31: kasan_save_stack (mm/kasan/common.c:46) kasan_set_track (mm/kasan/common.c:52) __kasan_kmalloc (mm/kasan/common.c:374 mm/kasan/common.c:383) hci_conn_params_add (./include/linux/slab.h:580 ./include/linux/slab.h:720 net/bluetooth/hci_core.c:2277) hci_connect_le_scan (net/bluetooth/hci_conn.c:1419 net/bluetooth/hci_conn.c:1589) hci_connect_cis (net/bluetooth/hci_conn.c:2266) iso_connect_cis (net/bluetooth/iso.c:390) iso_sock_connect (net/bluetooth/iso.c:899) __sys_connect (net/socket.c:2003 net/socket.c:2020) __x64_sys_connect (net/socket.c:2027) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) Freed by task 15: kasan_save_stack (mm/kasan/common.c:46) kasan_set_track (mm/kasan/common.c:52) kasan_save_free_info (mm/kasan/generic.c:523) __kasan_slab_free (mm/kasan/common.c:238 mm/kasan/common.c:200 mm/kasan/common.c:244) __kmem_cache_free (mm/slub.c:1807 mm/slub.c:3787 mm/slub.c:3800) hci_conn_params_del (net/bluetooth/hci_core.c:2323) le_scan_cleanup (net/bluetooth/hci_conn.c:202) process_one_work (./arch/x86/include/asm/preempt.h:27 kernel/workqueue.c:2399) worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2538) kthread (kernel/kthread.c:376) ret_from_fork (arch/x86/entry/entry_64.S:314) ================================================================== Fixes: e8907f76544f ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3") Signed-off-by: Pauli Virtanen Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 5 ++ net/bluetooth/hci_conn.c | 10 ++-- net/bluetooth/hci_core.c | 38 ++++++++++--- net/bluetooth/hci_event.c | 12 ++-- net/bluetooth/hci_sync.c | 117 +++++++++++++++++++++++++++++++++++---- net/bluetooth/mgmt.c | 26 ++++----- 6 files changed, 164 insertions(+), 44 deletions(-) diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 9654567cfae3..870b6d3c5146 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -822,6 +822,7 @@ struct hci_conn_params { struct hci_conn *conn; bool explicit_connect; + /* Accessed without hdev->lock: */ hci_conn_flags_t flags; u8 privacy_mode; }; @@ -1573,7 +1574,11 @@ struct hci_conn_params *hci_conn_params_add(struct hci_dev *hdev, bdaddr_t *addr, u8 addr_type); void hci_conn_params_del(struct hci_dev *hdev, bdaddr_t *addr, u8 addr_type); void hci_conn_params_clear_disabled(struct hci_dev *hdev); +void hci_conn_params_free(struct hci_conn_params *param); +void hci_pend_le_list_del_init(struct hci_conn_params *param); +void hci_pend_le_list_add(struct hci_conn_params *param, + struct list_head *list); struct hci_conn_params *hci_pend_le_action_lookup(struct list_head *list, bdaddr_t *addr, u8 addr_type); diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index 056f9516e46d..db598942cab4 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -118,7 +118,7 @@ static void hci_connect_le_scan_cleanup(struct hci_conn *conn, u8 status) */ params->explicit_connect = false; - list_del_init(¶ms->action); + hci_pend_le_list_del_init(params); switch (params->auto_connect) { case HCI_AUTO_CONN_EXPLICIT: @@ -127,10 +127,10 @@ static void hci_connect_le_scan_cleanup(struct hci_conn *conn, u8 status) return; case HCI_AUTO_CONN_DIRECT: case HCI_AUTO_CONN_ALWAYS: - list_add(¶ms->action, &hdev->pend_le_conns); + hci_pend_le_list_add(params, &hdev->pend_le_conns); break; case HCI_AUTO_CONN_REPORT: - list_add(¶ms->action, &hdev->pend_le_reports); + hci_pend_le_list_add(params, &hdev->pend_le_reports); break; default: break; @@ -1426,8 +1426,8 @@ static int hci_explicit_conn_params_set(struct hci_dev *hdev, if (params->auto_connect == HCI_AUTO_CONN_DISABLED || params->auto_connect == HCI_AUTO_CONN_REPORT || params->auto_connect == HCI_AUTO_CONN_EXPLICIT) { - list_del_init(¶ms->action); - list_add(¶ms->action, &hdev->pend_le_conns); + hci_pend_le_list_del_init(params); + hci_pend_le_list_add(params, &hdev->pend_le_conns); } params->explicit_connect = true; diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 48917c68358d..b421e196f60c 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -2249,21 +2249,45 @@ struct hci_conn_params *hci_conn_params_lookup(struct hci_dev *hdev, return NULL; } -/* This function requires the caller holds hdev->lock */ +/* This function requires the caller holds hdev->lock or rcu_read_lock */ struct hci_conn_params *hci_pend_le_action_lookup(struct list_head *list, bdaddr_t *addr, u8 addr_type) { struct hci_conn_params *param; - list_for_each_entry(param, list, action) { + rcu_read_lock(); + + list_for_each_entry_rcu(param, list, action) { if (bacmp(¶m->addr, addr) == 0 && - param->addr_type == addr_type) + param->addr_type == addr_type) { + rcu_read_unlock(); return param; + } } + rcu_read_unlock(); + return NULL; } +/* This function requires the caller holds hdev->lock */ +void hci_pend_le_list_del_init(struct hci_conn_params *param) +{ + if (list_empty(¶m->action)) + return; + + list_del_rcu(¶m->action); + synchronize_rcu(); + INIT_LIST_HEAD(¶m->action); +} + +/* This function requires the caller holds hdev->lock */ +void hci_pend_le_list_add(struct hci_conn_params *param, + struct list_head *list) +{ + list_add_rcu(¶m->action, list); +} + /* This function requires the caller holds hdev->lock */ struct hci_conn_params *hci_conn_params_add(struct hci_dev *hdev, bdaddr_t *addr, u8 addr_type) @@ -2297,14 +2321,15 @@ struct hci_conn_params *hci_conn_params_add(struct hci_dev *hdev, return params; } -static void hci_conn_params_free(struct hci_conn_params *params) +void hci_conn_params_free(struct hci_conn_params *params) { + hci_pend_le_list_del_init(params); + if (params->conn) { hci_conn_drop(params->conn); hci_conn_put(params->conn); } - list_del(¶ms->action); list_del(¶ms->list); kfree(params); } @@ -2342,8 +2367,7 @@ void hci_conn_params_clear_disabled(struct hci_dev *hdev) continue; } - list_del(¶ms->list); - kfree(params); + hci_conn_params_free(params); } BT_DBG("All LE disabled connection parameters were removed"); diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index 95816a938cea..fbc5cb8548f2 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -1564,7 +1564,7 @@ static u8 hci_cc_le_set_privacy_mode(struct hci_dev *hdev, void *data, params = hci_conn_params_lookup(hdev, &cp->bdaddr, cp->bdaddr_type); if (params) - params->privacy_mode = cp->mode; + WRITE_ONCE(params->privacy_mode, cp->mode); hci_dev_unlock(hdev); @@ -2804,8 +2804,8 @@ static void hci_cs_disconnect(struct hci_dev *hdev, u8 status) case HCI_AUTO_CONN_DIRECT: case HCI_AUTO_CONN_ALWAYS: - list_del_init(¶ms->action); - list_add(¶ms->action, &hdev->pend_le_conns); + hci_pend_le_list_del_init(params); + hci_pend_le_list_add(params, &hdev->pend_le_conns); break; default: @@ -3423,8 +3423,8 @@ static void hci_disconn_complete_evt(struct hci_dev *hdev, void *data, case HCI_AUTO_CONN_DIRECT: case HCI_AUTO_CONN_ALWAYS: - list_del_init(¶ms->action); - list_add(¶ms->action, &hdev->pend_le_conns); + hci_pend_le_list_del_init(params); + hci_pend_le_list_add(params, &hdev->pend_le_conns); hci_update_passive_scan(hdev); break; @@ -5962,7 +5962,7 @@ static void le_conn_complete_evt(struct hci_dev *hdev, u8 status, params = hci_pend_le_action_lookup(&hdev->pend_le_conns, &conn->dst, conn->dst_type); if (params) { - list_del_init(¶ms->action); + hci_pend_le_list_del_init(params); if (params->conn) { hci_conn_drop(params->conn); hci_conn_put(params->conn); diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 8561616abbe5..4d1e32bb6a9c 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -2160,15 +2160,23 @@ static int hci_le_del_accept_list_sync(struct hci_dev *hdev, return 0; } +struct conn_params { + bdaddr_t addr; + u8 addr_type; + hci_conn_flags_t flags; + u8 privacy_mode; +}; + /* Adds connection to resolve list if needed. * Setting params to NULL programs local hdev->irk */ static int hci_le_add_resolve_list_sync(struct hci_dev *hdev, - struct hci_conn_params *params) + struct conn_params *params) { struct hci_cp_le_add_to_resolv_list cp; struct smp_irk *irk; struct bdaddr_list_with_irk *entry; + struct hci_conn_params *p; if (!use_ll_privacy(hdev)) return 0; @@ -2203,6 +2211,16 @@ static int hci_le_add_resolve_list_sync(struct hci_dev *hdev, /* Default privacy mode is always Network */ params->privacy_mode = HCI_NETWORK_PRIVACY; + rcu_read_lock(); + p = hci_pend_le_action_lookup(&hdev->pend_le_conns, + ¶ms->addr, params->addr_type); + if (!p) + p = hci_pend_le_action_lookup(&hdev->pend_le_reports, + ¶ms->addr, params->addr_type); + if (p) + WRITE_ONCE(p->privacy_mode, HCI_NETWORK_PRIVACY); + rcu_read_unlock(); + done: if (hci_dev_test_flag(hdev, HCI_PRIVACY)) memcpy(cp.local_irk, hdev->irk, 16); @@ -2215,7 +2233,7 @@ done: /* Set Device Privacy Mode. */ static int hci_le_set_privacy_mode_sync(struct hci_dev *hdev, - struct hci_conn_params *params) + struct conn_params *params) { struct hci_cp_le_set_privacy_mode cp; struct smp_irk *irk; @@ -2240,6 +2258,8 @@ static int hci_le_set_privacy_mode_sync(struct hci_dev *hdev, bacpy(&cp.bdaddr, &irk->bdaddr); cp.mode = HCI_DEVICE_PRIVACY; + /* Note: params->privacy_mode is not updated since it is a copy */ + return __hci_cmd_sync_status(hdev, HCI_OP_LE_SET_PRIVACY_MODE, sizeof(cp), &cp, HCI_CMD_TIMEOUT); } @@ -2249,7 +2269,7 @@ static int hci_le_set_privacy_mode_sync(struct hci_dev *hdev, * properly set the privacy mode. */ static int hci_le_add_accept_list_sync(struct hci_dev *hdev, - struct hci_conn_params *params, + struct conn_params *params, u8 *num_entries) { struct hci_cp_le_add_to_accept_list cp; @@ -2447,6 +2467,52 @@ struct sk_buff *hci_read_local_oob_data_sync(struct hci_dev *hdev, return __hci_cmd_sync_sk(hdev, opcode, 0, NULL, 0, HCI_CMD_TIMEOUT, sk); } +static struct conn_params *conn_params_copy(struct list_head *list, size_t *n) +{ + struct hci_conn_params *params; + struct conn_params *p; + size_t i; + + rcu_read_lock(); + + i = 0; + list_for_each_entry_rcu(params, list, action) + ++i; + *n = i; + + rcu_read_unlock(); + + p = kvcalloc(*n, sizeof(struct conn_params), GFP_KERNEL); + if (!p) + return NULL; + + rcu_read_lock(); + + i = 0; + list_for_each_entry_rcu(params, list, action) { + /* Racing adds are handled in next scan update */ + if (i >= *n) + break; + + /* No hdev->lock, but: addr, addr_type are immutable. + * privacy_mode is only written by us or in + * hci_cc_le_set_privacy_mode that we wait for. + * We should be idempotent so MGMT updating flags + * while we are processing is OK. + */ + bacpy(&p[i].addr, ¶ms->addr); + p[i].addr_type = params->addr_type; + p[i].flags = READ_ONCE(params->flags); + p[i].privacy_mode = READ_ONCE(params->privacy_mode); + ++i; + } + + rcu_read_unlock(); + + *n = i; + return p; +} + /* Device must not be scanning when updating the accept list. * * Update is done using the following sequence: @@ -2466,11 +2532,12 @@ struct sk_buff *hci_read_local_oob_data_sync(struct hci_dev *hdev, */ static u8 hci_update_accept_list_sync(struct hci_dev *hdev) { - struct hci_conn_params *params; + struct conn_params *params; struct bdaddr_list *b, *t; u8 num_entries = 0; bool pend_conn, pend_report; u8 filter_policy; + size_t i, n; int err; /* Pause advertising if resolving list can be used as controllers @@ -2504,6 +2571,7 @@ static u8 hci_update_accept_list_sync(struct hci_dev *hdev) if (hci_conn_hash_lookup_le(hdev, &b->bdaddr, b->bdaddr_type)) continue; + /* Pointers not dereferenced, no locks needed */ pend_conn = hci_pend_le_action_lookup(&hdev->pend_le_conns, &b->bdaddr, b->bdaddr_type); @@ -2532,23 +2600,50 @@ static u8 hci_update_accept_list_sync(struct hci_dev *hdev) * available accept list entries in the controller, then * just abort and return filer policy value to not use the * accept list. + * + * The list and params may be mutated while we wait for events, + * so make a copy and iterate it. */ - list_for_each_entry(params, &hdev->pend_le_conns, action) { - err = hci_le_add_accept_list_sync(hdev, params, &num_entries); - if (err) + + params = conn_params_copy(&hdev->pend_le_conns, &n); + if (!params) { + err = -ENOMEM; + goto done; + } + + for (i = 0; i < n; ++i) { + err = hci_le_add_accept_list_sync(hdev, ¶ms[i], + &num_entries); + if (err) { + kvfree(params); goto done; + } } + kvfree(params); + /* After adding all new pending connections, walk through * the list of pending reports and also add these to the * accept list if there is still space. Abort if space runs out. */ - list_for_each_entry(params, &hdev->pend_le_reports, action) { - err = hci_le_add_accept_list_sync(hdev, params, &num_entries); - if (err) + + params = conn_params_copy(&hdev->pend_le_reports, &n); + if (!params) { + err = -ENOMEM; + goto done; + } + + for (i = 0; i < n; ++i) { + err = hci_le_add_accept_list_sync(hdev, ¶ms[i], + &num_entries); + if (err) { + kvfree(params); goto done; + } } + kvfree(params); + /* Use the allowlist unless the following conditions are all true: * - We are not currently suspending * - There are 1 or more ADV monitors registered and it's not offloaded @@ -4837,12 +4932,12 @@ static void hci_pend_le_actions_clear(struct hci_dev *hdev) struct hci_conn_params *p; list_for_each_entry(p, &hdev->le_conn_params, list) { + hci_pend_le_list_del_init(p); if (p->conn) { hci_conn_drop(p->conn); hci_conn_put(p->conn); p->conn = NULL; } - list_del_init(&p->action); } BT_DBG("All LE pending actions cleared"); diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c index f7b2d0971f24..1e07d0f28972 100644 --- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -1297,15 +1297,15 @@ static void restart_le_actions(struct hci_dev *hdev) /* Needed for AUTO_OFF case where might not "really" * have been powered off. */ - list_del_init(&p->action); + hci_pend_le_list_del_init(p); switch (p->auto_connect) { case HCI_AUTO_CONN_DIRECT: case HCI_AUTO_CONN_ALWAYS: - list_add(&p->action, &hdev->pend_le_conns); + hci_pend_le_list_add(p, &hdev->pend_le_conns); break; case HCI_AUTO_CONN_REPORT: - list_add(&p->action, &hdev->pend_le_reports); + hci_pend_le_list_add(p, &hdev->pend_le_reports); break; default: break; @@ -5169,7 +5169,7 @@ static int set_device_flags(struct sock *sk, struct hci_dev *hdev, void *data, goto unlock; } - params->flags = current_flags; + WRITE_ONCE(params->flags, current_flags); status = MGMT_STATUS_SUCCESS; /* Update passive scan if HCI_CONN_FLAG_DEVICE_PRIVACY @@ -7580,7 +7580,7 @@ static int hci_conn_params_set(struct hci_dev *hdev, bdaddr_t *addr, if (params->auto_connect == auto_connect) return 0; - list_del_init(¶ms->action); + hci_pend_le_list_del_init(params); switch (auto_connect) { case HCI_AUTO_CONN_DISABLED: @@ -7589,18 +7589,18 @@ static int hci_conn_params_set(struct hci_dev *hdev, bdaddr_t *addr, * connect to device, keep connecting. */ if (params->explicit_connect) - list_add(¶ms->action, &hdev->pend_le_conns); + hci_pend_le_list_add(params, &hdev->pend_le_conns); break; case HCI_AUTO_CONN_REPORT: if (params->explicit_connect) - list_add(¶ms->action, &hdev->pend_le_conns); + hci_pend_le_list_add(params, &hdev->pend_le_conns); else - list_add(¶ms->action, &hdev->pend_le_reports); + hci_pend_le_list_add(params, &hdev->pend_le_reports); break; case HCI_AUTO_CONN_DIRECT: case HCI_AUTO_CONN_ALWAYS: if (!is_connected(hdev, addr, addr_type)) - list_add(¶ms->action, &hdev->pend_le_conns); + hci_pend_le_list_add(params, &hdev->pend_le_conns); break; } @@ -7823,9 +7823,7 @@ static int remove_device(struct sock *sk, struct hci_dev *hdev, goto unlock; } - list_del(¶ms->action); - list_del(¶ms->list); - kfree(params); + hci_conn_params_free(params); device_removed(sk, hdev, &cp->addr.bdaddr, cp->addr.type); } else { @@ -7856,9 +7854,7 @@ static int remove_device(struct sock *sk, struct hci_dev *hdev, p->auto_connect = HCI_AUTO_CONN_EXPLICIT; continue; } - list_del(&p->action); - list_del(&p->list); - kfree(p); + hci_conn_params_free(p); } bt_dev_dbg(hdev, "All LE connection parameters were removed"); -- cgit v1.2.3 From 7f7cfcb6f0825652973b780f248603e23f16ee90 Mon Sep 17 00:00:00 2001 From: Pauli Virtanen Date: Mon, 19 Jun 2023 01:04:32 +0300 Subject: Bluetooth: hci_event: call disconnect callback before deleting conn In hci_cs_disconnect, we do hci_conn_del even if disconnection failed. ISO, L2CAP and SCO connections refer to the hci_conn without hci_conn_get, so disconn_cfm must be called so they can clean up their conn, otherwise use-after-free occurs. ISO: ========================================================== iso_sock_connect:880: sk 00000000eabd6557 iso_connect_cis:356: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da ... iso_conn_add:140: hcon 000000001696f1fd conn 00000000b6251073 hci_dev_put:1487: hci0 orig refcnt 17 __iso_chan_add:214: conn 00000000b6251073 iso_sock_clear_timer:117: sock 00000000eabd6557 state 3 ... hci_rx_work:4085: hci0 Event packet hci_event_packet:7601: hci0: event 0x0f hci_cmd_status_evt:4346: hci0: opcode 0x0406 hci_cs_disconnect:2760: hci0: status 0x0c hci_sent_cmd_data:3107: hci0 opcode 0x0406 hci_conn_del:1151: hci0 hcon 000000001696f1fd handle 2560 hci_conn_unlink:1102: hci0: hcon 000000001696f1fd hci_conn_drop:1451: hcon 00000000d8521aaf orig refcnt 2 hci_chan_list_flush:2780: hcon 000000001696f1fd hci_dev_put:1487: hci0 orig refcnt 21 hci_dev_put:1487: hci0 orig refcnt 20 hci_req_cmd_complete:3978: opcode 0x0406 status 0x0c ... ... iso_sock_sendmsg:1098: sock 00000000dea5e2e0, sk 00000000eabd6557 BUG: kernel NULL pointer dereference, address: 0000000000000668 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 RIP: 0010:iso_sock_sendmsg (net/bluetooth/iso.c:1112) bluetooth ========================================================== L2CAP: ================================================================== hci_cmd_status_evt:4359: hci0: opcode 0x0406 hci_cs_disconnect:2760: hci0: status 0x0c hci_sent_cmd_data:3085: hci0 opcode 0x0406 hci_conn_del:1151: hci0 hcon ffff88800c999000 handle 3585 hci_conn_unlink:1102: hci0: hcon ffff88800c999000 hci_chan_list_flush:2780: hcon ffff88800c999000 hci_chan_del:2761: hci0 hcon ffff88800c999000 chan ffff888018ddd280 ... BUG: KASAN: slab-use-after-free in hci_send_acl+0x2d/0x540 [bluetooth] Read of size 8 at addr ffff888018ddd298 by task bluetoothd/1175 CPU: 0 PID: 1175 Comm: bluetoothd Tainted: G E 6.4.0-rc4+ #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 Call Trace: dump_stack_lvl+0x5b/0x90 print_report+0xcf/0x670 ? __virt_addr_valid+0xf8/0x180 ? hci_send_acl+0x2d/0x540 [bluetooth] kasan_report+0xa8/0xe0 ? hci_send_acl+0x2d/0x540 [bluetooth] hci_send_acl+0x2d/0x540 [bluetooth] ? __pfx___lock_acquire+0x10/0x10 l2cap_chan_send+0x1fd/0x1300 [bluetooth] ? l2cap_sock_sendmsg+0xf2/0x170 [bluetooth] ? __pfx_l2cap_chan_send+0x10/0x10 [bluetooth] ? lock_release+0x1d5/0x3c0 ? mark_held_locks+0x1a/0x90 l2cap_sock_sendmsg+0x100/0x170 [bluetooth] sock_write_iter+0x275/0x280 ? __pfx_sock_write_iter+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 do_iter_readv_writev+0x176/0x220 ? __pfx_do_iter_readv_writev+0x10/0x10 ? find_held_lock+0x83/0xa0 ? selinux_file_permission+0x13e/0x210 do_iter_write+0xda/0x340 vfs_writev+0x1b4/0x400 ? __pfx_vfs_writev+0x10/0x10 ? __seccomp_filter+0x112/0x750 ? populate_seccomp_data+0x182/0x220 ? __fget_light+0xdf/0x100 ? do_writev+0x19d/0x210 do_writev+0x19d/0x210 ? __pfx_do_writev+0x10/0x10 ? mark_held_locks+0x1a/0x90 do_syscall_64+0x60/0x90 ? lockdep_hardirqs_on_prepare+0x149/0x210 ? do_syscall_64+0x6c/0x90 ? lockdep_hardirqs_on_prepare+0x149/0x210 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7ff45cb23e64 Code: 15 d1 1f 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 80 3d 9d a7 0d 00 00 74 13 b8 14 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89 RSP: 002b:00007fff21ae09b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007ff45cb23e64 RDX: 0000000000000001 RSI: 00007fff21ae0aa0 RDI: 0000000000000017 RBP: 00007fff21ae0aa0 R08: 000000000095a8a0 R09: 0000607000053f40 R10: 0000000000000001 R11: 0000000000000202 R12: 00007fff21ae0ac0 R13: 00000fffe435c150 R14: 00007fff21ae0a80 R15: 000060f000000040 Allocated by task 771: kasan_save_stack+0x33/0x60 kasan_set_track+0x25/0x30 __kasan_kmalloc+0xaa/0xb0 hci_chan_create+0x67/0x1b0 [bluetooth] l2cap_conn_add.part.0+0x17/0x590 [bluetooth] l2cap_connect_cfm+0x266/0x6b0 [bluetooth] hci_le_remote_feat_complete_evt+0x167/0x310 [bluetooth] hci_event_packet+0x38d/0x800 [bluetooth] hci_rx_work+0x287/0xb20 [bluetooth] process_one_work+0x4f7/0x970 worker_thread+0x8f/0x620 kthread+0x17f/0x1c0 ret_from_fork+0x2c/0x50 Freed by task 771: kasan_save_stack+0x33/0x60 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x50 ____kasan_slab_free+0x169/0x1c0 slab_free_freelist_hook+0x9e/0x1c0 __kmem_cache_free+0xc0/0x310 hci_chan_list_flush+0x46/0x90 [bluetooth] hci_conn_cleanup+0x7d/0x330 [bluetooth] hci_cs_disconnect+0x35d/0x530 [bluetooth] hci_cmd_status_evt+0xef/0x2b0 [bluetooth] hci_event_packet+0x38d/0x800 [bluetooth] hci_rx_work+0x287/0xb20 [bluetooth] process_one_work+0x4f7/0x970 worker_thread+0x8f/0x620 kthread+0x17f/0x1c0 ret_from_fork+0x2c/0x50 ================================================================== Fixes: b8d290525e39 ("Bluetooth: clean up connection in hci_cs_disconnect") Signed-off-by: Pauli Virtanen Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_event.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index fbc5cb8548f2..31ca320ce38d 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -2784,6 +2784,9 @@ static void hci_cs_disconnect(struct hci_dev *hdev, u8 status) hci_enable_advertising(hdev); } + /* Inform sockets conn is gone before we delete it */ + hci_disconn_cfm(conn, HCI_ERROR_UNSPECIFIED); + goto done; } -- cgit v1.2.3 From d40ae85ee62e3666f45bc61864b22121346f88ef Mon Sep 17 00:00:00 2001 From: Pauli Virtanen Date: Mon, 19 Jun 2023 01:04:33 +0300 Subject: Bluetooth: ISO: fix iso_conn related locking and validity issues sk->sk_state indicates whether iso_pi(sk)->conn is valid. Operations that check/update sk_state and access conn should hold lock_sock, otherwise they can race. The order of taking locks is hci_dev_lock > lock_sock > iso_conn_lock, which is how it is in connect/disconnect_cfm -> iso_conn_del -> iso_chan_del. Fix locking in iso_connect_cis/bis and sendmsg/recvmsg to take lock_sock around updating sk_state and conn. iso_conn_del must not occur during iso_connect_cis/bis, as it frees the iso_conn. Hold hdev->lock longer to prevent that. This should not reintroduce the issue fixed in commit 241f51931c35 ("Bluetooth: ISO: Avoid circular locking dependency"), since the we acquire locks in order. We retain the fix in iso_sock_connect to release lock_sock before iso_connect_* acquires hdev->lock. Similarly for commit 6a5ad251b7cd ("Bluetooth: ISO: Fix possible circular locking dependency"). We retain the fix in iso_conn_ready to not acquire iso_conn_lock before lock_sock. iso_conn_add shall return iso_conn with valid hcon. Make it so also when reusing an old CIS connection waiting for disconnect timeout (see __iso_sock_close where conn->hcon is set to NULL). Trace with iso_conn_del after iso_chan_add in iso_connect_cis: =============================================================== iso_sock_create:771: sock 00000000be9b69b7 iso_sock_init:693: sk 000000004dff667e iso_sock_bind:827: sk 000000004dff667e 70:1a:b8:98:ff:a2 type 1 iso_sock_setsockopt:1289: sk 000000004dff667e iso_sock_setsockopt:1289: sk 000000004dff667e iso_sock_setsockopt:1289: sk 000000004dff667e iso_sock_connect:875: sk 000000004dff667e iso_connect_cis:353: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da hci_get_route:1199: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da hci_conn_add:1005: hci0 dst 28:3d:c2:4a:7e:da iso_conn_add:140: hcon 000000007b65d182 conn 00000000daf8625e __iso_chan_add:214: conn 00000000daf8625e iso_connect_cfm:1700: hcon 000000007b65d182 bdaddr 28:3d:c2:4a:7e:da status 12 iso_conn_del:187: hcon 000000007b65d182 conn 00000000daf8625e, err 16 iso_sock_clear_timer:117: sock 000000004dff667e state 3 iso_chan_del:153: sk 000000004dff667e, conn 00000000daf8625e, err 16 hci_conn_del:1151: hci0 hcon 000000007b65d182 handle 65535 hci_conn_unlink:1102: hci0: hcon 000000007b65d182 hci_chan_list_flush:2780: hcon 000000007b65d182 iso_sock_getsockopt:1376: sk 000000004dff667e iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e iso_sock_getsockopt:1376: sk 000000004dff667e iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e iso_sock_getname:1070: sock 00000000be9b69b7, sk 000000004dff667e iso_sock_shutdown:1434: sock 00000000be9b69b7, sk 000000004dff667e, how 1 __iso_sock_close:632: sk 000000004dff667e state 5 socket 00000000be9b69b7 BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 8000000006467067 P4D 8000000006467067 PUD 3f5f067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 RIP: 0010:__iso_sock_close (net/bluetooth/iso.c:664) bluetooth =============================================================== Trace with iso_conn_del before iso_chan_add in iso_connect_cis: =============================================================== iso_connect_cis:356: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7e:da ... iso_conn_add:140: hcon 0000000093bc551f conn 00000000768ae504 hci_dev_put:1487: hci0 orig refcnt 21 hci_event_packet:7607: hci0: event 0x0e hci_cmd_complete_evt:4231: hci0: opcode 0x2062 hci_cc_le_set_cig_params:3846: hci0: status 0x07 hci_sent_cmd_data:3107: hci0 opcode 0x2062 iso_connect_cfm:1703: hcon 0000000093bc551f bdaddr 28:3d:c2:4a:7e:da status 7 iso_conn_del:187: hcon 0000000093bc551f conn 00000000768ae504, err 12 hci_conn_del:1151: hci0 hcon 0000000093bc551f handle 65535 hci_conn_unlink:1102: hci0: hcon 0000000093bc551f hci_chan_list_flush:2780: hcon 0000000093bc551f __iso_chan_add:214: conn 00000000768ae504 iso_sock_clear_timer:117: sock 0000000098323f95 state 3 general protection fault, probably for non-canonical address 0x30b29c630930aec8: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1920 Comm: bluetoothd Tainted: G E 6.3.0-rc7+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 RIP: 0010:detach_if_pending+0x28/0xd0 Code: 90 90 0f 1f 44 00 00 48 8b 47 08 48 85 c0 0f 84 ad 00 00 00 55 89 d5 53 48 83 3f 00 48 89 fb 74 7d 66 90 48 8b 03 48 8b 53 08 <> RSP: 0018:ffffb90841a67d08 EFLAGS: 00010007 RAX: 0000000000000000 RBX: ffff9141bd5061b8 RCX: 0000000000000000 RDX: 30b29c630930aec8 RSI: ffff9141fdd21e80 RDI: ffff9141bd5061b8 RBP: 0000000000000001 R08: 0000000000000000 R09: ffffb90841a67b88 R10: 0000000000000003 R11: ffffffff8613f558 R12: ffff9141fdd21e80 R13: 0000000000000000 R14: ffff9141b5976010 R15: ffff914185755338 FS: 00007f45768bd840(0000) GS:ffff9141fdd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000619000424074 CR3: 0000000009f5e005 CR4: 0000000000170ee0 Call Trace: timer_delete+0x48/0x80 try_to_grab_pending+0xdf/0x170 __cancel_work+0x37/0xb0 iso_connect_cis+0x141/0x400 [bluetooth] =============================================================== Trace with NULL conn->hcon in state BT_CONNECT: =============================================================== __iso_sock_close:619: sk 00000000f7c71fc5 state 1 socket 00000000d90c5fe5 ... __iso_sock_close:619: sk 00000000f7c71fc5 state 8 socket 00000000d90c5fe5 iso_chan_del:153: sk 00000000f7c71fc5, conn 0000000022c03a7e, err 104 ... iso_sock_connect:862: sk 00000000129b56c3 iso_connect_cis:348: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7d:2a hci_get_route:1199: 70:1a:b8:98:ff:a2 -> 28:3d:c2:4a:7d:2a hci_dev_hold:1495: hci0 orig refcnt 19 __iso_chan_add:214: conn 0000000022c03a7e iso_sock_clear_timer:117: sock 00000000129b56c3 state 3 ... iso_sock_ready:1485: sk 00000000129b56c3 ... iso_sock_sendmsg:1077: sock 00000000e5013966, sk 00000000129b56c3 BUG: kernel NULL pointer dereference, address: 00000000000006a8 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1403 Comm: wireplumber Tainted: G E 6.3.0-rc7+ #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 RIP: 0010:iso_sock_sendmsg+0x63/0x2a0 [bluetooth] =============================================================== Fixes: 241f51931c35 ("Bluetooth: ISO: Avoid circular locking dependency") Fixes: 6a5ad251b7cd ("Bluetooth: ISO: Fix possible circular locking dependency") Signed-off-by: Pauli Virtanen Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/iso.c | 53 +++++++++++++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 22 deletions(-) diff --git a/net/bluetooth/iso.c b/net/bluetooth/iso.c index 0e6cc57b3911..505d62247268 100644 --- a/net/bluetooth/iso.c +++ b/net/bluetooth/iso.c @@ -123,8 +123,11 @@ static struct iso_conn *iso_conn_add(struct hci_conn *hcon) { struct iso_conn *conn = hcon->iso_data; - if (conn) + if (conn) { + if (!conn->hcon) + conn->hcon = hcon; return conn; + } conn = kzalloc(sizeof(*conn), GFP_KERNEL); if (!conn) @@ -300,14 +303,13 @@ static int iso_connect_bis(struct sock *sk) goto unlock; } - hci_dev_unlock(hdev); - hci_dev_put(hdev); + lock_sock(sk); err = iso_chan_add(conn, sk, NULL); - if (err) - return err; - - lock_sock(sk); + if (err) { + release_sock(sk); + goto unlock; + } /* Update source addr of the socket */ bacpy(&iso_pi(sk)->src, &hcon->src); @@ -321,7 +323,6 @@ static int iso_connect_bis(struct sock *sk) } release_sock(sk); - return err; unlock: hci_dev_unlock(hdev); @@ -389,14 +390,13 @@ static int iso_connect_cis(struct sock *sk) goto unlock; } - hci_dev_unlock(hdev); - hci_dev_put(hdev); + lock_sock(sk); err = iso_chan_add(conn, sk, NULL); - if (err) - return err; - - lock_sock(sk); + if (err) { + release_sock(sk); + goto unlock; + } /* Update source addr of the socket */ bacpy(&iso_pi(sk)->src, &hcon->src); @@ -413,7 +413,6 @@ static int iso_connect_cis(struct sock *sk) } release_sock(sk); - return err; unlock: hci_dev_unlock(hdev); @@ -1072,8 +1071,8 @@ static int iso_sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { struct sock *sk = sock->sk; - struct iso_conn *conn = iso_pi(sk)->conn; struct sk_buff *skb, **frag; + size_t mtu; int err; BT_DBG("sock %p, sk %p", sock, sk); @@ -1085,11 +1084,18 @@ static int iso_sock_sendmsg(struct socket *sock, struct msghdr *msg, if (msg->msg_flags & MSG_OOB) return -EOPNOTSUPP; - if (sk->sk_state != BT_CONNECTED) + lock_sock(sk); + + if (sk->sk_state != BT_CONNECTED) { + release_sock(sk); return -ENOTCONN; + } + + mtu = iso_pi(sk)->conn->hcon->hdev->iso_mtu; + + release_sock(sk); - skb = bt_skb_sendmsg(sk, msg, len, conn->hcon->hdev->iso_mtu, - HCI_ISO_DATA_HDR_SIZE, 0); + skb = bt_skb_sendmsg(sk, msg, len, mtu, HCI_ISO_DATA_HDR_SIZE, 0); if (IS_ERR(skb)) return PTR_ERR(skb); @@ -1102,8 +1108,7 @@ static int iso_sock_sendmsg(struct socket *sock, struct msghdr *msg, while (len) { struct sk_buff *tmp; - tmp = bt_skb_sendmsg(sk, msg, len, conn->hcon->hdev->iso_mtu, - 0, 0); + tmp = bt_skb_sendmsg(sk, msg, len, mtu, 0, 0); if (IS_ERR(tmp)) { kfree_skb(skb); return PTR_ERR(tmp); @@ -1158,15 +1163,19 @@ static int iso_sock_recvmsg(struct socket *sock, struct msghdr *msg, BT_DBG("sk %p", sk); if (test_and_clear_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags)) { + lock_sock(sk); switch (sk->sk_state) { case BT_CONNECT2: - lock_sock(sk); iso_conn_defer_accept(pi->conn->hcon); sk->sk_state = BT_CONFIG; release_sock(sk); return 0; case BT_CONNECT: + release_sock(sk); return iso_connect_cis(sk); + default: + release_sock(sk); + break; } } -- cgit v1.2.3 From 6910e2eb39254d279bce5bc0f8eb6af45b59357c Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 3 Jul 2023 13:30:48 +0200 Subject: Bluetooth: coredump: fix building with coredump disabled The btmtk driver uses an IS_ENABLED() check to conditionally compile the coredump support, but this fails to build because the hdev->dump member is in an #ifdef: drivers/bluetooth/btmtk.c: In function 'btmtk_process_coredump': drivers/bluetooth/btmtk.c:386:30: error: 'struct hci_dev' has no member named 'dump' 386 | schedule_delayed_work(&hdev->dump.dump_timeout, | ^~ The struct member doesn't really make a huge difference in the total size, so just remove the #ifdef around it to avoid adding similar checks around each user. Fixes: 872f8c253cb9e ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support") Fixes: 9695ef876fd12 ("Bluetooth: Add support for hci devcoredump") Signed-off-by: Arnd Bergmann Signed-off-by: Luiz Augusto von Dentz --- include/net/bluetooth/hci_core.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 870b6d3c5146..e01d52cb668c 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -593,9 +593,7 @@ struct hci_dev { const char *fw_info; struct dentry *debugfs; -#ifdef CONFIG_DEV_COREDUMP struct hci_devcoredump dump; -#endif struct device dev; -- cgit v1.2.3 From de6dfcefd107667ce2dbedf4d9337f5ed557a4a1 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Fri, 30 Jun 2023 15:33:14 -0700 Subject: Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor() KASAN reports that there's a use-after-free in hci_remove_adv_monitor(). Trawling through the disassembly, you can see that the complaint is from the access in bt_dev_dbg() under the HCI_ADV_MONITOR_EXT_MSFT case. The problem case happens because msft_remove_monitor() can end up freeing the monitor structure. Specifically: hci_remove_adv_monitor() -> msft_remove_monitor() -> msft_remove_monitor_sync() -> msft_le_cancel_monitor_advertisement_cb() -> hci_free_adv_monitor() Let's fix the problem by just stashing the relevant data when it's still valid. Fixes: 7cf5c2978f23 ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Douglas Anderson Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_core.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index b421e196f60c..1ec83985f1ab 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1972,6 +1972,7 @@ static int hci_remove_adv_monitor(struct hci_dev *hdev, struct adv_monitor *monitor) { int status = 0; + int handle; switch (hci_get_adv_monitor_offload_ext(hdev)) { case HCI_ADV_MONITOR_EXT_NONE: /* also goes here when powered off */ @@ -1980,9 +1981,10 @@ static int hci_remove_adv_monitor(struct hci_dev *hdev, goto free_monitor; case HCI_ADV_MONITOR_EXT_MSFT: + handle = monitor->handle; status = msft_remove_monitor(hdev, monitor); bt_dev_dbg(hdev, "%s remove monitor %d msft status %d", - hdev->name, monitor->handle, status); + hdev->name, handle, status); break; } -- cgit v1.2.3 From b4066eb04bb67e7ff66e5aaab0db4a753f37eaad Mon Sep 17 00:00:00 2001 From: Siddh Raman Pant Date: Tue, 11 Jul 2023 18:43:53 +0530 Subject: Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link hci_connect_sco currently returns NULL when there is no link (i.e. when hci_conn_link() returns NULL). sco_connect() expects an ERR_PTR in case of any error (see line 266 in sco.c). Thus, hcon set as NULL passes through to sco_conn_add(), which tries to get hcon->hdev, resulting in dereferencing a NULL pointer as reported by syzkaller. The same issue exists for iso_connect_cis() calling hci_connect_cis(). Thus, make hci_connect_sco() and hci_connect_cis() return ERR_PTR instead of NULL. Reported-and-tested-by: syzbot+37acd5d80d00d609d233@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=37acd5d80d00d609d233 Fixes: 06149746e720 ("Bluetooth: hci_conn: Add support for linking multiple hcon") Signed-off-by: Siddh Raman Pant Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/hci_conn.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c index db598942cab4..76222565e2df 100644 --- a/net/bluetooth/hci_conn.c +++ b/net/bluetooth/hci_conn.c @@ -1684,7 +1684,7 @@ struct hci_conn *hci_connect_sco(struct hci_dev *hdev, int type, bdaddr_t *dst, if (!link) { hci_conn_drop(acl); hci_conn_drop(sco); - return NULL; + return ERR_PTR(-ENOLINK); } sco->setting = setting; @@ -2254,7 +2254,7 @@ struct hci_conn *hci_connect_cis(struct hci_dev *hdev, bdaddr_t *dst, if (!link) { hci_conn_drop(le); hci_conn_drop(cis); - return NULL; + return ERR_PTR(-ENOLINK); } /* If LE is already connected and CIS handle is already set proceed to -- cgit v1.2.3 From 3dcaa192ac2159193bc6ab57bc5369dcb84edd8e Mon Sep 17 00:00:00 2001 From: Pauli Virtanen Date: Mon, 10 Jul 2023 19:48:19 +0300 Subject: Bluetooth: SCO: fix sco_conn related locking and validity issues Operations that check/update sk_state and access conn should hold lock_sock, otherwise they can race. The order of taking locks is hci_dev_lock > lock_sock > sco_conn_lock, which is how it is in connect/disconnect_cfm -> sco_conn_del -> sco_chan_del. Fix locking in sco_connect to take lock_sock around updating sk_state and conn. sco_conn_del must not occur during sco_connect, as it frees the sco_conn. Hold hdev->lock longer to prevent that. sco_conn_add shall return sco_conn with valid hcon. Make it so also when reusing an old SCO connection waiting for disconnect timeout (see __sco_sock_close where conn->hcon is set to NULL). This should not reintroduce the issue fixed in the earlier commit 9a8ec9e8ebb5 ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm"), the relevant fix of releasing lock_sock in sco_sock_connect before acquiring hdev->lock is retained. These changes mirror similar fixes earlier in ISO sockets. Fixes: 9a8ec9e8ebb5 ("Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm") Signed-off-by: Pauli Virtanen Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/sco.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c index cd1a27ac555d..7762604ddfc0 100644 --- a/net/bluetooth/sco.c +++ b/net/bluetooth/sco.c @@ -126,8 +126,11 @@ static struct sco_conn *sco_conn_add(struct hci_conn *hcon) struct hci_dev *hdev = hcon->hdev; struct sco_conn *conn = hcon->sco_data; - if (conn) + if (conn) { + if (!conn->hcon) + conn->hcon = hcon; return conn; + } conn = kzalloc(sizeof(struct sco_conn), GFP_KERNEL); if (!conn) @@ -268,21 +271,21 @@ static int sco_connect(struct sock *sk) goto unlock; } - hci_dev_unlock(hdev); - hci_dev_put(hdev); - conn = sco_conn_add(hcon); if (!conn) { hci_conn_drop(hcon); - return -ENOMEM; + err = -ENOMEM; + goto unlock; } - err = sco_chan_add(conn, sk, NULL); - if (err) - return err; - lock_sock(sk); + err = sco_chan_add(conn, sk, NULL); + if (err) { + release_sock(sk); + goto unlock; + } + /* Update source addr of the socket */ bacpy(&sco_pi(sk)->src, &hcon->src); @@ -296,8 +299,6 @@ static int sco_connect(struct sock *sk) release_sock(sk); - return err; - unlock: hci_dev_unlock(hdev); hci_dev_put(hdev); -- cgit v1.2.3 From 95b7015433053cd5f648ad2a7b8f43b2c99c949a Mon Sep 17 00:00:00 2001 From: Tomasz Moń Date: Thu, 13 Jul 2023 12:25:14 +0200 Subject: Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit c13380a55522 ("Bluetooth: btusb: Do not require hardcoded interface numbers") inadvertedly broke bluetooth on Intel Macbook 2014. The intention was to keep behavior intact when BTUSB_IFNUM_2 is set and otherwise allow any interface numbers. The problem is that the new logic condition omits the case where bInterfaceNumber is 0. Fix BTUSB_IFNUM_2 handling by allowing both interface number 0 and 2 when the flag is set. Fixes: c13380a55522 ("Bluetooth: btusb: Do not require hardcoded interface numbers") Reported-by: John Holland Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217651 Signed-off-by: Tomasz Moń Tested-by: John Holland Signed-off-by: Luiz Augusto von Dentz --- drivers/bluetooth/btusb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 5ec4ad0a5c86..764d176e9735 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -4104,6 +4104,7 @@ static int btusb_probe(struct usb_interface *intf, BT_DBG("intf %p id %p", intf, id); if ((id->driver_info & BTUSB_IFNUM_2) && + (intf->cur_altsetting->desc.bInterfaceNumber != 0) && (intf->cur_altsetting->desc.bInterfaceNumber != 2)) return -ENODEV; -- cgit v1.2.3 From d1f0a9816f5fbc1316355ec1aa4ddfb9b624cca5 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 17 Jul 2023 12:32:14 +0300 Subject: Bluetooth: MGMT: Use correct address for memcpy() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In function ‘fortify_memcpy_chk’, inlined from ‘get_conn_info_complete’ at net/bluetooth/mgmt.c:7281:2: include/linux/fortify-string.h:592:25: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 592 | __read_overflow2_field(q_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors This is due to the wrong member is used for memcpy(). Use correct one. Signed-off-by: Andy Shevchenko Signed-off-by: Luiz Augusto von Dentz --- net/bluetooth/mgmt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c index 1e07d0f28972..d4498037fadc 100644 --- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -7285,7 +7285,7 @@ static void get_conn_info_complete(struct hci_dev *hdev, void *data, int err) bt_dev_dbg(hdev, "err %d", err); - memcpy(&rp.addr, &cp->addr.bdaddr, sizeof(rp.addr)); + memcpy(&rp.addr, &cp->addr, sizeof(rp.addr)); status = mgmt_status(err); if (status == MGMT_STATUS_SUCCESS) { -- cgit v1.2.3 From 348b81b68b13ebd489a3e6a46aa1c384c731c919 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:47 +0000 Subject: tcp: annotate data-races around tp->tcp_tx_delay do_tcp_getsockopt() reads tp->tcp_tx_delay while another cpu might change its value. Fixes: a842fe1425cb ("tcp: add optional per socket transmit delay") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-2-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e03e08745308..bd6400e1ae9f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3674,7 +3674,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, case TCP_TX_DELAY: if (val) tcp_enable_tx_delay(); - tp->tcp_tx_delay = val; + WRITE_ONCE(tp->tcp_tx_delay, val); break; default: err = -ENOPROTOOPT; @@ -4154,7 +4154,7 @@ int do_tcp_getsockopt(struct sock *sk, int level, break; case TCP_TX_DELAY: - val = tp->tcp_tx_delay; + val = READ_ONCE(tp->tcp_tx_delay); break; case TCP_TIMESTAMP: -- cgit v1.2.3 From dd23c9f1e8d5c1d2e3d29393412385ccb9c7a948 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:48 +0000 Subject: tcp: annotate data-races around tp->tsoffset do_tcp_getsockopt() reads tp->tsoffset while another cpu might change its value. Fixes: 93be6ce0e91b ("tcp: set and get per-socket timestamp") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-3-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp.c | 4 ++-- net/ipv4/tcp_ipv4.c | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index bd6400e1ae9f..03ae6554c78d 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3656,7 +3656,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, if (!tp->repair) err = -EPERM; else - tp->tsoffset = val - tcp_time_stamp_raw(); + WRITE_ONCE(tp->tsoffset, val - tcp_time_stamp_raw()); break; case TCP_REPAIR_WINDOW: err = tcp_repair_set_window(tp, optval, optlen); @@ -4158,7 +4158,7 @@ int do_tcp_getsockopt(struct sock *sk, int level, break; case TCP_TIMESTAMP: - val = tcp_time_stamp_raw() + tp->tsoffset; + val = tcp_time_stamp_raw() + READ_ONCE(tp->tsoffset); break; case TCP_NOTSENT_LOWAT: val = tp->notsent_lowat; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index b5c81cf5b86f..069642014636 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -307,8 +307,9 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) inet->inet_daddr, inet->inet_sport, usin->sin_port)); - tp->tsoffset = secure_tcp_ts_off(net, inet->inet_saddr, - inet->inet_daddr); + WRITE_ONCE(tp->tsoffset, + secure_tcp_ts_off(net, inet->inet_saddr, + inet->inet_daddr)); } inet->inet_id = get_random_u16(); -- cgit v1.2.3 From 4164245c76ff906c9086758e1c3f87082a7f5ef5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:49 +0000 Subject: tcp: annotate data-races around tp->keepalive_time do_tcp_getsockopt() reads tp->keepalive_time while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-4-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/tcp.h | 7 +++++-- net/ipv4/tcp.c | 3 ++- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 226bce6d1e8c..9e1c3337cdfe 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1517,9 +1517,12 @@ static inline int keepalive_intvl_when(const struct tcp_sock *tp) static inline int keepalive_time_when(const struct tcp_sock *tp) { struct net *net = sock_net((struct sock *)tp); + int val; - return tp->keepalive_time ? : - READ_ONCE(net->ipv4.sysctl_tcp_keepalive_time); + /* Paired with WRITE_ONCE() in tcp_sock_set_keepidle_locked() */ + val = READ_ONCE(tp->keepalive_time); + + return val ? : READ_ONCE(net->ipv4.sysctl_tcp_keepalive_time); } static inline int keepalive_probes(const struct tcp_sock *tp) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 03ae6554c78d..b4f7856dfb16 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3312,7 +3312,8 @@ int tcp_sock_set_keepidle_locked(struct sock *sk, int val) if (val < 1 || val > MAX_TCP_KEEPIDLE) return -EINVAL; - tp->keepalive_time = val * HZ; + /* Paired with WRITE_ONCE() in keepalive_time_when() */ + WRITE_ONCE(tp->keepalive_time, val * HZ); if (sock_flag(sk, SOCK_KEEPOPEN) && !((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))) { u32 elapsed = keepalive_time_elapsed(tp); -- cgit v1.2.3 From 5ecf9d4f52ff2f1d4d44c9b68bc75688e82f13b4 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:50 +0000 Subject: tcp: annotate data-races around tp->keepalive_intvl do_tcp_getsockopt() reads tp->keepalive_intvl while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-5-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/tcp.h | 9 +++++++-- net/ipv4/tcp.c | 4 ++-- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 9e1c3337cdfe..2d8d58dec652 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1509,9 +1509,14 @@ void tcp_leave_memory_pressure(struct sock *sk); static inline int keepalive_intvl_when(const struct tcp_sock *tp) { struct net *net = sock_net((struct sock *)tp); + int val; + + /* Paired with WRITE_ONCE() in tcp_sock_set_keepintvl() + * and do_tcp_setsockopt(). + */ + val = READ_ONCE(tp->keepalive_intvl); - return tp->keepalive_intvl ? : - READ_ONCE(net->ipv4.sysctl_tcp_keepalive_intvl); + return val ? : READ_ONCE(net->ipv4.sysctl_tcp_keepalive_intvl); } static inline int keepalive_time_when(const struct tcp_sock *tp) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index b4f7856dfb16..d55fe014e7c9 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3345,7 +3345,7 @@ int tcp_sock_set_keepintvl(struct sock *sk, int val) return -EINVAL; lock_sock(sk); - tcp_sk(sk)->keepalive_intvl = val * HZ; + WRITE_ONCE(tcp_sk(sk)->keepalive_intvl, val * HZ); release_sock(sk); return 0; } @@ -3559,7 +3559,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, if (val < 1 || val > MAX_TCP_KEEPINTVL) err = -EINVAL; else - tp->keepalive_intvl = val * HZ; + WRITE_ONCE(tp->keepalive_intvl, val * HZ); break; case TCP_KEEPCNT: if (val < 1 || val > MAX_TCP_KEEPCNT) -- cgit v1.2.3 From 6e5e1de616bf5f3df1769abc9292191dfad9110a Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:51 +0000 Subject: tcp: annotate data-races around tp->keepalive_probes do_tcp_getsockopt() reads tp->keepalive_probes while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-6-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/tcp.h | 9 +++++++-- net/ipv4/tcp.c | 5 +++-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 2d8d58dec652..a8b93b2875ea 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1533,9 +1533,14 @@ static inline int keepalive_time_when(const struct tcp_sock *tp) static inline int keepalive_probes(const struct tcp_sock *tp) { struct net *net = sock_net((struct sock *)tp); + int val; + + /* Paired with WRITE_ONCE() in tcp_sock_set_keepcnt() + * and do_tcp_setsockopt(). + */ + val = READ_ONCE(tp->keepalive_probes); - return tp->keepalive_probes ? : - READ_ONCE(net->ipv4.sysctl_tcp_keepalive_probes); + return val ? : READ_ONCE(net->ipv4.sysctl_tcp_keepalive_probes); } static inline u32 keepalive_time_elapsed(const struct tcp_sock *tp) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index d55fe014e7c9..574fd0da1673 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3357,7 +3357,8 @@ int tcp_sock_set_keepcnt(struct sock *sk, int val) return -EINVAL; lock_sock(sk); - tcp_sk(sk)->keepalive_probes = val; + /* Paired with READ_ONCE() in keepalive_probes() */ + WRITE_ONCE(tcp_sk(sk)->keepalive_probes, val); release_sock(sk); return 0; } @@ -3565,7 +3566,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, if (val < 1 || val > MAX_TCP_KEEPCNT) err = -EINVAL; else - tp->keepalive_probes = val; + WRITE_ONCE(tp->keepalive_probes, val); break; case TCP_SYNCNT: if (val < 1 || val > MAX_TCP_SYNCNT) -- cgit v1.2.3 From 3a037f0f3c4bfe44518f2fbb478aa2f99a9cd8bb Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:52 +0000 Subject: tcp: annotate data-races around icsk->icsk_syn_retries do_tcp_getsockopt() and reqsk_timer_handler() read icsk->icsk_syn_retries while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-7-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/inet_connection_sock.c | 2 +- net/ipv4/tcp.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 0cc19cfbb673..aeebe8816689 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1019,7 +1019,7 @@ static void reqsk_timer_handler(struct timer_list *t) icsk = inet_csk(sk_listener); net = sock_net(sk_listener); - max_syn_ack_retries = icsk->icsk_syn_retries ? : + max_syn_ack_retries = READ_ONCE(icsk->icsk_syn_retries) ? : READ_ONCE(net->ipv4.sysctl_tcp_synack_retries); /* Normally all the openreqs are young and become mature * (i.e. converted to established socket) for first timeout. diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 574fd0da1673..9f74ac16f1c1 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3291,7 +3291,7 @@ int tcp_sock_set_syncnt(struct sock *sk, int val) return -EINVAL; lock_sock(sk); - inet_csk(sk)->icsk_syn_retries = val; + WRITE_ONCE(inet_csk(sk)->icsk_syn_retries, val); release_sock(sk); return 0; } @@ -3572,7 +3572,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, if (val < 1 || val > MAX_TCP_SYNCNT) err = -EINVAL; else - icsk->icsk_syn_retries = val; + WRITE_ONCE(icsk->icsk_syn_retries, val); break; case TCP_SAVE_SYN: @@ -3993,7 +3993,7 @@ int do_tcp_getsockopt(struct sock *sk, int level, val = keepalive_probes(tp); break; case TCP_SYNCNT: - val = icsk->icsk_syn_retries ? : + val = READ_ONCE(icsk->icsk_syn_retries) ? : READ_ONCE(net->ipv4.sysctl_tcp_syn_retries); break; case TCP_LINGER2: -- cgit v1.2.3 From 9df5335ca974e688389c875546e5819778a80d59 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:53 +0000 Subject: tcp: annotate data-races around tp->linger2 do_tcp_getsockopt() reads tp->linger2 while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-8-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 9f74ac16f1c1..2cf129a0c00b 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3585,11 +3585,11 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, case TCP_LINGER2: if (val < 0) - tp->linger2 = -1; + WRITE_ONCE(tp->linger2, -1); else if (val > TCP_FIN_TIMEOUT_MAX / HZ) - tp->linger2 = TCP_FIN_TIMEOUT_MAX; + WRITE_ONCE(tp->linger2, TCP_FIN_TIMEOUT_MAX); else - tp->linger2 = val * HZ; + WRITE_ONCE(tp->linger2, val * HZ); break; case TCP_DEFER_ACCEPT: @@ -3997,7 +3997,7 @@ int do_tcp_getsockopt(struct sock *sk, int level, READ_ONCE(net->ipv4.sysctl_tcp_syn_retries); break; case TCP_LINGER2: - val = tp->linger2; + val = READ_ONCE(tp->linger2); if (val >= 0) val = (val ? : READ_ONCE(net->ipv4.sysctl_tcp_fin_timeout)) / HZ; break; -- cgit v1.2.3 From ae488c74422fb1dcd807c0201804b3b5e8a322a3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:54 +0000 Subject: tcp: annotate data-races around rskq_defer_accept do_tcp_getsockopt() reads rskq_defer_accept while another cpu might change its value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-9-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 2cf129a0c00b..5beec71a5c41 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3594,9 +3594,9 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, case TCP_DEFER_ACCEPT: /* Translate value in seconds to number of retransmits */ - icsk->icsk_accept_queue.rskq_defer_accept = - secs_to_retrans(val, TCP_TIMEOUT_INIT / HZ, - TCP_RTO_MAX / HZ); + WRITE_ONCE(icsk->icsk_accept_queue.rskq_defer_accept, + secs_to_retrans(val, TCP_TIMEOUT_INIT / HZ, + TCP_RTO_MAX / HZ)); break; case TCP_WINDOW_CLAMP: @@ -4002,8 +4002,9 @@ int do_tcp_getsockopt(struct sock *sk, int level, val = (val ? : READ_ONCE(net->ipv4.sysctl_tcp_fin_timeout)) / HZ; break; case TCP_DEFER_ACCEPT: - val = retrans_to_secs(icsk->icsk_accept_queue.rskq_defer_accept, - TCP_TIMEOUT_INIT / HZ, TCP_RTO_MAX / HZ); + val = READ_ONCE(icsk->icsk_accept_queue.rskq_defer_accept); + val = retrans_to_secs(val, TCP_TIMEOUT_INIT / HZ, + TCP_RTO_MAX / HZ); break; case TCP_WINDOW_CLAMP: val = tp->window_clamp; -- cgit v1.2.3 From 1aeb87bc1440c5447a7fa2d6e3c2cca52cbd206b Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:55 +0000 Subject: tcp: annotate data-races around tp->notsent_lowat tp->notsent_lowat can be read locklessly from do_tcp_getsockopt() and tcp_poll(). Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-10-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/net/tcp.h | 6 +++++- net/ipv4/tcp.c | 4 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index a8b93b2875ea..0ca972ebd3dd 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2061,7 +2061,11 @@ void __tcp_v4_send_check(struct sk_buff *skb, __be32 saddr, __be32 daddr); static inline u32 tcp_notsent_lowat(const struct tcp_sock *tp) { struct net *net = sock_net((struct sock *)tp); - return tp->notsent_lowat ?: READ_ONCE(net->ipv4.sysctl_tcp_notsent_lowat); + u32 val; + + val = READ_ONCE(tp->notsent_lowat); + + return val ?: READ_ONCE(net->ipv4.sysctl_tcp_notsent_lowat); } bool tcp_stream_memory_free(const struct sock *sk, int wake); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5beec71a5c41..2b2241e9b492 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3664,7 +3664,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, err = tcp_repair_set_window(tp, optval, optlen); break; case TCP_NOTSENT_LOWAT: - tp->notsent_lowat = val; + WRITE_ONCE(tp->notsent_lowat, val); sk->sk_write_space(sk); break; case TCP_INQ: @@ -4164,7 +4164,7 @@ int do_tcp_getsockopt(struct sock *sk, int level, val = tcp_time_stamp_raw() + READ_ONCE(tp->tsoffset); break; case TCP_NOTSENT_LOWAT: - val = tp->notsent_lowat; + val = READ_ONCE(tp->notsent_lowat); break; case TCP_INQ: val = tp->recvmsg_inq; -- cgit v1.2.3 From 26023e91e12c68669db416b97234328a03d8e499 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:56 +0000 Subject: tcp: annotate data-races around icsk->icsk_user_timeout This field can be read locklessly from do_tcp_getsockopt() Fixes: dca43c75e7e5 ("tcp: Add TCP_USER_TIMEOUT socket option.") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-11-edumazet@google.com Signed-off-by: Jakub Kicinski --- net/ipv4/tcp.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 2b2241e9b492..3e137e9a18f5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3300,7 +3300,7 @@ EXPORT_SYMBOL(tcp_sock_set_syncnt); void tcp_sock_set_user_timeout(struct sock *sk, u32 val) { lock_sock(sk); - inet_csk(sk)->icsk_user_timeout = val; + WRITE_ONCE(inet_csk(sk)->icsk_user_timeout, val); release_sock(sk); } EXPORT_SYMBOL(tcp_sock_set_user_timeout); @@ -3620,7 +3620,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, if (val < 0) err = -EINVAL; else - icsk->icsk_user_timeout = val; + WRITE_ONCE(icsk->icsk_user_timeout, val); break; case TCP_FASTOPEN: @@ -4141,7 +4141,7 @@ int do_tcp_getsockopt(struct sock *sk, int level, break; case TCP_USER_TIMEOUT: - val = icsk->icsk_user_timeout; + val = READ_ONCE(icsk->icsk_user_timeout); break; case TCP_FASTOPEN: -- cgit v1.2.3 From 70f360dd7042cb843635ece9d28335a4addff9eb Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 19 Jul 2023 21:28:57 +0000 Subject: tcp: annotate data-races around fastopenq.max_qlen This field can be read locklessly. Fixes: 1536e2857bd3 ("tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner") Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20230719212857.3943972-12-edumazet@google.com Signed-off-by: Jakub Kicinski --- include/linux/tcp.h | 2 +- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_fastopen.c | 6 ++++-- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index b4c08ac86983..91a37c99ba66 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -513,7 +513,7 @@ static inline void fastopen_queue_tune(struct sock *sk, int backlog) struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue; int somaxconn = READ_ONCE(sock_net(sk)->core.sysctl_somaxconn); - queue->fastopenq.max_qlen = min_t(unsigned int, backlog, somaxconn); + WRITE_ONCE(queue->fastopenq.max_qlen, min_t(unsigned int, backlog, somaxconn)); } static inline void tcp_move_syn(struct tcp_sock *tp, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 3e137e9a18f5..8ed52e1e3c99 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4145,7 +4145,7 @@ int do_tcp_getsockopt(struct sock *sk, int level, break; case TCP_FASTOPEN: - val = icsk->icsk_accept_queue.fastopenq.max_qlen; + val = READ_ONCE(icsk->icsk_accept_queue.fastopenq.max_qlen); break; case TCP_FASTOPEN_CONNECT: diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 45cc7f1ca296..85e4953f1182 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -296,6 +296,7 @@ static struct sock *tcp_fastopen_create_child(struct sock *sk, static bool tcp_fastopen_queue_check(struct sock *sk) { struct fastopen_queue *fastopenq; + int max_qlen; /* Make sure the listener has enabled fastopen, and we don't * exceed the max # of pending TFO requests allowed before trying @@ -308,10 +309,11 @@ static bool tcp_fastopen_queue_check(struct sock *sk) * temporarily vs a server not supporting Fast Open at all. */ fastopenq = &inet_csk(sk)->icsk_accept_queue.fastopenq; - if (fastopenq->max_qlen == 0) + max_qlen = READ_ONCE(fastopenq->max_qlen); + if (max_qlen == 0) return false; - if (fastopenq->qlen >= fastopenq->max_qlen) { + if (fastopenq->qlen >= max_qlen) { struct request_sock *req1; spin_lock(&fastopenq->lock); req1 = fastopenq->rskq_rst_head; -- cgit v1.2.3 From 1c613beaf877c0c0d755853dc62687e2013e55c4 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Thu, 20 Jul 2023 03:02:31 +0300 Subject: net: phy: prevent stale pointer dereference in phy_init() mdio_bus_init() and phy_driver_register() both have error paths, and if those are ever hit, ethtool will have a stale pointer to the phy_ethtool_phy_ops stub structure, which references memory from a module that failed to load (phylib). It is probably hard to force an error in this code path even manually, but the error teardown path of phy_init() should be the same as phy_exit(), which is now simply not the case. Fixes: 55d8f053ce1b ("net: phy: Register ethtool PHY operations") Link: https://lore.kernel.org/netdev/ZLaiJ4G6TaJYGJyU@shell.armlinux.org.uk/ Suggested-by: Russell King (Oracle) Signed-off-by: Vladimir Oltean Link: https://lore.kernel.org/r/20230720000231.1939689-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski --- drivers/net/phy/phy_device.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 0c2014accba7..61921d4dbb13 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -3451,23 +3451,30 @@ static int __init phy_init(void) { int rc; + ethtool_set_ethtool_phy_ops(&phy_ethtool_phy_ops); + rc = mdio_bus_init(); if (rc) - return rc; + goto err_ethtool_phy_ops; - ethtool_set_ethtool_phy_ops(&phy_ethtool_phy_ops); features_init(); rc = phy_driver_register(&genphy_c45_driver, THIS_MODULE); if (rc) - goto err_c45; + goto err_mdio_bus; rc = phy_driver_register(&genphy_driver, THIS_MODULE); - if (rc) { - phy_driver_unregister(&genphy_c45_driver); + if (rc) + goto err_c45; + + return 0; + err_c45: - mdio_bus_exit(); - } + phy_driver_unregister(&genphy_c45_driver); +err_mdio_bus: + mdio_bus_exit(); +err_ethtool_phy_ops: + ethtool_set_ethtool_phy_ops(NULL); return rc; } -- cgit v1.2.3