summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)AuthorFilesLines
2023-02-09net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping ↵Vladimir Oltean1-4/+4
is used While running this selftest which usually passes: ~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0 TEST: swp0: Unicast IPv4 to primary MAC address [ OK ] TEST: swp0: Unicast IPv4 to macvlan MAC address [ OK ] TEST: swp0: Unicast IPv4 to unknown MAC address [ OK ] TEST: swp0: Unicast IPv4 to unknown MAC address, promisc [ OK ] TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti [ OK ] TEST: swp0: Multicast IPv4 to joined group [ OK ] TEST: swp0: Multicast IPv4 to unknown group [ OK ] TEST: swp0: Multicast IPv4 to unknown group, promisc [ OK ] TEST: swp0: Multicast IPv4 to unknown group, allmulti [ OK ] TEST: swp0: Multicast IPv6 to joined group [ OK ] TEST: swp0: Multicast IPv6 to unknown group [ OK ] TEST: swp0: Multicast IPv6 to unknown group, promisc [ OK ] TEST: swp0: Multicast IPv6 to unknown group, allmulti [ OK ] if I start PTP timestamping then run it again (debug prints added by me), the unknown IPv6 MC traffic is seen by the CPU port even when it should have been dropped: ~/selftests/drivers/net/dsa# ptp4l -i swp0 -2 -P -m ptp4l[225.410]: selected /dev/ptp1 as PTP clock [ 225.445746] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_add: port 0 adding L2 PTP trap [ 225.453815] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP event trap [ 225.462703] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_add: port 0 adding IPv4 PTP general trap [ 225.471768] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP event trap [ 225.480651] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_add: port 0 adding IPv6 PTP general trap ptp4l[225.488]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE ptp4l[225.488]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE ^C ~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0 TEST: swp0: Unicast IPv4 to primary MAC address [ OK ] TEST: swp0: Unicast IPv4 to macvlan MAC address [ OK ] TEST: swp0: Unicast IPv4 to unknown MAC address [ OK ] TEST: swp0: Unicast IPv4 to unknown MAC address, promisc [ OK ] TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti [ OK ] TEST: swp0: Multicast IPv4 to joined group [ OK ] TEST: swp0: Multicast IPv4 to unknown group [ OK ] TEST: swp0: Multicast IPv4 to unknown group, promisc [ OK ] TEST: swp0: Multicast IPv4 to unknown group, allmulti [ OK ] TEST: swp0: Multicast IPv6 to joined group [ OK ] TEST: swp0: Multicast IPv6 to unknown group [FAIL] reception succeeded, but should have failed TEST: swp0: Multicast IPv6 to unknown group, promisc [ OK ] TEST: swp0: Multicast IPv6 to unknown group, allmulti [ OK ] The PGID_MCIPV6 is configured correctly to not flood to the CPU, I checked that. Furthermore, when I disable back PTP RX timestamping (ptp4l doesn't do that when it exists), packets are RX filtered again as they should be: ~/selftests/drivers/net/dsa# hwstamp_ctl -i swp0 -r 0 [ 218.202854] mscc_felix 0000:00:00.5: ocelot_l2_ptp_trap_del: port 0 removing L2 PTP trap [ 218.212656] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP event trap [ 218.222975] mscc_felix 0000:00:00.5: ocelot_ipv4_ptp_trap_del: port 0 removing IPv4 PTP general trap [ 218.233133] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP event trap [ 218.242251] mscc_felix 0000:00:00.5: ocelot_ipv6_ptp_trap_del: port 0 removing IPv6 PTP general trap current settings: tx_type 1 rx_filter 12 new settings: tx_type 1 rx_filter 0 ~/selftests/drivers/net/dsa# ./local_termination.sh eno0 swp0 TEST: swp0: Unicast IPv4 to primary MAC address [ OK ] TEST: swp0: Unicast IPv4 to macvlan MAC address [ OK ] TEST: swp0: Unicast IPv4 to unknown MAC address [ OK ] TEST: swp0: Unicast IPv4 to unknown MAC address, promisc [ OK ] TEST: swp0: Unicast IPv4 to unknown MAC address, allmulti [ OK ] TEST: swp0: Multicast IPv4 to joined group [ OK ] TEST: swp0: Multicast IPv4 to unknown group [ OK ] TEST: swp0: Multicast IPv4 to unknown group, promisc [ OK ] TEST: swp0: Multicast IPv4 to unknown group, allmulti [ OK ] TEST: swp0: Multicast IPv6 to joined group [ OK ] TEST: swp0: Multicast IPv6 to unknown group [ OK ] TEST: swp0: Multicast IPv6 to unknown group, promisc [ OK ] TEST: swp0: Multicast IPv6 to unknown group, allmulti [ OK ] So it's clear that something in the PTP RX trapping logic went wrong. Looking a bit at the code, I can see that there are 4 typos, which populate "ipv4" VCAP IS2 key filter fields for IPv6 keys. VCAP IS2 keys of type OCELOT_VCAP_KEY_IPV4 and OCELOT_VCAP_KEY_IPV6 are handled by is2_entry_set(). OCELOT_VCAP_KEY_IPV4 looks at &filter->key.ipv4, and OCELOT_VCAP_KEY_IPV6 at &filter->key.ipv6. Simply put, when we populate the wrong key field, &filter->key.ipv6 fields "proto.mask" and "proto.value" remain all zeroes (or "don't care"). So is2_entry_set() will enter the "else" of this "if" condition: if (msk == 0xff && (val == IPPROTO_TCP || val == IPPROTO_UDP)) and proceed to ignore the "proto" field. The resulting rule will match on all IPv6 traffic, trapping it to the CPU. This is the reason why the local_termination.sh selftest sees it, because control traps are stronger than the PGID_MCIPV6 used for flooding (from the forwarding data path). But the problem is in fact much deeper. We trap all IPv6 traffic to the CPU, but if we're bridged, we set skb->offload_fwd_mark = 1, so software forwarding will not take place and IPv6 traffic will never reach its destination. The fix is simple - correct the typos. I was intentionally inaccurate in the commit message about the breakage occurring when any PTP timestamping is enabled. In fact it only happens when L4 timestamping is requested (HWTSTAMP_FILTER_PTP_V2_EVENT or HWTSTAMP_FILTER_PTP_V2_L4_EVENT). But ptp4l requests a larger RX timestamping filter than it needs for "-2": HWTSTAMP_FILTER_PTP_V2_EVENT. I wanted people skimming through git logs to not think that the bug doesn't affect them because they only use ptp4l in L2 mode. Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230207183117.1745754-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-09Merge tag 'mlx5-fixes-2023-02-07' of ↵Jakub Kicinski12-123/+76
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-02-07 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2023-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Serialize module cleanup with reload and remove net/mlx5: fw_tracer, Zero consumer index when reloading the tracer net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffers net/mlx5: Expose SF firmware pages counter net/mlx5: Store page counters in a single array net/mlx5e: IPoIB, Show unknown speed instead of error net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode net/mlx5: Bridge, fix ageing of peer FDB entries net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change ==================== Link: https://lore.kernel.org/r/20230208030302.95378-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch port 0Vladimir Oltean1-10/+14
Arınç reports that on his MT7621AT Unielec U7621-06 board and MT7623NI Bananapi BPI-R2, packets received by the CPU over mt7530 switch port 0 (of which this driver acts as the DSA master) are not processed correctly by software. More precisely, they arrive without a DSA tag (in packet or in the hwaccel area - skb_metadata_dst()), so DSA cannot demux them towards the switch's interface for port 0. Traffic from other ports receives a skb_metadata_dst() with the correct port and is demuxed properly. Looking at mtk_poll_rx(), it becomes apparent that this driver uses the skb vlan hwaccel area: union { u32 vlan_all; struct { __be16 vlan_proto; __u16 vlan_tci; }; }; as a temporary storage for the VLAN hwaccel tag, or the DSA hwaccel tag. If this is a DSA master it's a DSA hwaccel tag, and finally clears up the skb VLAN hwaccel header. I'm guessing that the problem is the (mis)use of API. skb_vlan_tag_present() looks like this: #define skb_vlan_tag_present(__skb) (!!(__skb)->vlan_all) So if both vlan_proto and vlan_tci are zeroes, skb_vlan_tag_present() returns precisely false. I don't know for sure what is the format of the DSA hwaccel tag, but I surely know that lowermost 3 bits of vlan_proto are 0 when receiving from port 0: unsigned int port = vlan_proto & GENMASK(2, 0); If the RX descriptor has no other bits set to non-zero values in RX_DMA_VTAG, then the call to __vlan_hwaccel_put_tag() will not, in fact, make the subsequent skb_vlan_tag_present() return true, because it's implemented like this: static inline void __vlan_hwaccel_put_tag(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci) { skb->vlan_proto = vlan_proto; skb->vlan_tci = vlan_tci; } What we need to do to fix this problem (assuming this is the problem) is to stop using skb->vlan_all as temporary storage for driver affairs, and just create some local variables that serve the same purpose, but hopefully better. Instead of calling skb_vlan_tag_present(), let's look at a boolean has_hwaccel_tag which we set to true when the RX DMA descriptors have something. Disambiguate based on netdev_uses_dsa() whether this is a VLAN or DSA hwaccel tag, and only call __vlan_hwaccel_put_tag() if we're certain it's a VLAN tag. Arınç confirms that the treatment works, so this validates the assumption. Link: https://lore.kernel.org/netdev/704f3a72-fc9e-714a-db54-272e17612637@arinc9.com/ Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging") Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08nfp: ethtool: fix the bug of setting unsupported port speedYu Xiao2-36/+170
Unsupported port speed can be set and cause error. Now fixing it and return an error if setting unsupported speed. This fix depends on the following, which was included in v6.2-rc1: commit a61474c41e8c ("nfp: ethtool: support reporting link modes"). Fixes: 7c698737270f ("nfp: add support for .set_link_ksettings()") Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net: ethernet: mtk_eth_soc: fix wrong parameters order in __xdp_rxq_info_reg()Tariq Toukan1-2/+2
Parameters 'queue_index' and 'napi_id' are passed in a swapped order. Fix it here. Fixes: 23233e577ef9 ("net: ethernet: mtk_eth_soc: rely on page_pool for single page buffers") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08net: ethernet: mtk_eth_soc: enable special tag when any MAC uses DSAArınç ÜNAL1-5/+3
The special tag is only enabled when the first MAC uses DSA. However, it must be enabled when any MAC uses DSA. Change the check accordingly. This fixes hardware DSA untagging not working on the second MAC of the MT7621 and MT7623 SoCs, and likely other SoCs too. Therefore, remove the check that disables hardware DSA untagging for the second MAC of the MT7621 and MT7623 SoCs. Fixes: a1f47752fd62 ("net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MAC") Co-developed-by: Richard van Schagen <richard@routerhints.com> Signed-off-by: Richard van Schagen <richard@routerhints.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-08Merge branch '100GbE' of ↵Jakub Kicinski6-22/+30
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-02-06 (ice) This series contains updates to ice driver only. Ani removes WQ_MEM_RECLAIM flag from workqueue to resolve check_flush_dependency warning. Michal fixes KASAN out-of-bounds warning. Brett corrects behaviour for port VLAN Rx filters to prevent receiving of unintended traffic. Dan Carpenter fixes possible off by one issue. Zhang Changzhong adjusts error path for switch recipe to prevent memory leak. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: switch: fix potential memleak in ice_add_adv_recipe() ice: Fix off by one in ice_tc_forward_to_queue() ice: Fix disabling Rx VLAN filtering with port VLAN enabled ice: fix out-of-bounds KASAN warning in virtchnl ice: Do not use WQ_MEM_RECLAIM flag for workqueue ==================== Link: https://lore.kernel.org/r/20230206232934.634298-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08igc: Add ndo_tx_timeout supportSasha Neftin1-2/+23
On some platforms, 100/1000/2500 speeds seem to have sometimes problems reporting false positive tx unit hang during stressful UDP traffic. Likely other Intel drivers introduce responses to a tx hang. Update the 'tx hang' comparator with the comparison of the head and tail of ring pointers and restore the tx_timeout_factor to the previous value (one). This can be test by using netperf or iperf3 applications. Example: iperf3 -s -p 5001 iperf3 -c 192.168.0.2 --udp -p 5001 --time 600 -b 0 netserver -p 16604 netperf -H 192.168.0.2 -l 600 -p 16604 -t UDP_STREAM -- -m 64000 Fixes: b27b8dc77b5e ("igc: Increase timeout value for Speed 100/1000/2500") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230206235818.662384-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08net: mana: Fix accessing freed irq affinity_hintHaiyang Zhang1-26/+11
After calling irq_set_affinity_and_hint(), the cpumask pointer is saved in desc->affinity_hint, and will be used later when reading /proc/irq/<num>/affinity_hint. So the cpumask variable needs to be persistent. Otherwise, we are accessing freed memory when reading the affinity_hint file. Also, need to clear affinity_hint before free_irq(), otherwise there is a one-time warning and stack trace during module unloading: [ 243.948687] WARNING: CPU: 10 PID: 1589 at kernel/irq/manage.c:1913 free_irq+0x318/0x360 ... [ 243.948753] Call Trace: [ 243.948754] <TASK> [ 243.948760] mana_gd_remove_irqs+0x78/0xc0 [mana] [ 243.948767] mana_gd_remove+0x3e/0x80 [mana] [ 243.948773] pci_device_remove+0x3d/0xb0 [ 243.948778] device_remove+0x46/0x70 [ 243.948782] device_release_driver_internal+0x1fe/0x280 [ 243.948785] driver_detach+0x4e/0xa0 [ 243.948787] bus_remove_driver+0x70/0xf0 [ 243.948789] driver_unregister+0x35/0x60 [ 243.948792] pci_unregister_driver+0x44/0x90 [ 243.948794] mana_driver_exit+0x14/0x3fe [mana] [ 243.948800] __do_sys_delete_module.constprop.0+0x185/0x2f0 To fix the bug, use the persistent mask, cpumask_of(cpu#), and set affinity_hint to NULL before freeing the IRQ, as required by free_irq(). Cc: stable@vger.kernel.org Fixes: 71fa6887eeca ("net: mana: Assign interrupts to CPUs based on NUMA nodes") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/1675718929-19565-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08hv_netvsc: Allocate memory in netvsc_dma_map() with GFP_ATOMICMichael Kelley1-1/+1
Memory allocations in the network transmit path must use GFP_ATOMIC so they won't sleep. Reported-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/lkml/8a4d08f94d3e6fe8b6da68440eaa89a088ad84f9.camel@redhat.com/ Fixes: 846da38de0e8 ("net: netvsc: Add Isolation VM support for netvsc driver") Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1675714317-48577-1-git-send-email-mikelley@microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-08net/mlx5: Serialize module cleanup with reload and removeShay Drory1-7/+7
Currently, remove and reload flows can run in parallel to module cleanup. This design is error prone. For example: aux_drivers callbacks are called from both cleanup and remove flows with different lockings, which can cause a deadlock[1]. Hence, serialize module cleanup with reload and remove. [1] cleanup remove ------- ------ auxiliary_driver_unregister(); devl_lock() auxiliary_device_delete(mlx5e_aux) device_lock(mlx5e_aux) devl_lock() device_lock(mlx5e_aux) Fixes: 912cebf420c2 ("net/mlx5e: Connect ethernet part to auxiliary bus") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08net/mlx5: fw_tracer, Zero consumer index when reloading the tracerShay Drory1-1/+1
When tracer is reloaded, the device will log the traces at the beginning of the log buffer. Also, driver is reading the log buffer in chunks in accordance to the consumer index. Hence, zero consumer index when reloading the tracer. Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08net/mlx5: fw_tracer, Clear load bit when freeing string DBs buffersShay Drory1-0/+1
Whenever the driver is reading the string DBs into buffers, the driver is setting the load bit, but the driver never clears this bit. As a result, in case load bit is on and the driver query the device for new string DBs, the driver won't read again the string DBs. Fix it by clearing the load bit when query the device for new string DBs. Fixes: 2d69356752ff ("net/mlx5: Add support for fw live patch event") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08net/mlx5: Expose SF firmware pages counterMaher Sanalla2-1/+2
Currently, each core device has VF pages counter which stores number of fw pages used by its VFs and SFs. The current design led to a hang when performing firmware reset on DPU, where the DPU PFs stalled in sriov unload flow due to waiting on release of SFs pages instead of waiting on only VFs pages. Thus, Add a separate counter for SF firmware pages, which will prevent the stall scenario described above. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08net/mlx5: Store page counters in a single arrayMaher Sanalla4-20/+25
Currently, an independent page counter is used for tracking memory usage for each function type such as VF, PF and host PF (DPU). For better code-readibilty, use a single array that stores the number of allocated memory pages for each function type. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08net/mlx5e: IPoIB, Show unknown speed instead of errorDragos Tatulea1-8/+5
ethtool is returning an error for unknown speeds for the IPoIB interface: $ ethtool ib0 netlink error: failed to retrieve link settings netlink error: Invalid argument netlink error: failed to retrieve link settings netlink error: Invalid argument Settings for ib0: Link detected: no After this change, ethtool will return success and show "unknown speed": $ ethtool ib0 Settings for ib0: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: Unknown! Duplex: Full Auto-negotiation: off Port: Other PHYAD: 0 Transceiver: internal Link detected: no Fixes: eb234ee9d541 ("net/mlx5e: IPoIB, Add support for get_link_ksettings in ethtool") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev modeAmir Tzin2-1/+5
Moving to switchdev mode with rx-vlan-filter on and then setting it off causes the kernel to crash since fs->vlan is freed during nic profile cleanup flow. RX VLAN filtering is not supported in switchdev mode so unset it when changing to switchdev and restore its value when switching back to legacy. trace: [] RIP: 0010:mlx5e_disable_cvlan_filter+0x43/0x70 [] set_feature_cvlan_filter+0x37/0x40 [mlx5_core] [] mlx5e_handle_feature+0x3a/0x60 [mlx5_core] [] mlx5e_set_features+0x6d/0x160 [mlx5_core] [] __netdev_update_features+0x288/0xa70 [] ethnl_set_features+0x309/0x380 [] ? __nla_parse+0x21/0x30 [] genl_family_rcv_msg_doit.isra.17+0x110/0x150 [] genl_rcv_msg+0x112/0x260 [] ? features_reply_size+0xe0/0xe0 [] ? genl_family_rcv_msg_doit.isra.17+0x150/0x150 [] netlink_rcv_skb+0x4e/0x100 [] genl_rcv+0x24/0x40 [] netlink_unicast+0x1ab/0x290 [] netlink_sendmsg+0x257/0x4f0 [] sock_sendmsg+0x5c/0x70 Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors") Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08net/mlx5: Bridge, fix ageing of peer FDB entriesVlad Buslov2-5/+1
SWITCHDEV_FDB_ADD_TO_BRIDGE event handler that updates FDB entry 'lastuse' field is only executed for eswitch that owns the entry. However, if peer entry processed packets at least once it will have hardware counter 'used' value greater than entry 'lastuse' from that point on, which will cause FDB entry not being aged out. Process the event on all eswitch instances. Fixes: ff9b7521468b ("net/mlx5: Bridge, support LAG") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08net/mlx5: DR, Fix potential race in dr_rule_create_rule_nicYevgeny Kliteynik1-10/+15
Selecting builder should be protected by the lock to prevent the case where a new rule sets a builder in the nic_matcher while the previous rule is still using the nic_matcher. Fixing this issue and cleaning the error flow. Fixes: b9b81e1e9382 ("net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-08net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag changeAdham Faris1-71/+15
rq->hw_mtu is used in function en_rx.c/mlx5e_skb_from_cqe_mpwrq_linear() to catch oversized packets. If FCS is concatenated to the end of the packet then the check should be updated accordingly. Rx rings initialization (mlx5e_init_rxq_rq()) invoked for every new set of channels, as part of mlx5e_safe_switch_params(), unknowingly if it runs with default configuration or not. Current rq->hw_mtu initialization assumes default configuration and ignores params->scatter_fcs_en flag state. Fix this, by accounting for params->scatter_fcs_en flag state during rq->hw_mtu initialization. In addition, updating rq->hw_mtu value during ingress traffic might lead to packets drop and oversize_pkts_sw_drop counter increase with no good reason. Hence we remove this optimization and switch the set of channels with a new one, to make sure we don't get false positives on the oversize_pkts_sw_drop counter. Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-07net: mscc: ocelot: fix VCAP filters not matching on MAC with "protocol 802.1Q"Vladimir Oltean1-12/+12
Alternative short title: don't instruct the hardware to match on EtherType with "protocol 802.1Q" flower filters. It doesn't work for the reasons detailed below. With a command such as the following: tc filter add dev $swp1 ingress chain $(IS1 2) pref 3 \ protocol 802.1Q flower skip_sw vlan_id 200 src_mac $h1_mac \ action vlan modify id 300 \ action goto chain $(IS2 0 0) the created filter is set by ocelot_flower_parse_key() to be of type OCELOT_VCAP_KEY_ETYPE, and etype is set to {value=0x8100, mask=0xffff}. This gets propagated all the way to is1_entry_set() which commits it to hardware (the VCAP_IS1_HK_ETYPE field of the key). Compare this to the case where src_mac isn't specified - the key type is OCELOT_VCAP_KEY_ANY, and is1_entry_set() doesn't populate VCAP_IS1_HK_ETYPE. The problem is that for VLAN-tagged frames, the hardware interprets the ETYPE field as holding the encapsulated VLAN protocol. So the above filter will only match those packets which have an encapsulated protocol of 0x8100, rather than all packets with VLAN ID 200 and the given src_mac. The reason why this is allowed to occur is because, although we have a block of code in ocelot_flower_parse_key() which sets "match_protocol" to false when VLAN keys are present, that code executes too late. There is another block of code, which executes for Ethernet addresses, and has a "goto finished_key_parsing" and skips the VLAN header parsing. By skipping it, "match_protocol" remains with the value it was initialized with, i.e. "true", and "proto" is set to f->common.protocol, or 0x8100. The concept of ignoring some keys rather than erroring out when they are present but can't be offloaded is dubious in itself, but is present since the initial commit fe3490e6107e ("net: mscc: ocelot: Hardware ofload for tc flower filter"), and it's outside of the scope of this patch to change that. The problem was introduced when the driver started to interpret the flower filter's protocol, and populate the VCAP filter's ETYPE field based on it. To fix this, it is sufficient to move the code that parses the VLAN keys earlier than the "goto finished_key_parsing" instruction. This will ensure that if we have a flower filter with both VLAN and Ethernet address keys, it won't match on ETYPE 0x8100, because the VLAN key parsing sets "match_protocol = false". Fixes: 86b956de119c ("net: mscc: ocelot: support matching on EtherType") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230205192409.1796428-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-07net: dsa: mt7530: don't change PVC_EG_TAG when CPU port becomes VLAN-awareVladimir Oltean1-7/+19
Frank reports that in a mt7530 setup where some ports are standalone and some are in a VLAN-aware bridge, 8021q uppers of the standalone ports lose their VLAN tag on xmit, as seen by the link partner. This seems to occur because once the other ports join the VLAN-aware bridge, mt7530_port_vlan_filtering() also calls mt7530_port_set_vlan_aware(ds, cpu_dp->index), and this affects the way that the switch processes the traffic of the standalone port. Relevant is the PVC_EG_TAG bit. The MT7530 documentation says about it: EG_TAG: Incoming Port Egress Tag VLAN Attribution 0: disabled (system default) 1: consistent (keep the original ingress tag attribute) My interpretation is that this setting applies on the ingress port, and "disabled" is basically the normal behavior, where the egress tag format of the packet (tagged or untagged) is decided by the VLAN table (MT7530_VLAN_EGRESS_UNTAG or MT7530_VLAN_EGRESS_TAG). But there is also an option of overriding the system default behavior, and for the egress tagging format of packets to be decided not by the VLAN table, but simply by copying the ingress tag format (if ingress was tagged, egress is tagged; if ingress was untagged, egress is untagged; aka "consistent). This is useful in 2 scenarios: - VLAN-unaware bridge ports will always encounter a miss in the VLAN table. They should forward a packet as-is, though. So we use "consistent" there. See commit e045124e9399 ("net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode"). - Traffic injected from the CPU port. The operating system is in god mode; if it wants a packet to exit as VLAN-tagged, it sends it as VLAN-tagged. Otherwise it sends it as VLAN-untagged*. *This is true only if we don't consider the bridge TX forwarding offload feature, which mt7530 doesn't support. So for now, make the CPU port always stay in "consistent" mode to allow software VLANs to be forwarded to their egress ports with the VLAN tag intact, and not stripped. Link: https://lore.kernel.org/netdev/trinity-e6294d28-636c-4c40-bb8b-b523521b00be-1674233135062@3c-app-gmx-bs36/ Fixes: e045124e9399 ("net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode") Reported-by: Frank Wunderlich <frank-w@public-files.de> Tested-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230205140713.1609281-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-07ice: switch: fix potential memleak in ice_add_adv_recipe()Zhang Changzhong1-1/+1
When ice_add_special_words() fails, the 'rm' is not released, which will lead to a memory leak. Fix this up by going to 'err_unroll' label. Compile tested only. Fixes: 8b032a55c1bd ("ice: low level support for tunnels") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-07ice: Fix off by one in ice_tc_forward_to_queue()Dan Carpenter1-1/+1
The > comparison should be >= to prevent reading one element beyond the end of the array. The "vsi->num_rxq" is not strictly speaking the number of elements in the vsi->rxq_map[] array. The array has "vsi->alloc_rxq" elements and "vsi->num_rxq" is less than or equal to the number of elements in the array. The array is allocated in ice_vsi_alloc_arrays(). It's still an off by one but it might not access outside the end of the array. Fixes: 143b86f346c7 ("ice: Enable RX queue selection using skbedit action") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-07ice: Fix disabling Rx VLAN filtering with port VLAN enabledBrett Creeley1-1/+15
If the user turns on the vf-true-promiscuous-support flag, then Rx VLAN filtering will be disabled if the VF requests to enable promiscuous mode. When the VF is in a port VLAN, this is the incorrect behavior because it will allow the VF to receive traffic outside of its port VLAN domain. Fortunately this only resulted in the VF(s) receiving broadcast traffic outside of the VLAN domain because all of the VLAN promiscuous rules are based on the port VLAN ID. Fix this by setting the .disable_rx_filtering VLAN op to a no-op when a port VLAN is enabled on the VF. Also, make sure to make this fix for both Single VLAN Mode and Double VLAN Mode enabled devices. Fixes: c31af68a1b94 ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-02-07ice: fix out-of-bounds KASAN warning in virtchnlMichal Swiatkowski2-18/+12
KASAN reported: [ 9793.708867] BUG: KASAN: global-out-of-bounds in ice_get_link_speed+0x16/0x30 [ice] [ 9793.709205] Read of size 4 at addr ffffffffc1271b1c by task kworker/6:1/402 [ 9793.709222] CPU: 6 PID: 402 Comm: kworker/6:1 Kdump: loaded Tainted: G B OE 6.1.0+ #3 [ 9793.709235] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018 [ 9793.709245] Workqueue: ice ice_service_task [ice] [ 9793.709575] Call Trace: [ 9793.709582] <TASK> [ 9793.709588] dump_stack_lvl+0x44/0x5c [ 9793.709613] print_report+0x17f/0x47b [ 9793.709632] ? __cpuidle_text_end+0x5/0x5 [ 9793.709653] ? ice_get_link_speed+0x16/0x30 [ice] [ 9793.709986] ? ice_get_link_speed+0x16/0x30 [ice] [ 9793.710317] kasan_report+0xb7/0x140 [ 9793.710335] ? ice_get_link_speed+0x16/0x30 [ice] [ 9793.710673] ice_get_link_speed+0x16/0x30 [ice] [ 9793.711006] ice_vc_notify_vf_link_state+0x14c/0x160 [ice] [ 9793.711351] ? ice_vc_repr_cfg_promiscuous_mode+0x120/0x120 [ice] [ 9793.711698] ice_vc_process_vf_msg+0x7a7/0xc00 [ice] [ 9793.712074] __ice_clean_ctrlq+0x98f/0xd20 [ice] [ 9793.712534] ? ice_bridge_setlink+0x410/0x410 [ice] [ 9793.712979] ? __request_module+0x320/0x520 [ 9793.713014] ? ice_process_vflr_event+0x27/0x130 [ice] [ 9793.713489] ice_service_task+0x11cf/0x1950 [ice] [ 9793.713948] ? io_schedule_timeout+0xb0/0xb0 [ 9793.713972] process_one_work+0x3d0/0x6a0 [ 9793.714003] worker_thread+0x8a/0x610 [ 9793.714031] ? process_one_work+0x6a0/0x6a0 [ 9793.714049] kthread+0x164/0x1a0 [ 9793.714071] ? kthread_complete_and_exit+0x20/0x20 [ 9793.714100] ret_from_fork+0x1f/0x30 [ 9793.714137] </TASK> [ 9793.714151] The buggy address belongs to the variable: [ 9793.714158] ice_aq_to_link_speed+0x3c/0xffffffffffff3520 [ice] [ 9793.714632] Memory state around the buggy address: [ 9793.714642] ffffffffc1271a00: f9 f9 f9 f9 00 00 05 f9 f9 f9 f9 f9 00 00 02 f9 [ 9793.714656] ffffffffc1271a80: f9 f9 f9 f9 00 00 04 f9 f9 f9 f9 f9 00 00 00 00 [ 9793.714670] >ffffffffc1271b00: 00 00 00 04 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9 [ 9793.714680] ^ [ 9793.714690] ffffffffc1271b80: 00 00 00 00 00 04 f9 f9 f9 f9 f9 f9 00 00 00 00 [ 9793.714704] ffffffffc1271c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The ICE_AQ_LINK_SPEED_UNKNOWN define is BIT(15). The value is bigger than both legacy and normal link speed tables. Add one element (0 - unknown) to both tables. There is no need to explicitly set table size, leave it empty. Fixes: 1d0e28a9be1f ("ice: Remove and replace ice speed defines with ethtool.h versions") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-07ice: Do not use WQ_MEM_RECLAIM flag for workqueueAnirudh Venkataramanan1-1/+1
When both ice and the irdma driver are loaded, a warning in check_flush_dependency is being triggered. This is due to ice driver workqueue being allocated with the WQ_MEM_RECLAIM flag and the irdma one is not. According to kernel documentation, this flag should be set if the workqueue will be involved in the kernel's memory reclamation flow. Since it is not, there is no need for the ice driver's WQ to have this flag set so remove it. Example trace: [ +0.000004] workqueue: WQ_MEM_RECLAIM ice:ice_service_task [ice] is flushing !WQ_MEM_RECLAIM infiniband:0x0 [ +0.000139] WARNING: CPU: 0 PID: 728 at kernel/workqueue.c:2632 check_flush_dependency+0x178/0x1a0 [ +0.000011] Modules linked in: bonding tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_cha in_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat intel_rapl_msr intel _rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1 0dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_ core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support ipmi_ssif irdma mei_me ib_uverbs ib_core intel_uncore joydev pcspkr i2c_i801 acpi_ipmi mei lpc_ich i2c_smbus intel_pch_thermal ioatdma ipmi_si acpi_power_meter acpi_pad xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ahci ixgbe libahci ice i40e igb crc32c_intel mdio i2c_algo_bit liba ta dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [ +0.000161] [last unloaded: bonding] [ +0.000006] CPU: 0 PID: 728 Comm: kworker/0:2 Tainted: G S 6.2.0-rc2_next-queue-13jan-00458-gc20aabd57164 #1 [ +0.000006] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ +0.000003] Workqueue: ice ice_service_task [ice] [ +0.000127] RIP: 0010:check_flush_dependency+0x178/0x1a0 [ +0.000005] Code: 89 8e 02 01 e8 49 3d 40 00 49 8b 55 18 48 8d 8d d0 00 00 00 48 8d b3 d0 00 00 00 4d 89 e0 48 c7 c7 e0 3b 08 9f e8 bb d3 07 01 <0f> 0b e9 be fe ff ff 80 3d 24 89 8e 02 00 0f 85 6b ff ff ff e9 06 [ +0.000004] RSP: 0018:ffff88810a39f990 EFLAGS: 00010282 [ +0.000005] RAX: 0000000000000000 RBX: ffff888141bc2400 RCX: 0000000000000000 [ +0.000004] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffa1213a80 [ +0.000003] RBP: ffff888194bf3400 R08: ffffed117b306112 R09: ffffed117b306112 [ +0.000003] R10: ffff888bd983088b R11: ffffed117b306111 R12: 0000000000000000 [ +0.000003] R13: ffff888111f84d00 R14: ffff88810a3943ac R15: ffff888194bf3400 [ +0.000004] FS: 0000000000000000(0000) GS:ffff888bd9800000(0000) knlGS:0000000000000000 [ +0.000003] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000003] CR2: 000056035b208b60 CR3: 000000017795e005 CR4: 00000000007706f0 [ +0.000003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000003] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000002] PKRU: 55555554 [ +0.000003] Call Trace: [ +0.000002] <TASK> [ +0.000003] __flush_workqueue+0x203/0x840 [ +0.000006] ? mutex_unlock+0x84/0xd0 [ +0.000008] ? __pfx_mutex_unlock+0x10/0x10 [ +0.000004] ? __pfx___flush_workqueue+0x10/0x10 [ +0.000006] ? mutex_lock+0xa3/0xf0 [ +0.000005] ib_cache_cleanup_one+0x39/0x190 [ib_core] [ +0.000174] __ib_unregister_device+0x84/0xf0 [ib_core] [ +0.000094] ib_unregister_device+0x25/0x30 [ib_core] [ +0.000093] irdma_ib_unregister_device+0x97/0xc0 [irdma] [ +0.000064] ? __pfx_irdma_ib_unregister_device+0x10/0x10 [irdma] [ +0.000059] ? up_write+0x5c/0x90 [ +0.000005] irdma_remove+0x36/0x90 [irdma] [ +0.000062] auxiliary_bus_remove+0x32/0x50 [ +0.000007] device_release_driver_internal+0xfa/0x1c0 [ +0.000005] bus_remove_device+0x18a/0x260 [ +0.000007] device_del+0x2e5/0x650 [ +0.000005] ? __pfx_device_del+0x10/0x10 [ +0.000003] ? mutex_unlock+0x84/0xd0 [ +0.000004] ? __pfx_mutex_unlock+0x10/0x10 [ +0.000004] ? _raw_spin_unlock+0x18/0x40 [ +0.000005] ice_unplug_aux_dev+0x52/0x70 [ice] [ +0.000160] ice_service_task+0x1309/0x14f0 [ice] [ +0.000134] ? __pfx___schedule+0x10/0x10 [ +0.000006] process_one_work+0x3b1/0x6c0 [ +0.000008] worker_thread+0x69/0x670 [ +0.000005] ? __kthread_parkme+0xec/0x110 [ +0.000007] ? __pfx_worker_thread+0x10/0x10 [ +0.000005] kthread+0x17f/0x1b0 [ +0.000005] ? __pfx_kthread+0x10/0x10 [ +0.000004] ret_from_fork+0x29/0x50 [ +0.000009] </TASK> Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Jakub Andrysiak <jakub.andrysiak@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
2023-02-06net: USB: Fix wrong-direction WARNING in plusb.cAlan Stern1-3/+1
The syzbot fuzzer detected a bug in the plusb network driver: A zero-length control-OUT transfer was treated as a read instead of a write. In modern kernels this error provokes a WARNING: usb 1-1: BOGUS control dir, pipe 80000280 doesn't match bRequestType c0 WARNING: CPU: 0 PID: 4645 at drivers/usb/core/urb.c:411 usb_submit_urb+0x14a7/0x1880 drivers/usb/core/urb.c:411 Modules linked in: CPU: 1 PID: 4645 Comm: dhcpcd Not tainted 6.2.0-rc6-syzkaller-00050-g9f266ccaa2f5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 RIP: 0010:usb_submit_urb+0x14a7/0x1880 drivers/usb/core/urb.c:411 ... Call Trace: <TASK> usb_start_wait_urb+0x101/0x4b0 drivers/usb/core/message.c:58 usb_internal_control_msg drivers/usb/core/message.c:102 [inline] usb_control_msg+0x320/0x4a0 drivers/usb/core/message.c:153 __usbnet_read_cmd+0xb9/0x390 drivers/net/usb/usbnet.c:2010 usbnet_read_cmd+0x96/0xf0 drivers/net/usb/usbnet.c:2068 pl_vendor_req drivers/net/usb/plusb.c:60 [inline] pl_set_QuickLink_features drivers/net/usb/plusb.c:75 [inline] pl_reset+0x2f/0xf0 drivers/net/usb/plusb.c:85 usbnet_open+0xcc/0x5d0 drivers/net/usb/usbnet.c:889 __dev_open+0x297/0x4d0 net/core/dev.c:1417 __dev_change_flags+0x587/0x750 net/core/dev.c:8530 dev_change_flags+0x97/0x170 net/core/dev.c:8602 devinet_ioctl+0x15a2/0x1d70 net/ipv4/devinet.c:1147 inet_ioctl+0x33f/0x380 net/ipv4/af_inet.c:979 sock_do_ioctl+0xcc/0x230 net/socket.c:1169 sock_ioctl+0x1f8/0x680 net/socket.c:1286 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The fix is to call usbnet_write_cmd() instead of usbnet_read_cmd() and remove the USB_DIR_IN flag. Reported-and-tested-by: syzbot+2a0e7abd24f1eb90ce25@syzkaller.appspotmail.com Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Fixes: 090ffa9d0e90 ("[PATCH] USB: usbnet (9/9) module for pl2301/2302 cables") CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/00000000000052099f05f3b3e298@google.com/ Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-06net: microchip: sparx5: fix PTP init/deinit not checking all portsCasper Andersson1-2/+2
Check all ports instead of just port_count ports. PTP init was only checking ports 0 to port_count. If the hardware ports are not mapped starting from 0 then they would be missed, e.g. if only ports 20-30 were mapped it would attempt to init ports 0-10, resulting in NULL pointers when attempting to timestamp. Now it will init all mapped ports. Fixes: 70dfe25cd866 ("net: sparx5: Update extraction/injection for timestamping") Signed-off-by: Casper Andersson <casper.casan@gmail.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-04ionic: missed doorbell workaroundAllen Hubbe6-4/+176
In one version of the HW there is a remote possibility that it will miss the doorbell ring. This adds a bit of protection to be sure we don't stall a queue from a missed doorbell. Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling") Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-04ionic: clear up notifyq alloc commentaryShannon Nelson1-5/+7
Make sure the q+cq alloc for NotifyQ is clearly documented and don't bother with unnecessary local variables. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-04ionic: clean interrupt before enabling queue to avoid credit raceNeel Patel1-3/+12
Clear the interrupt credits before enabling the queue rather than after to be sure that the enabled queue starts at 0 and that we don't wipe away possible credits after enabling the queue. Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling") Signed-off-by: Neel Patel <neel.patel@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-04net: phy: meson-gxl: use MMD access dummy stubs for GXL, internal PHYHeiner Kallweit1-0/+2
Jerome provided the information that also the GXL internal PHY doesn't support MMD register access and EEE. MMD reads return 0xffff, what results in e.g. completely wrong ethtool --show-eee output. Therefore use the MMD dummy stubs. Fixes: d853d145ea3e ("net: phy: add an option to disable EEE advertisement") Suggested-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/84432fe4-0be4-bc82-4e5c-557206b40f56@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-04net: macb: Perform zynqmp dynamic configuration only for SGMII interfaceRadhey Shyam Pandey1-15/+16
In zynqmp platforms where firmware supports dynamic SGMII configuration but has other non-SGMII ethernet devices, it fails them with no packets received at the RX interface. To fix this behaviour perform SGMII dynamic configuration only for the SGMII phy interface. Fixes: 32cee7818111 ("net: macb: Add zynqmp SGMII dynamic configuration support") Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reported-by: Michal Simek <michal.simek@amd.com> Tested-by: Michal Simek <michal.simek@amd.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/1675340779-27499-1-git-send-email-radhey.shyam.pandey@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-04bonding: fix error checking in bond_debug_reregister()Qi Zheng1-1/+1
Since commit ff9fb72bc077 ("debugfs: return error values, not NULL") changed return value of debugfs_rename() in error cases from %NULL to %ERR_PTR(-ERROR), we should also check error values instead of NULL. Fixes: ff9fb72bc077 ("debugfs: return error values, not NULL") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20230202093256.32458-1-zhengqi.arch@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-03net: phylink: move phy_device_free() to correctly release phy deviceClément Léger1-3/+2
After calling fwnode_phy_find_device(), the phy device refcount is incremented. Then, when the phy device is attached to a netdev with phy_attach_direct(), the refcount is also incremented but only decremented in the caller if phy_attach_direct() fails. Move phy_device_free() before the "if" to always release it correctly. Indeed, either phy_attach_direct() failed and we don't want to keep a reference to the phydev or it succeeded and a reference has been taken internally. Fixes: 25396f680dd6 ("net: phylink: introduce phylink_fwnode_phy_connect()") Signed-off-by: Clément Léger <clement.leger@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-02mtk_sgmii: enable PCS polling to allow SFP workAlexander Couzens1-0/+1
Currently there is no IRQ handling (even the SGMII supports it). Enable polling to support SFP ports. Fixes: 14a44ab0330d ("net: mtk_eth_soc: partially convert to phylink_pcs") Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Alexander Couzens <lynxis@fe80.eu> [ bmork: changed "1" => "true" ] Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02net: mediatek: sgmii: fix duplex configurationBjørn Mork2-4/+4
The logic of the duplex bit is inverted. Setting it means half duplex, not full duplex. Fix and rename macro to avoid confusion. Fixes: 7e538372694b ("net: ethernet: mediatek: Re-add support SGMII") Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02net: mediatek: sgmii: ensure the SGMII PHY is powered down on configurationAlexander Couzens2-11/+30
The code expect the PHY to be in power down which is only true after reset. Allow changes of the SGMII parameters more than once. Only power down when reconfiguring to avoid bouncing the link when there's no reason to - based on code from Russell King. There are cases when the SGMII_PHYA_PWD register contains 0x9 which prevents SGMII from working. The SGMII still shows link but no traffic can flow. Writing 0x0 to the PHYA_PWD register fix the issue. 0x0 was taken from a good working state of the SGMII interface. Fixes: 42c03844e93d ("net-next: mediatek: add support for MediaTek MT7622 SoC") Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Signed-off-by: Alexander Couzens <lynxis@fe80.eu> [ bmork: rebased and squashed into one patch ] Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02Merge tag 'linux-can-fixes-for-6.2-20230202' of ↵Jakub Kicinski1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== can 2023-02-02 The first patch is by Ziyang Xuan and removes a errant WARN_ON_ONCE() in the CAN J1939 protocol. The next 3 patches are by Oliver Hartkopp. The first 2 target the CAN ISO-TP protocol and fix the state machine with respect to signals and a regression found by the syzbot. The last patch is by me an missing assignment during the ethtool ring configuration callback. * tag 'linux-can-fixes-for-6.2-20230202' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing tx_obj_num_coalesce_irq can: isotp: split tx timer into transmission and timeout can: isotp: handle wait_event_interruptible() return values can: raw: fix CAN FD frame transmissions over CAN XL devices can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate ==================== Link: https://lore.kernel.org/r/20230202094135.2293939-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02net: ethernet: mtk_eth_soc: disable hardware DSA untagging for second MACArınç ÜNAL1-2/+4
According to my tests on MT7621AT and MT7623NI SoCs, hardware DSA untagging won't work on the second MAC. Therefore, disable this feature when the second MAC of the MT7621 and MT7623 SoCs is being used. Fixes: 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging") Link: https://lore.kernel.org/netdev/6249fc14-b38a-c770-36b4-5af6d41c21d3@arinc9.com/ Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Link: https://lore.kernel.org/r/20230128094232.2451947-1-arinc.unal@arinc9.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02virtio-net: Keep stop() to follow mirror sequence of open()Parav Pandit1-1/+1
Cited commit in fixes tag frees rxq xdp info while RQ NAPI is still enabled and packet processing may be ongoing. Follow the mirror sequence of open() in the stop() callback. This ensures that when rxq info is unregistered, no rx packet processing is ongoing. Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Link: https://lore.kernel.org/r/20230202163516.12559-1-parav@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02can: mcp251xfd: mcp251xfd_ring_set_ringparam(): assign missing ↵Marc Kleine-Budde1-0/+1
tx_obj_num_coalesce_irq If the a new ring layout is set, the max coalesced frames for RX and TX are re-calculated, too. Add the missing assignment of the newly calculated TX max coalesced frames. Fixes: 656fc12ddaf8 ("can: mcp251xfd: add TX IRQ coalescing ethtool support") Link: https://lore.kernel.org/all/20230130154334.1578518-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2023-02-02hv_netvsc: Fix missed pagebuf entries in netvsc_dma_map/unmap()Michael Kelley1-7/+2
netvsc_dma_map() and netvsc_dma_unmap() currently check the cp_partial flag and adjust the page_count so that pagebuf entries for the RNDIS portion of the message are skipped when it has already been copied into a send buffer. But this adjustment has already been made by code in netvsc_send(). The duplicate adjustment causes some pagebuf entries to not be mapped. In a normal VM, this doesn't break anything because the mapping doesn’t change the PFN. But in a Confidential VM, dma_map_single() does bounce buffering and provides a different PFN. Failing to do the mapping causes the wrong PFN to be passed to Hyper-V, and various errors ensue. Fix this by removing the duplicate adjustment in netvsc_dma_map() and netvsc_dma_unmap(). Fixes: 846da38de0e8 ("net: netvsc: Add Isolation VM support for netvsc driver") Cc: stable@vger.kernel.org Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://lore.kernel.org/r/1675135986-254490-1-git-send-email-mikelley@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-02octeontx2-af: Fix devlink unregisterRatheesh Kannoth1-8/+27
Exact match feature is only available in CN10K-B. Unregister exact match devlink entry only for this silicon variant. Fixes: 87e4ea29b030 ("octeontx2-af: Debugsfs support for exact match.") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230131061659.1025137-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02igc: return an error if the mac type is unknown in igc_ptp_systim_to_hwtstamp()Tom Rix1-5/+9
clang static analysis reports drivers/net/ethernet/intel/igc/igc_ptp.c:673:3: warning: The left operand of '+' is a garbage value [core.UndefinedBinaryOperatorResult] ktime_add_ns(shhwtstamps.hwtstamp, adjust); ^ ~~~~~~~~~~~~~~~~~~~~ igc_ptp_systim_to_hwtstamp() silently returns without setting the hwtstamp if the mac type is unknown. This should be treated as an error. Fixes: 81b055205e8b ("igc: Add support for RX timestamping") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230131215437.1528994-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-02nfp: flower: avoid taking mutex in atomic contextYanguo Li1-1/+7
A mutex may sleep, which is not permitted in atomic context. Avoid a case where this may arise by moving the to nfp_flower_lag_get_info_from_netdev() in nfp_tun_write_neigh() spinlock. Fixes: abc210952af7 ("nfp: flower: tunnel neigh support bond offload") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Yanguo Li <yanguo.li@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230131080313.2076060-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01net: phy: meson-gxl: Add generic dummy stubs for MMD register accessChris Healy1-0/+2
The Meson G12A Internal PHY does not support standard IEEE MMD extended register access, therefore add generic dummy stubs to fail the read and write MMD calls. This is necessary to prevent the core PHY code from erroneously believing that EEE is supported by this PHY even though this PHY does not support EEE, as MMD register access returns all FFFFs. Fixes: 5c3407abb338 ("net: phy: meson-gxl: add g12a support") Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Chris Healy <healych@amazon.com> Reviewed-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20230130231402.471493-1-cphealy@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01net: fman: memac: free mdio device if lynx_pcs_create() failsVladimir Oltean1-0/+3
When memory allocation fails in lynx_pcs_create() and it returns NULL, there remains a dangling reference to the mdiodev returned by of_mdio_find_device() which is leaked as soon as memac_pcs_create() returns empty-handed. Fixes: a7c2a32e7f22 ("net: fman: memac: Use lynx pcs driver") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Sean Anderson <sean.anderson@seco.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Link: https://lore.kernel.org/r/20230130193051.563315-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-31net: ethernet: mtk_eth_soc: Avoid truncating allocationKees Cook2-3/+1
There doesn't appear to be a reason to truncate the allocation used for flow_info, so do a full allocation and remove the unused empty struct. GCC does not like having a reference to an object that has been partially allocated, as bounds checking may become impossible when such an object is passed to other code. Seen with GCC 13: ../drivers/net/ethernet/mediatek/mtk_ppe.c: In function 'mtk_foe_entry_commit_subflow': ../drivers/net/ethernet/mediatek/mtk_ppe.c:623:18: warning: array subscript 'struct mtk_flow_entry[0]' is partly outside array bounds of 'unsigned char[48]' [-Warray-bounds=] 623 | flow_info->l2_data.base_flow = entry; | ^~ Cc: Felix Fietkau <nbd@nbd.name> Cc: John Crispin <john@phrozen.org> Cc: Sean Wang <sean.wang@mediatek.com> Cc: Mark Lee <Mark-MC.Lee@mediatek.com> Cc: Lorenzo Bianconi <lorenzo@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: netdev@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mediatek@lists.infradead.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230127223853.never.014-kees@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>