summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2023-06-01net/mlx5: Remove rmap also in case dynamic MSIX not supportedShay Drory1-1/+1
mlx5 add IRQs to rmap upon MSIX request, and mlx5 remove rmap from MSIX only if msi_map.index is populated. However, msi_map.index is populated only when dynamic MSIX is supported. This results in freeing IRQs without removing them from rmap, which triggers the bellow WARN_ON[1]. rmap is a feature which have no relation to dynamic MSIX. Hence, remove the check of msi_map.index when removing IRQ from rmap. [1] [ 200.307160 ] WARNING: CPU: 20 PID: 1702 at kernel/irq/manage.c:2034 free_irq+0x2ac/0x358 [ 200.316990 ] CPU: 20 PID: 1702 Comm: modprobe Not tainted 6.4.0-rc3_for_upstream_min_debug_2023_05_24_14_02 #1 [ 200.318939 ] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 200.321659 ] pc : free_irq+0x2ac/0x358 [ 200.322400 ] lr : free_irq+0x20/0x358 [ 200.337865 ] Call trace: [ 200.338360 ] free_irq+0x2ac/0x358 [ 200.339029 ] irq_release+0x58/0xd0 [mlx5_core] [ 200.340093 ] mlx5_irqs_release_vectors+0x80/0xb0 [mlx5_core] [ 200.341344 ] destroy_comp_eqs+0x120/0x170 [mlx5_core] [ 200.342469 ] mlx5_eq_table_destroy+0x1c/0x38 [mlx5_core] [ 200.343645 ] mlx5_unload+0x8c/0xc8 [mlx5_core] [ 200.344652 ] mlx5_uninit_one+0x78/0x118 [mlx5_core] [ 200.345745 ] remove_one+0x80/0x108 [mlx5_core] [ 200.346752 ] pci_device_remove+0x40/0xd8 [ 200.347554 ] device_remove+0x50/0x88 [ 200.348272 ] device_release_driver_internal+0x1c4/0x228 [ 200.349312 ] driver_detach+0x54/0xa0 [ 200.350030 ] bus_remove_driver+0x74/0x100 [ 200.350833 ] driver_unregister+0x34/0x68 [ 200.351619 ] pci_unregister_driver+0x28/0xa0 [ 200.352476 ] mlx5_cleanup+0x14/0x2210 [mlx5_core] [ 200.353536 ] __arm64_sys_delete_module+0x190/0x2e8 [ 200.354495 ] el0_svc_common.constprop.0+0x6c/0x1d0 [ 200.355455 ] do_el0_svc+0x38/0x98 [ 200.356122 ] el0_svc+0x1c/0x80 [ 200.356739 ] el0t_64_sync_handler+0xb4/0x130 [ 200.357604 ] el0t_64_sync+0x174/0x178 [ 200.358345 ] ---[ end trace 0000000000000000 ]--- Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-30net: mana: Fix perf regression: remove rx_cqes, tx_cqes countersHaiyang Zhang2-12/+0
The apc->eth_stats.rx_cqes is one per NIC (vport), and it's on the frequent and parallel code path of all queues. So, r/w into this single shared variable by many threads on different CPUs creates a lot caching and memory overhead, hence perf regression. And, it's not accurate due to the high volume concurrent r/w. For example, a workload is iperf with 128 threads, and with RPS enabled. We saw perf regression of 25% with the previous patch adding the counters. And this patch eliminates the regression. Since the error path of mana_poll_rx_cq() already has warnings, so keeping the counter and convert it to a per-queue variable is not necessary. So, just remove this counter from this high frequency code path. Also, remove the tx_cqes counter for the same reason. We have warnings & other counters for errors on that path, and don't need to count every normal cqe processing. Cc: stable@vger.kernel.org Fixes: bd7fc6e1957c ("net: mana: Add new MANA VF performance counters for easier troubleshooting") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/1685115537-31675-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-30net: usb: qmi_wwan: Set DTR quirk for BroadMobi BM818Sebastian Krzyszkowiak1-1/+1
BM818 is based on Qualcomm MDM9607 chipset. Fixes: 9a07406b00cd ("net: usb: qmi_wwan: Add the BroadMobi BM818 card") Cc: stable@vger.kernel.org Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm> Acked-by: Bjørn Mork <bjorn@mork.no> Link: https://lore.kernel.org/r/20230526-bm818-dtr-v1-1-64bbfa6ba8af@puri.sm Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-26nfcsim.c: Fix error checking for debugfs_create_dirOsama Muhammad1-4/+0
This patch fixes the error checking in nfcsim.c. The DebugFS kernel API is developed in a way that the caller can safely ignore the errors that occur during the creation of DebugFS nodes. Signed-off-by: Osama Muhammad <osmtendev@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-26amd-xgbe: fix the false linkup in xgbe_phy_statusRaju Rangoju1-3/+9
In the event of a change in XGBE mode, the current auto-negotiation needs to be reset and the AN cycle needs to be re-triggerred. However, the current code ignores the return value of xgbe_set_mode(), leading to false information as the link is declared without checking the status register. Fix this by propagating the mode switch status information to xgbe_phy_status(). Fixes: e57f7a3feaef ("amd-xgbe: Prepare for working with more than one type of phy") Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com> Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-26Merge tag 'mlx5-fixes-2023-05-24' of ↵Jakub Kicinski17-167/+241
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-05-24 This series includes bug fixes for the mlx5 driver. * tag 'mlx5-fixes-2023-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: Documentation: net/mlx5: Wrap notes in admonition blocks Documentation: net/mlx5: Add blank line separator before numbered lists Documentation: net/mlx5: Use bullet and definition lists for vnic counters description Documentation: net/mlx5: Wrap vnic reporter devlink commands in code blocks net/mlx5: Fix check for allocation failure in comp_irqs_request_pci() net/mlx5: DR, Add missing mutex init/destroy in pattern manager net/mlx5e: Move Ethernet driver debugfs to profile init callback net/mlx5e: Don't attach netdev profile while handling internal error net/mlx5: Fix post parse infra to only parse every action once net/mlx5e: Use query_special_contexts cmd only once per mdev net/mlx5: fw_tracer, Fix event handling net/mlx5: SF, Drain health before removing device net/mlx5: Drain health before unregistering devlink net/mlx5e: Do not update SBCM when prio2buffer command is invalid net/mlx5e: Consider internal buffers size in port buffer calculations net/mlx5e: Prevent encap offload when neigh update is running net/mlx5e: Extract remaining tunnel encap code to dedicated file ==================== Link: https://lore.kernel.org/r/20230525034847.99268-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-26net: stmmac: fix call trace when stmmac_xdp_xmit() is invokedWei Fang2-2/+7
We encountered a kernel call trace issue which was related to ndo_xdp_xmit callback on our i.MX8MP platform. The reproduce steps show as follows. 1. The FEC port (eth0) connects to a PC port, and the PC uses pktgen_sample03_burst_single_flow.sh to generate packets and send these packets to the FEC port. Notice that the script must be executed before step 2. 2. Run the "./xdp_redirect eth0 eth1" command on i.MX8MP, the eth1 interface is the dwmac. Then there will be a call trace issue soon. Please see the log for more details. The root cause is that the NETDEV_XDP_ACT_NDO_XMIT feature is enabled by default, so when the step 2 command is exexcuted and packets have already been sent to eth0, the stmmac_xdp_xmit() starts running before the stmmac_xdp_set_prog() finishes. To resolve this issue, we disable the NETDEV_XDP_ACT_NDO_XMIT feature by default and turn on/off this feature when the bpf program is installed/uninstalled which just like the other ethernet drivers. Call Trace log: [ 306.311271] ------------[ cut here ]------------ [ 306.315910] WARNING: CPU: 0 PID: 15 at lib/timerqueue.c:55 timerqueue_del+0x68/0x70 [ 306.323590] Modules linked in: [ 306.326654] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 6.4.0-rc1+ #37 [ 306.333277] Hardware name: NXP i.MX8MPlus EVK board (DT) [ 306.338591] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 306.345561] pc : timerqueue_del+0x68/0x70 [ 306.349577] lr : __remove_hrtimer+0x5c/0xa0 [ 306.353777] sp : ffff80000b7c3920 [ 306.357094] x29: ffff80000b7c3920 x28: 0000000000000000 x27: 0000000000000001 [ 306.364244] x26: ffff80000a763a40 x25: ffff0000d0285a00 x24: 0000000000000001 [ 306.371390] x23: 0000000000000001 x22: ffff000179389a40 x21: 0000000000000000 [ 306.378537] x20: ffff000179389aa0 x19: ffff0000d2951308 x18: 0000000000001000 [ 306.385686] x17: f1d3000000000000 x16: 00000000c39c1000 x15: 55e99bbe00001a00 [ 306.392835] x14: 09000900120aa8c0 x13: e49af1d300000000 x12: 000000000000c39c [ 306.399987] x11: 100055e99bbe0000 x10: ffff8000090b1048 x9 : ffff8000081603fc [ 306.407133] x8 : 000000000000003c x7 : 000000000000003c x6 : 0000000000000001 [ 306.414284] x5 : ffff0000d2950980 x4 : 0000000000000000 x3 : 0000000000000000 [ 306.421432] x2 : 0000000000000001 x1 : ffff0000d2951308 x0 : ffff0000d2951308 [ 306.428585] Call trace: [ 306.431035] timerqueue_del+0x68/0x70 [ 306.434706] __remove_hrtimer+0x5c/0xa0 [ 306.438549] hrtimer_start_range_ns+0x2bc/0x370 [ 306.443089] stmmac_xdp_xmit+0x174/0x1b0 [ 306.447021] bq_xmit_all+0x194/0x4b0 [ 306.450612] __dev_flush+0x4c/0x98 [ 306.454024] xdp_do_flush+0x18/0x38 [ 306.457522] fec_enet_rx_napi+0x6c8/0xc68 [ 306.461539] __napi_poll+0x40/0x220 [ 306.465038] net_rx_action+0xf8/0x240 [ 306.468707] __do_softirq+0x128/0x3a8 [ 306.472378] run_ksoftirqd+0x40/0x58 [ 306.475961] smpboot_thread_fn+0x1c4/0x288 [ 306.480068] kthread+0x124/0x138 [ 306.483305] ret_from_fork+0x10/0x20 [ 306.486889] ---[ end trace 0000000000000000 ]--- Fixes: 66c0e13ad236 ("drivers: net: turn on XDP features") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230524125714.357337-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-26net: mellanox: mlxbf_gige: Fix skb_panic splat under memory pressureThomas Bogendoerfer1-6/+7
Do skb_put() after a new skb has been successfully allocated otherwise the reused skb leads to skb_panics or incorrect packet sizes. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230524194908.147145-1-tbogendoerfer@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25Merge tag 'net-6.4-rc4' of ↵Linus Torvalds31-196/+370
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and bpf. Current release - regressions: - net: fix skb leak in __skb_tstamp_tx() - eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs Current release - new code bugs: - handshake: - fix sock->file allocation - fix handshake_dup() ref counting - bluetooth: - fix potential double free caused by hci_conn_unlink - fix UAF in hci_conn_hash_flush Previous releases - regressions: - core: fix stack overflow when LRO is disabled for virtual interfaces - tls: fix strparser rx issues - bpf: - fix many sockmap/TCP related issues - fix a memory leak in the LRU and LRU_PERCPU hash maps - init the offload table earlier - eth: mlx5e: - do as little as possible in napi poll when budget is 0 - fix using eswitch mapping in nic mode - fix deadlock in tc route query code Previous releases - always broken: - udplite: fix NULL pointer dereference in __sk_mem_raise_allocated() - raw: fix output xfrm lookup wrt protocol - smc: reset connection when trying to use SMCRv2 fails - phy: mscc: enable VSC8501/2 RGMII RX clock - eth: octeontx2-pf: fix TSOv6 offload - eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize" * tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits) udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). net: phy: mscc: enable VSC8501/2 RGMII RX clock net: phy: mscc: remove unnecessary phydev locking net: phy: mscc: add support for VSC8501 net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE net/handshake: Enable the SNI extension to work properly net/handshake: Unpin sock->file if a handshake is cancelled net/handshake: handshake_genl_notify() shouldn't ignore @flags net/handshake: Fix uninitialized local variable net/handshake: Fix handshake_dup() ref counting net/handshake: Remove unneeded check from handshake_dup() ipv6: Fix out-of-bounds access in ipv6_find_tlv() net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs docs: netdev: document the existence of the mail bot net: fix skb leak in __skb_tstamp_tx() r8169: Use a raw_spinlock_t for the register locks. page_pool: fix inconsistency for page_pool_ring_[un]lock() bpf, sockmap: Test progs verifier error with latest clang bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer ...
2023-05-25Merge tag 'for-v6.4-rc' of ↵Linus Torvalds14-112/+132
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply fixes from Sebastian Reichel: - Fix power_supply_get_battery_info for devices without parent devices resulting in NULL pointer dereference - Fix desktop systems reporting to run on battery once a power-supply device with device scope appears (e.g. a HID keyboard with a battery) - Ratelimit debug print about driver not providing data - Fix race condition related to external_power_changed in multiple drivers (ab8500, axp288, bq25890, sc27xx, bq27xxx) - Fix LED trigger switching from blinking to solid-on when charging finishes - Fix multiple races in bq27xxx battery driver - mt6360: handle potential ENOMEM from devm_work_autocancel - sbs-charger: Fix SBS_CHARGER_STATUS_CHARGE_INHIBITED bit - rt9467: avoid passing 0 to dev_err_probe * tag 'for-v6.4-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (21 commits) power: supply: Fix logic checking if system is running from battery power: supply: mt6360: add a check of devm_work_autocancel in mt6360_charger_probe power: supply: sbs-charger: Fix INHIBITED bit for Status reg power: supply: rt9467: Fix passing zero to 'dev_err_probe' power: supply: Ratelimit no data debug output power: supply: Fix power_supply_get_battery_info() if parent is NULL power: supply: bq24190: Call power_supply_changed() after updating input current power: supply: bq25890: Call power_supply_changed() after updating input current or voltage power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule() power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize power: supply: bq27xxx: Ensure power_supply_changed() is called on current sign changes power: supply: bq27xxx: Move bq27xxx_battery_update() down power: supply: bq27xxx: Add cache parameter to bq27xxx_battery_current_and_status() power: supply: bq27xxx: Fix poll_interval handling and races on remove power: supply: bq27xxx: Fix I2C IRQ race on remove power: supply: bq27xxx: Fix bq27xxx_battery_update() race condition power: supply: leds: Fix blink to LED on transition power: supply: sc27xx: Fix external_power_changed race power: supply: bq25890: Fix external_power_changed race power: supply: axp288_fuel_gauge: Fix external_power_changed race ...
2023-05-25Merge tag 'platform-drivers-x86-v6.4-3' of ↵Linus Torvalds5-18/+35
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "Nothing special to report just a few small fixes" * tag 'platform-drivers-x86-v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel/ifs: Annotate work queue on stack so object debug does not complain platform/x86: ISST: Remove 8 socket limit platform/mellanox: mlxbf-pmc: fix sscanf() error checking platform/x86/amd/pmf: Fix CnQF and auto-mode after resume platform/x86: asus-wmi: Ignore WMI events with codes 0x7B, 0xC0
2023-05-25net: phy: mscc: enable VSC8501/2 RGMII RX clockDavid Epping2-26/+29
By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is disabled. To allow packet forwarding towards the MAC it needs to be enabled. For other PHYs supported by this driver the clock output is enabled by default. Fixes: d3169863310d ("net: phy: mscc: add support for VSC8502") Signed-off-by: David Epping <david.epping@missinglinkelectronics.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25net: phy: mscc: remove unnecessary phydev lockingDavid Epping1-4/+0
Holding the struct phy_device (phydev) lock is unnecessary when accessing phydev->interface in the PHY driver .config_init method, which is the only place that vsc85xx_rgmii_set_skews() is called from. The phy_modify_paged() function implements required MDIO bus level locking, which can not be achieved by a phydev lock. Signed-off-by: David Epping <david.epping@missinglinkelectronics.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25net: phy: mscc: add support for VSC8501David Epping2-0/+26
The VSC8501 PHY can use the same driver implementation as the VSC8502. Adding the PHY ID and copying the handler functions of VSC8502 is sufficient to operate it. Signed-off-by: David Epping <david.epping@missinglinkelectronics.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLEDavid Epping1-0/+1
The mscc driver implements support for VSC8502, so its ID should be in the MODULE_DEVICE_TABLE for automatic loading. Signed-off-by: David Epping <david.epping@missinglinkelectronics.com> Fixes: d3169863310d ("net: phy: mscc: add support for VSC8502") Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25net/mlx5: Fix check for allocation failure in comp_irqs_request_pci()Dan Carpenter1-1/+1
This function accidentally dereferences "cpus" instead of returning directly. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202305200354.KV3jU94w-lkp@intel.com/ Fixes: b48a0f72bc3e ("net/mlx5: Refactor completion irq request/release code") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5: DR, Add missing mutex init/destroy in pattern managerYevgeny Kliteynik1-0/+3
Add missing mutex init/destroy as caught by the lock's debug warning: DEBUG_LOCKS_WARN_ON(lock->magic != lock) Fixes: da5d0027d666 ("net/mlx5: DR, Add cache for modify header pattern") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5e: Move Ethernet driver debugfs to profile init callbackJianbo Liu2-5/+11
As priv->dfs_root is cleared, and therefore missed, when change eswitch mode, move the creation of the root debugfs to the init callback of mlx5e_nic_profile and mlx5e_uplink_rep_profile, and the destruction to the cleanup callback for symmeter. Fixes: 288eca60cc31 ("net/mlx5e: Add Ethernet driver debugfs") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5e: Don't attach netdev profile while handling internal errorDmytro Linkin1-4/+31
As part of switchdev mode disablement, driver changes port netdevice profile from uplink to nic. If this process is triggered by health recovery flow (PCI reset, for ex.) profile attach would fail because all fw commands aborted when internal error flag is set. As a result, nic netdevice profile is not attached and driver fails to rollback to uplink profile, which leave driver in broken state and cause crash later. To handle broken state do netdevice profile initialization only instead of full attachment and release mdev resources on driver suspend as expected. Actual netdevice attachment is done during driver load. Fixes: c4d7eb57687f ("net/mxl5e: Add change profile method") Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5: Fix post parse infra to only parse every action onceVlad Buslov3-5/+12
Caller of mlx5e_tc_act_post_parse() needs it to parse only the subset of actions starting after previous split and ending at the current action. However, that range is not provided as arguments and mlx5e_tc_act_post_parse() uses generic flow_action_for_each() that iterates over all flow actions. Not only this is redundant, it also causes a bug when mlx5e_tc_act->post_parse() callback is not idempotent since it will be called for every split. For example, ct action tc_act_post_parse_ct() callback obtains a reference to mlx5_ct_ft instance and calling it several times during parsing stage will cause reference counter imbalance. Fix the issue by providing a proper action range of the current split subset to mlx5e_tc_act_post_parse() and only calling mlx5e_tc_act->post_parse() for actions inside the subset range. Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5e: Use query_special_contexts cmd only once per mdevDragos Tatulea3-21/+25
Don't query the firmware so many times (num rqs * num wqes * wqe frags) because it slows down linearly the interface creation time when the product is larger. Do it only once per mdev and store the result in mlx5e_param. Due to helper function being called from different files, move it to an appropriate location. Rename the function with a proper prefix and add a small cleanup. This fix applies only for legacy rq. Fixes: 1b1e4868836a ("net/mlx5e: Use query_special_contexts for mkeys") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5: fw_tracer, Fix event handlingShay Drory1-1/+1
mlx5 driver needs to parse traces with event_id inside the range of first_string_trace and num_string_trace. However, mlx5 is parsing all events with event_id >= first_string_trace. Fix it by checking for the correct range. Fixes: c71ad41ccb0c ("net/mlx5: FW tracer, events handling") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5: SF, Drain health before removing deviceShay Drory1-0/+1
There is no point in recovery during device removal. Also, if health work started need to wait for it to avoid races and NULL pointer access. Hence, drain health WQ before removing device. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5: Drain health before unregistering devlinkShay Drory1-3/+4
mlx5 health mechanism is using devlink APIs, which are using devlink notify APIs. After the cited patch, using devlink notify APIs after devlink is unregistered triggers a WARN_ON(). Hence, drain health WQ before devlink is unregistered. Fixes: cf530217408e ("devlink: Notify users when objects are accessible") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5e: Do not update SBCM when prio2buffer command is invalidMaher Sanalla1-2/+2
The shared buffer pools configuration which are stored in the SBCM register are updated when the user changes the prio2buffer mapping. However, in case the user desired prio2buffer change is invalid, which can occur due to mapping a lossless priority to a not large enough buffer, the SBCM update should not be performed, as the user command is failed. Thus, Perform the SBCM update only after xoff threshold calculation is performed and the user prio2buffer mapping is validated. Fixes: a440030d8946 ("net/mlx5e: Update shared buffer along with device buffer changes") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5e: Consider internal buffers size in port buffer calculationsMaher Sanalla3-21/+36
Currently, when a user triggers a change in port buffer headroom (buffers 0-7), the driver checks that the requested headroom does not exceed the total port buffer size. However, this check does not take into account the internal buffers (buffers 8-9), which are also part of the total port buffer. This can result in treating invalid port buffer change requests as valid, causing unintended changes to the shared buffer. To address this, include the internal buffers size in the calculation of available port buffer space which ensures that port buffer requests do not exceed the correct limit. Furthermore, remove internal buffers (8-9) size from the total_size calculation as these buffers are reserved for internal use and are not exposed to the user. While at it, add verbosity to the debug prints in mlx5e_port_query_buffer() function to ease future debugging. Fixes: ecdf2dadee8e ("net/mlx5e: Receive buffer support for DCBX") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5e: Prevent encap offload when neigh update is runningChris Mi1-17/+20
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad7999b ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5e: Extract remaining tunnel encap code to dedicated fileChris Mi3-87/+94
Move set_encap_dests() and clean_encap_dests() to the tunnel encap dedicated file. And rename them to mlx5e_tc_tun_encap_dests_set() and mlx5e_tc_tun_encap_dests_unset(). No functional change in this patch. It is needed in the next patch. Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-24Merge tag 'spi-fix-v6.4-rc3' of ↵Linus Torvalds3-64/+51
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A collection of fixes that came in since the merge window, plus an update to MAINTAINERS. The Cadence fixes are coming from the addition of device mode support, they required a couple of incremental updates in order to get something that works robustly for both device and controller modes" * tag 'spi-fix-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: spi-cadence: Interleave write of TX and read of RX FIFO spi: dw: Replace spi->chip_select references with function calls spi: MAINTAINERS: drop Krzysztof Kozlowski from Samsung SPI spi: spi-cadence: Only overlap FIFO transactions in slave mode spi: spi-cadence: Avoid read of RX FIFO before its ready spi: spi-geni-qcom: Select FIFO mode for chip select
2023-05-24Merge tag 'regulator-fix-v6.4-rc3' of ↵Linus Torvalds3-6/+9
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "Some fixes that came in since the merge window, nothing terribly exciting - a couple of driver specific fixes and a fix for the error handling when setting up the debugfs for the devices" * tag 'regulator-fix-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: mt6359: add read check for PMIC MT6359 regulator: Fix error checking for debugfs_create_dir regulator: pca9450: Fix BUCK2 enable_mask
2023-05-24Merge tag 'mmc-v6.4-rc1' of ↵Linus Torvalds3-11/+20
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Fix error propagation for the non-block-device I/O paths MMC host: - sdhci-cadence: Fix an error path during probe - sdhci-esdhc-imx: Fix support for the 'no-mmc-hs400' DT property" * tag 'mmc-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-esdhc-imx: make "no-mmc-hs400" works mmc: sdhci-cadence: Fix an error handling path in sdhci_cdns_probe() mmc: block: ensure error propagation for non-blk
2023-05-24Merge tag 'mlx5-fixes-2023-05-22' of ↵David S. Miller16-58/+173
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2023-05-22 This series provides bug fixes for the mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCsArınç ÜNAL1-6/+2
The commit c6d96df9fa2c ("net: ethernet: mtk_eth_soc: drop generic vlan rx offload, only use DSA untagging") makes VLAN RX offloading to be only used on the SoCs without the MTK_NETSYS_V2 ability (which are not just MT7621 and MT7622). The commit disables the proper handling of special tagged (DSA) frames, added with commit 87e3df4961f4 ("net-next: ethernet: mediatek: add CDM able to recognize the tag for DSA"), for non MTK_NETSYS_V2 SoCs when it finds a MAC that does not use DSA. So if the other MAC uses DSA, the CDMQ component transmits DSA tagged frames to the CPU improperly. This issue can be observed on frames with TCP, for example, a TCP speed test using iperf3 won't work. The commit disables the proper handling of special tagged (DSA) frames because it assumes that these SoCs don't use more than one MAC, which is wrong. Although I made Frank address this false assumption on the patch log when they sent the patch on behalf of Felix, the code still made changes with this assumption. Therefore, the proper handling of special tagged (DSA) frames must be kept enabled in all circumstances as it doesn't affect non DSA tagged frames. Hardware DSA untagging, introduced with the commit 2d7605a72906 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging"), and VLAN RX offloading are operations on the two CDM components of the frame engine, CDMP and CDMQ, which connect to Packet DMA (PDMA) and QoS DMA (QDMA) and are between the MACs and the CPU. These operations apply to all MACs of the SoC so if one MAC uses DSA and the other doesn't, the hardware DSA untagging operation will cause the CDMP component to transmit non DSA tagged frames to the CPU improperly. Since the VLAN RX offloading feature configuration was dropped, VLAN RX offloading can only be used along with hardware DSA untagging. So, for the case above, we need to disable both features and leave it to the CPU, therefore software, to untag the DSA and VLAN tags. So the correct way to handle this is: For all SoCs: Enable the proper handling of special tagged (DSA) frames (MTK_CDMQ_IG_CTRL). For non MTK_NETSYS_V2 SoCs: Enable hardware DSA untagging (MTK_CDMP_IG_CTRL). Enable VLAN RX offloading (MTK_CDMP_EG_CTRL). When a non MTK_NETSYS_V2 SoC MAC does not use DSA: Disable hardware DSA untagging (MTK_CDMP_IG_CTRL). Disable VLAN RX offloading (MTK_CDMP_EG_CTRL). Fixes: c6d96df9fa2c ("net: ethernet: mtk_eth_soc: drop generic vlan rx offload, only use DSA untagging") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-05-24r8169: Use a raw_spinlock_t for the register locks.Sebastian Andrzej Siewior1-22/+22
The driver's interrupt service routine is requested with the IRQF_NO_THREAD if MSI is available. This means that the routine is invoked in hardirq context even on PREEMPT_RT. The routine itself is relatively short and schedules a worker, performs register access and schedules NAPI. On PREEMPT_RT, scheduling NAPI from hardirq results in waking ksoftirqd for further processing so using NAPI threads with this driver is highly recommended since it NULL routes the threaded-IRQ efforts. Adding rtl_hw_aspm_clkreq_enable() to the ISR is problematic on PREEMPT_RT because the function uses spinlock_t locks which become sleeping locks on PREEMPT_RT. The locks are only used to protect register access and don't nest into other functions or locks. They are also not used for unbounded period of time. Therefore it looks okay to convert them to raw_spinlock_t. Convert the three locks which are used from the interrupt service routine to raw_spinlock_t. Fixes: e1ed3e4d9111 ("r8169: disable ASPM during NAPI poll") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/20230522134121.uxjax0F5@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24tpm: tpm_tis: Disable interrupts for AEON UPX-i11Peter Ujfalusi1-0/+7
Interrupts got recently enabled for tpm_tis. The interrupts initially works on the device but they will stop arriving after circa ~200 interrupts. On system reboot/shutdown this will cause a long wait (120000 jiffies). [jarkko@kernel.org: fix a merge conflict and adjust the commit message] Fixes: e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-05-23lan966x: Fix unloading/loading of the driverHoratiu Vultur1-0/+10
It was noticing that after a while when unloading/loading the driver and sending traffic through the switch, it would stop working. It would stop forwarding any traffic and the only way to get out of this was to do a power cycle of the board. The root cause seems to be that the switch core is initialized twice. Apparently initializing twice the switch core disturbs the pointers in the queue systems in the HW, so after a while it would stop sending the traffic. Unfortunetly, it is not possible to use a reset of the switch here, because the reset line is connected to multiple devices like MDIO, SGPIO, FAN, etc. So then all the devices will get reseted when the network driver will be loaded. So the fix is to check if the core is initialized already and if that is the case don't initialize it again. Fixes: db8bcaad5393 ("net: lan966x: add the basic lan966x driver") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230522120038.3749026-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-23platform/x86/intel/ifs: Annotate work queue on stack so object debug does ↵David Arcari1-1/+1
not complain Object Debug results in the following warning while attempting to load ifs firmware: [ 220.007422] ODEBUG: object 000000003bf952db is on stack 00000000e843994b, but NOT annotated. [ 220.007459] ------------[ cut here ]------------ [ 220.007461] WARNING: CPU: 0 PID: 11774 at lib/debugobjects.c:548 __debug_object_init.cold+0x22e/0x2d5 [ 220.137476] RIP: 0010:__debug_object_init.cold+0x22e/0x2d5 [ 220.254774] Call Trace: [ 220.257641] <TASK> [ 220.265606] scan_chunks_sanity_check+0x368/0x5f0 [intel_ifs] [ 220.288292] ifs_load_firmware+0x2a3/0x400 [intel_ifs] [ 220.332793] current_batch_store+0xea/0x160 [intel_ifs] [ 220.357947] kernfs_fop_write_iter+0x355/0x530 [ 220.363048] new_sync_write+0x28e/0x4a0 [ 220.381226] vfs_write+0x62a/0x920 [ 220.385160] ksys_write+0xf9/0x1d0 [ 220.399421] do_syscall_64+0x59/0x90 [ 220.440635] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 220.566845] ---[ end trace 3a01b299db142b41 ]--- Correct this by calling INIT_WORK_ONSTACK instead of INIT_WORK. Fixes: 684ec215706d ("platform/x86/intel/ifs: Authenticate and copy to secured memory") Signed-off-by: David Arcari <darcari@redhat.com> Cc: Jithu Joseph <jithu.joseph@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Mark Gross <markgross@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230523105400.674152-1-darcari@redhat.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-05-23platform/x86: ISST: Remove 8 socket limitSteve Wahl1-7/+5
Stop restricting the PCI search to a range of PCI domains fed to pci_get_domain_bus_and_slot(). Instead, use for_each_pci_dev() and look at all PCI domains in one pass. On systems with more than 8 sockets, this avoids error messages like "Information: Invalid level, Can't get TDP control information at specified levels on cpu 480" from the intel speed select utility. Fixes: aa2ddd242572 ("platform/x86: ISST: Use numa node id for cpu pci dev mapping") Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230519160420.2588475-1-steve.wahl@hpe.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-05-23net/mlx5: Fix indexing of mlx5_irqShay Drory1-4/+5
After the cited patch, mlx5_irq xarray index can be different then mlx5_irq MSIX table index. Fix it by storing both mlx5_irq xarray index and MSIX table index. Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5: Fix irq affinity managementShay Drory1-1/+1
The cited patch deny the user of changing the affinity of mlx5 irqs, which break backward compatibility. Hence, allow the user to change the affinity of mlx5 irqs. Fixes: bbac70c74183 ("net/mlx5: Use newer affinity descriptor") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5: Free irqs only on shutdown callbackShay Drory3-1/+31
Whenever a shutdown is invoked, free irqs only and keep mlx5_irq synthetic wrapper intact in order to avoid use-after-free on system shutdown. for example: ================================================================== BUG: KASAN: use-after-free in _find_first_bit+0x66/0x80 Read of size 8 at addr ffff88823fc0d318 by task kworker/u192:0/13608 CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G B W O 6.1.21-cloudflare-kasan-2023.3.21 #1 Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_report+0x170/0x473 ? _find_first_bit+0x66/0x80 kasan_report+0xad/0x130 ? _find_first_bit+0x66/0x80 _find_first_bit+0x66/0x80 mlx5e_open_channels+0x3c5/0x3a10 [mlx5_core] ? console_unlock+0x2fa/0x430 ? _raw_spin_lock_irqsave+0x8d/0xf0 ? _raw_spin_unlock_irqrestore+0x42/0x80 ? preempt_count_add+0x7d/0x150 ? __wake_up_klogd.part.0+0x7d/0xc0 ? vprintk_emit+0xfe/0x2c0 ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core] ? dev_attr_show.cold+0x35/0x35 ? devlink_health_do_dump.part.0+0x174/0x340 ? devlink_health_report+0x504/0x810 ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core] ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core] ? process_one_work+0x680/0x1050 mlx5e_safe_switch_params+0x156/0x220 [mlx5_core] ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core] ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core] mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core] ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0 devlink_health_reporter_recover+0xa6/0x1f0 devlink_health_report+0x2f7/0x810 ? vsnprintf+0x854/0x15e0 mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core] ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core] ? newidle_balance+0x9b7/0xe30 ? psi_group_change+0x6a7/0xb80 ? mutex_lock+0x96/0xf0 ? __mutex_lock_slowpath+0x10/0x10 mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core] process_one_work+0x680/0x1050 worker_thread+0x5a0/0xeb0 ? process_one_work+0x1050/0x1050 kthread+0x2a2/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Freed by task 1: kasan_save_stack+0x23/0x50 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x169/0x1d0 slab_free_freelist_hook+0xd2/0x190 __kmem_cache_free+0x1a1/0x2f0 irq_pool_free+0x138/0x200 [mlx5_core] mlx5_irq_table_destroy+0xf6/0x170 [mlx5_core] mlx5_core_eq_free_irqs+0x74/0xf0 [mlx5_core] shutdown+0x194/0x1aa [mlx5_core] pci_device_shutdown+0x75/0x120 device_shutdown+0x35c/0x620 kernel_restart+0x60/0xa0 __do_sys_reboot+0x1cb/0x2c0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x4b/0xb5 The buggy address belongs to the object at ffff88823fc0d300 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 24 bytes inside of 192-byte region [ffff88823fc0d300, ffff88823fc0d3c0) The buggy address belongs to the physical page: page:0000000010139587 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x23fc0c head:0000000010139587 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004ca00 raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88823fc0d200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88823fc0d280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff88823fc0d300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88823fc0d380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff88823fc0d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== general protection fault, probably for non-canonical address 0xdffffc005c40d7ac: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: probably user-memory-access in range [0x00000002e206bd60-0x00000002e206bd67] CPU: 25 PID: 13608 Comm: kworker/u192:0 Tainted: G B W O 6.1.21-cloudflare-kasan-2023.3.21 #1 Hardware name: GIGABYTE R162-R2-GEN0/MZ12-HD2-CD, BIOS R14 05/03/2021 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] RIP: 0010:__alloc_pages+0x141/0x5c0 Call Trace: <TASK> ? sysvec_apic_timer_interrupt+0xa0/0xc0 ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? __alloc_pages_slowpath.constprop.0+0x1ec0/0x1ec0 ? _raw_spin_unlock_irqrestore+0x3d/0x80 __kmalloc_large_node+0x80/0x120 ? kvmalloc_node+0x4e/0x170 __kmalloc_node+0xd4/0x150 kvmalloc_node+0x4e/0x170 mlx5e_open_channels+0x631/0x3a10 [mlx5_core] ? console_unlock+0x2fa/0x430 ? _raw_spin_lock_irqsave+0x8d/0xf0 ? _raw_spin_unlock_irqrestore+0x42/0x80 ? preempt_count_add+0x7d/0x150 ? __wake_up_klogd.part.0+0x7d/0xc0 ? vprintk_emit+0xfe/0x2c0 ? mlx5e_trigger_napi_sched+0x40/0x40 [mlx5_core] ? dev_attr_show.cold+0x35/0x35 ? devlink_health_do_dump.part.0+0x174/0x340 ? devlink_health_report+0x504/0x810 ? mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core] ? mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core] ? process_one_work+0x680/0x1050 mlx5e_safe_switch_params+0x156/0x220 [mlx5_core] ? mlx5e_switch_priv_channels+0x310/0x310 [mlx5_core] ? mlx5_eq_poll_irq_disabled+0xb6/0x100 [mlx5_core] mlx5e_tx_reporter_timeout_recover+0x123/0x240 [mlx5_core] ? __mutex_unlock_slowpath.constprop.0+0x2b0/0x2b0 devlink_health_reporter_recover+0xa6/0x1f0 devlink_health_report+0x2f7/0x810 ? vsnprintf+0x854/0x15e0 mlx5e_reporter_tx_timeout+0x29d/0x3a0 [mlx5_core] ? mlx5e_reporter_tx_err_cqe+0x1a0/0x1a0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x50/0x50 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x260/0x260 [mlx5_core] ? newidle_balance+0x9b7/0xe30 ? psi_group_change+0x6a7/0xb80 ? mutex_lock+0x96/0xf0 ? __mutex_lock_slowpath+0x10/0x10 mlx5e_tx_timeout_work+0x17c/0x230 [mlx5_core] process_one_work+0x680/0x1050 worker_thread+0x5a0/0xeb0 ? process_one_work+0x1050/0x1050 kthread+0x2a2/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> ---[ end trace 0000000000000000 ]--- RIP: 0010:__alloc_pages+0x141/0x5c0 Code: e0 39 a3 96 89 e9 b8 22 01 32 01 83 e1 0f 48 89 fa 01 c9 48 c1 ea 03 d3 f8 83 e0 03 89 44 24 6c 48 b8 00 00 00 00 00 fc ff df <80> 3c 02 00 0f 85 fc 03 00 00 89 e8 4a 8b 14 f5 e0 39 a3 96 4c 89 RSP: 0018:ffff888251f0f438 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 1ffff1104a3e1e8b RCX: 0000000000000000 RDX: 000000005c40d7ac RSI: 0000000000000003 RDI: 00000002e206bd60 RBP: 0000000000052dc0 R08: ffff8882b0044218 R09: ffff8882b0045e8a R10: fffffbfff300fefc R11: ffff888167af4000 R12: 0000000000000003 R13: 0000000000000000 R14: 00000000696c7070 R15: ffff8882373f4380 FS: 0000000000000000(0000) GS:ffff88bf2be80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005641d031eee8 CR3: 0000002e7ca14000 CR4: 0000000000350ee0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]---] Reported-by: Frederick Lawler <fred@cloudflare.com> Link: https://lore.kernel.org/netdev/be5b9271-7507-19c5-ded1-fa78f1980e69@cloudflare.com Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5: Devcom, serialize devcom registrationShay Drory1-5/+14
From one hand, mlx5 driver is allowing to probe PFs in parallel. From the other hand, devcom, which is a share resource between PFs, is registered without any lock. This might resulted in memory problems. Hence, use the global mlx5_dev_list_lock in order to serialize devcom registration. Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5: Devcom, fix error flow in mlx5_devcom_register_deviceShay Drory1-1/+2
In case devcom allocation is failed, mlx5 is always freeing the priv. However, this priv might have been allocated by a different thread, and freeing it might lead to use-after-free bugs. Fix it by freeing the priv only in case it was allocated by the running thread. Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5: E-switch, Devcom, sync devcom events and devcom comp registerShay Drory2-1/+9
devcom events are sent to all registered component. Following the cited patch, it is possible for two components, e.g.: two eswitches, to send devcom events, while both components are registered. This means eswitch layer will do double un/pairing, which is double allocation and free of resources, even though only one un/pairing is needed. flow example: cpu0 cpu1 ---- ---- mlx5_devlink_eswitch_mode_set(dev0) esw_offloads_devcom_init() mlx5_devcom_register_component(esw0) mlx5_devlink_eswitch_mode_set(dev1) esw_offloads_devcom_init() mlx5_devcom_register_component(esw1) mlx5_devcom_send_event() mlx5_devcom_send_event() Hence, check whether the eswitches are already un/paired before free/allocation of resources. Fixes: 09b278462f16 ("net: devlink: enable parallel ops on netlink interface") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5e: TC, Fix using eswitch mapping in nic modePaul Blakey1-7/+27
Cited patch is using the eswitch object mapping pool while in nic mode where it isn't initialized. This results in the trace below [0]. Fix that by using either nic or eswitch object mapping pool depending if eswitch is enabled or not. [0]: [ 826.446057] ================================================================== [ 826.446729] BUG: KASAN: slab-use-after-free in mlx5_add_flow_rules+0x30/0x490 [mlx5_core] [ 826.447515] Read of size 8 at addr ffff888194485830 by task tc/6233 [ 826.448243] CPU: 16 PID: 6233 Comm: tc Tainted: G W 6.3.0-rc6+ #1 [ 826.448890] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 826.449785] Call Trace: [ 826.450052] <TASK> [ 826.450302] dump_stack_lvl+0x33/0x50 [ 826.450650] print_report+0xc2/0x610 [ 826.450998] ? __virt_addr_valid+0xb1/0x130 [ 826.451385] ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core] [ 826.451935] kasan_report+0xae/0xe0 [ 826.452276] ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core] [ 826.452829] mlx5_add_flow_rules+0x30/0x490 [mlx5_core] [ 826.453368] ? __kmalloc_node+0x5a/0x120 [ 826.453733] esw_add_restore_rule+0x20f/0x270 [mlx5_core] [ 826.454288] ? mlx5_eswitch_add_send_to_vport_meta_rule+0x260/0x260 [mlx5_core] [ 826.455011] ? mutex_unlock+0x80/0xd0 [ 826.455361] ? __mutex_unlock_slowpath.constprop.0+0x210/0x210 [ 826.455862] ? mapping_add+0x2cb/0x440 [mlx5_core] [ 826.456425] mlx5e_tc_action_miss_mapping_get+0x139/0x180 [mlx5_core] [ 826.457058] ? mlx5e_tc_update_skb_nic+0xb0/0xb0 [mlx5_core] [ 826.457636] ? __kasan_kmalloc+0x77/0x90 [ 826.458000] ? __kmalloc+0x57/0x120 [ 826.458336] mlx5_tc_ct_flow_offload+0x325/0xe40 [mlx5_core] [ 826.458916] ? ct_kernel_enter.constprop.0+0x48/0xa0 [ 826.459360] ? mlx5_tc_ct_parse_action+0xf0/0xf0 [mlx5_core] [ 826.459933] ? mlx5e_mod_hdr_attach+0x491/0x520 [mlx5_core] [ 826.460507] ? mlx5e_mod_hdr_get+0x12/0x20 [mlx5_core] [ 826.461046] ? mlx5e_tc_attach_mod_hdr+0x154/0x170 [mlx5_core] [ 826.461635] mlx5e_configure_flower+0x969/0x2110 [mlx5_core] [ 826.462217] ? _raw_spin_lock_bh+0x85/0xe0 [ 826.462597] ? __mlx5e_add_fdb_flow+0x750/0x750 [mlx5_core] [ 826.463163] ? kasan_save_stack+0x2e/0x40 [ 826.463534] ? down_read+0x115/0x1b0 [ 826.463878] ? down_write_killable+0x110/0x110 [ 826.464288] ? tc_setup_action.part.0+0x9f/0x3b0 [ 826.464701] ? mlx5e_is_uplink_rep+0x4c/0x90 [mlx5_core] [ 826.465253] ? mlx5e_tc_reoffload_flows_work+0x130/0x130 [mlx5_core] [ 826.465878] tc_setup_cb_add+0x112/0x250 [ 826.466247] fl_hw_replace_filter+0x230/0x310 [cls_flower] [ 826.466724] ? fl_hw_destroy_filter+0x1a0/0x1a0 [cls_flower] [ 826.467212] fl_change+0x14e1/0x2030 [cls_flower] [ 826.467636] ? sock_def_readable+0x89/0x120 [ 826.468019] ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower] [ 826.468509] ? kasan_unpoison+0x23/0x50 [ 826.468873] ? get_random_u16+0x180/0x180 [ 826.469244] ? __radix_tree_lookup+0x2b/0x130 [ 826.469640] ? fl_get+0x7b/0x140 [cls_flower] [ 826.470042] ? fl_mask_put+0x200/0x200 [cls_flower] [ 826.470478] ? __mutex_unlock_slowpath.constprop.0+0x210/0x210 [ 826.470973] ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower] [ 826.471427] tc_new_tfilter+0x644/0x1050 [ 826.471795] ? tc_get_tfilter+0x860/0x860 [ 826.472170] ? __thaw_task+0x130/0x130 [ 826.472525] ? arch_stack_walk+0x98/0xf0 [ 826.472892] ? cap_capable+0x9f/0xd0 [ 826.473235] ? security_capable+0x47/0x60 [ 826.473608] rtnetlink_rcv_msg+0x1d5/0x550 [ 826.473985] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 826.474383] ? __stack_depot_save+0x35/0x4c0 [ 826.474779] ? kasan_save_stack+0x2e/0x40 [ 826.475149] ? kasan_save_stack+0x1e/0x40 [ 826.475518] ? __kasan_record_aux_stack+0x9f/0xb0 [ 826.475939] ? task_work_add+0x77/0x1c0 [ 826.476305] netlink_rcv_skb+0xe0/0x210 [ 826.476661] ? rtnl_calcit.isra.0+0x1f0/0x1f0 [ 826.477057] ? netlink_ack+0x7c0/0x7c0 [ 826.477412] ? rhashtable_jhash2+0xef/0x150 [ 826.477796] ? _copy_from_iter+0x105/0x770 [ 826.484386] netlink_unicast+0x346/0x490 [ 826.484755] ? netlink_attachskb+0x400/0x400 [ 826.485145] ? kernel_text_address+0xc2/0xd0 [ 826.485535] netlink_sendmsg+0x3b0/0x6c0 [ 826.485902] ? kernel_text_address+0xc2/0xd0 [ 826.486296] ? netlink_unicast+0x490/0x490 [ 826.486671] ? iovec_from_user.part.0+0x7a/0x1a0 [ 826.487083] ? netlink_unicast+0x490/0x490 [ 826.487461] sock_sendmsg+0x73/0xc0 [ 826.487803] ____sys_sendmsg+0x364/0x380 [ 826.488186] ? import_iovec+0x7/0x10 [ 826.488531] ? kernel_sendmsg+0x30/0x30 [ 826.488893] ? __copy_msghdr+0x180/0x180 [ 826.489258] ? kasan_save_stack+0x2e/0x40 [ 826.489629] ? kasan_save_stack+0x1e/0x40 [ 826.490002] ? __kasan_record_aux_stack+0x9f/0xb0 [ 826.490424] ? __call_rcu_common.constprop.0+0x46/0x580 [ 826.490876] ___sys_sendmsg+0xdf/0x140 [ 826.491231] ? copy_msghdr_from_user+0x110/0x110 [ 826.491649] ? fget_raw+0x120/0x120 [ 826.491988] ? ___sys_recvmsg+0xd9/0x130 [ 826.492355] ? folio_batch_add_and_move+0x80/0xa0 [ 826.492776] ? _raw_spin_lock+0x7a/0xd0 [ 826.493137] ? _raw_spin_lock+0x7a/0xd0 [ 826.493500] ? _raw_read_lock_irq+0x30/0x30 [ 826.493880] ? kasan_set_track+0x21/0x30 [ 826.494249] ? kasan_save_free_info+0x2a/0x40 [ 826.494650] ? do_sys_openat2+0xff/0x270 [ 826.495016] ? __fget_light+0x1b5/0x200 [ 826.495377] ? __virt_addr_valid+0xb1/0x130 [ 826.495763] __sys_sendmsg+0xb2/0x130 [ 826.496118] ? __sys_sendmsg_sock+0x20/0x20 [ 826.496501] ? __x64_sys_rseq+0x2e0/0x2e0 [ 826.496874] ? do_user_addr_fault+0x276/0x820 [ 826.497273] ? fpregs_assert_state_consistent+0x52/0x60 [ 826.497727] ? exit_to_user_mode_prepare+0x30/0x120 [ 826.498158] do_syscall_64+0x3d/0x90 [ 826.498502] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 826.498949] RIP: 0033:0x7f9b67f4f887 [ 826.499294] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 826.500742] RSP: 002b:00007fff5d1a5498 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 826.501395] RAX: ffffffffffffffda RBX: 0000000064413ce6 RCX: 00007f9b67f4f887 [ 826.501975] RDX: 0000000000000000 RSI: 00007fff5d1a5500 RDI: 0000000000000003 [ 826.502556] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 [ 826.503135] R10: 00007f9b67e08708 R11: 0000000000000246 R12: 0000000000000001 [ 826.503714] R13: 0000000000000001 R14: 00007fff5d1a9800 R15: 0000000000485400 [ 826.504304] </TASK> [ 826.504753] Allocated by task 3764: [ 826.505090] kasan_save_stack+0x1e/0x40 [ 826.505453] kasan_set_track+0x21/0x30 [ 826.505810] __kasan_kmalloc+0x77/0x90 [ 826.506164] __mlx5_create_flow_table+0x16d/0xbb0 [mlx5_core] [ 826.506742] esw_offloads_enable+0x60d/0xfb0 [mlx5_core] [ 826.507292] mlx5_eswitch_enable_locked+0x4d3/0x680 [mlx5_core] [ 826.507885] mlx5_devlink_eswitch_mode_set+0x2a3/0x580 [mlx5_core] [ 826.508513] devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0 [ 826.508969] genl_family_rcv_msg_doit.isra.0+0x146/0x1c0 [ 826.509427] genl_rcv_msg+0x28d/0x3e0 [ 826.509772] netlink_rcv_skb+0xe0/0x210 [ 826.510133] genl_rcv+0x24/0x40 [ 826.510448] netlink_unicast+0x346/0x490 [ 826.510810] netlink_sendmsg+0x3b0/0x6c0 [ 826.511179] sock_sendmsg+0x73/0xc0 [ 826.511519] __sys_sendto+0x18d/0x220 [ 826.511867] __x64_sys_sendto+0x72/0x80 [ 826.512232] do_syscall_64+0x3d/0x90 [ 826.512576] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 826.513220] Freed by task 5674: [ 826.513535] kasan_save_stack+0x1e/0x40 [ 826.513893] kasan_set_track+0x21/0x30 [ 826.514245] kasan_save_free_info+0x2a/0x40 [ 826.514629] ____kasan_slab_free+0x11a/0x1b0 [ 826.515021] __kmem_cache_free+0x14d/0x280 [ 826.515399] tree_put_node+0x109/0x1c0 [mlx5_core] [ 826.515907] mlx5_destroy_flow_table+0x119/0x630 [mlx5_core] [ 826.516481] esw_offloads_steering_cleanup+0xe7/0x150 [mlx5_core] [ 826.517084] esw_offloads_disable+0xe0/0x160 [mlx5_core] [ 826.517632] mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core] [ 826.518225] mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core] [ 826.518834] devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0 [ 826.519286] genl_family_rcv_msg_doit.isra.0+0x146/0x1c0 [ 826.519748] genl_rcv_msg+0x28d/0x3e0 [ 826.520101] netlink_rcv_skb+0xe0/0x210 [ 826.520458] genl_rcv+0x24/0x40 [ 826.520771] netlink_unicast+0x346/0x490 [ 826.521137] netlink_sendmsg+0x3b0/0x6c0 [ 826.521505] sock_sendmsg+0x73/0xc0 [ 826.521842] __sys_sendto+0x18d/0x220 [ 826.522191] __x64_sys_sendto+0x72/0x80 [ 826.522554] do_syscall_64+0x3d/0x90 [ 826.522894] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 826.523540] Last potentially related work creation: [ 826.523969] kasan_save_stack+0x1e/0x40 [ 826.524331] __kasan_record_aux_stack+0x9f/0xb0 [ 826.524739] insert_work+0x30/0x130 [ 826.525078] __queue_work+0x34b/0x690 [ 826.525426] queue_work_on+0x48/0x50 [ 826.525766] __rhashtable_remove_fast_one+0x4af/0x4d0 [mlx5_core] [ 826.526365] del_sw_flow_group+0x1b5/0x270 [mlx5_core] [ 826.526898] tree_put_node+0x109/0x1c0 [mlx5_core] [ 826.527407] esw_offloads_steering_cleanup+0xd3/0x150 [mlx5_core] [ 826.528009] esw_offloads_disable+0xe0/0x160 [mlx5_core] [ 826.528616] mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core] [ 826.529218] mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core] [ 826.529823] devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0 [ 826.530276] genl_family_rcv_msg_doit.isra.0+0x146/0x1c0 [ 826.530733] genl_rcv_msg+0x28d/0x3e0 [ 826.531079] netlink_rcv_skb+0xe0/0x210 [ 826.531439] genl_rcv+0x24/0x40 [ 826.531755] netlink_unicast+0x346/0x490 [ 826.532123] netlink_sendmsg+0x3b0/0x6c0 [ 826.532487] sock_sendmsg+0x73/0xc0 [ 826.532825] __sys_sendto+0x18d/0x220 [ 826.533175] __x64_sys_sendto+0x72/0x80 [ 826.533533] do_syscall_64+0x3d/0x90 [ 826.533877] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 826.534521] The buggy address belongs to the object at ffff888194485800 which belongs to the cache kmalloc-512 of size 512 [ 826.535506] The buggy address is located 48 bytes inside of freed 512-byte region [ffff888194485800, ffff888194485a00) [ 826.536666] The buggy address belongs to the physical page: [ 826.537138] page:00000000d75841dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x194480 [ 826.537915] head:00000000d75841dd order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 826.538595] flags: 0x200000000010200(slab|head|node=0|zone=2) [ 826.539089] raw: 0200000000010200 ffff888100042c80 ffffea0004523800 dead000000000002 [ 826.539755] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 826.540417] page dumped because: kasan: bad access detected [ 826.541095] Memory state around the buggy address: [ 826.541519] ffff888194485700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 826.542149] ffff888194485780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 826.542773] >ffff888194485800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 826.543400] ^ [ 826.543822] ffff888194485880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 826.544452] ffff888194485900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 826.545079] ================================================================== Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5e: Fix SQ wake logic in ptp napi_poll contextRahul Rameshbabu3-7/+16
Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken up. Before change, the ptp tx sq may never wake up if the ptp tx ts skb fifo is full when mlx5e_poll_tx_cq checks if the queue should be woken up. Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5e: Fix deadlock in tc route query codeVlad Buslov3-20/+48
Cited commit causes ABBA deadlock[0] when peer flows are created while holding the devcom rw semaphore. Due to peer flows offload implementation the lock is taken much higher up the call chain and there is no obvious way to easily fix the deadlock. Instead, since tc route query code needs the peer eswitch structure only to perform a lookup in xarray and doesn't perform any sleeping operations with it, refactor the code for lockless execution in following ways: - RCUify the devcom 'data' pointer. When resetting the pointer synchronously wait for RCU grace period before returning. This is fine since devcom is currently only used for synchronization of pairing/unpairing of eswitches which is rare and already expensive as-is. - Wrap all usages of 'paired' boolean in {READ|WRITE}_ONCE(). The flag has already been used in some unlocked contexts without proper annotations (e.g. users of mlx5_devcom_is_paired() function), but it wasn't an issue since all relevant code paths checked it again after obtaining the devcom semaphore. Now it is also used by mlx5_devcom_get_peer_data_rcu() as "best effort" check to return NULL when devcom is being unpaired. Note that while RCU read lock doesn't prevent the unpaired flag from being changed concurrently it still guarantees that reader can continue to use 'data'. - Refactor mlx5e_tc_query_route_vport() function to use new mlx5_devcom_get_peer_data_rcu() API which fixes the deadlock. [0]: [ 164.599612] ====================================================== [ 164.600142] WARNING: possible circular locking dependency detected [ 164.600667] 6.3.0-rc3+ #1 Not tainted [ 164.601021] ------------------------------------------------------ [ 164.601557] handler1/3456 is trying to acquire lock: [ 164.601998] ffff88811f1714b0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}, at: mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.603078] but task is already holding lock: [ 164.603617] ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core] [ 164.604459] which lock already depends on the new lock. [ 164.605190] the existing dependency chain (in reverse order) is: [ 164.605848] -> #1 (&comp->sem){++++}-{3:3}: [ 164.606380] down_read+0x39/0x50 [ 164.606772] mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core] [ 164.607336] mlx5e_tc_query_route_vport+0x86/0xc0 [mlx5_core] [ 164.607914] mlx5e_tc_tun_route_lookup+0x1a4/0x1d0 [mlx5_core] [ 164.608495] mlx5e_attach_decap_route+0xc6/0x1e0 [mlx5_core] [ 164.609063] mlx5e_tc_add_fdb_flow+0x1ea/0x360 [mlx5_core] [ 164.609627] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core] [ 164.610175] mlx5e_configure_flower+0x952/0x1a20 [mlx5_core] [ 164.610741] tc_setup_cb_add+0xd4/0x200 [ 164.611146] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 164.611661] fl_change+0xc95/0x18a0 [cls_flower] [ 164.612116] tc_new_tfilter+0x3fc/0xd20 [ 164.612516] rtnetlink_rcv_msg+0x418/0x5b0 [ 164.612936] netlink_rcv_skb+0x54/0x100 [ 164.613339] netlink_unicast+0x190/0x250 [ 164.613746] netlink_sendmsg+0x245/0x4a0 [ 164.614150] sock_sendmsg+0x38/0x60 [ 164.614522] ____sys_sendmsg+0x1d0/0x1e0 [ 164.614934] ___sys_sendmsg+0x80/0xc0 [ 164.615320] __sys_sendmsg+0x51/0x90 [ 164.615701] do_syscall_64+0x3d/0x90 [ 164.616083] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 164.616568] -> #0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}: [ 164.617210] __lock_acquire+0x159e/0x26e0 [ 164.617638] lock_acquire+0xc2/0x2a0 [ 164.618018] __mutex_lock+0x92/0xcd0 [ 164.618401] mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.618943] post_process_attr+0x153/0x2d0 [mlx5_core] [ 164.619471] mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core] [ 164.620021] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core] [ 164.620564] mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core] [ 164.621125] tc_setup_cb_add+0xd4/0x200 [ 164.621531] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 164.622047] fl_change+0xc95/0x18a0 [cls_flower] [ 164.622500] tc_new_tfilter+0x3fc/0xd20 [ 164.622906] rtnetlink_rcv_msg+0x418/0x5b0 [ 164.623324] netlink_rcv_skb+0x54/0x100 [ 164.623727] netlink_unicast+0x190/0x250 [ 164.624138] netlink_sendmsg+0x245/0x4a0 [ 164.624544] sock_sendmsg+0x38/0x60 [ 164.624919] ____sys_sendmsg+0x1d0/0x1e0 [ 164.625340] ___sys_sendmsg+0x80/0xc0 [ 164.625731] __sys_sendmsg+0x51/0x90 [ 164.626117] do_syscall_64+0x3d/0x90 [ 164.626502] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 164.626995] other info that might help us debug this: [ 164.627725] Possible unsafe locking scenario: [ 164.628268] CPU0 CPU1 [ 164.628683] ---- ---- [ 164.629098] lock(&comp->sem); [ 164.629421] lock(&esw->offloads.encap_tbl_lock); [ 164.630066] lock(&comp->sem); [ 164.630555] lock(&esw->offloads.encap_tbl_lock); [ 164.630993] *** DEADLOCK *** [ 164.631575] 3 locks held by handler1/3456: [ 164.631962] #0: ffff888124b75130 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200 [ 164.632703] #1: ffff888116e512b8 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core] [ 164.633552] #2: ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core] [ 164.634435] stack backtrace: [ 164.634883] CPU: 17 PID: 3456 Comm: handler1 Not tainted 6.3.0-rc3+ #1 [ 164.635431] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 164.636340] Call Trace: [ 164.636616] <TASK> [ 164.636863] dump_stack_lvl+0x47/0x70 [ 164.637217] check_noncircular+0xfe/0x110 [ 164.637601] __lock_acquire+0x159e/0x26e0 [ 164.637977] ? mlx5_cmd_set_fte+0x5b0/0x830 [mlx5_core] [ 164.638472] lock_acquire+0xc2/0x2a0 [ 164.638828] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.639339] ? lock_is_held_type+0x98/0x110 [ 164.639728] __mutex_lock+0x92/0xcd0 [ 164.640074] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.640576] ? __lock_acquire+0x382/0x26e0 [ 164.640958] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.641468] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.641965] mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.642454] ? lock_release+0xbf/0x240 [ 164.642819] post_process_attr+0x153/0x2d0 [mlx5_core] [ 164.643318] mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core] [ 164.643835] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core] [ 164.644340] mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core] [ 164.644862] ? lock_acquire+0xc2/0x2a0 [ 164.645219] tc_setup_cb_add+0xd4/0x200 [ 164.645588] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 164.646067] fl_change+0xc95/0x18a0 [cls_flower] [ 164.646488] tc_new_tfilter+0x3fc/0xd20 [ 164.646861] ? tc_del_tfilter+0x810/0x810 [ 164.647236] rtnetlink_rcv_msg+0x418/0x5b0 [ 164.647621] ? rtnl_setlink+0x160/0x160 [ 164.647982] netlink_rcv_skb+0x54/0x100 [ 164.648348] netlink_unicast+0x190/0x250 [ 164.648722] netlink_sendmsg+0x245/0x4a0 [ 164.649090] sock_sendmsg+0x38/0x60 [ 164.649434] ____sys_sendmsg+0x1d0/0x1e0 [ 164.649804] ? copy_msghdr_from_user+0x6d/0xa0 [ 164.650213] ___sys_sendmsg+0x80/0xc0 [ 164.650563] ? lock_acquire+0xc2/0x2a0 [ 164.650926] ? lock_acquire+0xc2/0x2a0 [ 164.651286] ? __fget_files+0x5/0x190 [ 164.651644] ? find_held_lock+0x2b/0x80 [ 164.652006] ? __fget_files+0xb9/0x190 [ 164.652365] ? lock_release+0xbf/0x240 [ 164.652723] ? __fget_files+0xd3/0x190 [ 164.653079] __sys_sendmsg+0x51/0x90 [ 164.653435] do_syscall_64+0x3d/0x90 [ 164.653784] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 164.654229] RIP: 0033:0x7f378054f8bd [ 164.654577] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a c3 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 be c3 f4 ff 48 [ 164.656041] RSP: 002b:00007f377fa114b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 164.656701] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f378054f8bd [ 164.657297] RDX: 0000000000000000 RSI: 00007f377fa11540 RDI: 0000000000000014 [ 164.657885] RBP: 00007f377fa12278 R08: 0000000000000000 R09: 000000000000015c [ 164.658472] R10: 00007f377fa123d0 R11: 0000000000000293 R12: 0000560962d99bd0 [ 164.665317] R13: 0000000000000000 R14: 0000560962d99bd0 R15: 00007f377fa11540 Fixes: f9d196bd632b ("net/mlx5e: Use correct eswitch for stack devices with lag") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5: Fix error message when failing to allocate device memoryRoi Dayan1-1/+1
Fix spacing for the error and also the correct error code pointer. Fixes: c9b9dcb430b3 ("net/mlx5: Move device memory management to mlx5_core") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5e: Use correct encap attribute during invalidationVlad Buslov1-1/+3
With introduction of post action infrastructure most of the users of encap attribute had been modified in order to obtain the correct attribute by calling mlx5e_tc_get_encap_attr() helper instead of assuming encap action is always on default attribute. However, the cited commit didn't modify mlx5e_invalidate_encap() which prevents it from destroying correct modify header action which leads to a warning [0]. Fix the issue by using correct attribute. [0]: Feb 21 09:47:35 c-237-177-40-045 kernel: WARNING: CPU: 17 PID: 654 at drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:684 mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core] Feb 21 09:47:35 c-237-177-40-045 kernel: RIP: 0010:mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core] Feb 21 09:47:35 c-237-177-40-045 kernel: Call Trace: Feb 21 09:47:35 c-237-177-40-045 kernel: <TASK> Feb 21 09:47:35 c-237-177-40-045 kernel: mlx5e_tc_fib_event_work+0x8e3/0x1f60 [mlx5_core] Feb 21 09:47:35 c-237-177-40-045 kernel: ? mlx5e_take_all_encap_flows+0xe0/0xe0 [mlx5_core] Feb 21 09:47:35 c-237-177-40-045 kernel: ? lock_downgrade+0x6d0/0x6d0 Feb 21 09:47:35 c-237-177-40-045 kernel: ? lockdep_hardirqs_on_prepare+0x273/0x3f0 Feb 21 09:47:35 c-237-177-40-045 kernel: ? lockdep_hardirqs_on_prepare+0x273/0x3f0 Feb 21 09:47:35 c-237-177-40-045 kernel: process_one_work+0x7c2/0x1310 Feb 21 09:47:35 c-237-177-40-045 kernel: ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0 Feb 21 09:47:35 c-237-177-40-045 kernel: ? pwq_dec_nr_in_flight+0x230/0x230 Feb 21 09:47:35 c-237-177-40-045 kernel: ? rwlock_bug.part.0+0x90/0x90 Feb 21 09:47:35 c-237-177-40-045 kernel: worker_thread+0x59d/0xec0 Feb 21 09:47:35 c-237-177-40-045 kernel: ? __kthread_parkme+0xd9/0x1d0 Fixes: 8300f225268b ("net/mlx5e: Create new flow attr for multi table actions") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5: DR, Check force-loopback RC QP capability independently from RoCEYevgeny Kliteynik1-1/+3
SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB (loopback), and FL (force-loopback) QP is preferred for performance. FL is available when RoCE is enabled or disabled based on RoCE caps. This patch adds reading of FL capability from HCA caps in addition to the existing reading from RoCE caps, thus fixing the case where we didn't have loopback enabled when RoCE was disabled. Fixes: 7304d603a57a ("net/mlx5: DR, Add support for force-loopback QP") Signed-off-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>