summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
AgeCommit message (Collapse)AuthorFilesLines
2023-09-17net/mlx5: Lift reload limitation when SFs are presentJiri Pirko1-11/+0
Historically, the shared devlink_mutex prevented devlink instances from being registered/unregistered during another devlink instance reload operation. However, devlink_muxex is gone for some time now, this limitation is no longer needed. Lift it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-15net/mlx5: Check with FW that sync reset completed successfullyMoshe Shemesh1-0/+3
Even if the PF driver had no error on his part of the sync reset flow, the firmware can see wider picture as it syncs all the PFs in the flow. So add at end of sync reset flow check with firmware by reading MFRL register and initialization segment that the flow had no issue from firmware point of view too. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-10net/mlx5: Light probe local SFsShay Drory1-4/+16
In case user wants to configure the SFs, for example: to use only vdpa functionality, he needs to fully probe a SF, configure what he wants, and afterward reload the SF. In order to save the time of the reload, local SFs will probe without any auxiliary sub-device, so that the SFs can be configured prior to its full probe. The defaults of the enable_* devlink params of these SFs are set to false. Usage example: Create SF: $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 $ devlink port function set pci/0000:08:00.0/32768 \ hw_addr 00:00:00:00:00:11 state active Enable ETH auxiliary device: $ devlink dev param set auxiliary/mlx5_core.sf.1 \ name enable_eth value true cmode driverinit Now, in order to fully probe the SF, use devlink reload: $ devlink dev reload auxiliary/mlx5_core.sf.1 At this point the user have SF devlink instance with auxiliary device for the Ethernet functionality only. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-10net/mlx5: Move esw multiport devlink param to eswitch codeShay Drory1-34/+0
Move the param registration and handling code into the eswitch code as they are related to each other. No point in having the devlink param registration done in separate file. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-30devlink: move port_del() to devlink_port_opsJiri Pirko1-1/+0
Move port_del() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30devlink: move port_fn_state_get/set() to devlink_port_opsJiri Pirko1-2/+0
Move port_fn_state_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30devlink: move port_fn_migratable_get/set() to devlink_port_opsJiri Pirko1-2/+0
Move port_fn_migratable_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30devlink: move port_fn_roce_get/set() to devlink_port_opsJiri Pirko1-2/+0
Move port_fn_roce_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-30devlink: move port_fn_hw_addr_get/set() to devlink_port_opsJiri Pirko1-2/+0
Move port_fn_hw_addr_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-19net/mlx5: devlink, Only show PF related devlink warning when neededRoi Dayan1-2/+1
Limit the PF related warning to show if device is actually a PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-19net/mlx5: Remove redundant esw multiport validate functionRoi Dayan1-22/+1
The function didn't validate the value and doesn't require value validation as it will always be valid true or false values. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni1-1/+1
No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-21net/mlx5: Use recovery timeout on sync reset flowMoshe Shemesh1-1/+1
Use the same timeout for sync reset flow and health recovery flow, since the former involves driver's recovery from firmware reset, which is similar to health recovery. Otherwise, in some cases, such as a firmware upgrade on the DPU, the firmware pre-init bit may not be ready within current timeout and the driver will abort loading back after reset. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Fixes: 37ca95e62ee2 ("net/mlx5: Increase FW pre-init timeout for health recovery") Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 6e9d51b1a5cb ("net/mlx5e: Initialize link speed to zero") 1bffcea42926 ("net/mlx5e: Add devlink hairpin queues parameters") https://lore.kernel.org/all/20230324120623.4ebbc66f@canb.auug.org.au/ https://lore.kernel.org/all/20230321211135.47711-1-saeed@kernel.org/ Adjacent changes: drivers/net/phy/phy.c 323fe43cf9ae ("net: phy: Improved PHY error reporting in state machine") 4203d84032e2 ("net: phy: Ensure state transitions are processed from phy_stop()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net/mlx5e: Add devlink hairpin queues parametersGal Pressman1-0/+66
We refer to a TC NIC rule that involves forwarding as "hairpin". Hairpin queues are mlx5 hardware specific implementation for hardware forwarding of such packets. Per the discussion in [1], move the hairpin queues control (number and size) from debugfs to devlink. Expose two devlink params: - hairpin_num_queues: control the number of hairpin queues - hairpin_queue_size: control the size (in packets) of the hairpin queues [1] https://lore.kernel.org/all/20230111194608.7f15b9a1@kernel.org/ Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-12-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net/mlx5: Add comment to mlx5_devlink_params_register()Jiri Pirko1-0/+5
Add comment to mlx5_devlink_params_register() functions so it is clear that only driver init params should be registered here. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15net/mlx5: Suspend auxiliary devices only in case of PCI device suspendJiri Pirko1-2/+2
The original behavior introduced by commit c6acd629eec7 ("net/mlx5e: Add support for devlink-port in non-representors mode") correctly re-instantiated uplink devlink port and related netdevice during devlink reload. However with migration to auxiliary devices, this behaviour changed. Restore the original behaviour and tear down auxiliary devices completely during devlink reload. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-15net/mlx5e: TC, Add peer flow in mpesw modeRoi Dayan1-1/+1
While at it rename mlx5_lag_mpesw_is_activated() to mlx5_lag_is_mpesw() to be consistent with checking if other lag modes are activated. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-15net/mlx5: Lag, Control MultiPort E-Switch single FDB modeRoi Dayan1-0/+54
MultiPort E-Switch builds on newer hardware's capabilities and introduces a mode where a single E-Switch is used and all the vports and physical ports on the NIC are connected to it. The new mode will allow in the future a decrease in the memory used by the driver and advanced features that aren't possible today. This represents a big change in the current E-Switch implantation in mlx5. Currently, by default, each E-Switch manager manages its E-Switch. Steering rules in each E-Switch can only forward traffic to the native physical port associated with that E-Switch. While there are ways to target non-native physical ports, for example using a bond or via special TC rules. None of the ways allows a user to configure the driver to operate by default in such a mode nor can the driver decide to move to this mode by default as it's user configuration-driven right now. While MultiPort E-Switch single FDB mode is the preferred mode, older generations of ConnectX hardware couldn't support this mode so it was never implemented. Now that there is capable hardware present, start the transition to having this mode by default. Introduce a devlink parameter to control MultiPort E-Switch single FDB mode. This will allow users to select this mode on their system right now and in the future will allow the driver to move to this mode by default. Example: $ devlink dev param set pci/0000:00:0b.0 name esw_multiport value 1 \ cmode runtime Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-02-09Merge tag 'mlx5-next-netdev-deadlock' of ↵Jakub Kicinski1-12/+12
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next-netdev-deadlock This series from Jiri solves a deadlock when removing a network namespace with mlx5 devlink instance being in it. The deadlock is between: 1) mlx5_ib->unregister_netdevice_notifier() AND 2) mlx5_core->devlink_reload->cleanup_net() To slove this introduced mlx5 netdev added/removed events to track uplink netdev to be used for register_netdevice_notifier_dev_net() purposes. * tag 'mlx5-next-netdev-deadlock' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling net/mlx5: Introduce CQE error syndrome ==================== Link: https://lore.kernel.org/r/20230208005626.72930-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net/mlx5e: Fix trap event handlingJiri Pirko1-12/+12
Current code does not return correct return value from event handler. Fix it by returning NOTIFY_* and propagate err over newly introduce ctx structure. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-01-30devlink: remove devlink featuresJiri Pirko1-4/+5
Devlink features were introduced to disallow devlink reload calls of userspace before the devlink was fully initialized. The reason for this workaround was the fact that devlink reload was originally called without devlink instance lock held. However, with recent changes that converted devlink reload to be performed under devlink instance lock, this is redundant so remove devlink features entirely. Note that mlx5 used this to enable devlink reload conditionally only when device didn't act as multi port slave. Move the multi port check into mlx5_devlink_reload_down() callback alongside with the other checks preventing the device from reload in certain states. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27net/mlx5: Move eswitch port metadata devlink param to flow eswitch codeJiri Pirko1-49/+0
Move the param registration and handling code into the eswitch offloads code as they are related to each other. No point in having the devlink param registration done in separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27net/mlx5: Move flow steering devlink param to flow steering codeJiri Pirko1-69/+0
Move the param registration and handling code into the flow steering code as they are related to each other. No point in having the devlink param registration done in separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27net/mlx5: Move fw reset devlink param to fw reset codeJiri Pirko1-21/+0
Move the param registration and handling code into the fw reset code as they are related to each other. No point in having the devlink param registration done in separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27devlink: protect devlink param list by instance lockJiri Pirko1-46/+46
Commit 1d18bb1a4ddd ("devlink: allow registering parameters after the instance") as the subject implies introduced possibility to register devlink params even for already registered devlink instance. This is a bit problematic, as the consistency or params list was originally secured by the fact it is static during devlink lifetime. So in order to protect the params list, take devlink instance lock during the params operations. Introduce unlocked function variants and use them in drivers in locked context. Put lock assertions to appropriate places. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27net/mlx5: Covert devlink params registration to use ↵Jiri Pirko1-34/+46
devlink_params_register/unregister() Since mlx5 is the only user of devlink API to register/unregister a single param, convert it to use array registration function allowing to simplify the devlink API by removing the single param registration functions. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27net/mlx5: Change devlink param register/unregister function namesJiri Pirko1-2/+2
The functions are registering and unregistering devlink params, so change the names accordingly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-28net/mlx5: Fix RoCE setting at HCA levelShay Drory1-1/+1
mlx5 PF can disable RoCE for its VFs and SFs. In such case RoCE is marked as unsupported on those VFs/SFs. The cited patch added an option for disable (and enable) RoCE at HCA level. However, that commit didn't check whether RoCE is supported on the HCA and enabled user to try and set RoCE to on. Fix it by checking whether the HCA supports RoCE. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-28net/mlx5: Fix io_eq_size and event_eq_size params validationShay Drory1-1/+1
io_eq_size and event_eq_size params are of param type DEVLINK_PARAM_TYPE_U32. But, the validation callback is addressing them as DEVLINK_PARAM_TYPE_U16. This cause mismatch in validation in big-endian systems, in which values in range were rejected while 268500991 was accepted. Fix it by checking the U32 value in the validation callback. Fixes: 0844fa5f7b89 ("net/mlx5: Let user configure io_eq_size param") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-12-08net/mlx5: E-Switch, Implement devlink port function cmds to control migratableShay Drory1-0/+2
Implement devlink port function commands to enable / disable migratable. This is used to control the migratable capability of the device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08net/mlx5: E-Switch, Implement devlink port function cmds to control RoCEYishai Hadas1-0/+2
Implement devlink port function commands to enable / disable RoCE. This is used to control the RoCE device capabilities. This patch implement infrastructure which will be used by downstream patches that will add additional capabilities. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-01net: devlink: let the core report the driver name instead of the driversVincent Mailhol1-4/+0
The driver name is available in device_driver::name. Right now, drivers still have to report this piece of information themselves in their devlink_ops::info_get callback function. In order to factorize code, make devlink_nl_info_fill() add the driver name attribute. Now that the core sets the driver name attribute, drivers are not supposed to call devlink_info_driver_name_put() anymore. Remove devlink_info_driver_name_put() and clean-up all the drivers using this function in their callback. Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Tested-by: Ido Schimmel <idosch@nvidia.com> # mlxsw Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-12net/mlx5: Unregister traps on driver unload flowMoshe Shemesh1-9/+2
Before this patch, devlink traps are registered only on full driver probe and unregistered on driver removal. As devlink traps are not usable once driver functionality is unloaded, it should be unrgeistered also on flows that unload the driver and then registered when loaded back, e.g. devlink reload flow. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-01net: devlink: convert reload command to take implicit devlink->lockJiri Pirko1-4/+0
Convert reload command to behave the same way as the rest of the commands and let if be called with devlink->lock held. Remove the temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK flag is no longer used, remove it alongside. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-29net/mlx5: Lock mlx5 devlink reload callbacksMoshe Shemesh1-19/+31
Change devlink instance locks in mlx5 driver to have devlink reload callbacks locked, while keeping all driver paths which lead to devl_ API functions called by the driver locked. Add mlx5_load_one_devl_locked() and mlx5_unload_one_devl_locked() which are used by the paths which are already locked such as devlink reload callbacks. This patch makes the driver use devl_ API also for traps register as these functions are called from the driver paths parallel to reload that requires locking now. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29net/mlx5: Move fw reset unload to mlx5_fw_reset_complete_reloadMoshe Shemesh1-1/+10
Refactor fw reset code to have the unload driver part done on mlx5_fw_reset_complete_reload(), so if it was called by the PF which initiated the reload fw activate flow, the unload part will be handled by the mlx5_devlink_reload_fw_activate() callback itself and not by the reset event work. This will be used by the downstream patch to invoke devlink reload callbacks with devlink lock held. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10net/mlx5: Increase FW pre-init timeout for health recoveryGavin Li1-2/+2
Currently, health recovery will reload driver to recover it from fatal errors. During the driver's load process, it would wait for FW to set the pre-init bit for up to 120 seconds, beyond this threshold it would abort the load process. In some cases, such as a FW upgrade on the DPU, this timeout period is insufficient, and the user has no way to recover the host device. To solve this issue, introduce a new FW pre-init timeout for health recovery, which is set to 2 hours. The timeout for devlink reload and probe will use the original one because they are user triggered flows, and therefore should not have a significantly long timeout, during which the user command would hang. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Delete redundant default assignment of runtime devlink paramsShay Drory1-20/+0
Runtime devlink params always read their values from the get() callbacks. Also, it is an error to set driverinit_value for params which don't support driverinit cmode. Delete such assignments. In addition, move the set of default matching mode inside eswitch code. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-02-24net/mlx5: Add clarification on sync reset failureMoshe Shemesh1-7/+3
In case devlink reload action fw_activate failed in sync reset stage, use the new MFRL field reset_state to find why it failed and share this clarification with the user. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22net/mlx5: Let user configure max_macs generic paramShay Drory1-0/+67
Currently, max_macs is taking 70Kbytes of memory per function. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the number of max_macs. For example, to reduce the number of max_macs to 1, execute:: $ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \ cmode driverinit $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22net/mlx5: Let user configure event_eq_size paramShay Drory1-0/+7
Event EQ is an EQ which received the notification of almost all the events generated by the NIC. Currently, each event EQ is taking 512KB of memory. This size is not needed in most use cases, and is critical with large scale. Hence, allow user to configure the size of the event EQ. For example to reduce event EQ size to 64, execute:: $ devlink dev param set pci/0000:00:0b.0 name event_eq_size value 64 \ cmode driverinit $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22net/mlx5: Let user configure io_eq_size paramShay Drory1-0/+14
Currently, each I/O EQ is taking 128KB of memory. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the size of I/O EQs. For example, to reduce I/O EQ size to 64, execute: $ devlink dev param set pci/0000:00:0b.0 name io_eq_size value 64 \ cmode driverinit $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-26net/mlx5: remove the recent devlink paramsJakub Kicinski1-69/+0
revert commit 46ae40b94d88 ("net/mlx5: Let user configure io_eq_size param") revert commit a6cb08daa3b4 ("net/mlx5: Let user configure event_eq_size param") revert commit 554604061979 ("net/mlx5: Let user configure max_macs param") The EQE parameters are applicable to more drivers, they should be configured via standard API, probably ethtool. Example of another driver needing something similar: https://lore.kernel.org/all/1633454136-14679-3-git-send-email-sbhatta@marvell.com/ The last param for "max_macs" is probably fine but the documentation is severely lacking. The meaning and implications for changing the param need to be stated. Link: https://lore.kernel.org/r/20211026152939.3125950-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-25net/mlx5: Let user configure max_macs paramShay Drory1-0/+69
Currently, max_macs is taking 70Kbytes of memory per function. This size is not needed in all use cases, and is critical with large scale. Hence, allow user to configure the number of max_macs. For example, to reduce the number of max_macs to 1, execute:: $ devlink dev param set pci/0000:00:0b.0 name max_macs value 1 \ cmode driverinit $ devlink dev reload pci/0000:00:0b.0 Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-16net/mlx5: Disable roce at HCA levelShay Drory1-1/+2
Currently, when a user disables roce via the devlink param, this change isn't passed down to the device. If device allows disabling RoCE at device level, make use of it. This instructs the device to skip memory allocations related to RoCE functionality which otherwise is done by the device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-13net/mlx5: Set devlink reload feature bit for supported devices onlyLeon Romanovsky1-1/+4
Mulitport slave device doesn't support devlink reload, so instead of complicating initialization flow with devlink_reload_enable() which will be removed in next patch, don't set DEVLINK_F_RELOAD feature bit for such devices. This fixes an error when reload counters exposed (and equal zero) for the mode that is not supported at all. Fixes: d89ddaae1766 ("net/mlx5: Disable devlink reload for multi port slave device") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-13devlink: Allow control devlink ops behavior through feature maskLeon Romanovsky1-0/+1
Introduce new devlink call to set feature mask to control devlink behavior during device initialization phase after devlink_alloc() is already called. This allows us to set reload ops based on device property which is not known at the beginning of driver initialization. For the sake of simplicity, this API lacks any type of locking and needs to be called before devlink_register() to make sure that no parallel access to the ops is possible at this stage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01net/mlx5: Warn for devlink reload when there are VFs aliveLama Kayal1-0/+5
When performing PF reload, VF can't communicate with FW until it recovers and reloads as well. Add a warning message when performing devlink reload while VFs are still present. Thus, giving a notice of an unfavorable behavior that might occur as a result of a consequential reloads and cause interruption of VF recovery. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-27net/mlx5: Accept devlink user input after driver initialization completeLeon Romanovsky1-7/+2
The change of devlink_alloc() to accept device makes sure that device is fully initialized and device_register() does nothing except allowing users to use that devlink instance. Such change ensures that no user input will be usable till that point and it eliminates the need to worry about internal locking as long as devlink_register is called last since all accesses to the devlink are during initialization. This change fixes the following lockdep warning. ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc2+ #27 Not tainted ------------------------------------------------------ devlink/265 is trying to acquire lock: ffff8880133c2bc0 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_unload_one+0x1e/0xa0 [mlx5_core] but task is already holding lock: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (devlink_mutex){+.+.}-{3:3}: __mutex_lock+0x149/0x1310 devlink_register+0xe7/0x280 mlx5_devlink_register+0x118/0x480 [mlx5_core] mlx5_init_one+0x34b/0x440 [mlx5_core] probe_one+0x480/0x6e0 [mlx5_core] pci_device_probe+0x2a0/0x4a0 really_probe+0x1cb/0xba0 __driver_probe_device+0x18f/0x470 driver_probe_device+0x49/0x120 __driver_attach+0x1ce/0x400 bus_for_each_dev+0x11e/0x1a0 bus_add_driver+0x309/0x570 driver_register+0x20f/0x390 0xffffffffa04a0062 do_one_initcall+0xd5/0x400 do_init_module+0x1c8/0x760 load_module+0x7d9d/0xa4b0 __do_sys_finit_module+0x118/0x1a0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}: __lock_acquire+0x2999/0x5a40 lock_acquire+0x1a9/0x4a0 __mutex_lock+0x149/0x1310 mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core] devlink_reload+0x1f2/0x640 devlink_nl_cmd_reload+0x6c3/0x10d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x27f/0x4a0 netlink_rcv_skb+0x11e/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 netlink_sendmsg+0x6fb/0xbe0 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x192/0x240 __x64_sys_sendto+0xdc/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(devlink_mutex); lock(&dev->intf_state_mutex); lock(devlink_mutex); lock(&dev->intf_state_mutex); *** DEADLOCK *** 3 locks held by devlink/265: #0: ffffffff836371d0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 #1: ffffffff83637288 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x31a/0x4a0 #2: ffffffff8362b468 (devlink_mutex){+.+.}-{3:3}, at: devlink_nl_pre_doit+0x2b/0x8d0 stack backtrace: CPU: 0 PID: 265 Comm: devlink Not tainted 5.14.0-rc2+ #27 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0x268/0x310 ? print_circular_bug+0x460/0x460 ? __kernel_text_address+0xe/0x30 ? alloc_chain_hlocks+0x1e6/0x5a0 __lock_acquire+0x2999/0x5a40 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? add_lock_to_list.constprop.0+0x6c/0x530 lock_acquire+0x1a9/0x4a0 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? lock_release+0x6c0/0x6c0 ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0 ? lock_is_held_type+0x98/0x110 __mutex_lock+0x149/0x1310 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? lock_is_held_type+0x98/0x110 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] ? find_held_lock+0x2d/0x110 ? mutex_lock_io_nested+0x1160/0x1160 ? mlx5_lag_is_active+0x72/0x90 [mlx5_core] ? lock_downgrade+0x6d0/0x6d0 ? do_raw_spin_lock+0x12e/0x270 ? rwlock_bug.part.0+0x90/0x90 ? mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_unload_one+0x1e/0xa0 [mlx5_core] mlx5_devlink_reload_down+0x185/0x2b0 [mlx5_core] ? netlink_broadcast_filtered+0x308/0xac0 ? mlx5_devlink_info_get+0x1f0/0x1f0 [mlx5_core] ? __build_skb_around+0x110/0x2b0 ? __alloc_skb+0x113/0x2b0 devlink_reload+0x1f2/0x640 ? devlink_unregister+0x1e0/0x1e0 ? security_capable+0x51/0x90 devlink_nl_cmd_reload+0x6c3/0x10d0 ? devlink_nl_cmd_get_doit+0x1e0/0x1e0 ? devlink_nl_pre_doit+0x72/0x8d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 ? __lock_acquire+0x15e2/0x5a40 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? mutex_lock_io_nested+0x1160/0x1160 ? security_capable+0x51/0x90 genl_rcv_msg+0x27f/0x4a0 ? genl_get_cmd+0x3c0/0x3c0 ? lock_acquire+0x1a9/0x4a0 ? devlink_nl_cmd_get_doit+0x1e0/0x1e0 ? lock_release+0x6c0/0x6c0 netlink_rcv_skb+0x11e/0x340 ? genl_get_cmd+0x3c0/0x3c0 ? netlink_ack+0x930/0x930 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 ? netlink_attachskb+0x750/0x750 ? __alloc_skb+0x113/0x2b0 netlink_sendmsg+0x6fb/0xbe0 ? netlink_unicast+0x700/0x700 ? netlink_unicast+0x700/0x700 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x192/0x240 ? __x64_sys_getpeername+0xb0/0xb0 ? do_sys_openat2+0x10a/0x370 ? down_write_nested+0x150/0x150 ? do_user_addr_fault+0x215/0xd50 ? __x64_sys_openat+0x11f/0x1d0 ? __x64_sys_open+0x1a0/0x1a0 __x64_sys_sendto+0xdc/0x1b0 ? syscall_enter_from_user_mode+0x1d/0x50 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f50b50b6b3a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007fff6c0d3f38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f50b50b6b3a RDX: 0000000000000038 RSI: 000055763ac08440 RDI: 0000000000000003 RBP: 000055763ac08410 R08: 00007f50b5192200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 000055763ac08410 R15: 000055763ac08440 mlx5_core 0000:00:09.0: firmware version: 4.8.9999 mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link) mlx5_core 0000:00:09.0 eth1: Link up Fixes: a6f3b62386a0 ("net/mlx5: Move devlink registration before interfaces load") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>