summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/broadcom
AgeCommit message (Collapse)AuthorFilesLines
2024-05-11bnxt_en: silence clang build warningVadim Fedorenko1-1/+1
Clang build brings a warning: ../drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:133:12: warning: comparison of distinct pointer types ('typeof (tmo_us) *' (aka 'unsigned int *') and 'typeof (65535) *' (aka 'int *')) [-Wcompare-distinct-pointer-types] 133 | tmo_us = min(tmo_us, BNXT_PTP_QTS_MAX_TMO_US); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix it by specifying proper type for BNXT_PTP_QTS_MAX_TMO_US. Fixes: 7de3c2218eed ("bnxt_en: Add a timeout parameter to bnxt_hwrm_port_ts_query()") Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240509151833.12579-1-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet6-7/+7
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-04bnxt: fix bnxt_get_avail_msix() returning negative valuesDavid Wei2-13/+5
Current net-next/main does not boot for older chipsets e.g. Stratus. Sample dmesg: [ 11.368315] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Able to reserve only 0 out of 9 requested RX rings [ 11.390181] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Unable to reserve tx rings [ 11.438780] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): 2nd rings reservation failed. [ 11.487559] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Not enough rings available. [ 11.506012] bnxt_en 0000:02:00.0: probe with driver bnxt_en failed with error -12 This is caused by bnxt_get_avail_msix() returning a negative value for these chipsets not using the new resource manager i.e. !BNXT_NEW_RM. This in turn causes hwr.cp in __bnxt_reserve_rings() to be set to 0. In the current call stack, __bnxt_reserve_rings() is called from bnxt_set_dflt_rings() before bnxt_init_int_mode(). Therefore, bp->total_irqs is always 0 and for !BNXT_NEW_RM bnxt_get_avail_msix() always returns a negative number. Historically, MSIX vectors were requested by the RoCE driver during run-time and bnxt_get_avail_msix() was used for this purpose. Today, RoCE MSIX vectors are statically allocated. bnxt_get_avail_msix() should only be called for the BNXT_NEW_RM() case to reserve the MSIX ahead of time for RoCE use. bnxt_get_avail_msix() is also be simplified to handle the BNXT_NEW_RM() case only. Fixes: d630624ebd70 ("bnxt_en: Utilize ulp client resources if RoCE is not registered") Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240502203757.3761827-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-5/+29
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: Add VF PCI ID for 5760X (P7) chipsAjit Khaparde2-1/+4
No driver logic changes are required to support the VFs, so just add the VF PCI ID. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: Optimize recovery path ULP locking in the driverKalesh AP3-26/+44
In the error recovery path (AER, firmware recovery, etc), the driver notifies the RoCE driver via ULP_STOP before the reset and via ULP_START after the reset, all under RTNL_LOCK. The RoCE driver can take a long time if there are a lot of QPs to destroy, so it is not ideal to hold the global RTNL lock. Rely on the new en_dev_lock mutex instead for ULP_STOP and ULP_START. For the most part, we move the ULP_STOP call before we take the RTNL lock and move the ULP_START after RTNL unlock. Note that SRIOV re-enablement must be done after ULP_START or RoCE on the VFs will not resume properly after reset. The one scenario in bnxt_hwrm_if_change() where the RTNL lock is already taken in the .ndo_open() context requires the ULP restart to be deferred to the bnxt_sp_task() workqueue. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: Add a mutex to synchronize ULP operationsKalesh AP2-1/+22
The current scheme relies heavily on the RTNL lock for all ULP operations between the L2 and the RoCE driver. Add a new en_dev_lock mutex so that the asynchronous ULP_STOP and ULP_START operations can be serialized with bnxt_register_dev() and bnxt_unregister_dev() calls without relying on the RTNL lock. The next patch will remove the RTNL lock from the ULP_STOP and ULP_START calls. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: Don't call ULP_STOP/ULP_START during L2 resetMichael Chan1-11/+2
There is no need to call ULP_STOP and ULP_START before and after the L2 reset in bnxt_reset_task(). This L2 reset is done after detecting TX timeout, RX ring errors, or VF config changes. The L2 reset does not affect RoCE since the firmware is not reset and the backing store is left alone. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: Don't support offline self test when RoCE driver is loadedKalesh AP1-3/+8
Offline self test is a very disruptive operation for RoCE and requires all active QPs to be destroyed. With a large number of QPs, it can take a long time to destroy all the QPs and can timeout. Do not allow ethtool offline self test if the RoCE driver is registered on the device. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: share NQ ring sw_stats memory with subringsEdwin Peer3-19/+27
On P5_PLUS chips and later, the NQ rings have subrings for RX and TX completions respectively. These subrings are passed to the poll function instead of the base NQ, but each ring carries its own copy of the software ring statistics. For stats to be conveniently accessible in __bnxt_poll_work(), the statistics memory should either be shared between the NQ and its subrings or the subrings need to be included in the ethtool stats aggregation logic. This patch opts for the former, because it's more efficient and less confusing having the software statistics for a ring exist in a single place. Before this patch, the counter will not be displayed if the "wrong" cpr->sw_stats was used to increment a counter. Link: https://lore.kernel.org/netdev/CACKFLikEhVAJA+osD7UjQNotdGte+fth7zOy7yDdLkTyFk9Pyw@mail.gmail.com/ Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29net: bcmgenet: synchronize UMAC_CMD accessDoug Berger4-3/+23
The UMAC_CMD register is written from different execution contexts and has insufficient synchronization protections to prevent possible corruption. Of particular concern are the acceses from the phy_device delayed work context used by the adjust_link call and the BH context that may be used by the ndo_set_rx_mode call. A spinlock is added to the driver to protect contended register accesses (i.e. reg_lock) and it is used to synchronize accesses to UMAC_CMD. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-29net: bcmgenet: synchronize use of bcmgenet_set_rx_mode()Doug Berger1-1/+3
The ndo_set_rx_mode function is synchronized with the netif_addr_lock spinlock and BHs disabled. Since this function is also invoked directly from the driver the same synchronization should be applied. Fixes: 72f96347628e ("net: bcmgenet: set Rx mode before starting netif") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-29net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL accessDoug Berger1-1/+3
The EXT_RGMII_OOB_CTRL register can be written from different contexts. It is predominantly written from the adjust_link handler which is synchronized by the phydev->lock, but can also be written from a different context when configuring the mii in bcmgenet_mii_config(). The chances of contention are quite low, but it is conceivable that adjust_link could occur during resume when WoL is enabled so use the phydev->lock synchronizer in bcmgenet_mii_config() to be sure. Fixes: afe3f907d20f ("net: bcmgenet: power on MII block for all MII modes") Cc: stable@vger.kernel.org Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-49/+68
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25net: b44: set pause params only when interface is upPeter Münster1-6/+8
b44_free_rings() accesses b44::rx_buffers (and ::tx_buffers) unconditionally, but b44::rx_buffers is only valid when the device is up (they get allocated in b44_open(), and deallocated again in b44_close()), any other time these are just a NULL pointers. So if you try to change the pause params while the network interface is disabled/administratively down, everything explodes (which likely netifd tries to do). Link: https://github.com/openwrt/openwrt/issues/13789 Fixes: 1da177e4c3f4 (Linux-2.6.12-rc2) Cc: stable@vger.kernel.org Reported-by: Peter Münster <pm@a16n.net> Suggested-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Vaclav Svoboda <svoboda@neng.cz> Tested-by: Peter Münster <pm@a16n.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Peter Münster <pm@a16n.net> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/87y192oolj.fsf@a16n.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25eth: bnxt: fix counting packets discarded due to OOM and netpollJakub Kicinski1-26/+18
I added OOM and netpoll discard counters, naively assuming that the cpr pointer is pointing to a common completion ring. Turns out that is usually *a* completion ring but not *the* completion ring which bnapi->cp_ring points to. bnapi->cp_ring is where the stats are read from, so we end up reporting 0 thru ethtool -S and qstat even though the drop events have happened. Make 100% sure we're recording statistics in the correct structure. Fixes: 907fd4a294db ("bnxt: count discards due to memory allocation errors") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240424002148.3937059-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25Merge branch '100GbE' of ↵Jakub Kicinski1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: Support 5 layer Tx scheduler topology Mateusz Polchlopek says: For performance reasons there is a need to have support for selectable Tx scheduler topology. Currently firmware supports only the default 9-layer and 5-layer topology. This patch series enables switch from default to 5-layer topology, if user decides to opt-in. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Document tx_scheduling_layers parameter ice: Add tx_scheduling_layers devlink param ice: Enable switching default Tx scheduler topology ice: Adjust the VSI/Aggregator layers ice: Support 5 layer topology devlink: extend devlink_param *set pointer ==================== Link: https://lore.kernel.org/r/20240422203913.225151-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25bnxt_en: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+4
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Tested-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Link: https://lore.kernel.org/r/20240422152626.175569-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22devlink: extend devlink_param *set pointerMateusz Polchlopek1-2/+4
Extend devlink_param *set function pointer to take extack as a param. Sometimes it is needed to pass information to the end user from set function. It is more proper to use for that netlink instead of passing message to dmesg. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-22bnxt_en: Fix error recovery for 5760X (P7) chipsMichael Chan1-1/+1
During error recovery, such as AER fatal error slot reset, we call bnxt_try_map_fw_health_reg() to try to get access to the health register to determine the firmware state. Fix bnxt_try_map_fw_health_reg() to recognize the P7 chip correctly and set up the health register. This fixes this type of AER slot reset failure: bnxt_en 0000:04:00.0: AER: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID) bnxt_en 0000:04:00.0 enp4s0f0np0: PCI I/O error detected bnxt_en 0000:04:00.0 bnxt_re0: Handle device suspend call bnxt_en 0000:04:00.1 enp4s0f1np1: PCI I/O error detected bnxt_en 0000:04:00.1 bnxt_re1: Handle device suspend call pcieport 0000:00:02.0: AER: Root Port link has been reset (0) bnxt_en 0000:04:00.0 enp4s0f0np0: PCI Slot Reset bnxt_en 0000:04:00.0: enabling device (0000 -> 0002) bnxt_en 0000:04:00.0: Firmware not ready bnxt_en 0000:04:00.1 enp4s0f1np1: PCI Slot Reset bnxt_en 0000:04:00.1: enabling device (0000 -> 0002) bnxt_en 0000:04:00.1: Firmware not ready pcieport 0000:00:02.0: AER: device recovery failed Fixes: a432a45bdba4 ("bnxt_en: Define basic P7 macros") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22bnxt_en: Fix the PCI-AER routinesVikas Gupta1-3/+16
We do not support two simultaneous recoveries so check for reset flag, BNXT_STATE_IN_FW_RESET, and do not proceed with AER further. When the pci channel state is pci_channel_io_frozen, the PCIe link can not be trusted so we disable the traffic immediately and stop BAR access by calling bnxt_fw_fatal_close(). BAR access after AER fatal error can cause an NMI. Fixes: f75d9a0aa967 ("bnxt_en: Re-write PCI BARs after PCI fatal error.") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22bnxt_en: refactor reset close codeVikas Gupta1-6/+11
Introduce bnxt_fw_fatal_close() API which can be used to stop data path and disable device when firmware is in fatal state. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-20net: bcmasp: fix memory leak when bringing down interfaceJustin Chen1-7/+14
When bringing down the TX rings we flush the rings but forget to reclaimed the flushed packets. This leads to a memory leak since we do not free the dma mapped buffers. This also leads to tx control block corruption when bringing down the interface for power management. Fixes: 490cb412007d ("net: bcmasp: Add support for ASP2.0 Ethernet controller") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240418180541.2271719-1-justin.chen@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-1/+7
Cross-merge networking fixes after downstream PR. Conflicts: net/unix/garbage.c 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c faa12ca24558 ("bnxt_en: Reset PTP tx_avail after possible firmware reset") b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 7ac10c7d728d ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()") 194fad5b2781 ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions") drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 958f56e48385 ("net/mlx5e: Un-expose functions in en.h") 49e6c9387051 ("net/mlx5e: RSS, Block XOR hash with over 128 channels") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11bnxt_en: Update MODULE_DESCRIPTIONMichael Chan1-1/+1
Update MODULE_DESCRIPTION to the more generic adapter family name. The old name only includes the first generation of supported adapters. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11bnxt_en: Utilize ulp client resources if RoCE is not registeredVikas Gupta3-9/+48
If the RoCE driver is not registered for a RoCE capable device, add flexibility to use the RoCE resources (MSIX/NQs) for L2 purposes, such as additional rings configured by the user or for XDP. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11bnxt_en: Change MSIX/NQs allocation policyVikas Gupta4-19/+92
The existing scheme sets aside a number of MSIX/NQs for the RoCE driver whether the RoCE driver is registered or not. This scheme is not flexible and limits the resources available for the L2 rings if RoCE is never used. Modify the scheme so that the RoCE MSIX/NQs can be used by the L2 driver if they are not used for RoCE. The MSIX/NQs are now represented by 3 fields. bp->ulp_num_msix_want contains the desired default value, edev->ulp_num_msix_vec contains the available value (but not necessarily in use), and ulp_tbl->msix_requested contains the actual value in use by RoCE. The L2 driver can dip into edev->ulp_num_msix_vec if necessary. We need to add rtnl_lock() back in bnxt_register_dev() and bnxt_unregister_dev() to synchronize the MSIX usage between L2 and RoCE. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functionsVikas Gupta3-10/+35
In its current form, bnxt_rdma_aux_device_init() not only initializes the necessary data structures of the newly created aux device but also adds the aux device into the aux bus subsytem. Refactor the logic into separate functions, first function to initialize the aux device along with the required resources and second, to actually add the device to the aux bus subsytem. This separation helps to create bnxt_en_dev much earlier and save its resources separately. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11bnxt_en: Remove unneeded MSIX base structure fields and codeVikas Gupta3-45/+8
Ever since commit: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") The MSIX base vector is effectively always 0. Remove all unneeded structure fields and code referencing the MSIX base. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11bnxt_en: Remove a redundant NULL check in bnxt_register_dev()Kalesh AP1-3/+0
The memory for "edev->ulp_tbl" is allocated inside the bnxt_rdma_aux_device_init() function. If it fails, the driver will not create the auxiliary device for RoCE. Hence the NULL check inside bnxt_register_dev() is unnecessary. Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-11bnxt_en: Skip ethtool RSS context configuration in ifdown statePavan Chebbi1-0/+5
The current implementation requires the ifstate to be up when configuring the RSS contexts. It will try to fetch the RX ring IDs and will crash if it is in ifdown state. Return error if !netif_running() to prevent the crash. An improved implementation is in the works to allow RSS contexts to be changed while in ifdown state. Fixes: b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-08bnxt_en: Reset PTP tx_avail after possible firmware resetPavan Chebbi1-0/+2
It is possible that during error recovery and firmware reset, there is a pending TX PTP packet waiting for the timestamp. We need to reset this condition so that after recovery, the tx_avail count for PTP is reset back to the initial value. Otherwise, we may not accept any PTP TX timestamps after recovery. Fixes: 118612d519d8 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08bnxt_en: Fix error recovery for RoCE ulp clientVikas Gupta1-0/+3
Since runtime MSIXs vector allocation/free has been removed, the L2 driver needs to repopulate the MSIX entries for the ulp client as the irq table may change during the recovery process. Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()Vikas Gupta1-1/+2
If ulp = kzalloc() fails, the allocated edev will leak because it is not properly assigned and the cleanup path will not be able to free it. Fix it by assigning it properly immediately after allocation. Fixes: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08ipv4: Set scope explicitly in ip_route_output().Guillaume Nault1-1/+2
Add a "scope" parameter to ip_route_output() so that callers don't have to override the tos parameter with the RTO_ONLINK flag if they want a local scope. This will allow converting flowi4_tos to dscp_t in the future, thus allowing static analysers to flag invalid interactions between "tos" (the DSCP bits) and ECN. Only three users ask for local scope (bonding, arp and atm). The others continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn users about the limitations of ip_route_output(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> # infiniband Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-06bnxt_en: Fix PTP firmware timeout parameterMichael Chan1-1/+1
Use the correct tmo_us microsecond parameter for the PTP firmware timeout parameter. Fixes: 7de3c2218eed ("bnxt_en: Add a timeout parameter to bnxt_hwrm_port_ts_query()") Reported-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240404195500.171071-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-4/+12
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/ip_gre.c 17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head") 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps") https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/ Adjacent changes: net/ipv6/ip6_fib.c d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().") 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Add warning message about disallowed speed changeSreekanth Reddy1-0/+3
Some chips may not allow changing default speed when dual rate transceivers modules are used. Firmware on those chips will indicate the same to the driver. Add a warning message when speed change is not supported because a dual rate transceiver is detected by the NIC. Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-8-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Update firmware interface to 1.10.3.39Pavan Chebbi1-47/+137
This updated interface supports backing store APIs to configure host FW trace buffers, updates transceivers ID types, updates to TrueFlow features and other changes for the basic L2 features. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-7-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Add XDP Metadata supportSomnath Kotur2-5/+41
- Change the last arg to xdp_prepare_buff to true from false. - Ensure that when XDP_PASS is returned the xdp->data_meta area is copied to the skb->data area. Account for the meta data size on skb allocation and do a pull after to move it to the "reserved" zone. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-6-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Change bnxt_rx_xdp function prototypeSomnath Kotur3-16/+16
Change bnxt_rx_xdp() to take a pointer to xdp instead of stack variable. This is in prepartion for the XDP metadata patch change where the BPF program can change the value of the xdp.meta_data. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-5-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Allocate page pool per numa nodeSomnath Kotur1-4/+11
Driver's Page Pool allocation code looks at the node local to the PCIe device to determine where to allocate memory. In scenarios where the core count per NUMA node is low (< default rings) it makes sense to exhaust page pool allocations on Node 0 first and then moving on to allocating page pools for the remaining rings from Node 1. With this patch, and the following configuration on the NIC $ ethtool -L ens1f0np0 combined 16 (core count/node = 12, first 12 rings on node#0, last 4 rings node#1) and traffic redirected to a ring on node#1 , we see a performance improvement of ~20% Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-4-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Enable XPS by default on driver loadSomnath Kotur1-1/+45
Enable XPS on default during NIC open. The choice of Tx queue is based on the CPU executing the thread that submits the Tx request. The pool of Tx queues will be spread evenly across both device-attached NUMA nodes(local) and remote NUMA nodes. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-3-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Add delay to handle Downstream Port Containment (DPC) AERVikas Gupta1-0/+4
In case of DPC, after issuing the hot reset, the kernel waits for 100ms for the device to complete the reset. However on some older chips, the firmware may take up to 1 second to complete the reset, only after which the driver can restart the card. Introduce delay of 900ms to handle this scenario on the older chipsets. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-2-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04Revert "tg3: Remove residual error handling in tg3_suspend"Paolo Abeni1-4/+26
This reverts commit 9ab4ad295622a3481818856762471c1f8c830e18. I went out of coffee and applied it to the wrong tree. Blame on me. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-04tg3: Remove residual error handling in tg3_suspendNikita Kiryushin1-26/+4
As of now, tg3_power_down_prepare always ends with success, but the error handling code from former tg3_set_power_state call is still here. This code became unreachable in commit c866b7eac073 ("tg3: Do not use legacy PCI power management"). Remove (now unreachable) error handling code for simplification and change tg3_power_down_prepare to a void function as its result is no more checked. Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240401191418.361747-1-kiryushin@ancud.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-04tg3: Remove residual error handling in tg3_suspendNikita Kiryushin1-26/+4
As of now, tg3_power_down_prepare always ends with success, but the error handling code from former tg3_set_power_state call is still here. This code became unreachable in commit c866b7eac073 ("tg3: Do not use legacy PCI power management"). Remove (now unreachable) error handling code for simplification and change tg3_power_down_prepare to a void function as its result is no more checked. Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240401191418.361747-1-kiryushin@ancud.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-03net: bcmgenet: Reset RBUF on first openPhil Elwell1-4/+12
If the RBUF logic is not reset when the kernel starts then there may be some data left over from any network boot loader. If the 64-byte packet headers are enabled then this can be fatal. Extend bcmgenet_dma_disable to do perform the reset, but not when called from bcmgenet_resume in order to preserve a wake packet. N.B. This different handling of resume is just based on a hunch - why else wouldn't one reset the RBUF as well as the TBUF? If this isn't the case then it's easy to change the patch to make the RBUF reset unconditional. See: https://github.com/raspberrypi/linux/issues/3850 See: https://github.com/raspberrypi/firmware/issues/1882 Signed-off-by: Phil Elwell <phil@raspberrypi.com> Signed-off-by: Maarten Vanraes <maarten@rmail.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-30netlink: introduce type-checking attribute iterationJohannes Berg1-4/+1
There are, especially with multi-attr arrays, many cases of needing to iterate all attributes of a specific type in a netlink message or a nested attribute. Add specific macros to support that case. Also convert many instances using this spatch: @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } Although I had to undo one bad change this made, and I also adjusted some other code for whitespace and to use direct variable initialization now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29bnxt_en: Support adding ntuple rules on RSS contextsPavan Chebbi3-9/+51
When the user wants to add an ntuple filter to an RSS context, select the appropriate VNIC belonging to the selected RSS context and add the VNIC destination rule. Make the necessary changes to bnxt_add_ntuple_cls_rule(). Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-13-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>