summaryrefslogtreecommitdiff
path: root/include/linux/mlx5
AgeCommit message (Collapse)AuthorFilesLines
2023-06-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-10/+0
Pull rdma updates from Jason Gunthorpe: "This cycle saw a focus on rxe and bnxt_re drivers: - Code cleanups for irdma, rxe, rtrs, hns, vmw_pvrdma - rxe uses workqueues instead of tasklets - rxe has better compliance around access checks for MRs and rereg_mr - mana supportst he 'v2' FW interface for RX coalescing - hfi1 bug fix for stale cache entries in its MR cache - mlx5 buf fix to handle FW failures when destroying QPs - erdma HW has a new doorbell allocation mechanism for uverbs that is secure - Lots of small cleanups and rework in bnxt_re: - Use the common mmap functions - Support disassociation - Improve FW command flow - support for 'low latency push'" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits) RDMA/bnxt_re: Fix an IS_ERR() vs NULL check RDMA/bnxt_re: Fix spelling mistake "priviledged" -> "privileged" RDMA/bnxt_re: Remove duplicated include in bnxt_re/main.c RDMA/bnxt_re: Refactor code around bnxt_qplib_map_rc() RDMA/bnxt_re: Remove incorrect return check from slow path RDMA/bnxt_re: Enable low latency push RDMA/bnxt_re: Reorg the bar mapping RDMA/bnxt_re: Move the interface version to chip context structure RDMA/bnxt_re: Query function capabilities from firmware RDMA/bnxt_re: Optimize the bnxt_re_init_hwrm_hdr usage RDMA/bnxt_re: Add disassociate ucontext support RDMA/bnxt_re: Use the common mmap helper functions RDMA/bnxt_re: Initialize opcode while sending message RDMA/cma: Remove NULL check before dev_{put, hold} RDMA/rxe: Simplify cq->notify code RDMA/rxe: Fixes mr access supported list RDMA/bnxt_re: optimize the parameters passed to helper functions RDMA/bnxt_re: remove redundant cmdq_bitmap RDMA/bnxt_re: use firmware provided max request timeout RDMA/bnxt_re: cancel all control path command waiters upon error ...
2023-06-27Merge tag 'v6.4' into rdma.git for-nextJason Gunthorpe2-1/+16
Linux 6.4 Resolve conflicts between rdma rc and next in rxe_cq matching linux-next: drivers/infiniband/sw/rxe/rxe_cq.c: https://lore.kernel.org/r/20230622115246.365d30ad@canb.auug.org.au Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-23net/mlx5: Fix reserved at offset in hca_cap registerLama Kayal1-2/+2
A member of struct mlx5_ifc_cmd_hca_cap_bits has been mistakenly assigned the wrong reserved_at offset value. Correct it to align to the right value, thus avoid future miscalculation. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5: Expose bits for local loopback counterOr Har-Toov1-2/+4
Add needed HW bits for querying local loopback counter and the HCA capability for it. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5: Handle sync reset unload eventMoshe Shemesh2-1/+3
Added a new event handler to firmware sync reset, which is used to support firmware sync reset flow on smart NIC. Adding this new stage to the flow enables the firmware to ensure host PFs unload before ECPFs unload, to avoid race of PFs recovery. If firmware sends sync_reset_unload event to driver the driver should unload and close all HW resources of the function. Once the driver finishes unloading part, it can't get any more events from firmware as event queues are closed, so it polls the reset state field to know when to continue to next stage of the sync reset flow. Added capability bit for supporting sync_reset_unload event. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5: Expose timeout for sync reset unload stageMoshe Shemesh1-1/+3
Expose new timoueout in Default Timeouts Register to be used on sync reset flow running on smart NIC. In this flow the driver should know how much time to wait from getting unload request till firmware will ask the PF to continue to next stage of the flow. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+12
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/mlx5/driver.h 617f5db1a626 ("RDMA/mlx5: Fix affinity assignment") dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") https://lore.kernel.org/all/20230613125939.595e50b8@canb.auug.org.au/ tools/testing/selftests/net/mptcp/mptcp_join.sh 47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported") 425ba803124b ("selftests: mptcp: join: support RM_ADDR for used endpoints or not") 45b1a1227a7a ("mptcp: introduces more address related mibs") 0639fa230a21 ("selftests: mptcp: add explicit check for new mibs") https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-0/+12
Pull rdma fixes from Jason Gunthorpe: "This is an unusually large bunch of bug fixes for the later rc cycle, rxe and mlx5 both dumped a lot of things at once. rxe continues to fix itself, and mlx5 is fixing a bunch of "queue counters" related bugs. There is one highly notable bug fix regarding the qkey. This small security check was missed in the original 2005 implementation and it allows some significant issues. Summary: - Two rtrs bug fixes for error unwind bugs - Several rxe bug fixes: * Incorrect Rx packet validation * Using memory without a refcount * Syzkaller found use before initialization * Regression fix for missing locking with the tasklet conversion from this merge window - Have bnxt report the correct link properties to userspace, this was a regression in v6.3 - Several mlx5 bug fixes: * Kernel crash triggerable by userspace for the RAW ethernet profile * Defend against steering refcounting issues created by userspace * Incorrect change of QP port affinity parameters in some LAG configurations - Fix mlx5 Q counters: * Do not over allocate Q counters to allow userspace to use the full port capacity * Kernel crash triggered by eswitch due to mis-use of Q counters * Incorrect mlx5_device for Q counters in some LAG configurations - Properly implement the IBA spec restricting privileged qkeys to root - Always an error when reading from a disassociated device's event queue - isert bug fixes: * Avoid a deadlock with the CM handler and CM ID destruction * Correct list corruption due to incorrect locking * Fix a use after free around connection tear down" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/rxe: Fix rxe_cq_post IB/isert: Fix incorrect release of isert connection IB/isert: Fix possible list corruption in CMA handler IB/isert: Fix dead lock in ib_isert RDMA/mlx5: Fix affinity assignment IB/uverbs: Fix to consider event queue closing also upon non-blocking mode RDMA/uverbs: Restrict usage of privileged QKEYs RDMA/cma: Always set static rate to 0 for RoCE RDMA/mlx5: Fix Q-counters query in LAG mode RDMA/mlx5: Remove vport Q-counters dependency on normal Q-counters RDMA/mlx5: Fix Q-counters per vport allocation RDMA/mlx5: Create an indirect flow table for steering anchor RDMA/mlx5: Initiate dropless RQ for RAW Ethernet functions RDMA/rxe: Fix the use-before-initialization error of resp_pkts RDMA/bnxt_re: Fix reporting active_{speed,width} attributes RDMA/rxe: Fix ref count error in check_rkey() RDMA/rxe: Fix packet length checks RDMA/rtrs: Fix rxe_dealloc_pd warning RDMA/rtrs: Fix the last iu->buf leak in err path
2023-06-11RDMA/mlx5: Fix affinity assignmentMark Bloch1-0/+12
The cited commit aimed to ensure that Virtual Functions (VFs) assign a queue affinity to a Queue Pair (QP) to distribute traffic when the LAG master creates a hardware LAG. If the affinity was set while the hardware was not in LAG, the firmware would ignore the affinity value. However, this commit unintentionally assigned an affinity to QPs on the LAG master's VPORT even if the RDMA device was not marked as LAG-enabled. In most cases, this was not an issue because when the hardware entered hardware LAG configuration, the RDMA device of the LAG master would be destroyed and a new one would be created, marked as LAG-enabled. The problem arises when a user configures Equal-Cost Multipath (ECMP). In ECMP mode, traffic can be directed to different physical ports based on the queue affinity, which is intended for use by VPORTS other than the E-Switch manager. ECMP mode is supported only if both E-Switch managers are in switchdev mode and the appropriate route is configured via IP. In this configuration, the RDMA device is not destroyed, and we retain the RDMA device that is not marked as LAG-enabled. To ensure correct behavior, Send Queues (SQs) opened by the E-Switch manager through verbs should be assigned strict affinity. This means they will only be able to communicate through the native physical port associated with the E-Switch manager. This will prevent the firmware from assigning affinity and will not allow the SQs to be remapped in case of failover. Fixes: 802dcc7fc5ec ("RDMA/mlx5: Support TX port affinity for VF drivers in LAG mode") Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/425b05f4da840bc684b0f7e8ebf61aeb5cef09b0.1685960567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11RDMA/mlx5: Handle DCT QP logic separately from low level QP interfacePatrisious Haddad1-1/+0
Previously when destroying a DCT, if the firmware function for the destruction failed, the common resource would have been destroyed either way, since it was destroyed before the firmware object. Which leads to kernel warning "refcount_t: underflow" which indicates possible use-after-free. Which is triggered when we try to destroy the common resource for the second time and execute refcount_dec_and_test(&common->refcount). So, let's fix the destruction order by factoring out the DCT QP logic to be in separate XArray database. refcount_t: underflow; use-after-free. WARNING: CPU: 8 PID: 1002 at lib/refcount.c:28 refcount_warn_saturate+0xd8/0xe0 Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core fuse CPU: 8 PID: 1002 Comm: python3 Not tainted 5.16.0-rc5+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0xd8/0xe0 Code: ff 48 c7 c7 18 f5 23 82 c6 05 60 70 ff 00 01 e8 d0 0a 45 00 0f 0b c3 48 c7 c7 c0 f4 23 82 c6 05 4c 70 ff 00 01 e8 ba 0a 45 00 <0f> 0b c3 0f 1f 44 00 00 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 RSP: 0018:ffff8881221d3aa8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8881313e8d40 RCX: ffff88852cc1b5c8 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88852cc1b5c0 RBP: ffff888100f70000 R08: ffff88853ffd1ba8 R09: 0000000000000003 R10: 00000000fffff000 R11: 3fffffffffffffff R12: 0000000000000246 R13: ffff888100f71fa0 R14: ffff8881221d3c68 R15: 0000000000000020 FS: 00007efebbb13740(0000) GS:ffff88852cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005611aac29f80 CR3: 00000001313de004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> destroy_resource_common+0x6e/0x95 [mlx5_ib] mlx5_core_destroy_rq_tracked+0x38/0xbe [mlx5_ib] mlx5_ib_destroy_wq+0x22/0x80 [mlx5_ib] ib_destroy_wq_user+0x1f/0x40 [ib_core] uverbs_free_wq+0x19/0x40 [ib_uverbs] destroy_hw_idr_uobject+0x18/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2f/0x190 [ib_uverbs] uobj_destroy+0x3c/0x80 [ib_uverbs] ib_uverbs_cmd_verbs+0x3e4/0xb80 [ib_uverbs] ? uverbs_free_wq+0x40/0x40 [ib_uverbs] ? ip_list_rcv+0xf7/0x120 ? netif_receive_skb_list_internal+0x1b6/0x2d0 ? task_tick_fair+0xbf/0x450 ? __handle_mm_fault+0x11fc/0x1450 ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs] __x64_sys_ioctl+0x3e4/0x8e0 ? handle_mm_fault+0xb9/0x210 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7efebc0be17b Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe71813e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffe71813fb8 RCX: 00007efebc0be17b RDX: 00007ffe71813fa0 RSI: 00000000c0181b01 RDI: 0000000000000005 RBP: 00007ffe71813f80 R08: 00005611aae96020 R09: 000000000000004f R10: 00007efebbf9ffa0 R11: 0000000000000246 R12: 00007ffe71813f80 R13: 00007ffe71813f4c R14: 00005611aae2eca0 R15: 00007efeae6c89d0 </TASK> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4470888466c8a898edc9833286967529cc5f3c0d.1685953497.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-11RDMA/mlx5: Reduce QP table exposureLeon Romanovsky1-9/+0
driver.h is common header to whole mlx5 code base, but struct mlx5_qp_table is used in mlx5_ib driver only. So move that struct to be under sole responsibility of mlx5_ib. Link: https://lore.kernel.org/r/bec0dc1158e795813b135d1143147977f26bf668.1685953497.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-06-10net/mlx5: Add new page type for EC VF pagesDaniel Jurgens1-0/+1
When the embedded cpu supports SRIOV it can be enabled and disabled independently from the host SRIOV. Track the pages separately so we can properly wait for returned VF pages. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-10net/mlx5: Update vport caps query/set for EC VFsDaniel Jurgens1-1/+1
These functions are for query/set by vport, there was an underlying assumption that vport was equal to function ID. That's not the case for EC VF functions. Set the ec_vf_function bit accordingly. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-10net/mlx5: Enable devlink port for embedded cpu VF vportsDaniel Jurgens1-0/+6
Enable creation of a devlink port for EC VF vports. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-10net/mlx5: mlx5_ifc updates for embedded CPU SRIOVDaniel Jurgens1-3/+8
Add ec_vf_vport_base to HCA Capabilities 2. This indicates the base vport of embedded CPU virtual functions that are connected to the eswitch. Add ec_vf_function to query/set_hca_caps. If set this indicates accessing a virtual function on the embedded CPU by function ID. This should only be used with other_function set to 1. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-08net/mlx5e: Expose catastrophic steering error countersLama Kayal1-2/+10
Add generated_pkt_steering_fail and handled_pkt_steering_fail to devlink heatlth reporter. generated_pkt_steering_fail indicates the number of packets dropped due to illegal steering operation within the vport steering domain. handled_pkt_steering_fail indicates the number of packets dropped due to illegal steering operation, originated by the vport. Also, update devlink reporter functionality documentation with the newly exposed counters. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-08{net/RDMA}/mlx5: introduce lag_for_each_peerShay Drory1-1/+7
Introduce a generic APIs to iterate over all the devices which are part of the LAG. This API replace mlx5_lag_get_peer_mdev() which retrieve only a single peer device from the lag. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-25net/mlx5e: Use query_special_contexts cmd only once per mdevDragos Tatulea1-0/+1
Don't query the firmware so many times (num rqs * num wqes * wqe frags) because it slows down linearly the interface creation time when the product is larger. Do it only once per mdev and store the result in mlx5e_param. Due to helper function being called from different files, move it to an appropriate location. Rename the function with a proper prefix and add a small cleanup. This fix applies only for legacy rq. Fixes: 1b1e4868836a ("net/mlx5e: Use query_special_contexts for mkeys") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-23net/mlx5: DR, Check force-loopback RC QP capability independently from RoCEYevgeny Kliteynik1-1/+3
SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB (loopback), and FL (force-loopback) QP is preferred for performance. FL is available when RoCE is enabled or disabled based on RoCE caps. This patch adds reading of FL capability from HCA caps in addition to the existing reading from RoCE caps, thus fixing the case where we didn't have loopback enabled when RoCE was disabled. Fixes: 7304d603a57a ("net/mlx5: DR, Add support for force-loopback QP") Signed-off-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-9/+22
Pull rdma updates from Jason Gunthorpe: "Usual wide collection of unrelated items in drivers: - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5, rxe, usnic, usnic, bnxt_re, ocrdma, iser: - remove unnecessary NULL checks - kmap obsolescence - pci_enable_pcie_error_reporting() obsolescence - unused variables and macros - trace event related warnings - casting warnings - Code cleanups for irdm and erdma - EFA reporting of 128 byte PCIe TLP support - mlx5 more agressively uses the out of order HW feature - Big rework of how state machines and tasks work in rxe - Fix a syzkaller found crash netdev refcount leak in siw - bnxt_re revises their HW description header - Congestion control for bnxt_re - Use mmu_notifiers more safely in hfi1 - mlx5 gets better support for PCIe relaxed ordering inside VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (81 commits) RDMA/efa: Add rdma write capability to device caps RDMA/mlx5: Use correct device num_ports when modify DC RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call RDMA/rxe: Fix spinlock recursion deadlock on requester RDMA/mlx5: Fix flow counter query via DEVX RDMA/rxe: Protect QP state with qp->state_lock RDMA/rxe: Move code to check if drained to subroutine RDMA/rxe: Remove qp->req.state RDMA/rxe: Remove qp->comp.state RDMA/rxe: Remove qp->resp.state RDMA/mlx5: Allow relaxed ordering read in VFs and VMs net/mlx5: Update relaxed ordering read HCA capabilities RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write RDMA: Add ib_virt_dma_to_page() RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame() RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c IB/hfi1: Place struct mmu_rb_handler on cache line start IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests ...
2023-04-22Merge tag 'mlx5-updates-2023-04-20' of ↵Jakub Kicinski2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-04-20 1) Dragos Improves RX page pool, and provides some fixes to his previous series: 1.1) Fix releasing page_pool for striding RQ and legacy RQ nonlinear case 1.2) Hook NAPIs to page pools to gain more performance. 2) From Roi, Some cleanups to TC and eswitch modules. 3) Maher migrates vnic diagnostic counters reporting from debugfs to a dedicated devlink health reporter Maher Says: =========== net/mlx5: Expose vnic diagnostic counters using devlink Currently, vnic diagnostic counters are exposed through the following debugfs: $ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/ cq_overrun quota_exceeded_command total_q_under_processor_handle invalid_command send_queue_priority_update_flow nic_receive_steering_discard The current design does not allow the hypervisor to view the diagnostic counters of its VFs, in case the VFs get bound to a VM. In other words, the counters are not exposed for representor interfaces. Furthermore, the debugfs design is inconvenient future-wise, in case more counters need to be reported by the driver in the future. As these counters pertain to vNIC health, it is more appropriate to utilize the devlink health reporter to expose them. Thus, this patchest includes the following changes: * Drop the current vnic diagnostic counters debugfs interface. * Add a vnic devlink health reporter for PFs/VFs core devices, which when diagnosed will dump vnic diagnostic counter values that are queried from FW. * Add a vnic devlink health reporter for the representor interface, which serves the same purpose listed in the previous point, in addition to allowing the hypervisor to view its VFs diagnostic counters, even when the VFs are bounded to external VMs. Example of devlink health reporter usage is: $devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 =========== 4) SW steering fixes and improvements Yevgeny Kliteynik Says: ======================= These short patch series are just small fixes / improvements for SW steering: - Patch 1: Fix dumping of legacy modify_hdr in debug dump to align to what is expected by parser - Patch 2: Have separate threshold for ICM sync per ICM type - Patch 3: Add more info to the steering debug dump - Linux version and device name - Patch 4: Keep track of number of buddies that are currently in use per domain per buddy type ======================= * tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Update op_mode to op_mod for port selection net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set() net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc() net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn() net/mlx5e: RX, Hook NAPIs to page pools net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ net/mlx5e: Add vnic devlink health reporter to representors net/mlx5: Add vnic devlink health reporter to PFs/VFs Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports" Revert "net/mlx5: Expose steering dropped packets counter" net/mlx5: DR, Add memory statistics for domain object net/mlx5: DR, Add more info in domain dbg dump net/mlx5: DR, Calculate sync threshold of each pool according to its type net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump ==================== Link: https://lore.kernel.org/r/20230421013850.349646-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21net/mlx5: Update op_mode to op_mod for port selectionRoi Dayan1-1/+1
To be consistent with the other enum keys use OP_MOD instead of OP_MODE. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-21net/mlx5: Add vnic devlink health reporter to PFs/VFsMaher Sanalla1-0/+1
Create a vnic devlink health reporter for PFs/VFs interfaces. The reporter's diagnose callback displays the values of vNIC/vport transport debug counters of PFs/VFs, as follows: $ devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 Moreover, add documentation on the reporter functionality and the counters description. While at it, expose the vNIC counters diagnose function to be used by the downstream patch, which will reveal the counters for representor interfaces. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-5/+0
Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20Revert "net/mlx5: Enable management PF initialization"Jakub Kicinski1-5/+0
This reverts commit fe998a3c77b9f989a30a2a01fb00d3729a6d53a4. Paul reports that it causes a regression with IB on CX4 and FW 12.18.1000. In addition I think that the concept of "management PF" is not fully accepted and requires a discussion. Fixes: fe998a3c77b9 ("net/mlx5: Enable management PF initialization") Reported-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/all/CAHC9VhQ7A4+msL38WpbOMYjAqLp0EtOjeLh4Dc6SQtD6OUvCQg@mail.gmail.com/ Link: https://lore.kernel.org/r/20230413222547.56901-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-18RDMA/mlx5: Fix flow counter query via DEVXMark Bloch1-1/+2
Commit cited in "fixes" tag added bulk support for flow counters but it didn't account that's also possible to query a counter using a non-base id if the counter was allocated as bulk. When a user performs a query, validate the flow counter id given in the mailbox is inside the valid range taking bulk value into account. Fixes: 208d70f562e5 ("IB/mlx5: Support flow counters offset for bulk counters") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://lore.kernel.org/r/79d7fbe291690128e44672418934256254d93115.1681377114.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-18net/mlx5e: Add IPsec packet offload tunnel bitsLeon Romanovsky1-2/+6
Extend packet reformat types and flow table capabilities with IPsec packet offload tunnel bits. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-16net/mlx5: Update relaxed ordering read HCA capabilitiesAvihai Horon1-2/+3
Rename existing HCA capability relaxed_ordering_read to relaxed_ordering_read_pci_enabled. This is in accordance with recent PRM change to better describe the capability, as it's set only if both the device supports relaxed ordering (RO) read and RO is enabled in PCI config space. In addition, add new HCA capability relaxed_ordering_read which is set if the device supports RO read, regardless of RO in PCI config space. This will be used in the following patch to allow RO in VFs and VMs. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Link: https://lore.kernel.org/r/caa0002fd8135086357dfcc368e2f5cc73b08480.1681131553.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-14Merge tag 'mlx5-updates-2023-04-11' of ↵Jakub Kicinski4-2/+46
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-04-11 1) Vlad adds the support for linux bridge multicast offload support Patches #1 through #9 Synopsis Vlad Says: ============== Implement support of bridge multicast offload in mlx5. Handle port object attribute SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED notification to toggle multicast offload and bridge snooping support on bridge. Handle port object SWITCHDEV_OBJ_ID_PORT_MDB notification to attach a bridge port to MDB. Steering architecture Existing offload infrastructure relies on two levels of flow tables - bridge ingress and egress. For multicast offload the architecture is extended with additional layer of per-port multicast replication tables. Such tables filter loopback traffic (so packets are not replicated to their source port) and pop VLAN headers for "untagged" VLANs. The tables are referenced by the MDB rules in egress table. MDB egress rule can point to multiple per-port multicast tables, which causes matching multicast traffic to be replicated to all of them, and, consecutively, to several bridge ports: +--------+--+ +---------------------------------------> Port 1 | | | +-^------+--+ | | | | +-----------------------------------------+ | +---------------------------+ | | EGRESS table | | +--> PORT 1 multicast table | | +----------------------------------+ +-----------------------------------------+ | | +---------------------------+ | | INGRESS table | | | | | | | | +----------------------------------+ | dst_mac=P1,vlan=X -> pop vlan, goto P1 +--+ | | FG0: | | | | | dst_mac=P1,vlan=Y -> pop vlan, goto P1 | | | src_port=dst_port -> drop | | | src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2 +--+ | | FG1: | | | ... | | dst_mac=P2,vlan=Y -> goto P2 | | | | VLAN X -> pop, goto port | | | | | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+ | ... | | +----------------------------------+ | | | | | VLAN Y -> pop, goto port +-------+ +-----------------------------------------+ | | | FG3: | | | | matchall -> goto port | | | | | | | +---------------------------+ | | | | | | +--------+--+ +---------------------------------------> Port 2 | | | +-^------+--+ | | | | | +---------------------------+ | +--> PORT 2 multicast table | | +---------------------------+ | | | | | FG0: | | | src_port=dst_port -> drop | | | FG1: | | | VLAN X -> pop, goto port | | | ... | | | | | | FG3: | | | matchall -> goto port +-------+ | | +---------------------------+ Patches overview: - Patch 1 adds hardware definition bits for capabilities required to replicate multicast packets to multiple per-port tables. These bits are used by following patches to only attempt multicast offload if firmware and hardware provide necessary support. - Pathces 2-4 patches are preparations and refactoring. - Patch 5 implements necessary infrastructure to toggle multicast offload via SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED port object attribute notification. This also enabled IGMP and MLD snooping. - Patch 6 implements per-port multicast replication tables. It only supports filtering of loopback packets. - Patch 7 extends per-port multicast tables with VLAN pop support for 'untagged' VLANs. - Patch 8 handles SWITCHDEV_OBJ_ID_PORT_MDB port object notifications. It creates MDB replication rules in egress table that can replicate packets to multiple per-port multicast tables. - Patch 9 adds tracepoints for MDB events. ============== 2) Parav Create a new allocation profile for SFs, to save on memory 3) Yevgeny provides some initial patches for upcoming software steering support new pattern/arguments type of modify_header actions. Starting with ConnectX-6 DX, we use a new design of modify_header FW object. The current modify_header object allows for having only limited number of these FW objects, which means that we are limited in the number of offloaded flows that require modify_header action. As a preparation Yevgeny provides the following 4 patches: - Patch 1: Add required mlx5_ifc HW bits - Patch 2, 3: Add new WQE type and opcode that is required for pattern/arg support and adds appropriate support in dr_send.c - Patch 4: Add ICM pool for modify-header-pattern objects and implement patterns cache, allowing patterns reuse for different flows * tag 'mlx5-updates-2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: DR, Add modify-header-pattern ICM pool net/mlx5: DR, Prepare sending new WQE type net/mlx5: Add new WQE for updating flow table net/mlx5: Add mlx5_ifc bits for modify header argument net/mlx5: DR, Set counter ID on the last STE for STEv1 TX net/mlx5: Create a new profile for SFs net/mlx5: Bridge, add tracepoints for multicast net/mlx5: Bridge, implement mdb offload net/mlx5: Bridge, support multicast VLAN pop net/mlx5: Bridge, add per-port multicast replication tables net/mlx5: Bridge, snoop igmp/mld packets net/mlx5: Bridge, extract code to lookup parent bridge of port net/mlx5: Bridge, move additional data structures to priv header net/mlx5: Bridge, increase bridge tables sizes net/mlx5: Add mlx5_ifc definitions for bridge multicast support ==================== Link: https://lore.kernel.org/r/20230412040752.14220-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+12
Conflicts: tools/testing/selftests/net/config 62199e3f1658 ("selftests: net: Add VXLAN MDB test") 3a0385be133e ("selftests: add the missing CONFIG_IP_SCTP in net config") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash typeJesper Dangaard Brouer1-2/+12
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type via mapping table. The mlx5 hardware can also identify and RSS hash IPSEC. This indicate hash includes SPI (Security Parameters Index) as part of IPSEC hash. Extend xdp core enum xdp_rss_hash_type with IPSEC hash type. Fixes: bc8d405b1ba9 ("net/mlx5e: Support RX XDP metadata") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/168132892548.340624.11185734579430124869.stgit@firesoul Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-12net/mlx5: Add new WQE for updating flow tableYevgeny Kliteynik2-0/+12
Add new WQE type: FLOW_TBL_ACCESS, which will be used for writing modify header arguments. This type has specific control segment and special data segment. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-12net/mlx5: Add mlx5_ifc bits for modify header argumentYevgeny Kliteynik1-1/+27
Add enum value for modify-header argument object and mlx5_bits for the related capabilities. Signed-off-by: Muhammad Sammar <muhammads@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-12net/mlx5: Create a new profile for SFsParav Pandit1-0/+1
Create a new profile for SFs in order to disable the command cache. Each function command cache consumes ~500KB of memory, when using a large number of SFs this savings is notable on memory constarined systems. Use a new profile to provide for future differences between SFs and PFs. The mr_cache not used for non-PF functions, so it is excluded from the new profile. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-12net/mlx5: Add mlx5_ifc definitions for bridge multicast supportVlad Buslov1-1/+6
Add the required hardware definitions to mlx5_ifc: fdb_uplink_hairpin, fdb_multi_path_any_table_limit_regc, fdb_multi_path_any_table. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-29net/mlx5: Introduce other vport query for Q-countersPatrisious Haddad1-3/+10
These new fields in QUERY_Q_COUNTER command allow us to access another vport counters during the query command, which is specially useful to query representor vports. In addition also add the required caps to check if this capability is actually supported. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Link: https://lore.kernel.org/r/75c73a4a0e60f18c37b35a4a11ca2e2415e4a6f3.1679566038.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29Merge tag 'mlx5-updates-2023-03-20' of ↵Jakub Kicinski1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-03-20 mlx5 dynamic msix This patch series adds support for dynamic msix vectors allocation in mlx5. Eli Cohen Says: ================ The following series of patches modifies mlx5_core to work with the dynamic MSIX API. Currently, mlx5_core allocates all the interrupt vectors it needs and distributes them amongst the consumers. With the introduction of dynamic MSIX support, which allows for allocation of interrupts more than once, we now allocate vectors as we need them. This allows other drivers running on top of mlx5_core to allocate interrupt vectors for their own use. An example for this is mlx5_vdpa, which uses these vectors to propagate interrupts directly from the hardware to the vCPU [1]. As a preparation for using this series, a use after free issue is fixed in lib/cpu_rmap.c and the allocator for rmap entries has been modified. A complementary API for irq_cpu_rmap_add() has also been introduced. [1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/patch/?id=0f2bf1fcae96a83b8c5581854713c9fc3407556e ================ * tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Provide external API for allocating vectors net/mlx5: Use one completion vector if eth is disabled net/mlx5: Refactor calculation of required completion vectors net/mlx5: Move devlink registration before mlx5_load net/mlx5: Use dynamic msix vectors allocation net/mlx5: Refactor completion irq request/release code net/mlx5: Improve naming of pci function vectors net/mlx5: Use newer affinity descriptor net/mlx5: Modify struct mlx5_irq to use struct msi_map net/mlx5: Fix wrong comment net/mlx5e: Coding style fix, add empty line lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_add lib: cpu_rmap: Use allocator for rmap entries lib: cpu_rmap: Avoid use after free on rmap->obj array entries ==================== Link: https://lore.kernel.org/r/20230324231341.29808-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-25net/mlx5: Provide external API for allocating vectorsEli Cohen1-0/+6
Provide external API to be used by other drivers relying on mlx5_core, for allocating MSIX vectors. An example for such a driver would be mlx5_vdpa. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
2023-03-23net/mlx5: Expose bits for enabling out-of-order by defaultOr Har-Toov1-3/+7
Add needed HW bits for enabling out-of-order by default and use go_back_n when out-of-order is not needed. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/75d6dfe263989a05c08c43406132b336ea12d00a.1679230449.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-18net/mlx5e: TC, Add support for VxLAN GBP encap/decap flows offloadGavin Li2-2/+17
Add HW offloading support for TC flows with VxLAN GBP encap/decap. Example of encap rule: tc filter add dev eth0 protocol ip ingress flower \ action tunnel_key set id 42 vxlan_opts 512 \ action mirred egress redirect dev vxlan1 Example of decap rule: tc filter add dev vxlan1 protocol ip ingress flower \ enc_key_id 42 enc_dst_port 4789 vxlan_opts 1024 \ action tunnel_key unset action mirred egress redirect dev eth0 Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net/mlx5: Move needed PTYS functions to core layerGal Pressman1-0/+16
Downstream patches require devlink params to access the PTYS register, move the needed functions from mlx5e to the core layer. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-11-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-16net/mlx5: Implement thermal zoneSandipan Patra2-0/+29
Implement thermal zone support for mlx5 based HW. The NIC uses temperature sensor provided by ASIC to report current temperature to thermal core. Signed-off-by: Sandipan Patra <spatra@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230314054234.267365-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds3-4/+21
Pull rdma updates from Jason Gunthorpe: "Quite a small cycle this time, even with the rc8. I suppose everyone went to sleep over xmas. - Minor driver updates for hfi1, cxgb4, erdma, hns, irdma, mlx5, siw, mana - inline CQE support for hns - Have mlx5 display device error codes - Pinned DMABUF support for irdma - Continued rxe cleanups, particularly converting the MRs to use xarray - Improvements to what can be cached in the mlx5 mkey cache" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (61 commits) IB/mlx5: Extend debug control for CC parameters IB/hfi1: Fix sdma.h tx->num_descs off-by-one errors IB/hfi1: Fix math bugs in hfi1_can_pin_pages() RDMA/irdma: Add support for dmabuf pin memory regions RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys RDMA/rxe: Fix missing memory barriers in rxe_queue.h RDMA/mana_ib: Fix a bug when the PF indicates more entries for registering memory on first packet RDMA/rxe: Remove rxe_alloc() RDMA/cma: Distinguish between sockaddr_in and sockaddr_in6 by size Subject: RDMA/rxe: Handle zero length rdma iw_cxgb4: Fix potential NULL dereference in c4iw_fill_res_cm_id_entry() RDMA/mlx5: Use rdma_umem_for_each_dma_block() RDMA/umem: Remove unused 'work' member from struct ib_umem RDMA/irdma: Cap MSIX used to online CPUs + 1 RDMA/mlx5: Check reg_create() create for errors RDMA/restrack: Correct spelling RDMA/cxgb4: Fix potential null-ptr-deref in pass_establish() ...
2023-02-19IB/mlx5: Extend debug control for CC parametersEdward Srouji1-0/+12
This patch adds rtt_resp_dscp to the current debug controllability of congestion control (CC) parameters. rtt_resp_dscp can be read or written through debugfs. If set, its value overwrites the DSCP of the generated RTT response. Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://lore.kernel.org/r/1dcc3440ee53c688f19f579a051ded81a2aaa70a.1676538714.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-17Merge mlx5-next into rdma.git for-nextJason Gunthorpe5-5/+28
Synchronize the shared mlx5 branch with net: - From Jiri: fixe a deadlock in mlx5_ib's netdev notifier unregister. - From Mark and Patrisious: add IPsec RoCEv2 support. - From Or: Rely on firmware to get special mkeys * branch mlx5-next: RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys net/mlx5: Configure IPsec steering for egress RoCEv2 traffic net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic net/mlx5: Add IPSec priorities in RDMA namespaces net/mlx5: Implement new destination type TABLE_TYPE net/mlx5: Introduce new destination type TABLE_TYPE RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17net/mlx5: Change define name for 0x100 lkey valueOr Har-Toov1-1/+1
Change define of 0x100 lkey value from MLX5_INVALID_LKEY to be MLX5_TERMINATE_SCATTER_LIST_LKEY as 0x100 is the value of terminate_scatter_list_mkey. Link: https://lore.kernel.org/r/3a116dc3fbae4cb6b76a63d27d418830b06ade0c.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17net/mlx5: Expose bits for querying special mkeysOr Har-Toov1-2/+8
Add needed HW bits to query the values of all special mkeys. Link: https://lore.kernel.org/r/080ebb563a9717c15b1ea75d669aede676df386b.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16Merge branch 'mlx5-next' of ↵Jakub Kicinski2-2/+13
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== mlx5-next changes Following previous conversations [1] and our clear commitment to do the TC work [2], please pull mlx5-next shared branch, which includes low-level steering logic to allow RoCEv2 traffic to be encrypted/ decrypted through IPsec. [1] https://lore.kernel.org/all/20230126230815.224239-1-saeed@kernel.org/ [2] https://lore.kernel.org/all/Y+Z7lVVWqnRBiPh2@nvidia.com/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Configure IPsec steering for egress RoCEv2 traffic net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic net/mlx5: Add IPSec priorities in RDMA namespaces net/mlx5: Implement new destination type TABLE_TYPE net/mlx5: Introduce new destination type TABLE_TYPE ==================== Link: https://lore.kernel.org/r/20230215095624.1365200-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15net/mlx5: Add IPSec priorities in RDMA namespacesMark Zhang1-0/+2
Add IPSec flow steering priorities in RDMA namespaces. This allows adding tables/rules to forward RoCEv2 traffic to the IPSec crypto tables in NIC_TX domain, and accept RoCEv2 traffic from NIC_RX domain. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-02-15net/mlx5: Implement new destination type TABLE_TYPEMark Zhang1-0/+1
Implement new destination type to support flow transition between different table types. e.g. from NIC_RX to RDMA_RX or from RDMA_TX to NIC_TX. The new destination is described in the tracepoint as follows: "mlx5_fs_add_rule: rule=00000000d53cd0ed fte=0000000048a8a6ed index=0 sw_action=<> [dst] flow_table_type=7 id:262152" Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>