summaryrefslogtreecommitdiff
path: root/drivers/infiniband/sw/rxe
AgeCommit message (Collapse)AuthorFilesLines
2024-06-09RDMA/rxe: Fix responder length checking for UD request packetsHonggang LI1-0/+13
According to the IBA specification: If a UD request packet is detected with an invalid length, the request shall be an invalid request and it shall be silently dropped by the responder. The responder then waits for a new request packet. commit 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length checking") defers responder length check for UD QPs in function `copy_data`. But it introduces a regression issue for UD QPs. When the packet size is too large to fit in the receive buffer. `copy_data` will return error code -EINVAL. Then `send_data_in` will return RESPST_ERR_MALFORMED_WQE. UD QP will transfer into ERROR state. Fixes: 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length checking") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240523094617.141148-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-05-30RDMA/rxe: Fix data copy for IB_SEND_INLINEHonggang LI1-1/+1
For RDMA Send and Write with IB_SEND_INLINE, the memory buffers specified in sge list will be placed inline in the Send Request. The data should be copied by CPU from the virtual addresses of corresponding sge list DMA addresses. Cc: stable@kernel.org Fixes: 8d7c7c0eeb74 ("RDMA: Add ib_virt_dma_to_page()") Signed-off-by: Honggang LI <honggangli@163.com> Link: https://lore.kernel.org/r/20240516095052.542767-1-honggangli@163.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-05-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds11-165/+113
Pull rdma updates from Jason Gunthorpe: "Aside from the usual things this has an arch update for __iowrite64_copy() used by the RDMA drivers. This API was intended to generate large 64 byte MemWr TLPs on PCI. These days most processors had done this by just repeating writel() in a loop. S390 and some new ARM64 designs require a special helper to get this to generate. - Small improvements and fixes for erdma, efa, hfi1, bnxt_re - Fix a UAF crash after module unload on leaking restrack entry - Continue adding full RDMA support in mana with support for EQs, GID's and CQs - Improvements to the mkey cache in mlx5 - DSCP traffic class support in hns and several bug fixes - Cap the maximum number of MADs in the receive queue to avoid OOM - Another batch of rxe bug fixes from large scale testing - __iowrite64_copy() optimizations for write combining MMIO memory - Remove NULL checks before dev_put/hold() - EFA support for receive with immediate - Fix a recent memleaking regression in a cma error path" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (70 commits) RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw RDMA/IPoIB: Fix format truncation compilation errors bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq RDMA/efa: Support QP with unsolicited write w/ imm. receive IB/hfi1: Remove generic .ndo_get_stats64 IB/hfi1: Do not use custom stat allocator RDMA/hfi1: Use RMW accessors for changing LNKCTL2 RDMA/mana_ib: implement uapi for creation of rnic cq RDMA/mana_ib: boundary check before installing cq callbacks RDMA/mana_ib: introduce a helper to remove cq callbacks RDMA/mana_ib: create and destroy RNIC cqs RDMA/mana_ib: create EQs for RNIC CQs RDMA/core: Remove NULL check before dev_{put, hold} RDMA/ipoib: Remove NULL check before dev_{put, hold} RDMA/mlx5: Remove NULL check before dev_{put, hold} RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources. RDMA/core: Add an option to display driver-specific QPs in the rdmatool RDMA/efa: Add shutdown notifier RDMA/mana_ib: Fix missing ret value IB/mlx5: Use __iowrite64_copy() for write combining stores ...
2024-04-22RDMA/rxe: Let destroy qp succeed with stuck packetBob Pearson2-12/+32
In some situations a sent packet may get queued in the NIC longer than than timeout of a ULP. Currently if this happens the ULP may try to reset the link by destroying the qp and setting up an alternate connection but will fail because the rxe driver is waiting for the packet to finish getting sent and be returned to the skb destructor function where the qp reference holding things up will be dropped. This patch modifies the way that the qp is passed to the destructor to pass the qp index and not a qp pointer. Then the destructor will attempt to lookup the qp from its index and if it fails exit early. This requires taking a reference on the struct sock rather than the qp allowing the qp to be destroyed while the sk is still around waiting for the packet to finish. Link: https://lore.kernel.org/r/20240329145513.35381-15-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Get rid of pkt resend on errBob Pearson2-18/+3
Currently the rxe_driver detects packet drops by ip_local_out() which occur before the packet is sent on the wire and attempts to resend them. This is redundant with the usual retry mechanism which covers packets that get dropped in transit to or from the remote node. The way this is implemented is not robust since it sets need_req_skb and waits for the number of local skbs outstanding for this qp to drop below a low water mark. This is racy since the skb may be sent to the destructor before the requester can set the need_req_skb flag. This will cause a deadlock in the send path for that qp. This patch removes this mechanism since the normal retry path will correct the error and resend the packet and it makes no difference if the packet is dropped locally or later. Link: https://lore.kernel.org/r/20240329145513.35381-14-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Make rxe_loopback match rxe_send behaviorBob Pearson1-0/+6
The rxe send path currently counts the number of skbs outstanding between the rxe driver and the ethernet driver to prevent too many packets to accumulate waiting to send. This patch makes the local loopback path behave the same way. The loopback path forwards the packets to the receive path which will eventually call kfree_skb on all packets and drop the qp references. This makes the loopback path more useful for software testing. Link: https://lore.kernel.org/r/20240329145513.35381-13-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Fix incorrect rxe_put in error pathBob Pearson1-10/+2
In rxe_send() a ref is taken on the qp to keep it alive until the kfree_skb() has a chance to call the skb destructor rxe_skb_tx_dtor() which drops the reference. If the packet has an incorrect protocol the error path just calls kfree_skb() which will call the destructor which will drop the ref. Currently the driver also calls rxe_put() which is incorrect. Additionally since the packets sent to rxe_send() are under the control of the driver and it only ever produces IPV4 or IPV6 packets the simplest fix is to remove all the code in this block. Link: https://lore.kernel.org/r/20240329145513.35381-12-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 9eb7f8e44d13 ("IB/rxe: Move refcounting earlier in rxe_send()") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Don't call direct between tasksBob Pearson3-23/+4
Replace calls to rxe_run_task() with rxe_sched_task(). This prevents the tasks from all running on the same cpu. This change slightly reduces performance for single qp send and write benchmarks in loopback mode but greatly improves the performance with multiple qps because if run task is used all the work tends to be performed on one cpu. For actual on the wire benchmarks there is no noticeable performance change. Link: https://lore.kernel.org/r/20240329145513.35381-11-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Don't call rxe_requester from rxe_completerBob Pearson2-7/+11
Instead of rescheduling rxe_requester from rxe_completer() just extend the duration of rxe_sender() by one pass. Setting run_requester_again forces rxe_completer() to return 0 which will cause rxe_sender() to be called at least one more time. Link: https://lore.kernel.org/r/20240329145513.35381-10-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Don't schedule rxe_completer from rxe_requesterBob Pearson2-13/+2
Now that rxe_completer() is always called serially after rxe_requester() there is no reason to schedule rxe_completer() from rxe_requester(). Link: https://lore.kernel.org/r/20240329145513.35381-9-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Remove save/rollback_state in rxe_requesterBob Pearson1-38/+2
Now that req.task and comp.task are merged it is no longer necessary to call save_state() before calling rxe_xmit_pkt() and rollback_state() if rxe_xmit_pkt() fails. This was done originally to prevent races between rxe_completer() and rxe_requester() which now cannot happen. Link: https://lore.kernel.org/r/20240329145513.35381-8-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Merge request and complete tasksBob Pearson10-55/+63
Currently the rxe driver has three work queue tasks per qp. These are the req.task, comp.task and resp.task which call rxe_requester(), rxe_completer() and rxe_responder() respectively directly or on work queues. Each of these subroutines checks to see if there is work to be performed on the send queue or on the response packet queue or the request packet queue and will run until there is no work remaining or yield the cpu and reschedule itself until there is no work remaining. This commit combines the req.task and comp.task into a single send.task and renames the resp.task to the recv.task. The combined send.task calls rxe_requester() and rxe_completer() serially and continues until all work on both the send queue and the response packet queue are done. In various benchmarks the performance is either improved or left the same. At high scale there is a significant reduction in the load on the cpu. This is the first step in combining these two tasks. Once they are serialized cross rescheduling of req.task and comp.task can be more efficiently handled by just letting the send.task continue to run. This will be done in the next several patches. Link: https://lore.kernel.org/r/20240329145513.35381-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Remove redundant scheduling of rxe_completerBob Pearson1-5/+0
In rxe_post_send_kernel() if the qp is in the error state after posting the work requests the rxe_completer() task is scheduled. But, the only way to move the qp into the error state is to call rxe_qp_error() which also schedules the rxe_completer() task to drain the queues. Calling it a second time has no effect. This commit removes the redundant call. Link: https://lore.kernel.org/r/20240329145513.35381-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Allow good work requests to be executedBob Pearson1-1/+5
A previous commit incorrectly added an 'if(!err)' before scheduling the requester task in rxe_post_send_kernel(). But if there were send wrs successfully added to the send queue before a bad wr they might never get executed. This commit fixes this by scheduling the requester task if any wqes were successfully posted in rxe_post_send_kernel() in rxe_verbs.c. Link: https://lore.kernel.org/r/20240329145513.35381-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 5bf944f24129 ("RDMA/rxe: Add error messages") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22RDMA/rxe: Fix seg fault in rxe_comp_queue_pktBob Pearson1-3/+3
In rxe_comp_queue_pkt() an incoming response packet skb is enqueued to the resp_pkts queue and then a decision is made whether to run the completer task inline or schedule it. Finally the skb is dereferenced to bump a 'hw' performance counter. This is wrong because if the completer task is already running in a separate thread it may have already processed the skb and freed it which can cause a seg fault. This has been observed infrequently in testing at high scale. This patch fixes this by changing the order of enqueuing the packet until after the counter is accessed. Link: https://lore.kernel.org/r/20240329145513.35381-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 0b1e5b99a48b ("IB/rxe: Add port protocol stats") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-11RDMA/rxe: Return the correct errnoZhu Yanjun1-2/+2
In the function __rxe_add_to_pool, the function xa_alloc_cyclic is called. The return value of the function xa_alloc_cyclic is as below: " Return: 0 if the allocation succeeded without wrapping. 1 if the allocation succeeded after wrapping, -ENOMEM if memory could not be allocated or -EBUSY if there are no free entries in @limit. " But now the function __rxe_add_to_pool only returns -EINVAL. All the returned error value should be returned to the caller. Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240408142142.792413-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-04-01RDMA/rxe: Fix the problem "mutex_destroy missing"Yanjun.Zhu1-0/+2
When a mutex lock is not used any more, the function mutex_destroy should be called to mark the mutex lock uninitialized. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yanjun.Zhu <yanjun.zhu@linux.dev> Link: https://lore.kernel.org/r/20240314065140.27468-1-yanjun.zhu@linux.dev Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-02-04RDMA/rxe: Remove unused 'iova' parameter from rxe_mr_init_userGuoqing Jiang3-3/+3
This one is not needed since commit 954afc5a8fd8 ("RDMA/rxe: Use members of generic struct in rxe_mr"). Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20240202124144.16033-1-guoqing.jiang@linux.dev Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-01-25RDMA/rxe: Remove rxe_info from rxe_set_mtuLi Zhijian1-2/+0
commit 9ac01f434a1e ("RDMA/rxe: Extend dbg log messages to err and info") newly added this info. But it did only show null device when the rdma_rxe is being loaded because dev_name(rxe->ib_dev->dev) has not yet been assigned at the moment: "(null): rxe_set_mtu: Set mtu to 1024" Remove it to silent this message, check the mtu from it backend link instead if needed. CC: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240109083253.3629967-2-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-01-25RDMA/rxe: Improve newline in printing messagesLi Zhijian10-139/+139
Previously rxe_{dbg,info,err}() macros are appended built-in newline, but some users will add redundant newline sometimes. So remove the built-in newline for these macros. In terms of rxe_{dbg,info,err}_xxx() macros, because they don't have built-in newline, append newline when using them. CC: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20240109083253.3629967-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds7-102/+177
Pull rdma updates from Jason Gunthorpe: "Many small changes across the subystem, some highlights: - Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma, mthca, hns, and bnxt_re - siw now works over tunnel and other netdevs with a MAC address by removing assumptions about a MAC/GID from the connection manager - "Doorbell Pacing" for bnxt_re - this is a best effort scheme to allow userspace to slow down the doorbell rings if the HW gets full - irdma egress VLAN priority, better QP/WQ sizing - rxe bug fixes in queue draining and srq resizing - Support more ethernet speed options in the core layer - DMABUF support for bnxt_re - Multi-stage MTT support for erdma to allow much bigger MR registrations - A irdma fix with a CVE that came in too late to go to -rc, missing bounds checking for 0 length MRs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (87 commits) IB/hfi1: Reduce printing of errors during driver shut down RDMA/hfi1: Move user SDMA system memory pinning code to its own file RDMA/hfi1: Use list_for_each_entry() helper RDMA/mlx5: Fix trailing */ formatting in block comment RDMA/rxe: Fix redundant break statement in switch-case. RDMA/efa: Fix wrong resources deallocation order RDMA/siw: Call llist_reverse_order in siw_run_sq RDMA/siw: Correct wrong debug message RDMA/siw: Balance the reference of cep->kref in the error path Revert "IB/isert: Fix incorrect release of isert connection" RDMA/bnxt_re: Fix kernel doc errors RDMA/irdma: Prevent zero-length STAG registration RDMA/erdma: Implement hierarchical MTT RDMA/erdma: Refactor the storage structure of MTT entries RDMA/erdma: Renaming variable names and field names of struct erdma_mem RDMA/hns: Support hns HW stats RDMA/hns: Dump whole QP/CQ/MR resource in raw RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp() RDMA/mlx4: Copy union directly RDMA/irdma: Drop unused kernel push code ...
2023-08-22RDMA/rxe: Fix redundant break statement in switch-case.Rohit Chavan1-1/+0
Removed unreachable break statement after return. Signed-off-by: Rohit Chavan <roheetchavan@gmail.com> Link: https://lore.kernel.org/r/20230822091304.7312-1-roheetchavan@gmail.com Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-07-31RDMA/rxe: Fix incomplete state save in rxe_requesterBob Pearson1-20/+25
If a send packet is dropped by the IP layer in rxe_requester() the call to rxe_xmit_packet() can fail with err == -EAGAIN. To recover, the state of the wqe is restored to the state before the packet was sent so it can be resent. However, the routines that save and restore the state miss a significnt part of the variable state in the wqe, the dma struct which is used to process through the sge table. And, the state is not saved before the packet is built which modifies the dma struct. Under heavy stress testing with many QPs on a fast node sending large messages to a slow node dropped packets are observed and the resent packets are corrupted because the dma struct was not restored. This patch fixes this behavior and allows the test cases to succeed. Fixes: 3050b9985024 ("IB/rxe: Fix race condition between requester and completer") Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-31RDMA/rxe: Fix rxe_modify_srqBob Pearson2-30/+36
This patch corrects an error in rxe_modify_srq where if the caller changes the srq size the actual new value is not returned to the caller since it may be larger than what is requested. Additionally it open codes the subroutine rcv_wqe_size() which adds very little value, and makes some whitespace changes. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230620140142.9452-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-31RDMA/rxe: Fix unsafe drain work queue codeBob Pearson2-0/+8
If create_qp does not fully succeed it is possible for qp cleanup code to attempt to drain the send or recv work queues before the queues have been created causing a seg fault. This patch checks to see if the queues exist before attempting to drain them. Link: https://lore.kernel.org/r/20230620135519.9365-3-rpearsonhpe@gmail.com Reported-by: syzbot+2da1965168e7dbcba136@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-rdma/00000000000012d89205fe7cfe00@google.com/raw Fixes: 49dc9c1f0c7e ("RDMA/rxe: Cleanup reset state handling in rxe_resp.c") Fixes: fbdeb828a21f ("RDMA/rxe: Cleanup error state handling in rxe_comp.c") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-31RDMA/rxe: Move work queue code to subroutinesBob Pearson1-51/+108
This patch: - Moves code to initialize a qp send work queue to a subroutine named rxe_init_sq. - Moves code to initialize a qp recv work queue to a subroutine named rxe_init_rq. - Moves initialization of qp request and response packet queues ahead of work queue initialization so that cleanup of a qp if it is not fully completed can successfully attempt to drain the packet queues without a seg fault. - Makes minor whitespace cleanups. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230620135519.9365-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-07-18RDMA/rxe: Fix an error handling path in rxe_bind_mw()Christophe JAILLET1-1/+2
All errors go to the error handling path, except this one. Be consistent and also branch to it. Fixes: 02ed253770fb ("RDMA/rxe: Introduce rxe access supported flags") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/43698d8a3ed4e720899eadac887427f73d7ec2eb.1689623735.git.christophe.jaillet@wanadoo.fr Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-06-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds15-84/+198
Pull rdma updates from Jason Gunthorpe: "This cycle saw a focus on rxe and bnxt_re drivers: - Code cleanups for irdma, rxe, rtrs, hns, vmw_pvrdma - rxe uses workqueues instead of tasklets - rxe has better compliance around access checks for MRs and rereg_mr - mana supportst he 'v2' FW interface for RX coalescing - hfi1 bug fix for stale cache entries in its MR cache - mlx5 buf fix to handle FW failures when destroying QPs - erdma HW has a new doorbell allocation mechanism for uverbs that is secure - Lots of small cleanups and rework in bnxt_re: - Use the common mmap functions - Support disassociation - Improve FW command flow - support for 'low latency push'" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits) RDMA/bnxt_re: Fix an IS_ERR() vs NULL check RDMA/bnxt_re: Fix spelling mistake "priviledged" -> "privileged" RDMA/bnxt_re: Remove duplicated include in bnxt_re/main.c RDMA/bnxt_re: Refactor code around bnxt_qplib_map_rc() RDMA/bnxt_re: Remove incorrect return check from slow path RDMA/bnxt_re: Enable low latency push RDMA/bnxt_re: Reorg the bar mapping RDMA/bnxt_re: Move the interface version to chip context structure RDMA/bnxt_re: Query function capabilities from firmware RDMA/bnxt_re: Optimize the bnxt_re_init_hwrm_hdr usage RDMA/bnxt_re: Add disassociate ucontext support RDMA/bnxt_re: Use the common mmap helper functions RDMA/bnxt_re: Initialize opcode while sending message RDMA/cma: Remove NULL check before dev_{put, hold} RDMA/rxe: Simplify cq->notify code RDMA/rxe: Fixes mr access supported list RDMA/bnxt_re: optimize the parameters passed to helper functions RDMA/bnxt_re: remove redundant cmdq_bitmap RDMA/bnxt_re: use firmware provided max request timeout RDMA/bnxt_re: cancel all control path command waiters upon error ...
2023-06-27Merge tag 'rcu.2023.06.22a' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: "Documentation updates Miscellaneous fixes, perhaps most notably: - Remove RCU_NONIDLE(). The new visibility of most of the idle loop to RCU has obsoleted this API. - Make the RCU_SOFTIRQ callback-invocation time limit also apply to the rcuc kthreads that invoke callbacks for CONFIG_PREEMPT_RT. - Add a jiffies-based callback-invocation time limit to handle long-running callbacks. (The local_clock() function is only invoked once per 32 callbacks due to its high overhead.) - Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs, which fixes a bug that can occur on systems with non-contiguous CPU numbering. kvfree_rcu updates: - Eliminate the single-argument variant of k[v]free_rcu() now that all uses have been converted to k[v]free_rcu_mightsleep(). - Add WARN_ON_ONCE() checks for k[v]free_rcu*() freeing callbacks too soon. Yes, this is closing the barn door after the horse has escaped, but Murphy says that there will be more horses. Callback-offloading updates: - Fix a number of bugs involving the shrinker and lazy callbacks. Tasks RCU updates Torture-test updates" * tag 'rcu.2023.06.22a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (32 commits) torture: Remove duplicated argument -enable-kvm for ppc64 doc/rcutorture: Add description of rcutorture.stall_cpu_block rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() rcutorture: Correct name of use_softirq module parameter locktorture: Add long_hold to adjust lock-hold delays rcu/nocb: Make shrinker iterate only over NOCB CPUs rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs rcu: Make rcu_cpu_starting() rely on interrupts being disabled rcu: Mark rcu_cpu_kthread() accesses to ->rcu_cpu_has_work rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp rcu: Employ jiffies-based backstop to callback time limit rcu: Check callback-invocation time limit for rcuc kthreads rcu: Remove RCU_NONIDLE() rcu: Add more RCU files to kernel-api.rst rcu-tasks: Clarify the cblist_init_generic() function's pr_info() output rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic() rcu/nocb: Recheck lazy callbacks under the ->nocb_lock from shrinker rcu/nocb: Fix shrinker race against callback enqueuer rcu/nocb: Protect lazy shrinker against concurrent (de-)offloading ...
2023-06-27Merge tag 'v6.4' into rdma.git for-nextJason Gunthorpe8-68/+100
Linux 6.4 Resolve conflicts between rdma rc and next in rxe_cq matching linux-next: drivers/infiniband/sw/rxe/rxe_cq.c: https://lore.kernel.org/r/20230622115246.365d30ad@canb.auug.org.au Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-20RDMA/rxe: Simplify cq->notify codeBob Pearson2-6/+3
The flags parameter to the request notify verb is a bitmask. But, rxe driver treats cq->notify as an int. If someone ever set both the IB_CQ_SOLICITED and the IB_CQ_NEXT_COMP bits rxe_cq_post could fail to generate a completion event. This patch treats the notify flags as a bit mask consistently and can handle the above case correctly. Link: https://lore.kernel.org/r/20230612162244.20038-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-20RDMA/rxe: Fixes mr access supported listBob Pearson1-1/+2
A recent patch incorrectly did not include IB_ACCESS_RELAXED_ORDERING in the list of supported access flags for the rxe driver. The driver actually does nothing related to relaxed ordering but it causes no problems to include it as supported but with no effect. This change caused ib_send_bw and friends to not run correctly. The correct approach is for the driver to allow any of the optional access flags and otherwise ignore them. This patch adds IB_ACCESS_OPTIONAL to the list of rxe supported flags. Fixes: 02ed253770fb ("RDMA/rxe: Introduce rxe access supported flags") Link: https://lore.kernel.org/r/20230613171654.19334-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-14RDMA/rxe: Fix rxe_cq_postBob Pearson1-2/+2
A recent patch replaced a tasklet execution of cq->comp_handler by a direct call. While this made sense it let changes to cq->notify state be unprotected and assumed that the cq completion machinery and the ulp done callbacks were reentrant. The result is that in some cases completion events can be lost. This patch moves the cq->comp_handler call inside of the spinlock in rxe_cq_post which solves both issues. This is compatible with the matching code in the request notify verb. Fixes: 78b26a335310 ("RDMA/rxe: Remove tasklet call from rxe_cq.c") Link: https://lore.kernel.org/r/20230612155032.17036-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09RDMA/rxe: Send last wqe reached event on qp cleanupBob Pearson1-1/+10
The IBA requires: o11-5.2.5: If the HCA supports SRQ, for RC and UD service, the CI shall generate a Last WQE Reached Affiliated Asynchronous Event on a QP that is in the Error State and is associated with an SRQ when either: • a CQE is generated for the last WQE, or • the QP gets in the Error State and there are no more WQEs on the RQ. This patch implements this behavior in flush_recv_queue() which is called as a result of rxe_qp_error() being called whenever the qp is put into the error state. The rxe responder executes SRQ WQEs directly from the SRQ so there are never more WQES on the RQ. Link: https://lore.kernel.org/r/20230602164229.9277-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09RDMA/rxe: Fix the use-before-initialization error of resp_pktsZhu Yanjun1-4/+3
In the following: Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106 assign_lock_key kernel/locking/lockdep.c:982 [inline] register_lock_class+0xdb6/0x1120 kernel/locking/lockdep.c:1295 __lock_acquire+0x10a/0x5df0 kernel/locking/lockdep.c:4951 lock_acquire kernel/locking/lockdep.c:5691 [inline] lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 skb_dequeue+0x20/0x180 net/core/skbuff.c:3639 drain_resp_pkts drivers/infiniband/sw/rxe/rxe_comp.c:555 [inline] rxe_completer+0x250d/0x3cc0 drivers/infiniband/sw/rxe/rxe_comp.c:652 rxe_qp_do_cleanup+0x1be/0x820 drivers/infiniband/sw/rxe/rxe_qp.c:761 execute_in_process_context+0x3b/0x150 kernel/workqueue.c:3473 __rxe_cleanup+0x21e/0x370 drivers/infiniband/sw/rxe/rxe_pool.c:233 rxe_create_qp+0x3f6/0x5f0 drivers/infiniband/sw/rxe/rxe_verbs.c:583 This is a use-before-initialization problem. It happens because rxe_qp_do_cleanup is called during error unwind before the struct has been fully initialized. Move the initialization of the skb earlier. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230602035408.741534-1-yanjun.zhu@intel.com Reported-by: syzbot+eba589d8f49c73d356da@syzkaller.appspotmail.com Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09RDMA/rxe: Implement rereg_user_mrBob Pearson2-0/+40
Implement the two easy cases of ib_rereg_user_mr. Link: https://lore.kernel.org/r/20230530221334.89432-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09RDMA/rxe: Let rkey == lkey for local accessBob Pearson1-7/+8
In order to conform to other drivers stop using rkey == 0 as an indication that there are no remote access flags set. Set rkey == lkey by default for all MRs. Link: https://lore.kernel.org/r/20230530221334.89432-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09RDMA/rxe: Introduce rxe access supported flagsBob Pearson4-3/+30
Introduce supported bit masks for setting the access attributes of MWs, MRs, and QPs. Check these when attributes are set. Link: https://lore.kernel.org/r/20230530221334.89432-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09RDMA/rxe: Fix access checks in rxe_check_bind_mwBob Pearson1-8/+9
The subroutine rxe_check_bind_mw() in rxe_mw.c performs checks on the mw access flags before they are set so they always succeed. This patch instead checks the access flags passed in the send wqe. Fixes: 32a577b4c3a9 ("RDMA/rxe: Add support for bind MW work requests") Link: https://lore.kernel.org/r/20230530221334.89432-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09RDMA//rxe: Optimize send path in rxe_resp.cBob Pearson2-2/+13
Bypass calling check_rkey() in rxe_resp.c for non-rdma messages. Link: https://lore.kernel.org/r/20230530221334.89432-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09RDMA/rxe: Rename IB_ACCESS_REMOTEBob Pearson2-7/+9
Rename IB_ACCESS_REMOTE to RXE_ACCESS_REMOTE and move to rxe_verbs.h as an enum instead of a #define. Shouldn't use IB_xxx for rxe symbols. Link: https://lore.kernel.org/r/20230530221334.89432-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-01RDMA/rxe: Fix ref count error in check_rkey()Bob Pearson1-1/+2
There is a reference count error in error path code and a potential race in check_rkey() in rxe_resp.c. When looking up the rkey for a memory window the reference to the mw from rxe_lookup_mw() is dropped before a reference is taken on the mr referenced by the mw. If the mr is destroyed immediately after the call to rxe_put(mw) the mr pointer is unprotected and may end up pointing at freed memory. The rxe_get(mr) call should take place before the rxe_put(mw) call. All errors in check_rkey() call rxe_put(mw) if mw is not NULL but it was already called after the above. The mw pointer should be set to NULL after the rxe_put(mw) call to prevent this from happening. Fixes: cdd0b85675ae ("RDMA/rxe: Implement memory access through MWs") Link: https://lore.kernel.org/r/20230517211509.1819998-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-01RDMA/rxe: Fix packet length checksBob Pearson1-0/+6
In rxe_net.c a received packet, from udp or loopback, is passed to rxe_rcv() in rxe_recv.c as a udp packet. I.e. skb->data is pointing at the udp header. But rxe_rcv() makes length checks to verify the packet is long enough to hold the roce headers as if it were a roce packet. I.e. skb->data pointing at the bth header. A runt packet would appear to have 8 more bytes than it actually does which may lead to incorrect behavior. This patch calls skb_pull() to adjust the skb to point at the bth header before calling rxe_rcv() which fixes this error. Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20230517172242.1806340-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-01RDMA/rxe: Remove dangling declaration of rxe_cq_disable()Nicolas Morey1-2/+0
rxe_cq_disable() has been removed but not its declaration. Fixes: 78b26a335310 ("RDMA/rxe: Remove tasklet call from rxe_cq.c") Link: https://lore.kernel.org/r/4f20ffc5-b2c4-0c11-2883-a835caf01a94@suse.com Signed-off-by: Nicolas Morey <nmorey@suse.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-19RDMA/rxe: Fix comments about removed taskletsDaisuke Matsuda4-4/+4
The commit 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks") removed tasklets and replaced them with a workqueue, but relevant comments are still remaining in the source code. Fixes: 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks") Link: https://lore.kernel.org/r/20230518070027.942715-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-17RDMA/rxe: Add workqueue support for rxe tasksBob Pearson3-49/+76
Replace tasklets by work queues for the three main rxe tasklets: rxe_requester, rxe_completer and rxe_responder. work queues are a more modern way to process work from an IRQ and provide more control over how that work is run for future patches. Link: https://lore.kernel.org/r/20230428171321.5774-1-rpearsonhpe@gmail.com Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Tested-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-17RDMA/rxe: Convert spin_{lock_bh,unlock_bh} to ↵Guoqing Jiang7-61/+86
spin_{lock_irqsave,unlock_irqrestore} We need to call spin_lock_irqsave()/spin_unlock_irqrestore() for state_lock in rxe, otherwsie the callchain: ib_post_send_mad -> spin_lock_irqsave -> ib_post_send -> rxe_post_send -> spin_lock_bh -> spin_unlock_bh -> spin_unlock_irqrestore Causes below traces during run block nvmeof-mp/001 test due to mismatched spinlock nesting: WARNING: CPU: 0 PID: 94794 at kernel/softirq.c:376 __local_bh_enable_ip+0xc2/0x140 [ ... ] CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G E 6.4.0-rc1 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Workqueue: rdma_cm cma_work_handler [rdma_cm] RIP: 0010:__local_bh_enable_ip+0xc2/0x140 Code: 48 85 c0 74 72 5b 41 5c 5d 31 c0 89 c2 89 c1 89 c6 89 c7 41 89 c0 e9 bd 0e 11 01 65 8b 05 f2 65 72 48 85 c0 0f 85 76 ff ff ff <0f> 0b e9 6f ff ff ff e8 d2 39 1c 00 eb 80 4c 89 e7 e8 68 ad 0a 00 RSP: 0018:ffffb7cf818539f0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000201 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000201 RDI: ffffffffc0f25f79 RBP: ffffb7cf81853a00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc0f25f79 R13: ffff8db1f0fa6000 R14: ffff8db2c63ff000 R15: 00000000000000e8 FS: 0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0 Call Trace: <TASK> _raw_spin_unlock_bh+0x31/0x40 rxe_post_send+0x59/0x8b0 [rdma_rxe] ib_send_mad+0x26b/0x470 [ib_core] ib_post_send_mad+0x150/0xb40 [ib_core] ? cm_form_tid+0x5b/0x90 [ib_cm] ib_send_cm_req+0x7c8/0xb70 [ib_cm] rdma_connect_locked+0x433/0x940 [rdma_cm] nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma] cma_cm_event_handler+0x4f/0x170 [rdma_cm] cma_work_handler+0x6a/0xe0 [rdma_cm] process_one_work+0x2a9/0x580 worker_thread+0x52/0x3f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x109/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 0 PID: 94794 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x37/0x60 [ ... ] CPU: 0 PID: 94794 Comm: kworker/u4:1 Tainted: G W E 6.4.0-rc1 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Workqueue: rdma_cm cma_work_handler [rdma_cm] RIP: 0010:warn_bogus_irq_restore+0x37/0x60 Code: fb 01 77 36 83 e3 01 74 0e 48 8b 5d f8 c9 31 f6 89 f7 e9 ac ea 01 00 48 c7 c7 e0 52 33 b9 c6 05 bb 1c 69 01 01 e8 39 24 f0 fe <0f> 0b 48 8b 5d f8 c9 31 f6 89 f7 e9 89 ea 01 00 0f b6 f3 48 c7 c7 RSP: 0018:ffffb7cf81853a58 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffb7cf81853a60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8db2cfb1a9e8 R13: ffff8db2cfb1a9d8 R14: ffff8db2c63ff000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8db33bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559758db0f20 CR3: 0000000105124000 CR4: 00000000003506f0 Call Trace: <TASK> _raw_spin_unlock_irqrestore+0x91/0xa0 ib_send_mad+0x1e3/0x470 [ib_core] ib_post_send_mad+0x150/0xb40 [ib_core] ? cm_form_tid+0x5b/0x90 [ib_cm] ib_send_cm_req+0x7c8/0xb70 [ib_cm] rdma_connect_locked+0x433/0x940 [rdma_cm] nvme_rdma_cm_handler+0x5d7/0x9c0 [nvme_rdma] cma_cm_event_handler+0x4f/0x170 [rdma_cm] cma_work_handler+0x6a/0xe0 [rdma_cm] process_one_work+0x2a9/0x580 worker_thread+0x52/0x3f0 ? __pfx_worker_thread+0x10/0x10 kthread+0x109/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Fixes: f605f26ea196 ("RDMA/rxe: Protect QP state with qp->state_lock") Link: https://lore.kernel.org/r/20230510035056.881196-1-guoqing.jiang@linux.dev Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-05-16RDMA/rxe: Fix double unlock in rxe_qp.cBob Pearson1-1/+2
A recent patch can cause a double spin_unlock_bh() in rxe_qp_to_attr() at line 715 in rxe_qp.c. Move the 2nd unlock into the if statement. Fixes: f605f26ea196 ("RDMA/rxe: Protect QP state with qp->state_lock") Link: https://lore.kernel.org/r/20230515201056.1591140-1-rpearsonhpe@gmail.com Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/27773078-40ce-414f-8b97-781954da9f25@kili.mountain Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-21RDMA/rxe: Fix spinlock recursion deadlock on requesterDaisuke Matsuda1-3/+3
The following deadlock is observed: Call Trace: <IRQ> _raw_spin_lock_bh+0x29/0x30 check_type_state.constprop.0+0x4e/0xc0 [rdma_rxe] rxe_rcv+0x173/0x3d0 [rdma_rxe] rxe_udp_encap_recv+0x69/0xd0 [rdma_rxe] ? __pfx_rxe_udp_encap_recv+0x10/0x10 [rdma_rxe] udp_queue_rcv_one_skb+0x258/0x520 udp_unicast_rcv_skb+0x75/0x90 __udp4_lib_rcv+0x364/0x5c0 ip_protocol_deliver_rcu+0xa7/0x160 ip_local_deliver_finish+0x73/0xa0 ip_sublist_rcv_finish+0x80/0x90 ip_sublist_rcv+0x191/0x220 ip_list_rcv+0x132/0x160 __netif_receive_skb_list_core+0x297/0x2c0 netif_receive_skb_list_internal+0x1c5/0x300 napi_complete_done+0x6f/0x1b0 virtnet_poll+0x1f4/0x2d0 [virtio_net] __napi_poll+0x2c/0x1b0 net_rx_action+0x293/0x350 ? __napi_schedule+0x79/0x90 __do_softirq+0xcb/0x2ab __irq_exit_rcu+0xb9/0xf0 common_interrupt+0x80/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:_raw_spin_lock+0x17/0x30 rxe_requester+0xe4/0x8f0 [rdma_rxe] ? xas_load+0x9/0xa0 ? xa_load+0x70/0xb0 do_task+0x64/0x1f0 [rdma_rxe] rxe_post_send+0x54/0x110 [rdma_rxe] ib_uverbs_post_send+0x5f8/0x680 [ib_uverbs] ? netif_receive_skb_list_internal+0x1e3/0x300 ib_uverbs_write+0x3c8/0x500 [ib_uverbs] vfs_write+0xc5/0x3b0 ksys_write+0xab/0xe0 ? syscall_trace_enter.constprop.0+0x126/0x1a0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc </TASK> The deadlock is easily reproducible with perftest. Fix it by disabling softirq when acquiring the lock in process context. Fixes: f605f26ea196 ("RDMA/rxe: Protect QP state with qp->state_lock") Link: https://lore.kernel.org/r/20230418090642.1849358-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17RDMA/rxe: Protect QP state with qp->state_lockBob Pearson7-218/+317
Currently the rxe driver makes little effort to make the changes to qp state (which includes qp->attr.qp_state, qp->attr.sq_draining and qp->valid) atomic between different client threads and IO threads. In particular a common template is for an RDMA application to call ib_modify_qp() to move a qp to ERR state and then wait until all the packet and work queues have drained before calling ib_destroy_qp(). None of these state changes are protected by locks to assure that the changes are executed atomically and that memory barriers are included. This has been observed to lead to incorrect behavior around qp cleanup. This patch continues the work of the previous patches in this series and adds locking code around qp state changes and lookups. Link: https://lore.kernel.org/r/20230405042611.6467-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>