summaryrefslogtreecommitdiff
path: root/drivers/vdpa/mlx5
AgeCommit message (Collapse)AuthorFilesLines
2023-02-21vdpa/mlx5: support device features provisioningSi-Wei Liu1-8/+45
This patch implements features provisioning for mlx5_vdpa. 1) Validate the provisioned features are a subset of the parent features. 2) Clearing features that are not wanted by userspace. For example: # vdpa mgmtdev show pci/0000:41:04.2: supported_classes net max_supported_vqs 65 dev_features CSUM GUEST_CSUM MTU MAC HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM 1) Provision vDPA device with all features derived from the parent # vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2 # vdpa dev config show vdpa1: mac e4:11:c6:d3:45:f0 link up link_announce false max_vq_pairs 1 mtu 1500 negotiated_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM 2) Provision vDPA device with a subset of parent features # vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2 device_features 0x300020000 # vdpa dev config show vdpa1: negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-7-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21vdpa/mlx5: make MTU/STATUS presence conditional on feature bitsSi-Wei Liu1-8/+14
The spec says: mtu only exists if VIRTIO_NET_F_MTU is set status only exists if VIRTIO_NET_F_STATUS is set We should only present MTU and STATUS conditionally depending on the feature bits. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-6-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21vdpa/mlx5: Initialize CVQ iotlb spinlockEli Cohen1-0/+1
Initialize itolb spinlock. Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230206122016.1149373-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21vdpa/mlx5: Don't clear mr struct on destroy MREli Cohen1-1/+0
Clearing the mr struct erases the lock owner and causes warnings to be emitted. It is not required to clear the mr so remove the memset call. Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230206121956.1149356-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21vdpa/mlx5: Directly assign memory keyEli Cohen1-1/+1
When creating a memory key, the key value should be assigned to the passed pointer and not or'ed to. No functional issue was observed due to this bug. Fixes: 29064bfdabd5 ("vdpa/mlx5: Add support library for mlx5 VDPA implementation") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230205072906.1108194-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21vdpa: mlx5: support per virtqueue dma deviceJason Wang1-0/+11
This patch implements per virtqueue dma device for mlx5_vdpa. This is needed for virtio_vdpa to work for CVQ which is backed by vringh but not DMA. We simply advertise the vDPA device itself as the DMA device for CVQ then DMA API can simply use PA so the identical mapping for CVQ can still be used. Otherwise the identical (1:1) mapping won't work when platform IOMMU is enabled since the IOVA is allocated on demand which is not necessarily the PA. This fixes the following crash when mlx5 vDPA device is bound to virtio-vdpa with platform IOMMU enabled but not in passthrough mode: BUG: unable to handle page fault for address: ff2fb3063deb1002 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1393001067 P4D 1393002067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 55 PID: 8923 Comm: kworker/u112:3 Kdump: loaded Not tainted 6.1.0+ #7 Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.5.4 12/17/2021 Workqueue: mlx5_vdpa_wq mlx5_cvq_kick_handler [mlx5_vdpa] RIP: 0010:vringh_getdesc_iotlb+0x93/0x1d0 [vringh] Code: 14 25 40 ef 01 00 83 82 c0 0a 00 00 01 48 2b 05 93 5a 1b ea 8b 4c 24 14 48 c1 f8 06 48 c1 e0 0c 48 03 05 90 5a 1b ea 48 01 c8 <0f> b7 00 83 aa c0 0a 00 00 01 65 ff 0d bc e4 41 3f 0f 84 05 01 00 RSP: 0018:ff46821ba664fdf8 EFLAGS: 00010282 RAX: ff2fb3063deb1002 RBX: 0000000000000a20 RCX: 0000000000000002 RDX: ff2fb318d2f94380 RSI: 0000000000000002 RDI: 0000000000000001 RBP: ff2fb3065e832410 R08: ff46821ba664fe00 R09: 0000000000000001 R10: 0000000000000000 R11: 000000000000000d R12: ff2fb3065e832488 R13: ff2fb3065e8324a8 R14: ff2fb3065e8324c8 R15: ff2fb3065e8324a8 FS: 0000000000000000(0000) GS:ff2fb3257fac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ff2fb3063deb1002 CR3: 0000001392010006 CR4: 0000000000771ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> mlx5_cvq_kick_handler+0x89/0x2b0 [mlx5_vdpa] process_one_work+0x1e2/0x3b0 ? rescuer_thread+0x390/0x390 worker_thread+0x50/0x3a0 ? rescuer_thread+0x390/0x390 kthread+0xd6/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Reviewed-by: Eli Cohen <elic@nvidia.com> Tested-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230119061525.75068-6-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21vdpa/mlx5: Add RX counters to debugfsEli Cohen3-30/+207
For each interface, either VLAN tagged or untagged, add two hardware counters: one for unicast and another for multicast. The counters count RX packets and bytes and can be read through debugfs: $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/packets $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/bytes This feature is controlled via the config option MLX5_VDPA_STEERING_DEBUG. It is off by default as it may have some impact on performance. includes a fixup By Yang Yingliang <yangyingliang@huawei.com>: vdpa/mlx5: fix check wrong pointer in mlx5_vdpa_add_mac_vlan_rules() The local variable 'rule' is not used anymore, fix return value check after calling mlx5_add_flow_rules(). Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-9-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Message-Id: <20230104074418.1737510-1-yangyingliang@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-21vdpa/mlx5: Add debugfs subtreeEli Cohen4-1/+87
Add debugfs subtree and expose flow table ID and TIR number. This information can be used by external tools to do extended troubleshooting. The information can be retrieved like so: $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/table_id $ cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/tirn Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-8-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21vdpa/mlx5: Move some definitions to a new header fileEli Cohen2-44/+56
Move some definitions from mlx5_vnet.c to newly added header file mlx5_vnet.h. We need these definitions for the following patches that add debugfs tree to expose information vital for debug. Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-7-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28RDMA/mlx5: remove variable iColin Ian King1-2/+0
Variable i is just being incremented and it's never used anywhere else. The variable and the increment are redundant so remove it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Message-Id: <20221024133756.2158497-1-colin.i.king@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28vdpa/mlx5: Avoid overwriting CVQ iotlbEli Cohen3-59/+39
When qemu uses different address spaces for data and control virtqueues, the current code would overwrite the control virtqueue iotlb through the dup_iotlb call. Fix this by referring to the address space identifier and the group to asid mapping to determine which mapping needs to be updated. We also move the address space logic from mlx5 net to core directory. Reported-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-6-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-12-28vdpa/mlx5: Avoid using reslock in event_handlerEli Cohen1-12/+4
event_handler runs under atomic context and may not acquire reslock. We can still guarantee that the handler won't be called after suspend by clearing nb_registered, unregistering the handler and flushing the workqueue. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-5-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28vdpa/mlx5: Fix wrong mac address deletionEli Cohen1-1/+1
Delete the old MAC from the table and not the new one which is not there yet. Fixes: baf2ad3f6a98 ("vdpa/mlx5: Add RX MAC VLAN filter support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-4-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28vdpa/mlx5: Return error on vlan ctrl commands if not supportedEli Cohen1-0/+3
Check if VIRTIO_NET_F_CTRL_VLAN is negotiated and return error if control VQ command is received. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-12-28vdpa/mlx5: Fix rule forwarding VLAN to TIREli Cohen1-3/+5
Set the VLAN id to the header values field instead of overwriting the headers criteria field. Before this fix, VLAN filtering would not really work and tagged packets would be forwarded unfiltered to the TIR. Fixes: baf2ad3f6a98 ("vdpa/mlx5: Add RX MAC VLAN filter support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20221114131759.57883-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-09-28vdpa/mlx5: Fix MQ to support non power of two num queuesEli Cohen1-7/+10
RQT objects require that a power of two value be configured for both rqt_max_size and rqt_actual size. For create_rqt, make sure to round up to the power of two the value of given by the user who created the vdpa device and given by ndev->rqt_size. The actual size is also rounded up to the power of two using the current number of VQs given by ndev->cur_num_vqs. Same goes with modify_rqt where we need to make sure act size is power of two based on the new number of QPs. Without this patch, attempt to create a device with non power of two QPs would result in error from firmware. Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220912125019.833708-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa/mlx5: Fix possible uninitialized return valueEli Cohen1-1/+1
Initialize err local variable to return -EAGAIN if the asid cannot be found thus avoiding returning uninitialized value. Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220811134010.952291-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa/mlx5: Support different address spaces for control and dataEli Cohen2-11/+88
Partition virtqueues to two different address spaces: one for control virtqueue which is implemented in software, and one for data virtqueues. Based-on: <20220526124338.36247-1-eperezma@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220714113927.85729-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa/mlx5: Implement susupend virtqueue callbackEli Cohen1-3/+80
Implement the suspend callback allowing to suspend the virtqueues so they stop processing descriptors. This is required to allow to query a consistent state of the virtqueue while live migration is taking place. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220714113927.85729-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa/mlx5: Use eth_broadcast_addr() to assign broadcast addressXu Qiang1-1/+1
Using eth_broadcast_addr() to assign broadcast address instead of memset(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xu Qiang <xuqiang36@huawei.com> Message-Id: <20220704021405.64545-1-xuqiang36@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-06-24vdpa/mlx5: Initialize CVQ vringh only onceEli Cohen1-11/+20
Currently, CVQ vringh is initialized inside setup_virtqueues() which is called every time a memory update is done. This is undesirable since it resets all the context of the vring, including the available and used indices. Move the initialization to mlx5_vdpa_set_status() when VIRTIO_CONFIG_S_DRIVER_OK is set. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220613075958.511064-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-06-24vdpa/mlx5: Update Control VQ callback informationEli Cohen1-0/+2
The control VQ specific information is stored in the dedicated struct mlx5_control_vq. When the callback is updated through mlx5_vdpa_set_vq_cb(), make sure to update the control VQ struct. Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220613075958.511064-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com)
2022-06-08vdpa/mlx5: clean up indenting in handle_ctrl_vlan()Dan Carpenter1-3/+3
These lines were supposed to be indented. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Message-Id: <Yp71IYMP+QfuCJ8t@kili> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eli Cohen <elic@nvidia.com> Acked-by: Si-Wei Liu <si-wei.liu@oracle.com>
2022-06-08vdpa/mlx5: fix error code for deleting vlanDan Carpenter1-0/+1
Return success if we were able to delete a vlan. The current code always returns failure. Fixes: baf2ad3f6a98 ("vdpa/mlx5: Add RX MAC VLAN filter support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Message-Id: <Yp709f1g9NcMBCHg@kili> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eli Cohen <elic@nvidia.com> Acked-by: Si-Wei Liu <si-wei.liu@oracle.com>
2022-06-08vdpa/mlx5: Fix syntax errors in commentsXiang wangx1-1/+1
Delete the redundant word 'is'. Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Message-Id: <20220604143858.16073-1-wangxiang@cdjrlc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-06-01vdpa/mlx5: Add RX MAC VLAN filter supportEli Cohen1-58/+216
Support HW offloaded filtering of MAC/VLAN packets. To allow that, we add a handler to handle VLAN configurations coming through the control VQ. Two operations are supported. 1. Adding VLAN - in this case, an entry will be added to the RX flow table that will allow the combination of the MAC/VLAN to be forwarded to the TIR. 2. Removing VLAN - will remove the entry from the flow table, effectively blocking such packets from going through. Currently the control VQ does not propagate changes to the MAC of the VLAN device so we always use the MAC of the parent device. Examples: 1. Create vlan device: $ ip link add link ens1 name ens1.8 type vlan id 8 Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220411122942.225717-4-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-06-01vdpa/mlx5: Remove flow counter from steeringEli Cohen1-18/+6
The flow counter has been introduced in early versions of the driver to aid in debugging. It is no longer needed and can harm performance. Remove it. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220411122942.225717-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vdpa: multiple address spaces supportGautam Dawar1-2/+3
This patches introduces the multiple address spaces support for vDPA device. This idea is to identify a specific address space via an dedicated identifier - ASID. During vDPA device allocation, vDPA device driver needs to report the number of address spaces supported by the device then the DMA mapping ops of the vDPA device needs to be extended to support ASID. This helps to isolate the environments for the virtqueue that will not be assigned directly. E.g in the case of virtio-net, the control virtqueue will not be assigned directly to guest. As a start, simply claim 1 virtqueue groups and 1 address spaces for all vDPA devices. And vhost-vDPA will simply reject the device with more than 1 virtqueue groups or address spaces. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-7-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vdpa: introduce virtqueue groupsGautam Dawar1-1/+7
This patch introduces virtqueue groups to vDPA device. The virtqueue group is the minimal set of virtqueues that must share an address space. And the address space identifier could only be attached to a specific virtqueue group. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-6-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vdpa/mlx5: Use readers/writers semaphore instead of mutexEli Cohen1-22/+19
Reading statistics could be done intensively and by several processes concurrently. Reader's lock is sufficient in this case. Change reslock from mutex to a rwsem. Suggested-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220518133804.1075129-7-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vdpa/mlx5: Add support for reading descriptor statisticsEli Cohen2-0/+151
Implement the get_vq_stats calback of vdpa_config_ops to return the statistics for a virtqueue. The statistics are provided as vendor specific statistics where the driver provides a pair of attribute name and attribute value. Currently supported are received descriptors and completed descriptors. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220518133804.1075129-6-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-18vdpa/mlx5: Use consistent RQT sizeEli Cohen1-40/+21
The current code evaluates RQT size based on the configured number of virtqueues. This can raise an issue in the following scenario: Assume MQ was negotiated. 1. mlx5_vdpa_set_map() gets called. 2. handle_ctrl_mq() is called setting cur_num_vqs to some value, lower than the configured max VQs. 3. A second set_map gets called, but now a smaller number of VQs is used to evaluate the size of the RQT. 4. handle_ctrl_mq() is called with a value larger than what the RQT can hold. This will emit errors and the driver state is compromised. To fix this, we use a new field in struct mlx5_vdpa_net to hold the required number of entries in the RQT. This value is evaluated in mlx5_vdpa_set_driver_features() where we have the negotiated features all set up. In addition to that, we take into consideration the max capability of RQT entries early when the device is added so we don't need to take consider it when creating the RQT. Last, we remove the use of mlx5_vdpa_max_qps() which just returns the max_vas / 2 and make the code clearer. Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-30vdpa: mlx5: synchronize driver status with CVQJason Wang1-14/+37
Currently, CVQ doesn't have any synchronization with the driver status. Then CVQ emulation code run in the middle of: 1) device reset 2) device status changed 3) map updating The will lead several unexpected issue like trying to execute CVQ command after the driver has been teared down. Fixing this by using reslock to synchronize CVQ emulation code with the driver status changing: - protect the whole device reset, status changing and set_map() updating with reslock - protect the CVQ handler with the reslock and check VIRTIO_CONFIG_S_DRIVER_OK in the CVQ handler This will guarantee that: 1) CVQ handler won't work if VIRTIO_CONFIG_S_DRIVER_OK is not set 2) CVQ handler will see a consistent state of the driver instead of the partial one when it is running in the middle of the teardown_driver() or setup_driver(). Cc: 5262912ef3cfc ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220329042109.4029-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eli Cohen <elic@nvidia.com>
2022-03-30vdpa: mlx5: prevent cvq work from hogging CPUJason Wang1-12/+9
A userspace triggerable infinite loop could happen in mlx5_cvq_kick_handler() if userspace keeps sending a huge amount of cvq requests. Fixing this by introducing a quota and re-queue the work if we're out of the budget (currently the implicit budget is one) . While at it, using a per device work struct to avoid on demand memory allocation for cvq. Fixes: 5262912ef3cfc ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220329042109.4029-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eli Cohen <elic@nvidia.com>
2022-03-28vdpa/mlx5: Avoid processing works if workqueue was destroyedEli Cohen1-2/+5
If mlx5_vdpa gets unloaded while a VM is running, the workqueue will be destroyed. However, vhost might still have reference to the kick function and might attempt to push new works. This could lead to null pointer dereference. To fix this, set mvdev->wq to NULL just before destroying and verify that the workqueue is not NULL in mlx5_vdpa_kick_vq before attempting to push a new work. Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220321141303.9586-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28vdpa/mlx5: re-create forwarding rules after mac modifiedMichael Qiu1-1/+44
When MAC Address has been modified in guest, we only re-add the Mac to mpfs, it is not enough, because the guest network will not work correctly: the reply package from outside will go straight away to the host VF net interface. This patch recreate the flow rules, and make it work correctly. Signed-off-by: Michael Qiu <qiudayu@archeros.com> Link: https://lore.kernel.org/r/1648446492-17614-1-git-send-email-08005325@163.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eli Cohen <elic@nvidia.com>
2022-03-28net/mlx5: Add support for configuring max device MTUEli Cohen1-1/+31
Allow an admin creating a vdpa device to specify the max MTU for the net device. For example, to create a device with max MTU of 1000, the following command can be used: $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 mtu 1000 This configuration mechanism assumes that vdpa is the sole real user of the function. mlx5_core could theoretically change the mtu of the function using the ip command on the mlx5_core net device but this should not be done. Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220221121927.194728-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-04vdpa/mlx5: add validation for VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET commandSi-Wei Liu1-0/+16
When control vq receives a VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET command request from the driver, presently there is no validation against the number of queue pairs to configure, or even if multiqueue had been negotiated or not is unverified. This may lead to kernel panic due to uninitialized resource for the queues were there any bogus request sent down by untrusted driver. Tie up the loose ends there. Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1642206481-30721-4-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-04vdpa/mlx5: should verify CTRL_VQ feature exists for MQSi-Wei Liu1-2/+16
Per VIRTIO v1.1 specification, section 5.1.3.1 Feature bit requirements: "VIRTIO_NET_F_MQ Requires VIRTIO_NET_F_CTRL_VQ". There's assumption in the mlx5_vdpa multiqueue code that MQ must come together with CTRL_VQ. However, there's nowhere in the upper layer to guarantee this assumption would hold. Were there an untrusted driver sending down MQ without CTRL_VQ, it would compromise various spots for e.g. is_index_valid() and is_ctrl_vq_idx(). Although this doesn't end up with immediate panic or security loophole as of today's code, the chance for this to be taken advantage of due to future code change is not zero. Harden the crispy assumption by failing the set_driver_features() call when seeing (MQ && !CTRL_VQ). For that end, verify_min_features() is renamed to verify_driver_features() to reflect the fact that it now does more than just validate the minimum features. verify_driver_features() is now used to accommodate various checks against the driver features for set_driver_features(). Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1642206481-30721-3-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-59/+97
Pull virtio updates from Michael Tsirkin: "virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes. - partial support for < MAX_ORDER - 1 granularity for virtio-mem - driver_override for vdpa - sysfs ABI documentation for vdpa - multiqueue config support for mlx5 vdpa - and misc fixes, cleanups" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits) vdpa/mlx5: Fix tracking of current number of VQs vdpa/mlx5: Fix is_index_valid() to refer to features vdpa: Protect vdpa reset with cf_mutex vdpa: Avoid taking cf_mutex lock on get status vdpa/vdpa_sim_net: Report max device capabilities vdpa: Use BIT_ULL for bit operations vdpa/vdpa_sim: Configure max supported virtqueues vdpa/mlx5: Report max device capabilities vdpa: Support reporting max device capabilities vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps() vdpa: Add support for returning device configuration information vdpa/mlx5: Support configuring max data virtqueue vdpa/mlx5: Fix config_attr_mask assignment vdpa: Allow to configure max data virtqueues vdpa: Read device configuration only if FEATURES_OK vdpa: Sync calls set/get config/status with cf_mutex vdpa/mlx5: Distribute RX virtqueues in RQT object vdpa: Provide interface to read driver features vdpa: clean up get_config_size ret value handling virtio_ring: mark ring unused on error ...
2022-01-15vdpa/mlx5: Fix tracking of current number of VQsEli Cohen1-4/+8
Modify the code such that ndev->cur_num_vqs better reflects the actual number of data virtqueues. The value can be accurately realized after features have been negotiated. This is to prevent possible failures when modifying the RQT object if the cur_num_vqs bears invalid value. No issue was actually encountered but this also makes the code more readable. Fixes: c5a5cd3d3217 ("vdpa/mlx5: Support configuring max data virtqueue") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220111183400.38418-5-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-15vdpa/mlx5: Fix is_index_valid() to refer to featuresEli Cohen1-3/+7
Make sure the decision whether an index received through a callback is valid or not consults the negotiated features. The motivation for this was due to a case encountered where I shut down the VM. After the reset operation was called features were already clear, I got get_vq_state() call which caused out array bounds access since is_index_valid() reported the index value. So this is more of not hit a bug since the call shouldn't have been made first place. Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220111183400.38418-4-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-15vdpa/mlx5: Report max device capabilitiesEli Cohen1-12/+23
Configure max supported virtqueues and features on the management device. This info can be retrieved using: $ vdpa mgmtdev show auxiliary/mlx5_core.sf.1: supported_classes net max_supported_vqs 257 dev_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ MQ \ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-12-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com>
2022-01-15vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps()Eli Cohen1-1/+3
Restore ndev->cur_num_vqs to the original value in case change_num_qps() fails. Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support") Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-10-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-15vdpa/mlx5: Support configuring max data virtqueueEli Cohen1-14/+40
Check whether the max number of data virtqueue pairs was provided when a adding a new device and verify the new value does not exceed device capabilities. In addition, change the arrays holding virtqueue and callback contexts to be dynamically allocated. Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-8-elic@nvidia.com Includes fixup: vdpa/mlx5: fix error handling in mlx5_vdpa_dev_add() Clang build fails with mlx5_vnet.c:2574:6: error: variable 'mvdev' is used uninitialized whenever 'if' condition is true if (!ndev->vqs || !ndev->event_cbs) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mlx5_vnet.c:2660:14: note: uninitialized use occurs here put_device(&mvdev->vdev.dev); ^~~~~ This because mvdev is set after trying to allocate ndev->vqs,event_cbs. So move the allocation to after mvdev is set but before the arrays are used in init_mvqs() Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20220107211352.3940570-1-trix@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Includes fixup: vdpa/mlx5: fix endian-ness for max vqs sparse warnings: (new ones prefixed by >>) >> drivers/vdpa/mlx5/net/mlx5_vnet.c:1247:23: sparse: sparse: cast to restricted __le16 >> drivers/vdpa/mlx5/net/mlx5_vnet.c:1247:23: sparse: sparse: cast from restricted __virtio16 > 1247 num = le16_to_cpu(ndev->config.max_virtqueue_pairs); Address this using the appropriate wrapper. Cc: "Eli Cohen" <elic@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eli Cohen <elic@nvidia.com>
2022-01-15vdpa/mlx5: Fix config_attr_mask assignmentEli Cohen1-1/+1
Fix VDPA_ATTR_DEV_NET_CFG_MACADDR assignment to be explicit 64 bit assignment. No issue was seen since the value is well below 64 bit max value. Nevertheless it needs to be fixed. Fixes: a007d940040c ("vdpa/mlx5: Support configuration of MAC") Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-7-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-15vdpa/mlx5: Distribute RX virtqueues in RQT objectEli Cohen1-23/+7
Distribute the available rx virtqueues amongst the available RQT entries. RQTs require to have a power of two entries. When creating or modifying the RQT, use the lowest number of power of two entries that is not less than the number of rx virtqueues. Distribute them in the available entries such that some virtqueus may be referenced twice. This allows to configure any number of virtqueue pairs when multiqueue is used. Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-3-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-15vdpa: Provide interface to read driver featuresEli Cohen1-4/+12
Provide an interface to read the negotiated features. This is needed when building the netlink message in vdpa_dev_net_config_fill(). Also fix the implementation of vdpa_dev_net_config_fill() to use the negotiated features instead of the device features. To make APIs clearer, make the following name changes to struct vdpa_config_ops so they better describe their operations: get_features -> get_device_features set_features -> set_driver_features Finally, add get_driver_features to return the negotiated features and add implementation to all the upstream drivers. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-15vdpa/mlx5: Fix wrong configuration of virtio_version_1_0Eli Cohen1-2/+0
Remove overriding of virtio_version_1_0 which forced the virtqueue object to version 1. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20211230142024.142979-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
2022-01-15net/mlx5_vdpa: Offer VIRTIO_NET_F_MTU when setting MTUEli Cohen1-0/+1
Make sure to offer VIRTIO_NET_F_MTU since we configure the MTU based on what was queried from the device. This allows the virtio driver to allocate large enough buffers based on the reported MTU. Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20211124170949.51725-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>