summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-22vhost-vdpa: Remove usage of the deprecated ida_simple_xx() APIChristophe JAILLET1-3/+3
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_max() is inclusive. So a -1 has been added when needed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Message-Id: <67c2edf49788c27d5f7a49fc701520b9fcf739b5.1713088999.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2024-05-22vp_vdpa: don't allocate unused msix vectorsYuxue Liu1-6/+16
When there is a ctlq and it doesn't require interrupt callbacks,the original method of calculating vectors wastes hardware msi or msix resources as well as system IRQ resources. When conducting performance testing using testpmd in the guest os, it was found that the performance was lower compared to directly using vfio-pci to passthrough the device In scenarios where the virtio device in the guest os does not utilize interrupts, the vdpa driver still configures the hardware's msix vector. Therefore, the hardware still sends interrupts to the host os. Because of this unnecessary action by the hardware, hardware performance decreases, and it also affects the performance of the host os. Before modification:(interrupt mode) 32: 0 0 0 0 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-0 33: 0 0 0 0 PCI-MSI 32769-edge vp-vdpa[0000:00:02.0]-1 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[0000:00:02.0]-2 35: 0 0 0 0 PCI-MSI 32771-edge vp-vdpa[0000:00:02.0]-config After modification:(interrupt mode) 32: 0 0 1 7 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-0 33: 36 0 3 0 PCI-MSI 32769-edge vp-vdpa[0000:00:02.0]-1 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[0000:00:02.0]-config Before modification:(virtio pmd mode for guest os) 32: 0 0 0 0 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-0 33: 0 0 0 0 PCI-MSI 32769-edge vp-vdpa[0000:00:02.0]-1 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[0000:00:02.0]-2 35: 0 0 0 0 PCI-MSI 32771-edge vp-vdpa[0000:00:02.0]-config After modification:(virtio pmd mode for guest os) 32: 0 0 0 0 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-config To verify the use of the virtio PMD mode in the guest operating system, the following patch needs to be applied to QEMU: https://lore.kernel.org/all/20240408073311.2049-1-yuxue.liu@jaguarmicro.com Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Message-Id: <20240410033020.1310-1-yuxue.liu@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22sound: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-25-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Anton Yakovlev <anton.yakovlev@opensynergy.com>
2024-05-22fuse: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-24-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-05-22scsi: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-23-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-22rpmsg: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-22-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22nvdimm: virtio_pmem: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-21-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
2024-05-22wifi: mac80211_hwsim: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-20-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vsock/virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-19-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-05-22net: 9p: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-18-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22net: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-17-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22net: caif: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-16-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22misc: nsm: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-15-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alexander Graf <graf@amazon.com>
2024-05-22iommu: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-14-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22drm/virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-13-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22gpio: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-12-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Linus Walleij <linus.walleij@linaro.org>
2024-05-22firmware: arm_scmi: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-11-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com>
2024-05-22crypto: virtio - drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-10-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-22virtio_console: drop owner assignmentKrzysztof Kozlowski1-2/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-9-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22hwrng: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-8-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22bluetooth: virtio: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-7-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio_blk: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-6-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-05-22um: virt-pci: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-5-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio: mem: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-4-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio: input: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-3-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio: balloon: drop owner assignmentKrzysztof Kozlowski1-1/+0
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-2-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio_balloon: Treat stats requests as wakeup eventsDavid Stevens1-29/+46
Treat stats requests as wakeup events to ensure that the driver responds to device requests in a timely manner. Signed-off-by: David Stevens <stevensd@chromium.org> Acked-by: David Hildenbrand <david@redhat.com> Message-Id: <20240321012445.1593685-3-stevensd@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio_balloon: Give the balloon its own wakeup sourceDavid Stevens1-2/+11
Wakeup sources don't support nesting multiple events, so sharing a single object between multiple drivers can result in one driver overriding the wakeup event processing period specified by another driver. Have the virtio balloon driver use the wakeup source of the device it is bound to rather than the wakeup source of the parent device, to avoid conflicts with the transport layer. Note that although the virtio balloon's virtio_device itself isn't what actually wakes up the device, it is responsible for processing wakeup events. In the same way that EPOLLWAKEUP uses a dedicated wakeup_source to prevent suspend when userspace is processing wakeup events, a dedicated wakeup_source is necessary when processing wakeup events in a higher layer in the kernel. Fixes: b12fbc3f787e ("virtio_balloon: stay awake while adjusting balloon") Signed-off-by: David Stevens <stevensd@chromium.org> Acked-by: David Hildenbrand <david@redhat.com> Message-Id: <20240321012445.1593685-2-stevensd@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio-mem: support suspend+resumeDavid Hildenbrand2-6/+64
With virtio-mem, primarily hibernation is problematic: as the machine shuts down, the virtio-mem device loses its state. Powering the machine back up is like losing a bunch of DIMMs. While there would be ways to add limited support, suspend+resume is more commonly used for VMs and "easier" to support cleanly. s2idle can be supported without any device dependencies. Similarly, one would expect suspend-to-ram (i.e., S3) to work out of the box. However, QEMU currently unplugs all device memory when resuming the VM, using a cold reset on the "wakeup" path. In order to support S3, we need a feature flag for the device to tell us if memory remains plugged when waking up. In the future, QEMU will implement this feature. So let's always support s2idle and support S3 with plugged memory only if the device indicates support. Block hibernation early using the PM notifier. Trying to hibernate now fails early: # echo disk > /sys/power/state [ 26.455369] PM: hibernation: hibernation entry [ 26.458271] virtio_mem virtio0: hibernation is not supported. [ 26.462498] PM: hibernation: hibernation exit -bash: echo: write error: Operation not permitted s2idle works even without the new feature bit: # echo s2idle > /sys/power/mem_sleep # echo mem > /sys/power/state [ 52.083725] PM: suspend entry (s2idle) [ 52.095950] Filesystems sync: 0.010 seconds [ 52.101493] Freezing user space processes [ 52.104213] Freezing user space processes completed (elapsed 0.001 seconds) [ 52.106520] OOM killer disabled. [ 52.107655] Freezing remaining freezable tasks [ 52.110880] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 52.113296] printk: Suspending console(s) (use no_console_suspend to debug) S3 does not work without the feature bit when memory is plugged: # echo deep > /sys/power/mem_sleep # echo mem > /sys/power/state [ 32.788281] PM: suspend entry (deep) [ 32.816630] Filesystems sync: 0.027 seconds [ 32.820029] Freezing user space processes [ 32.823870] Freezing user space processes completed (elapsed 0.001 seconds) [ 32.827756] OOM killer disabled. [ 32.829608] Freezing remaining freezable tasks [ 32.833842] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 32.837953] printk: Suspending console(s) (use no_console_suspend to debug) [ 32.916172] virtio_mem virtio0: suspend+resume with plugged memory is not supported [ 32.916181] virtio-pci 0000:00:02.0: PM: pci_pm_suspend(): virtio_pci_freeze+0x0/0x50 returns -1 [ 32.916197] virtio-pci 0000:00:02.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -1 [ 32.916210] virtio-pci 0000:00:02.0: PM: failed to suspend async: error -1 But S3 works with the new feature bit when memory is plugged (patched QEMU): # echo deep > /sys/power/mem_sleep # echo mem > /sys/power/state [ 33.983694] PM: suspend entry (deep) [ 34.009828] Filesystems sync: 0.024 seconds [ 34.013589] Freezing user space processes [ 34.016722] Freezing user space processes completed (elapsed 0.001 seconds) [ 34.019092] OOM killer disabled. [ 34.020291] Freezing remaining freezable tasks [ 34.023549] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 34.026090] printk: Suspending console(s) (use no_console_suspend to debug) Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240318120645.105664-1-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22kernel: Remove signal hacks for vhost_tasksMike Christie3-10/+3
This removes the signal/coredump hacks added for vhost_tasks in: Commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") When that patch was added vhost_tasks did not handle SIGKILL and would try to ignore/clear the signal and continue on until the device's close function was called. In the previous patches vhost_tasks and the vhost drivers were converted to support SIGKILL by cleaning themselves up and exiting. The hacks are no longer needed so this removes them. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-10-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost_task: Handle SIGKILL by flushing work and exitingMike Christie4-24/+88
Instead of lingering until the device is closed, this has us handle SIGKILL by: 1. marking the worker as killed so we no longer try to use it with new virtqueues and new flush operations. 2. setting the virtqueue to worker mapping so no new works are queued. 3. running all the exiting works. Suggested-by: Edward Adam Davis <eadavis@qq.com> Reported-and-tested-by: syzbot+98edc2df894917b3431f@syzkaller.appspotmail.com Message-Id: <tencent_546DA49414E876EEBECF2C78D26D242EE50A@qq.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-9-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost: Release worker mutex during flushesMike Christie1-14/+30
In the next patches where the worker can be killed while in use, we need to be able to take the worker mutex and kill queued works for new IO and flushes, and set some new flags to prevent new __vhost_vq_attach_worker calls from swapping in/out killed workers. If we are holding the worker mutex during a flush and the flush's work is still in the queue, the worker code that will handle the SIGKILL cleanup won't be able to take the mutex and perform it's cleanup. So this patch has us drop the worker mutex while waiting for the flush to complete. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost: Use virtqueue mutex for swapping workerMike Christie1-6/+12
__vhost_vq_attach_worker uses the vhost_dev mutex to serialize the swapping of a virtqueue's worker. This was done for simplicity because we are already holding that mutex. In the next patches where the worker can be killed while in use, we need finer grained locking because some drivers will hold the vhost_dev mutex while flushing. However in the SIGKILL handler in the next patches, we will need to be able to swap workers (set current one to NULL), kill queued works and stop new flushes while flushes are in progress. To prepare us, this has us use the virtqueue mutex for swapping workers instead of the vhost_dev one. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost_scsi: Handle vhost_vq_work_queue failures for TMFsMike Christie1-1/+2
vhost_vq_work_queue will never fail when queueing the TMF's response handling because a guest can only send us TMFs when the device is fully setup so there is always a worker at that time. In the next patches we will modify the worker code so it handles SIGKILL by exiting before outstanding commands/TMFs have sent their responses. In that case vhost_vq_work_queue can fail when we try to send a response. This has us just free the TMF's resources since at this time the guest won't be able to get a response even if we could send it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-6-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost: Remove vhost_vq_flushMike Christie2-13/+0
vhost_vq_flush is no longer used so remove it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost-scsi: Use system wq to flush dev for TMFsMike Christie1-20/+19
We flush all the workers that are not also used by the ctl vq to make sure that responses queued by LIO before the TMF response are sent before the TMF response. This requires a special vhost_vq_flush function which, in the next patches where we handle SIGKILL killing workers while in use, will require extra locking/complexity. To avoid that, this patch has us flush the entire device from the system work queue, then queue up sending the response from there. This is a little less optimal since we now flush all workers but this will be ok since commands have already timed out and perf is not a concern. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost-scsi: Handle vhost_vq_work_queue failures for cmdsMike Christie1-1/+12
In the next patches we will support the vhost_task being killed while in use. The problem for vhost-scsi is that we can't free some structs until we get responses for commands we have submitted to the target layer and we currently process the responses from the vhost_task. This has just drop the responses and free the command's resources. When all commands have completed then operations like flush will be woken up and we can complete device release and endpoint cleanup. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost-scsi: Handle vhost_vq_work_queue failures for eventsMike Christie1-5/+12
Currently, we can try to queue an event's work before the vhost_task is created. When this happens we just drop it in vhost_scsi_do_plug before even calling vhost_vq_work_queue. During a device shutdown we do the same thing after vhost_scsi_clear_endpoint has cleared the backends. In the next patches we will be able to kill the vhost_task before we have cleared the endpoint. In that case, vhost_vq_work_queue can fail and we will leak the event's memory. This has handle the failure by just freeing the event. This is safe to do, because vhost_vq_work_queue will only return failure for us when the vhost_task is killed and so userspace will not be able to handle events if we sent them. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vdpa: Convert sprintf/snprintf to sysfs_emitLi Zhijian1-1/+1
Per filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. coccinelle complains that there are still a couple of functions that use snprintf(). Convert them to sysfs_emit(). sprintf() will be converted as weel if they have. Generally, this patch is generated by make coccicheck M=<path/to/file> MODE=patch \ COCCI=scripts/coccinelle/api/device_attr_show.cocci No functional change intended CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com> CC: virtualization@lists.linux.dev Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Message-Id: <20240314095853.1326111-1-lizhijian@fujitsu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vp_vdpa: Fix return value check vp_vdpa_request_irqYuxue Liu1-1/+4
In the vp_vdpa_set_status function, when setting the device status to VIRTIO_CONFIG_S_DRIVER_OK, the vp_vdpa_request_irq function may fail. In such cases, the device status should not be set to DRIVER_OK. Add exception printing to remind the user. Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com> Message-Id: <20240325105448.235-1-gavin.liu@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22arm64/fpsimd: Avoid erroneous elide of user state reloadArd Biesheuvel1-22/+22
TIF_FOREIGN_FPSTATE is a 'convenience' flag that should reflect whether the current CPU holds the most recent user mode FP/SIMD state of the current task. It combines two conditions: - whether the current CPU's FP/SIMD state belongs to the task; - whether that state is the most recent associated with the task (as a task may have executed on other CPUs as well). When a task is scheduled in and TIF_KERNEL_FPSTATE is set, it means the task was in a kernel mode NEON section when it was scheduled out, and so the kernel mode FP/SIMD state is restored. Since this implies that the current CPU is *not* holding the most recent user mode FP/SIMD state of the current task, the TIF_FOREIGN_FPSTATE flag is set too, so that the user mode FP/SIMD state is reloaded from memory when returning to userland. However, the task may be scheduled out after completing the kernel mode NEON section, but before returning to userland. When this happens, the TIF_FOREIGN_FPSTATE flag will not be preserved, but will be set as usual the next time the task is scheduled in, and will be based on the above conditions. This means that, rather than setting TIF_FOREIGN_FPSTATE when scheduling in a task with TIF_KERNEL_FPSTATE set, the underlying state should be updated so that TIF_FOREIGN_FPSTATE will assume the expected value as a result. So instead, call fpsimd_flush_cpu_state(), which takes care of this. Closes: https://lore.kernel.org/all/cb8822182231850108fa43e0446a4c7f@kernel.org Reported-by: Johannes Nixdorf <mixi@shadowice.org> Fixes: aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch") Cc: Mark Brown <broonie@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Janne Grunau <j@jannau.net> Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Janne Grunau <j@jannau.net> Tested-by: Johannes Nixdorf <mixi@shadowice.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240522091335.335346-2-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-05-22Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"Will Deacon2-0/+19
This reverts commit b8995a18417088bb53f87c49d200ec72a9dd4ec1. Ard managed to reproduce the dm-crypt corruption problem and got to the bottom of it, so re-apply the problematic patch in preparation for fixing things properly. Cc: stable@vger.kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-05-22net: mana: Fix the extra HZ in mana_hwc_send_requestSouradeep Chakrabarti1-1/+1
Commit 62c1bff593b7 added an extra HZ along with msecs_to_jiffies. This patch fixes that. Cc: stable@vger.kernel.org Fixes: 62c1bff593b7 ("net: mana: Configure hwc timeout from hardware") Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1716185104-31658-1-git-send-email-schakrabarti@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-22net: lan966x: Remove ptp traps in case the ptp is not enabled.Horatiu Vultur1-3/+3
Lan966x is adding ptp traps to redirect the ptp frames to the CPU such that the HW will not forward these frames anywhere. The issue is that in case ptp is not enabled and the timestamping source is et to HWTSTAMP_SOURCE_NETDEV then these traps would not be removed on the error path. Fix this by removing the traps in this case as they are not needed. Fixes: 54e1ed69c40a ("net: lan966x: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20240517135808.3025435-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-22rv: Update rv_en(dis)able_monitor doc to match kernel-docYang Li1-0/+2
The patch updates the function documentation comment for rv_en(dis)able_monitor to adhere to the kernel-doc specification. Link: https://lore.kernel.org/linux-trace-kernel/20240520054239.61784-1-yang.lee@linux.alibaba.com Fixes: 102227b970a15 ("rv: Add Runtime Verification (RV) interface") Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-22tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_testJeff Johnson1-0/+1
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/trace/preemptirq_delay_test.o Link: https://lore.kernel.org/linux-trace-kernel/20240518-md-preemptirq_delay_test-v1-1-387d11b30d85@quicinc.com Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: f96e8577da10 ("lib: Add module for testing preemptoff/irqsoff latency tracers") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-22ring-buffer: Fix a race between readers and resize checksPetr Pavlu1-0/+9
The reader code in rb_get_reader_page() swaps a new reader page into the ring buffer by doing cmpxchg on old->list.prev->next to point it to the new page. Following that, if the operation is successful, old->list.next->prev gets updated too. This means the underlying doubly-linked list is temporarily inconsistent, page->prev->next or page->next->prev might not be equal back to page for some page in the ring buffer. The resize operation in ring_buffer_resize() can be invoked in parallel. It calls rb_check_pages() which can detect the described inconsistency and stop further tracing: [ 190.271762] ------------[ cut here ]------------ [ 190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0 [ 190.271789] Modules linked in: [...] [ 190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1 [ 190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G E 6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f [ 190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 [ 190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0 [ 190.272023] Code: [...] [ 190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206 [ 190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80 [ 190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700 [ 190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000 [ 190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720 [ 190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000 [ 190.272053] FS: 00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000 [ 190.272057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0 [ 190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 190.272077] Call Trace: [ 190.272098] <TASK> [ 190.272189] ring_buffer_resize+0x2ab/0x460 [ 190.272199] __tracing_resize_ring_buffer.part.0+0x23/0xa0 [ 190.272206] tracing_resize_ring_buffer+0x65/0x90 [ 190.272216] tracing_entries_write+0x74/0xc0 [ 190.272225] vfs_write+0xf5/0x420 [ 190.272248] ksys_write+0x67/0xe0 [ 190.272256] do_syscall_64+0x82/0x170 [ 190.272363] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 190.272373] RIP: 0033:0x7f1bd657d263 [ 190.272381] Code: [...] [ 190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263 [ 190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001 [ 190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000 [ 190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500 [ 190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002 [ 190.272412] </TASK> [ 190.272414] ---[ end trace 0000000000000000 ]--- Note that ring_buffer_resize() calls rb_check_pages() only if the parent trace_buffer has recording disabled. Recent commit d78ab792705c ("tracing: Stop current tracer when resizing buffer") causes that it is now always the case which makes it more likely to experience this issue. The window to hit this race is nonetheless very small. To help reproducing it, one can add a delay loop in rb_get_reader_page(): ret = rb_head_page_replace(reader, cpu_buffer->reader_page); if (!ret) goto spin; for (unsigned i = 0; i < 1U << 26; i++) /* inserted delay loop */ __asm__ __volatile__ ("" : : : "memory"); rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list; .. and then run the following commands on the target system: echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable while true; do echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1 echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1 done & while true; do for i in /sys/kernel/tracing/per_cpu/*; do timeout 0.1 cat $i/trace_pipe; sleep 0.2 done done To fix the problem, make sure ring_buffer_resize() doesn't invoke rb_check_pages() concurrently with a reader operating on the same ring_buffer_per_cpu by taking its cpu_buffer->reader_lock. Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> [ Fixed whitespace ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-22ring-buffer: Correct stale comments related to non-consuming readersPetr Pavlu1-13/+3
Adjust the following code documentation: * Kernel-doc comments for ring_buffer_read_prepare() and ring_buffer_read_finish() mention that recording to the ring buffer is disabled when the read is active. Remove mention of this restriction because it was already lifted in commit 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator"). * Function ring_buffer_read_finish() performs a self-check of the ring-buffer by locking cpu_buffer->reader_lock and then calling rb_check_pages(). The preceding comment explains that the lock is needed because rb_check_pages() clears the HEAD flag required by readers which might be running in parallel. Remove this explanation because commit 8843e06f67b1 ("ring-buffer: Handle race between rb_move_tail and rb_check_pages") simplified the function so it no longer resets the mentioned flag. Nonetheless, the lock is still needed because a reader swapping a page into the ring buffer can make the underlying doubly-linked list temporarily inconsistent. This is a non-functional change. Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-2-petr.pavlu@suse.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-22Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of ↵Linus Torvalds304-6202/+15138
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: "General: - Integrate the shellcheck utility with the build of perf to allow catching shell problems early in areas such as 'perf test', 'perf trace' scrape scripts, etc - Add 'uretprobe' variant in the 'perf bench uprobe' tool - Add script to run instances of 'perf script' in parallel - Allow parsing tracepoint names that start with digits, such as 9p/9p_client_req, etc. Make sure 'perf test' tests it even on systems where those tracepoints aren't available - Add Kan Liang to MAINTAINERS as a perf tools reviewer - Add support for using the 'capstone' disassembler library in various tools, such as 'perf script' and 'perf annotate'. This is an alternative for the use of the 'xed' and 'objdump' disassemblers Data-type profiling improvements: - Resolve types for a->b->c by backtracking the assignments until it finds DWARF info for one of those members - Support for global variables, keeping a cache to speed up lookups - Handle the 'call' instruction, dealing with effects on registers and handling its return when tracking register data types - Handle x86's segment based addressing like %gs:0x28, to support things like per CPU variables, the stack canary, etc - Data-type profiling got big speedups when using capstone for disassembling. The objdump outoput parsing method is left as a fallback when capstone fails or isn't available. There are patches posted for 6.11 that to use a LLVM disassembler - Support event group display in the TUI when annotating types with --data-type, for instance to show memory load and store events for the data type fields - Optimize the 'perf annotate' data structures, reducing memory usage - Add a initial 'perf test' for 'perf annotate', checking that a target symbol appears on the output, specifying objdump via the command line, etc Vendor Events: - Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics erroneously in TopdownL1 - Add AMD's Zen 5 core and uncore events and metrics. Those come from the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh Processors" document, with events that capture information on op dispatch, execution and retirement, branch prediction, L1 and L2 cache activity, TLB activity, etc - Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/ AmpereOneX Miscellaneous: - Sync header copies with the kernel sources - Move some header copies used only for generating translation string tables for ioctl cmds and other syscall integer arguments to a new directory under tools/perf/beauty/, to separate from copies in tools/include/ that are used to build the tools - Introduce scrape script for several syscall 'flags'/'mask' arguments - Improve cpumap utilization, fixing up pairing of refcounts, using the right iterators (perf_cpu_map__for_each_cpu), etc - Give more details about raw event encodings in 'perf list', show tracepoint encoding in the detailed output - Refactor the DSOs handling code, reducing memory usage - Document the BPF event modifier and add a 'perf test' for it - Improve the event parser, better error messages and add further 'perf test's for it - Add reference count checking to 'struct comm_str' and 'struct mem_info' - Make ARM64's 'perf test' entries for the Neoverse N1 more robust - Tweak the ARM64's Coresight 'perf test's - Improve ARM64's CoreSight ETM version detection and error reporting - Fix handling of symbols when using kcore - Fix PAI (Processor Activity Instrumentation) counter names for s390 virtual machines in 'perf report' - Fix -g/--call-graph option failure in 'perf sched timehist' - Add LIBTRACEEVENT_DIR build option to allow building with libtraceevent installed in non-standard directories, such as when doing cross builds - Various 'perf test' and 'perf bench' fixes - Improve 'perf probe' error message for long C++ probe names" * tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits) tools lib subcmd: Show parent options in help perf pmu: Count sys and cpuid JSON events separately perf stat: Don't display metric header for non-leader uncore events perf annotate-data: Ensure the number of type histograms perf annotate: Fix segfault on sample histogram perf daemon: Fix file leak in daemon_session__control libsubcmd: Fix parse-options memory leak perf lock: Avoid memory leaks from strdup() perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency perf tools: Ignore deleted cgroups perf parse: Allow tracepoint names to start with digits perf parse-events: Add new 'fake_tp' parameter for tests perf parse-events: pass parse_state to add_tracepoint perf symbols: Fix ownership of string in dso__load_vmlinux() perf symbols: Update kcore map before merging in remaining symbols perf maps: Re-use __maps__free_maps_by_name() perf symbols: Remove map from list before updating addresses perf tracepoint: Don't scan all tracepoints to test if one exists perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT perf thread: Fixes to thread__new() related to initializing comm ...
2024-05-22Merge tag 'bitmap-for-6.10v2' of https://github.com/norov/linuxLinus Torvalds12-27/+88
Pull bitmap updates from Yury Norov: - topology_span_sane() optimization from Kyle Meyer - fns() rework from Kuan-Wei Chiu (used in cpumask_local_spread() and other places) - headers cleanup from Andy - add a MAINTAINERS record for bitops API * tag 'bitmap-for-6.10v2' of https://github.com/norov/linux: usercopy: Don't use "proxy" headers bitops: Move aligned_byte_mask() to wordpart.h MAINTAINERS: add BITOPS API record bitmap: relax find_nth_bit() limitation on return value lib: make test_bitops compilable into the kernel image bitops: Optimize fns() for improved performance lib/test_bitops: Add benchmark test for fns() Compiler Attributes: Add __always_used macro sched/topology: Optimize topology_span_sane() cpumask: Add for_each_cpu_from()