summaryrefslogtreecommitdiff
path: root/drivers/s390/crypto/vfio_ap_ops.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-22s390/vfio-ap: Add write support to sysfs attr ap_configJason J. Herne1-13/+177
Allow writing a complete set of masks to ap_config. Doing so will cause the vfio-ap driver to replace the vfio-ap mediated device's matrix masks with the given set of masks. If the given state cannot be set, then no changes are made to the vfio-ap mediated device. The format of the data written to ap_config is as follows: {amask},{dmask},{cmask}\n \n is a newline character. amask, dmask, and cmask are masks identifying which adapters, domains, and control domains should be assigned to the mediated device. The format of a mask is as follows: 0xNN..NN Where NN..NN is 64 hexadecimal characters representing a 256-bit value. The leftmost (highest order) bit represents adapter/domain 0. For an example set of masks that represent your mdev's current configuration, simply cat ap_config. This attribute is intended to be used by an mdevctl callout script supporting the mdev type vfio_ap-passthrough to atomically update a vfio-ap mediated device's state. Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20240415152555.13152-5-jjherne@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-22s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queueJason J. Herne1-4/+5
vfio_ap_mdev_link_queue is changed to detect if a matrix_mdev has already linked the given queue. If so, it bails out. Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/r/20240415152555.13152-4-jjherne@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-22s390/vfio-ap: Add sysfs attr, ap_config, to export mdev stateJason J. Herne1-0/+27
Add ap_config sysfs attribute. This will provide the means for setting or displaying the adapters, domains and control domains assigned to the vfio-ap mediated device in a single operation. This sysfs attribute is comprised of three masks: One for adapters, one for domains, and one for control domains. This attribute is intended to be used by mdevctl to query a vfio-ap mediated device's state. Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20240415152555.13152-3-jjherne@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-03-07s390/vfio-ap: handle hardware checkstop state on queue reset operationJason J. Herne1-17/+18
Update vfio_ap_mdev_reset_queue() to handle an unexpected checkstop (hardware error) the same as the deconfigured case. This prevents unexpected and unhelpful warnings in the event of a hardware error. We also stop lying about a queue's reset response code. This was originally done so we could force vfio_ap_mdev_filter_matrix to pass a deconfigured device through to the guest for the hotplug scenario. vfio_ap_mdev_filter_matrix is instead modified to allow passthrough for all queues with reset state normal, deconfigured, or checkstopped. In the checkstopped case we choose to pass the device through and let the error state be reflected at the guest level. Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/r/20240215153144.14747-1-jjherne@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-01-17s390/vfio-ap: do not reset queue removed from host configTony Krowiak1-4/+12
When a queue is unbound from the vfio_ap device driver, it is reset to ensure its crypto data is not leaked when it is bound to another device driver. If the queue is unbound due to the fact that the adapter or domain was removed from the host's AP configuration, then attempting to reset it will fail with response code 01 (APID not valid) getting returned from the reset command. Let's ensure that the queue is assigned to the host's configuration before resetting it. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: "Jason J. Herne" <jjherne@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Fixes: eeb386aeb5b7 ("s390/vfio-ap: handle config changed and scan complete notification") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-7-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17s390/vfio-ap: reset queues associated with adapter for queue unbound from driverTony Krowiak1-35/+41
When a queue is unbound from the vfio_ap device driver, if that queue is assigned to a guest's AP configuration, its associated adapter is removed because queues are defined to a guest via a matrix of adapters and domains; so, it is not possible to remove a single queue. If an adapter is removed from the guest's AP configuration, all associated queues must be reset to prevent leaking crypto data should any of them be assigned to a different guest or device driver. The one caveat is that if the queue is being removed because the adapter or domain has been removed from the host's AP configuration, then an attempt to reset the queue will fail with response code 01, AP-queue number not valid; so resetting these queues should be skipped. Acked-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Fixes: 09d31ff78793 ("s390/vfio-ap: hot plug/unplug of AP devices when probed/removed") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-6-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17s390/vfio-ap: reset queues filtered from the guest's AP configTony Krowiak1-45/+126
When filtering the adapters from the configuration profile for a guest to create or update a guest's AP configuration, if the APID of an adapter and the APQI of a domain identify a queue device that is not bound to the vfio_ap device driver, the APID of the adapter will be filtered because an individual APQN can not be filtered due to the fact the APQNs are assigned to an AP configuration as a matrix of APIDs and APQIs. Consequently, a guest will not have access to all of the queues associated with the filtered adapter. If the queues are subsequently made available again to the guest, they should re-appear in a reset state; so, let's make sure all queues associated with an adapter unplugged from the guest are reset. In order to identify the set of queues that need to be reset, let's allow a vfio_ap_queue object to be simultaneously stored in both a hashtable and a list: A hashtable used to store all of the queues assigned to a matrix mdev; and/or, a list used to store a subset of the queues that need to be reset. For example, when an adapter is hot unplugged from a guest, all guest queues associated with that adapter must be reset. Since that may be a subset of those assigned to the matrix mdev, they can be stored in a list that can be passed to the vfio_ap_mdev_reset_queues function. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-5-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17s390/vfio-ap: let on_scan_complete() callback filter matrix and update ↵Tony Krowiak1-0/+13
guest's APCB When adapters and/or domains are added to the host's AP configuration, this may result in multiple queue devices getting created and probed by the vfio_ap device driver. For each queue device probed, the matrix of adapters and domains assigned to a matrix mdev will be filtered to update the guest's APCB. If any adapters or domains get added to or removed from the APCB, the guest's AP configuration will be dynamically updated (i.e., hot plug/unplug). To dynamically update the guest's configuration, its VCPUs must be taken out of SIE for the period of time it takes to make the update. This is disruptive to the guest's operation and if there are many queues probed due to a change in the host's AP configuration, this could be troublesome. The problem is exacerbated by the fact that the 'on_scan_complete' callback also filters the mdev's matrix and updates the guest's AP configuration. In order to reduce the potential amount of disruption to the guest that may result from a change to the host's AP configuration, let's bypass the filtering of the matrix and updating of the guest's AP configuration in the probe callback - if due to a host config change - and defer it until the 'on_scan_complete' callback is invoked after the AP bus finishes its device scan operation. This way the filtering and updating will be performed only once regardless of the number of queues added. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-4-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17s390/vfio-ap: loop over the shadow APCB when filtering guest's AP configurationTony Krowiak1-2/+3
While filtering the mdev matrix, it doesn't make sense - and will have unexpected results - to filter an APID from the matrix if the APID or one of the associated APQIs is not in the host's AP configuration. There are two reasons for this: 1. An adapter or domain that is not in the host's AP configuration can be assigned to the matrix; this is known as over-provisioning. Queue devices, however, are only created for adapters and domains in the host's AP configuration, so there will be no queues associated with an over-provisioned adapter or domain to filter. 2. The adapter or domain may have been externally removed from the host's configuration via an SE or HMC attached to a DPM enabled LPAR. In this case, the vfio_ap device driver would have been notified by the AP bus via the on_config_changed callback and the adapter or domain would have already been filtered. Since the matrix_mdev->shadow_apcb.apm and matrix_mdev->shadow_apcb.aqm are copied from the mdev matrix sans the APIDs and APQIs not in the host's AP configuration, let's loop over those bitmaps instead of those assigned to the matrix. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-3-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-17s390/vfio-ap: always filter entire AP matrixTony Krowiak1-40/+17
The vfio_ap_mdev_filter_matrix function is called whenever a new adapter or domain is assigned to the mdev. The purpose of the function is to update the guest's AP configuration by filtering the matrix of adapters and domains assigned to the mdev. When an adapter or domain is assigned, only the APQNs associated with the APID of the new adapter or APQI of the new domain are inspected. If an APQN does not reference a queue device bound to the vfio_ap device driver, then it's APID will be filtered from the mdev's matrix when updating the guest's AP configuration. Inspecting only the APID of the new adapter or APQI of the new domain will result in passing AP queues through to a guest that are not bound to the vfio_ap device driver under certain circumstances. Consider the following: guest's AP configuration (all also assigned to the mdev's matrix): 14.0004 14.0005 14.0006 16.0004 16.0005 16.0006 unassign domain 4 unbind queue 16.0005 assign domain 4 When domain 4 is re-assigned, since only domain 4 will be inspected, the APQNs that will be examined will be: 14.0004 16.0004 Since both of those APQNs reference queue devices that are bound to the vfio_ap device driver, nothing will get filtered from the mdev's matrix when updating the guest's AP configuration. Consequently, queue 16.0005 will get passed through despite not being bound to the driver. This violates the linux device model requirement that a guest shall only be given access to devices bound to the device driver facilitating their pass-through. To resolve this problem, every adapter and domain assigned to the mdev will be inspected when filtering the mdev's matrix. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-2-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-11Merge tag 's390-6.8-1' of ↵Linus Torvalds1-6/+24
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Add machine variable capacity information to /proc/sysinfo. - Limit the waste of page tables and always align vmalloc area size and base address on segment boundary. - Fix a memory leak when an attempt to register interruption sub class (ISC) for the adjunct-processor (AP) guest failed. - Reset response code AP_RESPONSE_INVALID_GISA to understandable by guest AP_RESPONSE_INVALID_ADDRESS in response to a failed interruption sub class (ISC) registration attempt. - Improve reaction to adjunct-processor (AP) AP_RESPONSE_OTHERWISE_CHANGED response code when enabling interrupts on behalf of a guest. - Fix incorrect sysfs 'status' attribute of adjunct-processor (AP) queue device bound to the vfio_ap device driver when the mediated device is attached to a guest, but the queue device is not passed through. - Rework struct ap_card to hold the whole adjunct-processor (AP) card hardware information. As result, all the ugly bit checks are replaced by simple evaluations of the required bit fields. - Improve handling of some weird scenarios between service element (SE) host and SE guest with adjunct-processor (AP) pass-through support. - Change local_ctl_set_bit() and local_ctl_clear_bit() so they return the previous value of the to be changed control register. This is useful if a bit is only changed temporarily and the previous content needs to be restored. - The kernel starts with machine checks disabled and is expected to enable it once trap_init() is called. However the implementation allows machine checks early. Consistently enable it in trap_init() only. - local_mcck_disable() and local_mcck_enable() assume that machine checks are always enabled. Instead implement and use local_mcck_save() and local_mcck_restore() to disable machine checks and restore the previous state. - Modification of floating point control (FPC) register of a traced process using ptrace interface may lead to corruption of the FPC register of the tracing process. Fix this. - kvm_arch_vcpu_ioctl_set_fpu() allows to set the floating point control (FPC) register in vCPU, but may lead to corruption of the FPC register of the host process. Fix this. - Use READ_ONCE() to read a vCPU floating point register value from the memory mapped area. This avoids that, depending on code generation, a different value is tested for validity than the one that is used. - Get rid of test_fp_ctl(), since it is quite subtle to use it correctly. Instead copy a new floating point control register value into its save area and test the validity of the new value when loading it. - Remove superfluous save_fpu_regs() call. - Remove s390 support for ARCH_WANTS_DYNAMIC_TASK_STRUCT. All machines provide the vector facility since many years and the need to make the task structure size dependent on the vector facility does not exist. - Remove the "novx" kernel command line option, as the vector code runs without any problems since many years. - Add the vector facility to the z13 architecture level set (ALS). All hypervisors support the vector facility since many years. This allows compile time optimizations of the kernel. - Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx(). As result, the compiled code will have less runtime checks and less code. - Convert pgste_get_lock() and pgste_set_unlock() ASM inlines to C. - Convert the struct subchannel spinlock from pointer to member. * tag 's390-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (24 commits) Revert "s390: update defconfigs" s390/cio: make sch->lock spinlock pointer a member s390: update defconfigs s390/mm: convert pgste locking functions to C s390/fpu: get rid of MACHINE_HAS_VX s390/als: add vector facility to z13 architecture level set s390/fpu: remove "novx" option s390/fpu: remove ARCH_WANTS_DYNAMIC_TASK_STRUCT support KVM: s390: remove superfluous save_fpu_regs() call s390/fpu: get rid of test_fp_ctl() KVM: s390: use READ_ONCE() to read fpc register value KVM: s390: fix setting of fpc register s390/ptrace: handle setting of fpc register correctly s390/nmi: implement and use local_mcck_save() / local_mcck_restore() s390/nmi: consistently enable machine checks in trap_init() s390/ctlreg: return old register contents when changing bits s390/ap: handle outband SE bind state change s390/ap: store TAPQ hwinfo in struct ap_card s390/vfio-ap: fix sysfs status attribute for AP queue devices s390/vfio-ap: improve reaction to response code 07 from PQAP(AQIC) command ...
2023-11-30s390/ap: store TAPQ hwinfo in struct ap_cardHarald Freudenberger1-1/+1
As of now the AP card struct held only part of the queue's hwinfo (that is the GR2 register content returned with an TAPQ invocation). This patch reworks struct ap_card to hold the whole hwinfo now. As there is a nice bit field union on top of this ap_tapq_hwinfo struct, all the ugly bit checkings can now get replaced by simple evaluations of the required bit field. Suggested-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30s390/vfio-ap: fix sysfs status attribute for AP queue devicesTony Krowiak1-1/+15
The 'status' attribute for AP queue devices bound to the vfio_ap device driver displays incorrect status when the mediated device is attached to a guest, but the queue device is not passed through. In the current implementation, the status displayed is 'in_use' which is not correct; it should be 'assigned'. This can happen if one of the queue devices associated with a given adapter is not bound to the vfio_ap device driver. For example: Queues listed in /sys/bus/ap/drivers/vfio_ap: 14.0005 14.0006 14.000d 16.0006 16.000d Queues listed in /sys/devices/vfio_ap/matrix/$UUID/matrix 14.0005 14.0006 14.000d 16.0005 16.0006 16.000d Queues listed in /sys/devices/vfio_ap/matrix/$UUID/guest_matrix 14.0005 14.0006 14.000d The reason no queues for adapter 0x16 are listed in the guest_matrix is because queue 16.0005 is not bound to the vfio_ap device driver, so no queue associated with the adapter is passed through to the guest; therefore, each queue device for adapter 0x16 should display 'assigned' instead of 'in_use', because those queues are not in use by a guest, but only assigned to the mediated device. Let's check the AP configuration for the guest to determine whether a queue device is passed through before displaying a status of 'in_use'. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Acked-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20231108201135.351419-1-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30s390/vfio-ap: improve reaction to response code 07 from PQAP(AQIC) commandTony Krowiak1-1/+4
Let's improve the vfio_ap driver's reaction to reception of response code 07 from the PQAP(AQIC) command when enabling interrupts on behalf of a guest: * Unregister the guest's ISC before the pages containing the notification indicator bytes are unpinned. * Capture the return code from the kvm_s390_gisc_unregister function and log a DBF warning if it fails. Suggested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20231109164427.460493-4-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30s390/vfio-ap: set status response code to 06 on gisc registration failureAnthony Krowiak1-3/+3
The interception handler for the PQAP(AQIC) command calls the kvm_s390_gisc_register function to register the guest ISC with the channel subsystem. If that call fails, the status response code 08 - indicating Invalid ZONE/GISA designation - is returned to the guest. This response code is not valid because setting the ZONE/GISA values is the responsibility of the hypervisor controlling the guest and there is nothing that can be done from the guest perspective to correct that problem. The likelihood of GISC registration failure is nil and there is no status response code to indicate an invalid ISC value, so let's set the response code to 06 indicating 'Invalid address of AP-queue notification byte'. While this is not entirely accurate, it is better than setting a response code which makes no sense for the guest. Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com> Suggested-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20231109164427.460493-3-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-30s390/vfio-ap: unpin pages on gisc registration failureAnthony Krowiak1-0/+1
In the vfio_ap_irq_enable function, after the page containing the notification indicator byte (NIB) is pinned, the function attempts to register the guest ISC. If registration fails, the function sets the status response code and returns without unpinning the page containing the NIB. In order to avoid a memory leak, the NIB should be unpinned before returning from the vfio_ap_irq_enable function. Co-developed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Fixes: 783f0a3ccd79 ("s390/vfio-ap: add s390dbf logging to the vfio_ap_irq_enable function") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231109164427.460493-2-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-28eventfd: simplify eventfd_signal()Christian Brauner1-1/+1
Ever since the eventfd type was introduced back in 2007 in commit e1ad7468c77d ("signal/timer/event: eventfd core") the eventfd_signal() function only ever passed 1 as a value for @n. There's no point in keeping that additional argument. Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org Acked-by: Xu Yilun <yilun.xu@intel.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-31Merge tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds1-0/+1
Pull VFIO updates from Alex Williamson: - VFIO direct character device (cdev) interface support. This extracts the vfio device fd from the container and group model, and is intended to be the native uAPI for use with IOMMUFD (Yi Liu) - Enhancements to the PCI hot reset interface in support of cdev usage (Yi Liu) - Fix a potential race between registering and unregistering vfio files in the kvm-vfio interface and extend use of a lock to avoid extra drop and acquires (Dmitry Torokhov) - A new vfio-pci variant driver for the AMD/Pensando Distributed Services Card (PDS) Ethernet device, supporting live migration (Brett Creeley) - Cleanups to remove redundant owner setup in cdx and fsl bus drivers, and simplify driver init/exit in fsl code (Li Zetao) - Fix uninitialized hole in data structure and pad capability structures for alignment (Stefan Hajnoczi) * tag 'vfio-v6.6-rc1' of https://github.com/awilliam/linux-vfio: (53 commits) vfio/pds: Send type for SUSPEND_STATUS command vfio/pds: fix return value in pds_vfio_get_lm_file() pds_core: Fix function header descriptions vfio: align capability structures vfio/type1: fix cap_migration information leak vfio/fsl-mc: Use module_fsl_mc_driver macro to simplify the code vfio/cdx: Remove redundant initialization owner in vfio_cdx_driver vfio/pds: Add Kconfig and documentation vfio/pds: Add support for firmware recovery vfio/pds: Add support for dirty page tracking vfio/pds: Add VFIO live migration support vfio/pds: register with the pds_core PF pds_core: Require callers of register/unregister to pass PF drvdata vfio/pds: Initial support for pds VFIO driver vfio: Commonize combine_ranges for use in other VFIO drivers kvm/vfio: avoid bouncing the mutex when adding and deleting groups kvm/vfio: ensure kvg instance stays around in kvm_vfio_group_add() docs: vfio: Add vfio device cdev description vfio: Compile vfio_group infrastructure optionally vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in __vfio_register_dev() ...
2023-08-18s390/vfio-ap: make sure nib is sharedTony Krowiak1-0/+30
Since the NIB is visible by HW, KVM and the (PV) guest it needs to be in non-secure or secure but shared storage. Return code 6 is used to indicate to a PV guest that its NIB would be on secure, unshared storage and therefore the NIB address is invalid. Unfortunately we have no easy way to check if a page is unshared after vfio_pin_pages() since it will automatically export an unshared page if the UV pin shared call did not succeed due to a page being in unshared state. Therefore we use the fact that UV pinning it a second time is a nop but trying to pin an exported page is an error (0x102). If we encounter this error, we do a vfio unpin and import the page again, since vfio_pin_pages() exported it. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-13-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18s390/vfio-ap: check for TAPQ response codes 0x35 and 0x36Tony Krowiak1-1/+12
Check for response codes 0x35 and 0x36 which are asynchronous return codes indicating a failure of the guest to associate a secret with a queue. Since there can be no interaction with this queue from the guest (i.e., the vcpus are out of SIE for hot unplug, the guest is being shut down or an emulated subsystem reset of the guest is taking place), let's go ahead and re-issue the ZAPQ to reset and zeroize the queue. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-10-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18s390/vfio-ap: handle queue state change in progress on resetTony Krowiak1-1/+3
A new APQSW response code (0xA) indicating the designated queue is in the process of being bound or associated to a configuration may be returned from the PQAP(ZAPQ) command. This patch introduces code that will verify when the PQAP(ZAPQ) command can be re-issued after receiving response code 0xA. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-9-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18s390/vfio-ap: use work struct to verify queue resetTony Krowiak1-25/+23
Instead of waiting to verify that a queue is reset in the vfio_ap_mdev_reset_queue function, let's use a wait queue to check the the state of the reset. This way, when resetting all of the queues assigned to a matrix mdev, we don't have to wait for each queue to be reset before initiating a reset on the next queue to be reset. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Suggested-by: Halil Pasic <pasic@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-8-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18s390/vfio-ap: store entire AP queue status word with the queue objectTony Krowiak1-12/+15
Store the entire AP queue status word returned from the ZAPQ command with the struct vfio_ap_queue object instead of just the response code field. The other information contained in the status word is need by the apq_reset_check function to display a proper message to indicate that the vfio_ap driver is waiting for the ZAPQ to complete because the queue is not empty or IRQs are still enabled. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-7-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18s390/vfio-ap: remove upper limit on wait for queue reset to completeTony Krowiak1-29/+35
The architecture does not define an upper limit on how long a queue reset (RAPQ/ZAPQ) can take to complete. In order to ensure both the security requirements and prevent resource leakage and corruption in the hypervisor, it is necessary to remove the upper limit (200ms) the vfio_ap driver currently waits for a reset to complete. This, of course, may result in a hang which is a less than desirable user experience, but until a firmware solution is provided, this is a necessary evil. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-6-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18s390/vfio-ap: allow deconfigured queue to be passed through to a guestTony Krowiak1-2/+4
When a queue is reset, the status response code returned from the reset operation is stored in the reset_rc field of the vfio_ap_queue structure representing the queue being reset. This field is later used to decide whether the queue should be passed through to a guest. If the reset_rc field is a non-zero value, the queue will be filtered from the list of queues passed through. When an adapter is deconfigured, all queues associated with that adapter are reset. That being the case, it is not necessary to filter those queues; so, if the status response code returned from a reset operation indicates the queue is deconfigured, the reset_rc field of the vfio_ap_queue structure will be set to zero so it will be passed through (i.e., not filtered). Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-5-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18s390/vfio-ap: wait for response code 05 to clear on queue resetTony Krowiak1-0/+2
Response code 05, AP busy, is a valid response code for a ZAPQ or TAPQ. Instead of returning error -EIO when a ZAPQ fails with response code 05, let's wait until the queue is no longer busy and try the ZAPQ again. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-4-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18s390/vfio-ap: clean up irq resources if possibleTony Krowiak1-4/+7
The architecture does not specify whether interrupts are disabled as part of the asynchronous reset or upon return from the PQAP/ZAPQ instruction. If, however, PQAP/ZAPQ completes with APQSW response code 0 and the interrupt bit in the status word is also 0, we know the interrupts are disabled and we can go ahead and clean up the corresponding resources; otherwise, we must wait until the asynchronous reset has completed. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Suggested-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-3-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-08-18s390/vfio-ap: no need to check the 'E' and 'I' bits in APQSW after TAPQTony Krowiak1-11/+2
After a ZAPQ is executed to reset a queue, if the queue is not empty or interrupts are still enabled, the vfio_ap driver will wait for the reset operation to complete by repeatedly executing the TAPQ instruction and checking the 'E' and 'I' bits in the APQSW to verify that the queue is empty and interrupts are disabled. This is unnecessary because it is sufficient to check only the response code in the APQSW. If the reset is still in progress, the response code will be 02; however, if the reset has completed successfully, the response code will be 00. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com> Link: https://lore.kernel.org/r/20230815184333.6554-2-akrowiak@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-25vfio-iommufd: Add detach_ioas support for emulated VFIO devicesYi Liu1-0/+1
This prepares for adding DETACH ioctl for emulated VFIO devices. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yanting Jiang <yanting.jiang@intel.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-16-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-07-03s390: fix various typosHeiko Carstens1-2/+2
Fix various typos found with codespell. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-06s390/vfio-ap: wire in the vfio_device_ops request callbackTony Krowiak1-0/+21
The mdev device is being removed, so pass the request to userspace to ask for a graceful cleanup. This should free up the thread that would otherwise loop waiting for the device to be fully released. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20230530223538.279198-4-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-06s390/vfio-ap: realize the VFIO_DEVICE_SET_IRQS ioctlTony Krowiak1-0/+83
Realize the VFIO_DEVICE_SET_IRQS ioctl to set an eventfd file descriptor to be used by the vfio_ap device driver to signal a device request to userspace. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20230530223538.279198-3-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-06-06s390/vfio-ap: realize the VFIO_DEVICE_GET_IRQ_INFO ioctlTony Krowiak1-1/+29
Realize the VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve the information for the VFIO device request IRQ. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20230530223538.279198-2-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-03-20s390/ap: provide F bit parameter for ap_rapq() and ap_zapq()Harald Freudenberger1-1/+1
Extent the ap inline functions ap_rapq() (calls PQAP(RAPQ)) and ap_zapq() (calls PQAP(ZAPQ)) with a new parameter to enable the new architectured F bit which forces an unassociate and/or unbind on a secure execution associated and/or bound queue. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/ap: make tapq gr2 response a structHarald Freudenberger1-5/+3
This patch introduces a new struct ap_tapq_gr2 which covers the response in GR2 on TAPQ invocation. This makes it much easier and less error-prone for the calling functions to access the right field without shifting and masking. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-03-20s390/ap: exploit new B bit from QCI config infoHarald Freudenberger1-3/+3
This patch introduces an update to the ap_config_info struct which is filled with the QCI subfunction. There is a new bit apsb (short 'B') showing if the AP secure bind facility is available. The patch also includes a simple function ap_sb_available() wrapping this bit test. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-27s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg unionHarald Freudenberger1-2/+2
Introduce a new ap queue status register wrapper union to access register wide values. So the inline assembler only sees register wide values but the surrounding code may use a more structured view of the same value and a reader of the code (and the compiler) gets a clear understanding about the mapping between fields and register values. All the changes to access the ap queue status are local to the inline functions within ap.h. However, the struct ap_qirq_ctrl has been replaces by a union for same reason and this needed slight adaptions in the calling code. Suggested-by: Halil Pasic <pasic@linux.ibm.com> Suggested-by: Andreas Arnez <arnez@linux.ibm.com> Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-02-10s390: vfio-ap: tighten the NIB validity checkHalil Pasic1-0/+2
The NIB is architecturally invalid if the address designates a storage location that is not installed or if it is zero. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reported-by: Janosch Frank <frankja@linux.ibm.com> Fixes: ec89b55e3bce ("s390: ap: implement PAPQ AQIC interception in kernel") Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-22s390/vfio_ap: increase max wait time for reset verificationTony Krowiak1-3/+7
Increase the maximum time to wait for verification of a queue reset operation to 200ms. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20230118203111.529766-7-akrowiak@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-22s390/vfio_ap: fix handling of error response codesTony Krowiak1-10/+7
Some response codes returned from the queue reset function are not being handled correctly; this patch fixes them: 1. Response code 3, AP queue deconfigured: Deconfiguring an AP adapter resets all of its queues, so this is handled by indicating the reset verification completed successfully. 2. For all response codes other than 0 (normal reset completion), 2 (queue reset in progress) and 3 (AP deconfigured), the -EIO error will be returned from the vfio_ap_mdev_reset_queue() function. In all cases, all fields of the status word other than the response code will be set to zero, so it makes no sense to check status bits. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20230118203111.529766-6-akrowiak@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-22s390/vfio_ap: verify ZAPQ completion after return of response code zeroTony Krowiak1-4/+3
Verification that the asynchronous ZAPQ function has completed only needs to be done when the response code indicates the function was successfully initiated; so, let's call the apq_reset_check function immediately after the response code zero is returned from the ZAPQ. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20230118203111.529766-5-akrowiak@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-22s390/vfio_ap: use TAPQ to verify reset in progress completesTony Krowiak1-11/+13
To eliminate the repeated calls to the PQAP(ZAPQ) function to verify that a reset in progress completed successfully and ensure that error response codes get appropriately logged, let's call the apq_reset_check() function when the ZAPQ response code indicates that a reset that is already in progress. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20230118203111.529766-4-akrowiak@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-22s390/vfio_ap: check TAPQ response code when waiting for queue resetTony Krowiak1-5/+31
The vfio_ap_mdev_reset_queue() function does not check the status response code returned form the PQAP(TAPQ) function when verifying the queue's status; consequently, there is no way of knowing whether verification failed because the wait time was exceeded, or because the PQAP(TAPQ) failed. This patch adds a function to check the status response code from the PQAP(TAPQ) instruction and logs an appropriate message if it fails. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20230118203111.529766-3-akrowiak@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-22s390/vfio-ap: verify reset complete in separate functionTony Krowiak1-9/+21
The vfio_ap_mdev_reset_queue() function contains a loop to verify that the reset successfully completes within 40ms. This patch moves that loop into a separate function. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20230118203111.529766-2-akrowiak@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-01-13s390/vfio-ap: fix an error handling path in vfio_ap_mdev_probe_queue()Christophe JAILLET1-2/+8
The commit in Fixes: has switch the order of a sysfs_create_group() and a kzalloc(). It correctly removed the now useless kfree() but forgot to add a sysfs_remove_group() in case of (unlikely) memory allocation failure. Add it now. Fixes: 260f3ea14138 ("s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/d0c0a35eec4fa87cb7f3910d8ac4dc0f7dc9008a.1659283738.git.christophe.jaillet@wanadoo.fr Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2022-12-16Merge tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds1-6/+0
Pull VFIO updates from Alex Williamson: - Replace deprecated git://github.com link in MAINTAINERS (Palmer Dabbelt) - Simplify vfio/mlx5 with module_pci_driver() helper (Shang XiaoJing) - Drop unnecessary buffer from ACPI call (Rafael Mendonca) - Correct latent missing include issue in iova-bitmap and fix support for unaligned bitmaps. Follow-up with better fix through refactor (Joao Martins) - Rework ccw mdev driver to split private data from parent structure, better aligning with the mdev lifecycle and allowing us to remove a temporary workaround (Eric Farman) - Add an interface to get an estimated migration data size for a device, allowing userspace to make informed decisions, ex. more accurately predicting VM downtime (Yishai Hadas) - Fix minor typo in vfio/mlx5 array declaration (Yishai Hadas) - Simplify module and Kconfig through consolidating SPAPR/EEH code and config options and folding virqfd module into main vfio module (Jason Gunthorpe) - Fix error path from device_register() across all vfio mdev and sample drivers (Alex Williamson) - Define migration pre-copy interface and implement for vfio/mlx5 devices, allowing portions of the device state to be saved while the device continues operation, towards reducing the stop-copy state size (Jason Gunthorpe, Yishai Hadas, Shay Drory) - Implement pre-copy for hisi_acc devices (Shameer Kolothum) - Fixes to mdpy mdev driver remove path and error path on probe (Shang XiaoJing) - vfio/mlx5 fixes for incorrect return after copy_to_user() fault and incorrect buffer freeing (Dan Carpenter) * tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio: (42 commits) vfio/mlx5: error pointer dereference in error handling vfio/mlx5: fix error code in mlx5vf_precopy_ioctl() samples: vfio-mdev: Fix missing pci_disable_device() in mdpy_fb_probe() hisi_acc_vfio_pci: Enable PRE_COPY flag hisi_acc_vfio_pci: Move the dev compatibility tests for early check hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions hisi_acc_vfio_pci: Add support for precopy IOCTL vfio/mlx5: Enable MIGRATION_PRE_COPY flag vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error vfio/mlx5: Introduce multiple loads vfio/mlx5: Consider temporary end of stream as part of PRE_COPY vfio/mlx5: Introduce vfio precopy ioctl implementation vfio/mlx5: Introduce SW headers for migration states vfio/mlx5: Introduce device transitions of PRE_COPY vfio/mlx5: Refactor to use queue based data chunks vfio/mlx5: Refactor migration file state vfio/mlx5: Refactor MKEY usage vfio/mlx5: Refactor PD usage vfio/mlx5: Enforce a single SAVE command at a time vfio: Extend the device migration protocol with PRE_COPY ...
2022-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+1
Pull kvm updates from Paolo Bonzini: "ARM64: - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: - Second batch of the lazy destroy patches - First batch of KVM changes for kernel virtual != physical address support - Removal of a unused function x86: - Allow compiling out SMM support - Cleanup and documentation of SMM state save area format - Preserve interrupt shadow in SMM state save area - Respond to generic signals during slow page faults - Fixes and optimizations for the non-executable huge page errata fix. - Reprogram all performance counters on PMU filter change - Cleanups to Hyper-V emulation and tests - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) - Advertise several new Intel features - x86 Xen-for-KVM: - Allow the Xen runstate information to cross a page boundary - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured - Add support for 32-bit guests in SCHEDOP_poll - Notable x86 fixes and cleanups: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. - Advertise (on AMD) that the SMM_CTL MSR is not supported - Remove unnecessary exports Generic: - Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: - Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. - Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. - Introduce actual atomics for clear/set_bit() in selftests - Add support for pinning vCPUs in dirty_log_perf_test. - Rename the so called "perf_util" framework to "memstress". - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. - Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). - A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. - x86-specific selftest changes: - Clean up x86's page table management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. - Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: - Remove deleted ioctls from documentation - Clean up the docs for the x86 MSR filter. - Various fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits) KVM: x86: Add proper ReST tables for userspace MSR exits/flags KVM: selftests: Allocate ucall pool from MEM_REGION_DATA KVM: arm64: selftests: Align VA space allocator with TTBR0 KVM: arm64: Fix benign bug with incorrect use of VA_BITS KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: x86: Advertise that the SMM_CTL MSR is not supported KVM: x86: remove unnecessary exports KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic" tools: KVM: selftests: Convert clear/set_bit() to actual atomics tools: Drop "atomic_" prefix from atomic test_and_set_bit() tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests perf tools: Use dedicated non-atomic clear/set bit helpers tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers KVM: arm64: selftests: Enable single-step without a "full" ucall() KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself KVM: Remove stale comment about KVM_REQ_UNHALT KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR KVM: Reference to kvm_userspace_memory_region in doc and comments KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl ...
2022-12-02vfio-iommufd: Support iommufd for emulated VFIO devicesJason Gunthorpe1-0/+3
Emulated VFIO devices are calling vfio_register_emulated_iommu_dev() and consist of all the mdev drivers. Like the physical drivers, support for iommufd is provided by the driver supplying the correct standard ops. Provide ops from the core that duplicate what vfio_register_emulated_iommu_dev() does. Emulated drivers are where it is more likely to see variation in the iommfd support ops. For instance IDXD will probably need to setup both a iommfd_device context linked to a PASID and an iommufd_access context to support all their mdev operations. Link: https://lore.kernel.org/r/7-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02vfio/ap: Validate iova during dma_unmap and trigger irq disableMatthew Rosato1-1/+17
Currently, each mapped iova is stashed in its associated vfio_ap_queue; when we get an unmap request, validate that it matches with one or more of these stashed values before attempting unpins. Each stashed iova represents IRQ that was enabled for a queue. Therefore, if a match is found, trigger IRQ disable for this queue to ensure that underlying firmware will no longer try to use the associated pfn after the page is unpinned. IRQ disable will also handle the associated unpin. Link: https://lore.kernel.org/r/20221202135402.756470-3-yi.l.liu@intel.com Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-23s390/vfio-ap: GISA: sort out physical vs virtual pointers usageNico Boehr1-1/+1
Fix virtual vs physical address confusion (which currently are the same) for the GISA when enabling the IRQ. Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20221118100429.70453-1-nrb@linux.ibm.com Message-Id: <20221118100429.70453-1-nrb@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>