summaryrefslogtreecommitdiff
path: root/drivers/vfio/vfio.c
AgeCommit message (Collapse)AuthorFilesLines
2021-11-30vfio: remove all kernel-doc notationRandy Dunlap1-14/+14
vfio.c abuses (misuses) "/**", which indicates the beginning of kernel-doc notation in the kernel tree. This causes a bunch of kernel-doc complaints about this source file, so quieten all of them by changing all "/**" to "/*". vfio.c:236: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * IOMMU driver registration vfio.c:236: warning: missing initial short description on line: * IOMMU driver registration vfio.c:295: warning: expecting prototype for Container objects(). Prototype was for vfio_container_get() instead vfio.c:317: warning: expecting prototype for Group objects(). Prototype was for __vfio_group_get_from_iommu() instead vfio.c:496: warning: Function parameter or member 'device' not described in 'vfio_device_put' vfio.c:496: warning: expecting prototype for Device objects(). Prototype was for vfio_device_put() instead vfio.c:599: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Async device support vfio.c:599: warning: missing initial short description on line: * Async device support vfio.c:693: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * VFIO driver API vfio.c:693: warning: missing initial short description on line: * VFIO driver API vfio.c:835: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Get a reference to the vfio_device for a device. Even if the vfio.c:835: warning: missing initial short description on line: * Get a reference to the vfio_device for a device. Even if the vfio.c:969: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * VFIO base fd, /dev/vfio/vfio vfio.c:969: warning: missing initial short description on line: * VFIO base fd, /dev/vfio/vfio vfio.c:1187: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * VFIO Group fd, /dev/vfio/$GROUP vfio.c:1187: warning: missing initial short description on line: * VFIO Group fd, /dev/vfio/$GROUP vfio.c:1540: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * VFIO Device fd vfio.c:1540: warning: missing initial short description on line: * VFIO Device fd vfio.c:1615: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * External user API, exported by symbols to be linked dynamically. vfio.c:1615: warning: missing initial short description on line: * External user API, exported by symbols to be linked dynamically. vfio.c:1663: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * External user API, exported by symbols to be linked dynamically. vfio.c:1663: warning: missing initial short description on line: * External user API, exported by symbols to be linked dynamically. vfio.c:1742: warning: Function parameter or member 'caps' not described in 'vfio_info_cap_add' vfio.c:1742: warning: Function parameter or member 'size' not described in 'vfio_info_cap_add' vfio.c:1742: warning: Function parameter or member 'id' not described in 'vfio_info_cap_add' vfio.c:1742: warning: Function parameter or member 'version' not described in 'vfio_info_cap_add' vfio.c:1742: warning: expecting prototype for Sub(). Prototype was for vfio_info_cap_add() instead vfio.c:2276: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Module/class support vfio.c:2276: warning: missing initial short description on line: * Module/class support Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/r/38a9cb92-a473-40bf-b8f9-85cc5cfc2da4@infradead.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15vfio: Use cdev_device_add() instead of device_create()Jason Gunthorpe1-101/+92
Modernize how vfio is creating the group char dev and sysfs presence. These days drivers with state should use cdev_device_add() and cdev_device_del() to manage the cdev and sysfs lifetime. This API requires the driver to put the struct device and struct cdev inside its state struct (vfio_group), and then use the usual device_initialize()/cdev_device_add()/cdev_device_del() sequence. Split the code to make this possible: - vfio_group_alloc()/vfio_group_release() are pair'd functions to alloc/free the vfio_group. release is done under the struct device kref. - vfio_create_group()/vfio_group_put() are pairs that manage the sysfs/cdev lifetime. Once the uses count is zero the vfio group's userspace presence is destroyed. - The IDR is replaced with an IDA. container_of(inode->i_cdev) is used to get back to the vfio_group during fops open. The IDA assigns unique minor numbers. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15vfio: Use a refcount_t instead of a kref in the vfio_groupJason Gunthorpe1-12/+9
The next patch adds a struct device to the struct vfio_group, and it is confusing/bad practice to have two krefs in the same struct. This kref is controlling the period when the vfio_group is registered in sysfs, and visible in the internal lookup. Switch it to a refcount_t instead. The refcount_dec_and_mutex_lock() is still required because we need atomicity of the list searches and sysfs presence. Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15vfio: Don't leak a group reference if the group already existsJason Gunthorpe1-14/+8
If vfio_create_group() searches the group list and returns an already existing group it does not put back the iommu_group reference that the caller passed in. Change the semantic of vfio_create_group() to not move the reference in from the caller, but instead obtain a new reference inside and leave the caller's reference alone. The two callers must now call iommu_group_put(). This is an unlikely race as the only caller that could hit it has already searched the group list before attempting to create the group. Fixes: cba3345cc494 ("vfio: VFIO core") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15vfio: Do not open code the group list search in vfio_create_group()Jason Gunthorpe1-25/+30
Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so that vfio_create_group() can call it to consolidate this duplicated code. Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()Jason Gunthorpe1-85/+15
iommu_group_register_notifier()/iommu_group_unregister_notifier() are built using a blocking_notifier_chain which integrates a rwsem. The notifier function cannot be running outside its registration. When considering how the notifier function interacts with create/destroy of the group there are two fringe cases, the notifier starts before list_add(&vfio.group_list) and the notifier runs after the kref becomes 0. Prior to vfio_create_group() unlocking and returning we have container_users == 0 device_list == empty And this cannot change until the mutex is unlocked. After the kref goes to zero we must also have container_users == 0 device_list == empty Both are required because they are balanced operations and a 0 kref means some caller became unbalanced. Add the missing assertion that container_users must be zero as well. These two facts are important because when checking each operation we see: - IOMMU_GROUP_NOTIFY_ADD_DEVICE Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev() 0 container_users ends the call - IOMMU_GROUP_NOTIFY_BOUND_DRIVER 0 container_users ends the call Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes items from the unbound list. During creation this list is empty, during kref == 0 nothing can read this list, and it will be freed soon. Since the vfio_group_release() doesn't hold the appropriate lock to manipulate the unbound_list and could race with the notifier, move the cleanup to directly before the kfree. This allows deleting all of the deferred group put code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: clean up the check for mediated device in vfio_iommu_type1Christoph Hellwig1-27/+5
Pass the group flags to ->attach_group and remove the messy check for the bus type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-12-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: move the vfio_iommu_driver_ops interface out of <linux/vfio.h>Christoph Hellwig1-0/+1
Create a new private drivers/vfio/vfio.h header for the interface between the VFIO core and the iommu drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-10-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: remove unused method from vfio_iommu_driver_opsChristoph Hellwig1-50/+0
The read, write and mmap methods are never implemented, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-9-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: simplify iommu group allocation for mediated devicesChristoph Hellwig1-26/+66
Reuse the logic in vfio_noiommu_group_alloc to allocate a fake single-device iommu group for mediated devices by factoring out a common function, and replacing the noiommu boolean field in struct vfio_group with an enum to distinguish the three different kinds of groups. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-8-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: remove the iommudata hack for noiommu groupsChristoph Hellwig1-13/+8
Just pass a noiommu argument to vfio_create_group and set up the ->noiommu flag directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-7-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: refactor noiommu group creationChristoph Hellwig1-52/+52
Split the actual noiommu group creation from vfio_iommu_group_get into a new helper, and open code the rest of vfio_iommu_group_get in its only caller. This creates an entirely separate and clear code path for the noiommu group creation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-6-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: factor out a vfio_group_find_or_alloc helperChristoph Hellwig1-25/+35
Factor out a helper to find or allocate the vfio_group to reduce the spagetthi code in vfio_register_group_dev a little. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-5-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: remove the iommudata check in vfio_noiommu_attach_groupChristoph Hellwig1-1/+1
vfio_noiommu_attach_group has two callers: 1) __vfio_container_attach_groups is called by vfio_ioctl_set_iommu, which just called vfio_iommu_driver_allowed 2) vfio_group_set_container requires already checks ->noiommu on the vfio_group, which is propagated from the iommudata in vfio_create_group so this check is entirely superflous and can be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-4-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: factor out a vfio_iommu_driver_allowed helperChristoph Hellwig1-14/+19
Factor out a little helper to make the checks for the noiommu driver less ugly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-3-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-09-30vfio: Move vfio_iommu_group_get() to vfio_register_group_dev()Jason Gunthorpe1-22/+14
We don't need to hold a reference to the group in the driver as well as obtain a reference to the same group as the first thing vfio_register_group_dev() does. Since the drivers never use the group move this all into the core code. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20210924155705.4258-2-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio: Remove struct vfio_device_ops open/releaseJason Gunthorpe1-13/+1
Nothing uses this anymore, delete it. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/14-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio: Provide better generic support for open/release vfio_device_opsJason Gunthorpe1-22/+127
Currently the driver ops have an open/release pair that is called once each time a device FD is opened or closed. Add an additional set of open/close_device() ops which are called when the device FD is opened for the first time and closed for the last time. An analysis shows that all of the drivers require this semantic. Some are open coding it as part of their reflck implementation, and some are just buggy and miss it completely. To retain the current semantics PCI and FSL depend on, introduce the idea of a "device set" which is a grouping of vfio_device's that share the same lock around opening. The device set is established by providing a 'set_id' pointer. All vfio_device's that provide the same pointer will be joined to the same singleton memory and lock across the whole set. This effectively replaces the oddly named reflck. After conversion the set_id will be sourced from: - A struct device from a fsl_mc_device (fsl) - A struct pci_slot (pci) - A struct pci_bus (pci) - The struct vfio_device (everything) The design ensures that the above pointers are live as long as the vfio_device is registered, so they form reliable unique keys to group vfio_devices into sets. This implementation uses xarray instead of searching through the driver core structures, which simplifies the somewhat tricky locking in this area. Following patches convert all the drivers. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-11vfio: Introduce a vfio_uninit_group_dev() API callMax Gurtovoy1-0/+5
This pairs with vfio_init_group_dev() and allows undoing any state that is stored in the vfio_device unrelated to registration. Add appropriately placed calls to all the drivers. The following patch will use this to add pre-registration state for the device set. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-06-15vfio: centralize module refcount in subsystem layerMax Gurtovoy1-0/+10
Remove code duplication and move module refcounting to the subsystem module. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20210518192133.59195-2-mgurtovoy@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Remove device_data from the vfio bus driver APIJason Gunthorpe1-11/+1
There are no longer any users, so it can go away. Everything is using container_of now. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <14-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Make vfio_device_ops pass a 'struct vfio_device *' instead of 'void *'Jason Gunthorpe1-10/+10
This is the standard kernel pattern, the ops associated with a struct get the struct pointer in for typesafety. The expected design is to use container_of to cleanly go from the subsystem level type to the driver level type without having any type erasure in a void *. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <12-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/mdev: Use vfio_init/register/unregister_group_devJason Gunthorpe1-37/+2
mdev gets little benefit because it doesn't actually do anything, however it is the last user, so move the vfio_init/register/unregister_group_dev() code here for now. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <10-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Split creation of a vfio_device into init and register opsJason Gunthorpe1-60/+65
This makes the struct vfio_device part of the public interface so it can be used with container_of and so forth, as is typical for a Linux subystem. This is the first step to bring some type-safety to the vfio interface by allowing the replacement of 'void *' and 'struct device *' inputs with a simple and clear 'struct vfio_device *' For now the self-allocating vfio_add_group_dev() interface is kept so each user can be updated as a separate patch. The expected usage pattern is driver core probe() function: my_device = kzalloc(sizeof(*mydevice)); vfio_init_group_dev(&my_device->vdev, dev, ops, mydevice); /* other driver specific prep */ vfio_register_group_dev(&my_device->vdev); dev_set_drvdata(dev, my_device); driver core remove() function: my_device = dev_get_drvdata(dev); vfio_unregister_group_dev(&my_device->vdev); /* other driver specific tear down */ kfree(my_device); Allowing the driver to be able to use the drvdata and vfio_device to go to/from its own data. The pattern also makes it clear that vfio_register_group_dev() must be last in the sequence, as once it is called the core code can immediately start calling ops. The init/register gap is provided to allow for the driver to do setup before ops can be called and thus avoid races. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <3-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Simplify the lifetime logic for vfio_deviceJason Gunthorpe1-54/+25
The vfio_device is using a 'sleep until all refs go to zero' pattern for its lifetime, but it is indirectly coded by repeatedly scanning the group list waiting for the device to be removed on its own. Switch this around to be a direct representation, use a refcount to count the number of places that are blocking destruction and sleep directly on a completion until that counter goes to zero. kfree the device after other accesses have been excluded in vfio_del_group_dev(). This is a fairly common Linux idiom. Due to this we can now remove kref_put_mutex(), which is very rarely used in the kernel. Here it is being used to prevent a zero ref device from being seen in the group list. Instead allow the zero ref device to continue to exist in the device_list and use refcount_inc_not_zero() to exclude it once refs go to zero. This patch is organized so the next patch will be able to alter the API to allow drivers to provide the kfree. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <2-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Remove extra put/gets around vfio_device->groupJason Gunthorpe1-19/+2
The vfio_device->group value has a get obtained during vfio_add_group_dev() which gets moved from the stack to vfio_device->group in vfio_group_create_device(). The reference remains until we reach the end of vfio_del_group_dev() when it is put back. Thus anything that already has a kref on the vfio_device is guaranteed a valid group pointer. Remove all the extra reference traffic. It is tricky to see, but the get at the start of vfio_del_group_dev() is actually pairing with the put hidden inside vfio_device_put() a few lines below. A later patch merges vfio_group_create_device() into vfio_add_group_dev() which makes the ownership and error flow on the create side easier to follow. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <1-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-01vfio: iommu driver notify callbackSteve Sistare1-0/+5
Define a vfio_iommu_driver_ops notify callback, for sending events to the driver. Drivers are not required to provide the callback, and may ignore any events. The handling of events is driver specific. Define the CONTAINER_CLOSE event, called when the container's file descriptor is closed. This event signifies that no further state changes will occur via container ioctl's. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-12-11vfio/type1: Add vfio_group_iommu_domain()Lu Baolu1-0/+18
Add the API for getting the domain from a vfio group. This could be used by the physical device drivers which rely on the vfio/mdev framework for mediated device user level access. The typical use case like below: unsigned int pasid; struct vfio_group *vfio_group; struct iommu_domain *iommu_domain; struct device *dev = mdev_dev(mdev); struct device *iommu_device = mdev_get_iommu_device(dev); if (!iommu_device || !iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX)) return -EINVAL; vfio_group = vfio_group_get_external_user_from_dev(dev); if (IS_ERR_OR_NULL(vfio_group)) return -EFAULT; iommu_domain = vfio_group_iommu_domain(vfio_group); if (IS_ERR_OR_NULL(iommu_domain)) { vfio_group_put_external_user(vfio_group); return -EFAULT; } pasid = iommu_aux_get_pasid(iommu_domain, iommu_device); if (pasid < 0) { vfio_group_put_external_user(vfio_group); return -EFAULT; } /* Program device context with pasid value. */ ... Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22vfio: fix a missed vfio group put in vfio_pin_pagesYan Zhao1-2/+4
When error occurs, need to put vfio group after a successful get. Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21vfio: add a singleton check for vfio_group_pin_pagesYan Zhao1-0/+3
Page pinning is used both to translate and pin device mappings for DMA purpose, as well as to indicate to the IOMMU backend to limit the dirty page scope to those pages that have been pinned, in the case of an IOMMU backed device. To support this, the vfio_pin_pages() interface limits itself to only singleton groups such that the IOMMU backend can consider dirty page scope only at the group level. Implement the same requirement for the vfio_group_pin_pages() interface. Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27vfio: Cleanup allowed driver namingAlex Williamson1-6/+7
No functional change, avoid non-inclusive naming schemes. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29vfio: Selective dirty page tracking if IOMMU backed device pins pagesKirti Wankhede1-3/+10
Added a check such that only singleton IOMMU groups can pin pages. >From the point when vendor driver pins any pages, consider IOMMU group dirty page scope to be limited to pinned pages. To optimize to avoid walking list often, added flag pinned_page_dirty_scope to indicate if all of the vfio_groups for each vfio_domain in the domain_list dirty page scope is limited to pinned pages. This flag is updated on first pinned pages request for that IOMMU group and on attaching/detaching group. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24Merge branches 'v5.7/vfio/alex-sriov-v3' and 'v5.7/vfio/yan-dma-rw-v4' into ↵Alex Williamson1-4/+194
v5.7/vfio/next
2020-03-24vfio: Include optional device match in vfio_device_ops callbacksAlex Williamson1-4/+16
Allow bus drivers to provide their own callback to match a device to the user provided string. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pagesYan Zhao1-0/+91
vfio_group_pin_pages() and vfio_group_unpin_pages() are introduced to avoid inefficient search/check/ref/deref opertions associated with VFIO group as those in each calling into vfio_pin_pages() and vfio_unpin_pages(). VFIO group is taken as arg directly. The callers combine search/check/ref/deref operations associated with VFIO group by calling vfio_group_get_external_user()/vfio_group_get_external_user_from_dev() beforehand, and vfio_group_put_external_user() afterwards. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24vfio: introduce vfio_dma_rw to read/write a range of IOVAsYan Zhao1-0/+49
vfio_dma_rw will read/write a range of user space memory pointed to by IOVA into/from a kernel buffer without enforcing pinning the user space memory. TODO: mark the IOVAs to user space memory dirty if they are written in vfio_dma_rw(). Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-03-24vfio: allow external user to get vfio group from deviceYan Zhao1-0/+38
external user calls vfio_group_get_external_user_from_dev() with a device pointer to get the VFIO group associated with this device. The VFIO group is checked to be vialbe and have IOMMU set. Then container user counter is increased and VFIO group reference is hold to prevent the VFIO group from disposal before external user exits. when the external user finishes using of the VFIO group, it calls vfio_group_put_external_user() to dereference the VFIO group and the container user counter. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-10-23compat_ioctl: move drivers to compat_ptr_ioctlArnd Bergmann1-36/+3
Each of these drivers has a copy of the same trivial helper function to convert the pointer argument and then call the native ioctl handler. We now have a generic implementation of that, so use it. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner1-4/+1
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-23vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING"Farhan Ali1-20/+10
vfio_dev_present() which is the condition to wait_event_interruptible_timeout(), will call vfio_group_get_device and try to acquire the mutex group->device_lock. wait_event_interruptible_timeout() will set the state of the current task to TASK_INTERRUPTIBLE, before doing the condition check. This means that we will try to acquire the mutex while already in a sleeping state. The scheduler warns us by giving the following warning: [ 4050.264464] ------------[ cut here ]------------ [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188 [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90 .... 4050.264756] Call Trace: [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90) [ 4050.264774] [<0000000000b97edc>] __mutex_lock+0x44/0x8c0 [ 4050.264782] [<0000000000b9878a>] mutex_lock_nested+0x32/0x40 [ 4050.264793] [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio] [ 4050.264803] [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio] [ 4050.264813] [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev] [ 4050.264825] [<00000000008e01b0>] device_release_driver_internal+0x168/0x268 [ 4050.264834] [<00000000008de692>] bus_remove_device+0x162/0x190 [ 4050.264843] [<00000000008daf42>] device_del+0x1e2/0x368 [ 4050.264851] [<00000000008db12c>] device_unregister+0x64/0x88 [ 4050.264862] [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev] [ 4050.264872] [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev] [ 4050.264881] [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8 [ 4050.264890] [<00000000003c1530>] __vfs_write+0x38/0x1a8 [ 4050.264899] [<00000000003c187c>] vfs_write+0xb4/0x198 [ 4050.264908] [<00000000003c1af2>] ksys_write+0x5a/0xb0 [ 4050.264916] [<0000000000b9e270>] system_call+0xdc/0x2d8 [ 4050.264925] 4 locks held by sh/35924: [ 4050.264933] #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198 [ 4050.264948] #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8 [ 4050.264963] #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150 [ 4050.264979] #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268 [ 4050.264993] Last Breaking-Event-Address: [ 4050.265002] [<000000000017bbaa>] __might_sleep+0x72/0x90 [ 4050.265010] irq event stamp: 7039 [ 4050.265020] hardirqs last enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740 [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740 [ 4050.265040] softirqs last enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100 [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100 [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]--- Let's fix this as described in the article https://lwn.net/Articles/628628/. Signed-off-by: Farhan Ali <alifm@linux.ibm.com> [remove now redundant vfio_dev_present()] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-04-22vfio: Use dev_printk() when possibleBjorn Helgaas1-16/+13
Use dev_printk() when possible to make messages consistent with other device-related messages. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-12vfio: expand minor range when registering chrdev regionChengguang Xu1-4/+4
Actually, total amount of available minor number for a single major is MINORMARK + 1. So expand minor range when registering chrdev region. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-08vfio: use match_string() helperYisheng Xie1-8/+3
match_string() returns the index of an array for a matching string, which can be used intead of open coded variant. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-12-20vfio: Simplify capability helperAlex Williamson1-48/+4
The vfio_info_add_capability() helper requires the caller to pass a capability ID, which it then uses to fill in header fields, assuming hard coded versions. This makes for an awkward and rigid interface. The only thing we want this helper to do is allocate sufficient space in the caps buffer and chain this capability into the list. Reduce it to that simple task. Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-25locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns ↵Mark Rutland1-1/+1
to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-30vfio: Stall vfio_del_group_dev() for container group detachAlex Williamson1-0/+20
When the user unbinds the last device of a group from a vfio bus driver, the devices within that group should be available for other purposes. We currently have a race that makes this generally, but not always true. The device can be unbound from the vfio bus driver, but remaining IOMMU context of the group attached to the container can result in errors as the next driver configures DMA for the device. Wait for the group to be detached from the IOMMU backend before allowing the bus driver remove callback to complete. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-08-30vfio: fix noiommu vfio_iommu_group_get reference countEric Auger1-2/+3
In vfio_iommu_group_get() we want to increase the reference count of the iommu group. In noiommu case, the group does not exist and is allocated. iommu_group_add_device() increases the group ref count. However we then call iommu_group_put() which decrements it. This leads to a "refcount_t: underflow WARN_ON". Only decrement the ref count in case of iommu_group_add_device failure. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-07-08vfio: Remove unnecessary uses of vfio_container.group_lockAlex Williamson1-38/+0
The original intent of vfio_container.group_lock is to protect vfio_container.group_list, however over time it's become a crutch to prevent changes in container composition any time we call into the iommu driver backend. This introduces problems when we start to have more complex interactions, for example when a user's DMA unmap request triggers a notification to an mdev vendor driver, who responds by attempting to unpin mappings within that request, re-entering the iommu backend. We incorrectly assume that the use of read-locks here allow for this nested locking behavior, but a poorly timed write-lock could in fact trigger a deadlock. The current use of group_lock seems to fall into the trap of locking code, not data. Correct that by removing uses of group_lock that are not directly related to group_list. Note that the vfio type1 iommu backend has its own mutex, vfio_iommu.lock, which it uses to protect itself for each of these interfaces anyway. The group_lock appears to be a redundancy for these interfaces and type1 even goes so far as to release its mutex to allow for exactly the re-entrant code path above. Reported-by: Chuanxiao Dong <chuanxiao.dong@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: stable@vger.kernel.org # v4.10+
2017-06-28vfio: New external user group/file matchAlex Williamson1-0/+9
At the point where the kvm-vfio pseudo device wants to release its vfio group reference, we can't always acquire a new reference to make that happen. The group can be in a state where we wouldn't allow a new reference to be added. This new helper function allows a caller to match a file to a group to facilitate this. Given a file and group, report if they match. Thus the caller needs to already have a group reference to match to the file. This allows the deletion of a group without acquiring a new reference. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Cc: stable@vger.kernel.org
2017-06-28vfio: Fix group release deadlockAlex Williamson1-1/+36
If vfio_iommu_group_notifier() acquires a group reference and that reference becomes the last reference to the group, then vfio_group_put introduces a deadlock code path where we're trying to unregister from the iommu notifier chain from within a callout of that chain. Use a work_struct to release this reference asynchronously. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Cc: stable@vger.kernel.org