summaryrefslogtreecommitdiff
path: root/drivers/iommu/iommufd/Makefile
AgeCommit message (Collapse)AuthorFilesLines
2023-10-24vfio: Move iova_bitmap into iommufdJoao Martins1-0/+1
Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking the user bitmaps, so move to the common dependency into IOMMUFD. In doing so, create the symbol IOMMUFD_DRIVER which designates the builtin code that will be used by drivers when selected. Today this means MLX5_VFIO_PCI and PDS_VFIO_PCI. IOMMU drivers will do the same (in future patches) when supporting dirty tracking and select IOMMUFD_DRIVER accordingly. Given that the symbol maybe be disabled, add header definitions in iova_bitmap.h for when IOMMUFD_DRIVER=n Link: https://lore.kernel.org/r/20231024135109.73787-3-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01iommufd: Add kernel support for testing iommufdJason Gunthorpe1-0/+2
Provide a mock kernel module for the iommu_domain that allows it to run without any HW and the mocking provides a way to directly validate that the PFNs loaded into the iommu_domain are correct. This exposes the access kAPI toward userspace to allow userspace to explore the functionality of pages.c and io_pagetable.c The mock also simulates the rare case of PAGE_SIZE > iommu page size as the mock will operate at a 2K iommu page size. This allows exercising all of the calculations to support this mismatch. This is also intended to support syzkaller exploring the same space. However, it is an unusually invasive config option to enable all of this. The config option should not be enabled in a production kernel. Link: https://lore.kernel.org/r/16-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390 Tested-by: Eric Auger <eric.auger@redhat.com> # aarch64 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01iommufd: vfio container FD ioctl compatibilityJason Gunthorpe1-1/+2
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by mapping them into io_pagetable operations. A userspace application can test against iommufd and confirm compatibility then simply make a small change to open /dev/iommu instead of /dev/vfio/vfio. For testing purposes /dev/vfio/vfio can be symlinked to /dev/iommu and then all applications will use the compatibility path with no code changes. A later series allows /dev/vfio/vfio to be directly provided by iommufd, which allows the rlimit mode to work the same as well. This series just provides the iommufd side of compatibility. Actually linking this to VFIO_SET_CONTAINER is a followup series, with a link in the cover letter. Internally the compatibility API uses a normal IOAS object that, like vfio, is automatically allocated when the first device is attached. Userspace can also query or set this IOAS object directly using the IOMMU_VFIO_IOAS ioctl. This allows mixing and matching new iommufd only features while still using the VFIO style map/unmap ioctls. While this is enough to operate qemu, it has a few differences: - Resource limits rely on memory cgroups to bound what userspace can do instead of the module parameter dma_entry_limit. - VFIO P2P is not implemented. The DMABUF patches for vfio are a start at a solution where iommufd would import a special DMABUF. This is to avoid further propogating the follow_pfn() security problem. - A full audit for pedantic compatibility details (eg errnos, etc) has not yet been done - powerpc SPAPR is left out, as it is not connected to the iommu_domain framework. It seems interest in SPAPR is minimal as it is currently non-working in v6.1-rc1. They will have to convert to the iommu subsystem framework to enjoy iommfd. The following are not going to be implemented and we expect to remove them from VFIO type1: - SW access 'dirty tracking'. As discussed in the cover letter this will be done in VFIO. - VFIO_TYPE1_NESTING_IOMMU https://lore.kernel.org/all/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/ - VFIO_DMA_MAP_FLAG_VADDR https://lore.kernel.org/all/Yz777bJZjTyLrHEQ@nvidia.com/ Link: https://lore.kernel.org/r/15-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01iommufd: Add kAPI toward external drivers for physical devicesJason Gunthorpe1-0/+1
Add the four functions external drivers need to connect physical DMA to the IOMMUFD: iommufd_device_bind() / iommufd_device_unbind() Register the device with iommufd and establish security isolation. iommufd_device_attach() / iommufd_device_detach() Connect a bound device to a page table Binding a device creates a device object ID in the uAPI, however the generic API does not yet provide any IOCTLs to manipulate them. Link: https://lore.kernel.org/r/13-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01iommufd: Add a HW pagetable objectJason Gunthorpe1-0/+1
The hw_pagetable object exposes the internal struct iommu_domain's to userspace. An iommu_domain is required when any DMA device attaches to an IOAS to control the io page table through the iommu driver. For compatibility with VFIO the hw_pagetable is automatically created when a DMA device is attached to the IOAS. If a compatible iommu_domain already exists then the hw_pagetable associated with it is used for the attachment. In the initial series there is no iommufd uAPI for the hw_pagetable object. The next patch provides driver facing APIs for IO page table attachment that allows drivers to accept either an IOAS or a hw_pagetable ID and for the driver to return the hw_pagetable ID that was auto-selected from an IOAS. The expectation is the driver will provide uAPI through its own FD for attaching its device to iommufd. This allows userspace to learn the mapping of devices to iommu_domains and to override the automatic attachment. The future HW specific interface will allow userspace to create hw_pagetable objects using iommu_domains with IOMMU driver specific parameters. This infrastructure will allow linking those domains to IOAS's and devices. Link: https://lore.kernel.org/r/12-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01iommufd: IOCTLs for the io_pagetableJason Gunthorpe1-0/+1
Connect the IOAS to its IOCTL interface. This exposes most of the functionality in the io_pagetable to userspace. This is intended to be the core of the generic interface that IOMMUFD will provide. Every IOMMU driver should be able to implement an iommu_domain that is compatible with this generic mechanism. It is also designed to be easy to use for simple non virtual machine monitor users, like DPDK: - Universal simple support for all IOMMUs (no PPC special path) - An IOVA allocator that considers the aperture and the allowed/reserved ranges - io_pagetable allows any number of iommu_domains to be connected to the IOAS - Automatic allocation and re-use of iommu_domains Along with room in the design to add non-generic features to cater to specific HW functionality. Link: https://lore.kernel.org/r/11-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01iommufd: Data structure to provide IOVA to PFN mappingJason Gunthorpe1-0/+1
This is the remainder of the IOAS data structure. Provide an object called an io_pagetable that is composed of iopt_areas pointing at iopt_pages, along with a list of iommu_domains that mirror the IOVA to PFN map. At the top this is a simple interval tree of iopt_areas indicating the map of IOVA to iopt_pages. An xarray keeps track of a list of domains. Based on the attached domains there is a minimum alignment for areas (which may be smaller than PAGE_SIZE), an interval tree of reserved IOVA that can't be mapped and an IOVA of allowed IOVA that can always be mappable. The concept of an 'access' refers to something like a VFIO mdev that is accessing the IOVA and using a 'struct page *' for CPU based access. Externally an API is provided that matches the requirements of the IOCTL interface for map/unmap and domain attachment. The API provides a 'copy' primitive to establish a new IOVA map in a different IOAS from an existing mapping by re-using the iopt_pages. This is the basic mechanism to provide single pinning. This is designed to support a pre-registration flow where userspace would setup an dummy IOAS with no domains, map in memory and then establish an access to pin all PFNs into the xarray. Copy can then be used to create new IOVA mappings in a different IOAS, with iommu_domains attached. Upon copy the PFNs will be read out of the xarray and mapped into the iommu_domains, avoiding any pin_user_pages() overheads. Link: https://lore.kernel.org/r/10-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01iommufd: PFN handling for iopt_pagesJason Gunthorpe1-1/+2
The top of the data structure provides an IO Address Space (IOAS) that is similar to a VFIO container. The IOAS allows map/unmap of memory into ranges of IOVA called iopt_areas. Multiple IOMMU domains (IO page tables) and in-kernel accesses (like VFIO mdevs) can be attached to the IOAS to access the PFNs that those IOVA areas cover. The IO Address Space (IOAS) datastructure is composed of: - struct io_pagetable holding the IOVA map - struct iopt_areas representing populated portions of IOVA - struct iopt_pages representing the storage of PFNs - struct iommu_domain representing each IO page table in the system IOMMU - struct iopt_pages_access representing in-kernel accesses of PFNs (ie VFIO mdevs) - struct xarray pinned_pfns holding a list of pages pinned by in-kernel accesses This patch introduces the lowest part of the datastructure - the movement of PFNs in a tiered storage scheme: 1) iopt_pages::pinned_pfns xarray 2) Multiple iommu_domains 3) The origin of the PFNs, i.e. the userspace pointer PFN have to be copied between all combinations of tiers, depending on the configuration. The interface is an iterator called a 'pfn_reader' which determines which tier each PFN is stored and loads it into a list of PFNs held in a struct pfn_batch. Each step of the iterator will fill up the pfn_batch, then the caller can use the pfn_batch to send the PFNs to the required destination. Repeating this loop will read all the PFNs in an IOVA range. The pfn_reader and pfn_batch also keep track of the pinned page accounting. While PFNs are always stored and accessed as full PAGE_SIZE units the iommu_domain tier can store with a sub-page offset/length to support IOMMUs with a smaller IOPTE size than PAGE_SIZE. Link: https://lore.kernel.org/r/8-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-01iommufd: File descriptor, context, kconfig and makefilesJason Gunthorpe1-0/+5
This is the basic infrastructure of a new miscdevice to hold the iommufd IOCTL API. It provides: - A miscdevice to create file descriptors to run the IOCTL interface over - A table based ioctl dispatch and centralized extendable pre-validation step - An xarray mapping userspace ID's to kernel objects. The design has multiple inter-related objects held within in a single IOMMUFD fd - A simple usage count to build a graph of object relations and protect against hostile userspace racing ioctls The only IOCTL provided in this patch is the generic 'destroy any object by handle' operation. Link: https://lore.kernel.org/r/6-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>