summaryrefslogtreecommitdiff
path: root/drivers/vfio
AgeCommit message (Collapse)AuthorFilesLines
2020-10-12vfio/fsl-mc: trigger an interrupt via eventfdDiana Craciun3-2/+194
This patch allows to set an eventfd for fsl-mc device interrupts and also to trigger the interrupt eventfd from userspace for testing. All fsl-mc device interrupts are MSIs. The MSIs are allocated from the MSI domain only once per DPRC and used by all the DPAA2 objects. The interrupts are managed by the DPRC in a pool of interrupts. Each device requests interrupts from this pool. The pool is allocated when the first virtual device is setting the interrupts. The pool of interrupts is protected by a lock. The DPRC has an interrupt of its own which indicates if the DPRC contents have changed. However, currently, the contents of a DPRC assigned to the guest cannot be changed at runtime, so this interrupt is not configured. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12vfio/fsl-mc: Add irq infrastructure for fsl-mc devicesDiana Craciun4-3/+91
This patch adds the skeleton for interrupt support for fsl-mc devices. The interrupts are not yet functional, the functionality will be added by subsequent patches. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12vfio/fsl-mc: Added lock support in preparation for interrupt handlingDiana Craciun2-9/+90
Only the DPRC object allocates interrupts from the MSI interrupt domain. The interrupts are managed by the DPRC in a pool of interrupts. The access to this pool of interrupts has to be protected with a lock. This patch extends the current lock implementation to have a lock per DPRC. Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regionsDiana Craciun1-2/+66
Allow userspace to mmap device regions for direct access of fsl-mc devices. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl callDiana Craciun2-1/+96
Expose to userspace information about the memory regions. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctlDiana Craciun1-1/+20
Allow userspace to get fsl-mc device info (number of regions and irqs). Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-12vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bindDiana Craciun2-0/+92
The DPRC (Data Path Resource Container) device is a bus device and has child devices attached to it. When the vfio-fsl-mc driver is probed the DPRC is scanned and the child devices discovered and initialized. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-07vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devicesBharat Bhushan6-0/+186
DPAA2 (Data Path Acceleration Architecture) consists in mechanisms for processing Ethernet packets, queue management, accelerators, etc. The Management Complex (mc) is a hardware entity that manages the DPAA2 hardware resources. It provides an object-based abstraction for software drivers to use the DPAA2 hardware. The MC mediates operations such as create, discover, destroy of DPAA2 objects. The MC provides memory-mapped I/O command interfaces (MC portals) which DPAA2 software drivers use to operate on DPAA2 objects. A DPRC is a container object that holds other types of DPAA2 objects. Each object in the DPRC is a Linux device and bound to a driver. The MC-bus driver is a platform driver (different from PCI or platform bus). The DPRC driver does runtime management of a bus instance. It performs the initial scan of the DPRC and handles changes in the DPRC configuration (adding/removing objects). All objects inside a container share the same hardware isolation context, meaning that only an entire DPRC can be assigned to a virtual machine. When a container is assigned to a virtual machine, all the objects within that container are assigned to that virtual machine. The DPRC container assigned to the virtual machine is not allowed to change contents (add/remove objects) by the guest. The restriction is set by the host and enforced by the mc hardware. The DPAA2 objects can be directly assigned to the guest. However the MC portals (the memory mapped command interface to the MC) need to be emulated because there are commands that configure the interrupts and the isolation IDs which are virtual in the guest. Example: echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind The dprc.2 is bound to the VFIO driver and all the objects within dprc.2 are going to be bound to the VFIO driver. This patch adds the infrastructure for VFIO support for fsl-mc devices. Subsequent patches will add support for binding and secure assigning these devices using VFIO. More details about the DPAA2 objects can be found here: Documentation/networking/device_drivers/freescale/dpaa2/overview.rst Signed-off-by: Bharat Bhushan <Bharat.Bhushan@nxp.com> Signed-off-by: Diana Craciun <diana.craciun@oss.nxp.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', ↵Alex Williamson4-14/+40
'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next
2020-09-22vfio/type1: fix dirty bitmap calculation in vfio_dma_rwYan Zhao1-1/+2
The count of dirtied pages is not only determined by count of copied pages, but also by the start offset. e.g. if offset = PAGE_SIZE - 1, and *copied=2, the dirty pages count is 2, instead of 1 or 0. Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22vfio: fix a missed vfio group put in vfio_pin_pagesYan Zhao1-2/+4
When error occurs, need to put vfio group after a successful get. Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22vfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfnMatthew Rosato1-10/+14
While it is true that devices with is_virtfn=1 will have a Memory Space Enable bit that is hard-wired to 0, this is not the only case where we see this behavior -- For example some bare-metal hypervisors lack Memory Space Enable bit emulation for devices not setting is_virtfn (s390). Fix this by instead checking for the newly-added no_command_memory bit which directly denotes the need for PCI_COMMAND_MEMORY emulation in vfio. Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21vfio iommu: Add dma available capabilityMatthew Rosato1-0/+17
Commit 492855939bdb ("vfio/type1: Limit DMA mappings per container") added the ability to limit the number of memory backed DMA mappings. However on s390x, when lazy mapping is in use, we use a very large number of concurrent mappings. Let's provide the current allowable number of DMA mappings to userspace via the IOMMU info chain so that userspace can take appropriate mitigation. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21vfio: add a singleton check for vfio_group_pin_pagesYan Zhao1-0/+3
Page pinning is used both to translate and pin device mappings for DMA purpose, as well as to indicate to the IOMMU backend to limit the dirty page scope to those pages that have been pinned, in the case of an IOMMU backed device. To support this, the vfio_pin_pages() interface limits itself to only singleton groups such that the IOMMU backend can consider dirty page scope only at the group level. Implement the same requirement for the vfio_group_pin_pages() interface. Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21vfio/pci: Don't regenerate vconfig for all BARs if !bardirtyZenghui Yu1-0/+3
Now we regenerate vconfig for all the BARs via vfio_bar_fixup(), every time any offset of any of them are read. Though BARs aren't re-read regularly, the regeneration can be avoided if no BARs had been written since they were last read, in which case vdev->bardirty is false. Let's return immediately in vfio_bar_fixup() if bardirty is false. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21vfio/pci: Remove redundant declaration of vfio_pci_driverZenghui Yu1-1/+0
It was added by commit 137e5531351d ("vfio/pci: Add sriov_configure support") but duplicates a forward declaration earlier in the file. Remove it. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-04iommu: Rename iommu_tlb_* functions to iommu_iotlb_*Tom Murphy1-1/+1
To keep naming consistent we should stick with *iotlb*. This patch renames a few remaining functions. Signed-off-by: Tom Murphy <murphyt7@tcd.ie> Link: https://lore.kernel.org/r/20200817210051.13546-1-murphyt7@tcd.ie Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-08-24treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva2-2/+2
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-17vfio/type1: Add proper error unwind for vfio_iommu_replay()Alex Williamson1-5/+66
The vfio_iommu_replay() function does not currently unwind on error, yet it does pin pages, perform IOMMU mapping, and modify the vfio_dma structure to indicate IOMMU mapping. The IOMMU mappings are torn down when the domain is destroyed, but the other actions go on to cause trouble later. For example, the iommu->domain_list can be empty if we only have a non-IOMMU backed mdev attached. We don't currently check if the list is empty before getting the first entry in the list, which leads to a bogus domain pointer. If a vfio_dma entry is erroneously marked as iommu_mapped, we'll attempt to use that bogus pointer to retrieve the existing physical page addresses. This is the scenario that uncovered this issue, attempting to hot-add a vfio-pci device to a container with an existing mdev device and DMA mappings, one of which could not be pinned, causing a failure adding the new group to the existing container and setting the conditions for a subsequent attempt to explode. To resolve this, we can first check if the domain_list is empty so that we can reject replay of a bogus domain, should we ever encounter this inconsistent state again in the future. The real fix though is to add the necessary unwind support, which means cleaning up the current pinning if an IOMMU mapping fails, then walking back through the r-b tree of DMA entries, reading from the IOMMU which ranges are mapped, and unmapping and unpinning those ranges. To be able to do this, we also defer marking the DMA entry as IOMMU mapped until all entries are processed, in order to allow the unwind to know the disposition of each entry. Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-17vfio-pci: Avoid recursive read-lock usageAlex Williamson2-24/+98
A down_read on memory_lock is held when performing read/write accesses to MMIO BAR space, including across the copy_to/from_user() callouts which may fault. If the user buffer for these copies resides in an mmap of device MMIO space, the mmap fault handler will acquire a recursive read-lock on memory_lock. Avoid this by reducing the lock granularity. Sequential accesses requiring multiple ioread/iowrite cycles are expected to be rare, therefore typical accesses should not see additional overhead. VGA MMIO accesses are expected to be non-fatal regardless of the PCI memory enable bit to allow legacy probing, this behavior remains with a comment added. ioeventfds are now included in memory access testing, with writes dropped while memory space is disabled. Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-08-12Merge tag 'vfio-v5.9-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds4-193/+276
Pull VFIO updates from Alex Williamson: - Inclusive naming updates (Alex Williamson) - Intel X550 INTx quirk (Alex Williamson) - Error path resched between unmaps (Xiang Zheng) - SPAPR IOMMU pin_user_pages() conversion (John Hubbard) - Trivial mutex simplification (Alex Williamson) - QAT device denylist (Giovanni Cabiddu) - type1 IOMMU ioctl refactor (Liu Yi L) * tag 'vfio-v5.9-rc1' of git://github.com/awilliam/linux-vfio: vfio/type1: Refactor vfio_iommu_type1_ioctl() vfio/pci: Add QAT devices to denylist vfio/pci: Add device denylist PCI: Add Intel QuickAssist device IDs vfio/pci: Hold igate across releasing eventfd contexts vfio/spapr_tce: convert get_user_pages() --> pin_user_pages() vfio/type1: Add conditional rescheduling after iommu map failed vfio/pci: Add Intel X550 to hidden INTx devices vfio: Cleanup allowed driver naming
2020-08-12mm/gup: remove task_struct pointer for all gup codePeter Xu1-2/+2
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-27vfio/type1: Refactor vfio_iommu_type1_ioctl()Liu Yi L1-181/+213
This patch refactors the vfio_iommu_type1_ioctl() to use switch instead of if-else, and each command got a helper function. Cc: Kevin Tian <kevin.tian@intel.com> CC: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27vfio/pci: Add QAT devices to denylistGiovanni Cabiddu1-0/+15
The current generation of Intel® QuickAssist Technology devices are not designed to run in an untrusted environment because of the following issues reported in the document "Intel® QuickAssist Technology (Intel® QAT) Software for Linux" (document number 336211-014): QATE-39220 - GEN - Intel® QAT API submissions with bad addresses that trigger DMA to invalid or unmapped addresses can cause a platform hang QATE-7495 - GEN - An incorrectly formatted request to Intel® QAT can hang the entire Intel® QAT Endpoint The document is downloadable from https://01.org/intel-quickassist-technology at the following link: https://01.org/sites/default/files/downloads/336211-014-qatforlinux-releasenotes-hwv1.7_0.pdf This patch adds the following QAT devices to the denylist: DH895XCC, C3XXX and C62X. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27vfio/pci: Add device denylistGiovanni Cabiddu1-0/+33
Add denylist of devices that by default are not probed by vfio-pci. Devices in this list may be susceptible to untrusted application, even if the IOMMU is enabled. To be accessed via vfio-pci, the user has to explicitly disable the denylist. The denylist can be disabled via the module parameter disable_denylist. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Fiona Trahe <fiona.trahe@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27vfio/pci: Hold igate across releasing eventfd contextsAlex Williamson1-3/+1
No need to release and immediately re-acquire igate while clearing out the eventfd ctxs. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27vfio/spapr_tce: convert get_user_pages() --> pin_user_pages()John Hubbard1-2/+2
This code was using get_user_pages*(), in a "Case 2" scenario (DMA/RDMA), using the categorization from [1]. That means that it's time to convert the get_user_pages*() + put_page() calls to pin_user_pages*() + unpin_user_pages() calls. There is some helpful background in [2]: basically, this is a small part of fixing a long-standing disconnect between pinning pages, and file systems' use of those pages. [1] Documentation/core-api/pin_user_pages.rst [2] "Explicit pinning of user-space pages": https://lwn.net/Articles/807108/ Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27vfio/type1: Add conditional rescheduling after iommu map failedXiang Zheng1-1/+3
Commit c5e6688752c2 ("vfio/type1: Add conditional rescheduling") missed a "cond_resched()" in vfio_iommu_map if iommu map failed. This is a very tiny optimization and the case can hardly happen. Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27vfio/pci: Add Intel X550 to hidden INTx devicesAlex Williamson1-0/+2
Intel document 333717-008, "Intel® Ethernet Controller X550 Specification Update", version 2.7, dated June 2020, includes errata #22, added in version 2.1, May 2016, indicating X550 NICs suffer from the same implementation deficiency as the 700-series NICs: "The Interrupt Status bit in the Status register of the PCIe configuration space is not implemented and is not set as described in the PCIe specification." Without the interrupt status bit, vfio-pci cannot determine when these devices signal INTx. They are therefore added to the nointx quirk. Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-27vfio: Cleanup allowed driver namingAlex Williamson1-6/+7
No functional change, avoid non-inclusive naming schemes. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-07-17vfio/pci: fix racy on error and request eventfd ctxZeng Tao1-0/+5
The vfio_pci_release call will free and clear the error and request eventfd ctx while these ctx could be in use at the same time in the function like vfio_pci_request, and it's expected to protect them under the vdev->igate mutex, which is missing in vfio_pci_release. This issue is introduced since commit 1518ac272e78 ("vfio/pci: fix memory leaks of eventfd ctx"),and since commit 5c5866c593bb ("vfio/pci: Clear error and request eventfd ctx after releasing"), it's very easily to trigger the kernel panic like this: [ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 9513.913091] Mem abort info: [ 9513.915871] ESR = 0x96000006 [ 9513.918912] EC = 0x25: DABT (current EL), IL = 32 bits [ 9513.924198] SET = 0, FnV = 0 [ 9513.927238] EA = 0, S1PTW = 0 [ 9513.930364] Data abort info: [ 9513.933231] ISV = 0, ISS = 0x00000006 [ 9513.937048] CM = 0, WnR = 0 [ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000 [ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000 [ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci] [ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G W 5.8.0-rc4+ #3 [ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020 [ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--) [ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88 [ 9513.999515] lr : eventfd_signal+0x6c/0x1b0 [ 9514.003591] sp : ffff800038a0b960 [ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10 [ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000 [ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400 [ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008 [ 9514.028033] x21: 0000000000000000 x20: 0000000000000000 [ 9514.033321] x19: 0000000000000008 x18: 0000000000000000 [ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328 [ 9514.043893] x15: 0000000000000000 x14: 0000000000000001 [ 9514.049179] x13: 0000000000000000 x12: 0000000000000002 [ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00 [ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60 [ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748 [ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000 [ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001 [ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000 [ 9514.086181] Call trace: [ 9514.088618] _raw_spin_lock_irqsave+0x48/0x88 [ 9514.092954] eventfd_signal+0x6c/0x1b0 [ 9514.096691] vfio_pci_request+0x84/0xd0 [vfio_pci] [ 9514.101464] vfio_del_group_dev+0x150/0x290 [vfio] [ 9514.106234] vfio_pci_remove+0x30/0x128 [vfio_pci] [ 9514.111007] pci_device_remove+0x48/0x108 [ 9514.115001] device_release_driver_internal+0x100/0x1b8 [ 9514.120200] device_release_driver+0x28/0x38 [ 9514.124452] pci_stop_bus_device+0x68/0xa8 [ 9514.128528] pci_stop_and_remove_bus_device+0x20/0x38 [ 9514.133557] pci_iov_remove_virtfn+0xb4/0x128 [ 9514.137893] sriov_disable+0x3c/0x108 [ 9514.141538] pci_disable_sriov+0x28/0x38 [ 9514.145445] hns3_pci_sriov_configure+0x48/0xb8 [hns3] [ 9514.150558] sriov_numvfs_store+0x110/0x198 [ 9514.154724] dev_attr_store+0x44/0x60 [ 9514.158373] sysfs_kf_write+0x5c/0x78 [ 9514.162018] kernfs_fop_write+0x104/0x210 [ 9514.166010] __vfs_write+0x48/0x90 [ 9514.169395] vfs_write+0xbc/0x1c0 [ 9514.172694] ksys_write+0x74/0x100 [ 9514.176079] __arm64_sys_write+0x24/0x30 [ 9514.179987] el0_svc_common.constprop.4+0x110/0x200 [ 9514.184842] do_el0_svc+0x34/0x98 [ 9514.188144] el0_svc+0x14/0x40 [ 9514.191185] el0_sync_handler+0xb0/0x2d0 [ 9514.195088] el0_sync+0x140/0x180 [ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61) [ 9514.204455] ---[ end trace 648de00c8406465f ]--- [ 9514.212308] note: bash[1327] exited with preempt_count 1 Cc: Qian Cai <cai@lca.pw> Cc: Alex Williamson <alex.williamson@redhat.com> Fixes: 1518ac272e78 ("vfio/pci: fix memory leaks of eventfd ctx") Signed-off-by: Zeng Tao <prime.zeng@hisilicon.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-25vfio/pci: Fix SR-IOV VF handling with MMIO blockingAlex Williamson1-1/+16
SR-IOV VFs do not implement the memory enable bit of the command register, therefore this bit is not set in config space after pci_enable_device(). This leads to an unintended difference between PF and VF in hand-off state to the user. We can correct this by setting the initial value of the memory enable bit in our virtualized config space. There's really no need however to ever fault a user on a VF though as this would only indicate an error in the user's management of the enable bit, versus a PF where the same access could trigger hardware faults. Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-18vfio/pci: Clear error and request eventfd ctx after releasingAlex Williamson1-2/+6
The next use of the device will generate an underflow from the stale reference. Cc: Qian Cai <cai@lca.pw> Fixes: 1518ac272e78 ("vfio/pci: fix memory leaks of eventfd ctx") Reported-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-11kernel: better document the use_mm/unuse_mm API contractChristoph Hellwig1-2/+2
Switch the function documentation to kerneldoc comments, and add WARN_ON_ONCE asserts that the calling thread is a kernel thread and does not have ->mm set (or has ->mm set in the case of unuse_mm). Also give the functions a kthread_ prefix to better document the use case. [hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio] Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de [sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename] Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb] Acked-by: Haren Myneni <haren@linux.ibm.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-11kernel: move use_mm/unuse_mm to kthread.cChristoph Hellwig1-1/+1
cover the newly merged use_mm/unuse_mm caller in vfio Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Link: http://lkml.kernel.org/r/20200416053158.586887-2-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: convert mmap_sem commentsMichel Lespinasse1-7/+7
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: convert mmap_sem call sites missed by coccinelleMichel Lespinasse1-4/+4
Convert the last few remaining mmap_sem rwsem calls to use the new mmap locking API. These were missed by coccinelle for some reason (I think coccinelle does not support some of the preprocessor constructs in these files ?) [akpm@linux-foundation.org: convert linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: use coccinelle to convert mmap_sem rwsem call sitesMichel Lespinasse1-4/+4
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-05Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds8-101/+979
Pull VFIO updates from Alex Williamson: - Block accesses to disabled MMIO space (Alex Williamson) - VFIO device migration API (Kirti Wankhede) - type1 IOMMU dirty bitmap API and implementation (Kirti Wankhede) - PCI NULL capability masking (Alex Williamson) - Memory leak fixes (Qian Cai) - Reference leak fix (Qiushi Wu) * tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio: vfio iommu: typecast corrections vfio iommu: Use shift operation for 64-bit integer division vfio/mdev: Fix reference count leak in add_mdev_supported_type vfio: Selective dirty page tracking if IOMMU backed device pins pages vfio iommu: Add migration capability to report supported features vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap vfio iommu: Implementation of ioctl for dirty pages tracking vfio iommu: Add ioctl definition for dirty pages tracking vfio iommu: Cache pgsize_bitmap in struct vfio_iommu vfio iommu: Remove atomicity of ref_count of pinned pages vfio: UAPI for migration interface for device state vfio/pci: fix memory leaks of eventfd ctx vfio/pci: fix memory leaks in alloc_perm_bits() vfio-pci: Mask cap zero vfio-pci: Invalidate mmaps and block MMIO access on disabled memory vfio-pci: Fault mmaps to enable vma tracking vfio/type1: Support faulting PFNMAP vmas
2020-06-05Merge tag 'powerpc-5.8-1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Support for userspace to send requests directly to the on-chip GZIP accelerator on Power9. - Rework of our lockless page table walking (__find_linux_pte()) to make it safe against parallel page table manipulations without relying on an IPI for serialisation. - A series of fixes & enhancements to make our machine check handling more robust. - Lots of plumbing to add support for "prefixed" (64-bit) instructions on Power10. - Support for using huge pages for the linear mapping on 8xx (32-bit). - Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver. - Removal of some obsolete 40x platforms and associated cruft. - Initial support for booting on Power10. - Lots of other small features, cleanups & fixes. Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram Sang, Xiongfeng Wang. * tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits) powerpc/pseries: Make vio and ibmebus initcalls pseries specific cxl: Remove dead Kconfig options powerpc: Add POWER10 architected mode powerpc/dt_cpu_ftrs: Add MMA feature powerpc/dt_cpu_ftrs: Enable Prefixed Instructions powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected powerpc: Add support for ISA v3.1 powerpc: Add new HWCAP bits powerpc/64s: Don't set FSCR bits in INIT_THREAD powerpc/64s: Save FSCR to init_task.thread.fscr after feature init powerpc/64s: Don't let DT CPU features set FSCR_DSCR powerpc/64s: Don't init FSCR_DSCR in __init_FSCR() powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations powerpc/module_64: Consolidate ftrace code powerpc/32: Disable KASAN with pages bigger than 16k powerpc/uaccess: Don't set KUEP by default on book3s/32 powerpc/uaccess: Don't set KUAP by default on book3s/32 powerpc/8xx: Reduce time spent in allow_user_access() and friends ...
2020-06-02Merge branch 'v5.8/vfio/kirti-migration-fixes' into v5.8/vfio/nextAlex Williamson1-3/+4
2020-06-02vfio iommu: typecast correctionsKirti Wankhede1-2/+2
Fixes sparse warnings by adding '__user' in typecast for copy_[from,to]_user() Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-02vfio iommu: Use shift operation for 64-bit integer divisionKirti Wankhede1-1/+2
Fixes compilation error with ARCH=i386. Error fixed by this commit: ld: drivers/vfio/vfio_iommu_type1.o: in function `vfio_dma_populate_bitmap': >> vfio_iommu_type1.c:(.text+0x666): undefined reference to `__udivdi3' Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-30Merge branch 'qiushi-wu-mdev-ref-v1' into v5.8/vfio/nextAlex Williamson1-1/+1
2020-05-30vfio/mdev: Fix reference count leak in add_mdev_supported_typeQiushi Wu1-1/+1
kobject_init_and_add() takes reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Thus, replace kfree() by kobject_put() to fix this issue. Previous commit "b8eb718348b8" fixed a similar problem. Fixes: 7b96953bc640 ("vfio: Mediated device Core driver") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29vfio: Selective dirty page tracking if IOMMU backed device pins pagesKirti Wankhede2-10/+106
Added a check such that only singleton IOMMU groups can pin pages. >From the point when vendor driver pins any pages, consider IOMMU group dirty page scope to be limited to pinned pages. To optimize to avoid walking list often, added flag pinned_page_dirty_scope to indicate if all of the vfio_groups for each vfio_domain in the domain_list dirty page scope is limited to pinned pages. This flag is updated on first pinned pages request for that IOMMU group and on attaching/detaching group. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29vfio iommu: Add migration capability to report supported featuresKirti Wankhede1-1/+22
Added migration capability in IOMMU info chain. User application should check IOMMU info chain for migration capability to use dirty page tracking feature provided by kernel module. User application must check page sizes supported and maximum dirty bitmap size returned by this capability structure for ioctls used to get dirty bitmap. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmapKirti Wankhede1-11/+50
DMA mapped pages, including those pinned by mdev vendor drivers, might get unpinned and unmapped while migration is active and device is still running. For example, in pre-copy phase while guest driver could access those pages, host device or vendor driver can dirty these mapped pages. Such pages should be marked dirty so as to maintain memory consistency for a user making use of dirty page tracking. To get bitmap during unmap, user should allocate memory for bitmap, set it all zeros, set size of allocated memory, set page size to be considered for bitmap and set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29vfio iommu: Implementation of ioctl for dirty pages trackingKirti Wankhede1-6/+308
VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations: - Start dirty pages tracking while migration is active - Stop dirty pages tracking. - Get dirty pages bitmap. Its user space application's responsibility to copy content of dirty pages from source to destination during migration. To prevent DoS attack, memory for bitmap is allocated per vfio_dma structure. Bitmap size is calculated considering smallest supported page size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled Bitmap is populated for already pinned pages when bitmap is allocated for a vfio_dma with the smallest supported page size. Update bitmap from pinning functions when tracking is enabled. When user application queries bitmap, check if requested page size is same as page size used to populated bitmap. If it is equal, copy bitmap, but if not equal, return error. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Fixed error reported by build bot by changing pgsize type from uint64_t to size_t. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-29vfio iommu: Cache pgsize_bitmap in struct vfio_iommuKirti Wankhede1-39/+49
Calculate and cache pgsize_bitmap when iommu->domain_list is updated and iommu->external_domain is set for mdev device. Add iommu->lock protection when cached pgsize_bitmap is accessed. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>