summaryrefslogtreecommitdiff
path: root/drivers/iommu
AgeCommit message (Collapse)AuthorFilesLines
2024-03-02Merge tag 'for-linus-iommufd' of ↵Linus Torvalds2-24/+54
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: "Four syzkaller found bugs: - Corruption during error unwind in iommufd_access_change_ioas() - Overlapping IDs in the test suite due to out of order destruction - Missing locking for access->ioas in the test suite - False failures in the test suite validation logic with huge pages" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd/selftest: Don't check map/unmap pairing with HUGE_PAGES iommufd: Fix protection fault in iommufd_test_syz_conv_iova iommufd/selftest: Fix mock_dev_num bug iommufd: Fix iopt_access_list_id overwrite bug
2024-03-01iommu/sva: Fix SVA handle sharing in multi device caseZhangfei Gao1-2/+2
iommu_sva_bind_device will directly goto out in multi-device case when found existing domain, ignoring list_add handle, which causes the handle to fail to be shared. Fixes: 65d4418c5002 ("iommu/sva: Restore SVA handle sharing") Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240227064821.128-1-zhangfei.gao@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-26iommufd/selftest: Don't check map/unmap pairing with HUGE_PAGESJason Gunthorpe1-11/+18
Since MOCK_HUGE_PAGE_SIZE was introduced it allows the core code to invoke mock with large page sizes. This confuses the validation logic that checks that map/unmap are paired. This is because the page size computed for map is based on the physical address and in many cases will always be the base page size, however the entire range generated by iommufd will be passed to map. Randomly iommufd can see small groups of physically contiguous pages, (say 8k unaligned and grouped together), but that group crosses a huge page boundary. The map side will observe this as a contiguous run and mark it accordingly, but there is a chance the unmap side will end up terminating interior huge pages in the middle of that group and trigger a validation failure. Meaning the validation only works if the core code passes the iova/length directly from iommufd to mock. syzkaller randomly hits this with failures like: WARNING: CPU: 0 PID: 11568 at drivers/iommu/iommufd/selftest.c:461 mock_domain_unmap_pages+0x1c0/0x250 Modules linked in: CPU: 0 PID: 11568 Comm: syz-executor.0 Not tainted 6.8.0-rc3+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:mock_domain_unmap_pages+0x1c0/0x250 Code: 2b e8 94 37 0f ff 48 d1 eb 31 ff 48 b8 00 00 00 00 00 00 20 00 48 21 c3 48 89 de e8 aa 32 0f ff 48 85 db 75 07 e8 70 37 0f ff <0f> 0b e8 69 37 0f ff 31 f6 31 ff e8 90 32 0f ff e8 5b 37 0f ff 4c RSP: 0018:ffff88800e707490 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff822dfae6 RDX: ffff88800cf86400 RSI: ffffffff822dfaf0 RDI: 0000000000000007 RBP: ffff88800e7074d8 R08: 0000000000000000 R09: ffffed1001167c90 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000001500000 R13: 0000000000083000 R14: 0000000000000001 R15: 0000000000000800 FS: 0000555556048480(0000) GS:ffff88806d400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2dc23000 CR3: 0000000008cbb000 CR4: 0000000000350eb0 Call Trace: <TASK> __iommu_unmap+0x281/0x520 iommu_unmap+0xc9/0x180 iopt_area_unmap_domain_range+0x1b1/0x290 iopt_area_unpin_domain+0x590/0x800 __iopt_area_unfill_domain+0x22e/0x650 iopt_area_unfill_domain+0x47/0x60 iopt_unfill_domain+0x187/0x590 iopt_table_remove_domain+0x267/0x2d0 iommufd_hwpt_paging_destroy+0x1f1/0x370 iommufd_object_remove+0x2a3/0x490 iommufd_device_detach+0x23a/0x2c0 iommufd_selftest_destroy+0x7a/0xf0 iommufd_fops_release+0x1d3/0x340 __fput+0x272/0xb50 __fput_sync+0x4b/0x60 __x64_sys_close+0x8b/0x110 do_syscall_64+0x71/0x140 entry_SYSCALL_64_after_hwframe+0x46/0x4e Do the simple thing and just disable the validation when the huge page tests are being run. Fixes: 7db521e23fe9 ("iommufd/selftest: Hugepage mock domain support") Link: https://lore.kernel.org/r/0-v1-1e17e60a5c8a+103fb-iommufd_mock_hugepg_jgg@nvidia.com Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-26iommufd: Fix protection fault in iommufd_test_syz_conv_iovaNicolin Chen1-6/+21
Syzkaller reported the following bug: general protection fault, probably for non-canonical address 0xdffffc0000000038: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x00000000000001c0-0x00000000000001c7] Call Trace: lock_acquire lock_acquire+0x1ce/0x4f0 down_read+0x93/0x4a0 iommufd_test_syz_conv_iova+0x56/0x1f0 iommufd_test_access_rw.isra.0+0x2ec/0x390 iommufd_test+0x1058/0x1e30 iommufd_fops_ioctl+0x381/0x510 vfs_ioctl __do_sys_ioctl __se_sys_ioctl __x64_sys_ioctl+0x170/0x1e0 do_syscall_x64 do_syscall_64+0x71/0x140 This is because the new iommufd_access_change_ioas() sets access->ioas to NULL during its process, so the lock might be gone in a concurrent racing context. Fix this by doing the same access->ioas sanity as iommufd_access_rw() and iommufd_access_pin_pages() functions do. Cc: stable@vger.kernel.org Fixes: 9227da7816dd ("iommufd: Add iommufd_access_change_ioas(_id) helpers") Link: https://lore.kernel.org/r/3f1932acaf1dd494d404c04364d73ce8f57f3e5e.1708636627.git.nicolinc@nvidia.com Reported-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-26iommufd/selftest: Fix mock_dev_num bugNicolin Chen1-4/+9
Syzkaller reported the following bug: sysfs: cannot create duplicate filename '/devices/iommufd_mock4' Call Trace: sysfs_warn_dup+0x71/0x90 sysfs_create_dir_ns+0x1ee/0x260 ? sysfs_create_mount_point+0x80/0x80 ? spin_bug+0x1d0/0x1d0 ? do_raw_spin_unlock+0x54/0x220 kobject_add_internal+0x221/0x970 kobject_add+0x11c/0x1e0 ? lockdep_hardirqs_on_prepare+0x273/0x3e0 ? kset_create_and_add+0x160/0x160 ? kobject_put+0x5d/0x390 ? bus_get_dev_root+0x4a/0x60 ? kobject_put+0x5d/0x390 device_add+0x1d5/0x1550 ? __fw_devlink_link_to_consumers.isra.0+0x1f0/0x1f0 ? __init_waitqueue_head+0xcb/0x150 iommufd_test+0x462/0x3b60 ? lock_release+0x1fe/0x640 ? __might_fault+0x117/0x170 ? reacquire_held_locks+0x4b0/0x4b0 ? iommufd_selftest_destroy+0xd0/0xd0 ? __might_fault+0xbe/0x170 iommufd_fops_ioctl+0x256/0x350 ? iommufd_option+0x180/0x180 ? __lock_acquire+0x1755/0x45f0 __x64_sys_ioctl+0xa13/0x1640 The bug is triggered when Syzkaller created multiple mock devices but didn't destroy them in the same sequence, messing up the mock_dev_num counter. Replace the atomic with an mock_dev_ida. Cc: stable@vger.kernel.org Fixes: 23a1b46f15d5 ("iommufd/selftest: Make the mock iommu driver into a real driver") Link: https://lore.kernel.org/r/5af41d5af6d5c013cc51de01427abb8141b3587e.1708636627.git.nicolinc@nvidia.com Reported-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-26iommufd: Fix iopt_access_list_id overwrite bugNicolin Chen1-3/+6
Syzkaller reported the following WARN_ON: WARNING: CPU: 1 PID: 4738 at drivers/iommu/iommufd/io_pagetable.c:1360 Call Trace: iommufd_access_change_ioas+0x2fe/0x4e0 iommufd_access_destroy_object+0x50/0xb0 iommufd_object_remove+0x2a3/0x490 iommufd_object_destroy_user iommufd_access_destroy+0x71/0xb0 iommufd_test_staccess_release+0x89/0xd0 __fput+0x272/0xb50 __fput_sync+0x4b/0x60 __do_sys_close __se_sys_close __x64_sys_close+0x8b/0x110 do_syscall_x64 The mismatch between the access pointer in the list and the passed-in pointer is resulting from an overwrite of access->iopt_access_list_id, in iopt_add_access(). Called from iommufd_access_change_ioas() when xa_alloc() succeeds but iopt_calculate_iova_alignment() fails. Add a new_id in iopt_add_access() and only update iopt_access_list_id when returning successfully. Cc: stable@vger.kernel.org Fixes: 9227da7816dd ("iommufd: Add iommufd_access_change_ioas(_id) helpers") Link: https://lore.kernel.org/r/2dda7acb25b8562ec5f1310de828ef5da9ef509c.1708636627.git.nicolinc@nvidia.com Reported-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-25Merge tag 'iommu-fixes-v6.8-rc5' of ↵Linus Torvalds8-93/+222
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Intel VT-d fixes for nested domain handling: - Cache invalidation for changes in a parent domain - Dirty tracking setting for parent and nested domains - Fix a constant-out-of-range warning - ARM SMMU fixes: - Fix CD allocation from atomic context when using SVA with SMMUv3 - Revert the conversion of SMMUv2 to domain_alloc_paging(), as it breaks the boot for Qualcomm MSM8996 devices - Restore SVA handle sharing in core code as it turned out there are still drivers relying on it * tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/sva: Restore SVA handle sharing iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock iommu/vt-d: Fix constant-out-of-range warning iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking iommu/vt-d: Add missing dirty tracking set for parent domain iommu/vt-d: Wrap the dirty tracking loop to be a helper iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking() iommu/vt-d: Add missing device iotlb flush for parent domain iommu/vt-d: Update iotlb in nested domain attach iommu/vt-d: Add missing iotlb flush for parent domain iommu/vt-d: Add __iommu_flush_iotlb_psi() iommu/vt-d: Track nested domains in parent Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
2024-02-23iommu/sva: Restore SVA handle sharingJason Gunthorpe1-0/+17
Prior to commit 092edaddb660 ("iommu: Support mm PASID 1:n with sva domains") the code allowed a SVA handle to be bound multiple times to the same (mm, device) pair. This was alluded to in the kdoc comment, but we had understood this to be more a remark about allowing multiple devices, not a literal same-driver re-opening the same SVA. It turns out uacce and idxd were both relying on the core code to handle reference counting for same-device same-mm scenarios. As this looks hard to resolve in the drivers bring it back to the core code. The new design has changed the meaning of the domain->users refcount to refer to the number of devices that are sharing that domain for the same mm. This is part of the design to lift the SVA domain de-duplication out of the drivers. Return the old behavior by explicitly de-duplicating the struct iommu_sva handle. The same (mm, device) will return the same handle pointer and the core code will handle tracking this. The last unbind of the handle will destroy it. Fixes: 092edaddb660 ("iommu: Support mm PASID 1:n with sva domains") Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org> Closes: https://lore.kernel.org/all/20240221110658.529-1-zhangfei.gao@linaro.org/ Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/0-v1-9455fc497a6f+3b4-iommu_sva_sharing_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-23Merge tag 'arm-smmu-fixes' of ↵Joerg Roedel2-37/+18
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into iommu/fixes Arm SMMU fixes for 6.8 - Fix CD allocation from atomic context when using SVA with SMMUv3 - Revert the conversion of SMMUv2 to domain_alloc_paging(), as it breaks the boot for Qualcomm MSM8996 devices
2024-02-22iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlockJason Gunthorpe1-26/+12
If the SMMU is configured to use a two level CD table then arm_smmu_write_ctx_desc() allocates a CD table leaf internally using GFP_KERNEL. Due to recent changes this is being done under a spinlock to iterate over the device list - thus it will trigger a sleeping while atomic warning: arm_smmu_sva_set_dev_pasid() mutex_lock(&sva_lock); __arm_smmu_sva_bind() arm_smmu_mmu_notifier_get() spin_lock_irqsave() arm_smmu_write_ctx_desc() arm_smmu_get_cd_ptr() arm_smmu_alloc_cd_leaf_table() dmam_alloc_coherent(GFP_KERNEL) This is a 64K high order allocation and really should not be done atomically. At the moment the rework of the SVA to follow the new API is half finished. Recently the CD table memory was moved from the domain to the master, however we have the confusing situation where the SVA code is wrongly using the RID domains device's list to track which CD tables the SVA is installed in. Remove the logic to replicate the CD across all the domain's masters during attach. We know which master and which CD table the PASID should be installed in. Right now SVA only works when dma-iommu.c is in control of the RID translation, which means we have a single iommu_domain shared across the entire group and that iommu_domain is not shared outside the group. Critically this means that the iommu_group->devices list and RID's smmu_domain->devices list describe the same set of masters. For PCI cases the core code also insists on singleton groups so there is only one entry in the smmu_domain->devices list that is equal to the master being passed in to arm_smmu_sva_set_dev_pasid(). Only non-PCI cases may have multi-device groups. However, the core code will repeat the calls to arm_smmu_sva_set_dev_pasid() across the entire iommu_group->devices list. Instead of having arm_smmu_mmu_notifier_get() indirectly loop over all the devices in the group via the RID's smmu_domain, rely on __arm_smmu_sva_bind() to be called for each device in the group and install the repeated CD entry that way. This avoids taking the spinlock to access the devices list and permits the arm_smmu_write_ctx_desc() to use a sleeping allocation. Leave the arm_smmu_mm_release() as a confusing situation, this requires tracking attached masters inside the SVA domain. Removing the loop allows arm_smmu_write_ctx_desc() to be called outside the spinlock and thus is safe to use GFP_KERNEL. Move the clearing of the CD into arm_smmu_sva_remove_dev_pasid() so that arm_smmu_mmu_notifier_get/put() remain paired functions. Fixes: 24503148c545 ("iommu/arm-smmu-v3: Refactor write_ctx_desc") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/4e25d161-0cf8-4050-9aa3-dfa21cd63e56@moroto.mountain/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Michael Shavit <mshavit@google.com> Link: https://lore.kernel.org/r/0-v3-11978fc67151+112-smmu_cd_atomic_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-02-21iommufd: Reject non-zero data_type if no data_len is providedJason Gunthorpe1-1/+2
Since the current design doesn't forward the data_type to the driver to check unless there is a data_len/uptr for a driver specific struct we should check and ensure that data_type is 0 if data_len is 0. Otherwise any value is permitted. Fixes: bd529dbb661d ("iommufd: Add a nested HW pagetable object") Link: https://lore.kernel.org/r/0-v1-9b1ea6869554+110c60-iommufd_ck_data_type_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-21iommu/vt-d: Fix constant-out-of-range warningArnd Bergmann1-1/+1
On 32-bit builds, the vt-d driver causes a warning with clang: drivers/iommu/intel/nested.c:112:13: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned long' is always false [-Werror,-Wtautological-constant-out-of-range-compare] 112 | if (npages == U64_MAX) | ~~~~~~ ^ ~~~~~~~ Make the variable a 64-bit type, which matches both the caller and the use anyway. Fixes: f6f3721244a8 ("iommu/vt-d: Add iotlb flush for nested domain") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240213095832.455245-1-arnd@kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Set SSADE when attaching to a parent with dirty trackingYi Liu1-0/+2
Should set the SSADE (Second Stage Access/Dirty bit Enable) bit of the pasid entry when attaching a device to a nested domain if its parent has already enabled dirty tracking. Fixes: 111bf85c68f6 ("iommu/vt-d: Add helper to setup pasid nested translation") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20240208091414.28133-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Add missing dirty tracking set for parent domainYi Liu1-0/+35
Setting dirty tracking for a s2 domain requires to loop all the related devices and set the dirty tracking enable bit in the PASID table entry. This includes the devices that are attached to the nested domains of a s2 domain if this s2 domain is used as parent. However, the existing dirty tracking set only loops s2 domain's own devices. It will miss dirty page logs in the parent domain. Now, the parent domain tracks the nested domains, so it can loop the nested domains and the devices attached to the nested domains to ensure dirty tracking on the parent is set completely. Fixes: b41e38e22539 ("iommu/vt-d: Add nested domain allocation") Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-9-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Wrap the dirty tracking loop to be a helperYi Liu1-11/+24
Add device_set_dirty_tracking() to loop all the devices and set the dirty tracking per the @enable parameter. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20240208082307.15759-8-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()Yi Liu3-7/+4
The only usage of input @domain is to get the domain id (DID) to flush cache after setting dirty tracking. However, DID can be obtained from the pasid entry. So no need to pass in domain. This can make this helper cleaner when adding the missing dirty tracking for the parent domain, which needs to use the DID of nested domain. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-7-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Add missing device iotlb flush for parent domainYi Liu1-0/+18
ATS-capable devices cache the result of nested translation. This result relies on the mappings in s2 domain (a.k.a. parent). When there are modifications in the s2 domain, the related nested translation caches on the device should be flushed. This includes the devices that are attached to the s1 domain. However, the existing code ignores this fact to only loops its own devices. As there is no easy way to identify the exact set of nested translations affected by the change of s2 domain. So, this just flushes the entire device iotlb on the device. As above, driver loops the s2 domain's s1_domains list and loops the devices list of each s1_domain to flush the entire device iotlb on the devices. Fixes: b41e38e22539 ("iommu/vt-d: Add nested domain allocation") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-6-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Update iotlb in nested domain attachYi Liu3-3/+4
Should call domain_update_iotlb() to update the has_iotlb_device flag of the domain after attaching device to nested domain. Without it, this flag is not set properly and would result in missing device TLB flush. Fixes: 9838f2bb6b6b ("iommu/vt-d: Set the nested domain to a device") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-5-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Add missing iotlb flush for parent domainYi Liu1-0/+31
If a domain is used as the parent in nested translation its mappings might be cached using DID of the nested domain. But the existing code ignores this fact to only invalidate the iotlb entries tagged by the domain's own DID. Loop the s1_domains list, if any, to invalidate all iotlb entries related to the target s2 address range. According to VT-d spec there is no need for software to explicitly flush the affected s1 cache. It's implicitly done by HW when s2 cache is invalidated. Fixes: b41e38e22539 ("iommu/vt-d: Add nested domain allocation") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-4-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Add __iommu_flush_iotlb_psi()Yi Liu1-35/+43
Add __iommu_flush_iotlb_psi() to do the psi iotlb flush with a DID input rather than calculating it within the helper. This is useful when flushing cache for parent domain which reuses DIDs of its nested domains. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-3-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-21iommu/vt-d: Track nested domains in parentYi Liu3-5/+31
Today the parent domain (s2_domain) is unaware of which DID's are used by and which devices are attached to nested domains (s1_domain) nested on it. This leads to a problem that some operations (flush iotlb/devtlb and enable dirty tracking) on parent domain only apply to DID's and devices directly tracked in the parent domain hence are incomplete. This tracks the nested domains in list in parent domain. With this, operations on parent domain can loop the nested domains and refer to the devices and iommu_array to ensure the operations on parent domain take effect on all the affected devices and iommus. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240208082307.15759-2-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-13Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"Dmitry Baryshkov1-11/+6
This reverts commit 9b3febc3a3da ("iommu/arm-smmu: Convert to domain_alloc_paging()"). It breaks Qualcomm MSM8996 platform. Calling arm_smmu_write_context_bank() from new codepath results in the platform being reset because of the unclocked hardware access. Fixes: 9b3febc3a3da ("iommu/arm-smmu: Convert to domain_alloc_paging()") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20240213-iommu-revert-domain-alloc-v1-1-325ff55dece4@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2024-02-06iommufd/iova_bitmap: Consider page offset for the pages to be pinnedJoao Martins1-6/+7
For small bitmaps that aren't PAGE_SIZE aligned *and* that are less than 512 pages in bitmap length, use an extra page to be able to cover the entire range e.g. [1M..3G] which would be iterated more efficiently in a single iteration, rather than two. Fixes: b058ea3ab5af ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries") Link: https://lore.kernel.org/r/20240202133415.23819-10-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06iommufd/selftest: Hugepage mock domain supportJoao Martins2-2/+14
Add support to mock iommu hugepages of 1M (for a 2K mock io page size). To avoid breaking test suite defaults, the way this is done is by explicitly creating a iommu mock device which has hugepage support (i.e. through MOCK_FLAGS_DEVICE_HUGE_IOVA). The same scheme is maintained of mock base page index tracking in the XArray, except that an extra bit is added to mark it as a hugepage. One subpage containing the dirty bit, means that the whole hugepage is dirty (similar to AMD IOMMU non-standard page sizes). For clearing, same thing applies, and it must clear all dirty subpages. This is in preparation for dirty tracking to mark mock hugepages as dirty to exercise all the iova-bitmap fixes. Link: https://lore.kernel.org/r/20240202133415.23819-8-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06iommufd/selftest: Refactor mock_domain_read_and_clear_dirty()Joao Martins1-19/+45
Move the clearing of the dirty bit of the mock domain into mock_domain_test_and_clear_dirty() helper, simplifying the caller function. Additionally, rework the mock_domain_read_and_clear_dirty() loop to iterate over a potentially variable IO page size. No functional change intended with the loop refactor. This is in preparation for dirty tracking support for IOMMU hugepage mock domains. Link: https://lore.kernel.org/r/20240202133415.23819-7-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06iommufd/iova_bitmap: Handle recording beyond the mapped pagesJoao Martins1-0/+43
IOVA bitmap is a zero-copy scheme of recording dirty bits that iterate the different bitmap user pages at chunks of a maximum of PAGE_SIZE/sizeof(struct page*) pages. When the iterations are split up into 64G, the end of the range may be broken up in a way that's aligned with a non base page PTE size. This leads to only part of the huge page being recorded in the bitmap. Note that in pratice this is only a problem for IOMMU dirty tracking i.e. when the backing PTEs are in IOMMU hugepages and the bitmap is in base page granularity. So far this not something that affects VF dirty trackers (which reports and records at the same granularity). To fix that, if there is a remainder of bits left to set in which the current IOVA bitmap doesn't cover, make a copy of the bitmap structure and iterate-and-set the rest of the bits remaining. Finally, when advancing the iterator, skip all the bits that were set ahead. Link: https://lore.kernel.org/r/20240202133415.23819-5-joao.m.martins@oracle.com Reported-by: Avihai Horon <avihaih@nvidia.com> Fixes: f35f22cc760e ("iommu/vt-d: Access/Dirty bit support for SS domains") Fixes: 421a511a293f ("iommu/amd: Access/Dirty bit support in IOPTEs") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 arrayJoao Martins1-4/+4
iova_bitmap_mapped_length() don't deal correctly with the small bitmaps (< 2M bitmaps) when the starting address isn't u64 aligned, leading to skipping a tiny part of the IOVA range. This is materialized as not marking data dirty that should otherwise have been. Fix that by using a u8 * in the internal state of IOVA bitmap. Most of the data structures use the type of the bitmap to adjust its indexes, thus changing the type of the bitmap decreases the granularity of the bitmap indexes. Fixes: b058ea3ab5af ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries") Link: https://lore.kernel.org/r/20240202133415.23819-3-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-06iommufd/iova_bitmap: Bounds check mapped::pages accessJoao Martins1-0/+4
Dirty IOMMU hugepages reported on a base page page-size granularity can lead to an attempt to set dirty pages in the bitmap beyond the limits that are pinned. Bounds check the page index of the array we are trying to access is within the limits before we kmap() and return otherwise. While it is also a defensive check, this is also in preparation to defer setting bits (outside the mapped range) to the next iteration(s) when the pages become available. Fixes: b058ea3ab5af ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries") Link: https://lore.kernel.org/r/20240202133415.23819-2-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-01iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMAJason Gunthorpe1-5/+13
The ops->default_domain flow used a 0 req_type to select the default domain and this was enforced by iommu_group_alloc_default_domain(). When !CONFIG_IOMMU_DMA started forcing the old ARM32 drivers into IDENTITY it also overroad the 0 req_type of the ops->default_domain drivers to IDENTITY which ends up causing failures during device probe. Make iommu_group_alloc_default_domain() accept a req_type that matches the ops->default_domain and have iommu_group_alloc_default_domain() generate a req_type that matches the default_domain. This way the req_type always describes what kind of domain should be attached and ops->default_domain overrides all other mechanisms to choose the default domain. Fixes: 2ad56efa80db ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops") Fixes: 0f6a90436a57 ("iommu: Do not use IOMMU_DOMAIN_DMA if CONFIG_IOMMU_DMA is not enabled") Reported-by: Ovidiu Panait <ovidiu.panait@windriver.com> Closes: https://lore.kernel.org/linux-iommu/20240123165829.630276-1-ovidiu.panait@windriver.com/ Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Closes: https://lore.kernel.org/linux-iommu/170618452753.3805.4425669653666211728.stgit@ltcd48-lp2.aus.stglab.ibm.com/ Tested-by: Ovidiu Panait <ovidiu.panait@windriver.com> Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-755bd21c4a64+525b8-iommu_def_dom_fix_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-01-19Merge tag 'for-linus-iommufd' of ↵Linus Torvalds6-13/+256
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This brings the first of three planned user IO page table invalidation operations: - IOMMU_HWPT_INVALIDATE allows invalidating the IOTLB integrated into the iommu itself. The Intel implementation will also generate an ATC invalidation to flush the device IOTLB as it unambiguously knows the device, but other HW will not. It goes along with the prior PR to implement userspace IO page tables (aka nested translation for VMs) to allow Intel to have full functionality for simple cases. An Intel implementation of the operation is provided. Also fix a small bug in the selftest mock iommu driver probe" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd/selftest: Check the bus type during probe iommu/vt-d: Add iotlb flush for nested domain iommufd: Add data structure for Intel VT-d stage-1 cache invalidation iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add mock_domain_cache_invalidate_user support iommu: Add iommu_copy_struct_from_user_array helper iommufd: Add IOMMU_HWPT_INVALIDATE iommu: Add cache_invalidate_user op
2024-01-19Merge tag 'iommu-updates-v6.8' of ↵Linus Torvalds33-952/+1036
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Fix race conditions in device probe path - Retire IOMMU bus_ops - Support for passing custom allocators to page table drivers - Clean up Kconfig around IOMMU_SVA - Support for sharing SVA domains with all devices bound to a mm - Firmware data parsing cleanup - Tracing improvements for iommu-dma code - Some smaller fixes and cleanups ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Document Adreno clocks for Qualcomm's SM8350 SoC - SMMUv2: - Implement support for the ->domain_alloc_paging() callback - Ensure Secure context is restored following suspend of Qualcomm SMMU implementation - SMMUv3: - Disable stalling mode for the "quiet" context descriptor - Minor refactoring and driver cleanups Intel VT-d driver: - Cleanup and refactoring AMD IOMMU driver: - Improve IO TLB invalidation logic - Small cleanups and improvements Rockchip IOMMU driver: - DT binding update to add Rockchip RK3588 Apple DART driver: - Apple M1 USB4/Thunderbolt DART support - Cleanups Virtio IOMMU driver: - Add support for iotlb_sync_map - Enable deferred IO TLB flushes" * tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (66 commits) iommu: Don't reserve 0-length IOVA region iommu/vt-d: Move inline helpers to header files iommu/vt-d: Remove unused vcmd interfaces iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through() iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directly iommu/sva: Fix memory leak in iommu_sva_bind_device() dt-bindings: iommu: rockchip: Add Rockchip RK3588 iommu/dma: Trace bounce buffer usage when mapping buffers iommu/arm-smmu: Convert to domain_alloc_paging() iommu/arm-smmu: Pass arm_smmu_domain to internal functions iommu/arm-smmu: Implement IOMMU_DOMAIN_BLOCKED iommu/arm-smmu: Convert to a global static identity domain iommu/arm-smmu: Reorganize arm_smmu_domain_add_master() iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Master cannot be NULL in arm_smmu_write_strtab_ent() iommu/arm-smmu-v3: Add a type for the STE iommu/arm-smmu-v3: disable stall for quiet_cd iommu/qcom: restore IOMMU state if needed iommu/arm-smmu-qcom: Add QCM2290 MDSS compatible iommu/arm-smmu-qcom: Add missing GMU entry to match table ...
2024-01-11iommufd/selftest: Check the bus type during probeJason Gunthorpe1-13/+15
This relied on the probe function only being invoked by the bus type mock was registered on. The removal of the bus ops broke this assumption and the probe could be called on non-mock bus types like PCI. Check the bus type directly in probe. Fixes: 17de3f5fdd35 ("iommu: Retire bus ops") Link: https://lore.kernel.org/r/0-v1-82d59f7eab8c+40c-iommufd_mock_bus_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-01-11iommu/vt-d: Add iotlb flush for nested domainLu Baolu1-0/+88
This implements the .cache_invalidate_user() callback to support iotlb flush for nested domain. Link: https://lore.kernel.org/r/20240111041015.47920-9-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Co-developed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-01-11iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test opNicolin Chen2-0/+31
Allow to test whether IOTLB has been invalidated or not. Link: https://lore.kernel.org/r/20240111041015.47920-6-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-01-11iommufd/selftest: Add mock_domain_cache_invalidate_user supportNicolin Chen2-0/+68
Add mock_domain_cache_invalidate_user() data structure to support user space selftest program to cover user cache invalidation pathway. Link: https://lore.kernel.org/r/20240111041015.47920-5-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-01-11iommufd: Add IOMMU_HWPT_INVALIDATEYi Liu3-0/+54
In nested translation, the stage-1 page table is user-managed but cached by the IOMMU hardware, so an update on present page table entries in the stage-1 page table should be followed with a cache invalidation. Add an IOMMU_HWPT_INVALIDATE ioctl to support such a cache invalidation. It takes hwpt_id to specify the iommu_domain, and a multi-entry array to support multiple invalidation data in one ioctl. enum iommu_hwpt_invalidate_data_type is defined to tag the data type of the entries in the multi-entry array. Link: https://lore.kernel.org/r/20240111041015.47920-3-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-01-09Merge tag 'mm-stable-2024-01-08-15-31' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2024-01-09Merge tag 'x86-apic-2024-01-08' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: - Clean up 'struct apic': - Drop ::delivery_mode - Drop 'enum apic_delivery_modes' - Drop 'struct local_apic' - Fix comments * tag 'x86-apic-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioapic: Remove unfinished sentence from comment x86/apic: Drop struct local_apic x86/apic: Drop enum apic_delivery_modes x86/apic: Drop apic::delivery_mode
2024-01-09mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDERKirill A. Shutemov2-2/+2
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-03Merge branches 'apple/dart', 'arm/rockchip', 'arm/smmu', 'virtio', ↵Joerg Roedel33-952/+1036
'x86/vt-d', 'x86/amd' and 'core' into next
2023-12-19iommu: Don't reserve 0-length IOVA regionAshish Mhetre1-0/+4
When the bootloader/firmware doesn't setup the framebuffers, their address and size are 0 in "iommu-addresses" property. If IOVA region is reserved with 0 length, then it ends up corrupting the IOVA rbtree with an entry which has pfn_hi < pfn_lo. If we intend to use display driver in kernel without framebuffer then it's causing the display IOMMU mappings to fail as entire valid IOVA space is reserved when address and length are passed as 0. An ideal solution would be firmware removing the "iommu-addresses" property and corresponding "memory-region" if display is not present. But the kernel should be able to handle this by checking for size of IOVA region and skipping the IOVA reservation if size is 0. Also, add a warning if firmware is requesting 0-length IOVA region reservation. Fixes: a5bf3cfce8cb ("iommu: Implement of_iommu_get_resv_regions()") Signed-off-by: Ashish Mhetre <amhetre@nvidia.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20231205065656.9544-1-amhetre@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19iommu/vt-d: Move inline helpers to header filesLu Baolu4-405/+400
Move inline helpers to header files so that other files can use them without duplicating the code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19iommu/vt-d: Remove unused vcmd interfacesLu Baolu4-75/+0
Commit 99b5726b4423 ("iommu: Remove ioasid infrastructure") has removed ioasid allocation interfaces from the iommu subsystem. As a result, these vcmd interfaces have become obsolete. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through()Lu Baolu3-5/+2
The domain parameter of this helper is unused and can be deleted to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directlyLu Baolu3-38/+13
The device_to_iommu() helper was originally designed to look up the DMAR ACPI table to retrieve the iommu device and the request ID for a given device. However, it was also being used in other places where there was no need to lookup the ACPI table at all. Retrieve the iommu device directly from the per-device iommu private data in functions called after device is probed. Rename the original device_to_iommu() function to a more meaningful name, device_lookup_iommu(), to avoid mis-using it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-15iommu/sva: Fix memory leak in iommu_sva_bind_device()Harshit Mogalapalli1-1/+2
Free the handle when the domain allocation fails before unlocking and returning. Fixes: 092edaddb660 ("iommu: Support mm PASID 1:n with sva domains") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231213111450.2487861-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-15iommu/dma: Trace bounce buffer usage when mapping buffersIsaac J. Manjarres1-0/+3
When commit 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") was introduced, it did not add the logic for tracing the bounce buffer usage from iommu_dma_map_page(). All of the users of swiotlb_tbl_map_single() trace their bounce buffer usage, except iommu_dma_map_page(). This makes it difficult to track SWIOTLB usage from that function. Thus, trace bounce buffer usage from iommu_dma_map_page(). Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Cc: stable@vger.kernel.org # v5.15+ Cc: Tom Murphy <murphyt7@tcd.ie> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Saravana Kannan <saravanak@google.com> Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Link: https://lore.kernel.org/r/20231208234141.2356157-1-isaacmanjarres@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-13iommu/arm-smmu: Convert to domain_alloc_paging()Jason Gunthorpe1-6/+11
Now that the BLOCKED and IDENTITY behaviors are managed with their own domains change to the domain_alloc_paging() op. The check for using_legacy_binding is now redundant, arm_smmu_def_domain_type() always returns IOMMU_DOMAIN_IDENTITY for this mode, so the core code will never attempt to create a DMA domain in the first place. Since commit a4fdd9762272 ("iommu: Use flush queue capability") the core code only passes in IDENTITY/BLOCKED/UNMANAGED/DMA domain types. It will not pass in IDENTITY or BLOCKED if the global statics exist, so the test for DMA is also redundant now too. Call arm_smmu_init_domain_context() early if a dev is available. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v2-c86cc8c2230e+160bb-smmu_newapi_jgg@nvidia.com [will: Simplify arm_smmu_domain_alloc_paging() since 'cfg' cannot be NULL] Signed-off-by: Will Deacon <will@kernel.org>
2023-12-13iommu/arm-smmu: Pass arm_smmu_domain to internal functionsJason Gunthorpe1-12/+10
Keep the types consistent, all the callers of these functions already have obtained a struct arm_smmu_domain, don't needlessly go to/from an iommu_domain through the internal call chains. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v2-c86cc8c2230e+160bb-smmu_newapi_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2023-12-13iommu/arm-smmu: Implement IOMMU_DOMAIN_BLOCKEDJason Gunthorpe1-3/+25
Using the same design as IDENTITY setup a S2CR_TYPE_FAULT s2cr for the device. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v2-c86cc8c2230e+160bb-smmu_newapi_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>