summaryrefslogtreecommitdiff
path: root/drivers/iommu
AgeCommit message (Collapse)AuthorFilesLines
2024-05-17iommu: mtk: fix module autoloadingKrzysztof Kozlowski2-0/+2
[ Upstream commit 7537e31df80cb58c27f3b6fef702534ea87a5957 ] Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from of_device_id table. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20240410164109.233308-1-krzk@kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-17iommu/vt-d: Allocate local memory for page request queueJacob Pan1-1/+1
[ Upstream commit a34f3e20ddff02c4f12df2c0635367394e64c63d ] The page request queue is per IOMMU, its allocation should be made NUMA-aware for performance reasons. Fixes: a222a7f0bb6c ("iommu/vt-d: Implement page request handling") Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20240403214007.985600-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-03iommu/dma: Force swiotlb_max_mapping_size on an untrusted deviceNicolin Chen1-0/+9
[ Upstream commit afc5aa46ed560f01ceda897c053c6a40c77ce5c4 ] The swiotlb does not support a mapping size > swiotlb_max_mapping_size(). On the other hand, with a 64KB PAGE_SIZE configuration, it's observed that an NVME device can map a size between 300KB~512KB, which certainly failed the swiotlb mappings, though the default pool of swiotlb has many slots: systemd[1]: Started Journal Service. => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots) note: journal-offline[392] exited with irqs disabled note: journal-offline[392] exited with preempt_count 1 Call trace: [ 3.099918] swiotlb_tbl_map_single+0x214/0x240 [ 3.099921] iommu_dma_map_page+0x218/0x328 [ 3.099928] dma_map_page_attrs+0x2e8/0x3a0 [ 3.101985] nvme_prep_rq.part.0+0x408/0x878 [nvme] [ 3.102308] nvme_queue_rqs+0xc0/0x300 [nvme] [ 3.102313] blk_mq_flush_plug_list.part.0+0x57c/0x600 [ 3.102321] blk_add_rq_to_plug+0x180/0x2a0 [ 3.102323] blk_mq_submit_bio+0x4c8/0x6b8 [ 3.103463] __submit_bio+0x44/0x220 [ 3.103468] submit_bio_noacct_nocheck+0x2b8/0x360 [ 3.103470] submit_bio_noacct+0x180/0x6c8 [ 3.103471] submit_bio+0x34/0x130 [ 3.103473] ext4_bio_write_folio+0x5a4/0x8c8 [ 3.104766] mpage_submit_folio+0xa0/0x100 [ 3.104769] mpage_map_and_submit_buffers+0x1a4/0x400 [ 3.104771] ext4_do_writepages+0x6a0/0xd78 [ 3.105615] ext4_writepages+0x80/0x118 [ 3.105616] do_writepages+0x90/0x1e8 [ 3.105619] filemap_fdatawrite_wbc+0x94/0xe0 [ 3.105622] __filemap_fdatawrite_range+0x68/0xb8 [ 3.106656] file_write_and_wait_range+0x84/0x120 [ 3.106658] ext4_sync_file+0x7c/0x4c0 [ 3.106660] vfs_fsync_range+0x3c/0xa8 [ 3.106663] do_fsync+0x44/0xc0 Since untrusted devices might go down the swiotlb pathway with dma-iommu, these devices should not map a size larger than swiotlb_max_mapping_size. To fix this bug, add iommu_dma_max_mapping_size() for untrusted devices to take into account swiotlb_max_mapping_size() v.s. iova_rcache_range() from the iommu_dma_opt_mapping_size(). Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Link: https://lore.kernel.org/r/ee51a3a5c32cf885b18f6416171802669f4a718a.1707851466.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> [will: Drop redundant is_swiotlb_active(dev) check] Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-03iommu: Avoid races around default domain allocationsCharan Teja Kalla1-0/+3
This fix is applicable for LTS kernel, 6.1.y. In latest kernels, this race issue is fixed by the patch series [1] and [2]. The right thing to do here would have been propagating these changes from latest kernel to the stable branch, 6.1.y. However, these changes seems too intrusive to be picked for stable branches. Hence, the fix proposed can be taken as an alternative instead of backporting the patch series. [1] https://lore.kernel.org/all/0-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com/ [2] https://lore.kernel.org/all/0-v5-1b99ae392328+44574-iommu_err_unwind_jgg@nvidia.com/ Issue: A race condition is observed when arm_smmu_device_probe and modprobe of client devices happens in parallel. This results in the allocation of a new default domain for the iommu group even though it was previously allocated and the respective iova domain(iovad) was initialized. However, for this newly allocated default domain, iovad will not be initialized. As a result, for devices requesting dma allocations, this uninitialized iovad will be used, thereby causing NULL pointer dereference issue. Flow: - During arm_smmu_device_probe, bus_iommu_probe() will be called as part of iommu_device_register(). This results in the device probe, __iommu_probe_device(). - When the modprobe of the client device happens in parallel, it sets up the DMA configuration for the device using of_dma_configure_id(), which inturn calls iommu_probe_device(). Later, default domain is allocated and attached using iommu_alloc_default_domain() and __iommu_attach_device() respectively. It then ends up initializing a mapping domain(IOVA domain) and rcaches for the device via arch_setup_dma_ops()->iommu_setup_dma_ops(). - Now, in the bus_iommu_probe() path, it again tries to allocate a default domain via probe_alloc_default_domain(). This results in allocating a new default domain(along with IOVA domain) via __iommu_domain_alloc(). However, this newly allocated IOVA domain will not be initialized. - Now, when the same client device tries dma allocations via iommu_dma_alloc(), it ends up accessing the rcaches of the newly allocated IOVA domain, which is not initialized. This results into NULL pointer dereferencing. Fix this issue by adding a check in probe_alloc_default_domain() to see if the iommu_group already has a default domain allocated and initialized. Cc: <stable@vger.kernel.org> # see patch description, fix applicable only for 6.1.y Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> Co-developed-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-27iommu: Fix compilation without CONFIG_IOMMU_INTELBert Karwatzki3-2/+5
[ Upstream commit 70bad345e622c23bb530016925c936ab04a646ac ] When the kernel is comiled with CONFIG_IRQ_REMAP=y but without CONFIG_IOMMU_INTEL compilation fails since commit def054b01a8678 with an undefined reference to device_rbtree_find(). This patch makes sure that intel specific code is only compiled with CONFIG_IOMMU_INTEL=y. Signed-off-by: Bert Karwatzki <spasswolf@web.de> Fixes: 80a9b50c0b9e ("iommu/vt-d: Improve ITE fault handling if target device isn't present") Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240307194419.15801-1-spasswolf@web.de Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27iommu/vt-d: Retrieve IOMMU perfmon capability informationKan Liang6-1/+273
[ Upstream commit a6a5006dad572a53b5df3f47e1471d207ae9ba49 ] The performance monitoring infrastructure, perfmon, is to support collection of information about key events occurring during operation of the remapping hardware, to aid performance tuning and debug. Each remapping hardware unit has capability registers that indicate support for performance monitoring features and enumerate the capabilities. Add alloc_iommu_pmu() to retrieve IOMMU perfmon capability information for each iommu unit. The information is stored in the iommu->pmu data structure. Capability registers are read-only, so it's safe to prefetch and store them in the pmu structure. This could avoid unnecessary VMEXIT when this code is running in the virtualization environment. Add free_iommu_pmu() to free the saved capability information when freeing the iommu unit. Add a kernel config option for the IOMMU perfmon feature. Unless a user explicitly uses the perf tool to monitor the IOMMU perfmon event, there isn't any impact for the existing IOMMU. Enable it by default. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-3-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: 70bad345e622 ("iommu: Fix compilation without CONFIG_IOMMU_INTEL") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27iommu/vt-d: Don't issue ATS Invalidation request when device is disconnectedEthan Zhao1-0/+3
[ Upstream commit 4fc82cd907ac075648789cc3a00877778aa1838b ] For those endpoint devices connect to system via hotplug capable ports, users could request a hot reset to the device by flapping device's link through setting the slot's link control register, as pciehp_ist() DLLSC interrupt sequence response, pciehp will unload the device driver and then power it off. thus cause an IOMMU device-TLB invalidation (Intel VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence target device to be sent and deadly loop to retry that request after ITE fault triggered in interrupt context. That would cause following continuous hard lockup warning and system hang [ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down [ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not present [ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144 [ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S OE kernel version xxxx [ 4223.822623] Hardware name: vendorname xxxx 666-106, BIOS 01.01.02.03.01 05/15/2023 [ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490 [ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 1 0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39 [ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093 [ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005 [ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340 [ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000 [ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200 [ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004 [ 4223.822626] FS: 0000000000000000(0000) GS:ffffa237ae400000(0000) knlGS:0000000000000000 [ 4223.822627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 0000000000770ee0 [ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 4223.822628] PKRU: 55555554 [ 4223.822628] Call Trace: [ 4223.822628] qi_flush_dev_iotlb+0xb1/0xd0 [ 4223.822628] __dmar_remove_one_dev_info+0x224/0x250 [ 4223.822629] dmar_remove_one_dev_info+0x3e/0x50 [ 4223.822629] intel_iommu_release_device+0x1f/0x30 [ 4223.822629] iommu_release_device+0x33/0x60 [ 4223.822629] iommu_bus_notifier+0x7f/0x90 [ 4223.822630] blocking_notifier_call_chain+0x60/0x90 [ 4223.822630] device_del+0x2e5/0x420 [ 4223.822630] pci_remove_bus_device+0x70/0x110 [ 4223.822630] pciehp_unconfigure_device+0x7c/0x130 [ 4223.822631] pciehp_disable_slot+0x6b/0x100 [ 4223.822631] pciehp_handle_presence_or_link_change+0xd8/0x320 [ 4223.822631] pciehp_ist+0x176/0x180 [ 4223.822631] ? irq_finalize_oneshot.part.50+0x110/0x110 [ 4223.822632] irq_thread_fn+0x19/0x50 [ 4223.822632] irq_thread+0x104/0x190 [ 4223.822632] ? irq_forced_thread_fn+0x90/0x90 [ 4223.822632] ? irq_thread_check_affinity+0xe0/0xe0 [ 4223.822633] kthread+0x114/0x130 [ 4223.822633] ? __kthread_cancel_work+0x40/0x40 [ 4223.822633] ret_from_fork+0x1f/0x30 [ 4223.822633] Kernel panic - not syncing: Hard LOCKUP [ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S OE kernel version xxxx [ 4223.822634] Hardware name: vendorname xxxx 666-106, BIOS 01.01.02.03.01 05/15/2023 [ 4223.822634] Call Trace: [ 4223.822634] <NMI> [ 4223.822635] dump_stack+0x6d/0x88 [ 4223.822635] panic+0x101/0x2d0 [ 4223.822635] ? ret_from_fork+0x11/0x30 [ 4223.822635] nmi_panic.cold.14+0xc/0xc [ 4223.822636] watchdog_overflow_callback.cold.8+0x6d/0x81 [ 4223.822636] __perf_event_overflow+0x4f/0xf0 [ 4223.822636] handle_pmi_common+0x1ef/0x290 [ 4223.822636] ? __set_pte_vaddr+0x28/0x40 [ 4223.822637] ? flush_tlb_one_kernel+0xa/0x20 [ 4223.822637] ? __native_set_fixmap+0x24/0x30 [ 4223.822637] ? ghes_copy_tofrom_phys+0x70/0x100 [ 4223.822637] ? __ghes_peek_estatus.isra.16+0x49/0xa0 [ 4223.822637] intel_pmu_handle_irq+0xba/0x2b0 [ 4223.822638] perf_event_nmi_handler+0x24/0x40 [ 4223.822638] nmi_handle+0x4d/0xf0 [ 4223.822638] default_do_nmi+0x49/0x100 [ 4223.822638] exc_nmi+0x134/0x180 [ 4223.822639] end_repeat_nmi+0x16/0x67 [ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490 [ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 10 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39 [ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093 [ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005 [ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340 [ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000 [ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200 [ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004 [ 4223.822641] ? qi_submit_sync+0x2c0/0x490 [ 4223.822642] ? qi_submit_sync+0x2c0/0x490 [ 4223.822642] </NMI> [ 4223.822642] qi_flush_dev_iotlb+0xb1/0xd0 [ 4223.822642] __dmar_remove_one_dev_info+0x224/0x250 [ 4223.822643] dmar_remove_one_dev_info+0x3e/0x50 [ 4223.822643] intel_iommu_release_device+0x1f/0x30 [ 4223.822643] iommu_release_device+0x33/0x60 [ 4223.822643] iommu_bus_notifier+0x7f/0x90 [ 4223.822644] blocking_notifier_call_chain+0x60/0x90 [ 4223.822644] device_del+0x2e5/0x420 [ 4223.822644] pci_remove_bus_device+0x70/0x110 [ 4223.822644] pciehp_unconfigure_device+0x7c/0x130 [ 4223.822644] pciehp_disable_slot+0x6b/0x100 [ 4223.822645] pciehp_handle_presence_or_link_change+0xd8/0x320 [ 4223.822645] pciehp_ist+0x176/0x180 [ 4223.822645] ? irq_finalize_oneshot.part.50+0x110/0x110 [ 4223.822645] irq_thread_fn+0x19/0x50 [ 4223.822646] irq_thread+0x104/0x190 [ 4223.822646] ? irq_forced_thread_fn+0x90/0x90 [ 4223.822646] ? irq_thread_check_affinity+0xe0/0xe0 [ 4223.822646] kthread+0x114/0x130 [ 4223.822647] ? __kthread_cancel_work+0x40/0x40 [ 4223.822647] ret_from_fork+0x1f/0x30 [ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Such issue could be triggered by all kinds of regular surprise removal hotplug operation. like: 1. pull EP(endpoint device) out directly. 2. turn off EP's power. 3. bring the link down. etc. this patch aims to work for regular safe removal and surprise removal unplug. these hot unplug handling process could be optimized for fix the ATS Invalidation hang issue by calling pci_dev_is_disconnected() in function devtlb_invalidation_with_pasid() to check target device state to avoid sending meaningless ATS Invalidation request to iommu when device is gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1) For safe removal, device wouldn't be removed until the whole software handling process is done, it wouldn't trigger the hard lock up issue caused by too long ATS Invalidation timeout wait. In safe removal path, device state isn't set to pci_channel_io_perm_failure in pciehp_unconfigure_device() by checking 'presence' parameter, calling pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return false there, wouldn't break the function. For surprise removal, device state is set to pci_channel_io_perm_failure in pciehp_unconfigure_device(), means device is already gone (disconnected) call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return true to break the function not to send ATS Invalidation request to the disconnected device blindly, thus avoid to trigger further ITE fault, and ITE fault will block all invalidation request to be handled. furthermore retry the timeout request could trigger hard lockup. safe removal (present) & surprise removal (not present) pciehp_ist() pciehp_handle_presence_or_link_change() pciehp_disable_slot() remove_board() pciehp_unconfigure_device(presence) { if (!presence) pci_walk_bus(parent, pci_dev_set_disconnected, NULL); } this patch works for regular safe removal and surprise removal of ATS capable endpoint on PCIe switch downstream ports. Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface") Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Tested-by: Haorong Ye <yehaorong@bytedance.com> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> Link: https://lore.kernel.org/r/20240301080727.3529832-3-haifeng.zhao@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27iommu/amd: Mark interrupt as managedMario Limonciello1-0/+3
[ Upstream commit 0feda94c868d396fac3b3cb14089d2d989a07c72 ] On many systems that have an AMD IOMMU the following sequence of warnings is observed during bootup. ``` pci 0000:00:00.2 can't derive routing for PCI INT A pci 0000:00:00.2: PCI INT A: not connected ``` This series of events happens because of the IOMMU initialization sequence order and the lack of _PRT entries for the IOMMU. During initialization the IOMMU driver first enables the PCI device using pci_enable_device(). This will call acpi_pci_irq_enable() which will check if the interrupt is declared in a PCI routing table (_PRT) entry. According to the PCI spec [1] these routing entries are only required under PCI root bridges: The _PRT object is required under all PCI root bridges The IOMMU is directly connected to the root complex, so there is no parent bridge to look for a _PRT entry. The first warning is emitted since no entry could be found in the hierarchy. The second warning is then emitted because the interrupt hasn't yet been configured to any value. The pin was configured in pci_read_irq() but the byte in PCI_INTERRUPT_LINE return 0xff which means "Unknown". After that sequence of events pci_enable_msi() is called and this will allocate an interrupt. That is both of these warnings are totally harmless because the IOMMU uses MSI for interrupts. To avoid even trying to probe for a _PRT entry mark the IOMMU as IRQ managed. This avoids both warnings. Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/06_Device_Configuration/Device_Configuration.html?highlight=_prt#prt-pci-routing-table [1] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Fixes: cffe0a2b5a34 ("x86, irq: Keep balance of IOAPIC pin reference count") Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20240122233400.1802-1-mario.limonciello@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-06iommu/arm-smmu-qcom: Limit the SMR groups to 128Manivannan Sadhasivam1-1/+15
[ Upstream commit 12261134732689b7e30c59db9978f81230965181 ] Some platforms support more than 128 stream matching groups than what is defined by the ARM SMMU architecture specification. But due to some unknown reasons, those additional groups don't exhibit the same behavior as the architecture supported ones. For instance, the additional groups will not detect the quirky behavior of some firmware versions intercepting writes to S2CR register, thus skipping the quirk implemented in the driver and causing boot crash. So let's limit the groups to 128 for now until the issue with those groups are fixed and issue a notice to users in that case. Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20230327080029.11584-1-manivannan.sadhasivam@linaro.org [will: Reworded the comment slightly] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-06iommu/arm-smmu-v3: Acknowledge pri/event queue overflow if anyTomas Krcka1-5/+14
[ Upstream commit 67ea0b7ce41844eae7c10bb04dfe66a23318c224 ] When an overflow occurs in the PRI queue, the SMMU toggles the overflow flag in the PROD register. To exit the overflow condition, the PRI thread is supposed to acknowledge it by toggling this flag in the CONS register. Unacknowledged overflow causes the queue to stop adding anything new. Currently, the priq thread always writes the CONS register back to the SMMU after clearing the queue. The writeback is not necessary if the OVFLG in the PROD register has not been changed, no overflow has occured. This commit checks the difference of the overflow flag between CONS and PROD register. If it's different, toggles the OVACKFLG flag in the CONS register and write it to the SMMU. The situation is similar for the event queue. The acknowledge register is also toggled after clearing the event queue but never propagated to the hardware. This would only be done the next time when executing evtq thread. Unacknowledged event queue overflow doesn't affect the event queue, because the SMMU still adds elements to that queue when the overflow condition is active. But it feel nicer to keep SMMU in sync when possible, so use the same way here as well. Signed-off-by: Tomas Krcka <krckatom@amazon.de> Link: https://lore.kernel.org/r/20230329123420.34641-1-tomas.krcka@gmail.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-06iommu/sprd: Release dma buffer to avoid memory leakChunyan Zhang1-7/+22
[ Upstream commit 9afea57384d4ae7b2034593eac7fa76c7122762a ] When attaching to a domain, the driver would alloc a DMA buffer which is used to store address mapping table, and it need to be released when the IOMMU domain is freed. Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com> Link: https://lore.kernel.org/r/20230331033124.864691-2-zhang.lyra@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26iommu/dma: Trace bounce buffer usage when mapping buffersIsaac J. Manjarres1-0/+3
commit a63c357b9fd56ad5fe64616f5b22835252c6a76a upstream. When commit 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") was introduced, it did not add the logic for tracing the bounce buffer usage from iommu_dma_map_page(). All of the users of swiotlb_tbl_map_single() trace their bounce buffer usage, except iommu_dma_map_page(). This makes it difficult to track SWIOTLB usage from that function. Thus, trace bounce buffer usage from iommu_dma_map_page(). Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Cc: stable@vger.kernel.org # v5.15+ Cc: Tom Murphy <murphyt7@tcd.ie> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Saravana Kannan <saravanak@google.com> Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> Link: https://lore.kernel.org/r/20231208234141.2356157-1-isaacmanjarres@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-26iommu/arm-smmu-qcom: Add missing GMU entry to match tableRob Clark1-0/+1
commit afc95681c3068956fed1241a1ff1612c066c75ac upstream. In some cases the firmware expects cbndx 1 to be assigned to the GMU, so we also want the default domain for the GMU to be an identy domain. This way it does not get a context bank assigned. Without this, both of_dma_configure() and drm/msm's iommu_domain_attach() will trigger allocating and configuring a context bank. So GMU ends up attached to both cbndx 1 and later cbndx 2. This arrangement seemingly confounds and surprises the firmware if the GPU later triggers a translation fault, resulting (on sc8280xp / lenovo x13s, at least) in the SMMU getting wedged and the GPU stuck without memory access. Cc: stable@vger.kernel.org Signed-off-by: Rob Clark <robdclark@chromium.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20231210180655.75542-1-robdclark@gmail.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13iommu: Avoid more races around device probeRobin Murphy2-13/+18
commit a2e7e59a94269484a83386972ca07c22fd188854 upstream. It turns out there are more subtle races beyond just the main part of __iommu_probe_device() itself running in parallel - the dev_iommu_free() on the way out of an unsuccessful probe can still manage to trip up concurrent accesses to a device's fwspec. Thus, extend the scope of iommu_probe_device_lock() to also serialise fwspec creation and initial retrieval. Reported-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@quicinc.com/ Fixes: 01657bc14a39 ("iommu: Avoid races around device probe") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: André Draszik <andre.draszik@linaro.org> Tested-by: André Draszik <andre.draszik@linaro.org> Link: https://lore.kernel.org/r/16f433658661d7cadfea51e7c65da95826112a2b.1700071477.git.robin.murphy@arm.com Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08iommu/vt-d: Make context clearing consistent with context mappingLu Baolu1-2/+2
[ Upstream commit 9a16ab9d640274b20813d2d17475e18d3e99d834 ] In the iommu probe_device path, domain_context_mapping() allows setting up the context entry for a non-PCI device. However, in the iommu release_device path, domain_context_clear() only clears context entries for PCI devices. Make domain_context_clear() behave consistently with domain_context_mapping() by clearing context entries for both PCI and non-PCI devices. Fixes: 579305f75d34 ("iommu/vt-d: Update to use PCI DMA aliases") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08iommu/vt-d: Disable PCI ATS in legacy passthrough modeLu Baolu1-1/+2
[ Upstream commit da37dddcf4caf015c400a930301d2ee27a7a15fb ] When IOMMU hardware operates in legacy mode, the TT field of the context entry determines the translation type, with three supported types (Section 9.3 Context Entry): - DMA translation without device TLB support - DMA translation with device TLB support - Passthrough mode with translated and translation requests blocked Device TLB support is absent when hardware is configured in passthrough mode. Disable the PCI ATS feature when IOMMU is configured for passthrough translation type in legacy (non-scalable) mode. Fixes: 0faa19a1515f ("iommu/vt-d: Decouple PASID & PRI enabling from SVA") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08iommu/vt-d: Add device_block_translation() helperLu Baolu1-6/+38
[ Upstream commit c7be17c2903d4acbf9aa372bfb6e2a418387fce0 ] If domain attaching to device fails, the IOMMU driver should bring the device to blocking DMA state. The upper layer is expected to recover it by attaching a new domain. Use device_block_translation() in the error path of dev_attach to make the behavior specific. The difference between device_block_translation() and the previous dmar_remove_one_dev_info() is that, in the scalable mode, it is the RID2PASID entry instead of context entry being cleared. As a result, enabling PCI capabilities is moved up. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: da37dddcf4ca ("iommu/vt-d: Disable PCI ATS in legacy passthrough mode") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08iommu/vt-d: Allocate pasid table in device probe pathLu Baolu1-8/+12
[ Upstream commit ec62b4424174f41bdcedd08d12d7bed80088453d ] Whether or not a domain is attached to the device, the pasid table should always be valid as long as it has been probed. This moves the pasid table allocation from the domain attaching device path to device probe path and frees it in the device release path. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: da37dddcf4ca ("iommu/vt-d: Disable PCI ATS in legacy passthrough mode") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08iommu/vt-d: Omit devTLB invalidation requests when TES=0Lu Baolu1-0/+18
[ Upstream commit 0f5432a9b839847dcfe9fa369d72e3d646102ddf ] The latest VT-d spec indicates that when remapping hardware is disabled (TES=0 in Global Status Register), upstream ATS Invalidation Completion requests are treated as UR (Unsupported Request). Consequently, the spec recommends in section 4.3 Handling of Device-TLB Invalidations that software refrain from submitting any Device-TLB invalidation requests when address remapping hardware is disabled. Verify address remapping hardware is enabled prior to submitting Device- TLB invalidation requests. Fixes: 792fb43ce2c9 ("iommu/vt-d: Enable Intel IOMMU scalable mode by default") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-08iommu/vt-d: Add MTL to quirk list to skip TE disablingAbdul Halim, Mohd Syazwan1-1/+1
commit 85b80fdffa867d75dfb9084a839e7949e29064e8 upstream. The VT-d spec requires (10.4.4 Global Command Register, TE field) that: Hardware implementations supporting DMA draining must drain any in-flight DMA read/write requests queued within the Root-Complex before switching address translation on or off and reflecting the status of the command through the TES field in the Global Status register. Unfortunately, some integrated graphic devices fail to do so after some kind of power state transition. As the result, the system might stuck in iommu_disable_translation(), waiting for the completion of TE transition. Add MTL to the quirk list for those devices and skips TE disabling if the qurik hits. Fixes: b1012ca8dc4f ("iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu") Cc: stable@vger.kernel.org Signed-off-by: Abdul Halim, Mohd Syazwan <mohd.syazwan.abdul.halim@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116022324.30120-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10iommu/mediatek: Fix share pgtable for iova over 4GBYong Wu1-5/+4
[ Upstream commit b07eba71a512eb196cbcc29765c29c8c29b11b59 ] In mt8192/mt8186, there is only one MM IOMMU that supports 16GB iova space, which is shared by display, vcodec and camera. These two SoC use one pgtable and have not the flag SHARE_PGTABLE, we should also keep share pgtable for this case. In mtk_iommu_domain_finalise, MM IOMMU always share pgtable, thus remove the flag SHARE_PGTABLE checking. Infra IOMMU always uses independent pgtable. Fixes: cf69ef46dbd9 ("iommu/mediatek: Fix two IOMMU share pagetable issue") Reported-by: Laura Nao <laura.nao@collabora.com> Closes: https://lore.kernel.org/linux-iommu/20230818154156.314742-1-laura.nao@collabora.com/ Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: Laura Nao <laura.nao@collabora.com> Link: https://lore.kernel.org/r/20230819081443.8333-1-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10iommu/vt-d: Avoid memory allocation in iommu_suspend()Zhang Rui2-17/+1
commit 59df44bfb0ca4c3ee1f1c3c5d0ee8e314844799e upstream. The iommu_suspend() syscore suspend callback is invoked with IRQ disabled. Allocating memory with the GFP_KERNEL flag may re-enable IRQs during the suspend callback, which can cause intermittent suspend/hibernation problems with the following kernel traces: Calling iommu_suspend+0x0/0x1d0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 15 at kernel/time/timekeeping.c:868 ktime_get+0x9b/0xb0 ... CPU: 0 PID: 15 Comm: rcu_preempt Tainted: G U E 6.3-intel #r1 RIP: 0010:ktime_get+0x9b/0xb0 ... Call Trace: <IRQ> tick_sched_timer+0x22/0x90 ? __pfx_tick_sched_timer+0x10/0x10 __hrtimer_run_queues+0x111/0x2b0 hrtimer_interrupt+0xfa/0x230 __sysvec_apic_timer_interrupt+0x63/0x140 sysvec_apic_timer_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1f/0x30 ... ------------[ cut here ]------------ Interrupts enabled after iommu_suspend+0x0/0x1d0 WARNING: CPU: 0 PID: 27420 at drivers/base/syscore.c:68 syscore_suspend+0x147/0x270 CPU: 0 PID: 27420 Comm: rtcwake Tainted: G U W E 6.3-intel #r1 RIP: 0010:syscore_suspend+0x147/0x270 ... Call Trace: <TASK> hibernation_snapshot+0x25b/0x670 hibernate+0xcd/0x390 state_store+0xcf/0xe0 kobj_attr_store+0x13/0x30 sysfs_kf_write+0x3f/0x50 kernfs_fop_write_iter+0x128/0x200 vfs_write+0x1fd/0x3c0 ksys_write+0x6f/0xf0 __x64_sys_write+0x1d/0x30 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Given that only 4 words memory is needed, avoid the memory allocation in iommu_suspend(). CC: stable@kernel.org Fixes: 33e07157105e ("iommu/vt-d: Avoid GFP_ATOMIC where it is not needed") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Ooi, Chin Hao <chin.hao.ooi@intel.com> Link: https://lore.kernel.org/r/20230921093956.234692-1-rui.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230925120417.55977-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10iommu/arm-smmu-v3: Avoid constructing invalid range commandsRobin Murphy1-5/+10
[ Upstream commit eb6c97647be227822c7ce23655482b05e348fba5 ] Although io-pgtable's non-leaf invalidations are always for full tables, I missed that SVA also uses non-leaf invalidations, while being at the mercy of whatever range the MMU notifier throws at it. This means it definitely wants the previous TTL fix as well, since it also doesn't know exactly which leaf level(s) may need invalidating, but it can also give us less-aligned ranges wherein certain corners may lead to building an invalid command where TTL, Num and Scale are all 0. It should be fine to handle this by over-invalidating an extra page, since falling back to a non-range command opens up a whole can of errata-flavoured worms. Fixes: 6833b8f2e199 ("iommu/arm-smmu-v3: Set TTL invalidation hint better") Reported-by: Rui Zhu <zhurui3@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/b99cfe71af2bd93a8a2930f20967fb2a4f7748dd.1694432734.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10iommu/arm-smmu-v3: Set TTL invalidation hint betterRobin Murphy1-2/+7
[ Upstream commit 6833b8f2e19945a41e4d5efd8c6d9f4cae9a5b7d ] When io-pgtable unmaps a whole table, rather than waste time walking it to find the leaf entries to invalidate exactly, it simply expects .tlb_flush_walk with nominal last-level granularity to invalidate any leaf entries at higher intermediate levels as well. This works fine with page-based invalidation, but with range commands we need to be careful with the TTL hint - unconditionally setting it based on the given level 3 granule means that an invalidation for a level 1 table would strictly not be required to affect level 2 block entries. It's easy to comply with the expected behaviour by simply not setting the TTL hint for non-leaf invalidations, so let's do that. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/b409d9a17c52dc0db51faee91d92737bb7975f5b.1685637456.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-06iommu/arm-smmu-v3: Fix soft lockup triggered by arm_smmu_mm_invalidate_rangeNicolin Chen1-3/+24
commit d5afb4b47e13161b3f33904d45110f9e6463bad6 upstream. When running an SVA case, the following soft lockup is triggered: -------------------------------------------------------------------- watchdog: BUG: soft lockup - CPU#244 stuck for 26s! pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50 sp : ffff8000d83ef290 x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000 x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000 x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0 x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0 x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001 Call trace: arm_smmu_cmdq_issue_cmdlist+0x178/0xa50 __arm_smmu_tlb_inv_range+0x118/0x254 arm_smmu_tlb_inv_range_asid+0x6c/0x130 arm_smmu_mm_invalidate_range+0xa0/0xa4 __mmu_notifier_invalidate_range_end+0x88/0x120 unmap_vmas+0x194/0x1e0 unmap_region+0xb4/0x144 do_mas_align_munmap+0x290/0x490 do_mas_munmap+0xbc/0x124 __vm_munmap+0xa8/0x19c __arm64_sys_munmap+0x28/0x50 invoke_syscall+0x78/0x11c el0_svc_common.constprop.0+0x58/0x1c0 do_el0_svc+0x34/0x60 el0_svc+0x2c/0xd4 el0t_64_sync_handler+0x114/0x140 el0t_64_sync+0x1a4/0x1a8 -------------------------------------------------------------------- The commit 06ff87bae8d3 ("arm64: mm: remove unused functions and variable protoypes") fixed a similar lockup on the CPU MMU side. Yet, it can occur to SMMU too since arm_smmu_mm_invalidate_range() is typically called next to MMU tlb flush function, e.g. tlb_flush_mmu_tlbonly { tlb_flush { __flush_tlb_range { // check MAX_TLBI_OPS } } mmu_notifier_invalidate_range { arm_smmu_mm_invalidate_range { // does not check MAX_TLBI_OPS } } } Clone a CMDQ_MAX_TLBI_OPS from the MAX_TLBI_OPS in tlbflush.h, since in an SVA case SMMU uses the CPU page table, so it makes sense to align with the tlbflush code. Then, replace per-page TLBI commands with a single per-asid TLBI command, if the request size hits this threshold. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20230920052257.8615-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13iommu/vt-d: Fix to flush cache of PASID directory tableYanfei Xu1-1/+1
[ Upstream commit 8a3b8e63f8371c1247b7aa24ff9c5312f1a6948b ] Even the PCI devices don't support pasid capability, PASID table is mandatory for a PCI device in scalable mode. However flushing cache of pasid directory table for these devices are not taken after pasid table is allocated as the "size" of table is zero. Fix it by calculating the size by page order. Found this when reading the code, no real problem encountered for now. Fixes: 194b3348bdbb ("iommu/vt-d: Fix PASID directory pointer coherency") Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Link: https://lore.kernel.org/r/20230616081045.721873-1-yanfei.xu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13iommu/qcom: Disable and reset context bank before programmingAngeloGioacchino Del Regno1-0/+7
[ Upstream commit 9f3fef23d9b5a858a6e6d5f478bb1b6b76265e76 ] Writing the new TTBRs, TCRs and MAIRs on a previously enabled context bank may trigger a context fault, resulting in firmware driven AP resets: change the domain initialization programming sequence to disable the context bank(s) and to also clear the related fault address (CB_FAR) and fault status (CB_FSR) registers before writing new values to TTBR0/1, TCR/TCR2, MAIR0/1. Fixes: 0ae349a0f33f ("iommu/qcom: Add qcom_iommu") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20230622092742.74819-4-angelogioacchino.delregno@collabora.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13iommu/sprd: Add missing force_apertureJason Gunthorpe1-0/+1
[ Upstream commit d48a51286c698f7fe8efc688f23a532f4fe9a904 ] force_aperture was intended to false only by GART drivers that have an identity translation outside the aperture. This does not describe sprd, so add the missing 'force_aperture = true'. Fixes: b23e4fc4e3fa ("iommu: add Unisoc IOMMU basic driver") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Chunyan Zhang <zhang.lyra@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13iommu/mediatek: Fix two IOMMU share pagetable issueChengci.Xu1-8/+14
[ Upstream commit cf69ef46dbd980a0b1c956d668e066a73e0acd0f ] Prepare for mt8188 to fix a two IOMMU HWs share pagetable issue. We have two MM IOMMU HWs in mt8188, one is VPP-IOMMU, the other is VDO-IOMMU. The 2 MM IOMMU HWs share pagetable don't work in this case: a) VPP-IOMMU probe firstly. b) VDO-IOMMU probe. c) The master for VDO-IOMMU probe (means frstdata is vpp-iommu). d) The master in another domain probe. No matter it is vdo or vpp. Then it still create a new pagetable in step d). The problem is "frstdata->bank[0]->m4u_dom" was not initialized. Then when d) enter, it still create a new one. In this patch, we create a new variable "share_dom" for this share pgtable case, it should be helpful for readable. and put all the share pgtable logic in the mtk_iommu_domain_finalise. In mt8195, the master of VPP-IOMMU probes before than VDO-IOMMU from its dtsi node sequence, we don't see this issue in it. Prepare for mt8188. Fixes: 645b87c190c9 ("iommu/mediatek: Fix 2 HW sharing pgtable issue") Signed-off-by: Chengci.Xu <chengci.xu@mediatek.com> Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> Link: https://lore.kernel.org/r/20230602090227.7264-3-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13iommu/mediatek: Remove unused "mapping" member from mtk_iommu_dataYong Wu1-3/+0
[ Upstream commit 9ff894edd542618dad2fef538f8272c620a501db ] Just remove a unused variable that only is for mtk_iommu_v1. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20221018024258.19073-7-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: cf69ef46dbd9 ("iommu/mediatek: Fix two IOMMU share pagetable issue") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13iommu: rockchip: Fix directory table address encodingJonas Karlman1-38/+5
[ Upstream commit 6df63b7ebdaf5fcd75dceedf6967d0761e56eca1 ] The physical address to the directory table is currently encoded using the following bit layout for IOMMU v2. 31:12 - Address bit 31:0 11: 4 - Address bit 39:32 This is also the bit layout used by the vendor kernel. However, testing has shown that addresses to the directory/page tables and memory pages are all encoded using the same bit layout. IOMMU v1: 31:12 - Address bit 31:0 IOMMU v2: 31:12 - Address bit 31:0 11: 8 - Address bit 35:32 7: 4 - Address bit 39:36 Change to use the mk_dtentries ops to encode the directory table address correctly. The value written to DTE_ADDR may include the valid bit set, a bit that is ignored and DTE_ADDR reg read it back as 0. This also update the bit layout comment for the page address and the number of nybbles that are read back for DTE_ADDR comment. These changes render the dte_addr_phys and dma_addr_dte ops unused and is removed. Fixes: 227014b33f62 ("iommu: rockchip: Add internal ops to handle variants") Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2") Fixes: c987b65a574f ("iommu/rockchip: Fix physical address decoding") Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230617182540.3091374-2-jonas@kwiboo.se Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13iommu/amd/iommu_v2: Fix pasid_state refcount dec hit 0 warning on pasid unbindDaniel Marcovitch1-2/+2
[ Upstream commit 534103bcd52ca9c1fecbc70e717b4a538dc4ded8 ] When unbinding pasid - a race condition exists vs outstanding page faults. To prevent this, the pasid_state object contains a refcount. * set to 1 on pasid bind * incremented on each ppr notification start * decremented on each ppr notification done * decremented on pasid unbind Since refcount_dec assumes that refcount will never reach 0: the current implementation causes the following to be invoked on pasid unbind: REFCOUNT_WARN("decrement hit 0; leaking memory") Fix this issue by changing refcount_dec to refcount_dec_and_test to explicitly handle refcount=1. Fixes: 8bc54824da4e ("iommu/amd: Convert from atomic_t to refcount_t on pasid_state->count") Signed-off-by: Daniel Marcovitch <dmarcovitch@nvidia.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230609105146.7773-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23iommu/amd: Introduce Disable IRTE Caching SupportSuravee Suthikulpanit2-0/+40
[ Upstream commit 66419036f68a838c00cbccacd6cb2e99da6e5710 ] An Interrupt Remapping Table (IRT) stores interrupt remapping configuration for each device. In a normal operation, the AMD IOMMU caches the table to optimize subsequent data accesses. This requires the IOMMU driver to invalidate IRT whenever it updates the table. The invalidation process includes issuing an INVALIDATE_INTERRUPT_TABLE command following by a COMPLETION_WAIT command. However, there are cases in which the IRT is updated at a high rate. For example, for IOMMU AVIC, the IRTE[IsRun] bit is updated on every vcpu scheduling (i.e. amd_iommu_update_ga()). On system with large amount of vcpus and VFIO PCI pass-through devices, the invalidation process could potentially become a performance bottleneck. Introducing a new kernel boot option: amd_iommu=irtcachedis which disables IRTE caching by setting the IRTCachedis bit in each IOMMU Control register, and bypass the IRT invalidation process. Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Co-developed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230530141137.14376-4-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11iommu/arm-smmu-v3: Document nesting-related errataRobin Murphy1-0/+5
commit 0bfbfc526c70606bf0fad302e4821087cbecfaf4 upstream Both MMU-600 and MMU-700 have similar errata around TLB invalidation while both stages of translation are active, which will need some consideration once nesting support is implemented. For now, though, it's very easy to make our implicit lack of nesting support explicit for those cases, so they're less likely to be missed in future. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/696da78d32bb4491f898f11b0bb4d850a8aa7c6a.1683731256.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11iommu/arm-smmu-v3: Add explicit feature for nestingRobin Murphy2-0/+5
commit 1d9777b9f3d55b4b6faf186ba4f1d6fb560c0523 upstream In certain cases we may want to refuse to allow nested translation even when both stages are implemented, so let's add an explicit feature for nesting support which we can control in its own right. For now this merely serves as documentation, but it means a nice convenient check will be ready and waiting for the future nesting code. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/136c3f4a3a84cc14a5a1978ace57dfd3ed67b688.1683731256.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11iommu/arm-smmu-v3: Document MMU-700 erratum 2812531Robin Murphy2-0/+13
commit 309a15cb16bb075da1c99d46fb457db6a1a2669e upstream To work around MMU-700 erratum 2812531 we need to ensure that certain sequences of commands cannot be issued without an intervening sync. In practice this falls out of our current command-batching machinery anyway - each batch only contains a single type of invalidation command, and ends with a sync. The only exception is when a batch is sufficiently large to need issuing across multiple command queue slots, wherein the earlier slots will not contain a sync and thus may in theory interleave with another batch being issued in parallel to create an affected sequence across the slot boundary. Since MMU-700 supports range invalidate commands and thus we will prefer to use them (which also happens to avoid conditions for other errata), I'm not entirely sure it's even possible for a single high-level invalidate call to generate a batch of more than 63 commands, but for the sake of robustness and documentation, wire up an option to enforce that a sync is always inserted for every slot issued. The other aspect is that the relative order of DVM commands cannot be controlled, so DVM cannot be used. Again that is already the status quo, but since we have at least defined ARM_SMMU_FEAT_BTM, we can explicitly disable it for documentation purposes even if it's not wired up anywhere yet. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/330221cdfd0003cd51b6c04e7ff3566741ad8374.1683731256.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11iommu/arm-smmu-v3: Work around MMU-600 erratum 1076982Robin Murphy2-0/+35
commit f322e8af35c7f23a8c08b595c38d6c855b2d836f upstream MMU-600 versions prior to r1p0 fail to correctly generate a WFE wakeup event when the command queue transitions fom full to non-full. We can easily work around this by simply hiding the SEV capability such that we fall back to polling for space in the queue - since MMU-600 implements MSIs we wouldn't expect to need SEV for sync completion either, so this should have little to no impact. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/08adbe3d01024d8382a478325f73b56851f76e49.1683731256.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19iommu/virtio: Return size mapped for a detached domainJean-Philippe Brucker1-16/+17
[ Upstream commit 7061b6af34686e7e2364b7240cfb061293218f2d ] When map() is called on a detached domain, the domain does not exist in the device so we do not send a MAP request, but we do update the internal mapping tree, to be replayed on the next attach. Since this constitutes a successful iommu_map() call, return *mapped in this case too. Fixes: 7e62edd7a33a ("iommu/virtio: Add map/unmap_pages() callbacks implementation") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230515113946.1017624-3-jean-philippe@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19iommu/virtio: Detach domain on endpoint releaseJean-Philippe Brucker1-0/+24
[ Upstream commit 809d0810e3520da669d231303608cdf5fe5c1a70 ] When an endpoint is released, for example a PCIe VF being destroyed or a function hot-unplugged, it should be detached from its domain. Send a DETACH request. Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver") Reported-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/all/15bf1b00-3aa0-973a-3a86-3fa5c4d41d2c@daynix.com/ Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Tested-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/r/20230515113946.1017624-2-jean-philippe@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-01mm: always expand the stack with the mmap write lock heldLinus Torvalds2-3/+3
commit 8d7071af890768438c14db6172cc8f9f4d04e184 upstream This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again. For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that. It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures. As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [6.1: Patch drivers/iommu/io-pgfault.c instead] Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-09iommu/amd/pgtbl_v2: Fix domain max addressVasant Hegde1-1/+10
commit 11c439a19466e7feaccdbce148a75372fddaf4e9 upstream. IOMMU v2 page table supports 4 level (47 bit) or 5 level (56 bit) virtual address space. Current code assumes it can support 64bit IOVA address space. If IOVA allocator allocates virtual address > 47/56 bit (depending on page table level) then it will do wrong mapping and cause invalid translation. Hence adjust aperture size to use max address supported by the page table. Reported-by: Jerry Snitselaar <jsnitsel@redhat.com> Fixes: aaac38f61487 ("iommu/amd: Initial support for AMD IOMMU v2 page table") Cc: <Stable@vger.kernel.org> # v6.0+ Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230518054351.9626-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> [ Modified to work with "V2 with 4 level page table" only - Vasant ] Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-09iommu/amd: Fix domain flush size when syncing iotlbJon Pan-Doh1-1/+1
commit 2212fc2acf3f6ee690ea36506fb882a19d1bfcab upstream. When running on an AMD vIOMMU, we observed multiple invalidations (of decreasing power of 2 aligned sizes) when unmapping a single page. Domain flush takes gather bounds (end-start) as size param. However, gather->end is defined as the last inclusive address (start + size - 1). This leads to an off by 1 error. With this patch, verified that 1 invalidation occurs when unmapping a single page. Fixes: a270be1b3fdf ("iommu/amd: Use only natural aligned flushes in a VM") Cc: stable@vger.kernel.org # >= 5.15 Signed-off-by: Jon Pan-Doh <pandoh@google.com> Tested-by: Sudheer Dantuluri <dantuluris@google.com> Suggested-by: Gary Zibrat <gzibrat@google.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Acked-by: Nadav Amit <namit@vmware.com> Link: https://lore.kernel.org/r/20230426203256.237116-1-pandoh@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-09iommu/mediatek: Flush IOTLB completely only if domain has been attachedChen-Yu Tsai1-1/+2
[ Upstream commit b3fc95709c54ffbe80f16801e0a792a4d2b3d55e ] If an IOMMU domain was never attached, it lacks any linkage to the actual IOMMU hardware. Attempting to do flush_iotlb_all() on it will result in a NULL pointer dereference. This seems to happen after the recent IOMMU core rework in v6.4-rc1. Unable to handle kernel read from unreadable memory at virtual address 0000000000000018 Call trace: mtk_iommu_flush_iotlb_all+0x20/0x80 iommu_create_device_direct_mappings.part.0+0x13c/0x230 iommu_setup_default_domain+0x29c/0x4d0 iommu_probe_device+0x12c/0x190 of_iommu_configure+0x140/0x208 of_dma_configure_id+0x19c/0x3c0 platform_dma_configure+0x38/0x88 really_probe+0x78/0x2c0 Check if the "bank" field has been filled in before actually attempting the IOTLB flush to avoid it. The IOTLB is also flushed when the device comes out of runtime suspend, so it should have a clean initial state. Fixes: 08500c43d4f7 ("iommu/mediatek: Adjust the structure") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20230526085402.394239-1-wenst@chromium.org Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09iommu/amd: Fix up merge conflict resolutionJerry Snitselaar1-3/+0
[ Upstream commit 8ec4e2befef10c7679cd59251956a428e783c0b5 ] Merge commit e17c6debd4b2 ("Merge branches 'arm/mediatek', 'arm/msm', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'x86/vt-d' and 'x86/amd' into next") added amd_iommu_init_devices, amd_iommu_uninit_devices, and amd_iommu_init_notifier back to drivers/iommu/amd/amd_iommu.h. The only references to them are here, so clean them up. Fixes: e17c6debd4b2 ("Merge branches 'arm/mediatek', 'arm/msm', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'x86/vt-d' and 'x86/amd' into next") Cc: Joerg Roedel <joro@8bytes.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230420192013.733331-1-jsnitsel@redhat.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09iommu/amd: Handle GALog overflowsJoao Martins3-1/+33
[ Upstream commit af47b0a24058e56e983881993752f88288ca6511 ] GALog exists to propagate interrupts into all vCPUs in the system when interrupts are marked as non running (e.g. when vCPUs aren't running). A GALog overflow happens when there's in no space in the log to record the GATag of the interrupt. So when the GALOverflow condition happens, the GALog queue is processed and the GALog is restarted, as the IOMMU manual indicates in section "2.7.4 Guest Virtual APIC Log Restart Procedure": | * Wait until MMIO Offset 2020h[GALogRun]=0b so that all request | entries are completed as circumstances allow. GALogRun must be 0b to | modify the guest virtual APIC log registers safely. | * Write MMIO Offset 0018h[GALogEn]=0b. | * As necessary, change the following values (e.g., to relocate or | resize the guest virtual APIC event log): | - the Guest Virtual APIC Log Base Address Register | [MMIO Offset 00E0h], | - the Guest Virtual APIC Log Head Pointer Register | [MMIO Offset 2040h][GALogHead], and | - the Guest Virtual APIC Log Tail Pointer Register | [MMIO Offset 2048h][GALogTail]. | * Write MMIO Offset 2020h[GALOverflow] = 1b to clear the bit (W1C). | * Write MMIO Offset 0018h[GALogEn] = 1b, and either set | MMIO Offset 0018h[GAIntEn] to enable the GA log interrupt or clear | the bit to disable it. Failing to handle the GALog overflow means that none of the VFs (in any guest) will work with IOMMU AVIC forcing the user to power cycle the host. When handling the event it resumes the GALog without resizing much like how it is done in the event handler overflow. The [MMIO Offset 2020h][GALOverflow] bit might be set in status register without the [MMIO Offset 2020h][GAInt] bit, so when deciding to poll for GA events (to clear space in the galog), also check the overflow bit. [suravee: Check for GAOverflow without GAInt, toggle CONTROL_GAINT_EN] Co-developed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230419201154.83880-3-joao.m.martins@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Stable-dep-of: 8ec4e2befef1 ("iommu/amd: Fix up merge conflict resolution") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09iommu/amd: Don't block updates to GATag if guest mode is onJoao Martins1-2/+1
[ Upstream commit ed8a2f4ddef2eaaf864ab1efbbca9788187036ab ] On KVM GSI routing table updates, specially those where they have vIOMMUs with interrupt remapping enabled (to boot >255vcpus setups without relying on KVM_FEATURE_MSI_EXT_DEST_ID), a VMM may update the backing VF MSIs with a new VCPU affinity. On AMD with AVIC enabled, the new vcpu affinity info is updated via: avic_pi_update_irte() irq_set_vcpu_affinity() amd_ir_set_vcpu_affinity() amd_iommu_{de}activate_guest_mode() Where the IRTE[GATag] is updated with the new vcpu affinity. The GATag contains VM ID and VCPU ID, and is used by IOMMU hardware to signal KVM (via GALog) when interrupt cannot be delivered due to vCPU is in blocking state. The issue is that amd_iommu_activate_guest_mode() will essentially only change IRTE fields on transitions from non-guest-mode to guest-mode and otherwise returns *with no changes to IRTE* on already configured guest-mode interrupts. To the guest this means that the VF interrupts remain affined to the first vCPU they were first configured, and guest will be unable to issue VF interrupts and receive messages like this from spurious interrupts (e.g. from waking the wrong vCPU in GALog): [ 167.759472] __common_interrupt: 3.34 No irq handler for vector [ 230.680927] mlx5_core 0000:00:02.0: mlx5_cmd_eq_recover:247:(pid 3122): Recovered 1 EQEs on cmd_eq [ 230.681799] mlx5_core 0000:00:02.0: wait_func_handle_exec_timeout:1113:(pid 3122): cmd[0]: CREATE_CQ(0x400) recovered after timeout [ 230.683266] __common_interrupt: 3.34 No irq handler for vector Given the fact that amd_ir_set_vcpu_affinity() uses amd_iommu_activate_guest_mode() underneath it essentially means that VCPU affinity changes of IRTEs are nops. Fix it by dropping the check for guest-mode at amd_iommu_activate_guest_mode(). Same thing is applicable to amd_iommu_deactivate_guest_mode() although, even if the IRTE doesn't change underlying DestID on the host, the VFIO IRQ handler will still be able to poke at the right guest-vCPU. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230419201154.83880-2-joao.m.martins@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09iommu/rockchip: Fix unwind goto issueChao Wang1-6/+8
[ Upstream commit ec014683c564fb74fc68e8f5e84691d3b3839d24 ] Smatch complains that drivers/iommu/rockchip-iommu.c:1306 rk_iommu_probe() warn: missing unwind goto? The rk_iommu_probe function, after obtaining the irq value through platform_get_irq, directly returns an error if the returned value is negative, without releasing any resources. Fix this by adding a new error handling label "err_pm_disable" and use a goto statement to redirect to the error handling process. In order to preserve the original semantics, set err to the value of irq. Fixes: 1aa55ca9b14a ("iommu/rockchip: Move irq request past pm_runtime_enable") Signed-off-by: Chao Wang <D202280639@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20230417030421.2777-1-D202280639@hust.edu.cn Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11iommu/amd: Set page size bitmap during V2 domain allocationJerry Snitselaar2-6/+9
[ Upstream commit 8f880d19e6ad645a4b8066d5ff091c980b3231e7 ] With the addition of the V2 page table support, the domain page size bitmap needs to be set prior to iommu core setting up direct mappings for reserved regions. When reserved regions are mapped, if this is not done, it will be looking at the V1 page size bitmap when determining the page size to use in iommu_pgsize(). When it gets into the actual amd mapping code, a check of see if the page size is supported can fail, because at that point it is checking it against the V2 page size bitmap which only supports 4K, 2M, and 1G. Add a check to __iommu_domain_alloc() to not override the bitmap if it was already set by the iommu ops domain_alloc() code path. Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Joerg Roedel <joro@8bytes.org> Fixes: 4db6c41f0946 ("iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20230404072742.1895252-1-jsnitsel@redhat.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11iommu/mediatek: Set dma_mask for PGTABLE_PA_35_ENYong Wu1-0/+8
[ Upstream commit f045e9df6537175d02565f21616ac1a9dd59b61c ] When we enable PGTABLE_PA_35_EN, the PA for pgtable may be 35bits. Thus add dma_mask for it. Fixes: 301c3ca12576 ("iommu/mediatek: Allow page table PA up to 35bit") Signed-off-by: Chengci.Xu <chengci.xu@mediatek.com> Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20230316101445.12443-1-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11iommu/amd: Fix "Guest Virtual APIC Table Root Pointer" configuration in IRTEKishon Vijay Abraham I1-2/+2
commit ccc62b827775915a9b82db42a29813d04f92df7a upstream. commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") while refactoring guest virtual APIC activation/de-activation code, stored information for activate/de-activate in "struct amd_ir_data". It used 32-bit integer data type for storing the "Guest Virtual APIC Table Root Pointer" (ga_root_ptr), though the "ga_root_ptr" is actually a 40-bit field in IRTE (Interrupt Remapping Table Entry). This causes interrupts from PCIe devices to not reach the guest in the case of PCIe passthrough with SME (Secure Memory Encryption) enabled as _SME_ bit in the "ga_root_ptr" is lost before writing it to the IRTE. Fix it by using 64-bit data type for storing the "ga_root_ptr". While at that also change the data type of "ga_tag" to u32 in order to match the IOMMU spec. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Cc: stable@vger.kernel.org # v5.4+ Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Link: https://lore.kernel.org/r/20230405130317.9351-1-kvijayab@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>