summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
AgeCommit message (Collapse)AuthorFilesLines
2024-03-06drm/amdgpu: remove unused codeJesse Zhang1-45/+15
Remove the unused function - amdgpu_vm_pt_is_root_clean and remove the impossible condition v1: entries == 0 is not possible any more, so this condition could probably be removed (Felix) Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by:Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-04drm/amdgpu: change vm->task_info handlingShashank Sharma1-1/+1
This patch changes the handling and lifecycle of vm->task_info object. The major changes are: - vm->task_info is a dynamically allocated ptr now, and its uasge is reference counted. - introducing two new helper funcs for task_info lifecycle management - amdgpu_vm_get_task_info: reference counts up task_info before returning this info - amdgpu_vm_put_task_info: reference counts down task_info - last put to task_info() frees task_info from the vm. This patch also does logistical changes required for existing usage of vm->task_info. V2: Do not block all the prints when task_info not found (Felix) V3: Fixed review comments from Felix - Fix wrong indentation - No debug message for -ENOMEM - Add NULL check for task_info - Do not duplicate the debug messages (ti vs no ti) - Get first reference of task_info in vm_init(), put last in vm_fini() V4: Fixed review comments from Felix - fix double reference increment in create_task_info - change amdgpu_vm_get_task_info_pasid - additional changes in amdgpu_gem.c while porting Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-04Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for ↵Jesse Zhang1-55/+1
Raven fix the issue: "amdgpu: Failed to create process VM object". [Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm page table. But when clifo run. It also initializes a vm for a process device through the function kfd_process_device_init_vm and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean. So they have a conflict, and clinfo always failed. v1: - remove all the pte_supports_ats stuff from the amdgpu_vm code (Felix) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-14drm/amdgpu: fix tear down order in amdgpu_vm_pt_freeChristian König1-1/+2
When freeing PD/PT with shadows it can happen that the shadow destruction races with detaching the PD/PT from the VM causing a NULL pointer dereference in the invalidation code. Fix this by detaching the the PD/PT from the VM first and then freeing the shadow instead. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: https://gitlab.freedesktop.org/drm/amd/-/issues/2867 Cc: <stable@vger.kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-27drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systemsDavid Francis1-1/+1
On gfx943 APU, EXT_COHERENT should give MTYPE_CC for local and MTYPE_UC for nonlocal memory. On NUMA systems, local memory gets the local mtype, set by an override callback. If EXT_COHERENT is set, memory will be set as MTYPE_UC by default, with local memory MTYPE_CC. Add an option in the override function for this case, and add a check to ensure it is not used on UNCACHED memory. V2: Combined APU and NUMA code into one patch V3: Fixed a potential nullptr in amdgpu_vm_bo_update Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-05drm/amdgpu: ratelimited override pte flags messagesPhilip Yang1-8/+2
Use ratelimited version of dev_dbg to avoid flooding dmesg log. No functional change. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-21drm/amdgpu: Fix one kernel-doc commentYang Li1-1/+1
Use colon to separate parameter name from their specific meaning. silence the warning: drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c:793: warning: Function parameter or member 'adev' not described in 'amdgpu_vm_pte_update_noretry_flags' Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu/vm: use the same xcp_id from root PDGuchun Chen1-1/+2
Other PDs/PTs allocation should just use the same xcp_id as that stored in root PD. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-18drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_createGuchun Chen1-5/+6
Recent code set xcp_id stored from file private data when opening device to amdgpu bo for accounting memory usage etc, but not all VMs are attached to this fpriv structure like the vm cases in amdgpu_mes_self_test, otherwise, KASAN will complain below out of bound access. And more importantly, VM code should not touch fpriv structure, so drop fpriv code handling from amdgpu_vm_pt. [ 77.292314] BUG: KASAN: slab-out-of-bounds in amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.293845] Read of size 4 at addr ffff888102c48a48 by task modprobe/1069 [ 77.294146] Call Trace: [ 77.294178] <TASK> [ 77.294208] dump_stack_lvl+0x49/0x63 [ 77.294260] print_report+0x16f/0x4a6 [ 77.294307] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.295979] ? kasan_complete_mode_report_info+0x3c/0x200 [ 77.296057] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.297556] kasan_report+0xb4/0x130 [ 77.297609] ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.299202] __asan_load4+0x6f/0x90 [ 77.299272] amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu] [ 77.300796] ? amdgpu_init+0x6e/0x1000 [amdgpu] [ 77.302222] ? amdgpu_vm_pt_clear+0x750/0x750 [amdgpu] [ 77.303721] ? preempt_count_sub+0x18/0xc0 [ 77.303786] amdgpu_vm_init+0x39e/0x870 [amdgpu] [ 77.305186] ? amdgpu_vm_wait_idle+0x90/0x90 [amdgpu] [ 77.306683] ? kasan_set_track+0x25/0x30 [ 77.306737] ? kasan_save_alloc_info+0x1b/0x30 [ 77.306795] ? __kasan_kmalloc+0x87/0xa0 [ 77.306852] amdgpu_mes_self_test+0x169/0x620 [amdgpu] v2: without specifying xcp partition for PD/PT bo, the xcp id is -1. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2686 Fixes: 3ebfd221c1a8 ("drm/amdkfd: Store xcp partition id to amdgpu bo") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07drm/amdgpu: have bos for PDs/PTS cpu accessible when kfd uses cpu to update vmXiaogang Chen1-0/+28
When kfd uses cpu to update vm iterates all current PDs/PTs bos, adds AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag and kmap them to kernel virtual address space before kfd updates the vm that was created by gfx. Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-07drm/amdgpu: Update invalid PTE flag settingMukul Joshi1-0/+31
Update the invalid PTE flag setting with TF enabled. This is to ensure, in addition to transitioning the retry fault to a no-retry fault, it also causes the wavefront to enter the trap handler. With the current setting, the fault only transitions to a no-retry fault. Additionally, have 2 sets of invalid PTE settings, one for TF enabled, the other for TF disabled. The setting with TF disabled, doesn't work with TF enabled. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vramHoratio Zhang1-1/+0
Use the function of amdgpu_bo_vm_destroy to handle the resource release of shadow bo. During the amdgpu_mes_self_test, shadow bo released, but vmbo->shadow_list was not, which caused a null pointer reference error in amdgpu_device_recover_vram when GPU reset. Fixes: 6c032c37ac3e ("drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)") Signed-off-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Acked-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdkfd: Store xcp partition id to amdgpu boPhilip Yang1-2/+3
For memory accounting per compute partition and export drm amdgpu bo and then import to KFD, we need the xcp id to account the memory usage or find the KFD node of the original amdgpu bo to create the KFD bo on the correct adev KFD node. Set xcp_id_plus1 of amdgpu_bo_param to create bo and store xcp_id to amddgpu bo. Add helper macro to get the mem_id from adev and xcp_id. v2: squash in fix ("drm/amdgpu: Fix BO creation failure on GFX 9.4.3 dGPU") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Alloc page table on correct memory partitionPhilip Yang1-0/+3
Alloc kernel mode page table bo uses the amdgpu_vm->mem_id + 1 as bp mem_id_plus1 parameter. For APU mode, select the correct TTM pool to alloc page from the corresponding memory partition, this will be the closest NUMA node. For dGPU mode, select the correct address range for vram manager. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Override MTYPE per page on GFXv9.4.3 APUsFelix Kuehling1-3/+19
On GFXv9.4.3 NUMA APUs, system memory locality must be determined per page to choose the correct MTYPE. This patch adds a GMC callback that can provide this per-page override and implements it for native mode. Carve-out mode is not yet supported and will use the safe default (remote) MTYPE for system memory. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Handle VRAM dependencies on GFXIP9.4.3Rajneesh Bhardwaj1-1/+6
[For 1P NPS1 mode driver bringup] Changes required to initialize the amdgpu driver with frontdoor firmware loading and discovery=2 with the native mode SBIOS that enables CPU GPU unified interleaved memory. sudo modprobe amdgpu discovery=2 Once PSP TMR region is reported via the ACPI interface, the dependency on the ip_discovery.bin will be removed. Choice of where to allocate driver table is given to each IP version. In general, both GTT and VRAM domains will be considered. If one of the tables has a strict restriction for VRAM domain, then only VRAM domain is considered. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> (lijo: Modified the handling for SMU Tables) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-22drm/amd/amdgpu/amdgpu_vm_pt: Supply description for ↵Lee Jones1-0/+1
amdgpu_vm_pt_free_dfs()'s unlocked param Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c:683: warning: Function parameter or member 'unlocked' not described in 'amdgpu_vm_pt_free_dfs' Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-09drm/amdgpu: Use the TGID for trace_amdgpu_vm_update_ptesFriedrich Vock1-1/+1
The pid field corresponds to the result of gettid() in userspace. However, userspace cannot reliably attribute PTE events to processes with just the thread id. This patch allows userspace to easily attribute PTE update events to specific processes by comparing this field with the result of getpid(). For attributing events to specific threads, the thread id is also contained in the common fields of each trace event. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-11-10drm/amdgpu: Drop eviction lock when allocating PT BOPhilip Yang1-0/+2
Re-take the eviction lock immediately again after the allocation is completed, to fix circular locking warning with drm_buddy allocator. Move amdgpu_vm_eviction_lock/unlock/trylock to amdgpu_vm.h as they are called from multiple files. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21drm/amdgpu: Fix amdgpu_vm_pt_free warningPhilip Yang1-3/+38
Free page table BO from vm resv unlocked context generate below warnings. Add a pt_free_work in vm to free page table BO from vm->pt_freed list. pass vm resv unlock status from page table update caller, and add vm_bo entry to vm->pt_freed list and schedule the pt_free_work if calling with vm resv unlocked. WARNING: CPU: 12 PID: 3238 at drivers/gpu/drm/ttm/ttm_bo.c:106 ttm_bo_set_bulk_move+0xa1/0xc0 Call Trace: amdgpu_vm_pt_free+0x42/0xd0 [amdgpu] amdgpu_vm_pt_free_dfs+0xb3/0xf0 [amdgpu] amdgpu_vm_ptes_update+0x52d/0x850 [amdgpu] amdgpu_vm_update_range+0x2a6/0x640 [amdgpu] svm_range_unmap_from_gpus+0x110/0x300 [amdgpu] svm_range_cpu_invalidate_pagetables+0x535/0x600 [amdgpu] __mmu_notifier_invalidate_range_start+0x1cd/0x230 unmap_vmas+0x9d/0x140 unmap_region+0xa8/0x110 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-21drm/amdgpu: Use vm status_lock to protect pt freePhilip Yang1-0/+3
Use vm_status_lock to protect all vm_status state transitions to allow them to happen without a reservation lock in unlocked page table updates. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-03drm/amdgpu: fix drm-next merge falloutChristian König1-1/+5
That hunk somehow got missing while solving the conflict between the TTM and AMDGPU changes for drm-next. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220503063613.46925-1-christian.koenig@amd.com
2022-04-28Merge tag 'amd-drm-next-5.19-2022-04-15' of ↵Dave Airlie1-0/+977
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-5.19-2022-04-15: amdgpu: - USB-C updates - GPUVM updates - TMZ fixes for RV - DCN 3.1 pstate fixes - Display z state fixes - RAS fixes - Misc code cleanups and spelling fixes - More DC FP rework - GPUVM TLB handling rework - Power management sysfs code cleanup - Add RAS support for VCN - Backlight fix - Add unique id support for more asics - Misc display updates - SR-IOV fixes - Extend CG and PG flags to 64 bits - Enable VCN clk sysfs nodes for navi12 amdkfd: - Fix IO link cleanup during device removal - RAS fixes - Retry fault fixes - Asynchronously free events - SVM fixes radeon: - Drop some dead code - Misc code cleanups From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220415135144.5700-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-04-01drm/amdgpu: fix some kerneldoc in the VM code v2Christian König1-1/+1
Fix two incorrect kerneldocs for the recent VM code changes. v2: fix one more typo Signed-off-by: Christian König <christian.koenig@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-25drm/amdgpu: separate VM PT handling into amdgpu_vm_pt.cChristian König1-0/+979
Separate the VM page table backend operations from the state machine since the amdgpu_vm.c file is becoming to complex. The allocating, freeing and updating page tables and page directories can easily be moved into a separate file. While at it cleanup everything checkpatch.pl reported and rename the functions a bit to make more clear that they belong together. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>