summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-11-30drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointerLu Yao1-0/+6
For 'AMDGPU_FAMILY_SI' family cards, in 'si_common_early_init' func, init 'didt_rreg' and 'didt_wreg' to 'NULL'. But in func 'amdgpu_debugfs_regs_didt_read/write', using 'RREG32_DIDT' 'WREG32_DIDT' lacks of relevant judgment. And other 'amdgpu_ip_block_version' that use these two definitions won't be added for 'AMDGPU_FAMILY_SI'. So, add null pointer judgment before calling. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lu Yao <yaolu@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-30drm/amd: Enable PCIe PME from D3Mario Limonciello1-0/+2
When dGPU is put into BOCO it may be in D3cold but still able send PME on display hotplug event. For this to work it must be enabled as wake source from D3. When runpm is enabled use pci_wake_from_d3() to mark wakeup as enabled by default. Cc: stable@vger.kernel.org # 6.1+ Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-30drm/amdgpu: fix AGP addressing when GART is not at 0Alex Deucher3-6/+11
This worked by luck if the GART aperture ended up at 0. When we ended up moving GART on some chips, the GART aperture ended up offsetting the AGP address since the resource->start is a GART offset, not an MC address. Fix this by moving the AGP address setup into amdgpu_bo_gpu_offset_no_check(). v2: check mem_type before checking agp v3: check if the ttm bo has a ttm_tt allocated yet Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11") Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Jesse Zhang <Jesse.Zhang@amd.com> Reported-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: christian.koenig@amd.com Cc: mario.limonciello@amd.com
2023-11-30drm/amdgpu: correct the amdgpu runtime dereference usage countPrike Liang1-6/+3
Fix the amdgpu runpm dereference usage count. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-11-30drm/amdgpu: fix memory overflow in the IB testTim Huang4-7/+7
Fix a memory overflow issue in the gfx IB test for some ASICs. At least 20 bytes are needed for the IB test packet. v2: correct code indentation errors. (Christian) Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-11-30drm/amdgpu: add init_registers for nbio v7.11Li Ma1-9/+9
enable init_registers callback func for nbio v7.11. Signed-off-by: Li Ma <li.ma@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-30drm/amdgpu: Force order between a read and write to the same addressAlex Sierra1-0/+8
Setting register to force ordering to prevent read/write or write/read hazards for un-cached modes. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-11-30drm/amdgpu: Do not issue gpu reset from nbio v7_9 bif interruptHawking Zhang1-5/+0
In nbio v7_9, host driver should not issu gpu reset Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-30drm/amdgpu: optimize RLC powerdown notification on VangoghPerry Yuan1-0/+4
The smu needs to get the rlc power down message to sync the rlc state with smu, the rlc state updating message need to be sent at while smu begin suspend sequence , otherwise SMU will crash while RLC state is not notified by driver, and rlc state probally changed after that notification, so it needs to notify rlc state to smu at the end of the suspend sequence in amdgpu_device_suspend() that can make sure the rlc state is correctly set to SMU. [ 101.000590] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 [ 101.000598] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff! [ 110.838026] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000 [ 110.838035] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features. [ 110.838039] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features! [ 110.838040] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62 [ 110.884394] PM: suspend of devices aborted after 21213.620 msecs [ 110.884402] PM: start suspend of devices aborted after 21213.882 msecs [ 110.884405] PM: Some devices failed to suspend, or early wake event detected Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-30drm/amdgpu: update xgmi num links info post gc9.4.2Jonathan Kim1-1/+1
GC IP 9.4.2 and up support TA reporting of the number of xGMI links between peers. Tested-by: Vignesh Chander <vignesh.chander@amd.com> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu/gmc9: disable AGP apertureAlex Deucher1-1/+1
We've had misc reports of random IOMMU page faults when this is used. It's just a rarely used optimization anyway, so let's just disable it. It can still be toggled via the module parameter for testing. v2: leave it configurable via module parameter Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1) Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu/gmc10: disable AGP apertureAlex Deucher1-1/+1
We've had misc reports of random IOMMU page faults when this is used. It's just a rarely used optimization anyway, so let's just disable it. It can still be toggled via the module parameter for testing. v2: leave it configurable via module parameter Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1) Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu/gmc11: disable AGP apertureAlex Deucher1-1/+1
We've had misc reports of random IOMMU page faults when this is used. It's just a rarely used optimization anyway, so let's just disable it. It can still be toggled via the module parameter for testing. v2: leave it configurable via module parameter Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11") Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1) Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu: add a module parameter to control the AGP apertureAlex Deucher5-3/+15
Add a module parameter to control the AGP aperture. The AGP aperture is an aperture in the GPU's internal address space which provides direct non-paged access to the platform address space. This access is non-snooped so only uncached memory can be accessed. Add a knob so that we can toggle this for debugging. Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu/gmc11: fix logic typo in AGP checkAlex Deucher1-1/+1
Should be && rather than ||. Fixes: b2e1cbe6281f ("drm/amdgpu/gmc11: disable AGP on GC 11.5") Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu: add and populate the port num into xgmi topology infoShiwu Zhang2-0/+6
The port num info is firstly introduced with 20.00.01.13 xgmi ta and make them as part of topology info. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu: fix ras err_data null pointer issue in amdgpu_ras.cYang Wang1-1/+1
fix ras err_data null pointer issue in amdgpu_ras.c Fixes: 8cc0f5669eb6 ("drm/amdgpu: Support multiple error query modes") Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu: correct chunk_ptr to a pointer to chunk.YuanShang1-1/+1
The variable "chunk_ptr" should be a pointer pointing to a struct drm_amdgpu_cs_chunk instead of to a pointer of that. Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu: Address member 'ring' not described in 'amdgpu_ vce, ↵Srinivasan Shanmugam2-0/+2
uvd_entity_init()' Fixes the following: drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:237: warning: Function parameter or member 'ring' not described in 'amdgpu_vce_entity_init' drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:405: warning: Function parameter or member 'ring' not described in 'amdgpu_uvd_entity_init' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu: finalizing mem_partitions at the end of GMC v9 sw_finiLe Ma1-2/+3
The valid num_mem_partitions is required during ttm pool fini, thus move the cleanup at the end of the function. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu: Do not program VF copy regs in mmhub v1.8 under SRIOV (v2)Victor Lu1-3/+3
MC_VM_AGP_* registers should not be programmed by guest driver. v2: move early return outside of loop Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-11Merge tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drmLinus Torvalds45-223/+877
Pull drm fixes from Daniel Vetter: "Dave's VPN to the big machine died, so it's on me to do fixes pr this and next week while everyone else is at plumbers. - big pile of amd fixes, but mostly for hw support newly added in 6.7 - i915 fixes, mostly minor things - qxl memory leak fix - vc4 uaf fix in mock helpers - syncobj fix for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE" * tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm: (78 commits) drm/amdgpu: fix error handling in amdgpu_vm_init drm/amdgpu: Fix possible null pointer dereference drm/amdgpu: move UVD and VCE sched entity init after sched init drm/amdgpu: move kfd_resume before the ip late init drm/amd: Explicitly check for GFXOFF to be enabled for s0ix drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2) drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5) drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query support drm/amdgpu: fix software pci_unplug on some chips drm/amd/display: remove duplicated argument drm/amdgpu: correct mca debugfs dump reg list drm/amdgpu: correct acclerator check architecutre dump drm/amdgpu: add pcs xgmi v6.4.0 ras support drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3 drm/amdgpu: disable smu v13.0.6 mca debug mode by default drm/amdgpu: Support multiple error query modes drm/amdgpu: refine smu v13.0.6 mca dump driver drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2) drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV drm: amd: Resolve Sphinx unexpected indentation warning ...
2023-11-10Merge tag 'amd-drm-next-6.7-2023-11-10' of ↵Daniel Vetter45-223/+877
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.7-2023-11-10: amdgpu: - SR-IOV fixes - DMCUB fixes - DCN3.5 fixes - DP2 fixes - SubVP fixes - SMU14 fixes - SDMA4.x fixes - Suspend/resume fixes - AGP regression fix - UAF fixes for some error cases - SMU 13.0.6 fixes - Documentation fixes - RAS fixes - Hotplug fixes - Scheduling entity ordering fix - GPUVM fixes Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110190703.4741-1-alexander.deucher@amd.com
2023-11-10drm/amdgpu: fix error handling in amdgpu_vm_initChristian König1-15/+16
When clearing the root PD fails we need to properly release it again. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-11-10drm/amdgpu: Fix possible null pointer dereferenceFelix Kuehling1-2/+2
mem = bo->tbo.resource may be NULL in amdgpu_vm_bo_update. Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation") Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-11-10drm/amdgpu: move UVD and VCE sched entity init after sched initAlex Deucher13-46/+37
We need kernel scheduling entities to deal with handle clean up if apps are not cleaned up properly. With commit 56e449603f0ac5 ("drm/sched: Convert the GPU scheduler to variable number of run-queues") the scheduler entities have to be created after scheduler init, so change the ordering to fix this. v2: Leave logic in UVD and VCE code Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: ltuikov89@gmail.com
2023-11-10drm/amdgpu: move kfd_resume before the ip late initTim Huang1-7/+6
The kfd_resume needs to touch GC registers to enable the interrupts, it needs to be done before GFXOFF is enabled to ensure that the GFX is not off and GC registers can be touched. So move kfd_resume before the amdgpu_device_ip_late_init which enables the CGPG/GFXOFF. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amd: Explicitly check for GFXOFF to be enabled for s0ixMario Limonciello1-0/+3
If a user has disabled GFXOFF this may cause problems for the suspend sequence. Ensure that it is enabled in amdgpu_acpi_is_s0ix_active(). The system won't reach the deepest state but it also won't hang. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10Merge tag 'drm-misc-fixes-2023-11-08' of ↵Daniel Vetter3-9/+11
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-fixes for v6.7-rc1: qxl: - qxl memory leak fix. syncobj: - Fix waiting for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE vc4: - Fix UAF in mock helpers Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> [sima: Stitch together both changelogs from Maarten. Also because of branch history this contains a few more bugfixes which are already in v6.6, but I didn't feel like this justifies some backmerge since there wasn't any real conflict.] Link: https://patchwork.freedesktop.org/patch/msgid/bc8598ee-d427-4616-8ebd-64107ab9a2d8@linux.intel.com
2023-11-10drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2)Victor Lu3-43/+37
W/RREG32_RLC is hardedcoded to use instance 0. W/RREG32_SOC15_RLC should be used instead when inst != 0. v2: rebase Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)Victor Lu9-18/+116
amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0. Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter. Using amdgpu_sriov_runtime to determine whether to access via kiq or RLC is sufficient for now. v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call v4: avoid using amdgpu_sriov_w/rreg v3: use W/RREG32_XCC to handle non-kiq case v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters of amdgpu_device_wreg/rreg Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query supportYang Wang1-0/+1
add pcs xgmi ras error query support for smu v13.0.6. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: fix software pci_unplug on some chipsVitaly Prosyak1-3/+6
When software 'pci unplug' using IGT is executed we got a sysfs directory entry is NULL for differant ras blocks like hdp, umc, etc. Before call 'sysfs_remove_file_from_group' and 'sysfs_remove_group' check that 'sd' is not NULL. [ +0.000001] RIP: 0010:sysfs_remove_group+0x83/0x90 [ +0.000002] Code: 31 c0 31 d2 31 f6 31 ff e9 9a a8 b4 00 4c 89 e7 e8 f2 a2 ff ff eb c2 49 8b 55 00 48 8b 33 48 c7 c7 80 65 94 82 e8 cd 82 bb ff <0f> 0b eb cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 [ +0.000001] RSP: 0018:ffffc90002067c90 EFLAGS: 00010246 [ +0.000002] RAX: 0000000000000000 RBX: ffffffff824ea180 RCX: 0000000000000000 [ +0.000001] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000001] RBP: ffffc90002067ca8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ +0.000001] R13: ffff88810a395f48 R14: ffff888101aab0d0 R15: 0000000000000000 [ +0.000001] FS: 00007f5ddaa43a00(0000) GS:ffff88841e800000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f8ffa61ba50 CR3: 0000000106432000 CR4: 0000000000350ef0 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000001] ? show_regs+0x72/0x90 [ +0.000002] ? sysfs_remove_group+0x83/0x90 [ +0.000002] ? __warn+0x8d/0x160 [ +0.000001] ? sysfs_remove_group+0x83/0x90 [ +0.000001] ? report_bug+0x1bb/0x1d0 [ +0.000003] ? handle_bug+0x46/0x90 [ +0.000001] ? exc_invalid_op+0x19/0x80 [ +0.000002] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000003] ? sysfs_remove_group+0x83/0x90 [ +0.000001] dpm_sysfs_remove+0x61/0x70 [ +0.000002] device_del+0xa3/0x3d0 [ +0.000002] ? ktime_get_mono_fast_ns+0x46/0xb0 [ +0.000002] device_unregister+0x18/0x70 [ +0.000001] i2c_del_adapter+0x26d/0x330 [ +0.000002] arcturus_i2c_control_fini+0x25/0x50 [amdgpu] [ +0.000236] smu_sw_fini+0x38/0x260 [amdgpu] [ +0.000241] amdgpu_device_fini_sw+0x116/0x670 [amdgpu] [ +0.000186] ? mutex_lock+0x13/0x50 [ +0.000003] amdgpu_driver_release_kms+0x16/0x40 [amdgpu] [ +0.000192] drm_minor_release+0x4f/0x80 [drm] [ +0.000025] drm_release+0xfe/0x150 [drm] [ +0.000027] __fput+0x9f/0x290 [ +0.000002] ____fput+0xe/0x20 [ +0.000002] task_work_run+0x61/0xa0 [ +0.000002] exit_to_user_mode_prepare+0x150/0x170 [ +0.000002] syscall_exit_to_user_mode+0x2a/0x50 Cc: Hawking Zhang <hawking.zhang@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: correct mca debugfs dump reg listYang Wang1-2/+9
avoid driver to touch invalid mca reg. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: correct acclerator check architecutre dumpHawking Zhang2-6/+11
So driver doesn't touch invalid aca entries. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: add pcs xgmi v6.4.0 ras supportYang Wang2-3/+161
add pcs xgmi v6.4.0 ras support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3David Yat Sin1-2/+5
Change local memory type to MTYPE_UC on revision id 0 Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: Support multiple error query modesHawking Zhang2-23/+78
Direct error query mode and firmware error query mode are supported for now. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: refine smu v13.0.6 mca dump driverYang Wang2-10/+215
refine smu mca driver to support query ras error from pmfw path. - correct gfx smu bank hwid (from mp5 to smu bank) - retire unused callback function in amdgpu_mca_smu_funcs{} - add new mca_bank_set{} structure to collect mca bank - move enum mca_reg_idx into amdgpu_mca.h header - add mca status register field decode macro Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2)Victor Lu1-0/+4
The following regs can only be programmed by the PF: HDP_MISC_CNTL HDP_NONSURFACE_BASE HDP_NONSURFACE_BASE_HI v2: update commit message Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOVVictor Lu1-6/+10
PCTL0_MMHUB_DEEPSLEEP_IB is blocked for VF access Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Samir Dhume <samir.dhume@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: correct smu v13.0.6 umc ras error checkYang Wang2-2/+5
correct smu v13.0.0 umc ras error check Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: Add xcc param to SRIOV kiq write and WREG32_SOC15_IP_NO_KIQ (v4)Victor Lu6-19/+25
WREG32/RREG32_SOC15_IP_NO_KIQ and amdgpu_virt_kiq_reg_write_reg_wait are not using the correct rlcg interface or mec engine, respectively. Add xcc instance parameter to them. v4: Use GET_INST and squash commit with: "drm/amdgpu: Add xcc_inst param to amdgpu_virt_kiq_reg_write_reg_wait" v3: xcc not needed for MMMHUB v2: rebase Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: Add flag to enable indirect RLCG access for gfx v9.4.3Victor Lu1-0/+1
The "rlcg_reg_access_supported" flag is missing. Add it back in. Signed-off-by: Victor Lu <victorchengchi.lu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: correct amdgpu ip block rev infoYang Wang2-2/+2
correct following amdgpu ip block version information: - gfx_v9_4_3 - sdma_v4_4_2 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: Don't warn for unsupported set_xgmi_plpd_modeTao Zhou1-6/+8
set_xgmi_plpd_mode may be unsupported and this isn't error, no need to print warning for it. v2: add ret2 to save the status of psp_ras_trigger_error. Suggested-by: lijo.lazar@amd.com Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-10drm/amdgpu: lower CS errors to debug severityChristian König1-1/+1
Otherwise userspace can spam the logs by using incorrect input values. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-11-10drm/amdgpu: fix error handling in amdgpu_bo_list_get()Christian König1-0/+1
We should not leak the pointer where we couldn't grab the reference on to the caller because it can be that the error handling still tries to put the reference then. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-11-10drm/amdgpu: fix AGP init orderAlex Deucher7-3/+7
The default AGP settings were overwriting the IP selected ones since the default was getting set after the IP ones were selected. Fixes: de59b69932e6 ("drm/amdgpu/gmc: set a default disable value for AGP") Link: https://lists.freedesktop.org/archives/amd-gfx/2023-November/100966.html Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
2023-11-08Merge tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drmLinus Torvalds22-92/+328
Pull more drm updates from Dave Airlie: "Geert pointed out I missed the renesas reworks in my main pull, so this pull contains the renesas next work for atomic conversion and DT support. It also contains a bunch of amdgpu and some small ssd13xx fixes. renesas: - atomic conversion - DT support ssd13xx: - dt binding fix for ssd132x - Initialize ssd130x crtc_state to NULL. amdgpu: - Fix RAS support check - RAS fixes - MES fixes - SMU13 fixes - Contiguous memory allocation fix - BACO fixes - GPU reset fixes - Min power limit fixes - GFX11 fixes - USB4/TB hotplug fixes - ARM regression fix - GFX9.4.3 fixes - KASAN/KCSAN stack size check fixes - SR-IOV fixes - SMU14 fixes - PSP13 fixes - Display blend fixes - Flexible array size fixes amdkfd: - GPUVM fix radeon: - Flexible array size fixes" * tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm: (83 commits) drm/amd/display: Enable fast update on blendTF change drm/amd/display: Fix blend LUT programming drm/amd/display: Program plane color setting correctly drm/amdgpu: Query and report boot status drm/amdgpu: Add psp v13 function to query boot status drm/amd/swsmu: remove fw version check in sw_init. drm/amd/swsmu: update smu v14_0_0 driver if and metrics table drm/amdgpu: Add C2PMSG_109/126 reg field shift/masks drm/amdgpu: Optimize the asic type fix code drm/amdgpu: fix GRBM read timeout when do mes_self_test drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count drm/amd/pm: only check sriov vf flag once when creating hwmon sysfs drm/amdgpu: Attach eviction fence on alloc drm/amdkfd: Improve amdgpu_vm_handle_moved drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2 drm/amd/display: Avoid NULL dereference of timing generator drm/amdkfd: Update cache info for GFX 9.4.3 drm/amdkfd: Populate cache info for GFX 9.4.3 drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64 drm/amdgpu/smu13: drop compute workload workaround ...