summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c
AgeCommit message (Collapse)AuthorFilesLines
2022-03-25drm/amdgpu: add UTCL2 RAS poison query for Aldebaran (v2)Tao Zhou1-0/+14
Add help functions to query and reset RAS UTCL2 poison status. v2: implement it on amdgpu side and kfd only calls it. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-19drm/amdgpu: Fix the code style warnings in gfxyipechai1-1/+1
Fix the code style warnings in gfx: 1. WARNING: suspect code indent for conditional statements. 2. ERROR: spaces required around that '=' (ctx:WxV). Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-15drm/amdgpu: Modify gfx block to fit for the unified ras block data and opsyipechai1-11/+14
1.Modify gfx block to fit for the unified ras block data and ops. 2.Change amdgpu_gfx_ras_funcs to amdgpu_gfx_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of gfx ras variable so that gfx ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register gfx ras block into amdgpu device ras block link list. 5.Remove the redundant code about gfx in amdgpu_ras.c after using the unified ras block. 6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of gfx versions. If .ras_late_init and .ras_fini had been defined by the selected gfx version, the defined functions will take effect; if not defined, default fill with amdgpu_gfx_ras_late_init and amdgpu_gfx_ras_fini. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28drm/amdgpu: remove GPRs init for ALDEBARAN in gpu reset (v3)Tao Zhou1-3/+3
Remove GPRs init for ALDEBARAN in gpu reset temporarily, will add the init once the algorithm is stable. v2: Only remove GPRs init in gpu reset. v3: Suspend needs it, only skip it in gpu reset. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28drm/amdgpu: skip GPRs init for some CU settings on ALDEBARANTao Zhou1-0/+5
Skip GPRs init in specific condition since current GPRs init algorithm only works for some CU settings. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-09drm/amdgpu: fix kernel-doc warnings on non-kernel-doc commentsRandy Dunlap1-3/+3
Don't use "begin kernel-doc notation" (/**) for comments that are not kernel-doc. This eliminates warnings reported by the 0day bot. drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:89: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * This shader is used to clear VGPRS and LDS, and also write the input drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:209: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * The below shaders are used to clear SGPRS, and also write the input drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:301: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * This shader is used to clear the uninitiated sgprs after the above Fixes: 0e0036c7d13b ("drm/amdgpu: fix no full coverage issue for gprs initialization") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-19drm/amdgpu: PWRBRK sequence changes for AldebaranAshish Pawar1-5/+0
Modify power brake enablement sequence on Aldebaran Signed-off-by: Ashish Pawar <ashish.pawar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-27drm/amdgpu: Correctly clear GCEA error statusMukul Joshi1-3/+7
While clearing GCEA error status, do not clear the bits set by RAS TA. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-21drm/amd/amdgpu/gfx_v9_4_2: Mark functions called by reference as staticLee Jones1-7/+7
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1008:5: warning: no previous prototype for ‘gfx_v9_4_2_query_ras_error_count’ [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1054:6: warning: no previous prototype for ‘gfx_v9_4_2_reset_ras_error_count’ [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1063:5: warning: no previous prototype for ‘gfx_v9_4_2_ras_error_inject’ [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1133:6: warning: no previous prototype for ‘gfx_v9_4_2_query_ras_error_status’ [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1143:6: warning: no previous prototype for ‘gfx_v9_4_2_reset_ras_error_status’ [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c:1153:6: warning: no previous prototype for ‘gfx_v9_4_2_enable_watchdog_timer’ [-Wmissing-prototypes] Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-20drm/amdgpu: add synchronization among waves in the same threadgroupDennis Li1-134/+142
It is possible that the previous waves have exited before others are created, so the other waves maybe reuse pyhsical resouces left by previous ones. Therefore add barrier instruction to synchronize waves within the same threadgroup. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: correct the funtion to clear GCEA error statusDennis Li1-1/+4
The bit 11 of GCEA_ERR_STATUS register is used to clear GCEA error status. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-11drm/amdgpu: update the shader to clear specific SGPRsDennis Li1-44/+47
Add shader codes to explicitly clear specific SGPRs, such as flat_scratch_lo, flat_scratch_hi and so on. And also correct the allocation size of SGPRs in PGM_RSRC1. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-29drm/amdgpu: fix no full coverage issue for gprs initializationDennis Li1-264/+515
The wave's number per simd in aldebaran is changed to 8, so it is impossible to use old algorithm to initiate all sgprs with one threadgroup. The new algorithm firstly use three threadgroups to initiate most sgprs simultaneously and then use another threadgroup with 4 waves to cover other uninitiated sgprs. v2: Add more description about the new algorithm to clear sgprs and add some comment for shader binaries Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-24drm/amdgpu: refine gprs init shaders to check coverageDennis Li1-2/+375
Add codes to check whether all SIMDs are covered, make sure that all GPRs are initialized. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-21drm/amdgpu: resolve erroneous gfx_v9_4_2 printsJohn Clements1-1/+1
resolve bug on aldebaran where gfx error counts will print on driver load when there are no errors present Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-21drm/amdgpu: only harvest gcea/mmea error status in aldebaranHawking Zhang1-9/+12
In aldebaran, driver only needs to harvest SDP RdRspStatus, WrRspStatus and first parity error on RdRsp data. Check error type before harvest error information. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-15drm/amdgpu: update gfx 9.4.2 ras error reportingJohn Clements1-6/+7
only output ras error status if an error bit is set or error counter is set Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: split gfx callbacks into ras and non-ras onesHawking Zhang1-1/+12
gfx ras is only available in cerntain ip generations. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: harvest edc status when connected to host via xGMIDennis Li1-6/+48
When connected to a host via xGMI, system fatal errors may trigger warm reset, driver has no change to query edc status before reset. Therefore in this case, driver should harvest previous error loging registers during boot, instead of only resetting them. v2: 1. IP's ras_manager object is created when its ras feature is enabled, so change to query edc status after amdgpu_ras_late_init called 2. change to enable watchdog timer after finishing gfx edc init Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reivewed-by: Hawking Zhang <hawking.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: update default timeout of Aldebaran SQ watchdogHarish Kasiviswanathan1-0/+7
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reivewed-by: Hawking Zhang <hawking.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: enable watchdog feature for SQ of aldebaranDennis Li1-0/+106
SQ's watchdog timer monitors forward progress, a mask of which waves caused the watchdog timeout is recorded into ras status registers and then trigger a system fatal error event. v2: 1. change *query_timeout_status to *query_sq_timeout_status. 2. move query_sq_timeout_status into amdgpu_ras_do_recovery. 3. add module parameters to enable/disable fatal error event and modify the watchdog timer. v3: 1. remove unused parameters of *enable_watchdog_timer Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: refine ras codes for GC utc of aldebaranDennis Li1-173/+96
The bank number of both VML2 and ATCL2 are changed to 8, so refine related codes to avoid defining long name arrays. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: add ras support for gfx of aldebaranDennis Li1-0/+1078
add edc counter/status reset and query functions for gfx block of aldebaran. v2: change to clear edc counter explicitly aldebaran hardware will not clear edc counter after driver reading them, so driver should clear them explicitly. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: add gc powerbrake support (v2)Kevin Wang1-0/+26
add GC power brake feature support for Aldebaran. v2: squash in fixes (Alex) Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: update TCP_CHAN_STEER_1 golden value for aldebaranHawking Zhang1-1/+1
The golden setting was changed recently. update to the latest one Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: add common gc golden settings for aldebaranHawking Zhang1-3/+6
golden settings that should be applied Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: apply gc v9_4_2 golden settings for aldebaranHawking Zhang1-0/+51
Those registers should be programmed as one-time initialization Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Kevin Wang <kevin1.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-24drm/amdgpu: restore aldebaran save ttmp and trap config on init (v2)Jonathan Kim1-0/+50
Initialization of TRAP_DATA0/1 is still required for the debugger to detect new waves on Aldebaran. Also, per-vmid global trap enablement may be required outside of debugger scope so move to init phase. v2: just add the gfx 9.4.2 changes (Alex) Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>