summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
AgeCommit message (Collapse)AuthorFilesLines
2023-02-14drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10Kenneth Feng1-0/+7
implement mode2 reset on smu_v13_0_10 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-19Revert "drm/amdgpu: let mode2 reset fallback to default when failure"Victor Zhao1-6/+0
This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957. This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with the original design of reset handler. Will redesign it. Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-19Revert "drm/amdgpu: add debugfs amdgpu_reset_level"Victor Zhao1-8/+0
This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc. This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler. Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-17drm/amdgpu: add debugfs amdgpu_reset_levelVictor Zhao1-0/+8
Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled v2: make this debugfs in adev and use debugfs_create_u32 Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-17drm/amdgpu: let mode2 reset fallback to default when failureVictor Zhao1-0/+6
- introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed v2: move this part out from the asic specific part Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-17drm/amdgpu: add mode2 reset for sienna_cichlidVictor Zhao1-0/+7
To meet the requirement for multi container usecase which needs a quicker reset and not causing VRAM lost, adding the Mode2 reset handler for sienna_cichlid. v2: move skip mode2 flag part separately v3: remove the use of asic_reset_res Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10drm/amdgpu: Cache result of last reset at reset domain level.Andrey Grodzovsky1-0/+1
Will be read by executors of async reset like debugfs. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-04drm/amdgpu: Refactor mode2 reset logic for v13.0.2Lijo Lazar1-4/+4
Use IP version and refactor reset logic to apply to a list of devices. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amdgpu: Revert 'drm/amdgpu: annotate a false positive recursive locking'Andrey Grodzovsky1-8/+2
Since we have a single instance of reset semaphore which we lock only once even for XGMI hive we don't need the nested locking hint anymore. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74120.html
2022-02-09drm/amdgpu: Rework amdgpu_device_lock_adevAndrey Grodzovsky1-0/+19
This functions needs to be split into 2 parts where one is called only once for locking single instance of reset_domain's sem and reset flag and the other part which handles MP1 states should still be called for each device in XGMI hive. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74118.html
2022-02-09drm/amdgpu: Move in_gpu_reset into reset_domainAndrey Grodzovsky1-0/+1
We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74116.html
2022-02-09drm/amdgpu: Move reset sem into reset_domainAndrey Grodzovsky1-0/+2
We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html
2022-02-09drm/amdgpu: Rework reset domain to be refcounted.Andrey Grodzovsky1-0/+40
The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html
2021-04-09drm/amdgpu: Add mode2 reset support for aldebaranLijo Lazar1-0/+16
v1: Aldebaran uses reset control to support mode2 reset. The sequences to reset and restore hardware context are specific to a particular configuration. v2: Clear bus mastering before reset. Fix coding style issues, drop unwanted variables and info log. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Add reset control to amdgpu_deviceLijo Lazar1-0/+82
v1: Add generic amdgpu_reset_control to handle different types of resets. It may be added at device, hive or ip level. Each reset control has a list of handlers associated with it to handle different types of reset. Reset control is responsible for choosing the right handler given a particular reset context. Handler objects may implement a set of functions on how to handle a particular type of reset. prepare_env = Prepare environment/software context (not used currently). prepare_hwcontext = Prepare hardware context for the reset. perform_reset = Perform the type of reset. restore_hwcontext = Restore the hw context after reset. restore_env = Restore the environment after reset (not used currently). Reset context carries the context of reset, as of now this is based on the parameters used for current set of resets. v2: Fix coding style Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>