summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h
AgeCommit message (Collapse)AuthorFilesLines
2022-10-19Revert "drm/amdgpu: let mode2 reset fallback to default when failure"Victor Zhao1-2/+1
This reverts commit dac6b80818ac2353631c5a33d140d8d5508e2957. This commit reverted the AMDGPU_SKIP_MODE2_RESET as it conflicts with the original design of reset handler. Will redesign it. Fixes: dac6b80818ac23 ("drm/amdgpu: let mode2 reset fallback to default when failure") Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29drm/amdgpu: Skip put_reset_domain if it doesn't existVignesh Chander1-1/+2
For xgmi sriov, the reset is handled by host driver and hive->reset_domain is not initialized so need to check if it exists before doing a put. Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19drm/amdgpu: Adjust removal control flow for smu v13_0_2YiPeng Chai1-0/+1
Adjust removal control flow for smu v13_0_2: During amdgpu uninstallation, when removing the first device, the kernel needs to first send a mode1reset message to all gpu devices. Otherwise, smu initialization will fail the next time amdgpu is installed. V2: 1. Update commit comments. 2. Remove the global variable amdgpu_device_remove_cnt and add a variable to the structure amdgpu_hive_info. 3. Use hive to detect the first removed device instead of a global variable. V3: 1. Update commit comments. 2. Split a patch into multiple patches. 3. The current patch does: a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into the existing gpu recover path, which make all devices in hive list only have HW reset but no resume (except the base IP). b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover in amdgpu_pci_remove when removing the first device in hive list. c. When removing the first device, the IP blocks keyword function call sequence is as follows: .suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini. ^ | |-<----------<---------<----| The first three sequences are because of a call to amdgpu_device_gpu_recover. The three sequences will be executed in a loop until all devices in the hive list are iterated. The sequences starting from .hw_fini only apply to the first device. Since .suspend has been called before, except the resumed phase1 basic ip blocks, all other ip blocks .hw_fini of current device will do nothing. d. When removing other devices, the calling sequences is the same as legacy: .hw_fini -> .early_fini -> .sw_fini. Since .suspend has been called when removing the first device, except the resumed phase1 basic ip blocks, all of other ip blocks .hw_fini of current device will do nothing. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-17drm/amdgpu: let mode2 reset fallback to default when failureVictor Zhao1-0/+1
- introduce AMDGPU_SKIP_MODE2_RESET flag - let mode2 reset fallback to default reset method if failed v2: move this part out from the asic specific part Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-10drm/amdgpu: Avoid another list of reset devicesLijo Lazar1-0/+1
A list of devices to be reset is already created in amdgpu_device_gpu_recover function. Creating another list with the same nodes is incorrect and not supported in list_head. Instead, pass the device list as part of reset context. Fixes: 9e08564727fc (drm/amdgpu: Refactor mode2 reset logic for v13.0.2) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-10drm/amdgpu: Cache result of last reset at reset domain level.Andrey Grodzovsky1-0/+1
Will be read by executors of async reset like debugfs. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-10drm/amdgpu: Fix compile error.Andrey Grodzovsky1-2/+1
Seems I forgot to add this to the relevant commit when submitting. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220210031724.440943-1-andrey.grodzovsky@amd.com
2022-02-09drm/amdgpu: Rework amdgpu_device_lock_adevAndrey Grodzovsky1-0/+4
This functions needs to be split into 2 parts where one is called only once for locking single instance of reset_domain's sem and reset flag and the other part which handles MP1 states should still be called for each device in XGMI hive. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74118.html
2022-02-09drm/amdgpu: Move in_gpu_reset into reset_domainAndrey Grodzovsky1-0/+1
We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74116.html
2022-02-09drm/amdgpu: Move reset sem into reset_domainAndrey Grodzovsky1-0/+1
We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html
2022-02-09drm/amdgpu: Rework reset domain to be refcounted.Andrey Grodzovsky1-0/+35
The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html
2021-04-09drm/amdgpu: Fix build warningsLijo Lazar1-1/+1
Fix header guard and make internal functions static. Fixes the below warnings: drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgpu_reset.h:24:9: warning: '__AMDUGPU_RESET_H__' is used as a header guard here, followed by #define of a different macro [-Wheader-guard] drivers/gpu/drm/amd/amdgpu/aldebaran.c:110:6: warning: no previous prototype for function 'aldebaran_async_reset' [-Wmissing-prototypes] drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/aldebaran_ppt.c:1435:5: warning: no previous prototype for function 'aldebaran_mode2_reset' [-Wmissing-prototypes] Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: Add reset control to amdgpu_deviceLijo Lazar1-0/+85
v1: Add generic amdgpu_reset_control to handle different types of resets. It may be added at device, hive or ip level. Each reset control has a list of handlers associated with it to handle different types of reset. Reset control is responsible for choosing the right handler given a particular reset context. Handler objects may implement a set of functions on how to handle a particular type of reset. prepare_env = Prepare environment/software context (not used currently). prepare_hwcontext = Prepare hardware context for the reset. perform_reset = Perform the type of reset. restore_hwcontext = Restore the hw context after reset. restore_env = Restore the environment after reset (not used currently). Reset context carries the context of reset, as of now this is based on the parameters used for current set of resets. v2: Fix coding style Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>