summaryrefslogtreecommitdiff
path: root/drivers/scsi/hisi_sas
AgeCommit message (Collapse)AuthorFilesLines
2023-06-08scsi: hisi_sas: Convert to platform remove callback returning voidUwe Kleine-König4-15/+4
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. hisi_sas_remove() returned zero unconditionally so this was changed to return void. Then it has the right prototype to be used directly as remove callback for the two hisi_sas drivers. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20230518202043.261739-1-u.kleine-koenig@pengutronix.de Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-17scsi: hisi_sas: Fix warnings detected by sparseXingui Yang1-3/+4
This patch fixes the following warning: drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:2168:43: sparse: sparse: restricted __le32 degrades to integer Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304161254.NztCVZIO-lkp@intel.com/ Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1684118481-95908-4-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-17scsi: hisi_sas: Change DMA setup lock timeout to 2.5sXingui Yang1-0/+4
DMA setup lock timeout protection is added when DMA setup frames are received. It's a function outside the protocol and used to prevent SATA disk I/Os from being delivered for a long time. The default value is 100ms, it's too strict and easily triggered timeout when the disk is overloaded or faulty. Based on the average I/O latency of 300 disks, we adjust the value to 2.5s. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1684118481-95908-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-17scsi: hisi_sas: Configure initial value of some registers according to HBA modelYihang Li1-5/+12
For SAS HBAs of 920 and previous version, we use init_reg_v3_hw() to set some registers which are related to HW boards. For SAS HBAs of 920B and later version, those HW registers are set through firmware. And different HBA models are distinguished through pci_dev->revision. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1684118481-95908-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-12scsi: hisi_sas: Work around build failure in suspend functionArnd Bergmann1-0/+4
The suspend/resume functions in this driver seem to have multiple problems, the latest one just got introduced by a bugfix: drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw': drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count' 5142 | if (atomic_read(&device->power.usage_count)) { drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw': drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count' 5142 | if (atomic_read(&device->power.usage_count)) { As far as I can tell, the 'usage_count' is not meant to be accessed by device drivers at all, though I don't know what the driver is supposed to do instead. Another problem is the use of the deprecated UNIVERSAL_DEV_PM_OPS(), and marking functions as __maybe_unused to avoid warnings about unused functions. This should probably be changed to using DEFINE_RUNTIME_DEV_PM_OPS(). Both changes require actually understanding what the driver needs to do, and being able to test this, so instead here is the simplest patch to make it pass the randconfig builds instead. Fixes: e368d38cb952 ("scsi: hisi_sas: Exit suspend state when usage count is greater than 0") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230405083611.3376739-1-arnd@kernel.org Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-03Merge patch series "scsi: hisi_sas: Some misc changes"Martin K. Petersen5-35/+124
chenxiang <chenxiang66@hisilicon.com> says: This series contain some fixes including: - Grab sas_dev lock when traversing sas_dev list to avoid NULL pointer - Handle NCQ error when IPTT is valid - Ensure all enabled PHYs up during controller reset - Exit suspend state when usage count of runtime PM is greater than 0 https://lore.kernel.org/r/1679283265-115066-1-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-03scsi: hisi_sas: Exit suspend state when usage count is greater than 0Yihang Li1-17/+56
When the current status of the host controller is suspended, enabling a local PHY just after disabling all local PHYs in expander environment, a hang as follows occurs: [ 486.854655] INFO: task kworker/u256:1:899 blocked for more than 120 seconds. [ 486.862207] Not tainted 6.1.0-rc4+ #1 [ 486.870545] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 486.878893] task:kworker/u256:1 state:D stack:0 pid:899 ppid:2 flags:0x00000008 [ 486.887745] Workqueue: 0000:74:02.0_disco_q sas_discover_domain [libsas] [ 486.894704] Call trace: [ 486.897400] __switch_to+0xf0/0x170 [ 486.901146] __schedule+0x3e4/0x1160 [ 486.904970] schedule+0x64/0x104 [ 486.908442] rpm_resume+0x158/0x6a0 [ 486.912163] __pm_runtime_resume+0x5c/0x84 [ 486.916489] smp_execute_task_sg+0x1f8/0x264 [libsas] [ 486.921773] sas_discover_expander.part.0+0xbc/0x720 [libsas] [ 486.927750] sas_discover_root_expander+0x90/0x154 [libsas] [ 486.933552] sas_discover_domain+0x444/0x6d0 [libsas] [ 486.938826] process_one_work+0x1e0/0x450 [ 486.943057] worker_thread+0x150/0x44c [ 486.947015] kthread+0x114/0x120 [ 486.950447] ret_from_fork+0x10/0x20 [ 486.954292] INFO: task kworker/u256:2:1780 blocked for more than 120 seconds. [ 486.961637] Not tainted 6.1.0-rc4+ #1 [ 486.966087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 486.974356] task:kworker/u256:2 state:D stack:0 pid:1780 ppid:2 flags:0x00000208 [ 486.983141] Workqueue: 0000:74:02.0_event_q sas_port_event_worker [libsas] [ 486.990252] Call trace: [ 486.992930] __switch_to+0xf0/0x170 [ 486.996645] __schedule+0x3e4/0x1160 [ 487.000439] schedule+0x64/0x104 [ 487.003886] schedule_timeout+0x17c/0x1c0 [ 487.008102] wait_for_completion+0x7c/0x160 [ 487.012488] __flush_workqueue+0x104/0x3e0 [ 487.016782] sas_porte_bytes_dmaed+0x414/0x454 [libsas] [ 487.022203] sas_port_event_worker+0x38/0x60 [libsas] [ 487.027449] process_one_work+0x1e0/0x450 [ 487.031645] worker_thread+0x150/0x44c [ 487.035594] kthread+0x114/0x120 [ 487.039017] ret_from_fork+0x10/0x20 [ 487.042828] INFO: task bash:11488 blocked for more than 121 seconds. [ 487.049366] Not tainted 6.1.0-rc4+ #1 [ 487.053746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 487.061953] task:bash state:D stack:0 pid:11488 ppid:10977 flags:0x00000204 [ 487.070698] Call trace: [ 487.073355] __switch_to+0xf0/0x170 [ 487.077050] __schedule+0x3e4/0x1160 [ 487.080833] schedule+0x64/0x104 [ 487.084270] schedule_timeout+0x17c/0x1c0 [ 487.088474] wait_for_completion+0x7c/0x160 [ 487.092851] __flush_workqueue+0x104/0x3e0 [ 487.097137] drain_workqueue+0xb8/0x160 [ 487.101159] __sas_drain_work+0x50/0x90 [libsas] [ 487.105963] sas_suspend_ha+0x64/0xd4 [libsas] [ 487.110590] suspend_v3_hw+0x198/0x1e8 [hisi_sas_v3_hw] [ 487.115989] pci_pm_runtime_suspend+0x5c/0x1d0 [ 487.120606] __rpm_callback+0x50/0x150 [ 487.124535] rpm_callback+0x74/0x80 [ 487.128204] rpm_suspend+0x110/0x640 [ 487.131955] rpm_idle+0x1f4/0x2d0 [ 487.135447] __pm_runtime_idle+0x58/0x94 [ 487.139538] queue_phy_enable+0xcc/0xf0 [libsas] [ 487.144330] store_sas_phy_enable+0x74/0x100 [ 487.148770] dev_attr_store+0x20/0x34 [ 487.152606] sysfs_kf_write+0x4c/0x5c [ 487.156437] kernfs_fop_write_iter+0x120/0x1b0 [ 487.161049] vfs_write+0x2d0/0x36c [ 487.164625] ksys_write+0x70/0x100 [ 487.168194] __arm64_sys_write+0x24/0x30 [ 487.172280] invoke_syscall+0x50/0x120 [ 487.176186] el0_svc_common.constprop.0+0x168/0x190 [ 487.181214] do_el0_svc+0x34/0xc0 [ 487.184680] el0_svc+0x2c/0xb4 [ 487.187879] el0t_64_sync_handler+0xb8/0xbc [ 487.192205] el0t_64_sync+0x19c/0x1a0 We find that when all local PHYs are disabled, all the devices will be removed, the ->runtime_suspend() callback suspend_v3_hw() directly execute since the controller usage count drop to 0. On the other side, the first local PHY is enabled through the sysfs interface, and ensures that function phy_up_v3_hw() is completed due to suspend_v3_hw()-> interrupt_disable_v3_hw(). In the expander scenario, sas_discover_root_expander() is executed in event work DISCE_DISCOVER_DOMAIN, which will increases the controller usage count and carry out a resume and sends SMPIO, it cannot be completed because the runtime PM status of the controller is RPM_SUSPENDING. At the same time, the ->runtime_suspend() callback suspend_v3_hw() also cannot complete the process because of drain libsas event queue in sas_suspend_ha(), so hung occurs. (thread 1) | (thread 2) ... | rpm_idle() | ... | __update_runtime_status(RPM_SUSPENDING)| ... | ... suspend_v3_hw() | smp_execute_task_sg() ... | ... interrupt_disable_v3_hw() | pm_runtime_get_sync() | ... ... | rpm_resume() //RPM_SUSPENDING | __sas_drain_work() | To fix this, check if the current runtime PM status of the controller allows to be suspended continue after interrupt_disable_v3_hw(), return immediately if not. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hislicon.com> Link: https://lore.kernel.org/r/1679283265-115066-5-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-03scsi: hisi_sas: Ensure all enabled PHYs up during controller resetYihang Li1-2/+30
For the controller reset operation, hisi_sas_phy_enable() is executed for each enabled local PHY, and refresh the port id of each device based on the latest hisi_sas_phy->port_id after 1 second sleep, hisi_sas_phy->port_id is configured in the interrupt processing function phy_up_v3_hw(). However, in directly attached scenario, for some SATA disks the amount of time for phyup more than 1s sometimes. In this case, incorrect port id may be configured in hisi_sas_refresh_port_id(). As a result, all the internal IOs fail and disk lost, such as follows: [10717.666565] hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=10(sata) [10718.826813] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=63 task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc8003f 0x0 0x0 Error info: 0x0 0x0 0x0 0x0 [10718.843428] sas: TMF task open reject failed 5000000000000501 [10718.849242] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=64 task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc80040 0x0 0x0 Error info: 0x0 0x0 0x0 0x0 [10718.865856] sas: TMF task open reject failed 5000000000000501 [10718.871670] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=65 task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc80041 0x0 0x0 Error info: 0x0 0x0 0x0 0x0 [10718.888284] sas: TMF task open reject failed 5000000000000501 [10718.894093] sas: executing TMF for 5000000000000501 failed after 3 attempts! [10718.901114] hisi_sas_v3_hw 0000:74:02.0: ata disk 5000000000000501 reset failed [10718.908410] hisi_sas_v3_hw 0000:74:02.0: controller reset complete ..... [10773.298633] ata216.00: revalidation failed (errno=-19) [10773.303753] ata216.00: disable device So the time of waitting for PHYs up is 1s which may be not enough. To solve the issue, running hisi_sas_phy_enable() in parallel through async operations and use wait_for_completion_timeout() to wait for PHYs come up instead of directly sleep for 1 second. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1679283265-115066-4-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-03scsi: hisi_sas: Handle NCQ error when IPTT is validXingui Yang3-3/+15
If an NCQ error occurs when the IPTT is valid and slot->abort flag is set in completion path, sas_task_abort() will be called to abort only one NCQ command now, and the host would be set to SHOST_RECOVERY state. But this may not kick-off EH Immediately until other outstanding QCs timeouts. As a result, the host may remain in the SHOST_RECOVERY state for up to 30 seconds, such as follows: [7972317.645234] hisi_sas_v3_hw 0000:74:04.0: erroneous completion iptt=3264 task=00000000466116b8 dev id=2 sas_addr=0x5000000000000502 CQ hdr: 0x1883 0x20cc0 0x40000 0x20420000 Error info: 0x0 0x0 0x200000 0x0 [7972341.508264] sas: Enter sas_scsi_recover_host busy: 32 failed: 32 [7972341.984731] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 32 tries: 1 All NCQ commands that are in the queue should be aborted when an NCQ error occurs in this scenario. Fixes: 05d91b557af9 ("scsi: hisi_sas: Directly trigger SCSI error handling for completion errors") Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1679283265-115066-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-03scsi: hisi_sas: Grab sas_dev lock when traversing the members of sas_dev.listXingui Yang5-13/+23
When freeing slots in function slot_complete_v3_hw(), it is possible that sas_dev.list is being traversed elsewhere, and it may trigger a NULL pointer exception, such as follows: ==>cq thread ==>scsi_eh_6 ==>scsi_error_handler() ==>sas_eh_handle_sas_errors() ==>sas_scsi_find_task() ==>lldd_abort_task() ==>slot_complete_v3_hw() ==>hisi_sas_abort_task() ==>hisi_sas_slot_task_free() ==>dereg_device_v3_hw() ==>list_del_init() ==>list_for_each_entry_safe() [ 7165.434918] sas: Enter sas_scsi_recover_host busy: 32 failed: 32 [ 7165.434926] sas: trying to find task 0x00000000769b5ba5 [ 7165.434927] sas: sas_scsi_find_task: aborting task 0x00000000769b5ba5 [ 7165.434940] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000769b5ba5) aborted [ 7165.434964] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000c9f7aa07) ignored [ 7165.434965] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000e2a1cf01) ignored [ 7165.434968] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 7165.434972] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(0000000022d52d93) ignored [ 7165.434975] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(0000000066a7516c) ignored [ 7165.434976] Mem abort info: [ 7165.434982] ESR = 0x96000004 [ 7165.434991] Exception class = DABT (current EL), IL = 32 bits [ 7165.434992] SET = 0, FnV = 0 [ 7165.434993] EA = 0, S1PTW = 0 [ 7165.434994] Data abort info: [ 7165.434994] ISV = 0, ISS = 0x00000004 [ 7165.434995] CM = 0, WnR = 0 [ 7165.434997] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f29543f2 [ 7165.434998] [0000000000000000] pgd=0000000000000000 [ 7165.435003] Internal error: Oops: 96000004 [#1] SMP [ 7165.439863] Process scsi_eh_6 (pid: 4109, stack limit = 0x00000000c43818d5) [ 7165.468862] pstate: 00c00009 (nzcv daif +PAN +UAO) [ 7165.473637] pc : dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw] [ 7165.479443] lr : dereg_device_v3_hw+0x2c/0xa8 [hisi_sas_v3_hw] [ 7165.485247] sp : ffff00001d623bc0 [ 7165.488546] x29: ffff00001d623bc0 x28: ffffa027d03b9508 [ 7165.493835] x27: ffff80278ed50af0 x26: ffffa027dd31e0a8 [ 7165.499123] x25: ffffa027d9b27f88 x24: ffffa027d9b209f8 [ 7165.504411] x23: ffffa027c45b0d60 x22: ffff80278ec07c00 [ 7165.509700] x21: 0000000000000008 x20: ffffa027d9b209f8 [ 7165.514988] x19: ffffa027d9b27f88 x18: ffffffffffffffff [ 7165.520276] x17: 0000000000000000 x16: 0000000000000000 [ 7165.525564] x15: ffff0000091d9708 x14: ffff0000093b7dc8 [ 7165.530852] x13: ffff0000093b7a23 x12: 6e7265746e692067 [ 7165.536140] x11: 0000000000000000 x10: 0000000000000bb0 [ 7165.541429] x9 : ffff00001d6238f0 x8 : ffffa027d877af00 [ 7165.546718] x7 : ffffa027d6329600 x6 : ffff7e809f58ca00 [ 7165.552006] x5 : 0000000000001f8a x4 : 000000000000088e [ 7165.557295] x3 : ffffa027d9b27fa8 x2 : 0000000000000000 [ 7165.562583] x1 : 0000000000000000 x0 : 000000003000188e [ 7165.567872] Call trace: [ 7165.570309] dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw] [ 7165.575775] hisi_sas_abort_task+0x248/0x358 [hisi_sas_main] [ 7165.581415] sas_eh_handle_sas_errors+0x258/0x8e0 [libsas] [ 7165.586876] sas_scsi_recover_host+0x134/0x458 [libsas] [ 7165.592082] scsi_error_handler+0xb4/0x488 [ 7165.596163] kthread+0x134/0x138 [ 7165.599380] ret_from_fork+0x10/0x18 [ 7165.602940] Code: d5033e9f b9000040 aa0103e2 eb03003f (f9400021) [ 7165.609004] kernel fault(0x1) notification starting on CPU 75 [ 7165.700728] ---[ end trace fc042cbbea224efc ]--- [ 7165.705326] Kernel panic - not syncing: Fatal exception To fix the issue, grab sas_dev lock when traversing the members of sas_dev.list in dereg_device_v3_hw() and hisi_sas_release_tasks() to avoid concurrency of adding and deleting member. When function hisi_sas_release_tasks() calls hisi_sas_do_release_task() to free slot, the lock cannot be grabbed again in hisi_sas_slot_task_free(), then a bool parameter need_lock is added. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1679283265-115066-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-01Merge branch '6.3/scsi-fixes' into 6.4/scsi-stagingMartin K. Petersen1-2/+1
Pull in the fixes branch to resolve an mpi3mr conflict reported by sfr. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-25Merge patch series "Constify most SCSI host templates"Martin K. Petersen4-4/+4
Bart Van Assche <bvanassche@acm.org> says: It helps humans and the compiler if it is made explicit that SCSI host templates are not modified. Hence this patch series that constifies most SCSI host templates. Please consider this patch series for the next merge window. Link: https://lore.kernel.org/r/20230322195515.1267197-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-25scsi: hisi_sas: Declare SCSI host template constBart Van Assche4-4/+4
Make it explicit that the SCSI host template is not modified. Acked-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230322195515.1267197-42-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-10scsi: hisi_sas: Add device attribute experimental_iopoll_q_cnt for v3 hwXiang Chen1-0/+13
Add device attribute experimental_iopoll_q_cnt to indicate how many iopoll queues are used for v3 hw. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1678169355-76215-5-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-10scsi: hisi_sas: Sync complete queue for poll queueXiang Chen3-22/+61
Currently we sync irq to avoid freeing task before using task in I/O completion. After adding io_uring support, we need to do something similar for poll queues. As the process of CQ entries on poll queue are protected by spinlock cq->lock, we can use spin_lock() + spin_unlock() on cq->lock to make sure that CQ entries are processed to completion and then the complete queue is synced. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1678169355-76215-4-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-10scsi: hisi_sas: Add poll support for v3 hwXiang Chen3-10/+82
Add a module parameter to set how many queues are used for iopoll. Also fill the interface mq_poll. For internal I/Os from libsas and libata we use non-iopoll queue (queue 0) to deliver and complete them. But for internal abort I/Os, just don't send them for poll queues. There is still a risk associated as this sends internal abort commands to non-iopoll queues which actually requires sending an internal abort command to every queue. As a result, make the module parameter as "experimental" for now. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1678169355-76215-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-10scsi: hisi_sas: Add function complete_v3_hw()Xiang Chen1-5/+12
Put the work of processing cq slots in a separate function, complete_v3_hw(), which can then be used by cq_thread_v3_hw() and other functions when adding poll support. Co-developed-by: John Garry <john.garry@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1678169355-76215-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-07scsi: hisi_sas: Check devm_add_action() return valueKang Chen1-2/+1
In case devm_add_action() fails, check it in the caller of interrupt_preinit_v3_hw(). Link: https://lore.kernel.org/r/20230227031030.893324-1-void0red@gmail.com Signed-off-by: Kang Chen <void0red@gmail.com> Acked-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-01-12scsi: hisi_sas: Set a port invalid only if there are no devices attached ↵Yihang Li1-1/+1
when refreshing port id Currently the driver sets the port invalid if one phy in the port is not enabled, which may cause issues in expander situation. In directly attached situation, if phy up doesn't occur in time when refreshing port id, the port is incorrectly set to invalid which will also cause disk lost. Therefore set a port invalid only if there are no devices attached to the port. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1672805000-141102-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-01-12scsi: hisi_sas: Use abort task set to reset SAS disks when discoveredXingui Yang1-1/+1
Currently clear task set is used to abort all commands remaining in the disk when the SAS disk is discovered, and if the disk is discovered by two initiators, other I_T nexuses are also affected. So use abort task set instead and take effect only on the specified I_T nexus. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1672805000-141102-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-01-07scsi: hisi_sas: Fix tag freeing for reserved tagsJason Yan1-1/+1
The reserved tags were put in the lower region of the tagset in commit f7d190a94e35 ("scsi: hisi_sas: Put reserved tags in lower region of tagset"). However, only the allocate function was changed, freeing was not handled. This resulted in a failure to boot: [ 33.467345] hisi_sas_v3_hw 0000:b4:02.0: task exec: failed[-132]! [ 33.473413] sas: Executing internal abort failed 5000000000000603 (-132) [ 33.480088] hisi_sas_v3_hw 0000:b4:02.0: I_T nexus reset: internal abort (-132) [ 33.657336] hisi_sas_v3_hw 0000:b4:02.0: task exec: failed[-132]! [ 33.663403] ata7.00: failed to IDENTIFY (I/O error, err_mask=0x40) [ 35.787344] hisi_sas_v3_hw 0000:b4:04.0: task exec: failed[-132]! [ 35.793411] sas: Executing internal abort failed 5000000000000703 (-132) [ 35.800084] hisi_sas_v3_hw 0000:b4:04.0: I_T nexus reset: internal abort (-132) [ 35.977335] hisi_sas_v3_hw 0000:b4:04.0: task exec: failed[-132]! [ 35.983403] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x40) [ 35.989643] ata10.00: revalidation failed (errno=-5) Fixes: f7d190a94e35 ("scsi: hisi_sas: Put reserved tags in lower region of tagset") Cc: John Garry <john.g.garry@oracle.com> Cc: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Acked-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-11-26scsi: hisi_sas: Fix SATA devices missing issue during I_T nexus resetJie Zhan1-3/+5
SATA devices on an expander may be removed and not be found again when I_T nexus reset and revalidation are processed simultaneously. The issue comes from: - Revalidation can remove SATA devices in link reset, e.g. in hisi_sas_clear_nexus_ha(). - However, hisi_sas_debug_I_T_nexus_reset() polls the state of a SATA device on an expander after sending link_reset, where it calls: hisi_sas_debug_I_T_nexus_reset sas_ata_wait_after_reset ata_wait_after_reset ata_wait_ready smp_ata_check_ready sas_ex_phy_discover sas_ex_phy_discover_helper sas_set_ex_phy The ex_phy's change count is updated in sas_set_ex_phy(), so SATA devices after a link reset may not be found later through revalidation. A similar issue was reported in: commit 0f3fce5cc77e ("[SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready") commit 87c8331fcf72 ("[SCSI] libsas: prevent domain rediscovery competing with ata error handling"). To address this issue, in hisi_sas_debug_I_T_nexus_reset(), we now call smp_ata_check_ready_type() that only polls the device type while not updating the ex_phy's data of libsas. Fixes: 71453bd9d1bf ("scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset") Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Link: https://lore.kernel.org/r/20221118083714.4034612-5-zhanjie9@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-11-26scsi: Revert "scsi: hisi_sas: Don't send bcast events from HW during nexus ↵Jie Zhan1-12/+4
HA reset" This reverts commit f5f2a2716055ad8c0c4ff83e51d667646c6c5d8a. This is now unnecessary to solve the SATA devices missing issue in hisi_sas_clear_nexus_ha(). Hence, we should not ignore bcast events during sas_eh_handle_sas_errors() in case of missing bcast events, unless a justified need is found and a mechanism to defer (but not ignore) bcast events in sas_eh_handle_sas_errors() is provided. Also, in hisi_sas_clear_nexus_ha(), there is nothing further to handle in "out: " other than return, so that part can be reverted. Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Link: https://lore.kernel.org/r/20221118083714.4034612-3-zhanjie9@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-11-26scsi: Revert "scsi: hisi_sas: Drain bcast events in hisi_sas_rescan_topology()"Jie Zhan1-7/+0
This reverts commit 11ff0c98fca35df16c84d4eee52008faecaf10a6. Draining or flushing events in hisi_sas_rescan_topology() can hang the driver, typically with phy up or phy down events being processed, i.e. sas_porte_bytes_dmaed() or sas_phye_loss_of_signal(). Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Link: https://lore.kernel.org/r/20221118083714.4034612-2-zhanjie9@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-10-22scsi: hisi_sas: Put reserved tags in lower region of tagsetJohn Garry1-7/+7
To be consistent with blk-mq, put the reserved tags in the lower region of the tagset. Eventually we hope to get rid of all this reserved tag management. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1666091763-11023-4-git-send-email-john.garry@huawei.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-10-22scsi: hisi_sas: Use sas_task_find_rq()John Garry1-18/+8
Use sas_task_find_rq() to lookup the request per task for its driver tag. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1666091763-11023-3-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-10-18scsi: hisi_sas: Use sas_find_attathed_phy_id() instead of open coding itJason Yan1-11/+3
The attached phy finding is open coded. Replace it with sas_find_attached_phy_id(). To keep things consistent, the return value of hisi_sas_dev_found() is also changed to -ENODEV after calling sas_find_attathed_phy_id() failed. Signed-off-by: Jason Yan <yanaijie@huawei.com> Link: https://lore.kernel.org/r/20220928070130.3657183-6-yanaijie@huawei.com Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-10-18scsi: hisi_sas: Modify v3 HW SATA disk error state completion processingXingui Yang1-1/+4
When an NCQ error occurs, the controller will abnormally complete the I/Os that are newly delivered to disk, and bit8 in CQ dw3 will be set which indicates that the SATA disk is in error state. The current processing flow is to set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will set FIS stat to ATA_ERR. After analyzing the I/O by ata_eh_analyze_tf(), err_mask will set to AC_ERR_HSM. If media error occurs for four times within 10 minutes and the chip rejects new I/Os for four times, NCQ will be disabled due to excessive errors, which is undesirable. Therefore, use sas_task_abort() to handle abnormally completed I/Os when SATA disk is in error state, as these abnormally completed I/Os are already processed by sas_ata_device_link_abort() and qc->flag are set to ATA_QCFLAG_FAILED. If sas_task_abort() is used, qc->err_mask will not be modified in EH. Unlike the current process flow, it will not increase the count of ECAT_TOUT_HSM and not turn off NCQ. Like other I/Os on the disk that do not have an error but do not return after the NCQ error, they are retried after the EH. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1665998435-199946-5-git-send-email-john.garry@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-10-18scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hwXingui Yang3-3/+64
When CQ header dw3 SATA_DISK_ERR is set it means this SATA disk is in error state and the current IPTT is invalid. An invalid IPTT does not correspond to any slot. In this scenario, new I/Os that delivered to disk will be rejected by the controller and all I/Os remaining in the disk should be aborted, which we add here with the sas_ata_device_link_abort() call. In hisi_sas_abort_task() we don't want to issue a soft reset as it may cause info to be lost in the target disk for the ATA EH autopsy. In this case, just release resources - the disk won't return other I/Os normally after NCQ Error, so this is safe. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1665998435-199946-4-git-send-email-john.garry@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-10-18scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()Xingui Yang1-5/+3
Each branch currently defines a slot variable independently, and it is neater to move it to the function head. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1665998435-199946-3-git-send-email-john.garry@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-10-07Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds5-25/+38
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (qla2xxx, lpfc, ufs, hisi_sas, mpi3mr, mpt3sas, target). The biggest change (from my biased viewpoint) being that the mpi3mr now attached to the SAS transport class, making it the first fusion type device to do so. Beyond the usual bug fixing and security class reworks, there aren't a huge number of core changes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits) scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername() scsi: mpi3mr: Remove unnecessary cast scsi: stex: Properly zero out the passthrough command structure scsi: mpi3mr: Update driver version to 8.2.0.3.0 scsi: mpi3mr: Fix scheduling while atomic type bug scsi: mpi3mr: Scan the devices during resume time scsi: mpi3mr: Free enclosure objects during driver unload scsi: mpi3mr: Handle 0xF003 Fault Code scsi: mpi3mr: Graceful handling of surprise removal of PCIe HBA scsi: mpi3mr: Schedule IRQ kthreads only on non-RT kernels scsi: mpi3mr: Support new power management framework scsi: mpi3mr: Update mpi3 header files scsi: mpt3sas: Revert "scsi: mpt3sas: Fix ioc->base_readl() use" scsi: mpt3sas: Revert "scsi: mpt3sas: Fix writel() use" scsi: wd33c93: Remove dead code related to the long-gone config WD33C93_PIO scsi: core: Add I/O timeout count for SCSI device scsi: qedf: Populate sysfs attributes for vport scsi: pm8001: Replace one-element array with flexible-array member scsi: 3w-xxxx: Replace one-element array with flexible-array member scsi: hptiop: Replace one-element array with flexible-array member in struct hpt_iop_request_ioctl_command() ...
2022-09-07scsi: hisi_sas: Don't send bcast events from HW during nexus HA resetJohn Garry1-4/+12
Remote devices may go missing from the per-device nexus reset part of the HA nexus, i.e after the controller reset. This is because libsas may find the devices to be gone as the phy may be temporarily down when processing the bcast event generated from the nexus reset. Filter out bcast events during this time to stop the devices being lost. Link: https://lore.kernel.org/r/1662378529-101489-6-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-07scsi: hisi_sas: Add helper to process bcast eventsJohn Garry5-13/+18
Add a helper for bcast processing to reduce duplication. Link: https://lore.kernel.org/r/1662378529-101489-5-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-07scsi: hisi_sas: Drain bcast events in hisi_sas_rescan_topology()John Garry1-0/+7
In resetting the controller, SATA devices may be lost. The issue is that when we insert the bcast events to rescan the topology in hisi_sas_rescan_topology(), when we subsequently nexus reset the SATA devices in hisi_sas_async_I_T_nexus_reset(), there is a small timing window in which the remote phy is down and we process the bcast event (meaning that libsas judges that the disk is lost). Ensure that all bcast events are processed prior to the nexus reset to close this window. Link: https://lore.kernel.org/r/1662378529-101489-4-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-07scsi: hisi_sas: Clear HISI_SAS_HW_FAULT_BIT earlierJohn Garry1-1/+1
Once the controller HW has been reset then we can unset flag HISI_SAS_HW_FAULT_BIT. In clearing this flag earlier we can now successfully execute commands in hisi_sas_controller_reset_done(), like bcast processing. Link: https://lore.kernel.org/r/1662378529-101489-3-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-07scsi: hisi_sas: Revert change to limit max hw sectors for v3 HWJohn Garry1-7/+0
Now that libsas and the SCSI core code limits the default sectors from commit 4cbfca5f7750 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit") and commit 608128d391fa ("scsi: sd: allow max_sectors be capped at DMA optimal size limit"), there is no need for the hack to limit the max HW sectors. Link: https://lore.kernel.org/r/1662378529-101489-2-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-08-22block: Change the return type of blk_mq_map_queues() into voidBart Van Assche2-7/+3
Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-19scsi: hisi_sas: Modify v3 HW SATA completion error processingXingui Yang1-1/+8
If the I/O completion response frame returned by the target device has been written to the host memory and the err bit in the status field of the received fis is 1, ts->stat should set to SAS_PROTO_RESPONSE, and this will let EH analyze and further determine cause of failure. Link: https://lore.kernel.org/r/1657823002-139010-5-git-send-email-john.garry@huawei.com Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-19scsi: hisi_sas: Relocate DMA unmap of SMP taskXiang Chen4-7/+5
Currently SMP tasks are DMA unmapped only when cq of SMP I/O is returned normally. If the cq of SMP I/O is returned with exception actually SMP TAS is never unmapped. Relocate DMA unmap of SMP task to fix the issue. Link: https://lore.kernel.org/r/1657823002-139010-4-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-19scsi: hisi_sas: Remove unnecessary variable to hold DMA map elementsXiang Chen1-25/+18
Use slot->n_elem to store the return value of dma_map_sg() for SSP and SMP IOs, and remove unnecessary variable n_elem_req. Link: https://lore.kernel.org/r/1657823002-139010-3-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-19scsi: hisi_sas: Call hisi_sas_slave_configure() from slave_configure_v3_hw()John Garry1-4/+1
There is duplicated code between slave_configure_v3_hw() and hisi_sas_slave_configure(), so call common function hisi_sas_slave_configure() from slave_configure_v3_hw(). Link: https://lore.kernel.org/r/1657823002-139010-2-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-08Merge branch '5.19/scsi-fixes' into 5.20/scsi-stagingMartin K. Petersen1-0/+7
Bring in fixes to resolve a merge conflict in the lpfc driver update. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-28scsi: hisi_sas: Limit max hw sectors for v3 HWJohn Garry1-0/+7
If the controller is behind an IOMMU then the IOMMU IOVA caching range can affect performance, as discussed in [0]. Limit the max HW sectors to not exceed this limit. We need to hardcode the value until a proper DMA mapping API is available. [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@huawei.com/ Link: https://lore.kernel.org/r/1655988119-223714-1-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-06-22scsi: hisi_sas: Align commentsJiang Jian1-2/+2
Properly align comment lines in slot_index_alloc_quirk_v2_hw(). Link: https://lore.kernel.org/r/20220621072405.34394-1-jiangjian@cdjrlc.com Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-20scsi: hisi_sas: Fix memory ordering in hisi_sas_task_deliver()John Garry1-0/+2
The memories for the slot should be observed to be written prior to observing the slot as ready. Prior to commit 26fc0ea74fcb ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR"), we had a spin_lock() + spin_unlock() immediately before marking the slot as ready. The spin_unlock() - with release semantics - caused the slot memory to be observed to be written. Now that the spin_lock() + spin_unlock() is gone, use a smp_wmb(). Link: https://lore.kernel.org/r/1652774661-12935-1-git-send-email-john.garry@huawei.com Fixes: 26fc0ea74fcb ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR") Reported-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Yihang Li <liyihang6@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-20scsi: hisi_sas: Fix rescan after deleting a diskJohn Garry1-29/+18
Removing an ATA device via sysfs means that the device may not be found through re-scanning: root@ubuntu:/home/john# lsscsi [0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda [0:0:1:0] disk ATA HGST HUS724040AL A8B0 /dev/sdb [0:0:8:0] enclosu 12G SAS Expander RevB - root@ubuntu:/home/john# echo 1 > /sys/block/sdb/device/delete root@ubuntu:/home/john# echo "- - -" > /sys/class/scsi_host/host0/scan root@ubuntu:/home/john# lsscsi [0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda [0:0:8:0] enclosu 12G SAS Expander RevB - root@ubuntu:/home/john# The problem is that the rescan of the device may conflict with the device in being re-initialized, as follows: - In the rescan we call hisi_sas_slave_alloc() in store_scan() -> sas_user_scan() -> [__]scsi_scan_target() -> scsi_probe_and_add_lunc() -> scsi_alloc_sdev() -> hisi_sas_slave_alloc() -> hisi_sas_init_device() In hisi_sas_init_device() we issue an IT nexus reset for ATA devices - That IT nexus causes the remote PHY to go down and this triggers a bcast event - In parallel libsas processes the bcast event, finds that the phy is down and marks the device as gone The hard reset issued in hisi_sas_init_device() is unncessary - as described in the code comment - so remove it. Also set dev status as HISI_SAS_DEV_NORMAL as the hisi_sas_init_device() call. Link: https://lore.kernel.org/r/1652354134-171343-4-git-send-email-john.garry@huawei.com Fixes: 36c6b7613ef1 ("scsi: hisi_sas: Initialise devices in .slave_alloc callback") Tested-by: Yihang Li <liyihang6@hisilicon.com> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-20scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus resetJohn Garry1-7/+12
We have seen errors like this when a SATA device is probed: [524.566298] hisi_sas_v3_hw 0000L74:02.0: erroneous completion iptt=4096 ... [524.582827] sas: TMF task open reject failed 500e004aaaaaaaa00 Since commit 21c7e972475e ("scsi: hisi_sas: Disable SATA disk phy for severe I_T nexus reset failure"), we issue an ATA softreset to disks after a phy reset to ensure that they are in sound working order. If the softreset is issued before the remote phy has come back up then the softreset will fail (errors as above). Remedy this by waiting for the phy to come back up after the reset. Link: https://lore.kernel.org/r/1652354134-171343-3-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-11scsi: hisi_sas: Undo RPM resume for failed notify phy event for v3 HWXiang Chen1-2/+8
If we fail to notify the phy up event then undo the RPM resume, as the phy up notify event handling pairs with that RPM resume. Link: https://lore.kernel.org/r/1651839939-101188-1-git-send-email-john.garry@huawei.com Reported-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Yihang Li <liyihang6@hisilicon.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-30scsi: hisi_sas: Remove stray fallthrough annotationDan Carpenter1-1/+0
This case statement doesn't fall through any more so remove the fallthrough annotation. Link: https://lore.kernel.org/r/20220317075214.GC25237@kili Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-15scsi: hisi_sas: Use libsas internal abort supportJohn Garry4-319/+171
Use the common libsas internal abort functionality. In addition, this driver has special handling for internal abort timeouts - specifically whether to reset the controller in that instance, so extend the API for that. Timeout is now increased to 20 * Hz from 6 * Hz. We also retry for failure now, but this should not make a difference. Link: https://lore.kernel.org/r/1647001432-239276-5-git-send-email-john.garry@huawei.com Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>