summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)AuthorFilesLines
2024-04-03scsi: libsas: Add a helper sas_get_sas_addr_and_dev_type()Xingui Yang1-7/+12
commit a57345279fd311ba679b8083feb0eec5272c7729 upstream. Add a helper to get attached_sas_addr and device type from disc_resp. Suggested-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Xingui Yang <yangxingui@huawei.com> Link: https://lore.kernel.org/r/20240307141413.48049-2-yangxingui@huawei.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: lpfc: Correct size for wqe for memset()Muhammad Usama Anjum1-1/+1
commit 28d41991182c210ec1654f8af2e140ef4cc73f20 upstream. The wqe is of type lpfc_wqe128. It should be memset with the same type. Fixes: 6c621a2229b0 ("scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240304090649.833953-1-usama.anjum@collabora.com Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Justin Tee <justintee8345@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: lpfc: Correct size for cmdwqe/rspwqe for memset()Muhammad Usama Anjum1-2/+2
commit 16cc2ba71b9f6440805aef7f92ba0f031f79b765 upstream. The cmdwqe and rspwqe are of type lpfc_wqe128. They should be memset() with the same type. Fixes: 61910d6a5243 ("scsi: lpfc: SLI path split: Refactor CT paths") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240304091119.847060-1-usama.anjum@collabora.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: Delay I/O Abort on PCI errorQuinn Tran1-2/+12
commit 591c1fdf2016d118b8fbde427b796fac13f3f070 upstream. Currently when PCI error is detected, I/O is aborted manually through the ABORT IOCB mechanism which is not guaranteed to succeed. Instead, wait for the OS or system to notify driver to wind down I/O through the pci_error_handlers api. Set eeh_busy flag to pause all traffic and wait for I/O to drain. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-11-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: Change debug message during driver unloadSaurav Kashyap1-1/+1
commit b5a30840727a3e41d12a336d19f6c0716b299161 upstream. Upon driver unload, purge_mbox flag is set and the heartbeat monitor thread detects this flag and does not send the mailbox command down to FW with a debug message "Error detected: purge[1] eeh[0] cmd=0x0, Exiting". This being not a real error, change the debug message. Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-10-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: Fix double free of fcportSaurav Kashyap1-2/+3
commit 82f522ae0d97119a43da53e0f729275691b9c525 upstream. The server was crashing after LOGO because fcport was getting freed twice. -----------[ cut here ]----------- kernel BUG at mm/slub.c:371! invalid opcode: 0000 1 SMP PTI CPU: 35 PID: 4610 Comm: bash Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.3.1.el8.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021 RIP: 0010:set_freepointer.part.57+0x0/0x10 RSP: 0018:ffffb07107027d90 EFLAGS: 00010246 RAX: ffff9cb7e3150000 RBX: ffff9cb7e332b9c0 RCX: ffff9cb7e3150400 RDX: 0000000000001f37 RSI: 0000000000000000 RDI: ffff9cb7c0005500 RBP: fffff693448c5400 R08: 0000000080000000 R09: 0000000000000009 R10: 0000000000000000 R11: 0000000000132af0 R12: ffff9cb7c0005500 R13: ffff9cb7e3150000 R14: ffffffffc06990e0 R15: ffff9cb7ea85ea58 FS: 00007ff6b79c2740(0000) GS:ffff9cb8f7ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b426b7d700 CR3: 0000000169c18002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: kfree+0x238/0x250 qla2x00_els_dcmd_sp_free+0x20/0x230 [qla2xxx] ? qla24xx_els_dcmd_iocb+0x607/0x690 [qla2xxx] qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx] ? qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx] ? kernfs_fop_write+0x11e/0x1a0 Remove one of the free calls and add check for valid fcport. Also use function qla2x00_free_fcport() instead of kfree(). Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-9-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: Fix double free of the ha->vp_map pointerSaurav Kashyap1-0/+1
commit e288285d47784fdcf7c81be56df7d65c6f10c58b upstream. Coverity scan reported potential risk of double free of the pointer ha->vp_map. ha->vp_map was freed in qla2x00_mem_alloc(), and again freed in function qla2x00_mem_free(ha). Assign NULL to vp_map and kfree take care of NULL. Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-8-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: Fix command flush on cable pullQuinn Tran1-0/+10
commit a27d4d0e7de305def8a5098a614053be208d1aa1 upstream. System crash due to command failed to flush back to SCSI layer. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 27 PID: 793455 Comm: kworker/u130:6 Kdump: loaded Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021 Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc] RIP: 0010:__wake_up_common+0x4c/0x190 Code: 24 10 4d 85 c9 74 0a 41 f6 01 04 0f 85 9d 00 00 00 48 8b 43 08 48 83 c3 08 4c 8d 48 e8 49 8d 41 18 48 39 c3 0f 84 f0 00 00 00 <49> 8b 41 18 89 54 24 08 31 ed 4c 8d 70 e8 45 8b 29 41 f6 c5 04 75 RSP: 0018:ffff95f3e0cb7cd0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff8b08d3b26328 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8b08d3b26320 RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffe8 R10: 0000000000000000 R11: ffff95f3e0cb7a60 R12: ffff95f3e0cb7d20 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8b2fdf6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000002f1e410002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: __wake_up_common_lock+0x7c/0xc0 qla_nvme_ls_req+0x355/0x4c0 [qla2xxx] qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae1407ca000 from port 21:32:00:02:ac:07:ee:b8 loop_id 0x02 s_id 01:02:00 logout 1 keep 0 els_logo 0 ? __nvme_fc_send_ls_req+0x260/0x380 [nvme_fc] qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:00:02:ac:07:ee:b8 state transitioned from ONLINE to LOST - portid=010200. ? nvme_fc_send_ls_req.constprop.42+0x1a/0x45 [nvme_fc] qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320002ac07eeb8. rport ffff8ae598122000 roles 1 ? nvme_fc_connect_ctrl_work.cold.63+0x1e3/0xa7d [nvme_fc] qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae14801e000 from port 21:32:01:02:ad:f7:ee:b8 loop_id 0x04 s_id 01:02:01 logout 1 keep 0 els_logo 0 ? __switch_to+0x10c/0x450 ? process_one_work+0x1a7/0x360 qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:01:02:ad:f7:ee:b8 state transitioned from ONLINE to LOST - portid=010201. ? worker_thread+0x1ce/0x390 ? create_worker+0x1a0/0x1a0 qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320102adf7eeb8. rport ffff8ae3b2312800 roles 70 ? kthread+0x10a/0x120 qla2xxx [0000:12:00.1]-2112:3: qla_nvme_unregister_remote_port: unregister remoteport on ffff8ae14801e000 21320102adf7eeb8 ? set_kthread_struct+0x40/0x40 qla2xxx [0000:12:00.1]-2110:3: remoteport_delete of ffff8ae14801e000 21320102adf7eeb8 completed. ? ret_from_fork+0x1f/0x40 qla2xxx [0000:12:00.1]-f086:3: qlt_free_session_done: waiting for sess ffff8ae14801e000 logout The system was under memory stress where driver was not able to allocate an SRB to carry out error recovery of cable pull. The failure to flush causes upper layer to start modifying scsi_cmnd. When the system frees up some memory, the subsequent cable pull trigger another command flush. At this point the driver access a null pointer when attempting to DMA unmap the SGL. Add a check to make sure commands are flush back on session tear down to prevent the null pointer access. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-7-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: NVME|FCP prefer flag not being honoredQuinn Tran1-0/+18
commit 69aecdd410106dc3a8f543a4f7ec6379b995b8d0 upstream. Changing of [FCP|NVME] prefer flag in flash has no effect on driver. For device that supports both FCP + NVMe over the same connection, driver continues to connect to this device using the previous successful login mode. On completion of flash update, adapter will be reset. Driver will reset the prefer flag based on setting from flash. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-6-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: Update manufacturer detailBikash Hazarika1-1/+1
commit 688fa069fda6fce24d243cddfe0c7024428acb74 upstream. Update manufacturer detail from "Marvell Semiconductor, Inc." to "Marvell". Cc: stable@vger.kernel.org Signed-off-by: Bikash Hazarika <bhazarika@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-5-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: Split FCE|EFT trace controlQuinn Tran1-61/+41
commit 76a192e1a566e15365704b9f8fb3b70825f85064 upstream. Current code combines the allocation of FCE|EFT trace buffers and enables the features all in 1 step. Split this step into separate steps in preparation for follow-on patch to allow user to have a choice to enable / disable FCE trace feature. Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-4-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: Fix N2N stuck connectionQuinn Tran3-23/+13
commit 881eb861ca3877300570db10abbf11494e48548d upstream. Disk failed to rediscover after chip reset error injection. The chip reset happens at the time when a PLOGI is being sent. This causes a flag to be left on which blocks the retry. Clear the blocking flag. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-3-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: qla2xxx: Prevent command send on chip resetQuinn Tran2-4/+37
commit 4895009c4bb72f71f2e682f1e7d2c2d96e482087 upstream. Currently IOCBs are allowed to push through while chip reset could be in progress. During chip reset the outstanding_cmds array is cleared twice. Once when any command on this array is returned as failed and secondly when the array is initialize to zero. If a command is inserted on to the array between these intervals, then the command will be lost. Check for chip reset before sending IOCB. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20240227164127.36465-2-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: core: Fix unremoved procfs host directory regressionGuilherme G. Piccoli1-3/+4
commit f23a4d6e07570826fe95023ca1aa96a011fa9f84 upstream. Commit fc663711b944 ("scsi: core: Remove the /proc/scsi/${proc_name} directory earlier") fixed a bug related to modules loading/unloading, by adding a call to scsi_proc_hostdir_rm() on scsi_remove_host(). But that led to a potential duplicate call to the hostdir_rm() routine, since it's also called from scsi_host_dev_release(). That triggered a regression report, which was then fixed by commit be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression"). The fix just dropped the hostdir_rm() call from dev_release(). But it happens that this proc directory is created on scsi_host_alloc(), and that function "pairs" with scsi_host_dev_release(), while scsi_remove_host() pairs with scsi_add_host(). In other words, it seems the reason for removing the proc directory on dev_release() was meant to cover cases in which a SCSI host structure was allocated, but the call to scsi_add_host() didn't happen. And that pattern happens to exist in some error paths, for example. Syzkaller causes that by using USB raw gadget device, error'ing on usb-storage driver, at usb_stor_probe2(). By checking that path, we can see that the BadDevice label leads to a scsi_host_put() after a SCSI host allocation, but there's no call to scsi_add_host() in such path. That leads to messages like this in dmesg (and a leak of the SCSI host proc structure): usb-storage 4-1:87.51: USB Mass Storage device detected proc_dir_entry 'scsi/usb-storage' already registered WARNING: CPU: 1 PID: 3519 at fs/proc/generic.c:377 proc_register+0x347/0x4e0 fs/proc/generic.c:376 The proper fix seems to still call scsi_proc_hostdir_rm() on dev_release(), but guard that with the state check for SHOST_CREATED; there is even a comment in scsi_host_dev_release() detailing that: such conditional is meant for cases where the SCSI host was allocated but there was no calls to {add,remove}_host(), like the usb-storage case. This is what we propose here and with that, the error path of usb-storage does not trigger the warning anymore. Reported-by: syzbot+c645abf505ed21f931b5@syzkaller.appspotmail.com Fixes: be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression") Cc: stable@vger.kernel.org Cc: Bart Van Assche <bvanassche@acm.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Link: https://lore.kernel.org/r/20240313113006.2834799-1-gpiccoli@igalia.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-03scsi: sd: Fix TCG OPAL unlock on system resumeDamien Le Moal2-4/+53
commit 0c76106cb97548810214def8ee22700bbbb90543 upstream. Commit 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") introduced the manage_system_start_stop scsi_device flag to allow libata to indicate to the SCSI disk driver that nothing should be done when resuming a disk on system resume. This change turned the execution of sd_resume() into a no-op for ATA devices on system resume. While this solved deadlock issues during device resume, this change also wrongly removed the execution of opal_unlock_from_suspend(). As a result, devices with TCG OPAL locking enabled remain locked and inaccessible after a system resume from sleep. To fix this issue, introduce the SCSI driver resume method and implement it with the sd_resume() function calling opal_unlock_from_suspend(). The former sd_resume() function is renamed to sd_resume_common() and modified to call the new sd_resume() function. For non-ATA devices, this result in no functional changes. In order for libata to explicitly execute sd_resume() when a device is resumed during system restart, the function scsi_resume_device() is introduced. libata calls this function from the revalidation work executed on devie resume, a state that is indicated with the new device flag ATA_DFLAG_RESUMING. Doing so, locked TCG OPAL enabled devices are unlocked on resume, allowing normal operation. Fixes: 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218538 Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240319071209.1179257-1-dlemoal@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-27scsi: bfa: Fix function pointer type mismatch for hcb_qe->cbfnArnd Bergmann4-13/+19
[ Upstream commit b69600231f751304db914c63b937f7098ed2895c ] Some callback functions used here take a boolean argument, others take a status argument. This breaks KCFI type checking, so clang now warns about the function pointer cast: drivers/scsi/bfa/bfad_bsg.c:2138:29: error: cast from 'void (*)(void *, enum bfa_status)' to 'bfa_cb_cbfn_t' (aka 'void (*)(void *, enum bfa_boolean)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] Assuming the code is actually correct here and the callers always match the argument types of the callee, rework this to replace the explicit cast with a union of the two pointer types. This does not change the behavior of the code, so if something is actually broken here, a larger rework may be necessary. Fixes: 37ea0558b87a ("[SCSI] bfa: Added support to collect and reset fcport stats") Fixes: 3ec4f2c8bff2 ("[SCSI] bfa: Added support to configure QOS and collect stats.") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240222124433.2046570-1-arnd@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27scsi: csiostor: Avoid function pointer castsArnd Bergmann3-19/+20
[ Upstream commit 9f3dbcb5632d6876226031d552ef6163bb3ad215 ] csiostor uses function pointer casts to keep the csio_ln_ev state machine hidden, but this causes warnings about control flow integrity (KCFI) violations in clang-16 and higher: drivers/scsi/csiostor/csio_lnode.c:1098:33: error: cast from 'void (*)(struct csio_lnode *, enum csio_ln_ev)' to 'csio_sm_state_t' (aka 'void (*)(void *, unsigned int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1098 | return (csio_get_state(ln) == ((csio_sm_state_t)csio_lns_ready)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/scsi/csiostor/csio_lnode.c:1369:29: error: cast from 'void (*)(struct csio_lnode *, enum csio_ln_ev)' to 'csio_sm_state_t' (aka 'void (*)(void *, unsigned int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1369 | if (csio_get_state(ln) == ((csio_sm_state_t)csio_lns_uninit)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/scsi/csiostor/csio_lnode.c:1373:29: error: cast from 'void (*)(struct csio_lnode *, enum csio_ln_ev)' to 'csio_sm_state_t' (aka 'void (*)(void *, unsigned int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1373 | if (csio_get_state(ln) == ((csio_sm_state_t)csio_lns_ready)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/scsi/csiostor/csio_lnode.c:1377:29: error: cast from 'void (*)(struct csio_lnode *, enum csio_ln_ev)' to 'csio_sm_state_t' (aka 'void (*)(void *, unsigned int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] 1377 | if (csio_get_state(ln) == ((csio_sm_state_t)csio_lns_offline)) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Move the enum into a shared header so the correct types can be used without the need for casts. Fixes: a3667aaed569 ("[SCSI] csiostor: Chelsio FCoE offload driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240213100518.457623-1-arnd@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-27scsi: mpt3sas: Prevent sending diag_reset when the controller is readyRanjan Kumar1-1/+3
[ Upstream commit ee0017c3ed8a8abfa4d40e42f908fb38c31e7515 ] If the driver detects that the controller is not ready before sending the first IOC facts command, it will wait for a maximum of 10 seconds for it to become ready. However, even if the controller becomes ready within 10 seconds, the driver will still issue a diagnostic reset. Modify the driver to avoid sending a diag reset if the controller becomes ready within the 10-second wait time. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20240221071724.14986-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01scsi: jazz_esp: Only build if SCSI core is builtinRandy Dunlap1-1/+1
[ Upstream commit 9ddf190a7df77b77817f955fdb9c2ae9d1c9c9a3 ] JAZZ_ESP is a bool kconfig symbol that selects SCSI_SPI_ATTRS. When CONFIG_SCSI=m, this results in SCSI_SPI_ATTRS=m while JAZZ_ESP=y, which causes many undefined symbol linker errors. Fix this by only offering to build this driver when CONFIG_SCSI=y. [mkp: JAZZ_ESP is unique in that it does not support being compiled as a module unlike the remaining SPI SCSI HBA drivers] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240214055953.9612-1-rdunlap@infradead.org Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: linux-mips@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Cc: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402112222.Gl0udKyU-lkp@intel.com/ Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01scsi: smartpqi: Fix disable_managed_interruptsDon Brace1-1/+4
[ Upstream commit 5761eb9761d2d5fe8248a9b719efc4d8baf1f24a ] Correct blk-mq registration issue with module parameter disable_managed_interrupts enabled. When we turn off the default PCI_IRQ_AFFINITY flag, the driver needs to register with blk-mq using blk_mq_map_queues(). The driver is currently calling blk_mq_pci_map_queues() which results in a stack trace and possibly undefined behavior. Stack Trace: [ 7.860089] scsi host2: smartpqi [ 7.871934] WARNING: CPU: 0 PID: 238 at block/blk-mq-pci.c:52 blk_mq_pci_map_queues+0xca/0xd0 [ 7.889231] Modules linked in: sd_mod t10_pi sg uas smartpqi(+) crc32c_intel scsi_transport_sas usb_storage dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [ 7.924755] CPU: 0 PID: 238 Comm: kworker/0:3 Not tainted 4.18.0-372.88.1.el8_6_smartpqi_test.x86_64 #1 [ 7.944336] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 03/08/2022 [ 7.963026] Workqueue: events work_for_cpu_fn [ 7.978275] RIP: 0010:blk_mq_pci_map_queues+0xca/0xd0 [ 7.978278] Code: 48 89 de 89 c7 e8 f6 0f 4f 00 3b 05 c4 b7 8e 01 72 e1 5b 31 c0 5d 41 5c 41 5d 41 5e 41 5f e9 7d df 73 00 31 c0 e9 76 df 73 00 <0f> 0b eb bc 90 90 0f 1f 44 00 00 41 57 49 89 ff 41 56 41 55 41 54 [ 7.978280] RSP: 0018:ffffa95fc3707d50 EFLAGS: 00010216 [ 7.978283] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000010 [ 7.978284] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff9190c32d4310 [ 7.978286] RBP: 0000000000000000 R08: ffffa95fc3707d38 R09: ffff91929b81ac00 [ 7.978287] R10: 0000000000000001 R11: ffffa95fc3707ac0 R12: 0000000000000000 [ 7.978288] R13: ffff9190c32d4000 R14: 00000000ffffffff R15: ffff9190c4c950a8 [ 7.978290] FS: 0000000000000000(0000) GS:ffff9193efc00000(0000) knlGS:0000000000000000 [ 7.978292] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.172814] CR2: 000055d11166c000 CR3: 00000002dae10002 CR4: 00000000007706f0 [ 8.172816] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8.172817] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8.172818] PKRU: 55555554 [ 8.172819] Call Trace: [ 8.172823] blk_mq_alloc_tag_set+0x12e/0x310 [ 8.264339] scsi_add_host_with_dma.cold.9+0x30/0x245 [ 8.279302] pqi_ctrl_init+0xacf/0xc8e [smartpqi] [ 8.294085] ? pqi_pci_probe+0x480/0x4c8 [smartpqi] [ 8.309015] pqi_pci_probe+0x480/0x4c8 [smartpqi] [ 8.323286] local_pci_probe+0x42/0x80 [ 8.337855] work_for_cpu_fn+0x16/0x20 [ 8.351193] process_one_work+0x1a7/0x360 [ 8.364462] ? create_worker+0x1a0/0x1a0 [ 8.379252] worker_thread+0x1ce/0x390 [ 8.392623] ? create_worker+0x1a0/0x1a0 [ 8.406295] kthread+0x10a/0x120 [ 8.418428] ? set_kthread_struct+0x50/0x50 [ 8.431532] ret_from_fork+0x1f/0x40 [ 8.444137] ---[ end trace 1bf0173d39354506 ]--- Fixes: cf15c3e734e8 ("scsi: smartpqi: Add module param to disable managed ints") Tested-by: Yogesh Chandra Pandey <YogeshChandra.Pandey@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20240213162200.1875970-2-don.brace@microchip.com Reviewed-by: Tomas Henzl <thenzl@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01scsi: core: Consult supported VPD page list prior to fetching pageMartin K. Petersen1-2/+20
commit b5fc07a5fb56216a49e6c1d0b172d5464d99a89b upstream. Commit c92a6b5d6335 ("scsi: core: Query VPD size before getting full page") removed the logic which checks whether a VPD page is present on the supported pages list before asking for the page itself. That was done because SPC helpfully states "The Supported VPD Pages VPD page list may or may not include all the VPD pages that are able to be returned by the device server". Testing had revealed a few devices that supported some of the 0xBn pages but didn't actually list them in page 0. Julian Sikorski bisected a problem with his drive resetting during discovery to the commit above. As it turns out, this particular drive firmware will crash if we attempt to fetch page 0xB9. Various approaches were attempted to work around this. In the end, reinstating the logic that consults VPD page 0 before fetching any other page was the path of least resistance. A firmware update for the devices which originally compelled us to remove the check has since been released. Link: https://lore.kernel.org/r/20240214221411.2888112-1-martin.petersen@oracle.com Fixes: c92a6b5d6335 ("scsi: core: Query VPD size before getting full page") Cc: stable@vger.kernel.org Cc: Bart Van Assche <bvanassche@acm.org> Reported-by: Julian Sikorski <belegdol@gmail.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Reviewed-by: Lee Duncan <lee.duncan@suse.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01scsi: sd: usb_storage: uas: Access media prior to querying device propertiesMartin K. Petersen1-1/+25
commit 321da3dc1f3c92a12e3c5da934090d2992a8814c upstream. It has been observed that some USB/UAS devices return generic properties hardcoded in firmware for mode pages for a period of time after a device has been discovered. The reported properties are either garbage or they do not accurately reflect the characteristics of the physical storage device attached in the case of a bridge. Prior to commit 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to avoid calling revalidate twice") we would call revalidate several times during device discovery. As a result, incorrect values would eventually get replaced with ones accurately describing the attached storage. When we did away with the redundant revalidate pass, several cases were reported where devices reported nonsensical values or would end up in write-protected state. An initial attempt at addressing this issue involved introducing a delayed second revalidate invocation. However, this approach still left some devices reporting incorrect characteristics. Tasos Sahanidis debugged the problem further and identified that introducing a READ operation prior to MODE SENSE fixed the problem and that it wasn't a timing issue. Issuing a READ appears to cause the devices to update their state to reflect the actual properties of the storage media. Device properties like vendor, model, and storage capacity appear to be correctly reported from the get-go. It is unclear why these devices defer populating the remaining characteristics. Match the behavior of a well known commercial operating system and trigger a READ operation prior to querying device characteristics to force the device to populate the mode pages. The additional READ is triggered by a flag set in the USB storage and UAS drivers. We avoid issuing the READ for other transport classes since some storage devices identify Linux through our particular discovery command sequence. Link: https://lore.kernel.org/r/20240213143306.2194237-1-martin.petersen@oracle.com Fixes: 1e029397d12f ("scsi: sd: Reorganize DIF/DIX code to avoid calling revalidate twice") Cc: stable@vger.kernel.org Reported-by: Tasos Sahanidis <tasos@tasossah.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Tasos Sahanidis <tasos@tasossah.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01scsi: lpfc: Use unsigned type for num_sgeHannes Reinecke1-6/+6
[ Upstream commit d6c1b19153f92e95e5e1801d540e98771053afae ] LUNs going into "failed ready running" state observed on >1T and on even numbers of size (2T, 4T, 6T, 8T and 10T). The issue occurs when DIF is enabled at the host. The kernel logs: Cannot setup S/G List for HBAIO segs 1/1 SGL 512 SCSI 256: 3 0 The host lpfc driver is failing to setup scatter/gather list (protection data) for the I/Os. The return type lpfc_bg_setup_sgl()/lpfc_bg_setup_sgl_prot() causes the compiler to remove the most significant bit. Use an unsigned type instead. Signed-off-by: Hannes Reinecke <hare@suse.de> [dwagner: added commit message] Signed-off-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20231220162658.12392-1-dwagner@suse.de Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01scsi: smartpqi: Fix logical volume rescan race conditionMahesh Rajashekhara2-8/+36
[ Upstream commit fb4cece17b4583f55b34a8538e27a4adc833c9d4 ] Correct rescan flag race condition. Multiple conditions are being evaluated before notifying OS to do a rescan. Driver will skip rescanning the device if any one of the following conditions are met: - Devices that have not yet been added to the OS or devices that have been removed. - Devices which are already marked for removal or in the phase of removal. Under very rare conditions, after logical volume size expansion, the OS still sees the size of the logical volume which was before expansion. The rescan flag in the driver is used to signal the need for a logical volume rescan. A race condition can occur in the driver, and it leads to one thread overwriting the flag inadvertently. As a result, driver is not notifying the OS SML to rescan the logical volume. Move device->rescan update into new function pqi_mark_volumes_for_rescan() and protect with a spin lock. Move check for device->rescan into new function pqi_volume_rescan_needed() and protect function call with a spin_lock. Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Co-developed-by: Murthy Bhat <Murthy.Bhat@microchip.com> Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20231219193653.277553-3-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01scsi: smartpqi: Add new controller PCI IDsDavid Strahan1-0/+40
[ Upstream commit c6d5aa44eaf6d119f9ceb3bfc7d22405ac04232a ] All PCI ID entries in Hex. Add PCI IDs for Cisco controllers: VID / DID / SVID / SDID ---- ---- ---- ---- Cisco 24G TriMode M1 RAID 4GB FBWC 32D 9005 / 028f / 1137 / 02f8 Cisco 24G TriMode M1 RAID 4GB FBWC 16D 9005 / 028f / 1137 / 02f9 Cisco 24G TriMode M1 HBA 16D 9005 / 028f / 1137 / 02fa Add PCI IDs for CloudNine controllers: VID / DID / SVID / SDID ---- ---- ---- ---- SmartRAID P7604N-16i 9005 / 028f / 1f51 / 100e SmartRAID P7604N-8i 9005 / 028f / 1f51 / 100f SmartRAID P7504N-16i 9005 / 028f / 1f51 / 1010 SmartRAID P7504N-8i 9005 / 028f / 1f51 / 1011 SmartRAID P7504N-8i 9005 / 028f / 1f51 / 1043 SmartHBA P6500-8i 9005 / 028f / 1f51 / 1044 SmartRAID P7504-8i 9005 / 028f / 1f51 / 1045 Reviewed-by: Murthy Bhat <Murthy.Bhat@microchip.com> Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: David Strahan <david.strahan@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20231219193653.277553-2-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"Lee Duncan1-12/+8
commit 977fe773dcc7098d8eaf4ee6382cb51e13e784cb upstream. This reverts commit 1a1975551943f681772720f639ff42fbaa746212. This commit causes interrupts to be lost for FCoE devices, since it changed sping locks from "bh" to "irqsave". Instead, a work queue should be used, and will be addressed in a separate commit. Fixes: 1a1975551943 ("scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock") Signed-off-by: Lee Duncan <lduncan@suse.com> Link: https://lore.kernel.org/r/c578cdcd46b60470535c4c4a953e6a1feca0dffd.1707500786.git.lduncan@suse.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23scsi: storvsc: Fix ring buffer size calculationMichael Kelley1-5/+7
commit f4469f3858352ad1197434557150b1f7086762a0 upstream. Current code uses the specified ring buffer size (either the default of 128 Kbytes or a module parameter specified value) to encompass the one page ring buffer header plus the actual ring itself. When the page size is 4K, carving off one page for the header isn't significant. But when the page size is 64K on ARM64, only half of the default 128 Kbytes is left for the actual ring. While this doesn't break anything, the smaller ring size could be a performance bottleneck. Fix this by applying the VMBUS_RING_SIZE macro to the specified ring buffer size. This macro adds a page for the header, and rounds up the size to a page boundary, using the page size for which the kernel is built. Use this new size for subsequent ring buffer calculations. For example, on ARM64 with 64K page size and the default ring size, this results in the actual ring being 128 Kbytes, which is intended. Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20240122170956.496436-1-mhklinux@outlook.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-16scsi: core: Move scsi_host_busy() out of host lock if it is for per-commandMing Lei2-2/+5
[ Upstream commit 4e6c9011990726f4d175e2cdfebe5b0b8cce4839 ] Commit 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler") intended to fix a hard lockup issue triggered by EH. The core idea was to move scsi_host_busy() out of the host lock when processing individual commands for EH. However, a suggested style change inadvertently caused scsi_host_busy() to remain under the host lock. Fix this by calling scsi_host_busy() outside the lock. Fixes: 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler") Cc: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240203024521.2006455-1-ming.lei@redhat.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: core: Move scsi_host_busy() out of host lock for waking up EH handlerMing Lei3-6/+6
[ Upstream commit 4373534a9850627a2695317944898eb1283a2db0 ] Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host lock every time for deciding if error handler kthread needs to be waken up. This can be too heavy in case of recovery, such as: - N hardware queues - queue depth is M for each hardware queue - each scsi_host_busy() iterates over (N * M) tag/requests If recovery is triggered in case that all requests are in-flight, each scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called for the last in-flight request, scsi_host_busy() has been run for (N * M - 1) times, and request has been iterated for (N*M - 1) * (N * M) times. If both N and M are big enough, hard lockup can be triggered on acquiring host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169). Fix the issue by calling scsi_host_busy() outside the host lock. We don't need the host lock for getting busy count because host the lock never covers that. [mkp: Drop unnecessary 'busy' variables pointed out by Bart] Cc: Ewan Milne <emilne@redhat.com> Fixes: 6eb045e092ef ("scsi: core: avoid host-wide host_busy counter for scsi_mq") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com> Tested-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: isci: Fix an error code problem in isci_io_request_build()Su Hui1-1/+1
[ Upstream commit 658365c6b0857e6a306436e315a8633937e3af42 ] Clang static complains that Value stored to 'status' is never read. Return 'status' rather than 'SCI_SUCCESS'. Fixes: f1f52e75939b ("isci: uplevel request infrastructure") Signed-off-by: Su Hui <suhui@nfschina.com> Link: https://lore.kernel.org/r/20240112041926.3924315-1-suhui@nfschina.com Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: hisi_sas: Set .phy_attached before notifing phyup event ↵Yihang Li1-5/+5
HISI_PHYE_PHY_UP_PM [ Upstream commit ce26497c745d0541aec930d5211b431a1c26af97 ] Currently in directly attached scenario, the phyup event HISI_PHYE_PHY_UP_PM is notified before .phy_attached is set - this may cause the phyup work hisi_sas_bytes_dmaed() execution failed and the attached device will not be found. To fix it, set .phy_attached before notifing phyup event. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1702525516-51258-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: lpfc: Move determination of vmid_flag after VMID reinitialization ↵Justin Tee2-3/+4
completes [ Upstream commit aba0fb0ef607a71511d23a07406f11130c1a54c5 ] If priority tagging is set in the service parameters of a FLOGI cmpl, then we update the vmid_flag. In the current logic, if a follow up FLOGI cmpl updates its service parameters such that priority tagging is no longer set, then the vmid_flag ends up keeping stale data. Fix by ensuring we clear the vmid_flag member during lpfc_reinit_vmid, and check the priority tagging service parameter after reinitialization of the vmid data structures. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20231207224039.35466-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: lpfc: Reinitialize an NPIV's VMID data structures after FDISCJustin Tee1-0/+8
[ Upstream commit 8dc8eb89f4df74593ba4bf30c3d31a0fc6d3ea47 ] After a follow up FDISC cmpl, an NPIV's VMID data structures are not updated. Fix by calling lpfc_reinit_vmid and copying the physical port's vmid_flag to the NPIV's vmid_flag in the NPIV registration cmpl code path. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20231207224039.35466-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: libfc: Fix up timeout error in fc_fcp_rec_error()Hannes Reinecke1-1/+1
[ Upstream commit 53122a49f49796beb2c4a1bb702303b66347e29f ] We should set the status to FC_TIMED_OUT when a timeout error is passed to fc_fcp_rec_error(). Signed-off-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20231129165832.224100-3-hare@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: libfc: Don't schedule abort twiceHannes Reinecke1-5/+11
[ Upstream commit b57c4db5d23b9df0118a25e2441c9288edd73710 ] The current FC error recovery is sending up to three REC (recovery) frames in 10 second intervals, and as a final step sending an ABTS after 30 seconds for the command itself. Unfortunately sending an ABTS is also the action for the SCSI abort handler, and the default timeout for SCSI commands is also 30 seconds. This causes two ABTS to be scheduled, with the libfc one slightly earlier. The ABTS scheduled by SCSI EH then sees the command to be already aborted, and will always return with a 'GOOD' status irrespective on the actual result from the first ABTS. This causes the SCSI EH abort handler to always succeed, and SCSI EH never to be engaged. Fix this by not issuing an ABTS when a SCSI command is present for the exchange, but rather wait for the abort scheduled from SCSI EH. And warn if an abort is already scheduled to avoid similar errors in the future. Signed-off-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20231129165832.224100-2-hare@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: arcmsr: Support new PCI device IDs 1883 and 1886ching Huang2-0/+10
[ Upstream commit 41c8a1a1e90fa4721f856bf3cf71211fd16d6434 ] Add support for Areca RAID controllers with PCI device IDs 1883 and 1886. Signed-off-by: ching Huang <ching2048@areca.com.tw> Link: https://lore.kernel.org/r/7732e743eaad57681b1552eec9c6a86c76dbe459.camel@areca.com.tw Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: mpi3mr: Add PCI checks where SAS5116 diverges from SAS4116Sumit Saxena2-2/+6
[ Upstream commit c9260ff28ee561fca5f96425c9328a9698e8427b ] Add PCI IDs checks for the cases where SAS5116 diverges from SAS4116 in behavior. Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Link: https://lore.kernel.org/r/20231123160132.4155-3-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: mpi3mr: Add support for SAS5116 PCI IDsSumit Saxena1-0/+8
[ Upstream commit 6fa21eab82be57a3ad2470fac27b982793805336 ] Add support for Broadcom's SAS5116 IO/RAID controllers PCI IDs. Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Link: https://lore.kernel.org/r/20231123160132.4155-2-sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05scsi: lpfc: Fix possible file string name overflow when updating firmwareJustin Tee2-2/+3
[ Upstream commit f5779b529240b715f0e358489ad0ed933bf77c97 ] Because file_name and phba->ModelName are both declared a size 80 bytes, the extra ".grp" file extension could cause an overflow into file_name. Define a ELX_FW_NAME_SIZE macro with value 84. 84 incorporates the 4 extra characters from ".grp". file_name is changed to be declared as a char and initialized to zeros i.e. null chars. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20231031191224.150862-3-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-01scsi: core: Kick the requeue list after inserting when flushingNiklas Cassel1-3/+6
[ Upstream commit 6df0e077d76bd144c533b61d6182676aae6b0a85 ] When libata calls ata_link_abort() to abort all ata queued commands, it calls blk_abort_request() on the SCSI command representing each QC. This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add() for each SCSI command. scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds the command to shost->eh_cmd_q. This will wake up the SCSI EH, and eventually the libata EH strategy handler will be called, which calls scsi_eh_flush_done_q() to either flush retry or flush finish each failed command. The commands that are flush retried by scsi_eh_flush_done_q() are done so using scsi_queue_insert(). Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with the second argument set to true, indicating that it should always kick/run the requeue list after inserting. After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary"), __scsi_queue_insert() does not kick/run the requeue list after inserting, if the current SCSI host state is recovery (which is the case in the libata example above). This optimization is probably fine in most cases, as I can only assume that most often someone will eventually kick/run the queues. However, that is not the case for scsi_eh_flush_done_q(), where we can see that the request gets inserted to the requeue list, but the queue is never started after the request has been inserted, leading to the block layer waiting for the completion of command that never gets to run. Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host state is most likely always in recovery when this function is called. Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after inserting a flush retry command, so that scsi_eh_flush_done_q() keeps the same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary"). Simple reproducer for the libata example above: $ hdparm -Y /dev/sda $ echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/delete Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary") Reported-by: Kevin Locke <kevin@kevinlocke.name> Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@kevinlocke.name/ Signed-off-by: Niklas Cassel <cassel@kernel.org> Link: https://lore.kernel.org/r/20240111120533.3612509-1-cassel@kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26scsi: mpi3mr: Block PEL Enable Command on Controller Reset and Unrecoverable ↵Chandrakanth patil1-0/+16
State commit f8fb3f39148e8010479e4b2003ba4728818ec661 upstream. If a controller reset is underway or the controller is in an unrecoverable state, the PEL enable management command will be returned as EAGAIN or EFAULT. Cc: <stable@vger.kernel.org> # v6.1+ Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com> Link: https://lore.kernel.org/r/20231126053134.10133-4-chandrakanth.patil@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-26scsi: mpi3mr: Clean up block devices post controller resetChandrakanth patil1-2/+7
commit c01d515687e358b22aa8414d6dac60d7defa6eb9 upstream. After a controller reset, if the firmware changes the state of devices to "hide", then remove those devices from the OS. Cc: <stable@vger.kernel.org> # v6.6+ Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com> Link: https://lore.kernel.org/r/20231126053134.10133-3-chandrakanth.patil@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-26scsi: mpi3mr: Refresh sdev queue depth after controller resetChandrakanth patil1-2/+8
commit e5aab848dfdf7996d20ece4d28d2733c732c5e5a upstream. After a controller reset, the firmware may modify the device queue depth. Therefore, update the device queue depth accordingly. Cc: <stable@vger.kernel.org> # v5.15+ Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com> Link: https://lore.kernel.org/r/20231126053134.10133-2-chandrakanth.patil@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-26scsi: hisi_sas: Correct the number of global debugfs registersYihang Li1-1/+1
[ Upstream commit 73e33f969ef05328766b23a99b2c07bfff765009 ] In function debugfs_debugfs_snapshot_global_reg_v3_hw() it uses debugfs_axi_reg.count (which is the number of axi debugfs registers) to acquire the number of global debugfs registers. Use debugfs_global_reg.count to acquire the number of global debugfs registers instead. Fixes: 623a4b6d5c2a ("scsi: hisi_sas: Move debugfs code to v3 hw driver") Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1702525516-51258-6-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26scsi: hisi_sas: Rollback some operations if FLR failedYihang Li1-0/+5
[ Upstream commit 7ea3e7763c50b20a8bd25cf524ea0c6463de69be ] We obtain the semaphore and set HISI_SAS_RESETTING_BIT in hisi_sas_reset_prepare_v3_hw(), block the scsi host and set HISI_SAS_REJECT_CMD_BIT in hisi_sas_controller_reset_prepare(), released them in hisi_sas_controller_reset_done(). However, if the HW reset failure in FLR results in early return, the semaphore and flag bits will not be release. Rollback some operations including clearing flags / releasing semaphore when FLR is failed. Fixes: e5ea48014adc ("scsi: hisi_sas: Implement handlers of PCIe FLR for v3 hw") Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1702525516-51258-5-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26scsi: hisi_sas: Check before using pointer variablesYihang Li1-2/+5
[ Upstream commit 8dd10296be8562a45c6c6794dd492a2b7dccede8 ] In commit 4b329abc9180 ("scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()"), we move the variables slot to the function head. However, the variable slot may be NULL, we should check it in each branch. Fixes: 4b329abc9180 ("scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()") Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1702525516-51258-4-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26scsi: hisi_sas: Replace with standard error code return valueYihang Li2-3/+3
[ Upstream commit d34ee535705eb43885bc0f561c63046f697355ad ] In function hisi_sas_controller_prereset(), -ENOSYS (Function not implemented) should be returned if the driver does not support .soft_reset. Returns -EPERM (Operation not permitted) if HISI_SAS_RESETTING_BIT is already be set. In function _suspend_v3_hw(), returns -EPERM (Operation not permitted) if HISI_SAS_RESETTING_BIT is already be set. Fixes: 4522204ab218 ("scsi: hisi_sas: tidy host controller reset function a bit") Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1702525516-51258-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26scsi: fnic: Return error if vmalloc() failedArtem Chernyshev1-1/+2
[ Upstream commit f5f27a332a14f43463aa0075efa3a0c662c0f4a8 ] In fnic_init_module() exists redundant check for return value from fnic_debugfs_init(), because at moment it only can return zero. It make sense to process theoretical vmalloc() failure. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 9730ddfb123d ("scsi: fnic: remove redundant assignment of variable rc") Signed-off-by: Artem Chernyshev <artem.chernyshev@red-soft.ru> Link: https://lore.kernel.org/r/20231128111008.2280507-1-artem.chernyshev@red-soft.ru Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-26scsi: bfa: Use the proper data type for BLIST flagsBart Van Assche1-1/+1
[ Upstream commit 0349be31e4ffc79723e46e2e373569567b06347b ] Fix the following sparse warning: drivers/scsi/bfa/bfad_bsg.c:2553:50: sparse: sparse: incorrect type in initializer (different base types) Fixes: 2e5a6c3baccd ("scsi: bfa: Convert bfad_reset_sdev_bflags() from a macro into a function") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311031255.lmSPisIk-lkp@intel.com/ Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20231115193338.2261972-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01scsi: core: Always send batch on reset or error handling commandAlexander Atanasov1-0/+2
commit 066c5b46b6eaf2f13f80c19500dbb3b84baabb33 upstream. In commit 8930a6c20791 ("scsi: core: add support for request batching") the block layer bd->last flag was mapped to SCMD_LAST and used as an indicator to send the batch for the drivers that implement this feature. However, the error handling code was not updated accordingly. scsi_send_eh_cmnd() is used to send error handling commands and request sense. The problem is that request sense comes as a single command that gets into the batch queue and times out. As a result the device goes offline after several failed resets. This was observed on virtio_scsi during a device resize operation. [ 496.316946] sd 0:0:4:0: [sdd] tag#117 scsi_eh_0: requesting sense [ 506.786356] sd 0:0:4:0: [sdd] tag#117 scsi_send_eh_cmnd timeleft: 0 [ 506.787981] sd 0:0:4:0: [sdd] tag#117 abort To fix this always set SCMD_LAST flag in scsi_send_eh_cmnd() and scsi_reset_ioctl(). Fixes: 8930a6c20791 ("scsi: core: add support for request batching") Cc: <stable@vger.kernel.org> Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com> Link: https://lore.kernel.org/r/20231215121008.2881653-1-alexander.atanasov@virtuozzo.com Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>