summaryrefslogtreecommitdiff
path: root/drivers/scsi/lpfc/lpfc_hbadisc.c
AgeCommit message (Collapse)AuthorFilesLines
2024-08-03scsi: lpfc: Fix unsolicited FLOGI kref imbalance when in direct attached ↵Justin Tee1-2/+9
topology In direct attached topology, certain target vendors that are quick to issue FLOGI followed by a cable pull for more than dev_loss_tmo may result in a kref imbalance for the remote port ndlp object. Add an nlp_get when the defer_flogi_acc flag is set. This is expected to balance the nlp_put in the defer_flogi_acc clause in the lpfc_issue_els_flogi() routine. Because we need to retain the ndlp ptr, reorganize all of the defer_flogi_acc information into one lpfc_defer_flogi_acc struct. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240726231512.92867-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-08-03scsi: lpfc: Validate hdwq pointers before dereferencing in reset/errata pathsJustin Tee1-1/+2
When the HBA is undergoing a reset or is handling an errata event, NULL ptr dereference crashes may occur in routines such as lpfc_sli_flush_io_rings(), lpfc_dev_loss_tmo_callbk(), or lpfc_abort_handler(). Add NULL ptr checks before dereferencing hdwq pointers that may have been freed due to operations colliding with a reset or errata event handler. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240726231512.92867-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-05scsi: lpfc: Fix handling of fully recovered fabric node in dev_loss callbkJustin Tee1-0/+8
In rare cases when a fabric node is recovered after a link bounce and before dev_loss_tmo callbk is reached, the driver may leave the fabric node in an inconsistent state with the NLP_IN_DEV_LOSS flag perpetually set. In lpfc_dev_loss_tmo_callbk, a check is added for a recovered fabric node. If the node is recovered, then don't queue the lpfc_dev_loss_tmo_handler work. In lpfc_dev_loss_tmo_handler, the path taken for the recovered fabric nodes is updated to clear the NLP_IN_DEV_LOSS flag. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-07-05scsi: lpfc: Allow DEVICE_RECOVERY mode after RSCN receipt if in PRLI_ISSUE stateJustin Tee1-1/+1
Certain vendor specific targets initially register with the fabric as an initiator function first and then re-register as a target function afterwards. The timing of the target function re-registration can cause a race condition such that the driver is stuck assuming the remote port as an initiator function and never discovers the target's hosted LUNs. Expand the nlp_state qualifier to also include NLP_STE_PRLI_ISSUE because the state means that PRLI was issued but we have not quite reached MAPPED_NODE state yet. If we received an RSCN in the PRLI_ISSUE state, then we should restart discovery again by going into DEVICE_RECOVERY. Fixes: dded1dc31aa4 ("scsi: lpfc: Modify when a node should be put in device recovery mode during RSCN") Cc: <stable@vger.kernel.org> # v6.6+ Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240628172011.25921-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-07scsi: lpfc: Change lpfc_hba hba_flag member into a bitmaskJustin Tee1-72/+61
In attempt to reduce the amount of unnecessary phba->hbalock acquisitions in the lpfc driver, change hba_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on phba->hbalock for synchronization. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240429221547.6842-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-03-11scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptrJustin Tee1-5/+5
In LPFC_MBOXQ_t, the ctx_buf ptr shouldn't be defined as a generic void *ptr. It is named ctx_buf and it should only be used as an lpfc_dmabuf *ptr. Due to the void* declaration, there have been abuses of ctx_buf for things not related to lpfc_dmabuf. So, set the ptr type for *ctx_buf as lpfc_dmabuf. Remove all type casts on ctx_buf because it is no longer a void *ptr. Convert the abuse of ctx_buf for something not related to lpfc_dmabuf to use the void *context3 ptr. A particular abuse of the ctx_buf warranted a new void *ext_buf ptr. However, the usage of this new void *ext_buf is not generic. It is intended to only hold virtual addresses for extended mailbox commands. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-10-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-03-11scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptrJustin Tee1-9/+9
In LPFC_MBOXQ_t data structure, the ctx_ndlp ptr shouldn't be defined as a generic void *ptr. It is named ctx_ndlp and it should only be used as an lpfc_nodelist *ptr. Due to the void* declaration, there have been abuses of ctx_ndlp for things not related to ndlp. So, set the ptr type for *ctx_ndlp as lpfc_nodelist. Remove all type casts on ctx_ndlp because it is no longer a void *ptr. Convert the abuse of ctx_ndlp for things not related to ndlps to use the void *context3 ptr. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-03-11scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()Justin Tee1-3/+2
lpfc_worker_wake_up() calls the lpfc_work_done() routine, which takes the hbalock. Thus, lpfc_worker_wake_up() should not be called while holding the hbalock to avoid potential deadlock. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240305200503.57317-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-06scsi: lpfc: Copyright updates for 14.4.0.0 patchesJustin Tee1-1/+1
Update copyrights to 2024 for files modified in the 14.4.0.0 patch set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-18-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-06scsi: lpfc: Change lpfc_vport load_flag member into a bitmaskJustin Tee1-13/+13
In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, change load_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock for synchronization. Also, correct the test for FC_UNLOADING in lpfc_ct_handle_mibreq, which incorrectly tests vport->fc_flag rather than vport->load_flag. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-16-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-06scsi: lpfc: Change lpfc_vport fc_flag member into a bitmaskJustin Tee1-133/+85
In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, change fc_flag into an unsigned long bitmask and use clear_bit/test_bit bitwise atomic APIs instead of reliance on shost_lock for synchronization. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-15-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-06scsi: lpfc: Protect vport fc_nodes list with an explicit spin lockJustin Tee1-25/+25
In attempt to reduce the amount of unnecessary shost_lock acquisitions in the lpfc driver, replace shost_lock with an explicit fc_nodes_list_lock spinlock when accessing vport->fc_nodes lists. Although vport memory region is owned by shost->hostdata, it is driver private memory and an explicit fc_nodes list lock for fc_nodes list mutations is more appropriate than locking the entire shost. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-14-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-06scsi: lpfc: Change nlp state statistic counters into atomic_tJustin Tee1-23/+23
There is no reason to use the shost_lock to synchronize an LLDD statistics counter. Convert all the nlp state statistic counters into atomic_t. Corresponding zeroing, increments, and reads are converted to atomic versions. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-13-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-02-06scsi: lpfc: Remove NLP_RCV_PLOGI early return during RSCN processing for ndlpsJustin Tee1-8/+0
Upon first RSCN receipt of a target server's remote port that is initially acting as an initiator function, the driver marks the ndlp->nlp_type as an initiator role. Then later, when processing an RSCN for a target function role switch, that ndlp remote port is permanently stuck as an initiator role and can never transition to be discovered as an updated target role function. Remove the NLP_RCV_PLOGI early return if statement clause so that the NLP_NPR_2B_DISC flag gets set. This allows for role change detections. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20240131185112.149731-7-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-11-15scsi: lpfc: Eliminate unnecessary relocking in lpfc_check_nlp_post_devloss()Justin Tee1-1/+1
In lpfc_check_nlp_post_devloss(), retaking of the ndlp lock in the if statement is useless because the very next line unlocks. Simply return to avoid relocking. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20231031191224.150862-5-justintee8345@gmail.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-14Merge patch series "lpfc: Update lpfc to revision 14.2.0.15"Martin K. Petersen1-4/+4
Justin Tee <justintee8345@gmail.com> says: Update lpfc to revision 14.2.0.15 This patch set contains error handling fixes, ELS bug fixes, and logging improvements. The patches were cut against Martin's 6.7/scsi-queue tree. Link: https://lore.kernel.org/r/20231009161812.97232-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-10-13scsi: lpfc: Introduce LOG_NODE_VERBOSE messaging flagJustin Tee1-4/+4
The preexisting LOG_NODE message flag frequently spams a subset of the same log messages during normal FC driver operations. When analyzing driver logs, this sometimes leads to difficulty in troubleshooting. Because LOG_IP log message flag is unused, convert it to a new LOG_NODE_VERBOSE flag. The LOG_NODE_VERBOSE shall specifically be used for diagnosing issues that require precise ndlp tracking detail. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20231009161812.97232-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-14scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rportsJustin Tee1-1/+2
During rmmod, when dev_loss_tmo callback is called, an ndlp kref count is decremented twice. Once for SCSI transport registration and second to remove the initial node allocation kref. If there is also an NVMe transport registration, another reference count decrement is expected in lpfc_nvme_unregister_port(). Race conditions between the NVMe transport remoteport_delete and dev_loss_tmo callbacks sometimes results in premature ndlp object release resulting in use-after-free issues. Fix by not dropping the ndlp object in dev_loss_tmo callback with an outstanding NVMe transport registration. Inversely, mark the final NLP_DROPPED flag in lpfc_nvme_unregister_port when rmmod flag is set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230908211923.37603-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-14scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmoJustin Tee1-1/+1
When a dev_loss_tmo event occurs, an ndlp lock is taken before checking nlp_flag for NLP_DROPPED. There is an attempt to restore the ndlp lock when exiting the if statement, but the nlp_put kref could be the final decrement causing a use-after-free memory access on a released ndlp object. Instead of trying to reacquire the ndlp lock after checking nlp_flag, just return after calling nlp_put. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230908211852.37576-1-justintee8345@gmail.com Reviewed-by: "Ewan D. Milne" <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-09-02Merge branch 'fixes' into miscJames Bottomley1-0/+2
2023-08-08scsi: lpfc: Modify when a node should be put in device recovery mode during RSCNJustin Tee1-1/+1
Only nodes whose state is at least past a PLOGI issue and strictly less than a PRLI issue should be put into device recovery mode upon RSCN receipt. Previously, the allowance of LOGO and PRLI completion states did not make sense because those nodes should be allowed to flow through and marked as NPort dissappeared as is normally done. A follow up RSCN GID_FT would recover those nodes in such cases. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230804195546.157839-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23scsi: lpfc: Revise ndlp kref handling for dev_loss_tmo_callbk and lpfc_drop_nodeJustin Tee1-23/+47
The ndlp kref count implementation in lpfc_dev_loss_tmo_callbk() removes the initial node reference when a vport is unloading. When lpfc_cleanup() sends a DEVICE_RM event and is in NPR state, the driver calls lpfc_drop_node(). Subsequently, lpfc_drop_node() also removes an ndlp kref thinking it is the initial reference. This unintentionally introduces an extra kref decrement on the ndlp object. Fix by using the NLP_DROPPED node flag in lpfc_dev_loss_tmo_callbk() and lpfc_drop_node() to coordinate the removal of the initial node reference. In lpfc_dev_loss_tmo_callbk(), remove the SCSI transport reference provided the node is registered in the dev_loss context because the driver cannot call the SCSI transport in dev_loss context or afterwards. And, have lpfc_drop_node() not remove a reference if another thread is acting or has already acted on it. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230712180522.112722-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-23scsi: lpfc: Qualify ndlp discovery state when processing RSCNJustin Tee1-2/+5
Conditionalize when to put an ndlp into recovery mode when processing RSCNs. As long as an ndlp state is beyond a PLOGI issue and has been mapped to a transport layer before, the ndlp qualifies to be put into recovery mode. Otherwise, treat the ndlp rport normally through the discovery engine. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230712180522.112722-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-06scsi: lpfc: Fix a possible data race in lpfc_unregister_fcf_rescan()Tuo Li1-0/+2
The variable phba->fcf.fcf_flag is often protected by the lock phba->hbalock() when is accessed. Here is an example in lpfc_unregister_fcf_rescan(): spin_lock_irq(&phba->hbalock); phba->fcf.fcf_flag |= FCF_INIT_DISC; spin_unlock_irq(&phba->hbalock); However, in the same function, phba->fcf.fcf_flag is assigned with 0 without holding the lock, and thus can cause a data race: phba->fcf.fcf_flag = 0; To fix this possible data race, a lock and unlock pair is added when accessing the variable phba->fcf.fcf_flag. Reported-by: BassCheck <bass@buaa.edu.cn> Signed-off-by: Tuo Li <islituo@gmail.com> Link: https://lore.kernel.org/r/20230630024748.1035993-1-islituo@gmail.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Reviewed-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-01Merge patch series "lpfc: Update lpfc to revision 14.2.0.13"Martin K. Petersen1-16/+17
Justin Tee <justintee8345@gmail.com> says: Update lpfc to revision 14.2.0.13 This patch set contains discovery bug fixes, firmware logging improvements, clean up of CQ handling, and statistics collection enhancements. The patches were cut against Martin's 6.5/scsi-queue tree. Link: https://lore.kernel.org/r/20230523183206.7728-1-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-01scsi: lpfc: Account for fabric domain ctlr device loss recoveryJustin Tee1-5/+14
Pre-existing device loss recovery logic via the NLP_IN_RECOV_POST_DEV_LOSS flag only handled Fabric Port Login, Fabric Controller, Management, and Name Server addresses. Fabric domain controllers fall under the same category for usage of the NLP_IN_RECOV_POST_DEV_LOSS flag. Add a default case statement to mark an ndlp for device loss recovery. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230523183206.7728-4-justintee8345@gmail.com Acked-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-01scsi: lpfc: Clear NLP_IN_DEV_LOSS flag if already in rediscoveryJustin Tee1-3/+3
In dev_loss_tmo callback routine, we early return if the ndlp is in a state of rediscovery. This occurs when a target proactively PLOGIs or PRLIs after an RSCN before the dev_loss_tmo callback routine is scheduled to run. Move clear of the NLP_IN_DEV_LOSS flag before the ndlp state check in such cases. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230523183206.7728-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-01scsi: lpfc: Fix use-after-free rport memory access in ↵Justin Tee1-8/+0
lpfc_register_remote_port() Due to a target port D_ID swap, it is possible for the lpfc_register_remote_port() routine to touch post mortem fc_rport memory when trying to access fc_rport->dd_data. The D_ID swap causes a simultaneous call to lpfc_unregister_remote_port(), where fc_remote_port_delete() reclaims fc_rport memory. Remove the fc_rport->dd_data->pnode NULL assignment because the following line reassigns ndlp->rport with an fc_rport object from fc_remote_port_add() anyways. The pnode nullification is superfluous. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230523183206.7728-2-justintee8345@gmail.com Acked-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-01scsi: lpfc: Replace all non-returning strlcpy() with strscpy()Azeem Shaikh1-1/+1
strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Link: https://lore.kernel.org/r/20230530155745.343032-1-azeemshaikh38@gmail.com Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-08scsi: lpfc: Fix double free in lpfc_cmpl_els_logo_acc() caused by ↵Justin Tee1-21/+3
lpfc_nlp_not_used() Smatch detected a double free path because lpfc_nlp_not_used() releases an ndlp object before reaching lpfc_nlp_put() at the end of lpfc_cmpl_els_logo_acc(). Remove the outdated lpfc_nlp_not_used() routine. In lpfc_mbx_cmpl_ns_reg_login(), replace the call with lpfc_nlp_put(). In lpfc_cmpl_els_logo_acc(), replace the call with lpfc_unreg_rpi() and keep the lpfc_nlp_put() at the end of the routine. If ndlp's rpi was registered, then lpfc_unreg_rpi()'s completion routine performs the final ndlp clean up after lpfc_nlp_put() is called from lpfc_cmpl_els_logo_acc(). Otherwise if ndlp has no rpi registered, the lpfc_nlp_put() at the end of lpfc_cmpl_els_logo_acc() is the final ndlp clean up. Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking") Cc: <stable@vger.kernel.org> # v5.11+ Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/all/Y3OefhyyJNKH%2Fiaf@kili/ Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230417191558.83100-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-10scsi: lpfc: Revise lpfc_error_lost_link() reason code evaluation logicJustin Tee1-0/+35
Extended status reason code errors should mask off the IOERR_PARAM_MASK before checking strict equalities for IOERR values. Update the lpfc_error_lost_link() routine as such. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230301231626.9621-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-10scsi: lpfc: Defer issuing new PLOGI if received RSCN before completing REG_LOGINJustin Tee1-2/+2
When mapped to a target with multiple virtual ports, a link bounce sometimes results in unsuccessful rediscovery of all of the target's virtual ports. This is because a succession of repeat RSCNs for the virtual target ports leaves ndlps in the REG_LOGIN state with the NLP_REG_LOGIN_SEND flag set. With NLP_REG_LOGIN_SEND set, during the next PLOGI, the driver will UNREG_RPI. When UNREG_RPI is processed, the driver can be in the middle of PRLI_ISSUE or MAPPED state resulting in an illegal state transition by the discovery engine and stalling. Fix by calling the discovery state machine with DEVICE_RECOVERY event during RSCN processing. This will set the NLP_IGNR_REG_CMPL bit and prevent the old REG_LOGIN state from advancing. Then for the new PLOGI issue, add the check for the NLP_IGNR_REG_CMPL bit to delay issuing the new PLOGI until the queued REG_LOGIN and UNREG_LOGIN have been processed. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20230301231626.9621-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-02-22scsi: lpfc: Fix double word in commentsBo Liu1-1/+1
Remove the repeated word "the" in comments. [mkp: fixed additional typos in the changed lines] Link: https://lore.kernel.org/r/20230217083046.4090-1-liubo03@inspur.com Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-01-12scsi: lpfc: Copyright updates for 14.2.0.10 patchesJustin Tee1-1/+1
Update copyrights to 2023 for files modified in the 14.2.0.10 patch set. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-01-12scsi: lpfc: Remove duplicate ndlp kref decrement in lpfc_cleanup_rpis()Justin Tee1-15/+0
With faulty cables in PT2PT topology, an unintentional ndlp double kref decrement can occur. If a FLOGI request is outstanding before the link goes down, the missing FLOGI_ACC causes an F_Port ndlp to remain in the UNUSED state. During link down, lpfc_cleanup_rpis() is called and decrements an ndlp kref. Additionally, when the driver later decides to abort the FLOGI, the FLOGI completion handler decrements the ndlp kref a second time. Remove duplicate clean up logic in lpfc_cleanup_rpis() because the updated FLOGI completion handler already handles the ndlp kref decrement. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-11-17scsi: lpfc: Fix crash involving race between FLOGI timeout and devloss handlerJustin Tee1-10/+26
When a FLOGI completes with a sequence timeout error, a freed kref ptr dereference crash can occur due to a timing race involving ndlp referencing in lpfc_dev_loss_tmo_callbk. Fix by ensuring the driver accounts for an outstanding FLOGI when dev_loss is active. Also, don't remove the HBA_FLOGI_OUTSTANDING flag when the FLOGI is retried to allow the driver to handle the reference counts correctly in lpfc_dev_loss_tmo_handler. Reported-by: Dietmar Hahn <dietmar.hahn@fujitsu.com> Tested-by: Dietmar Hahn <dietmar.hahn@fujitsu.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20221116011921.105995-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-10-12treewide: use get_random_{u8,u16}() when possible, part 1Jason A. Donenfeld1-3/+3
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value, simply use the get_random_{u8,u16}() functions, which are faster than wasting the additional bytes from a 32-bit value. This was done mechanically with this coccinelle script: @@ expression E; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; typedef __be16; typedef __le16; typedef u8; @@ ( - (get_random_u32() & 0xffff) + get_random_u16() | - (get_random_u32() & 0xff) + get_random_u8() | - (get_random_u32() % 65536) + get_random_u16() | - (get_random_u32() % 256) + get_random_u8() | - (get_random_u32() >> 16) + get_random_u16() | - (get_random_u32() >> 24) + get_random_u8() | - (u16)get_random_u32() + get_random_u16() | - (u8)get_random_u32() + get_random_u8() | - (__be16)get_random_u32() + (__be16)get_random_u16() | - (__le16)get_random_u32() + (__le16)get_random_u16() | - prandom_u32_max(65536) + get_random_u16() | - prandom_u32_max(256) + get_random_u8() | - E->inet_id = get_random_u32() + E->inet_id = get_random_u16() ) @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; identifier v; @@ - u16 v = get_random_u32(); + u16 v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; identifier v; @@ - u8 v = get_random_u32(); + u8 v = get_random_u8(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; u16 v; @@ - v = get_random_u32(); + v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; u8 v; @@ - v = get_random_u32(); + v = get_random_u8(); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Examine limits @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value < 256: coccinelle.RESULT = cocci.make_ident("get_random_u8") elif value < 65536: coccinelle.RESULT = cocci.make_ident("get_random_u16") else: print("Skipping large mask of %s" % (literal)) cocci.include_match(False) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; identifier add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + (RESULT() & LITERAL) Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-09-16scsi: lpfc: Fix various issues reported by toolsJames Smart1-2/+2
This patch fixes below Smatch reported issues: 1. lpfc_hbadisc.c:3020 lpfc_mbx_cmpl_fcf_rr_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 2. lpfc_hbadisc.c:3121 lpfc_mbx_cmpl_read_fcf_rec() error: uninitialized symbol 'vlan_id'. 3. lpfc_init.c:335 lpfc_dump_wakeup_param_cmpl() warn: always true condition '(prg->dist < 4) => (0-3 < 4)' 4. lpfc_init.c:2419 lpfc_parse_vpd() warn: inconsistent indenting. 5. lpfc_init.c:13248 lpfc_sli4_enable_msi() warn: 'phba->pcidev->irq' 2147483648 can't fit into 65535 'eqhdl->irq' 6. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_cnt' 7. lpfc_debugfs.c:5300 lpfc_idiag_extacc_avail_get() error: uninitialized symbol 'ext_size' 8. lpfc_vmid.c:248 lpfc_vmid_get_appid() warn: sleeping in atomic context. 9. lpfc_init.c:8342 lpfc_sli4_driver_resource_setup() warn: missing error code 'rc'. 10. lpfc_init.c:13573 lpfc_sli4_hba_unset() warn: variable dereferenced before check 'phba->pport' (see line 13546) 11. lpfc_auth.c:1923 lpfc_auth_handle_dhchap_reply() error: double free of 'hash_value' Fixes: 1. Initialize vlan_id to LPFC_FCOE_NULL_VID. 2. Initialize vlan_id to LPFC_FCOE_NULL_VID. 3. prg->dist is a 2 bit field. Its value can only be between 0-3. Remove redundent check 'if (prg->dist < 4)'. 4. Fix inconsistent indenting. Moved logic into helper function lpfc_fill_vpd(). 5. Define 'eqhdl->irq' as int value as pci_irq_vector() returns int. Also, check for return value of pci_irq_vector() and log message in case of failure. 6. Initialize 'ext_cnt' to 0. 7. Initialize 'ext_size' to 0. 8. Use alloc_percpu_gfp() with GFP_ATOMIC flag. 9. 'rc' was not updated when dma_pool_create() fails. Update 'rc = -ENOMEM' when dma_pool_create() fails before calling goto statement. 10. Add check for 'phba->pport' in lpfc_cpuhp_remove(). 11. Initialize 'hash_value' to NULL, same like 'aug_chal' variable. Link: https://lore.kernel.org/r/20220911221505.117655-13-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-16scsi: lpfc: Add reporting capability for Link Degrade SignalingJames Smart1-1/+9
Firmware reports link degrade signaling via ACQES. Handlers and new additions to the SET_FEATURES mbox command are implemented so that link degrade parameters for 64GB capable links are reported through EDC ELS frames. Link: https://lore.kernel.org/r/20220911221505.117655-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-16scsi: lpfc: Fix FLOGI ACC with wrong SID in PT2PT topologyJames Smart1-2/+7
When a FLOGI is received before we have issued our FLOGI, the ACC response to the received FLOGI is issued with SID 2 instead of the expected fabric controller SID. Certain target vendors ignore the malformed ACC with SID 2 and wait for a properly filled ACC with a fabric controller SID. The lpfc_sli_prep_wqe() routine depends on the FC_PT2PT flag to fill in the fabric controller SID when in PT2PT mode, but due to a previous commit the flag was getting cleared. Fix by adding a check for the defer_flogi_acc flag to know whether or not to clear the FC_PT2PT flag on link up. Link: https://lore.kernel.org/r/20220911221505.117655-3-jsmart2021@gmail.com Fixes: 439b93293ff2 ("scsi: lpfc: Fix unsolicited FLOGI receive handling during PT2PT discovery") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-01scsi: lpfc: Remove SANDiags related codeJames Smart1-17/+0
The SANDiags feature is unused, and related code is removed. Link: https://lore.kernel.org/r/20220819011736.14141-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-09-01scsi: lpfc: Fix unsolicited FLOGI receive handling during PT2PT discoveryJames Smart1-1/+0
During a stress offline/online test in PT2PT topology, target rediscovery can fail with a specific target vendor array. When the HBA transitions to online mode it is possible to receive an unsolicited FLOGI before processing the Link Up event. The received FLOGI will set the defer_flogi_acc_flag, which instructs the driver to wait until it transmits its own FLOGI before ACKing the received FLOGI. In this failure scenario, the link up processing clears the set defer_flogi_acc_flag before we have sent out the FLOGI. As the target has the higher WWPN and is responsible for sending the PLOGI, the target is stuck waiting for its FLOGI_ACC that the driver will never send. Remove the clear of defer_flogi_acc_flag from Link Up event processing. In this stress test case, the defer_flogi_acc_flag is cleared during the Link Down event processing anyways. Link: https://lore.kernel.org/r/20220819011736.14141-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-08scsi: lpfc: Remove Menlo/Hornet related codeJames Smart1-51/+6
The Menlo/Hornet adapter was never released to the field. As such, driver code specific to the adapter is unnecessary and should be removed. Link: https://lore.kernel.org/r/20220701211425.2708-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-07-08scsi: lpfc: Revert RSCN_MEMENTO workaround for misbehaved configurationJames Smart1-2/+1
The RSCN_MEMENTO logic was to workaround a target that does not register both FCP and NVMe FC4 types at the same time. This caused the configuration to not produce a second RSCN for the NVMe FC4 type registration in a timely manner. The intention of the RSCN_MEMENTO flag was to always signal to try NVMe PRLI. However, there are other FCP-only target arrays in correctly behaved configurations that reject the NVMe PRLI followed by a LOGO leading to never rediscovering the target after an issue_lip (as LOGO causes a repeat of PLOGI/PRLIs). Revert the RSCN_MEMENTO patch as it is causing correctly behaved configs to fail while it exists only to succeed on a misbehaved config. Link: https://lore.kernel.org/r/20220701211425.2708-9-jsmart2021@gmail.com Fixes: 1045592fc968 ("scsi: lpfc: Introduce FC_RSCN_MEMENTO flag for tracking post RSCN completion") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-11scsi: lpfc: Rework FDMI initialization after link upJames Smart1-2/+3
After a link up, it's possible for the switch to change FDMI support (e.g. FDMI1 vs FDMI2 vs SmartSAN). If the switch reverts to FDMI1, then the revert is currently not detected. Additionally, when NPIV is configured, it's possible the physical port's RHBA is unprocessed by the switch before reciept of an NPIV port issued RPRT. This causes some switches vendors to reject the NPIV's RPRT. Fix by reinitializing base FDMI mode on link up, and defer FDMI vport RPRT submission until after confirming physical port's RHBA is completed. Link: https://lore.kernel.org/r/20220506035519.50908-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-11scsi: lpfc: Decrement outstanding gidft_inp counter if lpfc_err_lost_link()James Smart1-2/+3
During large NPIV port testing, it was sometimes seen that not all vports would log back in to the target device. There are instances when the fabric is slow to respond to a spam of GID_PT requests and as a result the SLI PORT may abort the GID_PT request because the fabric takes so long. lpfc_cmpl_ct_cmd_gid_pt() would enter the lpfc_err_lost_link() logic and attempt to lpfc_els_flush_rscn(), which is fine, but forgets to decrement the gidft_inp counter. This results in a vport->gidft_inp never reaching 0 and never restarting discovery again. Decrement vport->gidft_inp if lpfc_err_lost_link() is true for both lpfc_cmpl_ct_cmd_gid_pt() and lpfc_cmpl_ct_cmd_gid_ft(). Increase logging info during RSCN timeout and lpfc_err_lost_link() events. Link: https://lore.kernel.org/r/20220506035519.50908-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-11scsi: lpfc: Inhibit aborts if external loopback plug is insertedJames Smart1-0/+3
After running a short external loopback test, when the external loopback is removed and a normal cable inserted that is directly connected to a target device, the system oops in the llpfc_set_rrq_active() routine. When the loopback was inserted an FLOGI was transmit. As we're looped back, we receive the FLOGI request. The FLOGI is ABTS'd as we recognize the same wppn thus understand it's a loopback. However, as the ABTS sends address information the port is not set to (fffffe), the ABTS is dropped on the wire. A short 1 frame loopback test is run and completes before the ABTS times out. The looback is unplugged and the new cable plugged in, and the an FLOGI to the new device occurs and completes. Due to a mixup in ref counting the completion of the new FLOGI releases the fabric ndlp. Then the original ABTS completes and references the released ndlp generating the oops. Correct by no-op'ing the ABTS when in loopback mode (it will be dropped anyway). Added a flag to track the mode to recognize when it should be no-op'd. Link: https://lore.kernel.org/r/20220506035519.50908-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-11scsi: lpfc: Fix ndlp put following a LOGO completionJames Smart1-1/+7
During testing with repeated asynchronous resets of the target, an issue was found when the driver issues a LOGO to disconnect its login and recover all exchanges. The LOGO command takes a node reference but neglects to remove it, keeping the node reference count artifically high. Add a call to lpfc_nlp_put() to lpfc_nlp_logo_unreg() and move the mempool free call to the routine exit along with the needed put. This is always safe as this will not be the last reference removed as lpfc_unreg_rpi() ensures there is an additional reference on the ndlp. Link: https://lore.kernel.org/r/20220506035519.50908-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-19scsi: lpfc: Change FA-PWWN detection methodologyJames Smart1-0/+8
Do not rely on vendor version field of the CSPs to determine if we are in a FA-PWWN environment. Instead, use the following procedure: First, during HBA initialization, driver does a READ_CONFIG to determine if FA-PWWN is configured on the HBA. A LPFC_FAWWPN_CONFIG hba_flag is set accordingly. Next, when the link comes up before the driver gets a link up event, the firmware logs into the fabric with FA-PWWN. If the fabric port does not support FA-PWWN, the driver will get a Misconfigured FA-WWN async event before the link up. A LPFC_FAWWPN_FABRIC hba_flag will be set accordingly. Finally, if the fabric supports FA-PWWN, the firmware will replace its CSPs WWN with the Fabric Assigned ones. Then after link up, the driver will retrieve the Fabric Assigned WWN when it does a READ_SPARAM mbox command. Link: https://lore.kernel.org/r/20220412222008.126521-23-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-04-19scsi: lpfc: Refactor cleanup of mailbox commandsJames Smart1-91/+33
The intention of this patch is to refactor mailbox memory allocation and cleanup steps in one routine respectively to prevent memory leaks or memory errors related to mailbox commands. There are trivial localized fixes as well. Provide lpfc_mbox_rsrc_prep() - this routine allocates the dmabuf and the mbuf associated with it. It also catches allocation errors and returns status. Provide lpfc_mbox_rsrc_cleanup() - this routine verifies a dmabuf exists and if so releases the associated mbuf and the dmabuf memory. It then sets the ctx_buf to NULL and releases the mailbox memory to the mailbox pool. Link: https://lore.kernel.org/r/20220412222008.126521-22-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>