summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)AuthorFilesLines
2022-03-23iavf: Fix hang during reboot/shutdownIvan Vecera1-0/+7
[ Upstream commit b04683ff8f0823b869c219c78ba0d974bddea0b5 ] Recent commit 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") adds a wait-loop at the beginning of iavf_remove() to ensure that port initialization is finished prior unregistering net device. This causes a regression in reboot/shutdown scenario because in this case callback iavf_shutdown() is called and this callback detaches the device, makes it down if it is running and sets its state to __IAVF_REMOVE. Later shutdown callback of associated PF driver (e.g. ice_shutdown) is called. That callback calls among other things sriov_disable() that calls indirectly iavf_remove() (see stack trace below). As the adapter state is already __IAVF_REMOVE then the mentioned loop is end-less and shutdown process hangs. The patch fixes this by checking adapter's state at the beginning of iavf_remove() and skips the rest of the function if the adapter is already in remove state (shutdown is in progress). Reproducer: 1. Create VF on PF driven by ice or i40e driver 2. Ensure that the VF is bound to iavf driver 3. Reboot [52625.981294] sysrq: SysRq : Show Blocked State [52625.988377] task:reboot state:D stack: 0 pid:17359 ppid: 1 f2 [52625.996732] Call Trace: [52625.999187] __schedule+0x2d1/0x830 [52626.007400] schedule+0x35/0xa0 [52626.010545] schedule_hrtimeout_range_clock+0x83/0x100 [52626.020046] usleep_range+0x5b/0x80 [52626.023540] iavf_remove+0x63/0x5b0 [iavf] [52626.027645] pci_device_remove+0x3b/0xc0 [52626.031572] device_release_driver_internal+0x103/0x1f0 [52626.036805] pci_stop_bus_device+0x72/0xa0 [52626.040904] pci_stop_and_remove_bus_device+0xe/0x20 [52626.045870] pci_iov_remove_virtfn+0xba/0x120 [52626.050232] sriov_disable+0x2f/0xe0 [52626.053813] ice_free_vfs+0x7c/0x340 [ice] [52626.057946] ice_remove+0x220/0x240 [ice] [52626.061967] ice_shutdown+0x16/0x50 [ice] [52626.065987] pci_device_shutdown+0x34/0x60 [52626.070086] device_shutdown+0x165/0x1c5 [52626.074011] kernel_restart+0xe/0x30 [52626.077593] __do_sys_reboot+0x1d2/0x210 [52626.093815] do_syscall_64+0x5b/0x1a0 [52626.097483] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://lore.kernel.org/r/20220317104524.2802848-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-23iavf: Fix double free in iavf_reset_taskPrzemyslaw Patynowski1-1/+7
[ Upstream commit 16b2dd8cdf6f4e0597c34899de74b4d012b78188 ] Fix double free possibility in iavf_disable_vf, as crit_lock is freed in caller, iavf_reset_task. Add kernel-doc for iavf_disable_vf. Remove mutex_unlock in iavf_disable_vf. Without this patch there is double free scenario, when calling iavf_reset_task. Fixes: e85ff9c631e1 ("iavf: Fix deadlock in iavf_reset_task") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-19ice: Fix race condition during interface enslaveIvan Vecera2-2/+21
commit 5cb1ebdbc4342b1c2ce89516e19808d64417bdbc upstream. Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating auxiliary device") changes a process of re-creation of aux device so ice_plug_aux_dev() is called from ice_service_task() context. This unfortunately opens a race window that can result in dead-lock when interface has left LAG and immediately enters LAG again. Reproducer: ``` #!/bin/sh ip link add lag0 type bond mode 1 miimon 100 ip link set lag0 for n in {1..10}; do echo Cycle: $n ip link set ens7f0 master lag0 sleep 1 ip link set ens7f0 nomaster done ``` This results in: [20976.208697] Workqueue: ice ice_service_task [ice] [20976.213422] Call Trace: [20976.215871] __schedule+0x2d1/0x830 [20976.219364] schedule+0x35/0xa0 [20976.222510] schedule_preempt_disabled+0xa/0x10 [20976.227043] __mutex_lock.isra.7+0x310/0x420 [20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core] [20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core] [20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core] [20976.261079] ib_register_device+0x40d/0x580 [ib_core] [20976.266139] irdma_ib_register_device+0x129/0x250 [irdma] [20976.281409] irdma_probe+0x2c1/0x360 [irdma] [20976.285691] auxiliary_bus_probe+0x45/0x70 [20976.289790] really_probe+0x1f2/0x480 [20976.298509] driver_probe_device+0x49/0xc0 [20976.302609] bus_for_each_drv+0x79/0xc0 [20976.306448] __device_attach+0xdc/0x160 [20976.310286] bus_probe_device+0x9d/0xb0 [20976.314128] device_add+0x43c/0x890 [20976.321287] __auxiliary_device_add+0x43/0x60 [20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice] [20976.330109] ice_service_task+0xd0c/0xed0 [ice] [20976.342591] process_one_work+0x1a7/0x360 [20976.350536] worker_thread+0x30/0x390 [20976.358128] kthread+0x10a/0x120 [20976.365547] ret_from_fork+0x1f/0x40 ... [20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084 [20976.446469] Call Trace: [20976.448921] __schedule+0x2d1/0x830 [20976.452414] schedule+0x35/0xa0 [20976.455559] schedule_preempt_disabled+0xa/0x10 [20976.460090] __mutex_lock.isra.7+0x310/0x420 [20976.464364] device_del+0x36/0x3c0 [20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice] [20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice] [20976.477288] notifier_call_chain+0x47/0x70 [20976.481386] __netdev_upper_dev_link+0x18b/0x280 [20976.489845] bond_enslave+0xe05/0x1790 [bonding] [20976.494475] do_setlink+0x336/0xf50 [20976.502517] __rtnl_newlink+0x529/0x8b0 [20976.543441] rtnl_newlink+0x43/0x60 [20976.546934] rtnetlink_rcv_msg+0x2b1/0x360 [20976.559238] netlink_rcv_skb+0x4c/0x120 [20976.563079] netlink_unicast+0x196/0x230 [20976.567005] netlink_sendmsg+0x204/0x3d0 [20976.570930] sock_sendmsg+0x4c/0x50 [20976.574423] ____sys_sendmsg+0x1eb/0x250 [20976.586807] ___sys_sendmsg+0x7c/0xc0 [20976.606353] __sys_sendmsg+0x57/0xa0 [20976.609930] do_syscall_64+0x5b/0x1a0 [20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca 1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev() is called from ice_service_task() context, aux device is created and associated device->lock is taken. 2. Command 'ip link ... set master...' calls ice's notifier under RTNL lock and that notifier calls ice_unplug_aux_dev(). That function tries to take aux device->lock but this is already taken by ice_plug_aux_dev() in step 1 3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already taken in step 2 4. Dead-lock The patch fixes this issue by following changes: - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev() call in ice_service_task() - The bit is checked in ice_clear_rdma_cap() and only if it is not set then ice_unplug_aux_dev() is called. If it is set (in other words plugging of aux device was requested and ice_plug_aux_dev() is potentially running) then the function only clears the bit - Once ice_plug_aux_dev() call (in ice_service_task) is finished the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked whether it was already cleared by ice_clear_rdma_cap(). If so then aux device is unplugged. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Co-developed-by: Petr Oros <poros@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16ice: Fix curr_link_speed advertised speedJedrzej Jagielski1-1/+1
[ Upstream commit ad35ffa252af67d4cc7c744b9377a2b577748e3f ] Change curr_link_speed advertised speed, due to link_info.link_speed is not equal phy.curr_user_speed_req. Without this patch it is impossible to set advertised speed to same as link_speed. Testing Hints: Try to set advertised speed to 25G only with 25G default link (use ethtool -s 0x80000000) Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16ice: Don't use GFP_KERNEL in atomic contextChristophe JAILLET1-1/+1
[ Upstream commit 3d97f1afd8d831e0c0dc1157418f94b8faa97b54 ] ice_misc_intr() is an irq handler. It should not sleep. Use GFP_ATOMIC instead of GFP_KERNEL when allocating some memory. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16ice: Fix error with handling of bonding MTUDave Ertman2-15/+15
[ Upstream commit 97b0129146b1544bbb0773585327896da3bb4e0a ] When a bonded interface is destroyed, .ndo_change_mtu can be called during the tear-down process while the RTNL lock is held. This is a problem since the auxiliary driver linked to the LAN driver needs to be notified of the MTU change, and this requires grabbing a device_lock on the auxiliary_device's dev. Currently this is being attempted in the same execution context as the call to .ndo_change_mtu which is causing a dead-lock. Move the notification of the changed MTU to a separate execution context (watchdog service task) and eliminate the "before" notification. Fixes: 348048e724a0e ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16ice: stop disabling VFs due to PF error responsesJacob Keller2-21/+0
[ Upstream commit 79498d5af8e458102242d1667cf44df1f1564e63 ] The ice_vc_send_msg_to_vf function has logic to detect "failure" responses being sent to a VF. If a VF is sent more than ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled. Almost identical logic also existed in the i40e driver. This logic was added to the ice driver in commit 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") which itself copied from the i40e implementation in commit 5c3c48ac6bf5 ("i40e: implement virtual device interface"). Neither commit provides a proper explanation or justification of the check. In fact, later commits to i40e changed the logic to allow bypassing the check in some specific instances. The "logic" for this seems to be that error responses somehow indicate a malicious VF. This is not really true. The PF might be sending an error for any number of reasons such as lack of resources, etc. Additionally, this causes the PF to log an info message for every failed VF response which may confuse users, and can spam the kernel log. This behavior is not documented as part of any requirement for our products and other operating system drivers such as the FreeBSD implementation of our drivers do not include this type of check. In fact, the change from dev_err to dev_info in i40e commit 18b7af57d9c1 ("i40e: Lower some message levels") explains that these messages typically don't actually indicate a real issue. It is quite likely that a user who hits this in practice will be very confused as the VF will be disabled without an obvious way to recover. We already have robust malicious driver detection logic using actual hardware detection mechanisms that detect and prevent invalid device usage. Remove the logic since its not a documented requirement and the behavior is not intuitive. Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16i40e: stop disabling VFs due to PF error responsesJacob Keller3-59/+9
[ Upstream commit 5710ab79166504013f7c0ae6a57e7d2fd26e5c43 ] The i40e_vc_send_msg_to_vf_ex (and its wrapper i40e_vc_send_msg_to_vf) function has logic to detect "failure" responses sent to the VF. If a VF is sent more than I40E_DEFAULT_NUM_INVALID_MSGS_ALLOWED, then the VF is marked as disabled. In either case, a dev_info message is printed stating that a VF opcode failed. This logic originates from the early implementation of VF support in commit 5c3c48ac6bf5 ("i40e: implement virtual device interface"). That commit did not go far enough. The "logic" for this behavior seems to be that error responses somehow indicate a malicious VF. This is not really true. The PF might be sending an error for any number of reasons such as lacking resources, an unsupported operation, etc. This does not indicate a malicious VF. We already have a separate robust malicious VF detection which relies on hardware logic to detect and prevent a variety of behaviors. There is no justification for this behavior in the original implementation. In fact, a later commit 18b7af57d9c1 ("i40e: Lower some message levels") reduced the opcode failure message from a dev_err to a dev_info. In addition, recent commit 01cbf50877e6 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change") changed the logic to allow quieting it for expected failures. That commit prevented this logic from kicking in for specific circumstances. This change did not go far enough. The behavior is not documented nor is it part of any requirement for our products. Other operating systems such as the FreeBSD implementation of our driver do not include this logic. It is clear this check does not make sense, and causes problems which led to ugly workarounds. Fix this by just removing the entire logic and the need for the i40e_vc_send_msg_to_vf_ex function. Fixes: 01cbf50877e6 ("i40e: Fix to not show opcode msg on unsuccessful VF MAC change") Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16iavf: Fix handling of vlan strip virtual channel messagesMichal Maloszewski1-0/+40
[ Upstream commit 2cf29e55894886965722e6625f6a03630b4db31d ] Modify netdev->features for vlan stripping based on virtual channel messages received from the PF. Change is needed to synchronize vlan strip status between PF sysfs and iavf ethtool. Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR") Signed-off-by: Norbert Ciosek <norbertx.ciosek@intel.com> Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: missing unlocks in iavf_watchdog_task()Dan Carpenter1-2/+2
commit bc2f39a6252ee40d9bfc2743d4437d420aec5f6e upstream. This code was re-organized and there some unlocks missing now. Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08iavf: do not override the adapter state in the watchdog task (again)Stefan Assmann1-1/+0
commit fe523d7c9a8332855376ad5eb1aa301091129ba4 upstream. The watchdog task incorrectly changes the state to __IAVF_RESETTING, instead of letting the reset task take care of that. This was already resolved by commit 22c8fd71d3a5 ("iavf: do not override the adapter state in the watchdog task") but the problem was reintroduced by the recent code refactoring in commit 45eebd62999d ("iavf: Refactor iavf state machine tracking"). Fixes: 45eebd62999d ("iavf: Refactor iavf state machine tracking") Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08e1000e: Fix possible HW unit hang after an s0ix exitSasha Neftin4-0/+32
[ Upstream commit 1866aa0d0d6492bc2f8d22d0df49abaccf50cddd ] Disable the OEM bit/Gig Disable/restart AN impact and disable the PHY LAN connected device (LCD) reset during power management flows. This fixes possible HW unit hangs on the s0ix exit on some corporate ADL platforms. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821 Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Suggested-by: Nir Efrati <nir.efrati@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix __IAVF_RESETTING state usageSlawomir Laba1-7/+6
[ Upstream commit 14756b2ae265d526b8356e86729090b01778fdf6 ] The setup of __IAVF_RESETTING state in watchdog task had no effect and could lead to slow resets in the driver as the task for __IAVF_RESETTING state only requeues watchdog. Till now the __IAVF_RESETTING was interpreted by reset task as running state which could lead to errors with allocating and resources disposal. Make watchdog_task queue the reset task when it's necessary. Do not update the state to __IAVF_RESETTING so the reset task knows exactly what is the current state of the adapter. Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix race in init stateSlawomir Laba1-1/+2
[ Upstream commit a472eb5cbaebb5774672c565e024336c039e9128 ] When iavf_init_version_check sends VIRTCHNL_OP_GET_VF_RESOURCES message, the driver will wait for the response after requeueing the watchdog task in iavf_init_get_resources call stack. The logic is implemented this way that iavf_init_get_resources has to be called in order to allocate adapter->vf_res. It is polling for the AQ response in iavf_get_vf_config function. Expect a call trace from kernel when adminq_task worker handles this message first. adapter->vf_res will be NULL in iavf_virtchnl_completion. Make the watchdog task not queue the adminq_task if the init process is not finished yet. Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPSSlawomir Laba3-14/+17
[ Upstream commit 0579fafd37fb7efe091f0e6c8ccf968864f40f3e ] iavf_virtchnl_completion is called under crit_lock but when the code for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is called, this lock is released in order to obtain rtnl_lock to avoid ABBA deadlock with unregister_netdev. Along with the new way iavf_remove behaves, there exist many risks related to the lock release and attmepts to regrab it. The driver faces crashes related to races between unregister_netdev and netdev_update_features. Yet another risk is that the driver could already obtain the crit_lock in order to destroy it and iavf_virtchnl_completion could crash or block forever. Make iavf_virtchnl_completion never relock crit_lock in it's call paths. Extract rtnl_lock locking logic to the driver for unregister_netdev in order to set the netdev_registered flag inside the lock. Introduce a new flag that will inform adminq_task to perform the code from VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS right after it finishes processing messages. Guard this code with remove flags so it's never called when the driver is in remove state. Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix init state closure on removeSlawomir Laba2-1/+27
[ Upstream commit 3ccd54ef44ebfa0792c5441b6d9c86618f3378d1 ] When init states of the adapter work, the errors like lack of communication with the PF might hop in. If such events occur the driver restores previous states in order to retry initialization in a proper way. When remove task kicks in, this situation could lead to races with unregistering the netdevice as well as resources cleanup. With the commit introducing the waiting in remove for init to complete, this problem turns into an endless waiting if init never recovers from errors. Introduce __IAVF_IN_REMOVE_TASK bit to indicate that the remove thread has started. Make __IAVF_COMM_FAILED adapter state respect the __IAVF_IN_REMOVE_TASK bit and set the __IAVF_INIT_FAILED state and return without any action instead of trying to recover. Make __IAVF_INIT_FAILED adapter state respect the __IAVF_IN_REMOVE_TASK bit and return without any further actions. Make the loop in the remove handler break when adapter has __IAVF_INIT_FAILED state set. Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Add waiting so the port is initialized in removeSlawomir Laba1-11/+16
[ Upstream commit 974578017fc1fdd06cea8afb9dfa32602e8529ed ] There exist races when port is being configured and remove is triggered. unregister_netdev is not and can't be called under crit_lock mutex since it is calling ndo_stop -> iavf_close which requires this lock. Depending on init state the netdev could be still unregistered so unregister_netdev never cleans up, when shortly after that the device could become registered. Make iavf_remove wait until port finishes initialization. All critical state changes are atomic (under crit_lock). Crashes that come from iavf_reset_interrupt_capability and iavf_free_traffic_irqs should now be solved in a graceful manner. Fixes: 605ca7c5c6707 ("iavf: Fix kernel BUG in free_msi_irqs") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Fix kernel BUG in free_msi_irqsPrzemyslaw Patynowski2-0/+56
[ Upstream commit 605ca7c5c670762e36ccb475cfa089d7ad0698e0 ] Fix driver not freeing VF's traffic irqs, prior to calling pci_disable_msix in iavf_remove. There were possible 2 erroneous states in which, iavf_close would not be called. One erroneous state is fixed by allowing netdev to register, when state is already running. It was possible for VF adapter to enter state loop from running to resetting, where iavf_open would subsequently fail. If user would then unload driver/remove VF pci, iavf_close would not be called, as the netdev was not registered, leaving traffic pcis still allocated. Fixed this by breaking loop, allowing netdev to open device when adapter state is __IAVF_RUNNING and it is not explicitily downed. Other possiblity is entering to iavf_remove from __IAVF_RESETTING state, where iavf_close would not free irqs, but just return 0. Fixed this by checking for last adapter state and then removing irqs. Kernel panic: [ 2773.628585] kernel BUG at drivers/pci/msi.c:375! ... [ 2773.631567] RIP: 0010:free_msi_irqs+0x180/0x1b0 ... [ 2773.640939] Call Trace: [ 2773.641572] pci_disable_msix+0xf7/0x120 [ 2773.642224] iavf_reset_interrupt_capability.part.41+0x15/0x30 [iavf] [ 2773.642897] iavf_remove+0x12e/0x500 [iavf] [ 2773.643578] pci_device_remove+0x3b/0xc0 [ 2773.644266] device_release_driver_internal+0x103/0x1f0 [ 2773.644948] pci_stop_bus_device+0x69/0x90 [ 2773.645576] pci_stop_and_remove_bus_device+0xe/0x20 [ 2773.646215] pci_iov_remove_virtfn+0xba/0x120 [ 2773.646862] sriov_disable+0x2f/0xe0 [ 2773.647531] ice_free_vfs+0x2f8/0x350 [ice] [ 2773.648207] ice_sriov_configure+0x94/0x960 [ice] [ 2773.648883] ? _kstrtoull+0x3b/0x90 [ 2773.649560] sriov_numvfs_store+0x10a/0x190 [ 2773.650249] kernfs_fop_write+0x116/0x190 [ 2773.650948] vfs_write+0xa5/0x1a0 [ 2773.651651] ksys_write+0x4f/0xb0 [ 2773.652358] do_syscall_64+0x5b/0x1a0 [ 2773.653075] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 22ead37f8af8 ("i40evf: Add longer wait after remove module") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Add helper function to go from pci_dev to adapterKaren Sornek1-7/+17
[ Upstream commit 247aa001b72b6c8a89df9d108a2ec6f274a6b64d ] Add helper function to go from pci_dev to adapter to make work simple - to go from a pci_dev to the adapter structure and make netdev assignment instead of having to go to the net_device then the adapter. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Rework mutexes for better synchronisationSlawomir Laba2-31/+36
[ Upstream commit fc2e6b3b132a907378f6af08356b105a4139c4fb ] The driver used to crash in multiple spots when put to stress testing of the init, reset and remove paths. The user would experience call traces or hangs when creating, resetting, removing VFs. Depending on the machines, the call traces are happening in random spots, like reset restoring resources racing with driver remove. Make adapter->crit_lock mutex a mandatory lock for guarding the operations performed on all workqueues and functions dealing with resource allocation and disposal. Make __IAVF_REMOVE a final state of the driver respected by workqueues that shall not requeue, when they fail to obtain the crit_lock. Make the IRQ handler not to queue the new work for adminq_task when the __IAVF_REMOVE state is set. Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Add trace while removing deviceJedrzej Jagielski1-0/+1
[ Upstream commit bdb9e5c7aec73a7b8b5acab37587b6de1203e68d ] Add kernel trace that device was removed. Currently there is no such information. I.e. Host admin removes a PCI device from a VM, than on VM shall be info about the event. This patch adds info log to iavf_remove function. Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Combine init and watchdog state machinesMateusz Palczewski2-80/+57
[ Upstream commit 898ef1cb1cb24040c3e89263e02c605af70c776a ] Use single state machine for driver initialization and for service initialized driver. The init state machine implemented in init_task() is merged into the watchdog_task(). The init_task() function is removed. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Add __IAVF_INIT_FAILED stateMateusz Palczewski2-15/+21
[ Upstream commit 59756ad6948be91d66867ce458083b820c59b8ba ] This commit adds a new state, __IAVF_INIT_FAILED to the state machine. From now on initialization functions report errors not by returning an error value, but by changing the state to indicate that something went wrong. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08iavf: Refactor iavf state machine trackingMateusz Palczewski3-18/+31
[ Upstream commit 45eebd62999d37d13568723524b99d828e0ce22c ] Replace state changes of iavf state machine with a method that also tracks the previous state the machine was on. This change is required for further work with refactoring init and watchdog state machines. Tracking of previous state would help us recover iavf after failure has occurred. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08igc: igc_write_phy_reg_gpy: drop premature returnSasha Neftin1-2/+0
commit c4208653a327a09da1e9e7b10299709b6d9b17bf upstream. Similar to "igc_read_phy_reg_gpy: drop premature return" patch. igc_write_phy_reg_gpy checks the return value from igc_write_phy_reg_mdic and if it's not 0, returns immediately. By doing this, it leaves the HW semaphore in the acquired state. Drop this premature return statement, the function returns after releasing the semaphore immediately anyway. Fixes: 5586838fe9ce ("igc: Add code for PHY support") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Reported-by: Corinna Vinschen <vinschen@redhat.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08igc: igc_read_phy_reg_gpy: drop premature returnCorinna Vinschen1-2/+0
commit fda2635466cd26ad237e1bc5d3f6a60f97ad09b6 upstream. igc_read_phy_reg_gpy checks the return value from igc_read_phy_reg_mdic and if it's not 0, returns immediately. By doing this, it leaves the HW semaphore in the acquired state. Drop this premature return statement, the function returns after releasing the semaphore immediately anyway. Fixes: 5586838fe9ce ("igc: Add code for PHY support") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08iavf: Fix deadlock in iavf_reset_taskSlawomir Laba1-0/+1
commit e85ff9c631e1bf109ce8428848dfc8e8b0041f48 upstream. There exists a missing mutex_unlock call on crit_lock in iavf_reset_task call path. Unlock the crit_lock before returning from reset task. Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08ixgbe: xsk: change !netif_carrier_ok() handling in ixgbe_xmit_zc()Maciej Fijalkowski1-2/+4
commit 6c7273a266759d9d36f7c862149f248bcdeddc0f upstream. Commit c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") addressed the ring transient state when MEM_TYPE_XSK_BUFF_POOL was being configured which in turn caused the interface to through down/up. Maurice reported that when carrier is not ok and xsk_pool is present on ring pair, ksoftirqd will consume 100% CPU cycles due to the constant NAPI rescheduling as ixgbe_poll() states that there is still some work to be done. To fix this, do not set work_done to false for a !netif_carrier_ok(). Fixes: c685c69fba71 ("ixgbe: don't do any AF_XDP zero-copy transmit if netif is not OK") Reported-by: Maurice Baijens <maurice.baijens@ellips.com> Tested-by: Maurice Baijens <maurice.baijens@ellips.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08iavf: Fix missing check for running netdevSlawomir Laba1-2/+5
commit d2c0f45fcceb0995f208c441d9c9a453623f9ccf upstream. The driver was queueing reset_task regardless of the netdev state. Do not queue the reset task in iavf_change_mtu if netdev is not running. Fixes: fdd4044ffdc8 ("iavf: Remove timer for work triggering, use delaying work instead") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-08e1000e: Correct NVM checksum verification flowSasha Neftin1-2/+2
commit ffd24fa2fcc76ecb2e61e7a4ef8588177bcb42a6 upstream. Update MAC type check e1000_pch_tgp because for e1000_pch_cnp, NVM checksum update is still possible. Emit a more detailed warning message. Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1191663 Fixes: 4051f68318ca ("e1000e: Do not take care about recovery NVM checksum") Reported-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02ice: fix concurrent reset and removal of VFsJacob Keller3-18/+27
commit fadead80fe4c033b5e514fcbadd20b55c4494112 upstream. Commit c503e63200c6 ("ice: Stop processing VF messages during teardown") introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is intended to prevent some issues with concurrently handling messages from VFs while tearing down the VFs. This change was motivated by crashes caused while tearing down and bringing up VFs in rapid succession. It turns out that the fix actually introduces issues with the VF driver caused because the PF no longer responds to any messages sent by the VF during its .remove routine. This results in the VF potentially removing its DMA memory before the PF has shut down the device queues. Additionally, the fix doesn't actually resolve concurrency issues within the ice driver. It is possible for a VF to initiate a reset just prior to the ice driver removing VFs. This can result in the remove task concurrently operating while the VF is being reset. This results in similar memory corruption and panics purportedly fixed by that commit. Fix this concurrency at its root by protecting both the reset and removal flows using the existing VF cfg_lock. This ensures that we cannot remove the VF while any outstanding critical tasks such as a virtchnl message or a reset are occurring. This locking change also fixes the root cause originally fixed by commit c503e63200c6 ("ice: Stop processing VF messages during teardown"), so we can simply revert it. Note that I kept these two changes together because simply reverting the original commit alone would leave the driver vulnerable to worse race conditions. Fixes: c503e63200c6 ("ice: Stop processing VF messages during teardown") Cc: <stable@vger.kernel.org> # e6ba5273d4ed: ice: Fix race conditions between virtchnl handling and VF ndo ops Cc: <stable@vger.kernel.org> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02ice: Fix race conditions between virtchnl handling and VF ndo opsBrett Creeley2-0/+30
commit e6ba5273d4ede03d075d7a116b8edad1f6115f4d upstream. The VF can be configured via the PF's ndo ops at the same time the PF is receiving/handling virtchnl messages. This has many issues, with one of them being the ndo op could be actively resetting a VF (i.e. resetting it to the default state and deleting/re-adding the VF's VSI) while a virtchnl message is being handled. The following error was seen because a VF ndo op was used to change a VF's trust setting while the VIRTCHNL_OP_CONFIG_VSI_QUEUES was ongoing: [35274.192484] ice 0000:88:00.0: Failed to set LAN Tx queue context, error: ICE_ERR_PARAM [35274.193074] ice 0000:88:00.0: VF 0 failed opcode 6, retval: -5 [35274.193640] iavf 0000:88:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 6 Fix this by making sure the virtchnl handling and VF ndo ops that trigger VF resets cannot run concurrently. This is done by adding a struct mutex cfg_lock to each VF structure. For VF ndo ops, the mutex will be locked around the critical operations and VFR. Since the ndo ops will trigger a VFR, the virtchnl thread will use mutex_trylock(). This is done because if any other thread (i.e. VF ndo op) has the mutex, then that means the current VF message being handled is no longer valid, so just ignore it. This issue can be seen using the following commands: for i in {0..50}; do rmmod ice modprobe ice sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on sleep 2 echo 0 > /sys/class/net/ens785f0/device/sriov_numvfs echo 0 > /sys/class/net/ens785f1/device/sriov_numvfs sleep 1 echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs echo 1 > /sys/class/net/ens785f1/device/sriov_numvfs ip link set ens785f1 vf 0 trust on ip link set ens785f0 vf 0 trust on done Fixes: 7c710869d64e ("ice: Add handlers for VF netdevice operations") Cc: <stable@vger.kernel.org> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> [I had to fix the cherry-pick manually as the patch added a line around some context that was missing.] Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02ice: initialize local variable 'tlv'Tom Rix1-1/+1
commit 5950bdc88dd1d158f2845fdff8fb1de86476806c upstream. Clang static analysis reports this issues ice_common.c:5008:21: warning: The left expression of the compound assignment is an uninitialized value. The computed value will also be garbage ldo->phy_type_low |= ((u64)buf << (i * 16)); ~~~~~~~~~~~~~~~~~ ^ When called from ice_cfg_phy_fec() ldo is the uninitialized local variable tlv. So initialize. Fixes: ea78ce4dab05 ("ice: add link lenient and default override support") Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02ice: check the return of ice_ptp_gettimex64Tom Rix1-1/+4
commit ed22d9c8d128293fc7b0b086c7d3654bcb99a8dd upstream. Clang static analysis reports this issue time64.h:69:50: warning: The left operand of '+' is a garbage value set_normalized_timespec64(&ts_delta, lhs.tv_sec + rhs.tv_sec, ~~~~~~~~~~ ^ In ice_ptp_adjtime_nonatomic(), the timespec64 variable 'now' is set by ice_ptp_gettimex64(). This function can fail with -EBUSY, so 'now' can have a gargbage value. So check the return. Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices") Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"Mateusz Palczewski1-11/+1
commit fe20371578ef640069e6ae9fa8038f60e7908565 upstream. Revert of a patch that instead of fixing a AQ error when trying to reset BW limit introduced several regressions related to creation and managing TC. Currently there are errors when creating a TC on both PF and VF. Error log: [17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14 [17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391 [17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14 [17428.783259] i40e 0000:3b:00.1: Unable to configure TC map 0 for VSI 391 This reverts commit 3d2504663c41104b4359a15f35670cfa82de1bbf. Fixes: 3d2504663c41 (i40e: Fix reset bw limit when DCB enabled with 1 TC) Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23ice: enable parsing IPSEC SPI headers for RSSJesse Brandeburg1-0/+6
commit 86006f996346e8a5a1ea80637ec949ceeea4ecbc upstream. The COMMS package can enable the hardware parser to recognize IPSEC frames with ESP header and SPI identifier. If this package is available and configured for loading in /lib/firmware, then the driver will succeed in enabling this protocol type for RSS. This in turn allows the hardware to hash over the SPI and use it to pick a consistent receive queue for the same secure flow. Without this all traffic is steered to the same queue for multiple traffic threads from the same IP address. For that reason this is marked as a fix, as the driver supports the model, but it wasn't enabled. If the package is not available, adding this type will fail, but the failure is ignored on purpose as it has no negative affect. Fixes: c90ed40cefe1 ("ice: Enable writing hardware filtering tables") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16ice: Avoid RTNL lock when re-creating auxiliary deviceDave Ertman2-1/+5
[ Upstream commit 5dbbbd01cbba831233c6ea9a3e6bfa133606d3c0 ] If a call to re-create the auxiliary device happens in a context that has already taken the RTNL lock, then the call flow that recreates auxiliary device can hang if there is another attempt to claim the RTNL lock by the auxiliary driver. To avoid this, any call to re-create auxiliary devices that comes from an source that is holding the RTNL lock (e.g. netdev notifier when interface exits a bond) should execute in a separate thread. To accomplish this, add a flag to the PF that will be evaluated in the service task and dealt with there. Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16ice: Fix KASAN error in LAG NETDEV_UNREGISTER handlerDave Ertman1-6/+28
[ Upstream commit bea1898f65b9b7096cb4e73e97c83b94718f1fa1 ] Currently, the same handler is called for both a NETDEV_BONDING_INFO LAG unlink notification as for a NETDEV_UNREGISTER call. This is causing a problem though, since the netdev_notifier_info passed has a different structure depending on which event is passed. The problem manifests as a call trace from a BUG: KASAN stack-out-of-bounds error. Fix this by creating a handler specific to NETDEV_UNREGISTER that only is passed valid elements in the netdev_notifier_info struct for the NETDEV_UNREGISTER event. Also included is the removal of an unbalanced dev_put on the peer_netdev and related braces. Fixes: 6a8b357278f5 ("ice: Respond to a NETDEV_UNREGISTER event for LAG") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16ice: fix IPIP and SIT TSO offloadJesse Brandeburg2-8/+18
[ Upstream commit 46b699c50c0304cdbd725d7740073a7f9d5edb10 ] The driver was avoiding offload for IPIP (at least) frames due to parsing the inner header offsets incorrectly when trying to check lengths. This length check works for VXLAN frames but fails on IPIP frames because skb_transport_offset points to the inner header in IPIP frames, which meant the subtraction of transport_header from inner_network_header returns a negative value (-20). With the code before this patch, everything continued to work, but GSO was being used to segment, causing throughputs of 1.5Gb/s per thread. After this patch, throughput is more like 10Gb/s per thread for IPIP traffic. Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16ice: fix an error code in ice_cfg_phy_fec()Dan Carpenter1-1/+2
[ Upstream commit 21338d58736ef70eaae5fd75d567a358ff7902f9 ] Propagate the error code from ice_get_link_default_override() instead of returning success. Fixes: ea78ce4dab05 ("ice: add link lenient and default override support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-16ixgbevf: Require large buffers for build_skb on 82599VFSamuel Mendoza-Jonas1-6/+7
[ Upstream commit fe68195daf34d5dddacd3f93dd3eafc4beca3a0e ] From 4.17 onwards the ixgbevf driver uses build_skb() to build an skb around new data in the page buffer shared with the ixgbe PF. This uses either a 2K or 3K buffer, and offsets the DMA mapping by NET_SKB_PAD + NET_IP_ALIGN. When using a smaller buffer RXDCTL is set to ensure the PF does not write a full 2K bytes into the buffer, which is actually 2K minus the offset. However on the 82599 virtual function, the RXDCTL mechanism is not available. The driver attempts to work around this by using the SET_LPE mailbox method to lower the maximm frame size, but the ixgbe PF driver ignores this in order to keep the PF and all VFs in sync[0]. This means the PF will write up to the full 2K set in SRRCTL, causing it to write NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the buffer. With 4K pages split into two buffers, this means it either writes NET_SKB_PAD + NET_IP_ALIGN bytes past the first buffer (and into the second), or NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the DMA mapping. Avoid this by only enabling build_skb when using "large" buffers (3K). These are placed in each half of an order-1 page, preventing the PF from writing past the end of the mapping. [0]: Technically it only ever raises the max frame size, see ixgbe_set_vf_lpe() in ixgbe_sriov.c Fixes: f15c5ba5b6cd ("ixgbevf: add support for using order 1 pages to receive large frames") Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-08e1000e: Separate ADP board type from TGPSasha Neftin3-17/+40
commit 68defd528f94ed1cf11f49a75cc1875dccd781fa upstream. We have the same LAN controller on different PCH's. Separate ADP board type from a TGP which will allow for specific fixes to be applied for ADP platforms. Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-05e1000e: Handshake with CSME starts from ADL platformsSasha Neftin1-2/+4
commit cad014b7b5a6897d8c4fad13e2888978bfb7a53f upstream. Handshake with CSME/AMT on none provisioned platforms during S0ix flow is not supported on TGL platform and can cause to HW unit hang. Update the handshake with CSME flow to start from the ADL platform. Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-05i40e: Fix reset path while removing the driverKaren Sornek2-1/+19
commit 6533e558c6505e94c3e0ed4281ed5e31ec985f4d upstream. Fix the crash in kernel while dereferencing the NULL pointer, when the driver is unloaded and simultaneously the VSI rings are being stopped. The hardware requires 50msec in order to finish RX queues disable. For this purpose the driver spins in mdelay function for the operation to be completed. For example changing number of queues which requires reset would fail in the following call stack: 1) i40e_prep_for_reset 2) i40e_pf_quiesce_all_vsi 3) i40e_quiesce_vsi 4) i40e_vsi_close 5) i40e_down 6) i40e_vsi_stop_rings 7) i40e_vsi_control_rx -> disable requires the delay of 50msecs 8) continue back in i40e_down function where i40e_clean_tx_ring(vsi->tx_rings[i]) is going to crash When the driver was spinning vsi_release called i40e_vsi_free_arrays where the vsi->tx_rings resources were freed and the pointer was set to NULL. Fixes: 5b6d4a7f20b0 ("i40e: Fix crash during removing i40e driver") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-05i40e: Fix reset bw limit when DCB enabled with 1 TCJedrzej Jagielski1-1/+11
commit 3d2504663c41104b4359a15f35670cfa82de1bbf upstream. There was an AQ error I40E_AQ_RC_EINVAL when trying to reset bw limit as part of bw allocation setup. This was caused by trying to reset bw limit with DCB enabled. Bw limit should not be reset when DCB is enabled. The code was relying on the pf->flags to check if DCB is enabled but if only 1 TC is available this flag will not be set even though DCB is enabled. Add a check for number of TC and if it is 1 don't try to reset bw limit even if pf->flags shows DCB as disabled. Fixes: fa38e30ac73f ("i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled") Suggested-by: Alexander Lobakin <alexandr.lobakin@intel.com> # Flatten the condition Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Imam Hassan Reza Biswas <imam.hassan.reza.biswas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01i40e: fix unsigned stat widthsJoe Damato3-7/+7
commit 3b8428b84539c78fdc8006c17ebd25afd4722d51 upstream. Change i40e_update_vsi_stats and struct i40e_vsi to use u64 fields to match the width of the stats counters in struct i40e_rx_queue_stats. Update debugfs code to use the correct format specifier for u64. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Joe Damato <jdamato@fastly.com> Reported-by: kernel test robot <lkp@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01i40e: Fix for failed to init adminq while VF resetKaren Sornek3-2/+46
commit 0f344c8129a5337dae50e31b817dd50a60ff238c upstream. Fix for failed to init adminq: -53 while VF is resetting via MAC address changing procedure. Added sync module to avoid reading deadbeef value in reinit adminq during software reset. Without this patch it is possible to trigger VF reset procedure during reinit adminq. This resulted in an incorrect reading of value from the AQP registers and generated the -53 error. Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Signed-off-by: Grzegorz Szczurek <grzegorzx.szczurek@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01i40e: Fix queues reservation for XDPSylwester Dziedziuch1-0/+14
commit 92947844b8beee988c0ce17082b705c2f75f0742 upstream. When XDP was configured on a system with large number of CPUs and X722 NIC there was a call trace with NULL pointer dereference. i40e 0000:87:00.0: failed to get tracking for 256 queues for VSI 0 err -12 i40e 0000:87:00.0: setup of MAIN VSI failed BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:i40e_xdp+0xea/0x1b0 [i40e] Call Trace: ? i40e_reconfig_rss_queues+0x130/0x130 [i40e] dev_xdp_install+0x61/0xe0 dev_xdp_attach+0x18a/0x4c0 dev_change_xdp_fd+0x1e6/0x220 do_setlink+0x616/0x1030 ? ahci_port_stop+0x80/0x80 ? ata_qc_issue+0x107/0x1e0 ? lock_timer_base+0x61/0x80 ? __mod_timer+0x202/0x380 rtnl_setlink+0xe5/0x170 ? bpf_lsm_binder_transaction+0x10/0x10 ? security_capable+0x36/0x50 rtnetlink_rcv_msg+0x121/0x350 ? rtnl_calcit.isra.0+0x100/0x100 netlink_rcv_skb+0x50/0xf0 netlink_unicast+0x1d3/0x2a0 netlink_sendmsg+0x22a/0x440 sock_sendmsg+0x5e/0x60 __sys_sendto+0xf0/0x160 ? __sys_getsockname+0x7e/0xc0 ? _copy_from_user+0x3c/0x80 ? __sys_setsockopt+0xc8/0x1a0 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f83fa7a39e0 This was caused by PF queue pile fragmentation due to flow director VSI queue being placed right after main VSI. Because of this main VSI was not able to resize its queue allocation for XDP resulting in no queues allocated for main VSI when XDP was turned on. Fix this by always allocating last queue in PF queue pile for a flow director VSI. Fixes: 41c445ff0f48 ("i40e: main driver core") Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01i40e: Fix issue when maximum queues is exceededJedrzej Jagielski3-13/+61
commit d701658a50a471591094b3eb3961b4926cc8f104 upstream. Before this patch VF interface vanished when maximum queue number was exceeded. Driver tried to add next queues even if there was not enough space. PF sent incorrect number of queues to the VF when there were not enough of them. Add an additional condition introduced to check available space in 'qp_pile' before proceeding. This condition makes it impossible to add queues if they number is greater than the number resulting from available space. Also add the search for free space in PF queue pair piles. Without this patch VF interfaces are not seen when available space for queues has been exceeded and following logs appears permanently in dmesg: "Unable to get VF config (-32)". "VF 62 failed opcode 3, retval: -5" "Unable to get VF config due to PF error condition, not retrying" Fixes: 7daa6bf3294e ("i40e: driver core headers") Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com> Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01i40e: Increase delay to 1 s after global EMP resetJedrzej Jagielski1-9/+3
commit 9b13bd53134c9ddd544a790125199fdbdb505e67 upstream. Recently simplified i40e_rebuild causes that FW sometimes is not ready after NVM update, the ping does not return. Increase the delay in case of EMP reset. Old delay of 300 ms was introduced for specific cards for 710 series. Now it works for all the cards and delay was increased. Fixes: 1fa51a650e1d ("i40e: Add delay after EMP reset for firmware to recover") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>