summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/i40e
AgeCommit message (Collapse)AuthorFilesLines
2024-07-12i40e: fix: remove needless retries of NVM updateAleksandr Loktionov1-4/+0
Remove wrong EIO to EGAIN conversion and pass all errors as is. After commit 230f3d53a547 ("i40e: remove i40e_status"), which should only replace F/W specific error codes with Linux kernel generic, all EIO errors suddenly started to be converted into EAGAIN which leads nvmupdate to retry until it timeouts and sometimes fails after more than 20 minutes in the middle of NVM update, so NVM becomes corrupted. The bug affects users only at the time when they try to update NVM, and only F/W versions that generate errors while nvmupdate. For example, X710DA2 with 0x8000ECB7 F/W is affected, but there are probably more... Command for reproduction is just NVM update: ./nvmupdate64 In the log instead of: i40e_nvmupd_exec_aq err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_ENOMEM) appears: i40e_nvmupd_exec_aq err -EIO aq_err I40E_AQ_RC_ENOMEM i40e: eeprom check failed (-5), Tx/Rx traffic disabled The problematic code did silently convert EIO into EAGAIN which forced nvmupdate to ignore EAGAIN error and retry the same operation until timeout. That's why NVM update takes 20+ minutes to finish with the fail in the end. Fixes: 230f3d53a547 ("i40e: remove i40e_status") Co-developed-by: Kelvin Kang <kelvin.kang@intel.com> Signed-off-by: Kelvin Kang <kelvin.kang@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20240710224455.188502-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-10i40e: Fix XDP program unloading while removing the driverMichal Kubiak1-5/+4
The commit 6533e558c650 ("i40e: Fix reset path while removing the driver") introduced a new PF state "__I40E_IN_REMOVE" to block modifying the XDP program while the driver is being removed. Unfortunately, such a change is useful only if the ".ndo_bpf()" callback was called out of the rmmod context because unloading the existing XDP program is also a part of driver removing procedure. In other words, from the rmmod context the driver is expected to unload the XDP program without reporting any errors. Otherwise, the kernel warning with callstack is printed out to dmesg. Example failing scenario: 1. Load the i40e driver. 2. Load the XDP program. 3. Unload the i40e driver (using "rmmod" command). The example kernel warning log: [ +0.004646] WARNING: CPU: 94 PID: 10395 at net/core/dev.c:9290 unregister_netdevice_many_notify+0x7a9/0x870 [...] [ +0.010959] RIP: 0010:unregister_netdevice_many_notify+0x7a9/0x870 [...] [ +0.002726] Call Trace: [ +0.002457] <TASK> [ +0.002119] ? __warn+0x80/0x120 [ +0.003245] ? unregister_netdevice_many_notify+0x7a9/0x870 [ +0.005586] ? report_bug+0x164/0x190 [ +0.003678] ? handle_bug+0x3c/0x80 [ +0.003503] ? exc_invalid_op+0x17/0x70 [ +0.003846] ? asm_exc_invalid_op+0x1a/0x20 [ +0.004200] ? unregister_netdevice_many_notify+0x7a9/0x870 [ +0.005579] ? unregister_netdevice_many_notify+0x3cc/0x870 [ +0.005586] unregister_netdevice_queue+0xf7/0x140 [ +0.004806] unregister_netdev+0x1c/0x30 [ +0.003933] i40e_vsi_release+0x87/0x2f0 [i40e] [ +0.004604] i40e_remove+0x1a1/0x420 [i40e] [ +0.004220] pci_device_remove+0x3f/0xb0 [ +0.003943] device_release_driver_internal+0x19f/0x200 [ +0.005243] driver_detach+0x48/0x90 [ +0.003586] bus_remove_driver+0x6d/0xf0 [ +0.003939] pci_unregister_driver+0x2e/0xb0 [ +0.004278] i40e_exit_module+0x10/0x5f0 [i40e] [ +0.004570] __do_sys_delete_module.isra.0+0x197/0x310 [ +0.005153] do_syscall_64+0x85/0x170 [ +0.003684] ? syscall_exit_to_user_mode+0x69/0x220 [ +0.004886] ? do_syscall_64+0x95/0x170 [ +0.003851] ? exc_page_fault+0x7e/0x180 [ +0.003932] entry_SYSCALL_64_after_hwframe+0x71/0x79 [ +0.005064] RIP: 0033:0x7f59dc9347cb [ +0.003648] Code: 73 01 c3 48 8b 0d 65 16 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 16 0c 00 f7 d8 64 89 01 48 [ +0.018753] RSP: 002b:00007ffffac99048 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ +0.007577] RAX: ffffffffffffffda RBX: 0000559b9bb2f6e0 RCX: 00007f59dc9347cb [ +0.007140] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559b9bb2f748 [ +0.007146] RBP: 00007ffffac99070 R08: 1999999999999999 R09: 0000000000000000 [ +0.007133] R10: 00007f59dc9a5ac0 R11: 0000000000000206 R12: 0000000000000000 [ +0.007141] R13: 00007ffffac992d8 R14: 0000559b9bb2f6e0 R15: 0000000000000000 [ +0.007151] </TASK> [ +0.002204] ---[ end trace 0000000000000000 ]--- Fix this by checking if the XDP program is being loaded or unloaded. Then, block only loading a new program while "__I40E_IN_REMOVE" is set. Also, move testing "__I40E_IN_REMOVE" flag to the beginning of XDP_SETUP callback to avoid unnecessary operations and checks. Fixes: 6533e558c650 ("i40e: Fix reset path while removing the driver") Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20240708230750.625986-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-30i40e: Fully suspend and resume IO operations in EEH caseThinh Tran1-3/+6
When EEH events occurs, the callback functions in the i40e, which are managed by the EEH driver, will completely suspend and resume all IO operations. - In the PCI error detected callback, replaced i40e_prep_for_reset() with i40e_io_suspend(). The change is to fully suspend all I/O operations - In the PCI error slot reset callback, replaced pci_enable_device_mem() with pci_enable_device(). This change enables both I/O and memory of the device. - In the PCI error resume callback, replaced i40e_handle_reset_warning() with i40e_io_resume(). This change allows the system to resume I/O operations Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Robert Thomas <rob.thomas@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-3-dc8593d2bbc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-30i40e: factoring out i40e_suspend/i40e_resumeThinh Tran1-114/+135
Two new functions, i40e_io_suspend() and i40e_io_resume(), have been introduced. These functions were factored out from the existing i40e_suspend() and i40e_resume() respectively. This factoring was done due to concerns about the logic of the I40E_SUSPENSED state, which caused the device to be unable to recover. The functions are now used in the EEH handling for device suspend/resume callbacks. The function i40e_enable_mc_magic_wake() has been moved ahead of i40e_io_suspend() to ensure it is declared before being used. Tested-by: Robert Thomas <rob.thomas@ibm.com> Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-2-dc8593d2bbc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-23tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)1-5/+5
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-05-20Merge tag 'dma-mapping-6.10-2024-05-20' of ↵Linus Torvalds1-1/+1
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - optimize DMA sync calls when they are no-ops (Alexander Lobakin) - fix swiotlb padding for untrusted devices (Michael Kelley) - add documentation for swiotb (Michael Kelley) * tag 'dma-mapping-6.10-2024-05-20' of git://git.infradead.org/users/hch/dma-mapping: dma: fix DMA sync for drivers not calling dma_set_mask*() xsk: use generic DMA sync shortcut instead of a custom one page_pool: check for DMA sync shortcut earlier page_pool: don't use driver-set flags field directly page_pool: make sure frag API fields don't span between cachelines iommu/dma: avoid expensive indirect calls for sync operations dma: avoid redundant calls for sync operations dma: compile-out DMA sync op calls when not used iommu/dma: fix zeroing of bounce buffer padding used by untrusted devices swiotlb: remove alloc_size argument to swiotlb_tbl_map_single() Documentation/core-api: add swiotlb documentation
2024-05-08i40e: flower: validate control flagsAsbjørn Sloth Tønnesen1-0/+4
This driver currently doesn't support any control flags. Use flow_rule_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-05-08xsk: use generic DMA sync shortcut instead of a custom oneAlexander Lobakin1-1/+1
XSk infra's been using its own DMA sync shortcut to try avoiding redundant function calls. Now that there is a generic one, remove the custom implementation and rely on the generic helpers. xsk_buff_dma_sync_for_cpu() doesn't need the second argument anymore, remove it. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-05-08net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet1-1/+1
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30i40e: Add and use helper to reconfigure TC for given VSIIvan Vecera1-8/+24
Add helper i40e_vsi_reconfig_tc(vsi) that configures TC for given VSI using previously stored TC bitmap. Effectively replaces open-coded patterns: enabled_tc = vsi->tc_config.enabled_tc; vsi->tc_config.enabled_tc = 0; i40e_vsi_config_tc(vsi, enabled_tc); Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-30i40e: Add helper to access main VEBIvan Vecera3-16/+31
Add a helper to access main VEB: i40e_pf_get_main_veb(pf) replaces 'pf->veb[pf->lan_veb]' Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-30i40e: Consolidate checks whether given VSI is mainIvan Vecera3-17/+16
In the driver code there are 3 types of checks whether given VSI is main or not: 1. vsi->type ==/!= I40E_VSI_MAIN 2. vsi ==/!= pf->vsi[pf->lan_vsi] 3. vsi->seid ==/!= pf->vsi[pf->lan_vsi]->seid All of them are equivalent and can be consolidated. Convert cases 2 and 3 to case 1. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-30i40e: Add helper to access main VSIIvan Vecera9-83/+116
Add simple helper i40e_pf_get_main_vsi(pf) to access main VSI that replaces pattern 'pf->vsi[pf->lan_vsi]' Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-30i40e: Refactor argument of i40e_detect_recover_hung()Ivan Vecera3-6/+8
Commit 07d44190a389 ("i40e/i40evf: Detect and recover hung queue scenario") changes i40e_detect_recover_hung() argument type from i40e_pf* to i40e_vsi* to be shareable by both i40e and i40evf. Because the i40evf does not exist anymore and the function is exclusively used by i40e we can revert this change. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-30i40e: Refactor argument of several client notification functionsIvan Vecera3-19/+17
Commit 0ef2d5afb12d ("i40e: KISS the client interface") simplified the client interface so in practice it supports only one client per i40e netdev. But we have still 2 notification functions that uses as parameter a pointer to VSI of netdevice associated with the client. After the mentioned commit only possible and used VSI is the main (LAN) VSI. So refactor these functions so they are called with PF pointer argument and the associated VSI (LAN) is taken inside them. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-30i40e: Remove flags field from i40e_vebIvan Vecera3-11/+7
The field is initialized always to zero and it is never read. Remove it. Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-26Merge branch '40GbE' of ↵Jakub Kicinski5-403/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== net: intel: start The Great Code Dedup + Page Pool for iavf Alexander Lobakin says: Here's a two-shot: introduce {,Intel} Ethernet common library (libeth and libie) and switch iavf to Page Pool. Details are in the commit messages; here's a summary: Not a secret there's a ton of code duplication between two and more Intel ethernet modules. Before introducing new changes, which would need to be copied over again, start decoupling the already existing duplicate functionality into a new module, which will be shared between several Intel Ethernet drivers. The first name that came to my mind was "libie" -- "Intel Ethernet common library". Also this sounds like "lovelie" (-> one word, no "lib I E" pls) and can be expanded as "lib Internet Explorer" :P The "generic", pure-software part is placed separately, so that it can be easily reused in any driver by any vendor without linking to the Intel pre-200G guts. In a few words, it's something any modern driver does the same way, but nobody moved it level up (yet). The series is only the beginning. From now on, adding every new feature or doing any good driver refactoring will remove much more lines than add for quite some time. There's a basic roadmap with some deduplications planned already, not speaking of that touching every line now asks: "can I share this?". The final destination is very ambitious: have only one unified driver for at least i40e, ice, iavf, and idpf with a struct ops for each generation. That's never gonna happen, right? But you still can at least try. PP conversion for iavf lands within the same series as these two are tied closely. libie will support Page Pool model only, so that a driver can't use much of the lib until it's converted. iavf is only the example, the rest will eventually be converted soon on a per-driver basis. That is when it gets really interesting. Stay tech. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: MAINTAINERS: add entry for libeth and libie iavf: switch to Page Pool iavf: pack iavf_ring more efficiently libeth: add Rx buffer management page_pool: add DMA-sync-for-CPU inline helper page_pool: constify some read-only function arguments slab: introduce kvmalloc_array_node() and kvcalloc_node() iavf: drop page splitting and recycling iavf: kill "legacy-rx" for good net: intel: introduce {, Intel} Ethernet common library ==================== Link: https://lore.kernel.org/r/20240424203559.3420468-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+3
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25i40e: Report MFS in decimal base instead of hexErwan Velu1-2/+2
If the MFS is set below the default (0x2600), a warning message is reported like the following : MFS for port 1 has been set below the default: 600 This message is a bit confusing as the number shown here (600) is in fact an hexa number: 0x600 = 1536 Without any explicit "0x" prefix, this message is read like the MFS is set to 600 bytes. MFS, as per MTUs, are usually expressed in decimal base. This commit reports both current and default MFS values in decimal so it's less confusing for end-users. A typical warning message looks like the following : MFS for port 1 (1536) has been set below the default (9728) Signed-off-by: Erwan Velu <e.velu@criteo.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Fixes: 3a2c6ced90e1 ("i40e: Add a check to see if MFS is set") Link: https://lore.kernel.org/r/20240423182723.740401-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25i40e: Do not use WQ_MEM_RECLAIM flag for workqueueSindhu Devale1-1/+1
Issue reported by customer during SRIOV testing, call trace: When both i40e and the i40iw driver are loaded, a warning in check_flush_dependency is being triggered. This seems to be because of the i40e driver workqueue is allocated with the WQ_MEM_RECLAIM flag, and the i40iw one is not. Similar error was encountered on ice too and it was fixed by removing the flag. Do the same for i40e too. [Feb 9 09:08] ------------[ cut here ]------------ [ +0.000004] workqueue: WQ_MEM_RECLAIM i40e:i40e_service_task [i40e] is flushing !WQ_MEM_RECLAIM infiniband:0x0 [ +0.000060] WARNING: CPU: 0 PID: 937 at kernel/workqueue.c:2966 check_flush_dependency+0x10b/0x120 [ +0.000007] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_timer snd_seq_device snd soundcore nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 dns_resolver netfs qrtr rfkill sunrpc vfat fat intel_rapl_msr intel_rapl_common irdma intel_uncore_frequency intel_uncore_frequency_common ice ipmi_ssif isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp gnss coretemp ib_uverbs rapl intel_cstate ib_core iTCO_wdt iTCO_vendor_support acpi_ipmi mei_me ipmi_si intel_uncore ioatdma i2c_i801 joydev pcspkr mei ipmi_devintf lpc_ich intel_pch_thermal i2c_smbus ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c ast sd_mod drm_shmem_helper t10_pi drm_kms_helper sg ixgbe drm i40e ahci crct10dif_pclmul libahci crc32_pclmul igb crc32c_intel libata ghash_clmulni_intel i2c_algo_bit mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod fuse [ +0.000050] CPU: 0 PID: 937 Comm: kworker/0:3 Kdump: loaded Not tainted 6.8.0-rc2-Feb-net_dev-Qiueue-00279-gbd43c5687e05 #1 [ +0.000003] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 [ +0.000001] Workqueue: i40e i40e_service_task [i40e] [ +0.000024] RIP: 0010:check_flush_dependency+0x10b/0x120 [ +0.000003] Code: ff 49 8b 54 24 18 48 8d 8b b0 00 00 00 49 89 e8 48 81 c6 b0 00 00 00 48 c7 c7 b0 97 fa 9f c6 05 8a cc 1f 02 01 e8 35 b3 fd ff <0f> 0b e9 10 ff ff ff 80 3d 78 cc 1f 02 00 75 94 e9 46 ff ff ff 90 [ +0.000002] RSP: 0018:ffffbd294976bcf8 EFLAGS: 00010282 [ +0.000002] RAX: 0000000000000000 RBX: ffff94d4c483c000 RCX: 0000000000000027 [ +0.000001] RDX: ffff94d47f620bc8 RSI: 0000000000000001 RDI: ffff94d47f620bc0 [ +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffff7fff [ +0.000001] R10: ffffbd294976bb98 R11: ffffffffa0be65e8 R12: ffff94c5451ea180 [ +0.000001] R13: ffff94c5ab5e8000 R14: ffff94c5c20b6e05 R15: ffff94c5f1330ab0 [ +0.000001] FS: 0000000000000000(0000) GS:ffff94d47f600000(0000) knlGS:0000000000000000 [ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000001] CR2: 00007f9e6f1fca70 CR3: 0000000038e20004 CR4: 00000000007706f0 [ +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ +0.000001] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ +0.000001] PKRU: 55555554 [ +0.000001] Call Trace: [ +0.000001] <TASK> [ +0.000002] ? __warn+0x80/0x130 [ +0.000003] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? report_bug+0x195/0x1a0 [ +0.000005] ? handle_bug+0x3c/0x70 [ +0.000003] ? exc_invalid_op+0x14/0x70 [ +0.000002] ? asm_exc_invalid_op+0x16/0x20 [ +0.000006] ? check_flush_dependency+0x10b/0x120 [ +0.000002] ? check_flush_dependency+0x10b/0x120 [ +0.000002] __flush_workqueue+0x126/0x3f0 [ +0.000015] ib_cache_cleanup_one+0x1c/0xe0 [ib_core] [ +0.000056] __ib_unregister_device+0x6a/0xb0 [ib_core] [ +0.000023] ib_unregister_device_and_put+0x34/0x50 [ib_core] [ +0.000020] i40iw_close+0x4b/0x90 [irdma] [ +0.000022] i40e_notify_client_of_netdev_close+0x54/0xc0 [i40e] [ +0.000035] i40e_service_task+0x126/0x190 [i40e] [ +0.000024] process_one_work+0x174/0x340 [ +0.000003] worker_thread+0x27e/0x390 [ +0.000001] ? __pfx_worker_thread+0x10/0x10 [ +0.000002] kthread+0xdf/0x110 [ +0.000002] ? __pfx_kthread+0x10/0x10 [ +0.000002] ret_from_fork+0x2d/0x50 [ +0.000003] ? __pfx_kthread+0x10/0x10 [ +0.000001] ret_from_fork_asm+0x1b/0x30 [ +0.000004] </TASK> [ +0.000001] ---[ end trace 0000000000000000 ]--- Fixes: 4d5957cbdecd ("i40e: remove WQ_UNBOUND and the task limit of our workqueue") Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Robert Ganzynkowicz <robert.ganzynkowicz@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240423182723.740401-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24net: intel: introduce {, Intel} Ethernet common libraryAlexander Lobakin5-403/+18
Not a secret there's a ton of code duplication between two and more Intel ethernet modules. Before introducing new changes, which would need to be copied over again, start decoupling the already existing duplicate functionality into a new module, which will be shared between several Intel Ethernet drivers. Add the lookup table which converts 8/10-bit hardware packet type into a parsed bitfield structure for easy checking packet format parameters, such as payload level, IP version, etc. This is currently used by i40e, ice and iavf and it's all the same in all three drivers. The only difference introduced in this implementation is that instead of defining a 256 (or 1024 in case of ice) element array, add unlikely() condition to limit the input to 154 (current maximum non-reserved packet type). There's no reason to waste 600 (or even 3600) bytes only to not hurt very unlikely exception packets. The hash computation function now takes payload level directly as a pkt_hash_type. There's a couple cases when non-IP ptypes are marked as L3 payload and in the previous versions their hash level would be 2, not 3. But skb_set_hash() only sees difference between L4 and non-L4, thus this won't change anything at all. The module is behind the hidden Kconfig symbol, which the drivers will select when needed. The exports are behind 'LIBIE' namespace to limit the scope of the functions. Not that non-HW-specific symbols will live in yet another module, libeth. This is done to easily distinguish pretty generic code ready for reusing by any other vendor and/or for moving the layer up from the code useful in Intel's 1-100G drivers only. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski6-46/+99
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/ip_gre.c 17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head") 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps") https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/ Adjacent changes: net/ipv6/ip6_fib.c d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().") 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02Merge branch '1GbE' of ↵Jakub Kicinski2-527/+493
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-03-29 (net: intel) This series contains updates to most Intel drivers. Jesse moves declaration of pci_driver struct to remove need for forward declarations in igb and converts Intel drivers to user newer power management ops. Sasha reworks power management flow on igc to avoid using rtnl_lock() during those flows. Maciej reorganizes i40e_nvm file to avoid forward declarations. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: avoid forward declarations in i40e_nvm.c igc: Refactor runtime power management flow net: intel: implement modern PM ops declarations igb: simplify pci ops declaration ==================== Link: https://lore.kernel.org/r/20240329175632.211340-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02i40e: Fix VF MAC filter removalIvan Vecera1-5/+6
Commit 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove administratively set MAC") fixed an issue where untrusted VF was allowed to remove its own MAC address although this was assigned administratively from PF. Unfortunately the introduced check is wrong because it causes that MAC filters for other MAC addresses including multi-cast ones are not removed. <snip> if (ether_addr_equal(addr, vf->default_lan_addr.addr) && i40e_can_vf_change_mac(vf)) was_unimac_deleted = true; else continue; if (i40e_del_mac_filter(vsi, al->list[i].addr)) { ... </snip> The else path with `continue` effectively skips any MAC filter removal except one for primary MAC addr when VF is allowed to do so. Fix the check condition so the `continue` is only done for primary MAC address. Fixes: 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove administratively set MAC") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240329180638.211412-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-30netlink: introduce type-checking attribute iterationJohannes Berg1-6/+2
There are, especially with multi-attr arrays, many cases of needing to iterate all attributes of a specific type in a netlink message or a nested attribute. Add specific macros to support that case. Also convert many instances using this spatch: @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } Although I had to undo one bad change this made, and I also adjusted some other code for whitespace and to use direct variable initialization now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29i40e: avoid forward declarations in i40e_nvm.cMaciej Fijalkowski1-521/+489
Move code around to get rid of forward declarations. No functional changes. After a plain code juggling, checkpatch reported: total: 0 errors, 7 warnings, 12 checks, 1581 lines checked so while at it let's address old issues as well. Should we ever address the remaining unnecessary forward declarations within drivers/net/ethernet/intel/, consider this change as a starting point/reference. As reported in [0], there would be a lot more of work to do...if we care. [0]: https://lore.kernel.org/intel-wired-lan/Zeh8qadiTGf413YU@boxer/T/#u Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-29net: intel: implement modern PM ops declarationsJesse Brandeburg1-6/+4
Switch the Intel networking drivers to use the new power management ops declaration formats and macros, which allows us to drop __maybe_unused, as well as a bunch of ifdef checking CONFIG_PM. This is safe to do because the compiler drops the unused functions, verified by checking for any of the power management function symbols being present in System.map for a build without CONFIG_PM. If a driver has runtime PM, define the ops with pm_ptr(), and if the driver has Simple PM, use pm_sleep_ptr(), as well as the new versions of the macros for declaring the members of the pm_ops structs. Checked with network-enabled allnoconfig, allyesconfig, allmodconfig on x64_64. Reviewed-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-29net: remove gfp_mask from napi_alloc_skb()Jakub Kicinski2-5/+2
__napi_alloc_skb() is napi_alloc_skb() with the added flexibility of choosing gfp_mask. This is a NAPI function, so GFP_ATOMIC is implied. The only practical choice the caller has is whether to set __GFP_NOWARN. But that's a false choice, too, allocation failures in atomic context will happen, and printing warnings in logs, effectively for a packet drop, is both too much and very likely non-actionable. This leads me to a conclusion that most uses of napi_alloc_skb() are simply misguided, and should use __GFP_NOWARN in the first place. We also have a "standard" way of reporting allocation failures via the queue stat API (qstats::rx-alloc-fail). The direct motivation for this patch is that one of the drivers used at Meta calls napi_alloc_skb() (so prior to this patch without __GFP_NOWARN), and the resulting OOM warning is the top networking warning in our fleet. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240327040213.3153864-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-26i40e: fix vf may be used uninitialized in this function warningAleksandr Loktionov1-18/+16
To fix the regression introduced by commit 52424f974bc5, which causes servers hang in very hard to reproduce conditions with resets races. Using two sources for the information is the root cause. In this function before the fix bumping v didn't mean bumping vf pointer. But the code used this variables interchangeably, so stale vf could point to different/not intended vf. Remove redundant "v" variable and iterate via single VF pointer across whole function instead to guarantee VF pointer validity. Fixes: 52424f974bc5 ("i40e: Fix VF hang when reset is triggered on another VF") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26i40e: fix i40e_count_filters() to count only active/new filtersAleksandr Loktionov1-2/+5
The bug usually affects untrusted VFs, because they are limited to 18 MACs, it affects them badly, not letting to create MAC all filters. Not stable to reproduce, it happens when VF user creates MAC filters when other MACVLAN operations are happened in parallel. But consequence is that VF can't receive desired traffic. Fix counter to be bumped only for new or active filters. Fixes: 621650cabee5 ("i40e: Refactoring VF MAC filters counting to make more reliable") Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-26i40e: Enforce software interrupt during busy-poll exitIvan Vecera5-21/+72
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts during poll exit") followed by commit 23be7075b318 ("ice: fix software generating extra interrupts") I'm seeing the similar issue also with i40e driver. In certain situation when busy-loop is enabled together with adaptive coalescing, the driver occasionally misses that there are outstanding descriptors to clean when exiting busy poll. Try to catch the remaining work by triggering a software interrupt when exiting busy poll. No extra interrupts will be generated when busy polling is not used. The issue was found when running sockperf ping-pong tcp test with adaptive coalescing and busy poll enabled (50 as value busy_pool and busy_read sysctl knobs) and results in huge latency spikes with more than 100000us. The fix is inspired from the ice driver and do the following: 1) During napi poll exit in case of busy-poll (napo_complete_done() returns false) this is recorded to q_vector that we were in busy loop. 2) Extends i40e_buildreg_itr() to be able to add an enforced software interrupt into built value 2) In i40e_update_enable_itr() enforces a software interrupt trigger if we are exiting busy poll to catch any pending clean-ups 3) Reuses unused 3rd ITR (interrupt throttle) index and set it to 20K interrupts per second to limit the number of these sw interrupts. Test results ============ Prior: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473 sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.571 usec sockperf: Total 2429473 observations; each percentile contains 24294.73 observations sockperf: ---> <MAX> observation = 103294.331 sockperf: ---> percentile 99.999 = 45.633 sockperf: ---> percentile 99.990 = 37.013 sockperf: ---> percentile 99.900 = 35.910 sockperf: ---> percentile 99.000 = 33.390 sockperf: ---> percentile 90.000 = 28.626 sockperf: ---> percentile 75.000 = 27.741 sockperf: ---> percentile 50.000 = 26.743 sockperf: ---> percentile 25.000 = 25.614 sockperf: ---> <MIN> observation = 12.220 After: [root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120 sockperf: == version #3.10-no.git == sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s) [ 0] IP = 10.9.9.1 PORT = 11111 # TCP sockperf: Warmup stage (sending a few dummy messages)... sockperf: Starting test... sockperf: Test end (interrupted by timer) sockperf: Test ended sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054 sockperf: ========= Printing statistics for Server No: 0 sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186 sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975]) sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0 sockperf: Summary: Latency is 24.965 usec sockperf: Total 2391186 observations; each percentile contains 23911.86 observations sockperf: ---> <MAX> observation = 195.841 sockperf: ---> percentile 99.999 = 45.026 sockperf: ---> percentile 99.990 = 39.009 sockperf: ---> percentile 99.900 = 35.922 sockperf: ---> percentile 99.000 = 33.482 sockperf: ---> percentile 90.000 = 28.902 sockperf: ---> percentile 75.000 = 27.821 sockperf: ---> percentile 50.000 = 26.860 sockperf: ---> percentile 25.000 = 25.685 sockperf: ---> <MIN> observation = 12.277 Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit") Reported-by: Hugo Ferreira <hferreir@redhat.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-08Merge branch '40GbE' of ↵David S. Miller2-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-03-06 (iavf, i40e, ixgbe) This series contains updates to iavf, i40e, and ixgbe drivers. Alexey Kodanev removes duplicate calls related to cloud filters on iavf and unnecessary null checks on i40e. Maciej adds helper functions for common code relating to updating statistics for ixgbe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-3/+2
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06i40e: remove unnecessary qv_info ptr NULL checksAlexey Kodanev2-8/+0
The "qv_info" ptr cannot be NULL when it gets the address of an element of the flexible array "qvlist_info->qv_info". Detected using the static analysis tool - Svace. Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-05i40e: Fix firmware version comparison functionIvan Vecera1-2/+1
Helper i40e_is_fw_ver_eq() compares incorrectly given firmware version as it returns true when the major version of running firmware is greater than the given major version that is wrong and results in failure during getting of DCB configuration where this helper is used. Fix the check and return true only if the running FW version is exactly equals to the given version. Reproducer: 1. Load i40e driver 2. Check dmesg output [root@host ~]# modprobe i40e [root@host ~]# dmesg | grep 'i40e.*DCB' [ 74.750642] i40e 0000:02:00.0: Query for DCB configuration failed, err -EIO aq_err I40E_AQ_RC_EINVAL [ 74.759770] i40e 0000:02:00.0: DCB init failed -5, disabled [ 74.966550] i40e 0000:02:00.1: Query for DCB configuration failed, err -EIO aq_err I40E_AQ_RC_EINVAL [ 74.975683] i40e 0000:02:00.1: DCB init failed -5, disabled Fixes: cf488e13221f ("i40e: Add other helpers to check version of running firmware and AQ API") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-04net: adopt skb_network_header_len() more broadlyEric Dumazet1-1/+1
(skb_transport_header(skb) - skb_network_header(skb)) can be replaced by skb_network_header_len(skb) Add a DEBUG_NET_WARN_ON_ONCE() in skb_network_header_len() to catch cases were the transport_header was not set. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-04net: adopt skb_network_offset() and similar helpersEric Dumazet1-1/+1
This is a cleanup patch, making code a bit more concise. 1) Use skb_network_offset(skb) in place of (skb_network_header(skb) - skb->data) 2) Use -skb_network_offset(skb) in place of (skb->data - skb_network_header(skb)) 3) Use skb_transport_offset(skb) in place of (skb_transport_header(skb) - skb->data) 4) Use skb_inner_transport_offset(skb) in place of (skb_inner_transport_header(skb) - skb->data) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01i40e: disable NAPI right after disabling irqs when handling xsk_poolMaciej Fijalkowski1-1/+1
Disable NAPI before shutting down queues that this particular NAPI contains so that the order of actions in i40e_queue_pair_disable() mirrors what we do in i40e_queue_pair_enable(). Fixes: 123cecd427b6 ("i40e: added queue pair disable/enable functions") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-28net: intel: i40e/igc: Remove setting Autoneg in EEE capabilitiesAndrew Lunn1-6/+1
Energy Efficient Ethernet should always be negotiated with the link peer. Don't include SUPPORTED_Autoneg in the results of get_eee() for supported, advertised or lp_advertised, since it is assumed. Additionally, ethtool(1) ignores the set bit, and no other driver sets this. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-16i40e: Remove VEB recursionIvan Vecera3-109/+76
The VEB (virtual embedded switch) as a switch element can be connected according datasheet though its uplink to: - Physical port - Port Virtualizer (not used directly by i40e driver but can be present in MFP mode where the physical port is shared between PFs) - No uplink (aka floating VEB) But VEB uplink cannot be connected to another VEB and any attempt to do so results in: "i40e 0000:02:00.0: couldn't add VEB, err -EIO aq_err I40E_AQ_RC_ENOENT" that indicates "the uplink SEID does not point to valid element". Remove this logic from the driver code this way: 1) For debugfs only allow to build floating VEB (uplink_seid == 0) or main VEB (uplink_seid == mac_seid) 2) Do not recurse in i40e_veb_link_event() as no VEB cannot have sub-VEBs 3) Ditto for i40e_veb_rebuild() + simplify the function as we know that the VEB for rebuild can be only the main LAN VEB or some of the floating VEBs 4) In i40e_rebuild() there is no need to check veb->uplink_seid as the possible ones are 0 and MAC SEID 5) In i40e_vsi_release() do not take into account VEBs whose uplink is another VEB as this is not possible 6) Remove veb_idx field from i40e_veb as a VEB cannot have sub-VEBs Tested using i40e debugfs interface: 1) Initial state [root@cnb-03 net-next]# CMD="/sys/kernel/debug/i40e/0000:02:00.0/command" [root@cnb-03 net-next]# echo dump switch > $CMD [root@cnb-03 net-next]# dmesg -c [ 98.440641] i40e 0000:02:00.0: header: 3 reported 3 total [ 98.446053] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 98.452593] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 98.458856] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 2) Add floating VEB [root@cnb-03 net-next]# echo add relay > $CMD [root@cnb-03 net-next]# dmesg -c [ 122.745630] i40e 0000:02:00.0: added relay 162 [root@cnb-03 net-next]# echo dump switch > $CMD [root@cnb-03 net-next]# dmesg -c [ 136.650049] i40e 0000:02:00.0: header: 4 reported 4 total [ 136.655466] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 136.661994] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 136.668264] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 [ 136.674787] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 3) Add VMDQ2 VSI to this new VEB [root@cnb-03 net-next]# dmesg -c [ 168.351763] i40e 0000:02:00.0: added VSI 394 to relay 162 [ 168.374652] enp2s0f0np0v0: NIC Link is Up, 40 Gbps Full Duplex, Flow Control: None [root@cnb-03 net-next]# echo dump switch > $CMD [root@cnb-03 net-next]# dmesg -c [ 195.683204] i40e 0000:02:00.0: header: 5 reported 5 total [ 195.688611] i40e 0000:02:00.0: type=19 seid=394 uplink=162 downlink=16 [ 195.695143] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 [ 195.701410] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 195.707935] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 195.714201] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 4) Try to delete the VEB [root@cnb-03 net-next]# echo del relay 162 > $CMD [root@cnb-03 net-next]# dmesg -c [ 239.260901] i40e 0000:02:00.0: deleting relay 162 [ 239.265621] i40e 0000:02:00.0: can't remove VEB 162 with 1 VSIs left 5) Do PF reset and check switch status after rebuild [root@cnb-03 net-next]# echo pfr > $CMD [root@cnb-03 net-next]# echo dump switch > $CMD [root@cnb-03 net-next]# dmesg -c ... [ 272.333655] i40e 0000:02:00.0: header: 5 reported 5 total [ 272.339066] i40e 0000:02:00.0: type=19 seid=394 uplink=162 downlink=16 [ 272.345599] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 [ 272.351862] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 272.358387] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 272.364654] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 6) Delete VSI and delete VEB [ 297.199116] i40e 0000:02:00.0: deleting VSI 394 [ 299.807580] i40e 0000:02:00.0: deleting relay 162 [ 309.767905] i40e 0000:02:00.0: header: 3 reported 3 total [ 309.773318] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 309.779845] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 309.786111] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-16i40e: Fix broken support for floating VEBsIvan Vecera2-49/+67
Although the i40e supports so-called floating VEB (VEB without an uplink connection to external network), this support is broken. This functionality is currently unused (except debugfs) but it will be used by subsequent series for switchdev mode slow-path. Fix this by following: 1) Handle correctly floating VEB (VEB with uplink_seid == 0) in i40e_reconstitute_veb() and look for owner VSI and create it only for non-floating VEBs and also set bridge mode only for such VEBs as the floating ones are using always VEB mode. 2) Handle correctly floating VEB in i40e_veb_release() and disallow its release when there are some VSIs. This is different from regular VEB that have owner VSI that is connected to VEB's uplink after VEB deletion by FW. 3) Fix i40e_add_veb() to handle 'vsi' that is NULL for floating VEBs. For floating VEB use 0 for downlink SEID and 'true' for 'default_port' parameters as per datasheet. 4) Fix 'add relay' command in i40e_dbg_command_write() to allow to create floating VEB by 'add relay 0 0' or 'add relay' Tested using debugfs: 1) Initial state [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 173.701286] i40e 0000:02:00.0: header: 3 reported 3 total [ 173.706701] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 173.713241] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 173.719507] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 2) Add floating VEB [root@host net-next]# CMD="/sys/kernel/debug/i40e/0000:02:00.0/command" [root@host net-next]# echo add relay > $CMD [root@host net-next]# dmesg -c [ 245.551720] i40e 0000:02:00.0: added relay 162 [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 276.984371] i40e 0000:02:00.0: header: 4 reported 4 total [ 276.989779] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 276.996302] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 277.002569] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 [ 277.009091] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 3) Add VMDQ2 VSI to this new VEB [root@host net-next]# echo add vsi 162 > $CMD [root@host net-next]# dmesg -c [ 332.314030] i40e 0000:02:00.0: added VSI 394 to relay 162 [ 332.337486] enp2s0f0np0v0: NIC Link is Up, 40 Gbps Full Duplex, Flow Control: None [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 387.284490] i40e 0000:02:00.0: header: 5 reported 5 total [ 387.289904] i40e 0000:02:00.0: type=19 seid=394 uplink=162 downlink=16 [ 387.296446] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 [ 387.302708] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 387.309234] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 387.315500] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 4) Try to delete the VEB [root@host net-next]# echo del relay 162 > $CMD [root@host net-next]# dmesg -c [ 428.749297] i40e 0000:02:00.0: deleting relay 162 [ 428.754011] i40e 0000:02:00.0: can't remove VEB 162 with 1 VSIs left 5) Do PF reset and check switch status after rebuild [root@host net-next]# echo pfr > $CMD [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 738.056172] i40e 0000:02:00.0: header: 5 reported 5 total [ 738.061577] i40e 0000:02:00.0: type=19 seid=394 uplink=162 downlink=16 [ 738.068104] i40e 0000:02:00.0: type=17 seid=162 uplink=0 downlink=0 [ 738.074367] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 738.080892] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 738.087160] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 6) Delete VSI and delete VEB [root@host net-next]# echo del vsi 394 > $CMD [root@host net-next]# echo del relay 162 > $CMD [root@host net-next]# echo dump switch > $CMD [root@host net-next]# dmesg -c [ 1233.081126] i40e 0000:02:00.0: deleting VSI 394 [ 1239.345139] i40e 0000:02:00.0: deleting relay 162 [ 1244.886920] i40e 0000:02:00.0: header: 3 reported 3 total [ 1244.892328] i40e 0000:02:00.0: type=19 seid=392 uplink=160 downlink=16 [ 1244.898853] i40e 0000:02:00.0: type=17 seid=160 uplink=2 downlink=0 [ 1244.905119] i40e 0000:02:00.0: type=19 seid=390 uplink=160 downlink=16 Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-16i40e: Add helpers to find VSI and VEB by SEID and use themIvan Vecera3-86/+68
Add two helpers i40e_(veb|vsi)_get_by_seid() to find corresponding VEB or VSI by their SEID value and use these helpers to replace existing open-coded loops. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-16i40e: Introduce and use macros for iterating VSIs and VEBsIvan Vecera4-245/+264
Introduce i40e_for_each_vsi() and i40e_for_each_veb() helper macros and use them to iterate relevant arrays. Replace pattern: for (i = 0; i < pf->num_alloc_vsi; i++) by: i40e_for_each_vsi(pf, i, vsi) and pattern: for (i = 0; i < I40E_MAX_VEB; i++) by i40e_for_each_veb(pf, i, veb) These macros also check if array item pf->vsi[i] or pf->veb[i] are not NULL and skip such items so we can remove redundant checks from loop bodies. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-16i40e: Use existing helper to find flow director VSIIvan Vecera1-7/+4
Use existing i40e_find_vsi_by_type() to find a VSI associated with flow director. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-20/+44
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/dev.c 9f30831390ed ("net: add rcu safety to rtnl_prop_list_size()") 723de3ebef03 ("net: free altname using an RCU callback") net/unix/garbage.c 11498715f266 ("af_unix: Remove io_uring code for GC.") 25236c91b5ab ("af_unix: Fix task hung while purging oob_skb in GC.") drivers/net/ethernet/renesas/ravb_main.c ed4adc07207d ("net: ravb: Count packets instead of descriptors in GbEth RX path" ) c2da9408579d ("ravb: Add Rx checksum offload support for GbEth") net/mptcp/protocol.c bdd70eb68913 ("mptcp: drop the push_pending field") 28e5c1380506 ("mptcp: annotate lockless accesses around read-mostly fields") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-14Merge branch '40GbE' of ↵David S. Miller2-15/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-02-12 (i40e) This series contains updates to i40e driver only. Ivan Vecera corrects the looping value used while waiting for queues to be disabled as well as an incorrect mask being used for DCB configuration. Maciej resolves an issue related to XDP traffic; removing a double call to i40e_pf_rxq_wait() and accounting for XDP rings when stopping rings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-13i40e: take into account XDP Tx queues when stopping ringsMaciej Fijalkowski1-6/+8
Seth reported that on his side XDP traffic can not survive a round of down/up against i40e interface. Dmesg output was telling us that we were not able to disable the very first XDP ring. That was due to the fact that in i40e_vsi_stop_rings() in a pre-work that is done before calling i40e_vsi_wait_queues_disabled(), XDP Tx queues were not taken into the account. To fix this, let us distinguish between Rx and Tx queue boundaries and take into the account XDP queues for Tx side. Reported-by: Seth Forshee <sforshee@kernel.org> Closes: https://lore.kernel.org/netdev/ZbkE7Ep1N1Ou17sA@do-x1extreme/ Fixes: 65662a8dcdd0 ("i40e: Fix logic of disabling queues") Tested-by: Seth Forshee <sforshee@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-13i40e: avoid double calling i40e_pf_rxq_wait()Maciej Fijalkowski1-9/+3
Currently, when interface is being brought down and i40e_vsi_stop_rings() is called, i40e_pf_rxq_wait() is called two times, which is wrong. To showcase this scenario, simplified call stack looks as follows: i40e_vsi_stop_rings() i40e_control wait rx_q() i40e_control_rx_q() i40e_pf_rxq_wait() i40e_vsi_wait_queues_disabled() i40e_pf_rxq_wait() // redundant call To fix this, let us s/i40e_control_wait_rx_q/i40e_control_rx_q within i40e_vsi_stop_rings(). Fixes: 65662a8dcdd0 ("i40e: Fix logic of disabling queues") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-13i40e: Fix wrong mask used during DCB configIvan Vecera1-1/+1
Mask used for clearing PRTDCB_RETSTCC register in function i40e_dcb_hw_rx_ets_bw_config() is incorrect as there is used define I40E_PRTDCB_RETSTCC_ETSTC_SHIFT instead of define I40E_PRTDCB_RETSTCC_ETSTC_MASK. The PRTDCB_RETSTCC register is used to configure whether ETS or strict priority is used as TSA in Rx for particular TC. In practice it means that once the register is set to use ETS as TSA then it is not possible to switch back to strict priority without CoreR reset. Fix the value in the clearing mask. Fixes: 90bc8e003be2 ("i40e: Add hardware configuration for software based DCB") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-13i40e: Fix waiting for queues of all VSIs to be disabledIvan Vecera1-1/+1
The function i40e_pf_wait_queues_disabled() iterates all PF's VSIs up to 'pf->hw.func_caps.num_vsis' but this is incorrect because the real number of VSIs can be up to 'pf->num_alloc_vsi' that can be higher. Fix this loop. Fixes: 69129dc39fac ("i40e: Modify Tx disable wait flow in case of DCB reconfiguration") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>