summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-11-22bnxt_en: Add bnxt_setup_ctxm_pg_tbls() helper functionMichael Chan1-74/+59
In bnxt_alloc_ctx_mem(), the logic to set up the context memory entries and to allocate the context memory tables is done repetitively. Add a helper function to simplify the code. The setup of the Fast Path TQM entries relies on some information from the Slow Path TQM entries. Copy the SP_TQM entries to the FP_TQM entries to simplify the logic. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231120234405.194542-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22bnxt_en: Use the pg_info field in bnxt_ctx_mem_type structMichael Chan2-54/+29
Use the newly added pg_info field in bnxt_ctx_mem_type struct and remove the standalone page info structures in bnxt_ctx_mem_info. This now completes the reorganization of the context memory structures to work better with the new and more flexible firmware interface for newer chips. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231120234405.194542-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22bnxt_en: Add page info to struct bnxt_ctx_mem_typeMichael Chan2-0/+32
This will further improve the organization of the bnxt_ctx_mem_info structure by moving the standalone page info structures into the bnxt_ctx_mem_type array. Add the allocation and free logic first and the next patch will migrate to use the new infrastructure. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231120234405.194542-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22bnxt_en: Restructure context memory data structuresMichael Chan2-182/+242
The current code uses a flat bnxt_ctx_mem_info structure to store 8 types of context memory for the NIC. All the context memory types are very similar and have similar parameters. They can all share a common structure to improve the organization. Also, new firmware interface will provide a new API to retrieve each type of context memory by calling the API repeatedly. This patch reorganizes the bnxt_ctx_mem_info structure to fit better with the new firmware interface. It will also work with the legacy firmware interface. The flat fields in bnxt_ctx_mem_info are replaced by the bnxt_ctx_mem_type array. The bnxt_mem_init array info will no longer be needed. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231120234405.194542-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22bnxt_en: Free bp->ctx inside bnxt_free_ctx_mem()Michael Chan2-14/+2
We always free bp->ctx right after calling bnxt_free_ctx_mem(), so just free it at the end of that function to make things simpler. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231120234405.194542-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22bnxt_en: The caller of bnxt_alloc_ctx_mem() should always free bp->ctxMichael Chan1-2/+2
bnxt_alloc_ctx_mem() calls bnxt_hwrm_func_backing_store_qcaps() to allocate the memory for bp->ctx. Initialize bp->ctx with the allocated memory and let the caller free it during unwind. The unwind logic is already there, we just need to always set bp->ctx to the allocated memory so the caller will always free it. This simplifies the logic and makes it easier to expand on the backing store logic. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231120234405.194542-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22Merge branch 'net-page_pool-add-netlink-based-introspection-part1'Jakub Kicinski2-15/+27
Jakub Kicinski says: ==================== net: page_pool: plit the page_pool_params into fast and slow Small refactoring in prep for adding more page pool params which won't be needed on the fast path. v1: https://lore.kernel.org/all/20231024160220.3973311-1-kuba@kernel.org/ RFC: https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20231121000048.789613-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22net: page_pool: avoid touching slow on the fastpathJakub Kicinski2-1/+5
To fully benefit from previous commit add one byte of state in the first cache line recording if we need to look at the slow part. The packing isn't all that impressive right now, we create a 7B hole. I'm expecting Olek's rework will reshuffle this, anyway. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20231121000048.789613-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22net: page_pool: split the page_pool_params into fast and slowJakub Kicinski2-15/+23
struct page_pool is rather performance critical and we use 16B of the first cache line to store 2 pointers used only by test code. Future patches will add more informational (non-fast path) attributes. It's convenient for the user of the API to not have to worry which fields are fast and which are slow path. Use struct groups to split the params into the two categories internally. Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20231121000048.789613-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22Merge tag 'for-netdev' of ↵Jakub Kicinski18-416/+1043
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-11-21 We've added 19 non-merge commits during the last 4 day(s) which contain a total of 18 files changed, 1043 insertions(+), 416 deletions(-). The main changes are: 1) Fix BPF verifier to validate callbacks as if they are called an unknown number of times in order to fix not detecting some unsafe programs, from Eduard Zingerman. 2) Fix bpf_redirect_peer() handling which missed proper stats accounting for veth and netkit and also generally fix missing stats for the latter, from Peilin Ye, Daniel Borkmann et al. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: check if max number of bpf_loop iterations is tracked bpf: keep track of max number of bpf_loop callback iterations selftests/bpf: test widening for iterating callbacks bpf: widening for callback iterators selftests/bpf: tests for iterating callbacks bpf: verify callbacks as if they are called unknown number of times bpf: extract setup_func_entry() utility function bpf: extract __check_reg_arg() utility function selftests/bpf: fix bpf_loop_bench for new callback verification scheme selftests/bpf: track string payload offset as scalar in strobemeta selftests/bpf: track tcp payload offset as scalar in xdp_synproxy selftests/bpf: Add netkit to tc_redirect selftest selftests/bpf: De-veth-ize the tc_redirect test case bpf, netkit: Add indirect call wrapper for fetching peer dev bpf: Fix dev's rx stats for bpf_redirect_peer traffic veth: Use tstats per-CPU traffic counters netkit: Add tstats per-CPU traffic counters net: Move {l,t,d}stats allocation to core and convert veth & vrf net, vrf: Move dstats structure to core ==================== Link: https://lore.kernel.org/r/20231121193113.11796-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22Merge branch 'mlxsw-preparations-for-support-of-cff-flood-mode'Jakub Kicinski9-31/+169
Petr Machata says: ==================== mlxsw: Preparations for support of CFF flood mode PGT is an in-HW table that maps addresses to sets of ports. Then when some HW process needs a set of ports as an argument, instead of embedding the actual set in the dynamic configuration, what gets configured is the address referencing the set. The HW then works with the appropriate PGT entry. Among other allocations, the PGT currently contains two large blocks for bridge flooding: one for 802.1q and one for 802.1d. Within each of these blocks are three tables, for unknown-unicast, multicast and broadcast flooding: . . . | 802.1q | 802.1d | . . . | UC | MC | BC | UC | MC | BC | \______ _____/ \_____ ______/ v v FID flood vectors Thus each FID (which corresponds to an 802.1d bridge or one VLAN in an 802.1q bridge) uses three flood vectors spread across a fairly large region of PGT. This way of organizing the flood table (called "controlled") is not very flexible. E.g. to decrease a bridge scale and store more IP MC vectors, one would need to completely rewrite the bridge PGT blocks, or resort to hacks such as storing individual MC flood vectors into unused part of the bridge table. In order to address these shortcomings, Spectrum-2 and above support what is called CFF flood mode, for Compressed FID Flooding. In CFF flood mode, each FID has a little table of its own, with three entries adjacent to each other, one for unknown-UC, one for MC, one for BC. This allows for a much more fine-grained approach to PGT management, where bits of it are allocated on demand. . . . | FID | FID | FID | FID | FID | . . . |U|M|B|U|M|B|U|M|B|U|M|B|U|M|B| \_____________ _____________/ v FID flood vectors Besides the FID table organization, the CFF flood mode also impacts Router Subport (RSP) table. This table contains flood vectors for rFIDs, which are FIDs that reference front panel ports or LAGs. The RSP table contains two entries per front panel port and LAG, one for unknown-UC traffic, and one for everything else. Currently, the FW allocates and manages the table in its own part of PGT. rFIDs are marked with flood_rsp bit and managed specially. In CFF mode, rFIDs are managed as all other FIDs. The driver therefore has to allocate and maintain the flood vectors. Like with bridge FIDs, this is more work, but increases flexibility of the system. The FW currently supports both the controlled and CFF flood modes. To shed complexity, in the future it should only support CFF flood mode. Hence this patchset, which is the first in series of two to add CFF flood mode support to mlxsw. There are FW versions out there that do not support CFF flood mode, and on Spectrum-1 in particular, there is no plan to support it at all. mlxsw will therefore have to support both controlled flood mode as well as CFF. Another aspect is that at least on Spectrum-1, there are FW versions out there that claim to support CFF flood mode, but then reject or ignore configurations enabling the same. The driver thus has to have a say in whether an attempt to configure CFF flood mode should even be made. Much like with the LAG mode, the feature is therefore expressed in terms of "does the driver prefer CFF flood mode?", and "what flood mode the PCI module managed to configure the FW with". This gives to the driver a chance to determine whether CFF flood mode configuration should be attempted. In this patchset, we lay the ground with new definitions, registers and their fields, and some minor code shaping. The next patchset will be more focused on introducing necessary abstractions and implementation. - Patches #1 and #2 add CFF-related items to the command interface. - Patch #3 adds a new resource, for maximum number of flood profiles supported. (A flood profile is a mapping between traffic type and offset in the per-FID flood vector table.) - Patches #4 to #8 adjust reg.h. The SFFP register is added, which is used for configuring the abovementioned traffic-type-to-offset mapping. The SFMR, register, which serves for FID configuration, is extended with fields specific to CFF mode. And other minor adjustments. - Patches #9 and #10 add the plumbing for CFF mode: a way to request that CFF flood mode be configured, and a way to query the flood mode that was actually configured. - Patch #11 removes dead code. - Patches #12 and #13 add helpers that the next patchset will make use of. Patch #14 moves RIF setup ahead so that FID code can make use of it. ==================== Link: https://lore.kernel.org/r/cover.1700503643.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: spectrum_router: Call RIF setup before obtaining FIDPetr Machata1-3/+3
For subport RIFs, the setup initializes, among other things, RIF port and LAG numbers. Those are important to determine where in the PGT the RIF FID will be stored. Therefore, call the RIF setup before fid_get. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/f24d8cad7e4748b8e8e0e16894ca6a20704dea32.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: spectrum_router: Add a helper to get subport number from a RIFPetr Machata2-0/+16
In the CFF flood mode, responsibility for management of the PGT entries for rFIDs is moved from FW to the driver. All rFIDs are based off either a front panel port, or a LAG port. The flood vectors for port-based rFIDs enable just the port itself, the ones for LAG-based rFIDs enable all member ports of the LAG in question. Since all rFIDs based off the same port have the same flood vector, and similarly for LAG-based rFIDs, the flood entries are shared. The PGT address of the flood vector is therefore determined based on the port (or LAG) number of the RIF connected with the rFID. Add a helper to determine subport number given a RIF, to be used in these calculations. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/d7ab43cf5b021f785f363f236e4b6780d10eea93.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: spectrum_fid: Extract SFMR packing into a helperPetr Machata1-14/+19
Both mlxsw_sp_fid_op() and mlxsw_sp_fid_edit_op() pack the core of SFMR the same way. Extract the common code into a helper and call that. Extract out of that a wrapper that just calls mlxsw_reg_sfmr_pack(), because it will be useful for the dummy family later on. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/31f32b4d767183f6cb197148d0792feab2efadba.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: spectrum_fid: Drop unnecessary conditionsPetr Machata1-6/+0
The caller already only calls mlxsw_sp_fid_flood_tables_init() and mlxsw_sp_fid_flood_tables_fini() if (fid_family->flood_tables). There is no configuration where the pointer is non-NULL, but the number of tables is zero. So drop the conditions. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/897c6841bc756ac632b797bf67ac83c6a66ba359.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: pci: Permit enabling CFF modePetr Machata2-1/+14
There are FW versions out there that do not support CFF flood mode, and on Spectrum-1 in particular, there is no plan to support it at all. mlxsw will therefore have to support both controlled flood mode as well as CFF. There are also FW versions out there that claim to support CFF flood mode, but then reject or ignore configurations enabling the same. The driver thus has to have a say in whether an attempt to configure CFF flood mode should even be made, and what to use as a fallback. Hence express the feature in terms of "does the driver prefer CFF flood mode?", and "what flood mode the PCI module managed to configure the FW with". This gives to the driver a chance to determine whether CFF flood mode configuration should be attempted. The latter bit was added in previous patches. In this patch, add the bit that allows the driver to determine whether CFF enablement should be attempted, and the enablement code itself. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/41640a0ee58e0a9538f820f7b601a0e35f6449e4.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: core, pci: Add plumbing related to CFF modePetr Machata3-0/+28
CFF mode, for Compressed FID Flooding, is a way of organizing flood vectors in the PGT table. The bus module determines whether CFF is supported, can configure flood mode to CFF if it is, and knows what flood mode has been configured. Therefore add a bus callback to determine the configured flood mode. Also add to core an API to query it. Since after this patch, we rely on mlxsw_pci->flood_mode being set, it becomes a coding error if a driver invokes this function with a set of fields that misses the initialization. Warn and bail out in that case. The CFF mode is not used as of this patch. The code to actually use it will be added later. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/889d58759dd40f5037f2206b9fc4a78a9240da80.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: reg: Add to SFMR register the fields related to CFF flood modePetr Machata1-0/+20
Add the field cff_mid_base, which specifies at which point in PGT the per-FID flood table is stored. Add cff_prf_id, the profile ID, which determines on which row of the flood table a flood vector can be found for a given traffic type. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/3ad7ae38cf6534bedcd876f16090d109a814b3e3.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: reg: Extract flood-mode specific part of mlxsw_reg_sfmr_pack()Petr Machata2-10/+10
In CFF mode, it is necessary to set a different set of SFMR fields. Leave in mlxsw_reg_sfmr_pack() only the common bits, and move the parts relevant to controlled flood mode directly to the call site. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/6f29639ebc3ca0722272e6c644ca910096469413.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: reg: Drop unnecessary writes from mlxsw_reg_sfmr_pack()Petr Machata1-2/+0
The MLXSW_REG_ZERO at the beginning of the function wipes the whole payload. There's no need to set vtfp and vv to false explicitly. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/04a51ea7cf31eea0ef7707311d8e864e2d9ef307.1700503644.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: reg: Mark SFGC & some SFMR fields as reserved in CFF modePetr Machata1-0/+6
Some existing fields and the whole register of SFGC are reserved in CFF mode. Backport the reservation note to these fields. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/e1d5977a8cb778227e4ea2fd1515529957ce5de7.1700503643.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: reg: Add Switch FID Flooding Profiles RegisterPetr Machata1-0/+45
The SFFP register populates the fid flooding profile tables used for the NVE flooding and Compressed-FID Flooding (CFF). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/ca42eb67763bd0c7cf035afc62ef73632f3f61a6.1700503643.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: resources: Add max_cap_nve_flood_prfPetr Machata1-0/+2
max_cap_nve_flood_prf describes maximum number of NVE flooding profiles. The same value then applies for flooding profiles for flooding in CFF mode. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/064a2e013d879e5f5494167a6c120c4bb85a2204.1700503643.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: cmd: Add MLXSW_CMD_MBOX_CONFIG_PROFILE_FLOOD_MODE_CFFPetr Machata1-0/+5
PGT, a port-group table is an in-HW block of specialized memory that holds sets of ports. Allocated within the PGT are series of flood tables that describe to which ports traffic of various types (unknown UC, BC, MC) should be flooded from which FID. The hitherto-used layout of these flood tables is being replaced with a more flexible scheme, called compressed FID flooding (CFF). CFF can be configured through CONFIG_PROFILE.flood_mode. In this patch, add MLXSW_CMD_MBOX_CONFIG_PROFILE_FLOOD_MODE_CFF, the value to use to enable the CFF mode. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/fc2e063742856492f8f22b0b87abf431ea6d53d0.1700503643.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22mlxsw: cmd: Add cmd_mbox.query_fw.cff_supportPetr Machata1-0/+6
PGT, a port-group table is an in-HW block of specialized memory that holds sets of ports. Allocated within the PGT are series of flood tables that describe to which ports traffic of various types (unknown UC, BC, MC) should be flooded from which FID. The hitherto-used layout of these flood tables is being replaced with a more flexible scheme, called compressed FID flooding (CFF). CFF can be configured through CONFIG_PROFILE.flood_mode. cff_support determines whether CONFIG_PROFILE.flood_mode can be set to CFF. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/af727d0e1095e30fa45c7e60404637cdc491aeec.1700503643.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22net: do not send a MOVE event when netdev changes netnsJakub Kicinski1-4/+6
Networking supports changing netdevice's netns and name at the same time. This allows avoiding name conflicts and having to rename the interface in multiple steps. E.g. netns1={eth0, eth1}, netns2={eth1} - we want to move netns1:eth1 to netns2 and call it eth0 there. If we can't rename "in flight" we'd need to (1) rename eth1 -> $tmp, (2) change netns, (3) rename $tmp -> eth0. To rename the underlying struct device we have to call device_rename(). The rename()'s MOVE event, however, doesn't "belong" to either the old or the new namespace. If there are conflicts on both sides it's actually impossible to issue a real MOVE (old name -> new name) without confusing user space. And Daniel reports that such confusions do in fact happen for systemd, in real life. Since we already issue explicit REMOVE and ADD events manually - suppress the MOVE event completely. Move the ADD after the rename, so that the REMOVE uses the old name, and the ADD the new one. If there is no rename this changes the picture as follows: Before: old ns | KERNEL[213.399289] remove /devices/virtual/net/eth0 (net) new ns | KERNEL[213.401302] add /devices/virtual/net/eth0 (net) new ns | KERNEL[213.401397] move /devices/virtual/net/eth0 (net) After: old ns | KERNEL[266.774257] remove /devices/virtual/net/eth0 (net) new ns | KERNEL[266.774509] add /devices/virtual/net/eth0 (net) If there is a rename and a conflict (using the exact eth0/eth1 example explained above) we get this: Before: old ns | KERNEL[224.316833] remove /devices/virtual/net/eth1 (net) new ns | KERNEL[224.318551] add /devices/virtual/net/eth1 (net) new ns | KERNEL[224.319662] move /devices/virtual/net/eth0 (net) After: old ns | KERNEL[333.033166] remove /devices/virtual/net/eth1 (net) new ns | KERNEL[333.035098] add /devices/virtual/net/eth0 (net) Note that "in flight" rename is only performed when needed. If there is no conflict for old name in the target netns - the rename will be performed separately by dev_change_name(), as if the rename was a different command, and there will still be a MOVE event for the rename: Before: old ns | KERNEL[194.416429] remove /devices/virtual/net/eth0 (net) new ns | KERNEL[194.418809] add /devices/virtual/net/eth0 (net) new ns | KERNEL[194.418869] move /devices/virtual/net/eth0 (net) new ns | KERNEL[194.420866] move /devices/virtual/net/eth1 (net) After: old ns | KERNEL[71.917520] remove /devices/virtual/net/eth0 (net) new ns | KERNEL[71.919155] add /devices/virtual/net/eth0 (net) new ns | KERNEL[71.920729] move /devices/virtual/net/eth1 (net) If deleting the MOVE event breaks some user space we should insert an explicit kobject_uevent(MOVE) after the ADD, like this: @@ -11192,6 +11192,12 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net, kobject_uevent(&dev->dev.kobj, KOBJ_ADD); netdev_adjacent_add_links(dev); + /* User space wants an explicit MOVE event, issue one unless + * dev_change_name() will get called later and issue one. + */ + if (!pat || new_name[0]) + kobject_uevent(&dev->dev.kobj, KOBJ_MOVE); + /* Adapt owner in case owning user namespace of target network * namespace is different from the original one. */ Reported-by: Daniel Gröber <dxld@darkboxed.org> Link: https://lore.kernel.org/all/20231010121003.x3yi6fihecewjy4e@House.clients.dxld.at/ Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/all/20231120184140.578375-1-kuba@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22docs: netdev: try to guide people on dealing with silenceJakub Kicinski1-3/+17
There has been more than a few threads which went idle before the merge window and now people came back to them and started asking about next steps. We currently tell people to be patient and not to repost too often. Our "not too often", however, is still a few orders of magnitude faster than other subsystems. Or so I feel after hearing people talk about review rates at LPC. Clarify in the doc that if the discussion went idle for a week on netdev, 95% of the time there's no point waiting longer. Link: https://lore.kernel.org/r/20231120200109.620392-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22net: usb: ax88179_178a: avoid two consecutive device resetsJose Ignacio Tornos Martinez1-2/+0
The device is always reset two consecutive times (ax88179_reset is called twice), one from usbnet_probe during the device binding and the other from usbnet_open. Remove the non-necessary reset during the device binding and let the reset operation from open to keep the normal behavior (tested with generic ASIX Electronics Corp. AX88179 Gigabit Ethernet device). Reported-by: Herb Wei <weihao.bj@ieisystem.com> Tested-by: Herb Wei <weihao.bj@ieisystem.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Link: https://lore.kernel.org/r/20231120121239.54504-1-jtornosm@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-22net: usb: ax88179_178a: fix failed operations during ax88179_resetJose Ignacio Tornos Martinez1-2/+2
Using generic ASIX Electronics Corp. AX88179 Gigabit Ethernet device, the following test cycle has been implemented: - power on - check logs - shutdown - after detecting the system shutdown, disconnect power - after approximately 60 seconds of sleep, power is restored Running some cycles, sometimes error logs like this appear: kernel: ax88179_178a 2-9:1.0 (unnamed net_device) (uninitialized): Failed to write reg index 0x0001: -19 kernel: ax88179_178a 2-9:1.0 (unnamed net_device) (uninitialized): Failed to read reg index 0x0001: -19 ... These failed operation are happening during ax88179_reset execution, so the initialization could not be correct. In order to avoid this, we need to increase the delay after reset and clock initial operations. By using these larger values, many cycles have been run and no failed operations appear. It would be better to check some status register to verify when the operation has finished, but I do not have found any available information (neither in the public datasheets nor in the manufacturer's driver). The only available information for the necessary delays is the maufacturer's driver (original values) but the proposed values are not enough for the tested devices. Fixes: e2ca90c276e1f ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver") Reported-by: Herb Wei <weihao.bj@ieisystem.com> Tested-by: Herb Wei <weihao.bj@ieisystem.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Link: https://lore.kernel.org/r/20231120120642.54334-1-jtornosm@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21Merge tag 'platform-drivers-x86-v6.7-2' of ↵Linus Torvalds5-56/+20
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers fixes from Ilpo Järvinen: "Just a few fixes (one with two non-fix deps) plus tidying up MAINTAINERS" * tag 'platform-drivers-x86-v6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: intel_telemetry: Fix kernel doc descriptions MAINTAINERS: Drop Mark Gross as maintainer for x86 platform drivers platform/x86/amd/pmc: adjust getting DRAM size behavior platform/x86: hp-bioscfg: Remove unused obj in hp_add_other_attributes() platform/x86: hp-bioscfg: Fix error handling in hp_add_other_attributes() platform/x86: hp-bioscfg: move mutex_lock() down in hp_add_other_attributes() platform/x86: hp-bioscfg: Simplify return check in hp_add_other_attributes() platform/x86: ideapad-laptop: Set max_brightness before using it MAINTAINERS: Remove stale entry for SBL platform driver
2023-11-21Merge tag 'erofs-for-6.7-rc3-fixes' of ↵Linus Torvalds5-66/+44
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Tidy up erofs_read_inode() for simplicity - Fix broken fscache mode due to NULL dereference of dif->bdev_handle - Add the EROFS webpage to MAINTAINERS, documentation, and Kconfig * tag 'erofs-for-6.7-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: MAINTAINERS: erofs: add EROFS webpage erofs: fix NULL dereference of dif->bdev_handle in fscache mode erofs: simplify erofs_read_inode()
2023-11-21Merge branch 'selftests-bpf-update-multiple-prog_tests-to-use-assert_-macros'Andrii Nakryiko4-169/+105
Yuran Pereira says: ==================== selftests/bpf: Update multiple prog_tests to use ASSERT_ macros Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are available. As it was already pointed out by Yonghong Song [1] in the bpf selftests the use of the ASSERT_* series of macros is preferred over the CHECK macro. This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_` family of macros in the following prog_tests: - bind_perm.c - bpf_obj_id.c - bpf_tcp_ca.c - vmlinux.c [1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev Changes in v3: - Addressed the following points mentioned by Yonghong Song - Improved `bpf_map_lookup_elem` assertion in bpf_tcp_ca. - Replaced assertion introduced in v2 with one that checks `thread_ret` instead of `pthread_join`. This ensures that `server`'s return value (thread_ret) is the one being checked, as oposed to `pthread_join`'s return value, since the latter one is less likely to fail. Changes in v2: - Fixed pthread_join assertion that broke the previous test Previous version: v2 - https://lore.kernel.org/lkml/GV1PR10MB6563AECF8E94798A1E5B36A4E8B6A@GV1PR10MB6563.EURPRD10.PROD.OUTLOOK.COM v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10MB6563.EURPRD10.PROD.OUTLOOK.COM ==================== Link: https://lore.kernel.org/r/GV1PR10MB6563BEFEA4269E1DDBC264B1E8BBA@GV1PR10MB6563.EURPRD10.PROD.OUTLOOK.COM Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-11-21selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in vmlinuxYuran Pereira1-8/+8
vmlinux.c uses the `CHECK` calls even though the use of ASSERT_ series of macros is preferred in the bpf selftests. This patch replaces all `CHECK` calls for equivalent `ASSERT_` macro calls. Signed-off-by: Yuran Pereira <yuran.pereira@hotmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/GV1PR10MB6563ED1023A2A3AEF30BDA5DE8BBA@GV1PR10MB6563.EURPRD10.PROD.OUTLOOK.COM
2023-11-21selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_obj_idYuran Pereira1-131/+73
bpf_obj_id uses the `CHECK` calls even though the use of ASSERT_ series of macros is preferred in the bpf selftests. This patch replaces all `CHECK` calls for equivalent `ASSERT_` macro calls. Signed-off-by: Yuran Pereira <yuran.pereira@hotmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/GV1PR10MB65639AA3A10B4BBAA79952C7E8BBA@GV1PR10MB6563.EURPRD10.PROD.OUTLOOK.COM
2023-11-21selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bind_permYuran Pereira1-4/+2
bind_perm uses the `CHECK` calls even though the use of ASSERT_ series of macros is preferred in the bpf selftests. This patch replaces all `CHECK` calls for equivalent `ASSERT_` macro calls. Signed-off-by: Yuran Pereira <yuran.pereira@hotmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/GV1PR10MB656314F467E075A106CA02BFE8BBA@GV1PR10MB6563.EURPRD10.PROD.OUTLOOK.COM
2023-11-21selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_caYuran Pereira1-26/+22
bpf_tcp_ca uses the `CHECK` calls even though the use of ASSERT_ series of macros is preferred in the bpf selftests. This patch replaces all `CHECK` calls for equivalent `ASSERT_` macro calls. Signed-off-by: Yuran Pereira <yuran.pereira@hotmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/GV1PR10MB6563F180C0F2BB4F6CFA5130E8BBA@GV1PR10MB6563.EURPRD10.PROD.OUTLOOK.COM
2023-11-21MAINTAINERS: Add indirect_call_wrapper.h to NETWORKING [GENERAL]Simon Horman1-0/+1
indirect_call_wrapper.h is not, strictly speaking, networking specific. However, it's git history indicates that in practice changes go through netdev and thus the netdev maintainers have effectively been taking responsibility for it. Formalise this by adding it to the NETWORKING [GENERAL] section in the MAINTAINERS file. It is not clear how many other files under include/linux fall into this category and it would be interesting, as a follow-up, to audit that and propose further updates to the MAINTAINERS file as appropriate. Link: https://lore.kernel.org/netdev/20231116010310.4664dd38@kernel.org/ Signed-off-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231120-indirect_call_wrapper-maintainer-v1-1-0a6bb1f7363e@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21net: phylink: use for_each_set_bit()Russell King (Oracle)1-10/+8
Use for_each_set_bit() rather than open coding the for() test_bit() loop. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Link: https://lore.kernel.org/r/E1r4p15-00Cpxe-C7@rmk-PC.armlinux.org.uk Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21Merge branch 'hv_netvsc-fix-race-of-netvsc-vf-register-and-slave-bit'Paolo Abeni1-21/+45
Haiyang Zhang says: ==================== hv_netvsc: fix race of netvsc, VF register, and slave bit There are some races between netvsc probe, set notifier, VF register, and slave bit setting. This patch set fixes them. ==================== Link: https://lore.kernel.org/r/1700411023-14317-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21hv_netvsc: Mark VF as slave before exposing it to user-modeLong Li1-9/+23
When a VF is being exposed form the kernel, it should be marked as "slave" before exposing to the user-mode. The VF is not usable without netvsc running as master. The user-mode should never see a VF without the "slave" flag. This commit moves the code of setting the slave flag to the time before VF is exposed to user-mode. Cc: stable@vger.kernel.org Fixes: 0c195567a8f6 ("netvsc: transparent VF management") Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21hv_netvsc: Fix race of register_netdevice_notifier and VF registerHaiyang Zhang1-2/+7
If VF NIC is registered earlier, NETDEV_REGISTER event is replayed, but NETDEV_POST_INIT is not. Move register_netdevice_notifier() earlier, so the call back function is set before probing. Cc: stable@vger.kernel.org Fixes: e04e7a7bbd4b ("hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()") Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21hv_netvsc: fix race of netvsc and VF register_netdeviceHaiyang Zhang1-10/+15
The rtnl lock also needs to be held before rndis_filter_device_add() which advertises nvsp_2_vsc_capability / sriov bit, and triggers VF NIC offering and registering. If VF NIC finished register_netdev() earlier it may cause name based config failure. To fix this issue, move the call to rtnl_lock() before rndis_filter_device_add(), so VF will be registered later than netvsc / synthetic NIC, and gets a name numbered (ethX) after netvsc. Cc: stable@vger.kernel.org Fixes: e04e7a7bbd4b ("hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()") Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21ipv4: Correct/silence an endian warning in __ip_do_redirectKunwu Chan1-1/+1
net/ipv4/route.c:783:46: warning: incorrect type in argument 2 (different base types) net/ipv4/route.c:783:46: expected unsigned int [usertype] key net/ipv4/route.c:783:46: got restricted __be32 [usertype] new_gw Fixes: 969447f226b4 ("ipv4: use new_gw for redirect neigh lookup") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Link: https://lore.kernel.org/r/20231119141759.420477-1-chentao@kylinos.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21net: stmmac: reduce dma ring display code duplicationBaruch Siach1-17/+11
The code to show extended descriptor is identical to normal one. Consolidate the code to remove duplication. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Link: https://lore.kernel.org/r/a2a5c5ce9338bdea60ec71d7eeb00fe757281557.1700372381.git.baruch@tkos.co.il Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21net: stmmac: remove extra newline from descriptors displayBaruch Siach1-1/+0
One newline per line should be enough. Reduce the verbosity of descriptors dump. Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Baruch Siach <baruch@tkos.co.il> Link: https://lore.kernel.org/r/444f3b1dd409fdb14ed2a1ae7679a86b110dadcd.1700372381.git.baruch@tkos.co.il Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21bonding: return -ENOMEM instead of BUG in alb_upper_dev_walkZhengchao Shao2-2/+6
If failed to allocate "tags" or could not find the final upper device from start_dev's upper list in bond_verify_device_path(), only the loopback detection of the current upper device should be affected, and the system is no need to be panic. So return -ENOMEM in alb_upper_dev_walk to stop walking, print some warn information when failed to allocate memory for vlan tags in bond_verify_device_path. I also think that the following function calls netdev_walk_all_upper_dev_rcu ---->>>alb_upper_dev_walk ---------->>>bond_verify_device_path From this way, "end device" can eventually be obtained from "start device" in bond_verify_device_path, IS_ERR(tags) could be instead of IS_ERR_OR_NULL(tags) in alb_upper_dev_walk. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20231118081653.1481260-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21octeon_ep: support Octeon CN10K devicesShinas Rasheed6-1/+1318
Add PCI Endpoint NIC support for Octeon CN10K devices. CN10K devices are part of Octeon 10 family products with similar PCI NIC characteristics. These include: - CN10KA - CNF10KA - CNF10KB - CN10KB Update supported device list in Documentation Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Link: https://lore.kernel.org/r/20231117103817.2468176-1-srasheed@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21platform/x86: intel_telemetry: Fix kernel doc descriptionsAndy Shevchenko1-2/+2
LKP found issues with a kernel doc in the driver: core.c:116: warning: Function parameter or member 'ioss_evtconfig' not described in 'telemetry_update_events' core.c:188: warning: Function parameter or member 'ioss_evtconfig' not described in 'telemetry_get_eventconfig' It looks like it were copy'n'paste typos when these descriptions had been introduced. Fix the typos. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310070743.WALmRGSY-lkp@intel.com/ Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20231120150756.1661425-1-andriy.shevchenko@linux.intel.com Reviewed-by: Rajneesh Bhardwaj <irenic.rajneesh@gmail.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-11-21MAINTAINERS: Drop Mark Gross as maintainer for x86 platform driversHans de Goede1-3/+0
Mark has not really been active as maintainer for x86 platform drivers lately, drop Mark from the MAINTAINERS entries for drivers/platform/x86, drivers/platform/mellanox and drivers/platform/surface. Cc: Mark Gross <markgross@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20231120154548.611041-1-hdegoede@redhat.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2023-11-21Docs/zh_CN/LoongArch: Update links in LoongArch introduction.rstYanteng Si1-2/+2
LoongArch-Vol1 has been updated to v1.10, the links in the documentation are out of date, let's update it. Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>