summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-08-16inet: move inet->hdrincl to inet->inet_flagsEric Dumazet10-41/+31
IP_HDRINCL socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: move inet->freebind to inet->inet_flagsEric Dumazet7-22/+22
IP_FREEBIND socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: move inet->recverr_rfc4884 to inet->inet_flagsEric Dumazet3-11/+11
IP_RECVERR_RFC4884 socket option can now be set/read without locking the socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: move inet->recverr to inet->inet_flagsEric Dumazet9-32/+30
IP_RECVERR socket option can now be set/get without locking the socket. This patch potentially avoid data-races around inet->recverr. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: set/get simple options locklesslyEric Dumazet1-56/+62
Now we have inet->inet_flags, we can set following options without having to hold the socket lock: IP_PKTINFO, IP_RECVTTL, IP_RECVTOS, IP_RECVOPTS, IP_RETOPTS, IP_PASSSEC, IP_RECVORIGDSTADDR, IP_RECVFRAGSIZE. ip_sock_set_pktinfo() no longer hold the socket lock. Similarly we can get the following options whithout holding the socket lock: IP_PKTINFO, IP_RECVTTL, IP_RECVTOS, IP_RECVOPTS, IP_RETOPTS, IP_PASSSEC, IP_RECVORIGDSTADDR, IP_CHECKSUM, IP_RECVFRAGSIZE. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16inet: introduce inet->inet_flagsEric Dumazet8-71/+83
Various inet fields are currently racy. do_ip_setsockopt() and do_ip_getsockopt() are mostly holding the socket lock, but some (fast) paths do not. Use a new inet->inet_flags to hold atomic bits in the series. Remove inet->cmsg_flags, and use instead 9 bits from inet_flags. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16Merge branch 'redundant-of_match_ptr'David S. Miller6-7/+7
Ruan Jinjie says: ==================== net: Remove redundant of_match_ptr() macro Since these net drivers depend on CONFIG_OF, there is no need to wrap the macro of_match_ptr() here. Changes in v3: - Collect responses from v1 and v2. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16wlcore: spi: Remove redundant of_match_ptr()Ruan Jinjie1-1/+1
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: qualcomm: Remove redundant of_match_ptr()Ruan Jinjie1-1/+1
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: gemini: Remove redundant of_match_ptr()Ruan Jinjie1-2/+2
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: dsa: rzn1-a5psw: Remove redundant of_match_ptr()Ruan Jinjie1-1/+1
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: dsa: realtek: Remove redundant of_match_ptr()Ruan Jinjie2-2/+2
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16nfc: virtual_ncidev: Use module_misc_device macro to simplify the codeLi Zetao1-12/+1
Use the module_misc_device macro to simplify the code, which is the same as declaring with module_init() and module_exit(). Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16Merge branch 'hns3-ethtool'David S. Miller12-681/+875
Jijie Shao says: ==================== hns3: refactor registers information for ethtool -d refactor registers information for ethtool -d ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: hns3: fix wrong rpu tln reg issueJijie Shao4-2/+71
In the original RPU query command, the status register values of multiple RPU tunnels are accumulated by default, which is unreasonable. This patch Fix it by querying the specified tunnel ID. The tunnel number of the device can be obtained from firmware during initialization. Fixes: ddb54554fa51 ("net: hns3: add DFX registers information for ethtool -d") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: hns3: Support tlv in regs data for HNS3 VF driverJijie Shao1-24/+61
The dump register function is being refactored. The third step in refactoring is to support tlv info in regs data for HNS3 PF driver. Currently, if we use "ethtool -d" to dump regs value, the output is as follows: offset1: 00 01 02 03 04 05 ... offset2:10 11 12 13 14 15 ... ...... We can't get the value of a register directly. This patch deletes the original separator information and add tag_len_value information in regs data. ethtool can parse register data in key-value format by -d command. a patch will be added to the ethtool to parse regs data in the following format: reg1 : value2 reg2 : value2 ...... Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: hns3: Support tlv in regs data for HNS3 PF driverJijie Shao1-65/+102
The dump register function is being refactored. The second step in refactoring is to support tlv info in regs data for HNS3 PF driver. Currently, if we use "ethtool -d" to dump regs value, the output is as follows: offset1: 00 01 02 03 04 05 ... offset2:10 11 12 13 14 15 ... ...... We can't get the value of a register directly. This patch deletes the original separator information and add tag_len_value information in regs data. ethtool can parse register data in key-value format by -d command. a patch will be added to the ethtool to parse regs data in the following format: reg1 : value2 reg2 : value2 ...... Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: hns3: move dump regs function to a separate fileJijie Shao10-680/+731
The dump register function is being refactored. The first step in refactoring is put the dump regs function into a separate file. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16Merge branch 'fec-XDP_TX'David S. Miller2-61/+132
Wei Fang says: ==================== net: fec: add XDP_TX feature support This patch set is to support the XDP_TX feature of FEC driver, the first patch is add initial XDP_TX support, and the second patch improves the performance of XDP_TX by not using xdp_convert_buff_to_frame(). Please refer to the commit message of each patch for more details. ==================== Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: fec: improve XDP_TX performanceWei Fang2-70/+75
As suggested by Jesper and Alexander, we can avoid converting xdp_buff to xdp_frame in case of XDP_TX to save a bunch of CPU cycles, so that we can further improve the XDP_TX performance. Before this patch on i.MX8MP-EVK board, the performance shows as follows. root@imx8mpevk:~# ./xdp2 eth0 proto 17: 353918 pkt/s proto 17: 352923 pkt/s proto 17: 353900 pkt/s proto 17: 352672 pkt/s proto 17: 353912 pkt/s proto 17: 354219 pkt/s After applying this patch, the performance is improved. root@imx8mpevk:~# ./xdp2 eth0 proto 17: 369261 pkt/s proto 17: 369267 pkt/s proto 17: 369206 pkt/s proto 17: 369214 pkt/s proto 17: 369126 pkt/s proto 17: 369272 pkt/s Signed-off-by: Wei Fang <wei.fang@nxp.com> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Suggested-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: fec: add XDP_TX feature supportWei Fang2-21/+87
The XDP_TX feature is not supported before, and all the frames which are deemed to do XDP_TX action actually do the XDP_DROP action. So this patch adds the XDP_TX support to FEC driver. I tested the performance of XDP_TX in XDP_DRV mode and XDP_SKB mode respectively on i.MX8MP-EVK platform, and as suggested by Jesper, I also tested the performance of XDP_REDIRECT on the same platform. And the test steps and results are as follows. XDP_TX test: Step 1: One board is used as generator and connects to switch,and the FEC port of DUT also connects to the switch. Both boards with flow control off. Then the generator runs the pktgen_sample03_burst_single_flow.sh script to generate and send burst traffic to DUT. Note that the size of packet was set to 64 bytes and the procotol of packet was UDP in my test scenario. In addition, the SMAC of the packet need to be different from the MAC of the generator, because the xdp2 program will swap the DMAC and SMAC of the packet and send it back to the generator. If the SMAC of the generated packet is the MAC of the generator, the generator will receive the returned traffic which increase the CPU loading and significantly degrade the transmit speed of the generator, and finally it affects the test of XDP_TX performance. Step 2: The DUT runs the xdp2 program to transmit received UDP packets back out on the same port where they were received. root@imx8mpevk:~# ./xdp2 eth0 proto 17: 353918 pkt/s proto 17: 352923 pkt/s proto 17: 353900 pkt/s proto 17: 352672 pkt/s proto 17: 353912 pkt/s proto 17: 354219 pkt/s root@imx8mpevk:~# ./xdp2 -S eth0 proto 17: 160604 pkt/s proto 17: 160708 pkt/s proto 17: 160564 pkt/s proto 17: 160684 pkt/s proto 17: 160640 pkt/s proto 17: 160720 pkt/s The above results show that the XDP_TX performance of XDP_DRV mode is much better than XDP_SKB mode, more than twice that of XDP_SKB mode, which is in line with our expectation. XDP_REDIRECT test: Step1: Both the generator and the FEC port of the DUT connet to the switch port. All the ports with flow control off, then the generator runs the pktgen script to generate and send burst traffic to DUT. Note that the size of packet was set to 64 bytes and the procotol of packet was UDP in my test scenario. Step2: The DUT runs the xdp_redirect program to redirect the traffic from the FEC port to the FEC port itself. root@imx8mpevk:~# ./xdp_redirect eth0 eth0 Redirecting from eth0 (ifindex 2; driver fec) to eth0 (ifindex 2; driver fec) Summary 232,302 rx/s 0 err,drop/s 232,344 xmit/s Summary 234,579 rx/s 0 err,drop/s 234,577 xmit/s Summary 235,548 rx/s 0 err,drop/s 235,549 xmit/s Summary 234,704 rx/s 0 err,drop/s 234,703 xmit/s Summary 235,504 rx/s 0 err,drop/s 235,504 xmit/s Summary 235,223 rx/s 0 err,drop/s 235,224 xmit/s Summary 234,509 rx/s 0 err,drop/s 234,507 xmit/s Summary 235,481 rx/s 0 err,drop/s 235,482 xmit/s Summary 234,684 rx/s 0 err,drop/s 234,683 xmit/s Summary 235,520 rx/s 0 err,drop/s 235,520 xmit/s Summary 235,461 rx/s 0 err,drop/s 235,461 xmit/s Summary 234,627 rx/s 0 err,drop/s 234,627 xmit/s Summary 235,611 rx/s 0 err,drop/s 235,611 xmit/s Packets received : 3,053,753 Average packets/s : 234,904 Packets transmitted : 3,053,792 Average transmit/s : 234,907 Compared the performance of XDP_TX with XDP_REDIRECT, XDP_TX is also much better than XDP_REDIRECT. It's also in line with our expectation. Signed-off-by: Wei Fang <wei.fang@nxp.com> Suggested-by: Jesper Dangaard Brouer <hawk@kernel.org> Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16selftests: bonding: remove redundant delete action of device link1_1Zhengchao Shao1-1/+0
When run command "ip netns delete client", device link1_1 has been deleted. So, it is no need to delete link1_1 again. Remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16Merge tag 'mlx5-updates-2023-08-14' of ↵Jakub Kicinski25-569/+672
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-08-14 1) Handle PTP out of order CQEs issue 2) Check FW status before determining reset successful 3) Expose maximum supported SFs via devlink resource 4) MISC cleanups * tag 'mlx5-updates-2023-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Don't query MAX caps twice net/mlx5: Remove unused MAX HCA capabilities net/mlx5: Remove unused CAPs net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler() net/mlx5: Remove redundant check of mlx5_vhca_event_supported() net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN() net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init() net/mlx5: Use auxiliary_device_uninit() instead of device_put() net/mlx5: E-switch, Add checking for flow rule destinations net/mlx5: Check with FW that sync reset completed successfully net/mlx5: Expose max possible SFs via devlink resource net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst ==================== Link: https://lore.kernel.org/r/20230814214144.159464-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16Merge branch 'net-warn-about-attempts-to-register-negative-ifindex'Jakub Kicinski3-3/+35
Jakub Kicinski says: ==================== net: warn about attempts to register negative ifindex Follow up to the recently posted fix for OvS lacking input validation: https://lore.kernel.org/all/20230814203840.2908710-1-kuba@kernel.org/ Warn about negative ifindex more explicitly and misc YNL updates. ==================== Link: https://lore.kernel.org/r/20230814205627.2914583-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16tools: ynl: add more info to KeyErrors on missing attrsJakub Kicinski1-3/+12
When developing specs its useful to know which attr space YNL was trying to find an attribute in on key error. Instead of printing: KeyError: 0 add info about the space: Exception: Space 'vport' has no attribute with value '0' Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230814205627.2914583-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16netlink: specs: add ovs_vport new commandJakub Kicinski1-0/+18
Add NEW to the spec, it was useful testing the fix for OvS input validation. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230814205627.2914583-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16net: warn about attempts to register negative ifindexJakub Kicinski1-0/+5
Since the xarray changes we mix returning valid ifindex and negative errno in a single int returned from dev_index_reserve(). This depends on the fact that ifindexes can't be negative. Otherwise we may insert into the xarray and return a very large negative value. This in turn may break ERR_PTR(). OvS is susceptible to this problem and lacking validation (fix posted separately for net). Reject negative ifindex explicitly. Add a warning because the input validation is better handled by the caller. Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230814205627.2914583-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16eth: r8152: try to use a normal budgetJakub Kicinski1-2/+1
Mario reports that loading r8152 on his system leads to a: netif_napi_add_weight() called with weight 256 warning getting printed. We don't have any solid data on why such high budget was chosen, and it may cause stalls in processing other softirqs and rt threads. So try to switch back to the default (64) weight. If this slows down someone's system we should investigate which part of stopping starting the NAPI poll in this driver are expensive. Reported-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/all/0bfd445a-81f7-f702-08b0-bd5a72095e49@amd.com/ Acked-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/20230814153521.2697982-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16net: e1000e: Remove unused declarationsYue Haibing1-2/+0
Commit bdfe2da6aefd ("e1000e: cosmetic move of function prototypes to the new mac.h") declared but never implemented them. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230814135821.4808-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16qed: remove unused 'resp_size' calculationArnd Bergmann1-28/+17
Newer versions of clang warn about this variable being assigned but never used: drivers/net/ethernet/qlogic/qed/qed_vf.c:63:67: error: parameter 'resp_size' set but not used [-Werror,-Wunused-but-set-parameter] There is no indication in the git history on how this was ever meant to be used, so just remove the entire calculation and argument passing for it to avoid the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230814074512.1067715-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16net: phy: mediatek-ge-soc: support PHY LEDsDaniel Golle1-9/+426
Implement netdev trigger and primitive bliking offloading as well as simple set_brigthness function for both PHY LEDs of the in-SoC PHYs found in MT7981 and MT7988. For MT7988, read boottrap register and apply LED polarities accordingly to get uniform behavior from all LEDs on MT7988. This requires syscon phandle 'mediatek,pio' present in parenting MDIO bus which should point to the syscon holding the boottrap register. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/dc324d48c00cd7350f3a506eaa785324cae97372.1691977904.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16Merge branch 'nexthop-various-cleanups'Jakub Kicinski1-6/+0
Ido Schimmel says: ==================== nexthop: Various cleanups Benefit from recent bug fixes and simplify the nexthop dump code. No regressions in existing tests: # ./fib_nexthops.sh [...] Tests passed: 234 Tests failed: 0 ==================== Link: https://lore.kernel.org/r/20230813164856.2379822-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16nexthop: Do not increment dump sentinel at the end of the dumpIdo Schimmel1-1/+0
The nexthop and nexthop bucket dump callbacks previously returned a positive return code even when the dump was complete, prompting the core netlink code to invoke the callback again, until returning zero. Zero was only returned by these callbacks when no information was filled in the provided skb, which was achieved by incrementing the dump sentinel at the end of the dump beyond the ID of the last nexthop. This is no longer necessary as when the dump is complete these callbacks return zero. Remove the unnecessary increment. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230813164856.2379822-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16nexthop: Simplify nexthop bucket dumpIdo Schimmel1-5/+0
Before commit f10d3d9df49d ("nexthop: Make nexthop bucket dump more efficient"), rtm_dump_nexthop_bucket_nh() returned a non-zero return code for each resilient nexthop group whose buckets it dumped, regardless if it encountered an error or not. This meant that the sentinel ('dd->ctx->nh.idx') used by the function that walked the different nexthops could not be used as a sentinel for the bucket dump, as otherwise buckets from the same group would be dumped over and over again. This was dealt with by adding another sentinel ('dd->ctx->done_nh_idx') that was incremented by rtm_dump_nexthop_bucket_nh() after successfully dumping all the buckets from a given group. After the previously mentioned commit this sentinel is no longer necessary since the function no longer returns a non-zero return code when successfully dumping all the buckets from a given group. Remove this sentinel and simplify the code. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230813164856.2379822-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16Merge branch 'seg6-add-next-c-sid-support-for-srv6-end-x-behavior'Jakub Kicinski3-20/+1302
Andrea Mayer says: ==================== seg6: add NEXT-C-SID support for SRv6 End.X behavior In the Segment Routing (SR) architecture a list of instructions, called segments, can be added to the packet headers to influence the forwarding and processing of the packets in an SR enabled network. Considering the Segment Routing over IPv6 data plane (SRv6) [1], the segment identifiers (SIDs) are IPv6 addresses (128 bits) and the segment list (SID List) is carried in the Segment Routing Header (SRH). A segment may correspond to a "behavior" that is executed by a node when the packet is received. The Linux kernel currently supports a large subset of the behaviors described in [2] (e.g., End, End.X, End.T and so on). In some SRv6 scenarios, the number of segments carried by the SID List may increase dramatically, reducing the MTU (Maximum Transfer Unit) size and/or limiting the processing power of legacy hardware devices (due to longer IPv6 headers). The NEXT-C-SID mechanism [3] extends the SRv6 architecture by providing several ways to efficiently represent the SID List. By leveraging the NEXT-C-SID, it is possible to encode several SRv6 segments within a single 128 bit SID address (also referenced as Compressed SID Container). In this way, the length of the SID List can be drastically reduced. The NEXT-C-SID mechanism is built upon the "flavors" framework defined in [2]. This framework is already supported by the Linux SRv6 subsystem and is used to modify and/or extend a subset of existing behaviors. In this patchset, we extend the SRv6 End.X behavior in order to support the NEXT-C-SID mechanism. In details, the patchset is made of: - patch 1/2: add NEXT-C-SID support for SRv6 End.X behavior; - patch 2/2: add selftest for NEXT-C-SID in SRv6 End.X behavior. From the user space perspective, we do not need to change the iproute2 code to support the NEXT-C-SID flavor for the SRv6 End.X behavior. However, we will update the man page considering the NEXT-C-SID flavor applied to the SRv6 End.X behavior in a separate patch. [1] - https://datatracker.ietf.org/doc/html/rfc8754 [2] - https://datatracker.ietf.org/doc/html/rfc8986 [3] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression ==================== Link: https://lore.kernel.org/r/20230812180926.16689-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End.X behaviorPaolo Lungaroni2-0/+1214
This selftest is designed for testing the support of NEXT-C-SID flavor for SRv6 End.X behavior. It instantiates a virtual network composed of several nodes: hosts and SRv6 routers. Each node is realized using a network namespace that is properly interconnected to others through veth pairs, according to the topology depicted in the selftest script file. The test considers SRv6 routers implementing IPv4/IPv6 L3 VPNs leveraged by hosts for communicating with each other. Such routers i) apply different SRv6 Policies to the traffic received from connected hosts, considering the IPv4 or IPv6 protocols; ii) use the NEXT-C-SID compression mechanism for encoding several SRv6 segments within a single 128-bit SID address, referred to as a Compressed SID (C-SID) container. The NEXT-C-SID is provided as a "flavor" of the SRv6 End.X behavior, enabling it to properly process the C-SID containers. The correct execution of the enabled NEXT-C-SID SRv6 End.X behavior is verified through reachability tests carried out between hosts belonging to the same VPN. Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it> Co-developed-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230812180926.16689-3-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16seg6: add NEXT-C-SID support for SRv6 End.X behaviorAndrea Mayer1-20/+88
The NEXT-C-SID mechanism described in [1] offers the possibility of encoding several SRv6 segments within a single 128 bit SID address. Such a SID address is called a Compressed SID (C-SID) container. In this way, the length of the SID List can be drastically reduced. A SID instantiated with the NEXT-C-SID flavor considers an IPv6 address logically structured in three main blocks: i) Locator-Block; ii) Locator-Node Function; iii) Argument. C-SID container +------------------------------------------------------------------+ | Locator-Block |Loc-Node| Argument | | |Function| | +------------------------------------------------------------------+ <--------- B -----------> <- NF -> <------------- A ---------------> (i) The Locator-Block can be any IPv6 prefix available to the provider; (ii) The Locator-Node Function represents the node and the function to be triggered when a packet is received on the node; (iii) The Argument carries the remaining C-SIDs in the current C-SID container. This patch leverages the NEXT-C-SID mechanism previously introduced in the Linux SRv6 subsystem [2] to support SID compression capabilities in the SRv6 End.X behavior [3]. An SRv6 End.X behavior with NEXT-C-SID flavor works as an End.X behavior but it is capable of processing the compressed SID List encoded in C-SID containers. An SRv6 End.X behavior with NEXT-C-SID flavor can be configured to support user-provided Locator-Block and Locator-Node Function lengths. In this implementation, such lengths must be evenly divisible by 8 (i.e. must be byte-aligned), otherwise the kernel informs the user about invalid values with a meaningful error code and message through netlink_ext_ack. If Locator-Block and/or Locator-Node Function lengths are not provided by the user during configuration of an SRv6 End.X behavior instance with NEXT-C-SID flavor, the kernel will choose their default values i.e., 32-bit Locator-Block and 16-bit Locator-Node Function. [1] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression [2] - https://lore.kernel.org/all/20220912171619.16943-1-andrea.mayer@uniroma2.it/ [3] - https://datatracker.ietf.org/doc/html/rfc8986#name-endx-l3-cross-connect Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230812180926.16689-2-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16Merge branch 'genetlink-provide-struct-genl_info-to-dumps'Jakub Kicinski45-180/+236
Jakub Kicinski says: ==================== genetlink: provide struct genl_info to dumps One of the biggest (which is not to say only) annoyances with genetlink handling today is that doit and dumpit need some of the same information, but it is passed to them in completely different structs. The implementations commonly end up writing a _fill() method which populates a message and have to pass at least 6 parameters. 3 of which are extracted manually from request info. After a lot of umming and ahing I decided to populate struct genl_info for dumps, without trying to factor out only the common parts. This makes the adoption easiest. In the future we may add a new version of dump which takes struct genl_info *info as the second argument, instead of struct netlink_callback *cb. For now developers have to call genl_info_dump(cb) to get the info. Typical genetlink families no longer get exposed to netlink protocol internals like pid and seq numbers. v3: - correct the condition in ethtool code (patch 10) v2: https://lore.kernel.org/all/20230810233845.2318049-1-kuba@kernel.org/ - replace the GENL_INFO_NTF() macro with init helper - fix the commit messages v1: https://lore.kernel.org/all/20230809182648.1816537-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20230814214723.2924989-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16ethtool: netlink: always pass genl_info to .prepare_dataJakub Kicinski25-47/+47
We had a number of bugs in the past because developers forgot to fully test dumps, which pass NULL as info to .prepare_data. .prepare_data implementations would try to access info->extack leading to a null-deref. Now that dumps and notifications can access struct genl_info we can pass it in, and remove the info null checks. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # pause Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16ethtool: netlink: simplify arguments to ethnl_default_parse()Jakub Kicinski1-12/+9
Pass struct genl_info directly instead of its members. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16netdev-genl: use struct genl_info for reply constructionJakub Kicinski1-9/+8
Use the just added APIs to make the code simpler. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16genetlink: add genlmsg_iput() APIJakub Kicinski1-1/+53
Add some APIs and helpers required for convenient construction of replies and notifications based on struct genl_info. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16genetlink: add a family pointer to struct genl_infoJakub Kicinski2-11/+14
Having family in struct genl_info is quite useful. It cuts down the number of arguments which need to be passed to helpers which already take struct genl_info. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16genetlink: use attrs from struct genl_infoJakub Kicinski14-23/+22
Since dumps carry struct genl_info now, use the attrs pointer from genl_info and remove the one in struct genl_dumpit_info. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16genetlink: add struct genl_info to struct genl_dumpit_infoJakub Kicinski2-2/+22
Netlink GET implementations must currently juggle struct genl_info and struct netlink_callback, depending on whether they were called from doit or dumpit. Add genl_info to the dump state and populate the fields. This way implementations can simply pass struct genl_info around. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16genetlink: remove userhdr from struct genl_infoJakub Kicinski7-27/+33
Only three families use info->userhdr today and going forward we discourage using fixed headers in new families. So having the pointer to user header in struct genl_info is an overkill. Compute the header pointer at runtime. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20230814214723.2924989-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16genetlink: make genl_info->nlhdr constJakub Kicinski3-3/+3
struct netlink_callback has a const nlh pointer, make the pointer in struct genl_info const as well, to make copying between the two easier. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-16genetlink: push conditional locking into dumpit/doneJakub Kicinski1-55/+35
Add helpers which take/release the genl mutex based on family->parallel_ops. Remove the separation between handling of ops in locked and parallel families. Future patches would make the duplicated code grow even more. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15net: dsa: mv88e6060: add phylink_get_caps implementationRussell King (Oracle)1-0/+45
Add a phylink_get_caps implementation for Marvell 88e6060 DSA switch. This is a fast ethernet switch, with internal PHYs for ports 0 through 4. Port 4 also supports MII, REVMII, REVRMII and SNI. Port 5 supports MII, REVMII, REVRMII and SNI without an internal PHY. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/E1qUkx7-003dMX-9b@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15net/mlx5: Don't query MAX caps twiceShay Drory1-6/+6
Whenever mlx5 driver is probed or reloaded, it queries some capabilities in MAX mode via set_hca_cap() API. Afterwards, the driver queries all capabilities in MAX mode via mlx5_query_hca_caps() API. Since MAX caps are read only caps, querying them twice is redundant. Hence, delete the second query. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>