summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-04af_unix: Remove lock dance in unix_peek_fds().Kuniyuki Iwashima3-44/+1
In the previous GC implementation, the shape of the inflight socket graph was not expected to change while GC was in progress. MSG_PEEK was tricky because it could install inflight fd silently and transform the graph. Let's say we peeked a fd, which was a listening socket, and accept()ed some embryo sockets from it. The garbage collection algorithm would have been confused because the set of sockets visited in scan_inflight() would change within the same GC invocation. That's why we placed spin_lock(&unix_gc_lock) and spin_unlock() in unix_peek_fds() with a fat comment. In the new GC implementation, we no longer garbage-collect the socket if it exists in another queue, that is, if it has a bridge to another SCC. Also, accept() will require the lock if it has edges. Thus, we need not do the complicated lock dance. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240401173125.92184-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04af_unix: Remove scm_fp_dup() in unix_attach_fds().Kuniyuki Iwashima1-7/+2
When we passed fds, we used to bump each file's refcount twice in scm_fp_copy() and scm_fp_dup() before linking the socket to gc_inflight_list. This is because we incremented the inflight count of the socket and linked it to the list in advance before passing skb to the destination socket. Otherwise, the inflight socket could have been garbage-collected in a small race window between linking the socket to the list and queuing skb: CPU 1 : sendmsg(X) w/ A's fd CPU 2 : close(A) ----- ----- /* Here A's refcount is 1, and inflight count is 0 */ bump A's refcount to 2 in scm_fp_copy() bump A's inflight count to 1 link A to gc_inflight_list decrement A's refcount to 1 /* A's refcount == inflight count, thus A could be GC candidate */ start GC mark A as candidate purge A's receive queue queue skb w/ A's fd to X /* A is queued, but all data has been lost */ After commit 4090fa373f0e ("af_unix: Replace garbage collection algorithm."), we increment the inflight count and link the socket to the global list only when queuing the skb. The race no longer exists, so let's not clone the fd nor bump the count in unix_attach_fds(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240401173125.92184-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04Merge branch 'tcp-make-trace-of-reset-logic-complete'Jakub Kicinski5-18/+56
Jason Xing says: ==================== tcp: make trace of reset logic complete Before this, we miss some cases where the TCP layer could send RST but we cannot trace it. So I decided to complete it :) Link: https://lore.kernel.org/all/20240329034243.7929-1-kerneljasonxing@gmail.com/ ==================== Link: https://lore.kernel.org/r/20240401073605.37335-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04trace: tcp: fully support trace_tcp_send_resetJason Xing3-7/+43
Prior to this patch, what we can see by enabling trace_tcp_send is only happening under two circumstances: 1) active rst mode 2) non-active rst mode and based on the full socket That means the inconsistency occurs if we use tcpdump and trace simultaneously to see how rst happens. It's necessary that we should take into other cases into considerations, say: 1) time-wait socket 2) no socket ... By parsing the incoming skb and reversing its 4-tuple can we know the exact 'flow' which might not exist. Samples after applied this patch: 1. tcp_send_reset: skbaddr=XXX skaddr=XXX src=ip:port dest=ip:port state=TCP_ESTABLISHED 2. tcp_send_reset: skbaddr=000...000 skaddr=XXX src=ip:port dest=ip:port state=UNKNOWN Note: 1) UNKNOWN means we cannot extract the right information from skb. 2) skbaddr/skaddr could be 0 Signed-off-by: Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240401073605.37335-3-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04trace: adjust TP_STORE_ADDR_PORTS_SKB() parametersJason Xing3-11/+13
Introducing entry_saddr and entry_daddr parameters in this macro for later use can help us record the reverse 4-tuple by analyzing the 4-tuple of the incoming skb when receiving. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240401073605.37335-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04net: enable timestamp static key if CPUMarcelo Tosatti1-0/+5
For systems that use CPU isolation (via nohz_full), creating or destroying a socket with SO_TIMESTAMP, SO_TIMESTAMPNS or SO_TIMESTAMPING with flag SOF_TIMESTAMPING_RX_SOFTWARE will cause a static key to be enabled/disabled. This in turn causes undesired IPIs to isolated CPUs. So enable the static key unconditionally, if CPU isolation is enabled, thus avoiding the IPIs. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/ZgrUiLLtbEUf9SFn@tpad Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03Merge branch 'gve-ring-size-changes'David S. Miller8-122/+235
Harshitha Ramamurthy says: ==================== gve: enable ring size changes This series enables support to change ring size via ethtool in gve. The first three patches deal with some clean up, setting default values for the ring sizes and related fields. The last two patches enable ring size changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03gve: add support to change ring size via ethtoolHarshitha Ramamurthy3-14/+95
Allow the user to change ring size via ethtool if supported by the device. The driver relies on the ring size ranges queried from device to validate ring sizes requested by the user. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03gve: add support to read ring size ranges from the deviceHarshitha Ramamurthy3-24/+102
Add support to read ring size change capability and the min and max descriptor counts from the device and store it in the driver. Also accommodate a special case where the device does not provide minimum ring size depending on the version of the device. In that case, rely on default values for the minimums. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03gve: set page count for RX QPL for GQI and DQO queue formatsHarshitha Ramamurthy5-22/+20
Fulfill the requirement that for GQI, the number of pages per RX QPL is equal to the ring size. Set this value to be equal to ring size. Because of this change, the rx_data_slot_cnt and rx_pages_per_qpl fields stored in the priv structure are not needed, so remove their usage. And for DQO, the number of pages per RX QPL is more than ring size to account for out-of-order completions. So set it to two times of rx ring size. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03gve: make the completion and buffer ring size equal for DQOHarshitha Ramamurthy5-43/+13
For the DQO queue format, the gve driver stores two ring sizes for both TX and RX - one for completion queue ring and one for data buffer ring. This is supposed to enable asymmetric sizes for these two rings but that is not supported. Make both fields reference the same single variable. This change renders reading supported TX completion ring size and RX buffer ring size for DQO from the device useless, so change those fields to reserved and remove related code. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03gve: simplify setting decriptor count defaultsHarshitha Ramamurthy1-29/+15
Combine the gve_set_desc_cnt and gve_set_desc_cnt_dqo into one function which sets the counts after checking the queue format. Both the functions in the previous code and the new combined function never return an error so make the new function void and remove the goto on error. Also rename the new function to gve_set_default_desc_cnt to be clearer about its intention. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03octeontx2-pf: Reset MAC stats during probeSai Krishna9-0/+100
Reset CGX/RPM MAC HW statistics at the time of driver probe() Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03net/smc: Avoid -Wflex-array-member-not-at-end warningsGustavo A. R. Silva2-12/+20
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. There are currently a couple of objects in `struct smc_clc_msg_proposal_area` that contain a couple of flexible structures: struct smc_clc_msg_proposal_area { ... struct smc_clc_v2_extension pclc_v2_ext; ... struct smc_clc_smcd_v2_extension pclc_smcd_v2_ext; ... }; So, in order to avoid ending up with a couple of flexible-array members in the middle of a struct, we use the `struct_group_tagged()` helper to separate the flexible array from the rest of the members in the flexible structure: struct smc_clc_smcd_v2_extension { struct_group_tagged(smc_clc_smcd_v2_extension_fixed, fixed, u8 system_eid[SMC_MAX_EID_LEN]; u8 reserved[16]; ); struct smc_clc_smcd_gid_chid gidchid[]; }; With the change described above, we now declare objects of the type of the tagged struct without embedding flexible arrays in the middle of another struct: struct smc_clc_msg_proposal_area { ... struct smc_clc_v2_extension_fixed pclc_v2_ext; ... struct smc_clc_smcd_v2_extension_fixed pclc_smcd_v2_ext; ... }; We also use `container_of()` when we need to retrieve a pointer to the flexible structures. So, with these changes, fix the following warnings: In file included from net/smc/af_smc.c:42: net/smc/smc_clc.h:186:49: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] 186 | struct smc_clc_v2_extension pclc_v2_ext; | ^~~~~~~~~~~ net/smc/smc_clc.h:188:49: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] 188 | struct smc_clc_smcd_v2_extension pclc_smcd_v2_ext; | ^~~~~~~~~~~~~~~~ Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03netdevice: add DEFINE_FREE() for dev_putJohannes Berg1-0/+2
For short netdev holds within a function there are still a lot of users of dev_put() rather than netdev_put(). Add DEFINE_FREE() to allow making those safer. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03rtnetlink: add guard for RTNLJohannes Berg1-0/+3
The new guard/scoped_gard can be useful for the RTNL as well, so add a guard definition for it. It gets used like { guard(rtnl)(); // RTNL held until end of block } or scoped_guard(rtnl) { // RTNL held in this block } as with any other guard/scoped_guard. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-03Merge branch '100GbE' of ↵Jakub Kicinski20-500/+814
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-04-01 (ice) This series contains updates to ice driver only. Michal Schmidt changes flow for gettimex64 to use host-side spinlock rather than hardware semaphore for lighter-weight locking. Steven adds ability for switch recipes to be re-used when firmware supports it. Thorsten Blum removes unwanted newlines in netlink messaging. Michal Swiatkowski and Piotr re-organize devlink related code; renaming, moving, and consolidating it to a single location. Michal also simplifies the devlink init and cleanup path to occur under a single lock call. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: hold devlink lock for whole init/cleanup ice: move devlink port code to a separate file ice: move ice_devlink.[ch] to devlink folder ice: Remove newlines in NL_SET_ERR_MSG_MOD ice: Add switch recipe reusing feature ice: fold ice_ptp_read_time into ice_ptp_gettimex64 ice: avoid the PTP hardware semaphore in gettimex64 path ice: add ice_adapter for shared data across PFs on the same NIC ==================== Link: https://lore.kernel.org/r/20240401172421.1401696-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03dt-bindings: net: snps,dwmac: Align 'snps,priority' type definitionRob Herring1-2/+4
'snps,priority' is also defined in dma/snps,dw-axi-dmac.yaml as a uint32-array. It's preferred to have a single type for a given property name, so update the type in snps,dwmac schema to match. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240401204422.1692359-2-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03Merge branch 'doc-netlink-add-a-yaml-spec-for-team'Jakub Kicinski7-127/+346
Hangbin Liu says: ==================== doc/netlink: add a YAML spec for team Add a YAML spec for team. As we need to link two objects together to form the team module, rename team to team_core for linking. ==================== Link: https://lore.kernel.org/r/20240401031004.1159713-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03uapi: team: use header file generated from YAML specHangbin Liu1-73/+43
generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode uapi \ > --spec Documentation/netlink/specs/team.yaml \ > --header -o include/uapi/linux/if_team.h Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240401031004.1159713-5-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net: team: use policy generated by YAML specHangbin Liu4-55/+98
generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \ > --spec Documentation/netlink/specs/team.yaml --source \ > -o drivers/net/team/team_nl.c $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \ > --spec Documentation/netlink/specs/team.yaml --header \ > -o drivers/net/team/team_nl.h The TEAM_ATTR_LIST_PORT in team_nl_policy is removed as it is only in the port list reply attributes. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240401031004.1159713-4-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net: team: rename team to team_core for linkingHangbin Liu2-0/+1
Similar with commit 08d323234d10 ("net: fou: rename the source for linking"), We'll need to link two objects together to form the team module. This means the source can't be called team, the build system expects team.o to be the combined object. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240401031004.1159713-3-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03Documentation: netlink: add a YAML spec for teamHangbin Liu2-0/+205
Add a YAML specification for team. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240401031004.1159713-2-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03tcp/dccp: complete lockless accesses to sk->sk_max_ack_backlogJason Xing1-1/+1
Since commit 099ecf59f05b ("net: annotate lockless accesses to sk->sk_max_ack_backlog") decided to handle the sk_max_ack_backlog locklessly, there is one more function mostly called in TCP/DCCP cases. So this patch completes it:) Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240331090521.71965-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03caif: Use UTILITY_NAME_LENGTH instead of hard-coding 16Christophe JAILLET1-4/+4
UTILITY_NAME_LENGTH is 16. So better use the former when defining the 'utility_name' array. This makes the intent clearer when it is used around line 260. While at it, declare variable in reverse xmas tree style. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/8c1160501f69b64bb2d45ce9f26f746eec80ac77.1711787352.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03Merge branch 'avoid-explicit-cpumask-var-allocation-on-stack'Jakub Kicinski2-13/+27
Dawei Li says: ==================== Avoid explicit cpumask var allocation on stack v1: https://lore.kernel.org/lkml/20240329105610.922675-1-dawei.li@shingroup.cn/ ==================== Link: https://lore.kernel.org/r/20240331053441.1276826-1-dawei.li@shingroup.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/dpaa2: Avoid explicit cpumask var allocation on stackDawei Li1-5/+9
For CONFIG_CPUMASK_OFFSTACK=y kernel, explicit allocation of cpumask variable on stack is not recommended since it can cause potential stack overflow. Instead, kernel code should always use *cpumask_var API(s) to allocate cpumask var in config-neutral way, leaving allocation strategy to CONFIG_CPUMASK_OFFSTACK. Use *cpumask_var API(s) to address it. Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Link: https://lore.kernel.org/r/20240331053441.1276826-3-dawei.li@shingroup.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net/iucv: Avoid explicit cpumask var allocation on stackDawei Li1-8/+18
For CONFIG_CPUMASK_OFFSTACK=y kernel, explicit allocation of cpumask variable on stack is not recommended since it can cause potential stack overflow. Instead, kernel code should always use *cpumask_var API(s) to allocate cpumask var in config-neutral way, leaving allocation strategy to CONFIG_CPUMASK_OFFSTACK. Use *cpumask_var API(s) to address it. Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://lore.kernel.org/r/20240331053441.1276826-2-dawei.li@shingroup.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net: dsa: sja1105: drop driver owner assignmentKrzysztof Kozlowski1-1/+0
Core in spi_register_driver() already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240330211023.100924-2-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net: dsa: microchip: drop driver owner assignmentKrzysztof Kozlowski1-1/+0
Core in spi_register_driver() already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240330211023.100924-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03dt-bindings: net: renesas,ethertsn: Create child-node for MDIO busNiklas Söderlund1-19/+14
The bindings for Renesas Ethernet TSN was just merged in v6.9 and the design for the bindings followed that of other Renesas Ethernet drivers and thus did not force a child-node for the MDIO bus. As there are no upstream drivers or users of this binding yet take the opportunity to correct this and force the usage of a child-node for the MDIO bus. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240330131228.1541227-1-niklas.soderlund+renesas@ragnatech.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03Merge branch 'page_pool-allow-direct-bulk-recycling'Jakub Kicinski5-61/+63
Alexander Lobakin says: ==================== page_pool: allow direct bulk recycling Previously, there was no reliable way to check whether it's safe to use direct PP cache. The drivers were passing @allow_direct to the PP recycling functions and that was it. Bulk recycling is used by xdp_return_frame_bulk() on .ndo_xdp_xmit() frames completion where the page origin is unknown, thus the direct recycling has never been tried. Now that we have at least 2 ways of checking if we're allowed to perform direct recycling -- pool->p.napi (Jakub) and pool->cpuid (Lorenzo), we can use them when doing bulk recycling as well. Just move that logic from the skb core to the PP core and call it before __page_pool_put_page() every time @allow_direct is false. Under high .ndo_xdp_xmit() traffic load, the win is 2-3% Pps assuming the sending driver uses xdp_return_frame_bulk() on Tx completion. ==================== Link: https://lore.kernel.org/r/20240329165507.3240110-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03page_pool: try direct bulk recyclingAlexander Lobakin1-2/+5
Now that the checks for direct recycling possibility live inside the Page Pool core, reuse them when performing bulk recycling. page_pool_put_page_bulk() can be called from process context as well, page_pool_napi_local() takes care of this at the very beginning. Under high .ndo_xdp_xmit() traffic load, the win is 2-3% Pps assuming the sending driver uses xdp_return_frame_bulk() on Tx completion. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20240329165507.3240110-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03page_pool: check for PP direct cache locality laterAlexander Lobakin5-59/+58
Since we have pool->p.napi (Jakub) and pool->cpuid (Lorenzo) to check whether it's safe to use direct recycling, we can use both globally for each page instead of relying solely on @allow_direct argument. Let's assume that @allow_direct means "I'm sure it's local, don't waste time rechecking this" and when it's false, try the mentioned params to still recycle the page directly. If neither is true, we'll lose some CPU cycles, but then it surely won't be hotpath. On the other hand, paths where it's possible to use direct cache, but not possible to safely set @allow_direct, will benefit from this move. The whole propagation of @napi_safe through a dozen of skb freeing functions can now go away, which saves us some stack space. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20240329165507.3240110-2-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03rhashtable: Improve grammarJonathan Neuschäfer1-5/+5
Change "a" to "an" according to the usual rules, fix an "if" that was mistyped as "in", improve grammar in "considerable slow" -> "considerably slower". Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240329-misc-rhashtable-v1-1-5862383ff798@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03tools: ynl: add ynl_dump_empty() helperJakub Kicinski2-0/+14
Checking if dump is empty requires a couple of casts. Add a convenient wrapper. Add an example use in the netdev sample, loopback is always present so an empty dump is an error. Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240329181651.319326-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03nfp: Avoid -Wflex-array-member-not-at-end warningsGustavo A. R. Silva1-16/+25
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. There is currently an object (`tl`), at the beginning of multiple structures, that contains a flexible structure (`struct nfp_dump_tl`), for example: struct nfp_dumpspec_csr { struct nfp_dump_tl tl; ... __be32 register_width; /* in bits */ }; So, in order to avoid ending up with flexible-array members in the middle of multiple other structs, we use the `struct_group_tagged()` helper to separate the flexible array from the rest of the members in the flexible structure: struct nfp_dump_tl { struct_group_tagged(nfp_dump_tl_hdr, hdr, ... the rest of members ); char data[]; }; With the change described above, we now declare objects of the type of the tagged struct, in this case `struct nfp_dump_tl_hdr`, without embedding flexible arrays in the middle of another struct: struct nfp_dumpspec_csr { struct nfp_dump_tl_hdr tl; ... __be32 register_width; /* in bits */ }; Also, use `container_of()` whenever we need to retrieve a pointer to the flexible structure, through which we can access the flexible array if needed. So, with these changes, fix 33 of the following warnings: drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:58:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:64:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:70:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:78:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:87:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:92:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://github.com/KSPP/linux/issues/202 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZgYWlkxdrrieDYIu@neat Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net: phy: aquantia: add support for AQR114C PHY IDPaweł Owoc1-0/+21
Add support for AQR114C PHY ID. This PHY advertise 10G speed: SPEED(0x04): 0x6031 capabilities: -400g +5g +2.5g -200g -25g -10g-xr -100g -40g -10g/1g -10 +100 +1000 -10-ts -2-tl +10g EXTABLE(0x0B): 0x40fc capabilities: -10g-cx4 -10g-lrm +10g-t +10g-kx4 +10g-kr +1000-t +1000-kx +100-tx -10-t -p2mp -40g/100g -1000/100-t1 -25g -200g/400g +2.5g/5g -1000-h but supports only up to 5G speed (as with AQR111/111B0). AQR111 init config is used to set max speed 5G. Signed-off-by: Paweł Owoc <frut3k7@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240401145114.1699451-1-frut3k7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02ipv6: remove RTNL protection from inet6_dump_fib()Eric Dumazet1-25/+26
No longer hold RTNL while calling inet6_dump_fib(). Also change return value for a completed dump, so that NLMSG_DONE can be appended to current skb, saving one recvmsg() system call. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240329183053.644630-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02Merge branch 'genetlink-remove-linux-genetlink-h'Jakub Kicinski11-27/+26
Jakub Kicinski says: ==================== genetlink: remove linux/genetlink.h There are two genetlink headers net/genetlink.h and linux/genetlink.h This is similar to netlink.h, but for netlink.h both contain good amount of code. For genetlink.h the linux/ version is leftover from before uAPI headers were split out, it has 10 lines of code. Move those 10 lines into other appropriate headers and delete linux/genetlink.h. I occasionally open the wrong header in the editor when coding, I guess I'm not the only one. v2: https://lore.kernel.org/all/20240325173716.2390605-1-kuba@kernel.org/ v1: https://lore.kernel.org/all/20240309183458.3014713-1-kuba@kernel.org ==================== Link: https://lore.kernel.org/r/20240329175710.291749-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02genetlink: remove linux/genetlink.hJakub Kicinski7-20/+12
genetlink.h is a shell of what used to be a combined uAPI and kernel header over a decade ago. It has fewer than 10 lines of code. Merge it into net/genetlink.h. In some ways it'd be better to keep the combined header under linux/ but it would make looking through git history harder. Acked-by: Sven Eckelmann <sven@narfation.org> Link: https://lore.kernel.org/r/20240329175710.291749-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02net: openvswitch: remove unnecessary linux/genetlink.h includeJakub Kicinski1-1/+0
The only legit reason I could think of for net/genetlink.h and linux/genetlink.h to be separate would be if one was included by other headers and we wanted to keep it lightweight. That is not the case, net/openvswitch/meter.h includes linux/genetlink.h but for no apparent reason (for struct genl_family perhaps? it's not necessary, types of externs do not need to be known). Link: https://lore.kernel.org/r/20240329175710.291749-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02netlink: create a new header for internal genetlink symbolsJakub Kicinski4-6/+14
There are things in linux/genetlink.h which are only used under net/netlink/. Move them to a new local header. A new header with just 2 externs isn't great, but alternative would be to include af_netlink.h in genetlink.c which feels even worse. Link: https://lore.kernel.org/r/20240329175710.291749-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02Merge branch '1GbE' of ↵Jakub Kicinski13-649/+582
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-03-29 (net: intel) This series contains updates to most Intel drivers. Jesse moves declaration of pci_driver struct to remove need for forward declarations in igb and converts Intel drivers to user newer power management ops. Sasha reworks power management flow on igc to avoid using rtnl_lock() during those flows. Maciej reorganizes i40e_nvm file to avoid forward declarations. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: i40e: avoid forward declarations in i40e_nvm.c igc: Refactor runtime power management flow net: intel: implement modern PM ops declarations igb: simplify pci ops declaration ==================== Link: https://lore.kernel.org/r/20240329175632.211340-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02tcp/dccp: do not care about families in inet_twsk_purge()Eric Dumazet8-25/+10
We lost ability to unload ipv6 module a long time ago. Instead of calling expensive inet_twsk_purge() twice, we can handle all families in one round. Also remove an extra line added in my prior patch, per Kuniyuki Iwashima feedback. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/20240327192934.6843-1-kuniyu@amazon.com/ Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240329153203.345203-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02inet: preserve const qualifier in inet_csk()Eric Dumazet7-11/+8
We can change inet_csk() to propagate its argument const qualifier, thanks to container_of_const(). We have to fix few places that had mistakes, like tcp_bound_rto(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240329144931.295800-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02Merge branch 'doc-netlink-add-hyperlinks-to-generated-docs'Jakub Kicinski2-19/+94
Donald Hunter says: ==================== doc: netlink: Add hyperlinks to generated docs Extend ynl-gen-rst to generate hyperlinks to definitions, attribute sets and sub-messages from all the places that reference them. ==================== Link: https://lore.kernel.org/r/20240329135021.52534-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02doc: netlink: Update tc spec with missing definitionsDonald Hunter1-0/+51
The tc spec referenced tc-u32-mark and tc-act-police-attrs but did not define them. The missing definitions were discovered when building the docs with generated hyperlinks because the hyperlink target labels were missing. Add definitions for tc-u32-mark and tc-act-police-attrs. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240329135021.52534-4-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02doc: netlink: Add hyperlinks to generated Netlink docsDonald Hunter1-18/+42
Update ynl-gen-rst to generate hyperlinks to definitions, attribute sets and sub-messages from all the places that reference them. Note that there is a single label namespace for all of the kernel docs. Hyperlinks within a single netlink doc need to be qualified by the family name to avoid collisions. The label format is 'family-type-name' which gives, for example, 'rt-link-attribute-set-link-attrs' as the link id. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240329135021.52534-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02doc: netlink: Change generated docs to limit TOC to depth 3Donald Hunter1-1/+1
The tables of contents in the generated Netlink docs include individual attribute definitions. This can make the contents exceedingly long and repeats a lot of what is on the rest of the pages. See for example: https://docs.kernel.org/networking/netlink_spec/tc.html Add a depth limit to the contents directive in generated .rst files to limit the contents depth to 3 levels. This reduces the contents to: - Family - Summary - Operations - op-one - op-two - ... - Definitions - struct-one - struct-two - enum-one - ... - Attribute sets - attrs-one - attrs-two - ... Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240329135021.52534-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>