summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-01ipv6: use xa_array iterator to implement inet6_netconf_dump_devconf()Eric Dumazet1-58/+44
1) inet6_netconf_dump_devconf() can run under RCU protection instead of RTNL. 2) properly return 0 at the end of a dump, avoiding an an extra recvmsg() system call. 3) Do not use inet6_base_seq() anymore, for_each_netdev_dump() has nice properties. Restarting a GETDEVCONF dump if a device has been added/removed or if net->ipv6.dev_addr_genid has changed is moot. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6/addrconf: annotate data-races around devconf fields (II)Eric Dumazet7-54/+61
Final (?) round of this series. Annotate lockless reads on following devconf fields, because they be changed concurrently from /proc/net/ipv6/conf. - accept_dad - optimistic_dad - use_optimistic - use_oif_addrs_only - ra_honor_pio_life - keep_addr_on_down - ndisc_notify - ndisc_evict_nocarrier - suppress_frag_ndisc - addr_gen_mode - seg6_enabled - ioam6_enabled - ioam6_id - ioam6_id_wide - drop_unicast_in_l2_multicast - mldv[12]_unsolicited_report_interval - force_mld_version - force_tllao - accept_untracked_na - drop_unsolicited_na - accept_source_route Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6/addrconf: annotate data-races around devconf fields (I)Eric Dumazet2-31/+39
Annotate lockless reads and writes on following devconf fields: - regen_min_advance - regen_max_retry - dad_transmits - use_tempaddr - max_addresses - max_desync_factor - temp_valid_lft - rtr_solicits - rtr_solicit_max_interval - rtr_solicit_interval - rtr_solicit_delay - enhanced_dad - accept_redirects Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: addrconf_disable_policy() optimizationEric Dumazet1-7/+6
Writing over /proc/sys/net/ipv6/conf/default/disable_policy does not need to hold RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around devconf->disable_policyEric Dumazet3-5/+5
idev->cnf.disable_policy and net->ipv6.devconf_all->disable_policy can be read locklessly. Add appropriate annotations on reads and writes. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around devconf->proxy_ndpEric Dumazet3-4/+6
devconf->proxy_ndp can be read and written locklessly, add appropriate annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races in rt6_probe()Eric Dumazet1-2/+3
Use READ_ONCE() while reading idev->cnf.rtr_probe_interval while its value could be changed. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around idev->cnf.ignore_routes_with_linkdownEric Dumazet2-5/+5
idev->cnf.ignore_routes_with_linkdown can be used without any locks, add appropriate annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races in ndisc_router_discovery()Eric Dumazet1-15/+18
Annotate reads from in6_dev->cnf.XXX fields, as they could change concurrently. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around cnf.forwardingEric Dumazet7-17/+22
idev->cnf.forwarding and net->ipv6.devconf_all->forwarding might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around cnf.hop_limitEric Dumazet6-8/+8
idev->cnf.hop_limit and net->ipv6.devconf_all->hop_limit might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Florian Westphal <fw@strlen.de> # for netfilter parts Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around cnf.mtu6Eric Dumazet4-8/+8
idev->cnf.mtu6 might be read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: addrconf_disable_ipv6() optimizationEric Dumazet1-7/+6
Writing over /proc/sys/net/ipv6/conf/default/disable_ipv6 does not need to hold RTNL. v3: remove a wrong change (Jakub Kicinski feedback) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: annotate data-races around cnf.disable_ipv6Eric Dumazet3-7/+8
disable_ipv6 is read locklessly, add appropriate READ_ONCE() and WRITE_ONCE() annotations. v2: do not preload net before rtnl_trylock() in addrconf_disable_ipv6() (Jiri) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01ipv6: add ipv6_devconf_read_txrx cacheline_groupEric Dumazet1-4/+9
IPv6 TX and RX fast path use the following fields: - disable_ipv6 - hop_limit - mtu6 - forwarding - disable_policy - proxy_ndp Place them in a group to increase data locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski403-2087/+3927
Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-01Merge branch 'create-shadow-types-for-struct_ops-maps-in-skeletons'Andrii Nakryiko7-14/+372
Kui-Feng Lee says: ==================== Create shadow types for struct_ops maps in skeletons This patchset allows skeleton users to change the values of the fields in struct_ops maps at runtime. It will create a shadow type pointer in a skeleton for each struct_ops map, allowing users to access the values of fields through these pointers. For instance, if there is an integer field named "FOO" in a struct_ops map called "testmap", you can access the value of "FOO" in this way. skel->struct_ops.testmap->FOO = 13; With this feature, the users can pass flags or other data along with the map from the user space to the kernel without creating separate struct_ops map for different values in BPF. == Shadow Type == The shadow type of a struct_ops map is a variant of the original struct type of the map. The code generator translates each field in the original struct type to a field in the shadow type. The type of a field in the shadow type may not be the same as the corresponding field in the original struct type. For example, modifiers like volatile, const, etc., are removed from the fields in a shadow type. Function pointers are translated to pointers of struct bpf_program. Currently, only scalar types and function pointers are supported. Fields belonging to structs, unions, non-function pointers, arrays, or other types are not supported. For those unsupported fields, they are converted to arrays of characters to preserve their space within the original struct type. The padding between consecutive fields is handled by padding fields (__padding_*). This helps to maintain the memory layout consistent with the original struct_type. Here is an example of shadow types. The origin struct type of a struct_ops map is struct bpf_testmod_ops { int (*test_1)(void); void (*test_2)(int a, int b); /* Used to test nullable arguments. */ int (*test_maybe_null)(int dummy, struct task_struct *task); /* The following fields are used to test shadow copies. */ char onebyte; struct { int a; int b; } unsupported; int data; }; The struct_ops map, named testmod_1, of this type will be translated to a pointer in the shadow type. struct { struct my_skel__testmod_1__bpf_testmod_ops { const struct bpf_program *test_1; const struct bpf_program *test_2; const struct bpf_program *test_maybe_null; char onebyte; char __padding_4[3]; char __unsupported_4[8]; int data; } *testmod_1; } struct_ops; == Convert st_ops->data to Shadow Type == libbpf converts st_ops->data to the format of the shadow type for each struct_ops map. This means that the bytes where function pointers are located are converted to the values of the pointers of struct bpf_program. The fields of other types are kept as they were. Libbpf will synchronize the pointers of struct bpf_program with st_ops->progs[] so that users can change function pointers (bpf_program) before loading the map. --- Changes from v5: - Generate names for shadow types. - Check btf and the number of struct_ops maps in gen_st_ops_shadow() and gen_st_ops_shadow_init() instead of do_skeleton() and do_subskeleton(). - Name unsupported fields in the pattern __unsupported_*. - Have a padding field for a unsupported fields as well if necessary. - Implement resolve_func_ptr() in gen.c instead of reusing the one in libbpf. (Remove the part 1 in v4.) - Fix stylistic issues. Changes from v4: - Convert function pointers to the pointers to struct bpf_program in bpf_object__collect_st_ops_relos(). Changes from v3: - Add comment to avoid people from removing resolve_func_ptr() from libbpf_internal.h - Fix commit logs and comments. - Add an example about using the pointers of shadow types for struct_ops maps to bpftool-gen.8. v5: https://lore.kernel.org/all/20240227010432.714127-1-thinker.li@gmail.com/ v4: https://lore.kernel.org/all/20240222222624.1163754-1-thinker.li@gmail.com/ v3: https://lore.kernel.org/all/20240221012329.1387275-1-thinker.li@gmail.com/ v2: https://lore.kernel.org/all/20240214020836.1845354-1-thinker.li@gmail.com/ v1: https://lore.kernel.org/all/20240124224130.859921-1-thinker.li@gmail.com/ Kui-Feng Lee (5): libbpf: set btf_value_type_id of struct bpf_map for struct_ops. libbpf: Convert st_ops->data to shadow type. bpftool: generated shadow variables for struct_ops maps. bpftool: Add an example for struct_ops map and shadow type. selftests/bpf: Test if shadow types work correctly. .../bpf/bpftool/Documentation/bpftool-gen.rst | 58 ++++- tools/bpf/bpftool/gen.c | 237 +++++++++++++++++- tools/lib/bpf/libbpf.c | 50 +++- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 11 +- .../selftests/bpf/bpf_testmod/bpf_testmod.h | 8 + .../bpf/prog_tests/test_struct_ops_module.c | 19 +- .../selftests/bpf/progs/struct_ops_module.c | 8 + 7 files changed, 377 insertions(+), 14 deletions(-) ==================== Link: https://lore.kernel.org/r/20240229064523.2091270-1-thinker.li@gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-03-01selftests/bpf: Test if shadow types work correctly.Kui-Feng Lee4-5/+41
Change the values of fields, including scalar types and function pointers, and check if the struct_ops map works as expected. The test changes the field "test_2" of "testmod_1" from the pointer to test_2() to pointer to test_3() and the field "data" to 13. The function test_2() and test_3() both compute a new value for "test_2_result", but in different way. By checking the value of "test_2_result", it ensures the struct_ops map works as expected with changes through shadow types. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240229064523.2091270-6-thinker.li@gmail.com
2024-03-01bpftool: Add an example for struct_ops map and shadow type.Kui-Feng Lee1-6/+52
The example in bpftool-gen.8 explains how to use the pointer of the shadow type to change the value of a field of a struct_ops map. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240229064523.2091270-5-thinker.li@gmail.com
2024-03-01bpftool: Generated shadow variables for struct_ops maps.Kui-Feng Lee1-1/+236
Declares and defines a pointer of the shadow type for each struct_ops map. The code generator will create an anonymous struct type as the shadow type for each struct_ops map. The shadow type is translated from the original struct type of the map. The user of the skeleton use pointers of them to access the values of struct_ops maps. However, shadow types only supports certain types of fields, including scalar types and function pointers. Any fields of unsupported types are translated into an array of characters to occupy the space of the original field. Function pointers are translated into pointers of the struct bpf_program. Additionally, padding fields are generated to occupy the space between two consecutive fields. The pointers of shadow types of struct_osp maps are initialized when *__open_opts() in skeletons are called. For a map called FOO, the user can access it through the pointer at skel->struct_ops.FOO. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240229064523.2091270-4-thinker.li@gmail.com
2024-03-01libbpf: Convert st_ops->data to shadow type.Kui-Feng Lee1-2/+38
Convert st_ops->data to the shadow type of the struct_ops map. The shadow type of a struct_ops type is a variant of the original struct type providing a way to access/change the values in the maps of the struct_ops type. bpf_map__initial_value() will return st_ops->data for struct_ops types. The skeleton is going to use it as the pointer to the shadow type of the original struct type. One of the main differences between the original struct type and the shadow type is that all function pointers of the shadow type are converted to pointers of struct bpf_program. Users can replace these bpf_program pointers with other BPF programs. The st_ops->progs[] will be updated before updating the value of a map to reflect the changes made by users. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240229064523.2091270-3-thinker.li@gmail.com
2024-03-01libbpf: Set btf_value_type_id of struct bpf_map for struct_ops.Kui-Feng Lee1-0/+5
For a struct_ops map, btf_value_type_id is the type ID of it's struct type. This value is required by bpftool to generate skeleton including pointers of shadow types. The code generator gets the type ID from bpf_map__btf_value_type_id() in order to get the type information of the struct type of a map. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240229064523.2091270-2-thinker.li@gmail.com
2024-03-01bpf: Replace bpf_lpm_trie_key 0-length array with flexible arrayKees Cook8-25/+59
Replace deprecated 0-length array in struct bpf_lpm_trie_key with flexible array. Found with GCC 13: ../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=] 207 | *(__be16 *)&key->data[i]); | ^~~~~~~~~~~~~ ../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16' 102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) | ^ ../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu' 97 | #define be16_to_cpu __be16_to_cpu | ^~~~~~~~~~~~~ ../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu' 206 | u16 diff = be16_to_cpu(*(__be16 *)&node->data[i] ^ | ^~~~~~~~~~~ In file included from ../include/linux/bpf.h:7: ../include/uapi/linux/bpf.h:82:17: note: while referencing 'data' 82 | __u8 data[0]; /* Arbitrary size */ | ^~~~ And found at run-time under CONFIG_FORTIFY_SOURCE: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49 index 0 is out of range for type '__u8 [*]' Changing struct bpf_lpm_trie_key is difficult since has been used by userspace. For example, in Cilium: struct egress_gw_policy_key { struct bpf_lpm_trie_key lpm_key; __u32 saddr; __u32 daddr; }; While direct references to the "data" member haven't been found, there are static initializers what include the final member. For example, the "{}" here: struct egress_gw_policy_key in_key = { .lpm_key = { 32 + 24, {} }, .saddr = CLIENT_IP, .daddr = EXTERNAL_SVC_IP & 0Xffffff, }; To avoid the build time and run time warnings seen with a 0-sized trailing array for struct bpf_lpm_trie_key, introduce a new struct that correctly uses a flexible array for the trailing bytes, struct bpf_lpm_trie_key_u8. As part of this, include the "header" portion (which is just the "prefixlen" member), so it can be used by anything building a bpf_lpr_trie_key that has trailing members that aren't a u8 flexible array (like the self-test[1]), which is named struct bpf_lpm_trie_key_hdr. Unfortunately, C++ refuses to parse the __struct_group() helper, so it is not possible to define struct bpf_lpm_trie_key_hdr directly in struct bpf_lpm_trie_key_u8, so we must open-code the union directly. Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out, and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment to the UAPI header directing folks to the two new options. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Closes: https://paste.debian.net/hidden/ca500597/ Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1] Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
2024-02-29Merge tag 'net-6.8-rc7' of ↵Linus Torvalds74-211/+974
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29Merge tag 'landlock-6.8-rc7' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull Landlock fix from Mickaël Salaün: "Fix a potential issue when handling inodes with inconsistent properties" * tag 'landlock-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Fix asymmetric private inodes referring
2024-02-29dpll: fix build failure due to rcu_dereference_check() on unknown typeEric Dumazet2-4/+9
Tasmiya reports that their compiler complains that we deref a pointer to unknown type with rcu_dereference_rtnl(): include/linux/rcupdate.h:439:9: error: dereferencing pointer to incomplete type ‘struct dpll_pin’ Unclear what compiler it is, at the moment, and we can't report but since DPLL can't be a module - move the code from the header into the source file. Fixes: 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") Reported-by: Tasmiya Nalatwad <tasmiya@linux.vnet.ibm.com> Link: https://lore.kernel.org/all/3fcf3a2c-1c1b-42c1-bacb-78fdcd700389@linux.vnet.ibm.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240229190515.2740221-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29kunit: Fix again checksum tests on big endian CPUsChristophe Leroy1-8/+9
Commit b38460bc463c ("kunit: Fix checksum tests on big endian CPUs") fixed endianness issues with kunit checksum tests, but then commit 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum") introduced new issues on big endian CPUs. Those issues are once again reflected by the warnings reported by sparse. So, fix them with the same approach, perform proper conversion in order to support both little and big endian CPUs. Once the conversions are properly done and the right types used, the sparse warnings are cleared as well. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Fixes: 6f4c45cbcb00 ("kunit: Add tests for csum_ipv6_magic and ip_fast_csum") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/73df3a9e95c2179119398ad1b4c84cdacbd8dfb6.1708684443.git.christophe.leroy@csgroup.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29Merge tag 'for-net-2024-02-28' of ↵Jakub Kicinski9-20/+48
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - mgmt: Fix limited discoverable off timeout - hci_qca: Set BDA quirk bit if fwnode exists in DT - hci_bcm4377: do not mark valid bd_addr as invalid - hci_sync: Check the correct flag before starting a scan - Enforce validation on max value of connection interval - hci_sync: Fix accept_list when attempting to suspend - hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST - Avoid potential use-after-free in hci_error_reset - rfcomm: Fix null-ptr-deref in rfcomm_check_security - hci_event: Fix wrongly recorded wakeup BD_ADDR - qca: Fix wrong event type for patch config command - qca: Fix triggering coredump implementation * tag 'for-net-2024-02-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout Bluetooth: hci_event: Fix wrongly recorded wakeup BD_ADDR Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security Bluetooth: hci_sync: Fix accept_list when attempting to suspend Bluetooth: Avoid potential use-after-free in hci_error_reset Bluetooth: hci_sync: Check the correct flag before starting a scan Bluetooth: hci_bcm4377: do not mark valid bd_addr as invalid ==================== Link: https://lore.kernel.org/r/20240228145644.2269088-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29Merge branch 'tls-a-few-more-fixes-for-async-decrypt'Jakub Kicinski1-11/+29
Sabrina Dubroca says: ==================== tls: a few more fixes for async decrypt The previous patchset [1] took care of "full async". This adds a few fixes for cases where only part of the crypto operations go the async route, found by extending my previous debug patch [2] to do N synchronous operations followed by M asynchronous ops (with N and M configurable). [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=823784&state=* [2] https://lore.kernel.org/all/9d664093b1bf7f47497b2c40b3a085b45f3274a2.1694021240.git.sd@queasysnail.net/ ==================== Link: https://lore.kernel.org/r/cover.1709132643.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29tls: fix use-after-free on failed backlog decryptionSabrina Dubroca1-7/+17
When the decrypt request goes to the backlog and crypto_aead_decrypt returns -EBUSY, tls_do_decryption will wait until all async decryptions have completed. If one of them fails, tls_do_decryption will return -EBADMSG and tls_decrypt_sg jumps to the error path, releasing all the pages. But the pages have been passed to the async callback, and have already been released by tls_decrypt_done. The only true async case is when crypto_aead_decrypt returns -EINPROGRESS. With -EBUSY, we already waited so we can tell tls_sw_recvmsg that the data is available for immediate copy, but we need to notify tls_decrypt_sg (via the new ->async_done flag) that the memory has already been released. Fixes: 859054147318 ("net: tls: handle backlogging of crypto requests") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/4755dd8d9bebdefaa19ce1439b833d6199d4364c.1709132643.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29tls: separate no-async decryption request handling from asyncSabrina Dubroca1-5/+8
If we're not doing async, the handling is much simpler. There's no reference counting, we just need to wait for the completion to wake us up and return its result. We should preferably also use a separate crypto_wait. I'm not seeing a UAF as I did in the past, I think aec7961916f3 ("tls: fix race between async notify and socket close") took care of it. This will make the next fix easier. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/47bde5f649707610eaef9f0d679519966fc31061.1709132643.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29tls: fix peeking with sync+async decryptionSabrina Dubroca1-3/+6
If we peek from 2 records with a currently empty rx_list, and the first record is decrypted synchronously but the second record is decrypted async, the following happens: 1. decrypt record 1 (sync) 2. copy from record 1 to the userspace's msg 3. queue the decrypted record to rx_list for future read(!PEEK) 4. decrypt record 2 (async) 5. queue record 2 to rx_list 6. call process_rx_list to copy data from the 2nd record We currently pass copied=0 as skip offset to process_rx_list, so we end up copying once again from the first record. We should skip over the data we've already copied. Seen with selftest tls.12_aes_gcm.recv_peek_large_buf_mult_recs Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/1b132d2b2b99296bfde54e8a67672d90d6d16e71.1709132643.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29tls: decrement decrypt_pending if no async completion will be calledSabrina Dubroca1-0/+2
With mixed sync/async decryption, or failures of crypto_aead_decrypt, we increment decrypt_pending but we never do the corresponding decrement since tls_decrypt_done will not be called. In this case, we should decrement decrypt_pending immediately to avoid getting stuck. For example, the prequeue prequeue test gets stuck with mixed modes (one async decrypt + one sync decrypt). Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/c56d5fc35543891d5319f834f25622360e1bfbec.1709132643.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29gtp: fix use-after-free and null-ptr-deref in gtp_newlink()Alexander Ofitserov1-6/+6
The gtp_link_ops operations structure for the subsystem must be registered after registering the gtp_net_ops pernet operations structure. Syzkaller hit 'general protection fault in gtp_genl_dump_pdp' bug: [ 1010.702740] gtp: GTP module unloaded [ 1010.715877] general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] SMP KASAN NOPTI [ 1010.715888] KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] [ 1010.715895] CPU: 1 PID: 128616 Comm: a.out Not tainted 6.8.0-rc6-std-def-alt1 #1 [ 1010.715899] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-alt1 04/01/2014 [ 1010.715908] RIP: 0010:gtp_newlink+0x4d7/0x9c0 [gtp] [ 1010.715915] Code: 80 3c 02 00 0f 85 41 04 00 00 48 8b bb d8 05 00 00 e8 ed f6 ff ff 48 89 c2 48 89 c5 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 4f 04 00 00 4c 89 e2 4c 8b 6d 00 48 b8 00 00 00 [ 1010.715920] RSP: 0018:ffff888020fbf180 EFLAGS: 00010203 [ 1010.715929] RAX: dffffc0000000000 RBX: ffff88800399c000 RCX: 0000000000000000 [ 1010.715933] RDX: 0000000000000001 RSI: ffffffff84805280 RDI: 0000000000000282 [ 1010.715938] RBP: 000000000000000d R08: 0000000000000001 R09: 0000000000000000 [ 1010.715942] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88800399cc80 [ 1010.715947] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000400 [ 1010.715953] FS: 00007fd1509ab5c0(0000) GS:ffff88805b300000(0000) knlGS:0000000000000000 [ 1010.715958] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1010.715962] CR2: 0000000000000000 CR3: 000000001c07a000 CR4: 0000000000750ee0 [ 1010.715968] PKRU: 55555554 [ 1010.715972] Call Trace: [ 1010.715985] ? __die_body.cold+0x1a/0x1f [ 1010.715995] ? die_addr+0x43/0x70 [ 1010.716002] ? exc_general_protection+0x199/0x2f0 [ 1010.716016] ? asm_exc_general_protection+0x1e/0x30 [ 1010.716026] ? gtp_newlink+0x4d7/0x9c0 [gtp] [ 1010.716034] ? gtp_net_exit+0x150/0x150 [gtp] [ 1010.716042] __rtnl_newlink+0x1063/0x1700 [ 1010.716051] ? rtnl_setlink+0x3c0/0x3c0 [ 1010.716063] ? is_bpf_text_address+0xc0/0x1f0 [ 1010.716070] ? kernel_text_address.part.0+0xbb/0xd0 [ 1010.716076] ? __kernel_text_address+0x56/0xa0 [ 1010.716084] ? unwind_get_return_address+0x5a/0xa0 [ 1010.716091] ? create_prof_cpu_mask+0x30/0x30 [ 1010.716098] ? arch_stack_walk+0x9e/0xf0 [ 1010.716106] ? stack_trace_save+0x91/0xd0 [ 1010.716113] ? stack_trace_consume_entry+0x170/0x170 [ 1010.716121] ? __lock_acquire+0x15c5/0x5380 [ 1010.716139] ? mark_held_locks+0x9e/0xe0 [ 1010.716148] ? kmem_cache_alloc_trace+0x35f/0x3c0 [ 1010.716155] ? __rtnl_newlink+0x1700/0x1700 [ 1010.716160] rtnl_newlink+0x69/0xa0 [ 1010.716166] rtnetlink_rcv_msg+0x43b/0xc50 [ 1010.716172] ? rtnl_fdb_dump+0x9f0/0x9f0 [ 1010.716179] ? lock_acquire+0x1fe/0x560 [ 1010.716188] ? netlink_deliver_tap+0x12f/0xd50 [ 1010.716196] netlink_rcv_skb+0x14d/0x440 [ 1010.716202] ? rtnl_fdb_dump+0x9f0/0x9f0 [ 1010.716208] ? netlink_ack+0xab0/0xab0 [ 1010.716213] ? netlink_deliver_tap+0x202/0xd50 [ 1010.716220] ? netlink_deliver_tap+0x218/0xd50 [ 1010.716226] ? __virt_addr_valid+0x30b/0x590 [ 1010.716233] netlink_unicast+0x54b/0x800 [ 1010.716240] ? netlink_attachskb+0x870/0x870 [ 1010.716248] ? __check_object_size+0x2de/0x3b0 [ 1010.716254] netlink_sendmsg+0x938/0xe40 [ 1010.716261] ? netlink_unicast+0x800/0x800 [ 1010.716269] ? __import_iovec+0x292/0x510 [ 1010.716276] ? netlink_unicast+0x800/0x800 [ 1010.716284] __sock_sendmsg+0x159/0x190 [ 1010.716290] ____sys_sendmsg+0x712/0x880 [ 1010.716297] ? sock_write_iter+0x3d0/0x3d0 [ 1010.716304] ? __ia32_sys_recvmmsg+0x270/0x270 [ 1010.716309] ? lock_acquire+0x1fe/0x560 [ 1010.716315] ? drain_array_locked+0x90/0x90 [ 1010.716324] ___sys_sendmsg+0xf8/0x170 [ 1010.716331] ? sendmsg_copy_msghdr+0x170/0x170 [ 1010.716337] ? lockdep_init_map_type+0x2c7/0x860 [ 1010.716343] ? lockdep_hardirqs_on_prepare+0x430/0x430 [ 1010.716350] ? debug_mutex_init+0x33/0x70 [ 1010.716360] ? percpu_counter_add_batch+0x8b/0x140 [ 1010.716367] ? lock_acquire+0x1fe/0x560 [ 1010.716373] ? find_held_lock+0x2c/0x110 [ 1010.716384] ? __fd_install+0x1b6/0x6f0 [ 1010.716389] ? lock_downgrade+0x810/0x810 [ 1010.716396] ? __fget_light+0x222/0x290 [ 1010.716403] __sys_sendmsg+0xea/0x1b0 [ 1010.716409] ? __sys_sendmsg_sock+0x40/0x40 [ 1010.716419] ? lockdep_hardirqs_on_prepare+0x2b3/0x430 [ 1010.716425] ? syscall_enter_from_user_mode+0x1d/0x60 [ 1010.716432] do_syscall_64+0x30/0x40 [ 1010.716438] entry_SYSCALL_64_after_hwframe+0x62/0xc7 [ 1010.716444] RIP: 0033:0x7fd1508cbd49 [ 1010.716452] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ef 70 0d 00 f7 d8 64 89 01 48 [ 1010.716456] RSP: 002b:00007fff18872348 EFLAGS: 00000202 ORIG_RAX: 000000000000002e [ 1010.716463] RAX: ffffffffffffffda RBX: 000055f72bf0eac0 RCX: 00007fd1508cbd49 [ 1010.716468] RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000006 [ 1010.716473] RBP: 00007fff18872360 R08: 00007fff18872360 R09: 00007fff18872360 [ 1010.716478] R10: 00007fff18872360 R11: 0000000000000202 R12: 000055f72bf0e1b0 [ 1010.716482] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 1010.716491] Modules linked in: gtp(+) udp_tunnel ib_core uinput af_packet rfkill qrtr joydev hid_generic usbhid hid kvm_intel iTCO_wdt intel_pmc_bxt iTCO_vendor_support kvm snd_hda_codec_generic ledtrig_audio irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel nls_utf8 snd_intel_dspcfg nls_cp866 psmouse aesni_intel vfat crypto_simd fat cryptd glue_helper snd_hda_codec pcspkr snd_hda_core i2c_i801 snd_hwdep i2c_smbus xhci_pci snd_pcm lpc_ich xhci_pci_renesas xhci_hcd qemu_fw_cfg tiny_power_button button sch_fq_codel vboxvideo drm_vram_helper drm_ttm_helper ttm vboxsf vboxguest snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device snd_timer snd soundcore msr fuse efi_pstore dm_mod ip_tables x_tables autofs4 virtio_gpu virtio_dma_buf drm_kms_helper cec rc_core drm virtio_rng virtio_scsi rng_core virtio_balloon virtio_blk virtio_net virtio_console net_failover failover ahci libahci libata evdev scsi_mod input_leds serio_raw virtio_pci intel_agp [ 1010.716674] virtio_ring intel_gtt virtio [last unloaded: gtp] [ 1010.716693] ---[ end trace 04990a4ce61e174b ]--- Cc: stable@vger.kernel.org Signed-off-by: Alexander Ofitserov <oficerovas@altlinux.org> Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240228114703.465107-1-oficerovas@altlinux.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29Merge branch 'net-collect-tstats-automatically'Paolo Abeni2-1/+2
Breno Leitao says: ==================== net: collect tstats automatically The commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf") added a field in struct_netdevice, which tells what type of statistics the driver supports. That field is used primarily to allocate stats structures automatically, but, it also could leveraged to simplify the drivers even further, such as, if the driver relies in the default stats collection, then it doesn't need to assign to .ndo_get_stats64. That means that drivers only assign functions to .ndo_get_stats64 if they are using something special. I started to move some of these drivers[1][2][3] to use the core allocation, and with this change in, I just need to touch the driver once, and be able to simplify the whole stats allocation and collection for generic case. There are 44 devices today that could benefit from this simplification. # grep -r .ndo_get_stats64 | grep dev_get_tstats64 | wc -l 44 As of today, netnext only has the `sit` driver fully ported to core stats allocation, hence the second patch. Links: [1] https://lore.kernel.org/all/20240227182338.2739884-1-leitao@debian.org/ [2] https://lore.kernel.org/all/20240222144117.1370101-1-leitao@debian.org/ [3] https://lore.kernel.org/all/20240223115839.3572852-1-leitao@debian.org/ ==================== Link: https://lore.kernel.org/r/20240228113125.3473685-1-leitao@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29net: sit: Do not set .ndo_get_stats64Breno Leitao1-1/+0
If the driver is using the network core allocation mechanism, by setting NETDEV_PCPU_STAT_TSTATS, as this driver is, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Since the network core calls it automatically, and .ndo_get_stats64 should only be set if the driver needs special treatment. This simplifies the driver, since all the generic statistics is now handled by core. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29net: get stats64 if device if driver is configuredBreno Leitao1-0/+2
If the network driver is relying in the net core to do stats allocation, then we want to dev_get_tstats64() instead of netdev_stats_to_stats64(), since there are per-cpu stats that needs to be taken in consideration. This will also simplify the drivers in regard to statistics. Once the driver sets NETDEV_PCPU_STAT_TSTATS, it doesn't not need to allocate the stacks, neither it needs to set `.ndo_get_stats64 = dev_get_tstats64` for the generic stats collection function anymore. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29net: stmmac: fix typo in commentYanteng Si1-1/+1
This is just a trivial fix for a typo in a comment, no functional changes. Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240228112447.1490926-1-siyanteng@loongson.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29Merge tag 'nf-24-02-29' of ↵Paolo Abeni7-1/+338
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net Patch #1 restores NFPROTO_INET with nft_compat, from Ignat Korchagin. Patch #2 fixes an issue with bridge netfilter and broadcast/multicast packets. There is a day 0 bug in br_netfilter when used with connection tracking. Conntrack assumes that an nf_conn structure that is not yet added to hash table ("unconfirmed"), is only visible by the current cpu that is processing the sk_buff. For bridge this isn't true, sk_buff can get cloned in between, and clones can be processed in parallel on different cpu. This patch disables NAT and conntrack helpers for multicast packets. Patch #3 adds a selftest to cover for the br_netfilter bug. netfilter pull request 24-02-29 * tag 'nf-24-02-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() ==================== Link: https://lore.kernel.org/r/20240229000135.8780-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29net: hsr: Use correct offset for HSR TLV values in supervisory HSR framesLukasz Majewski1-1/+1
Current HSR implementation uses following supervisory frame (even for HSRv1 the HSR tag is not is not present): 00000000: 01 15 4e 00 01 2d XX YY ZZ 94 77 10 88 fb 00 01 00000010: 7e 1c 17 06 XX YY ZZ 94 77 10 1e 06 XX YY ZZ 94 00000020: 77 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 The current code adds extra two bytes (i.e. sizeof(struct hsr_sup_tlv)) when offset for skb_pull() is calculated. This is wrong, as both 'struct hsrv1_ethhdr_sp' and 'hsrv0_ethhdr_sp' already have 'struct hsr_sup_tag' defined in them, so there is no need for adding extra two bytes. This code was working correctly as with no RedBox support, the check for HSR_TLV_EOT (0x00) was off by two bytes, which were corresponding to zeroed padded bytes for minimal packet size. Fixes: eafaa88b3eb7 ("net: hsr: Add support for redbox supervision frames") Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240228085644.3618044-1-lukma@denx.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29ipv4: raw: remove useless input parameter in do_raw_set/getsockoptZhengchao Shao1-5/+5
The input parameter 'level' in do_raw_set/getsockopt() is not used. Therefore, remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240228072505.640550-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29Merge branch 'net-dsa-mv88e6xxx-add-amethyst-specific-smi-gpio-function'Paolo Abeni3-4/+40
Robert Marko says: ==================== net: dsa: mv88e6xxx: add Amethyst specific SMI GPIO function Amethyst family (MV88E6191X/6193X/6393X) has a simplified SMI GPIO setting via the Scratch and Misc register so it requires family specific function. In the v1 review, Andrew pointed out that it would make sense to rename the existing mv88e6xxx_g2_scratch_gpio_set_smi as it only works on the MV6390 family. Changes in v2: * Add rename of mv88e6xxx_g2_scratch_gpio_set_smi to mv88e6390_g2_scratch_gpio_set_smi ==================== Link: https://lore.kernel.org/r/20240227175457.2766628-1-robimarko@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29net: dsa: mv88e6xxx: add Amethyst specific SMI GPIO functionRobert Marko3-1/+37
The existing mv88e6390_g2_scratch_gpio_set_smi() cannot be used on the 88E6393X as it requires certain P0_MODE, it also checks the CPU mode as it impacts the bit setting value. This is all irrelevant for Amethyst (MV88E6191X/6193X/6393X) as only the default value of the SMI_PHY Config bit is set to CPU_MGD bootstrap pin value but it can be changed without restrictions so that GPIO pins 9 and 10 are used as SMI pins. So, introduce Amethyst specific function and call that if the Amethyst family wants to setup the external PHY. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Robert Marko <robimarko@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29net: dsa: mv88e6xxx: rename mv88e6xxx_g2_scratch_gpio_set_smiRobert Marko3-4/+4
The name mv88e6xxx_g2_scratch_gpio_set_smi is a bit ambiguous as it appears to only be applicable to the 6390 family, so lets rename it to mv88e6390_g2_scratch_gpio_set_smi to make it more obvious. Signed-off-by: Robert Marko <robimarko@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-29inet6: expand rcu_read_lock() scope in inet6_dump_addr()Eric Dumazet1-4/+4
I missed that inet6_dump_addr() is calling in6_dump_addrs() from two points. First one under RTNL protection, and second one under rcu_read_lock(). Since we want to remove RTNL use from inet6_dump_addr() very soon, no longer assume in6_dump_addrs() is protected by RTNL (even if this is still the case). Use rcu_read_lock() earlier to fix this lockdep splat: WARNING: suspicious RCU usage 6.8.0-rc5-syzkaller-01618-gf8cbf6bde4c8 #0 Not tainted net/ipv6/addrconf.c:5317 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by syz-executor.2/8834: #0: ffff88802f554678 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x119/0x780 net/netlink/af_netlink.c:2338 #1: ffffffff8f377a88 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0x676/0xda0 net/netlink/af_netlink.c:2265 #2: ffff88807e5f0580 (&ndev->lock){++--}-{2:2}, at: in6_dump_addrs+0xb8/0x1de0 net/ipv6/addrconf.c:5279 stack backtrace: CPU: 1 PID: 8834 Comm: syz-executor.2 Not tainted 6.8.0-rc5-syzkaller-01618-gf8cbf6bde4c8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106 lockdep_rcu_suspicious+0x220/0x340 kernel/locking/lockdep.c:6712 in6_dump_addrs+0x1b47/0x1de0 net/ipv6/addrconf.c:5317 inet6_dump_addr+0x1597/0x1690 net/ipv6/addrconf.c:5428 netlink_dump+0x6a6/0xda0 net/netlink/af_netlink.c:2266 __netlink_dump_start+0x59d/0x780 net/netlink/af_netlink.c:2374 netlink_dump_start include/linux/netlink.h:340 [inline] rtnetlink_rcv_msg+0xcf7/0x10d0 net/core/rtnetlink.c:6555 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2547 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1902 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 Fixes: c3718936ec47 ("ipv6: anycast: complete RCU handling of struct ifacaddr6") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240227222259.4081489-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29net: call skb_defer_free_flush() from __napi_busy_loop()Eric Dumazet1-21/+22
skb_defer_free_flush() is currently called from net_rx_action() and napi_threaded_poll(). We should also call it from __napi_busy_loop() otherwise there is the risk the percpu queue can grow until an IPI is forced from skb_attempt_defer_free() adding a latency spike. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Samiullah Khawaja <skhawaja@google.com> Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240227210105.3815474-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29tcp: remove some holes in struct tcp_sockEric Dumazet1-4/+4
By moving some fields around, this patch shrinks holes size from 56 to 32, saving 24 bytes on 64bit arches. After the patch pahole gives the following for 'struct tcp_sock': /* size: 2304, cachelines: 36, members: 162 */ /* sum members: 2234, holes: 6, sum holes: 32 */ /* sum bitfield members: 34 bits, bit holes: 5, sum bit holes: 14 bits */ /* padding: 32 */ /* paddings: 3, sum paddings: 10 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 12 */ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240227192721.3558982-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29igb: extend PTP timestamp adjustments to i211Oleksij Rempel1-2/+3
The i211 requires the same PTP timestamp adjustments as the i210, according to its datasheet. To ensure consistent timestamping across different platforms, this change extends the existing adjustments to include the i211. The adjustment result are tested and comparable for i210 and i211 based systems. Fixes: 3f544d2a4d5c ("igb: adjust PTP timestamps for Tx/Rx latency") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20240227184942.362710-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29net: bridge: Exit if multicast_init_stats failsBreno Leitao1-1/+2
If br_multicast_init_stats() fails, there is no need to set lockdep classes. Just return from the error path. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240227182338.2739884-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29net: bridge: Do not allocate stats in the driverBreno Leitao1-11/+2
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the bridge driver and leverage the network core allocation. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20240227182338.2739884-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>