summaryrefslogtreecommitdiff
path: root/net/tipc
AgeCommit message (Collapse)AuthorFilesLines
2022-02-15tipc: fix wrong publisher node address in link publicationsJon Maloy1-1/+1
When a link comes up we add its presence to the name table to make it possible for users to subscribe for link up/down events. However, after a previous call signature change the binding is wrongly published with the peer node as publishing node, instead of the own node as it should be. This has the effect that the command 'tipc name table show' will list the link binding (service type 2) with node scope and a peer node as originator, something that obviously is impossible. We correct this bug here. Fixes: 50a3499ab853 ("tipc: simplify signature of tipc_namtbl_publish()") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/20220214013852.2803940-1-jmaloy@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-13tipc: fix a bit overflow in tipc_crypto_key_rcv()Hangyu Hua1-1/+1
msg_data_sz return a 32bit value, but size is 16bit. This may lead to a bit overflow. Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-3/+10
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-11Merge tag 'net-5.17-rc4' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter and can. Current release - new code bugs: - sparx5: fix get_stat64 out-of-bound access and crash - smc: fix netdev ref tracker misuse Previous releases - regressions: - eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid overflows - eth: ocelot: fix all IP traffic getting trapped to CPU with PTP over IP - bonding: fix rare link activation misses in 802.3ad mode Previous releases - always broken: - tcp: fix tcp sock mem accounting in zero-copy corner cases - remove the cached dst when uncloning an skb dst and its metadata, since we only have one ref it'd lead to an UaF - netfilter: - conntrack: don't refresh sctp entries in closed state - conntrack: re-init state for retransmitted syn-ack, avoid connection establishment getting stuck with strange stacks - ctnetlink: disable helper autoassign, avoid it getting lost - nft_payload: don't allow transport header access for fragments - dsa: fix use of devres for mdio throughout drivers - eth: amd-xgbe: disable interrupts during pci removal - eth: dpaa2-eth: unregister netdev before disconnecting the PHY - eth: ice: fix IPIP and SIT TSO offload" * tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister net: mscc: ocelot: fix mutex lock error during ethtool stats read ice: Avoid RTNL lock when re-creating auxiliary device ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler ice: fix IPIP and SIT TSO offload ice: fix an error code in ice_cfg_phy_fec() net: mpls: Fix GCC 12 warning dpaa2-eth: unregister the netdev before disconnecting from the PHY skbuff: cleanup double word in comment net: macb: Align the dma and coherent dma masks mptcp: netlink: process IPv6 addrs in creating listening sockets selftests: mptcp: add missing join check net: usb: qmi_wwan: Add support for Dell DW5829e vlan: move dev_put into vlan_dev_uninit vlan: introduce vlan_dev_free_egress_priority ax25: fix UAF bugs of net_device caused by rebinding operation net: dsa: fix panic when DSA master device unbinds on shutdown net: amd-xgbe: disable interrupts during pci removal tipc: rate limit warning for received illegal binding update net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE ...
2022-02-10tipc: improve size validations for received domain recordsJon Maloy2-2/+9
The function tipc_mon_rcv() allows a node to receive and process domain_record structs from peer nodes to track their views of the network topology. This patch verifies that the number of members in a received domain record does not exceed the limit defined by MAX_MON_DOMAIN, something that may otherwise lead to a stack overflow. tipc_mon_rcv() is called from the function tipc_link_proto_rcv(), where we are reading a 32 bit message data length field into a uint16. To avert any risk of bit overflow, we add an extra sanity check for this in that function. We cannot see that happen with the current code, but future designers being unaware of this risk, may introduce it by allowing delivery of very large (> 64k) sk buffers from the bearer layer. This potential problem was identified by Eric Dumazet. This fixes CVE-2022-0435 Reported-by: Samuel Page <samuel.page@appgate.com> Reported-by: Eric Dumazet <edumazet@google.com> Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Samuel Page <samuel.page@appgate.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-09tipc: rate limit warning for received illegal binding updateJon Maloy1-1/+1
It would be easy to craft a message containing an illegal binding table update operation. This is handled correctly by the code, but the corresponding warning printout is not rate limited as is should be. We fix this now. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-27net: tipc: remove unused static inlinesJakub Kicinski1-23/+0
IIUC the TIPC msg helpers are not meant to provide and exhaustive API, so remove the unused ones. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+2
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-01net ticp:fix a kernel-infoleak in __tipc_sendmsg()Haimin Zhang1-0/+2
struct tipc_socket_addr.ref has a 4-byte hole,and __tipc_getname() currently copying it to user space,causing kernel-infoleak. BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:121 [inline] lib/usercopy.c:33 BUG: KMSAN: kernel-infoleak in _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 lib/usercopy.c:33 instrument_copy_to_user include/linux/instrumented.h:121 [inline] instrument_copy_to_user include/linux/instrumented.h:121 [inline] lib/usercopy.c:33 _copy_to_user+0x1c9/0x270 lib/usercopy.c:33 lib/usercopy.c:33 copy_to_user include/linux/uaccess.h:209 [inline] copy_to_user include/linux/uaccess.h:209 [inline] net/socket.c:287 move_addr_to_user+0x3f6/0x600 net/socket.c:287 net/socket.c:287 __sys_getpeername+0x470/0x6b0 net/socket.c:1987 net/socket.c:1987 __do_sys_getpeername net/socket.c:1997 [inline] __se_sys_getpeername net/socket.c:1994 [inline] __do_sys_getpeername net/socket.c:1997 [inline] net/socket.c:1994 __se_sys_getpeername net/socket.c:1994 [inline] net/socket.c:1994 __x64_sys_getpeername+0xda/0x120 net/socket.c:1994 net/socket.c:1994 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was stored to memory at: tipc_getname+0x575/0x5e0 net/tipc/socket.c:757 net/tipc/socket.c:757 __sys_getpeername+0x3b3/0x6b0 net/socket.c:1984 net/socket.c:1984 __do_sys_getpeername net/socket.c:1997 [inline] __se_sys_getpeername net/socket.c:1994 [inline] __do_sys_getpeername net/socket.c:1997 [inline] net/socket.c:1994 __se_sys_getpeername net/socket.c:1994 [inline] net/socket.c:1994 __x64_sys_getpeername+0xda/0x120 net/socket.c:1994 net/socket.c:1994 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was stored to memory at: msg_set_word net/tipc/msg.h:212 [inline] msg_set_destport net/tipc/msg.h:619 [inline] msg_set_word net/tipc/msg.h:212 [inline] net/tipc/socket.c:1486 msg_set_destport net/tipc/msg.h:619 [inline] net/tipc/socket.c:1486 __tipc_sendmsg+0x44fa/0x5890 net/tipc/socket.c:1486 net/tipc/socket.c:1486 tipc_sendmsg+0xeb/0x140 net/tipc/socket.c:1402 net/tipc/socket.c:1402 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] sock_sendmsg_nosec net/socket.c:704 [inline] net/socket.c:2409 sock_sendmsg net/socket.c:724 [inline] net/socket.c:2409 ____sys_sendmsg+0xe11/0x12c0 net/socket.c:2409 net/socket.c:2409 ___sys_sendmsg net/socket.c:2463 [inline] ___sys_sendmsg net/socket.c:2463 [inline] net/socket.c:2492 __sys_sendmsg+0x704/0x840 net/socket.c:2492 net/socket.c:2492 __do_sys_sendmsg net/socket.c:2501 [inline] __se_sys_sendmsg net/socket.c:2499 [inline] __do_sys_sendmsg net/socket.c:2501 [inline] net/socket.c:2499 __se_sys_sendmsg net/socket.c:2499 [inline] net/socket.c:2499 __x64_sys_sendmsg+0xe2/0x120 net/socket.c:2499 net/socket.c:2499 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_x64 arch/x86/entry/common.c:51 [inline] arch/x86/entry/common.c:82 do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Local variable skaddr created at: __tipc_sendmsg+0x2d0/0x5890 net/tipc/socket.c:1419 net/tipc/socket.c:1419 tipc_sendmsg+0xeb/0x140 net/tipc/socket.c:1402 net/tipc/socket.c:1402 Bytes 4-7 of 16 are uninitialized Memory access of size 16 starts at ffff888113753e00 Data copied to user address 0000000020000280 Reported-by: syzbot+cdbd40e0c3ca02cae3b7@syzkaller.appspotmail.com Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/1640918123-14547-1-git-send-email-tcs.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-4/+4
include/net/sock.h commit 8f905c0e7354 ("inet: fully convert sk->sk_rx_dst to RCU rules") commit 43f51df41729 ("net: move early demux fields close to sk_refcnt") https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-18Revert "tipc: use consistent GFP flags"Hoang Le1-4/+4
This reverts commit 86c3a3e964d910a62eeb277d60b2a60ebefa9feb. The tipc_aead_init() function can be calling from an interrupt routine. This allocation might sleep with GFP_KERNEL flag, hence the following BUG is reported. [ 17.657509] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:230 [ 17.660916] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/3 [ 17.664093] preempt_count: 302, expected: 0 [ 17.665619] RCU nest depth: 2, expected: 0 [ 17.667163] Preemption disabled at: [ 17.667165] [<0000000000000000>] 0x0 [ 17.669753] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G W 5.16.0-rc4+ #1 [ 17.673006] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 17.675540] Call Trace: [ 17.676285] <IRQ> [ 17.676913] dump_stack_lvl+0x34/0x44 [ 17.678033] __might_resched.cold+0xd6/0x10f [ 17.679311] kmem_cache_alloc_trace+0x14d/0x220 [ 17.680663] tipc_crypto_start+0x4a/0x2b0 [tipc] [ 17.682146] ? kmem_cache_alloc_trace+0xd3/0x220 [ 17.683545] tipc_node_create+0x2f0/0x790 [tipc] [ 17.684956] tipc_node_check_dest+0x72/0x680 [tipc] [ 17.686706] ? ___cache_free+0x31/0x350 [ 17.688008] ? skb_release_data+0x128/0x140 [ 17.689431] tipc_disc_rcv+0x479/0x510 [tipc] [ 17.690904] tipc_rcv+0x71c/0x730 [tipc] [ 17.692219] ? __netif_receive_skb_core+0xb7/0xf60 [ 17.693856] tipc_l2_rcv_msg+0x5e/0x90 [tipc] [ 17.695333] __netif_receive_skb_list_core+0x20b/0x260 [ 17.697072] netif_receive_skb_list_internal+0x1bf/0x2e0 [ 17.698870] ? dev_gro_receive+0x4c2/0x680 [ 17.700255] napi_complete_done+0x6f/0x180 [ 17.701657] virtnet_poll+0x29c/0x42e [virtio_net] [ 17.703262] __napi_poll+0x2c/0x170 [ 17.704429] net_rx_action+0x22f/0x280 [ 17.705706] __do_softirq+0xfd/0x30a [ 17.706921] common_interrupt+0xa4/0xc0 [ 17.708206] </IRQ> [ 17.708922] <TASK> [ 17.709651] asm_common_interrupt+0x1e/0x40 [ 17.711078] RIP: 0010:default_idle+0x18/0x20 Fixes: 86c3a3e964d9 ("tipc: use consistent GFP flags") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Link: https://lore.kernel.org/r/20211217030059.5947-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-15net: add net device refcount tracker to struct packet_typeEric Dumazet1-2/+2
Most notable changes are in af_packet, tipc ones are trivial. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-13tipc: discard MSG_CRYPTO msgs when key_exchange_enabled is not setXin Long1-1/+2
When key_exchange is disabled, there is no reason to accept MSG_CRYPTO msgs if it doesn't send MSG_CRYPTO msgs. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-26tipc: delete the unlikely branch in tipc_aead_encryptXin Long1-15/+4
When a skb comes to tipc_aead_encrypt(), it's always linear. The unlikely check 'skb_cloned(skb) && tailen <= skb_tailroom(skb)' can completely be taken care of in skb_cow_data() by the code in branch "if (!skb_has_frag_list())". Also, remove the 'TODO:' annotation, as the pages in skbs are not writable, see more on commit 3cf4375a0904 ("tipc: do not write skb_shinfo frags when doing decrytion"). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/47a478da0b6095b76e3cbe7a75cbd25d9da1df9a.1637773872.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-18tipc: check for null after calling kmemdupTadeusz Struk1-0/+4
kmemdup can return a null pointer so need to check for it, otherwise the null key will be dereferenced later in tipc_crypto_key_xmit as can be seen in the trace [1]. Cc: tipc-discussion@lists.sourceforge.net Cc: stable@vger.kernel.org # 5.15, 5.14, 5.10 [1] https://syzkaller.appspot.com/bug?id=bca180abb29567b189efdbdb34cbf7ba851c2a58 Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/20211115160143.5099-1-tadeusz.struk@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-15tipc: only accept encrypted MSG_CRYPTO msgsXin Long1-2/+5
The MSG_CRYPTO msgs are always encrypted and sent to other nodes for keys' deployment. But when receiving in peers, if those nodes do not validate it and make sure it's encrypted, one could craft a malicious MSG_CRYPTO msg to deploy its key with no need to know other nodes' keys. This patch is to do that by checking TIPC_SKB_CB(skb)->decrypted and discard it if this packet never got decrypted. Note that this is also a supplementary fix to CVE-2021-43267 that can be triggered by an unencrypted malicious MSG_CRYPTO msg. Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange") Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-15tipc: use consistent GFP flagsTadeusz Struk1-4/+4
Some functions, like tipc_crypto_start use inconsisten GFP flags when allocating memory. The mentioned function use GFP_ATOMIC to to alloc a crypto instance, and then calls alloc_ordered_workqueue() which allocates memory with GFP_KERNEL. tipc_aead_init() function even uses GFP_KERNEL and GFP_ATOMIC interchangeably. No doc comment specifies what context a function is designed to work in, but the flags should at least be consistent within a function. Cc: Jon Maloy <jmaloy@redhat.com> Cc: Ying Xue <ying.xue@windriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: tipc-discussion@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-11/+21
include/net/sock.h 7b50ecfcc6cd ("net: Rename ->stream_memory_read to ->sock_is_readable") 4c1e34c0dbff ("vsock: Enable y2038 safe timeval for timeout") drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c 0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table") e77bcdd1f639 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.") Adjacent code addition in both cases, keep both. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26tipc: fix size validations for the MSG_CRYPTO typeMax VA1-11/+21
The function tipc_crypto_key_rcv is used to parse MSG_CRYPTO messages to receive keys from other nodes in the cluster in order to decrypt any further messages from them. This patch verifies that any supplied sizes in the message body are valid for the received message. Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange") Signed-off-by: Max VA <maxv@sentinelone.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-13tipc: constify dev_addr passingJakub Kicinski4-5/+5
In preparation for netdev->dev_addr being constant make all relevant arguments in tipc constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-13tipc: increase timeout in tipc_sk_enqueue()Hoang Le1-1/+1
In tipc_sk_enqueue() we use hardcoded 2 jiffies to extract socket buffer from generic queue to particular socket. The 2 jiffies is too short in case there are other high priority tasks get CPU cycles for multiple jiffies update. As result, no buffer could be enqueued to particular socket. To solve this, we switch to use constant timeout 20msecs. Then, the function will be expired between 2 jiffies (CONFIG_100HZ) and 20 jiffies (CONFIG_1000HZ). Fixes: c637c1035534 ("tipc: resolve race problem at unicast message reception") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-03tipc: clean up inconsistent indentingColin Ian King1-1/+1
There is a statement that is indented one character too deeply, clean this up. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
drivers/ptp/Kconfig: 55c8fca1dae1 ("ptp_pch: Restore dependency on PCI") e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16tipc: call tipc_wait_for_connect only when dlen is not 0Xin Long1-1/+1
__tipc_sendmsg() is called to send SYN packet by either tipc_sendmsg() or tipc_connect(). The difference is in tipc_connect(), it will call tipc_wait_for_connect() after __tipc_sendmsg() to wait until connecting is done. So there's no need to wait in __tipc_sendmsg() for this case. This patch is to fix it by calling tipc_wait_for_connect() only when dlen is not 0 in __tipc_sendmsg(), which means it's called by tipc_connect(). Note this also fixes the failure in tipcutils/test/ptts/: # ./tipcTS & # ./tipcTC 9 (hang) Fixes: 36239dab6da7 ("tipc: fix implicit-connect for SYN+") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+3
Conflicts: drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events") kernel/bpf/helpers.c include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current") drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer") MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-12Revert "tipc: Return the correct errno code"Hoang Le1-3/+3
This reverts commit 0efea3c649f0 because of: - The returning -ENOBUF error is fine on socket buffer allocation. - There is side effect in the calling path tipc_node_xmit()->tipc_link_xmit() when checking error code returning. Fixes: 0efea3c649f0 ("tipc: Return the correct errno code") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-23/+21
Conflicting commits, all resolutions pretty trivial: drivers/bus/mhi/pci_generic.c 5c2c85315948 ("bus: mhi: pci-generic: configurable network interface MRU") 56f6f4c4eb2a ("bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean") drivers/nfc/s3fwrn5/firmware.c a0302ff5906a ("nfc: s3fwrn5: remove unnecessary label") 46573e3ab08f ("nfc: s3fwrn5: fix undefined parameter values in dev_err()") 801e541c79bb ("nfc: s3fwrn5: fix undefined parameter values in dev_err()") MAINTAINERS 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") 8a7b46fa7902 ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-25tipc: fix an use-after-free issue in tipc_recvmsgXin Long1-3/+5
syzbot reported an use-after-free crash: BUG: KASAN: use-after-free in tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979 Call Trace: tipc_recvmsg+0xf77/0xf90 net/tipc/socket.c:1979 sock_recvmsg_nosec net/socket.c:943 [inline] sock_recvmsg net/socket.c:961 [inline] sock_recvmsg+0xca/0x110 net/socket.c:957 tipc_conn_rcv_from_sock+0x162/0x2f0 net/tipc/topsrv.c:398 tipc_conn_recv_work+0xeb/0x190 net/tipc/topsrv.c:421 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422 As Hoang pointed out, it was caused by skb_cb->bytes_read still accessed after calling tsk_advance_rx_queue() to free the skb in tipc_recvmsg(). This patch is to fix it by accessing skb_cb->bytes_read earlier than calling tsk_advance_rx_queue(). Fixes: f4919ff59c28 ("tipc: keep the skb in rcv queue until the whole data is read") Reported-by: syzbot+e6741b97d5552f97c24d@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-24tipc: do not write skb_shinfo frags when doing decrytionXin Long1-10/+4
One skb's skb_shinfo frags are not writable, and they can be shared with other skbs' like by pskb_copy(). To write the frags may cause other skb's data crash. So before doing en/decryption, skb_cow_data() should always be called for a cloned or nonlinear skb if req dst is using the same sg as req src. While at it, the likely branch can be removed, as it will be covered by skb_cow_data(). Note that esp_input() has the same issue, and I will fix it in another patch. tipc_aead_encrypt() doesn't have this issue, as it only processes linear data in the unlikely branch. Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23tipc: fix sleeping in tipc accept routineHoang Le1-5/+4
The release_sock() is blocking function, it would change the state after sleeping. In order to evaluate the stated condition outside the socket lock context, switch to use wait_woken() instead. Fixes: 6398e23cdb1d8 ("tipc: standardize accept routine") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23tipc: fix implicit-connect for SYN+Xin Long1-8/+13
For implicit-connect, when it's either SYN- or SYN+, an ACK should be sent back to the client immediately. It's not appropriate for the client to enter established state only after receiving data from the server. On client side, after the SYN is sent out, tipc_wait_for_connect() should be called to wait for the ACK if timeout is set. This patch also restricts __tipc_sendstream() to call __sendmsg() only when it's in TIPC_OPEN state, so that the client can program in a single loop doing both connecting and data sending like: for (...) sendmsg(dest, buf); This makes the implicit-connect more implicit. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-17tipc: keep the skb in rcv queue until the whole data is readXin Long1-9/+27
Currently, when userspace reads a datagram with a buffer that is smaller than this datagram, the data will be truncated and only part of it can be received by users. It doesn't seem right that users don't know the datagram size and have to use a huge buffer to read it to avoid the truncation. This patch to fix it by keeping the skb in rcv queue until the whole data is read by users. Only the last msg of the datagram will be marked with MSG_EOR, just as TCP/SCTP does. Note that this will work as above only when MSG_EOR is set in the flags parameter of recvmsg(), so that it won't break any old user applications. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01Merge tag 'net-next-5.14' of ↵Linus Torvalds9-113/+97
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - BPF: - add syscall program type and libbpf support for generating instructions and bindings for in-kernel BPF loaders (BPF loaders for BPF), this is a stepping stone for signed BPF programs - infrastructure to migrate TCP child sockets from one listener to another in the same reuseport group/map to improve flexibility of service hand-off/restart - add broadcast support to XDP redirect - allow bypass of the lockless qdisc to improving performance (for pktgen: +23% with one thread, +44% with 2 threads) - add a simpler version of "DO_ONCE()" which does not require jump labels, intended for slow-path usage - virtio/vsock: introduce SOCK_SEQPACKET support - add getsocketopt to retrieve netns cookie - ip: treat lowest address of a IPv4 subnet as ordinary unicast address allowing reclaiming of precious IPv4 addresses - ipv6: use prandom_u32() for ID generation - ip: add support for more flexible field selection for hashing across multi-path routes (w/ offload to mlxsw) - icmp: add support for extended RFC 8335 PROBE (ping) - seg6: add support for SRv6 End.DT46 behavior - mptcp: - DSS checksum support (RFC 8684) to detect middlebox meddling - support Connection-time 'C' flag - time stamping support - sctp: packetization Layer Path MTU Discovery (RFC 8899) - xfrm: speed up state addition with seq set - WiFi: - hidden AP discovery on 6 GHz and other HE 6 GHz improvements - aggregation handling improvements for some drivers - minstrel improvements for no-ack frames - deferred rate control for TXQs to improve reaction times - switch from round robin to virtual time-based airtime scheduler - add trace points: - tcp checksum errors - openvswitch - action execution, upcalls - socket errors via sk_error_report Device APIs: - devlink: add rate API for hierarchical control of max egress rate of virtual devices (VFs, SFs etc.) - don't require RCU read lock to be held around BPF hooks in NAPI context - page_pool: generic buffer recycling New hardware/drivers: - mobile: - iosm: PCIe Driver for Intel M.2 Modem - support for Qualcomm MSM8998 (ipa) - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU) - NXP SJA1110 Automotive Ethernet 10-port switch - Qualcomm QCA8327 switch support (qca8k) - Mikrotik 10/25G NIC (atl1c) Driver changes: - ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP (our first foray into MAC/PHY description via ACPI) - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx - Mellanox/Nvidia NIC (mlx5) - NIC VF offload of L2 bridging - support IRQ distribution to Sub-functions - Marvell (prestera): - add flower and match all - devlink trap - link aggregation - Netronome (nfp): connection tracking offload - Intel 1GE (igc): add AF_XDP support - Marvell DPU (octeontx2): ingress ratelimit offload - Google vNIC (gve): new ring/descriptor format support - Qualcomm mobile (rmnet & ipa): inline checksum offload support - MediaTek WiFi (mt76) - mt7915 MSI support - mt7915 Tx status reporting - mt7915 thermal sensors support - mt7921 decapsulation offload - mt7921 enable runtime pm and deep sleep - Realtek WiFi (rtw88) - beacon filter support - Tx antenna path diversity support - firmware crash information via devcoredump - Qualcomm WiFi (wcn36xx) - Wake-on-WLAN support with magic packets and GTK rekeying - Micrel PHY (ksz886x/ksz8081): add cable test support" * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits) tcp: change ICSK_CA_PRIV_SIZE definition tcp_yeah: check struct yeah size at compile time gve: DQO: Fix off by one in gve_rx_dqo() stmmac: intel: set PCI_D3hot in suspend stmmac: intel: Enable PHY WOL option in EHL net: stmmac: option to enable PHY WOL with PMT enabled net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} net: use netdev_info in ndo_dflt_fdb_{add,del} ptp: Set lookup cookie when creating a PTP PPS source. net: sock: add trace for socket errors net: sock: introduce sk_error_report net: dsa: replay the local bridge FDB entries pointing to the bridge dev too net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev net: dsa: include fdb entries pointing to bridge in the host fdb list net: dsa: include bridge addresses which are local in the host fdb list net: dsa: sync static FDB entries on foreign interfaces to hardware net: dsa: install the host MDB and FDB entries in the master's RX filter net: dsa: reference count the FDB addresses at the cross-chip notifier level net: dsa: introduce a separate cross-chip notifier type for host FDBs net: dsa: reference count the MDB entries at the cross-chip notifier level ...
2021-06-29Merge tag 'fallthrough-fixes-clang-5.14-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo Silva: "Fix many fall-through warnings when building with Clang 12.0.0 and '-Wimplicit-fallthrough' so that we at some point will be able to enable that warning by default" * tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (26 commits) rxrpc: Fix fall-through warnings for Clang drm/nouveau/clk: Fix fall-through warnings for Clang drm/nouveau/therm: Fix fall-through warnings for Clang drm/nouveau: Fix fall-through warnings for Clang xfs: Fix fall-through warnings for Clang xfrm: Fix fall-through warnings for Clang tipc: Fix fall-through warnings for Clang sctp: Fix fall-through warnings for Clang rds: Fix fall-through warnings for Clang net/packet: Fix fall-through warnings for Clang net: netrom: Fix fall-through warnings for Clang ide: Fix fall-through warnings for Clang hwmon: (max6621) Fix fall-through warnings for Clang hwmon: (corsair-cpro) Fix fall-through warnings for Clang firewire: core: Fix fall-through warnings for Clang braille_console: Fix fall-through warnings for Clang ipv4: Fix fall-through warnings for Clang qlcnic: Fix fall-through warnings for Clang bnxt_en: Fix fall-through warnings for Clang netxen_nic: Fix fall-through warnings for Clang ...
2021-06-28net: tipc: replace align() with ALIGN in msg.cMenglong Dong1-8/+4
The function align() which is defined in msg.c is redundant, replace it with ALIGN() and introduce a BUF_ALIGN(). Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28net: tipc: fix FB_MTU eat two pagesMenglong Dong3-11/+11
FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory allocation fails, which can avoid unnecessary sending failures. The value of FB_MTU now is 3744, and the data size will be: (3744 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \ SKB_DATA_ALIGN(BUF_HEADROOM + BUF_TAILROOM + 3)) which is larger than one page(4096), and two pages will be allocated. To avoid it, replace '3744' with a calculation: (PAGE_SIZE - SKB_DATA_ALIGN(BUF_OVERHEAD) - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) What's more, alloc_skb_fclone() will call SKB_DATA_ALIGN for data size, and it's not necessary to make alignment for buf_size in tipc_buf_acquire(). So, just remove it. Fixes: 4c94cc2d3d57 ("tipc: fall back to smaller MTU if allocation of local send skb fails") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10tipc:subscr.c: fix a spelling mistakegushengxian1-1/+1
Fix a spelling mistake. Signed-off-by: gushengxian <gushengxian@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10tipc: socket.c: fix the use of copular verbgushengxian1-1/+1
Fix the use of copular verb. Signed-off-by: gushengxian <gushengxian@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10node.c: fix the use of indefinite articlegushengxian1-1/+1
Fix the use of indefinite article. Signed-off-by: gushengxian <gushengxian@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-05tipc: Return the correct errno codeZheng Yongjun1-3/+3
When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04tipc: simplify handling of lookup scope during multicast message receptionJon Maloy3-20/+16
We introduce a new macro TIPC_ANY_SCOPE to make the handling of the lookup scope value more comprehensible during multicast reception. The (unchanged) rules go as follows: 1) Multicast messages sent from own node are delivered to all matching sockets on the own node, irrespective of their binding scope. 2) Multicast messages sent from other nodes arrive here because they have found TIPC_CLUSTER_SCOPE bindings emanating from this node. Those messages should be delivered to exactly those sockets, but not to local sockets bound with TIPC_NODE_SCOPE, since the latter obviously were not meant to be visible for those senders. 3) Group multicast/broadcast messages are delivered to the sockets with a binding scope matching exactly the lookup scope indicated in the message header, and nobody else. Reviewed-by: Xin Long <lucien.xin@gmail.com> Tested-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04tipc: refactor function tipc_sk_anc_data_recv()Jon Maloy1-47/+38
We refactor tipc_sk_anc_data_recv() to make it slightly more comprehensible, but also to facilitate application of some additions to the code in a future commit. Reviewed-by: Xin Long <lucien.xin@gmail.com> Tested-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-04tipc: eliminate redundant fields in struct tipc_sockJon Maloy1-26/+27
We eliminate the redundant fields conn_type and conn_instance in struct tipc_sock. On the connecting side, this information is already present in the unused (after the connection is established) part of the pre-allocated header, and on the accepting side, we put it there when the new socket is created. Reviewed-by: Xin Long <lucien.xin@gmail.com> Tested-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18tipc: simplify the finalize work queueXin Long6-23/+14
This patch is to use "struct work_struct" for the finalize work queue instead of "struct tipc_net_work", as it can get the "net" and "addr" from tipc_net's other members and there is no need to add extra net and addr in tipc_net by defining "struct tipc_net_work". Note that it's safe to get net from tn->bcl as bcl is always released after the finalize work queue is done. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-18tipc: Fix fall-through warnings for ClangGustavo A. R. Silva1-0/+1
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-05-18tipc: wait and exit until all work queues are doneXin Long3-0/+6
On some host, a crash could be triggered simply by repeating these commands several times: # modprobe tipc # tipc bearer enable media udp name UDP1 localip 127.0.0.1 # rmmod tipc [] BUG: unable to handle kernel paging request at ffffffffc096bb00 [] Workqueue: events 0xffffffffc096bb00 [] Call Trace: [] ? process_one_work+0x1a7/0x360 [] ? worker_thread+0x30/0x390 [] ? create_worker+0x1a0/0x1a0 [] ? kthread+0x116/0x130 [] ? kthread_flush_work_fn+0x10/0x10 [] ? ret_from_fork+0x35/0x40 When removing the TIPC module, the UDP tunnel sock will be delayed to release in a work queue as sock_release() can't be done in rtnl_lock(). If the work queue is schedule to run after the TIPC module is removed, kernel will crash as the work queue function cleanup_beareri() code no longer exists when trying to invoke it. To fix it, this patch introduce a member wq_count in tipc_net to track the numbers of work queues in schedule, and wait and exit until all work queues are done in tipc_exit_net(). Fixes: d0f91938bede ("tipc: add ip/udp media type") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-17tipc: skb_linearize the head skb when reassembling msgsXin Long1-7/+2
It's not a good idea to append the frag skb to a skb's frag_list if the frag_list already has skbs from elsewhere, such as this skb was created by pskb_copy() where the frag_list was cloned (all the skbs in it were skb_get'ed) and shared by multiple skbs. However, the new appended frag skb should have been only seen by the current skb. Otherwise, it will cause use after free crashes as this appended frag skb are seen by multiple skbs but it only got skb_get called once. The same thing happens with a skb updated by pskb_may_pull() with a skb_cloned skb. Li Shuang has reported quite a few crashes caused by this when doing testing over macvlan devices: [] kernel BUG at net/core/skbuff.c:1970! [] Call Trace: [] skb_clone+0x4d/0xb0 [] macvlan_broadcast+0xd8/0x160 [macvlan] [] macvlan_process_broadcast+0x148/0x150 [macvlan] [] process_one_work+0x1a7/0x360 [] worker_thread+0x30/0x390 [] kernel BUG at mm/usercopy.c:102! [] Call Trace: [] __check_heap_object+0xd3/0x100 [] __check_object_size+0xff/0x16b [] simple_copy_to_iter+0x1c/0x30 [] __skb_datagram_iter+0x7d/0x310 [] __skb_datagram_iter+0x2a5/0x310 [] skb_copy_datagram_iter+0x3b/0x90 [] tipc_recvmsg+0x14a/0x3a0 [tipc] [] ____sys_recvmsg+0x91/0x150 [] ___sys_recvmsg+0x7b/0xc0 [] kernel BUG at mm/slub.c:305! [] Call Trace: [] <IRQ> [] kmem_cache_free+0x3ff/0x400 [] __netif_receive_skb_core+0x12c/0xc40 [] ? kmem_cache_alloc+0x12e/0x270 [] netif_receive_skb_internal+0x3d/0xb0 [] ? get_rx_page_info+0x8e/0xa0 [be2net] [] be_poll+0x6ef/0xd00 [be2net] [] ? irq_exit+0x4f/0x100 [] net_rx_action+0x149/0x3b0 ... This patch is to fix it by linearizing the head skb if it has frag_list set in tipc_buf_append(). Note that we choose to do this before calling skb_unshare(), as __skb_linearize() will avoid skb_copy(). Also, we can not just drop the frag_list either as the early time. Fixes: 45c8b7b175ce ("tipc: allow non-linear first fragment buffer") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-15Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv"Hoang Le1-1/+4
This reverts commit 6bf24dc0cc0cc43b29ba344b66d78590e687e046. Above fix is not correct and caused memory leak issue. Fixes: 6bf24dc0cc0c ("net:tipc: Fix a double free in tipc_sk_mcast_rcv") Acked-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11tipc: make node link identity publish thread safeHoang Le1-6/+6
The using of the node address and node link identity are not thread safe, meaning that two publications may be published the same values, as result one of them will get failure because of already existing in the name table. To avoid this we have to use the node address and node link identity values from inside the node item's write lock protection. Fixes: 50a3499ab853 ("tipc: simplify signature of tipc_namtbl_publish()") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-29Merge tag 'net-next-5.13' of ↵Linus Torvalds16-581/+712
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - bpf: - allow bpf programs calling kernel functions (initially to reuse TCP congestion control implementations) - enable task local storage for tracing programs - remove the need to store per-task state in hash maps, and allow tracing programs access to task local storage previously added for BPF_LSM - add bpf_for_each_map_elem() helper, allowing programs to walk all map elements in a more robust and easier to verify fashion - sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT redirection - lpm: add support for batched ops in LPM trie - add BTF_KIND_FLOAT support - mostly to allow use of BTF on s390 which has floats in its headers files - improve BPF syscall documentation and extend the use of kdoc parsing scripts we already employ for bpf-helpers - libbpf, bpftool: support static linking of BPF ELF files - improve support for encapsulation of L2 packets - xdp: restructure redirect actions to avoid a runtime lookup, improving performance by 4-8% in microbenchmarks - xsk: build skb by page (aka generic zerocopy xmit) - improve performance of software AF_XDP path by 33% for devices which don't need headers in the linear skb part (e.g. virtio) - nexthop: resilient next-hop groups - improve path stability on next-hops group changes (incl. offload for mlxsw) - ipv6: segment routing: add support for IPv4 decapsulation - icmp: add support for RFC 8335 extended PROBE messages - inet: use bigger hash table for IP ID generation - tcp: deal better with delayed TX completions - make sure we don't give up on fast TCP retransmissions only because driver is slow in reporting that it completed transmitting the original - tcp: reorder tcp_congestion_ops for better cache locality - mptcp: - add sockopt support for common TCP options - add support for common TCP msg flags - include multiple address ids in RM_ADDR - add reset option support for resetting one subflow - udp: GRO L4 improvements - improve 'forward' / 'frag_list' co-existence with UDP tunnel GRO, allowing the first to take place correctly even for encapsulated UDP traffic - micro-optimize dev_gro_receive() and flow dissection, avoid retpoline overhead on VLAN and TEB GRO - use less memory for sysctls, add a new sysctl type, to allow using u8 instead of "int" and "long" and shrink networking sysctls - veth: allow GRO without XDP - this allows aggregating UDP packets before handing them off to routing, bridge, OvS, etc. - allow specifing ifindex when device is moved to another namespace - netfilter: - nft_socket: add support for cgroupsv2 - nftables: add catch-all set element - special element used to define a default action in case normal lookup missed - use net_generic infra in many modules to avoid allocating per-ns memory unnecessarily - xps: improve the xps handling to avoid potential out-of-bound accesses and use-after-free when XPS change race with other re-configuration under traffic - add a config knob to turn off per-cpu netdev refcnt to catch underflows in testing Device APIs: - add WWAN subsystem to organize the WWAN interfaces better and hopefully start driving towards more unified and vendor- independent APIs - ethtool: - add interface for reading IEEE MIB stats (incl. mlx5 and bnxt support) - allow network drivers to dump arbitrary SFP EEPROM data, current offset+length API was a poor fit for modern SFP which define EEPROM in terms of pages (incl. mlx5 support) - act_police, flow_offload: add support for packet-per-second policing (incl. offload for nfp) - psample: add additional metadata attributes like transit delay for packets sampled from switch HW (and corresponding egress and policy-based sampling in the mlxsw driver) - dsa: improve support for sandwiched LAGs with bridge and DSA - netfilter: - flowtable: use direct xmit in topologies with IP forwarding, bridging, vlans etc. - nftables: counter hardware offload support - Bluetooth: - improvements for firmware download w/ Intel devices - add support for reading AOSP vendor capabilities - add support for virtio transport driver - mac80211: - allow concurrent monitor iface and ethernet rx decap - set priority and queue mapping for injected frames - phy: add support for Clause-45 PHY Loopback - pci/iov: add sysfs MSI-X vector assignment interface to distribute MSI-X resources to VFs (incl. mlx5 support) New hardware/drivers: - dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit interfaces. - dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and BCM63xx switches - Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches - ath11k: support for QCN9074 a 802.11ax device - Bluetooth: Broadcom BCM4330 and BMC4334 - phy: Marvell 88X2222 transceiver support - mdio: add BCM6368 MDIO mux bus controller - r8152: support RTL8153 and RTL8156 (USB Ethernet) chips - mana: driver for Microsoft Azure Network Adapter (MANA) - Actions Semi Owl Ethernet MAC - can: driver for ETAS ES58X CAN/USB interfaces Pure driver changes: - add XDP support to: enetc, igc, stmmac - add AF_XDP support to: stmmac - virtio: - page_to_skb() use build_skb when there's sufficient tailroom (21% improvement for 1000B UDP frames) - support XDP even without dedicated Tx queues - share the Tx queues with the stack when necessary - mlx5: - flow rules: add support for mirroring with conntrack, matching on ICMP, GTP, flex filters and more - support packet sampling with flow offloads - persist uplink representor netdev across eswitch mode changes - allow coexistence of CQE compression and HW time-stamping - add ethtool extended link error state reporting - ice, iavf: support flow filters, UDP Segmentation Offload - dpaa2-switch: - move the driver out of staging - add spanning tree (STP) support - add rx copybreak support - add tc flower hardware offload on ingress traffic - ionic: - implement Rx page reuse - support HW PTP time-stamping - octeon: support TC hardware offloads - flower matching on ingress and egress ratelimitting. - stmmac: - add RX frame steering based on VLAN priority in tc flower - support frame preemption (FPE) - intel: add cross time-stamping freq difference adjustment - ocelot: - support forwarding of MRP frames in HW - support multiple bridges - support PTP Sync one-step timestamping - dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like learning, flooding etc. - ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350, SC7280 SoCs) - mt7601u: enable TDLS support - mt76: - add support for 802.3 rx frames (mt7915/mt7615) - mt7915 flash pre-calibration support - mt7921/mt7663 runtime power management fixes" * tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits) net: selftest: fix build issue if INET is disabled net: netrom: nr_in: Remove redundant assignment to ns net: tun: Remove redundant assignment to ret net: phy: marvell: add downshift support for M88E1240 net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255 net/sched: act_ct: Remove redundant ct get and check icmp: standardize naming of RFC 8335 PROBE constants bpf, selftests: Update array map tests for per-cpu batched ops bpf: Add batched ops support for percpu array bpf: Implement formatted output helpers with bstr_printf seq_file: Add a seq_bprintf function sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues net:nfc:digital: Fix a double free in digital_tg_recv_dep_req net: fix a concurrency bug in l2tp_tunnel_register() net/smc: Remove redundant assignment to rc mpls: Remove redundant assignment to err llc2: Remove redundant assignment to rc net/tls: Remove redundant initialization of record rds: Remove redundant assignment to nr_sig dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0 ...