summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2021-05-05smc: disallow TCP_ULP in smc_setsockopt()Cong Wang1-1/+3
syzbot is able to setup kTLS on an SMC socket which coincidentally uses sk_user_data too. Later, kTLS treats it as psock so triggers a refcnt warning. The root cause is that smc_setsockopt() simply calls TCP setsockopt() which includes TCP_ULP. I do not think it makes sense to setup kTLS on top of SMC sockets, so we should just disallow this setup. It is hard to find a commit to blame, but we can apply this patch since the beginning of TCP_ULP. Reported-and-tested-by: syzbot+b54a1ce86ba4a623b7f0@syzkaller.appspotmail.com Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-05ethtool: fix missing NLM_F_MULTI flag when dumpingFernando Fernandez Mancera1-1/+2
When dumping the ethtool information from all the interfaces, the netlink reply should contain the NLM_F_MULTI flag. This flag allows userspace tools to identify that multiple messages are expected. Link: https://bugzilla.redhat.com/1953847 Fixes: 365f9ae4ee36 ("ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit()") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-04net/nfc: fix use-after-free llcp_sock_bind/connectOr Cohen1-0/+4
Commits 8a4cd82d ("nfc: fix refcount leak in llcp_sock_connect()") and c33b1cc62 ("nfc: fix refcount leak in llcp_sock_bind()") fixed a refcount leak bug in bind/connect but introduced a use-after-free if the same local is assigned to 2 different sockets. This can be triggered by the following simple program: int sock1 = socket( AF_NFC, SOCK_STREAM, NFC_SOCKPROTO_LLCP ); int sock2 = socket( AF_NFC, SOCK_STREAM, NFC_SOCKPROTO_LLCP ); memset( &addr, 0, sizeof(struct sockaddr_nfc_llcp) ); addr.sa_family = AF_NFC; addr.nfc_protocol = NFC_PROTO_NFC_DEP; bind( sock1, (struct sockaddr*) &addr, sizeof(struct sockaddr_nfc_llcp) ) bind( sock2, (struct sockaddr*) &addr, sizeof(struct sockaddr_nfc_llcp) ) close(sock1); close(sock2); Fix this by assigning NULL to llcp_sock->local after calling nfc_llcp_local_put. This addresses CVE-2021-23134. Reported-by: Or Cohen <orcohen@paloaltonetworks.com> Reported-by: Nadav Markus <nmarkus@paloaltonetworks.com> Fixes: c33b1cc62 ("nfc: fix refcount leak in llcp_sock_bind()") Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-04net: Only allow init netns to set default tcp cong to a restricted algoJonathon Reinhart1-0/+4
tcp_set_default_congestion_control() is netns-safe in that it writes to &net->ipv4.tcp_congestion_control, but it also sets ca->flags |= TCP_CONG_NON_RESTRICTED which is not namespaced. This has the unintended side-effect of changing the global net.ipv4.tcp_allowed_congestion_control sysctl, despite the fact that it is read-only: 97684f0970f6 ("net: Make tcp_allowed_congestion_control readonly in non-init netns") Resolve this netns "leak" by only allowing the init netns to set the default algorithm to one that is restricted. This restriction could be removed if tcp_allowed_congestion_control were namespace-ified in the future. This bug was uncovered with https://github.com/JonathonReinhart/linux-netns-sysctl-verify Fixes: 6670e1524477 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control") Signed-off-by: Jonathon Reinhart <jonathon.reinhart@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-4/+3
Daniel Borkmann says: ==================== pull-request: bpf 2021-05-04 The following pull-request contains BPF updates for your *net* tree. We've added 5 non-merge commits during the last 4 day(s) which contain a total of 6 files changed, 52 insertions(+), 30 deletions(-). The main changes are: 1) Fix libbpf overflow when processing BPF ring buffer in case of extreme application behavior, from Brendan Jackman. 2) Fix potential data leakage of uninitialized BPF stack under speculative execution, from Daniel Borkmann. 3) Fix off-by-one when validating xsk pool chunks, from Xuan Zhuo. 4) Fix snprintf BPF selftest with a pid filter to avoid racing its output test buffer, from Florent Revest. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-04xsk: Fix for xp_aligned_validate_desc() when len == chunk_sizeXuan Zhuo1-4/+3
When desc->len is equal to chunk_size, it is legal. But when the xp_aligned_validate_desc() got chunk_end from desc->addr + desc->len pointing to the next chunk during the check, it caused the check to fail. This problem was first introduced in bbff2f321a86 ("xsk: new descriptor addressing scheme"). Later in 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API") this piece of code was moved into the new function called xp_aligned_validate_desc(). This function was then moved into xsk_queue.h via 26062b185eee ("xsk: Explicitly inline functions and move definitions"). Fixes: bbff2f321a86 ("xsk: new descriptor addressing scheme") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20210428094424.54435-1-xuanzhuo@linux.alibaba.com
2021-05-03sctp: delay auto_asconf init until binding the first addrXin Long1-14/+17
As Or Cohen described: If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock held and sp->do_auto_asconf is true, then an element is removed from the auto_asconf_splist without any proper locking. This can happen in the following functions: 1. In sctp_accept, if sctp_sock_migrate fails. 2. In inet_create or inet6_create, if there is a bpf program attached to BPF_CGROUP_INET_SOCK_CREATE which denies creation of the sctp socket. This patch is to fix it by moving the auto_asconf init out of sctp_init_sock(), by which inet_create()/inet6_create() won't need to operate it in sctp_destroy_sock() when calling sk_common_release(). It also makes more sense to do auto_asconf init while binding the first addr, as auto_asconf actually requires an ANY addr bind, see it in sctp_addr_wq_timeout_handler(). This addresses CVE-2021-23133. Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications") Reported-by: Or Cohen <orcohen@paloaltonetworks.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-03Revert "net/sctp: fix race condition in sctp_destroy_sock"Xin Long1-5/+8
This reverts commit b166a20b07382b8bc1dcee2a448715c9c2c81b5b. This one has to be reverted as it introduced a dead lock, as syzbot reported: CPU0 CPU1 ---- ---- lock(&net->sctp.addr_wq_lock); lock(slock-AF_INET6); lock(&net->sctp.addr_wq_lock); lock(slock-AF_INET6); CPU0 is the thread of sctp_addr_wq_timeout_handler(), and CPU1 is that of sctp_close(). The original issue this commit fixed will be fixed in the next patch. Reported-by: syzbot+959223586843e69a2674@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-03net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_infoPhillip Potter1-0/+4
Check at start of fill_frame_info that the MAC header in the supplied skb is large enough to fit a struct hsr_ethhdr, as otherwise this is not a valid HSR frame. If it is too small, return an error which will then cause the callers to clean up the skb. Fixes a KMSAN-found uninit-value bug reported by syzbot at: https://syzkaller.appspot.com/bug?id=f7e9b601f1414f814f7602a82b6619a8d80bce3f Reported-by: syzbot+e267bed19bfc5478fb33@syzkaller.appspotmail.com Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-03sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_bXin Long1-1/+2
Normally SCTP_MIB_CURRESTAB is always incremented once asoc enter into ESTABLISHED from the state < ESTABLISHED and decremented when the asoc is being deleted. However, in sctp_sf_do_dupcook_b(), the asoc's state can be changed to ESTABLISHED from the state >= ESTABLISHED where it shouldn't increment SCTP_MIB_CURRESTAB. Otherwise, one asoc may increment MIB_CURRESTAB multiple times but only decrement once at the end. I was able to reproduce it by using scapy to do the 4-way shakehands, after that I replayed the COOKIE-ECHO chunk with 'peer_vtag' field changed to different values, and SCTP_MIB_CURRESTAB was incremented multiple times and never went back to 0 even when the asoc was freed. This patch is to fix it by only incrementing SCTP_MIB_CURRESTAB when the state < ESTABLISHED in sctp_sf_do_dupcook_b(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-03Revert "sctp: Fix SHUTDOWN CTSN Ack in the peer restart case"Xin Long1-5/+1
This reverts commit 12dfd78e3a74825e6f0bc8df7ef9f938fbc6bfe3. This can be reverted as shutdown and cookie_ack chunk are using the same asoc since commit 35b4f24415c8 ("sctp: do asoc update earlier in sctp_sf_do_dupcook_a"). Reported-by: Jere Leppänen <jere.leppanen@nokia.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-03Revert "Revert "sctp: Fix bundling of SHUTDOWN with COOKIE-ACK""Xin Long1-3/+3
This reverts commit 7e9269a5acec6d841d22e12770a0b02db4f5d8f2. As Jere notice, commit 35b4f24415c8 ("sctp: do asoc update earlier in sctp_sf_do_dupcook_a") only keeps the SHUTDOWN and COOKIE-ACK with the same asoc, not transport. So we have to bring this patch back. Reported-by: Jere Leppänen <jere.leppanen@nokia.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-01sctp: do asoc update earlier in sctp_sf_do_dupcook_bXin Long2-43/+30
The same thing should be done for sctp_sf_do_dupcook_b(). Meanwhile, SCTP_CMD_UPDATE_ASSOC cmd can be removed. v1->v2: - Fix the return value in sctp_sf_do_assoc_update(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-01Revert "sctp: Fix bundling of SHUTDOWN with COOKIE-ACK"Xin Long1-3/+3
This can be reverted as shutdown and cookie_ack chunk are using the same asoc since the last patch. This reverts commit 145cb2f7177d94bc54563ed26027e952ee0ae03c. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-01sctp: do asoc update earlier in sctp_sf_do_dupcook_aXin Long1-5/+20
There's a panic that occurs in a few of envs, the call trace is as below: [] general protection fault, ... 0x29acd70f1000a: 0000 [#1] SMP PTI [] RIP: 0010:sctp_ulpevent_notify_peer_addr_change+0x4b/0x1fa [sctp] [] sctp_assoc_control_transport+0x1b9/0x210 [sctp] [] sctp_do_8_2_transport_strike.isra.16+0x15c/0x220 [sctp] [] sctp_cmd_interpreter.isra.21+0x1231/0x1a10 [sctp] [] sctp_do_sm+0xc3/0x2a0 [sctp] [] sctp_generate_timeout_event+0x81/0xf0 [sctp] This is caused by a transport use-after-free issue. When processing a duplicate COOKIE-ECHO chunk in sctp_sf_do_dupcook_a(), both COOKIE-ACK and SHUTDOWN chunks are allocated with the transort from the new asoc. However, later in the sideeffect machine, the old asoc is used to send them out and old asoc's shutdown_last_sent_to is set to the transport that SHUTDOWN chunk attached to in sctp_cmd_setup_t2(), which actually belongs to the new asoc. After the new_asoc is freed and the old asoc T2 timeout, the old asoc's shutdown_last_sent_to that is already freed would be accessed in sctp_sf_t2_timer_expire(). Thanks Alexander and Jere for helping dig into this issue. To fix it, this patch is to do the asoc update first, then allocate the COOKIE-ACK and SHUTDOWN chunks with the 'updated' old asoc. This would make more sense, as a chunk from an asoc shouldn't be sent out with another asoc. We had fixed quite a few issues caused by this. Fixes: 145cb2f7177d ("sctp: Fix bundling of SHUTDOWN with COOKIE-ACK") Reported-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Reported-by: syzbot+bbe538efd1046586f587@syzkaller.appspotmail.com Reported-by: Michal Tesar <mtesar@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-01vsock/vmci: Remove redundant assignment to errYang Li1-2/+0
Variable 'err' is set to zero but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed. Clean up the following clang-analyzer warning: net/vmw_vsock/vmci_transport.c:948:2: warning: Value stored to 'err' is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-30net: Remove redundant assignment to errYang Li1-3/+0
Variable 'err' is set to -ENOMEM but this value is never read as it is overwritten with a new value later on, hence the 'If statements' and assignments are redundantand and can be removed. Cleans up the following clang-analyzer warning: net/ipv6/seg6.c:126:4: warning: Value stored to 'err' is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-30bridge: Fix possible races between assigning rx_handler_data and setting ↵Zhang Zhengming1-2/+3
IFF_BRIDGE_PORT bit There is a crash in the function br_get_link_af_size_filtered, as the port_exists(dev) is true and the rx_handler_data of dev is NULL. But the rx_handler_data of dev is correct saved in vmcore. The oops looks something like: ... pc : br_get_link_af_size_filtered+0x28/0x1c8 [bridge] ... Call trace: br_get_link_af_size_filtered+0x28/0x1c8 [bridge] if_nlmsg_size+0x180/0x1b0 rtnl_calcit.isra.12+0xf8/0x148 rtnetlink_rcv_msg+0x334/0x370 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x28/0x38 netlink_unicast+0x1f0/0x250 netlink_sendmsg+0x310/0x378 sock_sendmsg+0x4c/0x70 __sys_sendto+0x120/0x150 __arm64_sys_sendto+0x30/0x40 el0_svc_common+0x78/0x130 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc In br_add_if(), we found there is no guarantee that assigning rx_handler_data to dev->rx_handler_data will before setting the IFF_BRIDGE_PORT bit of priv_flags. So there is a possible data competition: CPU 0: CPU 1: (RCU read lock) (RTNL lock) rtnl_calcit() br_add_slave() if_nlmsg_size() br_add_if() br_get_link_af_size_filtered() -> netdev_rx_handler_register ... // The order is not guaranteed ... -> dev->priv_flags |= IFF_BRIDGE_PORT; // The IFF_BRIDGE_PORT bit of priv_flags has been set -> if (br_port_exists(dev)) { // The dev->rx_handler_data has NOT been assigned -> p = br_port_get_rcu(dev); .... -> rcu_assign_pointer(dev->rx_handler_data, rx_handler_data); ... Fix it in br_get_link_af_size_filtered, using br_port_get_check_rcu() and checking the return value. Signed-off-by: Zhang Zhengming <zhangzhengming@huawei.com> Reviewed-by: Zhao Lei <zhaolei69@huawei.com> Reviewed-by: Wang Xiaogang <wangxiaogang3@huawei.com> Suggested-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-30net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packetsDavide Caratti1-4/+4
when 'act_mirred' tries to fragment IPv4 packets that had been previously re-assembled using 'act_ct', splats like the following can be observed on kernels built with KASAN: BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60 Read of size 1 at addr ffff888147009574 by task ping/947 CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: <IRQ> dump_stack+0x92/0xc1 print_address_description.constprop.7+0x1a/0x150 kasan_report.cold.13+0x7f/0x111 ip_do_fragment+0x1b03/0x1f60 sch_fragment+0x4bf/0xe40 tcf_mirred_act+0xc3d/0x11a0 [act_mirred] tcf_action_exec+0x104/0x3e0 fl_classify+0x49a/0x5e0 [cls_flower] tcf_classify_ingress+0x18a/0x820 __netif_receive_skb_core+0xae7/0x3340 __netif_receive_skb_one_core+0xb6/0x1b0 process_backlog+0x1ef/0x6c0 __napi_poll+0xaa/0x500 net_rx_action+0x702/0xac0 __do_softirq+0x1e4/0x97f do_softirq+0x71/0x90 </IRQ> __local_bh_enable_ip+0xdb/0xf0 ip_finish_output2+0x760/0x2120 ip_do_fragment+0x15a5/0x1f60 __ip_finish_output+0x4c2/0xea0 ip_output+0x1ca/0x4d0 ip_send_skb+0x37/0xa0 raw_sendmsg+0x1c4b/0x2d00 sock_sendmsg+0xdb/0x110 __sys_sendto+0x1d7/0x2b0 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f82e13853eb Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89 RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003 RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0 R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0 The buggy address belongs to the page: page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009 flags: 0x17ffffc0001000(reserved) raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 >ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 ^ ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then, in the following call graph: ip_do_fragment() ip_skb_dst_mtu() ip_dst_mtu_maybe_forward() ip_mtu_locked() the pointer to struct dst_entry is used as pointer to struct rtable: this turns the access to struct members like rt_mtu_locked into an OOB read in the stack. Fix this changing the temporary variable used for IPv4 packets in sch_fragment(), similarly to what is done for IPv6 few lines below. Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.") Cc: <stable@vger.kernel.org> # 5.11 Reported-by: Shuang Li <shuali@redhat.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-30openvswitch: fix stack OOB read while fragmenting IPv4 packetsDavide Caratti1-4/+4
running openvswitch on kernels built with KASAN, it's possible to see the following splat while testing fragmentation of IPv4 packets: BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60 Read of size 1 at addr ffff888112fc713c by task handler2/1367 CPU: 0 PID: 1367 Comm: handler2 Not tainted 5.12.0-rc6+ #418 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 Call Trace: dump_stack+0x92/0xc1 print_address_description.constprop.7+0x1a/0x150 kasan_report.cold.13+0x7f/0x111 ip_do_fragment+0x1b03/0x1f60 ovs_fragment+0x5bf/0x840 [openvswitch] do_execute_actions+0x1bd5/0x2400 [openvswitch] ovs_execute_actions+0xc8/0x3d0 [openvswitch] ovs_packet_cmd_execute+0xa39/0x1150 [openvswitch] genl_family_rcv_msg_doit.isra.15+0x227/0x2d0 genl_rcv_msg+0x287/0x490 netlink_rcv_skb+0x120/0x380 genl_rcv+0x24/0x40 netlink_unicast+0x439/0x630 netlink_sendmsg+0x719/0xbf0 sock_sendmsg+0xe2/0x110 ____sys_sendmsg+0x5ba/0x890 ___sys_sendmsg+0xe9/0x160 __sys_sendmsg+0xd3/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f957079db07 Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 eb ec ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 24 ed ff ff 48 RSP: 002b:00007f956ce35a50 EFLAGS: 00000293 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007f957079db07 RDX: 0000000000000000 RSI: 00007f956ce35ae0 RDI: 0000000000000019 RBP: 00007f956ce35ae0 R08: 0000000000000000 R09: 00007f9558006730 R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 R13: 00007f956ce37308 R14: 00007f956ce35f80 R15: 00007f956ce35ae0 The buggy address belongs to the page: page:00000000af2a1d93 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x112fc7 flags: 0x17ffffc0000000() raw: 0017ffffc0000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected addr ffff888112fc713c is located in stack of task handler2/1367 at offset 180 in frame: ovs_fragment+0x0/0x840 [openvswitch] this frame has 2 objects: [32, 144) 'ovs_dst' [192, 424) 'ovs_rt' Memory state around the buggy address: ffff888112fc7000: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888112fc7080: 00 f1 f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00 >ffff888112fc7100: 00 00 00 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 ^ ffff888112fc7180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888112fc7200: 00 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 for IPv4 packets, ovs_fragment() uses a temporary struct dst_entry. Then, in the following call graph: ip_do_fragment() ip_skb_dst_mtu() ip_dst_mtu_maybe_forward() ip_mtu_locked() the pointer to struct dst_entry is used as pointer to struct rtable: this turns the access to struct members like rt_mtu_locked into an OOB read in the stack. Fix this changing the temporary variable used for IPv4 packets in ovs_fragment(), similarly to what is done for IPv6 few lines below. Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmt") Cc: <stable@vger.kernel.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-30seg6: add counters support for SRv6 BehaviorsAndrea Mayer1-2/+196
This patch provides counters for SRv6 Behaviors as defined in [1], section 6. For each SRv6 Behavior instance, counters defined in [1] are: - the total number of packets that have been correctly processed; - the total amount of traffic in bytes of all packets that have been correctly processed; In addition, this patch introduces a new counter that counts the number of packets that have NOT been properly processed (i.e. errors) by an SRv6 Behavior instance. Counters are not only interesting for network monitoring purposes (i.e. counting the number of packets processed by a given behavior) but they also provide a simple tool for checking whether a behavior instance is working as we expect or not. Counters can be useful for troubleshooting misconfigured SRv6 networks. Indeed, an SRv6 Behavior can silently drop packets for very different reasons (i.e. wrong SID configuration, interfaces set with SID addresses, etc) without any notification/message to the user. Due to the nature of SRv6 networks, diagnostic tools such as ping and traceroute may be ineffective: paths used for reaching a given router can be totally different from the ones followed by probe packets. In addition, paths are often asymmetrical and this makes it even more difficult to keep up with the journey of the packets and to understand which behaviors are actually processing our traffic. When counters are enabled on an SRv6 Behavior instance, it is possible to verify if packets are actually processed by such behavior and what is the outcome of the processing. Therefore, the counters for SRv6 Behaviors offer an non-invasive observability point which can be leveraged for both traffic monitoring and troubleshooting purposes. [1] https://www.rfc-editor.org/rfc/rfc8986.html#name-counters Troubleshooting using SRv6 Behavior counters -------------------------------------------- Let's make a brief example to see how helpful counters can be for SRv6 networks. Let's consider a node where an SRv6 End Behavior receives an SRv6 packet whose Segment Left (SL) is equal to 0. In this case, the End Behavior (which accepts only packets with SL >= 1) discards the packet and increases the error counter. This information can be leveraged by the network operator for troubleshooting. Indeed, the error counter is telling the user that the packet: (i) arrived at the node; (ii) the packet has been taken into account by the SRv6 End behavior; (iii) but an error has occurred during the processing. The error (iii) could be caused by different reasons, such as wrong route settings on the node or due to an invalid SID List carried by the SRv6 packet. Anyway, the error counter is used to exclude that the packet did not arrive at the node or it has not been processed by the behavior at all. Turning on/off counters for SRv6 Behaviors ------------------------------------------ Each SRv6 Behavior instance can be configured, at the time of its creation, to make use of counters. This is done through iproute2 which allows the user to create an SRv6 Behavior instance specifying the optional "count" attribute as shown in the following example: $ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0 per-behavior counters can be shown by adding "-s" to the iproute2 command line, i.e.: $ ip -s -6 route show 2001:db8::1 2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Impact of counters for SRv6 Behaviors on performance ==================================================== To determine the performance impact due to the introduction of counters in the SRv6 Behavior subsystem, we have carried out extensive tests. We chose to test the throughput achieved by the SRv6 End.DX2 Behavior because, among all the other behaviors implemented so far, it reaches the highest throughput which is around 1.5 Mpps (per core at 2.4 GHz on a Xeon(R) CPU E5-2630 v3) on kernel 5.12-rc2 using packets of size ~ 100 bytes. Three different tests were conducted in order to evaluate the overall throughput of the SRv6 End.DX2 Behavior in the following scenarios: 1) vanilla kernel (without the SRv6 Behavior counters patch) and a single instance of an SRv6 End.DX2 Behavior; 2) patched kernel with SRv6 Behavior counters and a single instance of an SRv6 End.DX2 Behavior with counters turned off; 3) patched kernel with SRv6 Behavior counters and a single instance of SRv6 End.DX2 Behavior with counters turned on. All tests were performed on a testbed deployed on the CloudLab facilities [2], a flexible infrastructure dedicated to scientific research on the future of Cloud Computing. Results of tests are shown in the following table: Scenario (1): average 1504764,81 pps (~1504,76 kpps); std. dev 3956,82 pps Scenario (2): average 1501469,78 pps (~1501,47 kpps); std. dev 2979,85 pps Scenario (3): average 1501315,13 pps (~1501,32 kpps); std. dev 2956,00 pps As can be observed, throughputs achieved in scenarios (2),(3) did not suffer any observable degradation compared to scenario (1). Thanks to Jakub Kicinski and David Ahern for their valuable suggestions and comments provided during the discussion of the proposed RFCs. [2] https://www.cloudlab.us Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-29Merge tag 'net-next-5.13' of ↵Linus Torvalds340-7565/+14477
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - bpf: - allow bpf programs calling kernel functions (initially to reuse TCP congestion control implementations) - enable task local storage for tracing programs - remove the need to store per-task state in hash maps, and allow tracing programs access to task local storage previously added for BPF_LSM - add bpf_for_each_map_elem() helper, allowing programs to walk all map elements in a more robust and easier to verify fashion - sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT redirection - lpm: add support for batched ops in LPM trie - add BTF_KIND_FLOAT support - mostly to allow use of BTF on s390 which has floats in its headers files - improve BPF syscall documentation and extend the use of kdoc parsing scripts we already employ for bpf-helpers - libbpf, bpftool: support static linking of BPF ELF files - improve support for encapsulation of L2 packets - xdp: restructure redirect actions to avoid a runtime lookup, improving performance by 4-8% in microbenchmarks - xsk: build skb by page (aka generic zerocopy xmit) - improve performance of software AF_XDP path by 33% for devices which don't need headers in the linear skb part (e.g. virtio) - nexthop: resilient next-hop groups - improve path stability on next-hops group changes (incl. offload for mlxsw) - ipv6: segment routing: add support for IPv4 decapsulation - icmp: add support for RFC 8335 extended PROBE messages - inet: use bigger hash table for IP ID generation - tcp: deal better with delayed TX completions - make sure we don't give up on fast TCP retransmissions only because driver is slow in reporting that it completed transmitting the original - tcp: reorder tcp_congestion_ops for better cache locality - mptcp: - add sockopt support for common TCP options - add support for common TCP msg flags - include multiple address ids in RM_ADDR - add reset option support for resetting one subflow - udp: GRO L4 improvements - improve 'forward' / 'frag_list' co-existence with UDP tunnel GRO, allowing the first to take place correctly even for encapsulated UDP traffic - micro-optimize dev_gro_receive() and flow dissection, avoid retpoline overhead on VLAN and TEB GRO - use less memory for sysctls, add a new sysctl type, to allow using u8 instead of "int" and "long" and shrink networking sysctls - veth: allow GRO without XDP - this allows aggregating UDP packets before handing them off to routing, bridge, OvS, etc. - allow specifing ifindex when device is moved to another namespace - netfilter: - nft_socket: add support for cgroupsv2 - nftables: add catch-all set element - special element used to define a default action in case normal lookup missed - use net_generic infra in many modules to avoid allocating per-ns memory unnecessarily - xps: improve the xps handling to avoid potential out-of-bound accesses and use-after-free when XPS change race with other re-configuration under traffic - add a config knob to turn off per-cpu netdev refcnt to catch underflows in testing Device APIs: - add WWAN subsystem to organize the WWAN interfaces better and hopefully start driving towards more unified and vendor- independent APIs - ethtool: - add interface for reading IEEE MIB stats (incl. mlx5 and bnxt support) - allow network drivers to dump arbitrary SFP EEPROM data, current offset+length API was a poor fit for modern SFP which define EEPROM in terms of pages (incl. mlx5 support) - act_police, flow_offload: add support for packet-per-second policing (incl. offload for nfp) - psample: add additional metadata attributes like transit delay for packets sampled from switch HW (and corresponding egress and policy-based sampling in the mlxsw driver) - dsa: improve support for sandwiched LAGs with bridge and DSA - netfilter: - flowtable: use direct xmit in topologies with IP forwarding, bridging, vlans etc. - nftables: counter hardware offload support - Bluetooth: - improvements for firmware download w/ Intel devices - add support for reading AOSP vendor capabilities - add support for virtio transport driver - mac80211: - allow concurrent monitor iface and ethernet rx decap - set priority and queue mapping for injected frames - phy: add support for Clause-45 PHY Loopback - pci/iov: add sysfs MSI-X vector assignment interface to distribute MSI-X resources to VFs (incl. mlx5 support) New hardware/drivers: - dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit interfaces. - dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and BCM63xx switches - Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches - ath11k: support for QCN9074 a 802.11ax device - Bluetooth: Broadcom BCM4330 and BMC4334 - phy: Marvell 88X2222 transceiver support - mdio: add BCM6368 MDIO mux bus controller - r8152: support RTL8153 and RTL8156 (USB Ethernet) chips - mana: driver for Microsoft Azure Network Adapter (MANA) - Actions Semi Owl Ethernet MAC - can: driver for ETAS ES58X CAN/USB interfaces Pure driver changes: - add XDP support to: enetc, igc, stmmac - add AF_XDP support to: stmmac - virtio: - page_to_skb() use build_skb when there's sufficient tailroom (21% improvement for 1000B UDP frames) - support XDP even without dedicated Tx queues - share the Tx queues with the stack when necessary - mlx5: - flow rules: add support for mirroring with conntrack, matching on ICMP, GTP, flex filters and more - support packet sampling with flow offloads - persist uplink representor netdev across eswitch mode changes - allow coexistence of CQE compression and HW time-stamping - add ethtool extended link error state reporting - ice, iavf: support flow filters, UDP Segmentation Offload - dpaa2-switch: - move the driver out of staging - add spanning tree (STP) support - add rx copybreak support - add tc flower hardware offload on ingress traffic - ionic: - implement Rx page reuse - support HW PTP time-stamping - octeon: support TC hardware offloads - flower matching on ingress and egress ratelimitting. - stmmac: - add RX frame steering based on VLAN priority in tc flower - support frame preemption (FPE) - intel: add cross time-stamping freq difference adjustment - ocelot: - support forwarding of MRP frames in HW - support multiple bridges - support PTP Sync one-step timestamping - dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like learning, flooding etc. - ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350, SC7280 SoCs) - mt7601u: enable TDLS support - mt76: - add support for 802.3 rx frames (mt7915/mt7615) - mt7915 flash pre-calibration support - mt7921/mt7663 runtime power management fixes" * tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits) net: selftest: fix build issue if INET is disabled net: netrom: nr_in: Remove redundant assignment to ns net: tun: Remove redundant assignment to ret net: phy: marvell: add downshift support for M88E1240 net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255 net/sched: act_ct: Remove redundant ct get and check icmp: standardize naming of RFC 8335 PROBE constants bpf, selftests: Update array map tests for per-cpu batched ops bpf: Add batched ops support for percpu array bpf: Implement formatted output helpers with bstr_printf seq_file: Add a seq_bprintf function sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues net:nfc:digital: Fix a double free in digital_tg_recv_dep_req net: fix a concurrency bug in l2tp_tunnel_register() net/smc: Remove redundant assignment to rc mpls: Remove redundant assignment to err llc2: Remove redundant assignment to rc net/tls: Remove redundant initialization of record rds: Remove redundant assignment to nr_sig dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0 ...
2021-04-29net: selftest: fix build issue if INET is disabledOleksij Rempel3-3/+3
In case ethernet driver is enabled and INET is disabled, selftest will fail to build. Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: 3e1e58d64c3d ("net: add generic selftest support") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210428130947.29649-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-28net: netrom: nr_in: Remove redundant assignment to nsJiapeng Chong1-1/+0
Variable ns is set to 'skb->data[17]' but this value is never read as it is overwritten or not used later on, hence it is a redundant assignment and can be removed. Cleans up the following clang-analyzer warning: net/netrom/nr_in.c:156:2: warning: Value stored to 'ns' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/1619603885-115604-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-28net/sched: act_ct: Remove redundant ct get and checkRoi Dayan1-3/+1
The assignment is not being used and redundant. The check for null is redundant as nf_conntrack_put() also checks this. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Link: https://lore.kernel.org/r/20210428060532.3330974-1-roid@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-28icmp: standardize naming of RFC 8335 PROBE constantsAndreas Roeseler1-8/+8
The current definitions of constants for PROBE, currently defined only in the net-next kernel branch, are inconsistent, with some beginning with ICMP and others with simply EXT. This patch attempts to standardize the naming conventions of the constants for PROBE before their release into a stable Kernel, and to update the relevant definitions in net/ipv4/icmp.c. Similarly, the definitions for the code field (previously ICMP_EXT_MAL_QUERY, etc) use the same prefixes as the type field. This patch adds _CODE_ to the prefix to clarify the distinction of these constants. Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com> Acked-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210427153635.2591-1-andreas.a.roeseler@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-28Merge tag 'sched-core-2021-04-28' of ↵Linus Torvalds3-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Clean up SCHED_DEBUG: move the decades old mess of sysctl, procfs and debugfs interfaces to a unified debugfs interface. - Signals: Allow caching one sigqueue object per task, to improve performance & latencies. - Improve newidle_balance() irq-off latencies on systems with a large number of CPU cgroups. - Improve energy-aware scheduling - Improve the PELT metrics for certain workloads - Reintroduce select_idle_smt() to improve load-balancing locality - but without the previous regressions - Add 'scheduler latency debugging': warn after long periods of pending need_resched. This is an opt-in feature that requires the enabling of the LATENCY_WARN scheduler feature, or the use of the resched_latency_warn_ms=xx boot parameter. - CPU hotplug fixes for HP-rollback, and for the 'fail' interface. Fix remaining balance_push() vs. hotplug holes/races - PSI fixes, plus allow /proc/pressure/ files to be written by CAP_SYS_RESOURCE tasks as well - Fix/improve various load-balancing corner cases vs. capacity margins - Fix sched topology on systems with NUMA diameter of 3 or above - Fix PF_KTHREAD vs to_kthread() race - Minor rseq optimizations - Misc cleanups, optimizations, fixes and smaller updates * tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits) cpumask/hotplug: Fix cpu_dying() state tracking kthread: Fix PF_KTHREAD vs to_kthread() race sched/debug: Fix cgroup_path[] serialization sched,psi: Handle potential task count underflow bugs more gracefully sched: Warn on long periods of pending need_resched sched/fair: Move update_nohz_stats() to the CONFIG_NO_HZ_COMMON block to simplify the code & fix an unused function warning sched/debug: Rename the sched_debug parameter to sched_verbose sched,fair: Alternative sched_slice() sched: Move /proc/sched_debug to debugfs sched,debug: Convert sysctl sched_domains to debugfs debugfs: Implement debugfs_create_str() sched,preempt: Move preempt_dynamic to debug.c sched: Move SCHED_DEBUG sysctl to debugfs sched: Don't make LATENCYTOP select SCHED_DEBUG sched: Remove sched_schedstats sysctl out from under SCHED_DEBUG sched/numa: Allow runtime enabling/disabling of NUMA balance without SCHED_DEBUG sched: Use cpu_dying() to fix balance_push vs hotplug-rollback cpumask: Introduce DYING mask cpumask: Make cpu_{online,possible,present,active}() inline rseq: Optimise rseq_get_rseq_cs() and clear_rseq_cs() ...
2021-04-28Fix misc new gcc warningsLinus Torvalds1-1/+1
It seems like Fedora 34 ends up enabling a few new gcc warnings, notably "-Wstringop-overread" and "-Warray-parameter". Both of them cause what seem to be valid warnings in the kernel, where we have array size mismatches in function arguments (that are no longer just silently converted to a pointer to element, but actually checked). This fixes most of the trivial ones, by making the function declaration match the function definition, and in the case of intel_pm.c, removing the over-specified array size from the argument declaration. At least one 'stringop-overread' warning remains in the i915 driver, but that one doesn't have the same obvious trivial fix, and may or may not actually be indicative of a bug. [ It was a mistake to upgrade one of my machines to Fedora 34 while being busy with the merge window, but if this is the extent of the compiler upgrade problems, things are better than usual - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-28net:nfc:digital: Fix a double free in digital_tg_recv_dep_reqLv Yunlong1-0/+2
In digital_tg_recv_dep_req, it calls nfc_tm_data_received(..,resp). If nfc_tm_data_received() failed, the callee will free the resp via kfree_skb() and return error. But in the exit branch, the resp will be freed again. My patch sets resp to NULL if nfc_tm_data_received() failed, to avoid the double free. Fixes: 1c7a4c24fbfd9 ("NFC Digital: Add target NFC-DEP support") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller7-103/+525
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add support for the catch-all set element. This special element can be used to define a default action to be applied in case that the set lookup returns no matching element. 2) Fix incorrect #ifdef dependencies in the nftables cgroupsv2 support, from Arnd Bergmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28net: fix a concurrency bug in l2tp_tunnel_register()Gong, Sishuai1-5/+5
l2tp_tunnel_register() registers a tunnel without fully initializing its attribute. This can allow another kernel thread running l2tp_xmit_core() to access the uninitialized data and then cause a kernel NULL pointer dereference error, as shown below. Thread 1 Thread 2 //l2tp_tunnel_register() list_add_rcu(&tunnel->list, &pn->l2tp_tunnel_list); //pppol2tp_connect() tunnel = l2tp_tunnel_get(sock_net(sk), info.tunnel_id); // Fetch the new tunnel ... //l2tp_xmit_core() struct sock *sk = tunnel->sock; ... bh_lock_sock(sk); //Null pointer error happens tunnel->sock = sk; Fix this bug by initializing tunnel->sock before adding the tunnel into l2tp_tunnel_list. Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Sishuai Gong <sishuai@purdue.edu> Reported-by: Sishuai Gong <sishuai@purdue.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28net/smc: Remove redundant assignment to rcJiapeng Chong1-1/+0
Variable rc is set to zero but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed. Cleans up the following clang-analyzer warning: net/smc/af_smc.c:1079:3: warning: Value stored to 'rc' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28mpls: Remove redundant assignment to errJiapeng Chong1-1/+0
Variable err is set to -ENOMEM but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed. Cleans up the following clang-analyzer warning: net/mpls/af_mpls.c:1022:2: warning: Value stored to 'err' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28llc2: Remove redundant assignment to rcJiapeng Chong1-2/+0
Variable rc is set to zero but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed. Cleans up the following clang-analyzer warning: net/llc/llc_station.c:86:2: warning: Value stored to 'rc' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28net/tls: Remove redundant initialization of recordJiapeng Chong1-1/+1
record is being initialized to ctx->open_record but this is never read as record is overwritten later on. Remove the redundant initialization. Cleans up the following clang-analyzer warning: net/tls/tls_device.c:421:26: warning: Value stored to 'record' during its initialization is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28rds: Remove redundant assignment to nr_sigJiapeng Chong1-1/+0
Variable nr_sig is being assigned a value however the assignment is never read, so this redundant assignment can be removed. Cleans up the following clang-analyzer warning: net/rds/ib_send.c:297:2: warning: Value stored to 'nr_sig' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28net: mscc: ocelot: support PTP Sync one-step timestampingYangbo Lu3-53/+17
Although HWTSTAMP_TX_ONESTEP_SYNC existed in ioctl for hardware timestamp configuration, the PTP Sync one-step timestamping had never been supported. This patch is to truely support it. - ocelot_port_txtstamp_request() This function handles tx timestamp request by storing ptp_cmd(tx timestamp type) in OCELOT_SKB_CB(skb)->ptp_cmd, and additionally for two-step timestamp storing ts_id in OCELOT_SKB_CB(clone)->ptp_cmd. - ocelot_ptp_rew_op() During xmit, this function is called to get rew_op (rewriter option) by checking skb->cb for tx timestamp request, and configure to transmitting. Non-onestep-Sync packet with one-step timestamp request falls back to use two-step timestamp. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28net: dsa: free skb->cb usage in core driverYangbo Lu3-9/+9
Free skb->cb usage in core driver and let device drivers decide to use or not. The reason having a DSA_SKB_CB(skb)->clone was because dsa_skb_tx_timestamp() which may set the clone pointer was called before p->xmit() which would use the clone if any, and the device driver has no way to initialize the clone pointer. This patch just put memset(skb->cb, 0, sizeof(skb->cb)) at beginning of dsa_slave_xmit(). Some new features in the future, like one-step timestamp may need more bytes of skb->cb to use in dsa_skb_tx_timestamp(), and p->xmit(). Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28net: dsa: no longer clone skb in core driverYangbo Lu1-11/+1
It was a waste to clone skb directly in dsa_skb_tx_timestamp(). For one-step timestamping, a clone was not needed. For any failure of port_txtstamp (this may usually happen), the skb clone had to be freed. So this patch moves skb cloning for tx timestamp out of dsa core, and let drivers clone skb in port_txtstamp if they really need. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28net: dsa: no longer identify PTP packet in core driverYangbo Lu1-10/+2
Move ptp_classify_raw out of dsa core driver for handling tx timestamp request. Let device drivers do this if they want. Not all drivers want to limit tx timestamping for only PTP packet. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28net: dsa: check tx timestamp request in core driverYangbo Lu1-0/+3
Check tx timestamp request in core driver at very beginning of dsa_skb_tx_timestamp(), so that most skbs not requiring tx timestamp just return. And drop such checking in device drivers. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28rxrpc: rxkad: Remove redundant variable offsetJiapeng Chong1-2/+0
Variable offset is being assigned a value from a calculation however the variable is never read, so this redundant variable can be removed. Cleans up the following clang-analyzer warning: net/rxrpc/rxkad.c:579:2: warning: Value stored to 'offset' is never read [clang-analyzer-deadcode.DeadStores]. net/rxrpc/rxkad.c:485:2: warning: Value stored to 'offset' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-28net: bridge: mcast: fix broken length + header check for MRDv6 Adv.Linus Lüssing2-30/+15
The IPv6 Multicast Router Advertisements parsing has the following two issues: For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes, assuming MLDv2 Reports with at least one multicast address entry). When ipv6_mc_check_mld_msg() tries to parse an Multicast Router Advertisement its MLD length check will fail - and it will wrongly return -EINVAL, even if we have a valid MRD Advertisement. With the returned -EINVAL the bridge code will assume a broken packet and will wrongly discard it, potentially leading to multicast packet loss towards multicast routers. The second issue is the MRD header parsing in br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header immediately after the IPv6 header (IPv6 next header type). However according to RFC4286, section 2 all MRD messages contain a Router Alert option (just like MLD). So instead there is an IPv6 Hop-by-Hop option for the Router Alert between the IPv6 and ICMPv6 header, again leading to the bridge wrongly discarding Multicast Router Advertisements. To fix these two issues, introduce a new return value -ENODATA to ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop option which is not an MLD but potentially an MRD packet. This also simplifies further parsing in the bridge code, as ipv6_mc_check_mld() already fully checks the ICMPv6 header and hop-by-hop option. These issues were found and fixed with the help of the mrdisc tool (https://github.com/troglobit/mrdisc). Fixes: 4b3087c7e37f ("bridge: Snoop Multicast Router Advertisements") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27Merge tag 'selinux-pr-20210426' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add support for measuring the SELinux state and policy capabilities using IMA. - A handful of SELinux/NFS patches to compare the SELinux state of one mount with a set of mount options. Olga goes into more detail in the patch descriptions, but this is important as it allows more flexibility when using NFS and SELinux context mounts. - Properly differentiate between the subjective and objective LSM credentials; including support for the SELinux and Smack. My clumsy attempt at a proper fix for AppArmor didn't quite pass muster so John is working on a proper AppArmor patch, in the meantime this set of patches shouldn't change the behavior of AppArmor in any way. This change explains the bulk of the diffstat beyond security/. - Fix a problem where we were not properly terminating the permission list for two SELinux object classes. * tag 'selinux-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: add proper NULL termination to the secclass_map permissions smack: differentiate between subjective and objective task credentials selinux: clarify task subjective and objective credentials lsm: separate security_task_getsecid() into subjective and objective variants nfs: account for selinux security context when deciding to share superblock nfs: remove unneeded null check in nfs_fill_super() lsm,selinux: add new hook to compare new mount to an existing mount selinux: fix misspellings using codespell tool selinux: fix misspellings using codespell tool selinux: measure state and policy capabilities selinux: Allow context mounts for unpriviliged overlayfs
2021-04-27netfilter: nft_socket: fix build with CONFIG_SOCK_CGROUP_DATA=nArnd Bergmann1-2/+2
In some configurations, the sock_cgroup_ptr() function is not available: net/netfilter/nft_socket.c: In function 'nft_sock_get_eval_cgroupv2': net/netfilter/nft_socket.c:47:16: error: implicit declaration of function 'sock_cgroup_ptr'; did you mean 'obj_cgroup_put'? [-Werror=implicit-function-declaration] 47 | cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); | ^~~~~~~~~~~~~~~ | obj_cgroup_put net/netfilter/nft_socket.c:47:14: error: assignment to 'struct cgroup *' from 'int' makes pointer from integer without a cast [-Werror=int-conversion] 47 | cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); | ^ Change the caller to match the same #ifdef check, only calling it when the function is defined. Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27netfilter: nft_socket: fix an unused variable warningArnd Bergmann1-2/+5
The variable is only used in an #ifdef, causing a harmless warning: net/netfilter/nft_socket.c: In function 'nft_socket_init': net/netfilter/nft_socket.c:137:27: error: unused variable 'level' [-Werror=unused-variable] 137 | unsigned int len, level; | ^~~~~ Move it into the same #ifdef block. Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27Merge tag 'afs-netfs-lib-20210426' of ↵Linus Torvalds1-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS updates from David Howells: "Use the new netfs lib. Begin the process of overhauling the use of the fscache API by AFS and the introduction of support for features such as Transparent Huge Pages (THPs). - Add some support for THPs, including using core VM helper functions to find details of pages. - Use the ITER_XARRAY I/O iterator to mediate access to the pagecache as this handles THPs and doesn't require allocation of large bvec arrays. - Delegate address_space read/pre-write I/O methods for AFS to the netfs helper library. A method is provided to the library that allows it to issue a read against the server. This includes a change in use for PG_fscache (it now indicates a DIO write in progress from the marked page), so a number of waits need to be deployed for it. - Split the core AFS writeback function to make it easier to modify in future patches to handle writing to the cache. [This might feasibly make more sense moved out into my fscache-iter branch]. I've tested these with "xfstests -g quick" against an AFS volume (xfstests needs patching to make it work). With this, AFS without a cache passes all expected xfstests; with a cache, there's an extra failure, but that's also there before these patches. Fixing that probably requires a greater overhaul (as can be found on my fscache-iter branch, but that's for a later time). Thanks should go to Marc Dionne and Jeff Altman of AuriStor for exercising the patches in their test farm also" Link: https://lore.kernel.org/lkml/3785063.1619482429@warthog.procyon.org.uk/ * tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Use the netfs_write_begin() helper afs: Use new netfs lib read helper API afs: Use the fs operation ops to handle FetchData completion afs: Prepare for use of THPs afs: Extract writeback extension into its own function afs: Wait on PG_fscache before modifying/releasing a page afs: Use ITER_XARRAY for writing afs: Set up the iov_iter before calling afs_extract_data() afs: Log remote unmarshalling errors afs: Don't truncate iter during data fetch afs: Move key to afs_read struct afs: Print the operation debug_id when logging an unexpected data version afs: Pass page into dirty region helpers to provide THP size afs: Disable use of the fscache I/O routines
2021-04-27Merge tag 'cfi-v5.13-rc1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull CFI on arm64 support from Kees Cook: "This builds on last cycle's LTO work, and allows the arm64 kernels to be built with Clang's Control Flow Integrity feature. This feature has happily lived in Android kernels for almost 3 years[1], so I'm excited to have it ready for upstream. The wide diffstat is mainly due to the treewide fixing of mismatched list_sort prototypes. Other things in core kernel are to address various CFI corner cases. The largest code portion is the CFI runtime implementation itself (which will be shared by all architectures implementing support for CFI). The arm64 pieces are Acked by arm64 maintainers rather than coming through the arm64 tree since carrying this tree over there was going to be awkward. CFI support for x86 is still under development, but is pretty close. There are a handful of corner cases on x86 that need some improvements to Clang and objtool, but otherwise works well. Summary: - Clean up list_sort prototypes (Sami Tolvanen) - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)" * tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: arm64: allow CONFIG_CFI_CLANG to be selected KVM: arm64: Disable CFI for nVHE arm64: ftrace: use function_nocfi for ftrace_call arm64: add __nocfi to __apply_alternatives arm64: add __nocfi to functions that jump to a physical address arm64: use function_nocfi with __pa_symbol arm64: implement function_nocfi psci: use function_nocfi for cpu_resume lkdtm: use function_nocfi treewide: Change list_sort to use const pointers bpf: disable CFI in dispatcher functions kallsyms: strip ThinLTO hashes from static functions kthread: use WARN_ON_FUNCTION_MISMATCH workqueue: use WARN_ON_FUNCTION_MISMATCH module: ensure __cfi_check alignment mm: add generic function_nocfi macro cfi: add __cficanonical add support for Clang CFI
2021-04-27netfilter: nftables: add catch-all set element supportPablo Neira Ayuso6-63/+458
This patch extends the set infrastructure to add a special catch-all set element. If the lookup fails to find an element (or range) in the set, then the catch-all element is selected. Users can specify a mapping, expression(s) and timeout to be attached to the catch-all element. This patch adds a catchall list to the set, this list might contain more than one single catch-all element (e.g. in case that the catch-all element is removed and a new one is added in the same transaction). However, most of the time, there will be either one element or no elements at all in this list. The catch-all element is identified via NFT_SET_ELEM_CATCHALL flag and such special element has no NFTA_SET_ELEM_KEY attribute. There is a new nft_set_elem_catchall object that stores a reference to the dummy catch-all element (catchall->elem) whose layout is the same of the set element type to reuse the existing set element codebase. The set size does not apply to the catch-all element, users can define a catch-all element even if the set is full. The check for valid set element flags hava been updates to report EOPNOTSUPP in case userspace requests flags that are not supported when using new userspace nftables and old kernel. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-27netfilter: nftables: add helper function to validate set element dataPablo Neira Ayuso1-4/+11
When binding sets to rule, validate set element data according to set definition. This patch adds a helper function to be reused by the catch-all set element support. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>