summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-08-25net: genl: fix error path memory leak in policy dumpingJakub Kicinski2-3/+17
commit 249801360db3dec4f73768c502192020bfddeacc upstream. If construction of the array of policies fails when recording non-first policy we need to unwind. netlink_policy_dump_add_policy() itself also needs fixing as it currently gives up on error without recording the allocated pointer in the pstate pointer. Reported-by: syzbot+dc54d9ba8153b216cae0@syzkaller.appspotmail.com Fixes: 50a896cf2d6f ("genetlink: properly support per-op policy dumping") Link: https://lore.kernel.org/r/20220816161939.577583-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specifiedPablo Neira Ayuso1-0/+5
commit 1b6345d4160ecd3d04bd8cd75df90c67811e8cc9 upstream. Since f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields"), it possible to combine intervals and concatenations. Later on, ef516e8625dd ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag for userspace to report that the set stores a concatenation. Make sure NFT_SET_CONCAT is set on if field_count is specified for consistency. Otherwise, if NFT_SET_CONCAT is specified with no field_count, bail out with EINVAL. Fixes: ef516e8625dd ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flagPablo Neira Ayuso1-4/+9
commit 5a2f3dc31811e93be15522d9eb13ed61460b76c8 upstream. If the NFTA_SET_ELEM_OBJREF netlink attribute is present and NFT_SET_OBJECT flag is set on, report EINVAL. Move existing sanity check earlier to validate that NFT_SET_OBJECT requires NFTA_SET_ELEM_OBJREF. Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25netfilter: nf_tables: really skip inactive sets when allocating namePablo Neira Ayuso1-1/+1
commit 271c5ca826e0c3c53e0eb4032f8eaedea1ee391c upstream. While looping to build the bitmap of used anonymous set names, check the current set in the iteration, instead of the one that is being created. Fixes: 37a9cc525525 ("netfilter: nf_tables: add generation mask to sets") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout()Peilin Ye1-0/+1
commit a3e7b29e30854ed67be0d17687e744ad0c769c4b upstream. Imagine two non-blocking vsock_connect() requests on the same socket. The first request schedules @connect_work, and after it times out, vsock_connect_timeout() sets *sock* state back to TCP_CLOSE, but keeps *socket* state as SS_CONNECTING. Later, the second request returns -EALREADY, meaning the socket "already has a pending connection in progress", even though the first request has already timed out. As suggested by Stefano, fix it by setting *socket* state back to SS_UNCONNECTED, so that the second request will return -ETIMEDOUT. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25vsock: Fix memory leak in vsock_connect()Peilin Ye1-1/+8
commit 7e97cfed9929eaabc41829c395eb0d1350fccb9d upstream. An O_NONBLOCK vsock_connect() request may try to reschedule @connect_work. Imagine the following sequence of vsock_connect() requests: 1. The 1st, non-blocking request schedules @connect_work, which will expire after 200 jiffies. Socket state is now SS_CONNECTING; 2. Later, the 2nd, blocking request gets interrupted by a signal after a few jiffies while waiting for the connection to be established. Socket state is back to SS_UNCONNECTED, but @connect_work is still pending, and will expire after 100 jiffies. 3. Now, the 3rd, non-blocking request tries to schedule @connect_work again. Since @connect_work is already scheduled, schedule_delayed_work() silently returns. sock_hold() is called twice, but sock_put() will only be called once in vsock_connect_timeout(), causing a memory leak reported by syzbot: BUG: memory leak unreferenced object 0xffff88810ea56a40 (size 1232): comm "syz-executor756", pid 3604, jiffies 4294947681 (age 12.350s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 28 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00 (..@............ backtrace: [<ffffffff837c830e>] sk_prot_alloc+0x3e/0x1b0 net/core/sock.c:1930 [<ffffffff837cbe22>] sk_alloc+0x32/0x2e0 net/core/sock.c:1989 [<ffffffff842ccf68>] __vsock_create.constprop.0+0x38/0x320 net/vmw_vsock/af_vsock.c:734 [<ffffffff842ce8f1>] vsock_create+0xc1/0x2d0 net/vmw_vsock/af_vsock.c:2203 [<ffffffff837c0cbb>] __sock_create+0x1ab/0x2b0 net/socket.c:1468 [<ffffffff837c3acf>] sock_create net/socket.c:1519 [inline] [<ffffffff837c3acf>] __sys_socket+0x6f/0x140 net/socket.c:1561 [<ffffffff837c3bba>] __do_sys_socket net/socket.c:1570 [inline] [<ffffffff837c3bba>] __se_sys_socket net/socket.c:1568 [inline] [<ffffffff837c3bba>] __x64_sys_socket+0x1a/0x20 net/socket.c:1568 [<ffffffff84512815>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84512815>] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80 [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae <...> Use mod_delayed_work() instead: if @connect_work is already scheduled, reschedule it, and undo sock_hold() to keep the reference count balanced. Reported-and-tested-by: syzbot+b03f55bf128f9a38f064@syzkaller.appspotmail.com Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Co-developed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25ipv6: do not use RT_TOS for IPv6 flowlabelMatthias May1-2/+1
commit ab7e2e0dfa5d37540ab1dc5376e9a2cb9188925d upstream. According to Guillaume Nault RT_TOS should never be used for IPv6. Quote: RT_TOS() is an old macro used to interprete IPv4 TOS as described in the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4 code, although, given the current state of the code, most of the existing calls have no consequence. But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS" field to be interpreted the RFC 1349 way. There's no historical compatibility to worry about. Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Matthias May <matthias.may@westermo.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25devlink: Fix use-after-free after a failed reloadIdo Schimmel1-2/+2
commit 6b4db2e528f650c7fb712961aac36455468d5902 upstream. After a failed devlink reload, devlink parameters are still registered, which means user space can set and get their values. In the case of the mlxsw "acl_region_rehash_interval" parameter, these operations will trigger a use-after-free [1]. Fix this by rejecting set and get operations while in the failed state. Return the "-EOPNOTSUPP" error code which does not abort the parameters dump, but instead causes it to skip over the problematic parameter. Another possible fix is to perform these checks in the mlxsw parameter callbacks, but other drivers might be affected by the same problem and I am not aware of scenarios where these stricter checks will cause a regression. [1] mlxsw_spectrum3 0000:00:10.0: Port 125: Failed to register netdev mlxsw_spectrum3 0000:00:10.0: Failed to create ports ================================================================== BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904 Read of size 4 at addr ffff8880099dcfd8 by task kworker/u4:4/777 CPU: 1 PID: 777 Comm: kworker/u4:4 Not tainted 5.19.0-rc7-custom-126601-gfe26f28c586d #1 Hardware name: QEMU MSN4700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x92/0xbd lib/dump_stack.c:106 print_address_description mm/kasan/report.c:313 [inline] print_report.cold+0x5e/0x5cf mm/kasan/report.c:429 kasan_report+0xb9/0xf0 mm/kasan/report.c:491 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report_generic.c:306 mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904 mlxsw_sp_acl_region_rehash_intrvl_get+0x49/0x60 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c:1106 mlxsw_sp_params_acl_region_rehash_intrvl_get+0x33/0x80 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3854 devlink_param_get net/core/devlink.c:4981 [inline] devlink_nl_param_fill+0x238/0x12d0 net/core/devlink.c:5089 devlink_param_notify+0xe5/0x230 net/core/devlink.c:5168 devlink_ns_change_notify net/core/devlink.c:4417 [inline] devlink_ns_change_notify net/core/devlink.c:4396 [inline] devlink_reload+0x15f/0x700 net/core/devlink.c:4507 devlink_pernet_pre_exit+0x112/0x1d0 net/core/devlink.c:12272 ops_pre_exit_list net/core/net_namespace.c:152 [inline] cleanup_net+0x494/0xc00 net/core/net_namespace.c:582 process_one_work+0x9fc/0x1710 kernel/workqueue.c:2289 worker_thread+0x675/0x10b0 kernel/workqueue.c:2436 kthread+0x30c/0x3d0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> The buggy address belongs to the physical page: page:ffffea0000267700 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99dc flags: 0x100000000000000(node=0|zone=1) raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880099dce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8880099dcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff8880099dcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8880099dd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8880099dd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Fixes: 98bbf70c1c41 ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25SUNRPC: Reinitialise the backchannel request buffers before reuseTrond Myklebust1-0/+14
commit 6622e3a73112fc336c1c2c582428fb5ef18e456a upstream. When we're reusing the backchannel requests instead of freeing them, then we should reinitialise any values of the send/receive xdr_bufs so that they reflect the available space. Fixes: 0d2a970d0ae5 ("SUNRPC: Fix a backchannel race") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25sunrpc: fix expiry of auth credsDan Aloni1-1/+1
commit f1bafa7375c01ff71fb7cb97c06caadfcfe815f3 upstream. Before this commit, with a large enough LRU of expired items (100), the loop skipped all the expired items and was entirely ineffectual in trimming the LRU list. Fixes: 95cd623250ad ('SUNRPC: Clean up the AUTH cache code') Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25bpf: Check the validity of max_rdwr_access for sock local storage map iteratorHou Tao1-1/+1
commit 52bd05eb7c88e1ad8541a48873188ccebca9da26 upstream. The value of sock local storage map is writable in map iterator, so check max_rdwr_access instead of max_rdonly_access. Fixes: 5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25bpf: Acquire map uref in .init_seq_private for sock{map,hash} iteratorHou Tao1-1/+19
commit f0d2b2716d71778d0b0c8eaa433c073287d69d93 upstream. sock_map_iter_attach_target() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in sock_map_iter_detach_target() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). Fixing it by acquiring an extra map uref in .init_seq_private and releasing it in .fini_seq_private. Fixes: 0365351524d7 ("net: Allow iterating sockmap and sockhash") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25bpf: Acquire map uref in .init_seq_private for sock local storage map iteratorHou Tao1-1/+9
commit 3c5f6e698b5c538bbb23cd453b22e1e4922cffd8 upstream. bpf_iter_attach_map() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in bpf_iter_detach_map() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). So acquiring an extra map uref in bpf_iter_init_sk_storage_map() and releasing it in bpf_iter_fini_sk_storage_map(). Fixes: 5ce6e77c7edf ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-25rds: add missing barrier to release_refillMikulas Patocka1-0/+1
commit 9f414eb409daf4f778f011cf8266d36896bb930b upstream. The functions clear_bit and set_bit do not imply a memory barrier, thus it may be possible that the waitqueue_active function (which does not take any locks) is moved before clear_bit and it could miss a wakeup event. Fix this bug by adding a memory barrier after clear_bit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21net_sched: cls_route: disallow handle of 0Jamal Hadi Salim1-0/+10
commit 02799571714dc5dd6948824b9d080b44a295f695 upstream. Follows up on: https://lore.kernel.org/all/20220809170518.164662-1-cascardo@canonical.com/ handle of 0 implies from/to of universe realm which is not very sensible. Lets see what this patch will do: $sudo tc qdisc add dev $DEV root handle 1:0 prio //lets manufacture a way to insert handle of 0 $sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \ route to 0 from 0 classid 1:10 action ok //gets rejected... Error: handle of 0 is not valid. We have an error talking to the kernel, -1 //lets create a legit entry.. sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \ classid 1:10 action ok //what did the kernel insert? $sudo tc filter ls dev $DEV parent 1:0 filter protocol ip pref 100 route chain 0 filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 //Lets try to replace that legit entry with a handle of 0 $ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \ handle 0x000a8000 route to 0 from 0 classid 1:10 action drop Error: Replacing with handle of 0 is invalid. We have an error talking to the kernel, -1 And last, lets run Cascardo's POC: $ ./poc 0 0 -22 -22 -22 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21net/9p: Initialize the iounit field during fid creationTyler Hicks1-4/+1
commit aa7aeee169480e98cf41d83c01290a37e569be6d upstream. Ensure that the fid's iounit field is set to zero when a new fid is created. Certain 9P operations, such as OPEN and CREATE, allow the server to reply with an iounit size which the client code assigns to the p9_fid struct shortly after the fid is created by p9_fid_create(). On the other hand, an XATTRWALK operation doesn't allow for the server to specify an iounit value. The iounit field of the newly allocated p9_fid struct remained uninitialized in that case. Depending on allocation patterns, the iounit value could have been something reasonable that was carried over from previously freed fids or, in the worst case, could have been arbitrary values from non-fid related usages of the memory location. The bug was detected in the Windows Subsystem for Linux 2 (WSL2) kernel after the uninitialized iounit field resulted in the typical sequence of two getxattr(2) syscalls, one to get the size of an xattr and another after allocating a sufficiently sized buffer to fit the xattr value, to hit an unexpected ERANGE error in the second call to getxattr(2). An uninitialized iounit field would sometimes force rsize to be smaller than the xattr value size in p9_client_read_once() and the 9P server in WSL refused to chunk up the READ on the attr_fid and, instead, returned ERANGE to the client. The virtfs server in QEMU seems happy to chunk up the READ and this problem goes undetected there. Link: https://lkml.kernel.org/r/20220710141402.803295-1-tyhicks@linux.microsoft.com Fixes: ebf46264a004 ("fs/9p: Add support user. xattr") Cc: stable@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> [tyhicks: Adjusted context due to: - Lack of fid refcounting introduced in v5.11 commit 6636b6dcc3db ("9p: add refcount to p9_fid struct") - Difference in how buffer sizes are specified v5.16 commit 6e195b0f7c8e ("9p: fix a bunch of checkpatch warnings")] Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regressionLuiz Augusto von Dentz1-7/+6
commit 332f1795ca202489c665a75e62e18ff6284de077 upstream. The patch d0be8347c623: "Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put" from Jul 21, 2022, leads to the following Smatch static checker warning: net/bluetooth/l2cap_core.c:1977 l2cap_global_chan_by_psm() error: we previously assumed 'c' could be null (see line 1996) Fixes: d0be8347c623 ("Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21tcp: fix over estimation in sk_forced_mem_schedule()Eric Dumazet1-3/+4
commit c4ee118561a0f74442439b7b5b486db1ac1ddfeb upstream. sk_forced_mem_schedule() has a bug similar to ones fixed in commit 7c80b038d23e ("net: fix sk_wmem_schedule() and sk_rmem_schedule() errors") While this bug has little chance to trigger in old kernels, we need to fix it before the following patch. Fixes: d83769a580f1 ("tcp: fix possible deadlock in tcp_send_fin()") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21mac80211: fix a memory leak where sta_info is not freedAhmed Zaki1-3/+3
commit 8f9dcc29566626f683843ccac6113a12208315ca upstream. The following is from a system that went OOM due to a memory leak: wlan0: Allocated STA 74:83:c2:64:0b:87 wlan0: Allocated STA 74:83:c2:64:0b:87 wlan0: IBSS finish 74:83:c2:64:0b:87 (---from ieee80211_ibss_add_sta) wlan0: Adding new IBSS station 74:83:c2:64:0b:87 wlan0: moving STA 74:83:c2:64:0b:87 to state 2 wlan0: moving STA 74:83:c2:64:0b:87 to state 3 wlan0: Inserted STA 74:83:c2:64:0b:87 wlan0: IBSS finish 74:83:c2:64:0b:87 (---from ieee80211_ibss_work) wlan0: Adding new IBSS station 74:83:c2:64:0b:87 wlan0: moving STA 74:83:c2:64:0b:87 to state 2 wlan0: moving STA 74:83:c2:64:0b:87 to state 3 . . wlan0: expiring inactive not authorized STA 74:83:c2:64:0b:87 wlan0: moving STA 74:83:c2:64:0b:87 to state 2 wlan0: moving STA 74:83:c2:64:0b:87 to state 1 wlan0: Removed STA 74:83:c2:64:0b:87 wlan0: Destroyed STA 74:83:c2:64:0b:87 The ieee80211_ibss_finish_sta() is called twice on the same STA from 2 different locations. On the second attempt, the allocated STA is not destroyed creating a kernel memory leak. This is happening because sta_info_insert_finish() does not call sta_info_free() the second time when the STA already exists (returns -EEXIST). Note that the caller sta_info_insert_rcu() assumes STA is destroyed upon errors. Same fix is applied to -ENOMEM. Signed-off-by: Ahmed Zaki <anzaki@gmail.com> Link: https://lore.kernel.org/r/20211002145329.3125293-1-anzaki@gmail.com [change the error path label to use the existing code] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Viacheslav Sablin <sablin@ispras.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21net_sched: cls_route: remove from list when handle is 0Thadeu Lima de Souza Cascardo1-1/+1
commit 9ad36309e2719a884f946678e0296be10f0bb4c1 upstream. When a route filter is replaced and the old filter has a 0 handle, the old one won't be removed from the hashtable, while it will still be freed. The test was there since before commit 1109c00547fc ("net: sched: RCU cls_route"), when a new filter was not allocated when there was an old one. The old filter was reused and the reinserting would only be necessary if an old filter was replaced. That was still wrong for the same case where the old handle was 0. Remove the old filter from the list independently from its handle value. This fixes CVE-2022-2588, also reported as ZDI-CAN-17440. Reported-by: Zhenpeng Lin <zplin@u.northwestern.edu> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Kamal Mostafa <kamal@canonical.com> Cc: <stable@vger.kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220809170518.164662-1-cascardo@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lockHangyu Hua1-5/+5
[ Upstream commit a41b17ff9dacd22f5f118ee53d82da0f3e52d5e3 ] In the case of sk->dccps_qpolicy == DCCPQ_POLICY_PRIO, dccp_qpolicy_full will drop a skb when qpolicy is full. And the lock in dccp_sendmsg is released before sock_alloc_send_skb and then relocked after sock_alloc_send_skb. The following conditions may lead dccp_qpolicy_push to add skb to an already full sk_write_queue: thread1--->lock thread1--->dccp_qpolicy_full: queue is full. drop a skb thread1--->unlock thread2--->lock thread2--->dccp_qpolicy_full: queue is not full. no need to drop. thread2--->unlock thread1--->lock thread1--->dccp_qpolicy_push: add a skb. queue is full. thread1--->unlock thread2--->lock thread2--->dccp_qpolicy_push: add a skb! thread2--->unlock Fix this by moving dccp_qpolicy_full. Fixes: b1308dc015eb ("[DCCP]: Set TX Queue Length Bounds via Sysctl") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20220729110027.40569-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-21net: rose: fix netdev reference changesEric Dumazet2-2/+11
[ Upstream commit 931027820e4dafabc78aff82af59f8c1c4bd3128 ] Bernard reported that trying to unload rose module would lead to infamous messages: unregistered_netdevice: waiting for rose0 to become free. Usage count = xx This patch solves the issue, by making sure each socket referring to a netdevice holds a reference count on it, and properly releases it in rose_release(). rose_dev_first() is also fixed to take a device reference before leaving the rcu_read_locked section. Following patch will add ref_tracker annotations to ease future bug hunting. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-21ipv6: add READ_ONCE(sk->sk_bound_dev_if) in INET6_MATCH()Eric Dumazet3-5/+5
[ Upstream commit 5d368f03280d3678433a7f119efe15dfbbb87bc8 ] INET6_MATCH() runs without holding a lock on the socket. We probably need to annotate most reads. This patch makes INET6_MATCH() an inline function to ease our changes. v2: inline function only defined if IS_ENABLED(CONFIG_IPV6) Change the name to inet6_match(), this is no longer a macro. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-21inet: add READ_ONCE(sk->sk_bound_dev_if) in INET_MATCH()Eric Dumazet2-12/+6
[ Upstream commit 4915d50e300e96929d2462041d6f6c6f061167fd ] INET_MATCH() runs without holding a lock on the socket. We probably need to annotate most reads. This patch makes INET_MATCH() an inline function to ease our changes. v2: We remove the 32bit version of it, as modern compilers should generate the same code really, no need to try to be smarter. Also make 'struct net *net' the first argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-21tcp: make retransmitted SKB fit into the send windowYonglong Li1-7/+16
[ Upstream commit 536a6c8e05f95e3d1118c40ae8b3022ee2d05d52 ] current code of __tcp_retransmit_skb only check TCP_SKB_CB(skb)->seq in send window, and TCP_SKB_CB(skb)->seq_end maybe out of send window. If receiver has shrunk his window, and skb is out of new window, it should retransmit a smaller portion of the payload. test packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8> +.05 < S. 0:0(0) ack 1 win 6000 <mss 1000,nop,nop,sackOK> +0 > . 1:1(0) ack 1 +0 write(3, ..., 10000) = 10000 +0 > . 1:2001(2000) ack 1 win 65535 +0 > . 2001:4001(2000) ack 1 win 65535 +0 > . 4001:6001(2000) ack 1 win 65535 +.05 < . 1:1(0) ack 4001 win 1001 and tcpdump show: 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 1:2001, ack 1, win 65535, length 2000 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 2001:4001, ack 1, win 65535, length 2000 192.168.226.67.55 > 192.0.2.1.8080: Flags [P.], seq 4001:5001, ack 1, win 65535, length 1000 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 5001:6001, ack 1, win 65535, length 1000 192.0.2.1.8080 > 192.168.226.67.55: Flags [.], ack 4001, win 1001, length 0 192.168.226.67.55 > 192.0.2.1.8080: Flags [.], seq 5001:6001, ack 1, win 65535, length 1000 192.168.226.67.55 > 192.0.2.1.8080: Flags [P.], seq 4001:5001, ack 1, win 65535, length 1000 when cient retract window to 1001, send window is [4001,5002], but TLP send 5001-6001 packet which is out of send window. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1657532838-20200-1-git-send-email-liyonglong@chinatelecom.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-21netfilter: nf_tables: fix null deref due to zeroed list headFlorian Westphal1-0/+1
commit 580077855a40741cf511766129702d97ff02f4d9 upstream. In nf_tables_updtable, if nf_tables_table_enable returns an error, nft_trans_destroy is called to free the transaction object. nft_trans_destroy() calls list_del(), but the transaction was never placed on a list -- the list head is all zeroes, this results in a null dereference: BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59 Call Trace: nft_trans_destroy+0x26/0x59 nf_tables_newtable+0x4bc/0x9bc [..] Its sane to assume that nft_trans_destroy() can be called on the transaction object returned by nft_trans_alloc(), so make sure the list head is initialised. Fixes: 55dd6f93076b ("netfilter: nf_tables: use new transaction infrastructure to handle table") Reported-by: mingi cho <mgcho.minic@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21netfilter: nf_tables: do not allow RULE_ID to refer to another chainThadeu Lima de Souza Cascardo1-2/+5
commit 36d5b2913219ac853908b0f1c664345e04313856 upstream. When doing lookups for rules on the same batch by using its ID, a rule from a different chain can be used. If a rule is added to a chain but tries to be positioned next to a rule from a different chain, it will be linked to chain2, but the use counter on chain1 would be the one to be incremented. When looking for rules by ID, use the chain that was used for the lookup by name. The chain used in the context copied to the transaction needs to match that same chain. That way, struct nft_rule does not need to get enlarged with another member. Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute") Fixes: 75dd48e2e420 ("netfilter: nf_tables: Support RULE_ID reference in new rule") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21netfilter: nf_tables: do not allow CHAIN_ID to refer to another tableThadeu Lima de Souza Cascardo1-2/+4
commit 95f466d22364a33d183509629d0879885b4f547e upstream. When doing lookups for chains on the same batch by using its ID, a chain from a different table can be used. If a rule is added to a table but refers to a chain in a different table, it will be linked to the chain in table2, but would have expressions referring to objects in table1. Then, when table1 is removed, the rule will not be removed as its linked to a chain in table2. When expressions in the rule are processed or removed, that will lead to a use-after-free. When looking for chains by ID, use the table that was used for the lookup by name, and only return chains belonging to that same table. Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-21netfilter: nf_tables: do not allow SET_ID to refer to another tableThadeu Lima de Souza Cascardo1-1/+3
commit 470ee20e069a6d05ae549f7d0ef2bdbcee6a81b2 upstream. When doing lookups for sets on the same batch by using its ID, a set from a different table can be used. Then, when the table is removed, a reference to the set may be kept after the set is freed, leading to a potential use-after-free. When looking for sets by ID, use the table that was used for the lookup by name, and only return sets belonging to that same table. This fixes CVE-2022-2586, also reported as ZDI-CAN-17470. Reported-by: Team Orca of Sea Security (@seasecresponse) Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03bpf: Add PROG_TEST_RUN support for sk_lookup programsLorenz Bauer2-0/+106
commit 7c32e8f8bc33a5f4b113a630857e46634e3e143b upstream. Allow to pass sk_lookup programs to PROG_TEST_RUN. User space provides the full bpf_sk_lookup struct as context. Since the context includes a socket pointer that can't be exposed to user space we define that PROG_TEST_RUN returns the cookie of the selected socket or zero in place of the socket pointer. We don't support testing programs that select a reuseport socket, since this would mean running another (unrelated) BPF program from the sk_lookup test handler. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03bpf: Consolidate shared test timing codeLorenz Bauer1-62/+78
commit 607b9cc92bd7208338d714a22b8082fe83bcb177 upstream. Share the timing / signal interruption logic between different implementations of PROG_TEST_RUN. There is a change in behaviour as well. We check the loop exit condition before checking for pending signals. This resolves an edge case where a signal arrives during the last iteration. Instead of aborting with EINTR we return the successful result to user space. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210303101816.36774-2-lmb@cloudflare.com [dtcccc: fix conflicts in bpf_test_run()] Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03sctp: leave the err path free in sctp_stream_init to sctp_stream_freeXin Long2-19/+5
[ Upstream commit 181d8d2066c000ba0a0e6940a7ad80f1a0e68e9d ] A NULL pointer dereference was reported by Wei Chen: BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:__list_del_entry_valid+0x26/0x80 Call Trace: <TASK> sctp_sched_dequeue_common+0x1c/0x90 sctp_sched_prio_dequeue+0x67/0x80 __sctp_outq_teardown+0x299/0x380 sctp_outq_free+0x15/0x20 sctp_association_free+0xc3/0x440 sctp_do_sm+0x1ca7/0x2210 sctp_assoc_bh_rcv+0x1f6/0x340 This happens when calling sctp_sendmsg without connecting to server first. In this case, a data chunk already queues up in send queue of client side when processing the INIT_ACK from server in sctp_process_init() where it calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in all stream_out will be freed in sctp_stream_init's err path. Then in the asoc freeing it will crash when dequeuing this data chunk as stream_out is missing. As we can't free stream out before dequeuing all data from send queue, and this patch is to fix it by moving the err path stream_out/in freeing in sctp_stream_init() to sctp_stream_free() which is eventually called when freeing the asoc in sctp_association_free(). This fix also makes the code in sctp_process_init() more clear. Note that in sctp_association_init() when it fails in sctp_stream_init(), sctp_association_free() will not be called, and in that case it should go to 'stream_free' err path to free stream instead of 'fail_init'. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03netfilter: nf_queue: do not allow packet truncation below transport header ↵Florian Westphal1-1/+6
offset [ Upstream commit 99a63d36cb3ed5ca3aa6fcb64cffbeaf3b0fb164 ] Domingo Dirutigliano and Nicola Guerrera report kernel panic when sending nf_queue verdict with 1-byte nfta_payload attribute. The IP/IPv6 stack pulls the IP(v6) header from the packet after the input hook. If user truncates the packet below the header size, this skb_pull() will result in a malformed skb (skb->len < 0). Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink") Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03sctp: fix sleep in atomic context bug in timer handlersDuoming Zhou1-1/+1
[ Upstream commit b89fc26f741d9f9efb51cba3e9b241cf1380ec5a ] There are sleep in atomic context bugs in timer handlers of sctp such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(), sctp_generate_t1_init_event(), sctp_generate_timeout_event(), sctp_generate_t3_rtx_event() and so on. The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter that may sleep could be called by different timer handlers which is in interrupt context. One of the call paths that could trigger bug is shown below: (interrupt context) sctp_generate_probe_event sctp_do_sm sctp_side_effects sctp_cmd_interpreter sctp_outq_teardown sctp_outq_init sctp_sched_set_sched n->init_sid(..,GFP_KERNEL) sctp_sched_prio_init_sid //may sleep This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched() from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic context bugs. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix data-races around sysctl_tcp_reflect_tos.Kuniyuki Iwashima2-4/+4
[ Upstream commit 870e3a634b6a6cb1543b359007aca73fe6a03ac5 ] While reading sysctl_tcp_reflect_tos, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.Kuniyuki Iwashima1-1/+1
[ Upstream commit 79f55473bfc8ac51bd6572929a679eeb4da22251 ] While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.Kuniyuki Iwashima1-1/+1
[ Upstream commit 22396941a7f343d704738360f9ef0e6576489d43 ] While reading sysctl_tcp_comp_sack_slack_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: a70437cc09a1 ("tcp: add hrtimer slack to sack compression") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.Kuniyuki Iwashima1-1/+2
[ Upstream commit 4866b2b0f7672b6d760c4b8ece6fb56f965dcc8a ] While reading sysctl_tcp_comp_sack_delay_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 6d82aa242092 ("tcp: add tcp_comp_sack_delay_ns sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.Kuniyuki Iwashima1-1/+2
[ Upstream commit 2afdbe7b8de84c28e219073a6661080e1b3ded48 ] While reading sysctl_tcp_invalid_ratelimit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 032ee4236954 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_autocorking.Kuniyuki Iwashima1-1/+1
[ Upstream commit 85225e6f0a76e6745bc841c9f25169c509b573d8 ] While reading sysctl_tcp_autocorking, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: f54b311142a9 ("tcp: auto corking") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.Kuniyuki Iwashima1-1/+1
[ Upstream commit 1330ffacd05fc9ac4159d19286ce119e22450ed2 ] While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: f672258391b4 ("tcp: track min RTT using windowed min-filter") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_min_tso_segs.Kuniyuki Iwashima1-1/+1
[ Upstream commit e0bb4ab9dfddd872622239f49fb2bd403b70853b ] While reading sysctl_tcp_min_tso_segs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 95bd09eb2750 ("tcp: TSO packets automatic sizing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03igmp: Fix data-races around sysctl_igmp_qrv.Kuniyuki Iwashima1-11/+13
[ Upstream commit 8ebcc62c738f68688ee7c6fec2efe5bc6d3d7e60 ] While reading sysctl_igmp_qrv, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. This test can be packed into a helper, so such changes will be in the follow-up series after net is merged into net-next. qrv ?: READ_ONCE(net->ipv4.sysctl_igmp_qrv); Fixes: a9fe8e29945d ("ipv4: implement igmp_qrv sysctl to tune igmp robustness variable") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03net/tls: Remove the context from the list in tls_device_downMaxim Mikityanskiy1-1/+6
commit f6336724a4d4220c89a4ec38bca84b03b178b1a3 upstream. tls_device_down takes a reference on all contexts it's going to move to the degraded state (software fallback). If sk_destruct runs afterwards, it can reduce the reference counter back to 1 and return early without destroying the context. Then tls_device_down will release the reference it took and call tls_device_free_ctx. However, the context will still stay in tls_device_down_list forever. The list will contain an item, memory for which is released, making a memory corruption possible. Fix the above bug by properly removing the context from all lists before any call to tls_device_free_ctx. Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03net: ping6: Fix memleak in ipv6_renew_options().Kuniyuki Iwashima1-0/+6
commit e27326009a3d247b831eda38878c777f6f4eb3d1 upstream. When we close ping6 sockets, some resources are left unfreed because pingv6_prot is missing sk->sk_prot->destroy(). As reported by syzbot [0], just three syscalls leak 96 bytes and easily cause OOM. struct ipv6_sr_hdr *hdr; char data[24] = {0}; int fd; hdr = (struct ipv6_sr_hdr *)data; hdr->hdrlen = 2; hdr->type = IPV6_SRCRT_TYPE_4; fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP); setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24); close(fd); To fix memory leaks, let's add a destroy function. Note the socket() syscall checks if the GID is within the range of net.ipv4.ping_group_range. The default value is [1, 0] so that no GID meets the condition (1 <= GID <= 0). Thus, the local DoS does not succeed until we change the default value. However, at least Ubuntu/Fedora/RHEL loosen it. $ cat /usr/lib/sysctl.d/50-default.conf ... -net.ipv4.ping_group_range = 0 2147483647 Also, there could be another path reported with these options, and some of them require CAP_NET_RAW. setsockopt IPV6_ADDRFORM (inet6_sk(sk)->pktoptions) IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu) IPV6_HOPOPTS (inet6_sk(sk)->opt) IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt) IPV6_RTHDR (inet6_sk(sk)->opt) IPV6_DSTOPTS (inet6_sk(sk)->opt) IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt) getsockopt IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list) For the record, I left a different splat with syzbot's one. unreferenced object 0xffff888006270c60 (size 96): comm "repro2", pid 231, jiffies 4294696626 (age 13.118s) hex dump (first 32 bytes): 01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554) [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715) [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024) [<000000007096a025>] __sys_setsockopt (net/socket.c:2254) [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262) [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176 Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com Reported-by: Ayushman Dutta <ayudutta@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.Kuniyuki Iwashima1-1/+1
commit db3815a2fa691da145cfbe834584f31ad75df9ff upstream. While reading sysctl_tcp_challenge_ack_limit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.Kuniyuki Iwashima1-1/+1
commit 9fb90193fbd66b4c5409ef729fd081861f8b6351 upstream. While reading sysctl_tcp_limit_output_bytes, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 46d3ceabd8d9 ("tcp: TCP Small Queues") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.Kuniyuki Iwashima2-2/+2
commit 780476488844e070580bfc9e3bc7832ec1cea883 upstream. While reading sysctl_tcp_moderate_rcvbuf, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03Revert "tcp: change pingpong threshold to 3"Wei Wang1-9/+6
commit 4d8f24eeedc58d5f87b650ddda73c16e8ba56559 upstream. This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. This to-be-reverted commit was meant to apply a stricter rule for the stack to enter pingpong mode. However, the condition used to check for interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is jiffy based and might be too coarse, which delays the stack entering pingpong mode. We revert this patch so that we no longer use the above condition to determine interactive session, and also reduce pingpong threshold to 1. Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") Reported-by: LemmyHuang <hlm3280@163.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.Kuniyuki Iwashima1-4/+4
commit ab1ba21b523ab496b1a4a8e396333b24b0a18f9a upstream. While reading sysctl_tcp_no_ssthresh_metrics_save, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 65e6d90168f3 ("net-tcp: Disable TCP ssthresh metrics cache by default") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>