summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-06-30s390/sclp: Fix typo in commentsJiang Jian1-1/+1
Remove the repeated word 'and' from comments Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220622142713.14187-1-jiangjian@cdjrlc.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-30s390/archrandom: simplify back to earlier design and initialize earlierJason A. Donenfeld3-224/+12
s390x appears to present two RNG interfaces: - a "TRNG" that gathers entropy using some hardware function; and - a "DRBG" that takes in a seed and expands it. Previously, the TRNG was wired up to arch_get_random_{long,int}(), but it was observed that this was being called really frequently, resulting in high overhead. So it was changed to be wired up to arch_get_random_ seed_{long,int}(), which was a reasonable decision. Later on, the DRBG was then wired up to arch_get_random_{long,int}(), with a complicated buffer filling thread, to control overhead and rate. Fortunately, none of the performance issues matter much now. The RNG always attempts to use arch_get_random_seed_{long,int}() first, which means a complicated implementation of arch_get_random_{long,int}() isn't really valuable or useful to have around. And it's only used when reseeding, which means it won't hit the high throughput complications that were faced before. So this commit returns to an earlier design of just calling the TRNG in arch_get_random_seed_{long,int}(), and returning false in arch_get_ random_{long,int}(). Part of what makes the simplification possible is that the RNG now seeds itself using the TRNG at bootup. But this only works if the TRNG is detected early in boot, before random_init() is called. So this commit also causes that check to happen in setup_arch(). Cc: stable@vger.kernel.org Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Ingo Franzki <ifranzki@linux.ibm.com> Cc: Juergen Christ <jchrist@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20220610222023.378448-1-Jason@zx2c4.com Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-30io_uring: fix provided buffer importDylan Yudaken1-3/+4
io_import_iovec uses the s pointer, but this was changed immediately after the iovec was re-imported and so it was imported into the wrong place. Change the ordering. Fixes: 2be2eb02e2f5 ("io_uring: ensure reads re-import for selected buffers") Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220630132006.2825668-1-dylany@fb.com [axboe: ensure we don't half-import as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds4-3/+8
Pull rdma fixes from Jason Gunthorpe: "Three minor bug fixes: - qedr not setting the QP timeout properly toward userspace - Memory leak on error path in ib_cm - Divide by 0 in RDMA interrupt moderation" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: linux/dim: Fix divide by 0 in RDMA DIM RDMA/cm: Fix memory leak in ib_cm_insert_listen RDMA/qedr: Fix reporting QP timeout attribute
2022-06-30Merge tag 'fsnotify_for_v5.19-rc5' of ↵Linus Torvalds2-15/+23
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify fix from Jan Kara: "A fix for recently added fanotify API to have stricter checks and refuse some invalid flag combinations to make our life easier in the future" * tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: refine the validation checks on non-dir inode mask
2022-06-30Merge tag 'v5.19-p3' of ↵Linus Torvalds1-10/+2
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a regression that breaks the ccp driver" * tag 'v5.19-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ccp - Fix device IRQ counting by using platform_irq_count()
2022-06-30Merge tag 'devfreq-fixes-for-5.19-rc5' of ↵Rafael J. Wysocki4-76/+75
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Pull devfreq fixes for 5.19-rc5 from Chanwoo Choi: "1. Fix devfreq passive governor issue when cpufreq policies are not ready during kernel boot because some CPUs turn on after kernel booting or others. - Re-initialize the vairables of struct devfreq_passive_data when PROBE_DEFER happens when cpufreq_get() returns NULL. - Use dev_err_probe to mute warning when PROBE_DEFER. - Fix cpufreq passive unregister erroring on PROBE_DEFER by using the allocated parent_cpu_data list to free resouce instead of for_each_possible_cpu(). - Remove duplicate cpufreq passive unregister and warning when PROBE_DEFER. - Use HZ_PER_KZH macro in units.h. - Fix wrong indentation in SPDX-License line. 2. Fix reference count leak in exynos-ppmu.c by using of_node_put(). 3. Rework freq_table to be local to devfreq struct - struct devfreq_dev_profile includes freq_table array to store the supported frequencies. If devfreq driver doesn't initialize the freq_table, devfreq core allocates the memory and initializes the freq_table. On a devfreq PROBE_DEFER, the freq_table in the driver profile struct is never reset and may be left in an undefined state. To fix this and correctly handle PROBE_DEFER, use a local freq_table and max_state in the devfreq struct." * tag 'devfreq-fixes-for-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: PM / devfreq: passive: revert an editing accident in SPDX-License line PM / devfreq: Fix kernel warning with cpufreq passive register fail PM / devfreq: Rework freq_table to be local to devfreq struct PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER PM / devfreq: Mute warning on governor PROBE_DEFER PM / devfreq: Fix kernel panic with cpu based scaling to passive gov
2022-06-30io_uring: keep sendrecv flags in ioprioPavel Begunkov2-5/+9
We waste a u64 SQE field for flags even though we don't need as many bits and it can be used for something more useful later. Store io_uring specific send/recv flags in sqe->ioprio instead of ->addr2. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: 0455d4ccec54 ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg") [axboe: change comment in io_uring.h as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-30s390/purgatory: remove duplicated build rule of kexec-purgatory.oMasahiro Yamada1-2/+1
This is equivalent to the pattern rule in scripts/Makefile.build. Having the dependency on $(obj)/purgatory.ro is enough. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20220613170902.1775211-3-masahiroy@kernel.org Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-30s390/purgatory: hard-code obj-y in MakefileMasahiro Yamada1-1/+1
The purgatory/ directory is entirely guarded in arch/s390/Kbuild. CONFIG_ARCH_HAS_KEXEC_PURGATORY is bool type. $(CONFIG_ARCH_HAS_KEXEC_PURGATORY) is always 'y' when Kbuild visits this Makefile for building. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20220613170902.1775211-2-masahiroy@kernel.org Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-30s390: remove unneeded 'select BUILD_BIN2C'Masahiro Yamada1-1/+0
Since commit 4c0f032d4963 ("s390/purgatory: Omit use of bin2c"), s390 builds the purgatory without using bin2c. Remove 'select BUILD_BIN2C' to avoid the unneeded build of bin2c. Fixes: 4c0f032d4963 ("s390/purgatory: Omit use of bin2c") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20220613170902.1775211-1-masahiroy@kernel.org Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2022-06-30net: sfp: fix memory leak in sfp_probe()Jianglei Nie1-1/+1
sfp_probe() allocates a memory chunk from sfp with sfp_alloc(). When devm_add_action() fails, sfp is not freed, which leads to a memory leak. We should use devm_add_action_or_reset() instead of devm_add_action(). Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20220629075550.2152003-1-niejianglei2021@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-30mlxsw: spectrum_router: Fix rollback in tunnel next hop initPetr Machata1-1/+13
In mlxsw_sp_nexthop6_init(), a next hop is always added to the router linked list, and mlxsw_sp_nexthop_type_init() is invoked afterwards. When that function results in an error, the next hop will not have been removed from the linked list. As the error is propagated upwards and the caller frees the next hop object, the linked list ends up holding an invalid object. A similar issue comes up with mlxsw_sp_nexthop4_init(), where rollback block does exist, however does not include the linked list removal. Both IPv6 and IPv4 next hops have a similar issue with next-hop counter rollbacks. As these were introduced in the same patchset as the next hop linked list, include the cleanup in this patch. Fixes: dbe4598c1e92 ("mlxsw: spectrum_router: Keep nexthops in a linked list") Fixes: a5390278a5eb ("mlxsw: spectrum: Add support for setting counters on nexthops") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220629070205.803952-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-30net: rose: fix UAF bugs caused by timer handlerDuoming Zhou1-15/+19
There are UAF bugs in rose_heartbeat_expiry(), rose_timer_expiry() and rose_idletimer_expiry(). The root cause is that del_timer() could not stop the timer handler that is running and the refcount of sock is not managed properly. One of the UAF bugs is shown below: (thread 1) | (thread 2) | rose_bind | rose_connect | rose_start_heartbeat rose_release | (wait a time) case ROSE_STATE_0 | rose_destroy_socket | rose_heartbeat_expiry rose_stop_heartbeat | sock_put(sk) | ... sock_put(sk) // FREE | | bh_lock_sock(sk) // USE The sock is deallocated by sock_put() in rose_release() and then used by bh_lock_sock() in rose_heartbeat_expiry(). Although rose_destroy_socket() calls rose_stop_heartbeat(), it could not stop the timer that is running. The KASAN report triggered by POC is shown below: BUG: KASAN: use-after-free in _raw_spin_lock+0x5a/0x110 Write of size 4 at addr ffff88800ae59098 by task swapper/3/0 ... Call Trace: <IRQ> dump_stack_lvl+0xbf/0xee print_address_description+0x7b/0x440 print_report+0x101/0x230 ? irq_work_single+0xbb/0x140 ? _raw_spin_lock+0x5a/0x110 kasan_report+0xed/0x120 ? _raw_spin_lock+0x5a/0x110 kasan_check_range+0x2bd/0x2e0 _raw_spin_lock+0x5a/0x110 rose_heartbeat_expiry+0x39/0x370 ? rose_start_heartbeat+0xb0/0xb0 call_timer_fn+0x2d/0x1c0 ? rose_start_heartbeat+0xb0/0xb0 expire_timers+0x1f3/0x320 __run_timers+0x3ff/0x4d0 run_timer_softirq+0x41/0x80 __do_softirq+0x233/0x544 irq_exit_rcu+0x41/0xa0 sysvec_apic_timer_interrupt+0x8c/0xb0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1b/0x20 RIP: 0010:default_idle+0xb/0x10 RSP: 0018:ffffc9000012fea0 EFLAGS: 00000202 RAX: 000000000000bcae RBX: ffff888006660f00 RCX: 000000000000bcae RDX: 0000000000000001 RSI: ffffffff843a11c0 RDI: ffffffff843a1180 RBP: dffffc0000000000 R08: dffffc0000000000 R09: ffffed100da36d46 R10: dfffe9100da36d47 R11: ffffffff83cf0950 R12: 0000000000000000 R13: 1ffff11000ccc1e0 R14: ffffffff8542af28 R15: dffffc0000000000 ... Allocated by task 146: __kasan_kmalloc+0xc4/0xf0 sk_prot_alloc+0xdd/0x1a0 sk_alloc+0x2d/0x4e0 rose_create+0x7b/0x330 __sock_create+0x2dd/0x640 __sys_socket+0xc7/0x270 __x64_sys_socket+0x71/0x80 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 152: kasan_set_track+0x4c/0x70 kasan_set_free_info+0x1f/0x40 ____kasan_slab_free+0x124/0x190 kfree+0xd3/0x270 __sk_destruct+0x314/0x460 rose_release+0x2fa/0x3b0 sock_close+0xcb/0x230 __fput+0x2d9/0x650 task_work_run+0xd6/0x160 exit_to_user_mode_loop+0xc7/0xd0 exit_to_user_mode_prepare+0x4e/0x80 syscall_exit_to_user_mode+0x20/0x40 do_syscall_64+0x4f/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This patch adds refcount of sock when we use functions such as rose_start_heartbeat() and so on to start timer, and decreases the refcount of sock when timer is finished or deleted by functions such as rose_stop_heartbeat() and so on. As a result, the UAF bugs could be mitigated. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Tested-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220629002640.5693-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-30net: usb: ax88179_178a: Fix packet receivingJose Alonso1-25/+76
This patch corrects packet receiving in ax88179_rx_fixup. - problem observed: ifconfig shows allways a lot of 'RX Errors' while packets are received normally. This occurs because ax88179_rx_fixup does not recognise properly the usb urb received. The packets are normally processed and at the end, the code exits with 'return 0', generating RX Errors. (pkt_cnt==-2 and ptk_hdr over field rx_hdr trying to identify another packet there) This is a usb urb received by "tcpdump -i usbmon2 -X" on a little-endian CPU: 0x0000: eeee f8e3 3b19 87a0 94de 80e3 daac 0800 ^ packet 1 start (pkt_len = 0x05ec) ^^^^ IP alignment pseudo header ^ ethernet packet start last byte ethernet packet v padding (8-bytes aligned) vvvv vvvv 0x05e0: c92d d444 1420 8a69 83dd 272f e82b 9811 0x05f0: eeee f8e3 3b19 87a0 94de 80e3 daac 0800 ... ^ packet 2 0x0be0: eeee f8e3 3b19 87a0 94de 80e3 daac 0800 ... 0x1130: 9d41 9171 8a38 0ec5 eeee f8e3 3b19 87a0 ... 0x1720: 8cfc 15ff 5e4c e85c eeee f8e3 3b19 87a0 ... 0x1d10: ecfa 2a3a 19ab c78c eeee f8e3 3b19 87a0 ... 0x2070: eeee f8e3 3b19 87a0 94de 80e3 daac 0800 ... ^ packet 7 0x2120: 7c88 4ca5 5c57 7dcc 0d34 7577 f778 7e0a 0x2130: f032 e093 7489 0740 3008 ec05 0000 0080 ====1==== ====2==== hdr_off ^ pkt_len = 0x05ec ^^^^ AX_RXHDR_*=0x00830 ^^^^ ^ pkt_len = 0 ^^^^ AX_RXHDR_DROP_ERR=0x80000000 ^^^^ ^ 0x2140: 3008 ec05 0000 0080 3008 5805 0000 0080 0x2150: 3008 ec05 0000 0080 3008 ec05 0000 0080 0x2160: 3008 5803 0000 0080 3008 c800 0000 0080 ===11==== ===12==== ===13==== ===14==== 0x2170: 0000 0000 0e00 3821 ^^^^ ^^^^ rx_hdr ^^^^ pkt_cnt=14 ^^^^ hdr_off=0x2138 ^^^^ ^^^^ padding The dump shows that pkt_cnt is the number of entrys in the per-packet metadata. It is "2 * packet count". Each packet have two entrys. The first have a valid value (pkt_len and AX_RXHDR_*) and the second have a dummy-header 0x80000000 (pkt_len=0 with AX_RXHDR_DROP_ERR). Why exists dummy-header for each packet?!? My guess is that this was done probably to align the entry for each packet to 64-bits and maintain compatibility with old firmware. There is also a padding (0x00000000) before the rx_hdr to align the end of rx_hdr to 64-bit. Note that packets have a alignment of 64-bits (8-bytes). This patch assumes that the dummy-header and the last padding are optional. So it preserves semantics and recognises the same valid packets as the current code. This patch was made using only the dumpfile information and tested with only one device: 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet Fixes: 57bc3d3ae8c1 ("net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup") Fixes: e2ca90c276e1 ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver") Signed-off-by: Jose Alonso <joalonsof@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/d6970bb04bf67598af4d316eaeb1792040b18cfd.camel@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-30nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1Lamarque Vieira Souza1-0/+2
ADATA IM2P33F8ABR1 reports bogus eui64 values that appear to be the same across all drives. Quirk them out so they are not marked as "non globally unique" duplicates. Co-developed-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com> Signed-off-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com> Signed-off-by: Lamarque V. Souza <lamarque.souza@petrosoftdesign.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-30nvmet: add a clear_ids attribute for passthru targetsAlan Adamson4-0/+82
If the clear_ids attribute is set to true, the EUI/GUID/UUID is cleared for the passthru target. By default, loop targets will set clear_ids to true. This resolves an issue where a connect to a passthru target fails when using a trtype of 'loop' because EUI/GUID/UUID is not unique. Fixes: 2079f41ec6ff ("nvme: check that EUI/GUID/UUID are globally unique") Signed-off-by: Alan Adamson <alan.adamson@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-30net: bonding: fix use-after-free after 802.3ad slave unbindYevhen Orlov1-1/+2
commit 0622cab0341c ("bonding: fix 802.3ad aggregator reselection"), resolve case, when there is several aggregation groups in the same bond. bond_3ad_unbind_slave will invalidate (clear) aggregator when __agg_active_ports return zero. So, ad_clear_agg can be executed even, when num_of_ports!=0. Than bond_3ad_unbind_slave can be executed again for, previously cleared aggregator. NOTE: at this time bond_3ad_unbind_slave will not update slave ports list, because lag_ports==NULL. So, here we got slave ports, pointing to freed aggregator memory. Fix with checking actual number of ports in group (as was before commit 0622cab0341c ("bonding: fix 802.3ad aggregator reselection") ), before ad_clear_agg(). The KASAN logs are as follows: [ 767.617392] ================================================================== [ 767.630776] BUG: KASAN: use-after-free in bond_3ad_state_machine_handler+0x13dc/0x1470 [ 767.638764] Read of size 2 at addr ffff00011ba9d430 by task kworker/u8:7/767 [ 767.647361] CPU: 3 PID: 767 Comm: kworker/u8:7 Tainted: G O 5.15.11 #15 [ 767.655329] Hardware name: DNI AmazonGo1 A7040 board (DT) [ 767.660760] Workqueue: lacp_1 bond_3ad_state_machine_handler [ 767.666468] Call trace: [ 767.668930] dump_backtrace+0x0/0x2d0 [ 767.672625] show_stack+0x24/0x30 [ 767.675965] dump_stack_lvl+0x68/0x84 [ 767.679659] print_address_description.constprop.0+0x74/0x2b8 [ 767.685451] kasan_report+0x1f0/0x260 [ 767.689148] __asan_load2+0x94/0xd0 [ 767.692667] bond_3ad_state_machine_handler+0x13dc/0x1470 Fixes: 0622cab0341c ("bonding: fix 802.3ad aggregator reselection") Co-developed-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20220629012914.361-1-yevhen.orlov@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30ipv6: fix lockdep splat in in6_dump_addrs()Eric Dumazet1-2/+2
As reported by syzbot, we should not use rcu_dereference() when rcu_read_lock() is not held. WARNING: suspicious RCU usage 5.19.0-rc2-syzkaller #0 Not tainted net/ipv6/addrconf.c:5175 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by syz-executor326/3617: #0: ffffffff8d5848e8 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xae/0xc20 net/netlink/af_netlink.c:2223 stack backtrace: CPU: 0 PID: 3617 Comm: syz-executor326 Not tainted 5.19.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 in6_dump_addrs+0x12d1/0x1790 net/ipv6/addrconf.c:5175 inet6_dump_addr+0x9c1/0xb50 net/ipv6/addrconf.c:5300 netlink_dump+0x541/0xc20 net/netlink/af_netlink.c:2275 __netlink_dump_start+0x647/0x900 net/netlink/af_netlink.c:2380 netlink_dump_start include/linux/netlink.h:245 [inline] rtnetlink_rcv_msg+0x73e/0xc90 net/core/rtnetlink.c:6046 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 ____sys_sendmsg+0x6eb/0x810 net/socket.c:2492 ___sys_sendmsg+0xf3/0x170 net/socket.c:2546 __sys_sendmsg net/socket.c:2575 [inline] __do_sys_sendmsg net/socket.c:2584 [inline] __se_sys_sendmsg net/socket.c:2582 [inline] __x64_sys_sendmsg+0x132/0x220 net/socket.c:2582 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 88e2ca308094 ("mld: convert ifmcaddr6 to RCU") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220628121248.858695-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30net: phy: ax88772a: fix lost pause advertisement configurationOleksij Rempel1-2/+4
In case of asix_ax88772a_link_change_notify() workaround, we run soft reset which will automatically clear MII_ADVERTISE configuration. The PHYlib framework do not know about changed configuration state of the PHY, so we need use phy_init_hw() to reinit PHY configuration. Fixes: dde258469257 ("net: usb/phy: asix: add support for ax88772A/C PHYs") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220628114349.3929928-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30net: phy: Don't trigger state machine while in suspendLukas Wunner3-0/+52
Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(), but subsequent interrupts may retrigger it: They may have been left enabled to facilitate wakeup and are not quiesced until the ->suspend_noirq() phase. Unwanted interrupts may hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(), as well as between dpm_resume_noirq() and mdio_bus_phy_resume(). Retriggering the phy_state_machine() through an interrupt is not only undesirable for the reason given in mdio_bus_phy_suspend() (freezing it midway with phydev->lock held), but also because the PHY may be inaccessible after it's suspended: Accesses to USB-attached PHYs are blocked once usb_suspend_both() clears the can_submit flag and PHYs on PCI network cards may become inaccessible upon suspend as well. Amend phy_interrupt() to avoid triggering the state machine if the PHY is suspended. Signal wakeup instead if the attached net_device or its parent has been configured as a wakeup source. (Those conditions are identical to mdio_bus_phy_may_suspend().) Postpone handling of the interrupt until the PHY has resumed. Before stopping the phy_state_machine() in mdio_bus_phy_suspend(), wait for a concurrent phy_interrupt() to run to completion. That is necessary because phy_interrupt() may have checked the PHY's suspend status before the system sleep transition commenced and it may thus retrigger the state machine after it was stopped. Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(), wait for a concurrent phy_interrupt() to complete to ensure that interrupts which it postponed are properly rerun. The issue was exposed by commit 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling"), but has existed since forever. Fixes: 541cd3ee00a4 ("phylib: Fix deadlock on resume") Link: https://lore.kernel.org/netdev/a5315a8a-32c2-962f-f696-de9a26d30091@samsung.com/ Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: stable@vger.kernel.org # v2.6.33+ Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/b7f386d04e9b5b0e2738f0125743e30676f309ef.1656410895.git.lukas@wunner.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30usbnet: fix memory allocation in helpersOliver Neukum1-2/+2
usbnet provides some helper functions that are also used in the context of reset() operations. During a reset the other drivers on a device are unable to operate. As that can be block drivers, a driver for another interface cannot use paging in its memory allocations without risking a deadlock. Use GFP_NOIO in the helpers. Fixes: 877bd862f32b8 ("usbnet: introduce usbnet 3 command helpers") Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/20220628093517.7469-1-oneukum@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30selftests net: fix kselftest net fatal errorColeman Dietsch1-1/+1
The incorrect path is causing the following error when trying to run net kselftests: In file included from bpf/nat6to4.c:43: ../../../lib/bpf/bpf_helpers.h:11:10: fatal error: 'bpf_helper_defs.h' file not found ^~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: cf67838c4422 ("selftests net: fix bpf build error") Signed-off-by: Coleman Dietsch <dietschc@csp.edu> Link: https://lore.kernel.org/r/20220628174744.7908-1-dietschc@csp.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski5-32/+75
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Restore set counter when one of the CPU loses race to add elements to sets. 2) After NF_STOLEN, skb might be there no more, update nftables trace infra to avoid access to skb in this case. From Florian Westphal. 3) nftables bridge might register a prerouting hook with zero priority, br_netfilter incorrectly skips it. Also from Florian. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: br_netfilter: do not skip all hooks with 0 priority netfilter: nf_tables: avoid skb access on nf_stolen netfilter: nft_dynset: restore set element counter when failing to update ==================== Link: https://lore.kernel.org/r/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30Merge tag 'amd-drm-fixes-5.19-2022-06-29' of ↵Dave Airlie5-7/+6
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.19-2022-06-29: amdgpu: - GPU recovery fix - Fix integer type usage in fourcc header for AMD modifiers - KFD TLB flush fix for gfx9 APUs - Display fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220629192220.5870-1-alexander.deucher@amd.com
2022-06-30Merge tag 'drm-intel-fixes-2022-06-29' of ↵Dave Airlie3-24/+21
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.19-rc5: - Fix ioctl argument error return - Fix d3cold disable to allow PCI upstream bridge D3 transition - Fix setting cache_dirty for dma-buf objects on discrete Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/871qv7rblv.fsf@intel.com
2022-06-30dm raid: fix KASAN warning in raid5_add_disksMikulas Patocka1-0/+1
There's a KASAN warning in raid5_add_disk when running the LVM testsuite. The warning happens in the test lvconvert-raid-reshape-linear_to_raid6-single-type.sh. We fix the warning by verifying that rdev->saved_raid_disk is within limits. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-30dm raid: fix KASAN warning in raid5_remove_diskMikulas Patocka1-1/+4
There's a KASAN warning in raid5_remove_disk when running the LVM testsuite. We fix this warning by verifying that the "number" variable is within limits. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-06-30ata: pata_cs5535: Fix W=1 warningsJohn Garry1-2/+2
x86_64 allmodconfig build with W=1 gives these warnings: drivers/ata/pata_cs5535.c: In function ‘cs5535_set_piomode’: drivers/ata/pata_cs5535.c:93:11: error: variable ‘dummy’ set but not used [-Werror=unused-but-set-variable] u32 reg, dummy; ^~~~~ drivers/ata/pata_cs5535.c: In function ‘cs5535_set_dmamode’: drivers/ata/pata_cs5535.c:132:11: error: variable ‘dummy’ set but not used [-Werror=unused-but-set-variable] u32 reg, dummy; ^~~~~ cc1: all warnings being treated as errors Mark variables 'dummy' as "maybe unused" as they are only ever written in rdmsr() calls. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-06-30hwmon: (pmbus/ucd9200) fix typos in commentsJiang Jian1-1/+1
Drop the redundant word 'the' in the comments following /* * Set PHASE registers on all pages to 0xff to ensure that phase * specific commands will apply to all phases of a given page (rail). * This only affects the READ_IOUT and READ_TEMPERATURE2 registers. * READ_IOUT will return the sum of currents of all phases of a rail, * and READ_TEMPERATURE2 will return the maximum temperature detected * for the [the - DROP] phases of the rail. */ Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Link: https://lore.kernel.org/r/20220622063231.20612-1-jiangjian@cdjrlc.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-06-29hwmon: (occ) Prevent power cap command overwriting poll responseEddie James4-13/+15
Currently, the response to the power cap command overwrites the first eight bytes of the poll response, since the commands use the same buffer. This means that user's get the wrong data between the time of sending the power cap and the next poll response update. Fix this by specifying a different buffer for the power cap command response. Fixes: 5b5513b88002 ("hwmon: Add On-Chip Controller (OCC) hwmon driver") Signed-off-by: Eddie James <eajames@linux.ibm.com> Link: https://lore.kernel.org/r/20220628203029.51747-1-eajames@linux.ibm.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-06-29PM / devfreq: passive: revert an editing accident in SPDX-License lineLukas Bulwahn1-1/+1
Commit 26984d9d581e ("PM / devfreq: passive: Keep cpufreq_policy for possible cpus") reworked governor_passive.c, and accidently added a tab in the first line, i.e., the SPDX-License-Identifier line. The checkpatch script warns with the SPDX_LICENSE_TAG warning, and hence pointed this issue out while investigating checkpatch warnings. Revert this editing accident. No functional change. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-06-29PM / devfreq: Fix kernel warning with cpufreq passive register failChristian Marangi1-1/+0
Remove cpufreq_passive_unregister_notifier from cpufreq_passive_register_notifier in case of error as devfreq core already call unregister on GOV_START fail. This fix the kernel always printing a WARN on governor PROBE_DEFER as cpufreq_passive_unregister_notifier is called two times and return error on the second call as the cpufreq is already unregistered. Fixes: a03dacb0316f ("PM / devfreq: Add cpu based scaling support to passive governor") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-06-29PM / devfreq: Rework freq_table to be local to devfreq structChristian Marangi3-44/+46
On a devfreq PROBE_DEFER, the freq_table in the driver profile struct, is never reset and may be leaved in an undefined state. This comes from the fact that we store the freq_table in the driver profile struct that is commonly defined as static and not reset on PROBE_DEFER. We currently skip the reinit of the freq_table if we found it's already defined since a driver may declare his own freq_table. This logic is flawed in the case devfreq core generate a freq_table, set it in the profile struct and then PROBE_DEFER, freeing the freq_table. In this case devfreq will found a NOT NULL freq_table that has been freed, skip the freq_table generation and probe the driver based on the wrong table. To fix this and correctly handle PROBE_DEFER, use a local freq_table and max_state in the devfreq struct and never modify the freq_table present in the profile struct if it does provide it. Fixes: 0ec09ac2cebe ("PM / devfreq: Set the freq_table of devfreq device") Cc: stable@vger.kernel.org Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-06-29PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_eventsMiaoqian Lin1-2/+6
of_get_child_by_name() returns a node pointer with refcount incremented, we should use of_node_put() on it when done. This function only calls of_node_put() in normal path, missing it in error paths. Add missing of_node_put() to avoid refcount leak. Fixes: f262f28c1470 ("PM / devfreq: event: Add devfreq_event class") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-06-29PM / devfreq: passive: Use HZ_PER_KHZ macro in units.hYicong Yang1-2/+1
HZ macros has been centralized in units.h since [1]. Use it to avoid duplicated definition. [1] commit e2c77032fcbe ("units: add the HZ macros") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-06-29PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFERChristian 'Ansuel' Marangi1-22/+17
With the passive governor, the cpu based scaling can PROBE_DEFER due to the fact that CPU policy are not ready. The cpufreq passive unregister notifier is called both from the GOV_START errors and for the GOV_STOP and assume the notifier is successfully registred every time. With GOV_START failing it's wrong to loop over each possible CPU since the register path has failed for some CPU policy not ready. Change the logic and unregister the notifer based on the current allocated parent_cpu_data list to correctly handle errors and the governor unregister path. Fixes: a03dacb0316f ("PM / devfreq: Add cpu based scaling support to passive governor") Signed-off-by: Christian 'Ansuel' Marangi <ansuelsmth@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-06-29PM / devfreq: Mute warning on governor PROBE_DEFERChristian 'Ansuel' Marangi1-2/+3
Don't print warning when a governor PROBE_DEFER as it's not a real GOV_START fail. Fixes: a03dacb0316f ("PM / devfreq: Add cpu based scaling support to passive governor") Signed-off-by: Christian 'Ansuel' Marangi <ansuelsmth@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-06-29PM / devfreq: Fix kernel panic with cpu based scaling to passive govChristian 'Ansuel' Marangi1-2/+1
The cpufreq passive register notifier can PROBE_DEFER and the devfreq struct is freed and then reallocaed on probe retry. The current logic assume that the code can't PROBE_DEFER so the devfreq struct in the this variable in devfreq_passive_data is assumed to be (if already set) always correct. This cause kernel panic as the code try to access the wrong address. To correctly handle this, update the this variable in devfreq_passive_data to the devfreq reallocated struct. Fixes: a03dacb0316f ("PM / devfreq: Add cpu based scaling support to passive governor") Signed-off-by: Christian 'Ansuel' Marangi <ansuelsmth@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
2022-06-29Revert "drm/amdgpu/display: set vblank_disable_immediate for DC"Alex Deucher2-3/+1
This reverts commit 92020e81ddbeac351ea4a19bcf01743f32b9c800. This causes stuttering and timeouts with DMCUB for some users so revert it until we understand why and safely enable it to save power. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1887 Acked-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: stable@vger.kernel.org
2022-06-29drm/amdgpu: To flush tlb for MMHUB of RAVEN seriesRuili Ji1-1/+2
amdgpu: [mmhub0] no-retry page fault (src_id:0 ring:40 vmid:8 pasid:32769, for process test_basic pid 3305 thread test_basic pid 3305) amdgpu: in page starting at address 0x00007ff990003000 from IH client 0x12 (VMC) amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00840051 amdgpu: Faulty UTCL2 client ID: MP1 (0x0) amdgpu: MORE_FAULTS: 0x1 amdgpu: WALKER_ERROR: 0x0 amdgpu: PERMISSION_FAULTS: 0x5 amdgpu: MAPPING_ERROR: 0x0 amdgpu: RW: 0x1 When memory is allocated by kfd, no one triggers the tlb flush for MMHUB0. There is page fault from MMHUB0. v2:fix indentation v3:change subject and fix indentation Signed-off-by: Ruili Ji <ruiliji2@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-06-29drm/fourcc: fix integer type usage in uapi headerCarlos Llamas1-2/+2
Kernel uapi headers are supposed to use __[us]{8,16,32,64} types defined by <linux/types.h> as opposed to 'uint32_t' and similar. See [1] for the relevant discussion about this topic. In this particular case, the usage of 'uint64_t' escaped headers_check as these macros are not being called here. However, the following program triggers a compilation error: #include <drm/drm_fourcc.h> int main() { unsigned long x = AMD_FMT_MOD_CLEAR(RB); return 0; } gcc error: drm.c:5:27: error: ‘uint64_t’ undeclared (first use in this function) 5 | unsigned long x = AMD_FMT_MOD_CLEAR(RB); | ^~~~~~~~~~~~~~~~~ This patch changes AMD_FMT_MOD_{SET,CLEAR} macros to use the correct integer types, which fixes the above issue. [1] https://lkml.org/lkml/2019/6/5/18 Fixes: 8ba16d599374 ("drm/fourcc: Add AMD DRM modifiers.") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Simon Ser <contact@emersion.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-29drm/amdgpu: fix adev variable used in amdgpu_device_gpu_recover()Alex Deucher1-1/+1
Use the correct adev variable for the drm_fb_helper in amdgpu_device_gpu_recover(). Noticed by inspection. Fixes: 087451f372bf ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-06-29Merge tag 'platform-drivers-x86-v5.19-3' of ↵Linus Torvalds8-60/+127
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: - thinkpad_acpi/ideapad-laptop: mem-leak and platform-profile fixes - panasonic-laptop: missing hotkey presses regression fix - some hardware-id additions - some other small fixes * tag 'platform-drivers-x86-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: hp-wmi: Ignore Sanitization Mode event platform/x86: thinkpad_acpi: do not use PSC mode on Intel platforms platform/x86: thinkpad-acpi: profile capabilities as integer platform/x86: panasonic-laptop: filter out duplicate volume up/down/mute keypresses platform/x86: panasonic-laptop: don't report duplicate brightness key-presses platform/x86: panasonic-laptop: revert "Resolve hotkey double trigger bug" platform/x86: panasonic-laptop: sort includes alphabetically platform/x86: panasonic-laptop: de-obfuscate button codes ACPI: video: Change how we determine if brightness key-presses are handled platform/x86: ideapad-laptop: Add Ideapad 5 15ITL05 to ideapad_dytc_v4_allow_table[] platform/x86: ideapad-laptop: Add allow_v4_dytc module parameter platform/x86: thinkpad_acpi: Fix a memory leak of EFCH MMIO resource platform/mellanox: nvsw-sn2201: fix error code in nvsw_sn2201_create_static_devices() platform/x86: intel/pmc: Add Alder Lake N support to PMC core driver
2022-06-29Merge tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds4-28/+24
Pull ksmbd server fixes from Steve French: - seek null check (don't use f_seek op directly and blindly) - offset validation in FSCTL_SET_ZERO_DATA - fallocate fix (relates e.g. to xfstests generic/091 and 263) - two cleanup fixes - fix socket settings on some arch * tag '5.19-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: use vfs_llseek instead of dereferencing NULL ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA ksmbd: set the range of bytes to zero without extending file size in FSCTL_ZERO_DATA ksmbd: remove duplicate flag set in smb2_write ksmbd: smbd: Remove useless license text when SPDX-License-Identifier is already used ksmbd: use SOCK_NONBLOCK type for kernel_accept()
2022-06-29ceph: wait on async create before checking caps for syncfsJeff Layton1-0/+1
Currently, we'll call ceph_check_caps, but if we're still waiting on the reply, we'll end up spinning around on the same inode in flush_dirty_session_caps. Wait for the async create reply before flushing caps. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/55823 Fixes: fbed7045f552 ("ceph: wait for async create reply before sending any cap messages") Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-06-29xfs: dont treat rt extents beyond EOF as eofblocks to be clearedDarrick J. Wong1-0/+2
On a system with a realtime volume and a 28k realtime extent, generic/491 fails because the test opens a file on a frozen filesystem and closing it causes xfs_release -> xfs_can_free_eofblocks to mistakenly think that the the blocks of the realtime extent beyond EOF are posteof blocks to be freed. Realtime extents cannot be partially unmapped, so this is pointless. Worse yet, this triggers posteof cleanup, which stalls on a transaction allocation, which is why the test fails. Teach the predicate to account for realtime extents properly. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-06-29xfs: don't hold xattr leaf buffers across transaction rollsDarrick J. Wong5-65/+12
Now that we've established (again!) that empty xattr leaf buffers are ok, we no longer need to bhold them to transactions when we're creating new leaf blocks. Get rid of the entire mechanism, which should simplify the xattr code quite a bit. The original justification for using bhold here was to prevent the AIL from trying to write the empty leaf block into the fs during the brief time that we release the buffer lock. The reason for /that/ was to prevent recovery from tripping over the empty ondisk block. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-06-29xfs: empty xattr leaf header blocks are not corruptionDarrick J. Wong1-9/+17
TLDR: Revert commit 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") because it was wrong. Every now and then we get a corruption report from the kernel or xfs_repair about empty leaf blocks in the extended attribute structure. We've long thought that these shouldn't be possible, but prior to 5.18 one would shake loose in the recoveryloop fstests about once a month. A new addition to the xattr leaf block verifier in 5.19-rc1 makes this happen every 7 minutes on my testing cloud. I added a ton of logging to detect any time we set the header count on an xattr leaf block to zero. This produced the following dmesg output on generic/388: XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_create+0x187/0x230 xfs_attr_shortform_to_leaf+0xd1/0x2f0 xfs_attr_set_iter+0x73e/0xa90 xfs_xattri_finish_update+0x45/0x80 xfs_attr_finish_item+0x1b/0xd0 xfs_defer_finish_noroll+0x19c/0x770 __xfs_trans_commit+0x153/0x3e0 xfs_attr_set+0x36b/0x740 xfs_xattr_set+0x89/0xd0 __vfs_setxattr+0x67/0x80 __vfs_setxattr_noperm+0x6e/0x120 vfs_setxattr+0x97/0x180 setxattr+0x88/0xa0 path_setxattr+0xc3/0xe0 __x64_sys_setxattr+0x27/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 So now we know that someone is creating empty xattr leaf blocks as part of converting a sf xattr structure into a leaf xattr structure. The conversion routine logs any existing sf attributes in the same transaction that creates the leaf block, so we know this is a setxattr to a file that has no attributes at all. Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log recovery. I also augmented buffer item recovery to call ->verify_struct on any attr leaf blocks and complain if it finds a failure: XFS (sda4): Unmounting Filesystem XFS (sda4): Mounting V5 Filesystem XFS (sda4): Starting recovery (logdev: internal) XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_verify+0x3b8/0x420 xlog_recover_buf_commit_pass2+0x60a/0x6c0 xlog_recover_items_pass2+0x4e/0xc0 xlog_recover_commit_trans+0x33c/0x350 xlog_recovery_process_trans+0xa5/0xe0 xlog_recover_process_data+0x8d/0x140 xlog_do_recovery_pass+0x19b/0x720 xlog_do_log_recovery+0x62/0xc0 xlog_do_recover+0x33/0x1d0 xlog_recover+0xda/0x190 xfs_log_mount+0x14c/0x360 xfs_mountfs+0x517/0xa60 xfs_fs_fill_super+0x6bc/0x950 get_tree_bdev+0x175/0x280 vfs_get_tree+0x1a/0x80 path_mount+0x6f5/0xaa0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fc61e241eae And a moment later, the _delwri_submit of the recovered buffers trips the same verifier and recovery fails: XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78 XFS (sda4): Unmount and run xfs_repair XFS (sda4): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00 ........;....... 00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00 .....).x........ 00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d ......I......1>. 00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00 ................ 00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 .P.............. 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518). Shutting down filesystem. XFS (sda4): Please unmount the filesystem and rectify the problem(s) XFS (sda4): log mount/recovery failed: error -117 XFS (sda4): log mount failed I think I see what's going on here -- setxattr is racing with something that shuts down the filesystem: Thread 1 Thread 2 -------- -------- xfs_attr_sf_addname xfs_attr_shortform_to_leaf <create empty leaf> xfs_trans_bhold(leaf) xattri_dela_state = XFS_DAS_LEAF_ADD <roll transaction> <flush log> <shut down filesystem> xfs_trans_bhold_release(leaf) <discover fs is dead, bail> Thread 3 -------- <cycle mount, start recovery> xlog_recover_buf_commit_pass2 xlog_recover_do_reg_buffer <replay empty leaf buffer from recovered buf item> xfs_buf_delwri_queue(leaf) xfs_buf_delwri_submit _xfs_buf_ioapply(leaf) xfs_attr3_leaf_write_verify <trip over empty leaf buffer> <fail recovery> As you can see, the bhold keeps the leaf buffer locked and thus prevents the *AIL* from tripping over the ichdr.count==0 check in the write verifier. Unfortunately, it doesn't prevent the log from getting flushed to disk, which sets up log recovery to fail. So. It's clear that the kernel has always had the ability to persist attr leaf blocks with ichdr.count==0, which means that it's part of the ondisk format now. Unfortunately, this check has been added and removed multiple times throughout history. It first appeared in[1] kernel 3.10 as part of the early V5 format patches. The check was later discovered to break log recovery and hence disabled[2] during log recovery in kernel 4.10. Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to weed out the empty leaf blocks. This was still not correct because log recovery would recover an empty attr leaf block successfully only for regular xattr operations to trip over the empty block during of the block during regular operation. Therefore, the check was removed entirely[4] in kernel 5.7 but removal of the xfs_repair check was forgotten. The continued complaints from xfs_repair lead to us mistakenly re-adding[5] the verifier check for kernel 5.19. Remove it once again. [1] 517c22207b04 ("xfs: add CRCs to attr leaf blocks") [2] 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay") [3] f7140161 ("xfs_repair: junk leaf attribute if count == 0") [4] f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block") [5] 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") Looking at the rest of the xattr code, it seems that files with empty leaf blocks behave as expected -- listxattr reports no attributes; getxattr on any xattr returns nothing as expected; removexattr does nothing; and setxattr can add attributes just fine. Original-bug: 517c22207b04 ("xfs: add CRCs to attr leaf blocks") Still-not-fixed-by: 2e1d23370e75 ("xfs: ignore leaf attr ichdr.count in verifier during log replay") Removed-in: f28cef9e4dac ("xfs: don't fail verifier on empty attr3 leaf block") Fixes: 51e6104fdb95 ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-06-29nvme: fix regression when disconnect a recovering ctrlRuozhu Li4-6/+19
We encountered a problem that the disconnect command hangs. After analyzing the log and stack, we found that the triggering process is as follows: CPU0 CPU1 nvme_rdma_error_recovery_work nvme_rdma_teardown_io_queues nvme_do_delete_ctrl nvme_stop_queues nvme_remove_namespaces --clear ctrl->namespaces nvme_start_queues --no ns in ctrl->namespaces nvme_ns_remove return(because ctrl is deleting) blk_freeze_queue blk_mq_freeze_queue_wait --wait for ns to unquiesce to clean infligt IO, hang forever This problem was not found in older kernels because we will flush err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not seem to be modified for functional reasons, the patch can be revert to solve the problem. Revert commit 794a4cb3d2f7 ("nvme: remove the .stop_ctrl callout") Signed-off-by: Ruozhu Li <liruozhu@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>