summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-08blk-iocost: Fix an UBSAN shift-out-of-bounds warningTejun Heo1-0/+7
When iocg_kick_delay() is called from a CPU different than the one which set the delay, @now may be in the past of @iocg->delay_at leading to the following warning: UBSAN: shift-out-of-bounds in block/blk-iocost.c:1359:23 shift exponent 18446744073709 is too large for 64-bit type 'u64' (aka 'unsigned long long') ... Call Trace: <TASK> dump_stack_lvl+0x79/0xc0 __ubsan_handle_shift_out_of_bounds+0x2ab/0x300 iocg_kick_delay+0x222/0x230 ioc_rqos_merge+0x1d7/0x2c0 __rq_qos_merge+0x2c/0x80 bio_attempt_back_merge+0x83/0x190 blk_attempt_plug_merge+0x101/0x150 blk_mq_submit_bio+0x2b1/0x720 submit_bio_noacct_nocheck+0x320/0x3e0 __swap_writepage+0x2ab/0x9d0 The underflow itself doesn't really affect the behavior in any meaningful way; however, the past timestamp may exaggerate the delay amount calculated later in the code, which shouldn't be a material problem given the nature of the delay mechanism. If @now is in the past, this CPU is racing another CPU which recently set up the delay and there's nothing this CPU can contribute w.r.t. the delay. Let's bail early from iocg_kick_delay() in such cases. Reported-by: Breno Leitão <leitao@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 5160a5a53c0c ("blk-iocost: implement delay adjustment hysteresis") Link: https://lore.kernel.org/r/ZVvc9L_CYk5LO1fT@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08smb: client: set correct d_type for reparse points under DFS mountsPaulo Alcantara2-7/+14
Send query dir requests with an info level of SMB_FIND_FILE_FULL_DIRECTORY_INFO rather than SMB_FIND_FILE_DIRECTORY_INFO when the client is generating its own inode numbers (e.g. noserverino) so that reparse tags still can be parsed directly from the responses, but server won't send UniqueId (server inode number) Signed-off-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-08smb3: add missing null server pointer checkSteve French1-1/+1
Address static checker warning in cifs_ses_get_chan_index(): warn: variable dereferenced before check 'server' To be consistent, and reduce risk, we should add another check for null server pointer. Fixes: 88675b22d34e ("cifs: do not search for channel if server is terminating") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-08kprobes: Remove unnecessary initial values of variablesLi zeming1-2/+2
ri and sym is assigned first, so it does not need to initialize the assignment. Link: https://lore.kernel.org/all/20230919012823.7815-1-zeming@nfschina.com/ Signed-off-by: Li zeming <zeming@nfschina.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-02-08tracing/probes: Fix to set arg size and fmt after setting type from BTFMasami Hiramatsu (Google)1-12/+13
Since the BTF type setting updates probe_arg::type, the type size calculation and setting print-fmt should be done after that. Without this fix, the argument size and print-fmt can be wrong. Link: https://lore.kernel.org/all/170602218196.215583.6417859469540955777.stgit@devnote2/ Fixes: b576e09701c7 ("tracing/probes: Support function parameters if BTF is available") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-02-08tracing/probes: Fix to show a parse error for bad type for $commMasami Hiramatsu (Google)2-3/+7
Fix to show a parse error for bad type (non-string) for $comm/$COMM and immediate-string. With this fix, error_log file shows appropriate error message as below. /sys/kernel/tracing # echo 'p vfs_read $comm:u32' >> kprobe_events sh: write error: Invalid argument /sys/kernel/tracing # echo 'p vfs_read \"hoge":u32' >> kprobe_events sh: write error: Invalid argument /sys/kernel/tracing # cat error_log [ 30.144183] trace_kprobe: error: $comm and immediate-string only accepts string type Command: p vfs_read $comm:u32 ^ [ 62.618500] trace_kprobe: error: $comm and immediate-string only accepts string type Command: p vfs_read \"hoge":u32 ^ Link: https://lore.kernel.org/all/170602215411.215583.2238016352271091852.stgit@devnote2/ Fixes: 3dd1f7f24f8c ("tracing: probeevent: Fix to make the type of $comm string") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-02-08x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32Masahiro Yamada2-4/+2
In arch/x86/Kconfig, COMPAT_32 is defined as (IA32_EMULATION || X86_32). Use it to eliminate redundancy in Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231121235701.239606-5-masahiroy@kernel.org
2024-02-08x86/vdso: Use $(addprefix ) instead of $(foreach )Masahiro Yamada1-3/+3
$(addprefix ) is slightly shorter and more intuitive. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231121235701.239606-4-masahiroy@kernel.org
2024-02-08x86/vdso: Simplify obj-y additionMasahiro Yamada1-12/+4
Add objects to obj-y in a more straightforward way. CONFIG_X86_32 and CONFIG_IA32_EMULATION are not enabled simultaneously, but even if they are, Kbuild graciously deduplicates obj-y entries. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231121235701.239606-3-masahiroy@kernel.org
2024-02-08x86/vdso: Consolidate targets and clean-filesMasahiro Yamada1-6/+1
'targets' and 'clean-files' do not need to list the same files because the files listed in 'targets' are cleaned up. Refactor the code. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231121235701.239606-2-masahiroy@kernel.org
2024-02-08Merge tag 'nf-24-02-08' of ↵Paolo Abeni15-102/+202
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Narrow down target/match revision to u8 in nft_compat. 2) Bail out with unused flags in nft_compat. 3) Restrict layer 4 protocol to u16 in nft_compat. 4) Remove static in pipapo get command that slipped through when reducing set memory footprint. 5) Follow up incremental fix for the ipset performance regression, this includes the missing gc cancellation, from Jozsef Kadlecsik. 6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0 as no filtering, from Felix Huettner. 7) Reject direction for NFT_CT_ID. 8) Use timestamp to check for set element expiration while transaction is handled to prevent garbage collection from removing set elements that were just added by this transaction. Packet path and netlink dump/get path still use current time to check for expiration. 9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal. 10) map_index needs to be percpu and per-set, not just percpu. At this time its possible for a pipapo set to fill the all-zero part with ones and take the 'might have bits set' as 'start-from-zero' area. From Florian Westphal. This includes three patches: - Change scratchpad area to a structure that provides space for a per-set-and-cpu toggle and uses it of the percpu one. - Add a new free helper to prepare for the next patch. - Remove the scratch_aligned pointer and makes AVX2 implementation use the exact same memory addresses for read/store of the matching state. netfilter pull request 24-02-08 * tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_set_pipapo: remove scratch_aligned pointer netfilter: nft_set_pipapo: add helper to release pcpu scratch area netfilter: nft_set_pipapo: store index in scratch maps netfilter: nft_set_rbtree: skip end interval element from gc netfilter: nfnetlink_queue: un-break NF_REPEAT netfilter: nf_tables: use timestamp to check for set element timeout netfilter: nft_ct: reject direction for ct id netfilter: ctnetlink: fix filtering for zone 0 netfilter: ipset: Missing gc cancellations fixed netfilter: nft_set_pipapo: remove static in nft_pipapo_get() netfilter: nft_compat: restrict match/target protocol to u16 netfilter: nft_compat: reject unused compat flag netfilter: nft_compat: narrow down revision to unsigned 8-bits ==================== Link: https://lore.kernel.org/r/20240208112834.1433-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08netfilter: nft_set_pipapo: remove scratch_aligned pointerFlorian Westphal3-39/+10
use ->scratch for both avx2 and the generic implementation. After previous change the scratch->map member is always aligned properly for AVX2, so we can just use scratch->map in AVX2 too. The alignoff delta is stored in the scratchpad so we can reconstruct the correct address to free the area again. Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08netfilter: nft_set_pipapo: add helper to release pcpu scratch areaFlorian Westphal1-5/+23
After next patch simple kfree() is not enough anymore, so add a helper for it. Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08netfilter: nft_set_pipapo: store index in scratch mapsFlorian Westphal3-26/+44
Pipapo needs a scratchpad area to keep state during matching. This state can be large and thus cannot reside on stack. Each set preallocates percpu areas for this. On each match stage, one scratchpad half starts with all-zero and the other is inited to all-ones. At the end of each stage, the half that starts with all-ones is always zero. Before next field is tested, pointers to the two halves are swapped, i.e. resmap pointer turns into fill pointer and vice versa. After the last field has been processed, pipapo stashes the index toggle in a percpu variable, with assumption that next packet will start with the all-zero half and sets all bits in the other to 1. This isn't reliable. There can be multiple sets and we can't be sure that the upper and lower half of all set scratch map is always in sync (lookups can be conditional), so one set might have swapped, but other might not have been queried. Thus we need to keep the index per-set-and-cpu, just like the scratchpad. Note that this bug fix is incomplete, there is a related issue. avx2 and normal implementation might use slightly different areas of the map array space due to the avx2 alignment requirements, so m->scratch (generic/fallback implementation) and ->scratch_aligned (avx) may partially overlap. scratch and scratch_aligned are not distinct objects, the latter is just the aligned address of the former. After this change, write to scratch_align->map_index may write to scratch->map, so this issue becomes more prominent, we can set to 1 a bit in the supposedly-all-zero area of scratch->map[]. A followup patch will remove the scratch_aligned and makes generic and avx code use the same (aligned) area. Its done in a separate change to ease review. Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08netfilter: nft_set_rbtree: skip end interval element from gcPablo Neira Ayuso1-3/+3
rbtree lazy gc on insert might collect an end interval element that has been just added in this transactions, skip end interval elements that are not yet active. Fixes: f718863aca46 ("netfilter: nft_set_rbtree: fix overlap expiration walk") Cc: stable@vger.kernel.org Reported-by: lonial con <kongln9170@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08netfilter: nfnetlink_queue: un-break NF_REPEATFlorian Westphal1-3/+10
Only override userspace verdict if the ct hook returns something other than ACCEPT. Else, this replaces NF_REPEAT (run all hooks again) with NF_ACCEPT (move to next hook). Fixes: 6291b3a67ad5 ("netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts") Reported-by: l.6diay@passmail.com Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08netfilter: nf_tables: use timestamp to check for set element timeoutPablo Neira Ayuso5-15/+42
Add a timestamp field at the beginning of the transaction, store it in the nftables per-netns area. Update set backend .insert, .deactivate and sync gc path to use the timestamp, this avoids that an element expires while control plane transaction is still unfinished. .lookup and .update, which are used from packet path, still use the current time to check if the element has expired. And .get path and dump also since this runs lockless under rcu read size lock. Then, there is async gc which also needs to check the current time since it runs asynchronously from a workqueue. Fixes: c3e1b005ed1c ("netfilter: nf_tables: add set element timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08netfilter: nft_ct: reject direction for ct idPablo Neira Ayuso1-0/+3
Direction attribute is ignored, reject it in case this ever needs to be supported Fixes: 3087c3f7c23b ("netfilter: nft_ct: Add ct id support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08netfilter: ctnetlink: fix filtering for zone 0Felix Huettner2-5/+50
previously filtering for the default zone would actually skip the zone filter and flush all zones. Fixes: eff3c558bb7e ("netfilter: ctnetlink: support filtering by zone") Reported-by: Ilya Maximets <i.maximets@ovn.org> Closes: https://lore.kernel.org/netdev/2032238f-31ac-4106-8f22-522e76df5a12@ovn.org/ Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08s390/qeth: Fix potential loss of L3-IP@ in case of network issuesAlexandra Winter1-3/+6
Symptom: In case of a bad cable connection (e.g. dirty optics) a fast sequence of network DOWN-UP-DOWN-UP could happen. UP triggers recovery of the qeth interface. In case of a second DOWN while recovery is still ongoing, it can happen that the IP@ of a Layer3 qeth interface is lost and will not be recovered by the second UP. Problem: When registration of IP addresses with Layer 3 qeth devices fails, (e.g. because of bad address format) the respective IP address is deleted from its hash-table in the driver. If registration fails because of a ENETDOWN condition, the address should stay in the hashtable, so a subsequent recovery can restore it. 3caa4af834df ("qeth: keep ip-address after LAN_OFFLINE failure") fixes this for registration failures during normal operation, but not during recovery. Solution: Keep L3-IP address in case of ENETDOWN in qeth_l3_recover_ip(). For consistency with qeth_l3_add_ip() we also keep it in case of EADDRINUSE, i.e. for some reason the card already/still has this address registered. Fixes: 4a71df50047f ("qeth: new qeth device driver") Cc: stable@vger.kernel.org Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://lore.kernel.org/r/20240206085849.2902775-1-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08netfilter: ipset: Missing gc cancellations fixedJozsef Kadlecsik2-2/+4
The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation") missed to add the calls to gc cancellations at the error path of create operations and at module unload. Also, because the half of the destroy operations now executed by a function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex or rcu read lock is held and therefore the checking of them results false warnings. Fixes: 97f7cf1cd80e ("netfilter: ipset: fix performance regression in swap operation") Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com Reported-by: Brad Spengler <spender@grsecurity.net> Reported-by: Стас Ничипорович <stasn77@gmail.com> Tested-by: Brad Spengler <spender@grsecurity.net> Tested-by: Стас Ничипорович <stasn77@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08octeontx2-af: Initialize maps.Ratheesh Kannoth1-16/+15
kmalloc_array() without __GFP_ZERO flag does not initialize memory to zero. This causes issues. Use kcalloc() for maps and bitmap_zalloc() for bitmaps. Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Brett Creeley <bcreeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240206024000.1070260-1-rkannoth@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08Merge branch 'cpsw-enable-mac_managed_pm-to-fix-mdio'Paolo Abeni2-0/+5
Sinthu Raja says: ==================== CPSW: enable mac_managed_pm to fix mdio This patch fix the resume/suspend issue on CPSW interface. Reference from the foloowing patchwork: https://lore.kernel.org/netdev/20221014144729.1159257-2-shenwei.wang@nxp.com/T/ V1: https://patchwork.kernel.org/project/netdevbpf/patch/20240122083414.6246-1-sinthu.raja@ti.com/ V2: https://patchwork.kernel.org/project/netdevbpf/patch/20240122093326.7618-1-sinthu.raja@ti.com/ ==================== Link: https://lore.kernel.org/r/20240206005928.15703-1-sinthu.raja@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdioSinthu Raja1-0/+2
The below commit introduced a WARN when phy state is not in the states: PHY_HALTED, PHY_READY and PHY_UP. commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") When cpsw resumes, there have port in PHY_NOLINK state, so the below warning comes out. Set mac_managed_pm be true to tell mdio that the phy resume/suspend is managed by the mac, to fix the following warning: WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144 CPU: 0 PID: 965 Comm: sh Tainted: G O 6.1.46-g247b2535b2 #1 Hardware name: Generic AM33XX (Flattened Device Tree) unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x24/0x2c dump_stack_lvl from __warn+0x84/0x15c __warn from warn_slowpath_fmt+0x1a8/0x1c8 warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144 mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140 dpm_run_callback from device_resume+0xb8/0x2b8 device_resume from dpm_resume+0x144/0x314 dpm_resume from dpm_resume_end+0x14/0x20 dpm_resume_end from suspend_devices_and_enter+0xd0/0x924 suspend_devices_and_enter from pm_suspend+0x2e0/0x33c pm_suspend from state_store+0x74/0xd0 state_store from kernfs_fop_write_iter+0x104/0x1ec kernfs_fop_write_iter from vfs_write+0x1b8/0x358 vfs_write from ksys_write+0x78/0xf8 ksys_write from ret_fast_syscall+0x0/0x54 Exception stack(0xe094dfa8 to 0xe094dff0) dfa0: 00000004 005c3fb8 00000001 005c3fb8 00000004 00000001 dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000 dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66 Cc: <stable@vger.kernel.org> # v6.0+ Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Sinthu Raja <sinthu.raja@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdioSinthu Raja1-0/+3
The below commit introduced a WARN when phy state is not in the states: PHY_HALTED, PHY_READY and PHY_UP. commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") When cpsw_new resumes, there have port in PHY_NOLINK state, so the below warning comes out. Set mac_managed_pm be true to tell mdio that the phy resume/suspend is managed by the mac, to fix the following warning: WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144 CPU: 0 PID: 965 Comm: sh Tainted: G O 6.1.46-g247b2535b2 #1 Hardware name: Generic AM33XX (Flattened Device Tree) unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x24/0x2c dump_stack_lvl from __warn+0x84/0x15c __warn from warn_slowpath_fmt+0x1a8/0x1c8 warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144 mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140 dpm_run_callback from device_resume+0xb8/0x2b8 device_resume from dpm_resume+0x144/0x314 dpm_resume from dpm_resume_end+0x14/0x20 dpm_resume_end from suspend_devices_and_enter+0xd0/0x924 suspend_devices_and_enter from pm_suspend+0x2e0/0x33c pm_suspend from state_store+0x74/0xd0 state_store from kernfs_fop_write_iter+0x104/0x1ec kernfs_fop_write_iter from vfs_write+0x1b8/0x358 vfs_write from ksys_write+0x78/0xf8 ksys_write from ret_fast_syscall+0x0/0x54 Exception stack(0xe094dfa8 to 0xe094dff0) dfa0: 00000004 005c3fb8 00000001 005c3fb8 00000004 00000001 dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000 dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66 Cc: <stable@vger.kernel.org> # v6.0+ Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Fixes: fba863b81604 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM") Signed-off-by: Sinthu Raja <sinthu.raja@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-08gpio: remove GPIO device from the list unconditionally in error pathBartosz Golaszewski1-4/+4
Since commit 48e1b4d369cf ("gpiolib: remove the GPIO device from the list when it's unregistered") we remove the GPIO device entry from the global list (used to order devices by their GPIO ranges) when unregistering the chip, not when releasing the device. It will not happen when the last reference is put anymore. This means, we need to remove it in error path in gpiochip_add_data_with_key() unconditionally, without checking if the device's .release() callback is set. Fixes: 48e1b4d369cf ("gpiolib: remove the GPIO device from the list when it's unregistered") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2024-02-08netfilter: nft_set_pipapo: remove static in nft_pipapo_get()Pablo Neira Ayuso1-1/+1
This has slipped through when reducing memory footprint for set elements, remove it. Fixes: 9dad402b89e8 ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-02-08drm/xe: Remove TEST_VM_ASYNC_OPS_ERRORMatthew Brost2-35/+1
TEST_VM_ASYNC_OPS_ERROR is broken and unused. Remove for now and will pull back in a later time when it is used, fixed, and properly hidden behind a Kconfig option. Also fixup the supported flags value. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240206045010.2981051-1-matthew.brost@intel.com (cherry picked from commit d9890c028d66a9e1ee3cccaa081ab5aedcbfe431) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-08drm/xe/vm: don't ignore error when in_kthreadMatthew Auld1-4/+1
If GUP fails and we are in_kthread, we can have pinned = 0 and ret = 0. If that happens we call sg_alloc_append_table_from_pages() with n_pages = 0, which is not well behaved and can trigger: kernel BUG at include/linux/scatterlist.h:115! depending on if the pages array happens to be zeroed or not. Even if we don't hit that it crashes later when trying to dma_map the returned table. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240202171435.427630-2-matthew.auld@intel.com (cherry picked from commit 8087199cd5951c1eba26003b3e4296dbb2110adf) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-08drm/xe: Assume large page size if VMA not yet boundMatthew Brost1-1/+3
The calculation to determine max page size of a VMA during a REMAP operations assumes the VMA has been bound. This assumption is not true if the VMA is from an eariler operation in an array of binds. If a VMA has not been bound use the maximum page size which will ensure the previous / next REMAP operations are not incorrectly skipped. Fixes: 8f33b4f054fc ("drm/xe: Avoid doing rebinds") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240205231714.2956225-1-matthew.brost@intel.com (cherry picked from commit 5ad6af5c91e9b942c44b657122270d935db3a813) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-08drm/xe/display: Fix memleak in display initializationXiaoming Wang1-6/+0
intel_power_domains_init is called twice in xe_device_probe: 1) intel_power_domains_init() xe_display_init_nommio() xe_device_probe() 2) intel_power_domains_init() intel_display_driver_probe_noirq() xe_display_init_noirq() xe_device_probe() It needs remove one to avoid power_domains->power_wells double malloc. unreferenced object 0xffff88811150ee00 (size 512): comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s) hex dump (first 32 bytes): 10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................ ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0 [<ffffffff812c98b2>] __kmalloc+0x52/0x150 [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe] [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe] [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe] [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe] [<ffffffff817f2187>] local_pci_probe+0x47/0xa0 [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0 [<ffffffff8192f2a2>] really_probe+0x1a2/0x410 [<ffffffff8192f598>] __driver_probe_device+0x78/0x160 [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90 [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0 [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0 [<ffffffff8192e159>] bus_add_driver+0x119/0x220 [<ffffffff81930d00>] driver_register+0x60/0x120 [<ffffffffa05e50a0>] 0xffffffffa05e50a0 The call to intel_power_domains_cleanup() needs to stay where it is for now. The main issue is that while the init is called by the display side, shared by i915 and xe, the cleanup is called by a non-shared code path. Fixing that will be done as a separate commit. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Signed-off-by: Xiaoming Wang <xiaoming.wang@intel.com> [ reword commit message and explain why the fini needs to stay where it is ] Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240202215658.561298-1-lucas.demarchi@intel.com (cherry picked from commit 86c99abb5f1b6fcd69fb268eeb2e34cb7c4f355c) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-08drm/xe: Map both mem.kernel_bb_pool and usm.bb_poolMatthew Brost2-6/+22
For integrated devices we need to map both mem.kernel_bb_pool and usm.bb_pool to be able to run batches from both pools. Fixes: a682b6a42d4d ("drm/xe: Support device page faults on integrated platforms") Tested-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240202033440.2351862-1-matthew.brost@intel.com (cherry picked from commit 72f86ed3c88933d6fa09b036de93621ea71097a7) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-08drm/xe: circumvent bogus stringop-overflow warningArnd Bergmann1-1/+1
gcc-13 warns about an array overflow that it sees but that is prevented by the "asid % NUM_PF_QUEUE" calculation: drivers/gpu/drm/xe/xe_gt_pagefault.c: In function 'xe_guc_pagefault_handler': include/linux/fortify-string.h:57:33: error: writing 16 bytes into a region of size 0 [-Werror=stringop-overflow=] include/linux/fortify-string.h:689:26: note: in expansion of macro '__fortify_memcpy_chk' 689 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ drivers/gpu/drm/xe/xe_gt_pagefault.c:341:17: note: in expansion of macro 'memcpy' 341 | memcpy(pf_queue->data + pf_queue->tail, msg, len * sizeof(u32)); | ^~~~~~ drivers/gpu/drm/xe/xe_gt_types.h:102:25: note: at offset [1144, 265324] into destination object 'tile' of size 8 I found that rewriting the assignment using pointer addition rather than the equivalent array index calculation prevents the warning, so use that instead. I sent a bug report against gcc for the false positive warning. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113214 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240103114819.2913937-1-arnd@kernel.org (cherry picked from commit 774ef5dfc95578a9079426d5106076dcd59c4dfa) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-08drm/xe: Pick correct userptr VMA to repin on REMAP op failureMatthew Brost1-5/+17
A REMAP op is composed of 3 VMA's - unmap, prev map, and next map. When op_execute fails with -EAGAIN we need to update the local VMA pointer to the current op state and then repin the VMA if it is a userptr. Fixes a failure seen in xe_vm.munmap-style-unbind-userptr-one-partial. Fixes: b06d47be7c83 ("drm/xe: Port Xe to GPUVA") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240201004849.2219558-3-matthew.brost@intel.com (cherry picked from commit 447f74d223b4f6cbab74963bf1099050c15374ce) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-08drm/xe: Take a reference in xe_exec_queue_last_fence_get()Matthew Brost5-6/+11
Take a reference in xe_exec_queue_last_fence_get(). Also fix a reference counting underflow bug VM bind and unbind. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240201004849.2219558-2-matthew.brost@intel.com (cherry picked from commit a856b67a84169e065ebbeee50258936b1eacc9eb) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-08drm/xe: Fix loop in vm_bind_ioctl_ops_unwindMatthew Brost1-1/+1
The logic for the unwind loop is incorrect resulting in an infinite loop. Fix to unwind to go from the last operations list to he first. Fixes: 617eebb9c480 ("drm/xe: Fix array of binds") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240201175532.2303168-1-matthew.brost@intel.com (cherry picked from commit 3acc1ff1a72fce00cdbd3ef1c27108a967fd5616) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-08Merge tag 'v6.8-p3' of ↵Linus Torvalds3-4/+14
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix regressions in cbc and algif_hash, as well as an older NULL-pointer dereference in ccp" * tag 'v6.8-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: algif_hash - Remove bogus SGL free on zero-length error path crypto: cbc - Ensure statesize is zero crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked
2024-02-08Merge tag 'percpu-for-6.8-rc4' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu fix from Dennis Zhou: - fix riscv wrong size passed to local_flush_tlb_range_asid() * tag 'percpu-for-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: riscv: Fix wrong size passed to local_flush_tlb_range_asid()
2024-02-08nilfs2: fix potential bug in end_buffer_async_writeRyusuke Konishi1-3/+5
According to a syzbot report, end_buffer_async_write(), which handles the completion of block device writes, may detect abnormal condition of the buffer async_write flag and cause a BUG_ON failure when using nilfs2. Nilfs2 itself does not use end_buffer_async_write(). But, the async_write flag is now used as a marker by commit 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") as a means of resolving double list insertion of dirty blocks in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the resulting crash. This modification is safe as long as it is used for file data and b-tree node blocks where the page caches are independent. However, it was irrelevant and redundant to also introduce async_write for segment summary and super root blocks that share buffers with the backing device. This led to the possibility that the BUG_ON check in end_buffer_async_write would fail as described above, if independent writebacks of the backing device occurred in parallel. The use of async_write for segment summary buffers has already been removed in a previous change. Fix this issue by removing the manipulation of the async_write flag for the remaining super root block buffer. Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com Fixes: 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setupSeongJae Park1-1/+1
DAMON sysfs interface's update_schemes_tried_regions command has a timeout of two apply intervals of the DAMOS scheme. Having zero value DAMOS scheme apply interval means it will use the aggregation interval as the value. However, the timeout setup logic is mistakenly using the sampling interval insted of the aggregartion interval for the case. This could cause earlier-than-expected timeout of the command. Fix it. Link: https://lkml.kernel.org/r/20240202191956.88791-1-sj@kernel.org Fixes: 7d6fa31a2fd7 ("mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.7.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()Ryusuke Konishi1-1/+7
Syzbot reported a hang issue in migrate_pages_batch() called by mbind() and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2. While migrate_pages_batch() locks a folio and waits for the writeback to complete, the log writer thread that should bring the writeback to completion picks up the folio being written back in nilfs_lookup_dirty_data_buffers() that it calls for subsequent log creation and was trying to lock the folio. Thus causing a deadlock. In the first place, it is unexpected that folios/pages in the middle of writeback will be updated and become dirty. Nilfs2 adds a checksum to verify the validity of the log being written and uses it for recovery at mount, so data changes during writeback are suppressed. Since this is broken, an unclean shutdown could potentially cause recovery to fail. Investigation revealed that the root cause is that the wait for writeback completion in nilfs_page_mkwrite() is conditional, and if the backing device does not require stable writes, data may be modified without waiting. Fix these issues by making nilfs_page_mkwrite() wait for writeback to finish regardless of the stable write requirement of the backing device. Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08MAINTAINERS: Leo Yan has movedLeo Yan2-1/+2
I will lose access to my @linaro.org email address next week, update the MAINTAINERS file and map it in .mailmap with the new email address. Link: https://lkml.kernel.org/r/20240201021022.886-1-leo.yan@linux.dev Signed-off-by: Leo Yan <leo.yan@linux.dev> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08mm/zswap: don't return LRU_SKIP if we have dropped lru lockChengming Zhou1-3/+1
LRU_SKIP can only be returned if we don't ever dropped lru lock, or we need to return LRU_RETRY to restart from the head of lru list. Otherwise, the iteration might continue from a cursor position that was freed while the locks were dropped. Actually we may need to introduce another LRU_STOP to really terminate the ongoing shrinking scan process, when we encounter a warm page already in the swap cache. The current list_lru implementation doesn't have this function to early break from __list_lru_walk_one. Link: https://lkml.kernel.org/r/20240126-zswap-writeback-race-v2-1-b10479847099@bytedance.com Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Chris Li <chriscli@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_superOscar Salvador1-2/+4
When configuring a hugetlb filesystem via the fsconfig() syscall, there is a possible NULL dereference in hugetlbfs_fill_super() caused by assigning NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize is non valid. E.g: Taking the following steps: fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC); fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0); fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced with NULL, losing its previous value, and we will print an error: ... ... case Opt_pagesize: ps = memparse(param->string, &rest); ctx->hstate = h; if (!ctx->hstate) { pr_err("Unsupported page size %lu MB\n", ps / SZ_1M); return -EINVAL; } return 0; ... ... This is a problem because later on, we will dereference ctxt->hstate in hugetlbfs_fill_super() ... ... sb->s_blocksize = huge_page_size(ctx->hstate); ... ... Causing below Oops. Fix this by replacing cxt->hstate value only when then pagesize is known to be valid. kernel: hugetlbfs: Unsupported page size 0 MB kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0 kernel: Oops: 0000 [#1] PREEMPT SMP PTI kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G E 6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 kernel: Call Trace: kernel: <TASK> kernel: ? __die_body+0x1a/0x60 kernel: ? page_fault_oops+0x16f/0x4a0 kernel: ? search_bpf_extables+0x65/0x70 kernel: ? fixup_exception+0x22/0x310 kernel: ? exc_page_fault+0x69/0x150 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: ? hugetlbfs_fill_super+0xb4/0x1a0 kernel: ? hugetlbfs_fill_super+0x28/0x1a0 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: vfs_get_super+0x40/0xa0 kernel: ? __pfx_bpf_lsm_capable+0x10/0x10 kernel: vfs_get_tree+0x25/0xd0 kernel: vfs_cmd_create+0x64/0xe0 kernel: __x64_sys_fsconfig+0x395/0x410 kernel: do_syscall_64+0x80/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? exc_page_fault+0x69/0x150 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 kernel: RIP: 0033:0x7ffbc0cb87c9 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000 kernel: </TASK> kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E) kernel: mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E) kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1 kernel: CR2: 0000000000000028 kernel: ---[ end trace 0000000000000000 ]--- kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 Link: https://lkml.kernel.org/r/20240130210418.3771-1-osalvador@suse.de Fixes: 32021982a324 ("hugetlbfs: Convert to fs_context") Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08mailmap: switch email address for John MoonJohn Moon1-0/+1
Add current email address as QUIC email is no longer active. Link: https://lkml.kernel.org/r/20240131034311.46706-1-john@jmoon.dev Signed-off-by: John Moon <john@jmoon.dev> Acked-by: Trilok Soni <quic_tsoni@quicinc.com> Cc: Elliot Berman <quic_eberman@quicinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08mm: zswap: fix objcg use-after-free in entry destructionJohannes Weiner1-4/+4
In the per-memcg LRU universe, LRU removal uses entry->objcg to determine which list count needs to be decreased. Drop the objcg reference after updating the LRU, to fix a possible use-after-free. Link: https://lkml.kernel.org/r/20240130013438.565167-1-hannes@cmpxchg.org Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08mm/madvise: don't forget to leave lazy MMU mode in ↵Sergey Senozhatsky1-0/+1
madvise_cold_or_pageout_pte_range() We need to leave lazy MMU mode before unlocking. Link: https://lkml.kernel.org/r/20240126032608.355899-1-senozhatsky@chromium.org Fixes: b2f557a21bc8 ("mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jiexun Wang <wangjiexun@tinylab.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08arch/arm/mm: fix major fault accounting when retrying under per-VMA lockSuren Baghdasaryan1-0/+2
The change [1] missed ARM architecture when fixing major fault accounting for page fault retry under per-VMA lock. The user-visible effects is that it restores correct major fault accounting that was broken after [2] was merged in 6.7 kernel. The more detailed description is in [3] and this patch simply adds the same fix to ARM architecture which I missed in [3]. Add missing code to fix ARM architecture fault accounting. [1] 46e714c729c8 ("arch/mm/fault: fix major fault accounting when retrying under per-VMA lock") [2] https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/ [3] https://lore.kernel.org/all/20231226214610.109282-1-surenb@google.com/ Link: https://lkml.kernel.org/r/20240123064305.2829244-1-surenb@google.com Fixes: 12214eba1992 ("mm: handle read faults under the VMA lock") Reported-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08selftests: core: include linux/close_range.h for CLOSE_RANGE_* macrosMuhammad Usama Anjum1-0/+1
Correct header file is needed for getting CLOSE_RANGE_* macros. Previously it was tested with newer glibc which didn't show the need to include the header which was a mistake. Link: https://lkml.kernel.org/r/20231024155137.219700-1-usama.anjum@collabora.com Fixes: ec54424923cf ("selftests: core: remove duplicate defines") Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com> Link: https://lore.kernel.org/all/7161219e-0223-d699-d6f3-81abd9abf13b@arm.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-08mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_pageMiaohe Lin1-0/+3
When I did soft offline stress test, a machine was observed to crash with the following message: kernel BUG at include/linux/memcontrol.h:554! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 5 PID: 3837 Comm: hwpoison.sh Not tainted 6.7.0-next-20240112-00001-g8ecf3e7fb7c8-dirty #97 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:folio_memcg+0xaf/0xd0 Code: 10 5b 5d c3 cc cc cc cc 48 c7 c6 08 b1 f2 b2 48 89 ef e8 b4 c5 f8 ff 90 0f 0b 48 c7 c6 d0 b0 f2 b2 48 89 ef e8 a2 c5 f8 ff 90 <0f> 0b 48 c7 c6 08 b1 f2 b2 48 89 ef e8 90 c5 f8 ff 90 0f 0b 66 66 RSP: 0018:ffffb6c043657c98 EFLAGS: 00000296 RAX: 000000000000004b RBX: ffff932bc1d1e401 RCX: ffff933abfb5c908 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff933abfb5c900 RBP: ffffea6f04019080 R08: ffffffffb3338ce8 R09: 0000000000009ffb R10: 00000000000004dd R11: ffffffffb3308d00 R12: ffffea6f04019080 R13: ffffea6f04019080 R14: 0000000000000001 R15: ffffb6c043657da0 FS: 00007f6c60f6b740(0000) GS:ffff933abfb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559c3bc8b980 CR3: 0000000107f1c000 CR4: 00000000000006f0 Call Trace: <TASK> split_huge_page_to_list+0x4d/0x1380 try_to_split_thp_page+0x3a/0xf0 soft_offline_page+0x1ea/0x8a0 soft_offline_page_store+0x52/0x90 kernfs_fop_write_iter+0x118/0x1b0 vfs_write+0x30b/0x430 ksys_write+0x5e/0xe0 do_syscall_64+0xb0/0x1b0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7f6c60d14697 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffe9b72b8d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f6c60d14697 RDX: 000000000000000c RSI: 0000559c3bc8b980 RDI: 0000000000000001 RBP: 0000559c3bc8b980 R08: 00007f6c60dd1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007f6c60e1a780 R14: 00007f6c60e16600 R15: 00007f6c60e15a00 The problem is that page->mapping is overloaded with slab->slab_list or slabs fields now, so slab pages could be taken as non-LRU movable pages if field slabs contains PAGE_MAPPING_MOVABLE or slab_list->prev is set to LIST_POISON2. These slab pages will be treated as thp later leading to crash in split_huge_page_to_list(). Link: https://lkml.kernel.org/r/20240126065837.2100184-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20240124084014.1772906-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Fixes: 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head") Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>