summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-07-23ksmbd: fix out of bounds read in smb2_sess_setupNamjae Jeon1-14/+17
commit 98422bdd4cb3ca4d08844046f6507d7ec2c2b8d8 upstream. ksmbd does not consider the case of that smb2 session setup is in compound request. If this is the second payload of the compound, OOB read issue occurs while processing the first payload in the smb2_sess_setup(). Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21355 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23ksmbd: add missing compound request handing in some commandsNamjae Jeon1-25/+53
commit 7b7d709ef7cf285309157fb94c33f625dd22c5e1 upstream. This patch add the compound request handling to the some commands. Existing clients do not send these commands as compound requests, but ksmbd should consider that they may come. Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19xfs: fix xfs_inodegc_stop racing with mod_delayed_workDarrick J. Wong1-5/+27
commit 2254a7396a0ca6309854948ee1c0a33fa4268cec upstream. syzbot reported this warning from the faux inodegc shrinker that tries to kick off inodegc work: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 102 at kernel/workqueue.c:1445 __queue_work+0xd44/0x1120 kernel/workqueue.c:1444 RIP: 0010:__queue_work+0xd44/0x1120 kernel/workqueue.c:1444 Call Trace: __queue_delayed_work+0x1c8/0x270 kernel/workqueue.c:1672 mod_delayed_work_on+0xe1/0x220 kernel/workqueue.c:1746 xfs_inodegc_shrinker_scan fs/xfs/xfs_icache.c:2212 [inline] xfs_inodegc_shrinker_scan+0x250/0x4f0 fs/xfs/xfs_icache.c:2191 do_shrink_slab+0x428/0xaa0 mm/vmscan.c:853 shrink_slab+0x175/0x660 mm/vmscan.c:1013 shrink_one+0x502/0x810 mm/vmscan.c:5343 shrink_many mm/vmscan.c:5394 [inline] lru_gen_shrink_node mm/vmscan.c:5511 [inline] shrink_node+0x2064/0x35f0 mm/vmscan.c:6459 kswapd_shrink_node mm/vmscan.c:7262 [inline] balance_pgdat+0xa02/0x1ac0 mm/vmscan.c:7452 kswapd+0x677/0xd60 mm/vmscan.c:7712 kthread+0x2e8/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 This warning corresponds to this code in __queue_work: /* * For a draining wq, only works from the same workqueue are * allowed. The __WQ_DESTROYING helps to spot the issue that * queues a new work item to a wq after destroy_workqueue(wq). */ if (unlikely(wq->flags & (__WQ_DESTROYING | __WQ_DRAINING) && WARN_ON_ONCE(!is_chained_work(wq)))) return; For this to trip, we must have a thread draining the inodedgc workqueue and a second thread trying to queue inodegc work to that workqueue. This can happen if freezing or a ro remount race with reclaim poking our faux inodegc shrinker and another thread dropping an unlinked O_RDONLY file: Thread 0 Thread 1 Thread 2 xfs_inodegc_stop xfs_inodegc_shrinker_scan xfs_is_inodegc_enabled <yes, will continue> xfs_clear_inodegc_enabled xfs_inodegc_queue_all <list empty, do not queue inodegc worker> xfs_inodegc_queue <add to list> xfs_is_inodegc_enabled <no, returns> drain_workqueue <set WQ_DRAINING> llist_empty <no, will queue list> mod_delayed_work_on(..., 0) __queue_work <sees WQ_DRAINING, kaboom> In other words, everything between the access to inodegc_enabled state and the decision to poke the inodegc workqueue requires some kind of coordination to avoid the WQ_DRAINING state. We could perhaps introduce a lock here, but we could also try to eliminate WQ_DRAINING from the picture. We could replace the drain_workqueue call with a loop that flushes the workqueue and queues workers as long as there is at least one inode present in the per-cpu inodegc llists. We've disabled inodegc at this point, so we know that the number of queued inodes will eventually hit zero as long as xfs_inodegc_start cannot reactivate the workers. There are four callers of xfs_inodegc_start. Three of them come from the VFS with s_umount held: filesystem thawing, failed filesystem freezing, and the rw remount transition. The fourth caller is mounting rw (no remount or freezing possible). There are three callers ofs xfs_inodegc_stop. One is unmounting (no remount or thaw possible). Two of them come from the VFS with s_umount held: fs freezing and ro remount transition. Hence, it is correct to replace the drain_workqueue call with a loop that drains the inodegc llists. Fixes: 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19xfs: disable reaping in fscounters scrubDarrick J. Wong5-38/+6
commit 2d5f38a31980d7090f5bf91021488dc61a0ba8ee upstream. The fscounters scrub code doesn't work properly because it cannot quiesce updates to the percpu counters in the filesystem, hence it returns false corruption reports. This has been fixed properly in one of the online repair patchsets that are under review by replacing the xchk_disable_reaping calls with an exclusive filesystem freeze. Disabling background gc isn't sufficient to fix the problem. In other words, scrub doesn't need to call xfs_inodegc_stop, which is just as well since it wasn't correct to allow scrub to call xfs_inodegc_start when something else could be calling xfs_inodegc_stop (e.g. trying to freeze the filesystem). Neuter the scrubber for now, and remove the xchk_*_reaping functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19xfs: check that per-cpu inodegc workers actually run on that cpuDarrick J. Wong3-0/+8
commit b37c4c8339cd394ea6b8b415026603320a185651 upstream. Now that we've allegedly worked out the problem of the per-cpu inodegc workers being scheduled on the wrong cpu, let's put in a debugging knob to let us know if a worker ever gets mis-scheduled again. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19xfs: explicitly specify cpu when forcing inodegc delayed work to run immediatelyDarrick J. Wong1-2/+4
commit 03e0add80f4cf3f7393edb574eeb3a89a1db7758 upstream. I've been noticing odd racing behavior in the inodegc code that could only be explained by one cpu adding an inode to its inactivation llist at the same time that another cpu is processing that cpu's llist. Preemption is disabled between get/put_cpu_ptr, so the only explanation is scheduler mayhem. I inserted the following debug code into xfs_inodegc_worker (see the next patch): ASSERT(gc->cpu == smp_processor_id()); This assertion tripped during overnight tests on the arm64 machines, but curiously not on x86_64. I think we haven't observed any resource leaks here because the lockfree list code can handle simultaneous llist_add and llist_del_all functions operating on the same list. However, the whole point of having percpu inodegc lists is to take advantage of warm memory caches by inactivating inodes on the last processor to touch the inode. The incorrect scheduling seems to occur after an inodegc worker is subjected to mod_delayed_work(). This wraps mod_delayed_work_on with WORK_CPU_UNBOUND specified as the cpu number. Unbound allows for scheduling on any cpu, not necessarily the same one that scheduled the work. Because preemption is disabled for as long as we have the gc pointer, I think it's safe to use current_cpu() (aka smp_processor_id) to queue the delayed work item on the correct cpu. Fixes: 7cf2b0f9611b ("xfs: bound maximum wait time for inodegc work") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19fs: no need to check sourceJan Kara1-2/+1
commit 66d8fc0539b0d49941f313c9509a8384e4245ac1 upstream. The @source inode must be valid. It is even checked via IS_SWAPFILE() above making it pretty clear. So no need to check it when we unlock. What doesn't need to exist is the @target inode. The lock_two_inodes() helper currently swaps the @inode1 and @inode2 arguments if @inode1 is NULL to have consistent lock class usage. However, we know that at least for vfs_rename() that @inode1 is @source and thus is never NULL as per above. We also know that @source is a different inode than @target as that is checked right at the beginning of vfs_rename(). So we know that @source is valid and locked and that @target is locked. So drop the check whether @source is non-NULL. Fixes: 28eceeda130f ("fs: Lock moved directories") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202307030026.9sE2pk2x-lkp@intel.com Message-Id: <20230703-vfs-rename-source-v1-1-37eebb29b65b@kernel.org> [brauner: use commit message from patch I sent concurrently] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()Filipe Manana1-2/+7
commit 40b0a749388517de244643c09bdbb98f7dcb6ef1 upstream. At __btrfs_cow_block(), instead of doing a BUG_ON() in case we fail to record a tree mod log root insertion operation, do a transaction abort instead. There's really no need for the BUG_ON(), we can properly release all resources in this context and turn the filesystem to RO mode and in an error state instead. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19btrfs: fix extent buffer leak after tree mod log failure at split_node()Filipe Manana1-0/+2
commit ede600e497b1461d06d22a7d17703d9096868bc3 upstream. At split_node(), if we fail to log the tree mod log copy operation, we return without unlocking the split extent buffer we just allocated and without decrementing the reference we own on it. Fix this by unlocking it and decrementing the ref count before returning. Fixes: 5de865eebb83 ("Btrfs: fix tree mod logging") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19btrfs: fix race when deleting quota root from the dirty cow roots listFilipe Manana1-0/+2
commit b31cb5a6eb7a48b0a7bfdf06832b1fd5088d8c79 upstream. When disabling quotas we are deleting the quota root from the list fs_info->dirty_cowonly_roots without taking the lock that protects it, which is struct btrfs_fs_info::trans_lock. This unsynchronized list manipulation may cause chaos if there's another concurrent manipulation of this list, such as when adding a root to it with ctree.c:add_root_to_dirty_list(). This can result in all sorts of weird failures caused by a race, such as the following crash: [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.279928] Code: 85 38 06 00 (...) [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206 [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000 [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070 [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600 [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48 [337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000 [337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0 [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [337571.282874] Call Trace: [337571.283101] <TASK> [337571.283327] ? __die_body+0x1b/0x60 [337571.283570] ? die_addr+0x39/0x60 [337571.283796] ? exc_general_protection+0x22e/0x430 [337571.284022] ? asm_exc_general_protection+0x22/0x30 [337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs] [337571.284803] ? _raw_spin_unlock+0x15/0x30 [337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs] [337571.285305] reset_balance_state+0x152/0x1b0 [btrfs] [337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs] [337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410 [337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs] [337571.286358] ? mod_objcg_state+0xd2/0x360 [337571.286577] ? refill_obj_stock+0xb0/0x160 [337571.286798] ? seq_release+0x25/0x30 [337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0 [337571.287235] ? percpu_counter_add_batch+0x2e/0xa0 [337571.287455] ? __x64_sys_ioctl+0x88/0xc0 [337571.287675] __x64_sys_ioctl+0x88/0xc0 [337571.287901] do_syscall_64+0x38/0x90 [337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc [337571.288352] RIP: 0033:0x7f478aaffe9b So fix this by locking struct btrfs_fs_info::trans_lock before deleting the quota root from that list. Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19btrfs: reinsert BGs failed to reclaimNaohiro Aota1-0/+2
commit 7e27180994383b7c741ad87749db01e4989a02ba upstream. The reclaim process can temporarily fail. For example, if the space is getting tight, it fails to make the block group read-only. If there are no further writes on that block group, the block group will never get back to the reclaim list, and the BG never gets reclaimed. In a certain workload, we can leave many such block groups never reclaimed. So, let's get it back to the list and give it a chance to be reclaimed. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19btrfs: add block-group tree to lockdep classesDavid Sterba1-2/+3
commit 1a1b0e729d227f9f758f7b5f1c997e874e94156e upstream. The block group tree was not present among the lockdep classes. We could get potentially lockdep warnings but so far none has been seen, also because block-group-tree is a relatively new feature. CC: stable@vger.kernel.org # 6.1+ Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19btrfs: bail out reclaim process if filesystem is read-onlyNaohiro Aota1-2/+9
commit 93463ff7b54626f8276c0bd3d3f968fbf8d5d380 upstream. When a filesystem is read-only, we cannot reclaim a block group as it cannot rewrite the data. Just bail out in that case. Note that it can drop block groups in this case. As we did sb_start_write(), read-only filesystem means we got a fatal error and forced read-only. There is no chance to reclaim them again. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19btrfs: delete unused BGs while reclaiming BGsNaohiro Aota1-0/+14
commit 3ed01616bad6c7e3de196676b542ae3df8058592 upstream. The reclaiming process only starts after the filesystem volumes are allocated to a certain level (75% by default). Thus, the list of reclaiming target block groups can build up so huge at the time the reclaim process kicks in. On a test run, there were over 1000 BGs in the reclaim list. As the reclaim involves rewriting the data, it takes really long time to reclaim the BGs. While the reclaim is running, btrfs_delete_unused_bgs() won't proceed because the reclaim side is holding fs_info->reclaim_bgs_lock. As a result, we will have a large number of unused BGs kept in the unused list. On my test run, I got 1057 unused BGs. Since deleting a block group is relatively easy and fast work, we can call btrfs_delete_unused_bgs() while it reclaims BGs, to avoid building up unused BGs. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profileMatt Corallo1-1/+8
commit 160fe8f6fdb13da6111677be6263e5d65e875987 upstream. Callers of `btrfs_reduce_alloc_profile` expect it to return exactly one allocation profile flag, and failing to do so may ultimately result in a WARN_ON and remount-ro when allocating new blocks, like the below transaction abort on 6.1. `btrfs_reduce_alloc_profile` has two ways of determining the profile, first it checks if a conversion balance is currently running and uses the profile we're converting to. If no balance is currently running, it returns the max-redundancy profile which at least one block in the selected block group has. This works by simply checking each known allocation profile bit in redundancy order. However, `btrfs_reduce_alloc_profile` has not been updated as new flags have been added - first with the `DUP` profile and later with the RAID1C34 profiles. Because of the way it checks, if we have blocks with different profiles and at least one is known, that profile will be selected. However, if none are known we may return a flag set with multiple allocation profiles set. This is currently only possible when a balance from one of the three unhandled profiles to another of the unhandled profiles is canceled after allocating at least one block using the new profile. In that case, a transaction abort like the below will occur and the filesystem will need to be mounted with -o skip_balance to get it mounted rw again (but the balance cannot be resumed without a similar abort). [770.648] ------------[ cut here ]------------ [770.648] BTRFS: Transaction aborted (error -22) [770.648] WARNING: CPU: 43 PID: 1159593 at fs/btrfs/extent-tree.c:4122 find_free_extent+0x1d94/0x1e00 [btrfs] [770.648] CPU: 43 PID: 1159593 Comm: btrfs Tainted: G W 6.1.0-0.deb11.7-powerpc64le #1 Debian 6.1.20-2~bpo11+1a~test [770.648] Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV [770.648] NIP: c00800000f6784fc LR: c00800000f6784f8 CTR: c000000000d746c0 [770.648] REGS: c000200089afe9a0 TRAP: 0700 Tainted: G W (6.1.0-0.deb11.7-powerpc64le Debian 6.1.20-2~bpo11+1a~test) [770.648] MSR: 9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28848282 XER: 20040000 [770.648] CFAR: c000000000135110 IRQMASK: 0 GPR00: c00800000f6784f8 c000200089afec40 c00800000f7ea800 0000000000000026 GPR04: 00000001004820c2 c000200089afea00 c000200089afe9f8 0000000000000027 GPR08: c000200ffbfe7f98 c000000002127f90 ffffffffffffffd8 0000000026d6a6e8 GPR12: 0000000028848282 c000200fff7f3800 5deadbeef0000122 c00000002269d000 GPR16: c0002008c7797c40 c000200089afef17 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000001 c000200008bc5a98 0000000000000001 GPR24: 0000000000000000 c0000003c73088d0 c000200089afef17 c000000016d3a800 GPR28: c0000003c7308800 c00000002269d000 ffffffffffffffea 0000000000000001 [770.648] NIP [c00800000f6784fc] find_free_extent+0x1d94/0x1e00 [btrfs] [770.648] LR [c00800000f6784f8] find_free_extent+0x1d90/0x1e00 [btrfs] [770.648] Call Trace: [770.648] [c000200089afec40] [c00800000f6784f8] find_free_extent+0x1d90/0x1e00 [btrfs] (unreliable) [770.648] [c000200089afed30] [c00800000f681398] btrfs_reserve_extent+0x1a0/0x2f0 [btrfs] [770.648] [c000200089afeea0] [c00800000f681bf0] btrfs_alloc_tree_block+0x108/0x670 [btrfs] [770.648] [c000200089afeff0] [c00800000f66bd68] __btrfs_cow_block+0x170/0x850 [btrfs] [770.648] [c000200089aff100] [c00800000f66c58c] btrfs_cow_block+0x144/0x288 [btrfs] [770.648] [c000200089aff1b0] [c00800000f67113c] btrfs_search_slot+0x6b4/0xcb0 [btrfs] [770.648] [c000200089aff2a0] [c00800000f679f60] lookup_inline_extent_backref+0x128/0x7c0 [btrfs] [770.648] [c000200089aff3b0] [c00800000f67b338] lookup_extent_backref+0x70/0x190 [btrfs] [770.648] [c000200089aff470] [c00800000f67b54c] __btrfs_free_extent+0xf4/0x1490 [btrfs] [770.648] [c000200089aff5a0] [c00800000f67d770] __btrfs_run_delayed_refs+0x328/0x1530 [btrfs] [770.648] [c000200089aff740] [c00800000f67ea2c] btrfs_run_delayed_refs+0xb4/0x3e0 [btrfs] [770.648] [c000200089aff800] [c00800000f699aa4] btrfs_commit_transaction+0x8c/0x12b0 [btrfs] [770.648] [c000200089aff8f0] [c00800000f6dc628] reset_balance_state+0x1c0/0x290 [btrfs] [770.648] [c000200089aff9a0] [c00800000f6e2f7c] btrfs_balance+0x1164/0x1500 [btrfs] [770.648] [c000200089affb40] [c00800000f6f8e4c] btrfs_ioctl+0x2b54/0x3100 [btrfs] [770.648] [c000200089affc80] [c00000000053be14] sys_ioctl+0x794/0x1310 [770.648] [c000200089affd70] [c00000000002af98] system_call_exception+0x138/0x250 [770.648] [c000200089affe10] [c00000000000c654] system_call_common+0xf4/0x258 [770.648] --- interrupt: c00 at 0x7fff94126800 [770.648] NIP: 00007fff94126800 LR: 0000000107e0b594 CTR: 0000000000000000 [770.648] REGS: c000200089affe80 TRAP: 0c00 Tainted: G W (6.1.0-0.deb11.7-powerpc64le Debian 6.1.20-2~bpo11+1a~test) [770.648] MSR: 900000000000d033 <SF,HV,EE,PR,ME,IR,DR,RI,LE> CR: 24002848 XER: 00000000 [770.648] IRQMASK: 0 GPR00: 0000000000000036 00007fffc9439da0 00007fff94217100 0000000000000003 GPR04: 00000000c4009420 00007fffc9439ee8 0000000000000000 0000000000000000 GPR08: 00000000803c7416 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fff9467d120 0000000107e64c9c 0000000107e64d0a GPR16: 0000000107e64d06 0000000107e64cf1 0000000107e64cc4 0000000107e64c73 GPR20: 0000000107e64c31 0000000107e64bf1 0000000107e64be7 0000000000000000 GPR24: 0000000000000000 00007fffc9439ee0 0000000000000003 0000000000000001 GPR28: 00007fffc943f713 0000000000000000 00007fffc9439ee8 0000000000000000 [770.648] NIP [00007fff94126800] 0x7fff94126800 [770.648] LR [0000000107e0b594] 0x107e0b594 [770.648] --- interrupt: c00 [770.648] Instruction dump: [770.648] 3b00ffe4 e8898828 481175f5 60000000 4bfff4fc 3be00000 4bfff570 3d220000 [770.648] 7fc4f378 e8698830 4811cd95 e8410018 <0fe00000> f9c10060 f9e10068 fa010070 [770.648] ---[ end trace 0000000000000000 ]--- [770.648] BTRFS: error (device dm-2: state A) in find_free_extent_update_loop:4122: errno=-22 unknown [770.648] BTRFS info (device dm-2: state EA): forced readonly [770.648] BTRFS: error (device dm-2: state EA) in __btrfs_free_extent:3070: errno=-22 unknown [770.648] BTRFS error (device dm-2: state EA): failed to run delayed ref for logical 17838685708288 num_bytes 24576 type 184 action 2 ref_mod 1: -22 [770.648] BTRFS: error (device dm-2: state EA) in btrfs_run_delayed_refs:2144: errno=-22 unknown [770.648] BTRFS: error (device dm-2: state EA) in reset_balance_state:3599: errno=-22 unknown Fixes: 47e6f7423b91 ("btrfs: add support for 3-copy replication (raid1c3)") Fixes: 8d6fac0087e5 ("btrfs: add support for 4-copy replication (raid1c4)") CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Matt Corallo <blnxfsl@bluematt.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19fs: Lock moved directoriesJan Kara1-8/+14
commit 28eceeda130f5058074dd007d9c59d2e8bc5af2e upstream. When a directory is moved to a different directory, some filesystems (udf, ext4, ocfs2, f2fs, and likely gfs2, reiserfs, and others) need to update their pointer to the parent and this must not race with other operations on the directory. Lock the directories when they are moved. Although not all filesystems need this locking, we perform it in vfs_rename() because getting the lock ordering right is really difficult and we don't want to expose these locking details to filesystems. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-5-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19fs: Establish locking order for unrelated directoriesJan Kara3-2/+46
commit f23ce757185319886ca80c4864ce5f81ac6cc9e9 upstream. Currently the locking order of inode locks for directories that are not in ancestor relationship is not defined because all operations that needed to lock two directories like this were serialized by sb->s_vfs_rename_mutex. However some filesystems need to lock two subdirectories for RENAME_EXCHANGE operations and for this we need the locking order established even for two tree-unrelated directories. Provide a helper function lock_two_inodes() that establishes lock ordering for any two inodes and use it in lock_two_directories(). CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-4-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19Revert "f2fs: fix potential corruption when moving a directory"Jan Kara1-15/+1
commit cde3c9d7e2a359e337216855dcb333a19daaa436 upstream. This reverts commit d94772154e524b329a168678836745d2773a6e02. The locking is going to be provided by VFS. CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-3-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19ext4: Remove ext4 locking of moved directoryJan Kara1-15/+2
commit 3658840cd363f2be094f5dfd2f0b174a9055dd0f upstream. Remove locking of moved directory in ext4_rename2(). We will take care of it in VFS instead. This effectively reverts commit 0813299c586b ("ext4: Fix possible corruption when moving a directory") and followup fixes. CC: Ted Tso <tytso@mit.edu> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230601105830.13168-1-jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19fs: avoid empty option when generating legacy mount stringThomas Weißschuh1-1/+2
commit 62176420274db5b5127cd7a0083a9aeb461756ee upstream. As each option string fragment is always prepended with a comma it would happen that the whole string always starts with a comma. This could be interpreted by filesystem drivers as an empty option and may produce errors. For example the NTFS driver from ntfs.ko behaves like this and fails when mounted via the new API. Link: https://github.com/util-linux/util-linux/issues/2298 Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Fixes: 3e1aeb00e6d1 ("vfs: Implement a filesystem superblock creation/configuration context") Cc: stable@vger.kernel.org Message-Id: <20230607-fs-empty-option-v1-1-20c8dbf4671b@weissschuh.net> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19jffs2: reduce stack usage in jffs2_build_xattr_subsystem()Fabian Frederick3-7/+15
commit 1168f095417643f663caa341211e117db552989f upstream. Use kcalloc() for allocation/flush of 128 pointers table to reduce stack usage. Function now returns -ENOMEM or 0 on success. stackusage Before: ./fs/jffs2/xattr.c:775 jffs2_build_xattr_subsystem 1208 dynamic,bounded After: ./fs/jffs2/xattr.c:775 jffs2_build_xattr_subsystem 192 dynamic,bounded Also update definition when CONFIG_JFFS2_FS_XATTR is not enabled Tested with an MTD mount point and some user set/getfattr. Many current target on OpenWRT also suffer from a compilation warning (that become an error with CONFIG_WERROR) with the following output: fs/jffs2/xattr.c: In function 'jffs2_build_xattr_subsystem': fs/jffs2/xattr.c:887:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] 887 | } | ^ Using dynamic allocation fix this compilation warning. Fixes: c9f700f840bd ("[JFFS2][XATTR] using 'delete marker' for xdatum/xref deletion") Reported-by: Tim Gardner <tim.gardner@canonical.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Ron Economos <re@w6rz.net> Reported-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Cc: stable@vger.kernel.org Message-Id: <20230506045612.16616-1-ansuelsmth@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfsRoberto Sassu1-1/+1
commit 36ce9d76b0a93bae799e27e4f5ac35478c676592 upstream. As the ramfs-based tmpfs uses ramfs_init_fs_context() for the init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb() to free it and avoid a memory leak. Link: https://lkml.kernel.org/r/20230607161523.2876433-1-roberto.sassu@huaweicloud.com Fixes: c3b1b1cbf002 ("ramfs: add support for "mode=" mount option") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19NFSD: add encoding of op_recall flag for write delegationDai Ngo1-1/+1
commit 58f5d894006d82ed7335e1c37182fbc5f08c2f51 upstream. Modified nfsd4_encode_open to encode the op_recall flag properly for OPEN result with write delegation granted. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19btrfs: do not BUG_ON() on tree mod log failure at balance_level()Filipe Manana1-3/+14
[ Upstream commit 39020d8abc7ec62c4de9b260e3d10d4a1c2478ce ] At balance_level(), instead of doing a BUG_ON() in case we fail to record tree mod log operations, do a transaction abort and return the error to the callers. There's really no need for the BUG_ON() as we can release all resources in this context, and we have to abort because other future tree searches that use the tree mod log (btrfs_search_old_slot()) may get inconsistent results if other operations modify the tree after that failure and before the tree mod log based search. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19afs: Fix accidental truncation when storing dataDavid Howells1-3/+5
[ Upstream commit 03275585cabd0240944f19f33d7584a1b099a3a8 ] When an AFS FS.StoreData RPC call is made, amongst other things it is given the resultant file size to be. On the server, this is processed by truncating the file to new size and then writing the data. Now, kafs has a lock (vnode->io_lock) that serves to serialise operations against a specific vnode (ie. inode), but the parameters for the op are set before the lock is taken. This allows two writebacks (say sync and kswapd) to race - and if writes are ongoing the writeback for a later write could occur before the writeback for an earlier one if the latter gets interrupted. Note that afs_writepages() cannot take i_mutex and only takes a shared lock on vnode->validate_lock. Also note that the server does the truncation and the write inside a lock, so there's no problem at that end. Fix this by moving the calculation for the proposed new i_size inside the vnode->io_lock. Also reset the iterator (which we might have read from) and update the mtime setting there. Fixes: bd80d8a80e12 ("afs: Use ITER_XARRAY for writing") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/3526895.1687960024@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19fanotify: disallow mount/sb marks on kernel internal pseudo fsAmir Goldstein1-0/+14
[ Upstream commit 69562eb0bd3e6bb8e522a7b254334e0fb30dff0c ] Hopefully, nobody is trying to abuse mount/sb marks for watching all anonymous pipes/inodes. I cannot think of a good reason to allow this - it looks like an oversight that dated back to the original fanotify API. Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack3/ Fixes: 0ff21db9fcc3 ("fanotify: hooks the fanotify_mark syscall to the vfsmount code") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230629042044.25723-1-amir73il@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19ntfs: Fix panic about slab-out-of-bounds caused by ntfs_listxattr()Zeng Heng1-0/+3
[ Upstream commit 3c675ddffb17a8b1e32efad5c983254af18b12c2 ] Here is a BUG report from syzbot: BUG: KASAN: slab-out-of-bounds in ntfs_list_ea fs/ntfs3/xattr.c:191 [inline] BUG: KASAN: slab-out-of-bounds in ntfs_listxattr+0x401/0x570 fs/ntfs3/xattr.c:710 Read of size 1 at addr ffff888021acaf3d by task syz-executor128/3632 Call Trace: ntfs_list_ea fs/ntfs3/xattr.c:191 [inline] ntfs_listxattr+0x401/0x570 fs/ntfs3/xattr.c:710 vfs_listxattr fs/xattr.c:457 [inline] listxattr+0x293/0x2d0 fs/xattr.c:804 Fix the logic of ea_all iteration. When the ea->name_len is 0, return immediately, or Add2Ptr() would visit invalid memory in the next loop. Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations") Reported-by: syzbot+9fcea5ef6dc4dc72d334@syzkaller.appspotmail.com Signed-off-by: Zeng Heng <zengheng4@huawei.com> [almaz.alexandrovich@paragon-software.com: lines of the patch have changed] Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19f2fs: fix error path handling in truncate_dnode()Chao Yu1-1/+3
[ Upstream commit 0135c482fa97e2fd8245cb462784112a00ed1211 ] If truncate_node() fails in truncate_dnode(), it missed to call f2fs_put_page(), fix it. Fixes: 7735730d39d7 ("f2fs: fix to propagate error from __get_meta_page()") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19f2fs: check return value of freeze_super()Chao Yu1-1/+3
[ Upstream commit 8bec7dd1b3f7d7769d433d67bde404de948a2d95 ] freeze_super() can fail, it needs to check its return value and do error handling in f2fs_resize_fs(). Fixes: 04f0b2eaa3b3 ("f2fs: ioctl for removing a range from F2FS") Fixes: b4b10061ef98 ("f2fs: refactor resize_fs to avoid meta updates in progress") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19f2fs: fix to avoid NULL pointer dereference f2fs_write_end_io()Chao Yu3-5/+20
[ Upstream commit d8189834d4348ae608083e1f1f53792cfcc2a9bc ] butt3rflyh4ck reports a bug as below: When a thread always calls F2FS_IOC_RESIZE_FS to resize fs, if resize fs is failed, f2fs kernel thread would invoke callback function to update f2fs io info, it would call f2fs_write_end_io and may trigger null-ptr-deref in NODE_MAPPING. general protection fault, probably for non-canonical address KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] RIP: 0010:NODE_MAPPING fs/f2fs/f2fs.h:1972 [inline] RIP: 0010:f2fs_write_end_io+0x727/0x1050 fs/f2fs/data.c:370 <TASK> bio_endio+0x5af/0x6c0 block/bio.c:1608 req_bio_endio block/blk-mq.c:761 [inline] blk_update_request+0x5cc/0x1690 block/blk-mq.c:906 blk_mq_end_request+0x59/0x4c0 block/blk-mq.c:1023 lo_complete_rq+0x1c6/0x280 drivers/block/loop.c:370 blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1101 __do_softirq+0x1d4/0x8ef kernel/softirq.c:571 run_ksoftirqd kernel/softirq.c:939 [inline] run_ksoftirqd+0x31/0x60 kernel/softirq.c:931 smpboot_thread_fn+0x659/0x9e0 kernel/smpboot.c:164 kthread+0x33e/0x440 kernel/kthread.c:379 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 The root cause is below race case can cause leaving dirty metadata in f2fs after filesystem is remount as ro: Thread A Thread B - f2fs_ioc_resize_fs - f2fs_readonly --- return false - f2fs_resize_fs - f2fs_remount - write_checkpoint - set f2fs as ro - free_segment_range - update meta_inode's data Then, if f2fs_put_super() fails to write_checkpoint due to readonly status, and meta_inode's dirty data will be writebacked after node_inode is put, finally, f2fs_write_end_io will access NULL pointer on sbi->node_inode. Thread A IRQ context - f2fs_put_super - write_checkpoint fails - iput(node_inode) - node_inode = NULL - iput(meta_inode) - write_inode_now - f2fs_write_meta_page - f2fs_write_end_io - NODE_MAPPING(sbi) : access NULL pointer on node_inode Fixes: b4b10061ef98 ("f2fs: refactor resize_fs to avoid meta updates in progress") Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Closes: https://lore.kernel.org/r/1684480657-2375-1-git-send-email-yangtiezhu@loongson.cn Tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19f2fs: fix potential deadlock due to unpaired node_write lock useChao Yu2-6/+8
[ Upstream commit f082c6b205a06953f26c40bdc7621cc5a58ceb7c ] If S_NOQUOTA is cleared from inode during data page writeback of quota file, it may miss to unlock node_write lock, result in potential deadlock, fix to use the lock in paired. Kworker Thread - writepage if (IS_NOQUOTA()) f2fs_down_read(&sbi->node_write); - vfs_cleanup_quota_inode - inode->i_flags &= ~S_NOQUOTA; if (IS_NOQUOTA()) f2fs_up_read(&sbi->node_write); Fixes: 79963d967b49 ("f2fs: shrink node_write lock coverage") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19gfs2: Fix duplicate should_fault_in_pages() callBob Peterson1-1/+1
[ Upstream commit c8ed1b35931245087968fd95b2ec3dfc50f77769 ] In gfs2_file_buffered_write(), we currently jump from the second call of function should_fault_in_pages() to above the first call, so should_fault_in_pages() is getting called twice in a row, causing it to accidentally fall back to single-page writes rather than trying the more efficient multi-page writes first. Fix that by moving the retry label to the correct place, behind the first call to should_fault_in_pages(). Fixes: e1fa9ea85ce8 ("gfs2: Stop using glock holder auto-demotion for now") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19kernfs: fix missing kernfs_idr_lock to remove an ID from the IDRMuchun Song1-0/+2
[ Upstream commit 30480b988f88c279752f3202a26b6fee5f586aef ] The root->ino_idr is supposed to be protected by kernfs_idr_lock, fix it. Fixes: 488dee96bb62 ("kernfs: allow creating kernfs objects with arbitrary uid/gid") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230523024017.24851-1-songmuchun@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19f2fs: do not allow to defragment files have FI_COMPRESS_RELEASEDYangtao Li1-0/+6
[ Upstream commit 7cd2e5f75b86a1befa99834f3ed1d735eeff69e6 ] If a file has FI_COMPRESS_RELEASED, all writes for it should not be allowed. Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Signed-off-by: Qi Han <hanqi@vivo.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19btrfs: fix race when deleting free space root from the dirty cow roots listFilipe Manana1-0/+3
commit babebf023e661b90b1c78b2baa384fb03a226879 upstream. When deleting the free space tree we are deleting the free space root from the list fs_info->dirty_cowonly_roots without taking the lock that protects it, which is struct btrfs_fs_info::trans_lock. This unsynchronized list manipulation may cause chaos if there's another concurrent manipulation of this list, such as when adding a root to it with ctree.c:add_root_to_dirty_list(). This can result in all sorts of weird failures caused by a race, such as the following crash: [337571.278245] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] PREEMPT SMP PTI [337571.278933] CPU: 1 PID: 115447 Comm: btrfs Tainted: G W 6.4.0-rc6-btrfs-next-134+ #1 [337571.279153] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [337571.279572] RIP: 0010:commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.279928] Code: 85 38 06 00 (...) [337571.280363] RSP: 0018:ffff9f63446efba0 EFLAGS: 00010206 [337571.280582] RAX: ffff942d98ec2638 RBX: ffff9430b82b4c30 RCX: 0000000449e1c000 [337571.280798] RDX: dead000000000100 RSI: ffff9430021e4900 RDI: 0000000000036070 [337571.281015] RBP: ffff942d98ec2000 R08: ffff942d98ec2000 R09: 000000000000015b [337571.281254] R10: 0000000000000009 R11: 0000000000000001 R12: ffff942fe8fbf600 [337571.281476] R13: ffff942dabe23040 R14: ffff942dabe20800 R15: ffff942d92cf3b48 [337571.281723] FS: 00007f478adb7340(0000) GS:ffff94349fa40000(0000) knlGS:0000000000000000 [337571.281950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [337571.282184] CR2: 00007f478ab9a3d5 CR3: 000000001e02c001 CR4: 0000000000370ee0 [337571.282416] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [337571.282647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [337571.282874] Call Trace: [337571.283101] <TASK> [337571.283327] ? __die_body+0x1b/0x60 [337571.283570] ? die_addr+0x39/0x60 [337571.283796] ? exc_general_protection+0x22e/0x430 [337571.284022] ? asm_exc_general_protection+0x22/0x30 [337571.284251] ? commit_cowonly_roots+0x11f/0x250 [btrfs] [337571.284531] btrfs_commit_transaction+0x42e/0xf90 [btrfs] [337571.284803] ? _raw_spin_unlock+0x15/0x30 [337571.285031] ? release_extent_buffer+0x103/0x130 [btrfs] [337571.285305] reset_balance_state+0x152/0x1b0 [btrfs] [337571.285578] btrfs_balance+0xa50/0x11e0 [btrfs] [337571.285864] ? __kmem_cache_alloc_node+0x14a/0x410 [337571.286086] btrfs_ioctl+0x249a/0x3320 [btrfs] [337571.286358] ? mod_objcg_state+0xd2/0x360 [337571.286577] ? refill_obj_stock+0xb0/0x160 [337571.286798] ? seq_release+0x25/0x30 [337571.287016] ? __rseq_handle_notify_resume+0x3ba/0x4b0 [337571.287235] ? percpu_counter_add_batch+0x2e/0xa0 [337571.287455] ? __x64_sys_ioctl+0x88/0xc0 [337571.287675] __x64_sys_ioctl+0x88/0xc0 [337571.287901] do_syscall_64+0x38/0x90 [337571.288126] entry_SYSCALL_64_after_hwframe+0x72/0xdc [337571.288352] RIP: 0033:0x7f478aaffe9b So fix this by locking struct btrfs_fs_info::trans_lock before deleting the free space root from that list. Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19ksmbd: avoid field overflow warningArnd Bergmann1-1/+1
[ Upstream commit 9cedc58bdbe9fff9aacd0ca19ee5777659f28fd7 ] clang warns about a possible field overflow in a memcpy: In file included from fs/smb/server/smb_common.c:7: include/linux/fortify-string.h:583:4: error: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror,-Wattribute-warning] __write_overflow_field(p_size_field, size); It appears to interpret the "&out[baselen + 4]" as referring to a single byte of the character array, while the equivalen "out + baselen + 4" is seen as an offset into the array. I don't see that kind of warning elsewhere, so just go with the simple rework. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19smb: client: fix broken file attrs with nodfs mountsPaulo Alcantara1-3/+0
[ Upstream commit d439b29057e26464120fc6c18f97433aa003b5fe ] *_get_inode_info() functions expect -EREMOTE when query path info calls find a DFS link, regardless whether !CONFIG_CIFS_DFS_UPCALL or 'nodfs' mount option. Otherwise, those files will miss the fake DFS file attributes. Before patch $ mount.cifs //srv/dfs /mnt/1 -o ...,nodfs $ ls -l /mnt/1 ls: cannot access '/mnt/1/link': Operation not supported total 0 -rwxr-xr-x 1 root root 0 Jul 26 2022 dfstest2_file1.txt drwxr-xr-x 2 root root 0 Aug 8 2022 dir1 d????????? ? ? ? ? ? link After patch $ mount.cifs //srv/dfs /mnt/1 -o ...,nodfs $ ls -l /mnt/1 total 0 -rwxr-xr-x 1 root root 0 Jul 26 2022 dfstest2_file1.txt drwxr-xr-x 2 root root 0 Aug 8 2022 dir1 drwx--x--x 2 root root 0 Jun 26 20:29 link Fixes: c877ce47e137 ("cifs: reduce roundtrips on create/qinfo requests") Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19cifs: do all necessary checks for credits within or before lockingShyam Prasad N2-19/+20
[ Upstream commit 326a8d04f147e2bf393f6f9cdb74126ee6900607 ] All the server credits and in-flight info is protected by req_lock. Once the req_lock is held, and we've determined that we have enough credits to continue, this lock cannot be dropped till we've made the changes to credits and in-flight count. However, we used to drop the lock in order to avoid deadlock with the recent srv_lock. This could cause the checks already made to be invalidated. Fixed it by moving the server status check to before locking req_lock. Fixes: d7d7a66aacd6 ("cifs: avoid use of global locks for high contention data") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19cifs: prevent use-after-free by freeing the cfile laterShyam Prasad N1-3/+3
[ Upstream commit 33f736187d08f6bc822117629f263b97d3df4165 ] In smb2_compound_op we have a possible use-after-free which can cause hard to debug problems later on. This was revealed during stress testing with KASAN enabled kernel. Fixing it by moving the cfile free call to a few lines below, after the usage. Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+") Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19SMB3: Do not send lease break acknowledgment if all file handles have been ↵Bharath SM1-13/+12
closed [ Upstream commit da787d5b74983f7525d1eb4b9c0b4aff2821511a ] In case if all existing file handles are deferred handles and if all of them gets closed due to handle lease break then we dont need to send lease break acknowledgment to server, because last handle close will be considered as lease break ack. After closing deferred handels, we check for openfile list of inode, if its empty then we skip sending lease break ack. Fixes: 59a556aebc43 ("SMB3: drop reference to cfile before sending oplock break") Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSIONOlga Kornievskaia1-0/+1
[ Upstream commit c907e72f58ed979a24a9fdcadfbc447c51d5e509 ] When the client received NFS4ERR_BADSESSION, it schedules recovery and start the state manager thread which in turn freezes the session table and does not allow for any new requests to use the no-longer valid session. However, it is possible that before the state manager thread runs, a new operation would use the released slot that received BADSESSION and was therefore not updated its sequence number. Such re-use of the slot can lead the application errors. Fixes: 5c441544f045 ("NFSv4.x: Handle bad/dead sessions correctly in nfs41_sequence_process()") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19NFSv4.2: fix wrong shrinker_idQi Zheng1-35/+44
[ Upstream commit 7f7ab336898f281e58540ef781a8fb375acc32a9 ] Currently, the list_lru::shrinker_id corresponding to the nfs4_xattr shrinkers is wrong: >>> prog["nfs4_xattr_cache_lru"].shrinker_id (int)0 >>> prog["nfs4_xattr_entry_lru"].shrinker_id (int)0 >>> prog["nfs4_xattr_large_entry_lru"].shrinker_id (int)0 >>> prog["nfs4_xattr_cache_shrinker"].id (int)18 >>> prog["nfs4_xattr_entry_shrinker"].id (int)19 >>> prog["nfs4_xattr_large_entry_shrinker"].id (int)20 This is not what we expect, which will cause these shrinkers not to be found in shrink_slab_memcg(). We should assign shrinker::id before calling list_lru_init_memcg(), so that the corresponding list_lru::shrinker_id will be assigned the correct value like below: >>> prog["nfs4_xattr_cache_lru"].shrinker_id (int)16 >>> prog["nfs4_xattr_entry_lru"].shrinker_id (int)17 >>> prog["nfs4_xattr_large_entry_lru"].shrinker_id (int)18 >>> prog["nfs4_xattr_cache_shrinker"].id (int)16 >>> prog["nfs4_xattr_entry_shrinker"].id (int)17 >>> prog["nfs4_xattr_large_entry_shrinker"].id (int)18 So just do it. Fixes: 95ad37f90c33 ("NFSv4.2: add client side xattr caching.") Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19ovl: update of dentry revalidate flags after copy upAmir Goldstein7-13/+30
[ Upstream commit b07d5cc93e1b28df47a72c519d09d0a836043613 ] After copy up, we may need to update d_flags if upper dentry is on a remote fs and lower dentries are not. Add helpers to allow incremental update of the revalidate flags. Fixes: bccece1ead36 ("ovl: allow remote upper") Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19ocfs2: Fix use of slab data with sendpageDavid Howells1-11/+12
[ Upstream commit 86d7bd6e66e9925f0f04a7bcf3c92c05fdfefb5a ] ocfs2 uses kzalloc() to allocate buffers for o2net_hand, o2net_keep_req and o2net_keep_resp and then passes these to sendpage. This isn't really allowed as the lifetime of slab objects is not controlled by page ref - though in this case it will probably work. sendmsg() with MSG_SPLICE_PAGES will, however, print a warning and give an error. Fix it to use folio_alloc() instead to allocate a buffer for the handshake message, keepalive request and reply messages. Fixes: 98211489d414 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: David Howells <dhowells@redhat.com> cc: Mark Fasheh <mark@fasheh.com> cc: Kurt Hackel <kurt.hackel@oracle.com> cc: Joel Becker <jlbec@evilplan.org> cc: Joseph Qi <joseph.qi@linux.alibaba.com> cc: ocfs2-devel@oss.oracle.com Link: https://lore.kernel.org/r/20230623225513.2732256-14-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19pstore/ram: Add check for kstrdupJiasheng Jiang1-0/+2
[ Upstream commit d97038d5ec2062733c1e016caf9baaf68cf64ea1 ] Add check for the return value of kstrdup() and return the error if it fails in order to avoid NULL pointer dereference. Fixes: e163fdb3f7f8 ("pstore/ram: Regularize prz label allocation lifetime") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230614093733.36048-1-jiasheng@iscas.ac.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19erofs: fix compact 4B support for 16k block sizeGao Xiang1-5/+1
[ Upstream commit 001b8ccd0650727e54ec16ef72bf1b8eeab7168e ] In compact 4B, two adjacent lclusters are packed together as a unit to form on-disk indexes for effective random access, as below: (amortized = 4, vcnt = 2) _____________________________________________ |___@_____ encoded bits __________|_ blkaddr _| 0 . amortized * vcnt = 8 . . . . amortized * vcnt - 4 = 4 . . .____________________________. |_type (2 bits)_|_clusterofs_| Therefore, encoded bits for each pack are 32 bits (4 bytes). IOWs, since each lcluster can get 16 bits for its type and clusterofs, the maximum supported lclustersize for compact 4B format is 16k (14 bits). Fix this to enable compact 4B format for 16k lclusters (blocks), which is tested on an arm64 server with 16k page size. Fixes: 152a333a5895 ("staging: erofs: add compacted compression indexes support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230601112341.56960-1-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19erofs: simplify iloc()Gao Xiang5-35/+25
[ Upstream commit b780d3fc6107464dcc43631a6208c43b6421f1e6 ] Actually we could pass in inodes directly to clean up all callers. Also rename iloc() as erofs_iloc(). Link: https://lore.kernel.org/r/20230114150823.432069-1-xiang@kernel.org Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Stable-dep-of: 001b8ccd0650 ("erofs: fix compact 4B support for 16k block size") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19lockd: drop inappropriate svc_get() from locked_get()NeilBrown1-1/+0
[ Upstream commit 665e89ab7c5af1f2d260834c861a74b01a30f95f ] The below-mentioned patch was intended to simplify refcounting on the svc_serv used by locked. The goal was to only ever have a single reference from the single thread. To that end we dropped a call to lockd_start_svc() (except when creating thread) which would take a reference, and dropped the svc_put(serv) that would drop that reference. Unfortunately we didn't also remove the svc_get() from lockd_create_svc() in the case where the svc_serv already existed. So after the patch: - on the first call the svc_serv was allocated and the one reference was given to the thread, so there are no extra references - on subsequent calls svc_get() was called so there is now an extra reference. This is clearly not consistent. The inconsistency is also clear in the current code in lockd_get() takes *two* references, one on nlmsvc_serv and one by incrementing nlmsvc_users. This clearly does not match lockd_put(). So: drop that svc_get() from lockd_get() (which used to be in lockd_create_svc(). Reported-by: Ido Schimmel <idosch@idosch.org> Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u Fixes: b73a2972041b ("lockd: move lockd_start_svc() call into lockd_create_svc()") Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19erofs: kill hooked chains to avoid loops on deduplicated compressed imagesGao Xiang1-61/+11
[ Upstream commit 967c28b23f6c89bb8eef6a046ea88afe0d7c1029 ] After heavily stressing EROFS with several images which include a hand-crafted image of repeated patterns for more than 46 days, I found two chains could be linked with each other almost simultaneously and form a loop so that the entire loop won't be submitted. As a consequence, the corresponding file pages will remain locked forever. It can be _only_ observed on data-deduplicated compressed images. For example, consider two chains with five pclusters in total: Chain 1: 2->3->4->5 -- The tail pcluster is 5; Chain 2: 5->1->2 -- The tail pcluster is 2. Chain 2 could link to Chain 1 with pcluster 5; and Chain 1 could link to Chain 2 at the same time with pcluster 2. Since hooked chains are all linked locklessly now, I have no idea how to simply avoid the race. Instead, let's avoid hooked chains completely until I could work out a proper way to fix this and end users finally tell us that it's needed to add it back. Actually, this optimization can be found with multi-threaded workloads (especially even more often on deduplicated compressed images), yet I'm not sure about the overall system impacts of not having this compared with implementation complexity. Fixes: 267f2492c8f7 ("erofs: introduce multi-reference pclusters (fully-referenced)") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20230526201459.128169-4-hsiangkao@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19erofs: move zdata.h into zdata.cGao Xiang2-178/+165
[ Upstream commit a9a94d9373349e1a53f149d2015eb6f03a8517cf ] Definitions in zdata.h are only used in zdata.c and for internal use only. No logic changes. Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230204093040.97967-4-hsiangkao@linux.alibaba.com Stable-dep-of: 967c28b23f6c ("erofs: kill hooked chains to avoid loops on deduplicated compressed images") Signed-off-by: Sasha Levin <sashal@kernel.org>