summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-06-16proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operationTyler Hicks (Microsoft)1-22/+20
commit 0a960ba49869ebe8ff859d000351504dd6b93b68 upstream. The following commits loosened the permissions of /proc/<PID>/fdinfo/ directory, as well as the files within it, from 0500 to 0555 while also introducing a PTRACE_MODE_READ check between the current task and <PID>'s task: - commit 7bc3fa0172a4 ("procfs: allow reading fdinfo with PTRACE_MODE_READ") - commit 1927e498aee1 ("procfs: prevent unprivileged processes accessing fdinfo dir") Before those changes, inode based system calls like inotify_add_watch(2) would fail when the current task didn't have sufficient read permissions: [...] lstat("/proc/1/task/1/fdinfo", {st_mode=S_IFDIR|0500, st_size=0, ...}) = 0 inotify_add_watch(64, "/proc/1/task/1/fdinfo", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE| IN_ONLYDIR|IN_DONT_FOLLOW|IN_EXCL_UNLINK) = -1 EACCES (Permission denied) [...] This matches the documented behavior in the inotify_add_watch(2) man page: ERRORS EACCES Read access to the given file is not permitted. After those changes, inotify_add_watch(2) started succeeding despite the current task not having PTRACE_MODE_READ privileges on the target task: [...] lstat("/proc/1/task/1/fdinfo", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 inotify_add_watch(64, "/proc/1/task/1/fdinfo", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE| IN_ONLYDIR|IN_DONT_FOLLOW|IN_EXCL_UNLINK) = 1757 openat(AT_FDCWD, "/proc/1/task/1/fdinfo", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 EACCES (Permission denied) [...] This change in behavior broke .NET prior to v7. See the github link below for the v7 commit that inadvertently/quietly (?) fixed .NET after the kernel changes mentioned above. Return to the old behavior by moving the PTRACE_MODE_READ check out of the file .open operation and into the inode .permission operation: [...] lstat("/proc/1/task/1/fdinfo", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 inotify_add_watch(64, "/proc/1/task/1/fdinfo", IN_MODIFY|IN_ATTRIB|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE| IN_ONLYDIR|IN_DONT_FOLLOW|IN_EXCL_UNLINK) = -1 EACCES (Permission denied) [...] Reported-by: Kevin Parsons (Microsoft) <parsonskev@gmail.com> Link: https://github.com/dotnet/runtime/commit/89e5469ac591b82d38510fe7de98346cce74ad4f Link: https://stackoverflow.com/questions/75379065/start-self-contained-net6-build-exe-as-service-on-raspbian-system-unauthorizeda Fixes: 7bc3fa0172a4 ("procfs: allow reading fdinfo with PTRACE_MODE_READ") Cc: stable@vger.kernel.org Cc: Christian Brauner <brauner@kernel.org> Cc: Christian König <christian.koenig@amd.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Hardik Garg <hargar@linux.microsoft.com> Cc: Allen Pais <apais@linux.microsoft.com> Signed-off-by: Tyler Hicks (Microsoft) <code@tyhicks.com> Link: https://lore.kernel.org/r/20240501005646.745089-1-code@tyhicks.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16fsverity: use register_sysctl_init() to avoid kmemleak warningEric Biggers1-6/+1
commit ee5814dddefbaa181cb247a75676dd5103775db1 upstream. Since the fsverity sysctl registration runs as a builtin initcall, there is no corresponding sysctl deregistration and the resulting struct ctl_table_header is not used. This can cause a kmemleak warning just after the system boots up. (A pointer to the ctl_table_header is stored in the fsverity_sysctl_header static variable, which kmemleak should detect; however, the compiler can optimize out that variable.) Avoid the kmemleak warning by using register_sysctl_init() which is intended for use by builtin initcalls and uses kmemleak_not_leak(). Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/r/CAHj4cs8DTSvR698UE040rs_pX1k-WVe7aR6N2OoXXuhXJPDC-w@mail.gmail.com Cc: stable@vger.kernel.org Reviewed-by: Joel Granados <j.granados@samsung.com> Link: https://lore.kernel.org/r/20240501025331.594183-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16btrfs: qgroup: fix initialization of auto inherit arrayDan Carpenter1-1/+1
commit 0e39c9e524479b85c1b83134df0cfc6e3cb5353a upstream. The "i++" was accidentally left out so it just sets qgids[0] over and over. This can lead to unexpected problems, as the groups[1:] would be all 0, leading to later find_qgroup_rb() unable to find a qgroup and cause snapshot creation failure. Fixes: 5343cd9364ea ("btrfs: qgroup: simple quota auto hierarchy for nested subvolumes") CC: stable@vger.kernel.org # 6.7+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()Chao Yu1-0/+6
commit 20faaf30e55522bba2b56d9c46689233205d7717 upstream. syzbot reports a kernel bug as below: F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 ================================================================== BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 Read of size 1 at addr ffff88807a58c76c by task syz-executor280/5076 CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline] f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838 __do_sys_ioctl fs/ioctl.c:902 [inline] __se_sys_ioctl+0x81/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is we missed to do sanity check on i_xattr_nid during f2fs_iget(), so that in fiemap() path, current_nat_addr() will access nat_bitmap w/ offset from invalid i_xattr_nid, result in triggering kasan bug report, fix it. Reported-and-tested-by: syzbot+3694e283cf5c40df6d14@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/00000000000094036c0616e72a1d@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16erofs: avoid allocating DEFLATE streams before mountingGao Xiang1-26/+29
commit 80eb4f62056d6ae709bdd0636ab96ce660f494b2 upstream. Currently, each DEFLATE stream takes one 32 KiB permanent internal window buffer even if there is no running instance which uses DEFLATE algorithm. It's unexpected and wasteful on embedded devices with limited resources and servers with hundreds of CPU cores if DEFLATE is enabled but unused. Fixes: ffa09b3bd024 ("erofs: DEFLATE compression support") Cc: <stable@vger.kernel.org> # 6.6+ Reviewed-by: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240520090106.2898681-1-hsiangkao@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-16afs: Don't cross .backup mountpoint from backup volumeMarc Dionne1-0/+5
commit 29be9100aca2915fab54b5693309bc42956542e5 upstream. Don't cross a mountpoint that explicitly specifies a backup volume (target is <vol>.backup) when starting from a backup volume. It it not uncommon to mount a volume's backup directly in the volume itself. This can cause tools that are not paying attention to get into a loop mounting the volume onto itself as they attempt to traverse the tree, leading to a variety of problems. This doesn't prevent the general case of loops in a sequence of mountpoints, but addresses a common special case in the same way as other afs clients. Reported-by: Jan Henrik Sylvester <jan.henrik.sylvester@uni-hamburg.de> Link: http://lists.infradead.org/pipermail/linux-afs/2024-May/008454.html Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Link: http://lists.infradead.org/pipermail/linux-afs/2024-February/008074.html Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/768760.1716567475@warthog.procyon.org.uk Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12cifs: Fix missing set of remote_i_sizeDavid Howells2-3/+4
[ Upstream commit 93a43155127fec0f8cc942d63b76668c2f8f69fa ] Occasionally, the generic/001 xfstest will fail indicating corruption in one of the copy chains when run on cifs against a server that supports FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs). The problem is that the remote_i_size value isn't updated by cifs_setsize() when called by smb2_duplicate_extents(), but i_size *is*. This may cause cifs_remap_file_range() to then skip the bit after calling ->duplicate_extents() that sets sizes. Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before calling cifs_setsize() to set i_size. This means we don't then need to call netfs_resize_file() upon return from ->duplicate_extents(), but we also fix the test to compare against the pre-dup inode size. [Note that this goes back before the addition of remote_i_size with the netfs_inode struct. It should probably have been setting cifsi->server_eof previously.] Fixes: cfc63fc8126a ("smb3: fix cached file size problems in duplicate extents (reflink)") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12cifs: Set zero_point in the copy_file_range() and remap_file_range()David Howells1-0/+6
[ Upstream commit 3758c485f6c9124d8ad76b88382004cbc28a0892 ] Set zero_point in the copy_file_range() and remap_file_range() implementations so that we don't skip reading data modified on a server-side copy. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Stable-dep-of: 93a43155127f ("cifs: Fix missing set of remote_i_size") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12netfs: Fix setting of BDP_ASYNC from iocb flagsDavid Howells1-1/+1
[ Upstream commit c596bea1452ddf172ec9b588e4597228e9a1f4d5 ] Fix netfs_perform_write() to set BDP_ASYNC if IOCB_NOWAIT is set rather than if IOCB_SYNC is not set. It reflects asynchronicity in the sense of not waiting rather than synchronicity in the sense of not returning until the op is complete. Without this, generic/590 fails on cifs in strict caching mode with a complaint that one of the writes fails with EAGAIN. The test can be distilled down to: mount -t cifs /my/share /mnt -ostuff xfs_io -i -c 'falloc 0 8191M -c fsync -f /mnt/file xfs_io -i -c 'pwrite -b 1M -W 0 8191M' /mnt/file Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/316306.1716306586@warthog.procyon.org.uk Reviewed-by: Jens Axboe <axboe@kernel.dk> cc: Jeff Layton <jlayton@kernel.org> cc: Enzo Matsumiya <ematsumiya@suse.de> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12pNFS/filelayout: fixup pNfs allocation modesOlga Kornievskaia1-2/+2
[ Upstream commit 3ebcb24646f8c5bfad2866892d3f3cff05514452 ] Change left over allocation flags. Fixes: a245832aaa99 ("pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12nfs: keep server info for remountsMartin Kaiser1-3/+6
[ Upstream commit b322bf9e983addedff0894c55e92d58f4d16d92a ] With newer kernels that use fs_context for nfs mounts, remounts fail with -EINVAL. $ mount -t nfs -o nolock 10.0.0.1:/tmp/test /mnt/test/ $ mount -t nfs -o remount /mnt/test/ mount: mounting 10.0.0.1:/tmp/test on /mnt/test failed: Invalid argument For remounts, the nfs server address and port are populated by nfs_init_fs_context and later overwritten with 0x00 bytes by nfs23_parse_monolithic. The remount then fails as the server address is invalid. Fix this by not overwriting nfs server info in nfs23_parse_monolithic if we're doing a remount. Fixes: f2aedb713c28 ("NFS: Add fs_context support.") Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12NFSv4: Fixup smatch warning for ambiguous returnBenjamin Coddington1-7/+5
[ Upstream commit 37ffe06537af3e3ec212e7cbe941046fce0a822f ] Dan Carpenter reports smatch warning for nfs4_try_migration() when a memory allocation failure results in a zero return value. In this case, a transient allocation failure error will likely be retried the next time the server responds with NFS4ERR_MOVED. We can fixup the smatch warning with a small refactor: attempt all three allocations before testing and returning on a failure. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: c3ed222745d9 ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12fs/ntfs3: Use variable length array instead of fixed sizeKonstantin Komarov1-1/+1
[ Upstream commit 1997cdc3e727526aa5d84b32f7cbb3f56459b7ef ] Should fix smatch warning: ntfs_set_label() error: __builtin_memcpy() 'uni->name' too small (20 vs 256) Fixes: 4534a70b7056f ("fs/ntfs3: Add headers and misc files") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202401091421.3RJ24Mn3-lkp@intel.com/ Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12fs/ntfs3: Use 64 bit variable to avoid 32 bit overflowKonstantin Komarov1-1/+2
[ Upstream commit e931f6b630ffb22d66caab202a52aa8cbb10c649 ] For example, in the expression: vbo = 2 * vbo + skip Fixes: b46acd6a6a627 ("fs/ntfs3: Add NTFS journal") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12fs/ntfs3: Check 'folio' pointer for NULLKonstantin Komarov1-6/+11
[ Upstream commit 1cd6c96219c429ebcfa8e79a865277376c563803 ] It can be NULL if bmap is called. Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12ocfs2: correctly use ocfs2_find_next_zero_bit()Joseph Qi3-18/+9
[ Upstream commit 30dd3478c3cd7d01cc5afc4952e885ba4eefb730 ] If no bits are zero, ocfs2_find_next_zero_bit() will return max size, so check the return value with -1 is meaningless. Correct this usage and cleanup the code. Link: https://lkml.kernel.org/r/20240314021713.240796-1-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 28d2188709d9 ("selftests/harness: use 1024 in place of LINE_MAX") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: fix to add missing iput() in gc_data_segment()Chao Yu1-2/+7
[ Upstream commit a798ff17cd2dabe47d5d4ed3d509631793c36e19 ] During gc_data_segment(), if inode state is abnormal, it missed to call iput(), fix it. Fixes: b73e52824c89 ("f2fs: reposition unlock_new_inode to prevent accessing invalid inode") Fixes: 9056d6489f5a ("f2fs: fix to do sanity check on inode type during garbage collection") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12fuse: clear FR_SENT when re-adding requests into pending listHou Tao1-0/+1
[ Upstream commit 246014876d782bbf2e652267482cd2e799fb5fcd ] The following warning was reported by lee bruce: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 8264 at fs/fuse/dev.c:300 fuse_request_end+0x685/0x7e0 fs/fuse/dev.c:300 Modules linked in: CPU: 0 PID: 8264 Comm: ab2 Not tainted 6.9.0-rc7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:fuse_request_end+0x685/0x7e0 fs/fuse/dev.c:300 ...... Call Trace: <TASK> fuse_dev_do_read.constprop.0+0xd36/0x1dd0 fs/fuse/dev.c:1334 fuse_dev_read+0x166/0x200 fs/fuse/dev.c:1367 call_read_iter include/linux/fs.h:2104 [inline] new_sync_read fs/read_write.c:395 [inline] vfs_read+0x85b/0xba0 fs/read_write.c:476 ksys_read+0x12f/0x260 fs/read_write.c:619 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xce/0x260 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ...... </TASK> The warning is due to the FUSE_NOTIFY_RESEND notify sent by the write() syscall in the reproducer program and it happens as follows: (1) calls fuse_dev_read() to read the INIT request The read succeeds. During the read, bit FR_SENT will be set on the request. (2) calls fuse_dev_write() to send an USE_NOTIFY_RESEND notify The resend notify will resend all processing requests, so the INIT request is moved from processing list to pending list again. (3) calls fuse_dev_read() with an invalid output address fuse_dev_read() will try to copy the same INIT request to the output address, but it will fail due to the invalid address, so the INIT request is ended and triggers the warning in fuse_request_end(). Fix it by clearing FR_SENT when re-adding requests into pending list. Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/linux-fsdevel/58f13e47-4765-fce4-daf4-dffcc5ae2330@huaweicloud.com/T/#m091614e5ea2af403b259e7cea6a49e51b9ee07a7 Fixes: 760eac73f9f6 ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12fuse: set FR_PENDING atomically in fuse_resend()Hou Tao1-1/+1
[ Upstream commit 42815f8ac54c5113bf450ec4b7ccc5b62af0f6a7 ] When fuse_resend() moves the requests from processing lists to pending list, it uses __set_bit() to set FR_PENDING bit in req->flags. Using __set_bit() is not safe, because other functions may update req->flags concurrently (e.g., request_wait_answer() may call set_bit(FR_INTERRUPTED, &flags)). Fix it by using set_bit() instead. Fixes: 760eac73f9f6 ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: compress: don't allow unaligned truncation on released compress inodeChao Yu1-3/+8
[ Upstream commit 29ed2b5dd521ce7c5d8466cd70bf0cc9d07afeee ] f2fs image may be corrupted after below testcase: - mkfs.f2fs -O extra_attr,compression -f /dev/vdb - mount /dev/vdb /mnt/f2fs - touch /mnt/f2fs/file - f2fs_io setflags compression /mnt/f2fs/file - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=4 - f2fs_io release_cblocks /mnt/f2fs/file - truncate -s 8192 /mnt/f2fs/file - umount /mnt/f2fs - fsck.f2fs /dev/vdb [ASSERT] (fsck_chk_inode_blk:1256) --> ino: 0x5 has i_blocks: 0x00000002, but has 0x3 blocks [FSCK] valid_block_count matching with CP [Fail] [0x4, 0x5] [FSCK] other corrupted bugs [Fail] The reason is: partial truncation assume compressed inode has reserved blocks, after partial truncation, valid block count may change w/o .i_blocks and .total_valid_block_count update, result in corruption. This patch only allow cluster size aligned truncation on released compress inode for fixing. Fixes: c61404153eb6 ("f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: fix to release node block count in error path of f2fs_new_node_page()Chao Yu1-1/+1
[ Upstream commit 0fa4e57c1db263effd72d2149d4e21da0055c316 ] It missed to call dec_valid_node_count() to release node block count in error path, fix it. Fixes: 141170b759e0 ("f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page()") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem ↵Chao Yu1-0/+10
lock [ Upstream commit 0a4ed2d97cb6d044196cc3e726b6699222b41019 ] It needs to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock to avoid racing with checkpoint, otherwise, filesystem metadata including blkaddr in dnode, inode fields and .total_valid_block_count may be corrupted after SPO case. Fixes: ef8d563f184e ("f2fs: introduce F2FS_IOC_RELEASE_COMPRESS_BLOCKS") Fixes: c75488fb4d82 ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: compress: fix error path of inc_valid_block_count()Chao Yu1-7/+8
[ Upstream commit 043c832371cd9023fbd725138ddc6c7f288dc469 ] If inc_valid_block_count() can not allocate all requested blocks, it needs to release block count in .total_valid_block_count and resevation blocks in inode. Fixes: 54607494875e ("f2fs: compress: fix to avoid inconsistence bewteen i_blocks and dnode") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: compress: fix to update i_compr_blocks correctlyChao Yu1-7/+14
[ Upstream commit 186e7d71534df4589405925caca5597af7626c12 ] Previously, we account reserved blocks and compressed blocks into @compr_blocks, then, f2fs_i_compr_blocks_update(,compr_blocks) will update i_compr_blocks incorrectly, fix it. Meanwhile, for the case all blocks in cluster were reserved, fix to update dn->ofs_in_node correctly. Fixes: eb8fbaa53374 ("f2fs: compress: fix to check unreleased compressed cluster") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: fix block migration when section is not aligned to pow2Wu Bo1-9/+8
[ Upstream commit aa4074e8fec4d2e686daee627fcafb3503efe365 ] As for zoned-UFS, f2fs section size is forced to zone size. And zone size may not aligned to pow2. Fixes: 859fca6b706e ("f2fs: swap: support migrating swapfile in aligned write mode") Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12ovl: remove upper umask handling from ovl_create_upper()Miklos Szeredi1-3/+0
[ Upstream commit 096802748ea1dea8b476938e0a8dc16f4bd2f1ad ] This is already done by vfs_prepare_mode() when creating the upper object by vfs_create(), vfs_mkdir() and vfs_mknod(). No regressions have been observed in xfstests run with posix acls turned off for the upper filesystem. Fixes: 1639a49ccdce ("fs: move S_ISGID stripping into the vfs_*() helpers") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12udf: Convert udf_expand_file_adinicb() to use a folioMatthew Wilcox (Oracle)1-13/+14
[ Upstream commit db6754090a4f99c67e05ae6b87343ba6e013531f ] Use the folio APIs throughout this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Fixes: 1eeceaec794e ("udf: Convert udf_expand_file_adinicb() to avoid kmap_atomic()") Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240417150416.752929-4-willy@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: write missing last sum blk of file pinning sectionDaeho Jeong1-0/+2
[ Upstream commit b084403cfc3295b59a1b6bcc94efaf870fc3c2c9 ] While do not allocating a new section in advance for file pinning area, I missed that we should write the sum block for the last segment of a file pinning section. Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: fix to check pinfile flag in f2fs_move_file_range()Chao Yu1-1/+2
[ Upstream commit e07230da0500e0919a765037c5e81583b519be2c ] ioctl(F2FS_IOC_MOVE_RANGE) can truncate or punch hole on pinned file, fix to disallow it. Fixes: 5fed0be8583f ("f2fs: do not allow partial truncation on pinned file") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: fix to relocate check condition in f2fs_fallocate()Chao Yu1-9/+11
[ Upstream commit 278a6253a673611dbc8ab72a3b34b151a8e75822 ] compress and pinfile flag should be checked after inode lock held to avoid race condition, fix it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Fixes: 5fed0be8583f ("f2fs: do not allow partial truncation on pinned file") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: compress: fix to relocate check condition in f2fs_ioc_{,de}compress_file()Chao Yu1-8/+4
[ Upstream commit bd9ae4ae9e585061acfd4a169f2321706f900246 ] Compress flag should be checked after inode lock held to avoid racing w/ f2fs_setflags_common() , fix it. Fixes: 5fdb322ff2c2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/CAHJ8P3LdZXLc2rqeYjvymgYHr2+YLuJ0sLG9DdsJZmwO7deuhw@mail.gmail.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: compress: fix to relocate check condition in ↵Chao Yu1-8/+4
f2fs_{release,reserve}_compress_blocks() [ Upstream commit 7c5dffb3d90c5921b91981cc663e02757d90526e ] Compress flag should be checked after inode lock held to avoid racing w/ f2fs_setflags_common(), fix it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Closes: https://lore.kernel.org/linux-f2fs-devel/CAHJ8P3LdZXLc2rqeYjvymgYHr2+YLuJ0sLG9DdsJZmwO7deuhw@mail.gmail.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: fix to wait on page writeback in __clone_blkaddrs()Chao Yu1-0/+3
[ Upstream commit d3876e34e7e789e2cbdd782360fef2a777391082 ] In below race condition, dst page may become writeback status in __clone_blkaddrs(), it needs to wait writeback before update, fix it. Thread A GC Thread - f2fs_move_file_range - filemap_write_and_wait_range(dst) - gc_data_segment - f2fs_down_write(dst) - move_data_page - set_page_writeback(dst_page) - f2fs_submit_page_write - f2fs_up_write(dst) - f2fs_down_write(dst) - __exchange_data_block - __clone_blkaddrs - f2fs_get_new_data_page - memcpy_page Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-12f2fs: multidev: fix to recognize valid zero block addressChao Yu1-1/+1
[ Upstream commit 33e62cd7b4c281cd737c62e5d8c4f0e602a8c5c5 ] As reported by Yi Zhang in mailing list [1], kernel warning was catched during zbd/010 test as below: ./check zbd/010 zbd/010 (test gap zone support with F2FS) [failed] runtime ... 3.752s something found in dmesg: [ 4378.146781] run blktests zbd/010 at 2024-02-18 11:31:13 [ 4378.192349] null_blk: module loaded [ 4378.209860] null_blk: disk nullb0 created [ 4378.413285] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1) [ 4378.422334] scsi host15: scsi_debug: version 0191 [20210520] dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0 [ 4378.434922] scsi 15:0:0:0: Direct-Access-ZBC Linux scsi_debug 0191 PQ: 0 ANSI: 7 [ 4378.443343] scsi 15:0:0:0: Power-on or device reset occurred [ 4378.449371] sd 15:0:0:0: Attached scsi generic sg5 type 20 [ 4378.449418] sd 15:0:0:0: [sdf] Host-managed zoned block device ... (See '/mnt/tests/gitlab.com/api/v4/projects/19168116/repository/archive.zip/storage/blktests/blk/blktests/results/nodev/zbd/010.dmesg' WARNING: CPU: 22 PID: 44011 at fs/iomap/iter.c:51 CPU: 22 PID: 44011 Comm: fio Not tainted 6.8.0-rc3+ #1 RIP: 0010:iomap_iter+0x32b/0x350 Call Trace: <TASK> __iomap_dio_rw+0x1df/0x830 f2fs_file_read_iter+0x156/0x3d0 [f2fs] aio_read+0x138/0x210 io_submit_one+0x188/0x8c0 __x64_sys_io_submit+0x8c/0x1a0 do_syscall_64+0x86/0x170 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Shinichiro Kawasaki helps to analyse this issue and proposes a potential fixing patch in [2]. Quoted from reply of Shinichiro Kawasaki: "I confirmed that the trigger commit is dbf8e63f48af as Yi reported. I took a look in the commit, but it looks fine to me. So I thought the cause is not in the commit diff. I found the WARN is printed when the f2fs is set up with multiple devices, and read requests are mapped to the very first block of the second device in the direct read path. In this case, f2fs_map_blocks() and f2fs_map_blocks_cached() modify map->m_pblk as the physical block address from each block device. It becomes zero when it is mapped to the first block of the device. However, f2fs_iomap_begin() assumes that map->m_pblk is the physical block address of the whole f2fs, across the all block devices. It compares map->m_pblk against NULL_ADDR == 0, then go into the unexpected branch and sets the invalid iomap->length. The WARN catches the invalid iomap->length. This WARN is printed even for non-zoned block devices, by following steps. - Create two (non-zoned) null_blk devices memory backed with 128MB size each: nullb0 and nullb1. # mkfs.f2fs /dev/nullb0 -c /dev/nullb1 # mount -t f2fs /dev/nullb0 "${mount_dir}" # dd if=/dev/zero of="${mount_dir}/test.dat" bs=1M count=192 # dd if="${mount_dir}/test.dat" of=/dev/null bs=1M count=192 iflag=direct ..." So, the root cause of this issue is: when multi-devices feature is on, f2fs_map_blocks() may return zero blkaddr in non-primary device, which is a verified valid block address, however, f2fs_iomap_begin() treats it as an invalid block address, and then it triggers the warning in iomap framework code. Finally, as discussed, we decide to use a more simple and direct way that checking (map.m_flags & F2FS_MAP_MAPPED) condition instead of (map.m_pblk != NULL_ADDR) to fix this issue. Thanks a lot for the effort of Yi Zhang and Shinichiro Kawasaki on this issue. [1] https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/ [2] https://lore.kernel.org/linux-f2fs-devel/gngdj77k4picagsfdtiaa7gpgnup6fsgwzsltx6milmhegmjff@iax2n4wvrqye/ Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/ Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Fixes: 1517c1a7a445 ("f2fs: implement iomap operations") Fixes: 8d3c1fa3fa5e ("f2fs: don't rely on F2FS_MAP_* in f2fs_iomap_begin") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30ext4: remove the redundant folio_wait_stable()Zhang Yi1-3/+0
[ Upstream commit df0b5afc62f3368d657a8fe4a8d393ac481474c2 ] __filemap_get_folio() with FGP_WRITEBEGIN parameter has already wait for stable folio, so remove the redundant folio_wait_stable() in ext4_da_write_begin(), it was left over from the commit cc883236b792 ("ext4: drop unnecessary journal handle in delalloc write") that removed the retry getting page logic. Fixes: cc883236b792 ("ext4: drop unnecessary journal handle in delalloc write") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240419023005.2719050-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30ext4: fix potential unnitialized variableDan Carpenter1-0/+1
[ Upstream commit 3f4830abd236d0428e50451e1ecb62e14c365e9b ] Smatch complains "err" can be uninitialized in the caller. fs/ext4/indirect.c:349 ext4_alloc_branch() error: uninitialized symbol 'err'. Set the error to zero on the success path. Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/363a4673-0fb8-4adf-b4fb-90a499077276@moroto.mountain Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30nfsd: don't create nfsv4recoverydir in nfsdfs when not used.NeilBrown1-2/+2
[ Upstream commit 0770249b90f9d9f69714b76adc36cf6c895bc1f9 ] When CONFIG_NFSD_LEGACY_CLIENT_TRACKING is not set, the virtual file /proc/fs/nfsd/nfsv4recoverydir is created but responds EINVAL to any access. This is not useful, is somewhat surprising, and it causes ltp to complain. The only known user of this file is in nfs-utils, which handles non-existence and read-failure equally well. So there is nothing to gain from leaving the file present but inaccessible. So this patch removes the file when its content is not available - i.e. when that config option is not selected. Also remove the #ifdef which hides some of the enum values when CONFIG_NFSD_V$ not selection. simple_fill_super() quietly ignores array entries that are not present, so having slots in the array that don't get used is perfectly acceptable. So there is no value in this #ifdef. Reported-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30ext4: avoid excessive credit estimate in ext4_tmpfile()Jan Kara1-1/+1
[ Upstream commit 35a1f12f0ca857fee1d7a04ef52cbd5f1f84de13 ] A user with minimum journal size (1024 blocks these days) complained about the following error triggered by generic/697 test in ext4_tmpfile(): run fstests generic/697 at 2024-02-28 05:34:46 JBD2: vfstest wants too many credits credits:260 rsv_credits:0 max:256 EXT4-fs error (device loop0) in __ext4_new_inode:1083: error 28 Indeed the credit estimate in ext4_tmpfile() is huge. EXT4_MAXQUOTAS_INIT_BLOCKS() is 219, then 10 credits from ext4_tmpfile() itself and then ext4_xattr_credits_for_new_inode() adds more credits needed for security attributes and ACLs. Now the EXT4_MAXQUOTAS_INIT_BLOCKS() is in fact unnecessary because we've already initialized quotas with dquot_init() shortly before and so EXT4_MAXQUOTAS_TRANS_BLOCKS() is enough (which boils down to 3 credits). Fixes: af51a2ac36d1 ("ext4: ->tmpfile() support") Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Luis Henriques <lhenriques@suse.de> Tested-by: Disha Goel <disgoel@linux.ibm.com> Link: https://lore.kernel.org/r/20240307115320.28949-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30mm/ksm: fix ksm exec support for prctlJinjiang Tu1-0/+11
[ Upstream commit 3a9e567ca45fb5280065283d10d9a11f0db61d2b ] Patch series "mm/ksm: fix ksm exec support for prctl", v4. commit 3c6f33b7273a ("mm/ksm: support fork/exec for prctl") inherits MMF_VM_MERGE_ANY flag when a task calls execve(). However, it doesn't create the mm_slot, so ksmd will not try to scan this task. The first patch fixes the issue. The second patch refactors to prepare for the third patch. The third patch extends the selftests of ksm to verfity the deduplication really happens after fork/exec inherits ths KSM setting. This patch (of 3): commit 3c6f33b7273a ("mm/ksm: support fork/exec for prctl") inherits MMF_VM_MERGE_ANY flag when a task calls execve(). Howerver, it doesn't create the mm_slot, so ksmd will not try to scan this task. To fix it, allocate and add the mm_slot to ksm_mm_head in __bprm_mm_init() when the mm has MMF_VM_MERGE_ANY flag. Link: https://lkml.kernel.org/r/20240328111010.1502191-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20240328111010.1502191-2-tujinjiang@huawei.com Fixes: 3c6f33b7273a ("mm/ksm: support fork/exec for prctl") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Rik van Riel <riel@surriel.com> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30btrfs: set start on clone before calling copy_extent_buffer_fullJosef Bacik1-2/+8
[ Upstream commit 53e24158684b527d013b5b2204ccb34d1f94c248 ] Our subpage testing started hanging on generic/560 and I bisected it down to 1cab1375ba6d ("btrfs: reuse cloned extent buffer during fiemap to avoid re-allocations"). This is subtle because we use eb->start to figure out where in the folio we're copying to when we're subpage, as our ->start may refer to an area inside of the folio. For example, assume a 16K page size machine with a 4K node size, and assume that we already have a cloned extent buffer when we cloned the previous search. copy_extent_buffer_full() will do the following when copying the extent buffer path->nodes[0] (src) into cloned (dest): src->start = 8k; // this is the new leaf we're cloning cloned->start = 4k; // this is left over from the previous clone src_addr = folio_address(src->folios[0]); dest_addr = folio_address(dest->folios[0]); memcpy(dest_addr + get_eb_offset_in_folio(dst, 0), src_addr + get_eb_offset_in_folio(src, 0), src->len); Now get_eb_offset_in_folio() is where the problems occur, because for sub-pagesize blocksize we can have multiple eb's per folio, the code for this is as follows size_t get_eb_offset_in_folio(eb, offset) { return (eb->start + offset & (folio_size(eb->folio[0]) - 1)); } So in the above example we are copying into offset 4K inside the folio. However once we update cloned->start to 8K to match the src the math for get_eb_offset_in_folio() changes, and any subsequent reads (i.e. btrfs_item_key_to_cpu()) will start reading from the offset 8K instead of 4K where we copied to, giving us garbage. Fix this by setting start before we co copy_extent_buffer_full() to make sure that we're copying into the same offset inside of the folio that we will read from later. All other sites of copy_extent_buffer_full() are correct because we either set ->start beforehand or we simply don't change it in the case of the tree-log usage. With this fix we now pass generic/560 on our subpage tests. Fixes: 1cab1375ba6d ("btrfs: reuse cloned extent buffer during fiemap to avoid re-allocations") Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30gfs2: do_xmote fixesAndreas Gruenbacher1-19/+25
[ Upstream commit 9947a06d29c0a30da88cdc6376ca5fd87083e130 ] Function do_xmote() is called with the glock spinlock held. Commit 86934198eefa added a 'goto skip_inval' statement at the beginning of the function to further below where the glock spinlock is expected not to be held anymore. Then it added code there that requires the glock spinlock to be held. This doesn't make sense; fix this up by dropping and retaking the spinlock where needed. In addition, when ->lm_lock() returned an error, do_xmote() didn't fail the locking operation, and simply left the glock hanging; fix that as well. (This is a much older error.) Fixes: 86934198eefa ("gfs2: Clear flags when withdraw prevents xmote") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30gfs2: finish_xmote cleanupAndreas Gruenbacher1-8/+13
[ Upstream commit 1cd28e15864054f3c48baee9eecda1c0441c48ac ] Currently, function finish_xmote() takes and releases the glock spinlock. However, all of its callers immediately take that spinlock again, so it makes more sense to take the spin lock before calling finish_xmote() already. With that, thaw_glock() is the only place that sets the GLF_HAVE_REPLY flag outside of the glock spinlock, but it also takes that spinlock immediately thereafter. Change that to set the bit when the spinlock is already held. This allows to switch from test_and_clear_bit() to test_bit() and clear_bit() in glock_work_func(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Stable-dep-of: 9947a06d29c0 ("gfs2: do_xmote fixes") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30gfs2: Fix potential glock use-after-free on unmountAndreas Gruenbacher6-16/+57
[ Upstream commit d98779e687726d8f8860f1c54b5687eec5f63a73 ] When a DLM lockspace is released and there ares still locks in that lockspace, DLM will unlock those locks automatically. Commit fb6791d100d1b started exploiting this behavior to speed up filesystem unmount: gfs2 would simply free glocks it didn't want to unlock and then release the lockspace. This didn't take the bast callbacks for asynchronous lock contention notifications into account, which remain active until until a lock is unlocked or its lockspace is released. To prevent those callbacks from accessing deallocated objects, put the glocks that should not be unlocked on the sd_dead_glocks list, release the lockspace, and only then free those glocks. As an additional measure, ignore unexpected ast and bast callbacks if the receiving glock is dead. Fixes: fb6791d100d1b ("GFS2: skip dlm_unlock calls in unmount") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30gfs2: Remove ill-placed consistency checkAndreas Gruenbacher1-1/+0
[ Upstream commit 59f60005797b4018d7b46620037e0c53d690795e ] This consistency check was originally added by commit 9287c6452d2b1 ("gfs2: Fix occasional glock use-after-free"). It is ill-placed in gfs2_glock_free() because if it holds there, it must equally hold in __gfs2_glock_put() already. Either way, the check doesn't seem necessary anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Stable-dep-of: d98779e68772 ("gfs2: Fix potential glock use-after-free on unmount") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30gfs2: Fix "ignore unlock failures after withdraw"Andreas Gruenbacher2-2/+3
[ Upstream commit 5d9231111966b6c5a65016d58dcbeab91055bc91 ] Commit 3e11e53041502 tries to suppress dlm_lock() lock conversion errors that occur when the lockspace has already been released. It does that by setting and checking the SDF_SKIP_DLM_UNLOCK flag. This conflicts with the intended meaning of the SDF_SKIP_DLM_UNLOCK flag, so check whether the lockspace is still allocated instead. (Given the current DLM API, checking for this kind of error after the fact seems easier that than to make sure that the lockspace is still allocated before calling dlm_lock(). Changing the DLM API so that users maintain the lockspace references themselves would be an option.) Fixes: 3e11e53041502 ("GFS2: ignore unlock failures after withdraw") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30gfs2: Don't forget to complete delayed withdrawAndreas Gruenbacher1-0/+3
[ Upstream commit b01189333ee91c1ae6cd96dfd1e3a3c2e69202f0 ] Commit fffe9bee14b0 ("gfs2: Delay withdraw from atomic context") switched from gfs2_withdraw() to gfs2_withdraw_delayed() in gfs2_ail_error(), but failed to then check if a delayed withdraw had occurred. Fix that by adding the missing check in __gfs2_ail_flush(), where the spin locks are already dropped and a withdraw is possible. Fixes: fffe9bee14b0 ("gfs2: Delay withdraw from atomic context") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30dlm: fix user space lock decision to copy lvbAlexander Aring3-13/+17
[ Upstream commit ad191e0eeebf64a60ca2d16ca01a223d2b1dd25e ] This patch fixes the copy lvb decision for user space lock requests. Checking dlm_lvb_operations is done earlier, where granted/requested lock modes are available to use in the matrix. The decision had been moved to the wrong location, where granted mode and requested mode where the same, which causes the dlm_lvb_operations matix to produce the wrong copy decision. For PW or EX requests, the caller could get invalid lvb data. Fixes: 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30shmem: Fix shmem_rename2()Chuck Lever1-0/+9
[ Upstream commit ad191eb6d6942bb835a0b20b647f7c53c1d99ca4 ] When renaming onto an existing directory entry, user space expects the replacement entry to have the same directory offset as the original one. Link: https://gitlab.alpinelinux.org/alpine/aports/-/issues/15966 Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20240415152057.4605-4-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30libfs: Add simple_offset_rename() APIChuck Lever1-0/+21
[ Upstream commit 5a1a25be995e1014abd01600479915683e356f5c ] I'm about to fix a tmpfs rename bug that requires the use of internal simple_offset helpers that are not available in mm/shmem.c Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20240415152057.4605-3-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: ad191eb6d694 ("shmem: Fix shmem_rename2()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-05-30libfs: Fix simple_offset_rename_exchange()Chuck Lever1-6/+19
[ Upstream commit 23cdd0eed3f1fff3af323092b0b88945a7950d8e ] User space expects the replacement (old) directory entry to have the same directory offset after the rename. Suggested-by: Christian Brauner <brauner@kernel.org> Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20240415152057.4605-2-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>