summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-04-14Merge tag 'pull-sysfs-annotation-fix' of ↵Linus Torvalds1-1/+8
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull sysfs fix from Al Viro: "Get rid of lockdep false positives around sysfs/overlayfs syzbot has uncovered a class of lockdep false positives for setups with sysfs being one of the backing layers in overlayfs. The root cause is that of->mutex allocated when opening a sysfs file read-only (which overlayfs might do) is confused with of->mutex of a file opened writable (held in write to sysfs file, which overlayfs won't do). Assigning them separate lockdep classes fixes that bunch and it's obviously safe" * tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kernfs: annotate different lockdep class for of->mutex of writable files
2024-04-14kernfs: annotate different lockdep class for of->mutex of writable filesAmir Goldstein1-1/+8
The writable file /sys/power/resume may call vfs lookup helpers for arbitrary paths and readonly files can be read by overlayfs from vfs helpers when sysfs is a lower layer of overalyfs. To avoid a lockdep warning of circular dependency between overlayfs inode lock and kernfs of->mutex, use a different lockdep class for writable and readonly kernfs files. Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-13Merge tag 'zonefs-6.9-rc4' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: - Suppress a coccicheck warning using str_plural() * tag 'zonefs-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Use str_plural() to fix Coccinelle warning
2024-04-13Merge tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds9-42/+103
Pull smb client fixes from Steve French: - fix for oops in cifs_get_fattr of deleted files - fix for the remote open counter going negative in some directory lease cases - fix for mkfifo to instantiate dentry to avoid possible crash - important fix to allow handling key rotation for mount and remount (ie cases that are becoming more common when password that was used for the mount will expire soon but will be replaced by new password) * tag 'v6.9-rc3-SMB3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix broken reconnect when password changing on the server by allowing password rotation smb: client: instantiate when creating SFU files smb3: fix Open files on server counter going negative smb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()
2024-04-12Merge tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds4-10/+10
Pull ceph fixes from Ilya Dryomov: "Two CephFS fixes marked for stable and a MAINTAINERS update" * tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client: MAINTAINERS: remove myself as a Reviewer for Ceph ceph: switch to use cap_delay_lock for the unlink delay list ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
2024-04-12Merge tag 'trace-v6.9-rc3' of ↵Linus Torvalds1-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix the buffer_percent accounting as it is dependent on three variables: 1) pages_read - number of subbuffers read 2) pages_lost - number of subbuffers lost due to overwrite 3) pages_touched - number of pages that a writer entered These three counters only increment, and to know how many active pages there are on the buffer at any given time, the pages_read and pages_lost are subtracted from pages_touched. But the pages touched was incremented whenever any writer went to the next subbuffer even if it wasn't the only one, so it was incremented more than it should be causing the counter for how many subbuffers currently have content incorrect, which caused the buffer_percent that holds waiters until the ring buffer is filled to a given percentage to wake up early. - Fix warning of unused functions when PERF_EVENTS is not configured in - Replace bad tab with space in Kconfig for FTRACE_RECORD_RECURSION_SIZE - Fix to some kerneldoc function comments in eventfs code. * tag 'trace-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Only update pages_touched when a new page is touched tracing: hide unused ftrace_event_id_fops tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry eventfs: Fix kernel-doc comments to functions
2024-04-12eventfs: Fix kernel-doc comments to functionsYang Li1-4/+10
This commit fix kernel-doc style comments with complete parameter descriptions for the lookup_file(),lookup_dir_entry() and lookup_file_dentry(). Link: https://lore.kernel.org/linux-trace-kernel/20240322062604.28862-1-yang.lee@linux.alibaba.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-04-12smb3: fix broken reconnect when password changing on the server by allowing ↵Steve French6-0/+44
password rotation There are various use cases that are becoming more common in which password changes are scheduled on a server(s) periodically but the clients connected to this server need to stay connected (even in the face of brief network reconnects) due to mounts which can not be easily unmounted and mounted at will, and servers that do password rotation do not always have the ability to tell the clients exactly when to the new password will be effective, so add support for an alt password ("password2=") on mount (and also remount) so that we can anticipate the upcoming change to the server without risking breaking existing mounts. An alternative would have been to use the kernel keyring for this but the processes doing the reconnect do not have access to the keyring but do have access to the ses structure. Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-12smb: client: instantiate when creating SFU filesPaulo Alcantara1-39/+55
In cifs_sfu_make_node(), on success, instantiate rather than leave it with dentry unhashed negative to support callers that expect mknod(2) to always instantiate. This fixes the following test case: mount.cifs //srv/share /mnt -o ...,sfu mkfifo /mnt/fifo ./xfstests/ltp/growfiles -b -W test -e 1 -u -i 0 -L 30 /mnt/fifo ... BUG: unable to handle page fault for address: 000000034cec4e58 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 1 PREEMPT SMP PTI CPU: 0 PID: 138098 Comm: growfiles Kdump: loaded Not tainted 5.14.0-436.3987_1240945149.el9.x86_64 #1 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:_raw_callee_save__kvm_vcpu_is_preempted+0x0/0x20 Code: e8 15 d9 61 00 e9 63 ff ff ff 41 bd ea ff ff ff e9 58 ff ff ff e8 d0 71 c0 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <48> 8b 04 fd 60 2b c1 99 80 b8 90 50 03 00 00 0f 95 c0 c3 cc cc cc RSP: 0018:ffffb6a143cf7cf8 EFLAGS: 00010206 RAX: ffff8a9bc30fb038 RBX: ffff8a9bc666a200 RCX: ffff8a9cc0260000 RDX: 00000000736f622e RSI: ffff8a9bc30fb038 RDI: 000000007665645f RBP: ffffb6a143cf7d70 R08: 0000000000001000 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8a9bc666a200 R13: 0000559a302a12b0 R14: 0000000000001000 R15: 0000000000000000 FS: 00007fbed1dbb740(0000) GS:ffff8a9cf0000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000034cec4e58 CR3: 0000000128ec6006 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __mutex_lock.constprop.0+0x5f7/0x6a0 ? __die_body.cold+0x8/0xd ? page_fault_oops+0x134/0x170 ? exc_page_fault+0x62/0x150 ? asm_exc_page_fault+0x22/0x30 ? _pfx_raw_callee_save__kvm_vcpu_is_preempted+0x10/0x10 __mutex_lock.constprop.0+0x5f7/0x6a0 ? __mod_memcg_lruvec_state+0x84/0xd0 pipe_write+0x47/0x650 ? do_anonymous_page+0x258/0x410 ? inode_security+0x22/0x60 ? selinux_file_permission+0x108/0x150 vfs_write+0x2cb/0x410 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0xf0 ? syscall_exit_to_user_mode+0x22/0x40 ? do_syscall_64+0x6b/0xf0 ? sched_clock_cpu+0x9/0xc0 ? exc_page_fault+0x62/0x150 entry_SYSCALL_64_after_hwframe+0x6e/0x76 Cc: stable@vger.kernel.org Fixes: 72bc63f5e23a ("smb3: fix creating FIFOs when mounting with "sfu" mount option") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-12smb3: fix Open files on server counter going negativeSteve French1-2/+2
We were decrementing the count of open files on server twice for the case where we were closing cached directories. Fixes: 8e843bf38f7b ("cifs: return a single-use cfid if we did not get a lease") Cc: stable@vger.kernel.org Acked-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-11ceph: switch to use cap_delay_lock for the unlink delay listXiubo Li3-9/+7
The same list item will be used in both cap_delay_list and cap_unlink_delay_list, so it's buggy to use two different locks to protect them. Cc: stable@vger.kernel.org Fixes: dbc347ef7f0c ("ceph: add ceph_cap_unlink_work to fire check_caps() immediately") Link: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AODC76VXRAMXKLFDCTK4TKFDDPWUSCN5 Reported-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de> Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-04-11Merge tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefsLinus Torvalds24-238/+359
Pull more bcachefs fixes from Kent Overstreet: "Notable user impacting bugs - On multi device filesystems, recovery was looping in btree_trans_too_many_iters(). This checks if a transaction has touched too many btree paths (because of iteration over many keys), and isuses a restart to drop unneeded paths. But it's now possible for some paths to exceed the previous limit without iteration in the interior btree update path, since the transaction commit will do alloc updates for every old and new btree node, and during journal replay we don't use the btree write buffer for locking reasons and thus those updates use btree paths when they wouldn't normally. - Fix a corner case in rebalance when moving extents on a durability=0 device. This wouldn't be hit when a device was formatted with durability=0 since in that case we'll only use it as a write through cache (only cached extents will live on it), but durability can now be changed on an existing device. - bch2_get_acl() could rarely forget to handle a transaction restart; this manifested as the occasional missing acl that came back after dropping caches. - Fix a major performance regression on high iops multithreaded write workloads (only since 6.9-rc1); a previous fix for a deadlock in the interior btree update path to check the journal watermark introduced a dependency on the state of btree write buffer flushing that we didn't want. - Assorted other repair paths and recovery fixes" * tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs: (25 commits) bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter() bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail() bcachefs: Fix a race in btree_update_nodes_written() bcachefs: btree_node_scan: Respect member.data_allowed bcachefs: Don't scan for btree nodes when we can reconstruct bcachefs: Fix check_topology() when using node scan bcachefs: fix eytzinger0_find_gt() bcachefs: fix bch2_get_acl() transaction restart handling bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu list bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take() bcachefs: Rename struct field swap to prevent macro naming collision MAINTAINERS: Add entry for bcachefs documentation Documentation: filesystems: Add bcachefs toctree bcachefs: JOURNAL_SPACE_LOW bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINE bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystems bcachefs: fix rand_delete unit test bcachefs: fix ! vs ~ typo in __clear_bit_le64() bcachefs: Fix rebalance from durability=0 device bcachefs: Print shutdown journal sequence number ...
2024-04-11ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATENeilBrown1-1/+3
The page has been marked clean before writepage is called. If we don't redirty it before postponing the write, it might never get written. Cc: stable@vger.kernel.org Fixes: 503d4fa6ee28 ("ceph: remove reliance on bdi congestion") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-04-11Merge tag 'bootconfig-fixes-v6.9-rc3' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull bootconfig fixes from Masami Hiramatsu: - show the original cmdline only once, and only if it was modeified by bootconfig * tag 'bootconfig-fixes-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: fs/proc: Skip bootloader comment if no embedded kernel parameters fs/proc: remove redundant comments from /proc/bootconfig
2024-04-11bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()Kent Overstreet1-5/+7
We weren't respecting trans->journal_replay_not_finished - we shouldn't be searching the journal keys unless we have a ref on them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()Kent Overstreet1-27/+1
dropping read locks in bch2_btree_node_lock_write_nofail() dates from before we had the cycle detector; we can now tell the cycle detector directly when taking a lock may not fail because we can't handle transaction restarts. This is needed for adding should_be_locked asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11bcachefs: Fix a race in btree_update_nodes_written()Kent Overstreet1-3/+7
One btree update might have terminated in a node update, and then while it is in flight another btree update might free that original node. This race has to be handled in btree_update_nodes_written() - we were missing a READ_ONCE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-11smb: client: fix NULL ptr deref in cifs_mark_open_handles_for_deleted_file()Paulo Alcantara1-1/+2
cifs_get_fattr() may be called with a NULL inode, so check for a non-NULL inode before calling cifs_mark_open_handles_for_deleted_file(). This fixes the following oops: mount.cifs //srv/share /mnt -o ...,vers=3.1.1 cd /mnt touch foo; tail -f foo & rm foo cat foo BUG: kernel NULL pointer dereference, address: 00000000000005c0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 696 Comm: cat Not tainted 6.9.0-rc2 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 RIP: 0010:__lock_acquire+0x5d/0x1c70 Code: 00 00 44 8b a4 24 a0 00 00 00 45 85 f6 0f 84 bb 06 00 00 8b 2d 48 e2 95 01 45 89 c3 41 89 d2 45 89 c8 85 ed 0 0 <48> 81 3f 40 7a 76 83 44 0f 44 d8 83 fe 01 0f 86 1b 03 00 00 31 d2 RSP: 0018:ffffc90000b37490 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff888110021ec0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000005c0 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000200 FS: 00007f2a1fa08740(0000) GS:ffff888157a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000005c0 CR3: 000000011ac7c000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x180/0x490 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x70/0x230 ? asm_exc_page_fault+0x26/0x30 ? __lock_acquire+0x5d/0x1c70 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 lock_acquire+0xc0/0x2d0 ? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? kmem_cache_alloc+0x2d9/0x370 _raw_spin_lock+0x34/0x80 ? cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs] cifs_mark_open_handles_for_deleted_file+0x3a/0x100 [cifs] cifs_get_fattr+0x24c/0x940 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 cifs_get_inode_info+0x96/0x120 [cifs] cifs_lookup+0x16e/0x800 [cifs] cifs_atomic_open+0xc7/0x5d0 [cifs] ? lookup_open.isra.0+0x3ce/0x5f0 ? __pfx_cifs_atomic_open+0x10/0x10 [cifs] lookup_open.isra.0+0x3ce/0x5f0 path_openat+0x42b/0xc30 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 do_filp_open+0xc4/0x170 do_sys_openat2+0xab/0xe0 __x64_sys_openat+0x57/0xa0 do_syscall_64+0xc1/0x1e0 entry_SYSCALL_64_after_hwframe+0x72/0x7a Fixes: ffceb7640cbf ("smb: client: do not defer close open handles to deleted files") Reviewed-by: Meetakshi Setiya <msetiya@microsoft.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-04-10bcachefs: btree_node_scan: Respect member.data_allowedKent Overstreet1-0/+3
If a device wasn't used for btree nodes, no need to scan for them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-10zonefs: Use str_plural() to fix Coccinelle warningThorsten Blum1-1/+1
Fixes the following Coccinelle/coccicheck warning reported by string_choices.cocci: opportunity for str_plural(zgroup->g_nr_zones) Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2024-04-09fs/proc: Skip bootloader comment if no embedded kernel parametersMasami Hiramatsu1-1/+1
If the "bootconfig" kernel command-line argument was specified or if the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are no embedded kernel parameter, omit the "# Parameters from bootloader:" comment from the /proc/bootconfig file. This will cause automation to fall back to the /proc/cmdline file, which will be identical to the comment in this no-embedded-kernel-parameters case. Link: https://lore.kernel.org/all/20240409044358.1156477-2-paulmck@kernel.org/ Fixes: 8b8ce6c75430 ("fs/proc: remove redundant comments from /proc/bootconfig") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-04-09fs/proc: remove redundant comments from /proc/bootconfigZhenhua Huang1-6/+6
commit 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to /proc/bootconfig") adds bootloader argument comments into /proc/bootconfig. /proc/bootconfig shows boot_command_line[] multiple times following every xbc key value pair, that's duplicated and not necessary. Remove redundant ones. Output before and after the fix is like: key1 = value1 *bootloader argument comments* key2 = value2 *bootloader argument comments* key3 = value3 *bootloader argument comments* ... key1 = value1 key2 = value2 key3 = value3 *bootloader argument comments* ... Link: https://lore.kernel.org/all/20240409044358.1156477-1-paulmck@kernel.org/ Fixes: 717c7c894d4b ("fs/proc: Add boot loader arguments as comment to /proc/bootconfig") Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: <linux-trace-kernel@vger.kernel.org> Cc: <linux-fsdevel@vger.kernel.org> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-04-09bcachefs: Don't scan for btree nodes when we can reconstructKent Overstreet4-18/+29
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09bcachefs: Fix check_topology() when using node scanKent Overstreet1-1/+1
shoot down journal keys _before_ populating journal keys with pointers to scanned nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-09bcachefs: fix eytzinger0_find_gt()Kent Overstreet1-6/+20
- fix return types: promoting from unsigned to ssize_t does not do what we want here, and was pointless since the rest of the eytzinger code is u32 - nr, not size Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-08Merge tag 'for-6.9-rc2-tag' of ↵Linus Torvalds7-33/+55
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Several fixes to qgroups that have been recently identified by test generic/475: - fix prealloc reserve leak in subvolume operations - various other fixes in reservation setup, conversion or cleanup" * tag 'for-6.9-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: always clear PERTRANS metadata during commit btrfs: make btrfs_clear_delalloc_extent() free delalloc reserve btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans btrfs: record delayed inode root in transaction btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations btrfs: qgroup: correctly model root qgroup rsv in convert
2024-04-08bcachefs: fix bch2_get_acl() transaction restart handlingKent Overstreet1-16/+14
bch2_acl_from_disk() uses allocate_dropping_locks, and can thus return a transaction restart - this wasn't handled. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu listHongbo Li1-0/+2
When allocating bkey_cached from bc->freed_pcpu list, it missed decreasing the count of nr_freed_pcpu which would cause the mismatch between the value of nr_freed_pcpu and the list items. This problem also exists in moving new bkey_cached to bc->freed_pcpu list. If these happened, the bug info may appear in bch2_fs_btree_key_cache_exit by the follow code: BUG_ON(list_count_nodes(&bc->freed_pcpu) != bc->nr_freed_pcpu); BUG_ON(list_count_nodes(&bc->freed_nonpcpu) != bc->nr_freed_nonpcpu); Fixes: c65c13f0eac6 ("bcachefs: Run btree key cache shrinker less aggressively") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()Kent Overstreet1-10/+45
Multiple bug fixes for journal iters: - When the journal keys gap buffer is resized, we have to adjust the iterators for moving the gap to the end - We don't want to rewind iterators to point to the key we just inserted if it's not for the correct btree/level Also, add some new assertions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07bcachefs: Rename struct field swap to prevent macro naming collisionThorsten Blum1-4/+4
The struct field swap can collide with the swap() macro defined in linux/minmax.h. Rename the struct field to prevent such collisions. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-06bcachefs: JOURNAL_SPACE_LOWKent Overstreet5-12/+19
"bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant performance regression (nearly 50%) on multithreaded random writes with fio. The reason is that the journal watermark checks multiple things, including the state of the btree write buffer, and on multithreaded update heavy workloads we're bottleneked on write buffer flushing - we don't want kicknig off btree updates to depend on the state of the write buffer. This isn't strictly correct; the interior btree update path does do write buffer updates, but it's a tiny fraction of total accounting updates and we're more concerned with space in the journal itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-06bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINEKent Overstreet1-0/+4
BCH_IOCTL_FSCK_OFFLINE allows the userspace fsck tool to use the kernel implementation of fsck - primarily when the kernel version is a better version match. It should look and act exactly like the normal userspace fsck that the user expected to be invoking, so errors should never result in a kernel panic. We may want to consider further restricting errors=panic - it's only intended for debugging in controlled test environments, it should have no purpose it normal usage. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-06bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystemsKent Overstreet1-44/+50
To open an encrypted filesystem, we use request_key() to get the encryption key from the user's keyring - but request_key() needs to happen in the context of the process that invoked the ioctl. This easily fixed by using bch2_fs_open() in nostart mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-06Merge tag 'nfsd-6.9-2' of ↵Linus Torvalds1-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Address a slow memory leak with RPC-over-TCP - Prevent another NFS4ERR_DELAY loop during CREATE_SESSION * tag 'nfsd-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: hold a lighter-weight client reference over CB_RECALL_ANY SUNRPC: Fix a slow server-side memory leak with RPC-over-TCP
2024-04-06Merge tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-2/+13
Pull xfs fix from Chandan Babu: - Allow creating new links to special files which were not associated with a project quota * tag 'xfs-6.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: allow cross-linking special files without project quota
2024-04-06Merge tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds22-178/+369
Pull smb client fixes from Steve French: - fix to retry close to avoid potential handle leaks when server returns EBUSY - DFS fixes including a fix for potential use after free - fscache fix - minor strncpy cleanup - reconnect race fix - deal with various possible UAF race conditions tearing sessions down * tag '6.9-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect() smb: client: fix potential UAF in smb2_is_network_name_deleted() smb: client: fix potential UAF in is_valid_oplock_break() smb: client: fix potential UAF in smb2_is_valid_oplock_break() smb: client: fix potential UAF in smb2_is_valid_lease_break() smb: client: fix potential UAF in cifs_stats_proc_show() smb: client: fix potential UAF in cifs_stats_proc_write() smb: client: fix potential UAF in cifs_dump_full_key() smb: client: fix potential UAF in cifs_debug_files_proc_show() smb3: retrying on failed server close smb: client: serialise cifs_construct_tcon() with cifs_mount_mutex smb: client: handle DFS tcons in cifs_construct_tcon() smb: client: refresh referral without acquiring refpath_lock smb: client: guarantee refcounted children from parent session cifs: Fix caching to try to do open O_WRONLY as rdwr on server smb: client: fix UAF in smb2_reconnect_server() smb: client: replace deprecated strncpy with strscpy
2024-04-05bcachefs: fix rand_delete unit testKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-05bcachefs: fix ! vs ~ typo in __clear_bit_le64()Dan Carpenter1-1/+1
The ! was obviously intended to be ~. As it is, this function does the equivalent to: "addr[bit / 64] = 0;". Fixes: 27fcec6c27ca ("bcachefs: Clear recovery_passes_required as they complete without errors") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-05nfsd: hold a lighter-weight client reference over CB_RECALL_ANYJeff Layton1-5/+2
Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the client. While a callback job is technically an RPC that counter is really more for client-driven RPCs, and this has the effect of preventing the client from being unhashed until the callback completes. If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we can end up in a situation where the callback can't complete on the (now dead) callback channel, but the new client can't connect because the old client can't be unhashed. This usually manifests as a NFS4ERR_DELAY return on the CREATE_SESSION operation. The job is only holding a reference to the client so it can clear a flag after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of reference when dealing with the nfsdfs info files, but it should work appropriately here to ensure that the nfs4_client doesn't disappear. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Vladimir Benes <vbenes@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-05Merge tag '6.9-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds5-8/+52
Pull smb server fixes from Steve French: "Three fixes, all also for stable: - encryption fix - memory overrun fix - oplock break fix" * tag '6.9-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: do not set SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1 ksmbd: validate payload size in ipc response ksmbd: don't send oplock break if rename fails
2024-04-05Merge tag 'vfs-6.9-rc3.fixes' of ↵Linus Torvalds11-37/+19
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "This contains a few small fixes. This comes with some delay because I wanted to wait on people running their reproducers and the Easter Holidays meant that those replies came in a little later than usual: - Fix handling of preventing writes to mounted block devices. Since last kernel we allow to prevent writing to mounted block devices provided CONFIG_BLK_DEV_WRITE_MOUNTED isn't set and the block device is opened with restricted writes. When we switched to opening block devices as files we altered the mechanism by which we recognize when a block device has been opened with write restrictions. The detection logic assumed that only read-write mounted filesystems would apply write restrictions to their block devices from other openers. That of course is not true since it also makes sense to apply write restrictions for filesystems that are read-only. Fix the detection logic using an FMODE_* bit. We still have a few left since we freed up a couple a while ago. I also picked up a patch to free up four additional FMODE_* bits scheduled for the next merge window. - Fix counting the number of writers to a block device. This just changes the logic to be consistent. - Fix a bug in aio causing a NULL pointer derefernce after we implemented batched processing in aio. - Finally, add the changes we discussed that allows to yield block devices early even though file closing itself is deferred. This also allows us to remove two holder operations to get and release the holder to align lifetime of file and holder of the block device" * tag 'vfs-6.9-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: aio: Fix null ptr deref in aio_complete() wakeup fs,block: yield devices early block: count BLK_OPEN_RESTRICT_WRITES openers block: handle BLK_OPEN_RESTRICT_WRITES correctly
2024-04-05aio: Fix null ptr deref in aio_complete() wakeupKent Overstreet1-1/+1
list_del_init_careful() needs to be the last access to the wait queue entry - it effectively unlocks access. Previously, finish_wait() would see the empty list head and skip taking the lock, and then we'd return - but the completion path would still attempt to do the wakeup after the task_struct pointer had been overwritten. Fixes: 71eb6b6b0ba9 ("fs/aio: obey min_nr when doing wakeups") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-fsdevel/CAHTA-ubfwwB51A5Wg5M6H_rPEQK9pNf8FkAGH=vr=FEkyRrtqw@mail.gmail.com/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/stable/20240331215212.522544-1-kent.overstreet%40linux.dev Link: https://lore.kernel.org/r/20240331215212.522544-1-kent.overstreet@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-05bcachefs: Fix rebalance from durability=0 deviceKent Overstreet1-4/+13
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-05Merge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefsLinus Torvalds39-494/+1869
Pull bcachefs repair code from Kent Overstreet: "A couple more small fixes, and new repair code. We can now automatically recover from arbitrary corrupted interior btree nodes by scanning, and we can reconstruct metadata as needed to bring a filesystem back into a working, consistent, read-write state and preserve access to whatevver wasn't corrupted. Meaning - you can blow away all metadata except for extents and dirents leaf nodes, and repair will reconstruct everything else and give you your data, and under the correct paths. If inodes are missing i_size will be slightly off and permissions/ownership/timestamps will be gone, and we do still need the snapshots btree if snapshots were in use - in the future we'll be able to guess the snapshot tree structure in some situations. IOW - aside from shaking out remaining bugs (fuzz testing is still coming), repair code should be complete and if repair ever doesn't work that's the highest priority bug that I want to know about immediately. This patchset was kindly tested by a user from India who accidentally wiped one drive out of a three drive filesystem with no replication on the family computer - it took a couple weeks but we got everything important back" * tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs: bcachefs: reconstruct_inode() bcachefs: Subvolume reconstruction bcachefs: Check for extents that point to same space bcachefs: Reconstruct missing snapshot nodes bcachefs: Flag btrees with missing data bcachefs: Topology repair now uses nodes found by scanning to fill holes bcachefs: Repair pass for scanning for btree nodes bcachefs: Don't skip fake btree roots in fsck bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake() bcachefs: Etyzinger cleanups bcachefs: bch2_shoot_down_journal_keys() bcachefs: Clear recovery_passes_required as they complete without errors bcachefs: ratelimit informational fsck errors bcachefs: Check for bad needs_discard before doing discard bcachefs: Improve bch2_btree_update_to_text() mean_and_variance: Drop always failing tests bcachefs: fix nocow lock deadlock bcachefs: BCH_WATERMARK_interior_updates bcachefs: Fix btree node reserve
2024-04-04bcachefs: Print shutdown journal sequence numberKent Overstreet1-0/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-04bcachefs: Further improve btree_update_to_text()Kent Overstreet2-55/+48
Print start and end level of the btree update; also a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-04bcachefs: Move btree_updates to debugfsKent Overstreet2-9/+42
sysfs is limited to PAGE_SIZE, and when we're debugging strange deadlocks/priority inversions we need to see the full list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-04bcachefs: Bump limit in btree_trans_too_many_iters()Kent Overstreet2-1/+15
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-04bcachefs: Make snapshot_is_ancestor() safeKent Overstreet1-6/+13
Snapshot table accesses generally need to be checking for invalid snapshot ID now, fix one that was missed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-04bcachefs: create debugfs dir for each btreeThomas Bertschinger1-15/+15
This creates a subdirectory for each individual btree under the btrees/ debugfs directory. Directory structure, before: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc ├── alloc-bfloat-failed ├── alloc-formats ├── backpointers ├── backpointers-bfloat-failed ├── backpointers-formats ... Directory structure, after: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc │   ├── bfloat-failed │   ├── formats │   └── keys ├── backpointers │   ├── bfloat-failed │   ├── formats │   └── keys ... Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>