summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-10-12btrfs: reformat remaining kdoc style commentsDavid Sterba18-51/+62
Function name in the comment does not bring much value to code not exposed as API and we don't stick to the kdoc format anymore. Update formatting of parameter descriptions. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: move functions comments from qgroup.h to qgroup.cDavid Sterba2-76/+71
We keep the comments next to the implementation, there were some left to move. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: comment about fsid and metadata_uuid relationshipAnand Jain1-0/+9
Add a comment explaining the relationship between fsid and metadata_uuid in the on-disk superblock and the in-memory struct btrfs_fs_devices. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: remove unused helpers for ulist aux dataJiapeng Chong1-10/+0
These functions are defined in the qgroup.c file, but not called anymore since commit "btrfs: qgroup: use qgroup_iterator_nested to in qgroup_update_refcnt()" so we can delete them. fs/btrfs/qgroup.c:149:19: warning: unused function 'qgroup_to_aux'. fs/btrfs/qgroup.c:154:36: warning: unused function 'unode_aux_to_qgroup'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6566 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: prealloc btrfs_qgroup_list for __add_relation_rb()Qu Wenruo1-19/+59
Currently we go GFP_ATOMIC allocation for qgroup relation add, this includes the following 3 call sites: - btrfs_read_qgroup_config() This is not really needed, as at that time we're still in single thread mode, and no spin lock is held. - btrfs_add_qgroup_relation() This one is holding a spinlock, but we're ensured to add at most one relation, thus we can easily do a preallocation and use the preallocated memory to avoid GFP_ATOMIC. - btrfs_qgroup_inherit() This is a little more tricky, as we may have as many relationships as inherit::num_qgroups. Thus we have to properly allocate an array then preallocate all the memory. This patch would remove the GFP_ATOMIC allocation for above involved call sites, by doing preallocation before holding the spinlock, and let __add_relation_rb() to handle the freeing of the structure. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: pre-allocate btrfs_qgroup to reduce GFP_ATOMIC usageQu Wenruo1-26/+61
Qgroup is the heaviest user of GFP_ATOMIC, but one call site does not really need GFP_ATOMIC, that is add_qgroup_rb(). That function only searches the rbtree to find if we already have such entry. If not, then it would try to allocate memory for it. This means we can afford to pre-allocate such structure unconditionally, then free the memory if it's not needed. Considering this function is not a hot path, only utilized by the following functions: - btrfs_qgroup_inherit() For "btrfs subvolume snapshot -i" option. - btrfs_read_qgroup_config() At mount time, and we're ensured there would be no existing rb tree entry for each qgroup. - btrfs_create_qgroup() Thus we're completely safe to pre-allocate the extra memory for btrfs_qgroup structure, and reduce unnecessary GFP_ATOMIC usage. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator_nested to in qgroup_update_refcnt()Qu Wenruo2-42/+53
The ulist @qgroups is utilized to record all involved qgroups from both old and new roots inside btrfs_qgroup_account_extent(). Due to the fact that qgroup_update_refcnt() itself is already utilizing qgroup_iterator, here we have to introduce another list_head, btrfs_qgroup::nested_iterator, allowing nested iteration. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator to replace tmp ulist in ↵Qu Wenruo1-28/+11
qgroup_update_refcnt() For function qgroup_update_refcnt(), we use @tmp list to iterate all the involved qgroups of a subvolume. It's a perfect match for qgroup_iterator facility, as that @tmp ulist has a very limited lifespan (just inside the while() loop). By migrating to qgroup_iterator, we can get rid of the GFP_ATOMIC memory allocation and no error handling is needed. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator in __qgroup_excl_accounting()Qu Wenruo1-64/+17
With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can get rid of the ulist and its GFP_ATOMIC memory allocation. Furthermore we can merge the code handling the initial and parent qgroups into one loop, and drop the @tmp ulist parameter for involved call sites. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator in qgroup_convert_meta()Qu Wenruo1-22/+10
With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can get rid of the ulist and its GFP_ATOMIC memory allocation. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: use qgroup_iterator in btrfs_qgroup_free_refroot()Qu Wenruo1-22/+7
With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can get rid of the ulist and its GFP_ATOMIC memory allocation. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: qgroup: iterate qgroups without memory allocation for qgroup_reserve()Qu Wenruo2-32/+38
Qgroup heavily relies on ulist to go through all the involved qgroups, but since we're using ulist inside fs_info->qgroup_lock spinlock, this means we're doing a lot of GFP_ATOMIC allocations. This patch reduces the GFP_ATOMIC usage for qgroup_reserve() by eliminating the memory allocation completely. This is done by moving the needed memory to btrfs_qgroup::iterator list_head, so that we can put all the involved qgroup into a on-stack list, thus eliminating the need to allocate memory while holding spinlock. The only cost is the slightly higher memory usage, but considering the reduce GFP_ATOMIC during a hot path, it should still be acceptable. Function qgroup_reserve() is the perfect start point for this conversion. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: remove extraneous includes from ctree.hJosef Bacik1-28/+0
We don't need any of these includes in the ctree.h header file for the header file itself, remove them to clean up ctree.h a little bit. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: include linux/security.h in super.cJosef Bacik1-0/+1
We use some of the security related code in here, include it in super.c so we can remove the include from ctree.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: include trace header in where necessaryJosef Bacik4-0/+4
If we no longer include the tracepoints from ctree.h we fail to compile because we have the dependency in some of the header files and source files. Add the include where we have these dependencies to allow us to remove the include from ctree.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: add btrfs_delayed_ref_head declaration to extent-tree.hJosef Bacik1-0/+1
extent-tree.h uses btrfs_delayed_ref_head in a function argument but doesn't pull it's declaration from anywhere, add it to the top of the header. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: add fscrypt related dependencies to respective headersJosef Bacik4-0/+6
These headers have struct fscrypt_str as function arguments, so add struct fscrypt_str to the theader, and include linux/fscrypt.h in btrfs_inode.h as it also needs the definition of struct fscrypt_name for the new inode args. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: include linux/iomap.h in file.cJosef Bacik1-0/+1
We use the iomap code in file.c, include it so we have our dependencies. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: include asm/unaligned.h in accessors.hJosef Bacik1-0/+1
We use the unaligned helpers directly in accessors.h, add the include here. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: move btrfs_name_hash to dir-item.hJosef Bacik4-5/+9
This is related to the name hashing for dir items, move it into dir-item.h. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: move btrfs_extref_hash into inode-item.hJosef Bacik2-9/+7
Ideally this would be un-inlined, but that is a cleanup for later. For now move this into inode-item.h, which is where the extref code lives. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: remove btrfs_crc32c wrapperJosef Bacik4-13/+8
This simply sends the same arguments into crc32c(), and is just used in a few places. Remove this wrapper and directly call crc32c() in these instances. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: move btrfs_crc32c_final into free-space-cache.cJosef Bacik2-5/+5
This is the only place this helper is used, take it out of ctree.h and move it into free-space-cache.c. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: do not require EXTENT_NOWAIT for btrfs_redirty_list_add()Qu Wenruo1-1/+1
The flag EXTENT_NOWAIT is a special flag to notify extent-io-tree code that this operation should not sleep for the extent state preallocation. However for btrfs_redirty_list_add(), all callers are able to sleep: - clean_log_buffer() Just 2 lines before, we call btrfs_pin_reserved_extent(), which calls pin_down_extent(), and that function does not require EXTENT_NOWAIT. Thus we're safe to call it without EXTENT_NOWAIT. - btrfs_free_tree_block() This function have several call sites which trigger tree read, e.g. walk_up_proc(), thus we're safe to call it without EXTENT_NOWAIT. Thus there is no need to require EXTENT_NOWAIT flag. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: sipmlify uuid parameters of alloc_fs_devices()Anand Jain2-17/+18
Among all the callers, only the device_list_add() function uses the second argument of alloc_fs_devices(). It passes metadata_uuid when available, otherwise, it passes NULL. And in turn, alloc_fs_devices() is designed to copy either metadata_uuid or fsid into fs_devices::metadata_uuid. So remove the second argument in alloc_fs_devices(), and always copy the fsid. In the caller device_list_add() function, we will overwrite it with metadata_uuid when it is available. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: update comment for reservation of metadata space for delayed itemsFilipe Manana1-1/+1
The second comment at btrfs_delayed_item_reserve_metadata() refers to a field named "index_items_size" of a delayed inode, however that field does not exists - it existed in a previous patch version, but then it split into the fields "curr_index_batch_size" and "index_item_leaves" in the final patch version that was picked. So update the comment. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12Merge tag 'fs_for_v6.6-rc6' of ↵Linus Torvalds1-27/+39
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota regression fix from Jan Kara. * tag 'fs_for_v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Fix slow quotaoff
2023-10-11Merge tag 'for-6.6-rc5-tag' of ↵Linus Torvalds3-6/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A revert of recent mount option parsing fix, this breaks mounts with security options. The second patch is a flexible array annotation" * tag 'for-6.6-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size() Revert "btrfs: reject unknown mount options early"
2023-10-11btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size()Gustavo A. R. Silva2-2/+2
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). While there, use struct_size() helper, instead of the open-coded version, to calculate the size for the allocation of the whole flexible structure, including of course, the flexible-array member. This code was found with the help of Coccinelle, and audited and fixed manually. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-10Revert "btrfs: reject unknown mount options early"David Sterba1-4/+0
This reverts commit 5f521494cc73520ffac18ede0758883b9aedd018. The patch breaks mounts with security mount options like $ mount -o context=system_u:object_r:root_t:s0 /dev/sdX /mn mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdX, missing codepage or helper program, ... We cannot reject all unknown options in btrfs_parse_subvol_options() as intended, the security options can be present at this point and it's not possible to enumerate them in a future proof way. This means unknown mount options are silently accepted like before when the filesystem is mounted with either -o subvol=/path or as followup mounts of the same device. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-08Merge tag '6.6-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds10-45/+151
Pull smb server fixes from Steve French: "Six SMB3 server fixes for various races found by RO0T Lab of Huawei: - Fix oops when racing between oplock break ack and freeing file - Simultaneous request fixes for parallel logoffs, and for parallel lock requests - Fixes for tree disconnect race, session expire race, and close/open race" * tag '6.6-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: fix race condition between tree conn lookup and disconnect ksmbd: fix race condition from parallel smb2 lock requests ksmbd: fix race condition from parallel smb2 logoff requests ksmbd: fix uaf in smb20_oplock_break_ack ksmbd: fix race condition with fp ksmbd: fix race condition between session lookup and expire
2023-10-07Merge tag '6.6-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds1-13/+13
Pull smb client fixes from Steve French: - protect cifs/smb3 socket connect from BPF address overwrite - fix case when directory leases disabled but wasting resources with unneeded thread on each mount * tag '6.6-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: do not start laundromat thread on nohandlecache smb: use kernel_connect() and kernel_bind()
2023-10-07Merge tag 'xfs-6.6-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds6-117/+311
Pull xfs fixes from Chandan Babu: - Prevent filesystem hang when executing fstrim operations on large and slow storage * tag 'xfs-6.6-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: abort fstrim if kernel is suspending xfs: reduce AGF hold times during fstrim operations xfs: move log discard work to xfs_discard.c
2023-10-06Merge tag 'for-6.6-rc4-tag' of ↵Linus Torvalds4-17/+47
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - reject unknown mount options - adjust transaction abort error message level - fix one more build warning with -Wmaybe-uninitialized - proper error handling in several COW-related cases * tag 'for-6.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: error out when reallocating block for defrag using a stale transaction btrfs: error when COWing block from a root that is being deleted btrfs: error out when COWing block using a stale transaction btrfs: always print transaction aborted messages with an error level btrfs: reject unknown mount options early btrfs: fix some -Wmaybe-uninitialized warnings in ioctl.c
2023-10-06quota: Fix slow quotaoffJan Kara1-27/+39
Eric has reported that commit dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") heavily increases runtime of generic/270 xfstest for ext4 in nojournal mode. The reason for this is that ext4 in nojournal mode leaves dquots dirty until the last dqput() and thus the cleanup done in quota_release_workfn() has to write them all. Due to the way quota_release_workfn() is written this results in synchronize_srcu() call for each dirty dquot which makes the dquot cleanup when turning quotas off extremely slow. To be able to avoid synchronize_srcu() for each dirty dquot we need to rework how we track dquots to be cleaned up. Instead of keeping the last dquot reference while it is on releasing_dquots list, we drop it right away and mark the dquot with new DQ_RELEASING_B bit instead. This way we can we can remove dquot from releasing_dquots list when new reference to it is acquired and thus there's no need to call synchronize_srcu() each time we drop dq_list_lock. References: https://lore.kernel.org/all/ZRytn6CxFK2oECUt@debian-BULLSEYE-live-builder-AMD64 Reported-by: Eric Whitney <enwlinux@gmail.com> Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2023-10-06Merge tag 'erofs-for-6.6-rc5-fixes' of ↵Linus Torvalds2-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: - Fix a memory leak issue when using LZMA global compressed deduplication - Fix empty device tags in flatdev mode - Update documentation for recent new features * tag 'erofs-for-6.6-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: update documentation erofs: allow empty device tags in flatdev mode erofs: fix memory leak of LZMA global compressed deduplication
2023-10-05Merge tag 'ovl-fixes-6.6-rc5' of ↵Linus Torvalds5-30/+28
git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs Pull overlayfs fixes from Amir Goldstein: - Fix for file reference leak regression - Fix for NULL pointer deref regression - Fixes for RCU-walk race regressions: Two of the fixes were taken from Al's RCU pathwalk race fixes series with his consent [1]. Note that unlike most of Al's series, these two patches are not about racing with ->kill_sb() and they are also very recent regressions from v6.5, so I think it's worth getting them into v6.5.y. There is also a fix for an RCU pathwalk race with ->kill_sb(), which may have been solved in vfs generic code as you suggested, but it also rids overlayfs from a nasty hack, so I think it's worth anyway. Link: https://lore.kernel.org/linux-fsdevel/20231003204749.GA800259@ZenIV/ [1] * tag 'ovl-fixes-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs: ovl: fix NULL pointer defer when encoding non-decodable lower fid ovl: make use of ->layers safe in rcu pathwalk ovl: fetch inode once in ovl_dentry_revalidate_common() ovl: move freeing ovl_entry past rcu delay ovl: fix file reference leak when submitting aio
2023-10-05ksmbd: fix race condition between tree conn lookup and disconnectNamjae Jeon6-17/+90
if thread A in smb2_write is using work-tcon, other thread B use smb2_tree_disconnect free the tcon, then thread A will use free'd tcon. Time + Thread A | Thread A smb2_write | smb2_tree_disconnect | | | kfree(tree_conn) | // UAF! | work->tcon->share_conf | + This patch add state, reference count and lock for tree conn to fix race condition issue. Reported-by: luosili <rootlab@huawei.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-05ksmbd: fix race condition from parallel smb2 lock requestsNamjae Jeon1-11/+1
There is a race condition issue between parallel smb2 lock request. Time + Thread A | Thread A smb2_lock | smb2_lock | insert smb_lock to lock_list | spin_unlock(&work->conn->llist_lock) | | | spin_lock(&conn->llist_lock); | kfree(cmp_lock); | // UAF! | list_add(&smb_lock->llist, &rollback_list) + This patch swaps the line for adding the smb lock to the rollback list and adding the lock list of connection to fix the race issue. Reported-by: luosili <rootlab@huawei.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-05ksmbd: fix race condition from parallel smb2 logoff requestsNamjae Jeon1-8/+16
If parallel smb2 logoff requests come in before closing door, running request count becomes more than 1 even though connection status is set to KSMBD_SESS_NEED_RECONNECT. It can't get condition true, and sleep forever. This patch fix race condition problem by returning error if connection status was already set to KSMBD_SESS_NEED_RECONNECT. Reported-by: luosili <rootlab@huawei.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-05ksmbd: fix uaf in smb20_oplock_break_ackluosili1-2/+2
drop reference after use opinfo. Signed-off-by: luosili <rootlab@huawei.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-05ksmbd: fix race condition with fpNamjae Jeon3-4/+32
fp can used in each command. If smb2_close command is coming at the same time, UAF issue can happen by race condition. Time + Thread A | Thread B1 B2 .... B5 smb2_open | smb2_close | __open_id | insert fp to file_table | | | atomic_dec_and_test(&fp->refcount) | if fp->refcount == 0, free fp by kfree. // UAF! | use fp | + This patch add f_state not to use freed fp is used and not to free fp in use. Reported-by: luosili <rootlab@huawei.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-05ksmbd: fix race condition between session lookup and expireNamjae Jeon3-3/+10
Thread A + Thread B ksmbd_session_lookup | smb2_sess_setup sess = xa_load | | | xa_erase(&conn->sessions, sess->id); | | ksmbd_session_destroy(sess) --> kfree(sess) | // UAF! | sess->last_active = jiffies | + This patch add rwsem to fix race condition between ksmbd_session_lookup and ksmbd_expire_session. Reported-by: luosili <rootlab@huawei.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-05smb: client: do not start laundromat thread on nohandlecachePaulo Alcantara1-8/+8
Honor 'nohandlecache' mount option by not starting laundromat thread even when SMB server supports directory leases. Do not waste system resources by having laundromat thread running with no directory caching at all. Fixes: 2da338ff752a ("smb3: do not start laundromat thread when dir leases disabled") Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-05smb: use kernel_connect() and kernel_bind()Jordan Rife1-5/+5
Recent changes to kernel_connect() and kernel_bind() ensure that callers are insulated from changes to the address parameter made by BPF SOCK_ADDR hooks. This patch wraps direct calls to ops->connect() and ops->bind() with kernel_connect() and kernel_bind() to ensure that SMB mounts do not see their mount address overwritten in such cases. Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.camel@redhat.com/ Cc: <stable@vger.kernel.org> # 6.0+ Signed-off-by: Jordan Rife <jrife@google.com> Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-04btrfs: error out when reallocating block for defrag using a stale transactionFilipe Manana1-2/+16
At btrfs_realloc_node() we have these checks to verify we are not using a stale transaction (a past transaction with an unblocked state or higher), and the only thing we do is to trigger two WARN_ON(). This however is a critical problem, highly unexpected and if it happens it's most likely due to a bug, so we should error out and turn the fs into error state so that such issue is much more easily noticed if it's triggered. The problem is critical because in btrfs_realloc_node() we COW tree blocks, and using such stale transaction will lead to not persisting the extent buffers used for the COW operations, as allocating tree block adds the range of the respective extent buffers to the ->dirty_pages iotree of the transaction, and a stale transaction, in the unlocked state or higher, will not flush dirty extent buffers anymore, therefore resulting in not persisting the tree block and resource leaks (not cleaning the dirty_pages iotree for example). So do the following changes: 1) Return -EUCLEAN if we find a stale transaction; 2) Turn the fs into error state, with error -EUCLEAN, so that no transaction can be committed, and generate a stack trace; 3) Combine both conditions into a single if statement, as both are related and have the same error message; 4) Mark the check as unlikely, since this is not expected to ever happen. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: error when COWing block from a root that is being deletedFilipe Manana1-3/+7
At btrfs_cow_block() we check if the block being COWed belongs to a root that is being deleted and if so we log an error message. However this is an unexpected case and it indicates a bug somewhere, so we should return an error and abort the transaction. So change this in the following ways: 1) Abort the transaction with -EUCLEAN, so that if the issue ever happens it can easily be noticed; 2) Change the logged message level from error to critical, and change the message itself to print the block's logical address and the ID of the root; 3) Return -EUCLEAN to the caller; 4) As this is an unexpected scenario, that should never happen, mark the check as unlikely, allowing the compiler to potentially generate better code. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: error out when COWing block using a stale transactionFilipe Manana1-8/+16
At btrfs_cow_block() we have these checks to verify we are not using a stale transaction (a past transaction with an unblocked state or higher), and the only thing we do is to trigger a WARN with a message and a stack trace. This however is a critical problem, highly unexpected and if it happens it's most likely due to a bug, so we should error out and turn the fs into error state so that such issue is much more easily noticed if it's triggered. The problem is critical because using such stale transaction will lead to not persisting the extent buffer used for the COW operation, as allocating a tree block adds the range of the respective extent buffer to the ->dirty_pages iotree of the transaction, and a stale transaction, in the unlocked state or higher, will not flush dirty extent buffers anymore, therefore resulting in not persisting the tree block and resource leaks (not cleaning the dirty_pages iotree for example). So do the following changes: 1) Return -EUCLEAN if we find a stale transaction; 2) Turn the fs into error state, with error -EUCLEAN, so that no transaction can be committed, and generate a stack trace; 3) Combine both conditions into a single if statement, as both are related and have the same error message; 4) Mark the check as unlikely, since this is not expected to ever happen. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: always print transaction aborted messages with an error levelFilipe Manana1-2/+2
Commit b7af0635c87f ("btrfs: print transaction aborted messages with an error level") changed the log level of transaction aborted messages from a debug level to an error level, so that such messages are always visible even on production systems where the log level is normally above the debug level (and also on some syzbot reports). Later, commit fccf0c842ed4 ("btrfs: move btrfs_abort_transaction to transaction.c") changed the log level back to debug level when the error number for a transaction abort should not have a stack trace printed. This happened for absolutely no reason. It's always useful to print transaction abort messages with an error level, regardless of whether the error number should cause a stack trace or not. So change back the log level to error level. Fixes: fccf0c842ed4 ("btrfs: move btrfs_abort_transaction to transaction.c") CC: stable@vger.kernel.org # 6.5+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-04btrfs: reject unknown mount options earlyQu Wenruo1-0/+4
[BUG] The following script would allow invalid mount options to be specified (although such invalid options would just be ignored): # mkfs.btrfs -f $dev # mount $dev $mnt1 <<< Successful mount expected # mount $dev $mnt2 -o junk <<< Failed mount expected # echo $? 0 [CAUSE] For the 2nd mount, since the fs is already mounted, we won't go through open_ctree() thus no btrfs_parse_options(), but only through btrfs_parse_subvol_options(). However we do not treat unrecognized options from valid but irrelevant options, thus those invalid options would just be ignored by btrfs_parse_subvol_options(). [FIX] Add the handling for Opt_err to handle invalid options and error out, while still ignore other valid options inside btrfs_parse_subvol_options(). Reported-by: Anand Jain <anand.jain@oracle.com> CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>