summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-10-28fs: Convert to bdev_open_by_dev()Jan Kara3-8/+11
Convert mount code to use bdev_open_by_dev() and propagate the handle around to bdev_release(). Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-19-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+1
Pull misc filesystem fixes from Al Viro: "Assorted fixes all over the place: literally nothing in common, could have been three separate pull requests. All are simple regression fixes, but not for anything from this cycle" * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ceph_wait_on_conflict_unlink(): grab reference before dropping ->d_lock io_uring: kiocb_done() should *not* trust ->ki_pos if ->{read,write}_iter() failed sparc32: fix a braino in fault handling in csum_and_copy_..._user()
2023-10-28ceph_wait_on_conflict_unlink(): grab reference before dropping ->d_lockAl Viro1-1/+1
Use of dget() after we'd dropped ->d_lock is too late - dentry might be gone by that point. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-10-25file, i915: fix file reference for mmap_singleton()Christian Brauner1-0/+25
Today we got a report at [1] for rcu stalls on the i915 testsuite in [2] due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict, get_file_rcu() goes into an infinite loop trying to carefully verify that i915->gem.mmap_singleton hasn't changed - see the splat below. So I stared at this code to figure out what it actually does. It seems that the i915->gem.mmap_singleton pointer itself never had rcu semantics. The i915->gem.mmap_singleton is replaced in file->f_op->release::singleton_release(): static int singleton_release(struct inode *inode, struct file *file) { struct drm_i915_private *i915 = file->private_data; cmpxchg(&i915->gem.mmap_singleton, file, NULL); drm_dev_put(&i915->drm); return 0; } The cmpxchg() is ordered against a concurrent update of i915->gem.mmap_singleton from mmap_singleton(). IOW, when mmap_singleton() fails to get a reference on i915->gem.mmap_singleton: While mmap_singleton() does rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); it allocates a new file via anon_inode_getfile() and does smp_store_mb(i915->gem.mmap_singleton, file); So, then what happens in the case of this bug is that at some point fput() is called and drops the file->f_count to zero leaving the pointer in i915->gem.mmap_singleton in tact. Now, there might be delays until file->f_op->release::singleton_release() is called and i915->gem.mmap_singleton is set to NULL. Say concurrently another task hits mmap_singleton() and does: rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock(); When get_file_rcu() fails to get a reference via atomic_inc_not_zero() it will try the reload from i915->gem.mmap_singleton expecting it to be NULL, assuming it has comparable semantics as we expect in __fget_files_rcu(). But it hasn't so it reloads the same pointer again, trying the same atomic_inc_not_zero() again and doing so until file->f_op->release::singleton_release() of the old file has been called. So, in contrast to __fget_files_rcu() here we want to not retry when atomic_inc_not_zero() has failed. We only want to retry in case we managed to get a reference but the pointer did change on reload. <3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: <3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238 <3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20) <6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002 <6> [511.395962] Call Trace: <6> [511.395966] <TASK> <6> [511.395974] ? __schedule+0x3a8/0xd70 <6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140 <6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396029] ? get_file_rcu+0x10/0x30 <6> [511.396039] ? get_file_rcu+0x10/0x30 <6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915] <6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915] <6> [511.396903] ? mmap_region+0x253/0xb60 <6> [511.396925] ? do_mmap+0x334/0x5c0 <6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0 <6> [511.396949] ? rcu_is_watching+0x11/0x50 <6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915] <6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915] <6> [511.397762] ? igt_mmap+0x11e/0x150 [i915] <6> [511.398139] ? __trace_bprintk+0x76/0x90 <6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915] <6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915] <6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915] <6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915] <6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915] <6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915] <6> [511.400692] ? pci_device_probe+0x95/0x120 <6> [511.400704] ? really_probe+0x164/0x3c0 <6> [511.400715] ? __pfx___driver_attach+0x10/0x10 <6> [511.400722] ? __driver_probe_device+0x73/0x160 <6> [511.400731] ? driver_probe_device+0x19/0xa0 <6> [511.400741] ? __driver_attach+0xb6/0x180 <6> [511.400749] ? __pfx___driver_attach+0x10/0x10 <6> [511.400756] ? bus_for_each_dev+0x77/0xd0 <6> [511.400770] ? bus_add_driver+0x114/0x210 <6> [511.400781] ? driver_register+0x5b/0x110 <6> [511.400791] ? i915_init+0x23/0xc0 [i915] <6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915] <6> [511.401503] ? do_one_initcall+0x57/0x270 <6> [511.401515] ? rcu_is_watching+0x11/0x50 <6> [511.401521] ? kmalloc_trace+0xa3/0xb0 <6> [511.401532] ? do_init_module+0x5f/0x210 <6> [511.401544] ? load_module+0x1d00/0x1f60 <6> [511.401581] ? init_module_from_file+0x86/0xd0 <6> [511.401590] ? init_module_from_file+0x86/0xd0 <6> [511.401613] ? idempotent_init_module+0x17c/0x230 <6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0 <6> [511.401650] ? do_syscall_64+0x3c/0x90 <6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 <6> [511.401684] </TASK> Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963 Cc: Jann Horn <jannh@google.com>, Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-25ext2: Convert ext2_prepare_chunk and ext2_commit_chunk to foliosMatthew Wilcox (Oracle)1-15/+14
All callers now have a folio, so pass it in. Saves one call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>
2023-10-25ext2: Convert ext2_make_empty() to use a folioMatthew Wilcox (Oracle)1-8/+8
Remove two hidden calls to compound_head() by using the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-9-willy@infradead.org>
2023-10-25ext2: Convert ext2_unlink() and ext2_rename() to use foliosMatthew Wilcox (Oracle)3-75/+55
This involves changing ext2_find_entry(), ext2_dotdot(), ext2_inode_by_name(), ext2_set_link() and ext2_delete_entry() to take a folio. These were also the last users of ext2_get_page() and ext2_put_page(), so remove those at the same time. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-8-willy@infradead.org>
2023-10-25ext2: Convert ext2_delete_entry() to use foliosMatthew Wilcox (Oracle)1-13/+17
Save some calls to compound_head() by using the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-7-willy@infradead.org>
2023-10-25ext2: Convert ext2_empty_dir() to use a folioMatthew Wilcox (Oracle)1-5/+5
Save two calls to compound_head() by using the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-6-willy@infradead.org>
2023-10-25ext2: Convert ext2_add_link() to use a folioMatthew Wilcox (Oracle)1-12/+12
Remove five hidden calls to compound_head() and fix a couple of places that assumed PAGE_SIZE. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-5-willy@infradead.org>
2023-10-25ext2: Convert ext2_readdir to use a folioMatthew Wilcox (Oracle)1-5/+5
Saves a hidden call to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-4-willy@infradead.org>
2023-10-25ext2: Add ext2_get_folio()Matthew Wilcox (Oracle)1-12/+24
Convert ext2_get_page() into ext2_get_folio() and keep the original function around as a temporary wrapper. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-3-willy@infradead.org>
2023-10-25ext2: Convert ext2_check_page to ext2_check_folioMatthew Wilcox (Oracle)1-14/+14
Support in this function for large folios is limited to supporting filesystems with block size > PAGE_SIZE. This new functionality will only be supported on machines without HIGHMEM, so the problem of kmap_local only being able to map a single page in the folio can be ignored. We will not use large folios for ext2 directories on HIGHMEM machines. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230921200746.3303942-2-willy@infradead.org>
2023-10-24autofs: fix add autofs_parse_fd()Ian Kent2-4/+11
We are seeing systemd hang on its autofs direct mount at /proc/sys/fs/binfmt_misc. Historically this was due to a mismatch in the communication structure size between a 64 bit kernel and a 32 bit user space and was fixed by making the pipe communication record oriented. During autofs v5 development I decided to stay with the existing usage instead of changing to a packed structure for autofs <=> user space communications which turned out to be a mistake on my part. Problems arose and they were fixed by allowing for the 64 bit to 32 bit size difference in the automount(8) code. Along the way systemd started to use autofs and eventually encountered this problem too. systemd refused to compensate for the length difference insisting it be fixed in the kernel. Fortunately Linus implemented the packetized pipe which resolved the problem in a straight forward and simple way. In the autofs mount api conversion series I inadvertatly dropped the packet pipe flag settings when adding the autofs_parse_fd() function. This patch fixes that omission. Fixes: 546694b8f658 ("autofs: add autofs_parse_fd()") Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/20231023093359.64265-1-raven@themaw.net Tested-by: Anders Roxell <anders.roxell@linaro.org> Cc: Bill O'Donnell <bodonnel@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-24vfs: Convert BUG_ON to WARN_ON_ONCE in open_last_lookupsBernd Schubert1-1/+2
The calling code actually handles -ECHILD, so this BUG_ON can be converted to WARN_ON_ONCE. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Link: https://lore.kernel.org/r/20231023184718.11143-1-bschubert@ddn.com Cc: Christian Brauner <brauner@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Dharmendra Singh <dsingh@ddn.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-24Merge tag 'pull-nfsd-fix' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull nfsd fix from Al Viro: "Catch from lock_rename() audit; nfsd_rename() checked that both directories belonged to the same filesystem, but only after having done lock_rename(). Trivial fix, tested and acked by nfs folks" * tag 'pull-nfsd-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nfsd: lock_rename() needs both directories to live on the same fs
2023-10-23Merge tag 'for-6.6-rc7-tag' of ↵Linus Torvalds5-15/+33
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more fix for a problem with snapshot of a newly created subvolume that can lead to inconsistent data under some circumstances. Kernel 6.5 added a performance optimization to skip transaction commit for subvolume creation but this could end up with newer data on disk but not linked to other structures. The fix itself is an added condition, the rest of the patch is a parameter added to several functions" * tag 'for-6.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix unwritten extent buffer after snapshotting a new subvolume
2023-10-23btrfs: fix unwritten extent buffer after snapshotting a new subvolumeFilipe Manana5-15/+33
When creating a snapshot of a subvolume that was created in the current transaction, we can end up not persisting a dirty extent buffer that is referenced by the snapshot, resulting in IO errors due to checksum failures when trying to read the extent buffer later from disk. A sequence of steps that leads to this is the following: 1) At ioctl.c:create_subvol() we allocate an extent buffer, with logical address 36007936, for the leaf/root of a new subvolume that has an ID of 291. We mark the extent buffer as dirty, and at this point the subvolume tree has a single node/leaf which is also its root (level 0); 2) We no longer commit the transaction used to create the subvolume at create_subvol(). We used to, but that was recently removed in commit 1b53e51a4a8f ("btrfs: don't commit transaction for every subvol create"); 3) The transaction used to create the subvolume has an ID of 33, so the extent buffer 36007936 has a generation of 33; 4) Several updates happen to subvolume 291 during transaction 33, several files created and its tree height changes from 0 to 1, so we end up with a new root at level 1 and the extent buffer 36007936 is now a leaf of that new root node, which is extent buffer 36048896. The commit root remains as 36007936, since we are still at transaction 33; 5) Creation of a snapshot of subvolume 291, with an ID of 292, starts at ioctl.c:create_snapshot(). This triggers a commit of transaction 33 and we end up at transaction.c:create_pending_snapshot(), in the critical section of a transaction commit. There we COW the root of subvolume 291, which is extent buffer 36048896. The COW operation returns extent buffer 36048896, since there's no need to COW because the extent buffer was created in this transaction and it was not written yet. The we call btrfs_copy_root() against the root node 36048896. During this operation we allocate a new extent buffer to turn into the root node of the snapshot, copy the contents of the root node 36048896 into this snapshot root extent buffer, set the owner to 292 (the ID of the snapshot), etc, and then we call btrfs_inc_ref(). This will create a delayed reference for each leaf pointed by the root node with a reference root of 292 - this includes a reference for the leaf 36007936. After that we set the bit BTRFS_ROOT_FORCE_COW in the root's state. Then we call btrfs_insert_dir_item(), to create the directory entry in in the tree of subvolume 291 that points to the snapshot. This ends up needing to modify leaf 36007936 to insert the respective directory items. Because the bit BTRFS_ROOT_FORCE_COW is set for the root's state, we need to COW the leaf. We end up at btrfs_force_cow_block() and then at update_ref_for_cow(). At update_ref_for_cow() we call btrfs_block_can_be_shared() which returns false, despite the fact the leaf 36007936 is shared - the subvolume's root and the snapshot's root point to that leaf. The reason that it incorrectly returns false is because the commit root of the subvolume is extent buffer 36007936 - it was the initial root of the subvolume when we created it. So btrfs_block_can_be_shared() which has the following logic: int btrfs_block_can_be_shared(struct btrfs_root *root, struct extent_buffer *buf) { if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && buf != root->node && buf != root->commit_root && (btrfs_header_generation(buf) <= btrfs_root_last_snapshot(&root->root_item) || btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC))) return 1; return 0; } Returns false (0) since 'buf' (extent buffer 36007936) matches the root's commit root. As a result, at update_ref_for_cow(), we don't check for the number of references for extent buffer 36007936, we just assume it's not shared and therefore that it has only 1 reference, so we set the local variable 'refs' to 1. Later on, in the final if-else statement at update_ref_for_cow(): static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, struct extent_buffer *cow, int *last_ref) { (...) if (refs > 1) { (...) } else { (...) btrfs_clear_buffer_dirty(trans, buf); *last_ref = 1; } } So we mark the extent buffer 36007936 as not dirty, and as a result we don't write it to disk later in the transaction commit, despite the fact that the snapshot's root points to it. Attempting to access the leaf or dumping the tree for example shows that the extent buffer was not written: $ btrfs inspect-internal dump-tree -t 292 /dev/sdb btrfs-progs v6.2.2 file tree key (292 ROOT_ITEM 33) node 36110336 level 1 items 2 free space 119 generation 33 owner 292 node 36110336 flags 0x1(WRITTEN) backref revision 1 checksum stored a8103e3e checksum calced a8103e3e fs uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 chunk uuid e8c9c885-78f4-4d31-85fe-89e5f5fd4a07 key (256 INODE_ITEM 0) block 36007936 gen 33 key (257 EXTENT_DATA 0) block 36052992 gen 33 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 total bytes 107374182400 bytes used 38572032 uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 The respective on disk region is full of zeroes as the device was trimmed at mkfs time. Obviously 'btrfs check' also detects and complains about this: $ btrfs check /dev/sdb Opening filesystem to check... Checking filesystem on /dev/sdb UUID: 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 generation: 33 (33) [1/7] checking root items [2/7] checking extents checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 bad tree block 36007936, bytenr mismatch, want=36007936, have=0 owner ref check failed [36007936 4096] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 bad tree block 36007936, bytenr mismatch, want=36007936, have=0 The following tree block(s) is corrupted in tree 292: tree block bytenr: 36110336, level: 1, node key: (256, 1, 0) root 292 root dir 256 not found ERROR: errors found in fs roots found 38572032 bytes used, error(s) found total csum bytes: 16048 total tree bytes: 1265664 total fs tree bytes: 1118208 total extent tree bytes: 65536 btree space waste bytes: 562598 file data blocks allocated: 65978368 referenced 36569088 Fix this by updating btrfs_block_can_be_shared() to consider that an extent buffer may be shared if it matches the commit root and if its generation matches the current transaction's generation. This can be reproduced with the following script: $ cat test.sh #!/bin/bash MNT=/mnt/sdi DEV=/dev/sdi # Use a filesystem with a 64K node size so that we have the same node # size on every machine regardless of its page size (on x86_64 default # node size is 16K due to the 4K page size, while on PPC it's 64K by # default). This way we can make sure we are able to create a btree for # the subvolume with a height of 2. mkfs.btrfs -f -n 64K $DEV mount $DEV $MNT btrfs subvolume create $MNT/subvol # Create a few empty files on the subvolume, this bumps its btree # height to 2 (root node at level 1 and 2 leaves). for ((i = 1; i <= 300; i++)); do echo -n > $MNT/subvol/file_$i done btrfs subvolume snapshot -r $MNT/subvol $MNT/subvol/snap umount $DEV btrfs check $DEV Running it on a 6.5 kernel (or any 6.6-rc kernel at the moment): $ ./test.sh Create subvolume '/mnt/sdi/subvol' Create a readonly snapshot of '/mnt/sdi/subvol' in '/mnt/sdi/subvol/snap' Opening filesystem to check... Checking filesystem on /dev/sdi UUID: bbdde2ff-7d02-45ca-8a73-3c36f23755a1 [1/7] checking root items [2/7] checking extents parent transid verify failed on 30539776 wanted 7 found 5 parent transid verify failed on 30539776 wanted 7 found 5 parent transid verify failed on 30539776 wanted 7 found 5 Ignoring transid failure owner ref check failed [30539776 65536] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots parent transid verify failed on 30539776 wanted 7 found 5 Ignoring transid failure Wrong key of child node/leaf, wanted: (256, 1, 0), have: (2, 132, 0) Wrong generation of child node/leaf, wanted: 5, have: 7 root 257 root dir 256 not found ERROR: errors found in fs roots found 917504 bytes used, error(s) found total csum bytes: 0 total tree bytes: 851968 total fs tree bytes: 393216 total extent tree bytes: 65536 btree space waste bytes: 736550 file data blocks allocated: 0 referenced 0 A test case for fstests will follow soon. Fixes: 1b53e51a4a8f ("btrfs: don't commit transaction for every subvol create") CC: stable@vger.kernel.org # 6.5+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-23ksmbd: add support for surrogate pair conversionNamjae Jeon1-49/+138
ksmbd is missing supporting to convert filename included surrogate pair characters. It triggers a "file or folder does not exist" error in Windows client. [Steps to Reproduce for bug] 1. Create surrogate pair file touch $(echo -e '\xf0\x9d\x9f\xa3') touch $(echo -e '\xf0\x9d\x9f\xa4') 2. Try to open these files in ksmbd share through Windows client. This patch update unicode functions not to consider about surrogate pair (and IVS). Reviewed-by: Marios Makassikis <mmakassikis@freebox.fr> Tested-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-23ksmbd: fix missing RDMA-capable flag for IPoIB device in ↵Kangjing Huang1-10/+30
ksmbd_rdma_capable_netdev() Physical ib_device does not have an underlying net_device, thus its association with IPoIB net_device cannot be retrieved via ops.get_netdev() or ib_device_get_by_netdev(). ksmbd reads physical ib_device port GUID from the lower 16 bytes of the hardware addresses on IPoIB net_device and match its underlying ib_device using ib_find_gid() Signed-off-by: Kangjing Huang <huangkangjing@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-23ksmbd: fix recursive locking in vfs helpersMarios Makassikis1-20/+3
Running smb2.rename test from Samba smbtorture suite against a kernel built with lockdep triggers a "possible recursive locking detected" warning. This is because mnt_want_write() is called twice with no mnt_drop_write() in between: -> ksmbd_vfs_mkdir() -> ksmbd_vfs_kern_path_create() -> kern_path_create() -> filename_create() -> mnt_want_write() -> mnt_want_write() Fix this by removing the mnt_want_write/mnt_drop_write calls from vfs helpers that call kern_path_create(). Full lockdep trace below: ============================================ WARNING: possible recursive locking detected 6.6.0-rc5 #775 Not tainted -------------------------------------------- kworker/1:1/32 is trying to acquire lock: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: ksmbd_vfs_mkdir+0xe1/0x410 but task is already holding lock: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: filename_create+0xb6/0x260 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sb_writers#5); lock(sb_writers#5); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/1:1/32: #0: ffff8880064e4138 ((wq_completion)ksmbd-io){+.+.}-{0:0}, at: process_one_work+0x40e/0x980 #1: ffff888005b0fdd0 ((work_completion)(&work->work)){+.+.}-{0:0}, at: process_one_work+0x40e/0x980 #2: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: filename_create+0xb6/0x260 #3: ffff8880057ce760 (&type->i_mutex_dir_key#3/1){+.+.}-{3:3}, at: filename_create+0x123/0x260 Cc: stable@vger.kernel.org Fixes: 40b268d384a2 ("ksmbd: add mnt_want_write to ksmbd vfs functions") Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-23ksmbd: fix kernel-doc comment of ksmbd_vfs_setxattr()Namjae Jeon1-1/+1
Fix argument list that the kdoc format and script verified in ksmbd_vfs_setxattr(). fs/smb/server/vfs.c:929: warning: Function parameter or member 'path' not described in 'ksmbd_vfs_setxattr' Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-23ksmbd: reorganize ksmbd_iov_pin_rsp()Namjae Jeon1-21/+22
If ksmbd_iov_pin_rsp fail, io vertor should be rollback. This patch moves memory allocations to before setting the io vector to avoid rollbacks. Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-23ksmbd: Remove unused field in ksmbd_user structCheng-Han Wu1-1/+0
fs/smb/server/mgmt/user_config.h:21: Remove the unused field 'failed_login_count' from the ksmbd_user struct. Signed-off-by: Cheng-Han Wu <hank20010209@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-10-23bcachefs: Refactor memcpy into direct assignmentKees Cook1-6/+4
The memcpy() in bch2_bkey_append_ptr() is operating on an embedded fake flexible array which looks to the compiler like it has 0 size. This causes W=1 builds to emit warnings due to -Wstringop-overflow: In file included from include/linux/string.h:254, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/radix-tree.h:14, from include/linux/backing-dev-defs.h:6, from fs/bcachefs/bcachefs.h:182: fs/bcachefs/extents.c: In function 'bch2_bkey_append_ptr': include/linux/fortify-string.h:57:33: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=] 57 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:648:9: note: in expansion of macro '__underlying_memcpy' 648 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:693:26: note: in expansion of macro '__fortify_memcpy_chk' 693 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ fs/bcachefs/extents.c:235:17: note: in expansion of macro 'memcpy' 235 | memcpy((void *) &k->v + bkey_val_bytes(&k->k), | ^~~~~~ fs/bcachefs/bcachefs_format.h:287:33: note: destination object 'v' of size 0 287 | struct bch_val v; | ^ Avoid making any structure changes and just replace the u64 copy into a direct assignment, side-stepping the entire problem. Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: linux-bcachefs@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309192314.VBsjiIm5-lkp@intel.com/ Link: https://lore.kernel.org/r/20231010235609.work.594-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix drop_alloc_keys()Kent Overstreet1-15/+15
For consistency with the rest of the reconstruct_alloc option, we should be skipping all alloc keys. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: snapshot_create_lockKent Overstreet5-7/+44
Add a new lock for snapshot creation - this addresses a few races with logged operations and snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix snapshot skiplists during snapshot deletionKent Overstreet1-0/+3
In snapshot deleion, we have to pick new skiplist nodes for entries that point to nodes being deleted. The function that finds a new skiplist node, skipping over entries being deleted, was incorrect: if n = 0, but the parent node is being deleted, we also need to skip over that node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_sb_field_get() refactoringKent Overstreet13-82/+71
Instead of using token pasting to generate methods for each superblock section, just make the type a parameter to bch2_sb_field_get(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: KEY_TYPE_error now counts towards i_sectorsKent Overstreet1-0/+1
KEY_TYPE_error is used when all replicas in an extent are marked as failed; it indicates that data was present, but has been lost. So that i_sectors doesn't change when replacing extents with KEY_TYPE_error, we now have to count error keys as allocations - this fixes fsck errors later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix handling of unknown bkey typesKent Overstreet1-1/+0
min_val_size was U8_MAX for unknown key types, causing us to flag any known key as invalid - it should have been 0. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Switch to unsafe_memcpy() in a few placesKent Overstreet2-5/+8
The new fortify checking doesn't work for us in all places; this switches to unsafe_memcpy() where appropriate to silence a few warnings/errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Use struct_size()Christophe JAILLET2-3/+2
Use struct_size() instead of hand writing it. This is less verbose and more robust. While at it, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Correctly initialize new buckets on device resizeKent Overstreet3-9/+27
bch2_dev_resize() was never updated for the allocator rewrite with persistent freelists, and it wasn't noticed because the tests weren't running fsck - oops. Fix this by running bch2_dev_freespace_init() for the new buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix another smatch complaintKent Overstreet1-1/+1
This should be harmless, but initialize last_seq anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Use strsep() in split_devs()Kent Overstreet1-4/+2
Minor refactoring to fix a smatch complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add iops fields to bch_memberHunter Shaffer4-7/+35
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1Hunter Shaffer5-20/+19
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New superblock section members_v2Hunter Shaffer9-160/+326
members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add new helper to retrieve bch_member from sbHunter Shaffer8-65/+62
Prep work for introducing bch_sb_field_members_v2 - introduce new helpers that will check for members_v2 if it exists, otherwise using v1 Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bucket_lock() is now a sleepable lockKent Overstreet2-4/+5
fsck_err() may sleep - it takes a mutex and may allocate memory, so bucket_lock() needs to be a sleepable lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: fix crc32c checksum merge byte order problemBrian Foster1-2/+2
An fsstress task on a big endian system (s390x) quickly produces a bunch of CRC errors in the system logs. Most of these are related to the narrow CRCs path, but the fundamental problem can be reduced to a single write and re-read (after dropping caches) of a previously merged extent. The key merge path that handles extent merges eventually calls into bch2_checksum_merge() to combine the CRCs of the associated extents. This code attempts to avoid a byte order swap by feeding the le64 values into the crc32c code, but the latter casts the resulting u64 value down to a u32, which truncates the high bytes where the actual crc value ends up. This results in a CRC value that does not change (since it is merged with a CRC of 0), and checksum failures ensue. Fix the checksum merge code to swap to cpu byte order on the boundaries to the external crc code such that any value casting is handled properly. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix bch2_inode_delete_keys()Kent Overstreet1-2/+8
bch2_inode_delete_keys() was using BTREE_ITER_NOT_EXTENTS, on the assumption that it would never need to split extents. But that caused a race with extents being split by other threads - specifically, the data move path. Extents iterators have the iterator position pointing to the start of the extent, which avoids the race. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Make btree root read errors recoverableKent Overstreet1-5/+4
The entire btree will be lost, but that is better than the entire filesystem not being recoverable. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fall back to requesting passphrase directlyKent Overstreet1-2/+31
We can only do this in userspace, unfortunately - but kernel keyrings have never seemed to worked reliably, this is a useful fallback. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix looping around bch2_propagate_key_to_snapshot_leaves()Kent Overstreet1-2/+8
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch_err_msg(), bch_err_fn() now filters out transaction restart errorsKent Overstreet4-137/+80
These errors aren't actual errors, and should never be printed - do this in the common helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Silence transaction restart error messageKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: More assertions for nocow lockingKent Overstreet3-8/+31
- assert in shutdown path that no nocow locks are held - check for overflow when taking nocow locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: nocow locking: Fix lock leakKent Overstreet1-1/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>