summaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)AuthorFilesLines
9 daysMerge tag 'pull-stable-struct_fd' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it.
12 daysMerge tag 'ext4_for_linus-6.12-rc1' of ↵Linus Torvalds21-921/+930
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Lots of cleanups and bug fixes this cycle, primarily in the block allocation, extent management, fast commit, and journalling" * tag 'ext4_for_linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (93 commits) ext4: convert EXT4_B2C(sbi->s_stripe) users to EXT4_NUM_B2C ext4: check stripe size compatibility on remount as well ext4: fix i_data_sem unlock order in ext4_ind_migrate() ext4: remove the special buffer dirty handling in do_journal_get_write_access ext4: fix a potential assertion failure due to improperly dirtied buffer ext4: hoist ext4_block_write_begin and replace the __block_write_begin ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers ext4: dax: keep orphan list before truncate overflow allocated blocks ext4: fix error message when rejecting the default hash ext4: save unnecessary indentation in ext4_ext_create_new_leaf() ext4: make some fast commit functions reuse extents path ext4: refactor ext4_swap_extents() to reuse extents path ext4: get rid of ppath in convert_initialized_extent() ext4: get rid of ppath in ext4_ext_handle_unwritten_extents() ext4: get rid of ppath in ext4_ext_convert_to_initialized() ext4: get rid of ppath in ext4_convert_unwritten_extents_endio() ext4: get rid of ppath in ext4_split_convert_extents() ext4: get rid of ppath in ext4_split_extent() ext4: get rid of ppath in ext4_force_split_extent_at() ext4: get rid of ppath in ext4_split_extent_at() ...
2024-09-16Merge tag 'vfs-6.12.file' of ↵Linus Torvalds3-25/+34
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs file updates from Christian Brauner: "This is the work to cleanup and shrink struct file significantly. Right now, (focusing on x86) struct file is 232 bytes. After this series struct file will be 184 bytes aka 3 cacheline and a spare 8 bytes for future extensions at the end of the struct. With struct file being as ubiquitous as it is this should make a difference for file heavy workloads and allow further optimizations in the future. - struct fown_struct was embedded into struct file letting it take up 32 bytes in total when really it shouldn't even be embedded in struct file in the first place. Instead, actual users of struct fown_struct now allocate the struct on demand. This frees up 24 bytes. - Move struct file_ra_state into the union containg the cleanup hooks and move f_iocb_flags out of the union. This closes a 4 byte hole we created earlier and brings struct file to 192 bytes. Which means struct file is 3 cachelines and we managed to shrink it by 40 bytes. - Reorder struct file so that nothing crosses a cacheline. I suspect that in the future we will end up reordering some members to mitigate false sharing issues or just because someone does actually provide really good perf data. - Shrinking struct file to 192 bytes is only part of the work. Files use a slab that is SLAB_TYPESAFE_BY_RCU and when a kmem cache is created with SLAB_TYPESAFE_BY_RCU the free pointer must be located outside of the object because the cache doesn't know what part of the memory can safely be overwritten as it may be needed to prevent object recycling. That has the consequence that SLAB_TYPESAFE_BY_RCU may end up adding a new cacheline. So this also contains work to add a new kmem_cache_create_rcu() function that allows the caller to specify an offset where the freelist pointer is supposed to be placed. Thus avoiding the implicit addition of a fourth cacheline. - And finally this removes the f_version member in struct file. The f_version member isn't particularly well-defined. It is mainly used as a cookie to detect concurrent seeks when iterating directories. But it is also abused by some subsystems for completely unrelated things. It is mostly a directory and filesystem specific thing that doesn't really need to live in struct file and with its wonky semantics it really lacks a specific function. For pipes, f_version is (ab)used to defer poll notifications until a write has happened. And struct pipe_inode_info is used by multiple struct files in their ->private_data so there's no chance of pushing that down into file->private_data without introducing another pointer indirection. But pipes don't rely on f_pos_lock so this adds a union into struct file encompassing f_pos_lock and a pipe specific f_pipe member that pipes can use. This union of course can be extended to other file types and is similar to what we do in struct inode already" * tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (26 commits) fs: remove f_version pipe: use f_pipe fs: add f_pipe ubifs: store cookie in private data ufs: store cookie in private data udf: store cookie in private data proc: store cookie in private data ocfs2: store cookie in private data input: remove f_version abuse ext4: store cookie in private data ext2: store cookie in private data affs: store cookie in private data fs: add generic_llseek_cookie() fs: use must_set_pos() fs: add must_set_pos() fs: add vfs_setpos_cookie() s390: remove unused f_version ceph: remove unused f_version adi: remove unused f_version mm: Removed @freeptr_offset to prevent doc warning ...
2024-09-09ext4: store cookie in private dataChristian Brauner3-25/+34
Store the cookie to detect concurrent seeks on directories in file->private_data. Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-11-6d3e4816aa7b@kernel.org Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-04ext4: convert EXT4_B2C(sbi->s_stripe) users to EXT4_NUM_B2COjaswin Mujoo1-5/+7
Although we have checks to make sure s_stripe is a multiple of cluster size, in case we accidentally end up with a scenario where this is not the case, use EXT4_NUM_B2C() so that we don't end up with unexpected cases where EXT4_B2C(stripe) becomes 0. Also make the is_stripe_aligned check in regular_allocator a bit more robust while we are at it. This should ideally have no functional change unless we have a bug somewhere causing (stripe % cluster_size != 0) Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/e0c0a3b58a40935a1361f668851d041575861411.1725002410.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: check stripe size compatibility on remount as wellOjaswin Mujoo1-7/+22
We disable stripe size in __ext4_fill_super if it is not a multiple of the cluster ratio however this check is missed when trying to remount. This can leave us with cases where stripe < cluster_ratio after remount:set making EXT4_B2C(sbi->s_stripe) become 0 that can cause some unforeseen bugs like divide by 0. Fix that by adding the check in remount path as well. Reported-by: syzbot+1ad8bac5af24d01e2cbd@syzkaller.appspotmail.com Tested-by: syzbot+1ad8bac5af24d01e2cbd@syzkaller.appspotmail.com Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Fixes: c3defd99d58c ("ext4: treat stripe in block unit") Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/3a493bb503c3598e25dcfbed2936bb2dff3fece7.1725002410.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: fix i_data_sem unlock order in ext4_ind_migrate()Artem Sadovnikov1-1/+1
Fuzzing reports a possible deadlock in jbd2_log_wait_commit. This issue is triggered when an EXT4_IOC_MIGRATE ioctl is set to require synchronous updates because the file descriptor is opened with O_SYNC. This can lead to the jbd2_journal_stop() function calling jbd2_might_wait_for_commit(), potentially causing a deadlock if the EXT4_IOC_MIGRATE call races with a write(2) system call. This problem only arises when CONFIG_PROVE_LOCKING is enabled. In this case, the jbd2_might_wait_for_commit macro locks jbd2_handle in the jbd2_journal_stop function while i_data_sem is locked. This triggers lockdep because the jbd2_journal_start function might also lock the same jbd2_handle simultaneously. Found by Linux Verification Center (linuxtesting.org) with syzkaller. Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Co-developed-by: Mikhail Ukhin <mish.uxin2012@yandex.ru> Signed-off-by: Mikhail Ukhin <mish.uxin2012@yandex.ru> Signed-off-by: Artem Sadovnikov <ancowi69@gmail.com> Rule: add Link: https://lore.kernel.org/stable/20240404095000.5872-1-mish.uxin2012%40yandex.ru Link: https://patch.msgid.link/20240829152210.2754-1-ancowi69@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: remove the special buffer dirty handling in do_journal_get_write_accessShida Zhang1-17/+1
This kinda revert the commit 56d35a4cd13e("ext4: Fix dirtying of journalled buffers in data=journal mode") made by Jan 14 years ago, since the do_get_write_access() itself can deal with the extra unexpected buf dirting things in a proper way now. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240830053739.3588573-5-zhangshida@kylinos.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: fix a potential assertion failure due to improperly dirtied bufferShida Zhang3-12/+40
On an old kernel version(4.19, ext3, data=journal, pagesize=64k), an assertion failure will occasionally be triggered by the line below: ----------- jbd2_journal_commit_transaction { ... J_ASSERT_BH(bh, !buffer_dirty(bh)); /* * The buffer on BJ_Forget list and not jbddirty means ... } ----------- The same condition may also be applied to the lattest kernel version. When blocksize < pagesize and we truncate a file, there can be buffers in the mapping tail page beyond i_size. These buffers will be filed to transaction's BJ_Forget list by ext4_journalled_invalidatepage() during truncation. When the transaction doing truncate starts committing, we can grow the file again. This calls __block_write_begin() which allocates new blocks under these buffers in the tail page we go through the branch: if (buffer_new(bh)) { clean_bdev_bh_alias(bh); if (folio_test_uptodate(folio)) { clear_buffer_new(bh); set_buffer_uptodate(bh); mark_buffer_dirty(bh); continue; } ... } Hence buffers on BJ_Forget list of the committing transaction get marked dirty and this triggers the jbd2 assertion. Teach ext4_block_write_begin() to properly handle files with data journalling by avoiding dirtying them directly. Instead of folio_zero_new_buffers() we use ext4_journalled_zero_new_buffers() which takes care of handling journalling. We also don't need to mark new uptodate buffers as dirty in ext4_block_write_begin(). That will be either done either by block_commit_write() in case of success or by folio_zero_new_buffers() in case of failure. Reported-by: Baolin Liu <liubaolin@kylinos.cn> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240830053739.3588573-4-zhangshida@kylinos.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: hoist ext4_block_write_begin and replace the __block_write_beginShida Zhang3-24/+12
Using __block_write_begin() make it inconvenient to journal the user data dirty process. We can't tell the block layer maintainer, ‘Hey, we want to trace the dirty user data in ext4, can we add some special code for ext4 in __block_write_begin?’:P So use ext4_block_write_begin() instead. The two functions are basically doing the same thing except for the fscrypt related code. Remove the unnecessary #ifdef since fscrypt_inode_uses_fs_layer_crypto() returns false (and it's known at compile time) when !CONFIG_FS_ENCRYPTION. And hoist the ext4_block_write_begin so that it can be used in other files. Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240830053739.3588573-3-zhangshida@kylinos.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffersShida Zhang1-1/+1
For new uptodate buffers we also need to call write_end_fn() to persist the uptodate content, similarly as folio_zero_new_buffers() does it. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Shida Zhang <zhangshida@kylinos.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240830053739.3588573-2-zhangshida@kylinos.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: dax: keep orphan list before truncate overflow allocated blocksyangerkun1-6/+6
Any extending write for ext4 requires the inode to be placed on the orphan list before the actual write. In addition, the inode can be actually removed from the orphan list only after all writes are completed. Otherwise we'd leave allocated blocks beyond i_disksize if we could not copy all the data into allocated block and e2fsck would complain. Currently, direct IO and buffered IO comply with this logic(buffered IO will truncate all overflow allocated blocks that has not been written successfully, and direct IO will truncate all allocated blocks when error occurs). However, dax write break this since dax write will remove the inode from the orphan list by calling ext4_handle_inode_extension unconditionally during extending write. We add a argument to help determine does we do a fully write, and for the case not fully write, we leave the inode on the orphan list, and the latter ext4_inode_extension_cleanup will help us truncate the overflow allocated blocks, and then remove the inode from the orphan list. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240829110222.126685-1-yangerkun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: fix error message when rejecting the default hashGabriel Krisman Bertazi2-10/+18
Commit 985b67cd8639 ("ext4: filesystems without casefold feature cannot be mounted with siphash") properly rejects volumes where s_def_hash_version is set to DX_HASH_SIPHASH, but the check and the error message should not look into casefold setup - a filesystem should never have DX_HASH_SIPHASH as the default hash. Fix it and, since we are there, move the check to ext4_hash_info_init. Fixes:985b67cd8639 ("ext4: filesystems without casefold feature cannot be mounted with siphash") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://patch.msgid.link/87jzg1en6j.fsf_-_@mailhost.krisman.be Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: save unnecessary indentation in ext4_ext_create_new_leaf()Baokun Li1-23/+21
Save an indentation level in ext4_ext_create_new_leaf() by removing unnecessary 'else'. Besides, the variable 'ee_block' is declared to avoid line breaks. No functional changes. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240822023545.1994557-26-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: make some fast commit functions reuse extents pathBaokun Li2-33/+29
The ext4_find_extent() can update the extent path so that it does not have to allocate and free the path repeatedly, thus reducing the consumption of memory allocation and freeing in the following functions: ext4_ext_clear_bb ext4_ext_replay_set_iblocks ext4_fc_replay_add_range ext4_fc_set_bitmaps_and_counters No functional changes. Note that ext4_find_extent() does not support error pointers, so in this case set path to NULL first. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-25-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: refactor ext4_swap_extents() to reuse extents pathBaokun Li1-26/+22
The ext4_find_extent() can update the extent path so it doesn't have to allocate and free path repeatedly, thus reducing the consumption of memory allocation and freeing in ext4_swap_extents(). Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-24-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in convert_initialized_extent()Baokun Li1-18/+19
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in convert_initialized_extent(), the following is done here: * Free the extents path when an error is encountered. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-23-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_ext_handle_unwritten_extents()Baokun Li1-45/+37
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_ext_handle_unwritten_extents(), the following is done here: * Free the extents path when an error is encountered. * The 'allocated' is changed from passing a value to passing an address. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-22-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_ext_convert_to_initialized()Baokun Li1-38/+35
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_ext_convert_to_initialized(), the following is done here: * Free the extents path when an error is encountered. * Its caller needs to update ppath if it uses ppath. * The 'allocated' is changed from passing a value to passing an address. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-21-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_convert_unwritten_extents_endio()Baokun Li1-20/+23
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_convert_unwritten_extents_endio(), the following is done here: * Free the extents path when an error is encountered. * Its caller needs to update ppath if it uses ppath. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-20-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_split_convert_extents()Baokun Li1-32/+33
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_split_convert_extents(), the following is done here: * Its caller needs to update ppath if it uses ppath. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-19-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_split_extent()Baokun Li1-47/+50
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_split_extent(), the following is done here: * The 'allocated' is changed from passing a value to passing an address. * Its caller needs to update ppath if it uses ppath. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-18-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_force_split_extent_at()Baokun Li1-31/+38
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_force_split_extent_at(), the following is done here: * Free the extents path when an error is encountered. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-17-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_split_extent_at()Baokun Li1-38/+47
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_split_extent_at(), the following is done here: * Free the extents path when an error is encountered. * Its caller needs to update ppath if it uses ppath. * Teach ext4_ext_show_leaf() to skip error pointer. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-16-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_ext_insert_extent()Baokun Li4-47/+61
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_ext_insert_extent(), the following is done here: * Free the extents path when an error is encountered. * Its caller needs to update ppath if it uses ppath. * Free path when npath is used, free npath when it is not used. * The got_allocated_blocks label in ext4_ext_map_blocks() does not update err now, so err is updated to 0 if the err returned by ext4_ext_search_right() is greater than 0 and is about to enter got_allocated_blocks. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-15-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_ext_create_new_leaf()Baokun Li1-21/+22
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. To get rid of the ppath in ext4_ext_create_new_leaf(), the following is done here: * Free the extents path when an error is encountered. * Its caller needs to update ppath if it uses ppath. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-14-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in get_ext_path()Baokun Li2-19/+20
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. After getting rid of ppath in get_ext_path(), its caller may pass an error pointer to ext4_free_ext_path(), so it needs to teach ext4_free_ext_path() and ext4_ext_drop_refs() to skip the error pointer. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-13-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: get rid of ppath in ext4_find_extent()Baokun Li3-30/+34
The use of path and ppath is now very confusing, so to make the code more readable, pass path between functions uniformly, and get rid of ppath. Getting rid of ppath in ext4_find_extent() requires its caller to update ppath. These ppaths will also be dropped later. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-12-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: propagate errors from ext4_find_extent() in ext4_insert_range()Baokun Li1-0/+1
Even though ext4_find_extent() returns an error, ext4_insert_range() still returns 0. This may confuse the user as to why fallocate returns success, but the contents of the file are not as expected. So propagate the error returned by ext4_find_extent() to avoid inconsistencies. Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-11-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: add new ext4_ext_path_brelse() helperBaokun Li1-12/+12
Add ext4_ext_path_brelse() helper function to reduce duplicate code and ensure that path->p_bh is set to NULL after it is released. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-10-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: fix double brelse() the buffer of the extents pathBaokun Li1-0/+1
In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been released, otherwise it may be released twice. An example of what triggers this is as follows: split2 map split1 |--------|-------|--------| ext4_ext_map_blocks ext4_ext_handle_unwritten_extents ext4_split_convert_extents // path->p_depth == 0 ext4_split_extent // 1. do split1 ext4_split_extent_at |ext4_ext_insert_extent | ext4_ext_create_new_leaf | ext4_ext_grow_indepth | le16_add_cpu(&neh->eh_depth, 1) | ext4_find_extent | // return -ENOMEM |// get error and try zeroout |path = ext4_find_extent | path->p_depth = 1 |ext4_ext_try_to_merge | ext4_ext_try_to_merge_up | path->p_depth = 0 | brelse(path[1].p_bh) ---> not set to NULL here |// zeroout success // 2. update path ext4_find_extent // 3. do split2 ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth le16_add_cpu(&neh->eh_depth, 1) ext4_find_extent path[0].p_bh = NULL; path->p_depth = 1 read_extent_tree_block ---> return err // path[1].p_bh is still the old value ext4_free_ext_path ext4_ext_drop_refs // path->p_depth == 1 brelse(path[1].p_bh) ---> brelse a buffer twice Finally got the following WARRNING when removing the buffer from lru: ============================================ VFS: brelse: Trying to free free buffer WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90 CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716 RIP: 0010:__brelse+0x58/0x90 Call Trace: <TASK> __find_get_block+0x6e7/0x810 bdev_getblk+0x2b/0x480 __ext4_get_inode_loc+0x48a/0x1240 ext4_get_inode_loc+0xb2/0x150 ext4_reserve_inode_write+0xb7/0x230 __ext4_mark_inode_dirty+0x144/0x6a0 ext4_ext_insert_extent+0x9c8/0x3230 ext4_ext_map_blocks+0xf45/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] ============================================ Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-9-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: drop ppath from ext4_ext_replay_update_ex() to avoid double-freeBaokun Li1-11/+10
When calling ext4_force_split_extent_at() in ext4_ext_replay_update_ex(), the 'ppath' is updated but it is the 'path' that is freed, thus potentially triggering a double-free in the following process: ext4_ext_replay_update_ex ppath = path ext4_force_split_extent_at(&ppath) ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth ext4_find_extent if (depth > path[0].p_maxdepth) kfree(path) ---> path First freed *orig_path = path = NULL ---> null ppath kfree(path) ---> path double-free !!! So drop the unnecessary ppath and use path directly to avoid this problem. And use ext4_find_extent() directly to update path, avoiding unnecessary memory allocation and freeing. Also, propagate the error returned by ext4_find_extent() instead of using strange error codes. Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-8-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: aovid use-after-free in ext4_ext_insert_extent()Baokun Li1-0/+1
As Ojaswin mentioned in Link, in ext4_ext_insert_extent(), if the path is reallocated in ext4_ext_create_new_leaf(), we'll use the stale path and cause UAF. Below is a sample trace with dummy values: ext4_ext_insert_extent path = *ppath = 2000 ext4_ext_create_new_leaf(ppath) ext4_find_extent(ppath) path = *ppath = 2000 if (depth > path[0].p_maxdepth) kfree(path = 2000); *ppath = path = NULL; path = kcalloc() = 3000 *ppath = 3000; return path; /* here path is still 2000, UAF! */ eh = path[depth].p_hdr ================================================================== BUG: KASAN: slab-use-after-free in ext4_ext_insert_extent+0x26d4/0x3330 Read of size 8 at addr ffff8881027bf7d0 by task kworker/u36:1/179 CPU: 3 UID: 0 PID: 179 Comm: kworker/u6:1 Not tainted 6.11.0-rc2-dirty #866 Call Trace: <TASK> ext4_ext_insert_extent+0x26d4/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 [...] Allocated by task 179: ext4_find_extent+0x81c/0x1f70 ext4_ext_map_blocks+0x146/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] Freed by task 179: kfree+0xcb/0x240 ext4_find_extent+0x7c0/0x1f70 ext4_ext_insert_extent+0xa26/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] ================================================================== So use *ppath to update the path to avoid the above problem. Reported-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Closes: https://lore.kernel.org/r/ZqyL6rmtwl6N4MWR@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240822023545.1994557-7-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: update orig_path in ext4_find_extent()Baokun Li2-2/+2
In ext4_find_extent(), if the path is not big enough, we free it and set *orig_path to NULL. But after reallocating and successfully initializing the path, we don't update *orig_path, in which case the caller gets a valid path but a NULL ppath, and this may cause a NULL pointer dereference or a path memory leak. For example: ext4_split_extent path = *ppath = 2000 ext4_find_extent if (depth > path[0].p_maxdepth) kfree(path = 2000); *orig_path = path = NULL; path = kcalloc() = 3000 ext4_split_extent_at(*ppath = NULL) path = *ppath; ex = path[depth].p_ext; // NULL pointer dereference! ================================================================== BUG: kernel NULL pointer dereference, address: 0000000000000010 CPU: 6 UID: 0 PID: 576 Comm: fsstress Not tainted 6.11.0-rc2-dirty #847 RIP: 0010:ext4_split_extent_at+0x6d/0x560 Call Trace: <TASK> ext4_split_extent.isra.0+0xcb/0x1b0 ext4_ext_convert_to_initialized+0x168/0x6c0 ext4_ext_handle_unwritten_extents+0x325/0x4d0 ext4_ext_map_blocks+0x520/0xdb0 ext4_map_blocks+0x2b0/0x690 ext4_iomap_begin+0x20e/0x2c0 [...] ================================================================== Therefore, *orig_path is updated when the extent lookup succeeds, so that the caller can safely use path or *ppath. Fixes: 10809df84a4d ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240822023545.1994557-6-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: avoid use-after-free in ext4_ext_show_leaf()Baokun Li1-5/+4
In ext4_find_extent(), path may be freed by error or be reallocated, so using a previously saved *ppath may have been freed and thus may trigger use-after-free, as follows: ext4_split_extent path = *ppath; ext4_split_extent_at(ppath) path = ext4_find_extent(ppath) ext4_split_extent_at(ppath) // ext4_find_extent fails to free path // but zeroout succeeds ext4_ext_show_leaf(inode, path) eh = path[depth].p_hdr // path use-after-free !!! Similar to ext4_split_extent_at(), we use *ppath directly as an input to ext4_ext_show_leaf(). Fix a spelling error by the way. Same problem in ext4_ext_handle_unwritten_extents(). Since 'path' is only used in ext4_ext_show_leaf(), remove 'path' and use *ppath directly. This issue is triggered only when EXT_DEBUG is defined and therefore does not affect functionality. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-5-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: fix slab-use-after-free in ext4_split_extent_at()Baokun Li1-1/+20
We hit the following use-after-free: ================================================================== BUG: KASAN: slab-use-after-free in ext4_split_extent_at+0xba8/0xcc0 Read of size 2 at addr ffff88810548ed08 by task kworker/u20:0/40 CPU: 0 PID: 40 Comm: kworker/u20:0 Not tainted 6.9.0-dirty #724 Call Trace: <TASK> kasan_report+0x93/0xc0 ext4_split_extent_at+0xba8/0xcc0 ext4_split_extent.isra.0+0x18f/0x500 ext4_split_convert_extents+0x275/0x750 ext4_ext_handle_unwritten_extents+0x73e/0x1580 ext4_ext_map_blocks+0xe20/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] Allocated by task 40: __kmalloc_noprof+0x1ac/0x480 ext4_find_extent+0xf3b/0x1e70 ext4_ext_map_blocks+0x188/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] Freed by task 40: kfree+0xf1/0x2b0 ext4_find_extent+0xa71/0x1e70 ext4_ext_insert_extent+0xa22/0x3260 ext4_split_extent_at+0x3ef/0xcc0 ext4_split_extent.isra.0+0x18f/0x500 ext4_split_convert_extents+0x275/0x750 ext4_ext_handle_unwritten_extents+0x73e/0x1580 ext4_ext_map_blocks+0xe20/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] ================================================================== The flow of issue triggering is as follows: ext4_split_extent_at path = *ppath ext4_ext_insert_extent(ppath) ext4_ext_create_new_leaf(ppath) ext4_find_extent(orig_path) path = *orig_path read_extent_tree_block // return -ENOMEM or -EIO ext4_free_ext_path(path) kfree(path) *orig_path = NULL a. If err is -ENOMEM: ext4_ext_dirty(path + path->p_depth) // path use-after-free !!! b. If err is -EIO and we have EXT_DEBUG defined: ext4_ext_show_leaf(path) eh = path[depth].p_hdr // path also use-after-free !!! So when trying to zeroout or fix the extent length, call ext4_find_extent() to update the path. In addition we use *ppath directly as an ext4_ext_show_leaf() input to avoid possible use-after-free when EXT_DEBUG is defined, and to avoid unnecessary path updates. Fixes: dfe5080939ea ("ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: prevent partial update of the extents pathBaokun Li1-4/+27
In ext4_ext_rm_idx() and ext4_ext_correct_indexes(), there is no proper rollback of already executed updates when updating a level of the extents path fails, so we may get an inconsistent extents tree, which may trigger some bad things in errors=continue mode. Hence clear the verified bit of modified extents buffers if the tree fails to be updated in ext4_ext_rm_idx() or ext4_ext_correct_indexes(), which forces the extents buffers to be checked in ext4_valid_extent_entries(), ensuring that the extents tree is consistent. Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com> Link: https://lore.kernel.org/r/20230213080514.535568-3-zhanchengbin1@huawei.com/ Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-3-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: refactor ext4_ext_rm_idx() to index 'path'Baokun Li1-17/+15
As suggested by Honza in Link,modify ext4_ext_rm_idx() to leave 'path' alone and just index it like ext4_ext_correct_indexes() does it. This facilitates adding error handling later. No functional changes. Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/all/20230216130305.nrbtd42tppxhbynn@quack3/ Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-2-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: avoid OOB when system.data xattr changes underneath the filesystemThadeu Lima de Souza Cascardo1-10/+21
When looking up for an entry in an inlined directory, if e_value_offs is changed underneath the filesystem by some change in the block device, it will lead to an out-of-bounds access that KASAN detects as an UAF. EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none. loop0: detected capacity change from 2048 to 2047 ================================================================== BUG: KASAN: use-after-free in ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500 Read of size 1 at addr ffff88803e91130f by task syz-executor269/5103 CPU: 0 UID: 0 PID: 5103 Comm: syz-executor269 Not tainted 6.11.0-rc4-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500 ext4_find_inline_entry+0x4be/0x5e0 fs/ext4/inline.c:1697 __ext4_find_entry+0x2b4/0x1b30 fs/ext4/namei.c:1573 ext4_lookup_entry fs/ext4/namei.c:1727 [inline] ext4_lookup+0x15f/0x750 fs/ext4/namei.c:1795 lookup_one_qstr_excl+0x11f/0x260 fs/namei.c:1633 filename_create+0x297/0x540 fs/namei.c:3980 do_symlinkat+0xf9/0x3a0 fs/namei.c:4587 __do_sys_symlinkat fs/namei.c:4610 [inline] __se_sys_symlinkat fs/namei.c:4607 [inline] __x64_sys_symlinkat+0x95/0xb0 fs/namei.c:4607 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f3e73ced469 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff4d40c258 EFLAGS: 00000246 ORIG_RAX: 000000000000010a RAX: ffffffffffffffda RBX: 0032656c69662f2e RCX: 00007f3e73ced469 RDX: 0000000020000200 RSI: 00000000ffffff9c RDI: 00000000200001c0 RBP: 0000000000000000 R08: 00007fff4d40c290 R09: 00007fff4d40c290 R10: 0023706f6f6c2f76 R11: 0000000000000246 R12: 00007fff4d40c27c R13: 0000000000000003 R14: 431bde82d7b634db R15: 00007fff4d40c2b0 </TASK> Calling ext4_xattr_ibody_find right after reading the inode with ext4_get_inode_loc will lead to a check of the validity of the xattrs, avoiding this problem. Reported-by: syzbot+0c2508114d912a54ee79@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0c2508114d912a54ee79 Fixes: e8e948e7802a ("ext4: let ext4_find_entry handle inline data") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-5-cascardo@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: explicitly exit when ext4_find_inline_entry returns an errorThadeu Lima de Souza Cascardo1-1/+1
__ext4_find_entry currently ignores the return of ext4_find_inline_entry, except for returning the bh or NULL when has_inline_data is 1. Even though has_inline_data is set to 1 before calling ext4_find_inline_entry and would only be set to 0 when that function returns NULL, check for an encoded error return explicitly in order to exit. That makes the code more readable, not requiring that one assumes the cases when has_inline_data is 1. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-4-cascardo@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: return error on ext4_find_inline_entryThadeu Lima de Souza Cascardo1-3/+7
In case of errors when reading an inode from disk or traversing inline directory entries, return an error-encoded ERR_PTR instead of returning NULL. ext4_find_inline_entry only caller, __ext4_find_entry already returns such encoded errors. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-3-cascardo@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: ext4_search_dir should return a proper errorThadeu Lima de Souza Cascardo1-5/+7
ext4_search_dir currently returns -1 in case of a failure, while it returns 0 when the name is not found. In such failure cases, it should return an error code instead. This becomes even more important when ext4_find_inline_entry returns an error code as well in the next commit. -EFSCORRUPTED seems appropriate as such error code as these failures would be caused by unexpected record lengths and is in line with other instances of ext4_check_dir_entry failures. In the case of ext4_dx_find_entry, the current use of ERR_BAD_DX_DIR was left as is to reduce the risk of regressions. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-2-cascardo@igalia.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: check buffer_verified in advance to avoid unneeded ext4_get_group_info()Kemeng Shi1-2/+2
Check buffer_verified in advance to avoid unneeded ext4_get_group_info(). This could be a simple cleanup as compiler may handle this. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-8-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: remove unneeded NULL check of buffer_head in ext4_mark_inode_used()Kemeng Shi1-1/+1
If gdp from ext4_get_group_desc() is not NULL, then returned group_desc_bh won't be NULL either. Remove check of group_desc_bh and only check returned gdp from ext4_get_group_desc() like how other callers do. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: move checksum length calculation of inode bitmap into ↵Kemeng Shi4-14/+13
ext4_inode_bitmap_csum_[verify/set]() functions There are some little improve: 1. remove repeat code to calculate checksum length of inode bitmap 2. remove unnecessary checksum length calculation if checksum is not enabled. 3. use more efficient bit shift operation instead of div opreation. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: remove dead check in __ext4_new_inode()Kemeng Shi1-3/+0
If we can't grab any inode, the prvious find_inode_bit() will set ino to be >= EXT4_INODES_PER_GROUP(sb). So the check of need to repeat in the same group is not needed. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: avoid negative min_clusters in find_group_orlov()Kemeng Shi1-0/+2
min_clusters is signed integer and will be converted to unsigned integer when compared with unsigned number stats.free_clusters. If min_clusters is negative, it will be converted to a huge unsigned value in which case all groups may not meet the actual desired free clusters. Set negative min_clusters to 0 to avoid unexpected behavior. Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: avoid potential buffer_head leak in __ext4_new_inode()Kemeng Shi1-3/+4
If a group is marked EXT4_GROUP_INFO_IBITMAP_CORRUPT after it's inode bitmap buffer_head was successfully verified, then __ext4_new_inode() will get a valid inode_bitmap_bh of a corrupted group from ext4_read_inode_bitmap() in which case inode_bitmap_bh misses a release. Hnadle "IS_ERR(inode_bitmap_bh)" and group corruption separately like how ext4_free_inode() does to avoid buffer_head leak. Fixes: 9008a58e5dce ("ext4: make the bitmap read routines return real error codes") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-04ext4: avoid buffer_head leak in ext4_mark_inode_used()Kemeng Shi1-2/+3
Release inode_bitmap_bh from ext4_read_inode_bitmap() in ext4_mark_inode_used() to avoid buffer_head leak. By the way, remove unneeded goto for invalid ino when inode_bitmap_bh is NULL. Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discardyangerkun1-6/+4
Commit 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in ext4_group_info") speed up fstrim by skipping trim trimmed group. We also has the chance to clear trimmed once there exists some block free for this group(mount without discard), and the next trim for this group will work well too. For mount with discard, we will issue dicard when we free blocks, so leave trimmed flag keep alive to skip useless trim trigger from userspace seems reasonable. But for some case like ext4 build on dm-thinpool(ext4 blocksize 4K, pool blocksize 128K), discard from ext4 maybe unaligned for dm thinpool, and thinpool will just finish this discard(see process_discard_bio when begein equals to end) without actually process discard. For this case, trim from userspace can really help us to free some thinpool block. So convert to clear trimmed flag for all case no matter mounted with discard or not. Fixes: 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in ext4_group_info") Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240817085510.2084444-1-yangerkun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>