summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-03-17ext4: Fix possible corruption when moving a directoryJan Kara1-1/+10
[ Upstream commit 0813299c586b175d7edb25f56412c54b812d0379 ] When we are renaming a directory to a different directory, we need to update '..' entry in the moved directory. However nothing prevents moved directory from being modified and even converted from the inline format to the normal format. When such race happens the rename code gets confused and we crash. Fix the problem by locking the moved directory. CC: stable@vger.kernel.org Fixes: 32f7f22c0b52 ("ext4: let ext4_rename handle inline dir") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230126112221.11866-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17udf: Fix off-by-one error when discarding preallocationJan Kara1-1/+1
[ Upstream commit f54aa97fb7e5329a373f9df4e5e213ced4fc8759 ] The condition determining whether the preallocation can be used had an off-by-one error so we didn't discard preallocation when new allocation was just following it. This can then confuse code in inode_getblk(). CC: stable@vger.kernel.org Fixes: 16d055656814 ("udf: Discard preallocation before extending file with a hole") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17ext4: zero i_disksize when initializing the bootloader inodeZhihao Cheng1-0/+1
commit f5361da1e60d54ec81346aee8e3d8baf1be0b762 upstream. If the boot loader inode has never been used before, the EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the i_size to 0. However, if the "never before used" boot loader has a non-zero i_size, then i_disksize will be non-zero, and the inconsistency between i_size and i_disksize can trigger a kernel warning: WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa RIP: 0010:ext4_file_write_iter+0xbc7/0xd10 Call Trace: vfs_write+0x3b1/0x5c0 ksys_write+0x77/0x160 __x64_sys_write+0x22/0x30 do_syscall_64+0x39/0x80 Reproducer: 1. create corrupted image and mount it: mke2fs -t ext4 /tmp/foo.img 200 debugfs -wR "sif <5> size 25700" /tmp/foo.img mount -t ext4 /tmp/foo.img /mnt cd /mnt echo 123 > file 2. Run the reproducer program: posix_memalign(&buf, 1024, 1024) fd = open("file", O_RDWR | O_DIRECT); ioctl(fd, EXT4_IOC_SWAP_BOOT); write(fd, buf, 1024); Fix this by setting i_disksize as well as i_size to zero when initiaizing the boot loader inode. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159 Cc: stable@kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17ext4: fix WARNING in ext4_update_inline_dataYe Bin1-0/+3
commit 2b96b4a5d9443ca4cad58b0040be455803c05a42 upstream. Syzbot found the following issue: EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 without journal. Quota mode: none. fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-aesni" fscrypt: AES-256-XTS using implementation "xts-aes-aesni" ------------[ cut here ]------------ WARNING: CPU: 0 PID: 5071 at mm/page_alloc.c:5525 __alloc_pages+0x30a/0x560 mm/page_alloc.c:5525 Modules linked in: CPU: 1 PID: 5071 Comm: syz-executor263 Not tainted 6.2.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 RIP: 0010:__alloc_pages+0x30a/0x560 mm/page_alloc.c:5525 RSP: 0018:ffffc90003c2f1c0 EFLAGS: 00010246 RAX: ffffc90003c2f220 RBX: 0000000000000014 RCX: 0000000000000000 RDX: 0000000000000028 RSI: 0000000000000000 RDI: ffffc90003c2f248 RBP: ffffc90003c2f2d8 R08: dffffc0000000000 R09: ffffc90003c2f220 R10: fffff52000785e49 R11: 1ffff92000785e44 R12: 0000000000040d40 R13: 1ffff92000785e40 R14: dffffc0000000000 R15: 1ffff92000785e3c FS: 0000555556c0d300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f95d5e04138 CR3: 00000000793aa000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __alloc_pages_node include/linux/gfp.h:237 [inline] alloc_pages_node include/linux/gfp.h:260 [inline] __kmalloc_large_node+0x95/0x1e0 mm/slab_common.c:1113 __do_kmalloc_node mm/slab_common.c:956 [inline] __kmalloc+0xfe/0x190 mm/slab_common.c:981 kmalloc include/linux/slab.h:584 [inline] kzalloc include/linux/slab.h:720 [inline] ext4_update_inline_data+0x236/0x6b0 fs/ext4/inline.c:346 ext4_update_inline_dir fs/ext4/inline.c:1115 [inline] ext4_try_add_inline_entry+0x328/0x990 fs/ext4/inline.c:1307 ext4_add_entry+0x5a4/0xeb0 fs/ext4/namei.c:2385 ext4_add_nondir+0x96/0x260 fs/ext4/namei.c:2772 ext4_create+0x36c/0x560 fs/ext4/namei.c:2817 lookup_open fs/namei.c:3413 [inline] open_last_lookups fs/namei.c:3481 [inline] path_openat+0x12ac/0x2dd0 fs/namei.c:3711 do_filp_open+0x264/0x4f0 fs/namei.c:3741 do_sys_openat2+0x124/0x4e0 fs/open.c:1310 do_sys_open fs/open.c:1326 [inline] __do_sys_openat fs/open.c:1342 [inline] __se_sys_openat fs/open.c:1337 [inline] __x64_sys_openat+0x243/0x290 fs/open.c:1337 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Above issue happens as follows: ext4_iget ext4_find_inline_data_nolock ->i_inline_off=164 i_inline_size=60 ext4_try_add_inline_entry __ext4_mark_inode_dirty ext4_expand_extra_isize_ea ->i_extra_isize=32 s_want_extra_isize=44 ext4_xattr_shift_entries ->after shift i_inline_off is incorrect, actually is change to 176 ext4_try_add_inline_entry ext4_update_inline_dir get_max_inline_xattr_value_size if (EXT4_I(inode)->i_inline_off) entry = (struct ext4_xattr_entry *)((void *)raw_inode + EXT4_I(inode)->i_inline_off); free += EXT4_XATTR_SIZE(le32_to_cpu(entry->e_value_size)); ->As entry is incorrect, then 'free' may be negative ext4_update_inline_data value = kzalloc(len, GFP_NOFS); -> len is unsigned int, maybe very large, then trigger warning when 'kzalloc()' To resolve the above issue we need to update 'i_inline_off' after 'ext4_xattr_shift_entries()'. We do not need to set EXT4_STATE_MAY_INLINE_DATA flag here, since ext4_mark_inode_dirty() already sets this flag if needed. Setting EXT4_STATE_MAY_INLINE_DATA when it is needed may trigger a BUG_ON in ext4_writepages(). Reported-by: syzbot+d30838395804afc2fa6f@syzkaller.appspotmail.com Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230307015253.2232062-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17ext4: move where set the MAY_INLINE_DATA flag is setYe Bin2-2/+6
commit 1dcdce5919115a471bf4921a57f20050c545a236 upstream. The only caller of ext4_find_inline_data_nolock() that needs setting of EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode(). In ext4_write_inline_data_end() we just need to update inode->i_inline_off. Since we are going to add one more caller that does not need to set EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA out to ext4_iget_extra_inode(). Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230307015253.2232062-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17ext4: fix another off-by-one fsmap error on 1k block filesystemsDarrick J. Wong1-0/+2
commit c993799baf9c5861f8df91beb80e1611b12efcbd upstream. Apparently syzbot figured out that issuing this FSMAP call: struct fsmap_head cmd = { .fmh_count = ...; .fmh_keys = { { .fmr_device = /* ext4 dev */, .fmr_physical = 0, }, { .fmr_device = /* ext4 dev */, .fmr_physical = 0, }, }, ... }; ret = ioctl(fd, FS_IOC_GETFSMAP, &cmd); Produces this crash if the underlying filesystem is a 1k-block ext4 filesystem: kernel BUG at fs/ext4/ext4.h:3331! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 3 PID: 3227965 Comm: xfs_io Tainted: G W O 6.2.0-rc8-achx Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:ext4_mb_load_buddy_gfp+0x47c/0x570 [ext4] RSP: 0018:ffffc90007c03998 EFLAGS: 00010246 RAX: ffff888004978000 RBX: ffffc90007c03a20 RCX: ffff888041618000 RDX: 0000000000000000 RSI: 00000000000005a4 RDI: ffffffffa0c99b11 RBP: ffff888012330000 R08: ffffffffa0c2b7d0 R09: 0000000000000400 R10: ffffc90007c03950 R11: 0000000000000000 R12: 0000000000000001 R13: 00000000ffffffff R14: 0000000000000c40 R15: ffff88802678c398 FS: 00007fdf2020c880(0000) GS:ffff88807e100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd318a5fe8 CR3: 000000007f80f001 CR4: 00000000001706e0 Call Trace: <TASK> ext4_mballoc_query_range+0x4b/0x210 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80] ext4_getfsmap_datadev+0x713/0x890 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80] ext4_getfsmap+0x2b7/0x330 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80] ext4_ioc_getfsmap+0x153/0x2b0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80] __ext4_ioctl+0x2a7/0x17e0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80] __x64_sys_ioctl+0x82/0xa0 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fdf20558aff RSP: 002b:00007ffd318a9e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000000200c0 RCX: 00007fdf20558aff RDX: 00007fdf1feb2010 RSI: 00000000c0c0583b RDI: 0000000000000003 RBP: 00005625c0634be0 R08: 00005625c0634c40 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdf1feb2010 R13: 00005625be70d994 R14: 0000000000000800 R15: 0000000000000000 For GETFSMAP calls, the caller selects a physical block device by writing its block number into fsmap_head.fmh_keys[01].fmr_device. To query mappings for a subrange of the device, the starting byte of the range is written to fsmap_head.fmh_keys[0].fmr_physical and the last byte of the range goes in fsmap_head.fmh_keys[1].fmr_physical. IOWs, to query what mappings overlap with bytes 3-14 of /dev/sda, you'd set the inputs as follows: fmh_keys[0] = { .fmr_device = major(8, 0), .fmr_physical = 3}, fmh_keys[1] = { .fmr_device = major(8, 0), .fmr_physical = 14}, Which would return you whatever is mapped in the 12 bytes starting at physical offset 3. The crash is due to insufficient range validation of keys[1] in ext4_getfsmap_datadev. On 1k-block filesystems, block 0 is not part of the filesystem, which means that s_first_data_block is nonzero. ext4_get_group_no_and_offset subtracts this quantity from the blocknr argument before cracking it into a group number and a block number within a group. IOWs, block group 0 spans blocks 1-8192 (1-based) instead of 0-8191 (0-based) like what happens with larger blocksizes. The net result of this encoding is that blocknr < s_first_data_block is not a valid input to this function. The end_fsb variable is set from the keys that are copied from userspace, which means that in the above example, its value is zero. That leads to an underflow here: blocknr = blocknr - le32_to_cpu(es->s_first_data_block); The division then operates on -1: offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >> EXT4_SB(sb)->s_cluster_bits; Leaving an impossibly large group number (2^32-1) in blocknr. ext4_getfsmap_check_keys checked that keys[0].fmr_physical and keys[1].fmr_physical are in increasing order, but ext4_getfsmap_datadev adjusts keys[0].fmr_physical to be at least s_first_data_block. This implies that we have to check it again after the adjustment, which is the piece that I forgot. Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com Fixes: 4a4956249dac ("ext4: fix off-by-one fsmap error on 1k block filesystems") Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002 Cc: stable@vger.kernel.org Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/Y+58NPTH7VNGgzdd@magnolia Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17ext4: fix RENAME_WHITEOUT handling for inline directoriesEric Whitney1-6/+7
commit c9f62c8b2dbf7240536c0cc9a4529397bb8bf38e upstream. A significant number of xfstests can cause ext4 to log one or more warning messages when they are run on a test file system where the inline_data feature has been enabled. An example: "EXT4-fs warning (device vdc): ext4_dirblock_csum_set:425: inode #16385: comm fsstress: No space for directory leaf checksum. Please run e2fsck -D." The xfstests include: ext4/057, 058, and 307; generic/013, 051, 068, 070, 076, 078, 083, 232, 269, 270, 390, 461, 475, 476, 482, 579, 585, 589, 626, 631, and 650. In this situation, the warning message indicates a bug in the code that performs the RENAME_WHITEOUT operation on a directory entry that has been stored inline. It doesn't detect that the directory is stored inline, and incorrectly attempts to compute a dirent block checksum on the whiteout inode when creating it. This attempt fails as a result of the integrity checking in get_dirent_tail (usually due to a failure to match the EXT4_FT_DIR_CSUM magic cookie), and the warning message is then emitted. Fix this by simply collecting the inlined data state at the time the search for the source directory entry is performed. Existing code handles the rest, and this is sufficient to eliminate all spurious warning messages produced by the tests above. Go one step further and do the same in the code that resets the source directory entry in the event of failure. The inlined state should be present in the "old" struct, but given the possibility of a race there's no harm in taking a conservative approach and getting that information again since the directory entry is being reread anyway. Fixes: b7ff91fd030d ("ext4: find old entry again if failed to rename whiteout") Cc: stable@kernel.org Signed-off-by: Eric Whitney <enwlinux@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230210173244.679890-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17ext4: fix cgroup writeback accounting with fs-layer encryptionEric Biggers1-5/+6
commit ffec85d53d0f39ee4680a2cf0795255e000e1feb upstream. When writing a page from an encrypted file that is using filesystem-layer encryption (not inline encryption), ext4 encrypts the pagecache page into a bounce page, then writes the bounce page. It also passes the bounce page to wbc_account_cgroup_owner(). That's incorrect, because the bounce page is a newly allocated temporary page that doesn't have the memory cgroup of the original pagecache page. This makes wbc_account_cgroup_owner() not account the I/O to the owner of the pagecache page as it should. Fix this by always passing the pagecache page to wbc_account_cgroup_owner(). Fixes: 001e4a8775f6 ("ext4: implement cgroup writeback support") Cc: stable@vger.kernel.org Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203005503.141557-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17fs: prevent out-of-bounds array speculation when closing a file descriptorTheodore Ts'o1-0/+1
commit 609d54441493c99f21c1823dfd66fa7f4c512ff4 upstream. Google-Bug-Id: 114199369 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11ubifs: ubifs_writepage: Mark page dirty after writing inode failedZhihao Cheng1-3/+9
[ Upstream commit fb8bc4c74ae4526d9489362ab2793a936d072b84 ] There are two states for ubifs writing pages: 1. Dirty, Private 2. Not Dirty, Not Private There is a third possibility which maybe related to [1] that page is private but not dirty caused by following process: PA lock(page) ubifs_write_end attach_page_private // set Private __set_page_dirty_nobuffers // set Dirty unlock(page) write_cache_pages lock(page) clear_page_dirty_for_io(page) // clear Dirty ubifs_writepage write_inode // fail, goto out, following codes are not executed // do_writepage // set_page_writeback // set Writeback // detach_page_private // clear Private // end_page_writeback // clear Writeback out: unlock(page) // Private, Not Dirty PB ksys_fadvise64_64 generic_fadvise invalidate_inode_page // page is neither Dirty nor Writeback invalidate_complete_page // page_has_private is true try_to_release_page ubifs_releasepage ubifs_assert(c, 0) !!! Then we may get following assertion failed: UBIFS error (ubi0:0 pid 1492): ubifs_assert_failed [ubifs]: UBIFS assert failed: 0, in fs/ubifs/file.c:1499 UBIFS warning (ubi0:0 pid 1492): ubifs_ro_mode [ubifs]: switched to read-only mode, error -22 CPU: 2 PID: 1492 Comm: aa Not tainted 5.16.0-rc2-00012-g7bb767dee0ba-dirty Call Trace: dump_stack+0x13/0x1b ubifs_ro_mode+0x54/0x60 [ubifs] ubifs_assert_failed+0x4b/0x80 [ubifs] ubifs_releasepage+0x7e/0x1e0 [ubifs] try_to_release_page+0x57/0xe0 invalidate_inode_page+0xfb/0x130 invalidate_mapping_pagevec+0x12/0x20 generic_fadvise+0x303/0x3c0 vfs_fadvise+0x35/0x40 ksys_fadvise64_64+0x4c/0xb0 Jump [2] to find a reproducer. [1] https://linux-mtd.infradead.narkive.com/NQoBeT1u/patch-rfc-ubifs-fix-assert-failed-in-ubifs-set-page-dirty [2] https://bugzilla.kernel.org/show_bug.cgi?id=215357 Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ubifs: dirty_cow_znode: Fix memleak in error handling pathZhihao Cheng1-1/+8
[ Upstream commit 122deabfe1428bffe95e2bf364ff8a5059bdf089 ] Following process will cause a memleak for copied up znode: dirty_cow_znode zn = copy_znode(c, znode); err = insert_old_idx(c, zbr->lnum, zbr->offs); if (unlikely(err)) return ERR_PTR(err); // No one refers to zn. Fix it by adding copied znode back to tnc, then it will be freed by ubifs_destroy_tnc_subtree() while closing tnc. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216705 Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ubifs: Re-statistic cleaned znode count if commit failedZhihao Cheng1-0/+15
[ Upstream commit 944e096aa24071d3fe22822f6249d3ae309e39ea ] Dirty znodes will be written on flash in committing process with following states: process A | znode state ------------------------------------------------------ do_commit | DIRTY_ZNODE ubifs_tnc_start_commit | DIRTY_ZNODE get_znodes_to_commit | DIRTY_ZNODE | COW_ZNODE layout_commit | DIRTY_ZNODE | COW_ZNODE fill_gap | 0 write master | 0 or OBSOLETE_ZNODE process B | znode state ------------------------------------------------------ do_commit | DIRTY_ZNODE[1] ubifs_tnc_start_commit | DIRTY_ZNODE get_znodes_to_commit | DIRTY_ZNODE | COW_ZNODE ubifs_tnc_end_commit | DIRTY_ZNODE | COW_ZNODE write_index | 0 write master | 0 or OBSOLETE_ZNODE[2] or | DIRTY_ZNODE[3] [1] znode is dirtied without concurrent committing process [2] znode is copied up (re-dirtied by other process) before cleaned up in committing process [3] znode is re-dirtied after cleaned up in committing process Currently, the clean znode count is updated in free_obsolete_znodes(), which is called only in normal path. If do_commit failed, clean znode count won't be updated, which triggers a failure ubifs assertion[4] in ubifs_tnc_close(): ubifs_assert_failed [ubifs]: UBIFS assert failed: freed == n [4] Commit 380347e9ca7682 ("UBIFS: Add an assertion for clean_zn_cnt"). Fix it by re-statisticing cleaned znode count in tnc_destroy_cnext(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216704 Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ubifs: Fix memory leak in alloc_wbufs()Li Zetao1-4/+13
[ Upstream commit 4a1ff3c5d04b9079b4f768d9a71b51c4af578dd2 ] kmemleak reported a sequence of memory leaks, and show them as following: unreferenced object 0xffff8881575f8400 (size 1024): comm "mount", pid 19625, jiffies 4297119604 (age 20.383s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff8176cecd>] __kmalloc+0x4d/0x150 [<ffffffffa0406b2b>] ubifs_mount+0x307b/0x7170 [ubifs] [<ffffffff819fa8fd>] legacy_get_tree+0xed/0x1d0 [<ffffffff81936f2d>] vfs_get_tree+0x7d/0x230 [<ffffffff819b2bd4>] path_mount+0xdd4/0x17b0 [<ffffffff819b37aa>] __x64_sys_mount+0x1fa/0x270 [<ffffffff83c14295>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 unreferenced object 0xffff8881798a6e00 (size 512): comm "mount", pid 19677, jiffies 4297121912 (age 37.816s) hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace: [<ffffffff8176cecd>] __kmalloc+0x4d/0x150 [<ffffffffa0418342>] ubifs_wbuf_init+0x52/0x480 [ubifs] [<ffffffffa0406ca5>] ubifs_mount+0x31f5/0x7170 [ubifs] [<ffffffff819fa8fd>] legacy_get_tree+0xed/0x1d0 [<ffffffff81936f2d>] vfs_get_tree+0x7d/0x230 [<ffffffff819b2bd4>] path_mount+0xdd4/0x17b0 [<ffffffff819b37aa>] __x64_sys_mount+0x1fa/0x270 [<ffffffff83c14295>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 The problem is that the ubifs_wbuf_init() returns an error in the loop which in the alloc_wbufs(), then the wbuf->buf and wbuf->inodes that were successfully alloced before are not freed. Fix it by adding error hanging path in alloc_wbufs() which frees the memory alloced before when ubifs_wbuf_init() returns an error. Fixes: 1e51764a3c2a ("UBIFS: add new flash file system") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ubifs: Reserve one leb for each journal head while doing budgetZhihao Cheng1-4/+3
[ Upstream commit e874dcde1cbf82c786c0e7f2899811c02630cc52 ] UBIFS calculates available space by c->main_bytes - c->lst.total_used (which means non-index lebs' free and dirty space is accounted into total available), then index lebs and four lebs (one for gc_lnum, one for deletions, two for journal heads) are deducted. In following situation, ubifs may get -ENOSPC from make_reservation(): LEB 84: DATAHD free 122880 used 1920 dirty 2176 dark 6144 LEB 110:DELETION free 126976 used 0 dirty 0 dark 6144 (empty) LEB 201:gc_lnum free 126976 used 0 dirty 0 dark 6144 LEB 272:GCHD free 77824 used 47672 dirty 1480 dark 6144 LEB 356:BASEHD free 0 used 39776 dirty 87200 dark 6144 OTHERS: index lebs, zero-available non-index lebs UBIFS calculates the available bytes is 6888 (How to calculate it: 126976 * 5[remain main bytes] - 1920[used] - 47672[used] - 39776[used] - 126976 * 1[deletions] - 126976 * 1[gc_lnum] - 126976 * 2[journal heads] - 6144 * 5[dark] = 6888) after doing budget, however UBIFS cannot use BASEHD's dirty space(87200), because UBIFS cannot find next BASEHD to reclaim current BASEHD. (c->bi.min_idx_lebs equals to c->lst.idx_lebs, the empty leb won't be found by ubifs_find_free_space(), and dirty index lebs won't be picked as gced lebs. All non-index lebs has dirty space less then c->dead_wm, non-index lebs won't be picked as gced lebs either. So new free lebs won't be produced.). See more details in Link. To fix it, reserve one leb for each journal head while doing budget. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216562 Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ubifs: do_rename: Fix wrong space budget when target inode's nlink > 1Zhihao Cheng1-0/+2
[ Upstream commit 25fce616a61fc2f1821e4a9ce212d0e064707093 ] If target inode is a special file (eg. block/char device) with nlink count greater than 1, the inode with ui->data will be re-written on disk. However, UBIFS losts target inode's data_len while doing space budget. Bad space budget may let make_reservation() return with -ENOSPC, which could turn ubifs to read-only mode in do_writepage() process. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216494 Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ubifs: Fix wrong dirty space budget for dirty inodeZhihao Cheng1-1/+1
[ Upstream commit b248eaf049d9cdc5eb76b59399e4d3de233f02ac ] Each dirty inode should reserve 'c->bi.inode_budget' bytes in space budget calculation. Currently, space budget for dirty inode reports more space than what UBIFS actually needs to write. Fixes: 1e51764a3c2ac0 ("UBIFS: add new flash file system") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ubifs: Rectify space budget for ubifs_xrename()Zhihao Cheng1-0/+5
[ Upstream commit 1b2ba09060e41adb356b9ae58ef94a7390928004 ] There is no space budget for ubifs_xrename(). It may let make_reservation() return with -ENOSPC, which could turn ubifs to read-only mode in do_writepage() process. Fix it by adding space budget for ubifs_xrename(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216569 Fixes: 9ec64962afb170 ("ubifs: Implement RENAME_EXCHANGE") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ubifs: Rectify space budget for ubifs_symlink() if symlink is encryptedZhihao Cheng1-1/+1
[ Upstream commit c2c36cc6ca23e614f9e4238d0ecf48549ee9002a ] Fix bad space budget when symlink file is encrypted. Bad space budget may let make_reservation() return with -ENOSPC, which could turn ubifs to read-only mode in do_writepage() process. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216490 Fixes: ca7f85be8d6cf9 ("ubifs: Add support for encrypted symlinks") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ubifs: Fix build errors as symbol undefinedLi Hua1-0/+5
[ Upstream commit aa6d148e6d6270274e3d5a529b71c54cd329d17f ] With CONFIG_UBIFS_FS_AUTHENTICATION not set, the compiler can assume that ubifs_node_check_hash() is never true and drops the call to ubifs_bad_hash(). Is CONFIG_CC_OPTIMIZE_FOR_SIZE enabled this optimization does not happen anymore. So When CONFIG_UBIFS_FS and CONFIG_CC_OPTIMIZE_FOR_SIZE is enabled but CONFIG_UBIFS_FS_AUTHENTICATION is not set, the build errors is as followd: ERROR: modpost: "ubifs_bad_hash" [fs/ubifs/ubifs.ko] undefined! Fix it by add no-op ubifs_bad_hash() for the CONFIG_UBIFS_FS_AUTHENTICATION=n case. Fixes: 16a26b20d2af ("ubifs: authentication: Add hashes to index nodes") Signed-off-by: Li Hua <hucool.lihua@huawei.com> Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11fs: f2fs: initialize fsdata in pagecache_write()Alexander Potapenko1-1/+1
[ Upstream commit b1b9896718bc1a212dc288ad66a5fa2fef11353d ] When aops->write_begin() does not initialize fsdata, KMSAN may report an error passing the latter to aops->write_end(). Fix this by unconditionally initializing fsdata. Suggested-by: Eric Biggers <ebiggers@kernel.org> Fixes: 95ae251fe828 ("f2fs: add fs-verity support") Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11f2fs: use memcpy_{to,from}_page() where possibleEric Biggers3-28/+8
[ Upstream commit b87846bd61c7c09560617da416208a5454530d57 ] This is simpler, and as a side effect it replaces several uses of kmap_atomic() with its recommended replacement kmap_local_page(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Stable-dep-of: b1b9896718bc ("fs: f2fs: initialize fsdata in pagecache_write()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11fs/jfs: fix shift exponent db_agl2size negativeLiu Shixin via Jfs-discussion1-1/+2
[ Upstream commit fad376fce0af58deebc5075b8539dc05bf639af3 ] As a shift exponent, db_agl2size can not be less than 0. Add the missing check to fix the shift-out-of-bounds bug reported by syzkaller: UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:2227:15 shift exponent -744642816 is negative Reported-by: syzbot+0be96567042453c0c820@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ext4: refuse to create ea block when umountedJun Nie1-0/+7
commit f31173c19901a96bb2ebf6bcfec8a08df7095c91 upstream. The ea block expansion need to access s_root while it is already set as NULL when umount is triggered. Refuse this request to avoid panic. Reported-by: syzbot+2dacb8f015bf1420155f@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08 Link: https://lore.kernel.org/r/20230103014517.495275-3-jun.nie@linaro.org Cc: stable@kernel.org Signed-off-by: Jun Nie <jun.nie@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11ext4: optimize ea_inode block expansionJun Nie1-11/+17
commit 1e9d62d252812575ded7c620d8fc67c32ff06c16 upstream. Copy ea data from inode entry when expanding ea block if possible. Then remove the ea entry if expansion success. Thus memcpy to a temporary buffer may be avoided. If the expansion fails, we do not need to recovery the removed ea entry neither in this way. Reported-by: syzbot+2dacb8f015bf1420155f@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=3613786cb88c93aa1c6a279b1df6a7b201347d08 Link: https://lore.kernel.org/r/20230103014517.495275-2-jun.nie@linaro.org Cc: stable@kernel.org Signed-off-by: Jun Nie <jun.nie@linaro.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11jbd2: fix data missing when reusing bh which is ready to be checkpointedZhihao Cheng1-21/+29
commit e6b9bd7290d334451ce054e98e752abc055e0034 upstream. Following process will make data lost and could lead to a filesystem corrupted problem: 1. jh(bh) is inserted into T1->t_checkpoint_list, bh is dirty, and jh->b_transaction = NULL 2. T1 is added into journal->j_checkpoint_transactions. 3. Get bh prepare to write while doing checkpoing: PA PB do_get_write_access jbd2_log_do_checkpoint spin_lock(&jh->b_state_lock) if (buffer_dirty(bh)) clear_buffer_dirty(bh) // clear buffer dirty set_buffer_jbddirty(bh) transaction = journal->j_checkpoint_transactions jh = transaction->t_checkpoint_list if (!buffer_dirty(bh)) __jbd2_journal_remove_checkpoint(jh) // bh won't be flushed jbd2_cleanup_journal_tail __jbd2_journal_file_buffer(jh, transaction, BJ_Reserved) 4. Aborting journal/Power-cut before writing latest bh on journal area. In this way we get a corrupted filesystem with bh's data lost. Fix it by moving the clearing of buffer_dirty bit just before the call to __jbd2_journal_file_buffer(), both bit clearing and jh->b_transaction assignment are under journal->j_list_lock locked, so that jbd2_log_do_checkpoint() will wait until jh's new transaction fininshed even bh is currently not dirty. And journal_shrink_one_cp_list() won't remove jh from checkpoint list if the buffer head is reused in do_get_write_access(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216898 Cc: <stable@kernel.org> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230110015327.1181863-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11udf: Fix file corruption when appending just after end of preallocated extentJan Kara1-13/+11
commit 36ec52ea038b18a53e198116ef7d7e70c87db046 upstream. When we append new block just after the end of preallocated extent, the code in inode_getblk() wrongly determined we're going to use the preallocated extent which resulted in adding block into a wrong logical offset in the file. Sequence like this manifests it: xfs_io -f -c "pwrite 0x2cacf 0xd122" -c "truncate 0x2dd6f" \ -c "pwrite 0x27fd9 0x69a9" -c "pwrite 0x32981 0x7244" <file> The code that determined the use of preallocated extent is actually stale because udf_do_extend_file() does not create preallocation anymore so after calling that function we are sure there's no usable preallocation. Just remove the faulty condition. CC: stable@vger.kernel.org Fixes: 16d055656814 ("udf: Discard preallocation before extending file with a hole") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11udf: Detect system inodes linked into directory hierarchyJan Kara1-1/+6
commit 85a37983ec69cc9fcd188bc37c4de15ee326355a upstream. When UDF filesystem is corrupted, hidden system inodes can be linked into directory hierarchy which is an avenue for further serious corruption of the filesystem and kernel confusion as noticed by syzbot fuzzed images. Refuse to access system inodes linked into directory hierarchy and vice versa. CC: stable@vger.kernel.org Reported-by: syzbot+38695a20b8addcbc1084@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11udf: Preserve link count of system filesJan Kara3-3/+10
commit fc8033a34a3ca7d23353e645e6dde5d364ac5f12 upstream. System files in UDF filesystem have link count 0. To not confuse VFS we fudge the link count to be 1 when reading such inodes however we forget to restore the link count of 0 when writing such inodes. Fix that. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11udf: Do not update file length for failed writes to inline filesJan Kara1-14/+12
commit 256fe4162f8b5a1625b8603ca5f7ff79725bfb47 upstream. When write to inline file fails (or happens only partly), we still updated length of inline data as if the whole write succeeded. Fix the update of length of inline data to happen only if the write succeeds. Reported-by: syzbot+0937935b993956ba28ab@syzkaller.appspotmail.com CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11udf: Do not bother merging very long extentsJan Kara1-17/+2
commit 53cafe1d6d8ef9f93318e5bfccc0d24f27d41ced upstream. When merging very long extents we try to push as much length as possible to the first extent. However this is unnecessarily complicated and not really worth the trouble. Furthermore there was a bug in the logic resulting in corrupting extents in the file as syzbot reproducer shows. So just don't bother with the merging of extents that are too long together. CC: stable@vger.kernel.org Reported-by: syzbot+60f291a24acecb3c2bd5@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11udf: Truncate added extents on failed expansionJan Kara1-4/+11
commit 70bfb3a8d661d4fdc742afc061b88a7f3fc9f500 upstream. When a file expansion failed because we didn't have enough space for indirect extents make sure we truncate extents created so far so that we don't leave extents beyond EOF. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11ocfs2: fix non-auto defrag path not working issueHeming Zhao via Ocfs2-devel1-11/+13
commit 236b9254f8d1edc273ad88b420aa85fbd84f492d upstream. This fixes three issues on move extents ioctl without auto defrag: a) In ocfs2_find_victim_alloc_group(), we have to convert bits to block first in case of global bitmap. b) In ocfs2_probe_alloc_group(), when finding enough bits in block group bitmap, we have to back off move_len to start pos as well, otherwise it may corrupt filesystem. c) In ocfs2_ioctl_move_extents(), set me_threshold both for non-auto and auto defrag paths. Otherwise it will set move_max_hop to 0 and finally cause unexpectedly ENOSPC error. Currently there are no tools triggering the above issues since defragfs.ocfs2 enables auto defrag by default. Tested with manually changing defragfs.ocfs2 to run non auto defrag path. Link: https://lkml.kernel.org/r/20230220050526.22020-1-heming.zhao@suse.com Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11ocfs2: fix defrag path triggering jbd2 ASSERTHeming Zhao via Ocfs2-devel1-10/+0
commit 60eed1e3d45045623e46944ebc7c42c30a4350f0 upstream. code path: ocfs2_ioctl_move_extents ocfs2_move_extents ocfs2_defrag_extent __ocfs2_move_extent + ocfs2_journal_access_di + ocfs2_split_extent //sub-paths call jbd2_journal_restart + ocfs2_journal_dirty //crash by jbs2 ASSERT crash stacks: PID: 11297 TASK: ffff974a676dcd00 CPU: 67 COMMAND: "defragfs.ocfs2" #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01 #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205 #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6 #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18 [exception RIP: jbd2_journal_dirty_metadata+0x2ba] RIP: ffffffffc09ca54a RSP: ffffb25d8dad3b70 RFLAGS: 00010207 RAX: 0000000000000000 RBX: ffff9706eedc5248 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff97337029ea28 RDI: ffff9706eedc5250 RBP: ffff9703c3520200 R8: 000000000f46b0b2 R9: 0000000000000000 R10: 0000000000000001 R11: 00000001000000fe R12: ffff97337029ea28 R13: 0000000000000000 R14: ffff9703de59bf60 R15: ffff9706eedc5250 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2] #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2] #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2] Analysis This bug has the same root cause of 'commit 7f27ec978b0e ("ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()")'. For this bug, jbd2_journal_restart() is called by ocfs2_split_extent() during defragmenting. How to fix For ocfs2_split_extent() can handle journal operations totally by itself. Caller doesn't need to call journal access/dirty pair, and caller only needs to call journal start/stop pair. The fix method is to remove journal access/dirty from __ocfs2_move_extent(). The discussion for this patch: https://oss.oracle.com/pipermail/ocfs2-devel/2023-February/000647.html Link: https://lkml.kernel.org/r/20230217003717.32469-1-heming.zhao@suse.com Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11f2fs: fix cgroup writeback accounting with fs-layer encryptionEric Biggers1-3/+3
commit 844545c51a5b2a524b22a2fe9d0b353b827d24b4 upstream. When writing a page from an encrypted file that is using filesystem-layer encryption (not inline encryption), f2fs encrypts the pagecache page into a bounce page, then writes the bounce page. It also passes the bounce page to wbc_account_cgroup_owner(). That's incorrect, because the bounce page is a newly allocated temporary page that doesn't have the memory cgroup of the original pagecache page. This makes wbc_account_cgroup_owner() not account the I/O to the owner of the pagecache page as it should. Fix this by always passing the pagecache page to wbc_account_cgroup_owner(). Fixes: 578c647879f7 ("f2fs: implement cgroup writeback support") Cc: stable@vger.kernel.org Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11f2fs: fix information leak in f2fs_move_inline_dirents()Eric Biggers1-7/+6
commit 9a5571cff4ffcfc24847df9fd545cc5799ac0ee5 upstream. When converting an inline directory to a regular one, f2fs is leaking uninitialized memory to disk because it doesn't initialize the entire directory block. Fix this by zero-initializing the block. This bug was introduced by commit 4ec17d688d74 ("f2fs: avoid unneeded initializing when converting inline dentry"), which didn't consider the security implications of leaking uninitialized memory to disk. This was found by running xfstest generic/435 on a KMSAN-enabled kernel. Fixes: 4ec17d688d74 ("f2fs: avoid unneeded initializing when converting inline dentry") Cc: <stable@vger.kernel.org> # v4.3+ Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11exfat: fix inode->i_blocks for non-512 byte sector size deviceYuezhang Mo4-9/+5
commit 39c1ce8eafc0ff64fb9e28536ccc7df6a8e2999d upstream. inode->i_blocks is not real number of blocks, but 512 byte ones. Fixes: 98d917047e8b ("exfat: add file operations") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Wang Yugui <wangyugui@e16-tech.com> Tested-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11exfat: redefine DIR_DELETED as the bad cluster numberSungjong Seo1-1/+1
commit bdaadfd343e3cba49ad0b009ff4b148dad0fa404 upstream. When a file or a directory is deleted, the hint for the cluster of its parent directory in its in-memory inode is set as DIR_DELETED. Therefore, DIR_DELETED must be one of invalid cluster numbers. According to the exFAT specification, a volume can have at most 2^32-11 clusters. However, DIR_DELETED is wrongly defined as 0xFFFF0321, which could be a valid cluster number. To fix it, let's redefine DIR_DELETED as 0xFFFFFFF7, the bad cluster number. Fixes: 1acf1a564b60 ("exfat: add in-memory and on-disk structures and headers") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11exfat: fix unexpected EOF while reading dirYuezhang Mo1-4/+1
commit 6cb5d1a16a51d080fbc1649a5144cbc5ca7d6f88 upstream. If the position is not aligned with the dentry size, the return value of readdir() will be NULL and errno is 0, which means the end of the directory stream is reached. If the position is aligned with dentry size, but there is no file or directory at the position, exfat_readdir() will continue to get dentry from the next dentry. So the dentry gotten by readdir() may not be at the position. After this commit, if the position is not aligned with the dentry size, round the position up to the dentry size and continue to get the dentry. Fixes: ca06197382bd ("exfat: add directory operations") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11exfat: fix reporting fs error when reading dir beyond EOFYuezhang Mo1-1/+1
commit 706fdcac002316893434d753be8cfb549fe1d40d upstream. Since seekdir() does not check whether the position is valid, the position may exceed the size of the directory. We found that for a directory with discontinuous clusters, if the position exceeds the size of the directory and the excess size is greater than or equal to the cluster size, exfat_readdir() will return -EIO, causing a file system error and making the file system unavailable. Reproduce this bug by: seekdir(dir, dir_size + cluster_size); dirent = readdir(dir); The following log will be printed if mount with 'errors=remount-ro'. [11166.712896] exFAT-fs (sdb1): error, invalid access to FAT (entry 0xffffffff) [11166.712905] exFAT-fs (sdb1): Filesystem has been set read-only Fixes: 1e5654de0f51 ("exfat: handle wrong stream entry size in exfat_readdir()") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11fs: hfsplus: fix UAF issue in hfsplus_put_superDongliang Mu1-2/+2
commit 07db5e247ab5858439b14dd7cc1fe538b9efcf32 upstream. The current hfsplus_put_super first calls hfs_btree_close on sbi->ext_tree, then invokes iput on sbi->hidden_dir, resulting in an use-after-free issue in hfsplus_release_folio. As shown in hfsplus_fill_super, the error handling code also calls iput before hfs_btree_close. To fix this error, we move all iput calls before hfsplus_btree_close. Note that this patch is tested on Syzbot. Link: https://lkml.kernel.org/r/20230226124948.3175736-1-mudongliangabcd@gmail.com Reported-by: syzbot+57e3e98f7e3b80f64d56@syzkaller.appspotmail.com Tested-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11hfs: fix missing hfs_bnode_get() in __hfs_bnode_createLiu Shixin1-0/+1
commit a9dc087fd3c484fd1ed18c5efb290efaaf44ce03 upstream. Syzbot found a kernel BUG in hfs_bnode_put(): kernel BUG at fs/hfs/bnode.c:466! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 3634 Comm: kworker/u4:5 Not tainted 6.1.0-rc7-syzkaller-00190-g97ee9d1c1696 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:hfs_bnode_put+0x46f/0x480 fs/hfs/bnode.c:466 Code: 8a 80 ff e9 73 fe ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a0 fe ff ff 48 89 df e8 db 8a 80 ff e9 93 fe ff ff e8 a1 68 2c ff <0f> 0b e8 9a 68 2c ff 0f 0b 0f 1f 84 00 00 00 00 00 55 41 57 41 56 RSP: 0018:ffffc90003b4f258 EFLAGS: 00010293 RAX: ffffffff825e318f RBX: 0000000000000000 RCX: ffff8880739dd7c0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90003b4f430 R08: ffffffff825e2d9b R09: ffffed10045157d1 R10: ffffed10045157d1 R11: 1ffff110045157d0 R12: ffff8880228abe80 R13: ffff88807016c000 R14: dffffc0000000000 R15: ffff8880228abe00 FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa6ebe88718 CR3: 000000001e93d000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> hfs_write_inode+0x1bc/0xb40 write_inode fs/fs-writeback.c:1440 [inline] __writeback_single_inode+0x4d6/0x670 fs/fs-writeback.c:1652 writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1878 __writeback_inodes_wb+0x125/0x420 fs/fs-writeback.c:1949 wb_writeback+0x440/0x7b0 fs/fs-writeback.c:2054 wb_check_start_all fs/fs-writeback.c:2176 [inline] wb_do_writeback fs/fs-writeback.c:2202 [inline] wb_workfn+0x827/0xef0 fs/fs-writeback.c:2235 process_one_work+0x877/0xdb0 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> The BUG_ON() is triggered at here: /* Dispose of resources used by a node */ void hfs_bnode_put(struct hfs_bnode *node) { if (node) { <skipped> BUG_ON(!atomic_read(&node->refcnt)); <- we have issue here!!!! <skipped> } } By tracing the refcnt, I found the node is created by hfs_bmap_alloc() with refcnt 1. Then the node is used by hfs_btree_write(). There is a missing of hfs_bnode_get() after find the node. The issue happened in following path: <alloc> hfs_bmap_alloc hfs_bnode_find __hfs_bnode_create <- allocate a new node with refcnt 1. hfs_bnode_put <- decrease the refcnt <write> hfs_btree_write hfs_bnode_find __hfs_bnode_create hfs_bnode_findhash <- find the node without refcnt increased. hfs_bnode_put <- trigger the BUG_ON() since refcnt is 0. Link: https://lkml.kernel.org/r/20221212021627.3766829-1-liushixin2@huawei.com Reported-by: syzbot+5b04b49a7ec7226c7426@syzkaller.appspotmail.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11cifs: Fix uninitialized memory read in smb3_qfs_tcon()Volker Lendecke1-6/+7
commit d447e794a37288ec7a080aa1b044a8d9deebbab7 upstream. oparms was not fully initialized Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11nfsd: zero out pointers after putting nfsd_files on COPY setup errorJeff Layton1-0/+2
[ Upstream commit 1f0001d43d0c0ac2a19a34a914f6595ad97cbc1d ] At first, I thought this might be a source of nfsd_file overputs, but the current callers seem to avoid an extra put when nfsd4_verify_copy returns an error. Still, it's "bad form" to leave the pointers filled out when we don't have a reference to them anymore, and that might lead to bugs later. Zero them out as a defensive coding measure. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11gfs2: Improve gfs2_make_fs_rw error handlingAndreas Gruenbacher1-2/+6
[ Upstream commit b66f723bb552ad59c2acb5d45ea45c890f84498b ] In gfs2_make_fs_rw(), make sure to call gfs2_consist() to report an inconsistency and mark the filesystem as withdrawn when gfs2_find_jhead() fails. At the end of gfs2_make_fs_rw(), when we discover that the filesystem has been withdrawn, make sure we report an error. This also replaces the gfs2_withdrawn() check after gfs2_find_jhead(). Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: syzbot+f51cb4b9afbd87ec06f2@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11coda: Avoid partial allocation of sig_inputArgsKees Cook1-1/+1
[ Upstream commit 48df133578c70185a95a49390d42df1996ddba2a ] GCC does not like having a partially allocated object, since it cannot reason about it for bounds checking when it is passed to other code. Instead, fully allocate sig_inputArgs. (Alternatively, sig_inputArgs should be defined as a struct coda_in_hdr, if it is actually not using any other part of the union.) Seen under GCC 13: ../fs/coda/upcall.c: In function 'coda_upcall': ../fs/coda/upcall.c:801:22: warning: array subscript 'union inputArgs[0]' is partly outside array bounds of 'unsigned char[20]' [-Warray-bounds=] 801 | sig_inputArgs->ih.opcode = CODA_SIGNAL; | ^~ Cc: Jan Harkes <jaharkes@cs.cmu.edu> Cc: coda@cs.cmu.edu Cc: codalist@coda.cs.cmu.edu Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230127223921.never.882-kees@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11udf: Define EFSCORRUPTED error codeJan Kara1-0/+2
[ Upstream commit 3d2d7e61553dbcc8ba45201d8ae4f383742c8202 ] Similarly to other filesystems define EFSCORRUPTED error code for reporting internal filesystem corruption. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11gfs2: jdata writepage fixAndreas Gruenbacher1-2/+1
[ Upstream commit cbb60951ce18c9b6e91d2eb97deb41d8ff616622 ] The ->writepage() and ->writepages() operations are supposed to write entire pages. However, on filesystems with a block size smaller than PAGE_SIZE, __gfs2_jdata_writepage() only adds the first block to the current transaction instead of adding the entire page. Fix that. Fixes: 18ec7d5c3f43 ("[GFS2] Make journaled data files identical to normal files on disk") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11cifs: Fix warning and UAF when destroy the MR listZhang Xiaoxu1-1/+2
[ Upstream commit 3e161c2791f8e661eed24a2c624087084d910215 ] If the MR allocate failed, the MR recovery work not initialized and list not cleared. Then will be warning and UAF when release the MR: WARNING: CPU: 4 PID: 824 at kernel/workqueue.c:3066 __flush_work.isra.0+0xf7/0x110 CPU: 4 PID: 824 Comm: mount.cifs Not tainted 6.1.0-rc5+ #82 RIP: 0010:__flush_work.isra.0+0xf7/0x110 Call Trace: <TASK> __cancel_work_timer+0x2ba/0x2e0 smbd_destroy+0x4e1/0x990 _smbd_get_connection+0x1cbd/0x2110 smbd_get_connection+0x21/0x40 cifs_get_tcp_session+0x8ef/0xda0 mount_get_conns+0x60/0x750 cifs_mount+0x103/0xd00 cifs_smb3_do_mount+0x1dd/0xcb0 smb3_get_tree+0x1d5/0x300 vfs_get_tree+0x41/0xf0 path_mount+0x9b3/0xdd0 __x64_sys_mount+0x190/0x1d0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 BUG: KASAN: use-after-free in smbd_destroy+0x4fc/0x990 Read of size 8 at addr ffff88810b156a08 by task mount.cifs/824 CPU: 4 PID: 824 Comm: mount.cifs Tainted: G W 6.1.0-rc5+ #82 Call Trace: dump_stack_lvl+0x34/0x44 print_report+0x171/0x472 kasan_report+0xad/0x130 smbd_destroy+0x4fc/0x990 _smbd_get_connection+0x1cbd/0x2110 smbd_get_connection+0x21/0x40 cifs_get_tcp_session+0x8ef/0xda0 mount_get_conns+0x60/0x750 cifs_mount+0x103/0xd00 cifs_smb3_do_mount+0x1dd/0xcb0 smb3_get_tree+0x1d5/0x300 vfs_get_tree+0x41/0xf0 path_mount+0x9b3/0xdd0 __x64_sys_mount+0x190/0x1d0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Allocated by task 824: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7a/0x90 _smbd_get_connection+0x1b6f/0x2110 smbd_get_connection+0x21/0x40 cifs_get_tcp_session+0x8ef/0xda0 mount_get_conns+0x60/0x750 cifs_mount+0x103/0xd00 cifs_smb3_do_mount+0x1dd/0xcb0 smb3_get_tree+0x1d5/0x300 vfs_get_tree+0x41/0xf0 path_mount+0x9b3/0xdd0 __x64_sys_mount+0x190/0x1d0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 824: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x143/0x1b0 __kmem_cache_free+0xc8/0x330 _smbd_get_connection+0x1c6a/0x2110 smbd_get_connection+0x21/0x40 cifs_get_tcp_session+0x8ef/0xda0 mount_get_conns+0x60/0x750 cifs_mount+0x103/0xd00 cifs_smb3_do_mount+0x1dd/0xcb0 smb3_get_tree+0x1d5/0x300 vfs_get_tree+0x41/0xf0 path_mount+0x9b3/0xdd0 __x64_sys_mount+0x190/0x1d0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Let's initialize the MR recovery work before MR allocate to prevent the warning, remove the MRs from the list to prevent the UAF. Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration") Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11cifs: Fix lost destroy smbd connection when MR allocate failedZhang Xiaoxu1-0/+1
[ Upstream commit e9d3401d95d62a9531082cd2453ed42f2740e3fd ] If the MR allocate failed, the smb direct connection info is NULL, then smbd_destroy() will directly return, then the connection info will be leaked. Let's set the smb direct connection info to the server before call smbd_destroy(). Fixes: c7398583340a ("CIFS: SMBD: Implement RDMA memory registration") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11nfsd: fix race to check ls_layoutsBenjamin Coddington1-2/+2
[ Upstream commit fb610c4dbc996415d57d7090957ecddd4fd64fb6 ] Its possible for __break_lease to find the layout's lease before we've added the layout to the owner's ls_layouts list. In that case, setting ls_recalled = true without actually recalling the layout will cause the server to never send a recall callback. Move the check for ls_layouts before setting ls_recalled. Fixes: c5c707f96fc9 ("nfsd: implement pNFS layout recalls") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>