summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2022-03-03fsdax: depend on ZONE_DEVICE || FS_DAX_LIMITEDChristoph Hellwig1-0/+1
Add a depends on ZONE_DEVICE support or the s390-specific limited DAX support, as one of the two is required at runtime for fsdax code to actually work. Link: https://lkml.kernel.org/r/20220210072828.2930359-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Knig <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03mm: remove the extra ZONE_DEVICE struct page refcountChristoph Hellwig1-1/+0
ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE pages. Note that this excludes the special idle page wakeup for fsdax pages, which still happens at refcount 1. This is a separate issue and will be sorted out later. Given that only fsdax pages require the notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig symbol can go away and be replaced with a FS_DAX check for this hook in the put_page fastpath. Based on an earlier patch from Ralph Campbell <rcampbell@nvidia.com>. Link: https://lkml.kernel.org/r/20220210072828.2930359-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Christian Knig <christian.koenig@amd.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-03mm: don't include <linux/memremap.h> in <linux/mm.h>Christoph Hellwig2-0/+2
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h. Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-02-15Merge tag 'for-5.17-rc4-tag' of ↵Linus Torvalds3-2/+19
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - yield CPU more often when defragmenting a large file - skip defragmenting extents already under writeback - improve error message when send fails to write file data - get rid of warning when mounted with 'flushoncommit' * tag 'for-5.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: send: in case of IO error log it btrfs: get rid of warning on transaction commit when using flushoncommit btrfs: defrag: don't try to defrag extents which are under writeback btrfs: don't hold CPU for too long when defragging a file
2022-02-13Merge tag '5.17-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds5-17/+31
Pull cifs fixes from Steve French: "Three small smb3 reconnect fixes and an error log clarification" * tag '5.17-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: mark sessions for reconnection in helper function cifs: call helper functions for marking channels for reconnect cifs: call cifs_reconnect when a connection is marked [smb3] improve error message when mount options conflict with posix
2022-02-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-10/+32
Merge misc fixes from Andrew Morton: "5 patches. Subsystems affected by this patch series: binfmt, procfs, and mm (vmscan, memcg, and kfence)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: kfence: make test case compatible with run time set sample interval mm: memcg: synchronize objcg lists with a dedicated spinlock mm: vmscan: remove deadlock due to throttling failing to make progress fs/proc: task_mmu.c: don't read mapcount for migration entry fs/binfmt_elf: fix PT_LOAD p_align values for loaders
2022-02-12fs/proc: task_mmu.c: don't read mapcount for migration entryYang Shi1-9/+31
The syzbot reported the below BUG: kernel BUG at include/linux/page-flags.h:785! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 4392 Comm: syz-executor560 Not tainted 5.16.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline] RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744 Call Trace: page_mapcount include/linux/mm.h:837 [inline] smaps_account+0x470/0xb10 fs/proc/task_mmu.c:466 smaps_pte_entry fs/proc/task_mmu.c:538 [inline] smaps_pte_range+0x611/0x1250 fs/proc/task_mmu.c:601 walk_pmd_range mm/pagewalk.c:128 [inline] walk_pud_range mm/pagewalk.c:205 [inline] walk_p4d_range mm/pagewalk.c:240 [inline] walk_pgd_range mm/pagewalk.c:277 [inline] __walk_page_range+0xe23/0x1ea0 mm/pagewalk.c:379 walk_page_vma+0x277/0x350 mm/pagewalk.c:530 smap_gather_stats.part.0+0x148/0x260 fs/proc/task_mmu.c:768 smap_gather_stats fs/proc/task_mmu.c:741 [inline] show_smap+0xc6/0x440 fs/proc/task_mmu.c:822 seq_read_iter+0xbb0/0x1240 fs/seq_file.c:272 seq_read+0x3e0/0x5b0 fs/seq_file.c:162 vfs_read+0x1b5/0x600 fs/read_write.c:479 ksys_read+0x12d/0x250 fs/read_write.c:619 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The reproducer was trying to read /proc/$PID/smaps when calling MADV_FREE at the mean time. MADV_FREE may split THPs if it is called for partial THP. It may trigger the below race: CPU A CPU B ----- ----- smaps walk: MADV_FREE: page_mapcount() PageCompound() split_huge_page() page = compound_head(page) PageDoubleMap(page) When calling PageDoubleMap() this page is not a tail page of THP anymore so the BUG is triggered. This could be fixed by elevated refcount of the page before calling mapcount, but that would prevent it from counting migration entries, and it seems overkilling because the race just could happen when PMD is split so all PTE entries of tail pages are actually migration entries, and smaps_account() does treat migration entries as mapcount == 1 as Kirill pointed out. Add a new parameter for smaps_account() to tell this entry is migration entry then skip calling page_mapcount(). Don't skip getting mapcount for device private entries since they do track references with mapcount. Pagemap also has the similar issue although it was not reported. Fixed it as well. [shy828301@gmail.com: v4] Link: https://lkml.kernel.org/r/20220203182641.824731-1-shy828301@gmail.com [nathan@kernel.org: avoid unused variable warning in pagemap_pmd_range()] Link: https://lkml.kernel.org/r/20220207171049.1102239-1-nathan@kernel.org Link: https://lkml.kernel.org/r/20220120202805.3369-1-shy828301@gmail.com Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()") Signed-off-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reported-by: syzbot+1f52b3a18d5633fa7f82@syzkaller.appspotmail.com Acked-by: David Hildenbrand <david@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-12fs/binfmt_elf: fix PT_LOAD p_align values for loadersMike Rapoport1-1/+1
Rui Salvaterra reported that Aisleroit solitaire crashes with "Wrong __data_start/_end pair" assertion from libgc after update to v5.17-rc1. Bisection pointed to commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") that fixed handling of static PIEs, but made the condition that guards load_bias calculation to exclude loader binaries. Restoring the check for presence of interpreter fixes the problem. Link: https://lkml.kernel.org/r/20220202121433.3697146-1-rppt@kernel.org Fixes: 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: Rui Salvaterra <rsalvaterra@gmail.com> Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H.J. Lu" <hjl.tools@gmail.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-11Merge tag 'gfs2-v5.16-rc3-fixes2' of ↵Linus Torvalds2-6/+4
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: - Revert debug commit that causes unexpected data corruption - Fix muti-block reservation regression * tag 'gfs2-v5.16-rc3-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix gfs2_release for non-writers regression Revert "gfs2: check context in gfs2_glock_put"
2022-02-11Merge tag 'io_uring-5.17-2022-02-11' of git://git.kernel.dk/linux-blockLinus Torvalds1-5/+3
Pull io_uring fixes from Jens Axboe: - Fix a false-positive warning from an older gcc (Alviro) - Allow oom killer invocations from io_uring_setup (Shakeel) * tag 'io_uring-5.17-2022-02-11' of git://git.kernel.dk/linux-block: mm: io_uring: allow oom-killer from io_uring_setup io_uring: Clean up a false-positive warning from GCC 9.3.0
2022-02-11gfs2: Fix gfs2_release for non-writers regressionBob Peterson1-3/+4
When a file is opened for writing, the vfs code (do_dentry_open) calls get_write_access for the inode, thus incrementing the inode's write count. That writer normally then creates a multi-block reservation for the inode (i_res) that can be re-used by other writers, which speeds up writes for applications that stupidly loop on open/write/close. When the writes are all done, the multi-block reservation should be deleted when the file is closed by the last "writer." Commit 0ec9b9ea4f83 broke that concept when it moved the call to gfs2_rs_delete before the check for FMODE_WRITE. Non-writers have no business removing the multi-block reservations of writers. In fact, if someone opens and closes the file for RO while a writer has a multi-block reservation, the RO closer will delete the reservation midway through the write, and this results in: kernel BUG at fs/gfs2/rgrp.c:677! (or thereabouts) which is: BUG_ON(rs->rs_requested); from function gfs2_rs_deltree. This patch moves the check back inside the check for FMODE_WRITE. Fixes: 0ec9b9ea4f83 ("gfs2: Check for active reservation in gfs2_release") Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-11Revert "gfs2: check context in gfs2_glock_put"Andreas Gruenbacher1-3/+0
It turns out that the might_sleep() call that commit 660a6126f8c3 adds is triggering occasional data corruption in testing. We're not sure about the root cause yet, but since this commit was added as a debugging aid only, revert it for now. This reverts commit 660a6126f8c3208f6df8d552039cda078a8426d1. Fixes: 660a6126f8c3 ("gfs2: check context in gfs2_glock_put") Cc: stable@vger.kernel.org # v5.16+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-09Merge tag 'nfsd-5.17-2' of ↵Linus Torvalds7-47/+74
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd fixes from Chuck Lever: "Ensure that NFS clients cannot send file size or offset values that can cause the NFS server to crash or to return incorrect or surprising results. In particular, fix how the NFS server handles values larger than OFFSET_MAX" * tag 'nfsd-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Deprecate NFS_OFFSET_MAX NFSD: Fix offset type in I/O trace points NFSD: COMMIT operations must not return NFS?ERR_INVAL NFSD: Clamp WRITE offsets NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes NFSD: Fix ia_size underflow NFSD: Fix the behavior of READ near OFFSET_MAX
2022-02-09btrfs: send: in case of IO error log itDāvis Mosāns1-0/+4
Currently if we get IO error while doing send then we abort without logging information about which file caused issue. So log it to help with debugging. CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Dāvis Mosāns <davispuh@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-02-09btrfs: get rid of warning on transaction commit when using flushoncommitFilipe Manana1-2/+10
When using the flushoncommit mount option, during almost every transaction commit we trigger a warning from __writeback_inodes_sb_nr(): $ cat fs/fs-writeback.c: (...) static void __writeback_inodes_sb_nr(struct super_block *sb, ... { (...) WARN_ON(!rwsem_is_locked(&sb->s_umount)); (...) } (...) The trace produced in dmesg looks like the following: [947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3 [947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif [947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186 [947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3 [947.502097] Code: 24 10 4c 89 44 24 18 c6 (...) [947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246 [947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000 [947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50 [947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000 [947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488 [947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460 [947.553621] FS: 0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000 [947.560537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0 [947.571072] Call Trace: [947.572354] <TASK> [947.573266] btrfs_commit_transaction+0x1f1/0x998 [947.576785] ? start_transaction+0x3ab/0x44e [947.579867] ? schedule_timeout+0x8a/0xdd [947.582716] transaction_kthread+0xe9/0x156 [947.585721] ? btrfs_cleanup_transaction.isra.0+0x407/0x407 [947.590104] kthread+0x131/0x139 [947.592168] ? set_kthread_struct+0x32/0x32 [947.595174] ret_from_fork+0x22/0x30 [947.597561] </TASK> [947.598553] ---[ end trace 644721052755541c ]--- This is because we started using writeback_inodes_sb() to flush delalloc when committing a transaction (when using -o flushoncommit), in order to avoid deadlocks with filesystem freeze operations. This change was made by commit ce8ea7cc6eb313 ("btrfs: don't call btrfs_start_delalloc_roots in flushoncommit"). After that change we started producing that warning, and every now and then a user reports this since the warning happens too often, it spams dmesg/syslog, and a user is unsure if this reflects any problem that might compromise the filesystem's reliability. We can not just lock the sb->s_umount semaphore before calling writeback_inodes_sb(), because that would at least deadlock with filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem() is called while we are holding that semaphore in write mode, and that can trigger a transaction commit, resulting in a deadlock. It would also trigger the same type of deadlock in the unmount path. Possibly, it could also introduce some other locking dependencies that lockdep would report. To fix this call try_to_writeback_inodes_sb() instead of writeback_inodes_sb(), because that will try to read lock sb->s_umount and then will only call writeback_inodes_sb() if it was able to lock it. This is fine because the cases where it can't read lock sb->s_umount are during a filesystem unmount or during a filesystem freeze - in those cases sb->s_umount is write locked and sync_filesystem() is called, which calls writeback_inodes_sb(). In other words, in all cases where we can't take a read lock on sb->s_umount, writeback is already being triggered elsewhere. An alternative would be to call btrfs_start_delalloc_roots() with a number of pages different from LONG_MAX, for example matching the number of delalloc bytes we currently have, in which case we would end up starting all delalloc with filemap_fdatawrite_wbc() and not with an async flush via filemap_flush() - that is only possible after the rather recent commit e076ab2a2ca70a ("btrfs: shrink delalloc pages instead of full inodes"). However that creates a whole new can of worms due to new lock dependencies, which lockdep complains, like for example: [ 8948.247280] ====================================================== [ 8948.247823] WARNING: possible circular locking dependency detected [ 8948.248353] 5.17.0-rc1-btrfs-next-111 #1 Not tainted [ 8948.248786] ------------------------------------------------------ [ 8948.249320] kworker/u16:18/933570 is trying to acquire lock: [ 8948.249812] ffff9b3de1591690 (sb_internal#2){.+.+}-{0:0}, at: find_free_extent+0x141e/0x1590 [btrfs] [ 8948.250638] but task is already holding lock: [ 8948.251140] ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs] [ 8948.252018] which lock already depends on the new lock. [ 8948.252710] the existing dependency chain (in reverse order) is: [ 8948.253343] -> #2 (&root->delalloc_mutex){+.+.}-{3:3}: [ 8948.253950] __mutex_lock+0x90/0x900 [ 8948.254354] start_delalloc_inodes+0x78/0x400 [btrfs] [ 8948.254859] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs] [ 8948.255408] btrfs_commit_transaction+0x32f/0xc00 [btrfs] [ 8948.255942] btrfs_mksubvol+0x380/0x570 [btrfs] [ 8948.256406] btrfs_mksnapshot+0x81/0xb0 [btrfs] [ 8948.256870] __btrfs_ioctl_snap_create+0x17f/0x190 [btrfs] [ 8948.257413] btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs] [ 8948.257961] btrfs_ioctl+0x1196/0x3630 [btrfs] [ 8948.258418] __x64_sys_ioctl+0x83/0xb0 [ 8948.258793] do_syscall_64+0x3b/0xc0 [ 8948.259146] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8948.259709] -> #1 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}: [ 8948.260330] __mutex_lock+0x90/0x900 [ 8948.260692] btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs] [ 8948.261234] btrfs_commit_transaction+0x32f/0xc00 [btrfs] [ 8948.261766] btrfs_set_free_space_cache_v1_active+0x38/0x60 [btrfs] [ 8948.262379] btrfs_start_pre_rw_mount+0x119/0x180 [btrfs] [ 8948.262909] open_ctree+0x1511/0x171e [btrfs] [ 8948.263359] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 8948.263863] legacy_get_tree+0x30/0x50 [ 8948.264242] vfs_get_tree+0x28/0xc0 [ 8948.264594] vfs_kern_mount.part.0+0x71/0xb0 [ 8948.265017] btrfs_mount+0x11d/0x3a0 [btrfs] [ 8948.265462] legacy_get_tree+0x30/0x50 [ 8948.265851] vfs_get_tree+0x28/0xc0 [ 8948.266203] path_mount+0x2d4/0xbe0 [ 8948.266554] __x64_sys_mount+0x103/0x140 [ 8948.266940] do_syscall_64+0x3b/0xc0 [ 8948.267300] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8948.267790] -> #0 (sb_internal#2){.+.+}-{0:0}: [ 8948.268322] __lock_acquire+0x12e8/0x2260 [ 8948.268733] lock_acquire+0xd7/0x310 [ 8948.269092] start_transaction+0x44c/0x6e0 [btrfs] [ 8948.269591] find_free_extent+0x141e/0x1590 [btrfs] [ 8948.270087] btrfs_reserve_extent+0x14b/0x280 [btrfs] [ 8948.270588] cow_file_range+0x17e/0x490 [btrfs] [ 8948.271051] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs] [ 8948.271586] writepage_delalloc+0xb5/0x170 [btrfs] [ 8948.272071] __extent_writepage+0x156/0x3c0 [btrfs] [ 8948.272579] extent_write_cache_pages+0x263/0x460 [btrfs] [ 8948.273113] extent_writepages+0x76/0x130 [btrfs] [ 8948.273573] do_writepages+0xd2/0x1c0 [ 8948.273942] filemap_fdatawrite_wbc+0x68/0x90 [ 8948.274371] start_delalloc_inodes+0x17f/0x400 [btrfs] [ 8948.274876] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs] [ 8948.275417] flush_space+0x1f2/0x630 [btrfs] [ 8948.275863] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs] [ 8948.276438] process_one_work+0x252/0x5a0 [ 8948.276829] worker_thread+0x55/0x3b0 [ 8948.277189] kthread+0xf2/0x120 [ 8948.277506] ret_from_fork+0x22/0x30 [ 8948.277868] other info that might help us debug this: [ 8948.278548] Chain exists of: sb_internal#2 --> &fs_info->delalloc_root_mutex --> &root->delalloc_mutex [ 8948.279601] Possible unsafe locking scenario: [ 8948.280102] CPU0 CPU1 [ 8948.280508] ---- ---- [ 8948.280915] lock(&root->delalloc_mutex); [ 8948.281271] lock(&fs_info->delalloc_root_mutex); [ 8948.281915] lock(&root->delalloc_mutex); [ 8948.282487] lock(sb_internal#2); [ 8948.282800] *** DEADLOCK *** [ 8948.283333] 4 locks held by kworker/u16:18/933570: [ 8948.283750] #0: ffff9b3dc00a9d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0 [ 8948.284609] #1: ffffa90349dafe70 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0 [ 8948.285637] #2: ffff9b3e14db5040 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}, at: btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs] [ 8948.286674] #3: ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs] [ 8948.287596] stack backtrace: [ 8948.287975] CPU: 3 PID: 933570 Comm: kworker/u16:18 Not tainted 5.17.0-rc1-btrfs-next-111 #1 [ 8948.288677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 8948.289649] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] [ 8948.290298] Call Trace: [ 8948.290517] <TASK> [ 8948.290700] dump_stack_lvl+0x59/0x73 [ 8948.291026] check_noncircular+0xf3/0x110 [ 8948.291375] ? start_transaction+0x228/0x6e0 [btrfs] [ 8948.291826] __lock_acquire+0x12e8/0x2260 [ 8948.292241] lock_acquire+0xd7/0x310 [ 8948.292714] ? find_free_extent+0x141e/0x1590 [btrfs] [ 8948.293241] ? lock_is_held_type+0xea/0x140 [ 8948.293601] start_transaction+0x44c/0x6e0 [btrfs] [ 8948.294055] ? find_free_extent+0x141e/0x1590 [btrfs] [ 8948.294518] find_free_extent+0x141e/0x1590 [btrfs] [ 8948.294957] ? _raw_spin_unlock+0x29/0x40 [ 8948.295312] ? btrfs_get_alloc_profile+0x124/0x290 [btrfs] [ 8948.295813] btrfs_reserve_extent+0x14b/0x280 [btrfs] [ 8948.296270] cow_file_range+0x17e/0x490 [btrfs] [ 8948.296691] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs] [ 8948.297175] ? find_lock_delalloc_range+0x247/0x270 [btrfs] [ 8948.297678] writepage_delalloc+0xb5/0x170 [btrfs] [ 8948.298123] __extent_writepage+0x156/0x3c0 [btrfs] [ 8948.298570] extent_write_cache_pages+0x263/0x460 [btrfs] [ 8948.299061] extent_writepages+0x76/0x130 [btrfs] [ 8948.299495] do_writepages+0xd2/0x1c0 [ 8948.299817] ? sched_clock_cpu+0xd/0x110 [ 8948.300160] ? lock_release+0x155/0x4a0 [ 8948.300494] filemap_fdatawrite_wbc+0x68/0x90 [ 8948.300874] ? do_raw_spin_unlock+0x4b/0xa0 [ 8948.301243] start_delalloc_inodes+0x17f/0x400 [btrfs] [ 8948.301706] ? lock_release+0x155/0x4a0 [ 8948.302055] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs] [ 8948.302564] flush_space+0x1f2/0x630 [btrfs] [ 8948.302970] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs] [ 8948.303510] process_one_work+0x252/0x5a0 [ 8948.303860] ? process_one_work+0x5a0/0x5a0 [ 8948.304221] worker_thread+0x55/0x3b0 [ 8948.304543] ? process_one_work+0x5a0/0x5a0 [ 8948.304904] kthread+0xf2/0x120 [ 8948.305184] ? kthread_complete_and_exit+0x20/0x20 [ 8948.305598] ret_from_fork+0x22/0x30 [ 8948.305921] </TASK> It all comes from the fact that btrfs_start_delalloc_roots() takes the delalloc_root_mutex, in the transaction commit path we are holding a read lock on one of the superblock's freeze semaphores (via sb_start_intwrite()), the async reclaim task can also do a call to btrfs_start_delalloc_roots(), which ends up triggering writeback with calls to filemap_fdatawrite_wbc(), resulting in extent allocation which in turn can call btrfs_start_transaction(), which will result in taking the freeze semaphore via sb_start_intwrite(), forming a nasty dependency on all those locks which can be taken in different orders by different code paths. So just adopt the simple approach of calling try_to_writeback_inodes_sb() at btrfs_start_delalloc_flush(). Link: https://lore.kernel.org/linux-btrfs/20220130005258.GA7465@cuci.nl/ Link: https://lore.kernel.org/linux-btrfs/43acc426-d683-d1b6-729d-c6bc4a2fff4d@gmail.com/ Link: https://lore.kernel.org/linux-btrfs/6833930a-08d7-6fbc-0141-eb9cdfd6bb4d@gmail.com/ Link: https://lore.kernel.org/linux-btrfs/20190322041731.GF16651@hungrycats.org/ Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> [ add more link reports ] Signed-off-by: David Sterba <dsterba@suse.com>
2022-02-09btrfs: defrag: don't try to defrag extents which are under writebackQu Wenruo1-0/+4
Once we start writeback (have called btrfs_run_delalloc_range()), we allocate an extent, create an extent map point to that extent, with a generation of (u64)-1, created the ordered extent and then clear the DELALLOC bit from the range in the inode's io tree. Such extent map can pass the first call of defrag_collect_targets(), as its generation is (u64)-1, meets any possible minimal generation check. And the range will not have DELALLOC bit, also passing the DELALLOC bit check. It will only be re-checked in the second call of defrag_collect_targets(), which will wait for writeback. But at that stage we have already spent our time waiting for some IO we may or may not want to defrag. Let's reject such extents early so we won't waste our time. CC: stable@vger.kernel.org # 5.16 Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-02-09btrfs: don't hold CPU for too long when defragging a fileQu Wenruo1-0/+1
There is a user report about "btrfs filesystem defrag" causing 120s timeout problem. For btrfs_defrag_file() it will iterate all file extents if called from defrag ioctl, thus it can take a long time. There is no reason not to release the CPU during such a long operation. Add cond_resched() after defragged one cluster. CC: stable@vger.kernel.org # 5.16 Link: https://lore.kernel.org/linux-btrfs/10e51417-2203-f0a4-2021-86c8511cc367@gmx.com Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-02-09Fix regression due to "fs: move binfmt_misc sysctl to its own file"Domenico Andreoli2-5/+3
Commit 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") did not go unnoticed, binfmt-support stopped to work on my Debian system since v5.17-rc2 (did not check with -rc1). The existance of the /proc/sys/fs/binfmt_misc is a precondition for attempting to mount the binfmt_misc fs, which in turn triggers the autoload of the binfmt_misc module. Without it, no module is loaded and no binfmt is available at boot. Building as built-in or manually loading the module and mounting the fs works fine, it's therefore only a matter of interaction with user-space. I could try to improve the Debian systemd configuration but I can't say anything about the other distributions. This patch restores a working system right after boot. Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") Signed-off-by: Domenico Andreoli <domenico.andreoli@linux.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-09NFSD: Deprecate NFS_OFFSET_MAXChuck Lever2-2/+2
NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there was a kernel-wide OFFSET_MAX value. As a clean up, replace the last few uses of it with its generic equivalent, and get rid of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: Fix offset type in I/O trace pointsChuck Lever1-7/+7
NFSv3 and NFSv4 use u64 offset values on the wire. Record these values verbatim without the implicit type case to loff_t. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: COMMIT operations must not return NFS?ERR_INVALChuck Lever3-25/+38
Since, well, forever, the Linux NFS server's nfsd_commit() function has returned nfserr_inval when the passed-in byte range arguments were non-sensical. However, according to RFC 1813 section 3.3.21, NFSv3 COMMIT requests are permitted to return only the following non-zero status codes: NFS3ERR_IO NFS3ERR_STALE NFS3ERR_BADHANDLE NFS3ERR_SERVERFAULT NFS3ERR_INVAL is not included in that list. Likewise, NFS4ERR_INVAL is not listed in the COMMIT row of Table 6 in RFC 8881. RFC 7530 does permit COMMIT to return NFS4ERR_INVAL, but does not specify when it can or should be used. Instead of dropping or failing a COMMIT request in a byte range that is not supported, turn it into a valid request by treating one or both arguments as zero. Offset zero means start-of-file, count zero means until-end-of-file, so we only ever extend the commit range. NFS servers are always allowed to commit more and sooner than requested. The range check is no longer bounded by NFS_OFFSET_MAX, but rather by the value that is returned in the maxfilesize field of the NFSv3 FSINFO procedure or the NFSv4 maxfilesize file attribute. Note that this change results in a new pynfs failure: CMT4 st_commit.testCommitOverflow : RUNNING CMT4 st_commit.testCommitOverflow : FAILURE COMMIT with offset + count overflow should return NFS4ERR_INVAL, instead got NFS4_OK IMO the test is not correct as written: RFC 8881 does not allow the COMMIT operation to return NFS4ERR_INVAL. Reported-by: Dan Aloni <dan.aloni@vastdata.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Bruce Fields <bfields@fieldses.org>
2022-02-09NFSD: Clamp WRITE offsetsChuck Lever2-2/+8
Ensure that a client cannot specify a WRITE range that falls in a byte range outside what the kernel's internal types (such as loff_t, which is signed) can represent. The kiocb iterators, invoked in nfsd_vfs_write(), should properly limit write operations to within the underlying file system's s_maxbytes. Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizesChuck Lever1-1/+1
iattr::ia_size is a loff_t, so these NFSv3 procedures must be careful to deal with incoming client size values that are larger than s64_max without corrupting the value. Silently capping the value results in storing a different value than the client passed in which is unexpected behavior, so remove the min_t() check in decode_sattr3(). Note that RFC 1813 permits only the WRITE procedure to return NFS3ERR_FBIG. We believe that NFSv3 reference implementations also return NFS3ERR_FBIG when ia_size is too large. Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: Fix ia_size underflowChuck Lever1-0/+4
iattr::ia_size is a loff_t, which is a signed 64-bit type. NFSv3 and NFSv4 both define file size as an unsigned 64-bit type. Thus there is a range of valid file size values an NFS client can send that is already larger than Linux can handle. Currently decode_fattr4() dumps a full u64 value into ia_size. If that value happens to be larger than S64_MAX, then ia_size underflows. I'm about to fix up the NFSv3 behavior as well, so let's catch the underflow in the common code path: nfsd_setattr(). Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09NFSD: Fix the behavior of READ near OFFSET_MAXChuck Lever3-10/+14
Dan Aloni reports: > Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to > the RPC read layers") on the client, a read of 0xfff is aligned up > to server rsize of 0x1000. > > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server > and it returns an NFS code of EINVAL to the client. The client as > a result indefinitely retries the request. The Linux NFS client does not handle NFS?ERR_INVAL, even though all NFS specifications permit servers to return that status code for a READ. Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed and return a short result. Set the EOF flag in the result to prevent the client from retrying the READ request. This behavior appears to be consistent with Solaris NFS servers. Note that NFSv3 and NFSv4 use u64 offset values on the wire. These must be converted to loff_t internally before use -- an implicit type cast is not adequate for this purpose. Otherwise VFS checks against sb->s_maxbytes do not work properly. Reported-by: Dan Aloni <dan.aloni@vastdata.com> Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-09cifs: mark sessions for reconnection in helper functionShyam Prasad N1-5/+14
Today we have the code to mark connections and sessions (and tcons) for reconnect clubbed with the code to close the socket and abort all mids in the same function. Sometimes, we need to mark connections and sessions outside cifsd thread. So as a part of this change, I'm splitting this function into two different functions and calling them one after the other in cifs_reconnect. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-09cifs: call helper functions for marking channels for reconnectShyam Prasad N4-11/+6
cifs_mark_tcp_ses_conns_for_reconnect helper function is now meant to be used by any of the threads to mark a channel (or all the channels) for reconnect. Replace all such manual changes to tcpStatus to use this helper function, which takes care that the right channels, smb sessions and tcons are marked for reconnect. Also includes one line minor change Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-08Merge tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds3-8/+21
Pull NFS client fixes from Anna Schumaker: "Stable Fixes: - Fix initialization of nfs_client cl_flags Other Fixes: - Fix performance issues with uncached readdir calls - Fix potential pointer dereferences in rpcrdma_ep_create - Fix nfs4_proc_get_locations() kernel-doc comment - Fix locking during sunrpc sysfs reads - Update my email address in the MAINTAINERS file to my new kernel.org email" * tag 'nfs-for-5.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: lock against ->sock changing during sysfs read MAINTAINERS: Update my email address NFS: Fix nfs4_proc_get_locations() kernel-doc comment xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create NFS: Fix initialisation of nfs_client cl_flags field NFS: Avoid duplicate uncached readdir calls on eof NFS: Don't skip directory entries when doing uncached readdir NFS: Don't overfill uncached readdir pages
2022-02-08cifs: call cifs_reconnect when a connection is markedShyam Prasad N1-0/+1
In cifsd thread, we should continue to call cifs_reconnect whenever server->tcpStatus is marked as CifsNeedReconnect. This was inexplicably removed by one of my recent commits. Fixing that here. Fixes: a05885ce13bd ("cifs: fix the connection state transitions with multichannel") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-08NFS: Fix nfs4_proc_get_locations() kernel-doc commentYang Li1-1/+2
Add the description of @server and @fhandle, and remove the excess @inode in nfs4_proc_get_locations() kernel-doc comment to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/nfs/nfs4proc.c:8219: warning: Function parameter or member 'server' not described in 'nfs4_proc_get_locations' fs/nfs/nfs4proc.c:8219: warning: Function parameter or member 'fhandle' not described in 'nfs4_proc_get_locations' fs/nfs/nfs4proc.c:8219: warning: Excess function parameter 'inode' description in 'nfs4_proc_get_locations' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08NFS: Fix initialisation of nfs_client cl_flags fieldTrond Myklebust1-1/+1
For some long forgotten reason, the nfs_client cl_flags field is initialised in nfs_get_client() instead of being initialised at allocation time. This quirk was harmless until we moved the call to nfs_create_rpc_client(). Fixes: dd99e9f98fbf ("NFSv4: Initialise connection to the server in nfs4_alloc_client()") Cc: stable@vger.kernel.org # 4.8.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08Merge tag '5.17-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds6-14/+70
Pull ksmbd server fixes from Steve French: - NTLMSSP authentication improvement - RDMA (smbdirect) fix allowing broader set of NICs to be supported - improved buffer validation - additional small fixes, including a posix extensions fix for stable * tag '5.17-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: add support for key exchange ksmbd: reduce smb direct max read/write size ksmbd: don't align last entry offset in smb2 query directory ksmbd: fix same UniqueId for dot and dotdot entries ksmbd: smbd: validate buffer descriptor structures ksmbd: fix SMB 3.11 posix extension mount failure
2022-02-07mm: io_uring: allow oom-killer from io_uring_setupShakeel Butt1-3/+2
On an overcommitted system which is running multiple workloads of varying priorities, it is preferred to trigger an oom-killer to kill a low priority workload than to let the high priority workload receiving ENOMEMs. On our memory overcommitted systems, we are seeing a lot of ENOMEMs instead of oom-kills because io_uring_setup callchain is using __GFP_NORETRY gfp flag which avoids the oom-killer. Let's remove it and allow the oom-killer to kill a lower priority job. Signed-off-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20220125051736.2981459-1-shakeelb@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-07io_uring: Clean up a false-positive warning from GCC 9.3.0Alviro Iskandar Setiawan1-2/+1
In io_recv(), if import_single_range() fails, the @flags variable is uninitialized, then it will goto out_free. After the goto, the compiler doesn't know that (ret < min_ret) is always true, so it thinks the "if ((flags & MSG_WAITALL) ..." path could be taken. The complaint comes from gcc-9 (Debian 9.3.0-22) 9.3.0: ``` fs/io_uring.c:5238 io_recvfrom() error: uninitialized symbol 'flags' ``` Fix this by bypassing the @ret and @flags check when import_single_range() fails. Reasons: 1. import_single_range() only returns -EFAULT when it fails. 2. At that point, @flags is uninitialized and shouldn't be read. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: "Chen, Rong A" <rong.a.chen@intel.com> Link: https://lore.gnuweeb.org/timl/d33bb5a9-8173-f65b-f653-51fc0681c6d6@intel.com/ Cc: Pavel Begunkov <asml.silence@gmail.com> Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Fixes: 7297ce3d59449de49d3c9e1f64ae25488750a1fc ("io_uring: improve send/recv error handling") Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@gmail.com> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Link: https://lore.kernel.org/r/20220207140533.565411-1-ammarfaizi2@gnuweeb.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-07[smb3] improve error message when mount options conflict with posixSteve French1-1/+10
POSIX extensions require SMB3.1.1 (so improve the error message when vers=3.0, 2.1 or 2.0 is specified on mount) Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-06Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds18-151/+192
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Various bug fixes for ext4 fast commit and inline data handling. Also fix regression introduced as part of moving to the new mount API" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: fs/ext4: fix comments mentioning i_mutex ext4: fix incorrect type issue during replay_del_range jbd2: fix kernel-doc descriptions for jbd2_journal_shrink_{scan,count}() ext4: fix potential NULL pointer dereference in ext4_fill_super() jbd2: refactor wait logic for transaction updates into a common function jbd2: cleanup unused functions declarations from jbd2.h ext4: fix error handling in ext4_fc_record_modified_inode() ext4: remove redundant max inline_size check in ext4_da_write_inline_data_begin() ext4: fix error handling in ext4_restore_inline_data() ext4: fast commit may miss file actions ext4: fast commit may not fallback for ineligible commit ext4: modify the logic of ext4_mb_new_blocks_simple ext4: prevent used blocks from being allocated during fast commit replay
2022-02-05Merge tag 'xfs-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds5-79/+69
Pull xfs fixes from Darrick Wong: "I was auditing operations in XFS that clear file privileges, and realized that XFS' fallocate implementation drops suid/sgid but doesn't clear file capabilities the same way that file writes and reflink do. There are VFS helpers that do it correctly, so refactor XFS to use them. I also noticed that we weren't flushing the log at the correct point in the fallocate operation, so that's fixed too. Summary: - Fix fallocate so that it drops all file privileges when files are modified instead of open-coding that incompletely. - Fix fallocate to flush the log if the caller wanted synchronous file updates" * tag 'xfs-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: ensure log flush at the end of a synchronous fallocate call xfs: move xfs_update_prealloc_flags() to xfs_pnfs.c xfs: set prealloc flag in xfs_alloc_file_space() xfs: fallocate() should call file_modified() xfs: remove XFS_PREALLOC_SYNC xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*
2022-02-05Merge tag 'vfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds4-17/+37
Pull vfs fixes from Darrick Wong: "I was auditing the sync_fs code paths recently and noticed that most callers of ->sync_fs ignore its return value (and many implementations never return nonzero even if the fs is broken!), which means that internal fs errors and corruption are not passed up to userspace callers of syncfs(2) or FIFREEZE. Hence fixing the common code and XFS, and I'll start working on the ext4/btrfs folks if this is merged. Summary: - Fix a bug where callers of ->sync_fs (e.g. sync_filesystem and syncfs(2)) ignore the return value. - Fix a bug where callers of sync_filesystem (e.g. fs freeze) ignore the return value. - Fix a bug in XFS where xfs_fs_sync_fs never passed back error returns" * tag 'vfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: return errors in xfs_fs_sync_fs quota: make dquot_quota_sync return errors from ->sync_fs vfs: make sync_filesystem return errors from ->sync_fs vfs: make freeze_super abort when sync_filesystem returns error
2022-02-05Merge tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2-5/+63
Pull iomap fix from Darrick Wong: "A single bugfix for iomap. The fix should eliminate occasional complaints about stall warnings when a lot of writeback IO completes all at once and we have to then go clearing status on a large number of folios. Summary: - Limit the length of ioend chains in writeback so that we don't trip the softlockup watchdog and to limit long tail latency on clearing PageWriteback" * tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs, iomap: limit individual ioend chain lengths in writeback
2022-02-04Merge tag 'for-5.17-rc2-tag' of ↵Linus Torvalds8-9/+128
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few fixes and error handling improvements: - fix deadlock between quota disable and qgroup rescan worker - fix use-after-free after failure to create a snapshot - skip warning on unmount after log cleanup failure - don't start transaction for scrub if the fs is mounted read-only - tree checker verifies item sizes" * tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: skip reserved bytes warning on unmount after log cleanup failure btrfs: fix use of uninitialized variable at rm device ioctl btrfs: fix use-after-free after failure to create a snapshot btrfs: tree-checker: check item_size for dev_item btrfs: tree-checker: check item_size for inode_item btrfs: fix deadlock between quota disable and qgroup rescan worker btrfs: don't start transaction for scrub if the fs is mounted read-only
2022-02-04Merge tag 'erofs-for-5.17-rc3-fixes' of ↵Linus Torvalds3-61/+67
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "Two fixes related to fsdax cleanup in this cycle and ztailpacking to fix small compressed data inlining. There is also a trivial cleanup to rearrange code for better reading. Summary: - fix fsdax partition offset misbehavior - clean up z_erofs_decompressqueue_work() declaration - fix up EOF lcluster inlining, especially for small compressed data" * tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix small compressed files inlining erofs: avoid unnecessary z_erofs_decompressqueue_work() declaration erofs: fix fsdax partition offset handling
2022-02-04Merge tag '9p-for-5.17-rc3' of git://github.com/martinetd/linuxLinus Torvalds1-5/+4
Pull 9p fix from Dominique Martinet: "Fix 'cannot walk open fid' rule The 9p 'walk' operation requires fid arguments to not originate from an open or create call and we've missed that for a while as the servers regularly running tests with don't enforce the check and no active reviewer knew about the rule. Both reporters confirmed reverting this patch fixes things for them and looking at it further wasn't actually required... Will take more time for follow up and enforcing the rule more thoroughly later" * tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux: Revert "fs/9p: search open fids first"
2022-02-04Merge tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds7-199/+323
Pull cifs fixes from Steve French: "SMB3 client fixes including: - multiple fscache related fixes, reenabling ability to read/write to cached files for cifs.ko (that was temporarily disabled for cifs.ko a few weeks ago due to the recent fscache changes) - also includes a new fscache helper function ("query_occupancy") used by above - fix for multiuser mounts and NTLMSSP auth (workstation name) for stable - fix locking ordering problem in multichannel code - trivial malformed comment fix" * tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix workstation_name for multiuser mounts Invalidate fscache cookie only when inode attributes are changed. cifs: Fix the readahead conversion to manage the batch when reading from cache cifs: Implement cache I/O by accessing the cache directly netfs, cachefiles: Add a method to query presence of data in the cache cifs: Transition from ->readpages() to ->readahead() cifs: unlock chan_lock before calling cifs_put_tcp_session Fix a warning about a malformed kernel doc comment in cifs
2022-02-04ksmbd: add support for key exchangeNamjae Jeon2-2/+29
When mounting cifs client, can see the following warning message. CIFS: decode_ntlmssp_challenge: authentication has been weakened as server does not support key exchange To remove this warning message, Add support for key exchange feature to ksmbd. This patch decrypts 16-byte ciphertext value sent by the client using RC4 with session key. The decrypted value is the recovered secondary key that will use instead of the session key for signing and sealing. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04ksmbd: reduce smb direct max read/write sizeNamjae Jeon1-1/+1
ksmbd does not support more than one Buffer Descriptor V1 element in an smbdirect protocol request. Reducing the maximum read/write size to about 512KB allows interoperability with Windows over a wider variety of RDMA NICs, as an interim workaround. Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04ksmbd: don't align last entry offset in smb2 query directoryNamjae Jeon2-3/+5
When checking smb2 query directory packets from other servers, OutputBufferLength is different with ksmbd. Other servers add an unaligned next offset to OutputBufferLength for the last entry. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04ksmbd: fix same UniqueId for dot and dotdot entriesNamjae Jeon1-1/+4
ksmbd sets the inode number to UniqueId. However, the same UniqueId for dot and dotdot entry is set to the inode number of the parent inode. This patch set them using the current inode and parent inode. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04ksmbd: smbd: validate buffer descriptor structuresHyunchul Lee1-6/+30
Check ChannelInfoOffset and ChannelInfoLength to validate buffer descriptor structures. And add a debug log to print the structures' content. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-02-04erofs: fix small compressed files inliningGao Xiang1-0/+7
Prior to ztailpacking feature, it's enough that each lcluster has two pclusters at most, and the last pcluster should be turned into an uncompressed pcluster when necessary. For example, _________________________________________________ |_ pcluster n-2 _|_ pcluster n-1 _|____ EOFed ____| which should be converted into: _________________________________________________ |_ pcluster n-2 _|_ pcluster n-1 (uncompressed)' _| That is fine since either pcluster n-1 or (uncompressed)' takes one physical block. However, after ztailpacking was supported, the game is changed since the last pcluster can be inlined now. And such case above is quite common for inlining small files. Therefore, in order to inline more effectively, special EOF lclusters are now supported which can have three parts at most, as illustrated below: _________________________________________________ |_ pcluster n-2 _|_ pcluster n-1 _|____ EOFed ____| ^ i_size Actually similar code exists in Yue Hu's original patchset [1], but I removed this part on purpose. After evaluating more real cases with small files, I've changed my mind. [1] https://lore.kernel.org/r/20211215094449.15162-1-huyue2@yulong.com Link: https://lore.kernel.org/r/20220203190203.30794-1-xiang@kernel.org Fixes: ab92184ff8f1 ("erofs: add on-disk compressed tail-packing inline support") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-02-03fs/ext4: fix comments mentioning i_mutexhongnanli8-20/+20
inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com> Link: https://lore.kernel.org/r/20220121070611.21618-1-hongnan.li@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>