summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2021-04-04Merge tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+5
POull io_uring fix from Jens Axboe: "Just fixing a silly braino in a previous patch, where we'd end up failing to compile if CONFIG_BLOCK isn't enabled. Not that a lot of people do that, but kernel bot spotted it and it's probably prudent to just flush this out now before -rc6. Sorry about that, none of my test compile configs have !CONFIG_BLOCK" * tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block: io_uring: fix !CONFIG_BLOCK compilation failure
2021-04-03Merge tag 'gfs2-v5.12-rc2-fixes2' of ↵Linus Torvalds1-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Two more gfs2 fixes" * tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: report "already frozen/thawed" errors gfs2: Flag a withdraw if init_threads() fails
2021-04-03io_uring: fix !CONFIG_BLOCK compilation failureJens Axboe1-0/+5
kernel test robot correctly pinpoints a compilation failure if CONFIG_BLOCK isn't set: fs/io_uring.c: In function '__io_complete_rw': >> fs/io_uring.c:2509:48: error: implicit declaration of function 'io_rw_should_reissue'; did you mean 'io_rw_reissue'? [-Werror=implicit-function-declaration] 2509 | if ((res == -EAGAIN || res == -EOPNOTSUPP) && io_rw_should_reissue(req)) { | ^~~~~~~~~~~~~~~~~~~~ | io_rw_reissue cc1: some warnings being treated as errors Ensure that we have a stub declaration of io_rw_should_reissue() for !CONFIG_BLOCK. Fixes: 230d50d448ac ("io_uring: move reissue into regular IO path") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-03Merge tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+4
Pull block fixes from Jens Axboe: - Remove comment that never came to fruition in 22 years of development (Christoph) - Remove unused request flag (Christoph) - Fix for null_blk fake timeout handling (Damien) - Fix for IOCB_NOWAIT being ignored for O_DIRECT on raw bdevs (Pavel) - Error propagation fix for multiple split bios (Yufen) * tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block: block: remove the unused RQF_ALLOCED flag block: update a few comments in uapi/linux/blkpg.h block: don't ignore REQ_NOWAIT for direct IO null_blk: fix command timeout completion handling block: only update parent bi_status when bio fail
2021-04-03Merge tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-blockLinus Torvalds2-22/+32
Pull io_uring fixes from Jens Axboe: "Nothing really major in here, and finally nothing really related to signals. A few minor fixups related to the threading changes, and some general fixes, that's it. There's the pending gdb-get-confused-about-arch, but that's more of a cosmetic issue, nothing that hinder use of it. And given that other archs will likely be affected by that oddity too, better to postpone any changes there until 5.13 imho" * tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block: io_uring: move reissue into regular IO path io_uring: fix EIOCBQUEUED iter revert io_uring/io-wq: protect against sprintf overflow io_uring: don't mark S_ISBLK async work as unbounded io_uring: drop sqd lock before handling signals for SQPOLL io_uring: handle setup-failed ctx in kill_timeouts io_uring: always go for cancellation spin on exec
2021-04-02io_uring: move reissue into regular IO pathJens Axboe1-4/+13
It's non-obvious how retry is done for block backed files, when it happens off the kiocb done path. It also makes it tricky to deal with the iov_iter handling. Just mark the req as needing a reissue, and handling it from the submission path instead. This makes it directly obvious that we're not re-importing the iovec from userspace past the submit point, and it means that we can just reuse our usual -EAGAIN retry path from the read/write handling. At some point in the future, we'll gain the ability to always reliably return -EAGAIN through the stack. A previous attempt on the block side didn't pan out and got reverted, hence the need to check for this information out-of-band right now. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-02block: don't ignore REQ_NOWAIT for direct IOPavel Begunkov1-0/+4
If IOCB_NOWAIT is set on submission, then that needs to get propagated to REQ_NOWAIT on the block side. Otherwise we completely lose this information, and any issuer of IOCB_NOWAIT IO will potentially end up blocking on eg request allocation on the storage side. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01io_uring: fix EIOCBQUEUED iter revertPavel Begunkov1-4/+0
iov_iter_revert() is done in completion handlers that happensf before read/write returns -EIOCBQUEUED, no need to repeat reverting afterwards. Moreover, even though it may appear being just a no-op, it's actually races with 1) user forging a new iovec of a different size 2) reissue, that is done via io-wq continues completely asynchronously. Fixes: 3e6a0d3c7571c ("io_uring: fix -EAGAIN retry with IOPOLL") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01io_uring/io-wq: protect against sprintf overflowPavel Begunkov2-3/+3
task_pid may be large enough to not fit into the left space of TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care about uniqueness, so replace it with safer snprintf(). Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01io_uring: don't mark S_ISBLK async work as unboundedJens Axboe1-1/+1
S_ISBLK is marked as unbounded work for async preparation, because it doesn't match S_ISREG. That is incorrect, as any read/write to a block device is also a bounded operation. Fix it up and ensure that S_ISBLK isn't marked unbounded. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-31reiserfs: update reiserfs_xattrs_initialized() conditionTetsuo Handa1-1/+1
syzbot is reporting NULL pointer dereference at reiserfs_security_init() [1], for commit ab17c4f02156c4f7 ("reiserfs: fixup xattr_root caching") is assuming that REISERFS_SB(s)->xattr_root != NULL in reiserfs_xattr_jcreate_nblocks() despite that commit made REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL case possible. I guess that commit 6cb4aff0a77cc0e6 ("reiserfs: fix oops while creating privroot with selinux enabled") wanted to check xattr_root != NULL before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking about the xattr root. The issue is that while creating the privroot during mount reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which dereferences the xattr root. The xattr root doesn't exist, so we get an oops. Therefore, update reiserfs_xattrs_initialized() to check both the privroot and the xattr root. Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde # [1] Reported-and-tested-by: syzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 6cb4aff0a77c ("reiserfs: fix oops while creating privroot with selinux enabled") Acked-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jan Kara <jack@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-30io_uring: drop sqd lock before handling signals for SQPOLLJens Axboe1-8/+11
Don't call into get_signal() with the sqd mutex held, it'll fail if we're freezing the task and we'll get complaints on locks still being held: ==================================== WARNING: iou-sqp-8386/8387 still has locks held! 5.12.0-rc4-syzkaller #0 Not tainted ------------------------------------ 1 lock held by iou-sqp-8386/8387: #0: ffff88801e1d2470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread+0x24c/0x13a0 fs/io_uring.c:6731 stack backtrace: CPU: 1 PID: 8387 Comm: iou-sqp-8386 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 try_to_freeze include/linux/freezer.h:66 [inline] get_signal+0x171a/0x2150 kernel/signal.c:2576 io_sq_thread+0x8d2/0x13a0 fs/io_uring.c:6748 Fold the get_signal() case in with the parking checks, as we need to drop the lock in both cases, and since we need to be checking for parking when juggling the lock anyway. Reported-by: syzbot+796d767eb376810256f5@syzkaller.appspotmail.com Fixes: dbe1bdbb39db ("io_uring: handle signals for IO threads like a normal thread") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-29io_uring: handle setup-failed ctx in kill_timeoutsPavel Begunkov1-2/+2
general protection fault, probably for non-canonical address 0xdffffc0000000018: 0000 [#1] KASAN: null-ptr-deref in range [0x00000000000000c0-0x00000000000000c7] RIP: 0010:io_commit_cqring+0x37f/0xc10 fs/io_uring.c:1318 Call Trace: io_kill_timeouts+0x2b5/0x320 fs/io_uring.c:8606 io_ring_ctx_wait_and_kill+0x1da/0x400 fs/io_uring.c:8629 io_uring_create fs/io_uring.c:9572 [inline] io_uring_setup+0x10da/0x2ae0 fs/io_uring.c:9599 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae It can get into wait_and_kill() before setting up ctx->rings, and hence io_commit_cqring() fails. Mimic poll cancel and do it only when we completed events, there can't be any requests if it failed before initialising rings. Fixes: 80c4cbdb5ee60 ("io_uring: do post-completion chore on t-out cancel") Reported-by: syzbot+0e905eb8228070c457a0@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/660261a48f0e7abf260c8e43c87edab3c16736fa.1617014345.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-29io_uring: always go for cancellation spin on execPavel Begunkov1-0/+2
Always try to do cancellation in __io_uring_task_cancel() at least once, so it actually goes and cleans its sqpoll tasks (i.e. via io_sqpoll_cancel_sync()), otherwise sqpoll task may submit new requests after cancellation and it's racy for many reasons. Fixes: 521d6a737a31c ("io_uring: cancel sqpoll via task_work") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0a21bd6d794bb1629bc906dd57a57b2c2985a8ac.1616839147.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-28Merge tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds8-22/+60
Pull cifs fixes from Steve French: "Five cifs/smb3 fixes, two for stable. Includes an important fix for encryption and an ACL fix, as well as a fix for possible reflink data corruption" * tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6: smb3: fix cached file size problems in duplicate extents (reflink) cifs: Silently ignore unknown oplock break handle cifs: revalidate mapping when we open files for SMB1 POSIX cifs: Fix chmod with modefromsid when an older ACE already exists. cifs: Adjust key sizes and key generation routines for AES256 encryption
2021-03-28Merge tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-blockLinus Torvalds2-58/+72
Pull io_uring fixes from Jens Axboe: - Use thread info versions of flag testing, as discussed last week. - The series enabling PF_IO_WORKER to just take signals, instead of needing to special case that they do not in a bunch of places. Ends up being pretty trivial to do, and then we can revert all the special casing we're currently doing. - Kill dead pointer assignment - Fix hashed part of async work queue trace - Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS - Fix a link completion ordering regression in this merge window - Cancellation fixes * tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block: io_uring: remove unsued assignment to pointer io io_uring: don't cancel extra on files match io_uring: don't cancel-track common timeouts io_uring: do post-completion chore on t-out cancel io_uring: fix timeout cancel return code Revert "signal: don't allow STOP on PF_IO_WORKER threads" Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals" Revert "signal: don't allow sending any signals to PF_IO_WORKER threads" kernel: stop masking signals in create_io_thread() io_uring: handle signals for IO threads like a normal thread kernel: don't call do_exit() for PF_IO_WORKER threads io_uring: maintain CQE order of a failed link io-wq: fix race around pending work on teardown io_uring: do ctx sqd ejection in a clear context io_uring: fix provide_buffers sign extension io_uring: don't skip file_end_write() on reissue io_uring: correct io_queue_async_work() traces io_uring: don't use {test,clear}_tsk_thread_flag() for current
2021-03-28Merge tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+2
Pull block fixes from Jens Axboe: - Fix regression from this merge window with the xarray partition change, which allowed partition counts that overflow the u8 that holds the partition number (Ming) - Fix zone append warning (Johannes) - Segmentation count fix for multipage bvecs (David) - Partition scan fix (Chris) * tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block: block: don't create too many partitions block: support zone append bvecs block: recalculate segment count for multi-segment discards correctly block: clear GD_NEED_PART_SCAN later in bdev_disk_changed
2021-03-27io_uring: remove unsued assignment to pointer ioColin Ian King1-1/+0
There is an assignment to io that is never read after the assignment, the assignment is redundant and can be removed. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27io_uring: don't cancel extra on files matchPavel Begunkov1-2/+0
As tasks always wait and kill their io-wq on exec/exit, files are of no more concern to us, so we don't need to specifically cancel them by hand in those cases. Moreover we should not, because io_match_task() looks at req->task->files now, which is always true and so leads to extra cancellations, that wasn't a case before per-task io-wq. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0566c1de9b9dd417f5de345c817ca953580e0e2e.1616696997.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27io_uring: don't cancel-track common timeoutsPavel Begunkov1-1/+2
Don't account usual timeouts (i.e. not linked) as REQ_F_INFLIGHT but keep behaviour prior to dd59a3d595cc1 ("io_uring: reliably cancel linked timeouts"). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/104441ef5d97e3932113d44501fda0df88656b83.1616696997.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27io_uring: do post-completion chore on t-out cancelPavel Begunkov1-20/+22
Don't forget about io_commit_cqring() + io_cqring_ev_posted() after exit/exec cancelling timeouts. Both functions declared only after io_kill_timeouts(), so to avoid tons of forward declarations move it down. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/72ace588772c0f14834a6a4185d56c445a366fb4.1616696997.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27io_uring: fix timeout cancel return codePavel Begunkov1-4/+4
When we cancel a timeout we should emit a sensible return code, like -ECANCELED but not 0, otherwise it may trick users. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27io_uring: handle signals for IO threads like a normal threadJens Axboe2-9/+20
We go through various hoops to disallow signals for the IO threads, but there's really no reason why we cannot just allow them. The IO threads never return to userspace like a normal thread, and hence don't go through normal signal processing. Instead, just check for a pending signal as part of the work loop, and call get_signal() to handle it for us if anything is pending. With that, we can support receiving signals, including special ones like SIGSTOP. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27smb3: fix cached file size problems in duplicate extents (reflink)Steve French1-3/+15
There were two problems (one of which could cause data corruption) that were noticed with duplicate extents (ie reflink) when debugging why various xfstests were being incorrectly skipped (e.g. generic/138, generic/140, generic/142). First, we were not updating the file size locally in the cache when extending a file due to reflink (it would refresh after actimeo expires) but xfstest was checking the size immediately which was still 0 so caused the test to be skipped. Second, we were setting the target file size (which could shrink the file) in all cases to the end of the reflinked range rather than only setting the target file size when reflink would extend the file. CC: <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-27cifs: Silently ignore unknown oplock break handleVincent Whitchurch1-2/+2
Make SMB2 not print out an error when an oplock break is received for an unknown handle, similar to SMB1. The debug message which is printed for these unknown handles may also be misleading, so fix that too. The SMB2 lease break path is not affected by this patch. Without this, a program which writes to a file from one thread, and opens, reads, and writes the same file from another thread triggers the below errors several times a minute when run against a Samba server configured with "smb2 leases = no". CIFS: VFS: \\192.168.0.1 No task to wake, unknown frame received! NumMids 2 00000000: 424d53fe 00000040 00000000 00000012 .SMB@........... 00000010: 00000001 00000000 ffffffff ffffffff ................ 00000020: 00000000 00000000 00000000 00000000 ................ 00000030: 00000000 00000000 00000000 00000000 ................ Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Reviewed-by: Tom Talpey <tom@talpey.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-27cifs: revalidate mapping when we open files for SMB1 POSIXRonnie Sahlberg1-0/+1
RHBZ: 1933527 Under SMB1 + POSIX, if an inode is reused on a server after we have read and cached a part of a file, when we then open the new file with the re-cycled inode there is a chance that we may serve the old data out of cache to the application. This only happens for SMB1 (deprecated) and when posix are used. The simplest solution to avoid this race is to force a revalidate on smb1-posix open. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-27cifs: Fix chmod with modefromsid when an older ACE already exists.Shyam Prasad N1-2/+1
My recent fixes to cifsacl to maintain inherited ACEs had regressed modefromsid when an older ACL already exists. Found testing xfstest 495 with modefromsid mount option Fixes: f5065508897a ("cifs: Retain old ACEs when converting between mode bits and ACL") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26cifs: Adjust key sizes and key generation routines for AES256 encryptionShyam Prasad N5-15/+41
For AES256 encryption (GCM and CCM), we need to adjust the size of a few fields to 32 bytes instead of 16 to accommodate the larger keys. Also, the L value supplied to the key generator needs to be changed from to 256 when these algorithms are used. Keeping the ioctl struct for dumping keys of the same size for now. Will send out a different patch for that one. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: <stable@vger.kernel.org> # v5.10+ Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26Merge tag 'for-5.12-rc4-tag' of ↵Linus Torvalds6-17/+48
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Fixes for issues that have some user visibility and are simple enough for this time of development cycle: - a few fixes for rescue= mount option, adding more checks for missing trees - fix sleeping in atomic context on qgroup deletion - fix subvolume deletion on mount - fix build with M= syntax - fix checksum mismatch error message for direct io" * tag 'for-5.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix check_data_csum() error message for direct I/O btrfs: fix sleep while in non-sleep context during qgroup removal btrfs: fix subvolume/snapshot deletion not triggered on mount btrfs: fix build when using M=fs/btrfs btrfs: do not initialize dev replace for bad dev root btrfs: initialize device::fs_info always btrfs: do not initialize dev stats if we have no dev_root btrfs: zoned: remove outdated WARN_ON in direct IO
2021-03-25io_uring: maintain CQE order of a failed linkPavel Begunkov1-2/+2
Arguably we want CQEs of linked requests be in a strict order of submission as it always was. Now if init of a request fails its CQE may be posted before all prior linked requests including the head of the link. Fix it by failing it last. Fixes: de59bc104c24f ("io_uring: fail links more in io_submit_sqe()") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b7a96b05832e7ab23ad55f84092a2548c4a888b0.1616699075.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds4-6/+15
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (hugetlb, kasan, gup, selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64, gcov, and mailmap" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mailmap: update Andrey Konovalov's email address mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm: memblock: fix section mismatch warning again kfence: make compatible with kmemleak gcov: fix clang-11+ support ia64: fix format strings for err_inject ia64: mca: allocate early mca with GFP_ATOMIC squashfs: fix xattr id and id lookup sanity checks squashfs: fix inode lookup sanity checks z3fold: prevent reclaim/free race for headless pages selftests/vm: fix out-of-tree build mm/mmu_notifiers: ensure range_end() is paired with range_start() kasan: fix per-page tags for non-page_alloc pages hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
2021-03-25gfs2: report "already frozen/thawed" errorsBob Peterson1-4/+6
Before this patch, gfs2's freeze function failed to report an error when the target file system was already frozen as it should (and as generic vfs function freeze_super does. Similarly, gfs2's thaw function failed to report an error when trying to thaw a file system that is not frozen, as vfs function thaw_super does. The errors were checked, but it always returned a 0 return code. This patch adds the missing error return codes to gfs2 freeze and thaw. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-03-25squashfs: fix xattr id and id lookup sanity checksPhillip Lougher2-4/+8
The checks for maximum metadata block size is missing SQUASHFS_BLOCK_OFFSET (the two byte length count). Link: https://lkml.kernel.org/r/2069685113.2081245.1614583677427@webmail.123-reg.co.uk Fixes: f37aa4c7366e23f ("squashfs: add more sanity checks in id lookup") Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: Sean Nyekjaer <sean@geanix.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25squashfs: fix inode lookup sanity checksSean Nyekjaer2-2/+7
When mouting a squashfs image created without inode compression it fails with: "unable to read inode lookup table" It turns out that the BLOCK_OFFSET is missing when checking the SQUASHFS_METADATA_SIZE agaist the actual size. Link: https://lkml.kernel.org/r/20210226092903.1473545-1-sean@geanix.com Fixes: eabac19e40c0 ("squashfs: add more sanity checks in inode lookup") Signed-off-by: Sean Nyekjaer <sean@geanix.com> Acked-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-25io-wq: fix race around pending work on teardownJens Axboe1-1/+5
syzbot reports that it's triggering the warning condition on having pending work on shutdown: WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_destroy fs/io-wq.c:1061 [inline] WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_put+0x153/0x260 fs/io-wq.c:1072 Modules linked in: CPU: 1 PID: 12346 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:io_wq_destroy fs/io-wq.c:1061 [inline] RIP: 0010:io_wq_put+0x153/0x260 fs/io-wq.c:1072 Code: 8d e8 71 90 ea 01 49 89 c4 41 83 fc 40 7d 4f e8 33 4d 97 ff 42 80 7c 2d 00 00 0f 85 77 ff ff ff e9 7a ff ff ff e8 1d 4d 97 ff <0f> 0b eb b9 8d 6b ff 89 ee 09 de bf ff ff ff ff e8 18 51 97 ff 09 RSP: 0018:ffffc90001ebfb08 EFLAGS: 00010293 RAX: ffffffff81e16083 RBX: ffff888019038040 RCX: ffff88801e86b780 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000040 RBP: 1ffff1100b2f8a80 R08: ffffffff81e15fce R09: ffffed100b2f8a82 R10: ffffed100b2f8a82 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: ffff8880597c5400 R15: ffff888019038000 FS: 00007f8dcd89c700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055e9a054e160 CR3: 000000001dfb8000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: io_uring_clean_tctx+0x1b7/0x210 fs/io_uring.c:8802 __io_uring_files_cancel+0x13c/0x170 fs/io_uring.c:8820 io_uring_files_cancel include/linux/io_uring.h:47 [inline] do_exit+0x258/0x2340 kernel/exit.c:780 do_group_exit+0x168/0x2d0 kernel/exit.c:922 get_signal+0x1734/0x1ef0 kernel/signal.c:2773 arch_do_signal_or_restart+0x3c/0x610 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x48/0x180 kernel/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x465f69 which shouldn't happen, but seems to be possible due to a race on whether or not the io-wq manager sees a fatal signal first, or whether the io-wq workers do. If we race with queueing work and then send a fatal signal to the owning task, and the io-wq worker sees that before the manager sets IO_WQ_BIT_EXIT, then it's possible to have the worker exit and leave work behind. Just turn the WARN_ON_ONCE() into a cancelation condition instead. Reported-by: syzbot+77a738a6bc947bf639ca@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-24Merge tag 'afs-cachefiles-fixes-20210323' of ↵Linus Torvalds2-6/+4
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull cachefiles and afs fixes from David Howells: "Fixes from Matthew Wilcox for page waiting-related issues in cachefiles and afs as extracted from his folio series[1]: - In cachefiles, remove the use of the wait_bit_key struct to access something that's actually in wait_page_key format. The proper struct is now available in the header, so that should be used instead. - Add a proper wait function for waiting killably on the page writeback flag. This includes a recent bugfix[2] that's not in the afs code. - In afs, use the function added in (2) rather than using wait_on_page_bit_killable() which doesn't provide the aforementioned bugfix" Link: https://lore.kernel.org/r/20210320054104.1300774-1-willy@infradead.org[1] Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 [2] Link: https://lore.kernel.org/r/20210323120829.GC1719932@casper.infradead.org/ # v1 * tag 'afs-cachefiles-fixes-20210323' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Use wait_on_page_writeback_killable mm/writeback: Add wait_on_page_writeback_killable fs/cachefiles: Remove wait_bit_key layout dependency
2021-03-24cachefiles: do not yet allow on idmapped mountsChristian Brauner1-0/+6
Based on discussions (e.g. in [1]) my understanding of cachefiles and the cachefiles userspace daemon is that it creates a cache on a local filesystem (e.g. ext4, xfs etc.) for a network filesystem. The way this is done is by writing "bind" to /dev/cachefiles and pointing it to the directory to use as the cache. Currently this directory can technically also be an idmapped mount but cachefiles aren't yet fully aware of such mounts and thus don't take the idmapping into account when creating cache entries. This could leave users confused as the ownership of the files wouldn't match to what they expressed in the idmapping. Block cache files on idmapped mounts until the fscache rework is done and we have ported it to support idmapped mounts. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/lkml/20210303161528.n3jzg66ou2wa43qb@wittgenstein [1] Link: https://lore.kernel.org/r/20210316112257.2974212-1-christian.brauner@ubuntu.com/ # v1 Link: https://listman.redhat.com/archives/linux-cachefs/2021-March/msg00044.html # v2 Link: https://lore.kernel.org/r/20210319114146.410329-1-christian.brauner@ubuntu.com/ # v3 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-24io_uring: do ctx sqd ejection in a clear contextPavel Begunkov1-8/+8
WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147 CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0 RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147 Call Trace: io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619 io_uring_release+0x3e/0x50 fs/io_uring.c:8646 __fput+0x288/0x920 fs/file_table.c:280 task_work_run+0xdd/0x1a0 kernel/task_work.c:140 io_run_task_work fs/io_uring.c:2238 [inline] io_run_task_work fs/io_uring.c:2228 [inline] io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770 io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974 io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907 io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961 io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so fput callback (i.e. io_uring_release()) is enqueued through task_work, and run by same cancellation. As it's deeply nested we can't do parking or taking sqd->lock there, because its state is unclear. So avoid ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it in a clear context in io_ring_exit_work(). Fixes: f6d54255f423 ("io_uring: halt SQO submission on ctx exit") Reported-by: syzbot+e3a3f84f5cecf61f0583@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e90df88b8ff2cabb14a7534601d35d62ab4cb8c7.1616496707.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-23afs: Use wait_on_page_writeback_killableMatthew Wilcox (Oracle)1-2/+1
Open-coding this function meant it missed out on the recent bugfix for waiters being woken by a delayed wake event from a previous instantiation of the page[1]. [DH: Changed the patch to use vmf->page rather than variable page which doesn't exist yet upstream] Fixes: 1cf7a1518aef ("afs: Implement shared-writeable mmap") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: kafs-testing@auristor.com cc: linux-afs@lists.infradead.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20210320054104.1300774-4-willy@infradead.org Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 [1]
2021-03-23fs/cachefiles: Remove wait_bit_key layout dependencyMatthew Wilcox (Oracle)1-4/+3
Cachefiles was relying on wait_page_key and wait_bit_key being the same layout, which is fragile. Now that wait_page_key is exposed in the pagemap.h header, we can remove that fragility A comment on the need to maintain structure layout equivalence was added by Linus[1] and that is no longer applicable. Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: kafs-testing@auristor.com cc: linux-cachefs@redhat.com cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20210320054104.1300774-2-willy@infradead.org/ Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3510ca20ece0150af6b10c77a74ff1b5c198e3e2 [1]
2021-03-23block: clear GD_NEED_PART_SCAN later in bdev_disk_changedChris Chiu1-2/+2
The GD_NEED_PART_SCAN is set by bdev_check_media_change to initiate a partition scan while removing a block device. It should be cleared after blk_drop_paritions because blk_drop_paritions could return -EBUSY and then the consequence __blkdev_get has no chance to do delete_partition if GD_NEED_PART_SCAN already cleared. It causes some problems on some card readers. Ex. Realtek card reader 0bda:0328 and 0bda:0158. The device node of the partition will not disappear after the memory card removed. Thus the user applications can not update the device mapping correctly. BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1920874 Signed-off-by: Chris Chiu <chris.chiu@canonical.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210323085219.24428-1-chris.chiu@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-22io_uring: fix provide_buffers sign extensionPavel Begunkov1-1/+3
io_provide_buffers_prep()'s "p->len * p->nbufs" to sign extension problems. Not a huge problem as it's only used for access_ok() and increases the checked length, but better to keep typing right. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: efe68c1ca8f49 ("io_uring: validate the full range of provided buffers for access") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/562376a39509e260d8532186a06226e56eb1f594.1616149233.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-22io_uring: don't skip file_end_write() on reissuePavel Begunkov1-3/+2
Don't miss to call kiocb_end_write() from __io_complete_rw() on reissue. Shouldn't be much of a problem as the function actually does some work only for ISREG, and NONBLOCK won't be reissued. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/32af9b77c5b874e1bee1a3c46396094bd969e577.1616366969.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-22io_uring: correct io_queue_async_work() tracesPavel Begunkov1-2/+2
Request's io-wq work is hashed in io_prep_async_link(), so as trace_io_uring_queue_async_work() looks at it should follow after prep has been done. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/709c9f872f4d2e198c7aed9c49019ca7095dd24d.1616366969.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-22Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds11-72/+168
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous ext4 bug fixes for v5.12" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: initialize ret to suppress smatch warning ext4: stop inode update before return ext4: fix rename whiteout with fast commit ext4: fix timer use-after-free on failed mount ext4: fix potential error in ext4_do_update_inode ext4: do not try to set xattr into ea_inode if value is empty ext4: do not iput inode under running transaction in ext4_rename() ext4: find old entry again if failed to rename whiteout ext4: fix error handling in ext4_end_enable_verity() ext4: fix bh ref count on error paths fs/ext4: fix integer overflow in s_log_groups_per_flex ext4: add reclaim checks to xattr code ext4: shrink race window in ext4_should_retry_alloc()
2021-03-21io_uring: don't use {test,clear}_tsk_thread_flag() for currentJens Axboe2-5/+3
Linus correctly points out that this is both unnecessary and generates much worse code on some archs as going from current to thread_info is actually backwards - and obviously just wasteful, since the thread_info is what we care about. Since io_uring only operates on current for these operations, just use test_thread_flag() instead. For io-wq, we can further simplify and use tracehook_notify_signal() to handle the TIF_NOTIFY_SIGNAL work and clear the flag. The latter isn't an actual bug right now, but it may very well be in the future if we place other work items under TIF_NOTIFY_SIGNAL. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/io-uring/CAHk-=wgYhNck33YHKZ14mFB5MzTTk8gqXHcfj=RWTAXKwgQJgg@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-21Merge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-blockLinus Torvalds2-6/+26
Pull io_uring followup fixes from Jens Axboe: - The SIGSTOP change from Eric, so we properly ignore that for PF_IO_WORKER threads. - Disallow sending signals to PF_IO_WORKER threads in general, we're not interested in having them funnel back to the io_uring owning task. - Stable fix from Stefan, ensuring we properly break links for short send/sendmsg recv/recvmsg if MSG_WAITALL is set. - Catch and loop when needing to run task_work before a PF_IO_WORKER threads goes to sleep. * tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block: io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL io-wq: ensure task is running before processing task_work signal: don't allow STOP on PF_IO_WORKER threads signal: don't allow sending any signals to PF_IO_WORKER threads
2021-03-21Merge tag 'x86_urgent_for_v5.12-rc4' of ↵Linus Torvalds1-6/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "The freshest pile of shiny x86 fixes for 5.12: - Add the arch-specific mapping between physical and logical CPUs to fix devicetree-node lookups - Restore the IRQ2 ignore logic - Fix get_nr_restart_syscall() to return the correct restart syscall number. Split in a 4-patches set to avoid kABI breakage when backporting to dead kernels" * tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/of: Fix CPU devicetree-node lookups x86/ioapic: Ignore IRQ2 again x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall() x86: Move TS_COMPAT back to asm/thread_info.h kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
2021-03-21io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with ↵Stefan Metzmacher1-4/+20
MSG_WAITALL Without that it's not safe to use them in a linked combination with others. Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE should be possible. We already handle short reads and writes for the following opcodes: - IORING_OP_READV - IORING_OP_READ_FIXED - IORING_OP_READ - IORING_OP_WRITEV - IORING_OP_WRITE_FIXED - IORING_OP_WRITE - IORING_OP_SPLICE - IORING_OP_TEE Now we have it for these as well: - IORING_OP_SENDMSG - IORING_OP_SEND - IORING_OP_RECVMSG - IORING_OP_RECV For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC flags in order to call req_set_fail_links(). There might be applications arround depending on the behavior that even short send[msg]()/recv[msg]() retuns continue an IOSQE_IO_LINK chain. It's very unlikely that such applications pass in MSG_WAITALL, which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'. It's expected that the low level sock_sendmsg() call just ignores MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set SO_ZEROCOPY. We also expect the caller to know about the implicit truncation to MAX_RW_COUNT, which we don't detect. cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-21io-wq: ensure task is running before processing task_workJens Axboe1-2/+6
Mark the current task as running if we need to run task_work from the io-wq threads as part of work handling. If that is the case, then return as such so that the caller can appropriately loop back and reset if it was part of a going-to-sleep flush. Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task") Signed-off-by: Jens Axboe <axboe@kernel.dk>