summaryrefslogtreecommitdiff
path: root/io_uring
AgeCommit message (Collapse)AuthorFilesLines
2023-01-30io_uring: refactor io_put_task helpersPavel Begunkov1-5/+12
Add a helper for putting refs from the target task context, rename __io_put_task() and add a couple of comments around. Use the remote version for __io_req_complete_post(), the local is only needed for __io_submit_flush_completions(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3bf92ebd594769d8a5d648472a8e335f2031d542.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: refactor req allocationPavel Begunkov3-14/+15
Follow the io_get_sqe pattern returning the result via a pointer and hide request cache refill inside io_alloc_req(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8c37c2e8a3cb5e4cd6a8ae3b91371227a92708a6.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: improve io_get_sqePavel Begunkov1-5/+5
Return an SQE from io_get_sqe() as a parameter and use the return value to determine if it failed or not. This enables the compiler to compile out the sqe NULL check when we know that the return SQE is valid. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9cceb11329240ea097dffef6bf0a675bca14cf42.1674484266.git.asml.silence@gmail.com [axboe: remove bogus const modifier on return value] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: kill outdated comment about overflow flushPavel Begunkov1-1/+0
__io_cqring_overflow_flush() doesn't return anything anymore, remove outdate comment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4ce2bcbb17eac80cdf883fd1459d5ee6586e238c.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: use user visible tail in io_uring_poll()Pavel Begunkov1-1/+1
We return POLLIN from io_uring_poll() depending on whether there are CQEs for the userspace, and so we should use the user visible tail pointer instead of a transient cached value. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/228ffcbf30ba98856f66ffdb9a6a60ead1dd96c0.1674484266.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: pass in io_issue_def to io_assign_file()Jens Axboe1-4/+5
This generates better code for me, avoiding an extra load on arm64, and both call sites already have this variable available for easy passing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: Enable KASAN for request cacheBreno Leitao2-4/+10
Every io_uring request is represented by struct io_kiocb, which is cached locally by io_uring (not SLAB/SLUB) in the list called submit_state.freelist. This patch simply enabled KASAN for this free list. This list is initially created by KMEM_CACHE, but later, managed by io_uring. This patch basically poisons the objects that are not used (i.e., they are the free list), and unpoisons it when the object is allocated/removed from the list. Touching these poisoned objects while in the freelist will cause a KASAN warning. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: handle TIF_NOTIFY_RESUME when checking for task_workJens Axboe1-0/+8
If TIF_NOTIFY_RESUME is set, then we need to call resume_user_mode_work() for PF_IO_WORKER threads. They never return to usermode, hence never get a chance to process any items that are marked by this flag. Most notably this includes the final put of files, but also any throttling markers set by block cgroups. Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring/msg-ring: ensure flags passing works for task_work completionsJens Axboe1-1/+6
If the target ring is using IORING_SETUP_SINGLE_ISSUER and we're posting a message from a different thread, then we need to ensure that the fallback task_work that posts the CQE knwos about the flags passing as well. If not we'll always be posting 0 as the flags. Fixes: 3563d7ed58a5 ("io_uring/msg_ring: Pass custom flags to the cqe") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: Split io_issue_def structBreno Leitao4-115/+238
This patch removes some "cold" fields from `struct io_issue_def`. The plan is to keep only highly used fields into `struct io_issue_def`, so, it may be hot in the cache. The hot fields are basically all the bitfields and the callback functions for .issue and .prep. The other less frequently used fields are now located in a secondary and cold struct, called `io_cold_def`. This is the size for the structs: Before: io_issue_def = 56 bytes After: io_issue_def = 24 bytes; io_cold_def = 40 bytes Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230112144411.2624698-2-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: Rename struct io_op_defBreno Leitao5-25/+25
The current io_op_def struct is becoming huge and the name is a bit generic. The goal of this patch is to rename this struct to `io_issue_def`. This struct will contain the hot functions associated with the issue code path. For now, this patch only renames the structure, and an upcoming patch will break up the structure in two, moving the non-issue fields to a secondary struct. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230112144411.2624698-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: refactor __io_req_complete_postPavel Begunkov1-2/+2
Keep parts of __io_req_complete_post() relying on req->flags together so the value can be cached. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2b4fbb42f404a0e75c4d9f0a5b16f314a839d0a9.1673887636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: simplify fallback executionPavel Begunkov1-8/+6
Lock the ring with uring_lock in io_fallback_req_func(), which should make it a bit safer and easier. With that we also don't need refs pinning as io_ring_exit_work() will wait until uring_lock is freed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/56170e6a0cbfc8edee2794c6613e8f6f1d76d276.1673887636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: don't export io_put_task()Pavel Begunkov2-11/+10
io_put_task() is only used in uring.c so enclose it there together with __io_put_task(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/43c7f9227e2ab215f1a6069dadbc5382bed346fe.1673887636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: return back links tw run optimisationPavel Begunkov1-1/+4
io_submit_flush_completions() may queue new requests for tw execution, especially true for linked requests. Recheck the tw list for emptiness after flushing completions. Note that this doesn't really fix the commit referenced below, but it does reinstate an optimization that existed before that got merged. Fixes: f88262e60bb9 ("io_uring: lockless task list") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6328acdbb5e60efc762b18003382de077e6e1367.1673887636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: make io_sqpoll_wait_sq return voidQuanfa Fu3-8/+5
Change the return type to void since it always return 0, and no need to do the checking in syscall io_uring_enter. Signed-off-by: Quanfa Fu <quanfafu@gmail.com> Link: https://lore.kernel.org/r/20230115071519.554282-1-quanfafu@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: optimise deferred tw executionPavel Begunkov1-17/+7
We needed fake nodes in __io_run_local_work() and to avoid unecessary wake ups while the task already running task_works, but we don't need them anymore since wake ups are protected by cq_waiting, which is always cleared by the time we're executing deferred task_work items. Note that because of loose sync around cq_waiting clearing io_req_local_work_add() may wake the task more than once, but that's fine and should be rare to not hurt perf. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8839534891f0a2f1076e78554a31ea7e099f7de5.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: add io_req_local_work_add wake fast pathPavel Begunkov1-1/+6
Don't wake the master task after queueing a deferred tw unless it's currently waiting in io_cqring_wait. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/717702d772825a6647e6c315b4690277ba84c3fc.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: waitqueue-less cq waitingPavel Begunkov1-7/+12
With DEFER_TASKRUN only ctx->submitter_task might be waiting for CQEs, we can use this to optimise io_cqring_wait(). Replace ->cq_wait waitqueue with waking the task directly. It works but misses an important optimisation covered by the following patch, so this patch without follow ups might hurt performance. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/103d174d35d919d4cb0922d8a9c93a8f0c35f74a.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: wake up optimisationsPavel Begunkov1-1/+20
Flush completions is done either from the submit syscall or by the task_work, both are in the context of the submitter task, and when it goes for a single threaded rings like implied by ->task_complete, there won't be any waiters on ->cq_wait but the master task. That means that there can be no tasks sleeping on cq_wait while we run __io_submit_flush_completions() and so waking up can be skipped. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/60ad9768ec74435a0ddaa6eec0ffa7729474f69f.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: add lazy poll_wq activationPavel Begunkov2-5/+62
Even though io_poll_wq_wake()'s waitqueue_active reuses a barrier we do for another waitqueue, it's not going to be the case in the future and so we want to have a fast path for it when the ring has never been polled. Move poll_wq wake ups into __io_commit_cqring_flush() using a new flag called ->poll_activated. The idea behind the flag is to set it when the ring was polled for the first time. This requires additional sync to not miss events, which is done here by using task_work for ->task_complete rings, and by default enabling the flag for all other types of rings. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/060785e8e9137a920b232c0c7f575b131af19cac.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: separate wq for ring pollingPavel Begunkov2-1/+11
Don't use ->cq_wait for ring polling but add a separate wait queue for it. We need it for following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dea0be0bf990503443c5c6c337fc66824af7d590.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: move io_run_local_work_lockedPavel Begunkov2-18/+17
io_run_local_work_locked() is only used in io_uring.c, move it there. With that we can also make __io_run_local_work() static. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/91757bcb33e5774e49fed6f2b6e058630608119b.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: mark io_run_local_work staticPavel Begunkov2-2/+1
io_run_local_work is enclosed in io_uring.c, we don't need to export it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b477fb81f5e77044f724a06fe245d5c078659364.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: don't set TASK_RUNNING in local tw runnerPavel Begunkov1-3/+2
The CQ waiting loop sets TASK_RUNNING before trying to execute task_work, no need to repeat it in io_run_local_work(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9d9422c429ef3f9457b4f4b8288bf4789564f33b.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: refactor io_wake_functionPavel Begunkov1-4/+2
Remove a local variable ctx in io_wake_function(), we don't need it if io_should_wake() triggers it to wake up. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e60eb1008aebe286aab7d34c772ed01c447bddb1.1673274244.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: remove excessive unlikely on IS_ERRDmitrii Bundin1-1/+1
The IS_ERR function uses the IS_ERR_VALUE macro under the hood which already wraps the condition into unlikely. Signed-off-by: Dmitrii Bundin <dmitrii.bundin.a@gmail.com> Link: https://lore.kernel.org/r/20230109185854.25698-1-dmitrii.bundin.a@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring/msg_ring: Pass custom flags to the cqeBreno Leitao1-5/+19
This patch adds a new flag (IORING_MSG_RING_FLAGS_PASS) in the message ring operations (IORING_OP_MSG_RING). This new flag enables the sender to specify custom flags, which will be copied over to cqe->flags in the receiving ring. These custom flags should be specified using the sqe->file_index field. This mechanism provides additional flexibility when sending messages between rings. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20230103160507.617416-1-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: keep timeout in io_wait_queuePavel Begunkov1-14/+14
Move waiting timeout into io_wait_queue Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e4b48a9e26a3b1cf97c80121e62d4b5ab873d28d.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: optimise non-timeout waitingPavel Begunkov1-1/+3
Unlike the jiffy scheduling version, schedule_hrtimeout() jumps a few functions before getting into schedule() even if there is no actual timeout needed. Some tests showed that it takes up to 1% of CPU cycles. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89f880574eceee6f4899783377ead234df7b3d04.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: set TASK_RUNNING right after schedulePavel Begunkov1-3/+2
Instead of constantly watching that the state of the task is running before executing tw or taking locks in io_cqring_wait(), switch it back to TASK_RUNNING immediately. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/246dddee247d89fd52023f785ed17cc34962a008.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: simplify io_has_workPavel Begunkov1-2/+1
->work_llist should never be non-empty for a non DEFER_TASKRUN ring, so we can safely skip checking the flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/26af9f73c09a56c9a035f94db56127358688f3aa.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: mimimise io_cqring_wait_schedulePavel Begunkov1-16/+23
io_cqring_wait_schedule() is called after we started waiting on the cq wq and set the state to TASK_INTERRUPTIBLE, for that reason we have to constantly worry whether we has returned the state back to running or not. Leave only quick checks in io_cqring_wait_schedule() and move the rest including running task work to the callers. Note, we run tw in the loop after the sched checks because of the fast path in the beginning of the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2814fabe75e2e019e7ca43ea07daa94564349805.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: parse check_cq out of wq waitingPavel Begunkov1-14/+18
We already avoid flushing overflows in io_cqring_wait_schedule() but only return an error for the outer loop to handle it. Minimise it even further by moving all ->check_cq parsing there. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dfcec3121013f98208dbf79368d636d74e1231a.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: move defer tw task checksPavel Begunkov2-9/+11
Most places that want to run local tw explicitly and in advance check if they are allowed to do so. Don't rely on a similar check in __io_run_local_work(), leave it as a just-in-case warning and make sure callers checks capabilities themselves. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/990fe0e8e70fd4d57e43625e5ce8fba584821d1a.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: kill io_run_task_work_ctxPavel Begunkov2-21/+5
There is only one user of io_run_task_work_ctx(), inline it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/40953c65f7c88fb00cdc4d870ca5d5319fb3d7ea.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: don't iterate cq wait fast pathPavel Begunkov1-10/+8
Task work runners keep running until all queues tw items are exhausted. It's also rare for defer tw to queue normal tw and vise versa. Taking it into account, there is only a dim chance that further iterating the io_cqring_wait() fast path will get us anything and so we can remove the loop there. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1f9565726661266abaa5d921e97433c831759ecf.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30io_uring: rearrange defer list checksPavel Begunkov2-4/+1
There should be nothing in the ->work_llist for non DEFER_TASKRUN rings, so we can skip flag checks and test the list emptiness directly. Also move it out of io_run_local_work() for inlining. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/331d63fd15ca79b35b95c82a82d9246110686392.1672916894.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-27io_uring: always prep_async for drain requestsDylan Yudaken1-10/+8
Drain requests all go through io_drain_req, which has a quick exit in case there is nothing pending (ie the drain is not useful). In that case it can run the issue the request immediately. However for safety it queues it through task work. The problem is that in this case the request is run asynchronously, but the async work has not been prepared through io_req_prep_async. This has not been a problem up to now, as the task work always would run before returning to userspace, and so the user would not have a chance to race with it. However - with IORING_SETUP_DEFER_TASKRUN - this is no longer the case and the work might be defered, giving userspace a chance to change data being referred to in the request. Instead _always_ prep_async for drain requests, which is simpler anyway and removes this issue. Cc: stable@vger.kernel.org Fixes: c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20230127105911.2420061-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-23io_uring/net: cache provided buffer group value for multishot receivesJens Axboe1-0/+11
If we're using ring provided buffers with multishot receive, and we end up doing an io-wq based issue at some points that also needs to select a buffer, we'll lose the initially assigned buffer group as io_ring_buffer_select() correctly clears the buffer group list as the issue isn't serialized by the ctx uring_lock. This is fine for normal receives as the request puts the buffer and finishes, but for multishot, we will re-arm and do further receives. On the next trigger for this multishot receive, the receive will try and pick from a buffer group whose value is the same as the buffer ID of the las receive. That is obviously incorrect, and will result in a premature -ENOUFS error for the receive even if we had available buffers in the correct group. Cache the buffer group value at prep time, so we can restore it for future receives. This only needs doing for the above mentioned case, but just do it by default to keep it easier to read. Cc: stable@vger.kernel.org Fixes: b3fdea6ecb55 ("io_uring: multishot recv") Fixes: 9bb66906f23e ("io_uring: support multishot in recvmsg") Cc: Dylan Yudaken <dylany@meta.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-21io_uring/poll: don't reissue in case of poll race on multishot requestJens Axboe1-1/+5
A previous commit fixed a poll race that can occur, but it's only applicable for multishot requests. For a multishot request, we can safely ignore a spurious wakeup, as we never leave the waitqueue to begin with. A blunt reissue of a multishot armed request can cause us to leak a buffer, if they are ring provided. While this seems like a bug in itself, it's not really defined behavior to reissue a multishot request directly. It's less efficient to do so as well, and not required to rearm anything like it is for singleshot poll requests. Cc: stable@vger.kernel.org Fixes: 6e5aedb9324a ("io_uring/poll: attempt request issue after racy poll wakeup") Reported-and-tested-by: Olivier Langlois <olivier@trillion01.com> Link: https://github.com/axboe/liburing/issues/778 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-20io_uring/msg_ring: fix remote queue to disabled ringPavel Begunkov2-2/+10
IORING_SETUP_R_DISABLED rings don't have the submitter task set, so it's not always safe to use ->submitter_task. Disallow posting msg_ring messaged to disabled rings. Also add task NULL check for loosy sync around testing for IORING_SETUP_R_DISABLED. Cc: stable@vger.kernel.org Fixes: 6d043ee1164ca ("io_uring: do msg_ring in target task via tw") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-20io_uring/msg_ring: fix flagging remote executionPavel Begunkov1-17/+23
There is a couple of problems with queueing a tw in io_msg_ring_data() for remote execution. First, once we queue it the target ring can go away and so setting IORING_SQ_TASKRUN there is not safe. Secondly, the userspace might not expect IORING_SQ_TASKRUN. Extract a helper and uniformly use TWA_SIGNAL without TWA_SIGNAL_NO_IPI tricks for now, just as it was done in the original patch. Cc: stable@vger.kernel.org Fixes: 6d043ee1164ca ("io_uring: do msg_ring in target task via tw") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-19io_uring/msg_ring: fix missing lock on overflow for IOPOLLJens Axboe1-9/+30
If the target ring is configured with IOPOLL, then we always need to hold the target ring uring_lock before posting CQEs. We could just grab it unconditionally, but since we don't expect many target rings to be of this type, make grabbing the uring_lock conditional on the ring type. Link: https://lore.kernel.org/io-uring/Y8krlYa52%2F0YGqkg@ip-172-31-85-199.ec2.internal/ Reported-by: Xingyuan Mo <hdthky0@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-19io_uring/msg_ring: move double lock/unlock helpers higher upJens Axboe1-24/+23
In preparation for needing them somewhere else, move them and get rid of the unused 'issue_flags' for the unlock side. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-19mm/nommu: factor out check for NOMMU shared mappings into ↵David Hildenbrand1-1/+1
is_nommu_shared_mapping() Patch series "mm/nommu: don't use VM_MAYSHARE for MAP_PRIVATE mappings". Trying to reduce the confusion around VM_SHARED and VM_MAYSHARE first requires !CONFIG_MMU to stop using VM_MAYSHARE for MAP_PRIVATE mappings. CONFIG_MMU only sets VM_MAYSHARE for MAP_SHARED mappings. This paves the way for further VM_MAYSHARE and VM_SHARED cleanups: for example, renaming VM_MAYSHARED to VM_MAP_SHARED to make it cleaner what is actually means. Let's first get the weird case out of the way and not use VM_MAYSHARE in MAP_PRIVATE mappings, using a new VM_MAYOVERLAY flag instead. This patch (of 3): We want to stop using VM_MAYSHARE in private mappings to pave the way for clarifying the semantics of VM_MAYSHARE vs. VM_SHARED and reduce the confusion. While CONFIG_MMU uses VM_MAYSHARE to represent MAP_SHARED, !CONFIG_MMU also sets VM_MAYSHARE for selected R/O private file mappings that are an effective overlay of a file mapping. Let's factor out all relevant VM_MAYSHARE checks in !CONFIG_MMU code into is_nommu_shared_mapping() first. Note that whenever VM_SHARED is set, VM_MAYSHARE must be set as well (unless there is a serious BUG). So there is not need to test for VM_SHARED manually. No functional change intended. Link: https://lkml.kernel.org/r/20230102160856.500584-1-david@redhat.com Link: https://lkml.kernel.org/r/20230102160856.500584-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicolas Pitre <nico@fluxnic.net> Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-13io_uring: lock overflowing for IOPOLLPavel Begunkov1-1/+5
syzbot reports an issue with overflow filling for IOPOLL: WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0 Workqueue: events_unbound io_ring_exit_work Call trace:  io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734  io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773  io_fill_cqe_req io_uring/io_uring.h:168 [inline]  io_do_iopoll+0x474/0x62c io_uring/rw.c:1065  io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513  io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056  io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869  process_one_work+0x2d8/0x504 kernel/workqueue.c:2289  worker_thread+0x340/0x610 kernel/workqueue.c:2436  kthread+0x12c/0x158 kernel/kthread.c:376  ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863 There is no real problem for normal IOPOLL as flush is also called with uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL, for which __io_cqring_overflow_flush() happens from the CQ waiting path. Reported-and-tested-by: syzbot+6805087452d72929404e@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-12io_uring/poll: attempt request issue after racy poll wakeupJens Axboe1-11/+20
If we have multiple requests waiting on the same target poll waitqueue, then it's quite possible to get a request triggered and get disappointed in not being able to make any progress with it. If we race in doing so, we'll potentially leave the poll request on the internal tables, but removed from the waitqueue. That means that any subsequent trigger of the poll waitqueue will not kick that request into action, causing an application to potentially wait for completion of a request that will never happen. Fix this by adding a new poll return state, IOU_POLL_REISSUE. Rather than have complicated logic for how to re-arm a given type of request, just punt it for a reissue. While in there, move the 'ret' variable to the only section where it gets used. This avoids confusion the scope of it. Cc: stable@vger.kernel.org Fixes: eb0089d629ba ("io_uring: single shot poll removal optimisation") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-10io_uring/fdinfo: include locked hash table in fdinfo outputJens Axboe1-2/+10
A previous commit split the hash table for polled requests into two parts, but didn't get the fdinfo output updated. This means that it's less useful for debugging, as we may think a given request is not pending poll. Fix this up by dumping the locked hash table contents too. Fixes: 9ca9fb24d5fe ("io_uring: mutex locked poll hashing") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-10io_uring/poll: add hash if ready poll request can't complete inlineJens Axboe1-5/+12
If we don't, then we may lose access to it completely, leading to a request leak. This will eventually stall the ring exit process as well. Cc: stable@vger.kernel.org Fixes: 49f1c68e048f ("io_uring: optimise submission side poll_refs") Reported-and-tested-by: syzbot+6c95df01470a47fc3af4@syzkaller.appspotmail.com Link: https://lore.kernel.org/io-uring/0000000000009f829805f1ce87b2@google.com/ Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>