summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2023-02-21Merge tag 'irq-core-2023-02-20' of ↵Linus Torvalds1-50/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core: - Move the interrupt affinity spreading mechanism into lib/group_cpus so it can be used for similar spreading requirements, e.g. in the block multi-queue code This also contains a first usecase in the block multi-queue code which Jens asked to take along with the librarization - Improve irqdomain locking to close a number race conditions which can be observed with massive parallel device driver probing - Enforce and document the semantics of disable_irq() which cannot be invoked safely from non-sleepable context - Move the IPI multiplexing code from the Apple AIC driver into the core, so it can be reused by RISCV Drivers: - Plug OF node refcounting leaks in various drivers - Correctly mark level triggered interrupts in the Broadcom L2 drivers - The usual small fixes and improvements - No new drivers for the record!" * tag 'irq-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) irqchip/irq-bcm7120-l2: Set IRQ_LEVEL for level triggered interrupts irqchip/irq-brcmstb-l2: Set IRQ_LEVEL for level triggered interrupts irqdomain: Switch to per-domain locking irqchip/mvebu-odmi: Use irq_domain_create_hierarchy() irqchip/loongson-pch-msi: Use irq_domain_create_hierarchy() irqchip/gic-v3-mbi: Use irq_domain_create_hierarchy() irqchip/gic-v3-its: Use irq_domain_create_hierarchy() irqchip/gic-v2m: Use irq_domain_create_hierarchy() irqchip/alpine-msi: Use irq_domain_add_hierarchy() x86/uv: Use irq_domain_create_hierarchy() x86/ioapic: Use irq_domain_create_hierarchy() irqdomain: Clean up irq_domain_push/pop_irq() irqdomain: Drop leftover brackets irqdomain: Drop dead domain-name assignment irqdomain: Drop revmap mutex irqdomain: Fix domain registration race irqdomain: Fix mapping-creation race irqdomain: Refactor __irq_domain_alloc_irqs() irqdomain: Look for existing mapping only once irqdomain: Drop bogus fwspec-mapping error handling ...
2023-02-21block: remove more NULL checks after bdev_get_queue()Juhyung Park1-10/+0
bdev_get_queue() never returns NULL. Several commits [1][2] have been made before to remove such superfluous checks, but some still remained. For places where bdev_get_queue() is called solely for NULL checks, it is removed entirely. [1] commit ec9fd2a13d74 ("blk-lib: don't check bdev_get_queue() NULL check") [2] commit fea127b36c93 ("block: remove superfluous check for request queue in bdev_is_zoned()") Signed-off-by: Juhyung Park <qkrwngud825@gmail.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230203024029.48260-1-qkrwngud825@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-21iov_iter: Define flags to qualify page extraction.David Howells2-7/+7
Define flags to qualify page extraction to pass into iov_iter_*_pages*() rather than passing in FOLL_* flags. For now only a flag to allow peer-to-peer DMA is supported. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-02-21Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linuxLinus Torvalds34-780/+1128
Pull block updates from Jens Axboe: - NVMe updates via Christoph: - Small improvements to the logging functionality (Amit Engel) - Authentication cleanups (Hannes Reinecke) - Cleanup and optimize the DMA mapping cod in the PCIe driver (Keith Busch) - Work around the command effects for Format NVM (Keith Busch) - Misc cleanups (Keith Busch, Christoph Hellwig) - Fix and cleanup freeing single sgl (Keith Busch) - MD updates via Song: - Fix a rare crash during the takeover process - Don't update recovery_cp when curr_resync is ACTIVE - Free writes_pending in md_stop - Change active_io to percpu - Updates to drbd, inching us closer to unifying the out-of-tree driver with the in-tree one (Andreas, Christoph, Lars, Robert) - BFQ update adding support for multi-actuator drives (Paolo, Federico, Davide) - Make brd compliant with REQ_NOWAIT (me) - Fix for IOPOLL and queue entering, fixing stalled IO waiting on timeouts (me) - Fix for REQ_NOWAIT with multiple bios (me) - Fix memory leak in blktrace cleanup (Greg) - Clean up sbitmap and fix a potential hang (Kemeng) - Clean up some bits in BFQ, and fix a bug in the request injection (Kemeng) - Clean up the request allocation and issue code, and fix some bugs related to that (Kemeng) - ublk updates and fixes: - Add support for unprivileged ublk (Ming) - Improve device deletion handling (Ming) - Misc (Liu, Ziyang) - s390 dasd fixes (Alexander, Qiheng) - Improve utility of request caching and fixes (Anuj, Xiao) - zoned cleanups (Pankaj) - More constification for kobjs (Thomas) - blk-iocost cleanups (Yu) - Remove bio splitting from drivers that don't need it (Christoph) - Switch blk-cgroups to use struct gendisk. Some of this is now incomplete as select late reverts were done. (Christoph) - Add bvec initialization helpers, and convert callers to use that rather than open-coding it (Christoph) - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin, Matthew, Ulf, Zhong) * tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits) brd: use radix_tree_maybe_preload instead of radix_tree_preload block: use proper return value from bio_failfast() block: bio-integrity: Copy flags when bio_integrity_payload is cloned block: Fix io statistics for cgroup in throttle path brd: mark as nowait compatible brd: check for REQ_NOWAIT and set correct page allocation mask brd: return 0/-error from brd_insert_page() block: sync mixed merged request's failfast with 1st bio's Revert "blk-cgroup: pin the gendisk in struct blkcg_gq" Revert "blk-cgroup: pass a gendisk to blkg_lookup" Revert "blk-cgroup: delay blk-cgroup initialization until add_disk" Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release" Revert "blk-cgroup: move the cgroup information to struct gendisk" nvme-pci: remove iod use_sgls nvme-pci: fix freeing single sgl block: ublk: check IO buffer based on flag need_get_data s390/dasd: Fix potential memleak in dasd_eckd_init() s390/dasd: sort out physical vs virtual pointers usage block: Remove the ALLOC_CACHE_SLACK constant block: make kobj_type structures constant ...
2023-02-21Merge tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linuxLinus Torvalds1-4/+4
Pull io_uring ITER_UBUF conversion from Jens Axboe: "Since we now have ITER_UBUF available, switch to using it for single ranges as it's more efficient than ITER_IOVEC for that" * tag 'for-6.3/iter-ubuf-2023-02-16' of git://git.kernel.dk/linux: block: use iter_ubuf for single range iov_iter: move iter_ubuf check inside restore WARN io_uring: use iter_ubuf for single range imports io_uring: switch network send/recv to ITER_UBUF iov: add import_ubuf()
2023-02-19Merge tag 'irqchip-6.3' of ↵Thomas Gleixner1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - New and improved irqdomain locking, closing a number of races that became apparent now that we are able to probe drivers in parallel - A bunch of OF node refcounting bugs have been fixed - We now have a new IPI mux, lifted from the Apple AIC code and made common. It is expected that riscv will eventually benefit from it - Two small fixes for the Broadcom L2 drivers - Various cleanups and minor bug fixes Link: https://lore.kernel.org/r/20230218143452.3817627-1-maz@kernel.org
2023-02-17block: fix scan partition for exclusively open device againYu Kuai2-5/+27
As explained in commit 36369f46e917 ("block: Do not reread partition table on exclusively open device"), reread partition on the device that is exclusively opened by someone else is problematic. This patch will make sure partition scan will only be proceed if current thread open the device exclusively, or the device is not opened exclusively, and in the later case, other scanners and exclusive openers will be blocked temporarily until partition scan is done. Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") Cc: <stable@vger.kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230217022200.3092987-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-17block: Revert "block: Do not reread partition table on exclusively open device"Yu Kuai3-13/+9
This reverts commit 36369f46e91785688a5f39d7a5590e3f07981316. This patch can't fix the problem in a corner case that device can be opened exclusively after the checking and before blkdev_get_by_dev(). We'll use a new solution to fix the problem in the next patch, and the new solution doesn't need to change apis. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230217022200.3092987-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-17sed-opal: add support flag for SUM in status ioctlLuca Boccassi1-0/+2
Not every OPAL drive supports SUM (Single User Mode), so report this information to userspace via the get-status ioctl so that we can adjust the formatting options accordingly. Tested on a kingston drive (which supports it) and a samsung one (which does not). Signed-off-by: Luca Boccassi <bluca@debian.org> Link: https://lore.kernel.org/r/20230210010612.28729-1-luca.boccassi@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-17block: use proper return value from bio_failfast()Jens Axboe1-1/+1
kernel test robot complains about a type mismatch: block/blk-merge.c:984:42: sparse: expected restricted blk_opf_t const [usertype] ff block/blk-merge.c:984:42: sparse: got unsigned int block/blk-merge.c:1010:42: sparse: sparse: incorrect type in initializer (different base types) @@ expected restricted blk_opf_t const [usertype] ff @@ got unsigned int @@ block/blk-merge.c:1010:42: sparse: expected restricted blk_opf_t const [usertype] ff block/blk-merge.c:1010:42: sparse: got unsigned int because bio_failfast() is return an unsigned int rather than the appropriate blk_opt_f type. Fix it up. Fixes: 3ce6a115980c ("block: sync mixed merged request's failfast with 1st bio's") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302170743.GXypM9Rt-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16block: bio-integrity: Copy flags when bio_integrity_payload is clonedMartin K. Petersen1-0/+1
Make sure to copy the flags when a bio_integrity_payload is cloned. Otherwise per-I/O properties such as IP checksum flag will not be passed down to the HBA driver. Since the integrity buffer is owned by the original bio, the BIP_BLOCK_INTEGRITY flag needs to be masked off to avoid a double free in the completion path. Fixes: aae7df50190a ("block: Integrity checksum flag") Fixes: b1f01388574c ("block: Relocate bio integrity flags") Reported-by: Saurav Kashyap <skashyap@marvell.com> Tested-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230215171801.21062-1-martin.petersen@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16block: Fix io statistics for cgroup in throttle pathJinke Han1-11/+12
In the current code, io statistics are missing for cgroup when bio was throttled by blk-throttle. Fix it by moving the unreaching code to submit_bio_noacct_nocheck. Fixes: 3f98c753717c ("block: don't check bio in blk_throtl_dispatch_work_fn") Signed-off-by: Jinke Han <hanjinke.666@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230216032250.74230-1-hanjinke.666@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16block: sync mixed merged request's failfast with 1st bio'sMing Lei1-2/+33
We support mixed merge for requests/bios with different fastfail settings. When request fails, each time we only handle the portion with same failfast setting, then bios with failfast can be failed immediately, and bios without failfast can be retried. The idea is pretty good, but the current implementation has several defects: 1) initially RA bio doesn't set failfast, however bio merge code doesn't consider this point, and just check its failfast setting for deciding if mixed merge is required. Fix this issue by adding helper of bio_failfast(). 2) when merging bio to request front, if this request is mixed merged, we have to sync request's faifast setting with 1st bio's failfast. Fix it by calling blk_update_mixed_merge(). 3) when merging bio to request back, if this request is mixed merged, we have to mark the bio as failfast, because blk_update_request simply updates request failfast with 1st bio's failfast. Fix it by calling blk_update_mixed_merge(). Fixes one normal EXT4 READ IO failure issue, because it is observed that the normal READ IO is merged with RA IO, and the mixed merged request has different failfast setting with 1st bio's, so finally the normal READ IO doesn't get retried. Cc: Tejun Heo <tj@kernel.org> Fixes: 80a761fd33cf ("block: implement mixed merge of different failfast requests") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230209125527.667004-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-15block: export bio_split_rwChristoph Hellwig1-1/+2
bio_split_rw can be used by file systems to split and incoming write bio into multiple bios fitting the hardware limit for use as ZONE_APPEND bios. Export it for initial use in btrfs. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"Christoph Hellwig7-31/+33
This reverts commit 84d7d462b16dd5f0bf7c7ca9254bf81db2c952a2. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-15Revert "blk-cgroup: pass a gendisk to blkg_lookup"Christoph Hellwig2-18/+18
This reverts commit 821e840c08ad83736eced4037cdad864e95e2584. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-15Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"Christoph Hellwig1-9/+8
This reverts commit 178fa7d49815ea8001f43ade37a22072829fd8ab. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-15Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"Christoph Hellwig2-4/+3
This reverts commit c43332fe028c252a2a28e46be70a530f64fc3c9d as it is not needed without moving to disk references in the blkg. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-15Revert "blk-cgroup: move the cgroup information to struct gendisk"Christoph Hellwig5-48/+44
This reverts commit 3f13ab7c80fdb0ada86a8e3e818960bc1ccbaa59 as a patch it depends on caused a few problems. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-10block: Remove the ALLOC_CACHE_SLACK constantBart Van Assche1-1/+0
Commit b99182c501c3 ("bio: add pcpu caching for non-polling bio_put") removed the code that uses this constant. Hence also remove the constant itself. Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230209230135.3475829-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09block: make kobj_type structures constantThomas Weißschuh6-10/+10
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230208-kobj_type-block-v1-1-0b3eafd7d983@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09block: Merge bio before checking ->cached_rqXiao Ni1-3/+4
It checks if plug->cached_rq is empty before merging bio. But the merge action doesn't have relationship with plug->cached_rq, it trys to merge bio with requests within plug->mq_list. Now it checks if ->cached_rq is empty before merging bio. If it's empty, it will miss the merge chances. So move the merge function before checking ->cached_rq. Signed-off-by: Xiao Ni <xni@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230209031930.27354-1-xni@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09Revert "blk-cgroup: simplify blkg freeing from initialization failure paths"Christoph Hellwig1-7/+20
It turns out this was too soon. blkg_conf_prep does to funky locking games with the queue lock for this to work properly. This reverts commit 27b642b07a4a5eb44dffa94a5171ce468bdc46f9. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230209053523.437927-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-09blk-cgroup: delay calling blkcg_exit_disk until disk_releaseChristoph Hellwig2-3/+4
While del_gendisk ensures there is no outstanding I/O on the queue, it can't prevent block layer users from building new I/O. This leads to a NULL ->root_blkg reference in bio_associate_blkg when allocating a new bio on a shut down file system. Delay freeing the blk-cgroup subsystems from del_gendisk until disk_release to make sure the blkg and throttle information is still avaіlable for bio submitters, even if those bios will immediately fail. This now can cause a case where disk_release is called on a disk that hasn't been added. That's mostly harmless, except for a case in blk_throttl_exit that now needs to check for a NULL ->td pointer. Fixes: 178fa7d49815 ("blk-cgroup: delay blk-cgroup initialization until add_disk") Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230208063514.171485-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-07block, bfq: cleanup 'bfqg->online'Yu Kuai3-6/+2
After commit dfd6200a0954 ("blk-cgroup: support to track if policy is online"), there is no need to do this again in bfq. However, 'pd->online' is not protected by 'bfqd->lock', in order to make sure bfq won't see that 'pd->online' is still set after bfq_pd_offline(), clear it before bfq_pd_offline() is called. This is fine because other polices doesn't use 'pd->online' and bfq_pd_offline() will move active bfqq to root cgroup anyway. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230202134913.2364549-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: correct stale comment of .get_budgetKemeng Shi1-2/+2
Commit 88022d7201e96 ("blk-mq: don't handle failure in .get_budget") remove BLK_STS_RESOURCE return value and we only check if we can get the budget from .get_budget() now. Correct stale comment that ".get_budget() returns BLK_STS_NO_RESOURCE" to ".get_budget() fails to get the budget". Fixes: 88022d7201e9 ("blk-mq: don't handle failure in .get_budget") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: use switch/case to improve readability in blk_mq_try_issue_list_directlyKemeng Shi1-9/+13
Use switch/case handle error as other function do to improve readability in blk_mq_try_issue_list_directly. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: remove set of bd->last when get driver tag for next request failsKemeng Shi1-22/+2
Commit 113285b473824 ("blk-mq: ensure that bd->last is always set correctly") will set last if we failed to get driver tag for next request to avoid flush miss as we break the list walk and will not send the last request in the list which will be sent with last set normally. This code seems stale now becase the flush introduced is always redundant as: For case tag is really out, we will send a extra flush if we find list is not empty after list walk. For case some tag is freed before retry in blk_mq_prep_dispatch_rq for next, then we can get a tag for next request in retry and flush notified already is not necessary. Just remove these stale codes. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: remove unnecessary error count and check in blk_mq_dispatch_rq_listKemeng Shi1-6/+5
blk_mq_dispatch_rq_list will notify if hctx is busy in return bool. It will return true if we are not busy and can handle more and return false on the opposite. Inside blk_mq_dispatch_rq_list, errors is only used if list is empty and we will return true if list is empty and (errors + queued) != 0. There are three types of status returned from request: -busy error BLK_STS*_RESOURCE: the failed request will be added back to list and list will not be empty. -BLK_STS_OK: We count queued for BLK_STS_OK -rest error: We count errors for rest error If list is empty, there is no request gets busy error then (errors + queued) will be total requests in the list which is checked not empty at beginning of blk_mq_dispatch_rq_list. So (errors + queued) != 0 is always met if list is empty. Then the (errors + queued) != 0 check and errors number count is not needed. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: simplify flush check in blk_mq_dispatch_rq_listKemeng Shi1-3/+3
1. Remove check of needs_resource and ret == BLK_STS_DEV_RESOURCE. For busy error BLK_STS*_RESOURCE, request will always be added back to list, so need_resource will not be true and ret will not be == BLK_STS_DEV_RESOURCE if list is empty. We could remove these dead check. 2. Check ret of last request instead of errors If list is empty, we only need to explicitly commit_rqs if error happens at last request which is stored in ret. So check ret of last request instead of errors to remove unnecessary commit_rqs triggered by errors returned from previous request. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: use blk_mq_commit_rqs helper in blk_mq_try_issue_list_directlyKemeng Shi1-10/+3
Call blk_mq_commit_rqs instead of access ->commit_rqs directly. As you can see in comment of blk_mq_commit_rqs, we only need explicitly call this in two cases: -did not queue everything initially scheduled to queue -the last attempt to queue a request failed Both cases can be checked with ret of last request which breaks list walk. Then we can remove unnecessary error count and unnecessary commit triggered by error besides cases described above. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: remove unncessary error count and commit in blk_mq_plug_issue_directKemeng Shi1-10/+4
We need only to explicitly commit in two error cases: -did not queue everything initially scheduled to queue -the last attempt to queue a request failed (see comment of blk_mq_commit_rqs for more details). Both cases can be checked with ret of last request which breaks list walk. Remove unnecessary error count and unnecessary commit triggered by error which is not covered by cases described above. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: make blk_mq_commit_rqs a general function for all commitsKemeng Shi1-14/+23
1. move blk_mq_commit_rqs forward before functions need commits. 2. add queued check and only commits request if any request was queued in blk_mq_commit_rqs to keep commit behavior consistent and remove unnecessary commit. 3. split the queued clearing from blk_mq_plug_commit_rqs as it is not wanted general. 4. sync current caller of blk_mq_commit_rqs with new general blk_mq_commit_rqs. 5. document rule for unusual cases which need explicit commit_rqs. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_directKemeng Shi1-5/+5
Function blk_mq_plug_issue_direct tries to issue batch requests in plug list to driver directly. We will only issue plug request to driver if we are not from scheduler, so from_scheduler parameter of blk_mq_plug_issue_direct is always false. Remove unncessary from_scheduler of blk_mq_plug_issue_direct. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directlyKemeng Shi1-2/+1
We only break the list walk if we get 'BLK_STS_*RESOURCE'. We also count errors for 'BLK_STS_*RESOURCE' error. If list is not empty, errors will always be non-zero. So we can remove unnecessary list_empty check. This will remove redundant list_empty check for case that error happened at sending last request in list. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: Fix potential io hung for shared sbitmap per tagsetKemeng Shi1-2/+4
Commit f906a6a0f4268 ("blk-mq: improve tag waiting setup for non-shared tags") mark restart for unshared tags for improvement. At that time, tags is only shared betweens queues and we can check if tags is shared by test BLK_MQ_F_TAG_SHARED. Afterwards, commit 32bc15afed04b ("blk-mq: Facilitate a shared sbitmap per tagset") enabled tags share betweens hctxs inside a queue. We only mark restart for shared hctxs inside a queue and may cause io hung if there is no tag currently allocated by hctxs going to be marked restart. Wait on sbitmap_queue instead of mark restart for shared hctxs case to fix this. Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_waitKemeng Shi1-1/+5
For shared queues case, we will only wait on bitmap_tags if we fail to get driver tag. However, rq could be from breserved_tags, then two problems will occur: 1. io hung if no tag is currently allocated from bitmap_tags. 2. unnecessary wakeup when tag is freed to bitmap_tags while no tag is freed to breserved_tags. Wait on the bitmap which rq from to fix this. Fixes: f906a6a0f426 ("blk-mq: improve tag waiting setup for non-shared tags") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctxKemeng Shi1-2/+1
Commit 97889f9ac24f8 ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") remove handle of TAG_SHARED in restart, then shared_hctx_restart counted for how many hardware queues are marked for restart is removed too. Remove the stale comment that we still count hardware queues need restart. Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-mq: avoid sleep in blk_mq_alloc_request_hctxKemeng Shi1-1/+2
Commit 1f5bd336b9150 ("blk-mq: add blk_mq_alloc_request_hctx") add blk_mq_alloc_request_hctx to send commands to a specific queue. If BLK_MQ_REQ_NOWAIT is not set in tag allocation, we may change to different hctx after sleep and get tag from unexpected hctx. So BLK_MQ_REQ_NOWAIT must be set in flags for blk_mq_alloc_request_hctx. After commit 600c3b0cea784 ("blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx"), blk_mq_alloc_request_hctx return -EINVAL if both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED are not set instead of if BLK_MQ_REQ_NOWAIT is not set. So if BLK_MQ_REQ_NOWAIT is not set and BLK_MQ_REQ_RESERVED is set, blk_mq_alloc_request_hctx could alloc tag from unexpected hctx. I guess what we need here is that return -EINVAL if either BLK_MQ_REQ_NOWAIT or BLK_MQ_REQ_RESERVED is not set. Currently both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED will be set if specific hctx is needed in nvme_auth_submit, nvmf_connect_io_queue and nvmf_connect_admin_queue. Fix the potential BLK_MQ_REQ_NOWAIT missed case in future. Fixes: 600c3b0cea78 ("blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06block: stub out and deprecated the capability attribute on the gendiskChristoph Hellwig1-3/+2
The capability attribute was added in 2017 to expose the kernel internal GENHD_FL_MEDIA_CHANGE_NOTIFY to userspace without ever adding a value to an UAPI header, and without ever setting it in any driver until it was finally removed in Linux 5.7. Deprecate the file and always return 0 instead of exposing the other internal and frequently renumbered other gendisk flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230203150209.3199115-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-06blk-cgroup: fix freeing NULL blkg in blkg_createChristoph Hellwig1-1/+2
new_blkg can be NULL if the caller didn't pass in a pre-allocated blkg. Don't try to free it in that case. Fixes: 27b642b07a4a ("blk-cgroup: simplify blkg freeing from initialization failure paths") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230206150201.3438972-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03Merge tag 'block-6.2-2023-02-03' of git://git.kernel.dk/linuxLinus Torvalds4-4/+11
Pull block fixes from Jens Axboe: "A bit bigger than I'd like at this point, but mostly a bunch of little fixes. In detail: - NVMe pull request via Christoph: - Fix a missing queue put in nvmet_fc_ls_create_association (Amit Engel) - Clear queue pointers on tag_set initialization failure (Maurizio Lombardi) - Use workqueue dedicated to authentication (Shin'ichiro Kawasaki) - Fix for an overflow in ublk (Liu) - Fix for leaking a queue reference in block cgroups (Ming) - Fix for a use-after-free in BFQ (Yu)" * tag 'block-6.2-2023-02-03' of git://git.kernel.dk/linux: blk-cgroup: don't update io stat for root cgroup nvme-auth: use workqueue dedicated to authentication nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_set nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_set nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association block: Fix the blk_mq_destroy_queue() documentation block: ublk: extending queue_size to fix overflow block, bfq: fix uaf for bfqq in bic_set_bfqq()
2023-02-03block: factor out a bvec_set_page helperChristoph Hellwig2-16/+3
Add a helper to initialize a bvec based of a page pointer. This will help removing various open code bvec initializations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230203150634.3199647-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03blk-cgroup: move the cgroup information to struct gendiskChristoph Hellwig5-44/+48
cgroup information only makes sense on a live gendisk that allows file system I/O (which includes the raw block device). So move over the cgroup related members. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03blk-cgroup: pass a gendisk to blkg_lookupChristoph Hellwig2-18/+18
Pass a gendisk to blkg_lookup and use that to find the match as part of phasing out usage of the request_queue in the blk-cgroup code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03blk-cgroup: pass a gendisk to pd_alloc_fnChristoph Hellwig7-22/+21
No need to the request_queue here, pass a gendisk and extract the node ids from that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03blk-cgroup: pass a gendisk to blkcg_{de,}activate_policyChristoph Hellwig8-25/+25
Prepare for storing the blkcg information in the gendisk instead of the request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03blk-rq-qos: store a gendisk instead of request_queue in struct rq_qosChristoph Hellwig6-31/+27
This is what about half of the users already want, and it's only going to grow more. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03blk-rq-qos: constify rq_qos_opsChristoph Hellwig5-6/+6
These op vectors are constant, so mark them const. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03blk-rq-qos: make rq_qos_add and rq_qos_del more usefulChristoph Hellwig5-29/+21
Switch to passing a gendisk, and make rq_qos_add initialize all required fields and drop the not required q argument from rq_qos_del. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>