summaryrefslogtreecommitdiff
path: root/block/blk-core.c
AgeCommit message (Collapse)AuthorFilesLines
2021-10-20block: cleanup the flush plug helpersChristoph Hellwig1-7/+6
Consolidate the various helpers into a single blk_flush_plug helper that takes a plk_plug and the from_scheduler bool and switch all callsites to call it directly. Checks that the plug is non-NULL must be performed by the caller, something that most already do anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: optimise blk_flush_plug_listPavel Begunkov1-2/+2
Don't call flush_plug_callbacks if there are no plug callbacks. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: don't bloat enter_queue with percpu_refPavel Begunkov1-1/+1
percpu_ref_put() are inlined for performance and bloat the binary, we don't care about the fail case of blk_try_enter_queue(), so we can replace it with a call to blk_queue_exit(). Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: convert leftovers to bdev_get_queuePavel Begunkov1-1/+1
Convert bdev->bd_disk->queue to bdev_get_queue(), which is faster. Apparently, there are a few such spots in block that got lost during rebases. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: attempt direct issue of plug listJens Axboe1-0/+1
If we have just one queue type in the plug list, then we can extend our direct issue to cover a full plug list as well. This allows sending a batch of requests for direct issue, which is more efficient than doing one-at-a-time kind of issue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: change plugging to use a singly linked listJens Axboe1-2/+2
Use a singly linked list for the blk_plug. This saves 8 bytes in the blk_plug struct, and makes for faster list manipulations than doubly linked lists. As we don't use the doubly linked lists for anything, singly linked is just fine. This yields a bump in default (merging enabled) performance from 7.0 to 7.1M IOPS, and ~7.5M IOPS with merging disabled. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: add a struct io_comp_batch argument to fops->iopoll()Jens Axboe1-4/+5
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move update request helpers into blk-mq.cJens Axboe1-145/+1
For some reason we still have them in blk-core, with the rest of the request completion being in blk-mq. That causes and out-of-line call for each completion. Move them into blk-mq.c instead, where they belong. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: remove useless caller argument to print_req_error()Jens Axboe1-5/+4
We have exactly one caller of this, just get rid of adding the useless function name to the output. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: use bdev_get_queue() in blk-core.cPavel Begunkov1-6/+7
Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached queue pointer and so is faster. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/efc41f880262517c8dc32f932f1b23112f21b255.1634219547.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: switch polling to be bio basedChristoph Hellwig1-32/+95
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: rename REQ_HIPRI to REQ_POLLEDChristoph Hellwig1-1/+1
Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague, the bio flag is specific to the polling implementation, so rename and document it properly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: inline hot paths of blk_account_io_*()Pavel Begunkov1-21/+9
Extract hot paths of __blk_account_io_start() and __blk_account_io_done() into inline functions, so we don't always pay for function calls. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b0662a636bd4cc7b4f84c9d0a41efa46a688ef13.1633781740.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: pre-allocate requests if plug is started and is a batchJens Axboe1-19/+28
The caller typically has a good (or even exact) idea of how many requests it needs to submit. We can make the request/tag allocation a lot more efficient if we just allocate N requests/tags upfront when we queue the first bio from the batch. Provide a new plug start helper that allows the caller to specify how many IOs are expected. This sets plug->nr_ios, and we can use that for smarter request allocation. The plug provides a holding spot for requests, and request allocation will check it before calling into the normal request allocation path. The blk_finish_plug() is called, check if there are unused requests and free them. This should not happen in normal operations. The exception is if we get merging, then we may be left with requests that need freeing when done. This raises the per-core performance on my setup from ~5.8M to ~6.1M IOPS. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move blk-throtl fast path inlineJens Axboe1-0/+1
Even if no policies are defined, we spend ~2% of the total IO time checking. Move the fast path inline. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq: Change shared sbitmap naming to shared tagsJohn Garry1-1/+1
Now that shared sbitmap support really means shared tags, rename symbols to match that. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1633429419-228500-15-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()John Garry1-1/+1
To be more concise and consistent in naming, rename blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs(). Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-7-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQJohn Garry1-1/+1
It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as the name BLKDEV_MAX_RQ would imply the max requests always, which it is not. Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being the default number of requests assigned when allocating a request queue. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-3-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: print the current process in handle_bad_sectorChristoph Hellwig1-1/+2
Make the bad sector information a little more useful by printing current->comm to identify the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20210928052755.113016-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move integrity handling out of <linux/blkdev.h>Christoph Hellwig1-0/+1
Split the integrity/metadata handling definitions out into a new header. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-16block: drain file system I/O on del_gendiskChristoph Hellwig1-15/+12
Instead of delaying draining of file system I/O related items like the blk-qos queues, the integrity read workqueue and timeouts only when the request_queue is removed, do that when del_gendisk is called. This is important for SCSI where the upper level drivers that control the gendisk are separate entities, and the disk can be freed much earlier than the request_queue, or can even be unbound without tearing down the queue. Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk") Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-5-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-16block: split bio_queue_enter from blk_queue_enterChristoph Hellwig1-8/+25
To prepare for fixing a gendisk shutdown race, open code the blk_queue_enter logic in bio_queue_enter. This also removes the pointless flags translation. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-4-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-16block: factor out a blk_try_enter_queue helperChristoph Hellwig1-28/+32
Factor out the code to try to get q_usage_counter without blocking into a separate helper. Both to improve code readability and to prepare for splitting bio_queue_enter from blk_queue_enter. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-3-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-16block: call submit_bio_checks under q_usage_counterChristoph Hellwig1-22/+12
Ensure all bios check the current values of the queue under freeze protection, i.e. to make sure the zero capacity set by del_gendisk is actually seen before dispatching to the driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210929071241.934472-2-hch@lst.de Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-31Merge tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull support for struct bio recycling from Jens Axboe: "This adds bio recycling support for polled IO, allowing quick reuse of a bio for high IOPS scenarios via a percpu bio_set list. It's good for almost a 10% improvement in performance, bumping our per-core IO limit from ~3.2M IOPS to ~3.5M IOPS" * tag 'io_uring-bio-cache.5-2021-08-30' of git://git.kernel.dk/linux-block: bio: improve kerneldoc documentation for bio_alloc_kiocb() block: provide bio_clear_hipri() helper block: use the percpu bio cache in __blkdev_direct_IO io_uring: enable use of bio alloc cache block: clear BIO_PERCPU_CACHE flag if polling isn't supported bio: add allocation cache abstraction fs: add kiocb alloc cache flag bio: optimize initialization of a bio
2021-08-31Merge tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-blockLinus Torvalds1-15/+3
Pull block updates from Jens Axboe: "Nothing major in here - lots of good cleanups and tech debt handling, which is also evident in the diffstats. In particular: - Add disk sequence numbers (Matteo) - Discard merge fix (Ming) - Relax disk zoned reporting restrictions (Niklas) - Bio error handling zoned leak fix (Pavel) - Start of proper add_disk() error handling (Luis, Christoph) - blk crypto fix (Eric) - Non-standard GPT location support (Dmitry) - IO priority improvements and cleanups (Damien)o - blk-throtl improvements (Chunguang) - diskstats_show() stack reduction (Abd-Alrhman) - Loop scheduler selection (Bart) - Switch block layer to use kmap_local_page() (Christoph) - Remove obsolete disk_name helper (Christoph) - block_device refcounting improvements (Christoph) - Ensure gendisk always has a request queue reference (Christoph) - Misc fixes/cleanups (Shaokun, Oliver, Guoqing)" * tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block: (129 commits) sg: pass the device name to blk_trace_setup block, bfq: cleanup the repeated declaration blk-crypto: fix check for too-large dun_bytes blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN blk-zoned: allow zone management send operations without CAP_SYS_ADMIN block: mark blkdev_fsync static block: refine the disk_live check in del_gendisk mmc: sdhci-tegra: Enable MMC_CAP2_ALT_GPT_TEGRA mmc: block: Support alternative_gpt_sector() operation partitions/efi: Support non-standard GPT location block: Add alternative_gpt_sector() operation bio: fix page leak bio_add_hw_page failure block: remove CONFIG_DEBUG_BLOCK_EXT_DEVT block: remove a pointless call to MINOR() in device_add_disk null_blk: add error handling support for add_disk() virtio_blk: add error handling support for add_disk() block: add error handling for device_add_disk / add_disk block: return errors from disk_alloc_events block: return errors from blk_integrity_add block: call blk_register_queue earlier in device_add_disk ...
2021-08-23block: provide bio_clear_hipri() helperJens Axboe1-5/+2
Any case that turns off REQ_HIPRI must also clear BIO_PERCPU_CACHE, as non-polled IO may complete through hard/soft IRQ and hence isn't safe for our polled bio alloc cache. Provide a helper that does just that, and use it in the merging code as well if we split a bio and turn off polling. Fixes: be863b9e4348 ("block: clear BIO_PERCPU_CACHE flag if polling isn't supported") Reported-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23block: clear BIO_PERCPU_CACHE flag if polling isn't supportedJens Axboe1-1/+4
The bio alloc cache relies on the fact that a polled bio will complete in process context, clear the cacheable flag if we disable polling for a given bio. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-17blk-mq: fix kernel panic during iterating over flush requestMing Lei1-1/+0
For fixing use-after-free during iterating over requests, we grabbed request's refcount before calling ->fn in commit 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter"). Turns out this way may cause kernel panic when iterating over one flush request: 1) old flush request's tag is just released, and this tag is reused by one new request, but ->rqs[] isn't updated yet 2) the flush request can be re-used for submitting one new flush command, so blk_rq_init() is called at the same time 3) meantime blk_mq_queue_tag_busy_iter() is called, and old flush request is retrieved from ->rqs[tag]; when blk_mq_put_rq_ref() is called, flush_rq->end_io may not be updated yet, so NULL pointer dereference is triggered in blk_mq_put_rq_ref(). Fix the issue by calling refcount_set(&flush_rq->ref, 1) after flush_rq->end_io is set. So far the only other caller of blk_rq_init() is scsi_ioctl_reset() in which the request doesn't enter block IO stack and the request reference count isn't used, so the change is safe. Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter") Reported-by: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de> Tested-by: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20210811142624.618598-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: move the bdi from the request_queue to the gendiskChristoph Hellwig1-10/+3
The backing device information only makes sense for file system I/O, and thus belongs into the gendisk and not the lower level request_queue structure. Move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09mm: hide laptop_mode_wb_timer entirely behind the BDI APIChristoph Hellwig1-5/+0
Don't leak the detaіls of the timer into the block layer, instead initialize the timer in bdi_alloc and delete it in bdi_unregister. Note that this means the timer is initialized (but not armed) for non-block queues as well now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-07block: fix the problem of io_ticks becoming smallerChunguang Xu1-1/+1
On the IO submission path, blk_account_io_start() may interrupt the system interruption. When the interruption returns, the value of part->stamp may have been updated by other cores, so the time value collected before the interruption may be less than part-> stamp. So when this happens, we should do nothing to make io_ticks more accurate? For kernels less than 5.0, this may cause io_ticks to become smaller, which in turn may cause abnormal ioutil values. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1625521646-1069-1-git-send-email-brookxu.cn@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-01block: remove REQ_OP_SCSI_{IN,OUT}Christoph Hellwig1-2/+0
With the legacy IDE driver gone drivers now use either REQ_OP_DRV_* or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests into a single one. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-03block: Update blk_update_request() documentationBart Van Assche1-8/+4
Although the original intent was to use blk_update_request() in stacking block drivers only, it is used much more widely today. Reflect this in the documentation block above this function. See also: * commit 32fab448e5e8 ("block: add request update interface"). * commit 2e60e02297cf ("block: clean up request completion API"). * commit ed6565e73424 ("block: handle partial completions for special payload requests"). Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210519175226.8853-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01block: unexport blk_alloc_queueChristoph Hellwig1-1/+0
blk_alloc_queue is just an internal helper now, unexport it and remove it from the public header. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-24block_dump: remove block_dump featurezhangyi (F)1-9/+0
We have already delete block_dump feature in mark_inode_dirty() because it can be replaced by tracepoints, now we also remove the part in submit_bio() for the same reason. The part of block dump feature in submit_bio() dump the write process, write region and sectors on the target disk into kernel message. it can be replaced by block_bio_queue tracepoint in submit_bio_checks(), so we do not need block_dump anymore, remove the whole block_dump feature. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210313030146.2882027-3-yi.zhang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-06block: refactor the bounce buffering codeChristoph Hellwig1-4/+2
Get rid of all the PFN arithmetics and just use an enum for the two remaining options, and use PageHighMem for the actual bounce decision. Add a fast path to entirely avoid the call for the common case of a queue not using the legacy bouncing code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210331073001.46776-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-22block: get rid of the trace rq insert wrapperChaitanya Kulkarni1-0/+1
Get rid of the wrapper for trace_block_rq_insert() and call the function directly. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: skip bio_check_eod for partition-remapped biosChristoph Hellwig1-5/+6
When an already remapped bio is resubmitted (e.g. by blk_queue_split), bio_check_eod will compare the remapped bi_sector against the size of the partition, leading to spurious I/O failures. Skip the EOD check in this case. Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: don't pass BIOSET_NEED_BVECS for q->bio_splitMing Lei1-1/+1
q->bio_split is only used by bio_split() for fast cloning bio, and no need to allocate bvecs, so remove this flag. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25blk-mq: use ->bi_bdev for I/O accountingChristoph Hellwig1-1/+5
Remove the reverse map from a sector to a partition for I/O accounting by simply using ->bi_bdev. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: use ->bi_bdev for bio based I/O accountingChristoph Hellwig1-10/+13
Rework the I/O accounting for bio based drivers to use ->bi_bdev. This means all drivers can now simply use bio_start_io_acct to start accounting, and it will take partitions into account automatically. To end I/O account either bio_end_io_acct can be used if the driver never remaps I/O to a different device, or bio_end_io_acct_remapped if the driver did remap the I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: do not reassig ->bi_bdev when partition remappingChristoph Hellwig1-2/+3
There is no good reason to reassign ->bi_bdev when remapping the partition-relative block number to the device wide one, as all the information required by the drivers comes from the gendisk anyway. Keeping the original ->bi_bdev alive will allow to greatly simplify the partition-away I/O accounting. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: simplify submit_bio_checks a bitChristoph Hellwig1-25/+14
Merge a few checks for whole devices vs partitions to streamline the sanity checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: store a block_device pointer in struct bioChristoph Hellwig1-20/+17
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-25block: add a hard-readonly flag to struct gendiskChristoph Hellwig1-3/+1
Commit 20bd1d026aac ("scsi: sd: Keep disk read-only when re-reading partition") addressed a long-standing problem with user read-only policy being overridden as a result of a device-initiated revalidate. The commit has since been reverted due to a regression that left some USB devices read-only indefinitely. To fix the underlying problems with revalidate we need to keep track of hardware state and user policy separately. The gendisk has been updated to reflect the current hardware state set by the device driver. This is done to allow returning the device to the hardware state once the user clears the BLKROSET flag. The resulting semantics are as follows: - If BLKROSET sets a given partition read-only, that partition will remain read-only even if the underlying storage stack initiates a revalidate. However, the BLKRRPART ioctl will cause the partition table to be dropped and any user policy on partitions will be lost. - If BLKROSET has not been set, both the whole disk device and any partitions will reflect the current write-protect state of the underlying device. Based on a patch from Martin K. Petersen <martin.petersen@oracle.com>. Reported-by: Oleksii Kurochko <olkuroch@cisco.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-01Merge tag 'scsi-fixes' of ↵Linus Torvalds1-6/+7
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi). The big core two fixes are for power management ("block: Do not accept any requests while suspended" and "block: Fix a race in the runtime power management code") which finally sorts out the resume problems we've occasionally been having. To make the resume fix, there are seven necessary precursors which effectively renames REQ_PREEMPT to REQ_PM, so every "special" request in block is automatically a power management exempt one. All of the non-PM preempt cases are removed except for the one in the SCSI Parallel Interface (spi) domain validation which is a genuine case where we have to run requests at high priority to validate the bus so this becomes an autopm get/put protected request" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits) scsi: cxgb4i: Fix TLS dependency scsi: ufs: Un-inline ufshcd_vops_device_reset function scsi: ufs: Re-enable WriteBooster after device reset scsi: ufs-mediatek: Use correct path to fix compile error scsi: mpt3sas: Signedness bug in _base_get_diag_triggers() scsi: block: Do not accept any requests while suspended scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE scsi: scsi_transport_spi: Set RQF_PM for domain validation commands scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT scsi: ide: Do not set the RQF_PREEMPT flag for sense requests scsi: block: Introduce BLK_MQ_REQ_PM scsi: block: Fix a race in the runtime power management code scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff() scsi: ufs-pci: Fix restore from S4 for Intel controllers scsi: ufs-mediatek: Keep VCC always-on for specific devices scsi: ufs: Allow regulators being always-on scsi: ufs: Clear UAC for RPMB after ufshcd resets ...
2020-12-09scsi: block: Do not accept any requests while suspendedAlan Stern1-3/+4
blk_queue_enter() accepts BLK_MQ_REQ_PM requests independent of the runtime power management state. Now that SCSI domain validation no longer depends on this behavior, modify the behavior of blk_queue_enter() as follows: - Do not accept any requests while suspended. - Only process power management requests while suspending or resuming. Submitting BLK_MQ_REQ_PM requests to a device that is runtime suspended causes runtime-suspended devices not to resume as they should. The request which should cause a runtime resume instead gets issued directly, without resuming the device first. Of course the device can't handle it properly, the I/O fails, and the device remains suspended. The problem is fixed by checking that the queue's runtime-PM status isn't RPM_SUSPENDED before allowing a request to be issued, and queuing a runtime-resume request if it is. In particular, the inline blk_pm_request_resume() routine is renamed blk_pm_resume_queue() and the code is unified by merging the surrounding checks into the routine. If the queue isn't set up for runtime PM, or there currently is no restriction on allowed requests, the request is allowed. Likewise if the BLK_MQ_REQ_PM flag is set and the status isn't RPM_SUSPENDED. Otherwise a runtime resume is queued and the request is blocked until conditions are more suitable. [ bvanassche: modified commit message and removed Cc: stable because without the previous patches from this series this patch would break parallel SCSI domain validation + introduced queue_rpm_status() ] Link: https://lore.kernel.org/r/20201209052951.16136-9-bvanassche@acm.org Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Can Guo <cang@codeaurora.org> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-and-tested-by: Martin Kepplinger <martin.kepplinger@puri.sm> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-09scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPTBart Van Assche1-4/+3
Remove flag RQF_PREEMPT and BLK_MQ_REQ_PREEMPT since these are no longer used by any kernel code. Link: https://lore.kernel.org/r/20201209052951.16136-8-bvanassche@acm.org Cc: Can Guo <cang@codeaurora.org> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Martin Kepplinger <martin.kepplinger@puri.sm> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-09scsi: block: Introduce BLK_MQ_REQ_PMBart Van Assche1-3/+4
Introduce the BLK_MQ_REQ_PM flag. This flag makes the request allocation functions set RQF_PM. This is the first step towards removing BLK_MQ_REQ_PREEMPT. Link: https://lore.kernel.org/r/20201209052951.16136-3-bvanassche@acm.org Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Can Guo <cang@codeaurora.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>