summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2021-10-25block: add single bio async direct IO helperPavel Begunkov1-3/+84
As with __blkdev_direct_IO_simple(), we can implement direct IO more efficiently if there is only one bio. Add __blkdev_direct_IO_async() and blkdev_bio_end_io_async(). This patch brings me from 4.45-4.5 MIOPS with nullblk to 4.7+. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f0ae4109b7a6934adede490f84d188d53b97051b.1635006010.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-23Merge tag 'block-5.15-2021-10-22' of git://git.kernel.dk/linux-blockLinus Torvalds2-2/+4
Pull block fixes from Jens Axboe: "Fix for the cgroup code not ussing irq safe stats updates, and one fix for an error handling condition in add_partition()" * tag 'block-5.15-2021-10-22' of git://git.kernel.dk/linux-block: block: fix incorrect references to disk objects blk-cgroup: blk_cgroup_bio_start() should use irq-safe operations on blkg->iostat_cpu
2021-10-22blk-mq-sched: Don't reference queue tagset in blk_mq_sched_tags_teardown()John Garry1-1/+1
We should not reference the queue tagset in blk_mq_sched_tags_teardown() (see function comment) for the blk-mq flags, so use the passed flags instead. This solves a use-after-free, similarly fixed earlier (and since broken again) in commit f0c1c4d2864e ("blk-mq: fix use-after-free in blk_mq_exit_sched"). Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support") Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1634890340-15432-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22block: fix req_bio_endio append error handlingPavel Begunkov1-1/+1
Shinichiro Kawasaki reports that there is a bug in a recent req_bio_endio() patch causing problems with zonefs. As Shinichiro suggested, inverse the condition in zone append path to resemble how it was before: fail when it's not fully completed. Fixes: 478eb72b815f3 ("block: optimise req_bio_endio()") Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/344ea4e334aace9148b41af5f2426da38c8aa65a.1634914228.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22block: simplify the block device syncing codeChristoph Hellwig1-3/+14
Get rid of the indirections and just provide a sync_bdevs helper for the generic sync code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22block: remove __sync_blockdevChristoph Hellwig1-5/+6
Instead offer a new sync_blockdev_nowait helper for the !wait case. This new helper is exported as it will grow modular callers in a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062530.2174626-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22block: remove QUEUE_FLAG_SCSI_PASSTHROUGHChristoph Hellwig1-1/+0
Export scsi_device_from_queue for use with pktcdvd and use that instead of the otherwise unused QUEUE_FLAG_SCSI_PASSTHROUGH queue flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22block: remove the initialize_rq_fn blk_mq_ops methodChristoph Hellwig1-8/+1
Entirely unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22bsg-lib: initialize the bsg_job in bsg_transport_sg_io_fnChristoph Hellwig1-19/+13
Directly initialize the bsg_job structure instead of relying on the ->.initialize_rq_fn indirection. This also removes the superflous initialization of the second request used for BIDI requests. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20211021060607.264371-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21blk-crypto: rename blk_keyslot_manager to blk_crypto_profileEric Biggers4-318/+304
blk_keyslot_manager is misnamed because it doesn't necessarily manage keyslots. It actually does several different things: - Contains the crypto capabilities of the device. - Provides functions to control the inline encryption hardware. Originally these were just for programming/evicting keyslots; however, new functionality (hardware-wrapped keys) will require new functions here which are unrelated to keyslots. Moreover, device-mapper devices already (ab)use "keyslot_evict" to pass key eviction requests to their underlying devices even though device-mapper devices don't have any keyslots themselves (so it really should be "evict_key", not "keyslot_evict"). - Sometimes (but not always!) it manages keyslots. Originally it always did, but device-mapper devices don't have keyslots themselves, so they use a "passthrough keyslot manager" which doesn't actually manage keyslots. This hack works, but the terminology is unnatural. Also, some hardware doesn't have keyslots and thus also uses a "passthrough keyslot manager" (support for such hardware is yet to be upstreamed, but it will happen eventually). Let's stop having keyslot managers which don't actually manage keyslots. Instead, rename blk_keyslot_manager to blk_crypto_profile. This is a fairly big change, since for consistency it also has to update keyslot manager-related function names, variable names, and comments -- not just the actual struct name. However it's still a fairly straightforward change, as it doesn't change any actual functionality. Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-4-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21blk-crypto: rename keyslot-manager files to blk-crypto-profileEric Biggers4-4/+4
In preparation for renaming struct blk_keyslot_manager to struct blk_crypto_profile, rename the keyslot-manager.h and keyslot-manager.c source files. Renaming these files separately before making a lot of changes to their contents makes it easier for git to understand that they were renamed. Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-3-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21blk-crypto-fallback: properly prefix function and struct namesEric Biggers1-29/+30
For clarity, avoid using just the "blk_crypto_" prefix for functions and structs that are specific to blk-crypto-fallback. Instead, use "blk_crypto_fallback_". Some places already did this, but others didn't. This is also a prerequisite for using "struct blk_crypto_keyslot" to mean a generic blk-crypto keyslot (which is what it sounds like). Rename the fallback one to "struct blk_crypto_fallback_keyslot". No change in behavior. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21block: Add invalidate_disk() helper to invalidate the gendiskXie Yongji1-0/+20
To hide internal implementation and simplify some driver code, this adds a helper to invalidate the gendisk. It will clean the gendisk's associated buffer/page caches and reset its internal states. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210922123711.187-2-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21block: kill extra rcu lock/unlock in queue enterPavel Begunkov1-1/+1
blk_try_enter_queue() already takes rcu_read_lock/unlock, so we can avoid the second pair in percpu_ref_tryget_live(), use a newly added percpu_ref_tryget_live_rcu(). As rcu_read_lock/unlock imply barrier()s, it's pretty noticeable, especially for for !CONFIG_PREEMPT_RCU (default for some distributions), where __rcu_read_lock/unlock() are not inlined. 3.20% io_uring [kernel.vmlinux] [k] __rcu_read_unlock 3.05% io_uring [kernel.vmlinux] [k] __rcu_read_lock 2.52% io_uring [kernel.vmlinux] [k] __rcu_read_unlock 2.28% io_uring [kernel.vmlinux] [k] __rcu_read_lock Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b11c67ea495ed9d44f067622d852de4a510ce65.1634822969.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21block: convert fops.c magic constants to SHIFT_SECTORPavel Begunkov1-8/+10
Don't use shifting by a magic number 9 but replace with a more descriptive SHIFT_SECTOR. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/068782b9f7e97569fb59a99529b23bb17ea4c5e2.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21block: clean up blk_mq_submit_bio() mergingPavel Begunkov3-20/+9
Combine blk_mq_sched_bio_merge() and blk_attempt_plug_merge() under a common if, so we don't check it twice. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/daedc90d4029a5d1d73344771632b1faca3aaf81.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21block: optimise boundary blkdev_read_iter's checksPavel Begunkov1-8/+11
Combine pos and len checks and mark unlikely. Also, don't reexpand if it's not truncated. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fff34e613aeaae1ad12977dc4592cb1a1f5d3190.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21fs: bdev: fix conflicting comment from lookup_bdevJackie Liu1-3/+5
We switched to directly use dev_t to get block device, lookup changed the meaning of use, now we fix this conflicting comment. Fixes: 4e7b5671c6a8 ("block: remove i_bdev") Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211021071344.1600362-1-liu.yun@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21blk-mq: Fix blk_mq_tagset_busy_iter() for shared tagsJohn Garry1-2/+5
Since it is now possible for a tagset to share a single set of tags, the iter function should not re-iter the tags for the count of #hw queues in that case. Rather it should just iter once. Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support") Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Link: https://lore.kernel.org/r/1634550083-202815-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: cleanup the flush plug helpersChristoph Hellwig1-7/+6
Consolidate the various helpers into a single blk_flush_plug helper that takes a plk_plug and the from_scheduler bool and switch all callsites to call it directly. Checks that the plug is non-NULL must be performed by the caller, something that most already do anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: optimise blk_flush_plug_listPavel Begunkov1-2/+2
Don't call flush_plug_callbacks if there are no plug callbacks. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20blk-mq: move blk_mq_flush_plug_list to block/blk-mq.hChristoph Hellwig1-0/+1
This helper is internal to the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20blk-mq: only flush requests from the plug in blk_mq_submit_bioChristoph Hellwig1-1/+1
Replace the call to blk_flush_plug_list in blk_mq_submit_bio with a direct call to blk_mq_flush_plug_list. This means we do not flush plug callback from stackable devices, which doesn't really help with the accumulated requests anyway, and it also means the cached requests aren't freed here as they can still be used later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: remove inaccurate requeue checkJens Axboe1-1/+0
This check is meant to catch cases where a requeue is attempted on a request that is still inserted. It's never really been useful to catch any misuse, and now it's actively wrong. Outside of that, this should not be a BUG_ON() to begin with. Remove the check as it's now causing active harm, as requeue off the plug path will trigger it even though the request state is just fine. Reported-by: Yi Zhang <yi.zhang@redhat.com> Link: https://lore.kernel.org/linux-block/CAHj4cs80zAUc2grnCZ015-2Rvd-=gXRfB_dFKy=RTm+wRo09HQ@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: inline a part of bio_release_pages()Pavel Begunkov1-5/+2
Inline BIO_NO_PAGE_REF check of bio_release_pages() to avoid function call. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: don't bloat enter_queue with percpu_refPavel Begunkov1-1/+1
percpu_ref_put() are inlined for performance and bloat the binary, we don't care about the fail case of blk_try_enter_queue(), so we can replace it with a call to blk_queue_exit(). Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: optimise req_bio_endio()Pavel Begunkov1-9/+7
First, get rid of an extra branch and chain error checks. Also reshuffle it with bio_advance(), so it goes closer to the final check, with that the compiler loads rq->rq_flags only once, and also doesn't reload bio->bi_iter.bi_size if bio_advance() didn't actually advanced the iter. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: convert leftovers to bdev_get_queuePavel Begunkov2-2/+2
Convert bdev->bd_disk->queue to bdev_get_queue(), which is faster. Apparently, there are a few such spots in block that got lost during rebases. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20blk-mq: support concurrent queue quiesce/unquiesceMing Lei1-3/+19
blk_mq_quiesce_queue() has been used a bit wide now, so far we don't support concurrent/nested quiesce. One biggest issue is that unquiesce can happen unexpectedly in case that quiesce/unquiesce are run concurrently from more than one context. This patch introduces q->mq_quiesce_depth to deal concurrent quiesce, and we only unquiesce queue when it is the last/outer-most one of all contexts. Several kernel panic issue has been reported[1][2][3] when running stress quiesce test. And this patch has been verified in these reports. [1] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m1fc52431fad7f33b1ffc3f12c4450e4238540787 [2] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m10ad90afeb9c8cc318334190a7c24c8b5c5e0722 [3] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014081710.1871747-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block, bfq: fix UAF problem in bfqg_stats_init()Zheng Liang1-5/+7
In bfq_pd_alloc(), the function bfqg_stats_init() init bfqg. If blkg_rwstat_init() init bfqg_stats->bytes successful and init bfqg_stats->ios failed, bfqg_stats_init() return failed, bfqg will be freed. But blkg_rwstat->cpu_cnt is not deleted from the list of percpu_counters. If we traverse the list of percpu_counters, It will have UAF problem. we should use blkg_rwstat_exit() to cleanup bfqg_stats bytes in the above scenario. Fixes: commit fd41e60331b ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios") Signed-off-by: Zheng Liang <zhengliang6@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20211018024225.1493938-1-zhengliang6@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20block: inline fast path of driver tag allocationJens Axboe2-6/+17
If we don't use an IO scheduler or have shared tags, then we don't need to call into this external function at all. This saves ~2% for such a setup. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19blk-mq: don't handle non-flush requests in blk_insert_flushChristoph Hellwig3-15/+13
Return to the normal blk_mq_submit_bio flow if the bio did not end up actually being a flush because the device didn't support it. Note that this is basically impossible to hit without special instrumentation given that submit_bio_checks already clears these flags usually, so we'd need a tight race to actually hit this code path. With this the call to blk_mq_run_hw_queue for the flush requests can be removed given that the actual flush requests are always issued via the requeue workqueue which runs the queue unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: attempt direct issue of plug listJens Axboe2-0/+61
If we have just one queue type in the plug list, then we can extend our direct issue to cover a full plug list as well. This allows sending a batch of requests for direct issue, which is more efficient than doing one-at-a-time kind of issue. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: change plugging to use a singly linked listJens Axboe3-39/+49
Use a singly linked list for the blk_plug. This saves 8 bytes in the blk_plug struct, and makes for faster list manipulations than doubly linked lists. As we don't use the doubly linked lists for anything, singly linked is just fine. This yields a bump in default (merging enabled) performance from 7.0 to 7.1M IOPS, and ~7.5M IOPS with merging disabled. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19partitions/ibm: use bdev_nr_sectors instead of open coding itChristoph Hellwig1-9/+10
Use the proper helper to read the block device size and switch various places to pass the size in terms of sectors which is more practical. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-4-hch@lst.de [axboe: fix comment typo] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19partitions/efi: use bdev_nr_bytes instead of open coding itChristoph Hellwig1-1/+1
Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block/ioctl: use bdev_nr_sectors and bdev_nr_bytesChristoph Hellwig1-12/+8
Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019062024.2171074-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19blk-wbt: prevent NULL pointer dereference in wb_timer_fnAndrea Righi1-0/+3
The timer callback used to evaluate if the latency is exceeded can be executed after the corresponding disk has been released, causing the following NULL pointer dereference: [ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ 119.987617] #PF: supervisor read access in kernel mode [ 119.987971] #PF: error_code(0x0000) - not-present page [ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0 [ 119.988697] Oops: 0000 [#1] SMP NOPTI [ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi [ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 [ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0 [ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00 [ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246 [ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004 [ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780 [ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000 [ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500 [ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00 [ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000 [ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0 [ 119.995906] Call Trace: [ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30 [ 119.996505] blk_stat_timer_fn+0x138/0x140 [ 119.996830] call_timer_fn+0x2b/0x100 [ 119.997136] __run_timers.part.0+0x1d1/0x240 [ 119.997470] ? kvm_clock_get_cycles+0x11/0x20 [ 119.997826] ? ktime_get+0x3e/0xa0 [ 119.998110] ? native_apic_msr_write+0x2c/0x30 [ 119.998456] ? lapic_next_event+0x20/0x30 [ 119.998779] ? clockevents_program_event+0x94/0xf0 [ 119.999150] run_timer_softirq+0x2a/0x50 [ 119.999465] __do_softirq+0xcb/0x26f [ 119.999764] irq_exit_rcu+0x8c/0xb0 [ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90 [ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20 [ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20 In this case simply return from the timer callback (no action required) to prevent the NULL pointer dereference. BugLink: https://bugs.launchpad.net/bugs/1947557 Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/ Fixes: 34dbad5d26e2 ("blk-stat: convert to callback-based statistics reporting") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktop Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: align blkdev_dio inlined bio to a cachelineJens Axboe1-1/+1
We get all sorts of unreliable and funky results since the bio is designed to align on a cacheline, which it does not when inlined like this. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: move blk_mq_tag_to_rq() inlineJens Axboe2-34/+0
This is in the fast path of driver issue or completion, and it's a single array index operation. Move it inline to avoid a function call for it. This does mean making struct blk_mq_tags block layer public, but there's not really much in there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: get rid of plug list sortingJens Axboe1-19/+0
Even if we have multiple queues in the plug list, chances that they are very interspersed is minimal. Don't bother spending CPU cycles sorting the list. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: return whether or not to unplug through booleanJens Axboe3-14/+15
Instead of returning the same queue request through a request pointer, use a boolean to accomplish the same. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: don't call blk_status_to_errno in blk_update_requestChristoph Hellwig1-1/+1
We only need to call it to resolve the blk_status_t -> errno mapping for tracing, so move the conversion into the tracepoints that are not called at all when tracing isn't enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: move bdev_read_only() into the headerJens Axboe1-6/+0
This is called for every write in the fast path, move it inline next to get_disk_ro() which is called internally. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19block: fix too broad elevator check in blk_mq_free_request()Jens Axboe1-1/+1
We added RQF_ELV to tell whether there's an IO scheduler attached, and RQF_ELVPRIV tells us whether there's an IO scheduler with private data attached. Don't check RQF_ELV in blk_mq_free_request(), what we care about here is just if we have scheduler private data attached. This fixes a boot crash Fixes: 2ff0682da6e0 ("block: store elevator state in request") Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: cache inode size in bdevJens Axboe2-0/+2
Reading the inode size brings in a new cacheline for IO submit, and it's in the hot path being checked for every single IO. When doing millions of IOs per core per second, this is noticeable overhead. Cache the nr_sectors in the bdev itself. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: use bdev_nr_bytes instead of open coding it in blkdev_fallocateChristoph Hellwig1-1/+1
Use the proper helper to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211018101130.1838532-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: add support for blk_mq_end_request_batch()Jens Axboe3-19/+70
Instead of calling blk_mq_end_request() on a single request, add a helper that takes the new struct io_comp_batch and completes any request stored in there. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: add a struct io_comp_batch argument to fops->iopoll()Jens Axboe5-12/+15
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: provide helpers for rq_list manipulationJens Axboe1-14/+5
Instead of open-coding the list additions, traversal, and removal, provide a basic set of helpers. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>