summaryrefslogtreecommitdiff
path: root/block/blk-merge.c
AgeCommit message (Collapse)AuthorFilesLines
2019-05-23block: remove the bi_seg_{front,back}_size fields in struct bioChristoph Hellwig1-82/+12
At this point these fields aren't used for anything, so we can remove them. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-23block: remove the segment size check in bio_will_gapChristoph Hellwig1-18/+1
We fundamentally do not have a maximum segement size for devices with a virt boundary. So don't bother checking it, especially given that the existing checks didn't properly work to start with as we never fully update the front/back segment size and miss the bi_seg_front_size that wuld have been required for some cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-23block: don't decrement nr_phys_segments for physically contigous segmentsChristoph Hellwig1-22/+1
Currently ll_merge_requests_fn, unlike all other merge functions, reduces nr_phys_segments by one if the last segment of the previous, and the first segment of the next segement are contigous. While this seems like a nice solution to avoid building smaller than possible requests it causes a mismatch between the segments actually present in the request and those iterated over by the bvec iterators, including __rq_for_each_bio. This can for example mistrigger the single segment optimization in the nvme-pci driver, and might lead to mismatching nr_phys_segments number when recalculating the number of request when inserting a cloned request. We could possibly work around this by making the bvec iterators take the front and back segment size into account, but that would require moving them from the bio to the bio_iter and spreading this mess over all users of bvecs. Or we could simply remove this optimization under the assumption that most users already build good enough bvecs, and that the bio merge patch never cared about this optimization either. The latter is what this patch does. dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests"). Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-22block: avoid scatterlist offsets > PAGE_SIZEChristoph Hellwig1-1/+13
While we generally allow scatterlists to have offsets larger than page size for an entry, and other subsystems like the crypto code make use of that, the block layer isn't quite ready for that. Flip the switch back to avoid them for now, and revisit that decision early in a merge window once the known offenders are fixed. Fixes: 8a96a0e40810 ("block: rewrite blk_bvec_map_sg to avoid a nth_page call") Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-12block: rewrite blk_bvec_map_sg to avoid a nth_page callChristoph Hellwig1-15/+6
The offset in scatterlists is allowed to be larger than the page size, so don't go to great length to avoid that case and simplify the arithmetics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-08block: fix build warning in merging bvecsMing Lei1-6/+10
Commit f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can be mergeable") changes bvec merge by only considering two bvecs from different bios. However, if the former bio doesn't inlcude any io bvec, then the following warning may be triggered: warning: ‘bvec.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized] In practice, it shouldn't be triggered. Fixes it by adding check on former bio, the check shouldn't add any cost given 'bio->bi_iter' can be hit in cache. Reported-by: Jens Axboe <axboe@kernel.dk> Fixes: f6970f83ef79 ("block: don't check if adjacent bvecs in one bio can be mergeable") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-01block: don't check if adjacent bvecs in one bio can be mergeableMing Lei1-27/+42
Now both passthrough and FS IO have supported multi-page bvec, and bvec merging has been handled actually when adding page to bio, then adjacent bvecs won't be mergeable any more if they belong to same bio. So only try to merge bvecs if they are from different bios. Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-01block: reuse __blk_bvec_map_sg() for mapping page sized bvecMing Lei1-11/+9
Inside __blk_segment_map_sg(), page sized bvec mapping is optimized a bit with one standalone branch. So reuse __blk_bvec_map_sg() to do that. Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-01block: remove argument of 'request_queue' from __blk_bvec_map_sgMing Lei1-3/+3
The argument of 'request_queue' isn't used by __blk_bvec_map_sg(), so remove it. Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-01block: don't merge adjacent bvecs to one segment in bio blk_queue_splitMing Lei1-17/+0
For normal filesystem IO, each page is added via blk_add_page(), in which bvec(page) merge has been handled already, and basically not possible to merge two adjacent bvecs in one bio. So not try to merge two adjacent bvecs in blk_queue_split(). Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-06block: fix segment calculation for passthrough IOMing Lei1-7/+8
blk_recount_segments() can be called in bio_add_pc_page() for calculating how many segments this bio will has after one page is added to this bio. If the resulted segment number is beyond the queue limit, the added page will be removed. The try-and-fix policy requires blk_recount_segments(__blk_recalc_rq_segments) to not consider the segment number limit. Unfortunately bvec_split_segs() does check this limit, and causes small segment number returned to bio_add_pc_page(), then page still may be added to the bio even though segment number limit becomes broken. Fixes this issue by not considering segment number limit when calcualting bio's segment number. Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count") Cc: Christoph Hellwig <hch@lst.de> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-02block: fix updating bio's front segment sizeMing Lei1-0/+8
When the current bvec can be merged to the 1st segment, the bio's front segment size has to be updated. However, dcebd755926b doesn't consider that case, then bio's front segment size may not be correct. This patch fixes this issue. Cc: Christoph Hellwig <hch@lst.de> Cc: Omar Sandoval <osandov@fb.com> Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-27block: optimize blk_bio_segment_split for single-page bvecMing Lei1-3/+9
Introduce a fast path for single-page bvec IO, then we can avoid to call bvec_split_segs() unnecessarily. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-27block: optimize __blk_segment_map_sg() for single-page bvecMing Lei1-2/+7
Introduce a fast path for single-page bvec IO, then blk_bvec_map_sg() can be avoided. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-27block: introduce bvec_nth_page()Ming Lei1-1/+1
Single-page bvec can often be seen in small BS workloads, so introduce bvec_nth_page() for avoiding to call nth_page() unnecessarily, which looks not cheap. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-19block: avoid to READ fields of null bioMing Lei1-1/+3
rq->bio can be NULL sometimes, such as flush request, so don't read bio->bi_seg_front_size until this 'bio' is checked as valid. Cc: Bart Van Assche <bvanassche@acm.org> Reported-by: Bart Van Assche <bvanassche@acm.org> Fixes: dcebd755926b0f39dd1e ("block: use bio_for_each_bvec() to compute multi-page bvec count") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15block: kill QUEUE_FLAG_NO_SG_MERGEMing Lei1-25/+6
Since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"), physical segment number is mainly figured out in blk_queue_split() for fast path, and the flag of BIO_SEG_VALID is set there too. Now only blk_recount_segments() and blk_recalc_rq_segments() use this flag. Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID is set in blk_queue_split(). For another user of blk_recalc_rq_segments(): - run in partial completion branch of blk_update_request, which is an unusual case - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed since dm-rq is the only user. Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the current setup of the I/O path, as it isn't going to save you a significant amount of cycles. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15block: use bio_for_each_bvec() to map sgMing Lei1-20/+50
It is more efficient to use bio_for_each_bvec() to map sg, meantime we have to consider splitting multipage bvec as done in blk_bio_segment_split(). Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15block: use bio_for_each_bvec() to compute multi-page bvec countMing Lei1-20/+83
First it is more efficient to use bio_for_each_bvec() in both blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how many multi-page bvecs there are in the bio. Secondly once bio_for_each_bvec() is used, the bvec may need to be splitted because its length can be very longer than max segment size, so we have to split the big bvec into several segments. Thirdly when splitting multi-page bvec into segments, the max segment limit may be reached, so the bio split need to be considered under this situation too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15block: don't use bio->bi_vcnt to figure out segment numberMing Lei1-7/+1
It is wrong to use bio->bi_vcnt to figure out how many segments there are in the bio even though CLONED flag isn't set on this bio, because this bio may be splitted or advanced. So always use bio_segments() in blk_recount_segments(), and it shouldn't cause any performance loss now because the physical segment number is figured out in blk_queue_split() and BIO_SEG_VALID is set meantime since bdced438acd83ad83a6c ("block: setup bi_phys_segments after splitting"). Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 76d8137a3113 ("blk-merge: recaculate segment if it isn't less than max segments") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-27Revert "block: cover another queue enter recursion via BIO_QUEUE_ENTERED"Jens Axboe1-0/+10
We can't touch a bio after ->make_request_fn(), for all we know it could already have been completed by the time this function returns. This reverts commit 698cef173983b086977e633e46476e0f925ca01e. Reported-by: syzbot+4df6ca820108fd248943@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-01-22block: cover another queue enter recursion via BIO_QUEUE_ENTEREDMing Lei1-10/+0
Except for blk_queue_split(), bio_split() is used for splitting bio too, then the remained bio is often resubmit to queue via generic_make_request(). So the same queue enter recursion exits in this case too. Unfortunatley commit cd4a4ae4683dc2 doesn't help this case. This patch covers the above case by setting BIO_QUEUE_ENTERED before calling q->make_request_fn. In theory the per-bio flag is used to simulate one stack variable, it is just fine to clear it after q->make_request_fn is returned. Especially the same bio can't be submitted from another context. Fixes: cd4a4ae4683dc2 ("block: don't use blocking queue entered for recursive bio submits") Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: NeilBrown <neilb@suse.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-29Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-11/+7
Pull SCSI updates from James Bottomley: "This is mostly update of the usual drivers: smarpqi, lpfc, qedi, megaraid_sas, libsas, zfcp, mpt3sas, hisi_sas. Additionally, we have a pile of annotation, unused variable and minor updates. The big API change is the updates for Christoph's DMA rework which include removing the DISABLE_CLUSTERING flag. And finally there are a couple of target tree updates" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (259 commits) scsi: isci: request: mark expected switch fall-through scsi: isci: remote_node_context: mark expected switch fall-throughs scsi: isci: remote_device: Mark expected switch fall-throughs scsi: isci: phy: Mark expected switch fall-through scsi: iscsi: Capture iscsi debug messages using tracepoints scsi: myrb: Mark expected switch fall-throughs scsi: megaraid: fix out-of-bound array accesses scsi: mpt3sas: mpt3sas_scsih: Mark expected switch fall-through scsi: fcoe: remove set but not used variable 'port' scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown() scsi: smartpqi: fix build warnings scsi: smartpqi: update driver version scsi: smartpqi: add ofa support scsi: smartpqi: increase fw status register read timeout scsi: smartpqi: bump driver version scsi: smartpqi: add smp_utils support scsi: smartpqi: correct lun reset issues scsi: smartpqi: correct volume status scsi: smartpqi: do not offline disks for transient did no connect conditions scsi: smartpqi: allow for larger raid maps ...
2018-12-19scsi: block: remove the cluster flagChristoph Hellwig1-11/+7
Now that the the SCSI layer replaced the use of the cluster flag with segment size limits and the DMA boundary we can remove the cluster flag from the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-14block: remove the blk_recount_segments exportChristoph Hellwig1-1/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-10block: delete part_round_stats and switch to less precise countingMikulas Patocka1-1/+0
We want to convert to per-cpu in_flight counters. The function part_round_stats needs the in_flight counter every jiffy, it would be too costly to sum all the percpu variables every jiffy, so it must be deleted. part_round_stats is used to calculate two counters - time_in_queue and io_ticks. time_in_queue can be calculated without part_round_stats, by adding the duration of the I/O when the I/O ends (the value is almost as exact as the previously calculated value, except that time for in-progress I/Os is not counted). io_ticks can be approximated by increasing the value when I/O is started or ended and the jiffies value has changed. If the I/Os take less than a jiffy, the value is as exact as the previously calculated value. If the I/Os take more than a jiffy, io_ticks can drift behind the previously calculated value. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-10block: stop passing 'cpu' to all percpu stats methodsMike Snitzer1-3/+2
All of part_stat_* and related methods are used with preempt disabled, so there is no need to pass cpu around to allow of them. Just call smp_processor_id() as needed. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-04Merge tag 'v4.20-rc5' into for-4.21/blockJens Axboe1-1/+1
Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and also getting the merge fix that went into mainline that users are hitting testing for-4.21/block and/or for-next. * tag 'v4.20-rc5': (664 commits) Linux 4.20-rc5 PCI: Fix incorrect value returned from pcie_get_speed_cap() MAINTAINERS: Update linux-mips mailing list address ocfs2: fix potential use after free mm/khugepaged: fix the xas_create_range() error path mm/khugepaged: collapse_shmem() do not crash on Compound mm/khugepaged: collapse_shmem() without freezing new_page mm/khugepaged: minor reorderings in collapse_shmem() mm/khugepaged: collapse_shmem() remember to clear holes mm/khugepaged: fix crashes due to misaccounted holes mm/khugepaged: collapse_shmem() stop if punched or truncated mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() mm/huge_memory: splitting set mapping+index before unfreeze mm/huge_memory: rename freeze_page() to unmap_page() initramfs: clean old path before creating a hardlink kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace psi: make disabling/enabling easier for vendor kernels proc: fixup map_files test on arm debugobjects: avoid recursive calls with kmemleak userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set ...
2018-11-30block: fix single range discard mergeMing Lei1-1/+1
There are actually two kinds of discard merge: - one is the normal discard merge, just like normal read/write request, and call it single-range discard - another is the multi-range discard, queue_max_discard_segments(rq->q) > 1 For the former case, queue_max_discard_segments(rq->q) is 1, and we should handle this kind of discard merge like the normal read/write request. This patch fixes the following kernel panic issue[1], which is caused by not removing the single-range discard request from elevator queue. Guangwu has one raid discard test case, in which this issue is a bit easier to trigger, and I verified that this patch can fix the kernel panic issue in Guangwu's test case. [1] kernel panic log from Jens's report BUG: unable to handle kernel NULL pointer dereference at 0000000000000148 PGD 0 P4D 0. Oops: 0000 [#1] SMP PTI CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \ 4.20.0-rc3-00649-ge64d9a554a91-dirty #14 Hardware name: Wiwynn \ Leopard-Orv2/Leopard-DDR BW, BIOS LBM08 03/03/2017 Workqueue: kblockd \ blk_mq_run_work_fn RIP: \ 0010:blk_mq_get_driver_tag+0x81/0x120 Code: 24 \ 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \ 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \ f6 87 b0 00 00 00 02 RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246 \ RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8 RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000 R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300 R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000 FS: 0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: blk_mq_dispatch_rq_list+0xec/0x480 ? elv_rb_del+0x11/0x30 blk_mq_do_dispatch_sched+0x6e/0xf0 blk_mq_sched_dispatch_requests+0xfa/0x170 __blk_mq_run_hw_queue+0x5f/0xe0 process_one_work+0x154/0x350 worker_thread+0x46/0x3c0 kthread+0xf5/0x130 ? process_one_work+0x350/0x350 ? kthread_destroy_worker+0x50/0x50 ret_from_fork+0x1f/0x30 Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \ kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \ cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \ button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \ nvme_core fuse sg loop efivarfs autofs4 CR2: 0000000000000148 \ ---[ end trace 340a1fb996df1b9b ]--- RIP: 0010:blk_mq_get_driver_tag+0x81/0x120 Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \ 00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 \ 20 72 37 f6 87 b0 00 00 00 02 Fixes: 445251d0f4d329a ("blk-mq: fix discard merge with scheduler attached") Reported-by: Jens Axboe <axboe@kernel.dk> Cc: Guangwu Zhang <guazhang@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-20block: prevent merging of requests with different prioritiesDamien Le Moal1-2/+6
Growing in size a high priority request by merging it with a lower priority BIO or request will increase the request execution time. This is the opposite result of the desired effect of high I/O priorities, namely getting low I/O latencies. Prevent merging of requests and BIOs that have different I/O priorities to fix this. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-19Merge tag 'v4.20-rc3' into for-4.21/blockJens Axboe1-2/+3
Merge in -rc3 to resolve a few conflicts, but also to get a few important fixes that have gone into mainline since the block 4.21 branch was forked off (most notably the SCSI queue issue, which is both a conflict AND needed fix). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-15block: make blk_try_req_merge() staticEric Biggers1-1/+2
blk_try_req_merge() is only used in block/blk-merge.c, so make it static. This addresses a gcc warning when -Wmissing-prototypes is enabled. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-09block: make sure discard bio is aligned with logical block sizeMing Lei1-1/+2
Obviously the created discard bio has to be aligned with logical block size. This patch introduces the helper of bio_allowed_max_sectors() for this purpose. Cc: stable@vger.kernel.org Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Xiao Ni <xni@redhat.com> Cc: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Fixes: 744889b7cbb56a6 ("block: don't deal with discard limit in blkdev_issue_discard()") Fixes: a22c4d7e34402cc ("block: re-add discard_granularity and alignment checks") Reported-by: Rui Salvaterra <rsalvaterra@gmail.com> Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07block: kill request ->cpu memberJens Axboe1-2/+0
This was used for completion placement for the legacy path, but for mq we have rq->mq_ctx->cpu for that. Add a helper to get the request CPU assignment, as the mq_ctx type is private to blk-mq. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07blk-merge: kill dead queue lock held checkJens Axboe1-3/+0
This is dead code, any queue reaching this part has mq_ops attached. Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07block: remove req_no_special_merge() from merging codeJens Axboe1-22/+3
It'll always be false at this point, just remove it. Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07block: remove __blk_put_request()Jens Axboe1-1/+1
Now there's no difference between blk_put_request() and __blk_put_request() anymore, get rid of the underscore version and convert the few callers. Reviewed-by: Hannes Reinecke <hare@suse.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07block: remove dead elevator codeJens Axboe1-5/+0
This removes a bunch of core and elevator related code. On the core front, we remove anything related to queue running, draining, initialization, plugging, and congestions. We also kill anything related to request allocation, merging, retrieval, and completion. Remove any checking for single queue IO schedulers, as they no longer exist. This means we can also delete a bunch of code related to request issue, adding, completion, etc - and all the SQ related ops and helpers. Also kill the load_default_modules(), as all that did was provide for a way to load the default single queue elevator. Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-07block: respect virtual boundary mask in bvecsJohannes Thumshirn1-1/+1
With drivers that are settting a virtual boundary constrain, we are seeing a lot of bio splitting and smaller I/Os being submitted to the driver. This happens because the bio gap detection code does not account cases where PAGE_SIZE - 1 is bigger than queue_virt_boundary() and thus will split the bio unnecessarily. Cc: Jan Kara <jack@suse.cz> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-29block: fix the DISCARD request mergeJianchao Wang1-10/+36
There are two cases when handle DISCARD merge. If max_discard_segments == 1, the bios/requests need to be contiguous to merge. If max_discard_segments > 1, it takes every bio as a range and different range needn't to be contiguous. But now, attempt_merge screws this up. It always consider contiguity for DISCARD for the case max_discard_segments > 1 and cannot merge contiguous DISCARD for the case max_discard_segments == 1, because rq_attempt_discard_merge always returns false in this case. This patch fixes both of the two cases above. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24block: merge BIOVEC_SEG_BOUNDARY into biovec_phys_mergeableChristoph Hellwig1-24/+5
These two checks should always be performed together, so merge them into a single helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24block: simplify BIOVEC_PHYS_MERGEABLEChristoph Hellwig1-5/+5
Turn the macro into an inline, move it to blk.h and simplify the arch hooks a bit. Also rename the function to biovec_phys_mergeable as there is no need to shout. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-24block: move req_gap_{back,front}_merge to blk-merge.cChristoph Hellwig1-0/+65
Keep it close to the actual users instead of exposing the function to all drivers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-03block: don't use blocking queue entered for recursive bio submitsJens Axboe1-0/+10
If we end up splitting a bio and the queue goes away between the initial submission and the later split submission, then we can block forever in blk_queue_enter() waiting for the reference to drop to zero. This will never happen, since we already hold a reference. Mark a split bio as already having entered the queue, so we can just use the live non-blocking queue enter variant. Thanks to Tetsuo Handa for the analysis. Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-31block: convert bounce, q->bio_split to bioset_init()/mempool_init()Kent Overstreet1-4/+4
Convert the core block functionality to embedded bio sets. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-09block: consolidate struct request timestamp fieldsOmar Sandoval1-6/+5
Currently, struct request has four timestamp fields: - A start time, set at get_request time, in jiffies, used for iostats - An I/O start time, set at start_request time, in ktime nanoseconds, used for blk-stats (i.e., wbt, kyber, hybrid polling) - Another start time and another I/O start time, used for cfq and bfq These can all be consolidated into one start time and one I/O start time, both in ktime nanoseconds, shaving off up to 16 bytes from struct request depending on the kernel config. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-02blk-mq: fix discard merge with scheduler attachedJens Axboe1-3/+26
I ran into an issue on my laptop that triggered a bug on the discard path: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430 Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G U 4.15.0+ #176 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017 RIP: 0010:nvme_setup_cmd+0x3d3/0x430 RSP: 0018:ffff880423e9f838 EFLAGS: 00010217 RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000 RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000 RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009 R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440 FS: 0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: nvme_queue_rq+0x40/0xa00 ? __sbitmap_queue_get+0x24/0x90 ? blk_mq_get_tag+0xa3/0x250 ? wait_woken+0x80/0x80 ? blk_mq_get_driver_tag+0x97/0xf0 blk_mq_dispatch_rq_list+0x7b/0x4a0 ? deadline_remove_request+0x49/0xb0 blk_mq_do_dispatch_sched+0x4f/0xc0 blk_mq_sched_dispatch_requests+0x106/0x170 __blk_mq_run_hw_queue+0x53/0xa0 __blk_mq_delay_run_hw_queue+0x83/0xa0 blk_mq_run_hw_queue+0x6c/0xd0 blk_mq_sched_insert_request+0x96/0x140 __blk_mq_try_issue_directly+0x3d/0x190 blk_mq_try_issue_directly+0x30/0x70 blk_mq_make_request+0x1a4/0x6a0 generic_make_request+0xfd/0x2f0 ? submit_bio+0x5c/0x110 submit_bio+0x5c/0x110 ? __blkdev_issue_discard+0x152/0x200 submit_bio_wait+0x43/0x60 ext4_process_freed_data+0x1cd/0x440 ? account_page_dirtied+0xe2/0x1a0 ext4_journal_commit_callback+0x4a/0xc0 jbd2_journal_commit_transaction+0x17e2/0x19e0 ? kjournald2+0xb0/0x250 kjournald2+0xb0/0x250 ? wait_woken+0x80/0x80 ? commit_timeout+0x10/0x10 kthread+0x111/0x130 ? kthread_create_worker_on_cpu+0x50/0x50 ? do_group_exit+0x3a/0xa0 ret_from_fork+0x1f/0x30 Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff ---[ end trace 50d361cc444506c8 ]--- print_req_error: I/O error, dev nvme0n1, sector 847167488 Decoding the assembly, the request claims to have 0xffff segments, while nvme counts two. This turns out to be because we don't check for a data carrying request on the mq scheduler path, and since blk_phys_contig_segment() returns true for a non-data request, we decrement the initial segment count of 0 and end up with 0xffff in the unsigned short. There are a few issues here: 1) We should initialize the segment count for a discard to 1. 2) The discard merging is currently using the data limits for segments and sectors. Fix this up by having attempt_merge() correctly identify the request, and by initializing the segment count correctly for discards. This can only be triggered with mq-deadline on discard capable devices right now, which isn't a common configuration. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-10Revert "block: blk-merge: try to make front segments in full size"Ming Lei1-49/+5
This reverts commit a2d37968d784363842f87820a21e106741d28004. If max segment size isn't 512-aligned, this patch won't work well. Also once multipage bvec is enabled, adjacent bvecs won't be physically contiguous if page is added via bio_add_page(), so we don't need this kind of complicated logic. Reported-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06block: blk-merge: remove unnecessary checkMing Lei1-3/+1
In this case, 'sectors' can't be zero at all, so remove the check and let the bio be split. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06block: blk-merge: try to make front segments in full sizeMing Lei1-5/+49
When merging one bvec into segment, if the bvec is too big to merge, current policy is to move the whole bvec into another new segment. This patchset changes the policy into trying to maximize size of front segments, that means in above situation, part of bvec is merged into current segment, and the remainder is put into next segment. This patch prepares for support multipage bvec because it can be quite common to see this case and we should try to make front segments in full size. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>