summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2023-06-14brd: use cond_resched instead of cond_resched_rcuPankaj Raghav1-1/+1
The body of the loop is run without RCU lock held. Use the regular cond_resched() instead of cond_resched_rcu(). Fixes: 786bb0245881 ("brd: use XArray instead of radix-tree to index backing pages") Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20230614133538.1279369-1-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-13swim3: fix the floppy_locked_ioctl prototypeChristoph Hellwig1-1/+1
Add back the accidentally dropped mode parameter. Fixes: b60f7635788a ("swim3: fix the floppy_locked_ioctl prototype") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230613154309.327557-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig21-98/+93
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12rnbd-srv: replace sess->open_flags with a "bool readonly"Christoph Hellwig3-11/+9
Stop passing the fmode_t around and just use a simple bool to track if an export is read-only. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20230608110258.189493-24-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: use the holder as indication for exclusive opensChristoph Hellwig5-23/+27
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12rnbd-srv: don't pass a holder for non-exclusive blkdev_get_by_pathChristoph Hellwig1-1/+1
Passing a holder to blkdev_get_by_path when FMODE_EXCL isn't set doesn't make sense, so pass NULL instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20230608110258.189493-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: remove the unused mode argument to ->releaseChristoph Hellwig13-17/+16
The mode argument to the ->release block_device_operation is never used, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: pass a gendisk to ->openChristoph Hellwig14-66/+63
->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: pass a gendisk on bdev_check_media_changeChristoph Hellwig5-14/+14
bdev_check_media_change should only ever be called for the whole device. Pass a gendisk to make that explicit and rename the function to disk_check_media_change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block/rnbd-srv: make process_msg_sess_info returns voidGuoqing Jiang1-6/+3
Change the return type to void given it always returns 0. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-9-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block/rnbd-srv: init err earlier in rnbd_srv_init_moduleGuoqing Jiang1-5/+3
With this, we can remove several lines of code. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-8-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block/rnbd-srv: init ret with 0 instead of -EPERMGuoqing Jiang1-4/+3
Let's always set errno after pr_err which is consistent with default case. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-7-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block/rnbd-srv: rename one member in rnbd_srv_devGuoqing Jiang2-8/+8
It actually represents the name of rnbd_srv_dev. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-6-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block/rnbd-srv: no need to check sess_devGuoqing Jiang1-1/+1
Check ret is enough since if sess_dev is NULL which also implies ret should be 0. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-5-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block/rnbd: introduce rnbd_access_modesGuoqing Jiang6-32/+16
Add one new array (marked with __maybe_unused to prevent gcc warning about "defined but not used" with W=1), then we can remove rnbd_access_mode_str and rnbd-common.c accordingly. Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20230524070026.2932-4-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block/rnbd-srv: remove unused headerGuoqing Jiang1-1/+0
No need to include it since none of macros in limits.h are used by rnbd-srv. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-3-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block/rnbd: kill rnbd_flags_supportedGuoqing Jiang1-22/+0
This routine is not called since added. Then the two flags (RNBD_OP_LAST and RNBD_F_ALL) can be removed too after kill rnbd_flags_supported. Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Link: https://lore.kernel.org/r/20230524070026.2932-2-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-10Merge tag 'block-6.4-2023-06-09' of git://git.kernel.dk/linuxLinus Torvalds1-0/+1
Pull block fixes from Jens Axboe: - Fix an issue with the hardware queue nr_active, causing it to become imbalanced (Tian) - Fix an issue with null_blk not releasing pages if configured as memory backed (Nitesh) - Fix a locking issue in dasd (Jan) * tag 'block-6.4-2023-06-09' of git://git.kernel.dk/linux: s390/dasd: Use correct lock while counting channel queue length null_blk: Fix: memory release when memory_backed=1 blk-mq: fix blk_mq_hw_ctx active request accounting
2023-06-07pktcdvd: Sort headersAndy Shevchenko1-15/+16
Sort the headers in alphabetic order in order to ease the maintenance for this part. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-10-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Get rid of redundant 'else'Andy Shevchenko1-7/+7
In the snippets like the following if (...) return / goto / break / continue ...; else ... the 'else' is redundant. Get rid of it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-9-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Use put_unaligned_be16() and get_unaligned_be16()Andy Shevchenko1-17/+14
This makes the driver code slightly better to understand. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230310164549.22133-8-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Use DEFINE_SHOW_ATTRIBUTE() to simplify codeAndy Shevchenko1-20/+3
Use DEFINE_SHOW_ATTRIBUTE() helper macro to simplify the code. No functional change. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-7-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Drop redundant castings for sector_tAndy Shevchenko1-16/+10
Since the commit 72deb455b5ec ("block: remove CONFIG_LBDAF") the sector_t is always 64-bit type, no need to cast anymore. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-6-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Get rid of pkt_seq_show() forward declarationAndy Shevchenko1-76/+75
The code can be neater without forward declarations. Get rid of pkt_seq_show() forward declaration. This will also allow futher cleanups to be cleaner. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-5-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: use sysfs_emit() to instead of scnprintf()Andy Shevchenko1-1/+1
Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-4-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: replace sscanf() by kstrtoul()Andy Shevchenko1-16/+18
The checkpatch.pl warns: "Prefer kstrto<type> to single variable sscanf". Fix the code accordingly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230310164549.22133-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07pktcdvd: Get rid of custom printing macrosAndy Shevchenko1-118/+129
We may use traditional dev_*() macros instead of custom ones provided by the driver. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230310164549.22133-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-07nbd: Add the maximum limit of allocated index in nbd_dev_addZhong Jinghua1-1/+2
If the index allocated by idr_alloc greater than MINORMASK >> part_shift, the device number will overflow, resulting in failure to create a block device. Fix it by imiting the size of the max allocation. Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230605122159.2134384-1-zhongjinghua@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-06rbd: get snapshot context after exclusive lock is ensured to be heldIlya Dryomov1-7/+23
Move capturing the snapshot context into the image request state machine, after exclusive lock is ensured to be held for the duration of dealing with the image request. This is needed to ensure correctness of fast-diff states (OBJECT_EXISTS vs OBJECT_EXISTS_CLEAN) and object deltas computed based off of them. Otherwise the object map that is forked for the snapshot isn't guaranteed to accurately reflect the contents of the snapshot when the snapshot is taken under I/O. This breaks differential backup and snapshot-based mirroring use cases with fast-diff enabled: since some object deltas may be incomplete, the destination image may get corrupted. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/61472 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2023-06-06rbd: move RBD_OBJ_FLAG_COPYUP_ENABLED flag settingIlya Dryomov1-11/+21
Move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting into the object request state machine to allow for the snapshot context to be captured in the image request state machine rather than in rbd_queue_workfn(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2023-06-06null_blk: Fix: memory release when memory_backed=1Nitesh Shetty1-0/+1
Memory/pages are not freed, when unloading nullblk driver. Steps to reproduce issue 1.free -h total used free shared buff/cache available Mem: 7.8Gi 260Mi 7.1Gi 3.0Mi 395Mi 7.3Gi Swap: 0B 0B 0B 2.modprobe null_blk memory_backed=1 3.dd if=/dev/urandom of=/dev/nullb0 oflag=direct bs=1M count=1000 4.modprobe -r null_blk 5.free -h total used free shared buff/cache available Mem: 7.8Gi 1.2Gi 6.1Gi 3.0Mi 398Mi 6.3Gi Swap: 0B 0B 0B Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Link: https://lore.kernel.org/r/20230605062354.24785-1-nj.shetty@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: introduce holder opsChristoph Hellwig6-7/+9
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05drbd: stop defining __KERNEL_SYSCALLS__Christoph Hellwig2-2/+0
__KERNEL_SYSCALLS__ hasn't been needed since Linux 2.6.19 so stop defining it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230601151646.1386867-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-04ublk: add control command of UBLK_U_CMD_GET_FEATURESMing Lei1-0/+21
Add control command of UBLK_U_CMD_GET_FEATURES for returning driver's feature set or capability. This way can simplify userspace for maintaining compatibility because userspace doesn't need to send command to one device for querying driver feature set any more. Such as, with the queried feature set, userspace can choose to use: - UBLK_CMD_GET_DEV_INFO2 or UBLK_CMD_GET_DEV_INFO, - UBLK_U_CMD_* or UBLK_CMD_* Userspace code: https://github.com/ming1/ubdsrv/commits/features-cmd Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230603040601.775227-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31floppy: use __bio_add_page for adding single page to bioJohannes Thumshirn1-1/+1
The floppy code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/33c445a3b431270c72d9be03d5da1b08ae983920.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31zram: use __bio_add_page for adding single page to bioJohannes Thumshirn1-1/+1
The zram writeback code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/cfd141dd7773315879a126f2aa81b7f698bc0e10.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-31drbd: use __bio_add_page to add page to bioJohannes Thumshirn1-3/+1
The drbd code only adds a single page to a newly created bio. So use __bio_add_page() to add the page which is guaranteed to succeed in this case. This brings us closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/435007afac14f3766455559059d21843771fae53.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-27Merge tag 'for-linus-6.4-rc4-tag' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a double free fix in the Xen pvcalls backend driver - a fix for a regression causing the MSI related sysfs entries to not being created in Xen PV guests - a fix in the Xen blkfront driver for handling insane input data better * tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/pci/xen: populate MSI sysfs entries xen/pvcalls-back: fix double frees with pvcalls_new_active_socket() xen/blkfront: Only check REQ_FUA for writes
2023-05-24xen/blkfront: Only check REQ_FUA for writesRoss Lagerwall1-1/+2
The existing code silently converts read operations with the REQ_FUA bit set into write-barrier operations. This results in data loss as the backend scribbles zeroes over the data instead of returning it. While the REQ_FUA bit doesn't make sense on a read operation, at least one well-known out-of-tree kernel module does set it and since it results in data loss, let's be safe here and only look at REQ_FUA for writes. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230426164005.2213139-1-ross.lagerwall@citrix.com Signed-off-by: Juergen Gross <jgross@suse.com>
2023-05-21ublk: fix build warning on iov_iter_get_pages2Ming Lei1-1/+2
Return type of iov_iter_get_pages2() is ssize_t instead of size_t, so fix it. Fixes: 981f95a571e3 ("ublk: cleanup ublk_copy_user_pages") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230520151134.459679-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-20ublk: support user copyMing Lei1-11/+47
Currently copy between io request buffer(pages) and userspace buffer is done inside ublk_map_io() or ublk_unmap_io(). This way performs very well in case of pre-allocated userspace io buffer. For dynamically allocated or external userspace backend io buffer, UBLK_F_NEED_GET_DATA is added for ublk server to provide buffer by one extra command communication for WRITE request. For READ, userspace simply provides buffer, but can't know when the buffer is done[1]. Add UBLK_F_USER_COPY by moving io data copy out of kernel by providing read()/write() on /dev/ublkcN, and simply let ublk server do the io data copy. This way makes both side cleaner, the cost is that one extra syscall for copy io data between request and backend buffer. With UBLK_F_USER_COPY, it actually becomes possible to run per-io zero copy now, such as, only do zero copy for big size IO, so it can be thought as one prep patch for supporting zero copy. Meantime zero copy still needs to expose read()/write() buffer for some corner case, such as passthrough IO. [1] READ buffer in UBLK_F_NEED_GET_DATA https://lore.kernel.org/linux-block/116d8a56-0881-56d3-9bcc-78ff3e1dc4e5@linux.alibaba.com/T/#m23bd4b8634c0a054e6797063167b469949a247bb ublksrv loop usercopy code: https://github.com/ming1/ubdsrv/commits/usercopy Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-20ublk: add read()/write() support for ublk char deviceMing Lei1-0/+151
Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so data copy between io request buffer and userspace buffer can be moved to ublk server from ublk driver. Then UBLK_F_NEED_GET_DATA becomes not necessary, so ublk server can allocate buffer without one extra round uring command communication for userspace to provide buffer. IO buffer can be located by iocb->ki_pos which encodes buffer offset, io tag and queue id info, and type of iocb->ki_pos is u64, so it is big enough for holding reasonable queue depth, nr_queues and max io buffer size. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-20ublk: support to copy any part of request pagesMing Lei1-7/+24
Add 'offset' to 'struct ublk_map_data', so that ublk_copy_user_pages() can be used to copy any sub-buffer(linear mapped) of the request. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-20ublk: grab request reference when the request is handled by userspaceMing Lei1-3/+64
Add one reference counter into request pdu data, and hold this reference in the request's lifetime. Prepare for supporting to move request data copy into userspace, which needs to copy request data by read()/write() on /dev/ublkcN, so we have to guarantee that read()/write() is done on one valid/active request, and that will be enhanced by holding the io request reference in read()/write(). Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-20ublk: cleanup ublk_copy_user_pagesMing Lei1-63/+49
Clean up ublk_copy_user_pages() by using iov_iter_get_pages2, and code gets simplified a lot and becomes much more readable than before. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-20ublk: cleanup io cmd code path by adding ublk_fill_io_cmd()Ming Lei1-9/+11
Add one small helper to cleanup io command hanlding code path. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-20ublk: kill queuing request by task_work_addMing Lei1-38/+2
task_work_add() is used from early ublk development stage for handling request in batch. However, since commit 7d4a93176e01 ("ublk_drv: don't forward io commands in reserve order"), we can get similar batch processing with io_uring_cmd_complete_in_task(), and similar performance data is observed between task_work_add() and io_uring_cmd_complete_in_task(). Meantime we can kill one fast code path, which is actually seldom used given it is common to build ublk driver as module. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-18ublk: fix AB-BA lockdep warningMing Lei1-2/+7
When handling UBLK_IO_FETCH_REQ, ctx->uring_lock is grabbed first, then ub->mutex is acquired. When handling UBLK_CMD_STOP_DEV or UBLK_CMD_DEL_DEV, ub->mutex is grabbed first, then calling io_uring_cmd_done() for canceling uring command, in which ctx->uring_lock may be required. Real deadlock only happens when all the above commands are issued from same uring context, and in reality different uring contexts are often used for handing control command and IO command. Fix the issue by using io_uring_cmd_complete_in_task() to cancel command in ublk_cancel_dev(ublk_cancel_queue). Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-block/becol2g7sawl4rsjq2dztsbc7mqypfqko6wzsyoyazqydoasml@rcxarzwidrhk Cc: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20230517133408.210944-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-16brd: use XArray instead of radix-tree to index backing pagesPankaj Raghav1-69/+24
XArray was introduced to hold large array of pointers with a simple API. XArray API also provides array semantics which simplifies the way we store and access the backing pages, and the code becomes significantly easier to understand. No performance difference was noticed between the two implementation using fio with direct=1 [1]. [1] Performance in KIOPS: | radix-tree | XArray | Diff | | | write | 315 | 313 | -0.6% randwrite | 286 | 290 | +1.3% read | 330 | 335 | +1.5% randread | 309 | 312 | +0.9% Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20230511121544.111648-1-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-12ublk: fix command op code checkMing Lei1-1/+1
In case of CONFIG_BLKDEV_UBLK_LEGACY_OPCODES, type of cmd opcode could be 0 or 'u'; and type can only be 'u' if CONFIG_BLKDEV_UBLK_LEGACY_OPCODES isn't set. So fix the wrong check. Fixes: 2d786e66c966 ("block: ublk: switch to ioctl command encoding") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230505153142.1258336-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>