summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2023-04-03block: ublk_drv: clean up several helpersMing Lei1-13/+5
Convert the following pattern in several helpers if (Z) return true return false into: return Z; Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-03block: ublk_drv: add two helpers to clean up map/unmap requestMing Lei1-5/+14
Add two helpers for checking if map/unmap is needed, since we may have passthrough request which needs map or unmap in future, such as for supporting report zones. Meantime don't mark ublk_copy_user_pages as inline since this function is a bit fat now. Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-03block: ublk_drv: don't consider flush request in map/unmap ioMing Lei1-7/+3
There isn't data in request of REQ_OP_FLUSH always, so don't consider it in both ublk_map_io() and ublk_unmap_io(). Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-03block: ublk_drv: add common exit handlingMing Lei1-6/+9
Simplify exit handling a bit, and prepare for supporting fused command. Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02pktcdvd: simplify the class_pktcdvd logicGreg Kroah-Hartman1-28/+12
There is no need to dynamically create and destroy the class_pktcdvd structure, just make it static and remove the memory allocation logic which simplifies and cleans up the logic a lot. Cc: linux-block@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20230331164724.319703-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-02drbd: Pass a peer device to the resync and online verify functionsChristoph Böhmwalder6-108/+126
Originally-from: Andreas Grünbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02drbd: pass drbd_peer_device to __req_modChristoph Böhmwalder5-31/+46
In preparation to support multiple connections, we need to know which one we need to modify the request state for. Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02drbd: drbd_uuid_compare: pass a peer_deviceChristoph Böhmwalder1-4/+5
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02drbd: INFO_bm_xfer_stats(): Pass a peer device argumentAndreas Gruenbacher3-7/+7
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02drbd: Add peer device parameter to whole-bitmap I/O handlersAndreas Gruenbacher7-62/+96
Pass a peer device parameter through the bitmap I/O functions to the I/O handlers. In after_state_ch(), set that parameter when queuing the drbd_send_bitmap operation so that this operation knows where to send the bitmap. Signed-off-by: Andreas Gruenbacher <agruen@kernel.org> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02drbd: Rip out the ERR_IF_CNT_IS_NEGATIVE macroAndreas Gruenbacher1-22/+15
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02null_blk: use kmap_local_page() and kunmap_local()Chaitanya Kulkarni1-4/+4
Replace the deprecated API kmap_atomic() and kunmap_atomic() with kmap_local_page() and kunmap_local() in null_flush_cache_page(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230330184926.64209-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02null_blk: use non-deprecated lib functionsChaitanya Kulkarni1-22/+7
Use library helper memcpy_page() to copy source page into destination instead of having duplicate code in copy_to_nullb() & copy_from_nullb(). In copy_from_nullb() also replace the memset call with zero_user(). Use library helper memset_page() to set the buffer to 0xFF instead of having duplicate code. This also removes deprecated API kmap_atomic() from copy_to_nullb() copy_from_nullb() and nullb_fill_pattern(), from :include/linux/highmem.h: "kmap_atomic - Atomically map a page for temporary usage - Deprecated!" Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230330184926.64209-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-29driver core: class: mark the struct class for sysfs callbacks as constantGreg Kroah-Hartman2-9/+8
struct class should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct class to be moved to read-only memory. While we are touching all class sysfs callbacks also mark the attribute as constant as it can not be modified. The bonding code still uses this structure so it can not be removed from the function callbacks. Cc: "David S. Miller" <davem@davemloft.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Russ Weight <russell.h.weight@intel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steve French <sfrench@samba.org> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: linux-cifs@vger.kernel.org Cc: linux-gpio@vger.kernel.org Cc: linux-mtd@lists.infradead.org Cc: linux-rdma@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: netdev@vger.kernel.org Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230325084537.3622280-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-27loop: LOOP_CONFIGURE: send uevents for partitionsAlyssa Ross1-9/+9
LOOP_CONFIGURE is, as far as I understand it, supposed to be a way to combine LOOP_SET_FD and LOOP_SET_STATUS64 into a single syscall. When using LOOP_SET_FD+LOOP_SET_STATUS64, a single uevent would be sent for each partition found on the loop device after the second ioctl(), but when using LOOP_CONFIGURE, no such uevent was being sent. In the old setup, uevents are disabled for LOOP_SET_FD, but not for LOOP_SET_STATUS64. This makes sense, as it prevents uevents being sent for a partially configured device during LOOP_SET_FD - they're only sent at the end of LOOP_SET_STATUS64. But for LOOP_CONFIGURE, uevents were disabled for the entire operation, so that final notification was never issued. To fix this, reduce the critical section to exclude the loop_reread_partitions() call, which causes the uevents to be issued, to after uevents are re-enabled, matching the behaviour of the LOOP_SET_FD+LOOP_SET_STATUS64 combination. I noticed this because Busybox's losetup program recently changed from using LOOP_SET_FD+LOOP_SET_STATUS64 to LOOP_CONFIGURE, and this broke my setup, for which I want a notification from the kernel any time a new partition becomes available. Signed-off-by: Alyssa Ross <hi@alyssa.is> [hch: reduced the critical section] Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: 3448914e8cc5 ("loop: Add LOOP_CONFIGURE ioctl") Link: https://lore.kernel.org/r/20230320125430.55367-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-23driver core: bus: mark the struct bus_type for sysfs callbacks as constantGreg Kroah-Hartman1-19/+15
struct bus_type should never be modified in a sysfs callback as there is nothing in the structure to modify, and frankly, the structure is almost never used in a sysfs callback, so mark it as constant to allow struct bus_type to be moved to read-only memory. Cc: "David S. Miller" <davem@davemloft.net> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Hu Haowen <src.res@email.cn> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Stuart Yoder <stuyoder@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Acked-by: Ilya Dryomov <idryomov@gmail.com> # rbd Acked-by: Ira Weiny <ira.weiny@intel.com> # cxl Reviewed-by: Alex Shi <alexs@kernel.org> Acked-by: Iwona Winiarska <iwona.winiarska@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi Link: https://lore.kernel.org/r/20230313182918.1312597-23-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-21block/io_uring: pass in issue_flags for uring_cmd task_work handlingJens Axboe1-13/+18
io_uring_cmd_done() currently assumes that the uring_lock is held when invoked, and while it generally is, this is not guaranteed. Pass in the issue_flags associated with it, so that we have IO_URING_F_UNLOCKED available to be able to lock the CQ ring appropriately when completing events. Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-18block: ublk_drv: mark device as LIVE before adding diskMing Lei1-1/+2
IO can be started before add_disk() returns, such as reading parititon table, then the monitor work should work for making forward progress. So mark device as LIVE before adding disk, meantime change to DEAD if add_disk() fails. Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Reviewed-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230318141231.55562-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-17driver core: class: remove module * from class_create()Greg Kroah-Hartman4-4/+4
The module pointer in class_create() never actually did anything, and it shouldn't have been requred to be set as a parameter even if it did something. So just remove it and fix up all callers of the function in the kernel tree at the same time. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17drivers: remove struct module * setting from struct classGreg Kroah-Hartman2-2/+0
There is no need to manually set the owner of a struct class, as the registering function does it automatically, so remove all of the explicit settings from various drivers that did so as it is unneeded. This allows us to remove this pointer entirely from this structure going forward. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20230313181843.1207845-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-15block: sunvdc: add check for mdesc_grab() returning NULLLiang He1-0/+2
In vdc_port_probe(), we should check the return value of mdesc_grab() as it may return NULL, which can cause potential NPD bug. Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.") Signed-off-by: Liang He <windhl@126.com> Link: https://lore.kernel.org/r/20230315062032.1741692-1-windhl@126.com [axboe: style cleanup] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15block: null_blk: cleanup null_queue_rq()Damien Le Moal1-15/+14
Use a local struct request pointer variable to avoid having to dereference struct blk_mq_queue_data multiple times. While at it, also fix the function argument indentation and remove a useless "else" after a return. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15block: null_blk: Fix handling of fake timeout requestDamien Le Moal1-3/+3
When injecting a fake timeout into the null_blk driver using fail_io_timeout, the request timeout handler does not execute blk_mq_complete_request(), so the complete callback is never executed for a timedout request. The null_blk driver also has a driver-specific fake timeout mechanism which does not have this problem. Fix the problem with fail_io_timeout by using the same meachanism as null_blk internal timeout feature, using the fake_timeout field of null_blk commands. Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Fixes: de3510e52b0a ("null_blk: fix command timeout completion handling") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15loop: Fix use-after-free issuesBart Van Assche1-8/+17
do_req_filebacked() calls blk_mq_complete_request() synchronously or asynchronously when using asynchronous I/O unless memory allocation fails. Hence, modify loop_handle_cmd() such that it does not dereference 'cmd' nor 'rq' after do_req_filebacked() finished unless we are sure that the request has not yet been completed. This patch fixes the following kernel crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000054 Call trace: css_put.42938+0x1c/0x1ac loop_process_work+0xc8c/0xfd4 loop_rootcg_workfn+0x24/0x34 process_one_work+0x244/0x558 worker_thread+0x400/0x8fc kthread+0x16c/0x1e0 ret_from_fork+0x10/0x20 Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Schatzberg <schatzberg.dan@gmail.com> Fixes: c74d40e8b5e2 ("loop: charge i/o to mem and blk cg") Fixes: bc07c10a3603 ("block: loop: support DIO & AIO") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230314182155.80625-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-14nbd: use the structured req attr checkJakub Kicinski1-4/+4
Use the macro for checking presence of required attributes. It has the advantage of reporting to the user which attr was missing in a machine-readable format (extack). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230224021301.1630703-2-kuba@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-14nbd: allow genl access outside init_netJakub Kicinski1-0/+1
NBD doesn't have much to do with networking, allow users outside init_net to access the family. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230224021301.1630703-1-kuba@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-10pktcdvd: Remove CONFIG_CDROM_PKTCDVD_WCACHE from uapi headerThomas Huth1-4/+9
CONFIG_* switches should not be exposed in uapi headers, thus let's get rid of the USE_WCACHING macro here (which was also named way to generic) and integrate the logic directly in the only function that needs it. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-03-03Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linuxLinus Torvalds2-7/+4
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - Don't access released socket during error recovery (Akinobu Mita) - Bring back auto-removal of deleted namespaces during sequential scan (Christoph Hellwig) - Fix an error code in nvme_auth_process_dhchap_challenge (Dan Carpenter) - Show well known discovery name (Daniel Wagner) - Add a missing endianess conversion in effects masking (Keith Busch) - Fix for a regression introduced in blk-rq-qos during init in this merge window (Breno) - Reorder a few fields in struct blk_mq_tag_set, eliminating a few holes and shrinking it (Christophe) - Remove redundant bdev_get_queue() NULL checks (Juhyung) - Add sed-opal single user mode support flag (Luca) - Remove SQE128 check in ublk as it isn't needed, saving some memory (Ming) - Op specific segment checking for cloned requests (Uday) - Exclusive open partition scan fixes (Yu) - Loop offset/size checking before assigning them in the device (Zhong) - Bio polling fixes (me) * tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux: blk-mq: enforce op-specific segment limits in blk_insert_cloned_request nvme-fabrics: show well known discovery name nvme-tcp: don't access released socket during error recovery nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge() nvme: bring back auto-removal of deleted namespaces during sequential scan blk-iocost: Pass gendisk to ioc_refresh_params nvme: fix sparse warning on effects masking block: be a bit more careful in checking for NULL bdev while polling block: clear bio->bi_bdev when putting a bio back in the cache loop: loop_set_status_from_info() check before assignment ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd block: remove more NULL checks after bdev_get_queue() blk-mq: Reorder fields in 'struct blk_mq_tag_set' block: fix scan partition for exclusively open device again block: Revert "block: Do not reread partition table on exclusively open device" sed-opal: add support flag for SUM in status ioctl
2023-03-02Merge tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds1-11/+9
Pull ceph fixes from Ilya Dryomov: "Two small fixes from Xiubo and myself, marked for stable" * tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-client: rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() fails ceph: update the time stamps and try to drop the suid/sgid
2023-02-26rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() failsIlya Dryomov1-11/+9
If getting an ID or setting up a work queue in rbd_dev_create() fails, use-after-free on rbd_dev->rbd_client, rbd_dev->spec and rbd_dev->opts is triggered in do_rbd_add(). The root cause is that the ownership of these structures is transfered to rbd_dev prematurely and they all end up getting freed when rbd_dev_create() calls rbd_dev_free() prior to returning to do_rbd_add(). Found by Linux Verification Center (linuxtesting.org) with SVACE, an incomplete patch submitted by Natalia Petrova <n.petrova@fintech.ru>. Cc: stable@vger.kernel.org Fixes: 1643dfa4c2c8 ("rbd: introduce a per-device ordered workqueue") Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-02-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-54/+414
Pull virtio updates from Michael Tsirkin: - device feature provisioning in ifcvf, mlx5 - new SolidNET driver - support for zoned block device in virtio blk - numa support in virtio pmem - VIRTIO_F_RING_RESET support in vhost-net - more debugfs entries in mlx5 - resume support in vdpa - completion batching in virtio blk - cleanup of dma api use in vdpa - now simulating more features in vdpa-sim - documentation, features, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa/mlx5: support device features provisioning vdpa/mlx5: make MTU/STATUS presence conditional on feature bits vdpa: validate device feature provisioning against supported class vdpa: validate provisioned device features against specified attribute vdpa: conditionally read STATUS in config space vdpa: fix improper error message when adding vdpa dev vdpa/mlx5: Initialize CVQ iotlb spinlock vdpa/mlx5: Don't clear mr struct on destroy MR vdpa/mlx5: Directly assign memory key tools/virtio: enable to build with retpoline vringh: fix a typo in comments for vringh_kiov vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails scsi: virtio_scsi: fix handling of kmalloc failure vdpa: Fix a couple of spelling mistakes in some messages vhost-net: support VIRTIO_F_RING_RESET vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit vdpa: mlx5: support per virtqueue dma device vdpa: set dma mask for vDPA device virtio-vdpa: support per vq dma device vdpa: introduce get_vq_dma_device() ...
2023-02-24Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds2-78/+6
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-23loop: loop_set_status_from_info() check before assignmentZhong Jinghua1-4/+4
In loop_set_status_from_info(), lo->lo_offset and lo->lo_sizelimit should be checked before reassignment, because if an overflow error occurs, the original correct value will be changed to the wrong value, and it will not be changed back. More, the original patch did not solve the problem, the value was set and ioctl returned an error, but the subsequent io used the value in the loop driver, which still caused an alarm: loop_handle_cmd do_req_filebacked loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset; lo_rw_aio cmd->iocb.ki_pos = pos Fixes: c490a0b5a4f3 ("loop: Check for overflow while configuring loop") Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230221095027.3656193-1-zhongjinghua@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-23Merge tag 'ata-6.3-rc1' of ↵Linus Torvalds29-10756/+0
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA updates from Damien Le Moal: - Small cleanup of the pata_octeon driver to drop a useless platform callback (Uwe) - Simplify ata_scsi_cmd_error_handler() code using the fact that ap->ops->error_handler is NULL most of the time (Wenchao) - Several patches improving libata error handling. This is in preparation for supporting the command duration limits (CDL) feature. The changes allow handling corner cases of ATA NCQ errors which do not happen with regular drives but will be triggered with CDL drives (Niklas) - Simplify the qc_fill_rtf operation (me) - Improve SCSI command translation for REPORT_SUPPORTED_OPERATION_CODES command (me) - Cleanup of libata FUA handling. This falls short of enabling FUA for ATA drives that support it by default as there were concerns that old drives would break. The series however fixes several issues with the FUA support to ensure that FUA is reported as being supported only for drives that can handle all possible write cases (NCQ and non-NCQ). A check in the block layer is also added to ensure that we never see read FUA commands (current behavior) (me) - Several patches to move the old PARIDE (parallel port IDE) driver to libata as pata_parport. Given that this driver also needs protocol modules, the driver code resides in its own pata_parport directoy under drivers/ata (Ondrej) * tag 'ata-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: pata_parport: Fix ida_alloc return value error check drivers/block: Move PARIDE protocol modules to drivers/ata/pata_parport drivers/block: Remove PARIDE core and high-level protocols ata: pata_parport: add driver (PARIDE replacement) ata: libata: exclude FUA support for known buggy drives ata: libata: Fix FUA handling in ata_build_rw_tf() ata: libata: cleanup fua support detection ata: libata: Rename and cleanup ata_rwcmd_protocol() ata: libata: Introduce ata_ncq_supported() block: add a sanity check for non-write flush/fua bios ata: libata-scsi: improve ata_scsiop_maint_in() ata: libata-scsi: do not overwrite SCSI ML and status bytes ata: libata: move NCQ related ATA_DFLAGs ata: libata: respect successfully completed commands during errors ata: libata: read the shared status for successful NCQ commands once ata: libata: simplify qc_fill_rtf port operation interface ata: scsi: rename flag ATA_QCFLAG_FAILED to ATA_QCFLAG_EH ata: libata-eh: Cleanup ata_scsi_cmd_error_handler() ata: octeon: Drop empty platform remove function
2023-02-21ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmdMing Lei1-3/+0
sizeof(struct ublksrv_io_cmd) is 16bytes, which can be held in 64byte SQE, so not necessary to check IO_URING_F_SQE128. With this change, we get chance to save half SQ ring memory. Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230220041413.1524335-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-21virtio-blk: support completion batching for the IRQ pathSuwan Kim1-37/+45
This patch adds completion batching to the IRQ path. It reuses batch completion code of virtblk_poll(). It collects requests to io_comp_batch and processes them all at once. It can boost up the performance by 2%. To validate the performance improvement and stabilty, I did fio test with 4 vCPU VM and 12 vCPU VM respectively. Both VMs have 8GB ram and the same number of HW queues as vCPU. The fio cammad is as follows and I ran the fio 5 times and got IOPS average. (io_uring, randread, direct=1, bs=512, iodepth=64 numjobs=2,4) Test result shows about 2% improvement. 4 vcpu VM | numjobs=2 | numjobs=4 ----------------------------------------------------------- fio without patch | 367.2K IOPS | 397.6K IOPS ----------------------------------------------------------- fio with patch | 372.8K IOPS | 407.7K IOPS 12 vcpu VM | numjobs=2 | numjobs=4 ----------------------------------------------------------- fio without patch | 363.6K IOPS | 374.8K IOPS ----------------------------------------------------------- fio with patch | 373.8K IOPS | 385.3K IOPS Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <20221221145456.281218-3-suwan.kim027@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21virtio-blk: set req->state to MQ_RQ_COMPLETE after polling I/O is finishedSuwan Kim1-2/+3
Driver should set req->state to MQ_RQ_COMPLETE after it finishes to process req. But virtio-blk doesn't set MQ_RQ_COMPLETE after virtblk_poll() handles req and req->state still remains MQ_RQ_IN_FLIGHT. Fortunately so far there is no issue about it because blk_mq_end_request_batch() sets req->state to MQ_RQ_IDLE. In this patch, virblk_poll() calls blk_mq_complete_request_remote() to set req->state to MQ_RQ_COMPLETE before it adds req to a batch completion list. So it properly sets req->state after polling I/O is finished. Fixes: 4e0400525691 ("virtio-blk: support polling I/O") Signed-off-by: Suwan Kim <suwan.kim027@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <20221221145456.281218-2-suwan.kim027@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-21Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linuxLinus Torvalds16-199/+392
Pull block updates from Jens Axboe: - NVMe updates via Christoph: - Small improvements to the logging functionality (Amit Engel) - Authentication cleanups (Hannes Reinecke) - Cleanup and optimize the DMA mapping cod in the PCIe driver (Keith Busch) - Work around the command effects for Format NVM (Keith Busch) - Misc cleanups (Keith Busch, Christoph Hellwig) - Fix and cleanup freeing single sgl (Keith Busch) - MD updates via Song: - Fix a rare crash during the takeover process - Don't update recovery_cp when curr_resync is ACTIVE - Free writes_pending in md_stop - Change active_io to percpu - Updates to drbd, inching us closer to unifying the out-of-tree driver with the in-tree one (Andreas, Christoph, Lars, Robert) - BFQ update adding support for multi-actuator drives (Paolo, Federico, Davide) - Make brd compliant with REQ_NOWAIT (me) - Fix for IOPOLL and queue entering, fixing stalled IO waiting on timeouts (me) - Fix for REQ_NOWAIT with multiple bios (me) - Fix memory leak in blktrace cleanup (Greg) - Clean up sbitmap and fix a potential hang (Kemeng) - Clean up some bits in BFQ, and fix a bug in the request injection (Kemeng) - Clean up the request allocation and issue code, and fix some bugs related to that (Kemeng) - ublk updates and fixes: - Add support for unprivileged ublk (Ming) - Improve device deletion handling (Ming) - Misc (Liu, Ziyang) - s390 dasd fixes (Alexander, Qiheng) - Improve utility of request caching and fixes (Anuj, Xiao) - zoned cleanups (Pankaj) - More constification for kobjs (Thomas) - blk-iocost cleanups (Yu) - Remove bio splitting from drivers that don't need it (Christoph) - Switch blk-cgroups to use struct gendisk. Some of this is now incomplete as select late reverts were done. (Christoph) - Add bvec initialization helpers, and convert callers to use that rather than open-coding it (Christoph) - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin, Matthew, Ulf, Zhong) * tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits) brd: use radix_tree_maybe_preload instead of radix_tree_preload block: use proper return value from bio_failfast() block: bio-integrity: Copy flags when bio_integrity_payload is cloned block: Fix io statistics for cgroup in throttle path brd: mark as nowait compatible brd: check for REQ_NOWAIT and set correct page allocation mask brd: return 0/-error from brd_insert_page() block: sync mixed merged request's failfast with 1st bio's Revert "blk-cgroup: pin the gendisk in struct blkcg_gq" Revert "blk-cgroup: pass a gendisk to blkg_lookup" Revert "blk-cgroup: delay blk-cgroup initialization until add_disk" Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release" Revert "blk-cgroup: move the cgroup information to struct gendisk" nvme-pci: remove iod use_sgls nvme-pci: fix freeing single sgl block: ublk: check IO buffer based on flag need_get_data s390/dasd: Fix potential memleak in dasd_eckd_init() s390/dasd: sort out physical vs virtual pointers usage block: Remove the ALLOC_CACHE_SLACK constant block: make kobj_type structures constant ...
2023-02-17brd: use radix_tree_maybe_preload instead of radix_tree_preloadPankaj Raghav1-1/+1
Unconditionally calling radix_tree_preload_end() results in a OOPS message as the preload is only conditionally called for gfpflags_allow_blocking(). [ 20.267323] BUG: using smp_processor_id() in preemptible [00000000] code: fio/416 [ 20.267837] caller is brd_insert_page.part.0+0xbe/0x190 [brd] [ 20.269436] Call Trace: [ 20.269598] <TASK> [ 20.269742] dump_stack_lvl+0x32/0x50 [ 20.269982] check_preemption_disabled+0xd1/0xe0 [ 20.270289] brd_insert_page.part.0+0xbe/0x190 [brd] [ 20.270664] brd_submit_bio+0x33f/0xf40 [brd] Use radix_tree_maybe_preload() which does preload only if gfpflags_allow_blocking() is true but also takes the lock. Therefore, unconditionally calling radix_tree_preload_end() should not create any issues and the message disappears. Fixes: 6ded703c56c2 ("brd: check for REQ_NOWAIT and set correct page allocation mask") Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230217121442.33914-1-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16brd: mark as nowait compatibleJens Axboe1-0/+1
By default, non-mq drivers do not support nowait. This causes io_uring to use a slower path as the driver cannot be trust not to block. brd can safely set the nowait flag, as worst case all it does is a NOIO allocation. For io_uring, this makes a substantial difference. Before: submitter=0, tid=453, file=/dev/ram0, node=-1 polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=440.03K, BW=1718MiB/s, IOS/call=32/31 IOPS=428.96K, BW=1675MiB/s, IOS/call=32/32 IOPS=442.59K, BW=1728MiB/s, IOS/call=32/31 IOPS=419.65K, BW=1639MiB/s, IOS/call=32/32 IOPS=426.82K, BW=1667MiB/s, IOS/call=32/31 and after: submitter=0, tid=354, file=/dev/ram0, node=-1 polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=3.37M, BW=13.15GiB/s, IOS/call=32/31 IOPS=3.45M, BW=13.46GiB/s, IOS/call=32/31 IOPS=3.43M, BW=13.42GiB/s, IOS/call=32/32 IOPS=3.43M, BW=13.39GiB/s, IOS/call=32/31 IOPS=3.43M, BW=13.38GiB/s, IOS/call=32/31 or about an 8x in difference. Now that brd is prepared to deal with REQ_NOWAIT reads/writes, mark it as supporting that. Cc: stable@vger.kernel.org # 5.10+ Link: https://lore.kernel.org/linux-block/20230203103005.31290-1-p.raghav@samsung.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16brd: check for REQ_NOWAIT and set correct page allocation maskJens Axboe1-20/+28
If REQ_NOWAIT is set, then do a non-blocking allocation if the operation is a write and we need to insert a new page. Currently REQ_NOWAIT cannot be set as the queue isn't marked as supporting nowait, this change is in preparation for allowing that. radix_tree_preload() warns on attempting to call it with an allocation mask that doesn't allow blocking. While that warning could arguably be removed, we need to handle radix insertion failures anyway as they are more likely if we cannot block to get memory. Remove legacy BUG_ON()'s and turn them into proper errors instead, one for the allocation failure and one for finding a page that doesn't match the correct index. Cc: stable@vger.kernel.org # 5.10+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-16brd: return 0/-error from brd_insert_page()Jens Axboe1-14/+12
It currently returns a page, but callers just check for NULL/page to gauge success. Clean this up and return the appropriate error directly instead. Cc: stable@vger.kernel.org # 5.10+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-15virtio_blk: zone append in header type tweakMichael S. Tsirkin1-1/+1
virtio blk returns a 64 bit append_sector in an input buffer, in LE format. This field is not tagged as LE correctly, so even though the generated code is ok, we get warnings from sparse: drivers/block/virtio_blk.c:332:33: sparse: sparse: cast to restricted __le64 Make sparse happy by using the correct type. Message-Id: <20221220125154.564265-1-mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-02-15virtio_blk: temporary variable type tweakMichael S. Tsirkin1-1/+1
virtblk_result returns blk_status_t which is a bitwise restricted type, so we are not supposed to stuff it in a plain int temporary variable. All we do with it is pass it on to a function expecting blk_status_t so the generated code is ok, but we get warnings from sparse: drivers/block/virtio_blk.c:326:36: sparse: sparse: incorrect type in initializer (different base types) @@ expected int status @@ +got restricted blk_status_t @@ drivers/block/virtio_blk.c:334:33: sparse: sparse: incorrect type in argument 2 (different base types) @@ expected restricted +blk_status_t [usertype] error @@ got int status @@ Make sparse happy by using the correct type. Message-Id: <20221220124152.523531-1-mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-02-15virtio-blk: add support for zoned block devicesDmitry Fomichev1-18/+369
This patch adds support for Zoned Block Devices (ZBDs) to the kernel virtio-blk driver. The patch accompanies the virtio-blk ZBD support draft that is now being proposed for standardization. The latest version of the draft is linked at https://github.com/oasis-tcs/virtio-spec/issues/143 . The QEMU zoned device code that implements these protocol extensions has been developed by Sam Li and it is currently in review at the QEMU mailing list. A number of virtblk request structure changes has been introduced to accommodate the functionality that is specific to zoned block devices and, most importantly, make room for carrying the Zoned Append sector value from the device back to the driver along with the request status. The zone-specific code in the patch is heavily influenced by NVMe ZNS code in drivers/nvme/host/zns.c, but it is simpler because the proposed virtio ZBD draft only covers the zoned device features that are relevant to the zoned functionality provided by Linux block layer. includes the following fixup: virtio-blk: fix probe without CONFIG_BLK_DEV_ZONED When building without CONFIG_BLK_DEV_ZONED, VIRTIO_BLK_F_ZONED is excluded from array of driver features. As a result virtio_has_feature panics in virtio_check_driver_offered_feature since that by design verifies that a feature we are checking for is listed in the feature array. To fix, replace the call to virtio_has_feature with a stub. Message-Id: <20221016034127.330942-3-dmitry.fomichev@wdc.com> Co-developed-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Message-Id: <20221220112340.518841-1-mst@redhat.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Debugged-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Tested-by: Anders Roxell <anders.roxell@linaro.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-02-13block: ublk: check IO buffer based on flag need_get_dataLiu Xiaodong1-4/+9
Currently, uring_cmd with UBLK_IO_FETCH_REQ or UBLK_IO_COMMIT_AND_FETCH_REQ is always checked whether userspace server has provided IO buffer even flag UBLK_F_NEED_GET_DATA is configured. This is a excessive check. If UBLK_F_NEED_GET_DATA is configured, FETCH_RQ doesn't need to provide IO buffer; COMMIT_AND_FETCH_REQ also doesn't need to do that if the IO type is not READ. Check ub_cmd->addr together with ublk_need_get_data() and IO type in ublk_ch_uring_cmd(). With this fix, userspace server doesn't need to preserve buffers for every ublk_io when flag UBLK_F_NEED_GET_DATA is configured, in order to save memory. Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com> Fixes: c86019ff75c1 ("ublk_drv: add support for UBLK_IO_NEED_GET_DATA") Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230210141356.112321-1-xiaodong.liu@intel.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-08block: ublk: improve handling device deletionMing Lei1-4/+18
Inside ublk_ctrl_del_dev(), when the device is removed, we wait until the device number is freed with holding global lock of ublk_ctl_mutex, this way isn't friendly from user viewpoint: 1) if device is in-use, the current delete command hangs in ublk_ctrl_del_dev(), and user can't break from the handling because wait_event() is used 2) global lock is held, so any new device can't be added and other old devices can't be removed. Improve the deleting handling by the following way, suggested by Nadav: 1) wait without holding the global lock 2) replace wait_event() with wait_event_interruptible() Reported-by: Nadav Amit <nadav.amit@gmail.com> Suggested-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230207150700.545530-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-07ublk: pass NULL to blk_mq_alloc_disk() as queuedataZiyang Zhang1-1/+1
queuedata is not referenced in ublk_drv and we can use driver_data instead. Pass NULL to blk_mq_alloc_disk() as queuedata while allocating ublk's gendisk. Signed-off-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230207070839.370817-4-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-07ublk: mention WRITE_ZEROES in comment of ublk_complete_rq()Ziyang Zhang1-1/+1
WRITE_ZEROES won't return bytes returned just like FLUSH and DISCARD, and we can end it directly. Add missing comment for it in ublk_complete_rq(). Signed-off-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230207070839.370817-3-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-07ublk: remove unnecessary NULL check in ublk_rq_has_data()Ziyang Zhang1-1/+1
bio_has_data() allows a NULL bio so the NULL check in ublk_rq_has_data() is unnecessary. Signed-off-by: Ziyang Zhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230207070839.370817-2-ZiyangZhang@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>