summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2021-08-19isystem: trim/fixup stdarg.h and other headersAlexey Dobriyan1-1/+0
Delete/fixup few includes in anticipation of global -isystem compile option removal. Note: crypto/aegis128-neon-inner.c keeps <stddef.h> due to redefinition of uintptr_t error (one definition comes from <stddef.h>, another from <linux/types.h>). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-08-16block: nbd: add sanity check for first_minorPavel Skripkin1-0/+10
Syzbot hit WARNING in internal_create_group(). The problem was in too big disk->first_minor. disk->first_minor is initialized by value, which comes from userspace and there wasn't any sanity checks about value correctness. It can cause duplicate creation of sysfs files/links, because disk->first_minor will be passed to MKDEV() which causes truncation to byte. Since maximum minor value is 0xff, let's check if first_minor is correct minor number. NOTE: the root case of the reported warning was in wrong error handling in register_disk(), but we can avoid passing knowingly wrong values to sysfs API, because sysfs error messages can confuse users. For example: user passed 1048576 as index, but sysfs complains about duplicate creation of /dev/block/43:0. It's not obvious how 1048576 becomes 0. Log and reproducer for above example can be found on syzkaller bug report page. Link: https://syzkaller.appspot.com/bug?id=03c2ae9146416edf811958d5fd7acfab75b143d1 Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices") Reported-by: syzbot+9937dc42271cd87d4b98@syzkaller.appspotmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16ps3vram: use bvec_virtChristoph Hellwig1-1/+1
Use bvec_virt instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210804095634.460779-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16virtio_blk: use bvec_virtChristoph Hellwig1-5/+2
Use bvec_virt instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20210804095634.460779-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16rbd: use bvec_virtChristoph Hellwig1-2/+1
Use bvec_virt instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20210804095634.460779-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-6/+33
Pull virtio fixes from Michael Tsirkin: "Fixes in virtio, vhost, and vdpa drivers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Fix queue type selection logic vdpa/mlx5: Avoid destroying MR on empty iotlb tools/virtio: fix build virtio_ring: pull in spinlock header vringh: pull in spinlock header virtio-blk: Add validation for block size in config space vringh: Use wiov->used to check for read/write desc order virtio_vdpa: reject invalid vq indices vdpa: Add documentation for vdpa_alloc_device() macro vDPA/ifcvf: Fix return value check for vdpa_alloc_device() vp_vdpa: Fix return value check for vdpa_alloc_device() vdpa_sim: Fix return value check for vdpa_alloc_device() vhost: Fix the calculation in vhost_overflow() vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update() virtio_pci: Support surprise removal of virtio pci device virtio: Protect vqs list access virtio: Keep vring_del_virtqueue() mirror of VQ create virtio: Improve vq->broken access to avoid any compiler optimization
2021-08-14Merge tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+11
Pull block fixes from Jens Axboe: "A few fixes for block that should go into 5.14: - Revert the mq-deadline cgroup addition. More work is needed on this front, let's revert it for now and get it right before having it in a released kernel (Tejun) - blk-iocost lockdep fix (Ming) - nbd double completion fix (Xie) - Fix for non-idling when clearing the shared tag flag (Yu)" * tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block: nbd: Aovid double completion of a request blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED Revert "block/mq-deadline: Add cgroup support" blk-iocost: fix lockdep warning on blkcg->lock
2021-08-13nbd: reduce the nbd_index_mutex scopeChristoph Hellwig1-27/+28
nbd_index_mutex is currently held over add_disk and inside ->open, which leads to lock order reversals. Refactor the device creation code path so that nbd_dev_add is called without nbd_index_mutex lock held and only takes it for the IDR insertation. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-7-hch@lst.de [axboe: fix whitespace] Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13nbd: refactor device search and allocation in nbd_genl_connectChristoph Hellwig1-31/+14
Use idr_for_each_entry instead of the awkward callback to find an existing device for the index == -1 case, and de-duplicate the device allocation if no existing device was found. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-6-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13nbd: return the allocated nbd_device from nbd_dev_addChristoph Hellwig1-12/+9
Return the device we just allocated instead of doing an extra search for it in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-5-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13nbd: remove nbd_del_diskChristoph Hellwig1-12/+5
Fold nbd_del_disk and remove the pointless NULL check on ->disk given that it is always set for a successfully allocated nbd_device structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-4-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13nbd: refactor device removalChristoph Hellwig1-24/+13
Share common code for the synchronous and workqueue based device removal, and remove the pointless use of refcount_dec_and_mutex_lock. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-3-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13nbd: do del_gendisk() asynchronously for NBD_DESTROY_ON_DISCONNECTHou Tao1-9/+61
Now open_mutex is used to synchronize partition operations (e.g, blk_drop_partitions() and blkdev_reread_part()), however it makes nbd driver broken, because nbd may call del_gendisk() in nbd_release() or nbd_genl_disconnect() if NBD_CFLAG_DESTROY_ON_DISCONNECT is enabled, and deadlock occurs, as shown below: // AB-BA dead-lock nbd_genl_disconnect blkdev_open nbd_disconnect_and_put lock bd_mutex // last ref nbd_put lock nbd_index_mutex del_gendisk nbd_open try lock nbd_index_mutex try lock bd_mutex or // AA dead-lock nbd_release lock bd_mutex nbd_put try lock bd_mutex Instead of fixing block layer (e.g, introduce another lock), fixing the nbd driver to call del_gendisk() in a kworker when NBD_DESTROY_ON_DISCONNECT is enabled. When NBD_DESTROY_ON_DISCONNECT is disabled, nbd device will always be destroy through module removal, and there is no risky of deadlock. To ensure the reuse of nbd index succeeds, moving the calling of idr_remove() after del_gendisk(), so if the reused index is not found in nbd_index_idr, the old disk must have been deleted. And reusing the existing destroy_complete mechanism to ensure nbd_genl_connect() will wait for the completion of del_gendisk(). Also adding a new workqueue for nbd removal, so nbd_cleanup() can ensure all removals complete before exits. Reported-by: syzbot+0fe7752e52337864d29b@syzkaller.appspotmail.com Fixes: c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210811124428.2368491-2-hch@lst.de Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13nbd: add the check to prevent overflow in __nbd_ioctl()Baokun Li1-2/+4
If user specify a large enough value of NBD blocks option, it may trigger signed integer overflow which may lead to nbd->config->bytesize becomes a large or small value, zero in particular. UBSAN: Undefined behaviour in drivers/block/nbd.c:325:31 signed integer overflow: 1024 * 4611686155866341414 cannot be represented in type 'long long int' [...] Call trace: [...] handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 nbd_size_set drivers/block/nbd.c:325 [inline] __nbd_ioctl drivers/block/nbd.c:1342 [inline] nbd_ioctl+0x998/0xa10 drivers/block/nbd.c:1395 __blkdev_driver_ioctl block/ioctl.c:311 [inline] [...] Although it is not a big deal, still silence the UBSAN by limit the input value. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210804021212.990223-1-libaokun1@huawei.com [axboe: dropped unlikely()] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-13nbd: Aovid double completion of a requestXie Yongji1-3/+11
There is a race between iterating over requests in nbd_clear_que() and completing requests in recv_work(), which can lead to double completion of a request. To fix it, flush the recv worker before iterating over the requests and don't abort the completed request while iterating. Fixes: 96d97e17828f ("nbd: clear_sock on netlink disconnect") Reported-by: Jiang Yadong <jiangyadong@bytedance.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-12sx8: use the internal state machine to check if del_gendisk needs to be calledChristoph Hellwig1-1/+1
Remove usage of the block layer internal GENHD_FL_UP by just looking at the host state. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210809064028.1198327-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-12block: move some macros to blkdev.hGuoqing Jiang2-7/+0
Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the generic header file to remove redundancy. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Link: https://lore.kernel.org/r/20210721025315.1729118-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-11virtio-blk: Add validation for block size in config spaceXie Yongji1-6/+33
An untrusted device might presents an invalid block size in configuration space. This tries to add validation for it in the validate callback and clear the VIRTIO_BLK_F_BLK_SIZE feature bit if the value is out of the supported range. And we also double check the value in virtblk_probe() in case that it's changed after the validation. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210809101609.148-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-08-10xen-blkfront: Remove redundant assignment to variable errColin Ian King1-1/+0
The variable err is being assigned a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20210806110601.11386-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: move the bdi from the request_queue to the gendiskChristoph Hellwig2-8/+5
The backing device information only makes sense for file system I/O, and thus belongs into the gendisk and not the lower level request_queue structure. Move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09block: pass a gendisk to blk_queue_update_readaheadChristoph Hellwig1-1/+1
.. and rename the function to disk_update_readahead. This is in preparation for moving the BDI from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-07Merge tag 'block-5.14-2021-08-07' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block fixes from Jens Axboe: "A few minor fixes: - Fix ldm kernel-doc warning (Bart) - Fix adding offset twice for DMA address in n64cart (Christoph) - Fix use-after-free in dasd path handling (Stefan) - Order kyber insert trace correctly (Vincent) - raid1 errored write handling fix (Wei) - Fix blk-iolatency queue get failure handling (Yu)" * tag 'block-5.14-2021-08-07' of git://git.kernel.dk/linux-block: kyber: make trace_block_rq call consistent with documentation block/partitions/ldm.c: Fix a kernel-doc warning blk-iolatency: error out if blk_get_queue() failed in iolatency_set_limit() n64cart: fix the dma address in n64cart_do_bvec s390/dasd: fix use after free in dasd path handling md/raid10: properly indicate failure when ending a failed write request
2021-08-05loop: Select I/O scheduler 'none' from inside add_disk()Bart Van Assche1-1/+2
We noticed that the user interface of Android devices becomes very slow under memory pressure. This is because Android uses the zram driver on top of the loop driver for swapping, because under memory pressure the swap code alternates reads and writes quickly, because mq-deadline is the default scheduler for loop devices and because mq-deadline delays writes by five seconds for such a workload with default settings. Fix this by making the kernel select I/O scheduler 'none' from inside add_disk() for loop devices. This default can be overridden at any time from user space, e.g. via a udev rule. This approach has an advantage compared to changing the I/O scheduler from userspace from 'mq-deadline' into 'none', namely that synchronize_rcu() does not get called. This patch changes the default I/O scheduler for loop devices from 'mq-deadline' into 'none'. Additionally, this patch reduces the Android boot time on my test setup with 0.5 seconds compared to configuring the loop I/O scheduler from user space. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Martijn Coenen <maco@android.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210805174200.3250718-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-04n64cart: fix the dma address in n64cart_do_bvecChristoph Hellwig1-1/+1
dma_map_bvec already takes bv_offset into account. Fixes: 9b2a2bbbb4d0 ("block: Add n64 cart driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02block/rnbd: Use sysfs_emit instead of s*printf function for sysfs showMd Haris Iqbal2-25/+22
sysfs_emit function was added to be aware of the PAGE_SIZE maximum of the temporary buffer used for outputting sysfs content, so there is no possible overruns. So replace the uses of any s*printf functions for the sysfs show functions with sysfs_emit. Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20210726115950.470543-3-jinpu.wang@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02block/rnbd-clt: Use put_cpu_ptr after get_cpu_ptrGioh Kim1-1/+1
This patch replaces put_cpu_var with put_cpu_ptr because get_cpu_ptr should be paired with put_cpu_ptr. Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20210726115950.470543-2-jinpu.wang@ionos.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02loop: raise media_change eventMatteo Croce1-0/+5
Make the loop device raise a DISK_MEDIA_CHANGE event on attach or detach. # udevadm monitor -up |grep -e DISK_MEDIA_CHANGE -e DEVNAME & # losetup -f zero [ 7.454235] loop0: detected capacity change from 0 to 16384 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop0 DEVNAME=/dev/loop0 DEVNAME=/dev/loop0 # losetup -f zero [ 10.205245] loop1: detected capacity change from 0 to 16384 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop1 DEVNAME=/dev/loop1 DEVNAME=/dev/loop1 # losetup -f zero2 [ 13.532368] loop2: detected capacity change from 0 to 40960 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop2 DEVNAME=/dev/loop2 # losetup -D DEVNAME=/dev/loop1 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop1 DEVNAME=/dev/loop2 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop2 DEVNAME=/dev/loop0 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop0 Signed-off-by: Matteo Croce <mcroce@microsoft.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Luca Boccassi <bluca@debian.org> Link: https://lore.kernel.org/r/20210712230530.29323-7-mcroce@linux.microsoft.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02loop: don't grab a reference to the block deviceChristoph Hellwig1-5/+0
The whole device block device won't be removed while the disk is still alive, so don't bother to grab a reference to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Ming Lei <ming.lei@rehat.com> Reviewed-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Link: https://lore.kernel.org/r/20210722075402.983367-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02ps3disk: use memcpy_{from,to}_bvecChristoph Hellwig1-16/+2
Use the bvec helpers instead of open coding the copy. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Geoff Levand <geoff@infradead.org> Link: https://lore.kernel.org/r/20210727055646.118787-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02rbd: use memzero_bvecChristoph Hellwig1-13/+2
Use memzero_bvec instead of reimplementing it. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20210727055646.118787-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-30Merge tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-blockLinus Torvalds1-31/+97
Pull block fixes from Jens Axboe: - gendisk freeing fix (Christoph) - blk-iocost wake ordering fix (Tejun) - tag allocation error handling fix (John) - loop locking fix. While this isn't the prettiest fix in the world, nobody has any good alternatives for 5.14. Something to likely revisit for 5.15. (Tetsuo) * tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block: block: delay freeing the gendisk blk-iocost: fix operation ordering in iocg_wake_fn() blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling loop: reintroduce global lock for safe loop_validate_file() traversal
2021-07-29scsi: core: Rename CONFIG_BLK_SCSI_REQUEST to CONFIG_SCSI_COMMONChristoph Hellwig1-1/+1
CONFIG_BLK_SCSI_REQUEST is rather misnamed as it enables building a small amount of code shared by the SCSI initiator, target, and consumers of the scsi_request passthrough API. Rename it and also allow building it as a module. [mkp: add module license] Link: https://lore.kernel.org/r/20210724072033.1284840-20-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-29scsi: cdrom: Remove the call to scsi_cmd_blk_ioctl() from cdrom_ioctl()Christoph Hellwig2-2/+0
Only the sr driver can handle SCSI passthrough requests, so move the call to scsi_cmd_blk_ioctl() there. Link: https://lore.kernel.org/r/20210724072033.1284840-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-23loop: reintroduce global lock for safe loop_validate_file() traversalTetsuo Handa1-31/+97
Commit 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock") re-opened a race window for NULL pointer dereference at loop_validate_file() where commit 310ca162d779efee ("block/loop: Use global lock for ioctl() operation.") has closed. Although we need to guarantee that other loop devices will not change during traversal, we can't take remote "struct loop_device"->lo_mutex inside loop_validate_file() in order to avoid AB-BA deadlock. Therefore, introduce a global lock dedicated for loop_validate_file() which is conditionally taken before local "struct loop_device"->lo_mutex is taken. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-21rbd: resurrect setting of disk->private_data in rbd_init_disk()Ilya Dryomov1-0/+1
rbd_open() and rbd_release() expect that disk->private_data is set to rbd_dev. Otherwise we hit a NULL pointer dereference when mapping the image. URL: https://tracker.ceph.com/issues/51759 Fixes: 195b1956b85b ("rbd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-07-20rbd: don't hold lock_rwsem while running_list is being drainedIlya Dryomov1-7/+5
Currently rbd_quiesce_lock() holds lock_rwsem for read while blocking on releasing_wait completion. On the I/O completion side, each image request also needs to take lock_rwsem for read. Because rw_semaphore implementation doesn't allow new readers after a writer has indicated interest in the lock, this can result in a deadlock if something that needs to take lock_rwsem for write gets involved. For example: 1. watch error occurs 2. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and releases lock_rwsem 3. after reestablishing the watch, rbd_reregister_watch() takes lock_rwsem for write and calls rbd_reacquire_lock() 4. rbd_quiesce_lock() downgrades lock_rwsem to for read and blocks on releasing_wait until running_list becomes empty 5. another watch error occurs 6. rbd_watch_errcb() blocks trying to take lock_rwsem for write 7. no in-flight image request can complete and delete itself from running_list because lock_rwsem won't be granted anymore A similar scenario can occur with "lock has been acquired" and "lock has been released" notification handers which also take lock_rwsem for write to update owner_cid. We don't actually get anything useful from sitting on lock_rwsem in rbd_quiesce_lock() -- owner_cid updates certainly don't need to be synchronized with. In fact the whole owner_cid tracking logic could probably be removed from the kernel client because we don't support proxied maintenance operations. Cc: stable@vger.kernel.org # 5.3+ URL: https://tracker.ceph.com/issues/42757 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Robin Geuze <robin.geuze@nl.team.blue>
2021-07-20rbd: always kick acquire on "acquired" and "released" notificationsIlya Dryomov1-13/+7
Skipping the "lock has been released" notification if the lock owner is not what we expect based on owner_cid can lead to I/O hangs. One example is our own notifications: because owner_cid is cleared in rbd_unlock(), when we get our own notification it is processed as unexpected/duplicate and maybe_kick_acquire() isn't called. If a peer that requested the lock then doesn't go through with acquiring it, I/O requests that came in while the lock was being quiesced would be stalled until another I/O request is submitted and kicks acquire from rbd_img_exclusive_lock(). This makes the comment in rbd_release_lock() actually true: prior to this change the canceled work was being requeued in response to the "lock has been acquired" notification from rbd_handle_acquired_lock(). Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Robin Geuze <robin.geuze@nl.team.blue>
2021-07-16Merge tag 'block-5.14-2021-07-16' of git://git.kernel.dk/linux-blockLinus Torvalds3-200/+28
Pull block fixes from Jens Axboe: - NVMe fixes via Christoph: - fix various races in nvme-pci when shutting down just after probing (Casey Chen) - fix a net_device leak in nvme-tcp (Prabhakar Kushwaha) - Fix regression in xen-blkfront by cleaning up the removal state machine (Christoph) - Fix tag_set and queue cleanup ordering regression in nbd (Wang) - Fix tag_set and queue cleanup ordering regression in pd (Guoqing) * tag 'block-5.14-2021-07-16' of git://git.kernel.dk/linux-block: xen-blkfront: sanitize the removal state machine nbd: fix order of cleaning up the queue and freeing the tagset pd: fix order of cleaning up the queue and freeing the tagset nvme-pci: do not call nvme_dev_remove_admin from nvme_remove nvme-pci: fix multiple races in nvme_setup_io_queues nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACE
2021-07-15xen-blkfront: sanitize the removal state machineChristoph Hellwig1-198/+26
xen-blkfront has a weird protocol where close message from the remote side can be delayed, and where hot removals are treated somewhat differently from regular removals, all leading to potential NULL pointer removals, and a del_gendisk from the block device release method, which will deadlock. Fix this by just performing normal hot removals even when the device is opened like all other Linux block drivers. Fixes: c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20210715141711.1257293-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-15nbd: fix order of cleaning up the queue and freeing the tagsetWang Qing1-1/+1
We must release the queue before freeing the tagset. Fixes: 4af5f2e03013 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") Reported-and-tested-by: syzbot+9ca43ff47167c0ee3466@syzkaller.appspotmail.com Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210706040016.1360412-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-15pd: fix order of cleaning up the queue and freeing the tagsetGuoqing Jiang1-1/+1
We must release the queue before freeing the tagset. Fixes: 262d431f9000 ("pd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210706010734.1356066-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-09Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-blockLinus Torvalds5-210/+172
Pull more block updates from Jens Axboe: "A combination of changes that ended up depending on both the driver and core branch (and/or the IDE removal), and a few late arriving fixes. In detail: - Fix io ticks wrap-around issue (Chunguang) - nvme-tcp sock locking fix (Maurizio) - s390-dasd fixes (Kees, Christoph) - blk_execute_rq polling support (Keith) - blk-cgroup RCU iteration fix (Yu) - nbd backend ID addition (Prasanna) - Partition deletion fix (Yufen) - Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph) - Removal of now dead block request types due to IDE removal (Christoph) - Loop probing and control device cleanups (Christoph) - Device uevent fix (Christoph) - Misc cleanups/fixes (Tetsuo, Christoph)" * tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits) blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs block: fix the problem of io_ticks becoming smaller nvme-tcp: can't set sk_user_data without write_lock loop: remove unused variable in loop_set_status() block: remove the bdgrab in blk_drop_partitions block: grab a device refcount in disk_uevent s390/dasd: Avoid field over-reading memcpy() dasd: unexport dasd_set_target_state block: check disk exist before trying to add partition ubd: remove dead code in ubd_setup_common nvme: use return value from blk_execute_rq() block: return errors from blk_execute_rq() nvme: use blk_execute_rq() for passthrough commands block: support polling through blk_execute_rq block: remove REQ_OP_SCSI_{IN,OUT} block: mark blk_mq_init_queue_data static loop: rewrite loop_exit using idr_for_each_entry loop: split loop_lookup loop: don't allow deleting an unspecified loop device loop: move loop_ctl_mutex locking into loop_add ...
2021-07-09Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-8/+9
Pull virtio,vhost,vdpa updates from Michael Tsirkin: - Doorbell remapping for ifcvf, mlx5 - virtio_vdpa support for mlx5 - Validate device input in several drivers (for SEV and friends) - ZONE_MOVABLE aware handling in virtio-mem - Misc fixes, cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) virtio-mem: prioritize unplug from ZONE_MOVABLE in Big Block Mode virtio-mem: simplify high-level unplug handling in Big Block Mode virtio-mem: prioritize unplug from ZONE_MOVABLE in Sub Block Mode virtio-mem: simplify high-level unplug handling in Sub Block Mode virtio-mem: simplify high-level plug handling in Sub Block Mode virtio-mem: use page_zonenum() in virtio_mem_fake_offline() virtio-mem: don't read big block size in Sub Block Mode virtio/vdpa: clear the virtqueue state during probe vp_vdpa: allow set vq state to initial state after reset virtio-pci library: introduce vp_modern_get_driver_features() vdpa: support packed virtqueue for set/get_vq_state() virtio-ring: store DMA metadata in desc_extra for split virtqueue virtio: use err label in __vring_new_virtqueue() virtio_ring: introduce virtqueue_desc_add_split() virtio_ring: secure handling of mapping errors virtio-ring: factor out desc_extra allocation virtio_ring: rename vring_desc_extra_packed virtio-ring: maintain next in extra state for packed virtqueue vdpa/mlx5: Clear vq ready indication upon device reset vdpa/mlx5: Add support for doorbell bypassing ...
2021-07-05Merge tag 'char-misc-5.14-rc1' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver updates from Greg KH: "Here is the big set of char / misc and other driver subsystem updates for 5.14-rc1. Included in here are: - habanalabs driver updates - fsl-mc driver updates - comedi driver updates - fpga driver updates - extcon driver updates - interconnect driver updates - mei driver updates - nvmem driver updates - phy driver updates - pnp driver updates - soundwire driver updates - lots of other tiny driver updates for char and misc drivers This is looking more and more like the "various driver subsystems mushed together" tree... All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits) mcb: Use DEFINE_RES_MEM() helper macro and fix the end address PNP: moved EXPORT_SYMBOL so that it immediately followed its function/variable bus: mhi: pci-generic: Add missing 'pci_disable_pcie_error_reporting()' calls bus: mhi: Wait for M2 state during system resume bus: mhi: core: Fix power down latency intel_th: Wait until port is in reset before programming it intel_th: msu: Make contiguous buffers uncached intel_th: Remove an unused exit point from intel_th_remove() stm class: Spelling fix nitro_enclaves: Set Bus Master for the NE PCI device misc: ibmasm: Modify matricies to matrices misc: vmw_vmci: return the correct errno code siox: Simplify error handling via dev_err_probe() fpga: machxo2-spi: Address warning about unused variable lkdtm/heap: Add init_on_alloc tests selftests/lkdtm: Enable various testable CONFIGs lkdtm: Add CONFIG hints in errors where possible lkdtm: Enable DOUBLE_FAULT on all architectures lkdtm/heap: Add vmalloc linear overflow test lkdtm/bugs: XFAIL UNALIGNED_LOAD_STORE_WRITE ...
2021-07-03virtio-blk: limit seg_max to a safe valueStefan Hajnoczi1-1/+7
The struct virtio_blk_config seg_max value is read from the device and incremented by 2 to account for the request header and status byte descriptors added by the driver. In preparation for supporting untrusted virtio-blk devices, protect against integer overflow and limit the value to a safe maximum. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20210524154020.98195-1-stefanha@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03virtio-blk: Fix memory leak among suspend/resume procedureXie Yongji1-0/+2
The vblk->vqs should be freed before we call init_vqs() in virtblk_restore(). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210517084332.280-1-xieyongji@bytedance.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03virtio_blk: cleanups: remove check obsoleted by CONFIG_LBDAF removalSohaib1-7/+0
Prior to 72deb455b5ec ("block: remove CONFIG_LBDAF"), it was optional if the 32-bit kernel support block device and/or file sizes larger than 2 TiB (considering the sector size is 512 bytes) But now sector_t and blkcnt_t are always 64-bit in size. Suggested-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Sohaib Mohammed <sohaib.amhmd@gmail.com> Link: https://lore.kernel.org/r/20210430103611.77345-1-sohaib.amhmd@gmail.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...
2021-07-02loop: remove unused variable in loop_set_status()Tetsuo Handa1-2/+0
Commit 0384264ea8a39bd9 ("block: pass a gendisk to bdev_disk_changed") changed to pass lo->lo_disk instead of lo->lo_device. Fixes: 0384264ea8a3 ("block: pass a gendisk to bdev_disk_changed") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20210702152714.7978-1-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-6/+4
Pull rdma updates from Jason Gunthorpe: "This contains a replacement driver for Intel iWarp hardware. This new driver supports the old ethernet hardware and also newer chips that can do ROCE. Other than that, this contains the typical mix of patches: - Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5 - Many static checker driven code clean ups, including a wide refcount_t conversion - Several series for the hns driver, more HIP09 HW capabilities, migration to new HW register manipulators, and code cleanups - Minor fixes and improvements in srp, rts, and cm - Improvements throughout for sysfs related code to use DEVICE_ATTR_*, make the ib_port sysfs first-class, and overall use sysfs APIs properly - Intel's new irdma driver replacing i40iw - rxe general clean ups and Memory Window support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (211 commits) RDMA/core: Always release restrack object RDMA/mlx5: Don't access NULL-cleared mpi pointer RDMA/irdma: Fix potential overflow expression in irdma_prm_get_pbles RDMA/irdma: Check contents of user-space irdma_mem_reg_req object RDMA/rxe: Missing unlock on error in get_srq_wqe() RDMA/cma: Fix rdma_resolve_route() memory leak RDMA/core/sa_query: Remove unused argument RDMA/cma: Fix incorrect Packet Lifetime calculation RDMA/cma: Protect RMW with qp_mutex RDMA/cma: Remove unnecessary INIT->INIT transition RDMA/hns: Add window selection field of congestion control RDMA/hfi1: Remove use of kmap() RDMA/irdma: Remove use of kmap() RDMA/bnxt_re: Fix uninitialized struct bit field rsvd1 IB/isert: Align target max I/O size to initiator size RDMA/hns: Fix incorrect vlan enable bit in QPC MAINTAINERS: Update Broadcom RDMA maintainers RDMA/irdma: Use the queried port attributes RDMA/rxe: Fix redundant skb_put_zero RDMA/rxe: Fix extra copy in prepare_ack_packet ...