summaryrefslogtreecommitdiff
path: root/drivers/block/null_blk/main.c
AgeCommit message (Collapse)AuthorFilesLines
2024-04-02nullblk: Fix cleanup order in null_add_dev() error pathDamien Le Moal1-2/+2
In null_add_dev(), if an error happen after initializing the resources for a zoned null block device, we must free these resources before exiting the function. To ensure this, move the out_cleanup_zone label after out_cleanup_disk as we jump to this latter label if an error happens after calling null_init_zoned_dev(). Fixes: e440626b1caf ("null_blk: pass queue_limits to blk_mq_alloc_disk") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240330005300.1503252-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-22null_blk: Delete nullb.{queue_depth, nr_queues}John Garry1-11/+0
Since commit 8b631f9cf0b8 ("null_blk: remove the bio based I/O path"), struct nullb members queue_depth and nr_queues are only ever written, so delete them. With that, null_exit_hctx() can also be deleted. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240222083420.6026-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: pass queue_limits to blk_mq_alloc_diskChristoph Hellwig1-21/+20
Pass the queue limits directly to blk_mq_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: remove null_gendisk_registerChristoph Hellwig1-25/+16
null_gendisk_register isn't a very useful abstraction given that it doesn't even allocate the gendisk. Merge it into the only caller instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: refactor tag_set setupChristoph Hellwig1-55/+51
Move the tagset initialization out of null_add_dev into a new null_setup_tagset helper, and move the shared vs local differences out of null_init_tag_set into the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: initialize the tag_set timeout in null_init_tag_setChristoph Hellwig1-1/+1
Otherwise it will be reset to the always same value when initializing a device using the shared tag_set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20null_blk: remove the bio based I/O pathChristoph Hellwig1-301/+64
The bio based I/O path complicates null_blk and also make various data structures, including the per-command one way bigger than required for the main request based interface. As the bio-based path is mostly used by stacking drivers and simple memory based drivers, and brd is a good example driver for the latter there is no need to have a bio based path in null_blk. Remove the path to simplify the driver and make future block layer API changes simpler by not having to deal with the complex two API setup in null_blk. Note that the queue_mode field in struct nullb_device is kept as that is simpler than having two different places to check the value and fully open coding the debugfs helpers as the existing ones won't work without a named struct member. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-20block: pass a queue_limits argument to blk_alloc_diskChristoph Hellwig1-3/+4
Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Also change blk_alloc_disk to return an ERR_PTR instead of just NULL which can't distinguish errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-13block: pass a queue_limits argument to blk_mq_alloc_diskChristoph Hellwig1-1/+1
Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08null_blk: add configfs variable shared_tagsShin'ichiro Kawasaki1-18/+20
Allow setting shared_tags through configfs, which could only be set as a module parameter. For that purpose, delay tag_set initialization from null_init() to null_add_dev(). Refer tag_set.ops as the flag to check if tag_set is initialized or not. The following parameters can not be set through configfs yet: timeout requeue init_hctx Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240130042134.2463659-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-01-14null_blk: Remove usage of the deprecated ida_simple_xx() APIChristophe JAILLET1-2/+2
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/bf257b1078475a415cdc3344c6a750842946e367.1705222845.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-29null_blk: use the default discard granularityChristoph Hellwig1-1/+0
The discard granularity now defaults to a single sector, so don't set that value explicitly. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231228075545.362768-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-27null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORSChristoph Hellwig1-10/+2
null_blk has some rather odd capping of the max_hw_sectors value to BLK_DEF_MAX_SECTORS, which doesn't make sense - max_hw_sector is the hardware limit, and BLK_DEF_MAX_SECTORS despite the confusing name is the default cap for the max_sectors field used for normal file system I/O. Remove all the capping, and simply leave it to the block layer or user to take up or not all of that for file system I/O. Fixes: ea17fd354ca8 ("null_blk: Allow controlling max_hw_sectors limit") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227092305.279567-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-11-20block/null_blk: Fix double blk_mq_start_request() warningChengming Zhou1-12/+13
When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, null_queue_rq() would return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE for the request, which has been marked as MQ_RQ_IN_FLIGHT by blk_mq_start_request(). Then null_queue_rqs() put these requests in the rqlist, return back to the block layer core, which would try to queue them individually again, so the warning in blk_mq_start_request() triggered. Fix it by splitting the null_queue_rq() into two parts: the first is the preparation of request, the second is the handling of request. We put the blk_mq_start_request() after the preparation part, which may fail and return back to the block layer core. The throttling also belongs to the preparation part, so move it before blk_mq_start_request(). And change the return type of null_handle_cmd() to void, since it always return BLK_STS_OK now. Reported-by: <syzbot+fcc47ba2476570cbbeb0@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/all/0000000000000e6aac06098aee0c@google.com/ Fixes: d78bfa1346ab ("block/null_blk: add queue_rqs() support") Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20231120032521.1012037-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-04null_blk: replace strncpy with strscpyJustin Stitt1-1/+1
`strncpy` is deprecated for use on NUL-terminated destination strings [1]. We should favor a more robust and less ambiguous interface. We expect that both `nullb->disk_name` and `disk->disk_name` be NUL-terminated: | snprintf(nullb->disk_name, sizeof(nullb->disk_name), | "%s", config_item_name(&dev->group.cg_item)); ... | pr_info("disk %s created\n", nullb->disk_name); It seems like NUL-padding may be required due to __assign_disk_name() utilizing a memcpy as opposed to a `str*cpy` api. | static inline void __assign_disk_name(char *name, struct gendisk *disk) | { | if (disk) | memcpy(name, disk->disk_name, DISK_NAME_LEN); | else | memset(name, 0, DISK_NAME_LEN); | } Then we go and print it with `__print_disk_name` which wraps `nullb_trace_disk_name()`. | #define __print_disk_name(name) nullb_trace_disk_name(p, name) This function obviously expects a NUL-terminated string. | const char *nullb_trace_disk_name(struct trace_seq *p, char *name) | { | const char *ret = trace_seq_buffer_ptr(p); | | if (name && *name) | trace_seq_printf(p, "disk=%s, ", name); | trace_seq_putc(p, 0); | | return ret; | } >From the above, we need both 1) a NUL-terminated string and 2) a NUL-padded string. So, let's use strscpy_pad() as per Kees' suggestion from v1. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230919-strncpy-drivers-block-null_blk-main-c-v3-1-10cf0a87a2c3@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-22block/null_blk: add queue_rqs() supportChengming Zhou1-0/+20
Add batched mq_ops.queue_rqs() support in null_blk for testing. The implementation is much easy since null_blk doesn't have commit_rqs(). We simply handle each request one by one, if errors are encountered, leave them in the passed in list and return back. There is about 3.6% improvement in IOPS of fio/t/io_uring on null_blk with hw_queue_depth=256 on my test VM, from 1.09M to 1.13M. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-6-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-01null_blk: fix poll request timeout handlingChengming Zhou1-2/+10
When doing io_uring benchmark on /dev/nullb0, it's easy to crash the kernel if poll requests timeout triggered, as reported by David. [1] BUG: kernel NULL pointer dereference, address: 0000000000000008 Workqueue: kblockd blk_mq_timeout_work RIP: 0010:null_timeout_rq+0x4e/0x91 Call Trace: ? null_timeout_rq+0x4e/0x91 blk_mq_handle_expired+0x31/0x4b bt_iter+0x68/0x84 ? bt_tags_iter+0x81/0x81 __sbitmap_for_each_set.constprop.0+0xb0/0xf2 ? __blk_mq_complete_request_remote+0xf/0xf bt_for_each+0x46/0x64 ? __blk_mq_complete_request_remote+0xf/0xf ? percpu_ref_get_many+0xc/0x2a blk_mq_queue_tag_busy_iter+0x14d/0x18e blk_mq_timeout_work+0x95/0x127 process_one_work+0x185/0x263 worker_thread+0x1b5/0x227 This is indeed a race problem between null_timeout_rq() and null_poll(). null_poll() null_timeout_rq() spin_lock(&nq->poll_lock) list_splice_init(&nq->poll_list, &list) spin_unlock(&nq->poll_lock) while (!list_empty(&list)) req = list_first_entry() list_del_init() ... blk_mq_add_to_batch() // req->rq_next = NULL spin_lock(&nq->poll_lock) // rq->queuelist->next == NULL list_del_init(&rq->queuelist) spin_unlock(&nq->poll_lock) Fix these problems by setting requests state to MQ_RQ_COMPLETE under nq->poll_lock protection, in which null_timeout_rq() can safely detect this race and early return. Note this patch just fix the kernel panic when request timeout happen. [1] https://lore.kernel.org/all/3893581.1691785261@warthog.procyon.org.uk/ Fixes: 0a593fbbc245 ("null_blk: poll queue support") Reported-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20230901120306.170520-2-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-06null_blk: Fix: memory release when memory_backed=1Nitesh Shetty1-0/+1
Memory/pages are not freed, when unloading nullblk driver. Steps to reproduce issue 1.free -h total used free shared buff/cache available Mem: 7.8Gi 260Mi 7.1Gi 3.0Mi 395Mi 7.3Gi Swap: 0B 0B 0B 2.modprobe null_blk memory_backed=1 3.dd if=/dev/urandom of=/dev/nullb0 oflag=direct bs=1M count=1000 4.modprobe -r null_blk 5.free -h total used free shared buff/cache available Mem: 7.8Gi 1.2Gi 6.1Gi 3.0Mi 398Mi 6.3Gi Swap: 0B 0B 0B Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Link: https://lore.kernel.org/r/20230605062354.24785-1-nj.shetty@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-06Merge tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linuxLinus Torvalds1-1/+0
Pull more block updates from Jens Axboe: - MD pull request via Song: - Improve raid5 sequential IO performance on spinning disks, which fixes a regression since v6.0 (Jan Kara) - Fix bitmap offset types, which fixes an issue introduced in this merge window (Jonathan Derrick) - Cleanup of hweight type used for cgroup writeback (Maxim) - Fix a regression with the "has_submit_bio" changes across partitions (Ming) - Cleanup of QUEUE_FLAG_ADD_RANDOM clearing. We used to set this flag on queues non blk-mq queues, and hence some drivers clear it unconditionally. Since all of these have since been converted to true blk-mq drivers, drop the useless clear as the bit is not set (Chaitanya) - Fix the flags being set in a bio for a flush for drbd (Christoph) - Cleanup and deduplication of the code handling setting block device capacity (Damien) - Fix for ublk handling IO timeouts (Ming) - Fix for a regression in blk-cgroup teardown (Tao) - NBD documentation and code fixes (Eric) - Convert blk-integrity to using device_attributes rather than a second kobject to manage lifetimes (Thomas) * tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux: ublk: add timeout handler drbd: correctly submit flush bio on barrier mailmap: add mailmap entries for Jens Axboe block: Skip destroyed blkg when restart in blkg_destroy_all() writeback: fix call of incorrect macro md: Fix bitmap offset type in sb writer md/raid5: Improve performance for sequential IO docs nbd: userspace NBD now favors github over sourceforge block nbd: use req.cookie instead of req.handle uapi nbd: add cookie alias to handle uapi nbd: improve doc links to userspace spec blk-integrity: register sysfs attributes on struct device blk-integrity: convert to struct device_attribute blk-integrity: use sysfs_emit block/drivers: remove dead clear of random flag block: sync part's ->bd_has_submit_bio with disk's block: Cleanup set_capacity()/bdev_set_nr_sectors()
2023-04-26Merge tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linuxLinus Torvalds1-47/+88
Pull block updates from Jens Axboe: - drbd patches, bringing us closer to unifying the out-of-tree version and the in tree one (Andreas, Christoph) - support for auto-quiesce for the s390 dasd driver (Stefan) - MD pull request via Song: - md/bitmap: Optimal last page size (Jon Derrick) - Various raid10 fixes (Yu Kuai, Li Nan) - md: add error_handlers for raid0 and linear (Mariusz Tkaczyk) - NVMe pull request via Christoph: - Drop redundant pci_enable_pcie_error_reporting (Bjorn Helgaas) - Validate nvmet module parameters (Chaitanya Kulkarni) - Fence TCP socket on receive error (Chris Leech) - Fix async event trace event (Keith Busch) - Minor cleanups (Chaitanya Kulkarni, zhenwei pi) - Fix and cleanup nvmet Identify handling (Damien Le Moal, Christoph Hellwig) - Fix double blk_mq_complete_request race in the timeout handler (Lei Yin) - Fix irq locking in nvme-fcloop (Ming Lei) - Remove queue mapping helper for rdma devices (Sagi Grimberg) - use structured request attribute checks for nbd (Jakub) - fix blk-crypto race conditions between keyslot management (Eric) - add sed-opal support for reading read locking range attributes (Ondrej) - make fault injection configurable for null_blk (Akinobu) - clean up the request insertion API (Christoph) - clean up the queue running API (Christoph) - blkg config helper cleanups (Tejun) - lazy init support for blk-iolatency (Tejun) - various fixes and tweaks to ublk (Ming) - remove hybrid polling. It hasn't really been useful since we got async polled IO support, and these days we don't support sync polled IO at all (Keith) - misc fixes, cleanups, improvements (Zhong, Ondrej, Colin, Chengming, Chaitanya, me) * tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux: (118 commits) nbd: fix incomplete validation of ioctl arg ublk: don't return 0 in case of any failure sed-opal: geometry feature reporting command null_blk: Always check queue mode setting from configfs block: ublk: switch to ioctl command encoding blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush block, bfq: Fix division by zero error on zero wsum fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m block: store bdev->bd_disk->fops->submit_bio state in bdev block: re-arrange the struct block_device fields for better layout md/raid5: remove unused working_disks variable md/raid10: don't call bio_start_io_acct twice for bio which experienced read error md/raid10: fix memleak of md thread md/raid10: fix memleak for 'conf->bio_split' md/raid10: fix leak of 'r10bio->remaining' for recovery md/raid10: don't BUG_ON() in raise_barrier() md: fix soft lockup in status_resync md: add error_handlers for raid0 and linear md: Use optimal I/O size for last bitmap page md: Fix types in sb writer ...
2023-04-25block/drivers: remove dead clear of random flagChaitanya Kulkarni1-1/+0
QUEUE_FLAG_ADD_RANDOM is not set before we clear it for "null_blk", "brd", "nbd", "zram", and "bcache" since by default we don't set "QUEUE_FLAG_ADD_RANDOM" to MQ ops. Remove dead clear of QUEUE_FLAG_ADD_RANDOM in above listed drivers. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> #zram Link: https://lore.kernel.org/r/20230424234628.45544-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-19null_blk: Always check queue mode setting from configfsChaitanya Kulkarni1-0/+5
Make sure to check device queue mode in the null_validate_conf() and return error for NULL_Q_RQ as we don't allow legacy I/O path, without this patch we get OOPs when queue mode is set to 1 from configfs, following are repro steps :- modprobe null_blk nr_devices=0 mkdir config/nullb/nullb0 echo 1 > config/nullb/nullb0/memory_backed echo 4096 > config/nullb/nullb0/blocksize echo 20480 > config/nullb/nullb0/size echo 1 > config/nullb/nullb0/queue_mode echo 1 > config/nullb/nullb0/power Entering kdb (current=0xffff88810acdd080, pid 2372) on processor 42 Oops: (null) due to oops @ 0xffffffffc041c329 CPU: 42 PID: 2372 Comm: sh Tainted: G O N 6.3.0-rc5lblk+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:null_add_dev.part.0+0xd9/0x720 [null_blk] Code: 01 00 00 85 d2 0f 85 a1 03 00 00 48 83 bb 08 01 00 00 00 0f 85 f7 03 00 00 80 bb 62 01 00 00 00 48 8b 75 20 0f 85 6d 02 00 00 <48> 89 6e 60 48 8b 75 20 bf 06 00 00 00 e8 f5 37 2c c1 48 8b 75 20 RSP: 0018:ffffc900052cbde0 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff88811084d800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888100042e00 RBP: ffff8881053d8200 R08: ffffc900052cbd68 R09: ffff888105db2000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002 R13: ffff888104765200 R14: ffff88810eec1748 R15: ffff88810eec1740 FS: 00007fd445fd1740(0000) GS:ffff8897dfc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000060 CR3: 0000000166a00000 CR4: 0000000000350ee0 DR0: ffffffff8437a488 DR1: ffffffff8437a489 DR2: ffffffff8437a48a DR3: ffffffff8437a48b DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <TASK> nullb_device_power_store+0xd1/0x120 [null_blk] configfs_write_iter+0xb4/0x120 vfs_write+0x2ba/0x3c0 ksys_write+0x5f/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fd4460c57a7 Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffd3792a4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4460c57a7 RDX: 0000000000000002 RSI: 000055b43c02e4c0 RDI: 0000000000000001 RBP: 000055b43c02e4c0 R08: 000000000000000a R09: 00007fd44615b4e0 R10: 00007fd44615b3e0 R11: 0000000000000246 R12: 0000000000000002 R13: 00007fd446198520 R14: 0000000000000002 R15: 00007fd446198700 </TASK> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Link: https://lore.kernel.org/r/20230416220339.43845-1-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-13block: null_blk: make fault-injection dynamically configurable per deviceAkinobu Mita1-21/+72
The null_blk driver has multiple driver-specific fault injection mechanisms. Each fault injection configuration can only be specified by a module parameter and cannot be reconfigured without reloading the driver. Also, each configuration is common to all devices and is initialized every time a new device is added. This change adds the following subdirectories for each null_blk device. /sys/kernel/config/nullb/<disk>/timeout_inject /sys/kernel/config/nullb/<disk>/requeue_inject /sys/kernel/config/nullb/<disk>/init_hctx_fault_inject Each fault injection attribute can be dynamically set per device by a corresponding file in these directories. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Link: https://lore.kernel.org/r/20230327143733.14599-3-akinobu.mita@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02null_blk: use kmap_local_page() and kunmap_local()Chaitanya Kulkarni1-4/+4
Replace the deprecated API kmap_atomic() and kunmap_atomic() with kmap_local_page() and kunmap_local() in null_flush_cache_page(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230330184926.64209-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-02null_blk: use non-deprecated lib functionsChaitanya Kulkarni1-22/+7
Use library helper memcpy_page() to copy source page into destination instead of having duplicate code in copy_to_nullb() & copy_from_nullb(). In copy_from_nullb() also replace the memset call with zero_user(). Use library helper memset_page() to set the buffer to 0xFF instead of having duplicate code. This also removes deprecated API kmap_atomic() from copy_to_nullb() copy_from_nullb() and nullb_fill_pattern(), from :include/linux/highmem.h: "kmap_atomic - Atomically map a page for temporary usage - Deprecated!" Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230330184926.64209-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15block: null_blk: cleanup null_queue_rq()Damien Le Moal1-15/+14
Use a local struct request pointer variable to avoid having to dereference struct blk_mq_queue_data multiple times. While at it, also fix the function argument indentation and remove a useless "else" after a return. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-15block: null_blk: Fix handling of fake timeout requestDamien Le Moal1-3/+3
When injecting a fake timeout into the null_blk driver using fail_io_timeout, the request timeout handler does not execute blk_mq_complete_request(), so the complete callback is never executed for a timedout request. The null_blk driver also has a driver-specific fake timeout mechanism which does not have this problem. Fix the problem with fail_io_timeout by using the same meachanism as null_blk internal timeout feature, using the fake_timeout field of null_blk commands. Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Fixes: de3510e52b0a ("null_blk: fix command timeout completion handling") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-30block: make BLK_DEF_MAX_SECTORS unsignedKeith Busch1-2/+1
This is used as an unsigned value, so define it that way to avoid having to cast it. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230105205146.3610282-2-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-02null_blk: support read-only and offline zone conditionsShin'ichiro Kawasaki1-1/+21
In zoned mode, zones with write pointers can have conditions "read-only" or "offline". In read-only condition, zones can not be written. In offline condition, the zones can be neither written nor read. These conditions are intended for zones with media failures, then it is difficult to set those conditions to zones on real devices. To test handling of zones in the conditions, add a feature to null_blk to set up zones in read-only or offline condition. Add new configuration attributes "zone_readonly" and "zone_offline". Write a sector to the attribute files to specify the target zone to set the zone conditions. For example, following command lines do it: echo 0 > nullb1/zone_readonly echo 524288 > nullb1/zone_offline When the specified zones are already in read-only or offline condition, normal empty condition is restored to the zones. These condition changes can be done only after the null_blk device get powered, since status area of each zone is not yet allocated before power-on. Also improve zone condition checks to inhibit all commands for zones in offline conditions. In same manner, inhibit write and zone management commands for zones in read-only condition. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20221201061036.2342206-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-22block: Change the return type of blk_mq_map_queues() into voidBart Van Assche1-3/+1
Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-22null_blk: Modify the behavior of null_map_queues()Bart Van Assche1-1/+3
Instead of returning -EINVAL if an internal inconsistency is detected, fall back to a single submission queue. This patch prepares for changing the return value of the .map_queues() callbacks into void. Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220815170043.19489-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03null_blk: fix ida error handling in null_add_dev()Dan Carpenter1-3/+11
There needs to be some error checking if ida_simple_get() fails. Also call ida_free() if there are errors later. Fixes: 94bc02e30fb8 ("nullb: use ida to manage index") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YtEhXsr6vJeoiYhd@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03null_blk: add configfs variables for 2 optionsVincent Fu1-19/+48
Allow setting via configfs these two options: no_sched shared_tag_bitmap Previously these could only be activated as module parameters. Still missing are: shared_tags timeout requeue init_hctx Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220708174943.87787-3-vincent.fu@samsung.com [axboe: fold in nullb == NULL fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03null_blk: add module parameters for 4 optionsVincent Fu1-0/+20
Add as module parameters these options: memory_backed discard mbps cache_size Previously these could only be set via configfs. Still missing is bad_blocks. The kernel test robot found a documentation formatting issue in v1 of this patch. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Link: https://lore.kernel.org/r/20220708174943.87787-2-vincent.fu@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03block: null_blk: Use the bitmap API to allocate bitmapsChristophe JAILLET1-4/+3
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/7c4d3116ba843fc4a8ae557dd6176352a6cd0985.1656864320.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14treewide: Rename enum req_opf into enum req_opBart Van Assche1-5/+4
The type name enum req_opf is misleading since it suggests that values of this type include both an operation type and flags. Since values of this type represent an operation only, change the type name into enum req_op. Convert the enum req_op documentation into kernel-doc format. Move a few definitions such that the enum req_op documentation occurs just above the enum req_op definition. The name "req_opf" was introduced by commit ef295ecf090d ("block: better op and flags encoding"). Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06blk-mq: Drop blk_mq_ops.timeout 'reserved' argJohn Garry1-1/+1
With new API blk_mq_is_reserved_rq() we can tell if a request is from the reserved pool, so stop passing 'reserved' arg. There is actually only a single user of that arg for all the callback implementations, which can use blk_mq_is_reserved_rq() instead. This will also allow us to stop passing the same 'reserved' around the blk-mq iter functions next. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: remove blk_cleanup_diskChristoph Hellwig1-2/+2
blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-02block: null_blk: Fix null_zone_write()Damien Le Moal1-6/+0
The bio and rq fields of struct nullb_cmd are now overlapping in a union. So we cannot use a test on ->bio being non-NULL to detect the NULL_Q_BIO queue mode. null_zone_write() use such broken test to set the sector position of a zone append write in the command bio or request. When the null_blk device uses the NULL_Q_MQ queue mode, null_zone_write() wrongly end up setting the bio sector position, resulting in the command request to be broken and random crashes following. Fix this by testing the device queue mode directly. Fixes: 8ba816b23abd ("null-blk: save memory footprint for struct nullb_cmd") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220602120344.1365329-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04block: null_blk: Improve device creation with configfsDamien Le Moal1-1/+27
Currently, the directory name used to create a nullb device through sysfs is not used as the device name, potentially causing headaches for users if devices are already created through the modprobe operation withe the nr_device module parameter not set to 0. E.g. a user can do "mkdir /sys/kernel/config/nullb/nullb0" to create a nullb device even though /dev/nullb0 was already created by modprobe. In this case, the configfs nullb device will be named nullb1, causing confusion for the user. Simplify this by using the configfs directory name as the nullb device name, always, unless another nullb device is already using the same name. E.g. if modprobe created nullb0, then: $ mkdir /sys/kernel/config/nullb/nullb0 mkdir: cannot create directory '/sys/kernel/config/nullb/nullb0': File exists will be reported to the user. To implement this, the function null_find_dev_by_name() is added to check for the existence of a nullb device with the name used for a new configfs device directory. nullb_group_make_item() uses this new function to check if the directory name can be used as the disk name. Finally, null_add_dev() is modified to use the device config item name as the disk name for a new nullb device created using configfs. The naming of devices created though modprobe remains unchanged. Of note is that it is possible for a user to create through configfs a nullb device with the same name as an existing device. E.g. $ mkdir /sys/kernel/config/nullb/null will successfully create the nullb device named "null" but this block device will however not appear under /dev/ since /dev/null already exists. Suggested-by: Joseph Bacik <josef@toxicpanda.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-5-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04block: null_blk: Cleanup messagesDamien Le Moal1-0/+5
Use the pr_fmt() macro to prefix all null_blk pr_xxx() messages with "null_blk:" to clarify which module is printing the messages. Also add a pr_info() message in null_add_dev() to print the name of a newly created disk. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-4-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04block: null_blk: Cleanup device creation and deletionDamien Le Moal1-18/+30
Introduce the null_create_dev() and null_destroy_dev() helper functions to respectivel create nullb devices on modprobe and destroy them on rmmod. The null_destroy_dev() helper avoids duplicated code in the null_init() and null_exit() functions for deleting devices. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-3-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04block: null_blk: Fix code style issuesDamien Le Moal1-4/+6
Fix message grammar and code style issues (brackets and indentation) in null_init(). Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03null_blk: don't set the discard_alignment queue limitChristoph Hellwig1-1/+0
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by null_blk is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18block: remove QUEUE_FLAG_DISCARDChristoph Hellwig1-1/+0
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-14block: null_blk: end timed out poll requestMing Lei1-1/+1
When poll request is timed out, it is removed from the poll list, but not completed, so the request is leaked, and never get chance to complete. Fix the issue by ending it in timeout handler. Fixes: 0a593fbbc245 ("null_blk: poll queue support") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220413084836.1571995-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28null_blk: null_alloc_page() cleanupChaitanya Kulkarni1-7/+5
Remove goto labels and use direct returns as error unwinding code only needs to free t_page variable if we alloc_pages() call fails as having two labels for one kfree() can be avoided easily. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220222152852.26043-3-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28null_blk: remove hardcoded null_alloc_page() paramChaitanya Kulkarni1-4/+4
Only caller of null_alloc_page() is null_insert_page() unconditionally sets only parameter to GFP_NOIO and that is statically hard-coded in null_blk. There is no point in having statically hardcoded function parameter. Remove the unnecessary parameter gfp_flags and adjust the code, so it can retain existing behavior null_alloc_page() with GFP_NOIO. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220222152852.26043-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28null_blk: remove hardcoded alloc_cmd() parameterChaitanya Kulkarni1-17/+12
Only caller of alloc_cmd() is null_submit_bio() unconditionally sets second parameter to true and that is statically hard-coded in null_blk. There is no point in having statically hardcoded function parameter. Remove the unnecessary parameter can_wait and adjust the code so it can retain existing behavior of waiting when we don't get valid nullb_cmd from __alloc_cmd() in alloc_cmd(). The restructured code avoids multiple return statements, multiple calls to __alloc_cmd() and resulting a fast path call to prepare_to_wait() due to removal of first alloc_cmd() call. Follow the pattern that we have in bio_alloc() to set the structure members in the structure allocation function in alloc_cmd() and pass bio to initialize newly allocated cmd->bio member. Follow the pattern in copy_to_nullb() to use result of one function call (null_cache_active()) to be used as a parameter to another function call (null_insert_page()), use result of alloc_cmd() as a first parameter to the null_handle_cmd() in null_submit_bio() function. This allow us to remove the local variable cmd on stack in null_submit_bio() that is in fast path. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220216172945.31124-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28null_blk: fix return value from null_add_dev()Chaitanya Kulkarni1-2/+3
The function nullb_device_power_store() returns -ENOMEM when null_add_dev() fails. null_add_dev() can fail with return value other than -ENOMEM such as -EINVAL when Zoned Block Device option is used, see : nullb_device_power_store() null_add_dev() null_init_zoned_dev() return -EINVAL; When trying to load the module having -ENOMEM value returned on the command line creates confusion when pleanty of memory is free on the machine. Instead of hardcoding -ENOMEM return the value of null_add_dev() function. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220215115951.15945-1-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>