summaryrefslogtreecommitdiff
path: root/drivers/block/null_blk
AgeCommit message (Collapse)AuthorFilesLines
2022-12-02null_blk: support read-only and offline zone conditionsShin'ichiro Kawasaki3-4/+121
In zoned mode, zones with write pointers can have conditions "read-only" or "offline". In read-only condition, zones can not be written. In offline condition, the zones can be neither written nor read. These conditions are intended for zones with media failures, then it is difficult to set those conditions to zones on real devices. To test handling of zones in the conditions, add a feature to null_blk to set up zones in read-only or offline condition. Add new configuration attributes "zone_readonly" and "zone_offline". Write a sector to the attribute files to specify the target zone to set the zone conditions. For example, following command lines do it: echo 0 > nullb1/zone_readonly echo 524288 > nullb1/zone_offline When the specified zones are already in read-only or offline condition, normal empty condition is restored to the zones. These condition changes can be done only after the null_blk device get powered, since status area of each zone is not yet allocated before power-on. Also improve zone condition checks to inhibit all commands for zones in offline conditions. In same manner, inhibit write and zone management commands for zones in read-only condition. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20221201061036.2342206-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-22block: Change the return type of blk_mq_map_queues() into voidBart Van Assche1-3/+1
Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-22null_blk: Modify the behavior of null_map_queues()Bart Van Assche1-1/+3
Instead of returning -EINVAL if an internal inconsistency is detected, fall back to a single submission queue. This patch prepares for changing the return value of the .map_queues() callbacks into void. Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220815170043.19489-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03null_blk: fix ida error handling in null_add_dev()Dan Carpenter1-3/+11
There needs to be some error checking if ida_simple_get() fails. Also call ida_free() if there are errors later. Fixes: 94bc02e30fb8 ("nullb: use ida to manage index") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YtEhXsr6vJeoiYhd@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03null_blk: add configfs variables for 2 optionsVincent Fu2-19/+50
Allow setting via configfs these two options: no_sched shared_tag_bitmap Previously these could only be activated as module parameters. Still missing are: shared_tags timeout requeue init_hctx Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220708174943.87787-3-vincent.fu@samsung.com [axboe: fold in nullb == NULL fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03null_blk: add module parameters for 4 optionsVincent Fu1-0/+20
Add as module parameters these options: memory_backed discard mbps cache_size Previously these could only be set via configfs. Still missing is bad_blocks. The kernel test robot found a documentation formatting issue in v1 of this patch. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Link: https://lore.kernel.org/r/20220708174943.87787-2-vincent.fu@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-03block: null_blk: Use the bitmap API to allocate bitmapsChristophe JAILLET1-4/+3
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/7c4d3116ba843fc4a8ae557dd6176352a6cd0985.1656864320.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14treewide: Rename enum req_opf into enum req_opBart Van Assche4-15/+12
The type name enum req_opf is misleading since it suggests that values of this type include both an operation type and flags. Since values of this type represent an operation only, change the type name into enum req_op. Convert the enum req_op documentation into kernel-doc format. Move a few definitions such that the enum req_op documentation occurs just above the enum req_op definition. The name "req_opf" was introduced by commit ef295ecf090d ("block: better op and flags encoding"). Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: move zone related fields to struct gendiskChristoph Hellwig1-1/+1
Move the zone related fields that are currently stored in struct request_queue to struct gendisk as these are part of the highlevel block layer API and are only used for non-passthrough I/O that requires the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: replace blkdev_nr_zones with bdev_nr_zonesChristoph Hellwig1-1/+1
Pass a block_device instead of a request_queue as that is what most callers have at hand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: pass a gendisk to blk_queue_max_open_zones and blk_queue_max_active_zonesChristoph Hellwig1-2/+2
Switch to a gendisk based API in preparation for moving all zone related fields from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: pass a gendisk to blk_queue_set_zonedChristoph Hellwig1-1/+1
Prepare for storing the zone related field in struct gendisk instead of struct request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06blk-mq: Drop blk_mq_ops.timeout 'reserved' argJohn Garry1-1/+1
With new API blk_mq_is_reserved_rq() we can tell if a request is from the reserved pool, so stop passing 'reserved' arg. There is actually only a single user of that arg for all the callback implementations, which can use blk_mq_is_reserved_rq() instead. This will also allow us to stop passing the same 'reserved' around the blk-mq iter functions next. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-28block: remove blk_cleanup_diskChristoph Hellwig1-2/+2
blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-02block: null_blk: Fix null_zone_write()Damien Le Moal3-9/+10
The bio and rq fields of struct nullb_cmd are now overlapping in a union. So we cannot use a test on ->bio being non-NULL to detect the NULL_Q_BIO queue mode. null_zone_write() use such broken test to set the sector position of a zone append write in the command bio or request. When the null_blk device uses the NULL_Q_MQ queue mode, null_zone_write() wrongly end up setting the bio sector position, resulting in the command request to be broken and random crashes following. Fix this by testing the device queue mode directly. Fixes: 8ba816b23abd ("null-blk: save memory footprint for struct nullb_cmd") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220602120344.1365329-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04block: null_blk: Improve device creation with configfsDamien Le Moal1-1/+27
Currently, the directory name used to create a nullb device through sysfs is not used as the device name, potentially causing headaches for users if devices are already created through the modprobe operation withe the nr_device module parameter not set to 0. E.g. a user can do "mkdir /sys/kernel/config/nullb/nullb0" to create a nullb device even though /dev/nullb0 was already created by modprobe. In this case, the configfs nullb device will be named nullb1, causing confusion for the user. Simplify this by using the configfs directory name as the nullb device name, always, unless another nullb device is already using the same name. E.g. if modprobe created nullb0, then: $ mkdir /sys/kernel/config/nullb/nullb0 mkdir: cannot create directory '/sys/kernel/config/nullb/nullb0': File exists will be reported to the user. To implement this, the function null_find_dev_by_name() is added to check for the existence of a nullb device with the name used for a new configfs device directory. nullb_group_make_item() uses this new function to check if the directory name can be used as the disk name. Finally, null_add_dev() is modified to use the device config item name as the disk name for a new nullb device created using configfs. The naming of devices created though modprobe remains unchanged. Of note is that it is possible for a user to create through configfs a nullb device with the same name as an existing device. E.g. $ mkdir /sys/kernel/config/nullb/null will successfully create the nullb device named "null" but this block device will however not appear under /dev/ since /dev/null already exists. Suggested-by: Joseph Bacik <josef@toxicpanda.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-5-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04block: null_blk: Cleanup messagesDamien Le Moal2-2/+10
Use the pr_fmt() macro to prefix all null_blk pr_xxx() messages with "null_blk:" to clarify which module is printing the messages. Also add a pr_info() message in null_add_dev() to print the name of a newly created disk. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-4-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04block: null_blk: Cleanup device creation and deletionDamien Le Moal1-18/+30
Introduce the null_create_dev() and null_destroy_dev() helper functions to respectivel create nullb devices on modprobe and destroy them on rmmod. The null_destroy_dev() helper avoids duplicated code in the null_init() and null_exit() functions for deleting devices. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-3-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04block: null_blk: Fix code style issuesDamien Le Moal1-4/+6
Fix message grammar and code style issues (brackets and indentation) in null_init(). Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03null_blk: don't set the discard_alignment queue limitChristoph Hellwig1-1/+0
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by null_blk is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-26null-blk: save memory footprint for struct nullb_cmdYu Kuai1-3/+5
Total 16 bytes can be saved in two ways: 1) The field 'bio' will only be used in bio based mode, and the field 'rq' will only be used in mq mode. Since they won't be used in the same time, declare a union for them. 2) The field 'bool fake_timeout' can be placed in the hole after the field 'error'. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20220426022133.3999006-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18block: remove QUEUE_FLAG_DISCARDChristoph Hellwig1-1/+0
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-14block: null_blk: end timed out poll requestMing Lei1-1/+1
When poll request is timed out, it is removed from the poll list, but not completed, so the request is leaked, and never get chance to complete. Fix the issue by ending it in timeout handler. Fixes: 0a593fbbc245 ("null_blk: poll queue support") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220413084836.1571995-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28null_blk: null_alloc_page() cleanupChaitanya Kulkarni1-7/+5
Remove goto labels and use direct returns as error unwinding code only needs to free t_page variable if we alloc_pages() call fails as having two labels for one kfree() can be avoided easily. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220222152852.26043-3-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28null_blk: remove hardcoded null_alloc_page() paramChaitanya Kulkarni1-4/+4
Only caller of null_alloc_page() is null_insert_page() unconditionally sets only parameter to GFP_NOIO and that is statically hard-coded in null_blk. There is no point in having statically hardcoded function parameter. Remove the unnecessary parameter gfp_flags and adjust the code, so it can retain existing behavior null_alloc_page() with GFP_NOIO. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220222152852.26043-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28null_blk: remove hardcoded alloc_cmd() parameterChaitanya Kulkarni1-17/+12
Only caller of alloc_cmd() is null_submit_bio() unconditionally sets second parameter to true and that is statically hard-coded in null_blk. There is no point in having statically hardcoded function parameter. Remove the unnecessary parameter can_wait and adjust the code so it can retain existing behavior of waiting when we don't get valid nullb_cmd from __alloc_cmd() in alloc_cmd(). The restructured code avoids multiple return statements, multiple calls to __alloc_cmd() and resulting a fast path call to prepare_to_wait() due to removal of first alloc_cmd() call. Follow the pattern that we have in bio_alloc() to set the structure members in the structure allocation function in alloc_cmd() and pass bio to initialize newly allocated cmd->bio member. Follow the pattern in copy_to_nullb() to use result of one function call (null_cache_active()) to be used as a parameter to another function call (null_insert_page()), use result of alloc_cmd() as a first parameter to the null_handle_cmd() in null_submit_bio() function. This allow us to remove the local variable cmd on stack in null_submit_bio() that is in fast path. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220216172945.31124-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-28null_blk: fix return value from null_add_dev()Chaitanya Kulkarni1-2/+3
The function nullb_device_power_store() returns -ENOMEM when null_add_dev() fails. null_add_dev() can fail with return value other than -ENOMEM such as -EINVAL when Zoned Block Device option is used, see : nullb_device_power_store() null_add_dev() null_init_zoned_dev() return -EINVAL; When trying to load the module having -ENOMEM value returned on the command line creates confusion when pleanty of memory is free on the machine. Instead of hardcoding -ENOMEM return the value of null_add_dev() function. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220215115951.15945-1-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-24block: null_blk: only set set->nr_maps as 3 if active poll_queues is > 0Ming Lei1-1/+1
It isn't correct to set set->nr_maps as 3 if g_poll_queues is > 0 since we can change it via configfs for null_blk device created there, so only set it as 3 if active poll_queues is > 0. Fixes divide zero exception reported by Shinichiro. Fixes: 2bfdbe8b7ebd ("null_blk: allow zero poll queues") Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20211224010831.1521805-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-11null_blk: cast command status to integerJens Axboe1-1/+1
kernel test robot reports that sparse now triggers a warning on null_blk: >> drivers/block/null_blk/main.c:1577:55: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected int ioerror @@ got restricted blk_status_t [usertype] error @@ drivers/block/null_blk/main.c:1577:55: sparse: expected int ioerror drivers/block/null_blk/main.c:1577:55: sparse: got restricted blk_status_t [usertype] error because blk_mq_add_to_batch() takes an integer instead of a blk_status_t. Just cast this to an integer to silence it, null_blk is the odd one out here since the command status is the "right" type. If we change the function type, then we'll have do that for other callers too (existing and future ones). Fixes: 2385ebf38f94 ("block: null_blk: batched complete poll requests") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-03block: null_blk: batched complete poll requestsMing Lei1-1/+3
Complete poll requests via blk_mq_add_to_batch() and blk_mq_end_request_batch(), so that we can cover batched complete code path by running null_blk test. Meantime this way shows ~14% IOPS boost on 't/io_uring /dev/nullb0' in my test. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203081703.3506020-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-03null_blk: allow zero poll queuesMing Lei1-4/+2
There isn't any reason to not allow zero poll queues from user viewpoint. Also sometimes we need to compare io poll between poll mode and irq mode, so not allowing poll queues is bad. Fixes: 15dfc662ef31 ("null_blk: Fix handling of submit_queues and poll_queues attributes") Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203023935.3424042-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29block: remove the ->rq_disk field in struct requestChristoph Hellwig1-1/+1
Just use the disk attached to the request_queue instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29block: remove GENHD_FL_EXT_DEVTChristoph Hellwig1-1/+0
All modern drivers can support extra partitions using the extended dev_t. In fact except for the ioctl method drivers never even see partitions in normal operation. So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all block devices that do support partitions, and require those that do not support partitions to explicit disallow them using GENHD_FL_NO_PART. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29null_blk: don't suppress partitioning informationChristoph Hellwig1-1/+1
This manually reverts commit 27290b469051 ("null_blk: suppress invalid partition info"). The message in that commit log can't appearch as the flag is never checked during probing, and there is no good reason to treat null_blk special in /proc/partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29null_blk: Fix handling of submit_queues and poll_queues attributesShin'ichiro Kawasaki2-17/+87
Commit 0a593fbbc245 ("null_blk: poll queue support") introduced the poll queue feature to null_blk. After this change, null_blk device has both submit queues and poll queues, and null_map_queues() callback maps the both queues for corresponding hardware contexts. The commit also added the device configuration attribute 'poll_queues' in same manner as the existing attribute 'submit_queues'. These attributes allow to modify the numbers of queues. However, when the new values are stored to these attributes, the values are just handled only for the corresponding queue. When number of submit_queue is updated, number of poll_queue is not counted, or vice versa. This caused inconsistent number of queues and queue mapping and resulted in null-ptr-dereference. This failure was observed in blktests block/029 and block/030. To avoid the inconsistency, fix the attribute updates to care both submit_queues and poll_queues. Introduce the helper function nullb_update_nr_hw_queues() to handle stores to the both two attributes. Add poll_queues field to the struct nullb_device to track the number in same manner as submit_queues. Add two more fields prev_submit_queues and prev_poll_queues to keep the previous values before change. In case the block layer failed to update the nr_hw_queues, refer the previous values in null_map_queues() to map queues in same manner as before change. Also add poll_queues value checks in nullb_update_nr_hw_queues() and null_validate_conf(). They ensure the poll_queues value of each device is within the range from 1 to module parameter value of poll_queues. Fixes: 0a593fbbc245 ("null_blk: poll queue support") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20211029103926.845635-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18null_blk: poll queue supportJens Axboe2-4/+108
There's currently no way to experiment with polled IO with null_blk, which seems like an oversight. This patch adds support for polled IO. We keep a list of issued IOs on submit, and then process that list when mq_ops->poll() is invoked. A new parameter is added, poll_queues. It defaults to 1 like the submit queues, meaning we'll have 1 poll queue available. Fixes-by: Bart Van Assche <bvanassche@acm.org> Fixes-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/baca710d-0f2a-16e2-60bd-b105b854e0ae@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: switch polling to be bio basedChristoph Hellwig1-2/+1
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23null_blk: add error handling support for add_disk()Luis Chamberlain1-2/+1
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. The actual cleanup in case of error is already handled by the caller of null_gendisk_register(). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210818144542.19305-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-12block: move some macros to blkdev.hGuoqing Jiang1-4/+0
Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the generic header file to remove redundancy. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Link: https://lore.kernel.org/r/20210721025315.1729118-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-01null_blk: remove an unused variable assignment in null_add_devChristoph Hellwig1-1/+0
Fix up the recent blk_alloc_disk conversion. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060231.3965278-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-11nullb: use blk_mq_alloc_diskChristoph Hellwig1-6/+5
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-21-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-03null_blk: Fix null pointer dereference on nullb->disk on blk_cleanup_disk callColin Ian King1-1/+1
The error handling on a nullb->disk allocation currently jumps to out_cleanup_disk that calls blk_cleanup_disk with a null pointer causing a null pointer dereference issue. Fix this by jumping to out_cleanup_tags instead. Addresses-Coverity: ("Dereference after null check") Fixes: 132226b301b5 ("null_blk: convert to blk_alloc_disk/blk_cleanup_disk") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210602100659.11058-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01null_blk: convert to blk_alloc_disk/blk_cleanup_diskChristoph Hellwig1-19/+19
Convert the null_blk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Note that the blk-mq mode is left with its own allocations scheme, to be handled later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-26-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-29Merge tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-blockLinus Torvalds3-1/+13
Pull block driver updates from Jens Axboe: - MD changes via Song: - raid5 POWER fix - raid1 failure fix - UAF fix for md cluster - mddev_find_or_alloc() clean up - Fix NULL pointer deref with external bitmap - Performance improvement for raid10 discard requests - Fix missing information of /proc/mdstat - rsxx const qualifier removal (Arnd) - Expose allocated brd pages (Calvin) - rnbd via Gioh Kim: - Change maintainer - Change domain address of maintainers' email - Add polling IO mode and document update - Fix memory leak and some bug detected by static code analysis tools - Code refactoring - Series of floppy cleanups/fixes (Denis) - s390 dasd fixes (Julian) - kerneldoc fixes (Lee) - null_blk double free (Lv) - null_blk virtual boundary addition (Max) - Remove xsysace driver (Michal) - umem driver removal (Davidlohr) - ataflop fixes (Dan) - Revalidate disk removal (Christoph) - Bounce buffer cleanups (Christoph) - Mark lightnvm as deprecated (Christoph) - mtip32xx init cleanups (Shixin) - Various fixes (Tian, Gustavo, Coly, Yang, Zhang, Zhiqiang) * tag 'for-5.13/drivers-2021-04-27' of git://git.kernel.dk/linux-block: (143 commits) async_xor: increase src_offs when dropping destination page drivers/block/null_blk/main: Fix a double free in null_init. md/raid1: properly indicate failure when ending a failed write request md-cluster: fix use-after-free issue when removing rdev nvme: introduce generic per-namespace chardev nvme: cleanup nvme_configure_apst nvme: do not try to reconfigure APST when the controller is not live nvme: add 'kato' sysfs attribute nvme: sanitize KATO setting nvmet: avoid queuing keep-alive timer if it is disabled brd: expose number of allocated pages in debugfs ataflop: fix off by one in ataflop_probe() ataflop: potential out of bounds in do_format() drbd: Fix fall-through warnings for Clang block/rnbd: Use strscpy instead of strlcpy block/rnbd-clt-sysfs: Remove copy buffer overlap in rnbd_clt_get_path_name block/rnbd-clt: Remove max_segment_size block/rnbd-clt: Generate kobject_uevent when the rnbd device state changes block/rnbd-srv: Remove unused arguments of rnbd_srv_rdma_ev Documentation/ABI/rnbd-clt: Add description for nr_poll_queues ...
2021-04-26drivers/block/null_blk/main: Fix a double free in null_init.Lv Yunlong1-0/+1
In null_init, null_add_dev(dev) is called. In null_add_dev, it calls null_free_zoned_dev(dev) to free dev->zones via kvfree(dev->zones) in out_cleanup_zone branch and returns err. Then null_init accept the err code and then calls null_free_dev(dev). But in null_free_dev(dev), dev->zones is freed again by null_free_zoned_dev(). My patch set dev->zones to NULL in null_free_zoned_dev() after kvfree(dev->zones) is called, to avoid the double free. Fixes: 2984c8684f962 ("nullb: factor disk parameters") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Link: https://lore.kernel.org/r/20210426143229.7374-1-lyl2019@mail.ustc.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-12null_blk: add option for managing virtual boundaryMax Gurtovoy2-1/+12
This will enable changing the virtual boundary of null blk devices. For now, null blk devices didn't have any restriction on the scatter/gather elements received from the block layer. Add a module parameter and a configfs option that will control the virtual boundary. This will enable testing the efficiency of the block layer bounce buffer in case a suitable application will send discontiguous IO to the given device. Initial testing with patched FIO showed the following results (64 jobs, 128 iodepth, 1 nullb device): IO size READ (virt=false) READ (virt=true) Write (virt=false) Write (virt=true) ---------- ------------------- ----------------- ------------------- ------------------- 1k 10.7M 8482k 10.8M 8471k 2k 10.4M 8266k 10.4M 8271k 4k 10.4M 8274k 10.3M 8226k 8k 10.2M 8131k 9800k 7933k 16k 9567k 7764k 8081k 6828k 32k 8865k 7309k 5570k 5153k 64k 7695k 6586k 2682k 2617k 128k 5346k 5489k 1320k 1296k Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210412095523.278632-1-mgurtovoy@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01null_blk: fix command timeout completion handlingDamien Le Moal2-5/+22
Memory backed or zoned null block devices may generate actual request timeout errors due to the submission path being blocked on memory allocation or zone locking. Unlike fake timeouts or injected timeouts, the request submission path will call blk_mq_complete_request() or blk_mq_end_request() for these real timeout errors, causing a double completion and use after free situation as the block layer timeout handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in blk_mq_check_expired(). This problem often triggers a NULL pointer dereference such as: BUG: kernel NULL pointer dereference, address: 0000000000000050 RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20 ... Call Trace: dd_finish_request+0x56/0x80 blk_mq_free_request+0x37/0x130 null_handle_cmd+0xbf/0x250 [null_blk] ? null_queue_rq+0x67/0xd0 [null_blk] blk_mq_dispatch_rq_list+0x122/0x850 __blk_mq_do_dispatch_sched+0xbb/0x2c0 __blk_mq_sched_dispatch_requests+0x13d/0x190 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x49/0x90 process_one_work+0x26c/0x580 worker_thread+0x55/0x3c0 ? process_one_work+0x580/0x580 kthread+0x134/0x150 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x1f/0x30 This problem very often triggers when running the full btrfs xfstests on a memory-backed zoned null block device in a VM with limited amount of memory. Avoid this by executing blk_mq_complete_request() in null_timeout_rq() only for commands that are marked for a fake timeout completion using the fake_timeout boolean in struct null_cmd. For timeout errors injected through debugfs, the timeout handler will execute blk_mq_complete_request()i as before. This is safe as the submission path does not execute complete requests in this case. In null_timeout_rq(), also make sure to set the command error field to BLK_STS_TIMEOUT and to propagate this error through to the request completion. Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210331225244.126426-1-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-21Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds2-5/+5
Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
2021-02-10nullb: use blk_queue_set_zoned() to setup zoned devicesDamien Le Moal1-4/+4
Use blk_queue_set_zoned() to set a nullb device zone model instead of directly assigning the device queue zoned limit. This initialization of the devicve zoned model as well as the setup of the queue flag QUEUE_FLAG_ZONE_RESETALL and of the device queue elevator feature are moved from null_init_zoned_dev() to null_register_zoned_dev() so that the initialization of the queue limits is done when the gendisk of the nullb device is available. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-29null_blk: cleanup zoned mode initializationDamien Le Moal1-7/+9
To avoid potential compilation problems, replaced the badly written MB_TO_SECTS() macro (missing parenthesis around the argument use) with the inline function mb_to_sects(). And while at it, simplify the calculation of the total number of zones of the device using the round_up() macro. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>