summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)AuthorFilesLines
2024-03-04dm vdo memory-alloc: return VDO_SUCCESS on successMike Snitzer2-16/+16
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-04dm vdo errors: remove unused error codesMatthew Sakai4-51/+8
Also define VDO_SUCCESS in a more central location, and rename error block constants for clarity. Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo memory-alloc: rename vdo_do_allocation to __vdo_do_allocationMike Snitzer1-15/+15
__vdo_do_allocation shouldn't be used outside of memory-alloc.h, so add hidden prefix. Also, tabify the vdo_allocate_extended macro. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-04dm vdo memory-alloc: change from uds_ to vdo_ namespaceMike Snitzer44-455/+453
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-04dm-vdo: change unnamed enums to definesBruce Johnston25-160/+91
Signed-off-by: Bruce Johnston <bjohnsto@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo: remove outdated pointer_map referenceMatthew Sakai1-4/+1
Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo: update module commentsMatthew Sakai1-8/+3
Update outdated comments referring to separate VDO and UDS modules. Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo indexer delta-index: fix typos in commentsMatthew Sakai2-2/+2
Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo: fix various function names referenced in comment blocksJiapeng Chong8-12/+12
No functional modification involved. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo: move indexer files into sub-directoryMike Snitzer36-56/+77
The goal is to assist high-level understanding of which code is conceptually specific to VDO's indexer. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-04dm vdo: remove unnecessary indexer.h includesMatthew Sakai2-2/+0
Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo: clean up scnprintf usageChung Chung1-579/+239
Ignore scnprintf return status since it is not necessary. Change write_* functions type from int to void since we no longer return any result. Also, clean up any code that checks or uses any scnprintf return results. Check uds_allocate return code which was previous ignored, return and log error when uds_allocate failed. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Chung Chung <cchung@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo: include <asm/current.h> to resolve current being undeclaredMike Snitzer3-0/+3
Reported when building on loongarch. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Bruce Johnston <bjohnsto@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-04dm vdo indexer-volume: fix missing mutex_lock in process_entryMike Snitzer1-1/+1
Must mutex_lock after dm_bufio_read, before dm_bufio_read error handling, otherwise process_entry error path will return without volume->read_threads_mutex held. This fixes potential double mutex_unlock. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Susan LeGendre-McGhee <slegendr@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-04dm vdo flush: initialize return to NULL in allocate_flushMike Snitzer1-1/+1
Otherwise, error path could result in allocate_flush's subsequent check for flush being non-NULL leading to false positive. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Susan LeGendre-McGhee <slegendr@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo slab-depot: delete unnecessary check in allocate_componentsDan Carpenter1-3/+0
This is a duplicate check so it can't be true. Delete it. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Susan LeGendre-McGhee <slegendr@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-04dm vdo memory-alloc: simplify allocations_allowed()Mike Snitzer1-4/+2
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Susan LeGendre-McGhee <slegendr@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-04dm vdo: remove internal ticket referencesSusan LeGendre-McGhee11-39/+41
Signed-off-by: Susan LeGendre-McGhee <slegendr@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-02dm-verity: Convert from tasklet to BH workqueueTejun Heo2-24/+45
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This commit converts dm-verity from tasklet to BH workqueue. It backfills tasklet code that was removed with commit 0a9bab391e33 ("dm-crypt, dm-verity: disable tasklets") and tweaks to use BH workqueue (and does some renaming). This is a minimal conversion which doesn't rename the related names including the "try_verify_in_tasklet" option. If this patch is applied, a follow-up patch would be necessary. I couldn't decide whether the option name would need to be updated too. Signed-off-by: Tejun Heo <tj@kernel.org> [snitzer: rename 'use_tasklet' to 'use_bh_wq' and 'in_tasklet' to 'in_bh'] Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-02dm-crypt: Convert from tasklet to BH workqueueTejun Heo1-1/+5
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws. To replace tasklets, BH workqueue support was recently added. A BH workqueue behaves similarly to regular workqueues except that the queued work items are executed in the BH context. This commit converts dm-crypt from tasklet to BH workqueue. It backfills tasklet code that was removed with commit 0a9bab391e33 ("dm-crypt, dm-verity: disable tasklets") and tweaks to use BH workqueue. Like a regular workqueue, a BH workqueue allows freeing the currently executing work item. Converting from tasklet to BH workqueue removes the need for deferring bio_endio() again to a work item, which was buggy anyway. I tested this lightly with "--perf-no_read_workqueue --perf-no_write_workqueue" + some code modifications, but would really -appreciate if someone who knows the code base better could take a look. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/82b964f0-c2c8-a2c6-5b1f-f3145dc2c8e5@redhat.com [snitzer: rebase ontop of commit 0a9bab391e33 reduced this commit's changes] Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-01dm: use queue_limits_setChristoph Hellwig1-15/+12
Use queue_limits_set which validates the limits and takes care of updating the readahead settings instead of directly assigning them to the queue. For that make sure all limits are actually updated before the assignment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20240228225653.947152-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-03-01dm vdo thread-device: rename all methods to reflect vdo-only useMike Snitzer4-29/+27
Also moved vdo_init()'s call to vdo_initialize_thread_device_registry next to other registry initialization. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo thread-registry: rename all methods to reflect vdo-only useMike Snitzer4-21/+21
Otherwise, uds_ prefix is misleading (vdo_ is the new catch-all for code that is used by vdo-only or _both_ vdo and the indexer code). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo thread-utils: cleanup included headersMike Snitzer3-8/+3
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo thread-utils: further cleanup of thread functionsMike Snitzer6-21/+15
Change thread function prefix from "uds_" to "vdo_" and fix vdo_join_threads() to return void. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo thread-utils: remove all uds_*_mutex wrappersMike Snitzer5-157/+96
Just use mutex_init, mutex_lock and mutex_unlock. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo thread-utils: push uds_*_cond interface down to indexerMike Snitzer6-79/+40
Only used by indexer components. Also return void from uds_init_cond(), remove uds_destroy_cond(), and fix up all callers. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo: fold thread-cond-var.c into thread-utilsMike Snitzer4-52/+34
Further cleanup is needed for thread-utils interfaces given many functions should return void or be removed entirely because they amount to obfuscation via wrappers. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo indexer: rename uds.h to indexer.hMike Snitzer22-24/+23
Also remove unnecessary include from funnel-queue.c. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo: rename uds-threads.[ch] to thread-utils.[ch]Mike Snitzer14-16/+15
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo indexer sparse-cache: cleanup threads_barrier codeMike Snitzer1-41/+19
Rename 'barrier' to 'threads_barrier', remove useless uds_destroy_barrier(), return void from remaining methods and clean up uds_make_sparse_cache() accordingly. Also remove uds_ prefix from the 2 remaining threads_barrier functions. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo uds-threads: push 'barrier' down to sparse-cacheMike Snitzer3-72/+68
The sparse-cache is the only user of the 'barrier' data structure, so just move it private to it. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo uds-threads: eliminate uds_*_semaphore interfacesMike Snitzer2-44/+15
The implementation of thread 'barrier' data structure does not require overdone private semaphore wrappers. Also rename the barrier structure's 'mutex' member (a semaphore) to 'lock'. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo: make uds_*_semaphore interface private to uds-threads.cMike Snitzer2-37/+39
Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo block-map: rename page state name from "UDS_FREE" to "FREE"Mike Snitzer1-1/+1
Only used for log message, but no need for "UDS_" prefix. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Susan LeGendre-McGhee <slegendr@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com>
2024-03-01dm vdo volume-index: fix an assert statement in ↵Harshit Mogalapalli1-1/+1
start_restoring_volume_sub_index() Use "==" instead of "=" in ASSERT() statement. Fixes: ef074a31e88e ("dm vdo: implement the volume index") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Susan LeGendre-McGhee <slegendr@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-01md/raid1: factor out helpers to choose the best rdev from read_balance()Yu Kuai1-77/+98
The way that best rdev is chosen: 1) If the read is sequential from one rdev: - if rdev is rotational, use this rdev; - if rdev is non-rotational, use this rdev until total read length exceed disk opt io size; 2) If the read is not sequential: - if there is idle disk, use it, otherwise: - if the array has non-rotational disk, choose the rdev with minimal inflight IO; - if all the underlaying disks are rotational disk, choose the rdev with closest IO; There are no functional changes, just to make code cleaner and prepare for following refactor. Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-12-yukuai1@huaweicloud.com
2024-03-01md/raid1: factor out the code to manage sequential IOYu Kuai1-34/+37
There is no functional change for now, make read_balance() cleaner and prepare to fix problems and refactor the handler of sequential IO. Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-11-yukuai1@huaweicloud.com
2024-03-01md/raid1: factor out choose_bb_rdev() from read_balance()Yu Kuai1-31/+48
read_balance() is hard to understand because there are too many status and branches, and it's overlong. This patch factor out the case to read the rdev with bad blocks from read_balance(), there are no functional changes. Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-10-yukuai1@huaweicloud.com
2024-03-01md/raid1: factor out choose_slow_rdev() from read_balance()Yu Kuai1-17/+52
read_balance() is hard to understand because there are too many status and branches, and it's overlong. This patch factor out the case to read the slow rdev from read_balance(), there are no functional changes. Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-9-yukuai1@huaweicloud.com
2024-03-01md/raid1: factor out read_first_rdev() from read_balance()Yu Kuai1-17/+46
read_balance() is hard to understand because there are too many status and branches, and it's overlong. This patch factor out the case to read the first rdev from read_balance(), there are no functional changes. Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-8-yukuai1@huaweicloud.com
2024-03-01md/raid1-10: factor out a new helper raid1_should_read_first()Yu Kuai3-24/+24
If resync is in progress, read_balance() should find the first usable disk, otherwise, data could be inconsistent after resync is done. raid1 and raid10 implement the same checking, hence factor out the checking to make code cleaner. Noted that raid1 is using 'mddev->recovery_cp', which is updated after all resync IO is done, while raid10 is using 'conf->next_resync', which is inaccurate because raid10 update it before submitting resync IO. Fortunately, raid10 read IO can't concurrent with resync IO, hence there is no problem. And this patch also switch raid10 to use 'mddev->recovery_cp'. Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-7-yukuai1@huaweicloud.com
2024-03-01md/raid1-10: add a helper raid1_check_read_range()Yu Kuai1-0/+49
The checking and handler of bad blocks appear many timers during read_balance() in raid1 and raid10. This helper will be used in later patches to simplify read_balance() a lot. Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-6-yukuai1@huaweicloud.com
2024-03-01md/raid1: fix choose next idle in read_balance()Yu Kuai1-10/+22
Commit 12cee5a8a29e ("md/raid1: prevent merging too large request") add the case choose next idle in read_balance(): read_balance: for_each_rdev if(next_seq_sect == this_sector || dist == 0) -> sequential reads best_disk = disk; if (...) choose_next_idle = 1 continue; for_each_rdev -> iterate next rdev if (pending == 0) best_disk = disk; -> choose the next idle disk break; if (choose_next_idle) -> keep using this rdev if there are no other idle disk contine However, commit 2e52d449bcec ("md/raid1: add failfast handling for reads.") remove the code: - /* If device is idle, use it */ - if (pending == 0) { - best_disk = disk; - break; - } Hence choose next idle will never work now, fix this problem by following: 1) don't set best_disk in this case, read_balance() will choose the best disk after iterating all the disks; 2) add 'pending' so that other idle disk will be chosen; 3) add a new local variable 'sequential_disk' to record the disk, and if there is no other idle disk, 'sequential_disk' will be chosen; Fixes: 2e52d449bcec ("md/raid1: add failfast handling for reads.") Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-5-yukuai1@huaweicloud.com
2024-03-01md/raid1: record nonrot rdevs while adding/removing rdevs to confYu Kuai3-7/+12
For raid1, each read will iterate all the rdevs from conf and check if any rdev is non-rotational, then choose rdev with minimal IO inflight if so, or rdev with closest distance otherwise. Disk nonrot info can be changed through sysfs entry: /sys/block/[disk_name]/queue/rotational However, consider that this should only be used for testing, and user really shouldn't do this in real life. Record the number of non-rotational disks in conf, to avoid checking each rdev in IO fast path and simplify read_balance() a little bit. Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-4-yukuai1@huaweicloud.com
2024-03-01md/raid1: factor out helpers to add rdev to confYu Kuai1-32/+53
There are no functional changes, just make code cleaner and prepare to record disk non-rotational information while adding and removing rdev to conf Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-3-yukuai1@huaweicloud.com
2024-03-01md: add a new helper rdev_has_badblock()Yu Kuai4-72/+44
The current api is_badblock() must pass in 'first_bad' and 'bad_sectors', however, many caller just want to know if there are badblocks or not, and these caller must define two local variable that will never be used. Add a new helper rdev_has_badblock() that will only return if there are badblocks or not, remove unnecessary local variables and replace is_badblock() with the new helper in many places. There are no functional changes, and the new helper will also be used later to refactor read_balance(). Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240229095714.926789-2-yukuai1@huaweicloud.com
2024-02-28md/raid5: fix atomicity violation in raid5_cache_countGui-Dong Han1-6/+8
In raid5_cache_count(): if (conf->max_nr_stripes < conf->min_nr_stripes) return 0; return conf->max_nr_stripes - conf->min_nr_stripes; The current check is ineffective, as the values could change immediately after being checked. In raid5_set_cache_size(): ... conf->min_nr_stripes = size; ... while (size > conf->max_nr_stripes) conf->min_nr_stripes = conf->max_nr_stripes; ... Due to intermediate value updates in raid5_set_cache_size(), concurrent execution of raid5_cache_count() and raid5_set_cache_size() may lead to inconsistent reads of conf->max_nr_stripes and conf->min_nr_stripes. The current checks are ineffective as values could change immediately after being checked, raising the risk of conf->min_nr_stripes exceeding conf->max_nr_stripes and potentially causing an integer overflow. This possible bug is found by an experimental static analysis tool developed by our team. This tool analyzes the locking APIs to extract function pairs that can be concurrently executed, and then analyzes the instructions in the paired functions to identify possible concurrency bugs including data races and atomicity violations. The above possible bug is reported when our tool analyzes the source code of Linux 6.2. To resolve this issue, it is suggested to introduce local variables 'min_stripes' and 'max_stripes' in raid5_cache_count() to ensure the values remain stable throughout the check. Adding locks in raid5_cache_count() fails to resolve atomicity violations, as raid5_set_cache_size() may hold intermediate values of conf->min_nr_stripes while unlocked. With this patch applied, our tool no longer reports the bug, with the kernel configuration allyesconfig for x86_64. Due to the lack of associated hardware, we cannot test the patch in runtime testing, and just verify it according to the code logic. Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <2045gemini@gmail.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240112071017.16313-1-2045gemini@gmail.com Signed-off-by: Song Liu <song@kernel.org>
2024-02-27md/md-bitmap: fix incorrect usage for sb_indexHeming Zhao1-3/+6
Commit d7038f951828 ("md-bitmap: don't use ->index for pages backing the bitmap file") removed page->index from bitmap code, but left wrong code logic for clustered-md. current code never set slot offset for cluster nodes, will sometimes cause crash in clustered env. Call trace (partly): md_bitmap_file_set_bit+0x110/0x1d8 [md_mod] md_bitmap_startwrite+0x13c/0x240 [md_mod] raid1_make_request+0x6b0/0x1c08 [raid1] md_handle_request+0x1dc/0x368 [md_mod] md_submit_bio+0x80/0xf8 [md_mod] __submit_bio+0x178/0x300 submit_bio_noacct_nocheck+0x11c/0x338 submit_bio_noacct+0x134/0x614 submit_bio+0x28/0xdc submit_bh_wbc+0x130/0x1cc submit_bh+0x1c/0x28 Fixes: d7038f951828 ("md-bitmap: don't use ->index for pages backing the bitmap file") Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240223121128.28985-1-heming.zhao@suse.com
2024-02-26md: check mddev->pers before calling md_set_readonly()Li Nan1-11/+11
If 'mddev->pers' is NULL, there is nothing to do in md_set_readonly(). Except for md_ioctl(), the other two callers of md_set_readonly() have already checked 'mddev->pers'. To simplify the code, move the check of 'mddev->pers' to the caller. Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240226031444.3606764-10-linan666@huaweicloud.com