summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-17btrfs: pass a btrfs_bio to btrfs_submit_compressed_readChristoph Hellwig3-10/+10
btrfs_submit_compressed_read expects the bio passed to it to be embedded into a btrfs_bio structure. Pass the btrfs_bio directly to increase type safety and make the code self-documenting. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: pass a btrfs_bio to btrfs_submit_bioChristoph Hellwig5-14/+14
btrfs_submit_bio expects the bio passed to it to be embedded into a btrfs_bio structure. Pass the btrfs_bio directly to increase type safety and make the code self-documenting. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: move zero filling of compressed read bios into common codeChristoph Hellwig4-12/+7
All algorithms have to fill the remainder of the orig_bio with zeroes, so do it in common code. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: cleanup main loop in btrfs_encoded_read_regular_fill_pagesChristoph Hellwig1-28/+23
btrfs_encoded_read_regular_fill_pages has a pretty odd control flow. Unwind it so that there is a single loop over the pages array. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: remove unused members from struct btrfs_encoded_read_privateChristoph Hellwig1-4/+0
The inode and file_offset members in struct btrfs_encoded_read_private are unused, so remove them. Last used in commit 7959bd441176 ("btrfs: remove the start argument to check_data_csum and export") and commit 7609afac6775 ("btrfs: handle checksum validation and repair at the storage layer"). Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: locking: use atomic for DREW lock writersDavid Sterba3-33/+9
The DREW lock uses percpu variable to track lock counters and for that it needs to allocate the structure. In btrfs_read_tree_root() or btrfs_init_fs_root() this may add another error case or requires the NOFS scope protection. One way is to preallocate the structure as was suggested in https://lore.kernel.org/linux-btrfs/20221214021125.28289-1-robbieko@synology.com/ We may avoid the allocation altogether if we don't use the percpu variables but an atomic for the writer counter. This should not make any difference, the DREW lock is used for truncate and NOCOW writes along with other IO operations. The percpu counter for writers has been there since the original commit 8257b2dc3c1a1057 "Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume". The reason could be to avoid hammering the same cacheline from all the readers but then the writers do that anyway. Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: remove redundant clearing of NODISCARDAnand Jain1-1/+0
If no discard mount option is specified, including the NODISCARD option, we make the async discard the default option then we don't have to call the clear_opt again to clear the NODISCARD flag. Though this makes no difference, that the call is redundant has been pointed out several times so we better remove it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: avoid repetitive define BTRFS_FEATURE_INCOMPAT_SUPPAnand Jain1-20/+15
BTRFS_FEATURE_INCOMPAT_SUPP is defined twice, once under CONFIG_BTRFS_DEBUG and once without it, resulting in repetitive code. The reason for this is to add experimental features under CONFIG_BTRFS_DEBUG. To avoid repetitive code, add a common list BTRFS_FEATURE_INCOMPAT_SUPP_STABLE, and append experimental features only under CONFIG_BTRFS_DEBUG. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: scrub: remove root and csum_root arguments from scrub_simple_mirror()Qu Wenruo1-19/+9
We don't need to pass the roots as arguments, reading them from the rb-tree is cheap. Thus there is really not much need to pre-fetch it and pass it all the way from scrub_stripe(). And we already have more than enough arguments in scrub_simple_mirror() and scrub_simple_stripe(), it's better to remove them and only grab those roots in scrub_simple_mirror(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: scrub: remove unused path inside scrub_stripe()Qu Wenruo1-15/+0
The variable @path is no longer passed into any call sites after commit 18d30ab96149 ("btrfs: scrub: use scrub_simple_mirror() to handle RAID56 data stripe scrub"), thus we can remove the variable completely. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: do not use replace target device as an extra mirrorQu Wenruo1-120/+7
[BUG] Currently btrfs can use dev-replace device as an extra mirror for read-repair. But it can lead to NODATASUM corruption in the following case: There is a RAID1 data chunk, and dev-replace is running from dev2 to dev0. |//| = Replaced data X X+1MB X+2MB Dev 2: | | | <- Source dev Dev 0: |///////| | <- Target dev Then a read on dev 2 X+2MB happens. And something wrong happened inside devid 2, causing an -EIO. In that case, read-repair would try the next mirror, and since we can use target device as an extra mirror, we will use that mirror instead. But unfortunately since the read is beyond the current replace cursor, we should not trust it at all, what we get would be just uninitialized garbage. But if this read is for NODATASUM range, then we just trust them and cause data corruption. [CAUSE] We used to have some checks to make sure we only return such extra mirror when the range is before our left cursor. The first commit introducing this behavior is ad6d620e2a57 ("Btrfs: allow repair code to include target disk when searching mirrors"). But later a fix, 22ab04e814f4 ("Btrfs: fix race between device replace and chunk allocation") changed the behavior, to always let btrfs_map_block() include the extra mirror to address a race in dev-replace which can cause missing writes to target device. This means, we lose the tracking of cursor for the extra mirror, thus can lead to above corruption. [FIX] The extra mirror is never a reliable one, at the beginning of dev-replace, the reliability is zero, while only at the end of the replace it's a fully reliable mirror. We either do the complex tracking, or never trust it. IMHO it's much easier to maintain if we don't trust it at all, and the extra mirror can only benefit for a limited period of time (during replace). Thus this patch would completely remove the ability to use target device as an extra mirror. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: open_ctree() error handling cleanupQu Wenruo1-34/+31
Currently open_ctree() still uses two variables for error handling, err and ret. This can be confusing and missing some errors and does not conform to current coding style. This patch will fix the problems by: - Use only ret for error handling - Add proper ret assignment Originally we rely on the default value (-EINVAL) of err to handle errors, but that doesn't really reflects the error. This will change it use the correct error number for the following call sites: * subpage_info allocation * btrfs_free_extra_devids() * btrfs_check_rw_degradable() * cleaner_kthread allocation * transaction_kthread allocation - Add an extra ASSERT() To make sure we error out instead of returning 0. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: cleanup the main loop in btrfs_lookup_bio_sumsChristoph Hellwig1-24/+9
Introduce a bio_offset variable for the current offset into the bio instead of recalculating it over and over. Remove the now only used once search_len and sector_offset variables, and reduce the scope for count and cur_disk_bytenr. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: remove search_file_offset_in_bioChristoph Hellwig1-49/+3
There is no need to search for a file offset in a bio, it is now always provided in bbio->file_offset (set at bio allocation time since 0d495430db8d ("btrfs: set bbio->file_offset in alloc_new_bio")). Just use that with the offset into the bio. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: sink calc_bio_boundaries into its only callerJohannes Thumshirn1-22/+15
Nowadays calc_bio_boundaries() is a relatively simple function that only guarantees the one bio equals to one ordered extent rule for uncompressed Zone Append bios. Sink it into it's only caller alloc_new_bio(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: simplify main loop in submit_extent_pageChristoph Hellwig1-86/+30
bio_add_page always adds either the entire range passed to it or nothing. Based on that btrfs_bio_add_page can only return a length smaller than the passed in one when hitting the ordered extent limit, which can only happen for writes. Given that compressed writes never even use this code path, this means that all the special cases for compressed extent offset handling are dead code. Reflow submit_extent_page to take advantage of this by inlining btrfs_bio_add_page and handling the ordered extent limit by decrementing it for each added range and thus significantly simplifying the loop. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: check for contiguity in submit_extent_pageChristoph Hellwig1-33/+36
Different loop iterations in btrfs_bio_add_page not only have the same contiguity parameters, but also any non-initial operation operates on a fresh bio anyway. Factor out the contiguity check into a new btrfs_bio_is_contig and only call it once in submit_extent_page before descending into the bio_add_page loop. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: simplify the error handling in __extent_writepage_ioChristoph Hellwig1-11/+7
Remove the has_error and saved_ret variables, and just jump to a goto label for error handling from the only place returning an error from the main loop. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: remove the submit_extent_page return valueChristoph Hellwig1-120/+35
submit_extent_page always returns 0 since commit d5e4377d5051 ("btrfs: split zone append bios in btrfs_submit_bio"). Change it to a void return type and remove all the unreachable error handling code in the callers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: remove the compress_type argument to submit_extent_pageChristoph Hellwig1-18/+14
Update the compress_type in the btrfs_bio_ctrl after forcing out the previous bio in btrfs_do_readpage, so that alloc_new_bio can just use the compress_type member in struct btrfs_bio_ctrl instead of passing the same information redundantly as a function argument. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: rename the this_bio_flag variable in btrfs_do_readpageChristoph Hellwig1-5/+5
Rename this_bio_flag to compress_type to match the surrounding code and better document the intent. Also use the proper enum type instead of unsigned long. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: move the compress_type check out of btrfs_bio_add_pageChristoph Hellwig1-9/+6
The compress_type can only change on a per-extent basis. So instead of checking it for every page in btrfs_bio_add_page, do the check once in btrfs_do_readpage, which is the only caller of btrfs_bio_add_page and submit_extent_page that deals with compressed extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: add a wbc pointer to struct btrfs_bio_ctrlChristoph Hellwig1-47/+41
Instead of passing down the wbc pointer the deep call chain, just add it to the btrfs_bio_ctrl structure. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: remove the sync_io flag in struct btrfs_bio_ctrlChristoph Hellwig1-9/+4
The sync_io flag is equivalent to wbc->sync_mode == WB_SYNC_ALL, so just check for that and remove the separate flag. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: store the bio opf in struct btrfs_bio_ctrlChristoph Hellwig1-36/+29
The bio op and flags never change over the life time of a bio_ctrl, so move it in there instead of passing it down the deep call chain all the way down to alloc_new_bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: remove the force_bio_submit to submit_extent_pageChristoph Hellwig1-13/+9
If force_bio_submit, submit_extent_page simply calls submit_one_bio as the first thing. This can just be moved to the only caller that sets force_bio_submit to true. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: don't set force_bio_submit in read_extent_buffer_subpageChristoph Hellwig1-1/+1
When read_extent_buffer_subpage calls submit_extent_page, it does so on a freshly initialized btrfs_bio_ctrl structure that can't have a valid bio to submit. Clear the force_bio_submit parameter to false as there is nothing to submit. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: open code btrfs_bin_search()Anand Jain4-25/+13
btrfs_bin_search() is a simple wrapper that searches for the whole slots by calling btrfs_generic_bin_search() with the starting slot/first_slot preset to 0. This simple wrapper can be open coded as btrfs_bin_search(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: dev-replace: properly follow its read modeQu Wenruo1-40/+112
[BUG] Although dev replace ioctl has a way to specify the mode on whether we should read from the source device, it's not properly followed. # mkfs.btrfs -f -d raid1 -m raid1 $dev1 $dev2 # mount $dev1 $mnt # xfs_io -f -c "pwrite 0 32M" $mnt/file # sync # btrfs replace start -r -f 1 $dev3 $mnt And one extra trace is added to scrub_submit(), showing the detail about the bio: btrfs-11569 [005] ... 37.0270: scrub_submit.part.0: devid=1 logical=22036480 phy=22036480 len=16384 btrfs-11569 [005] ... 37.0273: scrub_submit.part.0: devid=1 logical=30457856 phy=30457856 len=32768 btrfs-11569 [005] ... 37.0274: scrub_submit.part.0: devid=1 logical=30507008 phy=30507008 len=49152 btrfs-11569 [005] ... 37.0274: scrub_submit.part.0: devid=1 logical=30605312 phy=30605312 len=32768 btrfs-11569 [005] ... 37.0275: scrub_submit.part.0: devid=1 logical=30703616 phy=30703616 len=65536 btrfs-11569 [005] ... 37.0281: scrub_submit.part.0: devid=1 logical=298844160 phy=298844160 len=131072 ... btrfs-11569 [005] ... 37.0762: scrub_submit.part.0: devid=1 logical=322961408 phy=322961408 len=131072 btrfs-11569 [005] ... 37.0762: scrub_submit.part.0: devid=1 logical=323092480 phy=323092480 len=131072 One can see that all the reads are submitted to devid 1, even if we have specified "-r" option to avoid reading from the source device. [CAUSE] The dev-replace read mode is only set but not followed by scrub code at all. In fact, only common read path is properly following the read mode, but scrub itself has its own read path, thus not following the mode. [FIX] Here we enhance scrub_find_good_copy() to also follow the read mode. The idea is pretty simple, in the first loop, we avoid the following devices: - Missing devices This is the existing condition - The source device if the replace wants to avoid it. And if above loop found no candidate (e.g. replace a single device), then we discard the 2nd condition, and try again. Since we're here, also enhance the function scrub_find_good_copy() by: - Remove the forward declaration - Makes it return int To indicates errors, e.g. no good mirror found. - Add extra error messages Now with the same trace, "btrfs replace start -r" works as expected: btrfs-1213 [000] ... 991.9059: scrub_submit.part.0: devid=2 logical=22036480 phy=1064960 len=16384 btrfs-1213 [000] ... 991.9062: scrub_submit.part.0: devid=2 logical=30457856 phy=9486336 len=32768 btrfs-1213 [000] ... 991.9063: scrub_submit.part.0: devid=2 logical=30507008 phy=9535488 len=49152 btrfs-1213 [000] ... 991.9064: scrub_submit.part.0: devid=2 logical=30605312 phy=9633792 len=32768 btrfs-1213 [000] ... 991.9065: scrub_submit.part.0: devid=2 logical=30703616 phy=9732096 len=65536 btrfs-1213 [000] ... 991.9073: scrub_submit.part.0: devid=2 logical=298844160 phy=277872640 len=131072 btrfs-1213 [000] ... 991.9075: scrub_submit.part.0: devid=2 logical=298975232 phy=278003712 len=131072 btrfs-1213 [000] ... 991.9078: scrub_submit.part.0: devid=2 logical=299106304 phy=278134784 len=131072 ... btrfs-1213 [000] ... 991.9474: scrub_submit.part.0: devid=2 logical=318504960 phy=297533440 len=131072 btrfs-1213 [000] ... 991.9476: scrub_submit.part.0: devid=2 logical=318636032 phy=297664512 len=131072 btrfs-1213 [000] ... 991.9479: scrub_submit.part.0: devid=2 logical=318767104 phy=297795584 len=131072 Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: fold finish_compressed_bio_write into btrfs_finish_compressed_write_workChristoph Hellwig1-9/+4
Fold finish_compressed_bio_write into its only caller as there is no reason to keep them separate. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: don't clear page->mapping in btrfs_free_compressed_pagesChristoph Hellwig1-6/+2
No one ever set ->mapping on these pages, so don't bother clearing it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: factor out a btrfs_free_compressed_pages helperChristoph Hellwig1-28/+13
Share the code to free the compressed pages and the array to hold them into a common helper. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: factor out a btrfs_add_compressed_bio_pages helperChristoph Hellwig1-63/+41
Factor out a common helper to add the compressed_bio pages to the bio that is shared by the compressed read and write path. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: use the bbio file offset in add_ra_bio_pagesChristoph Hellwig1-8/+1
struct btrfs_bio now has a file_offset field set up by all submitters. Use that value combined with the bio size in add_ra_bio_pages to calculate the last offset in the bio. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: use the bbio file offset in btrfs_submit_compressed_readChristoph Hellwig1-4/+1
struct btrfs_bio now has a file_offset field set up by all submitters. Use that in btrfs_submit_compressed_read instead of recalculating the value. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: remove redundant free_extent_map in btrfs_submit_compressed_readChristoph Hellwig1-2/+0
em can't be non-NULL after the free_extent_map label. Also remove the now pointless clearing of em to NULL after freeing it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: embed a btrfs_bio into struct compressed_bioChristoph Hellwig5-109/+83
Embed a btrfs_bio into struct compressed_bio. This avoids potential (so far theoretical) deadlocks due to nesting of btrfs_bioset allocations for the original read bio and the compressed bio, and avoids an extra memory allocation in the I/O path. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: replace btrfs_io_context::raid_map with a fixed u64 valueQu Wenruo5-83/+78
In btrfs_io_context structure, we have a pointer raid_map, which indicates the logical bytenr for each stripe. But considering we always call sort_parity_stripes(), the result raid_map[] is always sorted, thus raid_map[0] is always the logical bytenr of the full stripe. So why we waste the space and time (for sorting) for raid_map? This patch will replace btrfs_io_context::raid_map with a single u64 number, full_stripe_start, by: - Replace btrfs_io_context::raid_map with full_stripe_start - Replace call sites using raid_map[0] to use full_stripe_start - Replace call sites using raid_map[i] to compare with nr_data_stripes. The benefits are: - Less memory wasted on raid_map It's sizeof(u64) * num_stripes vs sizeof(u64). It'll always save at least one u64, and the benefit grows larger with num_stripes. - No more weird alloc_btrfs_io_context() behavior As there is only one fixed size + one variable length array. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: use an efficient way to represent source of duplicated stripesQu Wenruo4-113/+115
For btrfs dev-replace, we have to duplicate writes to the source device into the target device. For non-RAID56, all writes into the same mapped ranges are sharing the same content, thus they don't really need to bother anything. (E.g. in btrfs_submit_bio() for non-RAID56 range we just submit the same write to all involved devices). But for RAID56, all stripes contain different content, thus we must have a clear mapping of which stripe is duplicated from which original stripe. Currently we use a complex way using tgtdev_map[] array, e.g: num_tgtdevs = 1 tgtdev_map[0] = 0 <- Means stripes[0] is not involved in replace. tgtdev_map[1] = 3 <- Means stripes[1] is involved in replace, and it's duplicated to stripes[3]. tgtdev_map[2] = 0 <- Means stripes[2] is not involved in replace. But this is wasting some space, and ignores one important thing for dev-replace, there is at most one running replace. Thus we can change it to a fixed array to represent the mapping: replace_nr_stripes = 1 replace_stripe_src = 1 <- Means stripes[1] is involved in replace. thus the extra stripe is a copy of stripes[1] By this we can save some space for bioc on RAID56 chunks with many devices. And we get rid of one variable sized array from bioc. Thus the patch involves the following changes: - Replace @num_tgtdevs and @tgtdev_map[] with @replace_nr_stripes and @replace_stripe_src. @num_tgtdevs is just renamed to @replace_nr_stripes. While the mapping is completely changed. - Add extra ASSERT()s for RAID56 code - Only add two more extra stripes for dev-replace cases. As we have an upper limit on how many dev-replace stripes we can have. - Unify the behavior of handle_ops_on_dev_replace() Previously handle_ops_on_dev_replace() go two different paths for WRITE and GET_READ_MIRRORS. Now unify them by always going the WRITE path first (with at most 2 replace stripes), then if we're doing GET_READ_MIRRORS and we have 2 extra stripes, just drop one stripe. - Remove the @real_stripes argument from alloc_btrfs_io_context() As we don't need the old variable length array any more. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: reduce type width of btrfs_io_contextsQu Wenruo2-12/+58
That structure is our ultimate object for all __btrfs_map_block() related functions. We have some hard to understand members, like tgtdev_map, but without any comments. This patch will improve the situation: - Add extra comments for num_stripes, mirror_num, num_tgtdevs and tgtdev_map[] Especially for the last two members, add a dedicated (thus very long) comments for them, with example to explain it. - Shrink those int members to u16. In fact our on-disk format is only using u16 for num_stripes, thus no need to use int at all. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: simplify the bioc argument for handle_ops_on_dev_replace()Qu Wenruo1-4/+2
There is no memory re-allocation for handle_ops_on_dev_replace(), thus we don't need to pass a btrfs_io_context pointer. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: reduce div64 calls by limiting the number of stripes of a chunk to u32Qu Wenruo4-42/+62
There are quite some div64 calls inside btrfs_map_block() and its variants. Such calls are for @stripe_nr, where @stripe_nr is the number of stripes before our logical bytenr inside a chunk. However we can eliminate such div64 calls by just reducing the width of @stripe_nr from 64 to 32. This can be done because our chunk size limit is already 10G, with fixed stripe length 64K. Thus a U32 is definitely enough to contain the number of stripes. With such width reduction, we can get rid of slower div64, and extra warning for certain 32bit arch. This patch would do: - Add a new tree-checker chunk validation on chunk length Make sure no chunk can reach 256G, which can also act as a bitflip checker. - Reduce the width from u64 to u32 for @stripe_nr variables - Replace unnecessary div64 calls with regular modulo and division 32bit division and modulo are much faster than 64bit operations, and we are finally free of the div64 fear at least in those involved functions. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: replace map_lookup->stripe_len by BTRFS_STRIPE_LENQu Wenruo5-60/+66
Currently btrfs doesn't support stripe lengths other than 64KiB. This is already set in the tree-checker. There is really no meaning to record that fixed value in map_lookup for now, and can all be replaced with BTRFS_STRIPE_LEN. Furthermore we can use the fix stripe length to do the following optimization: - Use BTRFS_STRIPE_LEN_SHIFT to replace some 64bit division Now we only need to do a right shift. And the value of BTRFS_STRIPE_LEN itself is already too large for bit shift, thus if we accidentally use BTRFS_STRIPE_LEN to do bit shift, a compiler warning would be triggered. Thus this bit shift optimization would be safe. - Use BTRFS_STRIPE_LEN_MASK to calculate the offset inside a stripe Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: move all btree inode initialization into btrfs_init_btree_inodeChristoph Hellwig1-7/+14
Move the remaining code that deals with initializing the btree inode into btrfs_init_btree_inode instead of splitting it between that helpers and its only caller. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: switch search_file_offset_in_bio to return boolAnand Jain1-4/+4
Function search_file_offset_in_bio() finds the file offset in the file_offset_ret, and we use the return value to indicate if it is successful, so use bool. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: avoid reusing return variable in nested block in btrfs_lookup_bio_sumsAnand Jain1-5/+4
The function btrfs_lookup_bio_sums() and a nested if statement declare ret respectively as blk_status_t and int. There is no need to store the return value of search_file_offset_in_bio() to ret as this is a one-time call. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: open code btrfs_csum_ptrJohannes Thumshirn1-8/+2
Remove btrfs_csum_ptr() and fold it into it's only caller. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: raid56: no need for irqsafe lockingChristoph Hellwig1-28/+22
These days all the operations that take locks in the raid56.c code are run from user context (mostly workqueues). Drop all the irqsafe locking that is not required any more. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: abort the transaction if we get an error during snapshot dropJosef Bacik1-0/+2
We were seeing weird errors when we were testing our btrfs backports before we had the incorrect level check fix. These errors appeared to be improper error handling, but error injection testing uncovered that the errors were a result of corruption that occurred from improper error handling during snapshot delete. With snapshot delete if we encounter any errors during walk_down or walk_up we'll simply return an error, we won't abort the transaction. This is problematic because we will be dropping references for nodes and leaves along the way, and if we fail in the middle we will leave the file system corrupt because we don't know where we left off in the drop. Fix this by making sure we abort if we hit any errors during the walk down or walk up operations, as we have no idea what operations could have been left half done at this point. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-17btrfs: handle errors in walk_down_tree properlyJosef Bacik1-4/+4
We can get errors in walk_down_proc as we try and lookup extent info for the snapshot dropping to act on. However if we get an error we simply return 1 which indicates we're done with walking down, which will lead us to improperly continue with the snapshot drop with the incorrect information. Instead break if we get any error from walk_down_proc or do_walk_down, and handle the case of ret == 1 by returning 0, otherwise return the ret value that we have. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>