summaryrefslogtreecommitdiff
path: root/fs/btrfs/compression.c
AgeCommit message (Collapse)AuthorFilesLines
2022-05-16btrfs: derive compression type from extent map during readsGoldwyn Rodrigues1-5/+5
Derive the compression type from extent map as opposed to the bio flags passed. This makes it more precise and not reliant on function parameters. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16btrfs: do not return errors from btrfs_submit_compressed_readChristoph Hellwig1-6/+5
btrfs_submit_compressed_read already calls ->bi_end_io on error and the caller must ignore the return value, so remove it. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16btrfs: do not pass compressed_bio to submit_compressed_bio()Goldwyn Rodrigues1-3/+2
Parameter struct compressed_bio is not used by the function submit_compressed_bio(). Remove it. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16btrfs: factor out allocating an array of pagesSweet Tea Dorminy1-21/+15
Several functions currently populate an array of page pointers one allocated page at a time. Factor out the common code so as to allow improvements to all of the sites at once. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06btrfs: fix btrfs_submit_compressed_write cgroup attributionDennis Zhou1-0/+8
This restores the logic from commit 46bcff2bfc5e ("btrfs: fix compressed write bio blkcg attribution") which added cgroup attribution to btrfs writeback. It also adds back the REQ_CGROUP_PUNT flag for these ios. Fixes: 91507240482e ("btrfs: determine stripe boundary at bio allocation time in btrfs_submit_compressed_write") CC: stable@vger.kernel.org # 5.16+ Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: do not double complete bio on errors during compressed readsJosef Bacik1-6/+14
I hit some weird panics while fixing up the error handling from btrfs_lookup_bio_sums(). Turns out the compression path will complete the bio we use if we set up any of the compression bios and then return an error, and then btrfs_submit_data_bio() will also call bio_endio() on the bio. Fix this by making btrfs_submit_compressed_read() responsible for calling bio_endio() on the bio if there are any errors. Currently it was only doing it if we created the compression bios, otherwise it was depending on btrfs_submit_data_bio() to do the right thing. This creates the above problem, so fix up btrfs_submit_compressed_read() to always call bio_endio() in case of an error, and then simply return from btrfs_submit_data_bio() if we had to call btrfs_submit_compressed_read(). Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: track compressed bio errors as blk_status_tJosef Bacik1-11/+13
Right now we just have a binary "errors" flag, so any error we get on the compressed bio's gets translated to EIO. This isn't necessarily a bad thing, but if we get an ENOMEM it may be nice to know that's what happened instead of an EIO. Track our errors as a blk_status_t, and do the appropriate setting of the errors accordingly. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: remove the bio argument from finish_compressed_bio_readJosef Bacik1-5/+3
This bio is usually one of the compressed bio's, and we don't actually need it in this function, so remove the argument and stop passing it around. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: check correct bio in finish_compressed_bio_readJosef Bacik1-1/+1
Commit c09abff87f90 ("btrfs: cloned bios must not be iterated by bio_for_each_segment_all") added ASSERT()'s to make sure we weren't calling bio_for_each_segment_all() on a RAID5/6 bio. However it was checking the bio that the compression code passed in, not the cb->orig_bio that we actually iterate over, so adjust this ASSERT() to check the correct bio. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: add BTRFS_IOC_ENCODED_WRITEOmar Sandoval1-2/+5
The implementation resembles direct I/O: we have to flush any ordered extents, invalidate the page cache, and do the io tree/delalloc/extent map/ordered extent dance. From there, we can reuse the compression code with a minor modification to distinguish the write from writeback. This also creates inline extents when possible. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio()Omar Sandoval1-1/+1
btrfs_csum_one_bio() loops over each filesystem block in the bio while keeping a cursor of its current logical position in the file in order to look up the ordered extent to add the checksums to. However, this doesn't make much sense for compressed extents, as a sector on disk does not correspond to a sector of decompressed file data. It happens to work because: 1) the compressed bio always covers one ordered extent 2) the size of the bio is always less than the size of the ordered extent However, the second point will not always be true for encoded writes. Let's add a boolean parameter to btrfs_csum_one_bio() to indicate that it can assume that the bio only covers one ordered extent. Since we're already changing the signature, let's get rid of the contig parameter and make it implied by the offset parameter, similar to the change we recently made to btrfs_lookup_bio_sums(). Additionally, let's rename nr_sectors to blockcount to make it clear that it's the number of filesystem blocks, not the number of 512-byte sectors. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-07btrfs: remove unnecessary parameter type from compression_decompress_bioSu Yue1-4/+4
btrfs_decompress_bio, the only caller of compression_decompress_bio gets type from @cb and passes it to compression_decompress_bio. However, compression_decompress_bio can get compression type directly from @cb. So remove the parameter and access it through @cb. No functional change. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Su Yue <l@damenly.su> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: set BTRFS_FS_STATE_NO_CSUMS if we fail to load the csum rootJosef Bacik1-1/+2
We have a few places where we skip doing csums if we mounted with one of the rescue options that ignores bad csum roots. In the future when there are multiple csum roots it'll be costly to check and see if there are any missing csum roots, so simply add a flag to indicate the fs should skip loading csums in case of errors. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-11-01Merge tag 'for-5.16-tag' of ↵Linus Torvalds1-277/+404
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The updates this time are more under the hood and enhancing existing features (subpage with compression and zoned namespaces). Performance related: - misc small inode logging improvements (+3% throughput, -11% latency on sample dbench workload) - more efficient directory logging: bulk item insertion, less tree searches and locking - speed up bulk insertion of items into a b-tree, which is used when logging directories, when running delayed items for directories (fsync and transaction commits) and when running the slow path (full sync) of an fsync (bulk creation run time -4%, deletion -12%) Core: - continued subpage support - make defragmentation work - make compression write work - zoned mode - support ZNS (zoned namespaces), zone capacity is number of usable blocks in each zone - add dedicated block group (zoned) for relocation, to prevent out of order writes in some cases - greedy block group reclaim, pick the ones with least usable space first - preparatory work for send protocol updates - error handling improvements - cleanups and refactoring Fixes: - lockdep warnings - in show_devname callback, on seeding device - device delete on loop device due to conversions to workqueues - fix deadlock between chunk allocation and chunk btree modifications - fix tracking of missing device count and status" * tag 'for-5.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (140 commits) btrfs: remove root argument from check_item_in_log() btrfs: remove root argument from add_link() btrfs: remove root argument from btrfs_unlink_inode() btrfs: remove root argument from drop_one_dir_item() btrfs: clear MISSING device status bit in btrfs_close_one_device btrfs: call btrfs_check_rw_degradable only if there is a missing device btrfs: send: prepare for v2 protocol btrfs: fix comment about sector sizes supported in 64K systems btrfs: update device path inode time instead of bd_inode fs: export an inode_update_time helper btrfs: fix deadlock when defragging transparent huge pages btrfs: sysfs: convert scnprintf and snprintf to sysfs_emit btrfs: make btrfs_super_block size match BTRFS_SUPER_INFO_SIZE btrfs: update comments for chunk allocation -ENOSPC cases btrfs: fix deadlock between chunk allocation and chunk btree modifications btrfs: zoned: use greedy gc for auto reclaim btrfs: check-integrity: stop storing the block device name in btrfsic_dev_state btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls btrfs: add a btrfs_get_dev_args_from_path helper btrfs: handle device lookup with btrfs_dev_lookup_args ...
2021-11-01Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull block updates from Jens Axboe: - mq-deadline accounting improvements (Bart) - blk-wbt timer fix (Andrea) - Untangle the block layer includes (Christoph) - Rework the poll support to be bio based, which will enable adding support for polling for bio based drivers (Christoph) - Block layer core support for multi-actuator drives (Damien) - blk-crypto improvements (Eric) - Batched tag allocation support (me) - Request completion batching support (me) - Plugging improvements (me) - Shared tag set improvements (John) - Concurrent queue quiesce support (Ming) - Cache bdev in ->private_data for block devices (Pavel) - bdev dio improvements (Pavel) - Block device invalidation and block size improvements (Xie) - Various cleanups, fixes, and improvements (Christoph, Jackie, Masahira, Tejun, Yu, Pavel, Zheng, me) * tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits) blk-mq-debugfs: Show active requests per queue for shared tags block: improve readability of blk_mq_end_request_batch() virtio-blk: Use blk_validate_block_size() to validate block size loop: Use blk_validate_block_size() to validate block size nbd: Use blk_validate_block_size() to validate block size block: Add a helper to validate the block size block: re-flow blk_mq_rq_ctx_init() block: prefetch request to be initialized block: pass in blk_mq_tags to blk_mq_rq_ctx_init() block: add rq_flags to struct blk_mq_alloc_data block: add async version of bio_set_polled block: kill DIO_MULTI_BIO block: kill unused polling bits in __blkdev_direct_IO() block: avoid extra iter advance with async iocb block: Add independent access ranges support blk-mq: don't issue request directly in case that current is to be blocked sbitmap: silence data race warning blk-cgroup: synchronize blkg creation against policy deactivation block: refactor bio_iov_bvec_set() block: add single bio async direct IO helper ...
2021-10-27Revert "btrfs: compression: drop kmap/kunmap from generic helpers"David Sterba1-1/+2
This reverts commit 4c2bf276b56d8d27ddbafcdf056ef3fc60ae50b0. The kmaps in compression code are still needed and cause crashes on 32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004 with enabled LZO or ZSTD compression. Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839 Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: subpage: make end_compressed_bio_writeback() compatibleQu Wenruo1-1/+3
In end_compressed_writeback() we just clear the full page writeback. For subpage case, if there are two delalloc ranges in the same page, the 2nd range will trigger a BUG_ON() as the page writeback is already cleared by previous range. Fix it by using btrfs_page_clamp_clear_writeback() helper. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: subpage: make btrfs_submit_compressed_write() compatibleQu Wenruo1-1/+2
There is a WARN_ON() checking if @start is aligned to PAGE_SIZE, not sectorsize, which will cause false alert for subpage. Fix it to check against sectorsize. Furthermore: - Use ASSERT() to do the check So that in the future we may skip the check for production build - Also check alignment for @len Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: determine stripe boundary at bio allocation time in ↵Qu Wenruo1-82/+59
btrfs_submit_compressed_write Currently btrfs_submit_compressed_write() will check btrfs_bio_fits_in_stripe() each time a new page is going to be added. Even if compressed extent is small, we don't really need to do that for every page. Align the behavior to extent_io.c, by determining the stripe boundary when allocating a bio. Unlike extent_io.c, in compressed.c we don't need to bother things like different bio flags, thus no need to re-use bio_ctrl. Here we just manually introduce new local variable, next_stripe_start, and use that value returned from alloc_compressed_bio() to calculate the stripe boundary. Then each time we add some page range into the bio, we check if we reached the boundary. And if reached, submit it. Also, since we have @cur_disk_bytenr to determine whether we're the last bio, we don't need a explicit last_bio: tag for error handling any more. And since we use @cur_disk_bytenr to wait, there is no need for pending_bios, also remove it to save some memory of compressed_bio. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: determine stripe boundary at bio allocation time in ↵Qu Wenruo1-71/+97
btrfs_submit_compressed_read Currently btrfs_submit_compressed_read() will check btrfs_bio_fits_in_stripe() each time a new page is going to be added. Even if compressed extent is small, we don't really need to do that for every page. This patch will align the behavior to extent_io.c, by determining the stripe boundary when allocating a bio. Unlike extent_io.c, in compressed.c we don't need to bother things like different bio flags, thus no need to re-use bio_ctrl. Here we just manually introduce new local variable, next_stripe_start, and teach alloc_compressed_bio() to calculate the stripe boundary. Then each time we add some page range into the bio, we check if we reached the boundary. And if reached, submit it. Also, since we have @cur_disk_byte to determine whether we're the last bio, we don't need a explicit last_bio: tag for error handling any more. And we can use @cur_disk_byte to track which range has been added to bio, we can also use @cur_disk_byte to calculate the wait condition, no need for @pending_bios. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: introduce alloc_compressed_bio() for compressionQu Wenruo1-32/+58
Just aggregate the bio allocation code into one helper, so that we can replace 4 call sites. There is one special note for zoned write. Currently btrfs_submit_compressed_write() will only allocate the first bio using ZONE_APPEND. If we have to submit current bio due to stripe boundary, the new bio allocated will not use ZONE_APPEND. In theory this should be a bug, but considering zoned mode currently only support SINGLE profile, which doesn't have any stripe boundary limit, it should never be a problem and we have assertions in place. This function will provide a good entrance for any work which needs to be done at bio allocation time. Like determining the stripe boundary. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: introduce submit_compressed_bio() for compressionQu Wenruo1-26/+19
The new helper, submit_compressed_bio(), will aggregate the following work: - Increase compressed_bio::pending_bios - Remap the endio function - Map and submit the bio This slightly reorders calls to btrfs_csum_one_bio or btrfs_lookup_bio_sums but but none of them does anything regarding IO submission so this is effectively no change. We mainly care about order of - atomic_inc - btrfs_bio_wq_end_io - btrfs_map_bio Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: handle errors properly inside btrfs_submit_compressed_write()Qu Wenruo1-36/+62
Just like btrfs_submit_compressed_read(), there are quite some BUG_ON()s inside btrfs_submit_compressed_write() for the bio submission path. Fix them using the same method: - For last bio, just endio the bio As in that case, one of the endio function of all these submitted bio will be able to free the compressed_bio - For half-submitted bio, wait and finish the compressed_bio manually In this case, as long as all other bio finish, we're the only one referring the compressed bio, and can manually finish it. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: handle errors properly inside btrfs_submit_compressed_read()Qu Wenruo1-50/+83
There are quite some BUG_ON()s inside btrfs_submit_compressed_read(), namely all errors inside the for() loop relies on BUG_ON() to handle -ENOMEM. Handle these errors properly by: - Wait for submitted bios to finish first Using wake_var_event() APIs to wait without introducing extra memory overhead inside compressed_bio. This allows us to wait for any submitted bio to finish, while still keeps the compressed_bio from being freed. - Introduce finish_compressed_bio_read() to finish the compressed_bio - Properly end the bio and finish compressed_bio when error happens Now in btrfs_submit_compressed_read() even when the bio submission failed, we can properly handle the error without triggering BUG_ON(). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: subpage: add bitmap for PageChecked flagQu Wenruo1-2/+8
Although in btrfs we have very limited usage of PageChecked flag, it's still some page flag not yet subpage compatible. Fix it by introducing btrfs_subpage::checked_offset to do the convert. For most call sites, especially for free-space cache, COW fixup and btrfs_invalidatepage(), they all work in full page mode anyway. For other call sites, they work as subpage compatible mode. Some call sites need extra modification: - btrfs_drop_pages() Needs extra parameter to get the real range we need to clear checked flag. Also since btrfs_drop_pages() will accept pages beyond the dirtied range, update btrfs_subpage_clamp_range() to handle such case by setting @len to 0 if the page is beyond target range. - btrfs_invalidatepage() We need to call subpage helper before calling __btrfs_releasepage(), or it will trigger ASSERT() as page->private will be cleared. - btrfs_verify_data_csum() In theory we don't need the io_bio->csum check anymore, but it's won't hurt. Just change the comment. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: introduce compressed_bio::pending_sectors to trace compressed bioQu Wenruo1-33/+42
For btrfs_submit_compressed_read() and btrfs_submit_compressed_write(), we have a pretty weird dance around compressed_bio::pending_bios: btrfs_submit_compressed_read/write() { cb = kmalloc() refcount_set(&cb->pending_bios, 0); bio = btrfs_alloc_bio(); /* NOTE here, we haven't yet submitted any bio */ refcount_set(&cb->pending_bios, 1); for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) { if (submit) { /* Here we submit bio, but we always have one * extra pending_bios */ refcount_inc(&cb->pending_bios); ret = btrfs_map_bio(); } } /* Submit the last bio */ ret = btrfs_map_bio(); } There are two reasons why we do this: - compressed_bio::pending_bios is a refcount Thus if it's reduced to 0, it can not be increased again. - To ensure the compressed_bio is not freed by some submitted bios If the submitted bio is finished before the next bio submitted, we can free the compressed_bio completely. But the above code is sometimes confusing, and we can do it better by introducing a new member, compressed_bio::pending_sectors. Now we use compressed_bio::pending_sectors to indicate whether we have any pending sectors under IO or not yet submitted. If pending_sectors == 0, we're definitely the last bio of compressed_bio, and is OK to release the compressed bio. Now the workflow looks like this: btrfs_submit_compressed_read/write() { cb = kmalloc() atomic_set(&cb->pending_bios, 0); refcount_set(&cb->pending_sectors, compressed_len >> sectorsize_bits); bio = btrfs_alloc_bio(); for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) { if (submit) { refcount_inc(&cb->pending_bios); ret = btrfs_map_bio(); } } /* Submit the last bio */ refcount_inc(&cb->pending_bios); ret = btrfs_map_bio(); } For now we still need pending_bios for later error handling, but will remove pending_bios eventually after properly handling the errors. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: subpage: make add_ra_bio_pages() compatibleQu Wenruo1-32/+58
[BUG] If we remove the subpage limitation in add_ra_bio_pages(), then read a compressed extent which has part of its range in next page, like the following inode layout: 0 32K 64K 96K 128K |<--------------|-------------->| Btrfs will trigger ASSERT() in endio function: assertion failed: atomic_read(&subpage->readers) >= nbits ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3431! Internal error: Oops - BUG: 0 [#1] SMP Workqueue: btrfs-endio btrfs_work_helper [btrfs] Call trace: assertfail.constprop.0+0x28/0x2c [btrfs] btrfs_subpage_end_reader+0x148/0x14c [btrfs] end_page_read+0x8c/0x100 [btrfs] end_bio_extent_readpage+0x320/0x6b0 [btrfs] bio_endio+0x15c/0x1dc end_workqueue_fn+0x44/0x64 [btrfs] btrfs_work_helper+0x74/0x250 [btrfs] process_one_work+0x1d4/0x47c worker_thread+0x180/0x400 kthread+0x11c/0x120 ret_from_fork+0x10/0x30 ---[ end trace c8b7b552d3bb408c ]--- [CAUSE] When we read the page range [0, 64K), we find it's a compressed extent, and we will try to add extra pages in add_ra_bio_pages() to avoid reading the same compressed extent. But when we add such page into the read bio, it doesn't follow the behavior of btrfs_do_readpage() to properly set subpage::readers. This means, for page [64K, 128K), its subpage::readers is still 0. And when endio is executed on both pages, since page [64K, 128K) has 0 subpage::readers, it triggers above ASSERT() [FIX] Function add_ra_bio_pages() is far from subpage compatible, it always assume PAGE_SIZE == sectorsize, thus when it skip to next range it always just skip PAGE_SIZE. Make it subpage compatible by: - Skip to next page properly when needed If we find there is already a page cache, we need to skip to next page. For that case, we shouldn't just skip PAGE_SIZE bytes, but use @pg_index to calculate the next bytenr and continue. - Only add the page range covered by current extent map We need to calculate which range is covered by current extent map and only add that part into the read bio. - Update subpage::readers before submitting the bio - Use proper cursor other than confusing @last_offset - Calculate the missed threshold based on sector size It's no longer using missed pages, as for 64K page size, we have at most 3 pages to skip. (If aligned only 2 pages) - Add ASSERT() to make sure our bytenr is always aligned - Add comment for the function Add a special note for subpage case, as the function won't really work well for subpage cases. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: remove unused parameter nr_pages in add_ra_bio_pages()Qu Wenruo1-2/+0
Variable @nr_pages only gets increased but never used. Remove it. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: rename struct btrfs_io_bio to btrfs_bioQu Wenruo1-8/+8
Previously we had "struct btrfs_bio", which records IO context for mirrored IO and RAID56, and "strcut btrfs_io_bio", which records extra btrfs specific info for logical bytenr bio. With "btrfs_bio" renamed to "btrfs_io_context", we are safe to rename "btrfs_io_bio" to "btrfs_bio" which is a more suitable name now. The struct btrfs_bio changes meaning by this commit. There was a suggested name like btrfs_logical_bio but it's a bit long and we'd prefer to use a shorter name. This could be a concern for backports to older kernels where the different meaning could possibly cause confusion or bugs. Comparing the new and old structures, there's no overlap among the struct members so a build would break in case of incorrect backport. We haven't had many backports to bio code anyway so this is more of a theoretical cause of bugs and a matter of precaution but we'll need to keep the semantic change in mind. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: remove btrfs_bio_alloc() helperQu Wenruo1-4/+8
The helper btrfs_bio_alloc() is almost the same as btrfs_io_bio_alloc(), except it's allocating using BIO_MAX_VECS as @nr_iovecs, and initializes bio->bi_iter.bi_sector. However the naming itself is not using "btrfs_io_bio" to indicate its parameter is "strcut btrfs_io_bio" and can be easily confused with "struct btrfs_bio". Considering assigned bio->bi_iter.bi_sector is such a simple work and there are already tons of call sites doing that manually, there is no need to do that in a helper. Remove btrfs_bio_alloc() helper, and enhance btrfs_io_bio_alloc() function to provide a fail-safe value for its @nr_iovecs. And then replace all btrfs_bio_alloc() callers with btrfs_io_bio_alloc(). Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-18mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h>Christoph Hellwig1-0/+1
There is no need to pull blk-cgroup.h and thus blkdev.h in here, so break the include chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-23btrfs: rework btrfs_decompress_buf2page()Qu Wenruo1-79/+65
There are several bugs inside the function btrfs_decompress_buf2page() - @start_byte doesn't take bvec.bv_offset into consideration Thus it can't handle case where the target range is not page aligned. - Too many helper variables There are tons of helper variables, @buf_offset, @current_buf_start, @start_byte, @prev_start_byte, @working_bytes, @bytes. This hurts anyone who wants to read the function. - No obvious main cursor for the iteartion A new problem caused by previous problem. - Comments for parameter list makes no sense Like @buf_start is the offset to @buf, or offset inside the full decompressed extent? (Spoiler alert, the later case) And @total_out acts more like @buf_start + @size_of_buf. The worst is @disk_start. The real meaning of it is the file offset of the full decompressed extent. This patch will rework the whole function by: - Add a proper comment with ASCII art to explain the parameter list - Rework parameter list The old @buf_start is renamed to @decompressed, to show how many bytes are already decompressed inside the full decompressed extent. The old @total_out is replaced by @buf_len, which is the decompressed data size. For old @disk_start and @bio, just pass @compressed_bio in. - Use single main cursor The main cursor will be @cur_file_offset, to show what's the current file offset. Other helper variables will be declared inside the main loop, and only minimal amount of helper variables: * offset_inside_decompressed_buf: The only real helper * copy_start_file_offset: File offset we start memcpy * bvec_file_offset: File offset of current bvec Even with all these extensive comments, the final function is still smaller than the original function, which is definitely a win. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: grab correct extent map for subpage compressed extent readQu Wenruo1-3/+6
[BUG] When subpage compressed read write support is enabled, btrfs/038 always fails with EIO. A simplified script can easily trigger the problem: mkfs.btrfs -f -s 4k $dev mount $dev $mnt -o compress=lzo xfs_io -f -c "truncate 118811" $mnt/foo xfs_io -c "pwrite -S 0x0d -b 39987 92267 39987" $mnt/foo > /dev/null sync btrfs subvolume snapshot -r $mnt $mnt/mysnap1 xfs_io -c "pwrite -S 0x3e -b 80000 200000 80000" $mnt/foo > /dev/null sync xfs_io -c "pwrite -S 0xdc -b 10000 250000 10000" $mnt/foo > /dev/null xfs_io -c "pwrite -S 0xff -b 10000 300000 10000" $mnt/foo > /dev/null sync btrfs subvolume snapshot -r $mnt $mnt/mysnap2 cat $mnt/mysnap2/foo # Above cat will fail due to EIO [CAUSE] The problem is in btrfs_submit_compressed_read(). When it tries to grab the extent map of the read range, it uses the following call: em = lookup_extent_mapping(em_tree, page_offset(bio_first_page_all(bio)), fs_info->sectorsize); The problem is in the page_offset(bio_first_page_all(bio)) part. The offending inode has the following file extent layout item 10 key (257 EXTENT_DATA 131072) itemoff 15639 itemsize 53 generation 8 type 1 (regular) extent data disk byte 13680640 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 11 key (257 EXTENT_DATA 135168) itemoff 15586 itemsize 53 generation 8 type 1 (regular) extent data disk byte 0 nr 0 item 12 key (257 EXTENT_DATA 196608) itemoff 15533 itemsize 53 generation 8 type 1 (regular) extent data disk byte 13676544 nr 4096 extent data offset 0 nr 53248 ram 86016 extent compression 2 (lzo) And the bio passed in has the following parameters: page_offset(bio_first_page_all(bio)) = 131072 bio_first_bvec_all(bio)->bv_offset = 65536 If we use page_offset(bio_first_page_all(bio) without adding bv_offset, we will get an extent map for file offset 131072, not 196608. This means we read uncompressed data from disk, and later decompression will definitely fail. [FIX] Take bv_offset into consideration when trying to grab an extent map. And add an ASSERT() to ensure we're really getting a compressed extent. Thankfully this won't affect anything but subpage, thus we only need to ensure this patch get merged before we enabled basic subpage support. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: disable compressed readahead for subpageQu Wenruo1-0/+10
For current subpage support, we only support 64K page size with 4K sector size. This makes compressed readahead less effective, as maximum compressed extent size is only 128K, 2x the page size. On the other hand, the function add_ra_bio_pages() is still assuming sectorsize == PAGE_SIZE, and code change may affect 4K page size systems. So for now, let's disable subpage compressed readahead for now. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: compression: drop kmap/kunmap from generic helpersDavid Sterba1-2/+1
The pages in compressed_pages are not from highmem anymore so we can drop the mapping for checksum calculation and inline extent. Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23btrfs: drop from __GFP_HIGHMEM all allocationsDavid Sterba1-2/+1
The highmem flag is used for allocating pages for compression and for raid56 pages. The high memory makes sense on 32bit systems but is not without problems. On 64bit system's it's just another layer of wrappers. The time the pages are allocated for compression or raid56 is relatively short (about a transaction commit), so the pages are not blocked indefinitely. As the number of pages depends on the amount of data being written/read, there's a theoretical problem. A fast device on a 32bit system could use most of the low memory pool, while with the highmem allocation that would not happen. This was possibly the original idea long time ago, but nowadays we optimize for 64bit systems. This patch removes all usage of the __GFP_HIGHMEM flag for page allocation, the kmap/kunmap are still in place and will be removed in followup patches. Remaining is masking out the bit in alloc_extent_state and __lookup_free_space_inode, that can safely stay. Signed-off-by: David Sterba <dsterba@suse.com>
2021-07-28btrfs: mark compressed range uptodate only if all bio succeedGoldwyn Rodrigues1-1/+1
In compression write endio sequence, the range which the compressed_bio writes is marked as uptodate if the last bio of the compressed (sub)bios is completed successfully. There could be previous bio which may have failed which is recorded in cb->errors. Set the writeback range as uptodate only if cb->errors is zero, as opposed to checking only the last bio's status. Backporting notes: in all versions up to 4.4 the last argument is always replaced by "!cb->errors". CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-22btrfs: remove a stale comment for btrfs_decompress_bio()Qu Wenruo1-14/+0
Since commit 8140dc30a432 ("btrfs: btrfs_decompress_bio() could accept compressed_bio instead"), btrfs_decompress_bio() accepts "struct compressed_bio" other than open-coded parameter list. Thus the comments for the parameter list is no longer needed. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-21btrfs: pass btrfs_inode to btrfs_writepage_endio_finish_ordered()Qu Wenruo1-3/+1
There is a pretty bad abuse of btrfs_writepage_endio_finish_ordered() in end_compressed_bio_write(). It passes compressed pages to btrfs_writepage_endio_finish_ordered(), which is only supposed to accept inode pages. Thankfully the important info here is the inode, so let's pass btrfs_inode directly into btrfs_writepage_endio_finish_ordered(), and make @page parameter optional. By this, end_compressed_bio_write() can happily pass page=NULL while still getting everything done properly. Also, to cooperate with such modification, replace @page parameter for trace_btrfs_writepage_end_io_hook() with btrfs_inode. Although this removes page_index info, the existing start/len should be enough for most usage. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-21btrfs: fix comment about max_out in btrfs_compress_pagesAnand Jain1-3/+0
Commit e5d74902362f ("btrfs: derive maximum output size in the compression implementation") removed @max_out argument in btrfs_compress_pages() but its comment remained, remove it. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-21btrfs: optimize variables size in btrfs_submit_compressed_writeAnand Jain1-3/+3
Patch "btrfs: reduce compressed_bio member's types" reduced some member's size. Function arguments @len, @compressed_len and @nr_pages can be declared as unsigned int. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-21btrfs: optimize variables size in btrfs_submit_compressed_readAnand Jain1-3/+3
Patch "btrfs: reduce compressed_bio member's types" reduced some member's size. Declare the variables @compressed_len, @nr_pages and @pg_index size as an unsigned int in the function btrfs_submit_compressed_read. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-21btrfs: reduce the variable size to fit nr_pagesAnand Jain1-3/+3
Patch "btrfs: reduce compressed_bio member's types" reduced the @nr_pages size to unsigned int, its cascading effects are updated here. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-21btrfs: reduce compressed_bio members' typesDavid Sterba1-1/+1
Several members of compressed_bio are of type that's unnecessarily big for the values that they'd hold: - the size of the uncompressed and compressed data is 128K now, we can keep is as int - same for number of pages - the compress type fits to a byte - the errors is 0/1 The size of the unpatched structure is 80 bytes with several holes. Reordering nr_pages next to the pages the hole after pending_bios is filled and the resulting size is 56 bytes. This keeps the csums array aligned to 8 bytes, which is nice. Further size optimizations may be possible but right now it looks good to me: struct compressed_bio { refcount_t pending_bios; /* 0 4 */ unsigned int nr_pages; /* 4 4 */ struct page * * compressed_pages; /* 8 8 */ struct inode * inode; /* 16 8 */ u64 start; /* 24 8 */ unsigned int len; /* 32 4 */ unsigned int compressed_len; /* 36 4 */ u8 compress_type; /* 40 1 */ u8 errors; /* 41 1 */ /* XXX 2 bytes hole, try to pack */ int mirror_num; /* 44 4 */ struct bio * orig_bio; /* 48 8 */ u8 sums[]; /* 56 0 */ /* size: 56, cachelines: 1, members: 12 */ /* sum members: 54, holes: 1, sum holes: 2 */ /* last cacheline: 56 bytes */ }; Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-21btrfs: zoned: factor out zoned device lookupJohannes Thumshirn1-12/+4
To be able to construct a zone append bio we need to look up the btrfs_device. The code doing the chunk map lookup to get the device is present in btrfs_submit_compressed_write and submit_extent_page. Factor out the lookup calls into a helper and use it in the submission paths. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-03Merge tag 'for-5.13-rc4-tag' of ↵Linus Torvalds1-5/+12
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Error handling improvements, caught by error injection: - handle errors during checksum deletion - set error on mapping when ordered extent io cannot be finished - inode link count fixup in tree-log - missing return value checks for inode updates in tree-log - abort transaction in rename exchange if adding second reference fails Fixes: - fix fsync failure after writes to prealloc extents - fix deadlock when cloning inline extents and low on available space - fix compressed writes that cross stripe boundary" * tag 'for-5.13-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: MAINTAINERS: add btrfs IRC link btrfs: fix deadlock when cloning inline extents and low on available space btrfs: fix fsync failure and transaction abort after writes to prealloc extents btrfs: abort in rename_exchange if we fail to insert the second ref btrfs: check error value from btrfs_update_inode in tree log btrfs: fixup error handling in fixup_inode_link_counts btrfs: mark ordered extent and inode with error if we fail to finish btrfs: return errors from btrfs_del_csums in cleanup_ref_head btrfs: fix error handling in btrfs_del_csums btrfs: fix compressed writes that cross stripe boundary
2021-05-28btrfs: fix compressed writes that cross stripe boundaryQu Wenruo1-5/+12
[BUG] When running btrfs/027 with "-o compress" mount option, it always crashes with the following call trace: BTRFS critical (device dm-4): mapping failed logical 298901504 bio len 12288 len 8192 ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:6651! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 5 PID: 31089 Comm: kworker/u24:10 Tainted: G OE 5.13.0-rc2-custom+ #26 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: btrfs-delalloc btrfs_work_helper [btrfs] RIP: 0010:btrfs_map_bio.cold+0x58/0x5a [btrfs] Call Trace: btrfs_submit_compressed_write+0x2d7/0x470 [btrfs] submit_compressed_extents+0x3b0/0x470 [btrfs] ? mark_held_locks+0x49/0x70 btrfs_work_helper+0x131/0x3e0 [btrfs] process_one_work+0x28f/0x5d0 worker_thread+0x55/0x3c0 ? process_one_work+0x5d0/0x5d0 kthread+0x141/0x160 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 ---[ end trace 63113a3a91f34e68 ]--- [CAUSE] The critical message before the crash means we have a bio at logical bytenr 298901504 length 12288, but only 8192 bytes can fit into one stripe, the remaining 4096 bytes go to another stripe. In btrfs, all bios are properly split to avoid cross stripe boundary, but commit 764c7c9a464b ("btrfs: zoned: fix parallel compressed writes") changed the behavior for compressed writes. Previously if we find our new page can't be fitted into current stripe, ie. "submit == 1" case, we submit current bio without adding current page. submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0); page->mapping = NULL; if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { But after the modification, we will add the page no matter if it crosses stripe boundary, leading to the above crash. submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0); if (pg_index == 0 && use_append) len = bio_add_zone_append_page(bio, page, PAGE_SIZE, 0); else len = bio_add_page(bio, page, PAGE_SIZE, 0); page->mapping = NULL; if (submit || len < PAGE_SIZE) { [FIX] It's no longer possible to revert to the original code style as we have two different bio_add_*_page() calls now. The new fix is to skip the bio_add_*_page() call if @submit is true. Also to avoid @len to be uninitialized, always initialize it to zero. If @submit is true, @len will not be checked. If @submit is not true, @len will be the return value of bio_add_*_page() call. Either way, the behavior is still the same as the old code. Reported-by: Josef Bacik <josef@toxicpanda.com> Fixes: 764c7c9a464b ("btrfs: zoned: fix parallel compressed writes") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-22Merge tag 'for-5.13-rc2-tag' of ↵Linus Torvalds1-4/+38
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes: - fix unaligned compressed writes in zoned mode - fix false positive lockdep warning when cloning inline extent - remove wrong BUG_ON in tree-log error handling" * tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix parallel compressed writes btrfs: zoned: pass start block to btrfs_use_zone_append btrfs: do not BUG_ON in link_to_fixup_dir btrfs: release path before starting transaction when cloning inline extent
2021-05-20btrfs: zoned: fix parallel compressed writesJohannes Thumshirn1-4/+38
When multiple processes write data to the same block group on a compressed zoned filesystem, the underlying device could report I/O errors and data corruption is possible. This happens because on a zoned file system, compressed data writes where sent to the device via a REQ_OP_WRITE instead of a REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel submission it cannot be guaranteed that the data is always submitted aligned to the underlying zone's write pointer. The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a zoned filesystem is non intrusive on a regular file system or when submitting to a conventional zone on a zoned filesystem, as it is guarded by btrfs_use_zone_append. Reported-by: David Sterba <dsterba@suse.com> Fixes: 9d294a685fbc ("btrfs: zoned: enable to mount ZONED incompat flag") CC: stable@vger.kernel.org # 5.12.x: e380adfc213a13: btrfs: zoned: pass start block to btrfs_use_zone_append CC: stable@vger.kernel.org # 5.12.x Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-05btrfs: use memzero_page() instead of open coded kmap patternIra Weiny1-4/+1
There are many places where kmap/memset/kunmap patterns occur. Use the newly lifted memzero_page() to eliminate direct uses of kmap and leverage the new core functions use of kmap_local_page(). The development of this patch was aided by the following coccinelle script: // <smpl> // SPDX-License-Identifier: GPL-2.0-only // Find kmap/memset/kunmap pattern and replace with memset*page calls // // NOTE: Offsets and other expressions may be more complex than what the script // will automatically generate. Therefore a catchall rule is provided to find // the pattern which then must be evaluated by hand. // // Confidence: Low // Copyright: (C) 2021 Intel Corporation // URL: http://coccinelle.lip6.fr/ // Comments: // Options: // // Then the memset pattern // @ memset_rule1 @ expression page, V, L, Off; identifier ptr; type VP; @@ ( -VP ptr = kmap(page); | -ptr = kmap(page); | -VP ptr = kmap_atomic(page); | -ptr = kmap_atomic(page); ) <+... ( -memset(ptr, 0, L); +memzero_page(page, 0, L); | -memset(ptr + Off, 0, L); +memzero_page(page, Off, L); | -memset(ptr, V, L); +memset_page(page, V, 0, L); | -memset(ptr + Off, V, L); +memset_page(page, V, Off, L); ) ...+> ( -kunmap(page); | -kunmap_atomic(ptr); ) // Remove any pointers left unused @ depends on memset_rule1 @ identifier memset_rule1.ptr; type VP, VP1; @@ -VP ptr; ... when != ptr; ? VP1 ptr; // // Catch all // @ memset_rule2 @ expression page; identifier ptr; expression GenTo, GenSize, GenValue; type VP; @@ ( -VP ptr = kmap(page); | -ptr = kmap(page); | -VP ptr = kmap_atomic(page); | -ptr = kmap_atomic(page); ) <+... ( // // Some call sites have complex expressions within the memset/memcpy // The follow are catch alls which need to be evaluated by hand. // -memset(GenTo, 0, GenSize); +memzero_pageExtra(page, GenTo, GenSize); | -memset(GenTo, GenValue, GenSize); +memset_pageExtra(page, GenValue, GenTo, GenSize); ) ...+> ( -kunmap(page); | -kunmap_atomic(ptr); ) // Remove any pointers left unused @ depends on memset_rule2 @ identifier memset_rule2.ptr; type VP, VP1; @@ -VP ptr; ... when != ptr; ? VP1 ptr; // </smpl> Link: https://lkml.kernel.org/r/20210309212137.2610186-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: David Sterba <dsterba@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>