summaryrefslogtreecommitdiff
path: root/fs/xfs/scrub
AgeCommit message (Collapse)AuthorFilesLines
2024-01-30xfs: remove conditional building of rt geometry validator functionsDarrick J. Wong2-0/+2
I mistakenly turned off CONFIG_XFS_RT in the Kconfig file for arm64 variant of the djwong-wtf git branch. Unfortunately, it took me a good hour to figure out that RT wasn't built because this is what got printed to dmesg: XFS (sda2): realtime geometry sanity check failed XFS (sda2): Metadata corruption detected at xfs_sb_read_verify+0x170/0x190 [xfs], xfs_sb block 0x0 Whereas I would have expected: XFS (sda2): Not built with CONFIG_XFS_RT XFS (sda2): RT mount failed The root cause of these problems is the conditional compilation of the new functions xfs_validate_rtextents and xfs_compute_rextslog that I introduced in the two commits listed below. The !RT versions of these functions return false and 0, respectively, which causes primary superblock validation to fail, which explains the first message. Move the two functions to other parts of libxfs that are not conditionally defined by CONFIG_XFS_RT and remove the broken stubs so that validation works again. Fixes: e14293803f4e ("xfs: don't allow overly small or large realtime volumes") Fixes: a6a38f309afc ("xfs: make rextslog computation consistent with mkfs") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-29xfs: remove struct xfs_attr_shortformChristoph Hellwig2-7/+6
sparse complains about struct xfs_attr_shortform because it embeds a structure with a variable sized array in a variable sized array. Given that xfs_attr_shortform is not a very useful structure, and the dir2 equivalent has been removed a long time ago, remove it as well. Provide a xfs_attr_sf_firstentry helper that returns the first xfs_attr_sf_entry behind a xfs_attr_sf_hdr to replace the structure dereference. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-29xfs: make if_data a void pointerChristoph Hellwig3-12/+6
The xfs_ifork structure currently has a union of the if_root void pointer and the if_data char pointer. In either case it is an opaque pointer that depends on the fork format. Replace the union with a single if_data void pointer as that is what almost all callers want. Only the symlink NULL termination code in xfs_init_local_fork actually needs a new local variable now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs: remove rt-wrappers from xfs_format.hChristoph Hellwig1-1/+1
xfs_format.h has a bunch odd wrappers for helper functions and mount structure access using RT* prefixes. Replace them with their open coded versions (for those that weren't entirely unused) and remove the wrappers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-22xfs/health: cleanup, remove duplicated includingWang Jinchao1-2/+0
remove the second ones: \#include "xfs_trans_resv.h" \#include "xfs_mount.h" Signed-off-by: Wang Jinchao <wangjinchao@xfusion.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-15xfs: repair quotasDarrick J. Wong7-5/+618
Fix anything that causes the quota verifiers to fail. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: improve dquot iteration for scrubDarrick J. Wong5-9/+312
Upon a closer inspection of the quota record scrubber, I noticed that dqiterate wasn't actually walking all possible dquots for the mapped blocks in the quota file. This is due to xfs_qm_dqget_next skipping all XFS_IS_DQUOT_UNINITIALIZED dquots. For a fsck program, we really want to look at all the dquots, even if all counters and limits in the dquot record are zero. Rewrite the implementation to do this, as well as switching to an iterator paradigm to reduce the number of indirect calls. This enables removal of the old broken dqiterate code from xfs_dquot.c. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: check dquot resource timersDarrick J. Wong1-0/+21
For each dquot resource, ensure either (a) the resource usage is over the soft limit and there is a nonzero timer; or (b) usage is at or under the soft limit and the timer is unset. (a) is redundant with the dquot buffer verifier, but (b) isn't checked anywhere. Found by fuzzing xfs/426 and noticing that diskdq.btimer = add didn't trip any kind of warning for having a timer set even with no limits. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: check the ondisk space mapping behind a dquotDarrick J. Wong1-0/+58
Each xfs_dquot object caches the file offset and daddr of the ondisk block that backs the dquot. Make sure these cached values are the same as the bmapi data, and that the block state is written. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: online repair of realtime bitmapsDarrick J. Wong5-8/+241
Fix all the file metadata surrounding the realtime bitmap file, which includes the rt geometry, file size, forks, and space mappings. The bitmap contents themselves cannot be fixed without rt rmap, so that will come later. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: repair the inode core and forks of a metadata inodeDarrick J. Wong3-4/+168
Add a helper function to repair the core and forks of a metadata inode, so that we can get move onto the task of repairing higher level metadata that lives in an inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: always check the rtbitmap and rtsummary filesDarrick J. Wong1-2/+0
XFS filesystems always have a realtime bitmap and summary file, even if there has never been a realtime volume attached. Always check them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: check rt summary file geometry more thoroughlyDarrick J. Wong1-27/+110
I forgot that the xfs_mount tracks the size and number of levels in the realtime summary file, and that the rt summary file can have more blocks mapped to the data fork than m_rsumsize implies if growfsrt fails. So. Add to the rtsummary scrubber an explicit check that all the summary geometry values are correct, then adjust the rtsummary i_size checks to allow for the growfsrt failure case. Finally, flag post-eof blocks in the summary file. While we're at it, split the extent map checking so that we only call xfs_bmapi_read once per extent instead of once per rtsummary block. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: check rt bitmap file geometry more thoroughlyDarrick J. Wong1-15/+84
I forgot that the superblock tracks the number of blocks that are in the realtime bitmap, and that the rt bitmap file can have more blocks mapped to the data fork than sb_rbmblocks if growfsrt fails. So. Add to the rtbitmap scrubber an explicit check that sb_rextents and sb_rbmblocks are correct, then adjust the rtbitmap i_size checks to allow for the growfsrt failure case. Finally, flag post-eof blocks in the rtbitmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: repair problems in CoW forksDarrick J. Wong6-1/+770
Try to repair errors that we see in file CoW forks so that we don't do stupid things like remap garbage into a file. There's not a lot we can do with the COW fork -- the ondisk metadata record only that the COW staging extents are owned by the refcount btree, which effectively means that we can't reconstruct this incore structure from scratch. Actually, this is even worse -- we can't touch written extents, because those map space that are actively under writeback, and there's not much to do with delalloc reservations. Hence we can only detect crosslinked unwritten extents and fix them by punching out the problematic parts and replacing them with delalloc extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: refactor repair forcing tests into a repair.c helperDarrick J. Wong3-13/+25
There are a couple of conditions that userspace can set to force repairs of metadata. These really belong in the repair code and not open-coded into the check code, so refactor them into a helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: repair inode fork block mapping data structuresDarrick J. Wong7-4/+950
Use the reverse-mapping btree information to rebuild an inode block map. Update the btree bulk loading code as necessary to support inode rooted btrees and fix some bitrot problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: reintroduce reaping of file metadata blocks to xrep_reap_extentsDarrick J. Wong4-4/+160
Back in commit a55e07308831b ("xfs: only allow reaping of per-AG blocks in xrep_reap_extents"), we removed from the reaping code the ability to handle bmbt blocks. At the time, the reaping code only walked single blocks, didn't correctly detect crosslinked blocks, and the special casing made the function hard to understand. It was easier to remove unneeded functionality prior to fixing all the bugs. Now that we've fixed the problems, we want again the ability to reap file metadata blocks. Reintroduce the per-file reaping functionality atop the current implementation. We require that sc->sa is uninitialized, so that we can use it to hold all the per-AG context for a given extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: skip the rmapbt search on an empty attr fork unless we know it was zappedDarrick J. Wong1-22/+79
The attribute fork scrubber can optionally scan the reverse mapping records of the filesystem to determine if the fork is missing mappings that it should have. However, this is a very expensive operation, so we only want to do this if we suspect that the fork is missing records. For attribute forks the criteria for suspicion is that the attr fork is in EXTENTS format and has zero extents. However, there are several ways that a file can end up in this state through regular filesystem usage. For example, an LSM can set a s_security hook but then decide not to set an ACL; or an attr set can create the attr fork but then the actual set operation fails with ENOSPC; or we can delete all the attrs on a file whose data fork is in btree format, in which case we do not delete the attr fork. We don't want to run the expensive check for any case that can be arrived at through regular operations. However. When online inode repair decides to zap an attribute fork, it cannot determine if it is zapping ACL information. As a precaution it removes all the discretionary access control permissions and sets the user and group ids to zero. Check these three additional conditions to decide if we want to scan the rmap records. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: abort directory parent scrub scans if we encounter a zapped directoryDarrick J. Wong4-0/+46
In a previous patch, we added some code to perform sufficient repairs to an ondisk inode record such that the inode cache would be willing to load the inode. If the broken inode was a shortform directory, it will reset the directory to something plausible, which is to say an empty subdirectory of the root. The telltale signs that something is seriously wrong is the broken link count. Such directories look clean, but they shouldn't participate in a filesystem scan to find or confirm a directory parent pointer. Create a predicate that identifies such directories and abort the scrub. Found by fuzzing xfs/1554 with multithreaded xfs_scrub enabled and u3.bmx[0].startblock = zeroes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: zap broken inode forksDarrick J. Wong2-4/+751
Determine if inode fork damage is responsible for the inode being unable to pass the ifork verifiers in xfs_iget and zap the fork contents if this is true. Once this is done the fork will be empty but we'll be able to construct an in-core inode, and a subsequent call to the inode fork repair ioctl will search the rmapbt to rebuild the records that were in the fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: repair inode recordsDarrick J. Wong6-2/+1017
If an inode is so badly damaged that it cannot be loaded into the cache, fix the ondisk metadata and try again. If there /is/ a cached inode, fix any problems and apply any optimizations that can be solved incore. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: set inode sick state flags when we zap either ondisk forkDarrick J. Wong6-10/+101
In a few patches, we'll add some online repair code that tries to massage the ondisk inode record just enough to get it to pass the inode verifiers so that we can continue with more file repairs. Part of that massaging can include zapping the ondisk forks to clear errors. After that point, the bmap fork repair functions will rebuild the zapped forks. Christoph asked for stronger protections against online repair zapping a fork to get the inode to load vs. other threads trying to access the partially repaired file. Do this by adding a special "[DA]FORK_ZAPPED" inode health flag whenever repair zaps a fork, and sprinkling checks for that flag into the various file operations for things that don't like handling an unexpected zero-extents fork. In practice xfs_scrub will scrub and fix the forks almost immediately after zapping them, so the window is very small. However, if a crash or unmount should occur, we can still detect these zapped inode forks by looking for a zero-extents fork when data was expected. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: dont cast to char * for XFS_DFORK_*PTR macrosDarrick J. Wong1-1/+1
Code in the next patch will assign the return value of XFS_DFORK_*PTR macros to a struct pointer. gcc complains about casting char* strings to struct pointers, so let's fix the macro's cast to void* to shut up the warnings. While we're at it, fix one of the scrub tests that uses PTR to use BOFF instead for a simpler integer comparison, since other linters whine about char* and void* comparisons. Can't satisfy all these dman bots. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: add missing nrext64 inode flag check to scrubDarrick J. Wong1-0/+4
Add this missing check that the superblock nrext64 flag is set if the inode flag is set. Fixes: 9b7d16e34bbeb ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: try to attach dquots to files before repairing themDarrick J. Wong7-5/+55
Inode resource usage is tracked in the quota metadata. Repairing a file might change the resources used by that file, which means that we need to attach dquots to the file that we're examining before accessing anything in the file protected by the ILOCK. However, there's a twist: a dquot cache miss requires the dquot to be read in from the quota file, during which we drop the ILOCK on the file being examined. This means that we *must* try to attach the dquots before taking the ILOCK. Therefore, dquots must be attached to files in the scrub setup function. If doing so yields corruption errors (or unknown dquot errors), we instead clear the quotachecked status, which will cause a quotacheck on next mount. A future series will make this trigger live quotacheck. While we're here, change the xrep_ino_dqattach function to use the unlocked dqattach functions so that we avoid cycling the ILOCK if the inode already has dquots attached. This makes the naming and locking requirements consistent with the rest of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: disable online repair quota helpers when quota not enabledDarrick J. Wong2-0/+11
Don't compile the quota helper functions if quota isn't being built into the XFS module. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: repair refcount btreesDarrick J. Wong5-12/+810
Reconstruct the refcount data from the rmap btree. Link: https://docs.kernel.org/filesystems/xfs-online-fsck-design.html#case-study-rebuilding-the-space-reference-counts Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: repair inode btreesDarrick J. Wong8-37/+1001
Use the rmapbt to find inode chunks, query the chunks to compute hole and free masks, and with that information rebuild the inobt and finobt. Refer to the case study in Documentation/filesystems/xfs-online-fsck-design.rst for more details. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: repair free space btreesDarrick J. Wong11-13/+1173
Rebuild the free space btrees from the gaps in the rmap btree. Refer to the case study in Documentation/filesystems/xfs-online-fsck-design.rst for more details. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: remove trivial bnobt/inobt scrub helpersDarrick J. Wong4-46/+39
Christoph Hellwig complained about awkward code in the next two repair patches such as: sc->sm->sm_type = XFS_SCRUB_TYPE_BNOBT; error = xchk_bnobt(sc); This is a little silly, so let's export the xchk_{,i}allocbt functions to the dispatch table in scrub.c directly and get rid of the helpers. Originally I had planned each btree gets its own separate entry point, but since repair doesn't work that way, it no longer makes sense to complicate the call chain that way. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: roll the scrub transaction after completing a repairDarrick J. Wong1-3/+6
When we've finished repairing an AG header, roll the scrub transaction. This ensure that any failures caused by defer ops failing are captured by the xrep_done tracepoint and that any stacktraces that occur will point to the repair code that caused it, instead of xchk_teardown. Going forward, repair functions should commit the transaction if they're going to return success. Usually the space reaping functions that run after a successful atomic commit of the new metadata will take care of that for us. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: move the per-AG datatype bitmaps to separate filesDarrick J. Wong7-150/+174
Move struct xagb_bitmap to its own pair of C and header files per request of Christoph. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: create separate structures and code for u32 bitmapsDarrick J. Wong4-166/+458
Create a version of the xbitmap that handles 32-bit integer intervals and adapt the xfs_agblock_t bitmap to use it. This reduces the size of the interval tree nodes from 48 to 36 bytes and enables us to use a more efficient slab (:0000040 instead of :0000048) which allows us to pack more nodes into a single slab page (102 vs 85). As a side effect, the users of these bitmaps no longer have to convert between u32 and u64 quantities just to use the bitmap; and the hairy overflow checking code in xagb_bitmap_test goes away. Later in this patchset we're going to add bitmaps for xfs_agino_t, xfs_rgblock_t, and xfs_dablk_t, so the increase in code size (5622 vs. 9959 bytes) seems worth it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: constrain dirty buffers while formatting a staged btreeDarrick J. Wong1-0/+1
Constrain the number of dirty buffers that are locked by the btree staging code at any given time by establishing a threshold at which we put them all on the delwri queue and push them to disk. This limits memory consumption while writing out new btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: add debug knobs to control btree bulk load slack factorsDarrick J. Wong1-3/+8
Add some debug knobs so that we can control the leaf and node block slack when rebuilding btrees. For developers, it might be useful to construct btrees of various heights by crafting a filesystem with a certain number of records and then using repair+knobs to rebuild the index with a certain shape. Practically speaking, you'd only ever do that for extreme stress testing of the runtime code or the btree generator. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: fix an off-by-one error in xreap_agextent_binvalDarrick J. Wong1-1/+1
Overall, this function tries to find and invalidate all buffers for a given extent of space on the data device. The inner for loop in this function tries to find all xfs_bufs for a given daddr. The lengths of all possible cached buffers range from 1 fsblock to the largest needed to contain a 64k xattr value (~17fsb). The scan is capped to avoid looking at anything buffer going past the given extent. Unfortunately, the loop continuation test is wrong -- max_fsbs is the largest size we want to scan, not one past that. Put another way, this loop is actually 1-indexed, not 0-indexed. Therefore, the continuation test should use <=, not <. As a result, online repairs of btree blocks fails to stale any buffers for btrees that are being torn down, which causes later assertions in the buffer cache when another thread creates a different-sized buffer. This happens in xfs/709 when allocating an inode cluster buffer: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3346128 at fs/xfs/xfs_message.c:104 assfail+0x3a/0x40 [xfs] CPU: 0 PID: 3346128 Comm: fsstress Not tainted 6.7.0-rc4-djwx #rc4 RIP: 0010:assfail+0x3a/0x40 [xfs] Call Trace: <TASK> _xfs_buf_obj_cmp+0x4a/0x50 xfs_buf_get_map+0x191/0xba0 xfs_trans_get_buf_map+0x136/0x280 xfs_ialloc_inode_init+0x186/0x340 xfs_ialloc_ag_alloc+0x254/0x720 xfs_dialloc+0x21f/0x870 xfs_create_tmpfile+0x1a9/0x2f0 xfs_rename+0x369/0xfd0 xfs_vn_rename+0xfa/0x170 vfs_rename+0x5fb/0xc30 do_renameat2+0x52d/0x6e0 __x64_sys_renameat2+0x4b/0x60 do_syscall_64+0x3b/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e A later refactoring patch in the online repair series fixed this by accident, which is why I didn't notice this until I started testing only the patches that are likely to end up in 6.8. Fixes: 1c7ce115e521 ("xfs: reap large AG metadata extents when possible") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-12-07xfs: force small EFIs for reaping btree extentsDarrick J. Wong1-0/+5
Introduce the concept of a defer ops barrier to separate consecutively queued pending work items of the same type. With a barrier in place, the two work items will be tracked separately, and receive separate log intent items. The goal here is to prevent reaping of old metadata blocks from creating unnecessarily huge EFIs that could then run the risk of overflowing the scrub transaction. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-07xfs: log EFIs for all btree blocks being used to stage a btreeDarrick J. Wong2-8/+29
We need to log EFIs for every extent that we allocate for the purpose of staging a new btree so that if we fail then the blocks will be freed during log recovery. Use the autoreaping mechanism provided by the previous patch to attach paused freeing work to the scrub transaction. We can then mark the EFIs stale if we decide to commit the new btree, or we can unpause the EFIs if we decide to abort the repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-07xfs: implement block reservation accounting for btrees we're stagingDarrick J. Wong3-0/+594
Create a new xrep_newbt structure to encapsulate a fake root for creating a staged btree cursor as well as to track all the blocks that we need to reserve in order to build that btree. As for the particular choice of lowspace thresholds and btree block slack factors -- at this point one could say that the thresholds in online repair come from bulkload_estimate_ag_slack in xfs_repair[1]. But that's not the entire story, since the offline btree rebuilding code in xfs_repair was merged as a retroport of the online btree code in this patchset! Before xfs_btree_staging.[ch] came along, xfs_repair determined the slack factor (aka the number of slots to leave unfilled in each new btree block) via open-coded logic in repair/phase5.c[2]. At that point the slack factors were arbitrary quantities per btree. The rmapbt automatically left 10 slots free; everything else left zero. That had a noticeable effect on performance straight after mounting because adding records to /any/ btree would result in splits. A few years ago when this patch was first written, Dave and I decided that repair should generate btree blocks that were 75% full unless space was tight, in which case it should try to fill the blocks to nearly full. We defined tight as ~10% free to avoid repair failures but settled on 3/32 (~9%) to avoid div64. IOWs, we mostly pulled the thresholds out of thin air. We've been QAing with those geometry numbers ever since. ;) Link: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/repair/bulkload.c?h=v6.5.0#n114 Link: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/repair/phase5.c?h=v4.19.0#n1349 Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-12-07xfs: remove __xfs_free_extent_laterDarrick J. Wong1-1/+1
xfs_free_extent_later is a trivial helper, so remove it to reduce the amount of thinking required to understand the deferred freeing interface. This will make it easier to introduce automatic reaping of speculative allocations in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-07xfs: make xchk_iget safer in the presence of corrupt inode btreesDarrick J. Wong3-4/+31
When scrub is trying to iget an inode, ensure that it won't end up deadlocked on a cycle in the inode btree by using an empty transaction to store all the buffers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19xfs: simplify rt bitmap/summary block accessor functionsDarrick J. Wong1-1/+1
Simplify the calling convention of these functions since the xfs_rtalloc_args structure contains the parameters we need. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19xfs: simplify xfs_rtbuf_get calling conventionsDarrick J. Wong1-3/+2
Now that xfs_rtalloc_args holds references to the last-read bitmap and summary blocks, we don't need to pass the buffer pointer out of xfs_rtbuf_get. Callers no longer have to xfs_trans_brelse on their own, though they are required to call xfs_rtbuf_cache_relse before the xfs_rtalloc_args goes out of scope. While we're at it, create some trivial helpers so that we don't have to remember if "0" means "bitmap" and "1" means "summary". Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19xfs: cache last bitmap block in realtime allocatorOmar Sandoval1-2/+2
Profiling a workload on a highly fragmented realtime device showed a ton of CPU cycles being spent in xfs_trans_read_buf() called by xfs_rtbuf_get(). Further tracing showed that much of that was repeated calls to xfs_rtbuf_get() for the same block of the realtime bitmap. These come from xfs_rtallocate_extent_block(): as it walks through ranges of free bits in the bitmap, each call to xfs_rtcheck_range() and xfs_rtfind_{forw,back}() gets the same bitmap block. If the bitmap block is very fragmented, then this is _a lot_ of buffer lookups. The realtime allocator already passes around a cache of the last used realtime summary block to avoid repeated reads (the parameters rbpp and rsb). We can do the same for the realtime bitmap. This replaces rbpp and rsb with a struct xfs_rtbuf_cache, which caches the most recently used block for both the realtime bitmap and summary. xfs_rtbuf_get() now handles the caching instead of the callers, which requires plumbing xfs_rtbuf_cache to more functions but also makes sure we don't miss anything. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19xfs: consolidate realtime allocation argumentsDave Chinner1-1/+5
Consolidate the arguments passed around the rt allocator into a struct xfs_rtalloc_arg similar to how the btree allocator arguments are consolidated in a struct xfs_alloc_arg.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-19xfs: use accessor functions for summary info wordsDarrick J. Wong3-15/+28
Create get and set functions for rtsummary words so that we can redefine the ondisk format with a specific endianness. Note that this requires the definition of a distinct type for ondisk summary info words so that the compiler can perform proper typechecking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-18xfs: create helpers for rtbitmap block/wordcount computationsDarrick J. Wong1-4/+3
Create helper functions that compute the number of blocks or words necessary to store the rt bitmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-18xfs: convert rt summary macros to helpersDarrick J. Wong1-6/+8
Convert the realtime summary file macros to helper functions so that we can improve type checking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-18xfs: convert the rtbitmap block and bit macros to static inline functionsDarrick J. Wong1-1/+1
Replace these macros with typechecked helper functions. Eventually we're going to add more logic to the helpers and it'll be easier if we don't have to macro it up. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>