summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2024-04-05Merge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefsLinus Torvalds39-494/+1869
Pull bcachefs repair code from Kent Overstreet: "A couple more small fixes, and new repair code. We can now automatically recover from arbitrary corrupted interior btree nodes by scanning, and we can reconstruct metadata as needed to bring a filesystem back into a working, consistent, read-write state and preserve access to whatevver wasn't corrupted. Meaning - you can blow away all metadata except for extents and dirents leaf nodes, and repair will reconstruct everything else and give you your data, and under the correct paths. If inodes are missing i_size will be slightly off and permissions/ownership/timestamps will be gone, and we do still need the snapshots btree if snapshots were in use - in the future we'll be able to guess the snapshot tree structure in some situations. IOW - aside from shaking out remaining bugs (fuzz testing is still coming), repair code should be complete and if repair ever doesn't work that's the highest priority bug that I want to know about immediately. This patchset was kindly tested by a user from India who accidentally wiped one drive out of a three drive filesystem with no replication on the family computer - it took a couple weeks but we got everything important back" * tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs: bcachefs: reconstruct_inode() bcachefs: Subvolume reconstruction bcachefs: Check for extents that point to same space bcachefs: Reconstruct missing snapshot nodes bcachefs: Flag btrees with missing data bcachefs: Topology repair now uses nodes found by scanning to fill holes bcachefs: Repair pass for scanning for btree nodes bcachefs: Don't skip fake btree roots in fsck bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake() bcachefs: Etyzinger cleanups bcachefs: bch2_shoot_down_journal_keys() bcachefs: Clear recovery_passes_required as they complete without errors bcachefs: ratelimit informational fsck errors bcachefs: Check for bad needs_discard before doing discard bcachefs: Improve bch2_btree_update_to_text() mean_and_variance: Drop always failing tests bcachefs: fix nocow lock deadlock bcachefs: BCH_WATERMARK_interior_updates bcachefs: Fix btree node reserve
2024-04-03bcachefs: reconstruct_inode()Kent Overstreet1-2/+50
If an inode is missing, but corresponding extents and dirent still exist, it's well worth recreating it - this does so. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Subvolume reconstructionKent Overstreet1-19/+148
We can now recreate missing subvolumes from dirents and/or inodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Check for extents that point to same spaceKent Overstreet2-8/+168
In backpointer repair, if we get a missing backpointer - but there's already a backpointer that points to an existing extent - we've got multiple extents that point to the same space and need to decide which to keep. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Reconstruct missing snapshot nodesKent Overstreet6-6/+199
When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Flag btrees with missing dataKent Overstreet6-5/+44
We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Topology repair now uses nodes found by scanning to fill holesKent Overstreet2-107/+199
With the new btree node scan code, we can now recover from corrupt btree roots - simply create a new fake root at depth 1, and then insert all the leaves we found. If the root wasn't corrupt but there's corruption elsewhere in the btree, we can fill in holes as needed with the newest version of a given node(s) from the scan; we also check if a given btree node is older than what we found from the scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Repair pass for scanning for btree nodesKent Overstreet12-51/+605
If a btree root or interior btree node goes bad, we're going to lose a lot of data, unless we can recover the nodes that it pointed to by scanning. Fortunately btree node headers are fully self describing, and additionally the magic number is xored with the filesytem UUID, so we can do so safely. This implements the scanning - next patch will rework topology repair to make use of the found nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Don't skip fake btree roots in fsckKent Overstreet1-3/+0
When a btree root is unreadable, we might still have keys fro the journal to walk and mark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()Kent Overstreet3-7/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Etyzinger cleanupsKent Overstreet7-182/+285
Pull out eytzinger.c and kill eytzinger_cmp_fn. We now provide eytzinger0_sort and eytzinger0_sort_r, which use the standard cmp_func_t and cmp_r_func_t callbacks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: bch2_shoot_down_journal_keys()Kent Overstreet3-10/+35
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Clear recovery_passes_required as they complete without errorsKent Overstreet3-12/+43
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03Merge tag 'vboxsf-v6.9-1' of ↵Linus Torvalds3-7/+6
git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux Pull vboxsf fixes from Hans de Goede: - Compiler warning fixes - Explicitly deny setlease attempts * tag 'vboxsf-v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux: vboxsf: explicitly deny setlease attempts vboxsf: Remove usage of the deprecated ida_simple_xx() API vboxsf: Avoid an spurious warning if load_nls_xxx() fails vboxsf: remove redundant variable out_len
2024-04-03security: Place security_path_post_mknod() where the original IMA call wasRoberto Sassu1-5/+2
Commit 08abce60d63f ("security: Introduce path_post_mknod hook") introduced security_path_post_mknod(), to replace the IMA-specific call to ima_post_path_mknod(). For symmetry with security_path_mknod(), security_path_post_mknod() was called after a successful mknod operation, for any file type, rather than only for regular files at the time there was the IMA call. However, as reported by VFS maintainers, successful mknod operation does not mean that the dentry always has an inode attached to it (for example, not for FIFOs on a SAMBA mount). If that condition happens, the kernel crashes when security_path_post_mknod() attempts to verify if the inode associated to the dentry is private. Move security_path_post_mknod() where the ima_post_path_mknod() call was, which is obviously correct from IMA/EVM perspective. IMA/EVM are the only in-kernel users, and only need to inspect regular files. Reported-by: Steve French <smfrench@gmail.com> Closes: https://lore.kernel.org/linux-kernel/CAH2r5msAVzxCUHHG8VKrMPUKQHmBpE6K9_vjhgDa1uAvwx4ppw@mail.gmail.com/ Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: 08abce60d63f ("security: Introduce path_post_mknod hook") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-04-03vboxsf: explicitly deny setlease attemptsJeff Layton1-0/+1
vboxsf does not break leases on its own, so it can't properly handle the case where the hypervisor changes the data. Don't allow file leases on vboxsf. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240319-setlease-v1-1-5997d67e04b3@kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-04-03vboxsf: Remove usage of the deprecated ida_simple_xx() APIChristophe JAILLET1-3/+3
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/b3c057c86b73f0309a6362031d21f4d7ebb60587.1698835730.git.christophe.jaillet@wanadoo.fr Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-04-03vboxsf: Avoid an spurious warning if load_nls_xxx() failsChristophe JAILLET1-1/+2
If an load_nls_xxx() function fails a few lines above, the 'sbi->bdi_id' is still 0. So, in the error handling path, we will call ida_simple_remove(..., 0) which is not allocated yet. In order to prevent a spurious "ida_free called for id=0 which is not allocated." message, tweak the error handling path and add a new label. Fixes: 0fd169576648 ("fs: Add VirtualBox guest shared folder (vboxsf) support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/d09eaaa4e2e08206c58a1a27ca9b3e81dc168773.1698835730.git.christophe.jaillet@wanadoo.fr Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-04-03vboxsf: remove redundant variable out_lenColin Ian King1-3/+0
The variable out_len is being used to accumulate the number of bytes but it is not being used for any other purpose. The variable is redundant and can be removed. Cleans up clang scan build warning: fs/vboxsf/utils.c:443:9: warning: variable 'out_len' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20240229225138.351909-1-colin.i.king@gmail.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-04-03bcachefs: ratelimit informational fsck errorsKent Overstreet1-4/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Check for bad needs_discard before doing discardKent Overstreet1-21/+26
In the discard worker, we were failing to validate the bucket state - meaning a corrupt needs_discard btree could cause us to discard a bucket that we shouldn't. If check_alloc_info hasn't run yet we just want to bail out, otherwise it's a filesystem inconsistent error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-03bcachefs: Improve bch2_btree_update_to_text()Kent Overstreet2-22/+43
Print out the mode as a string, and also print out the btree and watermark. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-02mean_and_variance: Drop always failing testsGuenter Roeck1-27/+1
mean_and_variance_test_2 and mean_and_variance_test_4 always fail. The input parameters to those tests are identical to the input parameters to tests 1 and 3, yet the expected result for tests 2 and 4 is different for the mean and stddev tests. That will always fail. Expected mean_and_variance_get_mean(mv) == mean[i], but mean_and_variance_get_mean(mv) == 22 (0x16) mean[i] == 10 (0xa) Drop the bad tests. Fixes: 65bc41090720 ("mean and variance: More tests") Closes: https://lore.kernel.org/lkml/065b94eb-6a24-4248-b7d7-d3212efb4787@roeck-us.net/ Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-02bcachefs: fix nocow lock deadlockKent Overstreet1-2/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-02bcachefs: BCH_WATERMARK_interior_updatesKent Overstreet6-7/+12
This adds a new watermark, higher priority than BCH_WATERMARK_reclaim, for interior btree updates. We've seen a deadlock where journal replay triggers a ton of btree node merges, and these use up all available open buckets and then interior updates get stuck. One cause of this is that we're currently lacking btree node merging on write buffer btrees - that needs to be fixed as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-02bcachefs: Fix btree node reserveKent Overstreet1-1/+1
Sign error when checking the watermark - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: On emergency shutdown, print out current journal sequence numberKent Overstreet1-1/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Fix overlapping extent repairKent Overstreet1-4/+6
overlapping extent repair was colliding with extent past end of inode checks - don't update "extent ends at" until we know we have an extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Fix remove_dirent()Kent Overstreet1-3/+4
We were missing an iter_traverse(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Logged op errors should be ignoredKent Overstreet1-4/+3
If something is wrong with a logged op, we just want to delete it - there's nothing to repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Improve -o norecovery; opts.recovery_pass_limitKent Overstreet6-14/+26
This adds opts.recovery_pass_limit, and redoes -o norecovery to make use of it; this fixes some issues with -o norecovery so it can be safely used for data recovery. Norecovery means "don't do journal replay"; it's an important data recovery tool when we're getting stuck in journal replay. When using it this way we need to make sure we don't free journal keys after startup, so we continue to overlay them: thus it needs to imply retain_recovery_info, as well as nochanges. recovery_pass_limit is an explicit option for telling recovery to exit after a specific recovery pass; this is a much cleaner way of implementing -o norecovery, as well as being a useful debug feature in its own right. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: bch2_run_explicit_recovery_pass_persistent()Kent Overstreet2-0/+19
Flag that we need to run a recovery pass and run it - persistenly, so if we crash it'll still get run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Ensure bch_sb_field_ext always existsKent Overstreet2-17/+16
This makes bch_sb_field_ext more consistent with the rest of -o nochanges - we don't want to be varying other codepaths based on -o nochanges, since it's used for testing in dry run mode; also fixes some potential null ptr derefs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Flush journal immediately after replay if we did early repairKent Overstreet1-0/+20
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Resume logged ops after fsckKent Overstreet1-1/+1
Finishing logged ops requires the filesystem to be in a reasonably consistent state - and other fsck passes don't require it to have completed, so just run it last. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Add error messages to logged ops fnsKent Overstreet1-0/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Split out recovery_passes.cKent Overstreet16-286/+313
We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: fix backpointer for missing alloc key msgKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Fix bch2_btree_increase_depth()Kent Overstreet1-2/+5
When we haven't yet allocated any btree nodes for a given btree, we first need to call the regular split path to allocate one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Kill bch2_bkey_ptr_data_type()Kent Overstreet5-66/+67
Remove some duplication, and inconsistency between check_fix_ptrs and the main ptr marking paths Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Fix use after free in check_root_trans()Kent Overstreet1-7/+11
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Fix repair path for missing indirect extentsKent Overstreet1-2/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Fix use after free in bch2_check_fix_ptrs()Kent Overstreet1-6/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Fix btree node keys accounting in topology repair pathKent Overstreet4-4/+14
When dropping keys now outside a now because we're changing the node min/max, we need to redo the node's accounting as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Check btree ptr min_key in .invalidKent Overstreet2-3/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: add REQ_SYNC and REQ_IDLE in write diozhuxiaohui1-2/+2
when writing file with direct_IO on bcachefs, then performance is much lower than other fs due to write back throttle in block layer: wbt_wait+1 __rq_qos_throttle+32 blk_mq_submit_bio+394 submit_bio_noacct_nocheck+649 bch2_submit_wbio_replicas+538 __bch2_write+2539 bch2_direct_write+1663 bch2_write_iter+318 aio_write+355 io_submit_one+1224 __x64_sys_io_submit+169 do_syscall_64+134 entry_SYSCALL_64_after_hwframe+110 add set REQ_SYNC and REQ_IDLE in bio->bi_opf as standard dirct-io Signed-off-by: zhuxiaohui <zhuxiaohui.400@bytedance.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Improved topology repair checksKent Overstreet8-165/+137
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(), and replace them with an improved version, bch2_btree_node_check_topology(). This checks that children of an interior node correctly span the full range of the parent node with no overlaps. Also, ensure that topology repairs at runtime are always a fatal error; in particular, this adds a check in btree_iter_down() - if we don't find a key while walking down the btree that's indicative of a topology error and should be flagged as such, not a null ptr deref. Some checks in btree_update_interior.c remaining BUG_ONS(), because we already checked the node for topology errors when starting the update, and the assertions indicate that we _just_ corrupted the btree node - i.e. the problem can't be that existing on disk corruption, they indicate an actual algorithmic bug. In the future, we'll be annotating the fsck errors list with which recovery pass corrects them; the open coded "run explicit recovery pass or fatal error" in bch2_btree_node_check_topology() will in the future be done for every fsck_err() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Be careful about btree node splits during journal replayKent Overstreet3-3/+26
Don't pick a pivot that's going to be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: btree_and_journal_iter now respects trans->journal_replay_not_finishedKent Overstreet1-5/+8
btree_and_journal_iter is now safe to use at runtime, not just during recovery before journal keys have been freed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: fix trans->mem realloc in __bch2_trans_kmallocHongbo Li1-3/+30
The old code doesn't consider the mem alloced from mempool when call krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX" is inaccurate when trans->mem was allocated by krealloc function. Instead, we use used_mempool stuff to record the situation, and realloc or free the trans->mem in elegant way. Also, after krealloc failed in __bch2_trans_kmalloc, the old data should be copied to the new buffer when alloc from mempool_alloc. Fixes: 31403dca5bb1 ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>