summaryrefslogtreecommitdiff
path: root/fs/bcachefs/subvolume.c
AgeCommit message (Collapse)AuthorFilesLines
2024-05-20bcachefs: Check for subvolues with bogus snapshot/inode fieldsKent Overstreet1-0/+9
This fixes an assertion pop in btree_iter.c that checks for forgetting to pass a snapshot ID when iterating over snapshots btrees. Reported-by: syzbot+0dfe05235e38653e2aee@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: s/bkey_invalid_flags/bch_validate_flagsKent Overstreet1-1/+1
We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: Fix type of flags parameter for some ->trigger() implementationsNathan Chancellor1-1/+1
When building with clang's -Wincompatible-function-pointer-types-strict (a warning designed to catch potential kCFI failures at build time), there are several warnings along the lines of: fs/bcachefs/bkey_methods.c:118:2: error: incompatible function pointer types initializing 'int (*)(struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, enum btree_iter_update_trigger_flags)' with an expression of type 'int (struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, unsigned int)' [-Werror,-Wincompatible-function-pointer-types-strict] 118 | BCH_BKEY_TYPES() | ^~~~~~~~~~~~~~~~ fs/bcachefs/bcachefs_format.h:394:2: note: expanded from macro 'BCH_BKEY_TYPES' 394 | x(inode, 8) \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey_methods.c:117:41: note: expanded from macro 'x' 117 | #define x(name, nr) [KEY_TYPE_##name] = bch2_bkey_ops_##name, | ^~~~~~~~~~~~~~~~~~~~ <scratch space>:277:1: note: expanded from here 277 | bch2_bkey_ops_inode | ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/inode.h:26:13: note: expanded from macro 'bch2_bkey_ops_inode' 26 | .trigger = bch2_trigger_inode, \ | ^~~~~~~~~~~~~~~~~~ There are several functions that did not have their flags parameter converted to 'enum btree_iter_update_trigger_flags' in the recent unification, which will cause kCFI failures at runtime because the types, while ABI compatible (hence no warning from the non-strict version of this warning), do not match exactly. Fix up these functions (as well as a few other obvious functions that should have it, even if there are no warnings currently) to resolve the warnings and potential kCFI runtime failures. Fixes: 31e4ef3280c8 ("bcachefs: iter/update/trigger/str_hash flag cleanup") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-09bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet1-9/+9
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: Split out recovery_passes.cKent Overstreet1-0/+72
We've grown a fair amount of code for managing recovery passes; tracking which ones we're running, which ones need to be run, and flagging in the superblock which ones need to be run on the next recovery. So it's worth splitting out into its own file, this code is pretty different from the code in recovery.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-14bcachefs: Check for subvolume children when deleting subvolumesKent Overstreet1-0/+13
Recursively destroying subvolumes isn't allowed yet. Fixes: https://github.com/koverstreet/bcachefs/issues/634 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-14bcachefs: BTREE_ID_subvolume_childrenKent Overstreet1-0/+98
Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-14bcachefs: bch_subvolume::fs_path_parentKent Overstreet1-2/+21
Record the filesystem path heirarchy for subvolumes in bch_subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-14bcachefs: bch_subvolume::parent -> creation_parentKent Overstreet1-12/+12
bit of renaming, prep for adding a fs path parent Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-14bcachefs: Check subvol <-> inode pointers in check_subvol()Kent Overstreet1-0/+31
Subvolumes and subvolume root inodes point to each other: this verifies the subvolume -> inode -> subvolme path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: for_each_btree_key() now declares loop iterKent Overstreet1-7/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_err_(fn|msg) check if should printKent Overstreet1-10/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Explicity go RW for fsckKent Overstreet1-3/+1
This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Rename BTREE_INSERT flagsKent Overstreet1-3/+3
BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: make RO snapshots actually ROKent Overstreet1-0/+18
Add checks to all the VFS paths for "are we in a RO snapshot?". Note - we don't check this when setting inode options via our xattr interface, since those generally only affect data placement, not contents of data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
2023-11-02bcachefs: Enumerate fsck errorsKent Overstreet1-8/+10
This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarilyKent Overstreet1-17/+2
Be a bit more careful about when bch2_delete_dead_snapshots needs to run: it only needs to run synchronously if we're running fsck, and it only needs to run at all if we have snapshot nodes to delete or if fsck has noticed that it needs to run. Also: Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS Kill bch2_delete_dead_snapshots_hook(), move functionality to bch2_mark_snapshot() Factor out bch2_check_snapshot_needs_deletion(), to explicitly check if we need to be running snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Heap allocate btree_transKent Overstreet1-3/+3
We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix -Wincompatible-function-pointer-types-strict from key_invalid ↵Nathan Chancellor1-1/+1
callbacks When building bcachefs with -Wincompatible-function-pointer-types-strict, a clang warning designed to catch issues with mismatched function pointer types, which will be fatal at runtime due to kernel Control Flow Integrity (kCFI), there are several instances along the lines of: fs/bcachefs/bkey_methods.c:118:2: error: incompatible function pointer types initializing 'int (*)(const struct bch_fs *, struct bkey_s_c, enum bkey_invalid_flags, struct printbuf *)' with an expression of type 'int (const struct bch_fs *, struct bkey_s_c, unsigned int, struct printbuf *)' [-Werror,-Wincompatible-function-pointer-types-strict] 118 | BCH_BKEY_TYPES() | ^~~~~~~~~~~~~~~~ fs/bcachefs/bcachefs_format.h:342:2: note: expanded from macro 'BCH_BKEY_TYPES' 342 | x(deleted, 0) \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey_methods.c:117:41: note: expanded from macro 'x' 117 | #define x(name, nr) [KEY_TYPE_##name] = bch2_bkey_ops_##name, | ^~~~~~~~~~~~~~~~~~~~ <scratch space>:206:1: note: expanded from here 206 | bch2_bkey_ops_deleted | ^~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey_methods.c:34:17: note: expanded from macro 'bch2_bkey_ops_deleted' 34 | .key_invalid = deleted_key_invalid, \ | ^~~~~~~~~~~~~~~~~~~ The flags parameter should be of type 'enum bkey_invalid_flags', not 'unsigned int'. Adjust the type everywhere so that there is no more warning. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert more code to bch_err_msg()Kent Overstreet1-3/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Split out snapshot.cKent Overstreet1-1341/+19
subvolume.c has gotten a bit large, this splits out a separate file just for managing snapshot trees - BTREE_ID_snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_run_explicit_recovery_pass()Kent Overstreet1-5/+5
This introduces bch2_run_explicit_recovery_pass() and uses it for when fsck detects that we need to re-run dead snaphots cleanup, and makes dead snapshot cleanup more like a normal recovery pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Inline bch2_snapshot_is_ancestor() fast pathKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: is_ancestor bitmapKent Overstreet1-8/+15
Further optimization for bch2_snapshot_is_ancestor(). We add a small inline bitmap to snapshot_t, which indicates which of the next 128 snapshot IDs are ancestors of the current id - eliminating the last few iterations of the loop in bch2_snapshot_is_ancestor(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert snapshot table to RCU arrayKent Overstreet1-25/+117
This switches the generic radix tree for the in-memory table of snapshot nodes to a simple rcu array. This means we have to add new locking to deal with reallocations, but is faster than traversing the radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Snapshot depth, skiplist fieldsKent Overstreet1-47/+214
This extents KEY_TYPE_snapshot to include some new fields: - depth, to indicate depth of this particular node from the root - skip[3], skiplist entries for quickly walking back up to the root These are to improve bch2_snapshot_is_ancestor(), making it O(ln(n)) instead of O(n) in the snapshot tree depth. Skiplist nodes are picked at random from the set of ancestor nodes, not some fixed fraction. This introduces bcachefs_metadata_version 1.1, snapshot_skiplists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Enumerate recovery passesKent Overstreet1-5/+5
Recovery and fsck have many different passes/jobs to do, which always run in the same order - but not all of them run all the time. Some are for fsck, some for unclean shutdown, some for version upgrades. This adds some new structure: a defined list of recovery passes that we can run in a loop, as well as consolidating the log messages. The main benefit is consolidating the "should run this recovery pass" logic, as well as cleaning up the "this recovery pass has finished" state; instead of having a bunch of ad-hoc state bits in c->flags, we've now got c->curr_recovery_pass. By consolidating the "should run this recovery pass" logic, in the future on disk format upgrades will be able to say "upgrading to this version requires x passes to run", instead of forcing all of fsck to run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Change check for invalid key typesKent Overstreet1-2/+4
As part of the forward compatibility patch series, we need to allow for new key types without complaining loudly when running an old version. This patch changes the flags parameter of bkey_invalid to an enum, and adds a new flag to indicate we're being called from the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Assorted sparse fixesKent Overstreet1-3/+3
- endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve bch2_bkey_make_mut()Kent Overstreet1-5/+5
bch2_bkey_make_mut() now takes the bkey_s_c by reference and points it at the new, mutable key. This helps in some fsck paths that may have multiple repair operations on the same key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New error message helpersKent Overstreet1-4/+5
Add two new helpers for printing error messages with __func__ and bch2_err_str(): - bch_err_fn - bch_err_msg Also kill the old error strings in the recovery path, which were causing us to incorrectly report memory allocation failures - they're not needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: snapshot_to_text() includes snapshot treeKent Overstreet1-2/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert -ENOENT to private error codesKent Overstreet1-8/+13
As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix quotas + snapshotsKent Overstreet1-5/+5
Now that we can reliably designate and find the master subvolume out of a tree of snapshots, we can finally make quotas work with snapshots: That is - quotas will now _ignore_ snapshot subvolumes, and only be in effect for the master (non snapshot) subvolume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add otime, parent to bch_subvolumeKent Overstreet1-16/+81
Add two new fields to bch_subvolume: - otime: creation time - parent: For snapshots, this is the id of the subvolume the snapshot was created from Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: BTREE_ID_snapshot_treeKent Overstreet1-70/+520
This adds a new btree which gets us a persistent per-snapshot-tree identifier. - BTREE_ID_snapshot_trees - KEY_TYPE_snapshot_tree - bch_snapshot now has a field that points to a snapshot_tree This is going to be used to designate one snapshot ID/subvolume out of a given tree of snapshots as the "main" subvolume, so that we can do quota accounting in that subvolume and not the rest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_get_empty_slot()Kent Overstreet1-19/+6
Add a new helper for allocating a new slot in a btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_get_mut() now calls bch2_trans_update()Kent Overstreet1-24/+11
It's safe to call bch2_trans_update with a k/v pair where the value hasn't been filled out, as long as the key part has been and the value is filled out by transaction commit time. This patch folds the bch2_trans_update() call into bch2_bkey_get_mut(), eliminating a bit of boilerplate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_alloc() now calls bch2_trans_update()Kent Overstreet1-8/+4
It's safe to call bch2_trans_update with a k/v pair where the value hasn't been filled out, as long as the key part has been and the value is filled out by transaction commit time. This patch folds the bch2_trans_update() call into bch2_bkey_alloc(), eliminating a bit of boilerplate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_get_mut() improvementsKent Overstreet1-21/+18
- bch2_bkey_get_mut() now handles types increasing in size, allocating a buffer for the type's current size when necessary - bch2_bkey_make_mut_typed() - bch2_bkey_get_mut() now initializes the iterator, like bch2_bkey_get_iter() Also, refactor so that most of the code is in functions - now macros are only used for wrappers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_get_iter() helpersKent Overstreet1-59/+21
Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bkey_ops.min_val_sizeKent Overstreet1-12/+0
This adds a new field to bkey_ops for the minimum size of the value, which standardizes that check and also enforces the new rule (previously done somewhat ad-hoc) that we can extend value types by adding new fields on to the end. To make that work we do _not_ initialize min_val_size with sizeof, instead we initialize it to the size of the first version of those values. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: use dedicated workqueue for tasks holding write refsBrian Foster1-2/+2
A workqueue resource deadlock has been observed when running fsck on a filesystem with a full/stuck journal. fsck is not currently able to repair the fs due to fairly rapid emergency shutdown, but rather than exit gracefully the fsck process hangs during the shutdown sequence. Fortunately this is easily recoverable from userspace, but the root cause involves code shared between the kernel and userspace and so should be addressed. The deadlock scenario involves the main task in the bch2_fs_stop() -> bch2_fs_read_only() path waiting on write references to drain with the fs state lock held. A bch2_read_only_work() workqueue task is scheduled on the system_long_wq, blocked on the state lock. Finally, various other write ref holding workqueue tasks are scheduled to run on the same workqueue and must complete in order to release references that the initial task is waiting on. To avoid this problem, we can split the dependent workqueue tasks across different workqueues. It's a bit of a waste to create a dedicated wq for the read-only worker, but there are several tasks throughout the fs that follow the pattern of acquiring a write reference and then scheduling to the system wq. Use a local wq for such tasks to break the subtle dependency between these and the read-only worker. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Private error codes: ENOMEMKent Overstreet1-1/+1
This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Mark new snapshots earlier in create pathKent Overstreet1-2/+4
This fixes a null ptr deref when creating new snapshots: bch2_create_trans() will lookup the subvolume and find the _new_ snapshot in the BCH_CREATE_SUBVOL path that's being created in that transaction. We have to call bch2_mark_snapshot() earlier so that it's properly initialized, instead of leaving it for transaction commit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_mark_key() now takes btree_id & levelKent Overstreet1-1/+2
btree & level are passed to trans_mark - for backpointers - bch2_mark_key() should take them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_mark_snapshot() now called like other triggersKent Overstreet1-2/+1
This fixes a bug where bch2_mark_snapshot() wasn't called for existing snapshot nodes being updated when child nodes were added. This led to the data update path thinking the key being updated was for a snapshot that didn't have children, causing it to fail to insert whiteouts when splitting existing extents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Change bkey_invalid() rw param to flagsKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Debug mode for c->writes referencesKent Overstreet1-9/+7
This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New btree helpersKent Overstreet1-81/+27
This introduces some new conveniences, to help cut down on boilerplate: - bch2_trans_kmalloc_nomemzero() - performance optimiation - bch2_bkey_make_mut() - bch2_bkey_get_mut() - bch2_bkey_get_mut_typed() - bch2_bkey_alloc() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>