summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_update_interior.c
AgeCommit message (Collapse)AuthorFilesLines
2023-10-23bcachefs: Kill BTREE_INSERT_USE_RESERVEKent Overstreet1-29/+27
Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a null ptr deref in bch2_fs_alloc() error pathKent Overstreet1-1/+4
This fixes a null ptr deref in bch2_free_pending_node_rewrites() when the list head wasn't initialized. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill JOURNAL_WATERMARKKent Overstreet1-3/+3
This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Rename enum alloc_reserve -> bch_watermarkKent Overstreet1-3/+3
This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix bch2_btree_update_start()Kent Overstreet1-1/+1
The calculation for number of nodes to allocate in bch2_btree_update_start() was incorrect - this fixes a BUG_ON() on the small nodes test. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Delete weird hacky transaction restart injectionKent Overstreet1-3/+0
since we currently don't have a good fault injection library, bch2_btree_insert_node() was randomly injecting faults based on local_clock(). At the very least this should have been a debug mode only thing, but this is a brittle method so let's just delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: More drop_locks_do() conversionsKent Overstreet1-8/+4
Using drop_locks_do() ensures that every unlock() is paired with a relock(), with proper error checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: drop_locks_do()Kent Overstreet1-6/+2
Add a new helper for the common pattern of: - trans_unlock() - do something - trans_relock() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: GFP_NOIO -> GFP_NOFSKent Overstreet1-1/+1
GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23six locks: Kill six_lock_state unionKent Overstreet1-1/+1
As suggested by Linus, this drops the six_lock_state union in favor of raw bitmasks. On the one hand, bitfields give more type-level structure to the code. However, a significant amount of the code was working with six_lock_state as a u64/atomic64_t, and the conversions from the bitfields to the u64 were deemed a bit too out-there. More significantly, because bitfield order is poorly defined (#ifdef __LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the sequence number would overflow into the rest of the bitfield if the compiler didn't put the sequence number at the high end of the word. The new code is a bit saner when we're on an architecture without real atomic64_t support - all accesses to lock->state now go through atomic64_*() operations. On architectures with real atomic64_t support, we additionally use atomic bit ops for setting/clearing individual bits. Text size: 7467 bytes -> 4649 bytes - compilers still suck at bitfields. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve trans_restart_split_race tracepointKent Overstreet1-2/+2
Seeing occasional test failures where we get stuck in a livelock that involves this event - this will help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix bch2_extent_fallocate() in nocow modeKent Overstreet1-0/+2
When we allocate disk space, we need to be incrementing the WRITE io clock, which perhaps should be renamed to sectors allocated - copygc uses this io clock to know when to run. Also, we should be incrementing the same clock when allocating btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Private error codes: ENOMEMKent Overstreet1-3/+6
This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Drop some anonymous structs, unionsKent Overstreet1-2/+2
Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: BKEY_PADDED_ONSTACK()Kent Overstreet1-1/+1
Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED() only needs the anonymous struct when it's used on the stack, to guarantee layout, not when it's embedded in another struct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Plumb btree_trans through btree cache codeKent Overstreet1-4/+11
Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned device support is going to require a new allocation for every btree node write. This is a bit of prep work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add tracepoint & counter for btree split raceKent Overstreet1-1/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_journal_entries_postprocess()Kent Overstreet1-10/+5
This brings back journal_entries_compact(), but in a more efficient form - we need to do multiple postprocess steps, so iterate over the journal entries being written just once to make it more efficient. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Handle btree node rewrites before going RWKent Overstreet1-7/+58
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add some logging for btree node rewrites due to errorsKent Overstreet1-3/+20
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Debug mode for c->writes referencesKent Overstreet1-3/+3
This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix btree_node_write_blocked() not being clearedKent Overstreet1-0/+3
The btree_node_write_blocked bit was a later addition to this code, it only mirrors the state of the b->write_blocked list (empty or nonempty) - unfortunately, when it was added it wasn't correctly kept in sync - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve btree_reserve_get_fail tracepointKent Overstreet1-1/+2
Now we include the return code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert EAGAIN errors to private error codesKent Overstreet1-1/+1
More error code cleanup, for better error messages and debugability. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: btree_iter->ip_allocatedKent Overstreet1-4/+6
In debug mode, we now track where btree iterators and paths are initialized/allocated - helpful in tracking down btree path overflows. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New bpos_cmp(), bkey_cmp() replacementsKent Overstreet1-9/+8
This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a race with b->write_typeKent Overstreet1-2/+10
b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Btree split improvementKent Overstreet1-144/+106
This improves the bkey_format calculation when splitting btree nodes. Previously, we'd use a format calculated for the original node for the lower of the two new nodes. This was particularly bad on sequential insertions, where we iteratively split the last btree node, whos format has to include KEY_MAX. Now, we calculate formats precisely for the keys the two new nodes will contain. This also should make splitting a bit more efficient, since we're only copying keys once (from the original node to the new node, instead of new node, replacement node, then upper split). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improved btree write statisticsKent Overstreet1-0/+1
This replaces sysfs btree_avg_write_size with btree_write_stats, which now breaks out statistics by the source of the btree write. Btree writes that are too small are a source of inefficiency, and excessive btree resort overhead - this will let us see what's causing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Make error messages more uniformKent Overstreet1-3/+3
Use __func__ in error messages that refer to function name, and do so more uniformly - suggested by checkpatch.pl Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Assorted checkpatch fixesKent Overstreet1-1/+1
checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Call bch2_btree_update_add_new_node() before dropping write lockKent Overstreet1-11/+9
btree nodes can be written by other threads (shrinker, journal reclaim) with only a read lock, but brand new nodes should only be written by the thread doing the split/interior update. bch2_btree_update_add_new_node() sets btree node flags to indicate that this is a new node and should not be written out by other threads, thus we need to call it before dropping our write lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Btree splits now only take the locks they needKent Overstreet1-17/+25
Previously, bch2_btree_update_start() would always take all intent locks, all the way up to the root. We've finally got data from users where this became a scalability issue - so, this patch fixes bch2_btree_update_start() to only take the locks we need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_btree_insert_node() no longer uses lock_write_nofailKent Overstreet1-1/+6
Now that we have an error path plumbed through, there's no need to be using bch2_btree_node_lock_write_nofail(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add error path to btree_split()Kent Overstreet1-18/+87
The next patch in the series is (finally!) going to change btree splits (and interior updates in general) to not take intent locks all the way up to the root - instead only locking the nodes they'll need to modify. However, this will be introducing a race since if we're not holding a write lock on a btree node it can be written out by another thread, and then we might not have enough space for a new bset entry. We can handle this by retrying - we just need to introduce a new error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Write new btree nodes after parent updateKent Overstreet1-22/+24
In order to avoid locking all btree nodes up to the root for btree node splits, we're going to have to introduce a new error path into bch2_btree_insert_node(); this mean we can't have done any writes or modified global state before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix a deadlock in btree_update_nodes_written()Kent Overstreet1-0/+11
btree_node_lock_nopath() is something we'd like to get rid of, it's always prone to deadlocks if we accidentally are holding other locks, because it doesn't mark the lock it's taking in a path: we'll want to get rid of it in the future, but for now this patch works it by calling bch2_trans_unlock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs; Mark __bch2_trans_iter_init as inlineKent Overstreet1-2/+1
This function is fairly small and only used in two places: one very hot, the other cold, so it should definitely be inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix blocking with locks heldKent Overstreet1-2/+2
This is a major oopsy - we should always be unlocking before calling closure_sync(), else we'll cause a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: btree_update_nodes_written() needs BTREE_INSERT_USE_RESERVEKent Overstreet1-0/+1
This fixes an obvious deadlock - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix error handling in bch2_btree_update_start()Kent Overstreet1-2/+2
We were checking for -EAGAIN, but we're not returned that when we didn't pass a closure to wait with - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: All held locks must be in a btree pathKent Overstreet1-3/+75
With the new deadlock cycle detector, it's critical that all held locks be marked in a btree_path, because that's what the cycle detector traverses - any locks that aren't correctly marked will cause deadlocks. This changes the btree_path to allocate some btree_paths for the new nodes, since until the final update is done we otherwise don't have a path referencing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_btree_path_upgrade() now emits transaction restartKent Overstreet1-7/+5
Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Avoid using btree_node_lock_nopath()Kent Overstreet1-26/+21
With the upcoming cycle detector, we have to be careful about using btree_node_lock_nopath - in particular, using it to take write locks can cause deadlocks. All held locks need to be tracked in a btree_path, so that the cycle detector knows about them - unless we know that we cannot cause deadlocks for other reasons: e.g. we are only taking read locks, or we're in very early fsck (topology repair). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert more locking code to btree_bkey_cached_commonKent Overstreet1-2/+2
Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_btree_node_lock_write_nofail()Kent Overstreet1-2/+2
Taking a write lock will be able to fail, with the new cycle detector - unless we pass it nofail, which is possible but not preferred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New locking functionsKent Overstreet1-43/+52
In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Add persistent counters for all tracepointsKent Overstreet1-11/+10
Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix bch2_btree_update_start() to return ↵Kent Overstreet1-0/+5
-BCH_ERR_journal_reclaim_would_deadlock On failure to get a journal pre-reservation because we're called from journal reclaim we're not supposed to return a transaction restart error - this fixes a livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improve trans_restart_journal_preres_get tracepointKent Overstreet1-1/+1
It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>