summaryrefslogtreecommitdiff
path: root/fs/bcachefs/movinggc.c
AgeCommit message (Collapse)AuthorFilesLines
2024-01-01bcachefs: for_each_member_device() now declares loop iterKent Overstreet1-5/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: for_each_btree_key() now declares loop iterKent Overstreet1-2/+0
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto()Kent Overstreet1-1/+1
And for_each_btree_key2_upto -> for_each_btree_key_upto Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: darray_for_each() now declares loop iterKent Overstreet1-1/+0
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_err_(fn|msg) check if should printKent Overstreet1-4/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: copygc shouldn't try moving buckets on errorDaniel Hill1-4/+12
Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: copygc should wakeup on shutdown if disabledDaniel Hill1-1/+2
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: remove dead bch2_evacuate_bucket()Daniel Hill1-1/+1
Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush()Kent Overstreet1-2/+2
More accurate naming. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Clean up btree write buffer write ref handlingKent Overstreet1-0/+3
__bch2_btree_write_buffer_flush() now assumes a write ref is already held (as called by the transaction commit path); and the wrappers bch2_write_buffer_flush() and flush_sync() take an explicit write ref. This means internally the write buffer code can always use BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags around and hoping the NOCHECK_RW flag was always carried around correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: New bucket sector count helpersKent Overstreet1-1/+1
This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(), prep work for separately accounting stripe sectors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-29bcachefs: Extra kthread_should_stop() calls for copygcKent Overstreet1-1/+1
This fixes a bug where going read-only was taking longer than it should have due to copygc forgetting to check kthread_should_stop() Additionally: fix a missing is_kthread check in bch2_move_ratelimit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05bcachefs: fix odebug warn and lockdep splat due to on-stack rhashtableBrian Foster1-10/+14
Guenter Roeck reports a lockdep splat and DEBUG_OBJECTS_WORK related warning when bch2_copygc_thread() initializes its rhashtable. The lockdep splat relates to a warning print caused by the fact that the rhashtable exists on the stack but is not annotated as so. This is something that could be addressed by INIT_WORK_ONSTACK(), but rhashtable doesn't expose that control and probably isnt worth the churn for just one user. Instead, dynamically allocate the buckets_in_flight structure and avoid the splat that way. Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05bcachefs: Data move path now uses bch2_trans_unlock_long()Kent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04bcachefs: Ensure copygc does not spinKent Overstreet1-2/+18
If copygc does no work - finds no fragmented buckets - wait for a bit of IO to happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: move: move_stats refactoringKent Overstreet1-0/+1
data_progress_list is gone - it was redundant with moving_context_list The upcoming rebalance rewrite is going to have it using two different move_stats objects with the same moving_context, depending on whether it's scanning or using the rebalance_work btree - this patch plumbs stats around a bit differently so that will work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: moving_context now owns a btree_transKent Overstreet1-20/+16
btree_trans and moving_context are used together, and having the moving_context owns the transaction object reduces some plumbing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Heap allocate btree_transKent Overstreet1-9/+9
We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix W=12 build errorsKent Overstreet1-13/+13
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix -Wcompare-distinct-pointer-types in bch2_copygc_get_buckets()Nathan Chancellor1-1/+1
When building bcachefs for 32-bit ARM, there is a warning when using max() to compare an expression involving 'size_t' with an 'unsigned long' literal: fs/bcachefs/movinggc.c:159:21: error: comparison of distinct pointer types ('typeof (16UL) *' (aka 'unsigned long *') and 'typeof (buckets_in_flight->nr / 4) *' (aka 'unsigned int *')) [-Werror,-Wcompare-distinct-pointer-types] 159 | size_t nr_to_get = max(16UL, buckets_in_flight->nr / 4); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:76:19: note: expanded from macro 'max' 76 | #define max(x, y) __careful_cmp(x, y, >) | ^~~~~~~~~~~~~~~~~~~~~~ include/linux/minmax.h:38:24: note: expanded from macro '__careful_cmp' 38 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~~~~~~~ include/linux/minmax.h:28:4: note: expanded from macro '__safe_cmp' 28 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~~~~~~~ include/linux/minmax.h:22:28: note: expanded from macro '__typecheck' 22 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~ 1 error generated. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when comparing these two expressions. Use max_t(size_t, ...) for this situation, eliminating the warning. Fixes: dd49018737d4 ("bcachefs: Rhashtable based buckets_in_flight for copygc") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Break up io.cKent Overstreet1-8/+0
More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert more code to bch_err_msg()Kent Overstreet1-4/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix for bch2_copygc() spuriously returning -EEXISTKent Overstreet1-1/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill BTREE_INSERT_USE_RESERVEKent Overstreet1-1/+1
Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill JOURNAL_WATERMARKKent Overstreet1-1/+1
This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add a missing rhashtable_destroy() callKent Overstreet1-0/+1
Fixes https://lore.kernel.org/linux-bcachefs/784c3e6a-75bd-e6ca-535a-43b3e1daf643@kernel.dk/T/#mbf7caf005f960018eba23b58795d06c06c947411 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Rename enum alloc_reserve -> bch_watermarkKent Overstreet1-1/+1
This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert -ENOENT to private error codesKent Overstreet1-1/+1
As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_get_iter() helpersKent Overstreet1-13/+3
Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Mark bch2_copygc() noinlineKent Overstreet1-0/+1
This works around a "stack from too large" error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill bch2_verify_bucket_evacuated()Kent Overstreet1-7/+0
With backpointers, it's now impossible for bch2_evacuate_bucket() to be completely reliable: it can race with an extent being partially overwritten or split, which needs a new write buffer flush for the backpointer to be seen. This shouldn't be a real issue in practice; the previous patch added a new tracepoint so we'll be able to see more easily if it is. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Rhashtable based buckets_in_flight for copygcKent Overstreet1-85/+127
Previously, copygc used a fifo for tracking buckets in flight - this had the disadvantage of being fixed size, since we pass references to elements into the move code. This restructures it to be a hash table and linked list, since with erasure coding we need to be able to pipeline across an arbitrary number of buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improved copygc wait debuggingKent Overstreet1-1/+9
This just adds a line for how long copygc has been waiting to sysfs copygc_wait, helpful for debugging why copygc isn't running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix an assert in copygc thread shutdown pathKent Overstreet1-1/+1
We're not supposed to have nested (locked) btree_trans on the stack: this means copygc shutdown needs to exit our btree_trans before exiting the move_ctxt, which calls bch2_write(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bucket_is_movable() -> BTREE_ITER_CACHEDKent Overstreet1-1/+1
BTREE_ITER_CACHED should really be the default for cached btrees - this is an easy mistake to make. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improved copygc pipeliningKent Overstreet1-38/+153
This improves copygc pipelining across multiple buckets: we now track each in flight bucket we're evacuating, with separate moving_contexts. This means that whereas previously we had to wait for outstanding moves to complete to ensure we didn't try to evacuate the same bucket twice, we can now just check buckets we want to evacuate against the pending list. This also mean we can run the verify_bucket_evacuated() check without killing pipelining - meaning it can now always be enabled, not just on debug builds. This is going to be important for the upcoming erasure coding work, where moving IOs that are being erasure coded will now skip the initial replication step; instead the IOs will wait on the stripe to complete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Mark stripe buckets with correct data typeKent Overstreet1-3/+7
Currently, we don't use bucket data type for tracking whether buckets are part of a stripe; parity buckets are BCH_DATA_parity, but data buckets in a stripe are BCH_DATA_user. There's a separate counter, buckets_ec, outside the BCH_DATA_TYPES system for tracking number of buckets on a device that are part of a stripe. The trouble with this approach is that it's too coarse grained, and we need better information on fragmentation for debugging copygc. With this patch, data buckets in a stripe are now tracked as BCH_DATA_stripe buckets. This doesn't yet differentiate between erasure coded and non-erasure coded data in a stripe bucket, nor do we yet track empty data buckets in stripes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_copygc_wait_to_text()Kent Overstreet1-0/+12
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fragmentation LRUKent Overstreet1-101/+70
Now that we have much more efficient updates to the LRU btree, this patch adds a new LRU that indexes buckets by fragmentation. This means copygc no longer has to scan every bucket to find buckets that need to be evacuated. Changes: - A new field in bch_alloc_v4, fragmentation_lru - this corresponds to the bucket's position in the fragmentation LRU. We add a new field for this instead of calculating it as needed because we may make the fragmentation LRU optional; this field indicates whether a bucket is on the fragmentation LRU. Also, zoned devices will introduce variable bucket sizes; explicitly recording the LRU position will be safer for them. - A new copygc path for using the fragmentation LRU instead of scanning every bucket and building up an in-memory heap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Copygc now uses backpointersKent Overstreet1-206/+30
Previously, copygc needed to walk the entire extents & reflink btrees to find extents that needed to be moved. Now that we have backpointers, this patch implements bch2_evacuate_bucket() in the move code, which copygc now uses for evacuating mostly empty buckets. Also, thanks to the new backpointers code, copygc can now move btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Better inlining for bch2_alloc_to_v4_mutKent Overstreet1-8/+9
This separates out the slowpath into a separate function, and inlines bch2_alloc_v4_mut into bch2_trans_start_alloc_update(), the main place it's called. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert EROFS errors to private error codesKent Overstreet1-1/+1
More error code improvements - this gets us more useful error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Suppress -EROFS messages when shutting downKent Overstreet1-1/+1
This isn't actually an error condition, this just indicates a normal shutdown - no reason for these to be in the log. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fixes for building in userspaceKent Overstreet1-1/+1
- Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add persistent counters for all tracepointsKent Overstreet1-2/+2
Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Use bch2_err_str() in error messagesKent Overstreet1-4/+7
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: move.c refactoringKent Overstreet1-3/+3
- add bch2_moving_ctxt_(init|exit) - split out __bch2_evacutae_bucket() which takes an existing moving_ctxt, this will be used for improving copygc performance by pipelining across multiple buckets Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: data jobs, including rebalance wait for copygc.Daniel Hill1-4/+11
move_ratelimit() now has a bool that specifies whether we want to wait for copygc to finish. When copygc is running, we're probably low on free buckets instead of consuming the remaining buckets, we want to wait for copygc to finish. This should help with performance, and run away bucket fragmentation. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Redo data_update interfaceKent Overstreet1-32/+31
This patch significantly cleans up and simplifies the data_update interface. Instead of only being able to specify a single pointer by device to rewrite, we're now able to specify any or all of the pointers in the original extent to be rewrited, as a bitmask. data_cmd is no more: the various pred functions now just return true if the extent should be moved/updated. All the data_update path does is rewrite existing replicas, or add new ones. This fixes a bug where with background compression on replicated filesystems, where rebalance -> data_update would incorrectly drop the wrong old replica, and keep trying to recompress an extent pointer and each time failing to drop the right replica. Oops. Now, the data update path doesn't look at the io options to decide which pointers to keep and which to drop - it only goes off of the data_update_options passed to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: Improve "copygc requested to run" error messageKent Overstreet1-1/+22
This improves the "copygc requested to run but no buckets found" to show the device that requires copygc to be run on - we'll definitely need to improve this more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>