summaryrefslogtreecommitdiff
path: root/fs/bcachefs/io.c
AgeCommit message (Collapse)AuthorFilesLines
2023-10-23bcachefs: Rename enum alloc_reserve -> bch_watermarkKent Overstreet1-2/+2
This is prep work for consolidating with JOURNAL_WATERMARK. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix more lockdep splats in debug.cKent Overstreet1-1/+1
Similar to previous fixes, we can't incur page faults while holding btree locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Delete warning from promote_alloc()Kent Overstreet1-3/+4
It's possible to see a -BCH_ERR_ENOSPC_disk_reservation here, and that's fine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix error handling in promote pathKent Overstreet1-2/+4
The promote path had a BUG_ON() for unknown error type, which we're now seeing: change it to a WARN_ON() - because we're curious what this is - and otherwise handle it in the normal error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: GFP_NOIO -> GFP_NOFSKent Overstreet1-10/+10
GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_make_mut() now calls bch2_trans_update()Kent Overstreet1-1/+1
It's safe to call bch2_trans_update with a k/v pair where the value hasn't been filled out, as long as the key part has been and the value is filled out by transaction commit time. This patch folds the bch2_trans_update() call into bch2_bkey_make_mut(), eliminating a bit of boilerplate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_get_mut() now calls bch2_trans_update()Kent Overstreet1-1/+1
It's safe to call bch2_trans_update with a k/v pair where the value hasn't been filled out, as long as the key part has been and the value is filled out by transaction commit time. This patch folds the bch2_trans_update() call into bch2_bkey_get_mut(), eliminating a bit of boilerplate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_get_mut() improvementsKent Overstreet1-7/+6
- bch2_bkey_get_mut() now handles types increasing in size, allocating a buffer for the type's current size when necessary - bch2_bkey_make_mut_typed() - bch2_bkey_get_mut() now initializes the iterator, like bch2_bkey_get_iter() Also, refactor so that most of the code is in functions - now macros are only used for wrappers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_bkey_get_iter() helpersKent Overstreet1-7/+4
Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix nocow write path closure bugKent Overstreet1-4/+4
With regular waitlists, we need to ensure we always call finish_wait(). With closures, the equivalent is that we need to call closure_sync() before returning with a stack-allocated closure. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Nocow write error path fixKent Overstreet1-11/+7
The nocow write error path was iterating over pointers in an extent, aftre we'd dropped btree locks - oops. Fortunately we'd already stashed what we need in nocow_lock_bucket, so use that instead. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix bch2_extent_fallocate() in nocow modeKent Overstreet1-0/+7
When we allocate disk space, we need to be incrementing the WRITE io clock, which perhaps should be renamed to sectors allocated - copygc uses this io clock to know when to run. Also, we should be incrementing the same clock when allocating btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New erasure coding shutdown pathKent Overstreet1-2/+8
This implements a new shutdown path for erasure coding, which is needed for the upcoming BCH_WRITE_WAIT_FOR_EC write path. The process is: - Cancel new stripes being built up - Close out/cancel open buckets on write points or the partial list that are for stripes - Shutdown rebalance/copygc - Then wait for in flight new stripes to finish With BCH_WRITE_WAIT_FOR_EC, move ops will be waiting on stripes to fill up before they complete; the new ec shutdown path is needed for shutting down copygc/rebalance without deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_fs_moving_ctxts_to_text()Kent Overstreet1-0/+28
This also adds bch2_write_op_to_text(): now we can see outstand moves, useful for debugging shutdown with the upcoming BCH_WRITE_WAIT_FOR_EC and likely for other things in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Private error codes: ENOMEMKent Overstreet1-9/+17
This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill bch_write_op->btree_update_readyKent Overstreet1-24/+13
This changes the write path to not add write ops to to the write_point's list of pending work items until it's ready; this means we have to change the lock protecting it to an irq-safe lock, but means bch2_write_point_do_index_updates() no longer has to iterate over the list, which is beneficial with the way the new BCH_WRITE_WAIT_FOR_EC code works. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23fixup bcachefs: Use for_each_btree_key_upto() more consistentlyKent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Free move buffers as early as possibleKent Overstreet1-0/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Drop some anonymous structs, unionsKent Overstreet1-1/+1
Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_write_queue()Kent Overstreet1-9/+13
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add option for completely disabling nocowKent Overstreet1-1/+1
This adds an option for completely disabling nocow mode, including the locking in the data move path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: don't block reads if we're promotingDaniel Hill1-0/+7
The promote path calls data_update_init() and now that we take locks here, there's potential for promote to block our read path, just error when we can't take the lock instead of blocking. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix promote path leakKent Overstreet1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix deadlock on nocow locks in data move pathKent Overstreet1-13/+15
The recent nocow locking rework introduced a deadlock in the data move path: the new nocow locking scheme uses a hash table with a fixed size array for chaining, meaning on hash collision we may have to wait for other locks to be released before we can lock a bucket. And since the data move path needs to submit writes from the same thread that's taking nocow locks and submitting reads, this introduces a deadlock. This shouldn't happen often in practice, but since the data move path can keep large numbers of IOs in flight simultaneously, it's something we have to handle. This patch makes move_ctxt_wait_event() available to bch2_data_update_init() and uses it when appropriate, which is our normal solution to this kind of thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Improved nocow lockingKent Overstreet1-10/+8
This improves the nocow lock table so that hash table entries have multiple locks, and locks specify which bucket they're for - i.e. we can now resolve hash collisions. This is important because the allocator has to skip buckets that are locked in the nocow lock table, and previously hash collisions would cause it to spuriously skip unlocked buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Nocow supportKent Overstreet1-13/+439
This adds support for nocow mode, where we do writes in-place when possible. Patch components: - New boolean filesystem and inode option, nocow: note that when nocow is enabled, data checksumming and compression are implicitly disabled - To prevent in-place writes from racing with data moves (data_update.c) or bucket reuse (i.e. a bucket being reused and re-allocated while a nocow write is in flight, we have a new locking mechanism. Buckets can be locked for either data update or data move, using a fixed size hash table of two_state_shared locks. We don't have any chaining, meaning updates and moves to different buckets that hash to the same lock will wait unnecessarily - we'll want to watch for this becoming an issue. - The allocator path also needs to check for in-place writes in flight to a given bucket before giving it out: thus we add another counter to bucket_alloc_state so we can track this. - Fsync now may need to issue cache flushes to block devices instead of flushing the journal. We add a device bitmask to bch_inode_info, ei_devs_need_flush, which tracks devices that need to have flushes issued - note that this will lead to unnecessary flushes when other codepaths have already issued flushes, we may want to replace this with a sequence number. - New nocow write path: look up extents, and if they're writable write to them - otherwise fall back to the normal COW write path. XXX: switch to sequence numbers instead of bitmask for devs needing journal flush XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to run in process context - see if we can improve this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Unwritten extents supportKent Overstreet1-0/+3
- bch2_extent_merge checks unwritten bit - read path returns 0s for unwritten extents without actually reading - reflink path skips over unwritten extents - bch2_bkey_ptrs_invalid() checks for extents with both written and unwritten extents, and non-normal extents (stripes, btree ptrs) with unwritten ptrs - fiemap checks for unwritten extents and returns FIEMAP_EXTENT_UNWRITTEN Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_extent_update_i_size_sectors()Kent Overstreet1-54/+60
In the io path, when we do the extent update we also have to update the inode - for i_size and i_sectors updates, as well as for bi_journal_seq for fsync. This factors that out into a new helper which will be used in the new nocow mode, in the unwritten extent conversion path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_extent_fallocate()Kent Overstreet1-0/+30
This factors out part of __bchfs_fallocate() in fs-io.c into an new, lower level io.c helper, which creates a single extent reservation. This is prep work for nocow support - the new helper will shortly gain the ability to create unwritten extents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Skip inode unpack/pack in bch2_extent_update()Kent Overstreet1-43/+45
This takes advantage of the new inode type to skip the expensive pack/unpack when inode updates are required in the extent update path. Additionally, we now skip the inode update entirely when i_sectors and i_size aren't changing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Drop old maybe_extending optimizationKent Overstreet1-32/+2
The extend update path had an optimization to avoid updating the inode if we knew we were definitely not extending the file. But now that we're updating inodes on every extent update - for fsync - that code can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-23bcachefs: KEY_TYPE_inode_v3, metadata_version_inode_v3Kent Overstreet1-1/+1
Move bi_size and bi_sectors into the non-varint portion of the inode, so that the write path can update them without going through the relatively expensive unpack/pack operations. Other changes: - Add a field for the offset of the varint section, so we can add new non-varint fields without needing a new inode type, like alloc_v3 - Move bi_mode into the flags field, so that the varint section can be u64 aligned Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Delete in memory ec backpointersKent Overstreet1-4/+0
Post btree backpointers, these aren't needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: trans->notrace_relock_failKent Overstreet1-0/+6
When we unlock in order to submit IO, the next relock event is likely to fail if submit_bio() blocked - we shouldn't those events in our _fail stats, since those are expected events and shouldn't cause test failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Debug mode for c->writes referencesKent Overstreet1-5/+5
This adds a debug mode where we split up the c->writes refcount into distinct refcounts for every codepath that takes a reference, and adds sysfs code to print the value of each ref. This will make it easier to debug shutdown hangs due to refcount leaks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Use for_each_btree_key_upto() more consistentlyKent Overstreet1-4/+5
It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use peek_upto() and provide an end for the interval we're searching for - otherwise, when we hit the end of the inode the next inode be in a different subvolume and not have any keys in the current snapshot, and we'd iterate over arbitrarily many keys before returning one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Better inlining in core write pathKent Overstreet1-2/+2
Provide inline versions of some allocation functions - bch2_alloc_sectors_done_inlined() - bch2_alloc_sectors_append_ptrs_inlined() and use them in the core IO path. Also, inline bch2_extent_update_i_size_sectors() and bch2_bkey_append_ptr(). In the core write path, function call overhead matters - every function call is a jump to a new location and a potential cache miss. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert EAGAIN errors to private error codesKent Overstreet1-1/+1
More error code cleanup, for better error messages and debugability. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert EROFS errors to private error codesKent Overstreet1-1/+1
More error code improvements - this gets us more useful error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Suppress -EROFS messages when shutting downKent Overstreet1-2/+4
This isn't actually an error condition, this just indicates a normal shutdown - no reason for these to be in the log. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New bpos_cmp(), bkey_cmp() replacementsKent Overstreet1-4/+4
This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill BCH_FEATURE_incompressibleKent Overstreet1-6/+1
This isn't needed anymore, we only support metadata versions that have this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Error message improvementKent Overstreet1-17/+35
- Centralize format strings in bcachefs.h - Add bch2_fmt_inum_offset() and related helpers - Switch error messages for inodes to also print out the offset, in bytes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill BCH_WRITE_FLUSHKent Overstreet1-35/+9
BCH_WRITE_FLUSH is a write flag that causes a journal flush. It's only used in the direct IO path, and this will allow for some consolidation with the regular fsync path, which will help with the upcoming nocow mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill bch2_alloc_sectors_start()Kent Overstreet1-13/+13
Only used in one place, just inline it there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: BCH_WRITE_SYNCKent Overstreet1-30/+51
This adds a new flag for the write path, BCH_WRITE_SYNC, and switches the O_DIRECT write path to use it when we're not running asynchronously. It runs the btree update after the write in the original thread's context instead of a kworker, cutting context switches in half. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill BCH_WRITE_JOURNAL_SEQ_PTRKent Overstreet1-2/+2
Dead code, delete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: fix bch2_write_extent() crc corruption.Daniel Hill1-2/+10
crc.compression_type & nouce gets reset to inside bch2_rechecksum_bio(), we set it back to the previous values calculated. This fixes incompressible extents being marked as uncompressed. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Factor out bch2_write_drop_io_error_ptrs()Kent Overstreet1-13/+26
Move slowpath code to a separate, non-inline function. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add persistent counters for all tracepointsKent Overstreet1-5/+5
Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>