summaryrefslogtreecommitdiff
path: root/fs/bcachefs/super.c
AgeCommit message (Collapse)AuthorFilesLines
2024-01-01bcachefs: for_each_member_device_rcu() now declares loop iterKent Overstreet1-16/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: for_each_member_device() now declares loop iterKent Overstreet1-23/+12
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: add more verbose loggingKent Overstreet1-4/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: darray_for_each() now declares loop iterKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_err_(fn|msg) check if should printKent Overstreet1-48/+31
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree write buffer now slurps keys from journalKent Overstreet1-1/+2
Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: BCH_IOCTL_FSCK_ONLINEKent Overstreet1-0/+1
This adds a new ioctl for running fsck on a mounted, in use filesystem. This reuses the fsck_thread code from the previous patch for running fsck on an offline, unmounted filesystem, so that log messages for the fsck thread are redirected to userspace. Only one running fsck instance is allowed at a time; a new semaphore (since the lock will be taken by one thread and released by another) is added for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add ability to redirect log outputKent Overstreet1-0/+28
Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: c->ro_refKent Overstreet1-0/+6
Add a new refcount for async ops that don't necessarily need the fs to be RW, with similar lifetime/rules otherwise as c->writes. To be used by online fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: convert bch_fs_flags to x-macroKent Overstreet1-28/+35
Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: track_event_change()Kent Overstreet1-1/+2
This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add extra verbose logging for ro pathKent Overstreet1-2/+12
Also log time waiting for c->writes references to be dropped; this will help in debugging why unmounts are taking longer than they should. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-14bcachefs: improve modprobe support by providing softdepsDaniel Hill1-0/+6
We need to help modprobe load architecture specific modules so we don't fall back to generic software implementations, this should help performance when building as a module. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-14bcachefs: fix invalid memory access in bch2_fs_alloc() error pathThomas Bertschinger1-0/+1
When bch2_fs_alloc() gets an error before calling bch2_fs_btree_iter_init(), bch2_fs_btree_iter_exit() makes an invalid memory access because btree_trans_list is uninitialized. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Fixes: 6bd68ec266ad ("bcachefs: Heap allocate btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Proper refcounting for journal_keysKent Overstreet1-2/+4
The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Start gc, copygc, rebalance threads after initing writes refKent Overstreet1-12/+16
This fixes a bug where copygc would occasionally race with going read-write and die, thinking we were read only, because it couldn't take a ref on c->writes. It's not necessary for copygc (or rebalance, or copygc) to take write refs; they could run with BCH_TRANS_COMMIT_nocheck_rw, but this is an easier fix that making sure that flag is passed correctly everywhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05bcachefs: Convert bch2_fs_open() to darrayKent Overstreet1-32/+28
Open coded dynamic arrays are deprecated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-02bcachefs: bch_sb_field_errorsKent Overstreet1-4/+8
Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-02bcachefs: Add IO error counts to bch_memberKent Overstreet1-0/+5
We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-02bcachefs: Fix a kasan splat in bch2_dev_add()Kent Overstreet1-10/+2
This fixes a use after free - mi is dangling after the resize call. Additionally, resizing the device's member info section was useless - we were attempting to preallocate the space required before adding it to the filesystem superblock, but there's other sections that we should have been preallocating as well for that to work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: bch2_disk_path_to_text() no longer takes sb_lockKent Overstreet1-1/+1
We're going to be using bch2_target_to_text() -> bch2_disk_path_to_text() from bch2_bkey_ptrs_to_text() and bch2_bkey_ptrs_invalid(), which can be called in any context. This patch adds the actual label to bch_disk_group_cpu so that it can be used by bch2_disk_path_to_text, and splits out bch2_disk_path_to_text() into two variants - like the previous patch, one for when we have a running filesystem and another for when we only have a superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: Ensure devices are always correctly initializedKent Overstreet1-13/+17
We can't mark device superblocks or allocate journal on a device that isn't online. That means we may need to do this on every mount, because we may have formatted a new filesystem and then done the first mount (bch2_fs_initialize()) in degraded mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: Delete duplicate time stats initializationKent Overstreet1-6/+0
This code duplicated initialization already done in bch2_fs_btree_iter_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: snapshot_create_lockKent Overstreet1-0/+1
Add a new lock for snapshot creation - this addresses a few races with logged operations and snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: bch2_sb_field_get() refactoringKent Overstreet1-3/+3
Instead of using token pasting to generate methods for each superblock section, just make the type a parameter to bch2_sb_field_get(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Correctly initialize new buckets on device resizeKent Overstreet1-0/+14
bch2_dev_resize() was never updated for the allocator rewrite with persistent freelists, and it wasn't noticed because the tests weren't running fsck - oops. Fix this by running bch2_dev_freespace_init() for the new buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: New superblock section members_v2Hunter Shaffer1-23/+31
members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Add new helper to retrieve bch_member from sbHunter Shaffer1-24/+11
Prep work for introducing bch_sb_field_members_v2 - introduce new helpers that will check for members_v2 if it exists, otherwise using v1 Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: More assertions for nocow lockingKent Overstreet1-0/+1
- assert in shutdown path that no nocow locks are held - check for overflow when taking nocow locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: fix error checking in bch2_fs_alloc()Dan Carpenter1-1/+1
There is a typo here where it uses ";" instead of "?:". The result is that bch2_fs_fs_io_direct_init() is called unconditionally and the errors from it are not checked. Fixes: 0060c68159fc ("bcachefs: Split up fs-io.[ch]") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Brian Foster <bfoster@redhat.com>
2023-10-23bcachefs: Remove a redundant and harmless bch2_free_super() callChristophe JAILLET1-1/+0
Remove a redundant call to bch2_free_super(). This is harmless because bch2_free_super() has a memset() at its end. So a second call would only lead to from kfree(NULL). Remove the redundant call and only rely on the error handling path. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix use-after-free in bch2_dev_add()Christophe JAILLET1-3/+1
If __bch2_dev_attach_bdev() fails, bch2_dev_free() is called twice. Once here and another time in the error handling path. This leads to several use-after-free. Remove the redundant call and only rely on the error handling path. Fixes: 6a44735653d4 ("bcachefs: Improved superblock-related error messages") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: add module description to fix modpost warningBrian Foster1-0/+1
modpost produces the following warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/bcachefs/bcachefs.o Add a module description for bcachefs. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Heap allocate btree_transKent Overstreet1-7/+0
We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix W=12 build errorsKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: BTREE_ID_logged_opsKent Overstreet1-0/+1
Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Break up io.cKent Overstreet1-3/+6
More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Kill incorrect assertionKent Overstreet1-2/+0
In the bch2_fs_alloc() error path we call bch2_fs_free() without setting BCH_FS_STOPPING - this is fine. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert more code to bch_err_msg()Kent Overstreet1-22/+21
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: restart journal reclaim thread on ro->rw transitionsBrian Foster1-0/+4
Commit c2d5ff36065a4 ("bcachefs: Start journal reclaim thread earlier") tweaked reclaim thread management to start a bit earlier in the mount sequence by moving the start call from __bch2_fs_read_write() to bch2_fs_journal_start(). This has the side effect of never starting the reclaim thread on a ro->rw transition, which can be observed by monitoring reclaim behavior via the journal_reclaim tracepoints. I.e. once an fs has remounted ro->rw, we only ever rely on direct reclaim from that point forward. Since bch2_journal_reclaim_start() properly handles the case where the reclaim thread has already been created, restore the start call in the read-write helper. This allows the reclaim thread to start early when appropriate and also exit/restart on remounts or freeze cycles. In the latter case it may be possible to simply allow the task to freeze rather than destroy it, but for now just fix the immediate bug. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Fix bch2_mount error pathKent Overstreet1-0/+2
In the bch2_mount() error path, we were calling deactivate_locked_super(), which calls ->kill_sb(), which in our case was calling bch2_fs_free() without __bch2_fs_stop(). This changes bch2_mount() to just call bch2_fs_stop() directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Split out snapshot.cKent Overstreet1-0/+1
subvolume.c has gotten a bit large, this splits out a separate file just for managing snapshot trees - BTREE_ID_snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: btree_journal_iter.cKent Overstreet1-0/+1
Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: sb-clean.cKent Overstreet1-0/+1
Pull code for bch_sb_field_clean out into its own file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Split up fs-io.[ch]Kent Overstreet1-1/+7
fs-io.c is too big - time for some reorganization - fs-dio.c: direct io - fs-pagecache.c: pagecache data structures (bch_folio), utility code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: BCH_COMPAT_bformat_overflow_done no longer requiredKent Overstreet1-0/+1
Awhile back, we changed bkey_format generation to ensure that the packed representation could never represent fields larger than the unpacked representation. This was to ensure that bkey_packed_successor() always gave a sensible result, but in the current code bkey_packed_successor() is only used in a debug assertion - not for anything important. This kills the requirement that we've gotten rid of those weird bkey formats, and instead changes the assertion to check if we're dealing with an old weird bkey format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Print version, options earlier in startup pathKent Overstreet1-2/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Don't start copygc until recovery is finishedKent Overstreet1-6/+13
With "bcachefs: Snapshot depth, skiplist fields", we now can't run data move operations until after bch2_check_snapshots() is complete. Ideally we'd have the copygc (and rebalance) threads wait until c->curr_recovery_pass has advanced, but the waitlist handling is tricky - so for now, move starting copygc back to read_write_late(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Convert more -EROFS to private error codesKent Overstreet1-5/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-23bcachefs: Delete redundant log messagesKent Overstreet1-13/+1
Now that we have distinct error codes for different memory allocation failures, the early init log messages are no longer needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>