summaryrefslogtreecommitdiff
path: root/fs/bcachefs
AgeCommit message (Collapse)AuthorFilesLines
2024-01-11Merge tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefsLinus Torvalds123-5887/+7025
Pull bcachefs updates from Kent Overstreet: - btree write buffer rewrite: instead of adding keys to the btree write buffer at transaction commit time, we now journal them with a different journal entry type and copy them from the journal to the write buffer just prior to journal write. This reduces the number of atomic operations on shared cachelines in the transaction commit path and is a signicant performance improvement on some workloads: multithreaded 4k random writes went from ~650k iops to ~850k iops. - Bring back optimistic spinning for six locks: the new implementation doesn't use osq locks; instead we add to the lock waitlist as normal, and then spin on the lock_acquired bit in the waitlist entry, _not_ the lock itself. - New ioctls: - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck but without mounting: useful for transparently using the kernel version of fsck from 'bcachefs fsck' when the kernel version is a better match for the on disk filesystem. - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet, but the passes that are supported are fully featured - errors may be corrected as normal. The new ioctls use the new 'thread_with_file' abstraction for kicking off a kthread that's tied to a file descriptor returned to userspace via the ioctl. - btree_paths within a btree_trans are now dynamically growable, instead of being limited to 64. This is important for the check_directory_structure phase of fsck, and also fixes some issues we were having with btree path overflow in the reflink btree. - Trigger refactoring; prep work for the upcoming disk space accounting rewrite - Numerous bugfixes :) * tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (226 commits) bcachefs: eytzinger0_find() search should be const bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent() bcachefs: fix simulateously upgrading & downgrading bcachefs: Restart recovery passes more reliably bcachefs: bch2_dump_bset() doesn't choke on u64s == 0 bcachefs: improve checksum error messages bcachefs: improve validate_bset_keys() bcachefs: print sb magic when relevant bcachefs: __bch2_sb_field_to_text() bcachefs: %pg is banished bcachefs: Improve would_deadlock trace event bcachefs: fsck_err()s don't need to manually check c->sb.version anymore bcachefs: Upgrades now specify errors to fix, like downgrades bcachefs: no thread_with_file in userspace bcachefs: Don't autofix errors we can't fix bcachefs: add missing bch2_latency_acct() call bcachefs: increase max_active on io_complete_wq bcachefs: add time_stats for btree_node_read_done() bcachefs: don't clear accessed bit in btree node fill bcachefs: Add an option to control btree node prefetching ...
2024-01-09Merge tag 'mm-stable-2024-01-08-15-31' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2024-01-08Merge tag 'vfs-6.8.super' of ↵Linus Torvalds3-11/+13
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs super updates from Christian Brauner: "This contains the super work for this cycle including the long-awaited series by Jan to make it possible to prevent writing to mounted block devices: - Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. We expect that this will be interesting to quite a few workloads. Btrfs is currently opted out of this because they still haven't merged patches we require for this to work from three kernel releases ago. - Reimplement block device freezing and thawing as holder operations on the block device. This allows us to extend block device freezing to all devices associated with a superblock and not just the main device. It also allows us to remove get_active_super() and thus another function that scans the global list of superblocks. Freezing via additional block devices only works if the filesystem chooses to use @fs_holder_ops for these additional devices as well. That currently only includes ext4 and xfs. Earlier releases switched get_tree_bdev() and mount_bdev() to use @fs_holder_ops. The remaining nilfs2 open-coded version of mount_bdev() has been converted to rely on @fs_holder_ops as well. So block device freezing for the main block device will continue to work as before. There should be no regressions in functionality. The only special case is btrfs where block device freezing for the main block device never worked because sb->s_bdev isn't set. Block device freezing for btrfs can be fixed once they can switch to @fs_holder_ops but that can happen whenever they're ready" * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) block: Fix a memory leak in bdev_open_by_dev() super: don't bother with WARN_ON_ONCE() super: massage wait event mechanism ext4: Block writes to journal device xfs: Block writes to log device fs: Block writes to mounted block devices btrfs: Do not restrict writes to btrfs devices block: Add config option to not allow writing to mounted devices block: Remove blkdev_get_by_*() functions bcachefs: Convert to bdev_open_by_path() fs: handle freezing from multiple devices fs: remove dead check nilfs2: simplify device handling fs: streamline thaw_super_locked ext4: simplify device handling xfs: simplify device handling fs: simplify setup_bdev_super() calls blkdev: comment fs_holder_ops porting: document block device freeze and thaw changes fs: remove unused helper ...
2024-01-06bcachefs: eytzinger0_find() search should be constKent Overstreet1-5/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent()Kent Overstreet2-8/+12
This is useful for btree ptrs as well, when we're just updating sectors_written. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: fix simulateously upgrading & downgradingKent Overstreet1-3/+12
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Restart recovery passes more reliablyKent Overstreet1-1/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: bch2_dump_bset() doesn't choke on u64s == 0Kent Overstreet1-0/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: improve checksum error messagesKent Overstreet5-29/+78
new helpers: - bch2_csum_to_text() - bch2_csum_err_msg() standardize our checksum error messages a bit, and print out the checksums a bit more nicely. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: improve validate_bset_keys()Kent Overstreet1-20/+55
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: print sb magic when relevantKent Overstreet1-1/+8
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: __bch2_sb_field_to_text()Kent Overstreet2-7/+14
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: %pg is banishedKent Overstreet4-16/+52
not portable to userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Improve would_deadlock trace eventKent Overstreet5-17/+42
We now include backtraces for every thread involved in the cycle. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: fsck_err()s don't need to manually check c->sb.version anymoreKent Overstreet8-55/+42
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Upgrades now specify errors to fix, like downgradesKent Overstreet6-99/+116
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: no thread_with_file in userspaceKent Overstreet1-0/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Don't autofix errors we can't fixKent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: add missing bch2_latency_acct() callKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: increase max_active on io_complete_wqKent Overstreet1-1/+1
this definitely should _not_ be 1, and we don't actually want any concurrency limiting at all here - btree node read completions are getting blocked behind btree node write submissions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: add time_stats for btree_node_read_done()Kent Overstreet2-0/+3
Seeing weird latency issues in the btree node read path - add one bch2_btree_node_read_done(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: don't clear accessed bit in btree node fillKent Overstreet1-6/+0
Seeing strange performance issues that might be caused by memory pressure causing prefetched nodes to be evicted before they're used. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Add an option to control btree node prefetchingKent Overstreet2-3/+11
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: kill useless return retKent Overstreet1-3/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Combine .trans_trigger, .atomic_triggerKent Overstreet11-91/+61
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: unify extent triggerKent Overstreet5-96/+39
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: bch2_trigger_stripe_ptr()Kent Overstreet1-67/+61
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Online fsck can now fix errorsKent Overstreet4-11/+20
BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: bch2_trigger_pointer()Kent Overstreet1-234/+209
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: unify stripe triggerKent Overstreet2-97/+76
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: move stripe triggers to ec.cKent Overstreet4-340/+352
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: unify alloc triggerKent Overstreet2-165/+137
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: move bch2_mark_alloc() to alloc_background.cKent Overstreet4-128/+132
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: unify reservation triggerKent Overstreet3-63/+40
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: unify reflink_p triggerKent Overstreet2-82/+58
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: unify inode triggerKent Overstreet2-50/+33
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: kill mem_trigger_run_overwrite_then_insert()Kent Overstreet3-7/+4
now that type signatures are unified, redundant Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: BTREE_TRIGGER_TRANSACTIONALKent Overstreet1-4/+22
New flag so that triggers can distinguish whether we're running transactional or atomic triggers (or gc) - unifying the callbacks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Kill BTREE_TRIGGER_NOATOMICKent Overstreet2-6/+1
dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: mark now takes bkey_sKent Overstreet11-29/+43
Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: trans_mark now takes bkey_sKent Overstreet12-62/+62
Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Upgrading uses bch_sb.recovery_passes_requiredKent Overstreet1-8/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: factor out thread_with_file, thread_with_stdioKent Overstreet9-245/+459
thread_with_stdio now knows how to handle input - fsck can now prompt to fix errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Fix printing of device durabilityKent Overstreet1-1/+1
BCH_MEMBER_DURABILITY() was not present initially; a value of 0 means use the default, nonzero means use v - 1. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: __bch2_journal_key_to_wb -> bch2_journal_key_to_wb_slowpathKent Overstreet2-3/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: __journal_keys_sort() refactoringKent Overstreet1-3/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: wb_key_cmp -> wb_key_ref_cmpKent Overstreet1-6/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: track transaction durationsKent Overstreet3-12/+27
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: btree_trans always has statsKent Overstreet3-19/+8
reserve slot 0 for unknown (when we overflow), to avoid some branches Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-06bcachefs: Split brain detectionKent Overstreet2-11/+65
Use the new bch_member->seq, sb->write_time fields to detect split brain and kick out devices when necessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>