summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)AuthorFilesLines
2024-03-11Merge tag 'for-6.9/block-20240310' of git://git.kernel.dk/linuxLinus Torvalds1-3/+12
Pull block updates from Jens Axboe: - MD pull requests via Song: - Cleanup redundant checks (Yu Kuai) - Remove deprecated headers (Marc Zyngier, Song Liu) - Concurrency fixes (Li Lingfeng) - Memory leak fix (Li Nan) - Refactor raid1 read_balance (Yu Kuai, Paul Luse) - Clean up and fix for md_ioctl (Li Nan) - Other small fixes (Gui-Dong Han, Heming Zhao) - MD atomic limits (Christoph) - NVMe pull request via Keith: - RDMA target enhancements (Max) - Fabrics fixes (Max, Guixin, Hannes) - Atomic queue_limits usage (Christoph) - Const use for class_register (Ricardo) - Identification error handling fixes (Shin'ichiro, Keith) - Improvement and cleanup for cached request handling (Christoph) - Moving towards atomic queue limits. Core changes and driver bits so far (Christoph) - Fix UAF issues in aoeblk (Chun-Yi) - Zoned fix and cleanups (Damien) - s390 dasd cleanups and fixes (Jan, Miroslav) - Block issue timestamp caching (me) - noio scope guarding for zoned IO (Johannes) - block/nvme PI improvements (Kanchan) - Ability to terminate long running discard loop (Keith) - bdev revalidation fix (Li) - Get rid of old nr_queues hack for kdump kernels (Ming) - Support for async deletion of ublk (Ming) - Improve IRQ bio recycling (Pavel) - Factor in CPU capacity for remote vs local completion (Qais) - Add shared_tags configfs entry for null_blk (Shin'ichiro - Fix for a regression in page refcounts introduced by the folio unification (Tony) - Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid, Ricardo, Roman, Tang, Uwe) * tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits) block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC block/swim: Convert to platform remove callback returning void cdrom: gdrom: Convert to platform remove callback returning void block: remove disk_stack_limits md: remove mddev->queue md: don't initialize queue limits md/raid10: use the atomic queue limit update APIs md/raid5: use the atomic queue limit update APIs md/raid1: use the atomic queue limit update APIs md/raid0: use the atomic queue limit update APIs md: add queue limit helpers md: add a mddev_is_dm helper md: add a mddev_add_trace_msg helper md: add a mddev_trace_remap helper bcache: move calculation of stripe_size and io_opt into bcache_device_init virtio_blk: Do not use disk_set_max_open/active_zones() aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts block: move capacity validation to blkpg_do_ioctl() block: prevent division by zero in blk_rq_stat_sum() drbd: atomically update queue limits in drbd_reconsider_queue_parameters ...
2024-03-11Merge tag 'vfs-6.9.uuid' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs uuid updates from Christian Brauner: "This adds two new ioctl()s for getting the filesystem uuid and retrieving the sysfs path based on the path of a mounted filesystem. Getting the filesystem uuid has been implemented in filesystem specific code for a while it's now lifted as a generic ioctl" * tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: add support for FS_IOC_GETFSSYSFSPATH fs: add FS_IOC_GETFSSYSFSPATH fat: Hook up sb->s_uuid fs: FS_IOC_GETUUID ovl: convert to super_set_uuid() fs: super_set_uuid()
2024-03-11Merge tag 'vfs-6.9.super' of ↵Linus Torvalds2-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull block handle updates from Christian Brauner: "Last cycle we changed opening of block devices, and opening a block device would return a bdev_handle. This allowed us to implement support for restricting and forbidding writes to mounted block devices. It was accompanied by converting and adding helpers to operate on bdev_handles instead of plain block devices. That was already a good step forward but ultimately it isn't necessary to have special purpose helpers for opening block devices internally that return a bdev_handle. Fundamentally, opening a block device internally should just be equivalent to opening files. So now all internal opens of block devices return files just as a userspace open would. Instead of introducing a separate indirection into bdev_open_by_*() via struct bdev_handle bdev_file_open_by_*() is made to just return a struct file. Opening and closing a block device just becomes equivalent to opening and closing a file. This all works well because internally we already have a pseudo fs for block devices and so opening block devices is simple. There's a few places where we needed to be careful such as during boot when the kernel is supposed to mount the rootfs directly without init doing it. Here we need to take care to ensure that we flush out any asynchronous file close. That's what we already do for opening, unpacking, and closing the initramfs. So nothing new here. The equivalence of opening and closing block devices to regular files is a win in and of itself. But it also has various other advantages. We can remove struct bdev_handle completely. Various low-level helpers are now private to the block layer. Other helpers were simply removable completely. A follow-up series that is already reviewed build on this and makes it possible to remove bdev->bd_inode and allows various clean ups of the buffer head code as well. All places where we stashed a bdev_handle now just stash a file and use simple accessors to get to the actual block device which was already the case for bdev_handle" * tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) block: remove bdev_handle completely block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access bdev: remove bdev pointer from struct bdev_handle bdev: make struct bdev_handle private to the block layer bdev: make bdev_{release, open_by_dev}() private to block layer bdev: remove bdev_open_by_path() reiserfs: port block device access to file ocfs2: port block device access to file nfs: port block device access to files jfs: port block device access to file f2fs: port block device access to files ext4: port block device access to file erofs: port device access to file btrfs: port device access to file bcachefs: port block device access to file target: port block device access to file s390: port block device access to file nvme: port block device access to file block2mtd: port device access to files bcache: port block device access to files ...
2024-03-11Merge tag 'vfs-6.9.iomap' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull iomap updates from Christian Brauner: - Restore read-write hints in struct bio through the bi_write_hint member for the sake of UFS devices in mobile applications. This can result in up to 40% lower write amplification in UFS devices. The patch series that builds on this will be coming in via the SCSI maintainers (Bart) - Overhaul the iomap writeback code. Afterwards ->map_blocks() is able to map multiple blocks at once as long as they're in the same folio. This reduces CPU usage for buffered write workloads on e.g., xfs on systems with lots of cores (Christoph) - Record processed bytes in iomap_iter() trace event (Kassey) - Extend iomap_writepage_map() trace event after Christoph's ->map_block() changes to map mutliple blocks at once (Zhang) * tag 'vfs-6.9.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) iomap: Add processed for iomap_iter iomap: add pos and dirty_len into trace_iomap_writepage_map block, fs: Restore the per-bio/request data lifetime fields fs: Propagate write hints to the struct block_device inode fs: Move enum rw_hint into a new header file fs: Split fcntl_rw_hint() fs: Verify write lifetime constants at compile time fs: Fix rw_hint validation iomap: pass the length of the dirty region to ->map_blocks iomap: map multiple blocks at a time iomap: submit ioends immediately iomap: factor out a iomap_writepage_map_block helper iomap: only call mapping_set_error once for each failed bio iomap: don't chain bios iomap: move the iomap_sector sector calculation out of iomap_add_to_ioend iomap: clean up the iomap_alloc_ioend calling convention iomap: move all remaining per-folio logic into iomap_writepage_map iomap: factor out a iomap_writepage_handle_eof helper iomap: move the PF_MEMALLOC check to iomap_writepages iomap: move the io_folios field out of struct iomap_ioend ...
2024-03-07Merge tag 'for-next-6.9' of ↵Christian Brauner2-1/+1
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode into vfs.misc Merge case-insensitive updates from Gabriel Krisman Bertazi: - Patch case-insensitive lookup by trying the case-exact comparison first, before falling back to costly utf8 casefolded comparison. - Fix to forbid using a case-insensitive directory as part of an overlayfs mount. - Patchset to ensure d_op are set at d_alloc time for fscrypt and casefold volumes, ensuring filesystem dentries will all have the correct ops, whether they come from a lookup or not. * tag 'for-next-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/krisman/unicode: libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-28f2fs: Configure dentry operations at dentry-creation timeGabriel Krisman Bertazi2-1/+1
This was already the case for case-insensitive before commit bb9cd9106b22 ("fscrypt: Have filesystems handle their d_ops"), but it was changed to set at lookup-time to facilitate the integration with fscrypt. But it's a problem because dentries that don't get created through ->lookup() won't have any visibility of the operations. Since fscrypt now also supports configuring dentry operations at creation-time, do it for any encrypted and/or casefold volume, simplifying the implementation across these features. Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20240221171412.10710-9-krisman@suse.de Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-02-25f2fs: port block device access to filesChristian Brauner2-7/+7
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-22-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25bdev: open block device as filesChristian Brauner1-1/+1
Add two new helpers to allow opening block devices as files. This is not the final infrastructure. This still opens the block device before opening a struct a file. Until we have removed all references to struct bdev_handle we can't switch the order: * Introduce blk_to_file_flags() to translate from block specific to flags usable to pen a new file. * Introduce bdev_file_open_by_{dev,path}(). * Introduce temporary sb_bdev_handle() helper to retrieve a struct bdev_handle from a block device file and update places that directly reference struct bdev_handle to rely on it. * Don't count block device openes against the number of open files. A bdev_file_open_by_{dev,path}() file is never installed into any file descriptor table. One idea that came to mind was to use kernel_tmpfile_open() which would require us to pass a path and it would then call do_dentry_open() going through the regular fops->open::blkdev_open() path. But then we're back to the problem of routing block specific flags such as BLK_OPEN_RESTRICT_WRITES through the open path and would have to waste FMODE_* flags every time we add a new one. With this we can avoid using a flag bit and we have more leeway in how we open block devices from bdev_open_by_{dev,path}(). Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-12block: remove gfp_flags from blkdev_zone_mgmtJohannes Thumshirn1-2/+2
Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can drop the gfp_mask parameter from blkdev_zone_mgmt() as well as blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated(). Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-5-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12f2fs: guard blkdev_zone_mgmt with nofs scopeJohannes Thumshirn1-3/+12
Guard the calls to blkdev_zone_mgmt() with a memalloc_nofs scope. This helps us getting rid of the GFP_NOFS argument to blkdev_zone_mgmt(); Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-4-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-08fs: super_set_uuid()Kent Overstreet1-1/+1
Some weird old filesytems have UUID-like things that we wish to expose as UUIDs, but are smaller; add a length field so that the new FS_IOC_(GET|SET)UUID ioctls can handle them in generic code. And add a helper super_set_uuid(), for setting nonstandard length uuids. Helper is now required for the new FS_IOC_GETUUID ioctl; if super_set_uuid() hasn't been called, the ioctl won't be supported. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240207025624.1019754-2-kent.overstreet@linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-06fs: Move enum rw_hint into a new header fileBart Van Assche1-0/+1
Move enum rw_hint into a new header file to prepare for using this data type in the block layer. Add the attribute __packed to reduce the space occupied by instances of this data type from four bytes to one byte. Change the data type of i_write_hint from u8 into enum rw_hint. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Chao Yu <chao@kernel.org> # for the F2FS part Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240202203926.2478590-5-bvanassche@acm.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-23Merge tag 'exportfs-6.9' of ↵Christian Brauner1-11/+0
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux Merge exportfs fixes from Chuck Lever: * tag 'exportfs-6.9' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/cel/linux: fs: Create a generic is_dot_dotdot() utility exportfs: fix the fallback implementation of the get_name export operation Link: https://lore.kernel.org/r/BDC2AEB4-7085-4A7C-8DE8-A659FE1DBA6A@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-23fs: Create a generic is_dot_dotdot() utilityChuck Lever1-11/+0
De-duplicate the same functionality in several places by hoisting the is_dot_dotdot() utility function into linux/fs.h. Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-13f2fs: fix double free of f2fs_sb_infoEric Biggers1-0/+1
kill_f2fs_super() is called even if f2fs_fill_super() fails. f2fs_fill_super() frees the struct f2fs_sb_info, so it must set sb->s_fs_info to NULL to prevent it from being freed again. Fixes: 275dca4630c1 ("f2fs: move release of block devices to after kill_block_super()") Reported-by: <syzbot+8f477ac014ff5b32d81f@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/lkml/0000000000006cb174060ec34502@google.com Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/linux-f2fs-devel/20240113005747.38887-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-01-12Merge tag 'f2fs-for-6.8-rc1' of ↵Linus Torvalds13-254/+260
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs update from Jaegeuk Kim: "In this series, we've some progress to support Zoned block device regarding to the power-cut recovery flow and enabling checkpoint=disable feature which is essential for Android OTA. Other than that, some patches touched sysfs entries and tracepoints which are minor, while several bug fixes on error handlers and compression flows are good to improve the overall stability. Enhancements: - enable checkpoint=disable for zoned block device - sysfs entries such as discard status, discard_io_aware, dir_level - tracepoints such as f2fs_vm_page_mkwrite(), f2fs_rename(), f2fs_new_inode() - use shared inode lock during f2fs_fiemap() and f2fs_seek_block() Bug fixes: - address some power-cut recovery issues on zoned block device - handle errors and logics on do_garbage_collect(), f2fs_reserve_new_block(), f2fs_move_file_range(), f2fs_recover_xattr_data() - don't set FI_PREALLOCATED_ALL for partial write - fix to update iostat correctly in f2fs_filemap_fault() - fix to wait on block writeback for post_read case - fix to tag gcing flag on page during block migration - restrict max filesize for 16K f2fs - fix to avoid dirent corruption - explicitly null-terminate the xattr list There are also several clean-up patches to remove dead codes and better readability" * tag 'f2fs-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (33 commits) f2fs: show more discard status by sysfs f2fs: Add error handling for negative returns from do_garbage_collect f2fs: Constrain the modification range of dir_level in the sysfs f2fs: Use wait_event_freezable_timeout() for freezable kthread f2fs: fix to check return value of f2fs_recover_xattr_data f2fs: don't set FI_PREALLOCATED_ALL for partial write f2fs: fix to update iostat correctly in f2fs_filemap_fault() f2fs: fix to check compress file in f2fs_move_file_range() f2fs: fix to wait on block writeback for post_read case f2fs: fix to tag gcing flag on page during block migration f2fs: add tracepoint for f2fs_vm_page_mkwrite() f2fs: introduce f2fs_invalidate_internal_cache() for cleanup f2fs: update blkaddr in __set_data_blkaddr() for cleanup f2fs: introduce get_dnode_addr() to clean up codes f2fs: delete obsolete FI_DROP_CACHE f2fs: delete obsolete FI_FIRST_BLOCK_WRITTEN f2fs: Restrict max filesize for 16K f2fs f2fs: let's finish or reset zones all the time f2fs: check write pointers when checkpoint=disable f2fs: fix write pointers on zoned device after roll forward ...
2024-01-12Merge tag 'pull-rename' of ↵Linus Torvalds1-7/+8
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull rename updates from Al Viro: "Fix directory locking scheme on rename This was broken in 6.5; we really can't lock two unrelated directories without holding ->s_vfs_rename_mutex first and in case of same-parent rename of a subdirectory 6.5 ends up doing just that" * tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: rename(): avoid a deadlock in the case of parents having no common ancestor kill lock_two_inodes() rename(): fix the locking of subdirectories f2fs: Avoid reading renamed directory if parent does not change ext4: don't access the source subdirectory content on same-directory rename ext2: Avoid reading renamed directory if parent does not change udf_rename(): only access the child content on cross-directory rename ocfs2: Avoid touching renamed directory if parent does not change reiserfs: Avoid touching renamed directory if parent does not change
2024-01-12Merge tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linuxLinus Torvalds2-11/+8
Pull block updates from Jens Axboe: "Pretty quiet round this time around. This contains: - NVMe updates via Keith: - nvme fabrics spec updates (Guixin, Max) - nvme target udpates (Guixin, Evan) - nvme attribute refactoring (Daniel) - nvme-fc numa fix (Keith) - MD updates via Song: - Fix/Cleanup RCU usage from conf->disks[i].rdev (Yu Kuai) - Fix raid5 hang issue (Junxiao Bi) - Add Yu Kuai as Reviewer of the md subsystem - Remove deprecated flavors (Song Liu) - raid1 read error check support (Li Nan) - Better handle events off-by-1 case (Alex Lyakas) - Efficiency improvements for passthrough (Kundan) - Support for mapping integrity data directly (Keith) - Zoned write fix (Damien) - rnbd fixes (Kees, Santosh, Supriti) - Default to a sane discard size granularity (Christoph) - Make the default max transfer size naming less confusing (Christoph) - Remove support for deprecated host aware zoned model (Christoph) - Misc fixes (me, Li, Matthew, Min, Ming, Randy, liyouhong, Daniel, Bart, Christoph)" * tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux: (78 commits) block: Treat sequential write preferred zone type as invalid block: remove disk_clear_zoned sd: remove the !ZBC && blk_queue_is_zoned case in sd_read_block_characteristics drivers/block/xen-blkback/common.h: Fix spelling typo in comment blk-cgroup: fix rcu lockdep warning in blkg_lookup() blk-cgroup: don't use removal safe list iterators block: floor the discard granularity to the physical block size mtd_blkdevs: use the default discard granularity bcache: use the default discard granularity zram: use the default discard granularity null_blk: use the default discard granularity nbd: use the default discard granularity ubd: use the default discard granularity block: default the discard granularity to sector size bcache: discard_granularity should not be smaller than a sector block: remove two comments in bio_split_discard block: rename and document BLK_DEF_MAX_SECTORS loop: don't abuse BLK_DEF_MAX_SECTORS aoe: don't abuse BLK_DEF_MAX_SECTORS null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORS ...
2024-01-10Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds1-5/+8
Pull fscrypt updates from Eric Biggers: "Adjust the timing of the fscrypt keyring destruction, to prepare for btrfs's fscrypt support. Also document that CephFS supports fscrypt now" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fs: move fscrypt keyring destruction to after ->put_super f2fs: move release of block devices to after kill_block_super() fscrypt: document that CephFS supports fscrypt now fscrypt: update comment for do_remove_key() fscrypt.rst: update definition of struct fscrypt_context_v2
2024-01-09Merge tag 'mm-stable-2024-01-08-15-31' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...
2023-12-28f2fs: move release of block devices to after kill_block_super()Eric Biggers1-5/+8
Call destroy_device_list() and free the f2fs_sb_info from kill_f2fs_super(), after the call to kill_block_super(). This is necessary to order it after the call to fscrypt_destroy_keyring() once generic_shutdown_super() starts calling fscrypt_destroy_keyring() just after calling ->put_super. This is because fscrypt_destroy_keyring() may call into f2fs_get_devices() via the fscrypt_operations. Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20231227171429.9223-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-12-27f2fs: show more discard status by sysfsZhiguo Niu1-0/+34
The current pending_discard attr just only shows the discard_cmd_cnt information. More discard status can be shown so that we can check them through sysfs when needed. Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-27f2fs: Add error handling for negative returns from do_garbage_collectYongpeng Yang1-0/+3
The function do_garbage_collect can return a value less than 0 due to f2fs_cp_error being true or page allocation failure, as a result of calling f2fs_get_sum_page. However, f2fs_gc does not account for such cases, which could potentially lead to an abnormal total_freed and thus cause subsequent code to behave unexpectedly. Given that an f2fs_cp_error is irrecoverable, and considering that do_garbage_collect already retries page allocation errors through its call to f2fs_get_sum_page->f2fs_get_meta_page_retry, any error reported by do_garbage_collect should immediately terminate the current GC. Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-27f2fs: Constrain the modification range of dir_level in the sysfsYongpeng Yang1-0/+7
The {struct f2fs_sb_info}->dir_level can be modified through the sysfs interface, but its value range is not limited. If the value exceeds MAX_DIR_HASH_DEPTH and the mount options include "noinline_dentry", the following error will occur: [root@fedora ~]# mount -o noinline_dentry /dev/sdb /mnt/sdb/ [root@fedora ~]# echo 128 > /sys/fs/f2fs/sdb/dir_level [root@fedora ~]# cd /mnt/sdb/ [root@fedora sdb]# mkdir test [root@fedora sdb]# cd test/ [root@fedora test]# mkdir test mkdir: cannot create directory 'test': Argument list too long Signed-off-by: Yongpeng Yang <yangyongpeng1@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-27f2fs: Use wait_event_freezable_timeout() for freezable kthreadKevin Hao2-8/+5
A freezable kernel thread can enter frozen state during freezing by either calling try_to_freeze() or using wait_event_freezable() and its variants. So for the following snippet of code in a kernel thread loop: wait_event_interruptible_timeout(); try_to_freeze(); We can change it to a simple wait_event_freezable_timeout() and then eliminate the function calls to try_to_freeze() and freezing(). Signed-off-by: Kevin Hao <haokexin@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-20block: remove support for the host aware zone modelChristoph Hellwig2-11/+8
When zones were first added the SCSI and ATA specs, two different models were supported (in addition to the drive managed one that is invisible to the host): - host managed where non-conventional zones there is strict requirement to write at the write pointer, or else an error is returned - host aware where a write point is maintained if writes always happen at it, otherwise it is left in an under-defined state and the sequential write preferred zones behave like conventional zones (probably very badly performing ones, though) Not surprisingly this lukewarm model didn't prove to be very useful and was finally removed from the ZBC and SBC specs (NVMe never implemented it). Due to to the easily disappearing write pointer host software could never rely on the write pointer to actually be useful for say recovery. Fortunately only a few HDD prototypes shipped using this model which never made it to mass production. Drop the support before it is too late. Note that any such host aware prototype HDD can still be used with Linux as we'll now treat it as a conventional HDD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-16f2fs: fix to check return value of f2fs_recover_xattr_dataZhiguo Niu2-7/+10
Should check return value of f2fs_recover_xattr_data in __f2fs_setxattr rather than doing invalid retry if error happen. Also just do set_page_dirty in f2fs_recover_xattr_data when page is changed really. Fixes: 50a472bbc79f ("f2fs: do not return EFSCORRUPTED, but try to run online repair") Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: don't set FI_PREALLOCATED_ALL for partial writeChao Yu1-1/+2
In f2fs_preallocate_blocks(), if it is partial write in 4KB, it's not necessary to call f2fs_map_blocks() and set FI_PREALLOCATED_ALL flag. Cc: Eric Biggers <ebiggers@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: fix to update iostat correctly in f2fs_filemap_fault()Chao Yu1-1/+1
In f2fs_filemap_fault(), it fixes to update iostat info only if VM_FAULT_LOCKED is tagged in return value of filemap_fault(). Fixes: 8b83ac81f428 ("f2fs: support read iostat") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: fix to check compress file in f2fs_move_file_range()Chao Yu1-0/+5
f2fs_move_file_range() doesn't support migrating compressed cluster data, let's add the missing check condition and return -EOPNOTSUPP for the case until we support it. Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: fix to wait on block writeback for post_read caseChao Yu1-3/+4
If inode is compressed, but not encrypted, it missed to call f2fs_wait_on_block_writeback() to wait for GCed page writeback in IPU write path. Thread A GC-Thread - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_block - f2fs_submit_page_write migrate normal cluster's block via meta_inode's page cache - f2fs_write_single_data_page - f2fs_do_write_data_page - f2fs_inplace_write_data - f2fs_submit_page_bio IRQ - f2fs_read_end_io IRQ old data overrides new data due to out-of-order GC and common IO. - f2fs_read_end_io Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: fix to tag gcing flag on page during block migrationChao Yu2-1/+5
It needs to add missing gcing flag on page during block migration, in order to garantee migrated data be persisted during checkpoint, otherwise out-of-order persistency between data and node may cause data corruption after SPOR. Similar issue was fixed by commit 2d1fe8a86bf5 ("f2fs: fix to tag gcing flag on page during file defragment"). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: add tracepoint for f2fs_vm_page_mkwrite()Chao Yu1-10/+15
This patch adds to support tracepoint for f2fs_vm_page_mkwrite(), meanwhile it prints more details for trace_f2fs_filemap_fault(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: introduce f2fs_invalidate_internal_cache() for cleanupChao Yu4-18/+16
Just cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: update blkaddr in __set_data_blkaddr() for cleanupChao Yu3-16/+11
This patch allows caller to pass blkaddr to f2fs_set_data_blkaddr() and let __set_data_blkaddr() inside f2fs_set_data_blkaddr() to update dn->data_blkaddr w/ last value of blkaddr. Just cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: introduce get_dnode_addr() to clean up codesChao Yu4-37/+32
Just cleanup, no logic changes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: delete obsolete FI_DROP_CACHEChao Yu2-9/+0
FI_DROP_CACHE was introduced in commit 1e84371ffeef ("f2fs: change atomic and volatile write policies") for volatile write feature, after commit 7bc155fec5b3 ("f2fs: kill volatile write support"), we won't support volatile write, let's delete related codes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-12f2fs: delete obsolete FI_FIRST_BLOCK_WRITTENChao Yu6-40/+0
Commit 3c6c2bebef79 ("f2fs: avoid punch_hole overhead when releasing volatile data") introduced FI_FIRST_BLOCK_WRITTEN as below reason: This patch is to avoid some punch_hole overhead when releasing volatile data. If volatile data was not written yet, we just can make the first page as zero. After commit 7bc155fec5b3 ("f2fs: kill volatile write support"), we won't support volatile write, but it missed to remove obsolete FI_FIRST_BLOCK_WRITTEN, delete it in this patch. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-11fs: convert error_remove_page to error_remove_folioMatthew Wilcox (Oracle)2-2/+2
There were already assertions that we were not passing a tail page to error_remove_page(), so make the compiler enforce that by converting everything to pass and use a folio. Link: https://lkml.kernel.org/r/20231117161447.2461643-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-08f2fs: Restrict max filesize for 16K f2fsDaniel Rosenberg1-0/+8
Blocks are tracked by u32, so the max permitted filesize is (U32_MAX + 1) * BLOCK_SIZE. Additionally, in order to support crypto data unit sizes of 4K with a 16K block with IV_INO_LBLK_{32,64}, we must further restrict max filesize to (U32_MAX + 1) * 4096. This does not affect 4K blocksize f2fs as the natural limit for files are well below that. Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size") Signed-off-by: Daniel Rosenberg <drosen@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-05f2fs: let's finish or reset zones all the timeJaegeuk Kim1-58/+17
In order to limit # of open zones, let's finish or reset zones given # of valid blocks per section and its zone condition. Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-05f2fs: check write pointers when checkpoint=disableJaegeuk Kim1-2/+1
Even if f2fs was rebooted as staying checkpoint=disable, let's match the write pointers all the time. Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-05f2fs: fix write pointers on zoned device after roll forwardJaegeuk Kim1-0/+2
1. do roll forward recovery 2. update current segments pointers 3. fix the entire zones' write pointers 4. do checkpoint Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-05f2fs: allocate new section if it's not newJaegeuk Kim1-16/+17
If fsck can allocate a new zone, it'd be better to use that instead of allocating a new one. And, it modifies kernel messages. Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-01f2fs: allow checkpoint=disable for zoned block deviceJaegeuk Kim1-5/+0
Let's allow checkpoint=disable back for zoned block device. It's very risky as the feature relies on fsck or runtime recovery which matches the write pointers again if the device rebooted while disabling the checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-28f2fs: sysfs: support discard_io_awareChao Yu3-1/+21
It gives a way to enable/disable IO aware feature for background discard, so that we can tune background discard more precisely based on undiscard condition. e.g. force to disable IO aware if there are large number of discard extents, and discard IO may always be interrupted by frequent common IO. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-28f2fs: introduce tracepoint for f2fs_rename()Chao Yu1-5/+11
This patch adds tracepoints for f2fs_rename(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-28f2fs: fix to avoid dirent corruptionChao Yu1-1/+1
As Al reported in link[1]: f2fs_rename() ... if (old_dir != new_dir && !whiteout) f2fs_set_link(old_inode, old_dir_entry, old_dir_page, new_dir); else f2fs_put_page(old_dir_page, 0); You want correct inumber in the ".." link. And cross-directory rename does move the source to new parent, even if you'd been asked to leave a whiteout in the old place. [1] https://lore.kernel.org/all/20231017055040.GN800259@ZenIV/ With below testcase, it may cause dirent corruption, due to it missed to call f2fs_set_link() to update ".." link to new directory. - mkdir -p dir/foo - renameat2 -w dir/foo bar [ASSERT] (__chk_dots_dentries:1421) --> Bad inode number[0x4] for '..', parent parent ino is [0x3] [FSCK] other corrupted bugs [Fail] Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT") Cc: Jan Kara <jack@suse.cz> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-25f2fs: Avoid reading renamed directory if parent does not changeJan Kara1-7/+8
The VFS will not be locking moved directory if its parent does not change. Change f2fs rename code to avoid reading renamed directory if its parent does not change. Having it uninlined while we are reading it would cause trouble and we won't be able to rely upon ->i_rwsem on the directory being renamed in cases that do not alter its parent. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-11-20f2fs: skip adding a discard command if existsJaegeuk Kim1-1/+2
When recovering zoned UFS, sometimes we add the same zone to discard multiple times. Simple workaround is to bypass adding it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>