summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)AuthorFilesLines
2023-02-03f2fs: convert f2fs_sync_meta_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-23/+26
Convert function to use folios throughout. This is in preparation for the removal of find_get_pages_range_tag(). This change removes 5 calls to compound_head(). Initially the function was checking if the previous page index is truly the previous page i.e. 1 index behind the current page. To convert to folios and maintain this check we need to make the check folio->index != prev + folio_nr_pages(previous folio) since we don't know how many pages are in a folio. At index i == 0 the check is guaranteed to succeed, so to workaround indexing bounds we can simply ignore the check for that specific index. This makes the initial assignment of prev trivial, so I removed that as well. Also modify a comment in commit_checkpoint for consistency. Link: https://lkml.kernel.org/r/20230104211448.4804-17-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03f2fs: convert last_fsync_dnode() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-9/+10
Convert to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_range_tag(). Link: https://lkml.kernel.org/r/20230104211448.4804-16-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03f2fs: convert f2fs_write_cache_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-26/+58
Convert the function to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_range_tag(). Also modified f2fs_all_cluster_page_ready to take in a folio_batch instead of pagevec. This does NOT support large folios. The function currently only utilizes folios of size 1 so this shouldn't cause any issues right now. This version of the patch limits the number of pages fetched to F2FS_ONSTACK_PAGES. If that ever happens, update the start index here since filemap_get_folios_tag() updates the index to be after the last found folio, not necessarily the last used page. Link: https://lkml.kernel.org/r/20230104211448.4804-15-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03f2fs: convert f2fs_sync_node_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-8/+9
Convert function to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_range_tag(). Link: https://lkml.kernel.org/r/20230104211448.4804-14-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03f2fs: convert f2fs_flush_inline_data() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-8/+9
Convert function to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_tag(). Link: https://lkml.kernel.org/r/20230104211448.4804-13-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03f2fs: convert f2fs_fsync_node_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-9/+10
Convert function to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_range_tag(). Link: https://lkml.kernel.org/r/20230104211448.4804-12-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-03f2fs: let's avoid panic if extent_tree is not createdJaegeuk Kim1-1/+2
This patch avoids the below panic. pc : __lookup_extent_tree+0xd8/0x760 lr : f2fs_do_write_data_page+0x104/0x87c sp : ffffffc010cbb3c0 x29: ffffffc010cbb3e0 x28: 0000000000000000 x27: ffffff8803e7f020 x26: ffffff8803e7ed40 x25: ffffff8803e7f020 x24: ffffffc010cbb460 x23: ffffffc010cbb480 x22: 0000000000000000 x21: 0000000000000000 x20: ffffffff22e90900 x19: 0000000000000000 x18: ffffffc010c5d080 x17: 0000000000000000 x16: 0000000000000020 x15: ffffffdb1acdbb88 x14: ffffff888759e2b0 x13: 0000000000000000 x12: ffffff802da49000 x11: 000000000a001200 x10: ffffff8803e7ed40 x9 : ffffff8023195800 x8 : ffffff802da49078 x7 : 0000000000000001 x6 : 0000000000000000 x5 : 0000000000000006 x4 : ffffffc010cbba28 x3 : 0000000000000000 x2 : ffffffc010cbb480 x1 : 0000000000000000 x0 : ffffff8803e7ed40 Call trace: __lookup_extent_tree+0xd8/0x760 f2fs_do_write_data_page+0x104/0x87c f2fs_write_single_data_page+0x420/0xb60 f2fs_write_cache_pages+0x418/0xb1c __f2fs_write_data_pages+0x428/0x58c f2fs_write_data_pages+0x30/0x40 do_writepages+0x88/0x190 __writeback_single_inode+0x48/0x448 writeback_sb_inodes+0x468/0x9e8 __writeback_inodes_wb+0xb8/0x2a4 wb_writeback+0x33c/0x740 wb_do_writeback+0x2b4/0x400 wb_workfn+0xe4/0x34c process_one_work+0x24c/0x5bc worker_thread+0x3e8/0xa50 kthread+0x150/0x1b4 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03f2fs: should use a temp extent_info for lookupJaegeuk Kim1-6/+7
Otherwise, __lookup_extent_tree() will override the given extent_info which will be used by caller. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03f2fs: don't mix to use union values in extent_infoJaegeuk Kim1-8/+8
Let's explicitly use the defined values in block_age case only. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03f2fs: initialize extent_cache parameterJaegeuk Kim4-4/+4
This can avoid confusing tracepoint values. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-03f2fs: fix to avoid NULL pointer dereference in f2fs_issue_flush()Chao Yu1-7/+4
With below two cases, it will cause NULL pointer dereference when accessing SM_I(sbi)->fcc_info in f2fs_issue_flush(). a) If kthread_run() fails in f2fs_create_flush_cmd_control(), it will release SM_I(sbi)->fcc_info, - mount -o noflush_merge /dev/vda /mnt/f2fs - mount -o remount,flush_merge /dev/vda /mnt/f2fs -- kthread_run() fails - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync b) we will never allocate memory for SM_I(sbi)->fcc_info w/ below testcase, - mount -o ro /dev/vda /mnt/f2fs - mount -o rw,remount /dev/vda /mnt/f2fs - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=1 conv=fsync In order to fix this issue, let change as below: - fix error path handling in f2fs_create_flush_cmd_control(). - allocate SM_I(sbi)->fcc_info even if readonly is on. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-15Merge tag 'f2fs-for-6.2-rc1' of ↵Linus Torvalds19-880/+1453
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added two features: F2FS_IOC_START_ATOMIC_REPLACE and a per-block age-based extent cache. F2FS_IOC_START_ATOMIC_REPLACE is a variant of the previous atomic write feature which guarantees a per-file atomicity. It would be more efficient than AtomicFile implementation in Android framework. The per-block age-based extent cache implements another type of extent cache in memory which keeps the per-block age in a file, so that block allocator could split the hot and cold data blocks more accurately. Enhancements: - introduce F2FS_IOC_START_ATOMIC_REPLACE - refactor extent_cache to add a new per-block-age-based extent cache support - introduce discard_urgent_util, gc_mode, max_ordered_discard sysfs knobs - add proc entry to show discard_plist info - optimize iteration over sparse directories - add barrier mount option Bug fixes: - avoid victim selection from previous victim section - fix to enable compress for newly created file if extension matches - set zstd compress level correctly - initialize locks early in f2fs_fill_super() to fix bugs reported by syzbot - correct i_size change for atomic writes - allow to read node block after shutdown - allow to set compression for inlined file - fix gc mode when gc_urgent_high_remaining is 1 - should put a page when checking the summary info Minor fixes and various clean-ups in GC, discard, debugfs, sysfs, and doc" * tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits) f2fs: reset wait_ms to default if any of the victims have been selected f2fs: fix some format WARNING in debug.c and sysfs.c f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super() f2fs: fix iostat parameter for discard f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache f2fs: add block_age-based extent cache f2fs: allocate the extent_cache by default f2fs: refactor extent_cache to support for read and more f2fs: remove unnecessary __init_extent_tree f2fs: move internal functions into extent_cache.c f2fs: specify extent cache for read explicitly f2fs: introduce f2fs_is_readonly() for readability f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macro f2fs: do some cleanup for f2fs module init MAINTAINERS: Add f2fs bug tracker link f2fs: remove the unused flush argument to change_curseg f2fs: open code allocate_segment_by_default f2fs: remove struct segment_allocation default_salloc_ops f2fs: introduce discard_urgent_util sysfs node f2fs: define MIN_DISCARD_GRANULARITY macro ...
2022-12-13Merge tag 'fsverity-for-linus' of ↵Linus Torvalds2-53/+64
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity updates from Eric Biggers: "The main change this cycle is to stop using the PG_error flag to track verity failures, and instead just track failures at the bio level. This follows a similar fscrypt change that went into 6.1, and it is a step towards freeing up PG_error for other uses. There's also one other small cleanup" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fsverity: simplify fsverity_get_digest() fsverity: stop using PG_error to track error status
2022-12-13Merge tag 'fs.acl.rework.v6.2' of ↵Linus Torvalds4-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull VFS acl updates from Christian Brauner: "This contains the work that builds a dedicated vfs posix acl api. The origins of this work trace back to v5.19 but it took quite a while to understand the various filesystem specific implementations in sufficient detail and also come up with an acceptable solution. As we discussed and seen multiple times the current state of how posix acls are handled isn't nice and comes with a lot of problems: The current way of handling posix acls via the generic xattr api is error prone, hard to maintain, and type unsafe for the vfs until we call into the filesystem's dedicated get and set inode operations. It is already the case that posix acls are special-cased to death all the way through the vfs. There are an uncounted number of hacks that operate on the uapi posix acl struct instead of the dedicated vfs struct posix_acl. And the vfs must be involved in order to interpret and fixup posix acls before storing them to the backing store, caching them, reporting them to userspace, or for permission checking. Currently a range of hacks and duct tape exist to make this work. As with most things this is really no ones fault it's just something that happened over time. But the code is hard to understand and difficult to maintain and one is constantly at risk of introducing bugs and regressions when having to touch it. Instead of continuing to hack posix acls through the xattr handlers this series builds a dedicated posix acl api solely around the get and set inode operations. Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl() helpers must be used in order to interact with posix acls. They operate directly on the vfs internal struct posix_acl instead of abusing the uapi posix acl struct as we currently do. In the end this removes all of the hackiness, makes the codepaths easier to maintain, and gets us type safety. This series passes the LTP and xfstests suites without any regressions. For xfstests the following combinations were tested: - xfs - ext4 - btrfs - overlayfs - overlayfs on top of idmapped mounts - orangefs - (limited) cifs There's more simplifications for posix acls that we can make in the future if the basic api has made it. A few implementation details: - The series makes sure to retain exactly the same security and integrity module permission checks. Especially for the integrity modules this api is a win because right now they convert the uapi posix acl struct passed to them via a void pointer into the vfs struct posix_acl format to perform permission checking on the mode. There's a new dedicated security hook for setting posix acls which passes the vfs struct posix_acl not a void pointer. Basing checking on the posix acl stored in the uapi format is really unreliable. The vfs currently hacks around directly in the uapi struct storing values that frankly the security and integrity modules can't correctly interpret as evidenced by bugs we reported and fixed in this area. It's not necessarily even their fault it's just that the format we provide to them is sub optimal. - Some filesystems like 9p and cifs need access to the dentry in order to get and set posix acls which is why they either only partially or not even at all implement get and set inode operations. For example, cifs allows setxattr() and getxattr() operations but doesn't allow permission checking based on posix acls because it can't implement a get acl inode operation. Thus, this patch series updates the set acl inode operation to take a dentry instead of an inode argument. However, for the get acl inode operation we can't do this as the old get acl method is called in e.g., generic_permission() and inode_permission(). These helpers in turn are called in various filesystem's permission inode operation. So passing a dentry argument to the old get acl inode operation would amount to passing a dentry to the permission inode operation which we shouldn't and probably can't do. So instead of extending the existing inode operation Christoph suggested to add a new one. He also requested to ensure that the get and set acl inode operation taking a dentry are consistently named. So for this version the old get acl operation is renamed to ->get_inode_acl() and a new ->get_acl() inode operation taking a dentry is added. With this we can give both 9p and cifs get and set acl inode operations and in turn remove their complex custom posix xattr handlers. In the future I hope to get rid of the inode method duplication but it isn't like we have never had this situation. Readdir is just one example. And frankly, the overall gain in type safety and the more pleasant api wise are simply too big of a benefit to not accept this duplication for a while. - We've done a full audit of every codepaths using variant of the current generic xattr api to get and set posix acls and surprisingly it isn't that many places. There's of course always a chance that we might have missed some and if so I'm sure we'll find them soon enough. The crucial codepaths to be converted are obviously stacking filesystems such as ecryptfs and overlayfs. For a list of all callers currently using generic xattr api helpers see [2] including comments whether they support posix acls or not. - The old vfs generic posix acl infrastructure doesn't obey the create and replace semantics promised on the setxattr(2) manpage. This patch series doesn't address this. It really is something we should revisit later though. The patches are roughly organized as follows: (1) Change existing set acl inode operation to take a dentry argument (Intended to be a non-functional change) (2) Rename existing get acl method (Intended to be a non-functional change) (3) Implement get and set acl inode operations for filesystems that couldn't implement one before because of the missing dentry. That's mostly 9p and cifs (Intended to be a non-functional change) (4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl() including security and integrity hooks (Intended to be a non-functional change) (5) Implement get and set acl inode operations for stacking filesystems (Intended to be a non-functional change) (6) Switch posix acl handling in stacking filesystems to new posix acl api now that all filesystems it can stack upon support it. (7) Switch vfs to new posix acl api (semantical change) (8) Remove all now unused helpers (9) Additional regression fixes reported after we merged this into linux-next Thanks to Seth for a lot of good discussion around this and encouragement and input from Christoph" * tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits) posix_acl: Fix the type of sentinel in get_acl orangefs: fix mode handling ovl: call posix_acl_release() after error checking evm: remove dead code in evm_inode_set_acl() cifs: check whether acl is valid early acl: make vfs_posix_acl_to_xattr() static acl: remove a slew of now unused helpers 9p: use stub posix acl handlers cifs: use stub posix acl handlers ovl: use stub posix acl handlers ecryptfs: use stub posix acl handlers evm: remove evm_xattr_acl_change() xattr: use posix acl api ovl: use posix acl api ovl: implement set acl method ovl: implement get acl method ecryptfs: implement set acl method ecryptfs: implement get acl method ksmbd: use vfs_remove_acl() acl: add vfs_remove_acl() ...
2022-12-13f2fs: reset wait_ms to default if any of the victims have been selectedYuwei Guan1-0/+4
In non-foreground gc mode, if no victim is selected, the gc process will wait for no_gc_sleep_time before waking up again. In this subsequent time, even though a victim will be selected, the gc process still waits for no_gc_sleep_time before waking up. The configuration of wait_ms is not reasonable. After any of the victims have been selected, we need to reset wait_ms to default sleep time from no_gc_sleep_time. Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: fix some format WARNING in debug.c and sysfs.cYangtao Li2-27/+28
To fix: WARNING: function definition argument 'struct f2fs_attr *' should also have an identifier name + ssize_t (*show)(struct f2fs_attr *, struct f2fs_sb_info *, char *); WARNING: return sysfs_emit(...) formats should include a terminating newline + return sysfs_emit(buf, "(none)"); WARNING: Prefer 'unsigned int' to bare use of 'unsigned' + unsigned npages = NODE_MAPPING(sbi)->nrpages; WARNING: Missing a blank line after declarations + unsigned npages = COMPRESS_MAPPING(sbi)->nrpages; + si->page_mem += (unsigned long long)npages << PAGE_SHIFT; WARNING: quoted string split across lines + seq_printf(s, "CP merge (Queued: %4d, Issued: %4d, Total: %4d, " + "Cur time: %4d(ms), Peak time: %4d(ms))\n", Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in ↵Yangtao Li2-8/+6
f2fs_put_super() No need to call f2fs_issue_discard_timeout() in f2fs_put_super, when no discard command requires issue. Since the caller of f2fs_issue_discard_timeout() usually judges the number of discard commands before using it. Let's move this logic to f2fs_issue_discard_timeout(). By the way, use f2fs_realtime_discard_enable to simplify the code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: fix iostat parameter for discardYangtao Li1-1/+1
Just like other data we count uses the number of bytes as the basic unit, but discard uses the number of cmds as the statistical unit. In fact the discard command contains the number of blocks, so let's change to the number of bytes as the base unit. Fixes: b0af6d491a6b ("f2fs: add app/fs io stat") Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> ↵Colin Ian King1-2/+2
free_bio_entry_cache There is a spelling mistake in a label name. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: add block_age-based extent cacheJaegeuk Kim11-7/+329
This patch introduces a runtime hot/cold data separation method for f2fs, in order to improve the accuracy for data temperature classification, reduce the garbage collection overhead after long-term data updates. Enhanced hot/cold data separation can record data block update frequency as "age" of the extent per inode, and take use of the age info to indicate better temperature type for data block allocation: - It records total data blocks allocated since mount; - When file extent has been updated, it calculate the count of data blocks allocated since last update as the age of the extent; - Before the data block allocated, it searches for the age info and chooses the suitable segment for allocation. Test and result: - Prepare: create about 30000 files * 3% for cold files (with cold file extension like .apk, from 3M to 10M) * 50% for warm files (with random file extension like .FcDxq, from 1K to 4M) * 47% for hot files (with hot file extension like .db, from 1K to 256K) - create(5%)/random update(90%)/delete(5%) the files * total write amount is about 70G * fsync will be called for .db files, and buffered write will be used for other files The storage of test device is large enough(128G) so that it will not switch to SSR mode during the test. Benefit: dirty segment count increment reduce about 14% - before: Dirty +21110 - after: Dirty +18286 Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: allocate the extent_cache by defaultJaegeuk Kim4-24/+27
Let's allocate it to remove the runtime complexity. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: refactor extent_cache to support for read and moreJaegeuk Kim10-281/+434
This patch prepares extent_cache to be ready for addition. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: remove unnecessary __init_extent_treeJaegeuk Kim1-16/+5
Added into the caller. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: move internal functions into extent_cache.cJaegeuk Kim2-76/+81
No functional change. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: specify extent cache for read explicitlyJaegeuk Kim7-18/+18
Let's descrbie it's read extent cache. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: introduce f2fs_is_readonly() for readabilityYangtao Li2-3/+7
Introduce f2fs_is_readonly() and use it to simplify code. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-13f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macroYangtao Li1-5/+1
F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() have never been used since they were introduced by this commit 76f105a2dbcd("f2fs: add feature facility in superblock"). So let's remove them. BTW, convert f2fs_sb_has_##name to return bool. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-08f2fs: do some cleanup for f2fs module initYangtao Li5-62/+14
Just for cleanup, no functional changes. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-08f2fs: remove the unused flush argument to change_cursegChristoph Hellwig1-9/+7
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-08f2fs: open code allocate_segment_by_defaultChristoph Hellwig1-26/+24
allocate_segment_by_default has just two callers, which use very different code pathes inside it based on the force paramter. Just open code the logic in the two callers using a new helper to decided if a new segment should be allocated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-08f2fs: remove struct segment_allocation default_salloc_opsChristoph Hellwig2-15/+2
There is only single instance of these ops, so remove the indirection and call allocate_segment_by_default directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-29fsverity: stop using PG_error to track error statusEric Biggers2-53/+64
As a step towards freeing the PG_error flag for other uses, change ext4 and f2fs to stop using PG_error to track verity errors. Instead, if a verity error occurs, just mark the whole bio as failed. The coarser granularity isn't really a problem since it isn't any worse than what the block layer provides, and errors from a multi-page readahead aren't reported to applications unless a single-page read fails too. f2fs supports compression, which makes the f2fs changes a bit more complicated than desired, but the basic premise still works. Note: there are still a few uses of PageError in f2fs, but they are on the write path, so they are unrelated and this patch doesn't touch them. Reviewed-by: Chao Yu <chao@kernel.org> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221129070401.156114-1-ebiggers@kernel.org
2022-11-28f2fs: introduce discard_urgent_util sysfs nodeYangtao Li3-1/+12
Through this node, you can control the background discard to run more aggressively or not aggressively when reach the utilization rate of the space. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: define MIN_DISCARD_GRANULARITY macroYangtao Li3-3/+6
Do cleanup in f2fs_tuning_parameters() and __init_discard_policy(), let's use macro instead of number. Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: init discard policy after thread wakeupYangtao Li1-11/+9
Under the current logic, after the discard thread wakes up, it will not run according to the expected policy, but will use the expected policy before sleep. Move the strategy selection to after the thread wakes up, so that the running state of the thread meets expectations. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: avoid victim selection from previous victim sectionYonggil Song1-2/+3
When f2fs chooses GC victim in large section & LFS mode, next_victim_seg[gc_type] is referenced first. After segment is freed, next_victim_seg[gc_type] has the next segment number. However, next_victim_seg[gc_type] still has the last segment number even after the last segment of section is freed. In this case, when f2fs chooses a victim for the next GC round, the last segment of previous victim section is chosen as a victim. Initialize next_victim_seg[gc_type] to NULL_SEGNO for the last segment in large section. Fixes: e3080b0120a1 ("f2fs: support subsectional garbage collection") Signed-off-by: Yonggil Song <yonggil.song@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: truncate blocks in batch in __complete_revoke_list()Chao Yu1-7/+2
Use f2fs_do_truncate_blocks() to truncate all blocks in-batch in __complete_revoke_list(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: make __queue_discard_cmd() return voidYangtao Li1-5/+6
Since __queue_discard_cmd() never returns an error, let's make it return void. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: move set_file_temperature into f2fs_new_inodeSheng Yong1-33/+29
Since the file name has already passed to f2fs_new_inode(), let's move set_file_temperature() into f2fs_new_inode(). Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: fix to enable compress for newly created file if extension matchesSheng Yong2-167/+164
If compress_extension is set, and a newly created file matches the extension, the file could be marked as compression file. However, if inline_data is also enabled, there is no chance to check its extension since f2fs_should_compress() always returns false. This patch moves set_compress_inode(), which do extension check, in f2fs_should_compress() to check extensions before setting inline data flag. Fixes: 7165841d578e ("f2fs: fix to check inline_data during compressed inode conversion") Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: set zstd compress level correctlySheng Yong1-1/+1
Fixes: cf30f6a5f0c6 ("lib: zstd: Add kernel-specific API") Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: change type for 'sbi->readdir_ra'Yuwei Guan4-3/+8
Before this patch, the varibale 'readdir_ra' takes effect if it's equal to '1' or not, so we can change type for it from 'int' to 'bool'. Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: cleanup for 'f2fs_tuning_parameters' functionYuwei Guan1-5/+3
A cleanup patch for 'f2fs_tuning_parameters' function. Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: fix to alloc_mode changed after remount on a small volume deviceYuwei Guan1-2/+5
The commit 84b89e5d943d8 ("f2fs: add auto tuning for small devices") add tuning for small volume device, now support to tune alloce_mode to 'reuse' if it's small size. But the alloc_mode will change to 'default' when do remount on this small size dievce. This patch fo fix alloc_mode changed when do remount for a small volume device. Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: remove submit label in __submit_discard_cmd()Yangtao Li1-4/+3
Complaint from Matthew Wilcox in another similar place: "submit? You don't submit anything at the 'submit' label. it should be called 'skip' or something. But I think this is just badly written and you don't need a goto at all." Let's remove submit label for readability. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: fix to do sanity check on i_extra_isize in is_alive()Chao Yu1-6/+12
syzbot found a f2fs bug: BUG: KASAN: slab-out-of-bounds in data_blkaddr fs/f2fs/f2fs.h:2891 [inline] BUG: KASAN: slab-out-of-bounds in is_alive fs/f2fs/gc.c:1117 [inline] BUG: KASAN: slab-out-of-bounds in gc_data_segment fs/f2fs/gc.c:1520 [inline] BUG: KASAN: slab-out-of-bounds in do_garbage_collect+0x386a/0x3df0 fs/f2fs/gc.c:1734 Read of size 4 at addr ffff888076557568 by task kworker/u4:3/52 CPU: 1 PID: 52 Comm: kworker/u4:3 Not tainted 6.1.0-rc4-syzkaller-00362-gfef7fd48922d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: writeback wb_workfn (flush-7:0) Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x15e/0x45d mm/kasan/report.c:395 kasan_report+0xbb/0x1f0 mm/kasan/report.c:495 data_blkaddr fs/f2fs/f2fs.h:2891 [inline] is_alive fs/f2fs/gc.c:1117 [inline] gc_data_segment fs/f2fs/gc.c:1520 [inline] do_garbage_collect+0x386a/0x3df0 fs/f2fs/gc.c:1734 f2fs_gc+0x88c/0x20a0 fs/f2fs/gc.c:1831 f2fs_balance_fs+0x544/0x6b0 fs/f2fs/segment.c:410 f2fs_write_inode+0x57e/0xe20 fs/f2fs/inode.c:753 write_inode fs/fs-writeback.c:1440 [inline] __writeback_single_inode+0xcfc/0x1440 fs/fs-writeback.c:1652 writeback_sb_inodes+0x54d/0xf90 fs/fs-writeback.c:1870 wb_writeback+0x2c5/0xd70 fs/fs-writeback.c:2044 wb_do_writeback fs/fs-writeback.c:2187 [inline] wb_workfn+0x2dc/0x12f0 fs/fs-writeback.c:2227 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289 worker_thread+0x665/0x1080 kernel/workqueue.c:2436 kthread+0x2e4/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 The root cause is that we forgot to do sanity check on .i_extra_isize in below path, result in accessing invalid address later, fix it. - gc_data_segment - is_alive - data_blkaddr - offset_in_addr Reported-by: syzbot+f8f3dfa4abc489e768a1@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-f2fs-devel/0000000000003cb3c405ed5c17f9@google.com/T/#u Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: introduce F2FS_IOC_START_ATOMIC_REPLACEDaeho Jeong4-7/+31
introduce a new ioctl to replace the whole content of a file atomically, which means it induces truncate and content update at the same time. We can start it with F2FS_IOC_START_ATOMIC_REPLACE and complete it with F2FS_IOC_COMMIT_ATOMIC_WRITE. Or abort it with F2FS_IOC_ABORT_ATOMIC_WRITE. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-18treewide: use get_random_u32_inclusive() when possibleJason A. Donenfeld1-3/+3
These cases were done with this Coccinelle: @@ expression H; expression L; @@ - (get_random_u32_below(H) + L) + get_random_u32_inclusive(L, H + L - 1) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - + E - - E ) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - - E - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - - E + F - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - + E + F - - E ) And then subsequently cleaned up by hand, with several automatic cases rejected if it didn't make sense contextually. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-18treewide: use get_random_u32_below() instead of deprecated functionJason A. Donenfeld2-5/+5
This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-11-11f2fs: fix to set flush_merge opt and show noflush_mergeYangtao Li1-2/+11
Some minor modifications to flush_merge and related parameters: 1.The FLUSH_MERGE opt is set by default only in non-ro mode. 2.When ro and merge are set at the same time, an error is reported. 3.Display noflush_merge mount opt. Suggested-by: Chao Yu <chao@kernel.org> Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>