summaryrefslogtreecommitdiff
path: root/fs/f2fs
AgeCommit message (Collapse)AuthorFilesLines
2023-04-10f2fs: Fix system crash due to lack of free space in LFSYonggil Song3-11/+40
When f2fs tries to checkpoint during foreground gc in LFS mode, system crash occurs due to lack of free space if the amount of dirty node and dentry pages generated by data migration exceeds free space. The reproduction sequence is as follows. - 20GiB capacity block device (null_blk) - format and mount with LFS mode - create a file and write 20,000MiB - 4k random write on full range of the file RIP: 0010:new_curseg+0x48a/0x510 [f2fs] Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 48 04 45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff RSP: 0018:ffff977bc397b218 EFLAGS: 00010246 RAX: 00000000000027b9 RBX: 0000000000000000 RCX: 00000000000027c0 RDX: 0000000000000000 RSI: 00000000000027b9 RDI: ffff8c25ab4e74f8 RBP: ffff977bc397b268 R08: 00000000000027b9 R09: ffff8c29e4a34b40 R10: 0000000000000001 R11: ffff977bc397b0d8 R12: 0000000000000000 R13: ffff8c25b4dd81a0 R14: 0000000000000000 R15: ffff8c2f667f9000 FS: 0000000000000000(0000) GS:ffff8c344ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00055d000 CR3: 0000000e30810003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> allocate_segment_by_default+0x9c/0x110 [f2fs] f2fs_allocate_data_block+0x243/0xa30 [f2fs] ? __mod_lruvec_page_state+0xa0/0x150 do_write_page+0x80/0x160 [f2fs] f2fs_do_write_node_page+0x32/0x50 [f2fs] __write_node_page+0x339/0x730 [f2fs] f2fs_sync_node_pages+0x5a6/0x780 [f2fs] block_operations+0x257/0x340 [f2fs] f2fs_write_checkpoint+0x102/0x1050 [f2fs] f2fs_gc+0x27c/0x630 [f2fs] ? folio_mark_dirty+0x36/0x70 f2fs_balance_fs+0x16f/0x180 [f2fs] This patch adds checking whether free sections are enough before checkpoint during gc. Signed-off-by: Yonggil Song <yonggil.song@samsung.com> [Jaegeuk Kim: code clean-up] Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-10f2fs: remove struct victim_selection default_v_opsYangtao Li4-21/+10
There is only single instance of these ops, and Jaegeuk point out that: Originally this was intended to give a chance to provide other allocation option. Anyway, it seems quit hard to do it anymore. So remove the indirection and call f2fs_get_victim() directly. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-04f2fs: fix null pointer panic in tracepoint in __replace_atomic_write_blockJaegeuk Kim1-1/+1
We got a kernel panic if old_addr is NULL. https://bugzilla.kernel.org/show_bug.cgi?id=217266 BUG: kernel NULL pointer dereference, address: 0000000000000000 Call Trace: <TASK> f2fs_commit_atomic_write+0x619/0x990 [f2fs a1b985b80f5babd6f3ea778384908880812bfa43] __f2fs_ioctl+0xd8e/0x4080 [f2fs a1b985b80f5babd6f3ea778384908880812bfa43] ? vfs_write+0x2ae/0x3f0 ? vfs_write+0x2ae/0x3f0 __x64_sys_ioctl+0x91/0xd0 do_syscall_64+0x5c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7f69095fe53f Fixes: 2f3a9ae990a7 ("f2fs: introduce trace_f2fs_replace_atomic_write_block") Cc: <stable@vger.kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-04f2fs: fix iostat lock protectionQilin Tan1-2/+2
Made iostat lock irq safe to avoid potentinal deadlock. Deadlock scenario: f2fs_attr_store -> f2fs_sbi_store -> _sbi_store -> spin_lock(sbi->iostat_lock) <interrupt request> -> scsi_end_request -> bio_endio -> f2fs_dio_read_end_io -> f2fs_update_iostat -> spin_lock_irqsave(sbi->iostat_lock) ===> Dead lock here Fixes: 61803e984307 ("f2fs: fix iostat related lock protection") Fixes: a1e09b03e6f5 ("f2fs: use iomap for direct I/O") Signed-off-by: Qilin Tan <qilin.tan@mediatek.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-04f2fs: fix align check for npo2Yohan Joung1-2/+3
Fix alignment check to be correct in npo2 as well Signed-off-by: Yohan Joung <yohan.joung@sk.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-04f2fs: add compression feature check for all compress mount optYangtao Li1-0/+12
Opt_compress_chksum, Opt_compress_mode and Opt_compress_cache lack the necessary check to see if the image supports compression, let's add it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-04f2fs: convert is_extension_exist() to return bool typeYangtao Li1-6/+6
is_extension_exist() only return two values, 0 or 1. So there is no need to use int type. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: fix scheduling while atomic in decompression pathJaegeuk Kim3-2/+8
[ 16.945668][ C0] Call trace: [ 16.945678][ C0] dump_backtrace+0x110/0x204 [ 16.945706][ C0] dump_stack_lvl+0x84/0xbc [ 16.945735][ C0] __schedule_bug+0xb8/0x1ac [ 16.945756][ C0] __schedule+0x724/0xbdc [ 16.945778][ C0] schedule+0x154/0x258 [ 16.945793][ C0] bit_wait_io+0x48/0xa4 [ 16.945808][ C0] out_of_line_wait_on_bit+0x114/0x198 [ 16.945824][ C0] __sync_dirty_buffer+0x1f8/0x2e8 [ 16.945853][ C0] __f2fs_commit_super+0x140/0x1f4 [ 16.945881][ C0] f2fs_commit_super+0x110/0x28c [ 16.945898][ C0] f2fs_handle_error+0x1f4/0x2f4 [ 16.945917][ C0] f2fs_decompress_cluster+0xc4/0x450 [ 16.945942][ C0] f2fs_end_read_compressed_page+0xc0/0xfc [ 16.945959][ C0] f2fs_handle_step_decompress+0x118/0x1cc [ 16.945978][ C0] f2fs_read_end_io+0x168/0x2b0 [ 16.945993][ C0] bio_endio+0x25c/0x2c8 [ 16.946015][ C0] dm_io_dec_pending+0x3e8/0x57c [ 16.946052][ C0] clone_endio+0x134/0x254 [ 16.946069][ C0] bio_endio+0x25c/0x2c8 [ 16.946084][ C0] blk_update_request+0x1d4/0x478 [ 16.946103][ C0] scsi_end_request+0x38/0x4cc [ 16.946129][ C0] scsi_io_completion+0x94/0x184 [ 16.946147][ C0] scsi_finish_command+0xe8/0x154 [ 16.946164][ C0] scsi_complete+0x90/0x1d8 [ 16.946181][ C0] blk_done_softirq+0xa4/0x11c [ 16.946198][ C0] _stext+0x184/0x614 [ 16.946214][ C0] __irq_exit_rcu+0x78/0x144 [ 16.946234][ C0] handle_domain_irq+0xd4/0x154 [ 16.946260][ C0] gic_handle_irq.33881+0x5c/0x27c [ 16.946281][ C0] call_on_irq_stack+0x40/0x70 [ 16.946298][ C0] do_interrupt_handler+0x48/0xa4 [ 16.946313][ C0] el1_interrupt+0x38/0x68 [ 16.946346][ C0] el1h_64_irq_handler+0x20/0x30 [ 16.946362][ C0] el1h_64_irq+0x78/0x7c [ 16.946377][ C0] finish_task_switch+0xc8/0x3d8 [ 16.946394][ C0] __schedule+0x600/0xbdc [ 16.946408][ C0] preempt_schedule_common+0x34/0x5c [ 16.946423][ C0] preempt_schedule+0x44/0x48 [ 16.946438][ C0] process_one_work+0x30c/0x550 [ 16.946456][ C0] worker_thread+0x414/0x8bc [ 16.946472][ C0] kthread+0x16c/0x1e0 [ 16.946486][ C0] ret_from_fork+0x10/0x20 Fixes: bff139b49d9f ("f2fs: handle decompress only post processing in softirq") Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: preserve direct write semantics when buffering is forcedHans Holmberg1-8/+26
In some cases, e.g. for zoned block devices, direct writes are forced into buffered writes that will populate the page cache and be written out just like buffered io. Direct reads, on the other hand, is supported for the zoned block device case. This has the effect that applications built for direct io will fill up the page cache with data that will never be read, and that is a waste of resources. If we agree that this is a problem, how do we fix it? A) Supporting proper direct writes for zoned block devices would be the best, but it is currently not supported (probably for a good but non-obvious reason). Would it be feasible to implement proper direct IO? B) Avoid the cost of keeping unwanted data by syncing and throwing out the cached pages for buffered O_DIRECT writes before completion. This patch implements B) by reusing the code for how partial block writes are flushed out on the "normal" direct write path. Note that this changes the performance characteristics of f2fs quite a bit. Direct IO performance for zoned block devices is lower for small writes after this patch, but this should be expected with direct IO and in line with how f2fs behaves on top of conventional block devices. Another open question is if the flushing should be done for all cases where buffered writes are forced. Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Yonggil Song <yonggil.song@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: compress: fix to call f2fs_wait_on_page_writeback() in ↵Yangtao Li1-0/+6
f2fs_write_raw_pages() BUG_ON() will be triggered when writing files concurrently, because the same page is writtenback multiple times. 1597 void folio_end_writeback(struct folio *folio) 1598 { ...... 1618 if (!__folio_end_writeback(folio)) 1619 BUG(); ...... 1625 } kernel BUG at mm/filemap.c:1619! Call Trace: <TASK> f2fs_write_end_io+0x1a0/0x370 blk_update_request+0x6c/0x410 blk_mq_end_request+0x15/0x130 blk_complete_reqs+0x3c/0x50 __do_softirq+0xb8/0x29b ? sort_range+0x20/0x20 run_ksoftirqd+0x19/0x20 smpboot_thread_fn+0x10b/0x1d0 kthread+0xde/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Below is the concurrency scenario: [Process A] [Process B] [Process C] f2fs_write_raw_pages() - redirty_page_for_writepage() - unlock page() f2fs_do_write_data_page() - lock_page() - clear_page_dirty_for_io() - set_page_writeback() [1st writeback] ..... - unlock page() generic_perform_write() - f2fs_write_begin() - wait_for_stable_page() - f2fs_write_end() - set_page_dirty() - lock_page() - f2fs_do_write_data_page() - set_page_writeback() [2st writeback] This problem was introduced by the previous commit 7377e853967b ("f2fs: compress: fix potential deadlock of compress file"). All pagelocks were released in f2fs_write_raw_pages(), but whether the page was in the writeback state was ignored in the subsequent writing process. Let's fix it by waiting for the page to writeback before writing. Cc: Christoph Hellwig <hch@lst.de> Fixes: 4c8ff7095bef ("f2fs: support data compression") Fixes: 7377e853967b ("f2fs: compress: fix potential deadlock of compress file") Signed-off-by: Qi Han <hanqi@vivo.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: remove else in f2fs_write_cache_pages()Yangtao Li1-5/+2
As Christoph Hellwig point out: Please avoid the else by doing the goto in the branch. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: apply zone capacity to all zone typeJaegeuk Kim2-61/+7
If we manage the zone capacity per zone type, it'll break the GC assumption. And, the current logic complains valid block count mismatch. Let's apply zone capacity to all zone type, if specified. Fixes: de881df97768 ("f2fs: support zone capacity less than zone size") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: fix to handle filemap_fdatawrite() error in ↵Yangtao Li1-4/+10
f2fs_ioc_decompress_file/f2fs_ioc_compress_file It seems inappropriate that the current logic does not handle filemap_fdatawrite() errors, so let's fix it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: convert to MAX_SBI_FLAG instead of 32 in stat_show()Yangtao Li2-19/+23
BIW reduce the s_flag array size and make s_flag constant. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: Fix discard bug on zoned block devices with 2MiB zone sizeYonggil Song1-1/+3
When using f2fs on a zoned block device with 2MiB zone size, IO errors occurs because f2fs tries to write data to a zone that has not been reset. The cause is that f2fs tries to discard multiple zones at once. This is caused by a condition in f2fs_clear_prefree_segments that does not check for zoned block devices when setting the discard range. This leads to invalid reset commands and write pointer mismatches. This patch fixes the zoned block device with 2MiB zone size to reset one zone at a time. Signed-off-by: Yonggil Song <yonggil.song@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: remove entire rb_entry sharingJaegeuk Kim2-112/+71
This is a last part to remove the memory sharing for rb_tree in extent_cache. This should also fix arm32 memory alignment issue. [struct extent_node] [struct rb_entry] [0] struct rb_node rb_node; [0] struct rb_node rb_node; union { union { struct { struct { [16] unsigned int fofs; [12] unsigned int ofs; unsigned int len; unsigned int len; }; unsigned long long key; } __packed; Cc: <stable@vger.kernel.org> Fixes: 13054c548a1c ("f2fs: introduce infra macro and data structure of rb-tree extent cache") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: factor out discard_cmd usage from general rb_tree useJaegeuk Kim3-139/+169
This is a second part to remove the mixed use of rb_tree in discard_cmd from extent_cache. This should also fix arm32 memory alignment issue caused by shared rb_entry. [struct discard_cmd] [struct rb_entry] [0] struct rb_node rb_node; [0] struct rb_node rb_node; union { union { struct { struct { [16] block_t lstart; [12] unsigned int ofs; block_t len; unsigned int len; }; unsigned long long key; } __packed; Cc: <stable@vger.kernel.org> Fixes: 004b68621897 ("f2fs: use rb-tree to track pending discard commands") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: factor out victim_entry usage from general rb_tree useJaegeuk Kim5-115/+93
Let's reduce the complexity of mixed use of rb_tree in victim_entry from extent_cache and discard_cmd. This should fix arm32 memory alignment issue caused by shared rb_entry. [struct victim_entry] [struct rb_entry] [0] struct rb_node rb_node; [0] struct rb_node rb_node; union { struct { unsigned int ofs; unsigned int len; }; [16] unsigned long long mtime; [12] unsigned long long key; } __packed; Cc: <stable@vger.kernel.org> Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: fix uninitialized skipped_gc_rwsemYonggil Song1-1/+1
When f2fs skipped a gc round during victim migration, there was a bug which would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem was not initialized. It fixes the bug by correctly initializing the skipped_gc_rwsem inside the gc loop. Fixes: 6f8d4455060d ("f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc") Signed-off-by: Yonggil Song <yonggil.song@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: handle dqget error in f2fs_transfer_project_quota()Yangtao Li1-7/+8
We should set the error code when dqget() failed. Fixes: 2c1d03056991 ("f2fs: support F2FS_IOC_FS{GET,SET}XATTR") Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: convert to use bitmap APIYangtao Li10-46/+44
Let's use BIT() and GENMASK() instead of open it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: export compress_percent and compress_watermark entriesYangtao Li1-0/+18
This patch export below sysfs entries for better control cached compress page count. /sys/fs/f2fs/<disk>/compress_watermark /sys/fs/f2fs/<disk>/compress_percent Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-30f2fs: make f2fs_sync_inode_meta() staticLi Zetao2-2/+1
After commit 26b5a079197c ("f2fs: cleanup dirty pages if recover failed"), f2fs_sync_inode_meta() is only used in checkpoint.c, so f2fs_sync_inode_meta() should only be visible inside. Delete the declaration in the header file and change f2fs_sync_inode_meta() to static. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-28Merge tag 'f2fs-for-6.3-rc1' of ↵Linus Torvalds21-881/+948
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've got a huge number of patches that improve code readability along with minor bug fixes, while we've mainly fixed some critical issues in recently-added per-block age-based extent_cache, atomic write support, and some folio cases. Enhancements: - add sysfs nodes to set last_age_weight and manage discard_io_aware_gran - show ipu policy in debugfs - reduce stack memory cost by using bitfield in struct f2fs_io_info - introduce trace_f2fs_replace_atomic_write_block - enhance iostat support and adds flush commands Bug fixes: - revert "f2fs: truncate blocks in batch in __complete_revoke_list()" - fix kernel crash on the atomic write abort flow - call clear_page_private_reference in .{release,invalid}_folio - support .migrate_folio for compressed inode - fix cgroup writeback accounting with fs-layer encryption - retry to update the inode page given data corruption - fix kernel crash due to NULL io->bio - fix some bugs in per-block age-based extent_cache: - wrong calculation of block age - update age extent in f2fs_do_zero_range() - update age extent correctly during truncation" * tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits) f2fs: drop unnecessary arg for f2fs_ioc_*() f2fs: Revert "f2fs: truncate blocks in batch in __complete_revoke_list()" f2fs: synchronize atomic write aborts f2fs: fix wrong segment count f2fs: replace si->sbi w/ sbi in stat_show() f2fs: export ipu policy in debugfs f2fs: make kobj_type structures constant f2fs: fix to do sanity check on extent cache correctly f2fs: add missing description for ipu_policy node f2fs: fix to set ipu policy f2fs: fix typos in comments f2fs: fix kernel crash due to null io->bio f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx() f2fs: add sysfs nodes to set last_age_weight f2fs: fix f2fs_show_options to show nogc_merge mount option f2fs: fix cgroup writeback accounting with fs-layer encryption f2fs: fix wrong calculation of block age f2fs: fix to update age extent in f2fs_do_zero_range() f2fs: fix to update age extent correctly during truncation f2fs: fix to avoid potential memory corruption in __update_iostat_latency() ...
2023-02-24Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds3-83/+122
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-20Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds2-5/+4
Pull fsverity updates from Eric Biggers: "Fix the longstanding implementation limitation that fsverity was only supported when the Merkle tree block size, filesystem block size, and PAGE_SIZE were all equal. Specifically, add support for Merkle tree block sizes less than PAGE_SIZE, and make ext4 support fsverity on filesystems where the filesystem block size is less than PAGE_SIZE. Effectively, this means that fsverity can now be used on systems with non-4K pages, at least on ext4. These changes have been tested using the verity group of xfstests, newly updated to cover the new code paths. Also update fs/verity/ to support verifying data from large folios. There's also a similar patch for fs/crypto/, to support decrypting data from large folios, which I'm including in here to avoid a merge conflict between the fscrypt and fsverity branches" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fscrypt: support decrypting data from large folios fsverity: support verifying data from large folios fsverity.rst: update git repo URL for fsverity-utils ext4: allow verity with fs block size < PAGE_SIZE fs/buffer.c: support fsverity in block_read_full_folio() f2fs: simplify f2fs_readpage_limit() ext4: simplify ext4_readpage_limit() fsverity: support enabling with tree block size < PAGE_SIZE fsverity: support verification with tree block size < PAGE_SIZE fsverity: replace fsverity_hash_page() with fsverity_hash_block() fsverity: use EFBIG for file too large to enable verity fsverity: store log2(digest_size) precomputed fsverity: simplify Merkle tree readahead size calculation fsverity: use unsigned long for level_start fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUG fsverity: pass pos and size to ->write_merkle_tree_block fsverity: optimize fsverity_cleanup_inode() on non-verity files fsverity: optimize fsverity_prepare_setattr() on non-verity files fsverity: optimize fsverity_file_open() on non-verity files
2023-02-20Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds1-6/+0
Pull fscrypt updates from Eric Biggers: "Simplify the implementation of the test_dummy_encryption mount option by adding the 'test dummy key' on-demand" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: clean up fscrypt_add_test_dummy_key() fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super() f2fs: stop calling fscrypt_add_test_dummy_key() ext4: stop calling fscrypt_add_test_dummy_key() fscrypt: add the test dummy encryption key on-demand
2023-02-20Merge tag 'fs.idmapped.v6.3' of ↵Linus Torvalds7-68/+68
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for non-vfs developers without detailed knowledge in this area this was a potential source for bugs. This finishes the conversion. Instead of passing the plain namespace around this updates all places that currently take a pointer to a mnt_userns with a pointer to struct mnt_idmap. Now that the conversion is done all helpers down to the really low-level helpers only accept a struct mnt_idmap argument instead of two namespace arguments. Conflating mount and other idmappings will now cause the compiler to complain loudly thus eliminating the possibility of any bugs. This makes it impossible for filesystem developers to mix up mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. Everything associated with struct mnt_idmap is moved into a single separate file. With that change no code can poke around in struct mnt_idmap. It can only be interacted with through dedicated helpers. That means all filesystems are and all of the vfs is completely oblivious to the actual implementation of idmappings. We are now also able to extend struct mnt_idmap as we see fit. For example, we can decouple it completely from namespaces for users that don't require or don't want to use them at all. We can also extend the concept of idmappings so we can cover filesystem specific requirements. In combination with the vfs{g,u}id_t work we finished in v6.2 this makes this feature substantially more robust and thus difficult to implement wrong by a given filesystem and also protects the vfs. - Enable idmapped mounts for tmpfs and fulfill a longstanding request. A long-standing request from users had been to make it possible to create idmapped mounts for tmpfs. For example, to share the host's tmpfs mount between multiple sandboxes. This is a prerequisite for some advanced Kubernetes cases. Systemd also has a range of use-cases to increase service isolation. And there are more users of this. However, with all of the other work going on this was way down on the priority list but luckily someone other than ourselves picked this up. As usual the patch is tiny as all the infrastructure work had been done multiple kernel releases ago. In addition to all the tests that we already have I requested that Rodrigo add a dedicated tmpfs testsuite for idmapped mounts to xfstests. It is to be included into xfstests during the v6.3 development cycle. This should add a slew of additional tests. * tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits) shmem: support idmapped mounts for tmpfs fs: move mnt_idmap fs: port vfs{g,u}id helpers to mnt_idmap fs: port fs{g,u}id helpers to mnt_idmap fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap fs: port i_{g,u}id_{needs_}update() to mnt_idmap quota: port to mnt_idmap fs: port privilege checking helpers to mnt_idmap fs: port inode_owner_or_capable() to mnt_idmap fs: port inode_init_owner() to mnt_idmap fs: port acl to mnt_idmap fs: port xattr to mnt_idmap fs: port ->permission() to pass mnt_idmap fs: port ->fileattr_set() to pass mnt_idmap fs: port ->set_acl() to pass mnt_idmap fs: port ->get_acl() to pass mnt_idmap fs: port ->tmpfile() to pass mnt_idmap fs: port ->rename() to pass mnt_idmap fs: port ->mknod() to pass mnt_idmap fs: port ->mkdir() to pass mnt_idmap ...
2023-02-15f2fs: drop unnecessary arg for f2fs_ioc_*()Yangtao Li1-8/+8
They are not used, let's remove them. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-15f2fs: Revert "f2fs: truncate blocks in batch in __complete_revoke_list()"Jaegeuk Kim1-2/+7
We should not truncate replaced blocks, and were supposed to truncate the first part as well. This reverts commit 78a99fe6254cad4be310cd84af39f6c46b668c72. Cc: stable@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-14f2fs: synchronize atomic write abortsDaeho Jeong4-22/+38
To fix a race condition between atomic write aborts, I use the inode lock and make COW inode to be re-usable thoroughout the whole atomic file inode lifetime. Reported-by: syzbot+823000d23b3400619f7c@syzkaller.appspotmail.com Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-14f2fs: fix wrong segment countJaegeuk Kim1-4/+5
MAIN_SEGS is for data area, while TOTAL_SEGS includes data and metadata. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-14f2fs: replace si->sbi w/ sbi in stat_show()Yangtao Li1-21/+23
For each loop add a local f2fs_sb_info pointer insted of looking it up. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-14f2fs: export ipu policy in debugfsYangtao Li2-0/+25
Export ipu_policy as a string in debugfs for better readability and it can help us better understand some strategies of the file system. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-13f2fs: make kobj_type structures constantThomas Weißschuh1-5/+5
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-09f2fs: fix to do sanity check on extent cache correctlyChao Yu3-16/+32
In do_read_inode(), sanity check for extent cache should be called after f2fs_init_read_extent_tree(), fix it. Fixes: 72840cccc0a1 ("f2fs: allocate the extent_cache by default") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-08f2fs: stop calling fscrypt_add_test_dummy_key()Eric Biggers1-6/+0
Now that fs/crypto/ adds the test dummy encryption key on-demand when it's needed, there's no need for individual filesystems to call fscrypt_add_test_dummy_key(). Remove the call to it from f2fs. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230208062107.199831-4-ebiggers@kernel.org
2023-02-07f2fs: fix to set ipu policyYangtao Li3-5/+29
For LFS mode, it should update outplace and no need inplace update. When using LFS mode for small-volume devices, IPU will not be used, and the OPU writing method is actually used, but F2FS_IPU_FORCE can be read from the ipu_policy node, which is different from the actual situation. And remount to lfs mode should be disallowed when f2fs ipu is enabled, let's fix it. Fixes: 84b89e5d943d ("f2fs: add auto tuning for small devices") Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-07f2fs: fix typos in commentsJinyoung CHOI7-14/+14
This patch is to fix typos in f2fs files. Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-07f2fs: fix kernel crash due to null io->bioJaegeuk Kim1-0/+4
We should return when io->bio is null before doing anything. Otherwise, panic. BUG: kernel NULL pointer dereference, address: 0000000000000010 RIP: 0010:__submit_merged_write_cond+0x164/0x240 [f2fs] Call Trace: <TASK> f2fs_submit_merged_write+0x1d/0x30 [f2fs] commit_checkpoint+0x110/0x1e0 [f2fs] f2fs_write_checkpoint+0x9f7/0xf00 [f2fs] ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs] __checkpoint_and_complete_reqs+0x84/0x190 [f2fs] ? preempt_count_add+0x82/0xc0 ? __pfx_issue_checkpoint_thread+0x10/0x10 [f2fs] issue_checkpoint_thread+0x4c/0xf0 [f2fs] ? __pfx_autoremove_wake_function+0x10/0x10 kthread+0xff/0x130 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> Cc: stable@vger.kernel.org # v5.18+ Fixes: 64bf0eef0171 ("f2fs: pass the bio operation to bio_alloc_bioset") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-07f2fs: use iostat_lat_type directly as a parameter in the ↵Yangtao Li3-38/+33
iostat_update_and_unbind_ctx() Convert to use iostat_lat_type as parameter instead of raw number. BTW, move NUM_PREALLOC_IOSTAT_CTXS to the header file, adjust iostat_lat[{0,1,2}] to iostat_lat[{READ_IO,WRITE_SYNC_IO,WRITE_ASYNC_IO}] in tracepoint function, and rename iotype to page_type to match the definition. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-07f2fs: add sysfs nodes to set last_age_weightqixiaoyu13-6/+21
Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-06f2fs: fix f2fs_show_options to show nogc_merge mount optionYangtao Li1-0/+2
Commit 5911d2d1d1a3 ("f2fs: introduce gc_merge mount option") forgot to show nogc_merge option, let's fix it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-06f2fs: fix cgroup writeback accounting with fs-layer encryptionEric Biggers1-3/+3
When writing a page from an encrypted file that is using filesystem-layer encryption (not inline encryption), f2fs encrypts the pagecache page into a bounce page, then writes the bounce page. It also passes the bounce page to wbc_account_cgroup_owner(). That's incorrect, because the bounce page is a newly allocated temporary page that doesn't have the memory cgroup of the original pagecache page. This makes wbc_account_cgroup_owner() not account the I/O to the owner of the pagecache page as it should. Fix this by always passing the pagecache page to wbc_account_cgroup_owner(). Fixes: 578c647879f7 ("f2fs: implement cgroup writeback support") Cc: stable@vger.kernel.org Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-06f2fs: fix wrong calculation of block ageqixiaoyu11-3/+10
Currently we wrongly calculate the new block age to old * LAST_AGE_WEIGHT / 100. Fix it to new * (100 - LAST_AGE_WEIGHT) / 100 + old * LAST_AGE_WEIGHT / 100. Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-03f2fs: convert f2fs_sync_meta_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-23/+26
Convert function to use folios throughout. This is in preparation for the removal of find_get_pages_range_tag(). This change removes 5 calls to compound_head(). Initially the function was checking if the previous page index is truly the previous page i.e. 1 index behind the current page. To convert to folios and maintain this check we need to make the check folio->index != prev + folio_nr_pages(previous folio) since we don't know how many pages are in a folio. At index i == 0 the check is guaranteed to succeed, so to workaround indexing bounds we can simply ignore the check for that specific index. This makes the initial assignment of prev trivial, so I removed that as well. Also modify a comment in commit_checkpoint for consistency. Link: https://lkml.kernel.org/r/20230104211448.4804-17-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03f2fs: convert last_fsync_dnode() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-9/+10
Convert to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_range_tag(). Link: https://lkml.kernel.org/r/20230104211448.4804-16-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03f2fs: convert f2fs_write_cache_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-26/+58
Convert the function to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_range_tag(). Also modified f2fs_all_cluster_page_ready to take in a folio_batch instead of pagevec. This does NOT support large folios. The function currently only utilizes folios of size 1 so this shouldn't cause any issues right now. This version of the patch limits the number of pages fetched to F2FS_ONSTACK_PAGES. If that ever happens, update the start index here since filemap_get_folios_tag() updates the index to be after the last found folio, not necessarily the last used page. Link: https://lkml.kernel.org/r/20230104211448.4804-15-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03f2fs: convert f2fs_sync_node_pages() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-8/+9
Convert function to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_range_tag(). Link: https://lkml.kernel.org/r/20230104211448.4804-14-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-03f2fs: convert f2fs_flush_inline_data() to use filemap_get_folios_tag()Vishal Moola (Oracle)1-8/+9
Convert function to use a folio_batch instead of pagevec. This is in preparation for the removal of find_get_pages_tag(). Link: https://lkml.kernel.org/r/20230104211448.4804-13-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>