summaryrefslogtreecommitdiff
path: root/fs/f2fs/super.c
AgeCommit message (Collapse)AuthorFilesLines
2023-12-08f2fs: Restrict max filesize for 16K f2fsDaniel Rosenberg1-0/+8
Blocks are tracked by u32, so the max permitted filesize is (U32_MAX + 1) * BLOCK_SIZE. Additionally, in order to support crypto data unit sizes of 4K with a 16K block with IV_INO_LBLK_{32,64}, we must further restrict max filesize to (U32_MAX + 1) * 4096. This does not affect 4K blocksize f2fs as the natural limit for files are well below that. Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size") Signed-off-by: Daniel Rosenberg <drosen@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-05f2fs: check write pointers when checkpoint=disableJaegeuk Kim1-2/+1
Even if f2fs was rebooted as staying checkpoint=disable, let's match the write pointers all the time. Reviewed-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-12-01f2fs: allow checkpoint=disable for zoned block deviceJaegeuk Kim1-5/+0
Let's allow checkpoint=disable back for zoned block device. It's very risky as the feature relies on fsck or runtime recovery which matches the write pointers again if the device rebooted while disabling the checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-11-07Merge tag 'vfs-6.7.fsid' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fanotify fsid updates from Christian Brauner: "This work is part of the plan to enable fanotify to serve as a drop-in replacement for inotify. While inotify is availabe on all filesystems, fanotify currently isn't. In order to support fanotify on all filesystems two things are needed: (1) all filesystems need to support AT_HANDLE_FID (2) all filesystems need to report a non-zero f_fsid This contains (1) and allows filesystems to encode non-decodable file handlers for fanotify without implementing any exportfs operations by encoding a file id of type FILEID_INO64_GEN from i_ino and i_generation. Filesystems that want to opt out of encoding non-decodable file ids for fanotify that don't support NFS export can do so by providing an empty export_operations struct. This also partially addresses (2) by generating f_fsid for simple filesystems as well as freevxfs. Remaining filesystems will be dealt with by separate patches. Finally, this contains the patch from the current exportfs maintainers which moves exportfs under vfs with Chuck, Jeff, and Amir as maintainers and vfs.git as tree" * tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: create an entry for exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined freevxfs: derive f_fsid from bdev->bd_dev fs: report f_fsid from s_dev for "simple" filesystems exportfs: support encoding non-decodeable file handles by default exportfs: define FILEID_INO64_GEN* file handle types exportfs: make ->encode_fh() a mandatory method for NFS export exportfs: add helpers to check if filesystem can encode/decode file handles
2023-11-04Merge tag 'f2fs-for-6.7-rc1' of ↵Linus Torvalds1-30/+68
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we introduce a bigger page size support by changing the internal f2fs's block size aligned to the page size. We also continue to improve zoned block device support regarding the power off recovery. As usual, there are some bug fixes regarding the error handling routines in compression and ioctl. Enhancements: - Support Block Size == Page Size - let f2fs_precache_extents() traverses in file range - stop iterating f2fs_map_block if hole exists - preload extent_cache for POSIX_FADV_WILLNEED - compress: fix to avoid fragment w/ OPU during f2fs_ioc_compress_file() Bug fixes: - do not return EFSCORRUPTED, but try to run online repair - finish previous checkpoints before returning from remount - fix error handling of __get_node_page and __f2fs_build_free_nids - clean up zones when not successfully unmounted - fix to initialize map.m_pblk in f2fs_precache_extents() - fix to drop meta_inode's page cache in f2fs_put_super() - set the default compress_level on ioctl - fix to avoid use-after-free on dic - fix to avoid redundant compress extension - do sanity check on cluster when CONFIG_F2FS_CHECK_FS is on - fix deadloop in f2fs_write_cache_pages()" * tag 'f2fs-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: finish previous checkpoints before returning from remount f2fs: fix error handling of __get_node_page f2fs: do not return EFSCORRUPTED, but try to run online repair f2fs: fix error path of __f2fs_build_free_nids f2fs: Clean up errors in segment.h f2fs: clean up zones when not successfully unmounted f2fs: let f2fs_precache_extents() traverses in file range f2fs: avoid format-overflow warning f2fs: fix to initialize map.m_pblk in f2fs_precache_extents() f2fs: Support Block Size == Page Size f2fs: stop iterating f2fs_map_block if hole exists f2fs: preload extent_cache for POSIX_FADV_WILLNEED f2fs: set the default compress_level on ioctl f2fs: compress: fix to avoid fragment w/ OPU during f2fs_ioc_compress_file() f2fs: fix to drop meta_inode's page cache in f2fs_put_super() f2fs: split initial and dynamic conditions for extent_cache f2fs: compress: fix to avoid redundant compress extension f2fs: compress: do sanity check on cluster when CONFIG_F2FS_CHECK_FS is on f2fs: compress: fix to avoid use-after-free on dic f2fs: compress: fix deadloop in f2fs_write_cache_pages()
2023-11-03Merge tag 'mm-stable-2023-11-01-14-33' of ↵Linus Torvalds1-8/+23
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...
2023-10-30Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds1-9/+4
Pull fscrypt updates from Eric Biggers: "This update adds support for configuring the crypto data unit size (i.e. the granularity of file contents encryption) to be less than the filesystem block size. This can allow users to use inline encryption hardware in some cases when it wouldn't otherwise be possible. In addition, there are two commits that are prerequisites for the extent-based encryption support that the btrfs folks are working on" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: track master key presence separately from secret fscrypt: rename fscrypt_info => fscrypt_inode_info fscrypt: support crypto data unit size less than filesystem block size fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodes fscrypt: compute max_lblk_bits from s_maxbytes and block size fscrypt: make the bounce page pool opt-in instead of opt-out fscrypt: make it clearer that key_prefix is deprecated
2023-10-30Merge tag 'vfs-6.7.ctime' of ↵Linus Torvalds1-1/+1
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode time accessor updates from Christian Brauner: "This finishes the conversion of all inode time fields to accessor functions as discussed on list. Changing timestamps manually as we used to do before is error prone. Using accessors function makes this robust. It does not contain the switch of the time fields to discrete 64 bit integers to replace struct timespec and free up space in struct inode. But after this, the switch can be trivially made and the patch should only affect the vfs if we decide to do it" * tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits) fs: rename inode i_atime and i_mtime fields security: convert to new timestamp accessors selinux: convert to new timestamp accessors apparmor: convert to new timestamp accessors sunrpc: convert to new timestamp accessors mm: convert to new timestamp accessors bpf: convert to new timestamp accessors ipc: convert to new timestamp accessors linux: convert to new timestamp accessors zonefs: convert to new timestamp accessors xfs: convert to new timestamp accessors vboxsf: convert to new timestamp accessors ufs: convert to new timestamp accessors udf: convert to new timestamp accessors ubifs: convert to new timestamp accessors tracefs: convert to new timestamp accessors sysv: convert to new timestamp accessors squashfs: convert to new timestamp accessors server: convert to new timestamp accessors client: convert to new timestamp accessors ...
2023-10-28exportfs: make ->encode_fh() a mandatory method for NFS exportAmir Goldstein1-0/+1
Rename the default helper for encoding FILEID_INO32_GEN* file handles to generic_encode_ino32_fh() and convert the filesystems that used the default implementation to use the generic helper explicitly. After this change, exportfs_encode_inode_fh() no longer has a default implementation to encode FILEID_INO32_GEN* file handles. This is a step towards allowing filesystems to encode non-decodeable file handles for fanotify without having to implement any export_operations. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28f2fs: Convert to bdev_open_by_dev/path()Jan Kara1-6/+7
Convert f2fs to use bdev_open_by_dev/path() and pass the handle around. CC: Jaegeuk Kim <jaegeuk@kernel.org> CC: Chao Yu <chao@kernel.org> CC: linux-f2fs-devel@lists.sourceforge.net Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-23-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-22f2fs: finish previous checkpoints before returning from remountDaeho Jeong1-27/+32
Flush remaining checkpoint requests at the end of remount, since a new checkpoint would be triggered while remount and we need to take care of it before returning from remount, in order to avoid the below race condition. - Thread - checkpoint thread do_remount() down_write(&sb->s_umount); f2fs_remount() f2fs_disable_checkpoint(sbi) -> add checkpoints to the list block_operations() down_read_trylock(&sb->s_umount) = 0 up_write(&sb->s_umount); f2fs_quota_sync() dquot_writeback_dquots() WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount)); Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-10-18f2fs: convert to new timestamp accessorsJeff Layton1-1/+1
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-34-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-05f2fs: Support Block Size == Page SizeDaniel Rosenberg1-2/+2
This allows f2fs to support cases where the block size = page size for both 4K and 16K block sizes. Other sizes should work as well, should the need arise. This does not currently support 4K Block size filesystems if the page size is 16K. Signed-off-by: Daniel Rosenberg <drosen@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-10-04f2fs: dynamically allocate the f2fs-shrinkerQi Zheng1-8/+23
Use new APIs to dynamically allocate the f2fs-shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-8-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Chao Yu <chao@kernel.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-26fscrypt: support crypto data unit size less than filesystem block sizeEric Biggers1-0/+1
Until now, fscrypt has always used the filesystem block size as the granularity of file contents encryption. Two scenarios have come up where a sub-block granularity of contents encryption would be useful: 1. Inline crypto hardware that only supports a crypto data unit size that is less than the filesystem block size. 2. Support for direct I/O at a granularity less than the filesystem block size, for example at the block device's logical block size in order to match the traditional direct I/O alignment requirement. (1) first came up with older eMMC inline crypto hardware that only supports a crypto data unit size of 512 bytes. That specific case ultimately went away because all systems with that hardware continued using out of tree code and never actually upgraded to the upstream inline crypto framework. But, now it's coming back in a new way: some current UFS controllers only support a data unit size of 4096 bytes, and there is a proposal to increase the filesystem block size to 16K. (2) was discussed as a "nice to have" feature, though not essential, when support for direct I/O on encrypted files was being upstreamed. Still, the fact that this feature has come up several times does suggest it would be wise to have available. Therefore, this patch implements it by using one of the reserved bytes in fscrypt_policy_v2 to allow users to select a sub-block data unit size. Supported data unit sizes are powers of 2 between 512 and the filesystem block size, inclusively. Support is implemented for both the FS-layer and inline crypto cases. This patch focuses on the basic support for sub-block data units. Some things are out of scope for this patch but may be addressed later: - Supporting sub-block data units in combination with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64, in most cases. Unfortunately this combination usually causes data unit indices to exceed 32 bits, and thus fscrypt_supported_policy() correctly disallows it. The users who potentially need this combination are using f2fs. To support it, f2fs would need to provide an option to slightly reduce its max file size. - Supporting sub-block data units in combination with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32. This has the same problem described above, but also it will need special code to make DUN wraparound still happen on a FS block boundary. - Supporting use case (2) mentioned above. The encrypted direct I/O code will need to stop requiring and assuming FS block alignment. This won't be hard, but it belongs in a separate patch. - Supporting this feature on filesystems other than ext4 and f2fs. (Filesystems declare support for it via their fscrypt_operations.) On UBIFS, sub-block data units don't make sense because UBIFS encrypts variable-length blocks as a result of compression. CephFS could support it, but a bit more work would be needed to make the fscrypt_*_block_inplace functions play nicely with sub-block data units. I don't think there's a use case for this on CephFS anyway. Link: https://lore.kernel.org/r/20230925055451.59499-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-09-26fscrypt: replace get_ino_and_lblk_bits with just has_32bit_inodesEric Biggers1-8/+1
Now that fs/crypto/ computes the filesystem's lblk_bits from its maximum file size, it is no longer necessary for filesystems to provide lblk_bits via fscrypt_operations::get_ino_and_lblk_bits. It is still necessary for fs/crypto/ to retrieve ino_bits from the filesystem. However, this is used only to decide whether inode numbers fit in 32 bits. Also, ino_bits is static for all relevant filesystems, i.e. it doesn't depend on the filesystem instance. Therefore, in the interest of keeping things as simple as possible, replace 'get_ino_and_lblk_bits' with a flag 'has_32bit_inodes'. This can always be changed back to a function if a filesystem needs it to be dynamic, but for now a static flag is all that's needed. Link: https://lore.kernel.org/r/20230925055451.59499-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-09-25fscrypt: make the bounce page pool opt-in instead of opt-outEric Biggers1-0/+1
Replace FS_CFLG_OWN_PAGES with a bit flag 'needs_bounce_pages' which has the opposite meaning. I.e., filesystems now opt into the bounce page pool instead of opt out. Make fscrypt_alloc_bounce_page() check that the bounce page pool has been initialized. I believe the opt-in makes more sense, since nothing else in fscrypt_operations is opt-out, and these days filesystems can choose to use blk-crypto which doesn't need the fscrypt bounce page pool. Also, I happen to be planning to add two more flags, and I wanted to fix the "FS_CFLG_" name anyway as it wasn't prefixed with "FSCRYPT_". Link: https://lore.kernel.org/r/20230925055451.59499-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-09-25fscrypt: make it clearer that key_prefix is deprecatedEric Biggers1-1/+1
fscrypt_operations::key_prefix should not be set by any filesystems that aren't setting it already. This is already documented, but apparently it's not sufficiently clear, as both ceph and btrfs have tried to set it. Rename the field to legacy_key_prefix and improve the documentation to hopefully make it clearer. Link: https://lore.kernel.org/r/20230925055451.59499-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-09-12f2fs: fix to drop meta_inode's page cache in f2fs_put_super()Chao Yu1-1/+1
syzbot reports a kernel bug as below: F2FS-fs (loop1): detect filesystem reference count leak during umount, type: 10, count: 1 kernel BUG at fs/f2fs/super.c:1639! CPU: 0 PID: 15451 Comm: syz-executor.1 Not tainted 6.5.0-syzkaller-09338-ge0152e7481c6 #0 RIP: 0010:f2fs_put_super+0xce1/0xed0 fs/f2fs/super.c:1639 Call Trace: generic_shutdown_super+0x161/0x3c0 fs/super.c:693 kill_block_super+0x3b/0x70 fs/super.c:1646 kill_f2fs_super+0x2b7/0x3d0 fs/f2fs/super.c:4879 deactivate_locked_super+0x9a/0x170 fs/super.c:481 deactivate_super+0xde/0x100 fs/super.c:514 cleanup_mnt+0x222/0x3d0 fs/namespace.c:1254 task_work_run+0x14d/0x240 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x1d/0x60 kernel/entry/common.c:296 do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd In f2fs_put_super(), it tries to do sanity check on dirty and IO reference count of f2fs, once there is any reference count leak, it will trigger panic. The root case is, during f2fs_put_super(), if there is any IO error in f2fs_wait_on_all_pages(), we missed to truncate meta_inode's page cache later, result in panic, fix this case. Fixes: 20872584b8c0 ("f2fs: fix to drop all dirty meta/node pages during umount()") Reported-by: syzbot+ebd7072191e2eddd7d6e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000a14f020604a62a98@google.com Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-09-12f2fs: compress: fix to avoid redundant compress extensionChao Yu1-0/+33
With below script, redundant compress extension will be parsed and added by parse_options(), because parse_options() doesn't check whether the extension is existed or not, fix it. 1. mount -t f2fs -o compress_extension=so /dev/vdb /mnt/f2fs 2. mount -t f2fs -o remount,compress_extension=so /mnt/f2fs 3. mount|grep f2fs /dev/vdb on /mnt/f2fs type f2fs (...,compress_extension=so,compress_extension=so,...) Fixes: 4c8ff7095bef ("f2fs: support data compression") Fixes: 151b1982be5d ("f2fs: compress: add nocompress extensions support") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-09-03Merge tag 'f2fs-for-6-6-rc1' of ↵Linus Torvalds1-18/+22
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we don't have a highlighted feature enhancement, but mostly have fixed issues mainly in two parts: 1) zoned block device, and 2) compression support. For zoned block device, we've tried to improve the power-off recovery flow as much as possible. For compression, we found some corner cases caused by wrong compression policy and logics. Other than them, there were some reverts and stat corrections. Bug fixes: - use finish zone command when closing a zone - check zone type before sending async reset zone command - fix to assign compress_level for lz4 correctly - fix error path of f2fs_submit_page_read() - don't {,de}compress non-full cluster - send small discard commands during checkpoint back - flush inode if atomic file is aborted - correct to account gc/cp stats And, there are minor bug fixes, avoiding false lockdep warning, and clean-ups" * tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits) f2fs: use finish zone command when closing a zone f2fs: compress: fix to assign compress_level for lz4 correctly f2fs: fix error path of f2fs_submit_page_read() f2fs: clean up error handling in sanity_check_{compress_,}inode() f2fs: avoid false alarm of circular locking Revert "f2fs: do not issue small discard commands during checkpoint" f2fs: doc: fix description of max_small_discards f2fs: should update REQ_TIME for direct write f2fs: fix to account cp stats correctly f2fs: fix to account gc stats correctly f2fs: remove unneeded check condition in __f2fs_setxattr() f2fs: fix to update i_ctime in __f2fs_setxattr() Revert "f2fs: fix to do sanity check on extent cache correctly" f2fs: increase usage of folio_next_index() helper f2fs: Only lfs mode is allowed with zoned block device feature f2fs: check zone type before sending async reset zone command f2fs: compress: don't {,de}compress non-full cluster f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted f2fs: don't reopen the main block device in f2fs_scan_devices f2fs: fix to avoid mmap vs set_compress_option case ...
2023-08-28Merge tag 'v6.6-vfs.super' of ↵Linus Torvalds1-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...
2023-08-23f2fs: compress: fix to assign compress_level for lz4 correctlyChao Yu1-1/+1
After remount, F2FS_OPTION().compress_level was assgin to LZ4HC_DEFAULT_CLEVEL incorrectly, result in lz4hc:9 was enabled, fix it. 1. mount /dev/vdb /dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4,compress_log_size=2,...) 2. mount -t f2fs -o remount,compress_log_size=3 /mnt/f2fs/ 3. mount|grep f2fs /dev/vdb on /mnt/f2fs type f2fs (...,compress_algorithm=lz4:9,compress_log_size=3,...) Fixes: 00e120b5e4b5 ("f2fs: assign default compression level") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14f2fs: fix to account cp stats correctlyChao Yu1-1/+7
cp_foreground_calls sysfs entry shows total CP call count rather than foreground CP call count, fix it. Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14f2fs: fix to account gc stats correctlyChao Yu1-0/+1
As reported, status debugfs entry shows inconsistent GC stats as below: GC calls: 6008 (BG: 6161) - data segments : 3053 (BG: 3053) - node segments : 2955 (BG: 2955) Total GC calls is larger than BGGC calls, the reason is: - f2fs_stat_info.call_count accounts total migrated section count by f2fs_gc() - f2fs_stat_info.bg_gc accounts total call times of f2fs_gc() from background gc_thread Another issue is gc_foreground_calls sysfs entry shows total GC call count rather than FGGC call count. This patch changes as below for fix: - account GC calls and migrated segment count separately - support to account migrated section count if it enables large section mode - fix to show correct value in gc_foreground_calls sysfs entry Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14f2fs: Only lfs mode is allowed with zoned block device featureChunhai Guo1-5/+5
Now f2fs support four block allocation modes: lfs, adaptive, fragment:segment, fragment:block. Only lfs mode is allowed with zoned block device feature. Fixes: 6691d940b0e0 ("f2fs: introduce fragment allocation mode mount option") Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14f2fs: don't reopen the main block device in f2fs_scan_devicesChristoph Hellwig1-12/+8
f2fs_scan_devices reopens the main device since the very beginning, which has always been useless, and also means that we don't pass the right holder for the reopen, which now leads to a warning as the core super.c holder ops aren't passed in for the reopen. Fixes: 3c62be17d4f5 ("f2fs: support multiple devices") Fixes: 0718afd47f70 ("block: introduce holder ops") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-11fs: use the super_block as holder when mounting file systemsChristoph Hellwig1-4/+3
The file system type is not a very useful holder as it doesn't allow us to go back to the actual file system instance. Pass the super_block instead which is useful when passed back to the file system driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230802154131.2221419-7-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-24f2fs: convert to ctime accessor functionsJeff Layton1-1/+1
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-41-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-06Merge tag 'f2fs-for-6.5-rc1' of ↵Linus Torvalds1-38/+214
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we've mainly investigated the zoned block device support along with patches such as correcting write pointers between f2fs and storage, adding asynchronous zone reset flow, and managing the number of open zones. Other than them, f2fs adds another mount option, "errors=x" to specify how to handle when it detects an unexpected behavior at runtime. Enhancements: - support 'errors=remount-ro|continue|panic' mount option - enforce some inode flag policies - allow .tmp compression given extensions - add some ioctls to manage the f2fs compression - improve looped node chain flow - avoid issuing small-sized discard commands during checkpoint - implement an asynchronous zone reset Bug fixes: - fix deadlock in xattr and inode page lock - fix and add sanity check in some error paths - fix to avoid NULL pointer dereference f2fs_write_end_io() along with put_super - set proper flags to quota files - fix potential deadlock due to unpaired node_write lock use - fix over-estimating free section during FG GC - fix the wrong condition to determine atomic context As usual, also there are a number of patches with code refactoring and minor clean-ups" * tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits) f2fs: fix to do sanity check on direct node in truncate_dnode() f2fs: only set release for file that has compressed data f2fs: fix compile warning in f2fs_destroy_node_manager() f2fs: fix error path handling in truncate_dnode() f2fs: fix deadlock in i_xattr_sem and inode page lock f2fs: remove unneeded page uptodate check/set f2fs: update mtime and ctime in move file range method f2fs: compress tmp files given extension f2fs: refactor struct f2fs_attr macro f2fs: convert to use sbi directly f2fs: remove redundant assignment to variable err f2fs: do not issue small discard commands during checkpoint f2fs: check zone write pointer points to the end of zone f2fs: add f2fs_ioc_get_compress_blocks f2fs: cleanup MIN_INLINE_XATTR_SIZE f2fs: add helper to check compression level f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method f2fs: do more sanity check on inode f2fs: compress: fix to check validity of i_compress_flag field f2fs: add sanity compress level check for compressed file ...
2023-06-26f2fs: cleanup MIN_INLINE_XATTR_SIZESheng Yong1-1/+1
Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-26f2fs: add helper to check compression levelSheng Yong1-2/+2
This patch adds a helper function to check if compression level is valid. Signed-off-by: Sheng Yong <shengyong@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-26f2fs: assign default compression levelJaegeuk Kim1-5/+7
Let's avoid any confusion from assigning compress_level=0 for LZ4HC and ZSTD. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-26f2fs: introduce F2FS_QUOTA_DEFAULT_FL for cleanupChao Yu1-3/+3
This patch adds F2FS_QUOTA_DEFAULT_FL to include two default flags: F2FS_NOATIME_FL and F2FS_IMMUTABLE_FL, and use it to clean up codes. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12f2fs: avoid dead loop in f2fs_issue_checkpoint()Chao Yu1-2/+13
generic/082 reports a bug as below: __schedule+0x332/0xf60 schedule+0x6f/0xf0 schedule_timeout+0x23b/0x2a0 wait_for_completion+0x8f/0x140 f2fs_issue_checkpoint+0xfe/0x1b0 f2fs_sync_fs+0x9d/0xb0 sync_filesystem+0x87/0xb0 dquot_load_quota_sb+0x41b/0x460 dquot_load_quota_inode+0xa5/0x130 dquot_quota_on+0x4b/0x60 f2fs_quota_on+0xe3/0x1b0 do_quotactl+0x483/0x700 __x64_sys_quotactl+0x15c/0x310 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc The root casue is race case as below: Thread A Kworker IRQ - write() : write data to quota.user file - writepages - f2fs_submit_page_write - __is_cp_guaranteed return false - inc_page_count(F2FS_WB_DATA) - submit_bio - quotactl(Q_QUOTAON) - f2fs_quota_on - dquot_quota_on - dquot_load_quota_inode - vfs_setup_quota_inode : inode->i_flags |= S_NOQUOTA - f2fs_write_end_io - __is_cp_guaranteed return true - dec_page_count(F2FS_WB_CP_DATA) - dquot_load_quota_sb - f2fs_sync_fs - f2fs_issue_checkpoint - do_checkpoint - f2fs_wait_on_all_pages(F2FS_WB_CP_DATA) : loop due to F2FS_WB_CP_DATA count is negative Calling filemap_fdatawrite() and filemap_fdatawait() to keep all data clean before quota file setup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12f2fs: fix to drop all dirty meta/node pages during umount()Chao Yu1-2/+16
For cp error case, there will be dirty meta/node pages remained after f2fs_write_checkpoint() in f2fs_put_super(), drop them explicitly, and do sanity check on reference count of dirty pages and inflight IOs. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12f2fs: flush error flags in workqueueChao Yu1-3/+23
In IRQ context, it wakes up workqueue to record errors into on-disk superblock fields rather than in-memory fields. Fixes: 1aa161e43106 ("f2fs: fix scheduling while atomic in decompression path") Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12f2fs: don't reset unchangable mount option in f2fs_remount()Chao Yu1-12/+18
syzbot reports a bug as below: general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942 Call Trace: lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691 __raw_write_lock include/linux/rwlock_api_smp.h:209 [inline] _raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300 __drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100 f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116 f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664 f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838 vfs_fallocate+0x54b/0x6b0 fs/open.c:324 ksys_fallocate fs/open.c:347 [inline] __do_sys_fallocate fs/open.c:355 [inline] __se_sys_fallocate fs/open.c:353 [inline] __x64_sys_fallocate+0xbd/0x100 fs/open.c:353 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The root cause is race condition as below: - since it tries to remount rw filesystem, so that do_remount won't call sb_prepare_remount_readonly to block fallocate, there may be race condition in between remount and fallocate. - in f2fs_remount(), default_options() will reset mount option to default one, and then update it based on result of parse_options(), so there is a hole which race condition can happen. Thread A Thread B - f2fs_fill_super - parse_options - clear_opt(READ_EXTENT_CACHE) - f2fs_remount - default_options - set_opt(READ_EXTENT_CACHE) - f2fs_fallocate - f2fs_insert_range - f2fs_drop_extent_tree - __drop_extent_tree - __may_extent_tree - test_opt(READ_EXTENT_CACHE) return true - write_lock(&et->lock) access NULL pointer - parse_options - clear_opt(READ_EXTENT_CACHE) Cc: <stable@vger.kernel.org> Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12f2fs: fix to set noatime and immutable flag for quota fileChao Yu1-0/+9
We should set noatime bit for quota files, since no one cares about atime of quota file, and we should set immutalbe bit as well, due to nobody should write to the file through exported interfaces. Meanwhile this patch use inode_lock to avoid race condition during inode->i_flags, f2fs_inode->i_flags update. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig1-1/+1
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12fs: remove sb->s_modeChristoph Hellwig1-4/+6
There is no real need to store the open mode in the super_block now. It is only used by f2fs, which can easily recalculate it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: use the holder as indication for exclusive opensChristoph Hellwig1-1/+1
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: introduce holder opsChristoph Hellwig1-2/+2
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-08f2fs: support errors=remount-ro|continue|panic mountoptionChao Yu1-10/+124
This patch supports errors=remount-ro|continue|panic mount option for f2fs. f2fs behaves as below in three different modes: mode continue remount-ro panic access ops normal noraml N/A syscall errors -EIO -EROFS N/A mount option rw ro N/A pending dir write keep keep N/A pending non-dir write drop keep N/A pending node write drop keep N/A pending meta write keep keep N/A By default it uses "continue" mode. [Yangtao helps to clean up function's name] Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-24f2fs: remove power-of-two limitation of zoned deviceJaegeuk Kim1-6/+2
In f2fs, there's no reason to force po2. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-13f2fs: fix to recover quota data correctlyChao Yu1-0/+57
With -O quota mkfs option, xfstests generic/417 fails due to fsck detects data corruption on quota inodes. [ASSERT] (fsck_chk_quota_files:2051) --> Quota file is missing or invalid quota file content found. The root cause is there is a hole f2fs doesn't hold quota inodes, so all recovered quota data will be dropped due to SBI_POR_DOING flag was set. - f2fs_fill_super - f2fs_recover_orphan_inodes - f2fs_enable_quota_files - f2fs_quota_off_umount <--- quota inodes were dropped ---> - f2fs_recover_fsync_data - f2fs_enable_quota_files - f2fs_quota_off_umount This patch tries to eliminate the hole by holding quota inodes during entire recovery flow as below: - f2fs_fill_super - f2fs_recover_quota_begin - f2fs_recover_orphan_inodes - f2fs_recover_fsync_data - f2fs_recover_quota_end Then, recovered quota data can be persisted after SBI_POR_DOING is cleared. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-13f2fs: fix to check readonly condition correctlyChao Yu1-1/+1
With below case, it can mount multi-device image w/ rw option, however one of secondary device is set as ro, later update will cause panic, so let's introduce f2fs_dev_is_readonly(), and check multi-devices rw status in f2fs_remount() w/ it in order to avoid such inconsistent mount status. mkfs.f2fs -c /dev/zram1 /dev/zram0 -f blockdev --setro /dev/zram1 mount -t f2fs dev/zram0 /mnt/f2fs mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only. mount -t f2fs -o remount,rw mnt/f2fs dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=8192 kernel BUG at fs/f2fs/inline.c:258! RIP: 0010:f2fs_write_inline_data+0x23e/0x2d0 [f2fs] Call Trace: f2fs_write_single_data_page+0x26b/0x9f0 [f2fs] f2fs_write_cache_pages+0x389/0xa60 [f2fs] __f2fs_write_data_pages+0x26b/0x2d0 [f2fs] f2fs_write_data_pages+0x2e/0x40 [f2fs] do_writepages+0xd3/0x1b0 __writeback_single_inode+0x5b/0x420 writeback_sb_inodes+0x236/0x5a0 __writeback_inodes_wb+0x56/0xf0 wb_writeback+0x2a3/0x490 wb_do_writeback+0x2b2/0x330 wb_workfn+0x6a/0x260 process_one_work+0x270/0x5e0 worker_thread+0x52/0x3e0 kthread+0xf4/0x120 ret_from_fork+0x29/0x50 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-10f2fs: use f2fs_hw_is_readonly() instead of bdev_read_only()Chao Yu1-2/+2
f2fs has supported multi-device feature, to check devices' rw status, it should use f2fs_hw_is_readonly() rather than bdev_read_only(), fix it. Meanwhile, it removes f2fs_hw_is_readonly() check condition in: - f2fs_write_checkpoint() - f2fs_convert_inline_inode() As it has checked f2fs_readonly() condition, and if f2fs' devices were readonly, f2fs_readonly() must be true. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-10f2fs: set default compress option only when sb_has_compressionYangtao Li1-4/+6
If the compress feature is not enabled, there is no need to set compress-related parameters. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-04f2fs: add compression feature check for all compress mount optYangtao Li1-0/+12
Opt_compress_chksum, Opt_compress_mode and Opt_compress_cache lack the necessary check to see if the image supports compression, let's add it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>