diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/caching/netfs-api.rst | 7 | ||||
-rw-r--r-- | Documentation/filesystems/dax.rst | 6 | ||||
-rw-r--r-- | Documentation/filesystems/erofs.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/ext4/blocks.rst | 2 | ||||
-rw-r--r-- | Documentation/filesystems/fscrypt.rst | 25 | ||||
-rw-r--r-- | Documentation/filesystems/locking.rst | 48 | ||||
-rw-r--r-- | Documentation/filesystems/porting.rst | 6 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.rst | 62 |
8 files changed, 95 insertions, 63 deletions
diff --git a/Documentation/filesystems/caching/netfs-api.rst b/Documentation/filesystems/caching/netfs-api.rst index f84e9ffdf0b4..5066113acad5 100644 --- a/Documentation/filesystems/caching/netfs-api.rst +++ b/Documentation/filesystems/caching/netfs-api.rst @@ -345,8 +345,9 @@ The following facilities are provided to manage this: To support this, the following functions are provided:: - int fscache_set_page_dirty(struct page *page, - struct fscache_cookie *cookie); + bool fscache_dirty_folio(struct address_space *mapping, + struct folio *folio, + struct fscache_cookie *cookie); void fscache_unpin_writeback(struct writeback_control *wbc, struct fscache_cookie *cookie); void fscache_clear_inode_writeback(struct fscache_cookie *cookie, @@ -354,7 +355,7 @@ To support this, the following functions are provided:: const void *aux); The *set* function is intended to be called from the filesystem's -``set_page_dirty`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not +``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not set, it sets that flag and increments the use count on the cookie (the caller must already have called ``fscache_use_cookie()``). diff --git a/Documentation/filesystems/dax.rst b/Documentation/filesystems/dax.rst index e3b30429d703..c04609d8ee24 100644 --- a/Documentation/filesystems/dax.rst +++ b/Documentation/filesystems/dax.rst @@ -23,11 +23,11 @@ on it as usual. The `DAX` code currently only supports files with a block size equal to your kernel's `PAGE_SIZE`, so you may need to specify a block size when creating the filesystem. -Currently 4 filesystems support `DAX`: ext2, ext4, xfs and virtiofs. +Currently 5 filesystems support `DAX`: ext2, ext4, xfs, virtiofs and erofs. Enabling `DAX` on them is different. -Enabling DAX on ext2 --------------------- +Enabling DAX on ext2 and erofs +------------------------------ When mounting the filesystem, use the ``-o dax`` option on the command line or add 'dax' to the options in ``/etc/fstab``. This works to enable `DAX` on all files diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst index 7119aa213be7..bef6d3040ce4 100644 --- a/Documentation/filesystems/erofs.rst +++ b/Documentation/filesystems/erofs.rst @@ -40,7 +40,7 @@ Here is the main features of EROFS: Inode metadata size 32 bytes 64 bytes Max file size 4 GB 16 EB (also limited by max. vol size) Max uids/gids 65536 4294967296 - File change time no yes (64 + 32-bit timestamp) + Per-inode timestamp no yes (64 + 32-bit timestamp) Max hardlinks 65536 4294967296 Metadata reserved 4 bytes 14 bytes ===================== ============ ===================================== diff --git a/Documentation/filesystems/ext4/blocks.rst b/Documentation/filesystems/ext4/blocks.rst index bd722ecd92d6..b0f80ea87c90 100644 --- a/Documentation/filesystems/ext4/blocks.rst +++ b/Documentation/filesystems/ext4/blocks.rst @@ -39,7 +39,7 @@ For 32-bit filesystems, limits are as follows: - 4TiB - 8TiB - 16TiB - - 256PiB + - 256TiB * - Blocks Per Block Group - 8,192 - 16,384 diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index 4d5d50dca65c..6ccd5efb25b7 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -1047,8 +1047,8 @@ astute users may notice some differences in behavior: may be used to overwrite the source files but isn't guaranteed to be effective on all filesystems and storage devices. -- Direct I/O is not supported on encrypted files. Attempts to use - direct I/O on such files will fall back to buffered I/O. +- Direct I/O is supported on encrypted files only under some + circumstances. For details, see `Direct I/O support`_. - The fallocate operations FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_INSERT_RANGE are not supported on encrypted files and will @@ -1179,6 +1179,27 @@ Inline encryption doesn't affect the ciphertext or other aspects of the on-disk format, so users may freely switch back and forth between using "inlinecrypt" and not using "inlinecrypt". +Direct I/O support +================== + +For direct I/O on an encrypted file to work, the following conditions +must be met (in addition to the conditions for direct I/O on an +unencrypted file): + +* The file must be using inline encryption. Usually this means that + the filesystem must be mounted with ``-o inlinecrypt`` and inline + encryption hardware must be present. However, a software fallback + is also available. For details, see `Inline encryption support`_. + +* The I/O request must be fully aligned to the filesystem block size. + This means that the file position the I/O is targeting, the lengths + of all I/O segments, and the memory addresses of all I/O buffers + must be multiples of this value. Note that the filesystem block + size may be greater than the logical block size of the block device. + +If either of the above conditions is not met, then direct I/O on the +encrypted file will fall back to buffered I/O. + Implementation details ====================== diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 3f9b1497ebb8..2998cec9af4b 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -239,7 +239,7 @@ prototypes:: int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); int (*writepages)(struct address_space *, struct writeback_control *); - int (*set_page_dirty)(struct page *page); + bool (*dirty_folio)(struct address_space *, struct folio *folio); void (*readahead)(struct readahead_control *); int (*readpages)(struct file *filp, struct address_space *mapping, struct list_head *pages, unsigned nr_pages); @@ -250,21 +250,21 @@ prototypes:: loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata); sector_t (*bmap)(struct address_space *, sector_t); - void (*invalidatepage) (struct page *, unsigned int, unsigned int); + void (*invalidate_folio) (struct folio *, size_t start, size_t len); int (*releasepage) (struct page *, int); void (*freepage)(struct page *); int (*direct_IO)(struct kiocb *, struct iov_iter *iter); bool (*isolate_page) (struct page *, isolate_mode_t); int (*migratepage)(struct address_space *, struct page *, struct page *); void (*putback_page) (struct page *); - int (*launder_page)(struct page *); - int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); + int (*launder_folio)(struct folio *); + bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count); int (*error_remove_page)(struct address_space *, struct page *); int (*swap_activate)(struct file *); int (*swap_deactivate)(struct file *); locking rules: - All except set_page_dirty and freepage may block + All except dirty_folio and freepage may block ====================== ======================== ========= =============== ops PageLocked(page) i_rwsem invalidate_lock @@ -272,20 +272,20 @@ ops PageLocked(page) i_rwsem invalidate_lock writepage: yes, unlocks (see below) readpage: yes, unlocks shared writepages: -set_page_dirty no +dirty_folio maybe readahead: yes, unlocks shared readpages: no shared write_begin: locks the page exclusive write_end: yes, unlocks exclusive bmap: -invalidatepage: yes exclusive +invalidate_folio: yes exclusive releasepage: yes freepage: yes direct_IO: isolate_page: yes migratepage: yes (both) putback_page: yes -launder_page: yes +launder_folio: yes is_partially_uptodate: yes error_remove_page: yes swap_activate: no @@ -361,22 +361,22 @@ If nr_to_write is NULL, all dirty pages must be written. writepages should _only_ write pages which are present on mapping->io_pages. -->set_page_dirty() is called from various places in the kernel -when the target page is marked as needing writeback. It may be called -under spinlock (it cannot block) and is sometimes called with the page -not locked. +->dirty_folio() is called from various places in the kernel when +the target folio is marked as needing writeback. The folio cannot be +truncated because either the caller holds the folio lock, or the caller +has found the folio while holding the page table lock which will block +truncation. ->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some filesystems and by the swapper. The latter will eventually go away. Please, keep it that way and don't breed new callers. -->invalidatepage() is called when the filesystem must attempt to drop +->invalidate_folio() is called when the filesystem must attempt to drop some or all of the buffers from the page when it is being truncated. It -returns zero on success. If ->invalidatepage is zero, the kernel uses -block_invalidatepage() instead. The filesystem must exclusively acquire -invalidate_lock before invalidating page cache in truncate / hole punch path -(and thus calling into ->invalidatepage) to block races between page cache -invalidation and page cache filling functions (fault, read, ...). +returns zero on success. The filesystem must exclusively acquire +invalidate_lock before invalidating page cache in truncate / hole punch +path (and thus calling into ->invalidate_folio) to block races between page +cache invalidation and page cache filling functions (fault, read, ...). ->releasepage() is called when the kernel is about to try to drop the buffers from the page in preparation for freeing it. It returns zero to @@ -386,9 +386,9 @@ the kernel assumes that the fs has no private interest in the buffers. ->freepage() is called when the kernel is done dropping the page from the page cache. -->launder_page() may be called prior to releasing a page if -it is still found to be dirty. It returns zero if the page was successfully -cleaned, or an error value if not. Note that in order to prevent the page +->launder_folio() may be called prior to releasing a folio if +it is still found to be dirty. It returns zero if the folio was successfully +cleaned, or an error value if not. Note that in order to prevent the folio getting mapped back in and redirtied, it needs to be kept locked across the entire operation. @@ -438,13 +438,13 @@ prototypes:: locking rules: ====================== ============= ================= ========= -ops inode->i_lock blocked_lock_lock may block +ops flc_lock blocked_lock_lock may block ====================== ============= ================= ========= -lm_notify: yes yes no +lm_notify: no yes no lm_grant: no no no lm_break: yes no no lm_change yes no no -lm_breaker_owns_lease: no no no +lm_breaker_owns_lease: yes no no ====================== ============= ================= ========= buffer_head diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index bf19fd6b86e7..7c1583dbeb59 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -45,6 +45,12 @@ typically between calling iget_locked() and unlocking the inode. At some point that will become mandatory. +**mandatory** + +The foo_inode_info should always be allocated through alloc_inode_sb() rather +than kmem_cache_alloc() or kmalloc() related to set up the inode reclaim context +correctly. + --- **mandatory** diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index bf5c48066fac..4f14edf93941 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -658,7 +658,7 @@ pages, however the address_space has finer control of write sizes. The read process essentially only requires 'readpage'. The write process is more complicated and uses write_begin/write_end or -set_page_dirty to write data into the address_space, and writepage and +dirty_folio to write data into the address_space, and writepage and writepages to writeback data to storage. Adding and removing pages to/from an address_space is protected by the @@ -724,7 +724,7 @@ cache in your filesystem. The following members are defined: int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); int (*writepages)(struct address_space *, struct writeback_control *); - int (*set_page_dirty)(struct page *page); + bool (*dirty_folio)(struct address_space *, struct folio *); void (*readahead)(struct readahead_control *); int (*readpages)(struct file *filp, struct address_space *mapping, struct list_head *pages, unsigned nr_pages); @@ -735,7 +735,7 @@ cache in your filesystem. The following members are defined: loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata); sector_t (*bmap)(struct address_space *, sector_t); - void (*invalidatepage) (struct page *, unsigned int, unsigned int); + void (*invalidate_folio) (struct folio *, size_t start, size_t len); int (*releasepage) (struct page *, int); void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); @@ -745,10 +745,10 @@ cache in your filesystem. The following members are defined: int (*migratepage) (struct page *, struct page *); /* put migration-failed page back to right list */ void (*putback_page) (struct page *); - int (*launder_page) (struct page *); + int (*launder_folio) (struct folio *); - int (*is_partially_uptodate) (struct page *, unsigned long, - unsigned long); + bool (*is_partially_uptodate) (struct folio *, size_t from, + size_t count); void (*is_dirty_writeback) (struct page *, bool *, bool *); int (*error_remove_page) (struct mapping *mapping, struct page *page); int (*swap_activate)(struct file *); @@ -793,25 +793,29 @@ cache in your filesystem. The following members are defined: This will choose pages from the address space that are tagged as DIRTY and will pass them to ->writepage. -``set_page_dirty`` - called by the VM to set a page dirty. This is particularly - needed if an address space attaches private data to a page, and - that data needs to be updated when a page is dirtied. This is +``dirty_folio`` + called by the VM to mark a folio as dirty. This is particularly + needed if an address space attaches private data to a folio, and + that data needs to be updated when a folio is dirtied. This is called, for example, when a memory mapped page gets modified. - If defined, it should set the PageDirty flag, and the - PAGECACHE_TAG_DIRTY tag in the radix tree. + If defined, it should set the folio dirty flag, and the + PAGECACHE_TAG_DIRTY search mark in i_pages. ``readahead`` Called by the VM to read pages associated with the address_space object. The pages are consecutive in the page cache and are locked. The implementation should decrement the page refcount after starting I/O on each page. Usually the page will be - unlocked by the I/O completion handler. If the filesystem decides - to stop attempting I/O before reaching the end of the readahead - window, it can simply return. The caller will decrement the page - refcount and unlock the remaining pages for you. Set PageUptodate - if the I/O completes successfully. Setting PageError on any page - will be ignored; simply unlock the page if an I/O error occurs. + unlocked by the I/O completion handler. The set of pages are + divided into some sync pages followed by some async pages, + rac->ra->async_size gives the number of async pages. The + filesystem should attempt to read all sync pages but may decide + to stop once it reaches the async pages. If it does decide to + stop attempting I/O, it can simply return. The caller will + remove the remaining pages from the address space, unlock them + and decrement the page refcount. Set PageUptodate if the I/O + completes successfully. Setting PageError on any page will be + ignored; simply unlock the page if an I/O error occurs. ``readpages`` called by the VM to read pages associated with the address_space @@ -868,15 +872,15 @@ cache in your filesystem. The following members are defined: to find out where the blocks in the file are and uses those addresses directly. -``invalidatepage`` - If a page has PagePrivate set, then invalidatepage will be - called when part or all of the page is to be removed from the +``invalidate_folio`` + If a folio has private data, then invalidate_folio will be + called when part or all of the folio is to be removed from the address space. This generally corresponds to either a truncation, punch hole or a complete invalidation of the address space (in the latter case 'offset' will always be 0 and 'length' - will be PAGE_SIZE). Any private data associated with the page + will be folio_size()). Any private data associated with the page should be updated to reflect this truncation. If offset is 0 - and length is PAGE_SIZE, then the private data should be + and length is folio_size(), then the private data should be released, because the page must be able to be completely discarded. This may be done by calling the ->releasepage function, but in this case the release MUST succeed. @@ -930,16 +934,16 @@ cache in your filesystem. The following members are defined: ``putback_page`` Called by the VM when isolated page's migration fails. -``launder_page`` - Called before freeing a page - it writes back the dirty page. - To prevent redirtying the page, it is kept locked during the +``launder_folio`` + Called before freeing a folio - it writes back the dirty folio. + To prevent redirtying the folio, it is kept locked during the whole operation. ``is_partially_uptodate`` Called by the VM when reading a file through the pagecache when - the underlying blocksize != pagesize. If the required block is - up to date then the read can complete without needing the IO to - bring the whole page up to date. + the underlying blocksize is smaller than the size of the folio. + If the required block is up to date then the read can complete + without needing I/O to bring the whole page up to date. ``is_dirty_writeback`` Called by the VM when attempting to reclaim a page. The VM uses |