summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/caching/netfs-api.rst7
-rw-r--r--Documentation/filesystems/dax.rst6
-rw-r--r--Documentation/filesystems/erofs.rst2
-rw-r--r--Documentation/filesystems/ext4/blocks.rst2
-rw-r--r--Documentation/filesystems/fscrypt.rst25
-rw-r--r--Documentation/filesystems/locking.rst48
-rw-r--r--Documentation/filesystems/porting.rst6
-rw-r--r--Documentation/filesystems/vfs.rst62
8 files changed, 95 insertions, 63 deletions
diff --git a/Documentation/filesystems/caching/netfs-api.rst b/Documentation/filesystems/caching/netfs-api.rst
index f84e9ffdf0b4..5066113acad5 100644
--- a/Documentation/filesystems/caching/netfs-api.rst
+++ b/Documentation/filesystems/caching/netfs-api.rst
@@ -345,8 +345,9 @@ The following facilities are provided to manage this:
To support this, the following functions are provided::
- int fscache_set_page_dirty(struct page *page,
- struct fscache_cookie *cookie);
+ bool fscache_dirty_folio(struct address_space *mapping,
+ struct folio *folio,
+ struct fscache_cookie *cookie);
void fscache_unpin_writeback(struct writeback_control *wbc,
struct fscache_cookie *cookie);
void fscache_clear_inode_writeback(struct fscache_cookie *cookie,
@@ -354,7 +355,7 @@ To support this, the following functions are provided::
const void *aux);
The *set* function is intended to be called from the filesystem's
-``set_page_dirty`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not
+``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not
set, it sets that flag and increments the use count on the cookie (the caller
must already have called ``fscache_use_cookie()``).
diff --git a/Documentation/filesystems/dax.rst b/Documentation/filesystems/dax.rst
index e3b30429d703..c04609d8ee24 100644
--- a/Documentation/filesystems/dax.rst
+++ b/Documentation/filesystems/dax.rst
@@ -23,11 +23,11 @@ on it as usual. The `DAX` code currently only supports files with a block
size equal to your kernel's `PAGE_SIZE`, so you may need to specify a block
size when creating the filesystem.
-Currently 4 filesystems support `DAX`: ext2, ext4, xfs and virtiofs.
+Currently 5 filesystems support `DAX`: ext2, ext4, xfs, virtiofs and erofs.
Enabling `DAX` on them is different.
-Enabling DAX on ext2
---------------------
+Enabling DAX on ext2 and erofs
+------------------------------
When mounting the filesystem, use the ``-o dax`` option on the command line or
add 'dax' to the options in ``/etc/fstab``. This works to enable `DAX` on all files
diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
index 7119aa213be7..bef6d3040ce4 100644
--- a/Documentation/filesystems/erofs.rst
+++ b/Documentation/filesystems/erofs.rst
@@ -40,7 +40,7 @@ Here is the main features of EROFS:
Inode metadata size 32 bytes 64 bytes
Max file size 4 GB 16 EB (also limited by max. vol size)
Max uids/gids 65536 4294967296
- File change time no yes (64 + 32-bit timestamp)
+ Per-inode timestamp no yes (64 + 32-bit timestamp)
Max hardlinks 65536 4294967296
Metadata reserved 4 bytes 14 bytes
===================== ============ =====================================
diff --git a/Documentation/filesystems/ext4/blocks.rst b/Documentation/filesystems/ext4/blocks.rst
index bd722ecd92d6..b0f80ea87c90 100644
--- a/Documentation/filesystems/ext4/blocks.rst
+++ b/Documentation/filesystems/ext4/blocks.rst
@@ -39,7 +39,7 @@ For 32-bit filesystems, limits are as follows:
- 4TiB
- 8TiB
- 16TiB
- - 256PiB
+ - 256TiB
* - Blocks Per Block Group
- 8,192
- 16,384
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index 4d5d50dca65c..6ccd5efb25b7 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -1047,8 +1047,8 @@ astute users may notice some differences in behavior:
may be used to overwrite the source files but isn't guaranteed to be
effective on all filesystems and storage devices.
-- Direct I/O is not supported on encrypted files. Attempts to use
- direct I/O on such files will fall back to buffered I/O.
+- Direct I/O is supported on encrypted files only under some
+ circumstances. For details, see `Direct I/O support`_.
- The fallocate operations FALLOC_FL_COLLAPSE_RANGE and
FALLOC_FL_INSERT_RANGE are not supported on encrypted files and will
@@ -1179,6 +1179,27 @@ Inline encryption doesn't affect the ciphertext or other aspects of
the on-disk format, so users may freely switch back and forth between
using "inlinecrypt" and not using "inlinecrypt".
+Direct I/O support
+==================
+
+For direct I/O on an encrypted file to work, the following conditions
+must be met (in addition to the conditions for direct I/O on an
+unencrypted file):
+
+* The file must be using inline encryption. Usually this means that
+ the filesystem must be mounted with ``-o inlinecrypt`` and inline
+ encryption hardware must be present. However, a software fallback
+ is also available. For details, see `Inline encryption support`_.
+
+* The I/O request must be fully aligned to the filesystem block size.
+ This means that the file position the I/O is targeting, the lengths
+ of all I/O segments, and the memory addresses of all I/O buffers
+ must be multiples of this value. Note that the filesystem block
+ size may be greater than the logical block size of the block device.
+
+If either of the above conditions is not met, then direct I/O on the
+encrypted file will fall back to buffered I/O.
+
Implementation details
======================
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 3f9b1497ebb8..2998cec9af4b 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -239,7 +239,7 @@ prototypes::
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*readpage)(struct file *, struct page *);
int (*writepages)(struct address_space *, struct writeback_control *);
- int (*set_page_dirty)(struct page *page);
+ bool (*dirty_folio)(struct address_space *, struct folio *folio);
void (*readahead)(struct readahead_control *);
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
@@ -250,21 +250,21 @@ prototypes::
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- void (*invalidatepage) (struct page *, unsigned int, unsigned int);
+ void (*invalidate_folio) (struct folio *, size_t start, size_t len);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
bool (*isolate_page) (struct page *, isolate_mode_t);
int (*migratepage)(struct address_space *, struct page *, struct page *);
void (*putback_page) (struct page *);
- int (*launder_page)(struct page *);
- int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
+ int (*launder_folio)(struct folio *);
+ bool (*is_partially_uptodate)(struct folio *, size_t from, size_t count);
int (*error_remove_page)(struct address_space *, struct page *);
int (*swap_activate)(struct file *);
int (*swap_deactivate)(struct file *);
locking rules:
- All except set_page_dirty and freepage may block
+ All except dirty_folio and freepage may block
====================== ======================== ========= ===============
ops PageLocked(page) i_rwsem invalidate_lock
@@ -272,20 +272,20 @@ ops PageLocked(page) i_rwsem invalidate_lock
writepage: yes, unlocks (see below)
readpage: yes, unlocks shared
writepages:
-set_page_dirty no
+dirty_folio maybe
readahead: yes, unlocks shared
readpages: no shared
write_begin: locks the page exclusive
write_end: yes, unlocks exclusive
bmap:
-invalidatepage: yes exclusive
+invalidate_folio: yes exclusive
releasepage: yes
freepage: yes
direct_IO:
isolate_page: yes
migratepage: yes (both)
putback_page: yes
-launder_page: yes
+launder_folio: yes
is_partially_uptodate: yes
error_remove_page: yes
swap_activate: no
@@ -361,22 +361,22 @@ If nr_to_write is NULL, all dirty pages must be written.
writepages should _only_ write pages which are present on
mapping->io_pages.
-->set_page_dirty() is called from various places in the kernel
-when the target page is marked as needing writeback. It may be called
-under spinlock (it cannot block) and is sometimes called with the page
-not locked.
+->dirty_folio() is called from various places in the kernel when
+the target folio is marked as needing writeback. The folio cannot be
+truncated because either the caller holds the folio lock, or the caller
+has found the folio while holding the page table lock which will block
+truncation.
->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some
filesystems and by the swapper. The latter will eventually go away. Please,
keep it that way and don't breed new callers.
-->invalidatepage() is called when the filesystem must attempt to drop
+->invalidate_folio() is called when the filesystem must attempt to drop
some or all of the buffers from the page when it is being truncated. It
-returns zero on success. If ->invalidatepage is zero, the kernel uses
-block_invalidatepage() instead. The filesystem must exclusively acquire
-invalidate_lock before invalidating page cache in truncate / hole punch path
-(and thus calling into ->invalidatepage) to block races between page cache
-invalidation and page cache filling functions (fault, read, ...).
+returns zero on success. The filesystem must exclusively acquire
+invalidate_lock before invalidating page cache in truncate / hole punch
+path (and thus calling into ->invalidate_folio) to block races between page
+cache invalidation and page cache filling functions (fault, read, ...).
->releasepage() is called when the kernel is about to try to drop the
buffers from the page in preparation for freeing it. It returns zero to
@@ -386,9 +386,9 @@ the kernel assumes that the fs has no private interest in the buffers.
->freepage() is called when the kernel is done dropping the page
from the page cache.
-->launder_page() may be called prior to releasing a page if
-it is still found to be dirty. It returns zero if the page was successfully
-cleaned, or an error value if not. Note that in order to prevent the page
+->launder_folio() may be called prior to releasing a folio if
+it is still found to be dirty. It returns zero if the folio was successfully
+cleaned, or an error value if not. Note that in order to prevent the folio
getting mapped back in and redirtied, it needs to be kept locked
across the entire operation.
@@ -438,13 +438,13 @@ prototypes::
locking rules:
====================== ============= ================= =========
-ops inode->i_lock blocked_lock_lock may block
+ops flc_lock blocked_lock_lock may block
====================== ============= ================= =========
-lm_notify: yes yes no
+lm_notify: no yes no
lm_grant: no no no
lm_break: yes no no
lm_change yes no no
-lm_breaker_owns_lease: no no no
+lm_breaker_owns_lease: yes no no
====================== ============= ================= =========
buffer_head
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index bf19fd6b86e7..7c1583dbeb59 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -45,6 +45,12 @@ typically between calling iget_locked() and unlocking the inode.
At some point that will become mandatory.
+**mandatory**
+
+The foo_inode_info should always be allocated through alloc_inode_sb() rather
+than kmem_cache_alloc() or kmalloc() related to set up the inode reclaim context
+correctly.
+
---
**mandatory**
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index bf5c48066fac..4f14edf93941 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -658,7 +658,7 @@ pages, however the address_space has finer control of write sizes.
The read process essentially only requires 'readpage'. The write
process is more complicated and uses write_begin/write_end or
-set_page_dirty to write data into the address_space, and writepage and
+dirty_folio to write data into the address_space, and writepage and
writepages to writeback data to storage.
Adding and removing pages to/from an address_space is protected by the
@@ -724,7 +724,7 @@ cache in your filesystem. The following members are defined:
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*readpage)(struct file *, struct page *);
int (*writepages)(struct address_space *, struct writeback_control *);
- int (*set_page_dirty)(struct page *page);
+ bool (*dirty_folio)(struct address_space *, struct folio *);
void (*readahead)(struct readahead_control *);
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
@@ -735,7 +735,7 @@ cache in your filesystem. The following members are defined:
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
- void (*invalidatepage) (struct page *, unsigned int, unsigned int);
+ void (*invalidate_folio) (struct folio *, size_t start, size_t len);
int (*releasepage) (struct page *, int);
void (*freepage)(struct page *);
ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
@@ -745,10 +745,10 @@ cache in your filesystem. The following members are defined:
int (*migratepage) (struct page *, struct page *);
/* put migration-failed page back to right list */
void (*putback_page) (struct page *);
- int (*launder_page) (struct page *);
+ int (*launder_folio) (struct folio *);
- int (*is_partially_uptodate) (struct page *, unsigned long,
- unsigned long);
+ bool (*is_partially_uptodate) (struct folio *, size_t from,
+ size_t count);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
int (*error_remove_page) (struct mapping *mapping, struct page *page);
int (*swap_activate)(struct file *);
@@ -793,25 +793,29 @@ cache in your filesystem. The following members are defined:
This will choose pages from the address space that are tagged as
DIRTY and will pass them to ->writepage.
-``set_page_dirty``
- called by the VM to set a page dirty. This is particularly
- needed if an address space attaches private data to a page, and
- that data needs to be updated when a page is dirtied. This is
+``dirty_folio``
+ called by the VM to mark a folio as dirty. This is particularly
+ needed if an address space attaches private data to a folio, and
+ that data needs to be updated when a folio is dirtied. This is
called, for example, when a memory mapped page gets modified.
- If defined, it should set the PageDirty flag, and the
- PAGECACHE_TAG_DIRTY tag in the radix tree.
+ If defined, it should set the folio dirty flag, and the
+ PAGECACHE_TAG_DIRTY search mark in i_pages.
``readahead``
Called by the VM to read pages associated with the address_space
object. The pages are consecutive in the page cache and are
locked. The implementation should decrement the page refcount
after starting I/O on each page. Usually the page will be
- unlocked by the I/O completion handler. If the filesystem decides
- to stop attempting I/O before reaching the end of the readahead
- window, it can simply return. The caller will decrement the page
- refcount and unlock the remaining pages for you. Set PageUptodate
- if the I/O completes successfully. Setting PageError on any page
- will be ignored; simply unlock the page if an I/O error occurs.
+ unlocked by the I/O completion handler. The set of pages are
+ divided into some sync pages followed by some async pages,
+ rac->ra->async_size gives the number of async pages. The
+ filesystem should attempt to read all sync pages but may decide
+ to stop once it reaches the async pages. If it does decide to
+ stop attempting I/O, it can simply return. The caller will
+ remove the remaining pages from the address space, unlock them
+ and decrement the page refcount. Set PageUptodate if the I/O
+ completes successfully. Setting PageError on any page will be
+ ignored; simply unlock the page if an I/O error occurs.
``readpages``
called by the VM to read pages associated with the address_space
@@ -868,15 +872,15 @@ cache in your filesystem. The following members are defined:
to find out where the blocks in the file are and uses those
addresses directly.
-``invalidatepage``
- If a page has PagePrivate set, then invalidatepage will be
- called when part or all of the page is to be removed from the
+``invalidate_folio``
+ If a folio has private data, then invalidate_folio will be
+ called when part or all of the folio is to be removed from the
address space. This generally corresponds to either a
truncation, punch hole or a complete invalidation of the address
space (in the latter case 'offset' will always be 0 and 'length'
- will be PAGE_SIZE). Any private data associated with the page
+ will be folio_size()). Any private data associated with the page
should be updated to reflect this truncation. If offset is 0
- and length is PAGE_SIZE, then the private data should be
+ and length is folio_size(), then the private data should be
released, because the page must be able to be completely
discarded. This may be done by calling the ->releasepage
function, but in this case the release MUST succeed.
@@ -930,16 +934,16 @@ cache in your filesystem. The following members are defined:
``putback_page``
Called by the VM when isolated page's migration fails.
-``launder_page``
- Called before freeing a page - it writes back the dirty page.
- To prevent redirtying the page, it is kept locked during the
+``launder_folio``
+ Called before freeing a folio - it writes back the dirty folio.
+ To prevent redirtying the folio, it is kept locked during the
whole operation.
``is_partially_uptodate``
Called by the VM when reading a file through the pagecache when
- the underlying blocksize != pagesize. If the required block is
- up to date then the read can complete without needing the IO to
- bring the whole page up to date.
+ the underlying blocksize is smaller than the size of the folio.
+ If the required block is up to date then the read can complete
+ without needing I/O to bring the whole page up to date.
``is_dirty_writeback``
Called by the VM when attempting to reclaim a page. The VM uses