summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/Locking4
-rw-r--r--Documentation/filesystems/autofs-mount-control.txt6
-rw-r--r--Documentation/filesystems/autofs.txt66
-rw-r--r--Documentation/filesystems/debugfs.txt16
-rw-r--r--Documentation/filesystems/porting35
-rw-r--r--Documentation/filesystems/vfs.txt8
6 files changed, 105 insertions, 30 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index efea228ccd8a..dac435575384 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -52,7 +52,7 @@ prototypes:
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*readlink) (struct dentry *, char __user *,int);
- const char *(*get_link) (struct dentry *, struct inode *, void **);
+ const char *(*get_link) (struct dentry *, struct inode *, struct delayed_call *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int, unsigned int);
int (*get_acl)(struct inode *, int);
@@ -118,6 +118,7 @@ set: exclusive
--------------------------- super_operations ---------------------------
prototypes:
struct inode *(*alloc_inode)(struct super_block *sb);
+ void (*free_inode)(struct inode *);
void (*destroy_inode)(struct inode *);
void (*dirty_inode) (struct inode *, int flags);
int (*write_inode) (struct inode *, struct writeback_control *wbc);
@@ -139,6 +140,7 @@ locking rules:
All may block [not true, see below]
s_umount
alloc_inode:
+free_inode: called from RCU callback
destroy_inode:
dirty_inode:
write_inode:
diff --git a/Documentation/filesystems/autofs-mount-control.txt b/Documentation/filesystems/autofs-mount-control.txt
index 45edad6933cc..acc02fc57993 100644
--- a/Documentation/filesystems/autofs-mount-control.txt
+++ b/Documentation/filesystems/autofs-mount-control.txt
@@ -354,8 +354,10 @@ this ioctl is called until no further expire candidates are found.
The call requires an initialized struct autofs_dev_ioctl with the
ioctlfd field set to the descriptor obtained from the open call. In
-addition an immediate expire, independent of the mount timeout, can be
-requested by setting the how field of struct args_expire to 1. If no
+addition an immediate expire that's independent of the mount timeout,
+and a forced expire that's independent of whether the mount is busy,
+can be requested by setting the how field of struct args_expire to
+AUTOFS_EXP_IMMEDIATE or AUTOFS_EXP_FORCED, respectively . If no
expire candidates can be found the ioctl returns -1 with errno set to
EAGAIN.
diff --git a/Documentation/filesystems/autofs.txt b/Documentation/filesystems/autofs.txt
index 373ad25852d3..3af38c7fd26d 100644
--- a/Documentation/filesystems/autofs.txt
+++ b/Documentation/filesystems/autofs.txt
@@ -116,7 +116,7 @@ that purpose there is another flag.
**DCACHE_MANAGE_TRANSIT**
If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but
-related behaviors are invoked, both using the `d_op->d_manage()`
+related behaviours are invoked, both using the `d_op->d_manage()`
dentry operation.
Firstly, before checking to see if any filesystem is mounted on the
@@ -193,8 +193,8 @@ VFS remain in RCU-walk mode, but can only tell it to get out of
RCU-walk mode by returning `-ECHILD`.
So `d_manage()`, when called with `rcu_walk` set, should either return
--ECHILD if there is any reason to believe it is unsafe to end the
-mounted filesystem, and otherwise should return 0.
+-ECHILD if there is any reason to believe it is unsafe to enter the
+mounted filesystem, otherwise it should return 0.
autofs will return `-ECHILD` if an expiry of the filesystem has been
initiated or is being considered, otherwise it returns 0.
@@ -210,7 +210,7 @@ mounts that were created by `d_automount()` returning a filesystem to be
mounted. As autofs doesn't return such a filesystem but leaves the
mounting to the automount daemon, it must involve the automount daemon
in unmounting as well. This also means that autofs has more control
-of expiry.
+over expiry.
The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to
the `umount` system call. Unmounting with MNT_EXPIRE will fail unless
@@ -225,7 +225,7 @@ unmount any filesystems mounted on the autofs filesystem or remove any
symbolic links or empty directories any time it likes. If the unmount
or removal is successful the filesystem will be returned to the state
it was before the mount or creation, so that any access of the name
-will trigger normal auto-mount processing. In particlar, `rmdir` and
+will trigger normal auto-mount processing. In particular, `rmdir` and
`unlink` do not leave negative entries in the dcache as a normal
filesystem would, so an attempt to access a recently-removed object is
passed to autofs for handling.
@@ -240,11 +240,18 @@ Normally the daemon only wants to remove entries which haven't been
used for a while. For this purpose autofs maintains a "`last_used`"
time stamp on each directory or symlink. For symlinks it genuinely
does record the last time the symlink was "used" or followed to find
-out where it points to. For directories the field is a slight
-misnomer. It actually records the last time that autofs checked if
-the directory or one of its descendents was busy and found that it
-was. This is just as useful and doesn't require updating the field so
-often.
+out where it points to. For directories the field is used slightly
+differently. The field is updated at mount time and during expire
+checks if it is found to be in use (ie. open file descriptor or
+process working directory) and during path walks. The update done
+during path walks prevents frequent expire and immediate mount of
+frequently accessed automounts. But in the case where a GUI continually
+access or an application frequently scans an autofs directory tree
+there can be an accumulation of mounts that aren't actually being
+used. To cater for this case the "`strictexpire`" autofs mount option
+can be used to avoid the "`last_used`" update on path walk thereby
+preventing this apparent inability to expire mounts that aren't
+really in use.
The daemon is able to ask autofs if anything is due to be expired,
using an `ioctl` as discussed later. For a *direct* mount, autofs
@@ -255,8 +262,12 @@ up.
There is an option with indirect mounts to consider each of the leaves
that has been mounted on instead of considering the top-level names.
-This is intended for compatability with version 4 of autofs and should
-be considered as deprecated.
+This was originally intended for compatibility with version 4 of autofs
+and should be considered as deprecated for Sun Format automount maps.
+However, it may be used again for amd format mount maps (which are
+generally indirect maps) because the amd automounter allows for the
+setting of an expire timeout for individual mounts. But there are
+some difficulties in making the needed changes for this.
When autofs considers a directory it checks the `last_used` time and
compares it with the "timeout" value set when the filesystem was
@@ -273,7 +284,7 @@ mounts. If it finds something in the root directory to expire it will
return the name of that thing. Once a name has been returned the
automount daemon needs to unmount any filesystems mounted below the
name normally. As described above, this is unsafe for non-toplevel
-mounts in a version-5 autofs. For this reason the current `automountd`
+mounts in a version-5 autofs. For this reason the current `automount(8)`
does not use this ioctl.
The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or
@@ -345,7 +356,7 @@ The `wait_queue_token` is a unique number which can identify a
particular request to be acknowledged. When a message is sent over
the pipe the affected dentry is marked as either "active" or
"expiring" and other accesses to it block until the message is
-acknowledged using one of the ioctls below and the relevant
+acknowledged using one of the ioctls below with the relevant
`wait_queue_token`.
Communicating with autofs: root directory ioctls
@@ -367,15 +378,14 @@ The available ioctl commands are:
This mode is also entered if a write to the pipe fails.
- **AUTOFS_IOC_PROTOVER**: This returns the protocol version in use.
- **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which
- is really a version number for the implementation. It is
- currently 2.
+ is really a version number for the implementation.
- **AUTOFS_IOC_SETTIMEOUT**: This passes a pointer to an unsigned
long. The value is used to set the timeout for expiry, and
the current timeout value is stored back through the pointer.
- **AUTOFS_IOC_ASKUMOUNT**: Returns, in the pointed-to `int`, 1 if
the filesystem could be unmounted. This is only a hint as
the situation could change at any instant. This call can be
- use to avoid a more expensive full unmount attempt.
+ used to avoid a more expensive full unmount attempt.
- **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is
anything suitable to expire. A pointer to a packet:
@@ -400,6 +410,11 @@ The available ioctl commands are:
**AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored
and objects are expired if the are not in use.
+ **AUTOFS_EXP_FORCED** causes the in use status to be ignored
+ and objects are expired ieven if they are in use. This assumes
+ that the daemon has requested this because it is capable of
+ performing the umount.
+
**AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level
name to expire. This is only safe when *maxproto* is 4.
@@ -415,7 +430,7 @@ which can be used to communicate directly with the autofs filesystem.
It requires CAP_SYS_ADMIN for access.
The `ioctl`s that can be used on this device are described in a separate
-document `autofs-mount-control.txt`, and are summarized briefly here.
+document `autofs-mount-control.txt`, and are summarised briefly here.
Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure:
struct autofs_dev_ioctl {
@@ -511,6 +526,21 @@ directories.
Catatonic mode can only be left via the
**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`.
+The "ignore" mount option
+-------------------------
+
+The "ignore" mount option can be used to provide a generic indicator
+to applications that the mount entry should be ignored when displaying
+mount information.
+
+In other OSes that provide autofs and that provide a mount list to user
+space based on the kernel mount list a no-op mount option ("ignore" is
+the one use on the most common OSes) is allowed so that autofs file
+system users can optionally use it.
+
+This is intended to be used by user space programs to exclude autofs
+mounts from consideration when reading the mounts list.
+
autofs, name spaces, and shared mounts
--------------------------------------
diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt
index 4f45f71149cb..4a0a9c3f4af6 100644
--- a/Documentation/filesystems/debugfs.txt
+++ b/Documentation/filesystems/debugfs.txt
@@ -31,10 +31,10 @@ This call, if successful, will make a directory called name underneath the
indicated parent directory. If parent is NULL, the directory will be
created in the debugfs root. On success, the return value is a struct
dentry pointer which can be used to create files in the directory (and to
-clean it up at the end). A NULL return value indicates that something went
-wrong. If ERR_PTR(-ENODEV) is returned, that is an indication that the
-kernel has been built without debugfs support and none of the functions
-described below will work.
+clean it up at the end). An ERR_PTR(-ERROR) return value indicates that
+something went wrong. If ERR_PTR(-ENODEV) is returned, that is an
+indication that the kernel has been built without debugfs support and none
+of the functions described below will work.
The most general way to create a file within a debugfs directory is with:
@@ -48,8 +48,9 @@ should hold the file, data will be stored in the i_private field of the
resulting inode structure, and fops is a set of file operations which
implement the file's behavior. At a minimum, the read() and/or write()
operations should be provided; others can be included as needed. Again,
-the return value will be a dentry pointer to the created file, NULL for
-error, or ERR_PTR(-ENODEV) if debugfs support is missing.
+the return value will be a dentry pointer to the created file,
+ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is
+missing.
Create a file with an initial size, the following function can be used
instead:
@@ -214,7 +215,8 @@ can be removed with:
void debugfs_remove(struct dentry *dentry);
-The dentry value can be NULL, in which case nothing will be removed.
+The dentry value can be NULL or an error value, in which case nothing will
+be removed.
Once upon a time, debugfs users were required to remember the dentry
pointer for every debugfs file they created so that all files could be
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index cf43bc4dbf31..3bd1148d8bb6 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -638,3 +638,38 @@ in your dentry operations instead.
inode to d_splice_alias() will also do the right thing (equivalent of
d_add(dentry, NULL); return NULL;), so that kind of special cases
also doesn't need a separate treatment.
+--
+[strongly recommended]
+ take the RCU-delayed parts of ->destroy_inode() into a new method -
+ ->free_inode(). If ->destroy_inode() becomes empty - all the better,
+ just get rid of it. Synchronous work (e.g. the stuff that can't
+ be done from an RCU callback, or any WARN_ON() where we want the
+ stack trace) *might* be movable to ->evict_inode(); however,
+ that goes only for the things that are not needed to balance something
+ done by ->alloc_inode(). IOW, if it's cleaning up the stuff that
+ might have accumulated over the life of in-core inode, ->evict_inode()
+ might be a fit.
+
+ Rules for inode destruction:
+ * if ->destroy_inode() is non-NULL, it gets called
+ * if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
+ * combination of NULL ->destroy_inode and NULL ->free_inode is
+ treated as NULL/free_inode_nonrcu, to preserve the compatibility.
+
+ Note that the callback (be it via ->free_inode() or explicit call_rcu()
+ in ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
+ as the matter of fact, the superblock and all associated structures
+ might be already gone. The filesystem driver is guaranteed to be still
+ there, but that's it. Freeing memory in the callback is fine; doing
+ more than that is possible, but requires a lot of care and is best
+ avoided.
+--
+[mandatory]
+ DCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the
+ default. DCACHE_NORCU opts out, and only d_alloc_pseudo() has any
+ business doing so.
+--
+[mandatory]
+ d_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are
+ very suspect (and won't work in modules). Such uses are very likely to
+ be misspelled d_alloc_anon().
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 761c6fd24a53..57fc576b1f3e 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -3,8 +3,6 @@
Original author: Richard Gooch <rgooch@atnf.csiro.au>
- Last updated on June 24, 2007.
-
Copyright (C) 1999 Richard Gooch
Copyright (C) 2005 Pekka Enberg
@@ -465,6 +463,12 @@ otherwise noted.
argument. If request can't be handled without leaving RCU mode,
have it return ERR_PTR(-ECHILD).
+ If the filesystem stores the symlink target in ->i_link, the
+ VFS may use it directly without calling ->get_link(); however,
+ ->get_link() must still be provided. ->i_link must not be
+ freed until after an RCU grace period. Writing to ->i_link
+ post-iget() time requires a 'release' memory barrier.
+
readlink: this is now just an override for use by readlink(2) for the
cases when ->get_link uses nd_jump_link() or object is not in
fact a symlink. Normally filesystems should only implement