From d6b4dcf5c580470ed553052206836adfaa2052fc Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 5 Dec 2017 09:41:03 -0500 Subject: fs/file.c: trim includes Signed-off-by: Al Viro --- fs/file.c | 5 ----- 1 file changed, 5 deletions(-) (limited to 'fs') diff --git a/fs/file.c b/fs/file.c index 3b080834b870..bb2d251e19c1 100644 --- a/fs/file.c +++ b/fs/file.c @@ -11,18 +11,13 @@ #include #include #include -#include -#include #include #include -#include #include #include #include -#include #include #include -#include unsigned int sysctl_nr_open __read_mostly = 1024*1024; unsigned int sysctl_nr_open_min = BITS_PER_LONG; -- cgit v1.2.3 From 9c5650359a1e7fc21e191fdc087f31154ce27ae2 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Sat, 18 Nov 2017 07:02:17 +0800 Subject: vfs: remove unused hardirq.h Preempt counter APIs have been split out, currently, hardirq.h just includes irq_enter/exit APIs which are not used by vfs at all. So, remove the unused hardirq.h. Signed-off-by: Yang Shi Cc: Alexander Viro Signed-off-by: Al Viro --- fs/dcache.c | 1 - fs/file_table.c | 1 - 2 files changed, 2 deletions(-) (limited to 'fs') diff --git a/fs/dcache.c b/fs/dcache.c index 5c7df1df81ff..b99a39206930 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -32,7 +32,6 @@ #include #include #include -#include #include #include #include diff --git a/fs/file_table.c b/fs/file_table.c index 2dc9f38bd195..7ec0b3e5f05d 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include -- cgit v1.2.3 From f1ee616214cb22410e939d963bbb2349c2570f02 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 21 Dec 2017 09:45:40 +1100 Subject: VFS: don't keep disconnected dentries on d_anon The original purpose of the per-superblock d_anon list was to keep disconnected dentries in the cache between consecutive requests to the NFS server. Dentries can be disconnected if a client holds a file open and repeatedly performs IO on it, and if the server drops the dentry, whether due to memory pressure, server restart, or "echo 3 > /proc/sys/vm/drop_caches". This purpose was thwarted by commit 75a6f82a0d10 ("freeing unlinked file indefinitely delayed") which caused disconnected dentries to be freed as soon as their refcount reached zero. This means that, when a dentry being used by nfsd gets disconnected, a new one needs to be allocated for every request (unless requests overlap). As the dentry has no name, no parent, and no children, there is little of value to cache. As small memory allocations are typically fast (from per-cpu free lists) this likely has little cost. This means that the original purpose of s_anon is no longer relevant: there is no longer any need to keep disconnected dentries on a list so they appear to be hashed. However, s_anon now has a new use. When you mount an NFS filesystem, the dentry stored in s_root is just a placebo. The "real" root dentry is allocated using d_obtain_root() and so it kept on the s_anon list. I don't know the reason for this, but suspect it related to NFSv4 where a mount of "server:/some/path" require NFS to look up the root filehandle on the server, then walk down "/some" and "/path" to get the filehandle to mount. Whatever the reason, NFS depends on the s_anon list and on shrink_dcache_for_umount() pruning all dentries on this list. So we cannot simply remove s_anon. We could just leave the code unchanged, but apart from that being potentially confusing, the (unfair) bit-spin-lock which protects s_anon can become a bottle neck when lots of disconnected dentries are being created. So this patch renames s_anon to s_roots, and stops storing disconnected dentries on the list. Only dentries obtained with d_obtain_root() are now stored on this list. There are many fewer of these (only NFS and NILFS2 use the call, and only during filesystem mount) so contention on the bit-lock will not be a problem. Possibly an alternate solution should be found for NFS and NILFS2, but that would require understanding their needs first. Signed-off-by: NeilBrown Signed-off-by: Al Viro --- Documentation/filesystems/nfs/Exporting | 27 +++++++++++++++------- .../staging/lustre/lustre/llite/llite_internal.h | 10 +------- fs/dcache.c | 22 ++++++++++-------- fs/super.c | 2 +- include/linux/fs.h | 2 +- 5 files changed, 34 insertions(+), 29 deletions(-) (limited to 'fs') diff --git a/Documentation/filesystems/nfs/Exporting b/Documentation/filesystems/nfs/Exporting index 520a4becb75c..63889149f532 100644 --- a/Documentation/filesystems/nfs/Exporting +++ b/Documentation/filesystems/nfs/Exporting @@ -56,13 +56,25 @@ a/ A dentry flag DCACHE_DISCONNECTED which is set on any dentry that might not be part of the proper prefix. This is set when anonymous dentries are created, and cleared when a dentry is noticed to be a child of a dentry which is in the proper - prefix. - -b/ A per-superblock list "s_anon" of dentries which are the roots of - subtrees that are not in the proper prefix. These dentries, as - well as the proper prefix, need to be released at unmount time. As - these dentries will not be hashed, they are linked together on the - d_hash list_head. + prefix. If the refcount on a dentry with this flag set + becomes zero, the dentry is immediately discarded, rather than being + kept in the dcache. If a dentry that is not already in the dcache + is repeatedly accessed by filehandle (as NFSD might do), an new dentry + will be a allocated for each access, and discarded at the end of + the access. + + Note that such a dentry can acquire children, name, ancestors, etc. + without losing DCACHE_DISCONNECTED - that flag is only cleared when + subtree is successfully reconnected to root. Until then dentries + in such subtree are retained only as long as there are references; + refcount reaching zero means immediate eviction, same as for unhashed + dentries. That guarantees that we won't need to hunt them down upon + umount. + +b/ A primitive for creation of secondary roots - d_obtain_root(inode). + Those do _not_ bear DCACHE_DISCONNECTED. They are placed on the + per-superblock list (->s_roots), so they can be located at umount + time for eviction purposes. c/ Helper routines to allocate anonymous dentries, and to help attach loose directory dentries at lookup time. They are: @@ -77,7 +89,6 @@ c/ Helper routines to allocate anonymous dentries, and to help attach (such as an anonymous one created by d_obtain_alias), if appropriate. It returns NULL when the passed-in dentry is used, following the calling convention of ->lookup. - Filesystem Issues ----------------- diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h index b133fd00c08c..0d62fcf016dc 100644 --- a/drivers/staging/lustre/lustre/llite/llite_internal.h +++ b/drivers/staging/lustre/lustre/llite/llite_internal.h @@ -1296,15 +1296,7 @@ static inline void d_lustre_invalidate(struct dentry *dentry, int nested) spin_lock_nested(&dentry->d_lock, nested ? DENTRY_D_LOCK_NESTED : DENTRY_D_LOCK_NORMAL); ll_d2d(dentry)->lld_invalid = 1; - /* - * We should be careful about dentries created by d_obtain_alias(). - * These dentries are not put in the dentry tree, instead they are - * linked to sb->s_anon through dentry->d_hash. - * shrink_dcache_for_umount() shrinks the tree and sb->s_anon list. - * If we unhashed such a dentry, unmount would not be able to find - * it and busy inodes would be reported. - */ - if (d_count(dentry) == 0 && !(dentry->d_flags & DCACHE_DISCONNECTED)) + if (d_count(dentry) == 0) __d_drop(dentry); spin_unlock(&dentry->d_lock); } diff --git a/fs/dcache.c b/fs/dcache.c index b99a39206930..17e6b84b9656 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -48,8 +48,8 @@ * - i_dentry, d_u.d_alias, d_inode of aliases * dcache_hash_bucket lock protects: * - the dcache hash table - * s_anon bl list spinlock protects: - * - the s_anon list (see __d_drop) + * s_roots bl list spinlock protects: + * - the s_roots list (see __d_drop) * dentry->d_sb->s_dentry_lru_lock protects: * - the dcache lru lists and counters * d_lock protects: @@ -67,7 +67,7 @@ * dentry->d_lock * dentry->d_sb->s_dentry_lru_lock * dcache_hash_bucket lock - * s_anon lock + * s_roots lock * * If there is an ancestor relationship: * dentry->d_parent->...->d_parent->d_lock @@ -476,10 +476,10 @@ void __d_drop(struct dentry *dentry) /* * Hashed dentries are normally on the dentry hashtable, * with the exception of those newly allocated by - * d_obtain_alias, which are always IS_ROOT: + * d_obtain_root, which are always IS_ROOT: */ if (unlikely(IS_ROOT(dentry))) - b = &dentry->d_sb->s_anon; + b = &dentry->d_sb->s_roots; else b = d_hash(dentry->d_name.hash); @@ -1499,8 +1499,8 @@ void shrink_dcache_for_umount(struct super_block *sb) sb->s_root = NULL; do_one_tree(dentry); - while (!hlist_bl_empty(&sb->s_anon)) { - dentry = dget(hlist_bl_entry(hlist_bl_first(&sb->s_anon), struct dentry, d_hash)); + while (!hlist_bl_empty(&sb->s_roots)) { + dentry = dget(hlist_bl_entry(hlist_bl_first(&sb->s_roots), struct dentry, d_hash)); do_one_tree(dentry); } } @@ -1964,9 +1964,11 @@ static struct dentry *__d_obtain_alias(struct inode *inode, int disconnected) spin_lock(&tmp->d_lock); __d_set_inode_and_type(tmp, inode, add_flags); hlist_add_head(&tmp->d_u.d_alias, &inode->i_dentry); - hlist_bl_lock(&tmp->d_sb->s_anon); - hlist_bl_add_head(&tmp->d_hash, &tmp->d_sb->s_anon); - hlist_bl_unlock(&tmp->d_sb->s_anon); + if (!disconnected) { + hlist_bl_lock(&tmp->d_sb->s_roots); + hlist_bl_add_head(&tmp->d_hash, &tmp->d_sb->s_roots); + hlist_bl_unlock(&tmp->d_sb->s_roots); + } spin_unlock(&tmp->d_lock); spin_unlock(&inode->i_lock); diff --git a/fs/super.c b/fs/super.c index d4e33e8f1e6f..9ea66601d664 100644 --- a/fs/super.c +++ b/fs/super.c @@ -207,7 +207,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, if (s->s_user_ns != &init_user_ns) s->s_iflags |= SB_I_NODEV; INIT_HLIST_NODE(&s->s_instances); - INIT_HLIST_BL_HEAD(&s->s_anon); + INIT_HLIST_BL_HEAD(&s->s_roots); mutex_init(&s->s_sync_lock); INIT_LIST_HEAD(&s->s_inodes); spin_lock_init(&s->s_inode_list_lock); diff --git a/include/linux/fs.h b/include/linux/fs.h index 2995a271ec46..6276f8315e5b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1359,7 +1359,7 @@ struct super_block { const struct fscrypt_operations *s_cop; - struct hlist_bl_head s_anon; /* anonymous dentries for (nfs) exporting */ + struct hlist_bl_head s_roots; /* alternate root dentries for NFS */ struct list_head s_mounts; /* list of mounts; _not_ for fs use */ struct block_device *s_bdev; struct backing_dev_info *s_bdi; -- cgit v1.2.3 From 6db620012fceea7cf203a9889e311f27dc49a2c7 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 30 Dec 2017 00:03:39 -0500 Subject: nfs4file: get rid of pointless include of btrfs.h should've been killed by "vfs: pull btrfs clone API to vfs layer"... Signed-off-by: Al Viro --- fs/nfs/nfs4file.c | 1 - 1 file changed, 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 626d1382002e..6b3b372b59b9 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -8,7 +8,6 @@ #include #include #include -#include /* BTRFS_IOC_CLONE/BTRFS_IOC_CLONE_RANGE */ #include "delegation.h" #include "internal.h" #include "iostat.h" -- cgit v1.2.3 From 7d815165c1a64da9fd1b0f4ac8d97ba938ff1d71 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Sat, 6 Jan 2018 09:45:42 -0800 Subject: eventfd: convert to use anon_inode_getfd() Nothing actually calls eventfd_file_create() besides the eventfd2() system call itself. So simplify things by folding it into the system call and using anon_inode_getfd() instead of anon_inode_getfile(). This removes over 40 lines with no change in functionality. (eventfd_file_create() was apparently added years ago for KVM irqfd's, but was never used.) Signed-off-by: Eric Biggers Signed-off-by: Al Viro --- fs/eventfd.c | 53 +++++++------------------------------------------ include/linux/eventfd.h | 5 ----- 2 files changed, 7 insertions(+), 51 deletions(-) (limited to 'fs') diff --git a/fs/eventfd.c b/fs/eventfd.c index 2fb4eadaa118..4167e670ed4d 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -412,72 +412,33 @@ struct eventfd_ctx *eventfd_ctx_fileget(struct file *file) } EXPORT_SYMBOL_GPL(eventfd_ctx_fileget); -/** - * eventfd_file_create - Creates an eventfd file pointer. - * @count: Initial eventfd counter value. - * @flags: Flags for the eventfd file. - * - * This function creates an eventfd file pointer, w/out installing it into - * the fd table. This is useful when the eventfd file is used during the - * initialization of data structures that require extra setup after the eventfd - * creation. So the eventfd creation is split into the file pointer creation - * phase, and the file descriptor installation phase. - * In this way races with userspace closing the newly installed file descriptor - * can be avoided. - * Returns an eventfd file pointer, or a proper error pointer. - */ -struct file *eventfd_file_create(unsigned int count, int flags) +SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) { - struct file *file; struct eventfd_ctx *ctx; + int fd; /* Check the EFD_* constants for consistency. */ BUILD_BUG_ON(EFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(EFD_NONBLOCK != O_NONBLOCK); if (flags & ~EFD_FLAGS_SET) - return ERR_PTR(-EINVAL); + return -EINVAL; ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) - return ERR_PTR(-ENOMEM); + return -ENOMEM; kref_init(&ctx->kref); init_waitqueue_head(&ctx->wqh); ctx->count = count; ctx->flags = flags; - file = anon_inode_getfile("[eventfd]", &eventfd_fops, ctx, - O_RDWR | (flags & EFD_SHARED_FCNTL_FLAGS)); - if (IS_ERR(file)) + fd = anon_inode_getfd("[eventfd]", &eventfd_fops, ctx, + O_RDWR | (flags & EFD_SHARED_FCNTL_FLAGS)); + if (fd < 0) eventfd_free_ctx(ctx); - return file; -} - -SYSCALL_DEFINE2(eventfd2, unsigned int, count, int, flags) -{ - int fd, error; - struct file *file; - - error = get_unused_fd_flags(flags & EFD_SHARED_FCNTL_FLAGS); - if (error < 0) - return error; - fd = error; - - file = eventfd_file_create(count, flags); - if (IS_ERR(file)) { - error = PTR_ERR(file); - goto err_put_unused_fd; - } - fd_install(fd, file); - return fd; - -err_put_unused_fd: - put_unused_fd(fd); - - return error; } SYSCALL_DEFINE1(eventfd, unsigned int, count) diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index 60b2985e8a18..15826192cc23 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -30,7 +30,6 @@ struct file; #ifdef CONFIG_EVENTFD -struct file *eventfd_file_create(unsigned int count, int flags); struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx); void eventfd_ctx_put(struct eventfd_ctx *ctx); struct file *eventfd_fget(int fd); @@ -47,10 +46,6 @@ int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *w * Ugly ugly ugly error layer to support modules that uses eventfd but * pretend to work in !CONFIG_EVENTFD configurations. Namely, AIO. */ -static inline struct file *eventfd_file_create(unsigned int count, int flags) -{ - return ERR_PTR(-ENOSYS); -} static inline struct eventfd_ctx *eventfd_ctx_fdget(int fd) { -- cgit v1.2.3 From b6364572d641c8eba9eab9bcc31d8962f96ddf15 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Sat, 6 Jan 2018 09:45:43 -0800 Subject: eventfd: fold eventfd_ctx_read() into eventfd_read() eventfd_ctx_read() is not used outside of eventfd.c, so unexport it and fold it into eventfd_read(). This slightly simplifies the code and makes it more analogous to eventfd_write(). (eventfd_ctx_read() was apparently added years ago for KVM irqfd's, but was never used.) Signed-off-by: Eric Biggers Signed-off-by: Al Viro --- fs/eventfd.c | 53 ++++++++++++++----------------------------------- include/linux/eventfd.h | 7 ------- 2 files changed, 15 insertions(+), 45 deletions(-) (limited to 'fs') diff --git a/fs/eventfd.c b/fs/eventfd.c index 4167e670ed4d..6138d2b5cdeb 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -207,36 +207,27 @@ int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *w } EXPORT_SYMBOL_GPL(eventfd_ctx_remove_wait_queue); -/** - * eventfd_ctx_read - Reads the eventfd counter or wait if it is zero. - * @ctx: [in] Pointer to eventfd context. - * @no_wait: [in] Different from zero if the operation should not block. - * @cnt: [out] Pointer to the 64-bit counter value. - * - * Returns %0 if successful, or the following error codes: - * - * - -EAGAIN : The operation would have blocked but @no_wait was non-zero. - * - -ERESTARTSYS : A signal interrupted the wait operation. - * - * If @no_wait is zero, the function might sleep until the eventfd internal - * counter becomes greater than zero. - */ -ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt) +static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count, + loff_t *ppos) { + struct eventfd_ctx *ctx = file->private_data; ssize_t res; + __u64 ucnt = 0; DECLARE_WAITQUEUE(wait, current); + if (count < sizeof(ucnt)) + return -EINVAL; + spin_lock_irq(&ctx->wqh.lock); - *cnt = 0; res = -EAGAIN; if (ctx->count > 0) - res = 0; - else if (!no_wait) { + res = sizeof(ucnt); + else if (!(file->f_flags & O_NONBLOCK)) { __add_wait_queue(&ctx->wqh, &wait); for (;;) { set_current_state(TASK_INTERRUPTIBLE); if (ctx->count > 0) { - res = 0; + res = sizeof(ucnt); break; } if (signal_pending(current)) { @@ -250,31 +241,17 @@ ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt) __remove_wait_queue(&ctx->wqh, &wait); __set_current_state(TASK_RUNNING); } - if (likely(res == 0)) { - eventfd_ctx_do_read(ctx, cnt); + if (likely(res > 0)) { + eventfd_ctx_do_read(ctx, &ucnt); if (waitqueue_active(&ctx->wqh)) wake_up_locked_poll(&ctx->wqh, POLLOUT); } spin_unlock_irq(&ctx->wqh.lock); - return res; -} -EXPORT_SYMBOL_GPL(eventfd_ctx_read); - -static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count, - loff_t *ppos) -{ - struct eventfd_ctx *ctx = file->private_data; - ssize_t res; - __u64 cnt; - - if (count < sizeof(cnt)) - return -EINVAL; - res = eventfd_ctx_read(ctx, file->f_flags & O_NONBLOCK, &cnt); - if (res < 0) - return res; + if (res > 0 && put_user(ucnt, (__u64 __user *)buf)) + return -EFAULT; - return put_user(cnt, (__u64 __user *) buf) ? -EFAULT : sizeof(cnt); + return res; } static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count, diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index 15826192cc23..566fef14d0a6 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -36,7 +36,6 @@ struct file *eventfd_fget(int fd); struct eventfd_ctx *eventfd_ctx_fdget(int fd); struct eventfd_ctx *eventfd_ctx_fileget(struct file *file); __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n); -ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, __u64 *cnt); int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *wait, __u64 *cnt); @@ -62,12 +61,6 @@ static inline void eventfd_ctx_put(struct eventfd_ctx *ctx) } -static inline ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int no_wait, - __u64 *cnt) -{ - return -ENOSYS; -} - static inline int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *wait, __u64 *cnt) { -- cgit v1.2.3 From 105f2b7096075eacb6d2c83a6e00b652c2951063 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Sat, 6 Jan 2018 09:45:44 -0800 Subject: eventfd: fold eventfd_ctx_get() into eventfd_ctx_fileget() eventfd_ctx_get() is not used outside of eventfd.c, so unexport it and fold it into eventfd_ctx_fileget(). (eventfd_ctx_get() was apparently added years ago for KVM irqfd's, but was never used.) Signed-off-by: Eric Biggers Signed-off-by: Al Viro --- fs/eventfd.c | 21 ++++++--------------- include/linux/eventfd.h | 2 +- 2 files changed, 7 insertions(+), 16 deletions(-) (limited to 'fs') diff --git a/fs/eventfd.c b/fs/eventfd.c index 6138d2b5cdeb..bc0105ae253f 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -79,25 +79,12 @@ static void eventfd_free(struct kref *kref) eventfd_free_ctx(ctx); } -/** - * eventfd_ctx_get - Acquires a reference to the internal eventfd context. - * @ctx: [in] Pointer to the eventfd context. - * - * Returns: In case of success, returns a pointer to the eventfd context. - */ -struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) -{ - kref_get(&ctx->kref); - return ctx; -} -EXPORT_SYMBOL_GPL(eventfd_ctx_get); - /** * eventfd_ctx_put - Releases a reference to the internal eventfd context. * @ctx: [in] Pointer to eventfd context. * * The eventfd context reference must have been previously acquired either - * with eventfd_ctx_get() or eventfd_ctx_fdget(). + * with eventfd_ctx_fdget() or eventfd_ctx_fileget(). */ void eventfd_ctx_put(struct eventfd_ctx *ctx) { @@ -382,10 +369,14 @@ EXPORT_SYMBOL_GPL(eventfd_ctx_fdget); */ struct eventfd_ctx *eventfd_ctx_fileget(struct file *file) { + struct eventfd_ctx *ctx; + if (file->f_op != &eventfd_fops) return ERR_PTR(-EINVAL); - return eventfd_ctx_get(file->private_data); + ctx = file->private_data; + kref_get(&ctx->kref); + return ctx; } EXPORT_SYMBOL_GPL(eventfd_ctx_fileget); diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index 566fef14d0a6..7094718b653b 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -26,11 +26,11 @@ #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE) +struct eventfd_ctx; struct file; #ifdef CONFIG_EVENTFD -struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx); void eventfd_ctx_put(struct eventfd_ctx *ctx); struct file *eventfd_fget(int fd); struct eventfd_ctx *eventfd_ctx_fdget(int fd); -- cgit v1.2.3 From 4bfd054ae11ea061685c4a2a6234fdc8e92fad41 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 16 Jan 2018 21:44:24 -0800 Subject: fs: fold __inode_permission() into inode_permission() Since commit 9c630ebefeee ("ovl: simplify permission checking"), overlayfs doesn't call __inode_permission() anymore, which leaves no users other than inode_permission(). So just fold it back into inode_permission(). Signed-off-by: Eric Biggers Signed-off-by: Al Viro --- fs/namei.c | 71 ++++++++++++++++++++---------------------------------- include/linux/fs.h | 1 - 2 files changed, 26 insertions(+), 46 deletions(-) (limited to 'fs') diff --git a/fs/namei.c b/fs/namei.c index f0c7a7b9b6ca..29b044022e9c 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -390,50 +390,6 @@ static inline int do_inode_permission(struct inode *inode, int mask) return generic_permission(inode, mask); } -/** - * __inode_permission - Check for access rights to a given inode - * @inode: Inode to check permission on - * @mask: Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC) - * - * Check for read/write/execute permissions on an inode. - * - * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask. - * - * This does not check for a read-only file system. You probably want - * inode_permission(). - */ -int __inode_permission(struct inode *inode, int mask) -{ - int retval; - - if (unlikely(mask & MAY_WRITE)) { - /* - * Nobody gets write access to an immutable file. - */ - if (IS_IMMUTABLE(inode)) - return -EPERM; - - /* - * Updating mtime will likely cause i_uid and i_gid to be - * written back improperly if their true value is unknown - * to the vfs. - */ - if (HAS_UNMAPPED_ID(inode)) - return -EACCES; - } - - retval = do_inode_permission(inode, mask); - if (retval) - return retval; - - retval = devcgroup_inode_permission(inode, mask); - if (retval) - return retval; - - return security_inode_permission(inode, mask); -} -EXPORT_SYMBOL(__inode_permission); - /** * sb_permission - Check superblock-level permissions * @sb: Superblock of inode to check permission on @@ -472,7 +428,32 @@ int inode_permission(struct inode *inode, int mask) retval = sb_permission(inode->i_sb, inode, mask); if (retval) return retval; - return __inode_permission(inode, mask); + + if (unlikely(mask & MAY_WRITE)) { + /* + * Nobody gets write access to an immutable file. + */ + if (IS_IMMUTABLE(inode)) + return -EPERM; + + /* + * Updating mtime will likely cause i_uid and i_gid to be + * written back improperly if their true value is unknown + * to the vfs. + */ + if (HAS_UNMAPPED_ID(inode)) + return -EACCES; + } + + retval = do_inode_permission(inode, mask); + if (retval) + return retval; + + retval = devcgroup_inode_permission(inode, mask); + if (retval) + return retval; + + return security_inode_permission(inode, mask); } EXPORT_SYMBOL(inode_permission); diff --git a/include/linux/fs.h b/include/linux/fs.h index 85c8ddc55760..b49251112add 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2699,7 +2699,6 @@ extern sector_t bmap(struct inode *, sector_t); #endif extern int notify_change(struct dentry *, struct iattr *, struct inode **); extern int inode_permission(struct inode *, int); -extern int __inode_permission(struct inode *, int); extern int generic_permission(struct inode *, int); extern int __check_sticky(struct inode *dir, struct inode *inode); -- cgit v1.2.3 From 01950a349ec254f28bf9ad06e74a166521d213e1 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 16 Jan 2018 22:25:12 -0800 Subject: fs/buffer.c: fold init_buffer() into init_page_buffers() Since commit e76004093db1 ("fs/buffer.c: remove unnecessary init operation after allocating buffer_head"), there are no callers of init_buffer() outside of init_page_buffers(). So just fold it into init_page_buffers(). Signed-off-by: Eric Biggers Signed-off-by: Al Viro --- fs/buffer.c | 10 ++-------- include/linux/buffer_head.h | 1 - 2 files changed, 2 insertions(+), 9 deletions(-) (limited to 'fs') diff --git a/fs/buffer.c b/fs/buffer.c index 0736a6a2e2f0..3091801169ce 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -53,13 +53,6 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh, #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers) -void init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private) -{ - bh->b_end_io = handler; - bh->b_private = private; -} -EXPORT_SYMBOL(init_buffer); - inline void touch_buffer(struct buffer_head *bh) { trace_block_touch_buffer(bh); @@ -922,7 +915,8 @@ init_page_buffers(struct page *page, struct block_device *bdev, do { if (!buffer_mapped(bh)) { - init_buffer(bh, NULL, NULL); + bh->b_end_io = NULL; + bh->b_private = NULL; bh->b_bdev = bdev; bh->b_blocknr = block; if (uptodate) diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 8b1bf8d3d4a2..58a82f58e44e 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -151,7 +151,6 @@ void buffer_check_dirty_writeback(struct page *page, void mark_buffer_dirty(struct buffer_head *bh); void mark_buffer_write_io_error(struct buffer_head *bh); -void init_buffer(struct buffer_head *, bh_end_io_t *, void *); void touch_buffer(struct buffer_head *bh); void set_bh_page(struct buffer_head *bh, struct page *page, unsigned long offset); -- cgit v1.2.3 From 854d3e63438d72cde8296a4c4564898c5f9dd01a Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Mon, 20 Nov 2017 18:05:07 +0300 Subject: dcache: subtract d_hash_shift from 32 in advance Signed-off-by: Alexey Dobriyan Signed-off-by: Al Viro --- fs/dcache.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/dcache.c b/fs/dcache.c index 17e6b84b9656..d4f5b52d99be 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -110,7 +110,7 @@ static struct hlist_bl_head *dentry_hashtable __read_mostly; static inline struct hlist_bl_head *d_hash(unsigned int hash) { - return dentry_hashtable + (hash >> (32 - d_hash_shift)); + return dentry_hashtable + (hash >> d_hash_shift); } #define IN_LOOKUP_SHIFT 10 @@ -3593,6 +3593,7 @@ static void __init dcache_init_early(void) &d_hash_mask, 0, 0); + d_hash_shift = 32 - d_hash_shift; } static void __init dcache_init(void) @@ -3619,6 +3620,7 @@ static void __init dcache_init(void) &d_hash_mask, 0, 0); + d_hash_shift = 32 - d_hash_shift; } /* SLAB cache for __getname() consumers */ -- cgit v1.2.3 From b35d786b674345bb32b5181d48408ec2de147011 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Mon, 20 Nov 2017 18:05:52 +0300 Subject: dcache: delete unused d_hash_mask Signed-off-by: Alexey Dobriyan Signed-off-by: Al Viro --- fs/dcache.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/dcache.c b/fs/dcache.c index d4f5b52d99be..f110e9eebb58 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -103,7 +103,6 @@ EXPORT_SYMBOL(slash_name); * information, yet avoid using a prime hash-size or similar. */ -static unsigned int d_hash_mask __read_mostly; static unsigned int d_hash_shift __read_mostly; static struct hlist_bl_head *dentry_hashtable __read_mostly; @@ -3590,7 +3589,7 @@ static void __init dcache_init_early(void) 13, HASH_EARLY | HASH_ZERO, &d_hash_shift, - &d_hash_mask, + NULL, 0, 0); d_hash_shift = 32 - d_hash_shift; @@ -3617,7 +3616,7 @@ static void __init dcache_init(void) 13, HASH_ZERO, &d_hash_shift, - &d_hash_mask, + NULL, 0, 0); d_hash_shift = 32 - d_hash_shift; -- cgit v1.2.3 From 5bdd0c6f89fba430e18d636493398389dadc3b17 Mon Sep 17 00:00:00 2001 From: Jake Daryll Obina Date: Fri, 22 Sep 2017 00:00:14 +0800 Subject: jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path If jffs2_iget() fails for a newly-allocated inode, jffs2_do_clear_inode() can get called twice in the error handling path, the first call in jffs2_iget() itself and the second through iget_failed(). This can result to a use-after-free error in the second jffs2_do_clear_inode() call, such as shown by the oops below wherein the second jffs2_do_clear_inode() call was trying to free node fragments that were already freed in the first jffs2_do_clear_inode() call. [ 78.178860] jffs2: error: (1904) jffs2_do_read_inode_internal: CRC failed for read_inode of inode 24 at physical location 0x1fc00c [ 78.178914] Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b7b [ 78.185871] pgd = ffffffc03a567000 [ 78.188794] [6b6b6b6b6b6b6b7b] *pgd=0000000000000000, *pud=0000000000000000 [ 78.194968] Internal error: Oops: 96000004 [#1] PREEMPT SMP ... [ 78.513147] PC is at rb_first_postorder+0xc/0x28 [ 78.516503] LR is at jffs2_kill_fragtree+0x28/0x90 [jffs2] [ 78.520672] pc : [] lr : [] pstate: 60000105 [ 78.526757] sp : ffffff800cea38f0 [ 78.528753] x29: ffffff800cea38f0 x28: ffffffc01f3f8e80 [ 78.532754] x27: 0000000000000000 x26: ffffff800cea3c70 [ 78.536756] x25: 00000000dc67c8ae x24: ffffffc033d6945d [ 78.540759] x23: ffffffc036811740 x22: ffffff800891a5b8 [ 78.544760] x21: 0000000000000000 x20: 0000000000000000 [ 78.548762] x19: ffffffc037d48910 x18: ffffff800891a588 [ 78.552764] x17: 0000000000000800 x16: 0000000000000c00 [ 78.556766] x15: 0000000000000010 x14: 6f2065646f6e695f [ 78.560767] x13: 6461657220726f66 x12: 2064656c69616620 [ 78.564769] x11: 435243203a6c616e x10: 7265746e695f6564 [ 78.568771] x9 : 6f6e695f64616572 x8 : ffffffc037974038 [ 78.572774] x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008 [ 78.576775] x5 : 002f91d85bd44a2f x4 : 0000000000000000 [ 78.580777] x3 : 0000000000000000 x2 : 000000403755e000 [ 78.584779] x1 : 6b6b6b6b6b6b6b6b x0 : 6b6b6b6b6b6b6b6b ... [ 79.038551] [] rb_first_postorder+0xc/0x28 [ 79.042962] [] jffs2_do_clear_inode+0x88/0x100 [jffs2] [ 79.048395] [] jffs2_evict_inode+0x3c/0x48 [jffs2] [ 79.053443] [] evict+0xb0/0x168 [ 79.056835] [] iput+0x1c0/0x200 [ 79.060228] [] iget_failed+0x30/0x3c [ 79.064097] [] jffs2_iget+0x2d8/0x360 [jffs2] [ 79.068740] [] jffs2_lookup+0xe8/0x130 [jffs2] [ 79.073434] [] lookup_slow+0x118/0x190 [ 79.077435] [] walk_component+0xfc/0x28c [ 79.081610] [] path_lookupat+0x84/0x108 [ 79.085699] [] filename_lookup+0x88/0x100 [ 79.089960] [] user_path_at_empty+0x58/0x6c [ 79.094396] [] vfs_statx+0xa4/0x114 [ 79.098138] [] SyS_newfstatat+0x58/0x98 [ 79.102227] [] __sys_trace_return+0x0/0x4 [ 79.106489] Code: d65f03c0 f9400001 b40000e1 aa0103e0 (f9400821) The jffs2_do_clear_inode() call in jffs2_iget() is unnecessary since iget_failed() will eventually call jffs2_do_clear_inode() if needed, so just remove it. Fixes: 5451f79f5f81 ("iget: stop JFFS2 from using iget() and read_inode()") Reviewed-by: Richard Weinberger Signed-off-by: Jake Daryll Obina Signed-off-by: Al Viro --- fs/jffs2/fs.c | 1 - 1 file changed, 1 deletion(-) (limited to 'fs') diff --git a/fs/jffs2/fs.c b/fs/jffs2/fs.c index e96c6b05e43e..3c96f4bdc549 100644 --- a/fs/jffs2/fs.c +++ b/fs/jffs2/fs.c @@ -362,7 +362,6 @@ error_io: ret = -EIO; error: mutex_unlock(&f->sem); - jffs2_do_clear_inode(c, f); iget_failed(inode); return ERR_PTR(ret); } -- cgit v1.2.3