summaryrefslogtreecommitdiff
path: root/fs/nfsd/nfsctl.c
AgeCommit message (Collapse)AuthorFilesLines
2023-09-01Merge tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds1-0/+1
Pull nfsd updates from Chuck Lever: "I'm thrilled to announce that the Linux in-kernel NFS server now offers NFSv4 write delegations. A write delegation enables a client to cache data and metadata for a single file more aggressively, reducing network round trips and server workload. Many thanks to Dai Ngo for contributing this facility, and to Jeff Layton and Neil Brown for reviewing and testing it. This release also sees the removal of all support for DES- and triple-DES-based Kerberos encryption types in the kernel's SunRPC implementation. These encryption types have been deprecated by the Internet community for years and are considered insecure. This change affects both the in-kernel NFS client and server. The server's UDP and TCP socket transports have now fully adopted David Howells' new bio_vec iterator so that no more than one sendmsg() call is needed to transmit each RPC message. In particular, this helps kTLS optimize record boundaries when sending RPC-with-TLS replies, and it takes the server a baby step closer to handling file I/O via folios. We've begun work on overhauling the SunRPC thread scheduler to remove a costly linked-list walk when looking for an idle RPC service thread to wake. The pre-requisites are included in this release. Thanks to Neil Brown for his ongoing work on this improvement" * tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits) Documentation: Add missing documentation for EXPORT_OP flags SUNRPC: Remove unused declaration rpc_modcount() SUNRPC: Remove unused declarations NFSD: da_addr_body field missing in some GETDEVICEINFO replies SUNRPC: Remove return value of svc_pool_wake_idle_thread() SUNRPC: make rqst_should_sleep() idempotent() SUNRPC: Clean up svc_set_num_threads SUNRPC: Count ingress RPC messages per svc_pool SUNRPC: Deduplicate thread wake-up code SUNRPC: Move trace_svc_xprt_enqueue SUNRPC: Add enum svc_auth_status SUNRPC: change svc_xprt::xpt_flags bits to enum SUNRPC: change svc_rqst::rq_flags bits to enum SUNRPC: change svc_pool::sp_flags bits to enum SUNRPC: change cache_head.flags bits to enum SUNRPC: remove timeout arg from svc_recv() SUNRPC: change svc_recv() to return void. SUNRPC: call svc_process() from svc_recv(). nfsd: separate nfsd_last_thread() from nfsd_put() nfsd: Simplify code around svc_exit_thread() call in nfsd() ...
2023-08-30nfsd: add a MODULE_DESCRIPTIONJeff Layton1-0/+1
I got this today from modpost: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfsd/nfsd.o Add a module description. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-28Merge tag 'v6.6-vfs.ctime' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...
2023-08-24NFSD: Fix a thinko introduced by recent trace point changesChuck Lever1-0/+1
The fixed commit erroneously removed a call to nfsd_end_grace(), which makes calls to write_v4_end_grace() a no-op. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202308241229.68396422-oliver.sang@intel.com Fixes: 39d432fc7630 ("NFSD: trace nfsctl operations") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-07-24nfsd: convert to ctime accessor functionsJeff Layton1-1/+1
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-56-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-18NFSD: Distinguish per-net namespace initializationChuck Lever1-4/+19
I find the naming of nfsd_init_net() and nfsd_startup_net() to be confusingly similar. Rename the namespace initialization and tear- down ops and add comments to distinguish their separate purposes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-18nfsd: move init of percpu reply_cache_stats counters back to nfsd_init_netJeff Layton1-1/+9
Commit f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup") moved the initialization of the reply cache into nfsd startup, but didn't account for the stats counters, which can be accessed before nfsd is ever started. The result can be a NULL pointer dereference when someone accesses /proc/fs/nfsd/reply_cache_stats while nfsd is still shut down. This is a regression and a user-triggerable oops in the right situation: - non-x86_64 arch - /proc/fs/nfsd is mounted in the namespace - nfsd is not started in the namespace - unprivileged user calls "cat /proc/fs/nfsd/reply_cache_stats" Although this is easy to trigger on some arches (like aarch64), on x86_64, calling this_cpu_ptr(NULL) evidently returns a pointer to the fixed_percpu_data. That struct looks just enough like a newly initialized percpu var to allow nfsd_reply_cache_stats_show to access it without Oopsing. Move the initialization of the per-net+per-cpu reply-cache counters back into nfsd_init_net, while leaving the rest of the reply cache allocations to be done at nfsd startup time. Kudos to Eirik who did most of the legwork to track this down. Cc: stable@vger.kernel.org # v6.3+ Fixes: f5f9d4a314da ("nfsd: move reply cache initialization into nfsd startup") Reported-and-tested-by: Eirik Fuller <efuller@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2215429 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05NFSD: trace nfsctl operationsChuck Lever1-8/+25
Add trace log eye-catchers that record the arguments used to configure NFSD. This helps when troubleshooting the NFSD administrative interfaces. These tracepoints can capture NFSD start-up and shutdown times and parameters, changes in lease time and thread count, and a request to end the namespace's NFSv4 grace period, in addition to the set of NFS versions that are enabled. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05NFSD: Clean up nfsctl_transaction_write()Chuck Lever1-6/+6
For easier readability, follow the common convention: if (error) handle_error; continue_normally; No behavior change is expected. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05NFSD: Clean up nfsctl white-space damageChuck Lever1-19/+19
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-31nfsd: fix double fget() bug in __write_ports_addfd()Dan Carpenter1-6/+1
The bug here is that you cannot rely on getting the same socket from multiple calls to fget() because userspace can influence that. This is a kind of double fetch bug. The fix is to delete the svc_alien_sock() function and instead do the checking inside the svc_addsock() function. Fixes: 3064639423c4 ("nfsd: check passed socket's net matches NFSd superblock's one") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-02nfsd: define exports_proc_ops with CONFIG_PROC_FSTom Rix1-12/+13
gcc with W=1 and ! CONFIG_PROC_FS fs/nfsd/nfsctl.c:161:30: error: ‘exports_proc_ops’ defined but not used [-Werror=unused-const-variable=] 161 | static const struct proc_ops exports_proc_ops = { | ^~~~~~~~~~~~~~~~ The only use of exports_proc_ops is when CONFIG_PROC_FS is defined, so its definition should be likewise conditional. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20NFSD: Clean up nfsd_symlink()Chuck Lever1-8/+3
The pointer dentry is assigned a value that is never read, the assignment is redundant and can be removed. Cleans up clang-scan warning: fs/nfsd/nfsctl.c:1231:2: warning: Value stored to 'dentry' is never read [deadcode.DeadStores] dentry = ERR_PTR(ret); No need to initialize "int ret = -ENOMEM;" either. These are vestiges of nfsd_mkdir(), from whence I copied nfsd_symlink(). Reported-by: Colin Ian King <colin.i.king@gmail.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20NFSD: Replace /proc/fs/nfsd/supported_krb5_enctypes with a symlinkChuck Lever1-16/+58
Now that I've added a file under /proc/net/rpc that is managed by the SunRPC's Kerberos mechanism, replace NFSD's supported_krb5_enctypes file with a symlink to the new SunRPC proc file, which contains exactly the same content. Remarkably, commit b0b0c0a26e84 ("nfsd: add proc file listing kernel's gss_krb5 enctypes") added the nfsd_supported_krb5_enctypes file in 2011, but this file has never been documented in nfsd(7). Tested-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Simo Sorce <simo@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20nfsd: move reply cache initialization into nfsd startupJeff Layton1-8/+0
There's no need to start the reply cache before nfsd is up and running, and doing so means that we register a shrinker for every net namespace instead of just the ones where nfsd is running. Move it to the per-net nfsd startup instead. Reported-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-01-12NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown timeDai Ngo1-6/+1
Currently the nfsd-client shrinker is registered and unregistered at the time the nfsd module is loaded and unloaded. The problem with this is the shrinker is being registered before all of the relevant fields in nfsd_net are initialized when nfsd is started. This can lead to an oops when memory is low and the shrinker is called while nfsd is not running. This patch moves the register/unregister of nfsd-client shrinker from module load/unload time to nfsd startup/shutdown time. Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28nfsd: allow disabling NFSv2 at compile timeJeff Layton1-0/+2
rpc.nfsd stopped supporting NFSv2 a year ago. Take the next logical step toward deprecating it and allow NFSv2 support to be compiled out. Add a new CONFIG_NFSD_V2 option that can be turned off and rework the CONFIG_NFSD_V?_ACL option dependencies. Add a description that discourages enabling it. Also, change the description of CONFIG_NFSD to state that the always-on version is now 3 instead of 2. Finally, add an #ifdef around "case 2:" in __write_versions. When NFSv2 is disabled at compile time, this should make the kernel ignore attempts to disable it at runtime, but still error out when trying to enable it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-11-28nfsd: ignore requests to disable unsupported versionsJeff Layton1-1/+3
The kernel currently errors out if you attempt to enable or disable a version that it doesn't recognize. Change it to ignore attempts to disable an unrecognized version. If we don't support it, then there is no harm in doing so. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-10-11NFSD: unregister shrinker when nfsd_init_net() failsTetsuo Handa1-1/+3
syzbot is reporting UAF read at register_shrinker_prepared() [1], for commit 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") missed that nfsd4_leases_net_shutdown() from nfsd_exit_net() is called only when nfsd_init_net() succeeded. If nfsd_init_net() fails due to nfsd_reply_cache_init() failure, register_shrinker() from nfsd4_init_leases_net() has to be undone before nfsd_init_net() returns. Link: https://syzkaller.appspot.com/bug?extid=ff796f04613b4c84ad89 [1] Reported-by: syzbot <syzbot+ff796f04613b4c84ad89@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 7746b32f467b3813 ("NFSD: add shrinker to reap courtesy clients on low memory condition") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fopsChenXiaoSong1-7/+2
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fopsChenXiaoSong1-7/+3
Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. nfsd_net is converted from seq_file->file instead of seq_file->private in nfsd_reply_cache_stats_show(). Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> [ cel: reduce line length ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and ↵ChenXiaoSong1-24/+5
supported_enctypes_fops Use DEFINE_SHOW_ATTRIBUTE helper macro to simplify the code. Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> [ cel: reduce line length ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-09-26NFSD: add shrinker to reap courtesy clients on low memory conditionDai Ngo1-2/+4
Add courtesy_client_reaper to react to low memory condition triggered by the system memory shrinker. The delayed_work for the courtesy_client_reaper is scheduled on the shrinker's count callback using the laundry_wq. The shrinker's scan callback is not used for expiring the courtesy clients due to potential deadlocks. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-30nfsd: silence extraneous printk on nfsd.ko insertionJeff Layton1-1/+0
This printk pops every time nfsd.ko gets plugged in. Most kmods don't do that and this one is not very informative. Olaf's email address seems to be defunct at this point anyway. Just drop it. Cc: Olaf Kirch <okir@suse.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-30NFSD: refactoring v4 specific code to a helper in nfs4state.cDai Ngo1-8/+1
This patch moves the v4 specific code from nfsd_init_net() to nfsd4_init_leases_net() helper in nfs4state.c Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-30NFSD: Hook up the filecache stat fileChuck Lever1-0/+10
There has always been the capability of exporting filecache metrics via /proc, but it was never hooked up. Let's surface these metrics to enable better observability of the filecache. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-30nfsd: remove redundant assignment to variable lenColin Ian King1-1/+0
Variable len is being assigned a value zero and this is never read, it is being re-assigned later. The assignment is redundant and can be removed. Cleans up clang scan-build warning: fs/nfsd/nfsctl.c:636:2: warning: Value stored to 'len' is never read Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23nfsd: Fix null-ptr-deref in nfsd_fill_super()Zhang Xiaoxu1-7/+7
KASAN report null-ptr-deref as follows: BUG: KASAN: null-ptr-deref in nfsd_fill_super+0xc6/0xe0 [nfsd] Write of size 8 at addr 000000000000005d by task a.out/852 CPU: 7 PID: 852 Comm: a.out Not tainted 5.18.0-rc7-dirty #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 kasan_report+0xab/0x120 ? nfsd_mkdir+0x71/0x1c0 [nfsd] ? nfsd_fill_super+0xc6/0xe0 [nfsd] nfsd_fill_super+0xc6/0xe0 [nfsd] ? nfsd_mkdir+0x1c0/0x1c0 [nfsd] get_tree_keyed+0x8e/0x100 vfs_get_tree+0x41/0xf0 __do_sys_fsconfig+0x590/0x670 ? fscontext_read+0x180/0x180 ? anon_inode_getfd+0x4f/0x70 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae This can be reproduce by concurrent operations: 1. fsopen(nfsd)/fsconfig 2. insmod/rmmod nfsd Since the nfsd file system is registered before than nfsd_net allocated, the caller may get the file_system_type and use the nfsd_net before it allocated, then null-ptr-deref occurred. So init_nfsd() should call register_filesystem() last. Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-23nfsd: Unregister the cld notifier when laundry_wq create failedZhang Xiaoxu1-1/+3
If laundry_wq create failed, the cld notifier should be unregistered. Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-19NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsdDai Ngo1-0/+4
This patch moves create/destroy of laundry_wq from nfs4_state_start and nfs4_state_shutdown_net to init_nfsd and exit_nfsd to prevent the laundromat from being freed while a thread is processing a conflicting lock. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Rename svc_close_xprt()Chuck Lever1-1/+1
Clean up: Use the "svc_xprt_<task>" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-02-28SUNRPC: Rename svc_create_xprt()Chuck Lever1-4/+4
Clean up: Use the "svc_xprt_<task>" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-01-24fsnotify: fix fsnotify hooks in pseudo filesystemsAmir Goldstein1-2/+3
Commit 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify will have access to a positive dentry. This allowed a race where opening the deleted file via cached dentry is now possible after receiving the IN_DELETE event. To fix the regression in pseudo filesystems, convert d_delete() calls to d_drop() (see commit 46c46f8df9aa ("devpts_pty_kill(): don't bother with d_delete()") and move the fsnotify hook after d_drop(). Add a missing fsnotify_unlink() hook in nfsdfs that was found during the audit of fsnotify hooks in pseudo filesystems. Note that the fsnotify hooks in simple_recursive_removal() follow d_invalidate(), so they require no change. Link: https://lore.kernel.org/r/20220120215305.282577-2-amir73il@gmail.com Reported-by: Ivan Delalande <colona@arista.com> Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/ Fixes: 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()") Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2022-01-08NFSD: Clean up the nfsd_net::nfssvc_boot fieldChuck Lever1-1/+2
There are two boot-time fields in struct nfsd_net: one called boot_time and one called nfssvc_boot. The latter is used only to form write verifiers, but its documenting comment declares: /* Time of server startup */ Since commit 27c438f53e79 ("nfsd: Support the server resetting the boot verifier"), this field can be reset at any time; it's no longer tied to server restart. So that comment is stale. Also, according to pahole, struct timespec64 is 16 bytes long on x86_64. The nfssvc_boot field is used only to form a write verifier, which is 8 bytes long. Let's clarify this situation by manufacturing an 8-byte verifier in nfs_reset_boot_verifier() and storing only that in struct nfsd_net. We're grabbing 128 bits of time, so compress all of those into a 64-bit verifier instead of throwing out the high-order bits. In the future, the siphash_key can be re-used for other hashed objects per-nfsd_net. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13NFSD: simplify locking for network notifier.NeilBrown1-2/+0
nfsd currently maintains an open-coded read/write semaphore (refcount and wait queue) for each network namespace to ensure the nfs service isn't shut down while the notifier is running. This is excessive. As there is unlikely to be contention between notifiers and they run without sleeping, a single spinlock is sufficient to avoid problems. Signed-off-by: NeilBrown <neilb@suse.de> [ cel: ensure nfsd_notifier_lock is static ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13SUNRPC: stop using ->sv_nrthreads as a refcountNeilBrown1-12/+10
The use of sv_nrthreads as a general refcount results in clumsy code, as is seen by various comments needed to explain the situation. This patch introduces a 'struct kref' and uses that for reference counting, leaving sv_nrthreads to be a pure count of threads. The kref is managed particularly in svc_get() and svc_put(), and also nfsd_put(); svc_destroy() now takes a pointer to the embedded kref, rather than to the serv. nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero. This happens when a transport is created before the first thread is started. To support this, a 'keep_active' flag is introduced which holds a ref on the svc_serv. This is set when any listening socket is successfully added (unless there are running threads), and cleared when the number of threads is set. So when the last thread exits, the nfs_serv will be destroyed. The use of 'keep_active' replaces previous code which checked if there were any permanent sockets. We no longer clear ->rq_server when nfsd() exits. This was done to prevent svc_exit_thread() from calling svc_destroy(). Instead we take an extra reference to the svc_serv to prevent svc_destroy() from being called. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13SUNRPC/NFSD: clean up get/put functions.NeilBrown1-2/+2
svc_destroy() is poorly named - it doesn't necessarily destroy the svc, it might just reduce the ref count. nfsd_destroy() is poorly named for the same reason. This patch: - removes the refcount functionality from svc_destroy(), moving it to a new svc_put(). Almost all previous callers of svc_destroy() now call svc_put(). - renames nfsd_destroy() to nfsd_put() and improves the code, using the new svc_destroy() rather than svc_put() - removes a few comments that explain the important for balanced get/put calls. This should be obvious. The only non-trivial part of this is that svc_destroy() would call svc_sock_update() on a non-final decrement. It can no longer do that, and svc_put() isn't really a good place of it. This call is now made from svc_exit_thread() which seems like a good place. This makes the call *before* sv_nrthreads is decremented rather than after. This is not particularly important as the call just sets a flag which causes sv_nrthreads set be checked later. A subsequent patch will improve the ordering. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13NFSD: handle errors better in write_ports_addfd()NeilBrown1-1/+1
If write_ports_add() fails, we shouldn't destroy the serv, unless we had only just created it. So if there are any permanent sockets already attached, leave the serv in place. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-10nfsd: Fix nsfd startup race (again)Alexander Sverdlin1-7/+7
Commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") has re-opened rpc_pipefs_event() race against nfsd_net_id registration (register_pernet_subsys()) which has been fixed by commit bb7ffbf29e76 ("nfsd: fix nsfd startup race triggering BUG_ON"). Restore the order of register_pernet_subsys() vs register_cld_notifier(). Add WARN_ON() to prevent a future regression. Crash info: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012 CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1 pc : rpc_pipefs_event+0x54/0x120 [nfsd] lr : rpc_pipefs_event+0x48/0x120 [nfsd] Call trace: rpc_pipefs_event+0x54/0x120 [nfsd] blocking_notifier_call_chain rpc_fill_super get_tree_keyed rpc_fs_get_tree vfs_get_tree do_mount ksys_mount __arm64_sys_mount el0_svc_handler el0_svc Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Cc: stable@vger.kernel.org Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-11-11Merge tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-3/+3
Pull nfsd updates from Bruce Fields: "A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping support for a filehandle format deprecated 20 years ago, and further xdr-related cleanup from Chuck" * tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux: (26 commits) nfsd4: remove obselete comment nfsd: document server-to-server-copy parameters NFSD:fix boolreturn.cocci warning nfsd: update create verifier comment SUNRPC: Change return value type of .pc_encode SUNRPC: Replace the "__be32 *p" parameter to .pc_encode NFSD: Save location of NFSv4 COMPOUND status SUNRPC: Change return value type of .pc_decode SUNRPC: Replace the "__be32 *p" parameter to .pc_decode SUNRPC: De-duplicate .pc_release() call sites SUNRPC: Simplify the SVC dispatch code path SUNRPC: Capture value of xdr_buf::page_base SUNRPC: Add trace event when alloc_pages_bulk() makes no progress svcrdma: Split svcrmda_wc_{read,write} tracepoints svcrdma: Split the svcrdma_wc_send() tracepoint svcrdma: Split the svcrdma_wc_receive() tracepoint NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment() SUNRPC: xdr_stream_subsegment() must handle non-zero page_bases NFSD: Initialize pointer ni with NULL and not plain integer 0 NFSD: simplify struct nfsfh ...
2021-10-06NFSD: Keep existing listeners on portlist errorBenjamin Coddington1-1/+4
If nfsd has existing listening sockets without any processes, then an error returned from svc_create_xprt() for an additional transport will remove those existing listeners. We're seeing this in practice when userspace attempts to create rpcrdma transports without having the rpcrdma modules present before creating nfsd kernel processes. Fix this by checking for existing sockets before calling nfsd_destroy(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-10-02NFSD: simplify struct nfsfhNeilBrown1-3/+3
Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a struct) and are accessed using macros like: #define fh_FOO fh_base.fh_new.fb_FOO This patch makes the union and struct anonymous, so that "fh_FOO" can be a name directly within 'struct knfsd_fh' and the #defines aren't needed. The file handle as a whole is sometimes accessed as "fh_base" or "fh_base.fh_pad", neither of which are particularly helpful names. As the struct holding the filehandle is now anonymous, we cannot use the name of that, so we union it with 'fh_raw' and use that where the raw filehandle is needed. fh_raw also ensure the structure is large enough for the largest possible filehandle. fh_raw is a 'char' array, removing any need to cast it for memcpy etc. SVCFH_fmt() is simplified using the "%ph" printk format. This changes the appearance of filehandles in dprintk() debugging, making them a little more precise. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-09-30nfsd: fix error handling of register_pernet_subsys() in init_nfsd()Patrick Ho1-1/+1
init_nfsd() should not unregister pernet subsys if the register fails but should instead unwind from the last successful operation which is register_filesystem(). Unregistering a failed register_pernet_subsys() call can result in a kernel GPF as revealed by programmatically injecting an error in register_pernet_subsys(). Verified the fix handled failure gracefully with no lingering nfsd entry in /proc/filesystems. This change was introduced by the commit bd5ae9288d64 ("nfsd: register pernet ops last, unregister first"), the original error handling logic was correct. Fixes: bd5ae9288d64 ("nfsd: register pernet ops last, unregister first") Cc: stable@vger.kernel.org Signed-off-by: Patrick Ho <Patrick.Ho@netapp.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-04-20nfsd: Fix fall-through warnings for ClangGustavo A. R. Silva1-0/+1
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding a couple of break statements instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22nfsd: report client confirmation status in "info" fileNeilBrown1-6/+8
mountd can now monitor clients appearing and disappearing in /proc/fs/nfsd/clients, and will log these events, in liu of the logging of mount/unmount events for NFSv3. Currently it cannot distinguish between unconfirmed clients (which might be transient and totally uninteresting) and confirmed clients. So add a "status: " line which reports either "confirmed" or "unconfirmed", and use fsnotify to report that the info file has been modified. This requires a bit of infrastructure to keep the dentry for the "info" file. There is no need to take a counted reference as the dentry must remain around until the client is removed. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-03-22nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmountedTrond Myklebust1-12/+2
In order to ensure that knfsd threads don't linger once the nfsd pseudofs is unmounted (e.g. when the container is killed) we let nfsd_umount() shut down those threads and wait for them to exit. This also should ensure that we don't need to do a kernel mount of the pseudofs, since the thread lifetime is now limited by the lifetime of the filesystem. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-02-23Merge tag 'nfsd-5.12-1' of ↵Linus Torvalds1-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Here are a few additional NFSD commits for the merge window: Optimization: - Cork the socket while there are queued replies Fixes: - DRC shutdown ordering - svc_rdma_accept() lockdep splat" * tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Further clean up svc_tcp_sendmsg() SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg() SUNRPC: Use TCP_CORK to optimise send performance on the server svcrdma: Hold private mutex while invoking rdma_accept() nfsd: register pernet ops last, unregister first
2021-02-15nfsd: register pernet ops last, unregister firstJ. Bruce Fields1-7/+7
These pernet operations may depend on stuff set up or torn down in the module init/exit functions. And they may be called at any time in between. So it makes more sense for them to be the last to be registered in the init function, and the first to be unregistered in the exit function. In particular, without this, the drc slab is being destroyed before all the per-net drcs are shut down, resulting in an "Objects remaining in nfsd_drc on __kmem_cache_shutdown()" warning in exit_nfsd. Reported-by: Zhi Li <yieli@redhat.com> Fixes: 3ba75830ce17 "nfsd4: drc containerization" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25nfsd: report per-export statsAmir Goldstein1-0/+3
Collect some nfsd stats per export in addition to the global stats. A new nfsdfs export_stats file is created. It uses the same ops as the exports file to iterate the export entries and we use the file's name to determine the reported info per export. For example: $ cat /proc/fs/nfsd/export_stats # Version 1.1 # Path Client Start-time # Stats /test localhost 92 fh_stale: 0 io_read: 9 io_write: 1 Every export entry reports the start time when stats collection started, so stats collecting scripts can know if stats where reset between samples. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25nfsd: protect concurrent access to nfsd stats countersAmir Goldstein1-1/+4
nfsd stats counters can be updated by concurrent nfsd threads without any protection. Convert some nfsd_stats and nfsd_net struct members to use percpu counters. The longest_chain* members of struct nfsd_net remain unprotected. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>