summaryrefslogtreecommitdiff
path: root/net/sunrpc
AgeCommit message (Collapse)AuthorFilesLines
2023-06-18svcrdma: Fix stale commentChuck Lever1-4/+2
Commit 7d81ee8722d6 ("svcrdma: Single-stage RDMA Read") changed the behavior of svc_rdma_recvfrom() but neglected to update the documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Address RCU warning in net/sunrpc/svc.cChuck Lever1-2/+6
$ make C=1 W=1 net/sunrpc/svc.o make[1]: Entering directory 'linux/obj/manet.1015granger.net' GEN Makefile CALL linux/server-development/scripts/checksyscalls.sh DESCEND objtool INSTALL libsubcmd_headers DESCEND bpf/resolve_btfids INSTALL libsubcmd_headers CC [M] net/sunrpc/svc.o CHECK linux/server-development/net/sunrpc/svc.c linux/server-development/net/sunrpc/svc.c:1225:9: warning: incorrect type in argument 1 (different address spaces) linux/server-development/net/sunrpc/svc.c:1225:9: expected struct spinlock [usertype] *lock linux/server-development/net/sunrpc/svc.c:1225:9: got struct spinlock [noderef] __rcu * linux/server-development/net/sunrpc/svc.c:1227:40: warning: incorrect type in argument 1 (different address spaces) linux/server-development/net/sunrpc/svc.c:1227:40: expected struct spinlock [usertype] *lock linux/server-development/net/sunrpc/svc.c:1227:40: got struct spinlock [noderef] __rcu * make[1]: Leaving directory 'linux/obj/manet.1015granger.net' Warning introduced by commit 913292c97d75 ("sched.h: Annotate sighand_struct with __rcu"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Use sysfs_emit in place of strlcpy/sprintfAzeem Shaikh1-5/+5
Part of an effort to remove strlcpy() tree-wide [1]. Direct replacement is safe here since the getter in kernel_params_ops handles -errno return [2]. [1] https://github.com/KSPP/linux/issues/89 [2] https://elixir.bootlin.com/linux/v6.4-rc6/source/include/linux/moduleparam.h#L52 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Remove transport class dprintk call sitesChuck Lever1-3/+0
Remove a couple of dprintk call sites that are of little value. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Fix comments for transport class registrationChuck Lever1-0/+12
The preceding block comment before svc_register_xprt_class() is not related to that function. While we're here, add proper documenting comments for these two publicly-visible functions. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17svcrdma: Remove an unused argument from __svc_rdma_put_rw_ctxt()Chuck Lever1-4/+3
Clean up. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17svcrdma: trace cc_release callsChuck Lever1-0/+2
This event brackets the svcrdma_post_* trace points. If this trace event is enabled but does not appear as expected, that indicates a chunk_ctxt leak. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17svcrdma: Convert "might sleep" comment into a code annotationChuck Lever2-2/+5
Try to catch incorrect calling contexts mechanically rather than by code review. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Move initialization of rq_stimeChuck Lever1-1/+2
Micro-optimization: Call ktime_get() only when ->xpo_recvfrom() has given us a full RPC message to process. rq_stime isn't used otherwise, so this avoids pointless work. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17SUNRPC: Optimize page release in svc_rdma_sendto()Chuck Lever1-2/+2
Now that we have bulk page allocation and release APIs, it's more efficient to use those than it is for nfsd threads to wait for send completions. Previous patches have eliminated the calls to wait_for_completion() and complete(), in order to avoid scheduler overhead. Now release pages-under-I/O in the send completion handler using the efficient bulk release API. I've measured a 7% reduction in cumulative CPU utilization in svc_rdma_sendto(), svc_rdma_wc_send(), and svc_xprt_release(). In particular, using release_pages() instead of complete() cuts the time per svc_rdma_wc_send() call by two-thirds. This helps improve scalability because svc_rdma_wc_send() is single-threaded per connection. Reviewed-by: Tom Talpey <tom@talpey.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-17svcrdma: Prevent page release when nothing was receivedChuck Lever1-6/+6
I noticed that svc_rqst_release_pages() was still unnecessarily releasing a page when svc_rdma_recvfrom() returns zero. Fixes: a53d5cb0646a ("svcrdma: Avoid releasing a page in svc_xprt_release()") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Revert 2a1e4f21d841 ("svcrdma: Normalize Send page handling")Chuck Lever2-22/+13
Get rid of the completion wait in svc_rdma_sendto(), and release pages in the send completion handler again. A subsequent patch will handle releasing those pages more efficiently. Reverted by hand: patch -R would not apply cleanly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12SUNRPC: Revert 579900670ac7 ("svcrdma: Remove unused sc_pages field")Chuck Lever1-0/+25
Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply cleanly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12SUNRPC: Revert cc93ce9529a6 ("svcrdma: Retain the page backing ↵Chuck Lever1-5/+0
rq_res.head[0].iov_base") Pre-requisite for releasing pages in the send completion handler. Reverted by hand: patch -R would not apply cleanly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Clean up allocation of svc_rdma_rw_ctxtChuck Lever1-4/+6
The physical device's favored NUMA node ID is available when allocating a rw_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Clean up allocation of svc_rdma_send_ctxtChuck Lever1-5/+4
The physical device's favored NUMA node ID is available when allocating a send_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Clean up allocation of svc_rdma_recv_ctxtChuck Lever1-11/+7
The physical device's favored NUMA node ID is available when allocating a recv_ctxt. Use that value instead of relying on the assumption that the memory allocation happens to be running on a node close to the device. This clean up eliminates the hack of destroying recv_ctxts that were not created by the receive CQ thread -- recv_ctxts are now always allocated on a "good" node. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-12svcrdma: Allocate new transports on device's NUMA nodeChuck Lever1-9/+9
The physical device's NUMA node ID is available when allocating an svc_xprt for an incoming connection. Use that value to ensure the svc_xprt structure is allocated on the NUMA node closest to the device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-11NFSD: Hoist rq_vec preparation into nfsd_read() [step two]Chuck Lever1-14/+12
Now that the preparation of an rq_vec has been removed from the generic read path, nfsd_splice_read() no longer needs to reset rq_next_page. nfsd4_encode_read() calls nfsd_splice_read() directly. As far as I can ascertain, resetting rq_next_page for NFSv4 splice reads is unnecessary because rq_next_page is already set correctly. Moreover, resetting it might even be incorrect if previous operations in the COMPOUND have already consumed at least a page of the send buffer. I would expect that the result would be encoding the READ payload over previously-encoded results. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05SUNRPC: Use __alloc_bulk_pages() in svc_init_buffer()Chuck Lever1-16/+7
Clean up: Use the bulk page allocator when filling a server thread's buffer page array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05SUNRPC: Resupply rq_pages from node-local memoryChuck Lever1-2/+3
svc_init_buffer() is careful to allocate the initial set of server thread buffer pages from memory on the local NUMA node. svc_alloc_arg() should also be that careful. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05SUNRPC: Trace struct svc_sock lifetime eventsChuck Lever1-1/+3
Capture a timestamp and pointer address during the creation and destruction of struct svc_sock to record its lifetime. This helps to diagnose transport reference counting issues. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05SUNRPC: Improve observability in svc_tcp_accept()Chuck Lever1-7/+2
The -ENOMEM arm could fire repeatedly if the system runs low on memory, so remove it. Don't bother to trace -EAGAIN error events, since those fire after a listener is created (with no work done) and once again after an accept has been handled successfully (again, with no work done). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05SUNRPC: Remove dprintk() in svc_handle_xprt()Chuck Lever1-3/+0
When enabled, this dprintk() fires for every incoming RPC, which is an enormous amount of log traffic. These days, after the first few hundred log messages, the system journald is just going to mute it, along with all other NFSD debug output. Let's rely on trace points for this high-traffic information instead. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05SUNRPC: Fix an incorrect commentChuck Lever1-1/+1
The correct function name is svc_tcp_listen_data_ready(). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-05SUNRPC: Fix UAF in svc_tcp_listen_data_ready()Ding Hui1-12/+11
After the listener svc_sock is freed, and before invoking svc_tcp_accept() for the established child sock, there is a window that the newsock retaining a freed listener svc_sock in sk_user_data which cloning from parent. In the race window, if data is received on the newsock, we will observe use-after-free report in svc_tcp_listen_data_ready(). Reproduce by two tasks: 1. while :; do rpc.nfsd 0 ; rpc.nfsd; done 2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done KASAN report: ================================================================== BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] Read of size 8 at addr ffff888139d96228 by task nc/102553 CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 Call Trace: <IRQ> dump_stack_lvl+0x33/0x50 print_address_description.constprop.0+0x27/0x310 print_report+0x3e/0x70 kasan_report+0xae/0xe0 svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] tcp_data_queue+0x9f4/0x20e0 tcp_rcv_established+0x666/0x1f60 tcp_v4_do_rcv+0x51c/0x850 tcp_v4_rcv+0x23fc/0x2e80 ip_protocol_deliver_rcu+0x62/0x300 ip_local_deliver_finish+0x267/0x350 ip_local_deliver+0x18b/0x2d0 ip_rcv+0x2fb/0x370 __netif_receive_skb_one_core+0x166/0x1b0 process_backlog+0x24c/0x5e0 __napi_poll+0xa2/0x500 net_rx_action+0x854/0xc90 __do_softirq+0x1bb/0x5de do_softirq+0xcb/0x100 </IRQ> <TASK> ... </TASK> Allocated by task 102371: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7b/0x90 svc_setup_socket+0x52/0x4f0 [sunrpc] svc_addsock+0x20d/0x400 [sunrpc] __write_ports_addfd+0x209/0x390 [nfsd] write_ports+0x239/0x2c0 [nfsd] nfsctl_transaction_write+0xac/0x110 [nfsd] vfs_write+0x1c3/0xae0 ksys_write+0xed/0x1c0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Freed by task 102551: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x50 __kasan_slab_free+0x106/0x190 __kmem_cache_free+0x133/0x270 svc_xprt_free+0x1e2/0x350 [sunrpc] svc_xprt_destroy_all+0x25a/0x440 [sunrpc] nfsd_put+0x125/0x240 [nfsd] nfsd_svc+0x2cb/0x3c0 [nfsd] write_threads+0x1ac/0x2a0 [nfsd] nfsctl_transaction_write+0xac/0x110 [nfsd] vfs_write+0x1c3/0xae0 ksys_write+0xed/0x1c0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Fix the UAF by simply doing nothing in svc_tcp_listen_data_ready() if state != TCP_LISTEN, that will avoid dereferencing svsk for all child socket. Link: https://lore.kernel.org/lkml/20230507091131.23540-1-dinghui@sangfor.com.cn/ Fixes: fa9251afc33c ("SUNRPC: Call the default socket callbacks instead of open coding") Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-06-02Merge tag 'nfsd-6.4-2' of ↵Linus Torvalds1-18/+6
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Two minor bug fixes * tag 'nfsd-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix double fget() bug in __write_ports_addfd() nfsd: make a copy of struct iattr before calling notify_change
2023-05-31nfsd: fix double fget() bug in __write_ports_addfd()Dan Carpenter1-18/+6
The bug here is that you cannot rely on getting the same socket from multiple calls to fget() because userspace can influence that. This is a kind of double fetch bug. The fix is to delete the svc_alien_sock() function and instead do the checking inside the svc_addsock() function. Fixes: 3064639423c4 ("nfsd: check passed socket's net matches NFSd superblock's one") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-22Merge tag 'nfs-for-6.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds1-3/+2
Pull NFS client fixes from Anna Schumaker: "Stable Fix: - Don't change task->tk_status after the call to rpc_exit_task Other Bugfixes: - Convert kmap_atomic() to kmap_local_folio() - Fix a potential double free with READ_PLUS" * tag 'nfs-for-6.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.2: Fix a potential double free with READ_PLUS SUNRPC: Don't change task->tk_status after the call to rpc_exit_task NFS: Convert kmap_atomic() to kmap_local_folio()
2023-05-19SUNRPC: Don't change task->tk_status after the call to rpc_exit_taskTrond Myklebust1-3/+2
Some calls to rpc_exit_task() may deliberately change the value of task->tk_status, for instance because it gets checked by the RPC call's rpc_release() callback. That makes it wrong to reset the value to task->tk_rpc_status. In particular this causes a bug where the rpc_call_done() callback tries to fail over a set of pNFS/flexfiles writes to a different IP address, but the reset of task->tk_status causes nfs_commit_release_pages() to immediately mark the file as having a fatal error. Fixes: 39494194f93b ("SUNRPC: Fix races with rpc_killall_tasks()") Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-17Merge tag 'nfsd-6.4-1' of ↵Linus Torvalds6-46/+66
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - A collection of minor bug fixes * tag 'nfsd-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: Remove open coding of string copy SUNRPC: Fix trace_svc_register() call site SUNRPC: always free ctxt when freeing deferred request SUNRPC: double free xprt_ctxt while still in use SUNRPC: Fix error handling in svc_setup_socket() SUNRPC: Fix encoding of accepted but unsuccessful RPC replies lockd: define nlm_port_min,max with CONFIG_SYSCTL nfsd: define exports_proc_ops with CONFIG_PROC_FS SUNRPC: Avoid relying on crypto API to derive CBC-CTS output IV
2023-05-14SUNRPC: Fix trace_svc_register() call siteChuck Lever1-1/+1
The trace event recorded incorrect values for the registered family, protocol, and port because the arguments are in the wrong order. Fixes: b4af59328c25 ("SUNRPC: Trace server-side rpcbind registration events") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-14SUNRPC: always free ctxt when freeing deferred requestNeilBrown4-27/+39
Since the ->xprt_ctxt pointer was added to svc_deferred_req, it has not been sufficient to use kfree() to free a deferred request. We may need to free the ctxt as well. As freeing the ctxt is all that ->xpo_release_rqst() does, we repurpose it to explicit do that even when the ctxt is not stored in an rqst. So we now have ->xpo_release_ctxt() which is given an xprt and a ctxt, which may have been taken either from an rqst or from a dreq. The caller is now responsible for clearing that pointer after the call to ->xpo_release_ctxt. We also clear dr->xprt_ctxt when the ctxt is moved into a new rqst when revisiting a deferred request. This ensures there is only one pointer to the ctxt, so the risk of double freeing in future is reduced. The new code in svc_xprt_release which releases both the ctxt and any rq_deferred depends on this. Fixes: 773f91b2cf3f ("SUNRPC: Fix NFSD's request deferral on RDMA transports") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-14SUNRPC: double free xprt_ctxt while still in useNeilBrown1-1/+2
When an RPC request is deferred, the rq_xprt_ctxt pointer is moved out of the svc_rqst into the svc_deferred_req. When the deferred request is revisited, the pointer is copied into the new svc_rqst - and also remains in the svc_deferred_req. In the (rare?) case that the request is deferred a second time, the old svc_deferred_req is reused - it still has all the correct content. However in that case the rq_xprt_ctxt pointer is NOT cleared so that when xpo_release_xprt is called, the ctxt is freed (UDP) or possible added to a free list (RDMA). When the deferred request is revisited for a second time, it will reference this ctxt which may be invalid, and the free the object a second time which is likely to oops. So change svc_defer() to *always* clear rq_xprt_ctxt, and assert that the value is now stored in the svc_deferred_req. Fixes: 773f91b2cf3f ("SUNRPC: Fix NFSD's request deferral on RDMA transports") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-06SUNRPC: Fix error handling in svc_setup_socket()Chuck Lever1-12/+4
Dan points out that sock_alloc_file() releases @sock on error, but so do all of svc_setup_socket's callers, resulting in a double- release if sock_alloc_file() returns an error. Rather than allocating a struct file for all new sockets, allocate one only for sockets created during a TCP accept. For the moment, those are the only ones that will ever be used with RPC-with-TLS. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: ae0d77708aae ("SUNRPC: Ensure server-side sockets have a sock->file") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-03SUNRPC: Fix encoding of accepted but unsuccessful RPC repliesChuck Lever1-6/+11
Jiri Slaby says: > I bisected to this ... as it breaks nfs3-only servers in 6.3. > I.e. /etc/nfs.conf containing: > [nfsd] > vers4=no > > The client sees: > mount("10.0.2.15:/tmp", "/mnt", "nfs", 0, "vers=4.2,addr=10.0.2.15,clientad"...) = -1 EIO (Input/output error) > write(2, "mount.nfs: mount system call fai"..., 45 > mount.nfs: mount system call failed for /mnt > > And the kernel says: > nfs4_discover_server_trunking unhandled error -5. Exiting with error EIO Reported-by: Jiri Slaby <jirislaby@kernel.org> Link: https://bugzilla.suse.com/show_bug.cgi?id=1210995 Fixes: 4bcf0343e8a6 ("SUNRPC: Set rq_accept_statp inside ->accept methods") Tested-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-05-02SUNRPC: Avoid relying on crypto API to derive CBC-CTS output IVArd Biesheuvel1-0/+10
Scott reports SUNRPC self-test failures regarding the output IV on arm64 when using the SIMD accelerated implementation of AES in CBC mode with ciphertext stealing ("cts(cbc(aes))" in crypto API speak). These failures are the result of the fact that, while RFC 3962 does specify what the output IV should be and includes test vectors for it, the general concept of an output IV is poorly defined, and generally, not specified by the various algorithms implemented by the crypto API. Only algorithms that support transparent chaining (e.g., CBC mode on a block boundary) have requirements on the output IV, but ciphertext stealing (CTS) is fundamentally about how to encapsulate CBC in a way where the length of the entire message may not be an integral multiple of the cipher block size, and the concept of an output IV does not exist here because it has no defined purpose past the end of the message. The generic CTS template takes advantage of this chaining capability of the CBC implementations, and as a result, happens to return an output IV, simply because it passes its IV buffer directly to the encapsulated CBC implementation, which operates on full blocks only, and always returns an IV. This output IV happens to match how RFC 3962 defines it, even though the CTS template itself does not contain any output IV logic whatsoever, and, for this reason, lacks any test vectors that exercise this accidental output IV generation. The arm64 SIMD implementation of cts(cbc(aes)) does not use the generic CTS template at all, but instead, implements the CBC mode and ciphertext stealing directly, and therefore does not encapsule a CBC implementation that returns an output IV in the same way. The arm64 SIMD implementation complies with the specification and passes all internal tests, but when invoked by the SUNRPC code, fails to produce the expected output IV and causes its selftests to fail. Given that the output IV is defined as the penultimate block (where the final block may smaller than the block size), we can quite easily derive it in the caller by copying the appropriate slice of ciphertext after encryption. Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@kernel.org> Reported-by: Scott Mayhew <smayhew@redhat.com> Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-29Merge tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds6-69/+243
Pull nfsd updates from Chuck Lever: "The big ticket item for this release is that support for RPC-with-TLS [RFC 9289] has been added to the Linux NFS server. The goal is to provide a simple-to-deploy, low-overhead in-transit confidentiality and peer authentication mechanism. It can supplement NFS Kerberos and it can protect the use of legacy non-cryptographic user authentication flavors such as AUTH_SYS. The TLS Record protocol is handled entirely by kTLS, meaning it can use either software encryption or offload encryption to smart NICs. Aside from that, work continues on improving NFSD's open file cache. Among the many clean-ups in that area is a patch to convert the rhashtable to use the list-hashing version of that data structure" * tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits) NFSD: Handle new xprtsec= export option SUNRPC: Support TLS handshake in the server-side TCP socket code NFSD: Clean up xattr memory allocation flags NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop SUNRPC: Clear rq_xid when receiving a new RPC Call SUNRPC: Recognize control messages in server-side TCP socket code SUNRPC: Be even lazier about releasing pages SUNRPC: Convert svc_xprt_release() to the release_pages() API SUNRPC: Relocate svc_free_res_pages() nfsd: simplify the delayed disposal list code SUNRPC: Ignore return value of ->xpo_sendto SUNRPC: Ensure server-side sockets have a sock->file NFSD: Watch for rq_pages bounds checking errors in nfsd_splice_actor() sunrpc: simplify two-level sysctl registration for svcrdma_parm_table SUNRPC: return proper error from get_expiry() lockd: add some client-side tracepoints nfs: move nfs_fhandle_hash to common include file lockd: server should unlock lock if client rejects the grant lockd: fix races in client GRANTED_MSG wait logic lockd: move struct nlm_wait to lockd.h ...
2023-04-29Merge tag 'nfs-for-6.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds5-52/+18
Pull NFS client updates from Anna Schumaker: "New Features: - Convert the readdir path to use folios - Convert the NFS fscache code to use netfs Bugfixes and Cleanups: - Always send a RECLAIM_COMPLETE after establishing a lease - Simplify sysctl registrations and other cleanups - Handle out-of-order write replies on NFS v3 - Have sunrpc call_bind_status use standard hard/soft task semantics - Other minor cleanups" * tag 'nfs-for-6.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.2: Rework scratch handling for READ_PLUS NFS: Cleanup unused rpc_clnt variable NFS: set varaiable nfs_netfs_debug_id storage-class-specifier to static SUNRPC: remove the maximum number of retries in call_bind_status NFS: Convert readdir page array functions to use a folio NFS: Convert the readdir array-of-pages into an array-of-folios NFSv3: handle out-of-order write replies. NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API NFS: Convert buffered read paths to use netfs when fscache is enabled NFS: Configure support for netfs when NFS fscache is configured NFS: Rename readpage_async_filler to nfs_read_add_folio sunrpc: simplify one-level sysctl registration for debug_table sunrpc: move sunrpc_table and proc routines above sunrpc: simplify one-level sysctl registration for xs_tunables_table sunrpc: simplify one-level sysctl registration for xr_tunables_table nfs: simplify two-level sysctl registration for nfs_cb_sysctls nfs: simplify two-level sysctl registration for nfs4_cb_sysctls lockd: simplify two-level sysctl registration for nlm_sysctls NFSv4.1: Always send a RECLAIM_COMPLETE after establishing lease
2023-04-28SUNRPC: Support TLS handshake in the server-side TCP socket codeChuck Lever3-6/+111
This patch adds opportunitistic RPC-with-TLS to the Linux in-kernel NFS server. If the client requests RPC-with-TLS and the user space handshake agent is running, the server will set up a TLS session. There are no policy settings yet. For example, the server cannot yet require the use of RPC-with-TLS to access its data. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-28SUNRPC: Clear rq_xid when receiving a new RPC CallChuck Lever1-0/+2
This is an eye-catcher for tracepoints that record the XID: it means svc_rqst() has not received a full RPC Call with an XID yet. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-28SUNRPC: Recognize control messages in server-side TCP socket codeChuck Lever1-2/+46
To support kTLS, the server-side TCP socket receive path needs to watch for CMSGs. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-28SUNRPC: Be even lazier about releasing pagesChuck Lever2-3/+3
A single RPC transaction that touches only a couple of pages means rq_pvec will not be even close to full in svc_xpt_release(). This is a common case. Instead, just leave the pages in rq_pvec until it is completely full. This improves the efficiency of the batch release mechanism on workloads that involve small RPC messages. The rq_pvec is also fully emptied just before thread exit. Reviewed-by: Calum Mackay <calum.mackay@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-26SUNRPC: Convert svc_xprt_release() to the release_pages() APIChuck Lever1-6/+5
Instead of invoking put_page() one-at-a-time, pass the "response" portion of rq_pages directly to release_pages() to reduce the number of times each nfsd thread invokes a page allocator API. Since svc_xprt_release() is not invoked while a client is waiting for an RPC Reply, this is not expected to directly impact mean request latencies on a lightly or moderately loaded server. However as workload intensity increases, I expect somewhat better scalability: the same number of server threads should be able to handle more work. Reviewed-by: Calum Mackay <calum.mackay@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-26SUNRPC: Relocate svc_free_res_pages()Chuck Lever2-1/+20
Clean-up: There doesn't seem to be a reason why this function is stuck in a header. One thing it prevents is the convenient addition of tracing. Moving it to a source file also makes the rq_respages clean-up logic easier to find. Reviewed-by: Calum Mackay <calum.mackay@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-26SUNRPC: Ignore return value of ->xpo_sendtoChuck Lever2-18/+16
Clean up: All callers of svc_process() ignore its return value, so svc_process() can safely be converted to return void. Ditto for svc_send(). The return value of ->xpo_sendto() is now used only as part of a trace event. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-26SUNRPC: Ensure server-side sockets have a sock->fileChuck Lever1-7/+18
The TLS handshake upcall mechanism requires a non-NULL sock->file on the socket it hands to user space. svc_sock_free() already releases sock->file properly if one exists. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-26NFSD: Watch for rq_pages bounds checking errors in nfsd_splice_actor()Chuck Lever1-1/+14
There have been several bugs over the years where the NFSD splice actor has attempted to write outside the rq_pages array. This is a "should never happen" condition, but if for some reason the pipe splice actor should attempt to walk past the end of rq_pages, it needs to terminate the READ operation to prevent corruption of the pointer addresses in the fields just beyond the array. A server crash is thus prevented. Since the code is not behaving, the READ operation returns -EIO to the client. None of the READ payload data can be trusted if the splice actor isn't operating as expected. Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2023-04-26sunrpc: simplify two-level sysctl registration for svcrdma_parm_tableLuis Chamberlain1-19/+2
There is no need to declare two tables to just create directories, this can be easily be done with a prefix path with register_sysctl(). Simplify this registration. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-26SUNRPC: return proper error from get_expiry()NeilBrown2-12/+12
The get_expiry() function currently returns a timestamp, and uses the special return value of 0 to indicate an error. Unfortunately this causes a problem when 0 is the correct return value. On a system with no RTC it is possible that the boot time will be seen to be "3". When exportfs probes to see if a particular filesystem supports NFS export it tries to cache information with an expiry time of "3". The intention is for this to be "long in the past". Even with no RTC it will not be far in the future (at most a second or two) so this is harmless. But if the boot time happens to have been calculated to be "3", then get_expiry will fail incorrectly as it converts the number to "seconds since bootime" - 0. To avoid this problem we change get_expiry() to report the error quite separately from the expiry time. The error is now the return value. The expiry time is reported through a by-reference parameter. Reported-by: Jerry Zhang <jerry@skydio.com> Tested-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>