summaryrefslogtreecommitdiff
path: root/net/sunrpc
AgeCommit message (Collapse)AuthorFilesLines
2024-01-08svcrdma: De-duplicate completion ID initialization helpersChuck Lever3-22/+1
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Move the svc_rdma_cc_init() callChuck Lever2-3/+9
Now that the chunk_ctxt for Reads is no longer dynamically allocated it can be initialized once for the life of the object that contains it (struct svc_rdma_recv_ctxt). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Remove struct svc_rdma_read_infoChuck Lever1-29/+0
The remaining fields of struct svc_rdma_read_info are no longer referenced. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update the synopsis of svc_rdma_read_special()Chuck Lever1-10/+9
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_special() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update the synopsis of svc_rdma_read_call_chunk()Chuck Lever1-13/+11
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_call_chunk() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_read_multiple_chunks()Chuck Lever1-10/+9
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_read_multiple_chunks() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_copy_inline_range()Chuck Lever1-8/+9
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_copy_inline_range() can use that recv_ctxt to derive the read_info rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update the synopsis of svc_rdma_read_data_item()Chuck Lever1-9/+8
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_data_item() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_read_chunk_range()Chuck Lever1-12/+12
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_chunk_range() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_build_read_chunk()Chuck Lever1-11/+10
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_chunk() can use that recv_ctxt to derive that information rather than the other way around. This removes another usage of the ri_readctxt field, enabling its removal in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update synopsis of svc_rdma_build_read_segment()Chuck Lever1-8/+9
Since the RDMA Read I/O state is now contained in the recv_ctxt, svc_rdma_build_read_segment() can use the recv_ctxt to derive that information rather than the other way around. This removes one usage of the ri_readctxt field, enabling its removal in a subsequent patch. At the same time, the use of ri_rqst can similarly be replaced with a passed-in function parameter. Start with build_read_segment() because it is a common utility function at the bottom of the Read chunk path. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Move read_info::ri_pageoff into struct svc_rdma_recv_ctxtChuck Lever1-16/+15
Further clean up: move the starting byte offset field into svc_rdma_recv_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxtChuck Lever1-12/+9
Further clean up: move the page index field into svc_rdma_recv_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Start moving fields out of struct svc_rdma_read_infoChuck Lever1-31/+26
Since the request's svc_rdma_recv_ctxt will stay around for the duration of the RDMA Read operation, the contents of struct svc_rdma_read_info can reside in the request's svc_rdma_recv_ctxt rather than being allocated separately. This will eventually save a call to kmalloc() in a hot path. Start this clean-up by moving the Read chunk's svc_rdma_chunk_ctxt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Move struct svc_rdma_chunk_ctxt to svc_rdma.hChuck Lever1-18/+0
Prepare for nestling these into the send and recv ctxts so they no longer have to be allocated dynamically. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Remove the svc_rdma_chunk_ctxt::cc_rdma fieldChuck Lever1-2/+0
In every instance, the pointer address in that field is now available by other means. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Pass a pointer to the transport to svc_rdma_cc_release()Chuck Lever1-6/+7
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Explicitly pass the transport to svc_rdma_post_chunk_ctxt()Chuck Lever1-5/+5
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Explicitly pass the transport into Read chunk I/O pathsChuck Lever1-22/+36
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Explicitly pass the transport into Write chunk I/O pathsChuck Lever1-1/+4
Enable the eventual removal of the svc_rdma_chunk_ctxt::cc_rdma field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Acquire the svcxprt_rdma pointer from the CQ contextChuck Lever1-2/+3
Enable the removal of the svc_rdma_chunk_ctxt::cc_rdma field in a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Reduce size of struct svc_rdma_rw_ctxtChuck Lever1-4/+8
SG_CHUNK_SIZE is 128, making struct svc_rdma_rw_ctxt + the first SGL array more than 4200 bytes in length, pushing the memory allocation well into order 1. Even so, the RDMA rw core doesn't seem to use more than max_send_sge entries in that array (typically 32 or less), so that is all wasted space. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Update some svcrdma DMA-related tracepointsChuck Lever1-5/+5
A send/recv_ctxt already records transport-related information in the cq.id, thus there is no need to record the IP addresses of the transport endpoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: DMA error tracepoints should report completion IDsChuck Lever1-4/+5
Update the DMA error flow tracepoints to report the completion ID of the failing context. This ties the wait/failure to a particular operation or request, which is more useful than knowing only the failing transport. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: SQ error tracepoints should report completion IDsChuck Lever2-6/+6
Update the Send Queue's error flow tracepoints to report the completion ID of the waiting or failing context. This ties the wait/failure to a particular operation or request, which is a little more useful than knowing only the transport that is about to close. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08rpcrdma: Introduce a simple cid tracepoint classChuck Lever4-4/+4
De-duplicate some code, making it easier to add new tracepoints that report only a completion ID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Add lockdep class keys for transport locksChuck Lever1-0/+6
Two svcrdma-related transport locks can become quite contended. Collate their use and make them easy to find in /proc/lock_stat for better observability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Clean up lockingChuck Lever1-2/+2
There's no need to protect llist_entry() with a spin lock. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Add an async version of svc_rdma_write_info_free()Chuck Lever1-1/+11
DMA unmapping can take quite some time, so it should not be handled in a single-threaded completion handler. Defer releasing write_info structs to the recently-added workqueue. With this patch, DMA unmapping can be handled in parallel, and it does not cause head-of-queue blocking of Write completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Add an async version of svc_rdma_send_ctxt_put()Chuck Lever1-9/+25
DMA unmapping can take quite some time, so it should not be handled in a single-threaded completion handler. Defer releasing send_ctxts to the recently-added workqueue. With this patch, DMA unmapping can be handled in parallel, and it does not cause head-of-queue blocking of Send completions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Add a utility workqueue to svcrdmaChuck Lever2-8/+25
To handle work in the background, set up an UNBOUND workqueue for svcrdma. Subsequent patches will make use of it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Pre-allocate svc_rdma_recv_ctxt objectsChuck Lever1-11/+21
The original reason for allocating svc_rdma_recv_ctxt objects during Receive completion was to ensure the objects were allocated on the NUMA node closest to the underlying IB device. Since commit c5d68d25bd6b ("svcrdma: Clean up allocation of svc_rdma_recv_ctxt"), however, the device's favored node is explicitly passed to the memory allocator. To enable switching Receive completion to soft IRQ context, move memory allocation out of completion handling, since it can be costly, and it can sleep. A limited number of objects is now allocated at "accept" time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08svcrdma: Eliminate allocation of recv_ctxt objects in backchannelChuck Lever2-21/+21
The svc_rdma_recv_ctxt free list uses a lockless list to avoid the need for a spin lock in the fast path. llist_del_first(), which is used by svc_rdma_recv_ctxt_get(), requires serialization, however, when there are multiple list producers that are unserialized. I mistakenly thought there was only one caller of svc_rdma_recv_ctxt_get() (svc_rdma_refresh_recvs()), thus explicit serialization would not be necessary. But there is another caller: svc_rdma_bc_sendto(), and these two are not serialized against each other. I haven't seen ill effects that I could directly ascribe to a lack of serialization. It's just an observation based on code audit. When DMA-mapping before sending a Reply, the passed-in struct svc_rdma_recv_ctxt is used only for its write and reply PCLs. These are currently always empty in the backchannel case. So, instead of passing a full svc_rdma_recv_ctxt object to svc_rdma_map_reply_msg(), let's pass in just the Write and Reply PCLs. This change makes it unnecessary for the backchannel to acquire a dummy svc_rdma_recv_ctxt object when sending an RPC Call. The need for svc_rdma_recv_ctxt free list serialization is now completely avoided. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08SUNRPC: Remove RQ_SPLICE_OKChuck Lever2-12/+0
This flag is no longer used. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-01-08SUNRPC: Add a server-side API for retrieving an RPC's pseudoflavorChuck Lever2-0/+22
NFSD will use this new API to determine whether nfsd_splice_read is safe to use. This avoids the need to add a dependency to NFSD for CONFIG_SUNRPC_GSS. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-20Merge tag 'nfsd-6.7-2' of ↵Linus Torvalds1-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Address a few recently-introduced issues * tag 'nfsd-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Revert 5f7fc5d69f6e92ec0b38774c387f5cf7812c5806 NFSD: Revert 738401a9bd1ac34ccd5723d69640a4adbb1a4bc0 NFSD: Revert 6c41d9a9bd0298002805758216a9c44e38a8500d nfsd: hold nfsd_mutex across entire netlink operation nfsd: call nfsd_last_thread() before final nfsd_put()
2023-12-19SUNRPC: Revert 5f7fc5d69f6e92ec0b38774c387f5cf7812c5806Chuck Lever1-3/+2
Guillaume says: > I believe commit 5f7fc5d69f6e ("SUNRPC: Resupply rq_pages from > node-local memory") in Linux 6.5+ is incorrect. It passes > unconditionally rq_pool->sp_id as the NUMA node. > > While the comment in the svc_pool declaration in sunrpc/svc.h says > that sp_id is also the NUMA node id, it might not be the case if > the svc is created using svc_create_pooled(). svc_created_pooled() > can use the per-cpu pool mode therefore in this case sp_id would > be the cpu id. Fix this by reverting now. At a later point this minor optimization, and the deceptive labeling of the sp_id field, can be revisited. Reported-by: Guillaume Morin <guillaume@morinfr.org> Closes: https://lore.kernel.org/linux-nfs/ZYC9rsno8qYggVt9@bender.morinfr.org/T/#u Fixes: 5f7fc5d69f6e ("SUNRPC: Resupply rq_pages from node-local memory") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-12-16cred: get rid of CONFIG_DEBUG_CREDENTIALSJens Axboe1-3/+0
This code is rarely (never?) enabled by distros, and it hasn't caught anything in decades. Let's kill off this legacy debug code. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-11-09Merge tag 'nfs-for-6.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds4-14/+18
Pull NFS client updates from Trond Myklebust: "Bugfixes: - SUNRPC: - re-probe the target RPC port after an ECONNRESET error - handle allocation errors from rpcb_call_async() - fix a use-after-free condition in rpc_pipefs - fix up various checks for timeouts - NFSv4.1: - Handle NFS4ERR_DELAY errors during session trunking - fix SP4_MACH_CRED protection for pnfs IO - NFSv4: - Ensure that we test all delegations when the server notifies us that it may have revoked some of them Features: - Allow knfsd processes to break out of NFS4ERR_DELAY loops when re-exporting NFSv4.x by setting appropriate values for the 'delay_retrans' module parameter - nfs: Convert nfs_symlink() to use a folio" * tag 'nfs-for-6.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Convert nfs_symlink() to use a folio SUNRPC: Fix RPC client cleaned up the freed pipefs dentries NFSv4.1: fix SP4_MACH_CRED protection for pnfs IO SUNRPC: Add an IS_ERR() check back to where it was NFSv4.1: fix handling NFS4ERR_DELAY when testing for session trunking nfs41: drop dependency between flexfiles layout driver and NFSv3 modules NFSv4: fairly test all delegations on a SEQ4_ revocation SUNRPC: SOFTCONN tasks should time out when on the sending list SUNRPC: Force close the socket when a hard error is reported SUNRPC: Don't skip timeout checks in call_connect_status() SUNRPC: ECONNRESET might require a rebind NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY
2023-11-03Merge tag 'mm-stable-2023-11-01-14-33' of ↵Linus Torvalds1-8/+12
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...
2023-11-03Merge tag 'v6.7-p1' of ↵Linus Torvalds2-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add virtual-address based lskcipher interface - Optimise ahash/shash performance in light of costly indirect calls - Remove ahash alignmask attribute Algorithms: - Improve AES/XTS performance of 6-way unrolling for ppc - Remove some uses of obsolete algorithms (md4, md5, sha1) - Add FIPS 202 SHA-3 support in pkcs1pad - Add fast path for single-page messages in adiantum - Remove zlib-deflate Drivers: - Add support for S4 in meson RNG driver - Add STM32MP13x support in stm32 - Add hwrng interface support in qcom-rng - Add support for deflate algorithm in hisilicon/zip" * tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (283 commits) crypto: adiantum - flush destination page before unmapping crypto: testmgr - move pkcs1pad(rsa,sha3-*) to correct place Documentation/module-signing.txt: bring up to date module: enable automatic module signing with FIPS 202 SHA-3 crypto: asymmetric_keys - allow FIPS 202 SHA-3 signatures crypto: rsa-pkcs1pad - Add FIPS 202 SHA-3 support crypto: FIPS 202 SHA-3 register in hash info for IMA x509: Add OIDs for FIPS 202 SHA-3 hash and signatures crypto: ahash - optimize performance when wrapping shash crypto: ahash - check for shash type instead of not ahash type crypto: hash - move "ahash wrapping shash" functions to ahash.c crypto: talitos - stop using crypto_ahash::init crypto: chelsio - stop using crypto_ahash::init crypto: ahash - improve file comment crypto: ahash - remove struct ahash_request_priv crypto: ahash - remove crypto_ahash_alignmask crypto: gcm - stop using alignmask of ahash crypto: chacha20poly1305 - stop using alignmask of ahash crypto: ccm - stop using alignmask of ahash net: ipv6: stop checking crypto_ahash_alignmask ...
2023-11-01SUNRPC: Fix RPC client cleaned up the freed pipefs dentriesfelix1-1/+4
RPC client pipefs dentries cleanup is in separated rpc_remove_pipedir() workqueue,which takes care about pipefs superblock locking. In some special scenarios, when kernel frees the pipefs sb of the current client and immediately alloctes a new pipefs sb, rpc_remove_pipedir function would misjudge the existence of pipefs sb which is not the one it used to hold. As a result, the rpc_remove_pipedir would clean the released freed pipefs dentries. To fix this issue, rpc_remove_pipedir should check whether the current pipefs sb is consistent with the original pipefs sb. This error can be catched by KASAN: ========================================================= [ 250.497700] BUG: KASAN: slab-use-after-free in dget_parent+0x195/0x200 [ 250.498315] Read of size 4 at addr ffff88800a2ab804 by task kworker/0:18/106503 [ 250.500549] Workqueue: events rpc_free_client_work [ 250.501001] Call Trace: [ 250.502880] kasan_report+0xb6/0xf0 [ 250.503209] ? dget_parent+0x195/0x200 [ 250.503561] dget_parent+0x195/0x200 [ 250.503897] ? __pfx_rpc_clntdir_depopulate+0x10/0x10 [ 250.504384] rpc_rmdir_depopulate+0x1b/0x90 [ 250.504781] rpc_remove_client_dir+0xf5/0x150 [ 250.505195] rpc_free_client_work+0xe4/0x230 [ 250.505598] process_one_work+0x8ee/0x13b0 ... [ 22.039056] Allocated by task 244: [ 22.039390] kasan_save_stack+0x22/0x50 [ 22.039758] kasan_set_track+0x25/0x30 [ 22.040109] __kasan_slab_alloc+0x59/0x70 [ 22.040487] kmem_cache_alloc_lru+0xf0/0x240 [ 22.040889] __d_alloc+0x31/0x8e0 [ 22.041207] d_alloc+0x44/0x1f0 [ 22.041514] __rpc_lookup_create_exclusive+0x11c/0x140 [ 22.041987] rpc_mkdir_populate.constprop.0+0x5f/0x110 [ 22.042459] rpc_create_client_dir+0x34/0x150 [ 22.042874] rpc_setup_pipedir_sb+0x102/0x1c0 [ 22.043284] rpc_client_register+0x136/0x4e0 [ 22.043689] rpc_new_client+0x911/0x1020 [ 22.044057] rpc_create_xprt+0xcb/0x370 [ 22.044417] rpc_create+0x36b/0x6c0 ... [ 22.049524] Freed by task 0: [ 22.049803] kasan_save_stack+0x22/0x50 [ 22.050165] kasan_set_track+0x25/0x30 [ 22.050520] kasan_save_free_info+0x2b/0x50 [ 22.050921] __kasan_slab_free+0x10e/0x1a0 [ 22.051306] kmem_cache_free+0xa5/0x390 [ 22.051667] rcu_core+0x62c/0x1930 [ 22.051995] __do_softirq+0x165/0x52a [ 22.052347] [ 22.052503] Last potentially related work creation: [ 22.052952] kasan_save_stack+0x22/0x50 [ 22.053313] __kasan_record_aux_stack+0x8e/0xa0 [ 22.053739] __call_rcu_common.constprop.0+0x6b/0x8b0 [ 22.054209] dentry_free+0xb2/0x140 [ 22.054540] __dentry_kill+0x3be/0x540 [ 22.054900] shrink_dentry_list+0x199/0x510 [ 22.055293] shrink_dcache_parent+0x190/0x240 [ 22.055703] do_one_tree+0x11/0x40 [ 22.056028] shrink_dcache_for_umount+0x61/0x140 [ 22.056461] generic_shutdown_super+0x70/0x590 [ 22.056879] kill_anon_super+0x3a/0x60 [ 22.057234] rpc_kill_sb+0x121/0x200 Fixes: 0157d021d23a ("SUNRPC: handle RPC client pipefs dentries by network namespace aware routines") Signed-off-by: felix <fuzhen5@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-11-01SUNRPC: Add an IS_ERR() check back to where it wasDan Carpenter1-0/+4
This IS_ERR() check was deleted during in a cleanup because, at the time, the rpcb_call_async() function could not return an error pointer. That changed in commit 25cf32ad5dba ("SUNRPC: Handle allocation failure in rpc_new_task()") and now it can return an error pointer. Put the check back. A related revert was done in commit 13bd90141804 ("Revert "SUNRPC: Remove unreachable error condition""). Fixes: 037e910b52b0 ("SUNRPC: Remove unreachable error condition in rpcb_getport_async()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-10-30Merge tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds5-220/+193
Pull nfsd updates from Chuck Lever: "This release completes the SunRPC thread scheduler work that was begun in v6.6. The scheduler can now find an svc thread to wake in constant time and without a list walk. Thanks again to Neil Brown for this overhaul. Lorenzo Bianconi contributed infrastructure for a netlink-based NFSD control plane. The long-term plan is to provide the same functionality as found in /proc/fs/nfsd, plus some interesting additions, and then migrate the NFSD user space utilities to netlink. A long series to overhaul NFSD's NFSv4 operation encoding was applied in this release. The goals are to bring this family of encoding functions in line with the matching NFSv4 decoding functions and with the NFSv2 and NFSv3 XDR functions, preparing the way for better memory safety and maintainability. A further improvement to NFSD's write delegation support was contributed by Dai Ngo. This adds a CB_GETATTR callback, enabling the server to retrieve cached size and mtime data from clients holding write delegations. If the server can retrieve this information, it does not have to recall the delegation in some cases. The usual panoply of bug fixes and minor improvements round out this release. As always I am grateful to all contributors, reviewers, and testers" * tag 'nfsd-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (127 commits) svcrdma: Fix tracepoint printk format svcrdma: Drop connection after an RDMA Read error NFSD: clean up alloc_init_deleg() NFSD: Fix frame size warning in svc_export_parse() NFSD: Rewrite synopsis of nfsd_percpu_counters_init() nfsd: Clean up errors in nfs3proc.c nfsd: Clean up errors in nfs4state.c NFSD: Clean up errors in stats.c NFSD: simplify error paths in nfsd_svc() NFSD: Clean up nfsd4_encode_seek() NFSD: Clean up nfsd4_encode_offset_status() NFSD: Clean up nfsd4_encode_copy_notify() NFSD: Clean up nfsd4_encode_copy() NFSD: Clean up nfsd4_encode_test_stateid() NFSD: Clean up nfsd4_encode_exchange_id() NFSD: Clean up nfsd4_do_encode_secinfo() NFSD: Clean up nfsd4_encode_access() NFSD: Clean up nfsd4_encode_readdir() NFSD: Clean up nfsd4_encode_entry4() NFSD: Add an nfsd4_encode_nfs_cookie4() helper ...
2023-10-23SUNRPC: SOFTCONN tasks should time out when on the sending listTrond Myklebust1-2/+2
SOFTCONN tasks need to periodically check if the transport is still connected, so that they can time out if that is not the case. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-10-23SUNRPC: Force close the socket when a hard error is reportedTrond Myklebust1-9/+5
Fix up xs_wake_error() to close the socket when a hard error is being reported. Usually, that means an ECONNRESET was received on a connection attempt. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-10-23SUNRPC: Don't skip timeout checks in call_connect_status()Trond Myklebust1-1/+2
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-10-23SUNRPC: ECONNRESET might require a rebindTrond Myklebust1-1/+1
If connect() is returning ECONNRESET, it usually means that nothing is listening on that port. If so, a rebind might be required in order to obtain the new port on which the RPC service is listening. Fixes: fd01b2597941 ("SUNRPC: ECONNREFUSED should cause a rebind.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-10-18sunrpc: convert to new timestamp accessorsJeff Layton1-1/+1
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-81-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-16svcrdma: Drop connection after an RDMA Read errorChuck Lever1-1/+2
When an RPC Call message cannot be pulled from the client, that is a message loss, by definition. Close the connection to trigger the client to resend. Cc: <stable@vger.kernel.org> Reviewed-by: Tom Talpey <tom@talpey.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>