summaryrefslogtreecommitdiff
path: root/include/linux/sunrpc
AgeCommit message (Collapse)AuthorFilesLines
2016-05-29Merge branch 'hash' of git://ftp.sciencehorizons.net/linuxLinus Torvalds1-31/+9
Pull string hash improvements from George Spelvin: "This series does several related things: - Makes the dcache hash (fs/namei.c) useful for general kernel use. (Thanks to Bruce for noticing the zero-length corner case) - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the above. - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two 32-bit multiplies will do well enough. - Rids the world of the bad hash multipliers in hash_32. This finishes the job started in commit 689de1d6ca95 ("Minimal fix-up of bad hashing behavior of hash_64()") The vast majority of Linux architectures have hardware support for 32x32-bit multiply and so derive no benefit from "simplified" multipliers. The few processors that do not (68000, h8/300 and some models of Microblaze) have arch-specific implementations added. Those patches are last in the series. - Overhauls the dcache hash mixing. The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion. Replaced with a much more careful design that's simultaneously faster and better. (My own invention, as there was noting suitable in the literature I could find. Comments welcome!) - Modify the hash_name() loop to skip the initial HASH_MIX(). This would let us salt the hash if we ever wanted to. - Sort out partial_name_hash(). The hash function is declared as using a long state, even though it's truncated to 32 bits at the end and the extra internal state contributes nothing to the result. And some callers do odd things: - fs/hfs/string.c only allocates 32 bits of state - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes - Modify bytemask_from_count to handle inputs of 1..sizeof(long) rather than 0..sizeof(long)-1. This would simplify users other than full_name_hash" Special thanks to Bruce Fields for testing and finding bugs in v1. (I learned some humbling lessons about "obviously correct" code.) On the arch-specific front, the m68k assembly has been tested in a standalone test harness, I've been in contact with the Microblaze maintainers who mostly don't care, as the hardware multiplier is never omitted in real-world applications, and I haven't heard anything from the H8/300 world" * 'hash' of git://ftp.sciencehorizons.net/linux: h8300: Add <asm/hash.h> microblaze: Add <asm/hash.h> m68k: Add <asm/hash.h> <linux/hash.h>: Add support for architecture-specific functions fs/namei.c: Improve dcache hash function Eliminate bad hash multipliers from hash_32() and hash_64() Change hash_64() return value to 32 bits <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() fs/namei.c: Add hashlen_string() function Pull out string hash to <linux/stringhash.h>
2016-05-28<linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()George Spelvin1-31/+9
Finally, the first use of previous two patches: eliminate the separate ad-hoc string hash functions in the sunrpc code. Now hash_str() is a wrapper around hash_string(), and hash_mem() is likewise a wrapper around full_name_hash(). Note that sunrpc code *does* call hash_mem() with a zero length, which is why the previous patch needed to handle that in full_name_hash(). (Thanks, Bruce, for finding that!) This also eliminates the only caller of hash_long which asks for more than 32 bits of output. The comment about the quality of hashlen_string() and full_name_hash() is jumping the gun by a few patches; they aren't very impressive now, but will be improved greatly later in the series. Signed-off-by: George Spelvin <linux@sciencehorizons.net> Tested-by: J. Bruce Fields <bfields@redhat.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: linux-nfs@vger.kernel.org
2016-05-26Merge tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds5-6/+30
Pull NFS client updates from Anna Schumaker: "Highlights include: Features: - Add support for the NFS v4.2 COPY operation - Add support for NFS/RDMA over IPv6 Bugfixes and cleanups: - Avoid race that crashes nfs_init_commit() - Fix oops in callback path - Fix LOCK/OPEN race when unlinking an open file - Choose correct stateids when using delegations in setattr, read and write - Don't send empty SETATTR after OPEN_CREATE - xprtrdma: Prevent server from writing a reply into memory client has released - xprtrdma: Support using Read list and Reply chunk in one RPC call" * tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (61 commits) pnfs: pnfs_update_layout needs to consider if strict iomode checking is on nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO nfs: avoid race that crashes nfs_init_commit NFS: checking for NULL instead of IS_ERR() in nfs_commit_file() pnfs: make pnfs_layout_process more robust pnfs: rework LAYOUTGET retry handling pnfs: lift retry logic from send_layoutget to pnfs_update_layout pnfs: fix bad error handling in send_layoutget flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args pnfs: keep track of the return sequence number in pnfs_layout_hdr pnfs: record sequence in pnfs_layout_segment when it's created pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io pNFS/flexfile: Fix erroneous fall back to read/write through the MDS NFS: Reclaim writes via writepage are opportunistic NFSv4: Use the right stateid for delegations in setattr, read and write ...
2016-05-17SUNRPC: Ensure get_rpccred() and put_rpccred() can take NULL argumentsTrond Myklebust1-1/+2
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Bound the inline threshold valuesChuck Lever1-1/+3
Currently the sysctls that allow setting the inline threshold allow any value to be set. Small values only make the transport run slower. The default 1KB setting is as low as is reasonable. And the logic that decides how to divide a Send buffer between RPC-over-RDMA header and RPC message assumes (but does not check) that the lower bound is not crazy (say, 57 bytes). Send and receive buffers share a page with some control information. Values larger than about 3KB can't be supported, currently. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17sunrpc: Advertise maximum backchannel payload sizeChuck Lever2-0/+2
RPC-over-RDMA transports have a limit on how large a backward direction (backchannel) RPC message can be. Ensure that the NFSv4.x CREATE_SESSION operation advertises this limit to servers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17sunrpc: Update RPCBIND_MAXNETIDLENChuck Lever1-2/+2
Commit 176e21ee2ec8 ("SUNRPC: Support for RPC over AF_LOCAL transports") added a 5-character netid, but did not bump RPCBIND_MAXNETIDLEN from 4 to 5. Fixes: 176e21ee2ec8 ("SUNRPC: Support for RPC over AF_LOCAL ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-13svcrdma: Generalize svc_rdma_xdr_decode_req()Chuck Lever1-1/+1
Clean up: Pass in just the piece of the svc_rqst that is needed here. While we're in the area, add an informative documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-05-09sunrpc: add a get_rpccred_rcu inlineJeff Layton1-0/+18
Sometimes we might have a RCU managed credential pointer and don't want to use locking to handle it. Add a function that will take a reference to the cred iff the refcount is not already zero. Callers can dereference the pointer under the rcu_read_lock and use that function to take a reference only if the cred is not on its way to destruction. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09sunrpc: add rpc_lookup_generic_credWeston Andros Adamson1-0/+1
Add function rpc_lookup_generic_cred, which allows lookups of a generic credential that's not current_cred(). [jlayton: add gfp_t parm] Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09sunrpc: plumb gfp_t parm into crcreate operationJeff Layton1-2/+2
We need to be able to call the generic_cred creator from different contexts. Add a gfp_t parm to the crcreate operation and to rpcauth_lookup_credcache. For now, we just push the gfp_t parms up one level to the *_lookup_cred functions. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-04-04mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usageKirill A. Shutemov1-1/+1
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-24Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2-7/+20
Pull nfsd updates from Bruce Fields: "Various bugfixes, a RDMA update from Chuck Lever, and support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. (Also: note this pull request also includes the client side of SCSI layout, with Trond's permission.)" * tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux: sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race nfsd: recover: fix memory leak nfsd: fix deadlock secinfo+readdir compound nfsd4: resfh unused in nfsd4_secinfo svcrdma: Use new CQ API for RPC-over-RDMA server send CQs svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs svcrdma: Remove close_out exit path svcrdma: Hook up the logic to return ERR_CHUNK svcrdma: Use correct XID in error replies svcrdma: Make RDMA_ERROR messages work rpcrdma: Add RPCRDMA_HDRLEN_ERR svcrdma: svc_rdma_post_recv() should close connection on error svcrdma: Close connection when a send error occurs nfsd: Lower NFSv4.1 callback message size limit svcrdma: Do not send Write chunk XDR pad with inline content svcrdma: Do not write xdr_buf::tail in a Write chunk svcrdma: Find client-provided write and reply chunks once per reply nfsd: Update NFS server comments related to RDMA support nfsd: Fix a memory leak when meeting unsupported state_protect_how4 nfsd4: fix bad bounds checking
2016-03-22Merge tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds6-35/+120
Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - Add support for multiple NFSv4.1 callbacks in flight - Initial patchset for RPC multipath support - Adapt RPC/RDMA to use the new completion queue API Bugfixes and cleanups: - nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed - Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync - Fix RPC/RDMA credit accounting - Properly handle RDMA_ERROR replies - xprtrdma: Do not wait if ib_post_send() fails - xprtrdma: Segment head and tail XDR buffers on page boundaries - xprtrdma cleanups for dprintk, physical_op_map and unused macros" * tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits) nfs/blocklayout: make sure making a aligned read request nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed nfs: remove nfs_inode_dio_wait nfs: remove nfs4_file_fsync xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs xprtrdma: Use an anonymous union in struct rpcrdma_mw xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs xprtrdma: Serialize credit accounting again xprtrdma: Properly handle RDMA_ERROR replies rpcrdma: Add RPCRDMA_HDRLEN_ERR xprtrdma: Do not wait if ib_post_send() fails xprtrdma: Segment head and tail XDR buffers on page boundaries xprtrdma: Clean up dprintk format string containing a newline xprtrdma: Clean up physical_op_map() xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3 SUNRPC: Allow addition of new transports to a struct rpc_clnt NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections SUNRPC: Make NFS swap work with multipath ...
2016-03-16Merge tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust2-7/+7
NFS: NFSoRDMA Client Side Changes These patches include several bugfixes and cleanups for the NFSoRDMA client. This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes from the recent workqueue swicchover. These patches also switch xprtrdma to use the new CQ API Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (787 commits) xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs xprtrdma: Use an anonymous union in struct rpcrdma_mw xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs xprtrdma: Serialize credit accounting again xprtrdma: Properly handle RDMA_ERROR replies rpcrdma: Add RPCRDMA_HDRLEN_ERR xprtrdma: Do not wait if ib_post_send() fails xprtrdma: Segment head and tail XDR buffers on page boundaries xprtrdma: Clean up dprintk format string containing a newline xprtrdma: Clean up physical_op_map() xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
2016-03-14xprtrdma: Properly handle RDMA_ERROR repliesChuck Lever1-5/+6
These are shorter than RPCRDMA_HDRLEN_MIN, and they need to complete the waiting RPC. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14rpcrdma: Add RPCRDMA_HDRLEN_ERRChuck Lever1-0/+1
Error headers are shorter than either RDMA_MSG or RDMA_NOMSG. Since HDRLEN_MIN is already used in several other places that would be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two spots where the shorter length is needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-14xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macroChuck Lever1-2/+0
Fixes: b3221d6a53c4 ('xprtrdma: Remove logic that constructs...') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-03-02svcrdma: Use new CQ API for RPC-over-RDMA server send CQsChuck Lever1-2/+7
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-02svcrdma: Use new CQ API for RPC-over-RDMA server receive CQsChuck Lever1-1/+1
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. svcrdma receive completions no longer use the dto_tasklet. Each polled Receive WC is now handled individually in soft IRQ context. The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod metrics are no longer updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-02svcrdma: Use correct XID in error repliesChuck Lever1-1/+1
When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-02svcrdma: Make RDMA_ERROR messages workChuck Lever1-2/+2
Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-02rpcrdma: Add RPCRDMA_HDRLEN_ERRChuck Lever1-0/+1
Error headers are shorter than either RDMA_MSG or RDMA_NOMSG. Since HDRLEN_MIN is already used in several other places that would be annoying to change, add RPCRDMA_HDRLEN_ERR for the one or two spots where the shorter length is needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-02svcrdma: svc_rdma_post_recv() should close connection on errorChuck Lever1-0/+1
Clean up: Most svc_rdma_post_recv() call sites close the transport connection when a receive cannot be posted. Wrap that in a common helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-02nfsd: Lower NFSv4.1 callback message size limitChuck Lever1-0/+7
The maximum size of a backchannel message on RPC-over-RDMA depends on the connection's inline threshold. Today that threshold is typically 1024 bytes, making the maximum message size 996 bytes. The Linux server's CREATE_SESSION operation checks that the size of callback Calls can be as large as 1044 bytes, to accommodate RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the true message size maximum of 996 bytes. But the server's backchannel currently does not support RPCSEC_GSS. The actual maximum size it needs is much smaller. It is safe to reduce the limit to enable NFSv4.1 on RDMA backchannel operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-02svcrdma: Do not send Write chunk XDR pad with inline contentChuck Lever1-1/+1
The NFS server's XDR encoders adds an XDR pad for content in the xdr_buf page list at the beginning of the xdr_buf's tail buffer. On RDMA transports, Write chunks are sent separately and without an XDR pad. If a Write chunk is being sent, strip off the pad in the tail buffer so that inline content following the Write chunk remains XDR-aligned when it is sent to the client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-02-06SUNRPC: Allow addition of new transports to a struct rpc_clntTrond Myklebust1-0/+11
Add a function to allow creation and addition of a new transport to an existing rpc_clnt Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-06SUNRPC: Add a helper to apply a function to all the rpc_clnt's transportsTrond Myklebust1-0/+4
Add a helper for tasks that require us to apply a function to all the transports in an rpc_clnt. An example of a usecase would be BIND_CONN_TO_SESSION, where we want to send one RPC call down each transport. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-06SUNRPC: Allow caller to specify the transport to useTrond Myklebust1-0/+1
This is needed in order to allow the NFSv4.1 backchannel and BIND_CONN_TO_SESSION function to work. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-06SUNRPC: Use the multipath iterator to assign a transport to each taskTrond Myklebust1-0/+2
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-06SUNRPC: Make rpc_clnt store the multipath iteratorsTrond Myklebust1-0/+2
This is a pre-patch for the RPC multipath code. It sets up the storage in struct rpc_clnt for the multipath code. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-06SUNRPC: Add a structure to track multiple transportsTrond Myklebust2-0/+74
In order to support multipathing/trunking we will need the ability to track multiple transports. This patch sets up a basic structure for doing so. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-01SUNRPC: Make freeing of struct xprt rcu-safeTrond Myklebust1-0/+1
Have it call kfree_rcu() to ensure that we can use it on rcu-protected lists. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-01SUNRPC: Uninline xprt_get(); It isn't performance critical.Trond Myklebust1-13/+3
Also allow callers to pass NULL arguments to xprt_get() and xprt_put(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-01SUNRPC: Reorder rpc_task to put waitqueue related info in same cachelinesTrond Myklebust1-14/+15
Try to group all the data required by the waitqueues, their timers and timer callbacks into the same cachelines for performance. With this reordering, "pahole" reports the following structure on x86_64: struct rpc_task { atomic_t tk_count; /* 0 4 */ int tk_status; /* 4 4 */ struct list_head tk_task; /* 8 16 */ void (*tk_callback)(struct rpc_task *); /* 24 void (*tk_action)(struct rpc_task *); /* 32 long unsigned int tk_timeout; /* 40 8 */ long unsigned int tk_runstate; /* 48 8 */ struct rpc_wait_queue * tk_waitqueue; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ union { struct work_struct tk_work; /* 64 */ struct rpc_wait tk_wait; /* 56 */ } u; /* 64 64 */ /* --- cacheline 2 boundary (128 bytes) --- */ struct rpc_message tk_msg; /* 128 32 */ void * tk_calldata; /* 160 8 */ const struct rpc_call_ops * tk_ops; /* 168 8 */ struct rpc_clnt * tk_client; /* 176 8 */ struct rpc_rqst * tk_rqstp; /* 184 8 */ /* --- cacheline 3 boundary (192 bytes) --- */ struct workqueue_struct * tk_workqueue; /* 192 8 */ ktime_t tk_start; /* 200 8 */ pid_t tk_owner; /* 208 4 */ short unsigned int tk_flags; /* 212 2 */ short unsigned int tk_timeouts; /* 214 2 */ short unsigned int tk_pid; /* 216 2 */ unsigned char tk_priority:2; /* 218: 6 1 */ unsigned char tk_garb_retry:2; /* 218: 4 1 */ unsigned char tk_cred_retry:2; /* 218: 2 1 */ unsigned char tk_rebind_retry:2; /* 218: 0 1 */ /* size: 224, cachelines: 4, members: 24 */ /* padding: 5 */ /* last cacheline: 32 bytes */ }; whereas on i386, it reports everything fitting into the 1st cacheline. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-02-01SUNRPC: Remove unused function rpc_task_reset_clientTrond Myklebust1-1/+0
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-27sunrpc: Use skcipher and ahash/shashHerbert Xu1-16/+16
This patch replaces uses of blkcipher with skcipher and the long obsolete hash interface with either shash (for non-SG users) and ahash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-01-24Merge tag 'for-linus' of ↵Linus Torvalds1-9/+30
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Initial roundup of 4.5 merge window patches - Remove usage of ib_query_device and instead store attributes in ib_device struct - Move iopoll out of block and into lib, rename to irqpoll, and use in several places in the rdma stack as our new completion queue polling library mechanism. Update the other block drivers that already used iopoll to use the new mechanism too. - Replace the per-entry GID table locks with a single GID table lock - IPoIB multicast cleanup - Cleanups to the IB MR facility - Add support for 64bit extended IB counters - Fix for netlink oops while parsing RDMA nl messages - RoCEv2 support for the core IB code - mlx4 RoCEv2 support - mlx5 RoCEv2 support - Cross Channel support for mlx5 - Timestamp support for mlx5 - Atomic support for mlx5 - Raw QP support for mlx5 - MAINTAINERS update for mlx4/mlx5 - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates - Add support for remote invalidate to the iSER driver (pushed through the RDMA tree due to dependencies, acknowledged by nab) - Update to NFSoRDMA (pushed through the RDMA tree due to dependencies, acknowledged by Bruce)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits) IB/mlx5: Unify CQ create flags check IB/mlx5: Expose Raw Packet QP to user space consumers {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib IB/mlx5: Support setting Ethernet priority for Raw Packet QPs IB/mlx5: Add Raw Packet QP query functionality IB/mlx5: Add create and destroy functionality for Raw Packet QP IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types IB/mlx5: Allocate a Transport Domain for each ucontext net/mlx5_core: Warn on unsupported events of QP/RQ/SQ net/mlx5_core: Add RQ and SQ event handling net/mlx5_core: Export transport objects IB/mlx5: Expose CQE version to user-space IB/mlx5: Add CQE version 1 support to user QPs and SRQs IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext IB/sa: Fix netlink local service GFP crash IB/srpt: Remove redundant wc array IB/qib: Improve ipoib UD performance IB/mlx4: Advertise RoCE v2 support IB/mlx4: Create and use another QP1 for RoCEv2 IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers ...
2016-01-19svc_rdma: use local_dma_lkeyChristoph Hellwig1-2/+0
We now alwasy have a per-PD local_dma_lkey available. Make use of that fact in svc_rdma and stop registering our own MR. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19svcrdma: Add class for RDMA backwards direction transportChuck Lever1-0/+5
To support the server-side of an NFSv4.1 backchannel on RDMA connections, add a transport class that enables backward direction messages on an existing forward channel connection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19svcrdma: Define maximum number of backchannel requestsChuck Lever1-3/+10
Extra resources for handling backchannel requests have to be pre-allocated when a transport instance is created. Set up additional fields in svcxprt_rdma to track these resources. The max_requests fields are elements of the RPC-over-RDMA protocol, so they should be u32. To ensure that unsigned arithmetic is used everywhere, some other fields in the svcxprt_rdma struct are updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19svcrdma: Make map_xdr non-staticChuck Lever1-0/+2
Pre-requisite to use map_xdr in the backchannel code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19svcrdma: Add gfp flags to svc_rdma_post_recv()Chuck Lever1-1/+1
svc_rdma_post_recv() allocates pages for receive buffers on-demand. It uses GFP_KERNEL so the allocator tries hard, and may sleep. But I'm about to add a call to svc_rdma_post_recv() from a function that may not sleep. Since all svc_rdma_post_recv() call sites can tolerate its failure, allow it to fail if the page allocator returns nothing. Longer term, receive buffers, being a finite resource per-connection, should be pre-allocated and re-used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19svcrdma: Remove unused req_map and ctxt kmem_cachesChuck Lever1-0/+1
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19svcrdma: Improve allocation of struct svc_rdma_req_mapChuck Lever1-2/+6
To ensure this allocation cannot fail and will not sleep, pre-allocate the req_map structures per-connection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19svcrdma: Improve allocation of struct svc_rdma_op_ctxtChuck Lever1-1/+5
When the maximum payload size of NFS READ and WRITE was increased by commit cc9a903d915c ("svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt increased to over 6KB (on x86_64). That makes allocating one of these from a kmem_cache more likely to fail in situations when system memory is exhausted. Since I'm about to add a caller where this allocation must always work _and_ it cannot sleep, pre-allocate ctxts for each connection. Another motivation for this change is that NFSv4.x servers are required by specification not to drop NFS requests. Pre-allocating memory resources reduces the likelihood of a drop. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23sunrpc: Add a function to close temporary transports immediatelyScott Mayhew1-0/+1
Add a function svc_age_temp_xprts_now() to close temporary transports whose xpt_local matches the address passed in server_addr immediately instead of waiting for them to be closed by the timer function. The function is intended to be used by notifier_blocks that will be added to nfsd and lockd that will run when an ip address is deleted. This will eliminate the ACK storms and client hangs that occur in HA-NFS configurations where nfsd & lockd is left running on the cluster nodes all the time and the NFS 'service' is migrated back and forth within a short timeframe. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24nfsd4: fix gss-proxy 4.1 mounts for some AD principalsJ. Bruce Fields1-1/+8
The principal name on a gss cred is used to setup the NFSv4.0 callback, which has to have a client principal name to authenticate to. That code wants the name to be in the form servicetype@hostname. rpc.svcgssd passes down such names (and passes down no principal name at all in the case the principal isn't a service principal). gss-proxy always passes down the principal name, and passes it down in the form servicetype/hostname@REALM. So we've been munging the name gss-proxy passes down into the format the NFSv4.0 callback code expects, or throwing away the name if we can't. Since the introduction of the MACH_CRED enforcement in NFSv4.1, we've also been using the principal name to verify that certain operations are done as the same principal as was used on the original EXCHANGE_ID call. For that application, the original name passed down by gss-proxy is also useful. Lack of that name in some cases was causing some kerberized NFSv4.1 mount failures in an Active Directory environment. This fix only works in the gss-proxy case. The fix for legacy rpc.svcgssd would be more involved, and rpc.svcgssd already has other problems in the AD case. Reported-and-tested-by: James Ralston <ralston@pobox.com> Acked-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-12Merge tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-5/+11
Pull nfsd updates from Bruce Fields: "Apologies for coming a little late in the merge window. Fortunately this is another fairly quiet one: Mainly smaller bugfixes and cleanup. We're still finding some bugs from the breakup of the big NFSv4 state lock in 3.17 -- thanks especially to Andrew Elble and Jeff Layton for tracking down some of the remaining races" * tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux: svcrpc: document lack of some memory barriers nfsd: fix race with open / open upgrade stateids nfsd: eliminate sending duplicate and repeated delegations nfsd: remove recurring workqueue job to clean DRC SUNRPC: drop stale comment in svc_setup_socket() nfsd: ensure that seqid morphing operations are atomic wrt to copies nfsd: serialize layout stateid morphing operations nfsd: improve client_has_state to check for unused openowners nfsd: fix clid_inuse on mount with security change sunrpc/cache: make cache flushing more reliable. nfsd: move include of state.h from trace.c to trace.h sunrpc: avoid warning in gss_key_timeout lockd: get rid of reference-counted NSM RPC clients SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage() lockd: create NSM handles per net namespace nfsd: switch unsigned char flags in svc_fh to bools nfsd: move svc_fh->fh_maxsize to just after fh_handle nfsd: drop null test before destroy functions nfsd: serialize state seqid morphing operations
2015-11-10Merge tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds4-1/+21
Pull NFS client updates from Trond Myklebust: "Highlights include: New features: - RDMA client backchannel from Chuck - Support for NFSv4.2 file CLONE using the btrfs ioctl Bugfixes + cleanups: - Move socket data receive out of the bottom halves and into a workqueue - Refactor NFSv4 error handling so synchronous and asynchronous RPC handles errors identically. - Fix a panic when blocks or object layouts reads return a bad data length - Fix nfsroot so it can handle a 1024 byte long path. - Fix bad usage of page offset in bl_read_pagelist - Various NFSv4 callback cleanups+fixes - Fix GETATTR bitmap verification - Support hexadecimal number for sunrpc debug sysctl files" * tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (53 commits) Sunrpc: Supports hexadecimal number for sysctl files of sunrpc debug nfs: Fix GETATTR bitmap verification nfs: Remove unused xdr page offsets in getacl/setacl arguments fs/nfs: remove unnecessary new_valid_dev check SUNRPC: fix variable type NFS: Enable client side NFSv4.1 backchannel to use other transports pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDS pNFS/flexfiles: When mirrored, retry failed reads by switching mirrors SUNRPC: Remove the TCP-only restriction in bc_svc_process() svcrdma: Add backward direction service for RPC/RDMA transport xprtrdma: Handle incoming backward direction RPC calls xprtrdma: Add support for sending backward direction RPC replies xprtrdma: Pre-allocate Work Requests for backchannel xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers SUNRPC: Abstract backchannel operations xprtrdma: Saving IRQs no longer needed for rb_lock xprtrdma: Remove reply tasklet xprtrdma: Use workqueue to process RPC/RDMA replies xprtrdma: Replace send and receive arrays xprtrdma: Refactor reply handler error handling ...