summaryrefslogtreecommitdiff
path: root/fs/nfs/nfs4proc.c
AgeCommit message (Collapse)AuthorFilesLines
2015-11-24nfs: use sliding delay when LAYOUTGET gets NFS4ERR_DELAYJeff Layton1-1/+1
When LAYOUTGET gets NFS4ERR_DELAY, we currently will wait 15s before retrying the call. That is a _very_ long time, so add a timeout value to struct nfs4_layoutget and pass nfs4_async_handle_error a pointer to it. This allows the RPC engine to use a sliding delay window, instead of a 15s delay. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-14Merge branch 'for-linus-3' of ↵Linus Torvalds1-14/+20
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr cleanups from Al Viro. * 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: f2fs: xattr simplifications squashfs: xattr simplifications 9p: xattr simplifications xattr handlers: Pass handler to operations instead of flags jffs2: Add missing capability check for listing trusted xattrs hfsplus: Remove unused xattr handler list operations ubifs: Remove unused security xattr handler vfs: Fix the posix_acl_xattr_list return value vfs: Check attribute names in posix acl xattr handers
2015-11-14xattr handlers: Pass handler to operations instead of flagsAndreas Gruenbacher1-14/+20
The xattr_handler operations are currently all passed a file system specific flags value which the operations can use to disambiguate between different handlers; some file systems use that to distinguish the xattr namespace, for example. In some oprations, it would be useful to also have access to the handler prefix. To allow that, pass a pointer to the handler to operations instead of the flags value alone. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-10Merge tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-91/+91
Pull NFS client updates from Trond Myklebust: "Highlights include: New features: - RDMA client backchannel from Chuck - Support for NFSv4.2 file CLONE using the btrfs ioctl Bugfixes + cleanups: - Move socket data receive out of the bottom halves and into a workqueue - Refactor NFSv4 error handling so synchronous and asynchronous RPC handles errors identically. - Fix a panic when blocks or object layouts reads return a bad data length - Fix nfsroot so it can handle a 1024 byte long path. - Fix bad usage of page offset in bl_read_pagelist - Various NFSv4 callback cleanups+fixes - Fix GETATTR bitmap verification - Support hexadecimal number for sunrpc debug sysctl files" * tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (53 commits) Sunrpc: Supports hexadecimal number for sysctl files of sunrpc debug nfs: Fix GETATTR bitmap verification nfs: Remove unused xdr page offsets in getacl/setacl arguments fs/nfs: remove unnecessary new_valid_dev check SUNRPC: fix variable type NFS: Enable client side NFSv4.1 backchannel to use other transports pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDS pNFS/flexfiles: When mirrored, retry failed reads by switching mirrors SUNRPC: Remove the TCP-only restriction in bc_svc_process() svcrdma: Add backward direction service for RPC/RDMA transport xprtrdma: Handle incoming backward direction RPC calls xprtrdma: Add support for sending backward direction RPC replies xprtrdma: Pre-allocate Work Requests for backchannel xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers SUNRPC: Abstract backchannel operations xprtrdma: Saving IRQs no longer needed for rb_lock xprtrdma: Remove reply tasklet xprtrdma: Use workqueue to process RPC/RDMA replies xprtrdma: Replace send and receive arrays xprtrdma: Refactor reply handler error handling ...
2015-11-05Merge tag 'locks-v4.4-1' of git://git.samba.org/jlayton/linuxLinus Torvalds1-12/+1
Pull file locking updates from Jeff Layton: "The largest series of changes is from Ben who offered up a set to add a new helper function for setting locks based on the type set in fl_flags. Dmitry also send in a fix for a potential race that he found with KTSAN" * tag 'locks-v4.4-1' of git://git.samba.org/jlayton/linux: locks: cleanup posix_lock_inode_wait and flock_lock_inode_wait Move locks API users to locks_lock_inode_wait() locks: introduce locks_lock_inode_wait() locks: Use more file_inode and fix a comment fs: fix data races on inode->i_flctx locks: change tracepoint for generic_add_lease
2015-11-03nfs: Remove unused xdr page offsets in getacl/setacl argumentsAndreas Gruenbacher1-3/+2
The arguments passed around for getacl and setacl xdr encoding, struct nfs_setaclargs and struct nfs_getaclargs, both contain an array of pages, an offset into the first page, and the length of the page data. The offset is unused as it is always zero; remove it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-22Move locks API users to locks_lock_inode_wait()Benjamin Coddington1-12/+1
Instead of having users check for FL_POSIX or FL_FLOCK to call the correct locks API function, use the check within locks_lock_inode_wait(). This allows for some later cleanup. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-10-21Merge branch 'nfsclone'Trond Myklebust1-1/+3
* nfsclone: nfs: add missing linux/types.h NFS: Fix an 'unused variable' complaint when #ifndef CONFIG_NFS_V4_2 nfs42: add NFS_IOC_CLONE_RANGE ioctl nfs42: respect clone_blksize nfs: get clone_blksize when probing fsinfo nfs42: add NFS_IOC_CLONE ioctl nfs42: add CLONE proc functions nfs42: add CLONE xdr functions
2015-10-15nfs: get clone_blksize when probing fsinfoPeng Tao1-0/+1
NFSv42 CLONE operation is supposed to respect it. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-15nfs42: add CLONE proc functionsPeng Tao1-1/+2
Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08NFSv4: Unify synchronous and asynchronous error handlingTrond Myklebust1-74/+49
They now only differ in the way we handle waiting, so let's unify. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08NFSv4: Don't use synchronous delegation recall in exception handlingTrond Myklebust1-5/+3
The code needs to be able to work from inside an asynchronous context. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08NFSv4: nfs4_async_handle_error should take a non-const nfs_serverTrond Myklebust1-3/+3
For symmetry with the synchronous handler, and so that we can potentially handle errors such as NFS4ERR_BADNAME. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08NFSv4: Update the delay statistics counter for synchronous delaysTrond Myklebust1-1/+2
Currently, we only do so for asynchronous delays. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-08NFSv4: Refactor NFSv4 error handlingTrond Myklebust1-7/+32
Prepare for unification of the synchronous and asynchronous error handling. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02nfs4: reset states to use open_stateid when returning delegation voluntarilyJeff Layton1-0/+3
When the client goes to return a delegation, it should always update any nfs4_state currently set up to use that delegation stateid to instead use the open stateid. It already does do this in some cases, particularly in the state recovery code, but not currently when the delegation is voluntarily returned (e.g. in advance of a RENAME). This causes the client to try to continue using the delegation stateid after the DELEGRETURN, e.g. in LAYOUTGET. Set the nfs4_state back to using the open stateid in nfs4_open_delegation_recall, just before clearing the NFS_DELEGATED_STATE bit. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFSv4: Fix a nograce recovery hangBenjamin Coddington1-0/+2
Since commit 5cae02f42793130e1387f4ec09c4d07056ce9fa5 an OPEN_CONFIRM should have a privileged sequence in the recovery case to allow nograce recovery to proceed for NFSv4.0. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-10-02NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FHTrond Myklebust1-2/+8
We need to warn against broken NFSv4.1 servers that try to hand out delegations in response to NFS4_OPEN_CLAIM_DELEG_CUR_FH. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-23NFS41: make close wait for layoutreturnPeng Tao1-0/+17
If we send a layoutreturn asynchronously before close, the close might reach server first and layoutreturn would fail with BADSTATEID because there is nothing keeping the layout stateid alive. Also do not pretend sending layoutreturn if we are not. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-21NFSv4.x/pnfs: Don't try to recover stateids twice in layoutgetTrond Myklebust1-11/+17
If the current open or layout stateid doesn't match the stateid used in the layoutget RPC call, then don't try to recover it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-21NFSv4: Recovery of recalled read delegations is brokenTrond Myklebust1-34/+47
When a read delegation is being recalled, and we're reclaiming the cached opens, we need to make sure that we only reclaim read-only modes. A previous attempt to do this, relied on retrieving the delegation type from the nfs4_opendata structure. Unfortunately, as Kinglong pointed out, this field can only be set when performing reboot recovery. Furthermore, if we call nfs4_open_recover(), then we end up clobbering the state->flags for all modes that we're not recovering... The fix is to have the delegation recall code pass this information to the recovery call, and then refactor the recovery code so that nfs4_open_delegation_recall() does not need to call nfs4_open_recover(). Reported-by: Kinglong Mee <kinglongmee@gmail.com> Fixes: 39f897fdbd46 ("NFSv4: When returning a delegation, don't...") Tested-by: Kinglong Mee <kinglongmee@gmail.com> Cc: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-20NFS: Fix an infinite loop when layoutget fail with BAD_STATEIDKinglong Mee1-0/+1
If layouget fail with BAD_STATEID, restart should not using the old stateid. But, nfs client choose the layout stateid at first, and then the open stateid. To avoid the infinite loop of using bad stateid for layoutget, this patch sets the layout flag'ss NFS_LAYOUT_INVALID_STID bit to skip choosing the bad layout stateid. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-02nfs: Remove unneeded checking of the return value from scnprintfKinglong Mee1-18/+1
The return value from scnprintf always less than the buffer length. So, result >= len always false. This patch removes those checking. int vscnprintf(char *buf, size_t size, const char *fmt, va_list args) { int i; i = vsnprintf(buf, size, fmt, args); if (likely(i < size)) return i; if (size != 0) return size - 1; return 0; } Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-02nfs: Fix truncated client owner id without proto typeKinglong Mee1-1/+1
The length of "Linux NFSv4.0 " is 14, not 10. Without this patch, I get a truncated client owner id as, "Linux NFSv4.0 ::1/::1" With this patch, "Linux NFSv4.0 ::1/::1 tcp" Fixes: a319268891 ("nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31NFSv4.1/pnfs: Handle LAYOUTGET return values correctlyTrond Myklebust1-1/+14
According to RFC5661 section 18.43.3, if the server cannot satisfy the loga_minlength argument to LAYOUTGET, there are 2 cases: 1) If loga_minlength == 0, it returns NFS4ERR_LAYOUTTRYLATER 2) If loga_minlength != 0, it returns NFS4ERR_BADLAYOUT Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-31NFSv4.1: Fix a protocol issue with CLOSE stateidsTrond Myklebust1-5/+10
According to RFC5661 Section 18.2.4, CLOSE is supposed to return the zero stateid. This means that nfs_clear_open_stateid_locked() cannot assume that the result stateid will always match the 'other' field of the existing open stateid when trying to determine a race with a parallel OPEN. Instead, we look at the argument, and check for matches. Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-28NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1Kinglong Mee1-4/+14
Client sends a SETATTR request after OPEN for updating attributes. For create file with S_ISGID is set, the S_ISGID in SETATTR will be ignored at nfs server as chmod of no PERMISSION. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-28NFS: Get suppattr_exclcreat when getting server capabilitiesKinglong Mee1-1/+13
Create file with attributs as NFS4_CREATE_EXCLUSIVE4_1 mode depends on suppattr_exclcreat attribut. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-28NFS: Make opened as optional argument in _nfs4_do_openKinglong Mee1-3/+2
Check opened, only update it when non-NULL. It's not needs define an unused value for the opened when calling _nfs4_do_open. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-20NFSv4: Enable delegated opens even when reboot recovery is pendingTrond Myklebust1-8/+19
Unlike the previous attempt, this takes into account the fact that we may be calling it from the recovery thread itself. Detect this by looking at what kind of open we're doing, and checking the state of the NFS_DELEGATION_NEED_RECLAIM if it turns out we're doing a reboot reclaim-type open. Cc: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-19Revert "NFSv4: Remove incorrect check in can_open_delegated()"Trond Myklebust1-0/+2
This reverts commit 4e379d36c050b0117b5d10048be63a44f5036115. This commit opens up a race between the recovery code and the open code. Reported-by: Olga Kornievskaia <aglo@umich.edu> Cc: stable@vger.kernel # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-19NFSv4.1/pnfs: Play safe w.r.t. close() races when return-on-close is setTrond Myklebust1-5/+5
If we have an OPEN_DOWNGRADE and CLOSE race with one another, we want to ensure that the layout is forgotten by the client, so that we start afresh with a new layoutget. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-19NFSv4.1/pnfs: Fix a close/delegreturn hang when return-on-close is setTrond Myklebust1-8/+4
The helper pnfs_roc() has already verified that we have no delegations, and no further open files, hence no outstanding I/O and it has marked all the return-on-close lsegs as being invalid. Furthermore, it sets the NFS_LAYOUT_RETURN bit, thus serialising the close/delegreturn with all future layoutget calls on this inode. The checks in pnfs_roc_drain() for valid layout segments are therefore redundant: those cannot exist until another layoutget completes. The other check for whether or not NFS_LAYOUT_RETURN is set, actually causes a hang, since we already know that we hold that flag. To fix, we therefore strip out all the functionality in pnfs_roc_drain() except the retrieval of the barrier state, and then rename the function accordingly. Reported-by: Christoph Hellwig <hch@infradead.org> Fixes: 5c4a79fb2b1c ("Don't prevent layoutgets when doing return-on-close") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17NFS: Remove nfs41_server_notify_{target|highest}_slotid_update()Anna Schumaker1-1/+1
All these functions do is call nfs41_ping_server() without adding anything. Let's remove them and give nfs41_ping_server() a better name instead. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 clientKinglong Mee1-0/+1
---Steps to Reproduce-- <nfs-server> # cat /etc/exports /nfs/referal *(rw,insecure,no_subtree_check,no_root_squash,crossmnt) /nfs/old *(ro,insecure,subtree_check,root_squash,crossmnt) <nfs-client> # mount -t nfs nfs-server:/nfs/ /mnt/ # ll /mnt/*/ <nfs-server> # cat /etc/exports /nfs/referal *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server) /nfs/old *(ro,insecure,subtree_check,root_squash,crossmnt) # service nfs restart <nfs-client> # ll /mnt/*/ --->>>>> oops here [ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at (null) [ 5123.103363] IP: [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4] [ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0 [ 5123.104131] Oops: 0000 [#1] [ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd] [ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G OE 4.2.0-rc6+ #214 [ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000 [ 5123.107363] RIP: 0010:[<ffffffffa03ed38b>] [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4] [ 5123.107909] RSP: 0018:ffff88005877fdb8 EFLAGS: 00010246 [ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240 [ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908 [ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000 [ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800 [ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240 [ 5123.111169] FS: 0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000 [ 5123.111726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0 [ 5123.112888] Stack: [ 5123.113458] ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000 [ 5123.114049] 0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6 [ 5123.114662] ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800 [ 5123.115264] Call Trace: [ 5123.115868] [<ffffffffa03fb44b>] nfs4_try_migration+0xbb/0x220 [nfsv4] [ 5123.116487] [<ffffffffa03fcb3b>] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4] [ 5123.117104] [<ffffffffa03fc690>] ? nfs4_do_reclaim+0x510/0x510 [nfsv4] [ 5123.117813] [<ffffffff810a4527>] kthread+0xd7/0xf0 [ 5123.118456] [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160 [ 5123.119108] [<ffffffff816d9cdf>] ret_from_fork+0x3f/0x70 [ 5123.119723] [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160 [ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33 [ 5123.121643] RIP [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4] [ 5123.122308] RSP <ffff88005877fdb8> [ 5123.122942] CR2: 0000000000000000 Fixes: ec011fe847 ("NFS: Introduce a vector of migration recovery ops") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-12NFSv4: don't set SETATTR for O_RDONLY|O_EXCLNeilBrown1-1/+1
It is unusual to combine the open flags O_RDONLY and O_EXCL, but it appears that libre-office does just that. [pid 3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0 [pid 3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...> NFSv4 takes O_EXCL as a sign that a setattr command should be sent, probably to reset the timestamps. When it was an O_RDONLY open, the SETATTR command does not identify any actual attributes to change. If no delegation was provided to the open, the SETATTR uses the all-zeros stateid and the request is accepted (at least by the Linux NFS server - no harm, no foul). If a read-delegation was provided, this is used in the SETATTR request, and a Netapp filer will justifiably claim NFS4ERR_BAD_STATEID, which the Linux client takes as a sign to retry - indefinitely. So only treat O_EXCL specially if O_CREAT was also given. Signed-off-by: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-28Merge tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-15/+21
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - Fix a situation where the client uses the wrong (zero) stateid. - Fix a memory leak in nfs_do_recoalesce Bugfixes: - Plug a memory leak when ->prepare_layoutcommit fails - Fix an Oops in the NFSv4 open code - Fix a backchannel deadlock - Fix a livelock in sunrpc when sendmsg fails due to low memory availability - Don't revalidate the mapping if both size and change attr are up to date - Ensure we don't miss a file extension when doing pNFS - Several fixes to handle NFSv4.1 sequence operation status bits correctly - Several pNFS layout return bugfixes" * tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) nfs: Fix an oops caused by using other thread's stack space in ASYNC mode nfs: plug memory leak when ->prepare_layoutcommit fails SUNRPC: Report TCP errors to the caller sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. NFSv4.2: handle NFS-specific llseek errors NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce() NFS: Fix a memory leak in nfs_do_recoalesce NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised NFS: Don't revalidate the mapping if both size and change attr are up to date NFSv4/pnfs: Ensure we don't miss a file extension NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked SUNRPC: xprt_complete_bc_request must also decrement the free slot count SUNRPC: Fix a backchannel deadlock pNFS: Don't throw out valid layout segments pNFS: pnfs_roc_drain() fix a race with open pNFS: Fix races between return-on-close and layoutreturn. pNFS: pnfs_roc_drain should return 'true' when sleeping pNFS: Layoutreturn must invalidate all existing layout segments. ...
2015-07-28nfs: Fix an oops caused by using other thread's stack space in ASYNC modeKinglong Mee1-1/+9
An oops caused by using other thread's stack space in sunrpc ASYNC sending thread. [ 9839.007187] ------------[ cut here ]------------ [ 9839.007923] kernel BUG at fs/nfs/nfs4xdr.c:910! [ 9839.008069] invalid opcode: 0000 [#1] SMP [ 9839.008069] Modules linked in: blocklayoutdriver rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm joydev iosf_mbi crct10dif_pclmul snd_timer crc32_pclmul crc32c_intel ghash_clmulni_intel snd soundcore ppdev pvpanic parport_pc i2c_piix4 serio_raw virtio_balloon parport acpi_cpufreq nfsd nfs_acl lockd grace auth_rpcgss sunrpc qxl drm_kms_helper virtio_net virtio_console virtio_blk ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi [ 9839.008069] CPU: 0 PID: 308 Comm: kworker/0:1H Not tainted 4.0.0-0.rc4.git1.3.fc23.x86_64 #1 [ 9839.008069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 9839.008069] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 9839.008069] task: ffff8800d8b4d8e0 ti: ffff880036678000 task.ti: ffff880036678000 [ 9839.008069] RIP: 0010:[<ffffffffa0339cc9>] [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4] [ 9839.008069] RSP: 0018:ffff88003667ba58 EFLAGS: 00010246 [ 9839.008069] RAX: 0000000000000000 RBX: 000000001fc15e18 RCX: ffff8800c0193800 [ 9839.008069] RDX: ffff8800e4ae3f24 RSI: 000000001fc15e2c RDI: ffff88003667bcd0 [ 9839.008069] RBP: ffff88003667ba58 R08: ffff8800d9173008 R09: 0000000000000003 [ 9839.008069] R10: ffff88003667bcd0 R11: 000000000000000c R12: 0000000000010000 [ 9839.008069] R13: ffff8800d9173350 R14: 0000000000000000 R15: ffff8800c0067b98 [ 9839.008069] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 9839.008069] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9839.008069] CR2: 00007f988c9c8bb0 CR3: 00000000d99b6000 CR4: 00000000000407f0 [ 9839.008069] Stack: [ 9839.008069] ffff88003667bbc8 ffffffffa03412c5 00000000c6c55680 ffff880000000003 [ 9839.008069] 0000000000000088 00000010c6c55680 0001000000000002 ffffffff816e87e9 [ 9839.008069] 0000000000000000 00000000477290e2 ffff88003667bab8 ffffffff81327ba3 [ 9839.008069] Call Trace: [ 9839.008069] [<ffffffffa03412c5>] encode_attrs+0x435/0x530 [nfsv4] [ 9839.008069] [<ffffffff816e87e9>] ? inet_sendmsg+0x69/0xb0 [ 9839.008069] [<ffffffff81327ba3>] ? selinux_socket_sendmsg+0x23/0x30 [ 9839.008069] [<ffffffff8164c1df>] ? do_sock_sendmsg+0x9f/0xc0 [ 9839.008069] [<ffffffff8164c278>] ? kernel_sendmsg+0x58/0x70 [ 9839.008069] [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc] [ 9839.008069] [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc] [ 9839.008069] [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4] [ 9839.008069] [<ffffffffa03419a5>] encode_open+0x2d5/0x340 [nfsv4] [ 9839.008069] [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4] [ 9839.008069] [<ffffffffa011ab89>] ? xdr_encode_opaque+0x19/0x20 [sunrpc] [ 9839.008069] [<ffffffffa0339cfb>] ? encode_string+0x2b/0x40 [nfsv4] [ 9839.008069] [<ffffffffa0341bf3>] nfs4_xdr_enc_open+0xb3/0x140 [nfsv4] [ 9839.008069] [<ffffffffa0110a4c>] rpcauth_wrap_req+0xac/0xf0 [sunrpc] [ 9839.008069] [<ffffffffa01017db>] call_transmit+0x18b/0x2d0 [sunrpc] [ 9839.008069] [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc] [ 9839.008069] [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc] [ 9839.008069] [<ffffffffa010caa0>] __rpc_execute+0x90/0x460 [sunrpc] [ 9839.008069] [<ffffffffa010ce85>] rpc_async_schedule+0x15/0x20 [sunrpc] [ 9839.008069] [<ffffffff810b452b>] process_one_work+0x1bb/0x410 [ 9839.008069] [<ffffffff810b47d3>] worker_thread+0x53/0x470 [ 9839.008069] [<ffffffff810b4780>] ? process_one_work+0x410/0x410 [ 9839.008069] [<ffffffff810b4780>] ? process_one_work+0x410/0x410 [ 9839.008069] [<ffffffff810ba7b8>] kthread+0xd8/0xf0 [ 9839.008069] [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180 [ 9839.008069] [<ffffffff81786418>] ret_from_fork+0x58/0x90 [ 9839.008069] [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180 [ 9839.008069] Code: 00 00 48 c7 c7 21 fa 37 a0 e8 94 1c d6 e0 c6 05 d2 17 05 00 01 8b 03 eb d7 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 89 f3 [ 9839.008069] RIP [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4] [ 9839.008069] RSP <ffff88003667ba58> [ 9839.071114] ---[ end trace cc14c03adb522e94 ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-23NFS: Remove the "NFS_CAP_CHANGE_ATTR" capabilityTrond Myklebust1-3/+0
Setting the change attribute has been mandatory for all NFS versions, since commit 3a1556e8662c ("NFSv2/v3: Simulate the change attribute"). We should therefore not have anything be conditional on it being set/unset. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-23NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_lockedTrond Myklebust1-0/+3
Otherwise, nfs4_select_rw_stateid() will always return the zero stateid instead of the correct open stateid. Fixes: f95549cf24660 ("NFSv4: More CLOSE/OPEN races") Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-13nfs4: have do_vfs_lock take an inode pointerJeff Layton1-8/+8
Now that we have file locking helpers that can deal with an inode instead of a filp, we can change the NFSv4 locking code to use that instead. This should fix the case where we have a filp that is closed while flock or OFD locks are set on it, and the task is signaled so that it doesn't wait for the LOCKU reply to come in before the filp is freed. At that point we can end up with a use-after-free with the current code, which relies on dereferencing the fl_file in the lock request. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org> Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
2015-07-13Revert "nfs: take extra reference to fl->fl_file when running a LOCKU operation"Jeff Layton1-2/+0
This reverts commit db2efec0caba4f81a22d95a34da640b86c313c8e. William reported that he was seeing instability with this patch, which is likely due to the fact that it can cause the kernel to take a new reference to a filp after the last reference has already been put. Revert this patch for now, as we'll need to fix this in another way. Cc: stable@vger.kernel.org Reported-by: William Dauchy <william@gandi.net> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: "J. Bruce Fields" <bfields@fieldses.org> Tested-by: "J. Bruce Fields" <bfields@fieldses.org>
2015-07-11pNFS: Fix races between return-on-close and layoutreturn.Trond Myklebust1-2/+0
If one or more of the layout segments reports an error during I/O, then we may have to send a layoutreturn to report the error back to the NFS metadata server. This patch ensures that the return-on-close code can detect the outstanding layoutreturn, and not preempt it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-11pNFS: Layoutreturn must invalidate all existing layout segments.Trond Myklebust1-0/+3
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-05NFSv4: Leases are renewed in sequence_done when we have sessionsTrond Myklebust1-7/+5
Ensure that the calls to renew_lease() in open_done() etc. only apply to session-less versions of NFSv4.x (i.e. NFSv4.0). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-05NFSv4.1: nfs41_sequence_done should handle sequence flag errorsTrond Myklebust1-2/+1
Instead of just kicking off lease recovery, we should look into the sequence flag errors and handle them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-01nfs: Remove invalid tk_pid from debug messageKinglong Mee1-3/+2
Before rpc_run_task(), tk_pid is uninitiated as 0 always. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-01nfs: Fix a memory leak when meeting an unsupported state protectKinglong Mee1-6/+8
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-01nfs: take extra reference to fl->fl_file when running a LOCKU operationJeff Layton1-0/+2
Jean reported another crash, similar to the one fixed by feaff8e5b2cf: BUG: unable to handle kernel NULL pointer dereference at 0000000000000148 IP: [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0 PGD 0 Oops: 0000 [#1] SMP Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock cfg80211 rfkill coretemp crct10dif_pclmul ppdev vmw_balloon crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr vmxnet3 parport_pc i2c_piix4 microcode serio_raw parport nfsd floppy vmw_vmci acpi_cpufreq auth_rpcgss shpchp nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih ata_generic mptbase i2c_core pata_acpi CPU: 0 PID: 329 Comm: kworker/0:1H Not tainted 4.1.0-rc7+ #2 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013 Workqueue: rpciod rpc_async_schedule [sunrpc] 30ec000 RIP: 0010:[<ffffffff8124ef7f>] [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0 RSP: 0018:ffff8802330efc08 EFLAGS: 00010296 RAX: ffff8802330efc58 RBX: ffff880097187c80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 RBP: ffff8802330efc18 R08: ffff88023fc173d8 R09: 3038b7bf00000000 R10: 00002f1a02000000 R11: 3038b7bf00000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8802337a2300 R15: 0000000000000020 FS: 0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000148 CR3: 000000003680f000 CR4: 00000000001407f0 Stack: ffff880097187c80 ffff880097187cd8 ffff8802330efc98 ffffffff81250281 ffff8802330efc68 ffffffffa013e7df ffff8802330efc98 0000000000000246 ffff8801f6901c00 ffff880233d2b8d8 ffff8802330efc58 ffff8802330efc58 Call Trace: [<ffffffff81250281>] __posix_lock_file+0x31/0x5e0 [<ffffffffa013e7df>] ? rpc_wake_up_task_queue_locked.part.35+0xcf/0x240 [sunrpc] [<ffffffff8125088b>] posix_lock_file_wait+0x3b/0xd0 [<ffffffffa03890b2>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4] [<ffffffffa0365808>] ? nfs41_sequence_done+0xd8/0x300 [nfsv4] [<ffffffffa0367525>] do_vfs_lock+0x35/0x40 [nfsv4] [<ffffffffa03690c1>] nfs4_locku_done+0x81/0x120 [nfsv4] [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc] [<ffffffffa013e33c>] rpc_exit_task+0x2c/0x90 [sunrpc] [<ffffffffa0134400>] ? call_refreshresult+0x170/0x170 [sunrpc] [<ffffffffa013ece4>] __rpc_execute+0x84/0x410 [sunrpc] [<ffffffffa013f085>] rpc_async_schedule+0x15/0x20 [sunrpc] [<ffffffff810add67>] process_one_work+0x147/0x400 [<ffffffff810ae42b>] worker_thread+0x11b/0x460 [<ffffffff810ae310>] ? rescuer_thread+0x2f0/0x2f0 [<ffffffff810b35d9>] kthread+0xc9/0xe0 [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pmd+0xa0/0x160 [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170 [<ffffffff8173c222>] ret_from_fork+0x42/0x70 [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170 Code: a5 81 e8 85 75 e4 ff c6 05 31 ee aa 00 01 eb 98 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 9f 48 01 00 00 48 85 db 74 08 48 89 d8 5b 41 5c 5d c3 83 RIP [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0 RSP <ffff8802330efc08> CR2: 0000000000000148 ---[ end trace 64484f16250de7ef ]--- The problem is almost exactly the same as the one fixed by feaff8e5b2cf. We must take a reference to the struct file when running the LOCKU compound to prevent the final fput from running until the operation is complete. Reported-by: Jean Spector <jean@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-29NFSv4: When returning a delegation, don't reclaim an incompatible open mode.NeilBrown1-0/+7
It is possible to have an active open with one mode, and a delegation for the same file with a different mode. In particular, a WR_ONLY open and an RD_ONLY delegation. This happens if a WR_ONLY open is followed by a RD_ONLY open which provides a delegation, but is then close. When returning the delegation, we currently try to claim opens for every open type (n_rdwr, n_rdonly, n_wronly). As there is no harm in claiming an open for a mode that we already have, this is often simplest. However if the delegation only provides a subset of the modes that we currently have open, this will produce an error from the server. So when claiming open modes prior to returning a delegation, skip the open request if the mode is not covered by the delegation - the open_stateid must already cover that mode, so there is nothing to do. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>