summaryrefslogtreecommitdiff
path: root/fs/nfs/inode.c
AgeCommit message (Collapse)AuthorFilesLines
2016-12-05NFS: Fix incorrect mapping revalidation when holding a delegationTrond Myklebust1-3/+9
We should only care about checking the attributes if the page cache is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the NFS_INO_REVAL_FORCED flag is set. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02NFS: Allow getattr to also report readdirplus cache hitsTrond Myklebust1-4/+17
If the use called stat() on an 'ls -l' workload, and the attribute cache was successfully revalidate by READDIRPLUS, then we want to report that back so that the readdir code continues to use readdirplus. Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02NFS: discard nfs_lockowner structure.NeilBrown1-2/+2
It now has only one field and is only used in one structure. So replaced it in that structure by the field it contains. Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02NFSv4: add flock_owner to open contextNeilBrown1-2/+5
An open file description (struct file) in a given process can be associated with two different lock owners. It can have a Posix lock owner which will be different in each process that has a fd on the file. It can have a Flock owner which will be the same in all processes. When searching for a lock stateid to use, we need to consider both of these owners So add a new "flock_owner" to the "nfs_open_context" (of which there is one for each open file description). This flock_owner does not need to be reference-counted as there is a 1-1 relation between 'struct file' and nfs open contexts, and it will never be part of a list of contexts. So there is no need for a 'flock_context' - just the owner is enough. The io_count included in the (Posix) lock_context provides no guarantee that all read-aheads that could use the state have completed, so not supporting it for flock locks in not a serious problem. Synchronization between flock and read-ahead can be added later if needed. When creating an open_context for a non-openning create call, we don't have a 'struct file' to pass in, so the lock context gets initialized with a NULL owner, but this will never be used. The flock_owner is not used at all in this patch, that will come later. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02NFS: remove l_pid field from nfs_lockownerNeilBrown1-3/+0
this field is not used in any important way and probably should have been removed by Commit: 8003d3c4aaa5 ("nfs4: treat lock owners as opaque values") which removed the pid argument from nfs4_get_lock_state. Except in unusual and uninteresting cases, two threads with the same ->tgid will have the same ->files pointer, so keeping them both for comparison brings no benefit. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-12-02NFSv4: Optimise away forced revalidation when we know the attributes are OKTrond Myklebust1-1/+1
The NFS_INO_REVAL_FORCED flag needs to be set if we just got a delegation, and we see that there might still be some ambiguity as to whether or not our attribute or data cache are valid. In practice, this means that a call to nfs_check_inode_attributes() will have noticed a discrepancy between cached attributes and measured ones, so let's move the setting of NFS_INO_REVAL_FORCED to there. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan1-1/+1
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-28pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstandingBenjamin Coddington1-3/+5
A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes, and in that case NFS_INO_INVALID_ATTR is never set on the second pass through nfs_update_inode(). The existing check to skip the clearing of NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this case (see commit 10b7e9ad4488: "pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding"). We know that if a LAYOUTCOMMIT is outstanding then attributes will need upating, so always set NFS_INO_INVALID_ATTR. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-25Merge branch 'writeback'Trond Myklebust1-71/+67
2016-07-18pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstandingTrond Myklebust1-1/+4
We know that the attributes will need updating if there is still a LAYOUTCOMMIT outstanding. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-06NFS: Getattr doesn't require data sync semanticsTrond Myklebust1-3/+1
When retrieving stat() information, NFS unfortunately does require us to sync writes to disk in order to ensure that mtime and ctime are up to date. However we shouldn't have to ensure that those writes are persisted. Relaxing that requirement does mean that we may see an mtime/ctime change if the server reboots and forces us to replay all writes. The exception to this rule are pNFS clients that are required to send layoutcommit, however that is dealt with by the call to pnfs_sync_inode() in _nfs_revalidate_inode(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-06NFS: Do not aggressively cache file attributes in the case of O_DIRECTTrond Myklebust1-2/+7
A file that is open for O_DIRECT is by definition not obeying close-to-open cache consistency semantics, so let's not cache the attributes too aggressively either. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-06NFS: Remove unused function nfs_revalidate_mapping_protected()Trond Myklebust1-34/+4
Clean up... Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-06pNFS: Ensure we layoutcommit before revalidating attributesTrond Myklebust1-16/+7
If we need to update the cached attributes, then we'd better make sure that we also layoutcommit first. Otherwise, the server may have stale attributes. Prior to this patch, the revalidation code tried to "fix" this problem by simply disabling attributes that would be affected by the layoutcommit. That approach breaks nfs_writeback_check_extend(), leading to a file size corruption. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-24NFS: Fix potential race in nfs_fhget()Trond Myklebust1-0/+1
If we don't set the mode correctly in nfs_init_locked(), then there is potential for a race with a second call to nfs_fhget that will cause inode aliasing. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-06-22NFS: Cache aggressively when file is open for writingTrond Myklebust1-18/+44
Unless the user is using file locking, we must assume close-to-open cache consistency when the file is open for writing. Adjust the caching algorithm so that it does not clear the cache on out-of-order writes and/or attribute revalidations. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-06-13NFS: Don't flush caches for a getattr that races with writebackTrond Myklebust1-6/+9
If there were outstanding writes then chalk up the unexpected change attribute on the server to them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-05-09nfs: per-name sillyunlink exclusionAl Viro1-3/+1
use d_alloc_parallel() for sillyunlink/lookup exclusion and explicit rwsem (nfs_rmdir() being a writer and nfs_call_unlink() - a reader) for rmdir/sillyunlink one. That ought to make lookup/readdir/!O_CREAT atomic_open really parallel on NFS. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-08Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "These changes contains a fix for overlayfs interacting with some (badly behaved) dentry code in various file systems. These have been reviewed by Al and the respective file system mtinainers and are going through the ext4 tree for convenience. This also has a few ext4 encryption bug fixes that were discovered in Android testing (yes, we will need to get these sync'ed up with the fs/crypto code; I'll take care of that). It also has some bug fixes and a change to ignore the legacy quota options to allow for xfstests regression testing of ext4's internal quota feature and to be more consistent with how xfs handles this case" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: ignore quota mount options if the quota feature is enabled ext4 crypto: fix some error handling ext4: avoid calling dquot_get_next_id() if quota is not enabled ext4: retry block allocation for failed DIO and DAX writes ext4: add lockdep annotations for i_data_sem ext4: allow readdir()'s of large empty directories to be interrupted btrfs: fix crash/invalid memory access on fsync when using overlayfs ext4 crypto: use dget_parent() in ext4_d_revalidate() ext4: use file_dentry() ext4: use dget_parent() in ext4_file_open() nfs: use file_dentry() fs: add file_dentry() ext4 crypto: don't let data integrity writebacks fail with ENOMEM ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
2016-03-26nfs: use file_dentry()Miklos Szeredi1-1/+1
NFS may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: <stable@vger.kernel.org> # v4.2 Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk>
2016-03-16nfs: remove nfs_inode_dio_waitChristoph Hellwig1-1/+1
Just call inode_dio_wait directly instead of through a pointless wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-01-23wrappers for ->i_mutex accessAl Viro1-4/+4
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge first patch-bomb from Andrew Morton: - A few hotfixes which missed 4.4 becasue I was asleep. cc'ed to -stable - A few misc fixes - OCFS2 updates - Part of MM. Including pretty large changes to page-flags handling and to thp management which have been buffered up for 2-3 cycles now. I have a lot of MM material this time. [ It turns out the THP part wasn't quite ready, so that got dropped from this series - Linus ] * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) zsmalloc: reorganize struct size_class to pack 4 bytes hole mm/zbud.c: use list_last_entry() instead of list_tail_entry() zram/zcomp: do not zero out zcomp private pages zram: pass gfp from zcomp frontend to backend zram: try vmalloc() after kmalloc() zram/zcomp: use GFP_NOIO to allocate streams mm: add tracepoint for scanning pages drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64 mm/page_isolation: use macro to judge the alignment mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE() mm: rework virtual memory accounting include/linux/memblock.h: fix ordering of 'flags' argument in comments mm: move lru_to_page to mm_inline.h Documentation/filesystems: describe the shared memory usage/accounting memory-hotplug: don't BUG() in register_memory_resource() hugetlb: make mm and fs code explicitly non-modular mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd() mm: make sure isolate_lru_page() is never called for tail page vmstat: make vmstat_updater deferrable again and shut down on idle ...
2016-01-15Merge tag 'nfs-for-4.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-21/+57
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Fix a regression in the SunRPC socket polling code - Fix the attribute cache revalidation code - Fix race in __update_open_stateid() - Fix an lo->plh_block_lgets imbalance in layoutreturn - Fix an Oopsable typo in ff_mirror_match_fh() Features: - pNFS layout recall performance improvements. - pNFS/flexfiles: Support server-supplied layoutstats sampling period Bugfixes + cleanups: - NFSv4: Don't perform cached access checks before we've OPENed the file - Fix starvation issues with background flushes - Reclaim writes should be flushed as unstable writes if there are already entries in the commit lists - Various bugfixes from Chuck to fix NFS/RDMA send queue ordering problems - Ensure that we propagate fatal layoutget errors back to the application - Fixes for sundry flexfiles layoutstats bugs - Fix files/flexfiles to not cache invalidated layouts in the DS commit buckets" * tag 'nfs-for-4.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits) NFS: Fix a compile warning about unused variable in nfs_generic_pg_pgios() NFSv4: Fix a compile warning about no prototype for nfs4_ioctl() NFS: Use wait_on_atomic_t() for unlock after readahead SUNRPC: Fixup socket wait for memory NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments NFSv4.1/pnfs: Cleanup copying of pnfs_layout_range structures NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid() NFSv4.1/pNFS: Fix a race in initiate_file_draining() NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn() NFS: Relax requirements in nfs_flush_incompatible NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid NFS: Allow multiple commit requests in flight per file NFS/pNFS: Fix up pNFS write reschedule layering violations and bugs SUNRPC: Fix a missing break in rpc_anyaddr() pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh() NFS: Fix attribute cache revalidation NFS: Ensure we revalidate attributes before using execute_ok() ...
2016-01-15kmemcg: account certain kmem allocations to memcgVladimir Davydov1-1/+1
Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-12Merge branch 'work.symlinks' of ↵Linus Torvalds1-2/+24
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs RCU symlink updates from Al Viro: "Replacement of ->follow_link/->put_link, allowing to stay in RCU mode even if the symlink is not an embedded one. No changes since the mailbomb on Jan 1" * 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->get_link() to delayed_call, kill ->put_link() kill free_page_put_link() teach nfs_get_link() to work in RCU mode teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode teach shmem_get_link() to work in RCU mode teach page_get_link() to work in RCU mode replace ->follow_link() with new method that could stay in RCU mode don't put symlink bodies in pagecache into highmem namei: page_getlink() and page_follow_link_light() are the same thing ufs: get rid of ->setattr() for symlinks udf: don't duplicate page_symlink_inode_operations logfs: don't duplicate page_symlink_inode_operations switch befs long symlinks to page_symlink_operations
2016-01-08Merge branch 'bugfixes'Trond Myklebust1-15/+39
* bugfixes: SUNRPC: Fixup socket wait for memory SUNRPC: Fix a missing break in rpc_anyaddr() pNFS/flexfiles: Fix an Oopsable typo in ff_mirror_match_fh() NFS: Fix attribute cache revalidation NFS: Ensure we revalidate attributes before using execute_ok() NFS: Flush reclaim writes using FLUSH_COND_STABLE NFS: Background flush should not be low priority NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturn NFSv4: Don't perform cached access checks before we've OPENed the file NFS: Allow the combination pNFS and labeled NFS NFS42: handle layoutstats stateid error nfs: Fix race in __update_open_stateid() nfs: fix missing assignment in nfs4_sequence_done tracepoint
2016-01-08NFS: Use wait_on_atomic_t() for unlock after readaheadBenjamin Coddington1-6/+12
The use of wait_on_atomic_t() for waiting on I/O to complete before unlocking allows us to git rid of the NFS_IO_INPROGRESS flag, and thus the nfs_iocounter's flags member, and finally the nfs_iocounter altogether. The count of I/O is moved to the lock context, and the counter increment/decrement functions become simple enough to open-code. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> [Trond: Fix up conflict with existing function nfs_wait_atomic_killable()] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-30NFS: Fix attribute cache revalidationTrond Myklebust1-15/+39
If a NFSv4 client uses the cache_consistency_bitmask in order to request only information about the change attribute, timestamps and size, then it has not revalidated all attributes, and hence the attribute timeout timestamp should not be updated. Reported-by: Donald Buczek <buczek@molgen.mpg.de> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28nfs: handle request add failure properlyPeng Tao1-0/+6
When we fail to queue a read page to IO descriptor, we need to clean it up otherwise it is hanging around preventing nfs module from being removed. When we fail to queue a write page to IO descriptor, we need to clean it up and also save the failure status to open context. Then at file close, we can try to write pages back again and drop the page if it fails to writeback in .launder_page, which will be done in the next patch. Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-14sched/wait: Fix the signal handling fixPeter Zijlstra1-3/+3
Jan Stancek reported that I wrecked things for him by fixing things for Vladimir :/ His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which should not be possible, however my previous patch made this possible by unconditionally checking signal_pending(). We cannot use current->state as was done previously, because the instruction after the store to that variable it can be changed. We must instead pass the initial state along and use that. Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers") Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Chris Mason <clm@fb.com> Tested-by: Jan Stancek <jstancek@redhat.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Chris Mason <clm@fb.com> Reviewed-by: Paul Turner <pjt@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: tglx@linutronix.de Cc: Oleg Nesterov <oleg@redhat.com> Cc: hpa@zytor.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-09teach nfs_get_link() to work in RCU modeAl Viro1-0/+21
based upon the corresponding patch from Neil's March patchset, again with kmap-related horrors removed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-09don't put symlink bodies in pagecache into highmemAl Viro1-2/+3
kmap() in page_follow_link_light() needed to go - allowing to hold an arbitrary number of kmaps for long is a great way to deadlocking the system. new helper (inode_nohighmem(inode)) needs to be used for pagecache symlinks inodes; done for all in-tree cases. page_follow_link_light() instrumented to yell about anything missed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-11-25nfs: if we have no valid attrs, then don't declare the attribute cache validJeff Layton1-1/+5
If we pass in an empty nfs_fattr struct to nfs_update_inode, it will (correctly) not update any of the attributes, but it then clears the NFS_INO_INVALID_ATTR flag, which indicates that the attributes are up to date. Don't clear the flag if the fattr struct has no valid attrs to apply. Reviewed-by: Steve French <steve.french@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-25nfs: ensure that attrcache is revalidated after a SETATTRJeff Layton1-1/+4
If we get no post-op attributes back from a SETATTR operation, then no attributes will of course be updated during the call to nfs_update_inode. We know however that the attributes are invalid at that point, since we just changed some of them. At the very least, the ctime will be bogus. If we get no post-op attributes back on the call, mark the attrcache invalid to reflect that fact. Reviewed-by: Steve French <steve.french@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-04Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files"Trond Myklebust1-1/+6
This reverts commit f895c53f8ace3c3e49ebf9def90e63fc6d46d2bf. This commit causes a NFSv4 regression in that close()+unlink() can end up failing. The reason is that we no longer have a guarantee that the CLOSE has completed on the server, meaning that the subsequent call to REMOVE may fail with NFS4ERR_FILE_OPEN if the server implements Windows unlink() semantics. Reported-by: <Olga Kornievskaia <aglo@umich.edu> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-09-04NFS: Optimise away the close-to-open getattr if there is no cached dataTrond Myklebust1-3/+10
If there is no cached data, then there is no need to track the file change attribute on close. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-28NFS: Check size by inode_newsize_ok in nfs_setattrKinglong Mee1-8/+10
Set rlimit for NFS's files is useless right now. For local process's rlimit, it should be checked by nfs client. The same, CIFS also call inode_change_ok checking rlimit at its client in cifs_setattr_nounix() and cifs_setattr_unix(). v3, fix bad using of error Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-25NFSv4: Force a post-op attribute update when holding a delegationTrond Myklebust1-0/+7
If the ctime or mtime or change attribute have changed because of an operation we initiated, we should make sure that we force an attribute update. However we do not want to mark the page cache for revalidation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.0+
2015-08-17NFS: Don't let the ctime override attribute barriers.Trond Myklebust1-8/+0
Chuck reports seeing cases where a GETATTR that happens to race with an asynchronous WRITE is overriding the file size, despite the attribute barrier being set by the writeback code. The culprit turns out to be the check in nfs_ctime_need_update(), which sees that the ctime is newer than the cached ctime, and assumes that it is safe to override the attribute barrier. This patch removes that override, and ensures that attribute barriers are always respected. Reported-by: Chuck Lever <chuck.lever@oracle.com> Fixes: a08a8cd375db9 ("NFS: Add attribute update barriers to NFS writebacks") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-08-17NFS: Remove nfs_release()Anna Schumaker1-7/+1
And call nfs_file_clear_open_context() directly. This makes it obvious that nfs_file_release() will always return 0. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-23NFS: Remove the "NFS_CAP_CHANGE_ATTR" capabilityTrond Myklebust1-2/+2
Setting the change attribute has been mandatory for all NFS versions, since commit 3a1556e8662c ("NFSv2/v3: Simulate the change attribute"). We should therefore not have anything be conditional on it being set/unset. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-23NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialisedTrond Myklebust1-1/+2
We can't allow caching of data until the change attribute has been initialised correctly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-23NFS: Don't revalidate the mapping if both size and change attr are up to dateTrond Myklebust1-4/+4
If we've ensured that the size and the change attribute are both correct, then there is no point in marking those attributes as needing revalidation again. Only do so if we know the size is incorrect and was not updated. Fixes: f2467b6f64da ("NFS: Clear NFS_INO_REVAL_PAGECACHE when...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-01nfs: Remove unneeded micro checking of CONFIG_PROC_FSKinglong Mee1-7/+3
Have checking CONFIG_PROC_FS in include/linux/sunrpc/stats.h. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-06-02NFS: report more appropriate block size for directories.NeilBrown1-0/+2
In glibc 2.21 (and several previous), a call to opendir() will result in a 32K (BUFSIZ*4) buffer being allocated and passed to getdents. However a call to fdopendir() results in an 'fstat' request to determine block size and a matching buffer allocated for subsequent use with getdents. This will typically be 1M. The first getdents call on an NFS directory will always use READDIR_PLUS (or NFSv4 equivalent) if available. Subsequent getdents calls only use this more expensive version if some 'stat' requests are made between the getdents calls. For this reason it is good to keep at least that first getdents call relatively short. When fdopendir() and readdir() is used on a large directory, it takes approximately 32 times as long to complete as using "opendir". Current versions of 'find' use fdopendir() and demonstrate this slowness. 'stat' on a directory currently returns the 'wsize'. This number has no meaning on directories. Actual READDIR requests are limited to ->dtsize, which itself is capped at 4 pages, coincidently the same as BUFSIZ*4. So this is a meaningful number to use as the blocksize on directories, and has the effect of making 'find' on large directories go a lot faster. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-27Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-9/+27
Pull NFS client updates from Trond Myklebust: "Another set of mainly bugfixes and a couple of cleanups. No new functionality in this round. Highlights include: Stable patches: - Fix a regression in /proc/self/mountstats - Fix the pNFS flexfiles O_DIRECT support - Fix high load average due to callback thread sleeping Bugfixes: - Various patches to fix the pNFS layoutcommit support - Do not cache pNFS deviceids unless server notifications are enabled - Fix a SUNRPC transport reconnection regression - make debugfs file creation failure non-fatal in SUNRPC - Another fix for circular directory warnings on NFSv4 "junctioned" mountpoints - Fix locking around NFSv4.2 fallocate() support - Truncating NFSv4 file opens should also sync O_DIRECT writes - Prevent infinite loop in rpcrdma_ep_create() Features: - Various improvements to the RDMA transport code's handling of memory registration - Various code cleanups" * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits) fs/nfs: fix new compiler warning about boolean in switch nfs: Remove unneeded casts in nfs NFS: Don't attempt to decode missing directory entries Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one" NFS: Rename idmap.c to nfs4idmap.c NFS: Move nfs_idmap.h into fs/nfs/ NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h NFS: Add a stub for GETDEVICELIST nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes nfs: fix DIO good bytes calculation nfs: Fetch MOUNTED_ON_FILEID when updating an inode sunrpc: make debugfs file creation failure non-fatal nfs: fix high load average due to callback thread sleeping NFS: Reduce time spent holding the i_mutex during fallocate() NFS: Don't zap caches on fallocate() xprtrdma: Make rpcrdma_{un}map_one() into inline functions xprtrdma: Handle non-SEND completions via a callout xprtrdma: Add "open" memreg op xprtrdma: Add "destroy MRs" memreg op xprtrdma: Add "reset MRs" memreg op ...
2015-04-23nfs: Remove unneeded casts in nfsFiro Yang1-1/+1
Don't unnecessarily cast allocation return value in fs/nfs/inode.c::nfs_alloc_inode(). Signed-off-by: Firo Yang <firogm@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23nfs: Fetch MOUNTED_ON_FILEID when updating an inodeAnna Schumaker1-1/+14
2ef47eb1 (NFS: Fix use of nfs_attr_use_mounted_on_fileid()) was a good start to fixing a circular directory structure warning for NFS v4 "junctioned" mountpoints. Unfortunately, further testing continued to generate this error. My server is configured like this: anna@nfsd ~ % df Filesystem Size Used Avail Use% Mounted on /dev/vda1 9.1G 2.0G 6.5G 24% / /dev/vdc1 1014M 33M 982M 4% /exports /dev/vdc2 1014M 33M 982M 4% /exports/vol1 /dev/vdc3 1014M 33M 982M 4% /exports/vol1/vol2 anna@nfsd ~ % cat /etc/exports /exports/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/vol2 *(rw,async,no_subtree_check,no_root_squash) I've been running chown across the entire mountpoint twice in a row to hit this problem. The first run succeeds, but the second one fails with the circular directory warning along with: anna@client ~ % dmesg [Apr 3 14:28] NFS: server 192.168.100.204 error: fileid changed fsid 0:39: expected fileid 0x100080, got 0x80 WHere 0x80 is the mountpoint's fileid and 0x100080 is the mounted-on fileid. This patch fixes the issue by requesting an updated mounted-on fileid from the server during nfs_update_inode(), and then checking that the fileid stored in the nfs_inode matches either the fileid or mounted-on fileid returned by the server. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-04-23NFS: Don't zap caches on fallocate()Anna Schumaker1-1/+0
This patch adds a GETATTR to the end of ALLOCATE and DEALLOCATE operations so we can set the updated inode size and change attribute directly. DEALLOCATE will still need to release pagecache pages, so nfs42_proc_deallocate() now calls truncate_pagecache_range() before contacting the server. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>