summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)AuthorFilesLines
2016-12-13Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-blockLinus Torvalds2-5/+4
Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...
2016-11-02writeback: add wbc_to_write_flags()Jens Axboe1-6/+2
Add wbc_to_write_flags(), which returns the write modifier flags to use, based on a struct writeback_control. No functional changes in this patch, but it prepares us for factoring other wbc fields for write type. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-11-01block,fs: use REQ_* flags directlyChristoph Hellwig2-5/+8
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-24xfs: defer should abort intent items if the trans roll failsDarrick J. Wong1-12/+5
If the deferred ops transaction roll fails, we need to abort the intent items if we haven't already logged a done item for it, regardless of whether or not the deferred ops has had a transaction committed. Dave found this while running generic/388. Move the tracepoint to make it easier to track object lifetimes. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-24xfs: clear cowblocks tag when cow fork is emptiedBrian Foster1-1/+5
The background cowblocks scan job takes care of scanning for inodes with potentially lingering blocks in the cow fork and clearing them out. If the background scanner reclaims the cow fork blocks, however, it doesn't immediately clear the cowblocks tag from the inode. Instead, the inode remains tagged until the background scanner comes around again, discovers the inode cow fork has no blocks, clears the tag and fires the trace_xfs_inode_free_cowblocks_invalid() tracepoint to indicate that the inode may have been incorrectly tagged. This is not a major functional problem as the tag is ultimately cleared. Nonetheless, clear the tag when an inode cow fork is explicitly emptied to avoid the extra round trip through the background scanner and spurious "invalid" tracepoint. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-24xfs: fix up inode cowblocks tracking tracepointsBrian Foster1-4/+4
These calls are still using the eofblocks tracepoints. The cowblocks equivalents are already defined, we just aren't actually calling them. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: remove xfs_bunmapi_cowChristoph Hellwig2-23/+0
Since no one uses it anymore. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: optimize xfs_reflink_end_cowChristoph Hellwig2-64/+56
Instead of doing a full extent list search for each extent that is to be deleted using xfs_bmapi_read and then doing another one inside of xfs_bunmapi_cow use the same scheme that xfs_bumapi uses: look up the last extent to be deleted and then use the extent index to walk downward until we are outside the range to be deleted. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: optimize xfs_reflink_cancel_cow_blocksChristoph Hellwig1-28/+23
Rewrite xfs_reflink_cancel_cow_blocks so that we only do a search for the first extent in the extent list and then iterate over the remaining extents using the extent index, passing the extent we operate on directly to xfs_bmap_del_extent_delay or xfs_bmap_del_extent_cow instead of going through xfs_bunmapi and doing yet another extent list lookup. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: refactor xfs_bunmapi_cowChristoph Hellwig3-159/+225
Split out two helpers for deleting delayed or real extents from the COW fork. This allows to call them directly from xfs_reflink_cow_end_io once that function is refactored to iterate the extent tree. It will also allow to reuse the delalloc deletion from xfs_bunmapi in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: optimize writes to reflink filesChristoph Hellwig4-105/+100
Instead of reserving space as the first thing in write_begin move it past reading the extent in the data fork. That way we only have to read from the data fork once and can reuse that information for trimming the extent to the shared/unshared boundary. Additionally this allows to easily limit the actual write size to said boundary, and avoid a roundtrip on the ilock. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: don't bother looking at the refcount tree for readsChristoph Hellwig1-5/+8
There is no need to trim an extent into a shared or non-shared one, or report any flags for plain old reads. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: handle "raw" delayed extents xfs_reflink_trim_around_sharedChristoph Hellwig1-1/+2
Delalloc extents in the extent list contain the number of reserved indirect blocks in their startblock value and don't use the magic DELAYSTARTBLOCK constant. Ensure that xfs_reflink_trim_around_shared handles them properly by checking for isnullstartblock(). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: add xfs_trim_extentDarrick J. Wong2-0/+35
This helpers allows to trim an extent to a subset of it's original range while making sure the block numbers in it remain valid, In the future xfs_trim_extent and xfs_bmapi_trim_map should probably be merged in some form. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch: split from a previous patch from Darrick, moved around and added support for "raw" delayed extents"] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: merge xfs_reflink_remap_range and xfs_file_share_rangeChristoph Hellwig3-174/+143
There is no clear division of responsibility between those functions, so just merge them into one to keep the code simple. Also move xfs_file_wait_for_io to xfs_reflink.c together with its only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: remove xfs_file_wait_for_ioChristoph Hellwig1-29/+10
filemap_write_and_wait_range operates on full pages, so there is no need for the rounding operations. Additionally this allows us to micro-optimize by skipping the second inode_dio_wait for a intra-file clone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_rangeChristoph Hellwig2-36/+41
We need the iolock protection to stabilizie the IS_SWAPFILE and IS_IMMUTABLE values, as well as preventing new buffered writers re-dirtying the file data that we just wrote out. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: fix the same_inode check in xfs_file_share_rangeChristoph Hellwig1-1/+1
The VFS i_ino is an unsigned long, while XFS inode numbers are 64-bit wide, so checking i_ino for equality could lead to rate false positives on 32-bit architectures. Just compare the inode pointers themselves to be safe. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: remove the same fs check from xfs_file_share_rangeChristoph Hellwig1-3/+0
The VFS already does the check, and the placement of this duplicate is in the way of the following locking rework. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20libxfs: v3 inodes are only valid on crc-enabled filesystemsRoger Willcocks3-2/+14
xfs_repair was not detecting that version 3 inodes are invalid for for non-CRC filesystems. The result is specific inode corruptions go undetected and hence aren't repaired if only the version number is out of range. The core of the problem is that the XFS_DINODE_GOOD_VERSION() macro doesn't know that valid inode versions are dependent on a superblock version number. Fix this in libxfs, and propagate the new function out into the rest of xfsprogs to fix the issue. [Darrick: port to kernel from xfsprogs] Reported-by: Leslie Rhorer <lrhorer@mygrande.net> Signed-off-by: Roger Willcocks <roger@filmlight.ltd.uk> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20libxfs: clean up _calc_dquots_per_chunkDarrick J. Wong1-2/+1
The function xfs_calc_dquots_per_chunk takes a parameter in units of basic blocks. The kernel seems to get the units wrong, but userspace got 'fixed' by commenting out the unnecessary conversion. Fix both. cc: <stable@vger.kernel.org> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: unset MS_ACTIVE if mount failsDarrick J. Wong1-0/+1
As part of the inode block map intent log item recovery process, we had to set the IRECOVERY flag to prevent an unlinked inode from being truncated during the first iput call. This required us to set MS_ACTIVE so that iput puts the inode on the lru instead of immediately evicting the inode. Unfortunately, if the mount fails later on, the inodes that have been loaded (root dir and realtime) actually need to be evicted since we're aborting the mount. If we don't clear MS_ACTIVE in the failure step, those inodes are not evicted and therefore leak. The leak was found by running xfs/130 and rmmoding xfs immediately after the test. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: remove pointless error goto in xfs_bmap_remap_allocEric Sandeen1-3/+0
The commit: f65306ea xfs: map an inode's offset to an exact physical block added a pointless error0: target; remove it. Addresses-Coverity-Id: 1373865 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: don't take the IOLOCK exclusive for direct I/O page invalidationChristoph Hellwig1-68/+30
XFS historically took the iolock exclusive when invalidating pages before direct I/O operations to protect against writeback starvations. But this writeback starvation issues has been fixed a long time ago in the core writeback code, and all other file systems manage to do without the exclusive lock. Convert XFS over to avoid the exclusive lock in this case, and also move to range invalidations like done by the other file systems. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: add some 'static' annotationsEric Biggers2-3/+3
sparse reported that several variables and a function were not forward-declared anywhere and therefore should be 'static'. Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/xfs/' Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: Fix uninitialized variable in xfs_reflink_reserve_cow_range()Geert Uytterhoeven1-1/+1
with gcc 4.1.2: fs/xfs/xfs_reflink.c: In function xfs_reflink_reserve_cow_range: fs/xfs/xfs_reflink.c:327: warning: error may be used uninitialized in this function Indeed, if "count" is zero, the function will return an uninitialized error value. While "count" is unlikely to be zero, this function is called through the public iomap API. Hence fix this by preinitializing error to zero. Fixes: 2a06705cd5954030 ("xfs: create delalloc extents in CoW fork") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20xfs: remove redundant assignment of ifpColin Ian King1-1/+1
Remove redundant ifp = ifp statement, it does nothing. Found with static analysis by CoverityScan. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-14Merge tag 'xfs-reflink-for-linus-4.9-rc1' of ↵Linus Torvalds72-308/+10552
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs < XFS has gained super CoW powers! > ---------------------------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || Pull XFS support for shared data extents from Dave Chinner: "This is the second part of the XFS updates for this merge cycle. This pullreq contains the new shared data extents feature for XFS. Given the complexity and size of this change I am expecting - like the addition of reverse mapping last cycle - that there will be some follow-up bug fixes and cleanups around the -rc3 stage for issues that I'm sure will show up once the code hits a wider userbase. What it is: At the most basic level we are simply adding shared data extents to XFS - i.e. a single extent on disk can now have multiple owners. To do this we have to add new on-disk features to both track the shared extents and the number of times they've been shared. This is done by the new "refcount" btree that sits in every allocation group. When we share or unshare an extent, this tree gets updated. Along with this new tree, the reverse mapping tree needs to be updated to track each owner or a shared extent. This also needs to be updated ever share/unshare operation. These interactions at extent allocation and freeing time have complex ordering and recovery constraints, so there's a significant amount of new intent-based transaction code to ensure that operations are performed atomically from both the runtime and integrity/crash recovery perspectives. We also need to break sharing when writes hit a shared extent - this is where the new copy-on-write implementation comes in. We allocate new storage and copy the original data along with the overwrite data into the new location. We only do this for data as we don't share metadata at all - each inode has it's own metadata that tracks the shared data extents, the extents undergoing CoW and it's own private extents. Of course, being XFS, nothing is simple - we use delayed allocation for CoW similar to how we use it for normal writes. ENOSPC is a significant issue here - we build on the reservation code added in 4.8-rc1 with the reverse mapping feature to ensure we don't get spurious ENOSPC issues part way through a CoW operation. These mechanisms also help minimise fragmentation due to repeated CoW operations. To further reduce fragmentation overhead, we've also introduced a CoW extent size hint, which indicates how large a region we should allocate when we execute a CoW operation. With all this functionality in place, we can hook up .copy_file_range, .clone_file_range and .dedupe_file_range and we gain all the capabilities of reflink and other vfs provided functionality that enable manipulation to shared extents. We also added a fallocate mode that explicitly unshares a range of a file, which we implemented as an explicit CoW of all the shared extents in a file. As such, it's a huge chunk of new functionality with new on-disk format features and internal infrastructure. It warns at mount time as an experimental feature and that it may eat data (as we do with all new on-disk features until they stabilise). We have not released userspace suport for it yet - userspace support currently requires download from Darrick's xfsprogs repo and build from source, so the access to this feature is really developer/tester only at this point. Initial userspace support will be released at the same time the kernel with this code in it is released. The new code causes 5-6 new failures with xfstests - these aren't serious functional failures but things the output of tests changing slightly due to perturbations in layouts, space usage, etc. OTOH, we've added 150+ new tests to xfstests that specifically exercise this new functionality so it's got far better test coverage than any functionality we've previously added to XFS. Darrick has done a pretty amazing job getting us to this stage, and special mention also needs to go to Christoph (review, testing, improvements and bug fixes) and Brian (caught several intricate bugs during review) for the effort they've also put in. Summary: - unshare range (FALLOC_FL_UNSHARE) support for fallocate - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr interface - shared extent support for XFS - copy-on-write support for shared extents - copy_file_range support - clone_file_range support (implements reflink) - dedupe_file_range support - defrag support for reverse mapping enabled filesystems" * tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits) xfs: convert COW blocks to real blocks before unwritten extent conversion xfs: rework refcount cow recovery error handling xfs: clear reflink flag if setting realtime flag xfs: fix error initialization xfs: fix label inaccuracies xfs: remove isize check from unshare operation xfs: reduce stack usage of _reflink_clear_inode_flag xfs: check inode reflink flag before calling reflink functions xfs: implement swapext for rmap filesystems xfs: refactor swapext code xfs: various swapext cleanups xfs: recognize the reflink feature bit xfs: simulate per-AG reservations being critically low xfs: don't mix reflink and DAX mode for now xfs: check for invalid inode reflink flags xfs: set a default CoW extent size of 32 blocks xfs: convert unwritten status of reverse mappings for shared files xfs: use interval query for rmap alloc operations on shared files xfs: add shared rmap map/unmap/convert log item types xfs: increase log reservations for reflink ...
2016-10-11Merge branch 'for-linus' of ↵Linus Torvalds4-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning
2016-10-11Merge remote-tracking branch 'ovl/rename2' into for-linusAl Viro1-2/+2
2016-10-11Merge branch 'work.xattr' of ↵Linus Torvalds1-15/+0
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr updates from Al Viro: "xattr stuff from Andreas This completes the switch to xattr_handler ->get()/->set() from ->getxattr/->setxattr/->removexattr" * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Remove {get,set,remove}xattr inode operations xattr: Stop calling {get,set,remove}xattr inode operations vfs: Check for the IOP_XATTR flag in listxattr xattr: Add __vfs_{get,set,remove}xattr helpers libfs: Use IOP_XATTR flag for empty directory handling vfs: Use IOP_XATTR flag for bad-inode handling vfs: Add IOP_XATTR inode operations flag vfs: Move xattr_resolve_name to the front of fs/xattr.c ecryptfs: Switch to generic xattr handlers sockfs: Get rid of getxattr iop sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names kernfs: Switch to generic xattr handlers hfs: Switch to generic xattr handlers jffs2: Remove jffs2_{get,set,remove}xattr macros xattr: Remove unnecessary NULL attribute name check
2016-10-11xfs: convert COW blocks to real blocks before unwritten extent conversionChristoph Hellwig1-2/+2
We need to splice COW blocks we've completed in xfs_end_io_direct_write into the data fork before converting unwritten extents. Otherwise xfs_bmapi_write might first allocate blocks for any holes in the data fork, which isn't only not needed but also harmful as it might cause reserved block underruns in the transaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-10Merge branch 'work.splice_read' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull splice fixups from Al Viro: "A couple of fixups for interaction of pipe-backed iov_iter with O_DIRECT reads + constification of a couple of primitives in uio.h missed by previous rounds. Kudos to davej - his fuzzing has caught those bugs" * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: [btrfs] fix check_direct_IO() for non-iovec iterators constify iov_iter_count() and iter_is_iovec() fix ITER_PIPE interaction with direct_IO
2016-10-10Merge branch 'work.misc' of ↵Linus Torvalds6-44/+70
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...
2016-10-10fix ITER_PIPE interaction with direct_IOAl Viro1-1/+1
by making sure we call iov_iter_advance() on original iov_iter even if direct_IO (done on its copy) has returned 0. It's a no-op for old iov_iter flavours and does the right thing (== truncation of the stuff we'd allocated, but not filled) in ITER_PIPE case. Failures (e.g. -EIO) get caught and dealt with by cleanup in generic_file_read_iter(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-10xfs: rework refcount cow recovery error handlingDarrick J. Wong1-11/+9
The error handling in xfs_refcount_recover_cow_leftovers is confused and can potentially leak memory, so rework it to release resources correctly on error. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-10xfs: clear reflink flag if setting realtime flagDarrick J. Wong1-2/+2
Since we can only turn on the rt flag if there are no data extents, we can safely turn off the reflink flag if the rt flag is being turned on. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-10xfs: fix error initializationDarrick J. Wong1-1/+1
Eric Sandeen reported a gcc complaint about uninitialized error variables, so fix that. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-10xfs: fix label inaccuraciesDarrick J. Wong1-4/+4
Since we don't unlock anything on the way out, change the label. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-10xfs: remove isize check from unshare operationDarrick J. Wong1-12/+6
Now that fallocate has an explicit unshare flag again, let's try to remove the inode reflink flag whenever the user unshares any part of a file since checking is cheap compared to the CoW. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-10xfs: reduce stack usage of _reflink_clear_inode_flagDarrick J. Wong1-24/+16
The loop in _reflink_clear_inode_flag isn't necessary since we jump out if any part of any extent is shared. Remove the loop and we no longer need two maps, so we can save some stack use. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-10xfs: check inode reflink flag before calling reflink functionsDarrick J. Wong2-7/+9
There are a couple of places where we don't check the inode's reflink flag before calling into the reflink code. Fix those, and add some asserts so we don't make this mistake again. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-08Merge remote-tracking branch 'jk/vfs' into work.miscAl Viro6-44/+70
2016-10-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+1
Merge updates from Andrew Morton: - fsnotify updates - ocfs2 updates - all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits) console: don't prefer first registered if DT specifies stdout-path cred: simpler, 1D supplementary groups CREDITS: update Pavel's information, add GPG key, remove snail mail address mailmap: add Johan Hovold .gitattributes: set git diff driver for C source code files uprobes: remove function declarations from arch/{mips,s390} spelling.txt: "modeled" is spelt correctly nmi_backtrace: generate one-line reports for idle cpus arch/tile: adopt the new nmi_backtrace framework nmi_backtrace: do a local dump_stack() instead of a self-NMI nmi_backtrace: add more trigger_*_cpu_backtrace() methods min/max: remove sparse warnings when they're nested Documentation/filesystems/proc.txt: add more description for maps/smaps mm, proc: fix region lost in /proc/self/smaps proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self proc: add LSM hook checks to /proc/<tid>/timerslack_ns proc: relax /proc/<tid>/timerslack_ns capability requirements meminfo: break apart a very long seq_printf with #ifdefs seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char proc: faster /proc/*/status ...
2016-10-08vfs: Remove {get,set,remove}xattr inode operationsAndreas Gruenbacher1-15/+0
These inode operations are no longer used; remove them. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-08ext2/4, xfs: call thp_get_unmapped_area() for pmd mappingsToshi Kani1-0/+1
To support DAX pmd mappings with unmodified applications, filesystems need to align an mmap address by the pmd size. Call thp_get_unmapped_area() from f_op->get_unmapped_area. Note, there is no change in behavior for a non-DAX file. Link: http://lkml.kernel.org/r/1472497881-9323-3-git-send-email-toshi.kani@hpe.com Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-08Merge branch 'work.splice_read' of ↵Linus Torvalds2-41/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS splice updates from Al Viro: "There's a bunch of branches this cycle, both mine and from other folks and I'd rather send pull requests separately. This one is the conversion of ->splice_read() to ITER_PIPE iov_iter (and introduction of such). Gets rid of a lot of code in fs/splice.c and elsewhere; there will be followups, but these are for the next cycle... Some pipe/splice-related cleanups from Miklos in the same branch as well" * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: pipe: fix comment in pipe_buf_operations pipe: add pipe_buf_steal() helper pipe: add pipe_buf_confirm() helper pipe: add pipe_buf_release() helper pipe: add pipe_buf_get() helper relay: simplify relay_file_read() switch default_file_splice_read() to use of pipe-backed iov_iter switch generic_file_splice_read() to use of ->read_iter() new iov_iter flavour: pipe-backed fuse_dev_splice_read(): switch to add_to_pipe() skb_splice_bits(): get rid of callback new helper: add_to_pipe() splice: lift pipe_lock out of splice_to_pipe() splice: switch get_iovec_page_array() to iov_iter splice_to_pipe(): don't open-code wakeup_pipe_readers() consistent treatment of EFAULT on O_DIRECT read/write
2016-10-06xfs: implement swapext for rmap filesystemsDarrick J. Wong3-3/+173
Implement swapext for filesystems that have reverse mapping. Back in the reflink patches, we augmented the bmap code with a 'REMAP' flag that updates only the bmbt and doesn't touch the allocator and implemented log redo items for those two operations. Now we can rewrite extent swapping as a (looong) series of remap operations. This is far less efficient than the fork swapping method implemented in the past, so we only switch this on for rmap. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-06xfs: refactor swapext codeDarrick J. Wong1-131/+144
Refactor the swapext function to pull out the fork swapping piece into a separate function. In the next patch we'll add in the bit we need to make it work with rmap filesystems. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-06xfs: various swapext cleanupsDarrick J. Wong1-16/+16
Replace structure typedefs with struct expressions and fix some whitespace issues that result. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>