summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_iomap.c
AgeCommit message (Collapse)AuthorFilesLines
2019-11-13xfs: convert open coded corruption check to use XFS_IS_CORRUPTDarrick J. Wong1-4/+2
Convert the last of the open coded corruption check and report idioms to use the XFS_IS_CORRUPT macro. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-11xfs: attach dquots and reserve quota blocks during unwritten conversionDarrick J. Wong1-0/+10
In xfs_iomap_write_unwritten, we need to ensure that dquots are attached to the inode and quota blocks reserved so that we capture in the quota counters any blocks allocated to handle a bmbt split. This can happen on the first unwritten extent conversion to a preallocated sparse file on a fresh mount. This was found by running generic/311 with quotas enabled. The bug seems to have been introduced in "[XFS] rework iocore infrastructure, remove some code and make it more" from ~2002? Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-10xfs: refactor "does this fork map blocks" predicateDarrick J. Wong1-2/+1
Replace the open-coded checks for whether or not an inode fork maps blocks with a macro that will implant the code for us. This helps us declutter the bmap code a bit. Note that I had to use a macro instead of a static inline function because of C header dependency problems between xfs_inode.h and xfs_inode_fork.h. Conversion was performed with the following Coccinelle script: @@ expression ip, w; @@ - XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_EXTENTS || XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_BTREE + xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_EXTENTS && XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_BTREE + !xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_BTREE || XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_EXTENTS + xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_BTREE && XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_EXTENTS + !xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - (xfs_ifork_has_extents(ip, w)) + xfs_ifork_has_extents(ip, w) @@ expression ip, w; @@ - (!xfs_ifork_has_extents(ip, w)) + !xfs_ifork_has_extents(ip, w) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-03xfs: simplify the xfs_iomap_write_direct callingChristoph Hellwig1-53/+29
Move the EOF alignment and checking for the next allocated extent into the callers to avoid the need to pass the byte based offset and count as well as looking at the incoming imap. The added benefit is that the caller can unlock the incoming ilock and the function doesn't have funny unbalanced locking contexts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: remove the extsize argument to xfs_eof_alignmentChristoph Hellwig1-15/+13
And move the code dependent on it to the one caller that cares instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: mark xfs_eof_alignment staticChristoph Hellwig1-1/+1
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-03xfs: simplify xfs_iomap_eof_align_last_fsbChristoph Hellwig1-25/+22
By open coding xfs_bmap_last_extent instead of calling it through a double indirection we don't need to handle an error return that can't happen given that we are guaranteed to have the extent list in memory already. Also simplify the calling conventions a little and move the extent list assert from the only caller into the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-29xfs: rename the XFS_MOUNT_DFLT_IOSIZE option toChristoph Hellwig1-2/+2
Make the flag match the mount option and usage. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-29xfs: rename the m_writeio_* fields in struct xfs_mountChristoph Hellwig1-8/+8
Use the allocsize name to match the mount option and usage instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-28xfs: add a xfs_inode_buftarg helperChristoph Hellwig1-4/+7
Add a new xfs_inode_buftarg helper that gets the data I/O buftarg for a given inode. Replace the existing xfs_find_bdev_for_inode and xfs_find_daxdev_for_inode helpers with this new general one and cleanup some of the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-24xfs: don't set bmapi total block req where minleft isBrian Foster1-2/+2
xfs_bmapi_write() takes a total block requirement parameter that is passed down to the block allocation code and is used to specify the total block requirement of the associated transaction. This is used to try and select an AG that can not only satisfy the requested extent allocation, but can also accommodate subsequent allocations that might be required to complete the transaction. For example, additional bmbt block allocations may be required on insertion of the resulting extent to an inode data fork. While it's important for callers to calculate and reserve such extra blocks in the transaction, it is not necessary to pass the total value to xfs_bmapi_write() in all cases. The latter automatically sets minleft to ensure that sufficient free blocks remain after the allocation attempt to expand the format of the associated inode (i.e., such as extent to btree conversion, btree splits, etc). Therefore, any callers that pass a total block requirement of the bmap mapping length plus worst case bmbt expansion essentially specify the additional reservation requirement twice. These callers can pass a total of zero to rely on the bmapi minleft policy. Beyond being superfluous, the primary motivation for this change is that the total reservation logic in the bmbt code is dubious in scenarios where minlen < maxlen and a maxlen extent cannot be allocated (which is more common for data extent allocations where contiguity is not required). The total value is based on maxlen in the xfs_bmapi_write() caller. If the bmbt code falls back to an allocation between minlen and maxlen, that allocation will not succeed until total is reset to minlen, which essentially throws away any additional reservation included in total by the caller. In addition, the total value is not reset until after alignment is dropped, which means that such callers drop alignment far too aggressively than necessary. Update all callers of xfs_bmapi_write() that pass a total block value of the mapping length plus bmbt reservation to instead pass zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation requirement. This trades off slightly less conservative AG selection for the ability to preserve alignment in more scenarios. xfs_bmapi_write() callers that incorporate unrelated or additional reservations in total beyond what is already included in minleft must continue to use the former. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: improve the IOMAP_NOWAIT check for COW inodesChristoph Hellwig1-18/+5
Only bail out once we know that a COW allocation is actually required, similar to how we handle normal data fork allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: cleanup xfs_direct_write_iomap_beginChristoph Hellwig1-42/+46
Move more checks into the helpers that determine if we need a COW operation or allocation and split the return path for when an existing data for allocation has been found versus a new allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: rename the whichfork variable in xfs_buffered_write_iomap_beginChristoph Hellwig1-11/+11
Renaming whichfork to allocfork in xfs_buffered_write_iomap_begin makes the usage of this variable a little more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: split the iomap ops for buffered vs direct writesChristoph Hellwig1-40/+21
Instead of lots of magic conditionals in the main write_begin handler this make the intent very clear. Thing will become even better once we support delayed allocations for extent size hints and realtime allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: move xfs_file_iomap_begin_delay aroundChristoph Hellwig1-213/+221
Move xfs_file_iomap_begin_delay near the end of the file next to the other iomap functions to prepare for additional refactoring. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: split out a new set of read-only iomap opsChristoph Hellwig1-16/+47
Start untangling xfs_file_iomap_begin by splitting out the read-only case into its own set of iomap_ops with a very simply iomap_begin helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: factor out a helper to calculate the end_fsbChristoph Hellwig1-20/+20
We have lots of places that want to calculate the final fsb for a offset + count in bytes and check that the result fits into s_maxbytes. Factor out a helper for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: fill out the srcmap in iomap_beginChristoph Hellwig1-25/+24
Replace our local hacks to report the source block in the main iomap with the proper scrmap reporting. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor xfs_file_iomap_begin_delayChristoph Hellwig1-24/+24
Rejuggle the return path to prepare for filling out a source iomap. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: pass two imaps to xfs_reflink_allocate_cowChristoph Hellwig1-4/+4
xfs_reflink_allocate_cow consumes the source data fork imap, and potentially returns the COW fork imap. Split the arguments in two to clear up the calling conventions and to prepare for returning a source iomap from ->iomap_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: also call xfs_file_iomap_end_delalloc for zeroing operationsChristoph Hellwig1-1/+2
There is no reason not to punch out stale delalloc blocks for zeroing operations, as they otherwise behave exactly like normal writes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: use a srcmap for a read-modify-write I/OGoldwyn Rodrigues1-3/+6
The srcmap is used to identify where the read is to be performed from. It is passed to ->iomap_begin, which can fill it in if we need to read data for partially written blocks from a different location than the write target. The srcmap is only supported for buffered writes so far. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as srcmap if not set, adjust length down to srcmap end as well] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2019-10-21xfs: set IOMAP_F_NEW more carefullyChristoph Hellwig1-3/+6
Don't set IOMAP_F_NEW if we COW over an existing allocated range, as these aren't strictly new allocations. This is required to be able to use IOMAP_F_NEW to zero newly allocated blocks, which is required for the iomap code to fully support file systems that don't do delayed allocations or use unwritten extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: initialize iomap->flags in xfs_bmbt_to_iomapChristoph Hellwig1-12/+18
Currently we don't overwrite the flags field in the iomap in xfs_bmbt_to_iomap. This works fine with 0-initialized iomaps on stack, but is harmful once we want to be able to reuse an iomap in the writeback code. Replace the shared parameter with a set of initial flags an thus ensures the flags field is always reinitialized. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-17iomap: iomap that extends beyond EOF should be marked dirtyDave Chinner1-0/+7
When doing a direct IO that spans the current EOF, and there are written blocks beyond EOF that extend beyond the current write, the only metadata update that needs to be done is a file size extension. However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that there is IO completion metadata updates required, and hence we may fail to correctly sync file size extensions made in IO completion when O_DSYNC writes are being used and the hardware supports FUA. Hence when setting IOMAP_F_DIRTY, we need to also take into account whether the iomap spans the current EOF. If it does, then we need to mark it dirty so that IO completion will call generic_write_sync() to flush the inode size update to stable storage correctly. Fixes: 3460cac1ca76 ("iomap: Use FUA for pure data O_DSYNC DIO writes") Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: removed the ext4 part; they'll handle it separately] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-03xfs: add a xfs_valid_startblock helperChristoph Hellwig1-3/+3
Add a helper that validates the startblock is valid. This checks for a non-zero block on the main device, but skips that check for blocks on the realtime device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30xfs: remove XFS_TRANS_NOFSChristoph Hellwig1-1/+1
Instead of a magic flag for xfs_trans_alloc, just ensure all callers that can't relclaim through the file system use memalloc_nofs_save to set the per-task nofs flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-29xfs: remove unused header filesEric Sandeen1-3/+0
There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-25xfs: rework breaking of shared extents in xfs_file_iomap_beginDarrick J. Wong1-5/+7
Rework the data flow in xfs_file_iomap_begin where we decide if we have to break shared extents. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-25xfs: don't pass iomap flags to xfs_reflink_allocate_cowDarrick J. Wong1-6/+7
Don't pass raw iomap flags to xfs_reflink_allocate_cow; signal our intention with a boolean argument. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-02-21xfs: introduce an always_cow modeChristoph Hellwig1-10/+18
Add a mode where XFS never overwrites existing blocks in place. This is to aid debugging our COW code, and also put infatructure in place for things like possible future support for zoned block devices, which can't support overwrites. This mode is enabled globally by doing a: echo 1 > /sys/fs/xfs/debug/always_cow Note that the parameter is global to allow running all tests in xfstests easily in this mode, which would not easily be possible with a per-fs sysfs file. In always_cow mode persistent preallocations are disabled, and fallocate will fail when called with a 0 mode (with our without FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE. There are a few interesting xfstests failures when run in always_cow mode: - generic/392 fails because the bytes used in the file used to test hole punch recovery are less after the log replay. This is because the blocks written and then punched out are only freed with a delay due to the logging mechanism. - xfs/170 will fail as the already fragile file streams mechanism doesn't seem to interact well with the COW allocator - xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim the file system is badly fragmented, but there is not much we can do to avoid that when always writing out of place - xfs/205 fails because overwriting a file in always_cow mode will require new space allocation and the assumption in the test thus don't work anymore. - xfs/326 fails to modify the file at all in always_cow mode after injecting the refcount error, leading to an unexpected md5sum after the remount, but that again is expected Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-21xfs: report IOMAP_F_SHARED from xfs_file_iomap_begin_delayChristoph Hellwig1-3/+4
No user of it in the iomap code at the moment, but we should not actively report wrong information if we can trivially get it right. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-21xfs: merge COW handling into xfs_file_iomap_begin_delayChristoph Hellwig1-39/+94
Besides simplifying the code a bit this allows to actually implement the behavior of using COW preallocation for non-COW data mentioned in the current comments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-21xfs: don't use delalloc extents for COW on files with extsize hintsChristoph Hellwig1-11/+18
While using delalloc for extsize hints is generally a good idea, the current code that does so only for COW doesn't help us much and creates a lot of special cases. Switch it to use real allocations like we do for direct I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-21xfs: fix SEEK_DATA for speculative COW fork preallocationChristoph Hellwig1-0/+86
We speculatively allocate extents in the COW fork to reduce fragmentation. But when we write data into such COW fork blocks that do now shadow an allocation in the data fork SEEK_DATA will not correctly report it, as it only looks at the data fork extents. The only reason why that hasn't been an issue so far is because we even use these speculative COW fork preallocations over holes in the data fork at all for buffered writes, and blocks in the COW fork that are written by direct writes are moved into the data fork immediately at I/O completion time. Add a new set of iomap_ops for SEEK_HOLE/SEEK_DATA which looks into both the COW and data fork, and reports all COW extents as unwritten to the iomap layer. While this isn't strictly true for COW fork extents that were already converted to real extents, the practical semantics that you can't read data from them until they are moved into the data fork are very similar, and this will force the iomap layer into probing the extents for actually present data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-21xfs: make xfs_bmbt_to_iomap more usefulChristoph Hellwig1-46/+38
Move checking for invalid zero blocks and setting of various iomap flags into this helper. Also make it deal with "raw" delalloc extents to avoid clutter in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-17xfs: move xfs_iomap_write_allocate to xfs_aops.cChristoph Hellwig1-81/+0
This function is a small wrapper only used by the writeback code, so move it together with the writeback code and simplify it down to the glorified do { } while loop that is now is. A few bits intentionally got lost here: no need to call xfs_qm_dqattach because quotas are always attached when we create the delalloc reservation, and no need for the imap->br_startblock == 0 check given that xfs_bmapi_convert_delalloc already has a WARN_ON_ONCE for exactly that condition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-17xfs: move stat accounting to xfs_bmapi_convert_delallocChristoph Hellwig1-4/+0
This way we can actually count how many bytes got converted and how many calls we need, unlike in the caller which doesn't have the detailed view. Note that this includes a slight change in behavior as the xs_xstrat_quick is now bumped for every allocation instead of just the one covering the requested writeback offset, which makes a lot more sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-17xfs: move transaction handling to xfs_bmapi_convert_delallocChristoph Hellwig1-28/+4
No need to deal with the transaction and the inode locking in the caller. Note that we also switch to passing whichfork as the second paramter, matching what most related functions do. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-17xfs: remove the io_type field from the writeback context and ioendChristoph Hellwig1-4/+4
The io_type field contains what is basically a summary of information from the inode fork and the imap. But we can just as easily use that information directly, simplifying a few bits here and there and improving the trace points. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-12xfs: use the latest extent at writeback delalloc conversion timeBrian Foster1-113/+57
The writeback delalloc conversion code is racy with respect to changes in the currently cached file mapping outside of the current page. This is because the ilock is cycled between the time the caller originally looked up the mapping and across each real allocation of the provided file range. This code has collected various hacks over the years to help combat the symptoms of these races (i.e., truncate race detection, allocation into hole detection, etc.), but none address the fundamental problem that the imap may not be valid at allocation time. Rather than continue to use race detection hacks, update writeback delalloc conversion to a model that explicitly converts the delalloc extent backing the current file offset being processed. The current file offset is the only block we can trust to remain once the ilock is dropped because any operation that can remove the block (truncate, hole punch, etc.) must flush and discard pagecache pages first. Modify xfs_iomap_write_allocate() to use the xfs_bmapi_delalloc() mechanism to request allocation of the entire delalloc extent backing the current offset instead of assuming the extent passed by the caller is unchanged. Record the range specified by the caller and apply it to the resulting allocated extent so previous checks by the caller for COW fork overlap are not lost. Finally, overload the bmapi delalloc flag with the range reval flag behavior since this is the only use case for both. This ensures that writeback always picks up the correct and current extent associated with the page, regardless of races with other extent modifying operations. If operating on a data fork and the COW overlap state has changed since the ilock was cycled, the caller revalidates against the COW fork sequence number before using the imap for the next block. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-12xfs: validate writeback mapping using data fork seq counterBrian Foster1-3/+2
The writeback code caches the current extent mapping across multiple xfs_do_writepage() calls to avoid repeated lookups for sequential pages backed by the same extent. This is known to be slightly racy with extent fork changes in certain difficult to reproduce scenarios. The cached extent is trimmed to within EOF to help avoid the most common vector for this problem via speculative preallocation management, but this is a band-aid that does not address the fundamental problem. Now that we have an xfs_ifork sequence counter mechanism used to facilitate COW writeback, we can use the same mechanism to validate consistency between the data fork and cached writeback mappings. On its face, this is somewhat of a big hammer approach because any change to the data fork invalidates any mapping currently cached by a writeback in progress regardless of whether the data fork change overlaps with the range under writeback. In practice, however, the impact of this approach is minimal in most cases. First, data fork changes (delayed allocations) caused by sustained sequential buffered writes are amortized across speculative preallocations. This means that a cached mapping won't be invalidated by each buffered write of a common file copy workload, but rather only on less frequent allocation events. Second, the extent tree is always entirely in-core so an additional lookup of a usable extent mostly costs a shared ilock cycle and in-memory tree lookup. This means that a cached mapping reval is relatively cheap compared to the I/O itself. Third, spurious invalidations don't impact ioend construction. This means that even if the same extent is revalidated multiple times across multiple writepage instances, we still construct and submit the same size ioend (and bio) if the blocks are physically contiguous. Update struct xfs_writepage_ctx with a new field to hold the sequence number of the data fork associated with the currently cached mapping. Check the wpc seqno against the data fork when the mapping is validated and reestablish the mapping whenever the fork has changed since the mapping was cached. This ensures that writeback always uses a valid extent mapping and thus prevents lost writebacks and stale delalloc block problems. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-10-18xfs: remove the unused trimmed argument from xfs_reflink_trim_around_sharedChristoph Hellwig1-3/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18xfs: remove the unused shared argument to xfs_reflink_reserve_cowChristoph Hellwig1-4/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18xfs: handle zeroing in xfs_file_iomap_begin_delayChristoph Hellwig1-6/+38
We only need to allocate blocks for zeroing for reflink inodes, and for we currently have a special case for reflink files in the otherwise direct I/O path that I'd like to get rid of. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-08-07xfs: use WRITE_ONCE to update if_seqChristoph Hellwig1-1/+2
This adds ordering of the updates and makes sure we always see the if_seq update before the extent tree is modified. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-03xfs: automatic dfops inode reloggingBrian Foster1-3/+0
Inodes that are held across deferred operations are explicitly joined to the dfops structure to ensure appropriate relogging. While inodes are currently joined explicitly, we can detect the conditions that require relogging at dfops finish time by inspecting the transaction item list for inodes with ili_lock_flags == 0. Replace the xfs_defer_ijoin() infrastructure with such detection and automatic relogging of held inodes. This eliminates the need for the per-dfops inode list, replaced by an on-stack variant in xfs_defer_trans_roll(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-08-03xfs: add missing defer ijoins for held inodesBrian Foster1-0/+3
Log items that require relogging during deferred operations processing are explicitly joined to the associated dfops via the xfs_defer_*join() helpers. These calls imply that the associated object is "held" by the transaction such that when rolled, the item can be immediately joined to a follow up transaction. For buffers, this means the buffer remains locked and held after each roll. For inodes, this means that the inode remains locked. Failure to join a held item to the dfops structure means the associated object pins the tail of the log while dfops processing completes, because the item never relogs and is not unlocked or released until deferred processing completes. Currently, all buffers that are held in transactions (XFS_BLI_HOLD) with deferred operations are explicitly joined to the dfops. This is not the case for inodes, however, as various contexts defer operations to transactions with held inodes without explicit joins to the associated dfops (and thus not relogging). While this is not a catastrophic problem, it is not ideal. Given that we want to eventually relog such items automatically during dfops processing, start by explicitly adding these missing xfs_defer_ijoin() calls. A call is added everywhere an inode is joined to a transaction without transferring lock ownership and said transaction runs deferred operations. All xfs_defer_ijoin() calls will eventually be replaced by automatic dfops inode relogging. This patch essentially implements the behavior change that would otherwise occur due to automatic inode dfops relogging. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-31xfs: avoid COW fork extent lookups in writeback if the fork didn't changeChristoph Hellwig1-1/+4
Used the per-fork sequence counter to avoid lookups in the writeback code unless the COW fork actually changed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>