summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_bmap_util.c
AgeCommit message (Collapse)AuthorFilesLines
2018-07-12xfs: remove xfs_bmapi_write() dfops paramBrian Foster1-1/+1
Now that all callers use ->t_dfops, the xfs_bmapi_write() dfops parameter is no longer necessary. Remove it and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-12xfs: use ->t_dfops for all xfs_bmapi_write() callersBrian Foster1-2/+3
Attach ->t_dfops for all remaining callers of xfs_bmapi_write(). This prepares the latter to no longer require a separate dfops parameter. Note that xfs_symlink() already uses ->t_dfops. Fix up the local references for consistency. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-12xfs: move locking into xfs_bmap_punch_delalloc_rangeChristoph Hellwig1-4/+5
Both callers want the same looking, so do it only once. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-24xfs: ensure post-EOF zeroing happens after zeroing part of a fileDarrick J. Wong1-1/+16
If a user asks us to zero_range part of a file, the end of the range is EOF, and not aligned to a page boundary, invoke writeback of the EOF page to ensure that the post-EOF part of the page is zeroed. This ensures that we don't expose stale memory contents via mmap, if in a clumsy manner. Found by running generic/127 when it runs zero_range and mapread at EOF one after the other. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-06-24xfs: don't allow insert-range to shift extents past the maximum offsetDarrick J. Wong1-0/+4
Zorro Lang reports that generic/485 blows an assert on a filesystem with 512 byte blocks. The test tries to fallocate a post-eof extent at the maximum file size and calls insert range to shift the extents right by two blocks. On a 512b block filesystem this causes startoff to overflow the 54-bit startoff field, leading to the assert. Therefore, always check the rightmost extent to see if it would overflow prior to invoking the insert range machinery. Reported-by: zlang@redhat.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200137 Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-22xfs: simplify xfs_bmap_punch_delalloc_rangeChristoph Hellwig1-53/+32
Instead of using xfs_bmapi_read to find delalloc extents and then punch them out using xfs_bunmapi, opencode the loop to iterate over the extents and call xfs_bmap_del_extent_delay directly. This both simplifies the code and reduces the number of extent tree lookups required. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-08xfs: replace do_mod with native operationsDave Chinner1-4/+8
do_mod() is a hold-over from when we have different sizes for file offsets and and other internal values for 40 bit XFS filesystems. Hence depending on build flags variables passed to do_mod() could change size. We no longer support those small format filesystems and hence everything is of fixed size theses days, even on 32 bit platforms. As such, we can convert all the do_mod() callers to platform optimised modulus operations as defined by linux/math64.h. Individual conversions depend on the types of variables being used. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-07xfs: convert to SPDX license tagsDave Chinner1-13/+1
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-16xfs: factor out nodiscard helpersBrian Foster1-2/+2
The changes to skip discards of speculative preallocation and unwritten extents introduced several new wrapper functions through the bunmapi -> extent free codepath to reduce churn in all of the associated callers. In several cases, these wrappers simply toggle a single flag to skip or not skip discards for the resulting blocks. The explicit _nodiscard() wrappers for such an isolated set of callers is a bit overkill. Kill off these wrappers and replace with the calls to the underlying functions in the contexts that need to control discard behavior. Retain the wrappers that preserve the original calling conventions to serve the original purpose of reducing code churn. This is a refactoring patch and does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: remove unnecessary xfs_qm_dqattach parameterDarrick J. Wong1-3/+3
The flags argument is always zero, get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: skip online discard during eofblocks trimsBrian Foster1-2/+2
We've had reports of online discard operations being sent from XFS on write-only workloads. These discards occur as a result of eofblocks trims that can occur after a large file copy completes. These discards are slightly confusing for users who might be paying close attention to online discards (i.e., vdo) due to performance sensitivity. They also happen to be spurious because freed post-eof blocks by definition have not been written to during the current allocation cycle. Update xfs_free_eofblocks() to skip discards that are purely attributed to eofblocks trims. This cuts down the number of spurious discards that may occur on write-only workloads due to normal preallocation activity. Note that discards of post-eof extents can still occur from other codepaths that do not isolate handling of post-eof blocks from those within eof. For example, file unlinks and truncates may still cause discards for any file blocks affected by the operation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-09xfs: non-scrub - remove unused function parametersEric Sandeen1-2/+1
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-15xfs: remove xfs_zero_rangeChristoph Hellwig1-7/+4
This helper doesn't add any real value over just calling iomap_zero_range directly, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-15xfs: fix the check for COW extents in xfs_swap_extentsChristoph Hellwig1-2/+2
i_cnextents does not include delayed allocated extents, so switch to the inode fork size check that we already use in other places instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-12xfs: account format bouncing into rmapbt swapext tx reservationBrian Foster1-9/+20
The extent swap mechanism requires a unique implementation for rmapbt enabled filesystems. Because the rmapbt tracks extent owner information, extent swap must individually unmap and remap each extent between the two inodes. The rmapbt extent swap transaction block reservation currently accounts for the worst case bmapbt block and rmapbt block consumption based on the extent count of each inode. There is a corner case that exists due to the extent swap implementation that is not covered by this reservation, however. If one of the associated inodes is just over the max extent count used for extent format inodes (i.e., the inode is in btree format by a single extent), the unmap/remap cycle of the extent swap can bounce the inode between extent and btree format multiple times, almost as many times as there are extents in the inode (if the opposing inode happens to have one less, for example). Each back and forth cycle involves a block free and allocation, which isn't a problem except for that the initial transaction reservation must account for the total number of block allocations performed by the chain of deferred operations. If not, a block reservation overrun occurs and the filesystem shuts down. Update the rmapbt extent swap block reservation to check for this situation and add some block reservation slop to ensure the entire operation succeeds. We'd never likely require reservation for both inodes as fsr wouldn't defrag the file in that case, but the additional reservation is constrained by the data fork size so be cautious and check for both. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-01-29xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modesDarrick J. Wong1-2/+2
Refactor xfs_lock_two_inodes to take separate locking modes for each inode. Specifically, this enables us to take a SHARED lock on one inode and an EXCL lock on the other. The lock class (MMAPLOCK/ILOCK) must be the same for each inode. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-11-06xfs: remove support for inlining data/extents into the inode forkChristoph Hellwig1-15/+0
Supporting a small bit of data inside the inode fork blows up the fork size a lot, removing the 32 bytes of inline data halves the effective size of the inode fork (and it still has a lot of unused padding left), and the performance of a single kmalloc doesn't show up compared to the size to read an inode or create one. It also simplifies the fork management code a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-06xfs: introduce the xfs_iext_cursor abstractionChristoph Hellwig1-5/+7
Add a new xfs_iext_cursor structure to hide the direct extent map index manipulations. In addition to the existing lookup/get/insert/ remove and update routines new primitives to get the first and last extent cursor, as well as moving up and down by one extent are provided. Also new are convenience to increment/decrement the cursor and retreive the new extent, as well as to peek into the previous/next extent without updating the cursor and last but not least a macro to iterate over all extents in a fork. [darrick: rename for_each_iext to for_each_xfs_iext] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-27xfs: add asserts for the mmap lock in xfs_{insert,collapse}_file_spaceChristoph Hellwig1-0/+4
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-27xfs: split xfs_bmap_shift_extentsChristoph Hellwig1-8/+6
Have a separate helper for insert vs collapse, as this prepares us for simplifying the code in the next patches. Also changed the done output argument to a bool intead of int for both new functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-27xfs: remove XFS_BMAP_MAX_SHIFT_EXTENTSChristoph Hellwig1-12/+2
The define was always set to 1, which means looping until we reach is was dead code from the start. Also remove an initialization of next_fsb for the done case that doesn't fit the new code flow - it was never checked by the caller in the done case to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-27xfs: inline xfs_shift_file_space into callersChristoph Hellwig1-90/+102
The code is sufficiently different for the insert vs collapse cases both in xfs_shift_file_space itself and the callers that untangling them will make life a lot easier down the road. We still keep a common helper for flushing all data and COW state to get the inode into the right shape for shifting the extents around. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-27xfs: simplify the xfs_getbmap interfaceChristoph Hellwig1-29/+9
Instead of passing in a formatter callback allocate the bmap buffer in the caller and process the entries there. Additionally replace the in-kernel buffer with a new much smaller structure, and unify the implementation of the different ioctls in a single function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-27xfs: rewrite getbmap using the xfs_iext_* helpersChristoph Hellwig1-317/+208
Currently getbmap uses xfs_bmapi_read to query the extent map, and then fixes up various bits that are eventually reported to userspace. This patch instead rewrites it to use xfs_iext_lookup_extent and xfs_iext_get_extent to iteratively process the extent map. This not only avoids the need to allocate a map for the returned xfs_bmbt_irec structures but also greatly simplified the code. There are two intentional behavior changes compared to the old code: - the current code reports unwritten extents that don't directly border a written one as unwritten even when not passing the BMV_IF_PREALLOC option, contrary to the documentation. The new code requires the BMV_IF_PREALLOC flag to report the unwrittent extent bit. - The new code does never merges consecutive extents, unlike the old code that sometimes does it based on the boundaries of the xfs_bmapi_read calls. Note that the extent merging behavior was entirely undocumented. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-11xfs: move more RT specific code under CONFIG_XFS_RTDave Chinner1-0/+2
Various utility functions and interfaces that iterate internal devices try to reference the realtime device even when RT support is not compiled into the kernel. Make sure this code is excluded from the CONFIG_XFS_RT=n build, and where appropriate stub functions to return fatal errors if they ever get called when RT support is not present. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-04xfs: always swap the cow forks when swapping extentsDarrick J. Wong1-2/+22
Since the CoW fork exists as a secondary data structure to the data fork, we must always swap cow forks during swapext. We also need to swap the extent counts and reset the cowblocks tags. Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-26xfs: evict CoW fork extents when performing finsert/fcollapseDarrick J. Wong1-1/+13
When we perform an finsert/fcollapse operation, cancel all the CoW extents for the affected file offset range so that they don't end up pointing to the wrong blocks. Reported-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extentChristoph Hellwig1-11/+10
This avoids poking into the internals of the extent list. Also return the number of extents as the return value instead of an additional by reference argument, and make it available to callers outside of xfs_bmap_util.c Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01xfs: relog dirty buffers during swapext bmbt owner changeBrian Foster1-10/+47
The owner change bmbt scan that occurs during extent swap operations does not handle ordered buffer failures. Buffers that cannot be marked ordered must be physically logged so previously dirty ranges of the buffer can be relogged in the transaction. Since the bmbt scan may need to process and potentially log a large number of blocks, we can't expect to complete this operation in a single transaction. Update extent swap to use a permanent transaction with enough log reservation to physically log a buffer. Update the bmbt scan to physically log any buffers that cannot be ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the caller rolls the transaction and restarts the scan. Finally, update the bmbt scan helper function to skip bmbt blocks that already match the expected owner so they are not reprocessed after scan restarts. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix the xfs_trans_roll call] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01xfs: move bmbt owner change to last step of extent swapBrian Foster1-18/+26
The extent swap operation currently resets bmbt block owners before the inode forks are swapped. The bmbt buffers are marked as ordered so they do not have to be physically logged in the transaction. This use of ordered buffers is not safe as bmbt buffers may have been previously physically logged. The bmbt owner change algorithm needs to be updated to physically log buffers that are already dirty when/if they are encountered. This means that an extent swap will eventually require multiple rolling transactions to handle large btrees. In addition, all inode related changes must be logged before the bmbt owner change scan begins and can roll the transaction for the first time to preserve fs consistency via log recovery. In preparation for such fixes to the bmbt owner change algorithm, refactor the bmbt scan out of the extent fork swap code to the last operation before the transaction is committed. Update xfs_swap_extent_forks() to only set the inode log flags when an owner change scan is necessary. Update xfs_swap_extents() to trigger the owner change based on the inode log flags. Note that since the owner change now occurs after the extent fork swap, the inode btrees must be fixed up with the inode number of the current inode (similar to log recovery). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-09-01xfs: remove the ip argument to xfs_defer_finishChristoph Hellwig1-4/+6
And instead require callers to explicitly join the inode using xfs_defer_ijoin. Also consolidate the defer error handling in a few places using a goto label. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-06-20xfs: refactor the ifork block counting functionDarrick J. Wong1-43/+66
Refactor the inode fork block counting function to count extents for us at the same time. This will be used by the bmbt scrubber function. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-20xfs: make _bmap_count_blocks consistent wrt delalloc extent behaviorDarrick J. Wong1-10/+13
There is an inconsistency in the way that _bmap_count_blocks deals with delalloc reservations -- if the specified fork is in extents format, *count is set to the total number of blocks referenced by the in-core fork, including delalloc extents. However, if the fork is in btree format, *count is set to the number of blocks referenced by the on-disk fork, which does /not/ include delalloc extents. For the lone existing caller of _bmap_count_blocks this hasn't been an issue because the function is only used to count xattr fork blocks (where there aren't any delalloc reservations). However, when scrub comes along it will use this same function to check di_nblocks against both on-disk extent maps, so we need this behavior to be consistent. Therefore, fix _bmap_count_leaves not to include delalloc extents and remove unnecessary parameters. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-20xfs: reflink find shared should take a transactionDarrick J. Wong1-2/+2
Adapt _reflink_find_shared to take an optional transaction pointer. The inode scrubber code will need to decide (within transaction context) if a file has shared blocks. To avoid buffer deadlocks, we must pass the tp through to this function's utility calls. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-20xfs: remove double-underscore integer typesDarrick J. Wong1-12/+12
This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-06-19xfs: avoid harmless gcc-7 warningsArnd Bergmann1-2/+2
gcc-7 flags the use of integer math inside of a condition as a potential bug: fs/xfs/xfs_bmap_util.c: In function 'xfs_swap_extents_check_format': fs/xfs/xfs_bmap_util.c:1619:8: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context] fs/xfs/xfs_bmap_util.c:1629:8: error: '<<' in boolean context, did you mean '<' ? [-Werror=int-in-bool-context] There is already a helper function for testing the di_forkoff field for zero, so let's use that instead to shut up the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-16xfs: bad assertion for delalloc an extent that start at i_sizeZorro Lang1-1/+1
By run fsstress long enough time enough in RHEL-7, I find an assertion failure (harder to reproduce on linux-4.11, but problem is still there): XFS: Assertion failed: (iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c The assertion is in xfs_getbmap() funciton: if (map[i].br_startblock == DELAYSTARTBLOCK && --> map[i].br_startoff <= XFS_B_TO_FSB(mp, XFS_ISIZE(ip))) ASSERT((iflags & BMV_IF_DELALLOC) != 0); When map[i].br_startoff == XFS_B_TO_FSB(mp, XFS_ISIZE(ip)), the startoff is just at EOF. But we only need to make sure delalloc extents that are within EOF, not include EOF. Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-16xfs: BMAPX shouldn't barf on inline-format directoriesDarrick J. Wong1-2/+6
When we're fulfilling a BMAPX request, jump out early if the data fork is in local format. This prevents us from hitting a debugging check in bmapi_read and barfing errors back to userspace. The on-disk extent count check later isn't sufficient for IF_DELALLOC mode because da extents are in memory and not on disk. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-05-06Merge tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-13/+7
Pull xfs updates from Darrick Wong: "Here are the XFS changes for 4.12. The big new feature for this release is the new space mapping ioctl that we've been discussing since LSF2016, but other than that most of the patches are larger bug fixes, memory corruption prevention, and other cleanups. Summary: - various code cleanups - introduce GETFSMAP ioctl - various refactoring - avoid dio reads past eof - fix memory corruption and other errors with fragmented directory blocks - fix accidental userspace memory corruptions - publish fs uuid in superblock - make fstrim terminatable - fix race between quotaoff and in-core inode creation - avoid use-after-free when finishing up w/ buffer heads - reserve enough space to handle bmap tree resizing during cow remap" * tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits) xfs: fix use-after-free in xfs_finish_page_writeback xfs: reserve enough blocks to handle btree splits when remapping xfs: wait on new inodes during quotaoff dquot release xfs: update ag iterator to support wait on new inodes xfs: support ability to wait on new inodes xfs: publish UUID in struct super_block xfs: Allow user to kill fstrim process xfs: better log intent item refcount checking xfs: fix up quotacheck buffer list error handling xfs: remove xfs_trans_ail_delete_bulk xfs: don't use bool values in trace buffers xfs: fix getfsmap userspace memory corruption while setting OF_LAST xfs: fix __user annotations for xfs_ioc_getfsmap xfs: corruption needs to respect endianess too! xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap xfs: simplify validation of the unwritten extent bit xfs: remove unused values from xfs_exntst_t xfs: remove the unused XFS_MAXLINK_1 define xfs: more do_div cleanups ...
2017-05-01Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..
2017-04-25xfs: more do_div cleanupsEric Sandeen1-4/+1
On some architectures do_div does the pointer compare trick to make sure that we've sent it an unsigned 64-bit number. (Why unsigned? I don't know.) Fix up the few places that squawk about this; in xfs_bmap_wants_extents() we just used a bare int64_t so change that to unsigned. In xfs_adjust_extent_unmap_boundaries() all we wanted was the mod, and we have an xfs-specific function to handle that w/o side effects, which includes proper casting for do_div. In xfs_daddr_to_ag[b]no, we were using the wrong type anyway; XFS_BB_TO_FSBT returns a block in the filesystem, so use xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness from that type as a bonus. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-12xfs: drop iolock from reclaim context to appease lockdepBrian Foster1-5/+3
Lockdep complains about use of the iolock in inode reclaim context because it doesn't understand that reclaim has the last reference to the inode, and thus an iolock->reclaim->iolock deadlock is not possible. The iolock is technically not necessary in xfs_inactive() and was only added to appease an assert in xfs_free_eofblocks(), which can be called from other non-reclaim contexts. Therefore, just kill the assert and drop the use of the iolock from reclaim context to quiet lockdep. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-08block: add a flags argument to (__)blkdev_issue_zerooutChristoph Hellwig1-1/+1
Turn the existing discard flag into a new BLKDEV_ZERO_UNMAP flag with similar semantics, but without referring to diѕcard. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-04xfs: factor out a xfs_bmap_is_real_extent helperChristoph Hellwig1-4/+3
This checks for all the non-normal extent types, including handling both encodings of delayed allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-04xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of filesCalvin Owens1-1/+9
When punching past EOF on XFS, fallocate(mode=PUNCH_HOLE|KEEP_SIZE) will round the file size up to the nearest multiple of PAGE_SIZE: calvinow@vm-disks/generic-xfs-1 ~$ dd if=/dev/urandom of=test bs=2048 count=1 calvinow@vm-disks/generic-xfs-1 ~$ stat test Size: 2048 Blocks: 8 IO Block: 4096 regular file calvinow@vm-disks/generic-xfs-1 ~$ fallocate -n -l 2048 -o 2048 -p test calvinow@vm-disks/generic-xfs-1 ~$ stat test Size: 4096 Blocks: 8 IO Block: 4096 regular file Commit 3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes") replaced xfs_zero_remaining_bytes() with calls to iomap helpers. The new helpers don't enforce that [pos,offset) lies strictly on [0,i_size) when being called from xfs_free_file_space(), so by "leaking" these ranges into xfs_zero_range() we get this buggy behavior. Fix this by reintroducing the checks xfs_zero_remaining_bytes() did against i_size at the bottom of xfs_free_file_space(). Reported-by: Aaron Gao <gzh@fb.com> Fixes: 3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes") Cc: Christoph Hellwig <hch@lst.de> Cc: Brian Foster <bfoster@redhat.com> Cc: <stable@vger.kernel.org> # 4.8+ Signed-off-by: Calvin Owens <calvinowens@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-03xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of filesCalvin Owens1-1/+9
When punching past EOF on XFS, fallocate(mode=PUNCH_HOLE|KEEP_SIZE) will round the file size up to the nearest multiple of PAGE_SIZE: calvinow@vm-disks/generic-xfs-1 ~$ dd if=/dev/urandom of=test bs=2048 count=1 calvinow@vm-disks/generic-xfs-1 ~$ stat test Size: 2048 Blocks: 8 IO Block: 4096 regular file calvinow@vm-disks/generic-xfs-1 ~$ fallocate -n -l 2048 -o 2048 -p test calvinow@vm-disks/generic-xfs-1 ~$ stat test Size: 4096 Blocks: 8 IO Block: 4096 regular file Commit 3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes") replaced xfs_zero_remaining_bytes() with calls to iomap helpers. The new helpers don't enforce that [pos,offset) lies strictly on [0,i_size) when being called from xfs_free_file_space(), so by "leaking" these ranges into xfs_zero_range() we get this buggy behavior. Fix this by reintroducing the checks xfs_zero_remaining_bytes() did against i_size at the bottom of xfs_free_file_space(). Reported-by: Aaron Gao <gzh@fb.com> Fixes: 3c2bdc912a1cc050 ("xfs: kill xfs_zero_remaining_bytes") Cc: Christoph Hellwig <hch@lst.de> Cc: Brian Foster <bfoster@redhat.com> Cc: <stable@vger.kernel.org> # 4.8+ Signed-off-by: Calvin Owens <calvinowens@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-18xfs: simplify xfs_rtallocate_extentChristoph Hellwig1-9/+4
We can deduce the allocation type from the bno argument, and do the return without prod much simpler internally. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix the macro for the non-rt build] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-02-17xfs: don't reserve blocks for right shift transactionsBrian Foster1-10/+10
The block reservation for the transaction allocated in xfs_shift_file_space() is an artifact of the original collapse range support. It exists to handle the case where a collapse range occurs, the initial extent is left shifted into a location that forms a contiguous boundary with the previous extent and thus the extents are merged. This code was subsequently refactored and reused for insert range (right shift) support. If an insert range occurs under low free space conditions, the extent at the starting offset is split before the first shift transaction is allocated. If the block reservation fails, this leaves separate, but contiguous extents around in the inode. While not a fatal problem, this is unexpected and will flag a warning on subsequent insert range operations on the inode. This problem has been reproduce intermittently by generic/270 running against a ramdisk device. Since right shift does not create new extent boundaries in the inode, a block reservation for extent merge is unnecessary. Update xfs_shift_file_space() to conditionally reserve fs blocks for left shift transactions only. This avoids the warning reproduced by generic/270. Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-31xfs: remove unused full argument from bmapEric Sandeen1-4/+2
The "full" argument was used only by the fiemap formatter, which is now gone with the iomap updates. Remove the unused arg. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-31xfs: fix eofblocks race with file extending async dio writesBrian Foster1-0/+3
It's possible for post-eof blocks to end up being used for direct I/O writes. dio write performs an upfront unwritten extent allocation, sends the dio and then updates the inode size (if necessary) on write completion. If a file release occurs while a file extending dio write is in flight, it is possible to mistake the post-eof blocks for speculative preallocation and incorrectly truncate them from the inode. This means that the resulting dio write completion can discover a hole and allocate new blocks rather than perform unwritten extent conversion. This requires a strange mix of I/O and is thus not likely to reproduce in real world workloads. It is intermittently reproduced by generic/299. The error manifests as an assert failure due to transaction overrun because the aforementioned write completion transaction has only reserved enough blocks for btree operations: XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, \ file: fs/xfs//xfs_trans.c, line: 309 The root cause is that xfs_free_eofblocks() uses i_size to truncate post-eof blocks from the inode, but async, file extending direct writes do not update i_size until write completion, long after inode locks are dropped. Therefore, xfs_free_eofblocks() effectively truncates the inode to the incorrect size. Update xfs_free_eofblocks() to serialize against dio similar to how extending writes are serialized against i_size updates before post-eof block zeroing. Specifically, wait on dio while under the iolock. This ensures that dio write completions have updated i_size before post-eof blocks are processed. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>