summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_file.c
AgeCommit message (Collapse)AuthorFilesLines
2021-02-24Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-02-22Merge tag 'arm64-upstream' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - vDSO build improvements including support for building with BSD. - Cleanup to the AMU support code and initialisation rework to support cpufreq drivers built as modules. - Removal of synthetic frame record from exception stack when entering the kernel from EL0. - Add support for the TRNG firmware call introduced by Arm spec DEN0098. - Cleanup and refactoring across the board. - Avoid calling arch_get_random_seed_long() from add_interrupt_randomness() - Perf and PMU updates including support for Cortex-A78 and the v8.3 SPE extensions. - Significant steps along the road to leaving the MMU enabled during kexec relocation. - Faultaround changes to initialise prefaulted PTEs as 'old' when hardware access-flag updates are supported, which drastically improves vmscan performance. - CPU errata updates for Cortex-A76 (#1463225) and Cortex-A55 (#1024718) - Preparatory work for yielding the vector unit at a finer granularity in the crypto code, which in turn will one day allow us to defer softirq processing when it is in use. - Support for overriding CPU ID register fields on the command-line. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (85 commits) drivers/perf: Replace spin_lock_irqsave to spin_lock mm: filemap: Fix microblaze build failure with 'mmu_defconfig' arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+ arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line arm64: Defer enabling pointer authentication on boot core arm64: cpufeatures: Allow disabling of BTI from the command-line arm64: Move "nokaslr" over to the early cpufeature infrastructure KVM: arm64: Document HVC_VHE_RESTART stub hypercall arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0 arm64: Add an aliasing facility for the idreg override arm64: Honor VHE being disabled from the command-line arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line arm64: cpufeature: Add an early command-line cpufeature override facility arm64: Extract early FDT mapping from kaslr_early_init() arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding() arm64: cpufeature: Add global feature override facility arm64: Move SCTLR_EL1 initialisation to EL-agnostic code arm64: Simplify init_el2_state to be non-VHE only arm64: Move VHE-specific SPE setup to mutate_to_vhe() arm64: Drop early setting of MDSCR_EL2.TPMS ...
2021-02-21Merge tag 'xfs-5.12-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-49/+65
Pull xfs updates from Darrick Wong: "There's a lot going on this time, which seems about right for this drama-filled year. Community developers added some code to speed up freezing when read-only workloads are still running, refactored the logging code, added checks to prevent file extent counter overflow, reduced iolock cycling to speed up fsync and gc scans, and started the slow march towards supporting filesystem shrinking. There's a huge refactoring of the internal speculative preallocation garbage collection code which fixes a bunch of bugs, makes the gc scheduling per-AG and hence multithreaded, and standardizes the retry logic when we try to reserve space or quota, can't, and want to trigger a gc scan. We also enable multithreaded quotacheck to reduce mount times further. This is also preparation for background file gc, which may or may not land for 5.13. We also fixed some deadlocks in the rename code, fixed a quota accounting leak when FSSETXATTR fails, restored the behavior that write faults to an mmap'd region actually cause a SIGBUS, fixed a bug where sgid directory inheritance wasn't quite working properly, and fixed a bug where symlinks weren't working properly in ecryptfs. We also now advertise the inode btree counters feature that was introduced two cycles ago. Summary: - Fix an ABBA deadlock when renaming files on overlayfs. - Make sure that we can't overflow the inode extent counters when adding to or removing extents from a file. - Make directory sgid inheritance work the same way as all the other filesystems. - Don't drain the buffer cache on freeze and ro remount, which should reduce the amount of time if read-only workloads are continuing during the freeze. - Fix a bug where symlink size isn't reported to the vfs in ecryptfs. - Disentangle log cleaning from log covering. This refactoring sets us up for future changes to the log, though for now it simply means that we can use covering for freezes, and cleaning becomes something we only do at unmount. - Speed up file fsyncs by reducing iolock cycling. - Fix delalloc blocks leaking when changing the project id fails because of input validation errors in FSSETXATTR. - Fix oversized quota reservation when converting unwritten extents during a DAX write. - Create a transaction allocation helper function to standardize the idiom of allocating a transaction, reserving blocks, locking inodes, and reserving quota. Replace all the open-coded logic for file creation, file ownership changes, and file modifications to use them. - Actually shut down the fs if the incore quota reservations get corrupted. - Fix background block garbage collection scans to not block and to actually clean out CoW staging extents properly. - Run block gc scans when we run low on project quota. - Use the standardized transaction allocation helpers to make it so that ENOSPC and EDQUOT errors during reservation will back out, invoke the block gc scanner, and try again. This is preparation for introducing background inode garbage collection in the next cycle. - Combine speculative post-EOF block garbage collection with speculative copy on write block garbage collection. - Enable multithreaded quotacheck. - Allow sysadmins to tweak the CPU affinities and maximum concurrency levels of quotacheck and background blockgc worker pools. - Expose the inode btree counter feature in the fs geometry ioctl. - Cleanups of the growfs code in preparation for starting work on filesystem shrinking. - Fix all the bloody gcc warnings that the maintainer knows about. :P - Fix a RST syntax error. - Don't trigger bmbt corruption assertions after the fs shuts down. - Restore behavior of forcing SIGBUS on a shut down filesystem when someone triggers a mmap write fault (or really, any buffered write)" * tag 'xfs-5.12-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (85 commits) xfs: consider shutdown in bmapbt cursor delete assert xfs: fix boolreturn.cocci warnings xfs: restore shutdown check in mapped write fault path xfs: fix rst syntax error in admin guide xfs: fix incorrect root dquot corruption error when switching group/project quota types xfs: get rid of xfs_growfs_{data,log}_t xfs: rename `new' to `delta' in xfs_growfs_data_private() libxfs: expose inobtcount in xfs geometry xfs: don't bounce the iolock between free_{eof,cow}blocks xfs: expose the blockgc workqueue knobs publicly xfs: parallelize block preallocation garbage collection xfs: rename block gc start and stop functions xfs: only walk the incore inode tree once per blockgc scan xfs: consolidate the eofblocks and cowblocks workers xfs: consolidate incore inode radix tree posteof/cowblocks tags xfs: remove trivial eof/cowblocks functions xfs: hide xfs_icache_free_cowblocks xfs: hide xfs_icache_free_eofblocks xfs: relocate the eofb/cowb workqueue functions xfs: set WQ_SYSFS on all workqueues in debug mode ...
2021-02-03xfs: refactor xfs_icache_free_{eof,cow}blocks call sitesDarrick J. Wong1-2/+1
In anticipation of more restructuring of the eof/cowblocks gc code, refactor calling of those two functions into a single internal helper function, then present a new standard interface to purge speculative block preallocations and start shifting higher level code to use that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03xfs: pass flags and return gc errors from xfs_blockgc_free_quotaDarrick J. Wong1-5/+5
Change the signature of xfs_blockgc_free_quota in preparation for the next few patches. Callers can now pass EOF_FLAGS into the function to control scan parameters; and the function will now pass back any corruption errors seen while scanning, though for our retry loops we'll just try again unconditionally. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03xfs: move and rename xfs_inode_free_quota_blocks to avoid conflictsDarrick J. Wong1-1/+1
Move this function further down in the file so that later cleanups won't have to declare static functions. Change the name because we're about to rework all the code that performs garbage collection of speculatively allocated file blocks. No functional changes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-03xfs: trigger all block gc scans when low on quota spaceDarrick J. Wong1-9/+6
The functions to run an eof/cowblocks scan to try to reduce quota usage are kind of a mess -- the logic repeatedly initializes an eofb structure and there are logic bugs in the code that result in the cowblocks scan never actually happening. Replace all three functions with a single function that fills out an eofb and runs both eof and cowblocks scans. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-02-01xfs: reduce exclusive locking on unaligned dioDave Chinner1-13/+45
Attempt shared locking for unaligned DIO, but only if the the underlying extent is already allocated and in written state. On failure, retry with the existing exclusive locking. Test case is fio randrw of 512 byte IOs using AIO and an iodepth of 32 IOs. Vanilla: READ: bw=4560KiB/s (4670kB/s), 4560KiB/s-4560KiB/s (4670kB/s-4670kB/s), io=134MiB (140MB), run=30001-30001msec WRITE: bw=4567KiB/s (4676kB/s), 4567KiB/s-4567KiB/s (4676kB/s-4676kB/s), io=134MiB (140MB), run=30001-30001msec Patched: READ: bw=37.6MiB/s (39.4MB/s), 37.6MiB/s-37.6MiB/s (39.4MB/s-39.4MB/s), io=1127MiB (1182MB), run=30002-30002msec WRITE: bw=37.6MiB/s (39.4MB/s), 37.6MiB/s-37.6MiB/s (39.4MB/s-39.4MB/s), io=1128MiB (1183MB), run=30002-30002msec That's an improvement from ~18k IOPS to a ~150k IOPS, which is about the IOPS limit of the VM block device setup I'm testing on. 4kB block IO comparison: READ: bw=296MiB/s (310MB/s), 296MiB/s-296MiB/s (310MB/s-310MB/s), io=8868MiB (9299MB), run=30002-30002msec WRITE: bw=296MiB/s (310MB/s), 296MiB/s-296MiB/s (310MB/s-310MB/s), io=8878MiB (9309MB), run=30002-30002msec Which is ~150k IOPS, same as what the test gets for sub-block AIO+DIO writes with this patch. Signed-off-by: Dave Chinner <dchinner@redhat.com> [hch: rebased, split unaligned from nowait] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-01xfs: split the unaligned DIO write code outDave Chinner1-85/+85
The unaligned DIO write path is more convolted than the normal path, and we are about to make it more complex. Keep the block aligned fast path dio write code trim and simple by splitting out the unaligned DIO code from it. Signed-off-by: Dave Chinner <dchinner@redhat.com> [hch: rebased, fixed a few minor nits] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-01xfs: improve the reflink_bounce_dio_write tracepointChristoph Hellwig1-1/+1
Use a more suitable event class. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-01xfs: simplify the read/write tracepointsChristoph Hellwig1-12/+8
Pass the iocb and iov_iter to the tracepoints and leave decoding of actual arguments to the code only run when tracing is enabled. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-01xfs: remove the buffered I/O fallback assertChristoph Hellwig1-6/+0
The iomap code has been designed from the start not to do magic fallback, so remove the assert in preparation for further code cleanups. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-01xfs: cleanup the read/write helper namingChristoph Hellwig1-15/+15
Drop a few pointless aio_ prefixes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-01xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-awareChristoph Hellwig1-4/+21
Ensure we don't block on the iolock, or waiting for I/O in xfs_file_aio_write_checks if the caller asked to avoid that. Fixes: 29a5d29ec181 ("xfs: nowait aio support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-02-01xfs: factor out a xfs_ilock_iocb helperChristoph Hellwig1-26/+29
Add a helper to factor out the nowait locking logical for the read/write helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-24xfs: support idmapped mountsChristoph Hellwig1-1/+3
Enable idmapped mounts for xfs. This basically just means passing down the user_namespace argument from the VFS methods down to where it is passed to the relevant helpers. Note that full-filesystem bulkstat is not supported from inside idmapped mounts as it is an administrative operation that acts on the whole file system. The limitation is not applied to the bulkstat single operation that just operates on a single inode. Link: https://lore.kernel.org/r/20210121131959.646623-40-christian.brauner@ubuntu.com Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-23iomap: pass a flags argument to iomap_dio_rwChristoph Hellwig1-3/+2
Pass a set of flags to iomap_dio_rw instead of the boolean wait_for_completion argument. The IOMAP_DIO_FORCE_WAIT flag replaces the wait_for_completion, but only needs to be passed when the iocb isn't synchronous to start with to simplify the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> [djwong: rework xfs_file.c so that we can push iomap changes separately] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-01-23xfs: reduce ilock acquisitions in xfs_file_fsyncChristoph Hellwig1-1/+8
If the inode is not pinned by the time fsync is called we don't need the ilock to protect against concurrent clearing of ili_fsync_fields as the inode won't need a log flush or clearing of these fields. Not taking the iolock allows for full concurrency of fsync and thus O_DSYNC completions with io_uring/aio write submissions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-01-23xfs: refactor xfs_file_fsyncChristoph Hellwig1-31/+50
Factor out the log syncing logic into two helpers to make the code easier to read and more maintainable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-01-23xfs: remove a stale comment from xfs_file_aio_write_checks()Eric Biggers1-6/+0
The comment in xfs_file_aio_write_checks() about calling file_modified() after dropping the ilock doesn't make sense, because the code that unconditionally acquires and drops the ilock was removed by commit 467f78992a07 ("xfs: reduce ilock hold times in xfs_file_aio_write_checks"). Remove this outdated comment. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2021-01-20mm: Cleanup faultaround and finish_fault() codepathsKirill A. Shutemov1-2/+4
alloc_set_pte() has two users with different requirements: in the faultaround code, it called from an atomic context and PTE page table has to be preallocated. finish_fault() can sleep and allocate page table as needed. PTL locking rules are also strange, hard to follow and overkill for finish_fault(). Let's untangle the mess. alloc_set_pte() has gone now. All locking is explicit. The price is some code duplication to handle huge pages in faultaround path, but it should be fine, having overall improvement in readability. Link: https://lore.kernel.org/r/20201229132819.najtavneutnf7ajp@box Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [will: s/from from/from/ in comment; spotted by willy] Signed-off-by: Will Deacon <will@kernel.org>
2020-10-21xfs: fix fallocate functions when rtextsize is larger than 1Darrick J. Wong1-5/+35
In commit fe341eb151ec, I forgot that xfs_free_file_space isn't strictly a "remove mapped blocks" function. It is actually a function to zero file space by punching out the middle and writing zeroes to the unaligned ends of the specified range. Therefore, putting a rtextsize alignment check in that function is wrong because that breaks unaligned ZERO_RANGE on the realtime volume. Furthermore, xfs_file_fallocate already has alignment checks for the functions require the file range to be aligned to the size of a fundamental allocation unit (which is 1 FSB on the data volume and 1 rt extent on the realtime volume). Create a new helper to check fallocate arguments against the realtiem allocation unit size, fix the fallocate frontend to use it, fix free_file_space to delete the correct range, and remove a now redundant check from insert_file_space. NOTE: The realtime extent size is not required to be a power of two! Fixes: fe341eb151ec ("xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2020-09-16xfs: force the log after remapping a synchronous-writes fileDarrick J. Wong1-1/+16
Commit 5833112df7e9 tried to make it so that a remap operation would force the log out to disk if the filesystem is mounted with mandatory synchronous writes. Unfortunately, that commit failed to handle the case where the inode or the file descriptor require mandatory synchronous writes. Refactor the check into into a helper that will look for all three conditions, and now we can treat reflink just like any other synchronous write. Fixes: 5833112df7e9 ("xfs: reflink should force the log out if mounted with wsync") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-09-05xfs: don't update mtime on COW faultsMikulas Patocka1-2/+10
When running in a dax mode, if the user maps a page with MAP_PRIVATE and PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime when the user hits a COW fault. This breaks building of the Linux kernel. How to reproduce: 1. extract the Linux kernel tree on dax-mounted xfs filesystem 2. run make clean 3. run make -j12 4. run make -j12 at step 4, make would incorrectly rebuild the whole kernel (although it was already built in step 3). The reason for the breakage is that almost all object files depend on objtool. When we run objtool, it takes COW page fault on its .data section, and these faults will incorrectly update the timestamp of the objtool binary. The updated timestamp causes make to rebuild the whole tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07Merge tag 'xfs-5.9-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-6/+22
Pull xfs updates from Darrick Wong: "There are quite a few changes in this release, the most notable of which is that we've made inode flushing fully asynchronous, and we no longer block memory reclaim on this. Furthermore, we have fixed a long-standing bug in the quota code where soft limit warnings and inode limits were never tracked properly. Moving further down the line, the reflink control loops have been redesigned to behave more efficiently; and numerous small bugs have been fixed (see below). The xattr and quota code have been extensively refactored in preparation for more new features coming down the line. Finally, the behavior of DAX between ext4 and xfs has been stabilized, which gets us a step closer to removing the experimental tag from that feature. We have a few new contributors this time around. Welcome, all! I anticipate a second pull request next week for a few small bugfixes that have been trickling in, but this is it for big changes. Summary: - Fix some btree block pingponging problems when swapping extents - Redesign the reflink copy loop so that we only run one remapping operation per transaction. This helps us avoid running out of block reservation on highly deduped filesystems. - Take the MMAPLOCK around filemap_map_pages. - Make inode reclaim fully async so that we avoid stalling processes on flushing inodes to disk. - Reduce inode cluster buffer RMW cycles by attaching the buffer to dirty inodes so we won't let go of the cluster buffer when we know we're going to need it soon. - Add some more checks to the realtime bitmap file scrubber. - Don't trip false lockdep warnings in fs freeze. - Remove various redundant lines of code. - Remove unnecessary calls to xfs_perag_{get,put}. - Preserve I_VERSION state across remounts. - Fix an unmount hang due to AIL going to sleep with a non-empty delwri buffer list. - Fix an error in the inode allocation space reservation macro that caused regressions in generic/531. - Fix a potential livelock when dquot flush fails because the dquot buffer is locked. - Fix a miscalculation when reserving inode quota that could cause users to exceed a hardlimit. - Refactor struct xfs_dquot to use native types for incore fields instead of abusing the ondisk struct for this purpose. This will eventually enable proper y2038+ support, but for now it merely cleans up the quota function declarations. - Actually increment the quota softlimit warning counter so that soft failures turn into hard(er) failures when they exceed the softlimit warning counter limits set by the administrator. - Split incore dquot state flags into their own field and namespace, to avoid mixing them with quota type flags. - Create a new quota type flags namespace so that we can make it obvious when a quota function takes a quota type (user, group, project) as an argument. - Rename the ondisk dquot flags field to type, as that more accurately represents what we store in it. - Drop our bespoke memory allocation flags in favor of GFP_*. - Rearrange the xattr functions so that we no longer mix metadata updates and transaction management (e.g. rolling complex transactions) in the same functions. This work will prepare us for atomic xattr operations (itself a prerequisite for directory backrefs) in future release cycles. - Support FS_DAX_FL (aka FS_XFLAG_DAX) via GETFLAGS/SETFLAGS" * tag 'xfs-5.9-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (117 commits) fs/xfs: Support that ioctl(SETXFLAGS/GETXFLAGS) can set/get inode DAX on XFS. xfs: Lift -ENOSPC handler from xfs_attr_leaf_addname xfs: Simplify xfs_attr_node_addname xfs: Simplify xfs_attr_leaf_addname xfs: Add helper function xfs_attr_node_removename_rmt xfs: Add helper function xfs_attr_node_removename_setup xfs: Add remote block helper functions xfs: Add helper function xfs_attr_leaf_mark_incomplete xfs: Add helpers xfs_attr_is_shortform and xfs_attr_set_shortform xfs: Remove xfs_trans_roll in xfs_attr_node_removename xfs: Remove unneeded xfs_trans_roll_inode calls xfs: Add helper function xfs_attr_node_shrink xfs: Pull up xfs_attr_rmtval_invalidate xfs: Refactor xfs_attr_rmtval_remove xfs: Pull up trans roll in xfs_attr3_leaf_clearflag xfs: Factor out xfs_attr_rmtval_invalidate xfs: Pull up trans roll from xfs_attr3_leaf_setflag xfs: Refactor xfs_attr_try_sf_addname xfs: Split apart xfs_attr_leaf_addname xfs: Pull up trans handling in xfs_attr3_leaf_flipflags ...
2020-08-07Merge tag 'iomap-5.9-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-4/+4
Pull iomap updates from Darrick Wong: "The most notable changes are: - iomap no longer invalidates the page cache when performing a direct read, since doing so is unnecessary and the old directio code doesn't do that either. - iomap embraced the use of returning ENOTBLK from a direct write to trigger falling back to a buffered write since ext4 already did this and btrfs wants it for their port. - iomap falls back to buffered writes if we're doing a direct write and the page cache invalidation after the flush fails; this was necessary to handle a corner case in the btrfs port. - Remove email virus scanner detritus that was accidentally included in yesterday's pull request. Clearly I need(ed) to update my git branch checker scripts. :(" * tag 'iomap-5.9-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: fall back to buffered writes for invalidation failures xfs: use ENOTBLK for direct I/O to buffered I/O fallback iomap: Only invalidate page cache pages on direct IO writes iomap: Make sure iomap_end is called after iomap_begin
2020-08-05iomap: fall back to buffered writes for invalidation failuresChristoph Hellwig1-2/+2
Failing to invalid the page cache means data in incoherent, which is a very bad state for the system. Always fall back to buffered I/O through the page cache if we can't invalidate mappings. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4 Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2 Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
2020-08-05xfs: use ENOTBLK for direct I/O to buffered I/O fallbackChristoph Hellwig1-2/+2
This is what the classic fs/direct-io.c implementation and thuse other file systems use. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-06xfs: add an inode item lockDave Chinner1-3/+6
The inode log item is kind of special in that it can be aggregating new changes in memory at the same time time existing changes are being written back to disk. This means there are fields in the log item that are accessed concurrently from contexts that don't share any locking at all. e.g. updating ili_last_fields occurs at flush time under the ILOCK_EXCL and flush lock at flush time, under the flush lock at IO completion time, and is read under the ILOCK_EXCL when the inode is logged. Hence there is no actual serialisation between reading the field during logging of the inode in transactions vs clearing the field in IO completion. We currently get away with this by the fact that we are only clearing fields in IO completion, and nothing bad happens if we accidentally log more of the inode than we actually modify. Worst case is we consume a tiny bit more memory and log bandwidth. However, if we want to do more complex state manipulations on the log item that requires updates at all three of these potential locations, we need to have some mechanism of serialising those operations. To do this, introduce a spinlock into the log item to serialise internal state. This could be done via the xfs_inode i_flags_lock, but this then leads to potential lock inversion issues where inode flag updates need to occur inside locks that best nest inside the inode log item locks (e.g. marking inodes stale during inode cluster freeing). Using a separate spinlock avoids these sorts of problems and simplifies future code. This does not touch the use of ili_fields in the item formatting code - that is entirely protected by the ILOCK_EXCL at this point in time, so it remains untouched. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-06xfs: use MMAPLOCK around filemap_map_pages()Dave Chinner1-1/+14
The page faultround path ->map_pages is implemented in XFS via filemap_map_pages(). This function checks that pages found in page cache lookups have not raced with truncate based invalidation by checking page->mapping is correct and page->index is within EOF. However, we've known for a long time that this is not sufficient to protect against races with invalidations done by operations that do not change EOF. e.g. hole punching and other fallocate() based direct extent manipulations. The way we protect against these races is we wrap the page fault operations in a XFS_MMAPLOCK_SHARED lock so they serialise against fallocate and truncate before calling into the filemap function that processes the fault. Do the same for XFS's ->map_pages implementation to close this potential data corruption issue. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-07-06xfs: move helpers that lock and unlock two inodes against userspace IODarrick J. Wong1-1/+1
Move the double-inode locking helpers to xfs_inode.c since they're not specific to reflink. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-07-06xfs: refactor locking and unlocking two inodes against userspace IODarrick J. Wong1-1/+1
Refactor the two functions that we use to lock and unlock two inodes to block userspace from initiating IO against a file, whether via system calls or mmap activity. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-07-06xfs: fix xfs_reflink_remap_prep calling conventionsDarrick J. Wong1-1/+1
Fix the return value of xfs_reflink_remap_prep so that its return value conventions match the rest of xfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-06-22xfs: flag files as supporting buffered async readsJens Axboe1-1/+1
XFS uses generic_file_read_iter(), which already supports this. Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-09mmap locking API: convert mmap_sem commentsMichel Lespinasse1-1/+1
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-19xfs: move the per-fork nextents fields into struct xfs_iforkChristoph Hellwig1-1/+1
There are there are three extents counters per inode, one for each of the forks. Two are in the legacy icdinode and one is directly in struct xfs_inode. Switch to a single counter in the xfs_ifork structure where it uses up padding at the end of the structure. This simplifies various bits of code that just wants the number of extents counter and can now directly dereference it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-06xfs: reflink should force the log out if mounted with wsyncChristoph Hellwig1-0/+4
Reflink should force the log out to disk if the filesystem was mounted with wsync, the same as most other operations in xfs. [Note: XFS_MOUNT_WSYNC is set when the admin mounts the filesystem with either the 'wsync' or 'sync' mount options, which effectively means that we're classifying reflink/dedupe as IO operations and making them synchronous when required.] Fixes: 3fc9f5e409319 ("xfs: remove xfs_reflink_remap_range") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> [darrick: add more to the changelog] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-04-06xfs: factor out a new xfs_log_force_inode helperChristoph Hellwig1-11/+1
Create a new helper to force the log up to the last LSN touching an inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-16xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_readChristoph Hellwig1-1/+6
Direct I/O reads can also be used with RWF_NOWAIT & co. Fix the inode locking in xfs_file_dio_aio_read to take IOCB_NOWAIT into account. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22xfs: remove the mappedbno argument to xfs_da_reada_bufChristoph Hellwig1-1/+1
Replace the mappedbno argument with the simple flags for xfs_da_reada_buf and xfs_dir3_data_readahead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-31xfs: properly serialise fallocate against AIO+DIODave Chinner1-0/+30
AIO+DIO can extend the file size on IO completion, and it holds no inode locks while the IO is in flight. Therefore, a race condition exists in file size updates if we do something like this: aio-thread fallocate-thread lock inode submit IO beyond inode->i_size unlock inode ..... lock inode break layouts if (off + len > inode->i_size) new_size = off + len ..... inode_dio_wait() <blocks> ..... completes inode->i_size updated inode_dio_done() .... <wakes> <does stuff no long beyond EOF> if (new_size) xfs_vn_setattr(inode, new_size) Yup, that attempt to extend the file size in the fallocate code turns into a truncate - it removes the whatever the aio write allocated and put to disk, and reduced the inode size back down to where the fallocate operation ends. Fundamentally, xfs_file_fallocate() not compatible with racing AIO+DIO completions, so we need to move the inode_dio_wait() call up to where the lock the inode and break the layouts. Secondly, storing the inode size and then using it unchecked without holding the ILOCK is not safe; we can only do such a thing if we've locked out and drained all IO and other modification operations, which we don't do initially in xfs_file_fallocate. It should be noted that some of the fallocate operations are compound operations - they are made up of multiple manipulations that may zero data, and so we may need to flush and invalidate the file multiple times during an operation. However, we only need to lock out IO and other space manipulation operations once, as that lockout is maintained until the entire fallocate operation has been completed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-29xfs: consolidate preallocation in xfs_file_fallocateChristoph Hellwig1-8/+24
Remove xfs_zero_file_space and reorganize xfs_file_fallocate so that a single call to xfs_alloc_file_space covers all modes that preallocate blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-28xfs: use xfs_inode_buftarg in xfs_file_dio_aio_writeChristoph Hellwig1-2/+1
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-28xfs: add a xfs_inode_buftarg helperChristoph Hellwig1-7/+7
Add a new xfs_inode_buftarg helper that gets the data I/O buftarg for a given inode. Replace the existing xfs_find_bdev_for_inode and xfs_find_daxdev_for_inode helpers with this new general one and cleanup some of the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: split the iomap ops for buffered vs direct writesChristoph Hellwig1-6/+10
Instead of lots of magic conditionals in the main write_begin handler this make the intent very clear. Thing will become even better once we support delayed allocations for extent size hints and realtime allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: split out a new set of read-only iomap opsChristoph Hellwig1-3/+6
Start untangling xfs_file_iomap_begin by splitting out the read-only case into its own set of iomap_ops with a very simply iomap_begin helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-15xfs: Use iomap_dio_rw to wait for unaligned direct IOJan Kara1-8/+4
Use iomap_dio_rw() to wait for unaligned direct IO instead of opencoding the wait. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-15iomap: Allow forcing of waiting for running DIO in iomap_dio_rw()Jan Kara1-2/+3
Filesystems do not support doing IO as asynchronous in some cases. For example in case of unaligned writes or in case file size needs to be extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO in such cases, add argument to iomap_dio_rw() which makes the function wait for IO completion. This also results in executing iomap_dio_complete() inline in iomap_dio_rw() providing its return value to the caller as for ordinary sync IO. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-09-25Merge tag 'iomap-5.4-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-4/+10
Pull iomap updates from Darrick Wong: "After last week's failed pull request attempt, I scuttled everything in the branch except for the directio endio api changes, which were trivial. Everything else will simply have to wait for the next cycle. Summary: - Report both io errors and short io results to the directio endio handler. - Allow directio callers to pass an ops structure to iomap_dio_rw" * tag 'iomap-5.4-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: move the iomap_dio_rw ->end_io callback into a structure iomap: split size and error for iomap_dio_rw ->end_io
2019-09-20iomap: move the iomap_dio_rw ->end_io callback into a structureChristoph Hellwig1-1/+5
Add a new iomap_dio_ops structure that for now just contains the end_io handler. This avoid storing the function pointer in a mutable structure, which is a possible exploit vector for kernel code execution, and prepares for adding a submit_io handler that btrfs needs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>