summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2018-11-27ceph: quota: fix null pointer dereference in quota checkLuis Henriques1-1/+2
[ Upstream commit 71f2cc64d027d712f29bf8d09d3e123302d5f245 ] This patch fixes a possible null pointer dereference in check_quota_exceeded, detected by the static checker smatch, with the following warning:    fs/ceph/quota.c:240 check_quota_exceeded()     error: we previously assumed 'realm' could be null (see line 188) Fixes: b7a2921765cf ("ceph: quota: support for ceph.quota.max_files") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-27fs/exofs: fix potential memory leak in mount option parsingChengguang Xu1-1/+4
[ Upstream commit 515f1867addaba49c1c6ac73abfaffbc192c1db4 ] There are some cases can cause memory leak when parsing option 'osdname'. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-27afs: Handle EIO from delivery functionDavid Howells1-1/+4
[ Upstream commit 4ac15ea53622272c01954461b4814892b7481b40 ] Fix afs_deliver_to_call() to handle -EIO being returned by the operation delivery function, indicating that the call found itself in the wrong state, by printing an error and aborting the call. Currently, an assertion failure will occur. This can happen, say, if the delivery function falls off the end without calling afs_extract_data() with the want_more parameter set to false to collect the end of the Rx phase of a call. The assertion failure looks like: AFS: Assertion failed 4 == 7 is false 0x4 == 0x7 is false ------------[ cut here ]------------ kernel BUG at fs/afs/rxrpc.c:462! and is matched in the trace buffer by a line like: kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY Fixes: 98bf40cd99fc ("afs: Protect call->state changes against signals") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-27hfsplus: prevent btree data loss on root splitErnesto A. Fernández1-0/+4
[ Upstream commit 0a3021d4f5295aa073c7bf5c5e4de60a2e292578 ] Creating, renaming or deleting a file may cause catalog corruption and data loss. This bug is randomly triggered by xfstests generic/027, but here is a faster reproducer: truncate -s 50M fs.iso mkfs.hfsplus fs.iso mount fs.iso /mnt i=100 while [ $i -le 150 ]; do touch /mnt/$i &>/dev/null ((++i)) done i=100 while [ $i -le 150 ]; do mv /mnt/$i /mnt/$(perl -e "print $i x82") &>/dev/null ((++i)) done umount /mnt fsck.hfsplus -n fs.iso The bug is triggered whenever hfs_brec_update_parent() needs to split the root node. The height of the btree is not increased, which leaves the new node orphaned and its records lost. Link: http://lkml.kernel.org/r/26d882184fc43043a810114258f45277752186c7.1535682461.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-27hfs: prevent btree data loss on root splitErnesto A. Fernández1-0/+4
[ Upstream commit d057c036672f33d43a5f7344acbb08cf3a8a0c09 ] This bug is triggered whenever hfs_brec_update_parent() needs to split the root node. The height of the btree is not increased, which leaves the new node orphaned and its records lost. It is not possible for this to happen on a valid hfs filesystem because the index nodes have fixed length keys. For reasons I ignore, the hfs module does have support for a number of hfsplus features. A corrupt btree header may report variable length keys and trigger this bug, so it's better to fix it. Link: http://lkml.kernel.org/r/9750b1415685c4adca10766895f6d5ef12babdb0.1535682463.git.ernesto.mnd.fernandez@gmail.com Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-27reiserfs: propagate errors from fill_with_dentries() properlyJann Horn1-0/+7
[ Upstream commit b10298d56c9623f9b173f19959732d3184b35f4f ] fill_with_dentries() failed to propagate errors up to reiserfs_for_each_xattr() properly. Plumb them through. Note that reiserfs_for_each_xattr() is only used by reiserfs_delete_xattrs() and reiserfs_chown_xattrs(). The result of reiserfs_delete_xattrs() is discarded anyway, the only difference there is whether a warning is printed to dmesg. The result of reiserfs_chown_xattrs() does matter because it can block chowning of the file to which the xattrs belong; but either way, the resulting state can have misaligned ownership, so my patch doesn't improve things greatly. Credit for making me look at this code goes to Al Viro, who pointed out that the ->actor calling convention is suboptimal and should be changed. Link: http://lkml.kernel.org/r/20180802163335.83312-1-jannh@google.com Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-27cifs: fix return value for cifs_listxattrRonnie Sahlberg1-5/+6
[ Upstream commit 0c5d6cb6643f48ad3775322f3ebab6c7eb67484e ] If the application buffer was too small to fit all the names we would still count the number of bytes and return this for listxattr. This would then trigger a BUG in usercopy.c Fix the computation of the size so that we return -ERANGE correctly when the buffer is too small. This fixes the kernel BUG for xfstest generic/377 Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-27cifs: don't dereference smb_file_target before null checkColin Ian King1-2/+5
[ Upstream commit 8c6c9bed8773375b1d54ccca2911ec892c59db5d ] There is a null check on dst_file->private data which suggests it can be potentially null. However, before this check, pointer smb_file_target is derived from dst_file->private and dereferenced in the call to tlink_tcon, hence there is a potential null pointer deference. Fix this by assigning smb_file_target and target_tcon after the null pointer sanity checks. Detected by CoverityScan, CID#1475302 ("Dereference before null check") Fixes: 04b38d601239 ("vfs: pull btrfs clone API to vfs layer") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-21mm: don't reclaim inodes with many attached pagesRoman Gushchin1-2/+5
commit a76cf1a474d7dbcd9336b5f5afb0162baa142cf0 upstream. Spock reported that commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects") leads to a regression on his setup: periodically the majority of the pagecache is evicted without an obvious reason, while before the change the amount of free memory was balancing around the watermark. The reason behind is that the mentioned above change created some minimal background pressure on the inode cache. The problem is that if an inode is considered to be reclaimed, all belonging pagecache page are stripped, no matter how many of them are there. So, if a huge multi-gigabyte file is cached in the memory, and the goal is to reclaim only few slab objects (unused inodes), we still can eventually evict all gigabytes of the pagecache at once. The workload described by Spock has few large non-mapped files in the pagecache, so it's especially noticeable. To solve the problem let's postpone the reclaim of inodes, which have more than 1 attached page. Let's wait until the pagecache pages will be evicted naturally by scanning the corresponding LRU lists, and only then reclaim the inode structure. Link: http://lkml.kernel.org/r/20181023164302.20436-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Spock <dairinin@gmail.com> Tested-by: Spock <dairinin@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> [4.19.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21gfs2: Fix metadata read-ahead during truncate (2)Andreas Gruenbacher1-4/+10
commit e7445ceddfc220c1aede6d42758a5acb8844e9c3 upstream. The previous attempt to fix for metadata read-ahead during truncate was incorrect: for files with a height > 2 (1006989312 bytes with a block size of 4096 bytes), read-ahead requests were not being issued for some of the indirect blocks discovered while walking the metadata tree, leading to significant slow-downs when deleting large files. Fix that. In addition, only issue read-ahead requests in the first pass through the meta-data tree, while deallocating data blocks. Fixes: c3ce5aa9b0 ("gfs2: Fix metadata read-ahead during truncate") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21gfs2: Put bitmap buffers in put_superAndreas Gruenbacher1-1/+2
commit 10283ea525d30f2e99828978fd04d8427876a7ad upstream. gfs2_put_super calls gfs2_clear_rgrpd to destroy the gfs2_rgrpd objects attached to the resource group glocks. That function should release the buffers attached to the gfs2_bitmap objects (bi_bh), but the call to gfs2_rgrp_brelse for doing that is missing. When gfs2_releasepage later runs across these buffers which are still referenced, it refuses to free them. This causes the pages the buffers are attached to to remain referenced as well. With enough mount/unmount cycles, the system will eventually run out of memory. Fix this by adding the missing call to gfs2_rgrp_brelse in gfs2_clear_rgrpd. (Also fix a gfs2_rgrp_relse -> gfs2_rgrp_brelse typo in a comment.) Fixes: 39b0f1e92908 ("GFS2: Don't brelse rgrp buffer_heads every allocation") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: fix possibly missed wake-up after abortMiklos Szeredi1-3/+9
commit 2d84a2d19b6150c6dbac1e6ebad9c82e4c123772 upstream. In current fuse_drop_waiting() implementation it's possible that fuse_wait_aborted() will not be woken up in the unlikely case that fuse_abort_conn() + fuse_wait_aborted() runs in between checking fc->connected and calling atomic_dec(&fc->num_waiting). Do the atomic_dec_and_test() unconditionally, which also provides the necessary barrier against reordering with the fc->connected check. The explicit smp_mb() in fuse_wait_aborted() is not actually needed, since the spin_unlock() in fuse_abort_conn() provides the necessary RELEASE barrier after resetting fc->connected. However, this is not a performance sensitive path, and adding the explicit barrier makes it easier to document. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests") Cc: <stable@vger.kernel.org> #v4.19 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: fix leaked notify replyMiklos Szeredi1-1/+3
commit 7fabaf303458fcabb694999d6fa772cc13d4e217 upstream. fuse_request_send_notify_reply() may fail if the connection was reset for some reason (e.g. fs was unmounted). Don't leak request reference in this case. Besides leaking memory, this resulted in fc->num_waiting not being decremented and hence fuse_wait_aborted() left in a hanging and unkillable state. Fixes: 2d45ba381a74 ("fuse: add retrieve request") Fixes: b8f95e5d13f5 ("fuse: umount should wait for all requests") Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> #v2.6.36 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: fix use-after-free in fuse_direct_IO()Lukas Czerner1-1/+3
commit ebacb81273599555a7a19f7754a1451206a5fc4f upstream. In async IO blocking case the additional reference to the io is taken for it to survive fuse_aio_complete(). In non blocking case this additional reference is not needed, however we still reference io to figure out whether to wait for completion or not. This is wrong and will lead to use-after-free. Fix it by storing blocking information in separate variable. This was spotted by KASAN when running generic/208 fstest. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 744742d692e3 ("fuse: Add reference counting for fuse_io_priv") Cc: <stable@vger.kernel.org> # v4.6 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21nfsd: COPY and CLONE operations require the saved filehandle to be setScott Mayhew1-0/+3
commit 01310bb7c9c98752cc763b36532fab028e0f8f81 upstream. Make sure we have a saved filehandle, otherwise we'll oops with a null pointer dereference in nfs4_preprocess_stateid_op(). Signed-off-by: Scott Mayhew <smayhew@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21NFSv4: Don't exit the state manager without clearing NFS4CLNT_MANAGER_RUNNINGTrond Myklebust1-3/+5
commit 21a446cf186570168b7281b154b1993968598aca upstream. If we exit the NFSv4 state manager due to a umount, then we can end up leaving the NFS4CLNT_MANAGER_RUNNING flag set. If another mount causes the nfs4_client to be rereferenced before it is destroyed, then we end up never being able to recover state. Fixes: 47c2199b6eb5 ("NFSv4.1: Ensure state manager thread dies on last ...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21mnt: fix __detach_mounts infinite loopBenjamin Coddington1-3/+3
commit 1e9c75fb9c47a75a9aec0cd17db5f6dc36b58e00 upstream. Since commit ff17fa561a04 ("d_invalidate(): unhash immediately") immediately unhashes the dentry, we'll never return the mountpoint in lookup_mountpoint(), which can lead to an unbreakable loop in d_invalidate(). I have reports of NFS clients getting into this condition after the server removes an export of an existing mount created through follow_automount(), but I suspect there are various other ways to produce this problem if we hunt down users of d_invalidate(). For example, it is possible to get into this state by using XFS' d_invalidate() call in xfs_vn_unlink(): truncate -s 100m img{1,2} mkfs.xfs -q -n version=ci img1 mkfs.xfs -q -n version=ci img2 mkdir -p /mnt/xfs mount img1 /mnt/xfs mkdir /mnt/xfs/sub1 mount img2 /mnt/xfs/sub1 cat > /mnt/xfs/sub1/foo & umount -l /mnt/xfs/sub1 mount img2 /mnt/xfs/sub1 mount --make-private /mnt/xfs mkdir /mnt/xfs/sub2 mount --move /mnt/xfs/sub1 /mnt/xfs/sub2 rmdir /mnt/xfs/sub1 Fix this by moving the check for an unlinked dentry out of the detach_mounts() path. Fixes: ff17fa561a04 ("d_invalidate(): unhash immediately") Cc: stable@vger.kernel.org Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21mount: Prevent MNT_DETACH from disconnecting locked mountsEric W. Biederman1-1/+1
commit 9c8e0a1b683525464a2abe9fb4b54404a50ed2b4 upstream. Timothy Baldwin <timbaldwin@fastmail.co.uk> wrote: > As per mount_namespaces(7) unprivileged users should not be able to look under mount points: > > Mounts that come as a single unit from more privileged mount are locked > together and may not be separated in a less privileged mount namespace. > > However they can: > > 1. Create a mount namespace. > 2. In the mount namespace open a file descriptor to the parent of a mount point. > 3. Destroy the mount namespace. > 4. Use the file descriptor to look under the mount point. > > I have reproduced this with Linux 4.16.18 and Linux 4.18-rc8. > > The setup: > > $ sudo sysctl kernel.unprivileged_userns_clone=1 > kernel.unprivileged_userns_clone = 1 > $ mkdir -p A/B/Secret > $ sudo mount -t tmpfs hide A/B > > > "Secret" is indeed hidden as expected: > > $ ls -lR A > A: > total 0 > drwxrwxrwt 2 root root 40 Feb 12 21:08 B > > A/B: > total 0 > > > The attack revealing "Secret": > > $ unshare -Umr sh -c "exec unshare -m ls -lR /proc/self/fd/4/ 4<A" > /proc/self/fd/4/: > total 0 > drwxr-xr-x 3 root root 60 Feb 12 21:08 B > > /proc/self/fd/4/B: > total 0 > drwxr-xr-x 2 root root 40 Feb 12 21:08 Secret > > /proc/self/fd/4/B/Secret: > total 0 I tracked this down to put_mnt_ns running passing UMOUNT_SYNC and disconnecting all of the mounts in a mount namespace. Fix this by factoring drop_mounts out of drop_collected_mounts and passing 0 instead of UMOUNT_SYNC. There are two possible behavior differences that result from this. - No longer setting UMOUNT_SYNC will no longer set MNT_SYNC_UMOUNT on the vfsmounts being unmounted. This effects the lazy rcu walk by kicking the walk out of rcu mode and forcing it to be a non-lazy walk. - No longer disconnecting locked mounts will keep some mounts around longer as they stay because the are locked to other mounts. There are only two users of drop_collected mounts: audit_tree.c and put_mnt_ns. In audit_tree.c the mounts are private and there are no rcu lazy walks only calls to iterate_mounts. So the changes should have no effect except for a small timing effect as the connected mounts are disconnected. In put_mnt_ns there may be references from process outside the mount namespace to the mounts. So the mounts remaining connected will be the bug fix that is needed. That rcu walks are allowed to continue appears not to be a problem especially as the rcu walk change was about an implementation detail not about semantics. Cc: stable@vger.kernel.org Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users") Reported-by: Timothy Baldwin <timbaldwin@fastmail.co.uk> Tested-by: Timothy Baldwin <timbaldwin@fastmail.co.uk> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21mount: Don't allow copying MNT_UNBINDABLE|MNT_LOCKED mountsEric W. Biederman1-2/+8
commit df7342b240185d58d3d9665c0bbf0a0f5570ec29 upstream. Jonathan Calmels from NVIDIA reported that he's able to bypass the mount visibility security check in place in the Linux kernel by using a combination of the unbindable property along with the private mount propagation option to allow a unprivileged user to see a path which was purposefully hidden by the root user. Reproducer: # Hide a path to all users using a tmpfs root@castiana:~# mount -t tmpfs tmpfs /sys/devices/ root@castiana:~# # As an unprivileged user, unshare user namespace and mount namespace stgraber@castiana:~$ unshare -U -m -r # Confirm the path is still not accessible root@castiana:~# ls /sys/devices/ # Make /sys recursively unbindable and private root@castiana:~# mount --make-runbindable /sys root@castiana:~# mount --make-private /sys # Recursively bind-mount the rest of /sys over to /mnnt root@castiana:~# mount --rbind /sys/ /mnt # Access our hidden /sys/device as an unprivileged user root@castiana:~# ls /mnt/devices/ breakpoint cpu cstate_core cstate_pkg i915 intel_pt isa kprobe LNXSYSTM:00 msr pci0000:00 platform pnp0 power software system tracepoint uncore_arb uncore_cbox_0 uncore_cbox_1 uprobe virtual Solve this by teaching copy_tree to fail if a mount turns out to be both unbindable and locked. Cc: stable@vger.kernel.org Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users") Reported-by: Jonathan Calmels <jcalmels@nvidia.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21mount: Retest MNT_LOCKED in do_umountEric W. Biederman1-2/+8
commit 25d202ed820ee347edec0bf3bf553544556bf64b upstream. It was recently pointed out that the one instance of testing MNT_LOCKED outside of the namespace_sem is in ksys_umount. Fix that by adding a test inside of do_umount with namespace_sem and the mount_lock held. As it helps to fail fails the existing test is maintained with an additional comment pointing out that it may be racy because the locks are not held. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: fix buffer leak in __ext4_read_dirblock() on error pathVasily Averin1-0/+1
commit de59fae0043f07de5d25e02ca360f7d57bfa5866 upstream. Fixes: dc6982ff4db1 ("ext4: refactor code to read directory blocks ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.9 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: fix buffer leak in ext4_expand_extra_isize_ea() on error pathVasily Averin1-2/+5
commit 53692ec074d00589c2cf1d6d17ca76ad0adce6ec upstream. Fixes: de05ca852679 ("ext4: move call to ext4_error() into ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.17 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: fix buffer leak in ext4_xattr_move_to_block() on error pathVasily Averin1-0/+2
commit 6bdc9977fcdedf47118d2caf7270a19f4b6d8a8f upstream. Fixes: 3f2571c1f91f ("ext4: factor out xattr moving") Fixes: 6dd4ee7cab7e ("ext4: Expand extra_inodes space per ...") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.23 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: release bs.bh before re-using in ext4_xattr_block_find()Vasily Averin1-0/+2
commit 45ae932d246f721e6584430017176cbcadfde610 upstream. bs.bh was taken in previous ext4_xattr_block_find() call, it should be released before re-using Fixes: 7e01c8e5420b ("ext3/4: fix uninitialized bs in ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.26 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: fix buffer leak in ext4_xattr_get_block() on error pathVasily Averin1-1/+3
commit ecaaf408478b6fb4d9986f9b6652f3824e374f4c upstream. Fixes: dec214d00e0d ("ext4: xattr inode deduplication") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.13 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: fix possible leak of s_journal_flag_rwsem in error pathVasily Averin1-0/+1
commit af18e35bfd01e6d65a5e3ef84ffe8b252d1628c5 upstream. Fixes: c8585c6fcaf2 ("ext4: fix races between changing inode journal ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.7 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: fix possible leak of sbi->s_group_desc_leak in error pathTheodore Ts'o1-8/+8
commit 9e463084cdb22e0b56b2dfbc50461020409a5fd3 upstream. Fixes: bfe0a5f47ada ("ext4: add more mount time checks of the superblock") Reported-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.18 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: avoid possible double brelse() in add_new_gdb() on error pathTheodore Ts'o1-0/+1
commit 4f32c38b4662312dd3c5f113d8bdd459887fb773 upstream. Fixes: b40971426a83 ("ext4: add error checking to calls to ...") Reported-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.38 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizingVasily Averin1-1/+1
commit f348e2241fb73515d65b5d77dd9c174128a7fbf2 upstream. Fixes: 117fff10d7f1 ("ext4: grow the s_flex_groups array as needed ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.7 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: avoid buffer leak in ext4_orphan_add() after prior errorsVasily Averin1-1/+3
commit feaf264ce7f8d54582e2f66eb82dd9dd124c94f3 upstream. Fixes: d745a8c20c1f ("ext4: reduce contention on s_orphan_lock") Fixes: 6e3617e579e0 ("ext4: Handle non empty on-disk orphan link") Cc: Dmitry Monakhov <dmonakhov@gmail.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.34 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty()Vasily Averin1-2/+3
commit a6758309a005060b8297a538a457c88699cb2520 upstream. ext4_mark_iloc_dirty() callers expect that it releases iloc->bh even if it returns an error. Fixes: 0db1ff222d40 ("ext4: add shutdown bit and check for it") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.11 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: fix possible inode leak in the retry loop of ext4_resize_fs()Vasily Averin1-0/+4
commit db6aee62406d9fbb53315fcddd81f1dc271d49fa upstream. Fixes: 1c6bd7173d66 ("ext4: convert file system to meta_bg if needed ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.7 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: missing !bh check in ext4_xattr_inode_write()Vasily Averin1-0/+6
commit eb6984fa4ce2837dcb1f66720a600f31b0bb3739 upstream. According to Ted Ts'o ext4_getblk() called in ext4_xattr_inode_write() should not return bh = NULL The only time that bh could be NULL, then, would be in the case of something really going wrong; a programming error elsewhere (perhaps a wild pointer dereference) or I/O error causing on-disk file system corruption (although that would be highly unlikely given that we had *just* allocated the blocks and so the metadata blocks in question probably would still be in the cache). Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 4.13 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: avoid potential extra brelse in setup_new_flex_group_blocks()Vasily Averin1-6/+2
commit 9e4028935cca3f9ef9b6a90df9da6f1f94853536 upstream. Currently bh is set to NULL only during first iteration of for cycle, then this pointer is not cleared after end of using. Therefore rollback after errors can lead to extra brelse(bh) call, decrements bh counter and later trigger an unexpected warning in __brelse() Patch moves brelse() calls in body of cycle to exclude requirement of brelse() call in rollback. Fixes: 33afdcc5402d ("ext4: add a function which sets up group blocks ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.3+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: add missing brelse() add_new_gdb_meta_bg()'s error pathVasily Averin1-2/+1
commit 61a9c11e5e7a0dab5381afa5d9d4dd5ebf18f7a0 upstream. Fixes: 01f795f9e0d6 ("ext4: add online resizing support for meta_bg ...") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 3.7 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: add missing brelse() in set_flexbg_block_bitmap()'s error pathVasily Averin1-2/+4
commit cea5794122125bf67559906a0762186cf417099c upstream. Fixes: 33afdcc5402d ("ext4: add a function which sets up group blocks ...") Cc: stable@kernel.org # 3.3 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ext4: add missing brelse() update_backups()'s error pathVasily Averin1-1/+3
commit ea0abbb648452cdb6e1734b702b6330a7448fcf8 upstream. Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # 2.6.19 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21btrfs: tree-checker: Fix misleading group system informationShaokun Zhang1-1/+1
commit 761333f2f50ccc887aa9957ae829300262c0d15b upstream. block_group_err shows the group system as a decimal value with a '0x' prefix, which is somewhat misleading. Fix it to print hexadecimal, as was intended. Fixes: fce466eab7ac6 ("btrfs: tree-checker: Verify block_group_item") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21Btrfs: fix data corruption due to cloning of eof blockFilipe Manana1-2/+10
commit ac765f83f1397646c11092a032d4f62c3d478b81 upstream. We currently allow cloning a range from a file which includes the last block of the file even if the file's size is not aligned to the block size. This is fine and useful when the destination file has the same size, but when it does not and the range ends somewhere in the middle of the destination file, it leads to corruption because the bytes between the EOF and the end of the block have undefined data (when there is support for discard/trimming they have a value of 0x00). Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ export foo_size=$((256 * 1024 + 100)) $ xfs_io -f -c "pwrite -S 0x3c 0 $foo_size" /mnt/foo $ xfs_io -f -c "pwrite -S 0xb5 0 1M" /mnt/bar $ xfs_io -c "reflink /mnt/foo 0 512K $foo_size" /mnt/bar $ od -A d -t x1 /mnt/bar 0000000 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 * 0524288 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c * 0786528 3c 3c 3c 3c 00 00 00 00 00 00 00 00 00 00 00 00 0786544 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0790528 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 * 1048576 The bytes in the range from 786532 (512Kb + 256Kb + 100 bytes) to 790527 (512Kb + 256Kb + 4Kb - 1) got corrupted, having now a value of 0x00 instead of 0xb5. This is similar to the problem we had for deduplication that got recently fixed by commit de02b9f6bb65 ("Btrfs: fix data corruption when deduplicating between different files"). Fix this by not allowing such operations to be performed and return the errno -EINVAL to user space. This is what XFS is doing as well at the VFS level. This change however now makes us return -EINVAL instead of -EOPNOTSUPP for cases where the source range maps to an inline extent and the destination range's end is smaller then the destination file's size, since the detection of inline extents is done during the actual process of dropping file extent items (at __btrfs_drop_extents()). Returning the -EINVAL error is done early on and solely based on the input parameters (offsets and length) and destination file's size. This makes us consistent with XFS and anyone else supporting cloning since this case is now checked at a higher level in the VFS and is where the -EINVAL will be returned from starting with kernel 4.20 (the VFS changed was introduced in 4.20-rc1 by commit 07d19dc9fbe9 ("vfs: avoid problematic remapping requests into partial EOF block"). So this change is more geared towards stable kernels, as it's unlikely the new VFS checks get removed intentionally. A test case for fstests follows soon, as well as an update to filter existing tests that expect -EOPNOTSUPP to accept -EINVAL as well. CC: <stable@vger.kernel.org> # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21Btrfs: fix infinite loop on inode eviction after deduplication of eof blockFilipe Manana1-0/+2
commit 11023d3f5fdf89bba5e1142127701ca6e6014587 upstream. If we attempt to deduplicate the last block of a file A into the middle of a file B, and file A's size is not a multiple of the block size, we end rounding the deduplication length to 0 bytes, to avoid the data corruption issue fixed by commit de02b9f6bb65 ("Btrfs: fix data corruption when deduplicating between different files"). However a length of zero will cause the insertion of an extent state with a start value greater (by 1) then the end value, leading to a corrupt extent state that will trigger a warning and cause chaos such as an infinite loop during inode eviction. Example trace: [96049.833585] ------------[ cut here ]------------ [96049.833714] WARNING: CPU: 0 PID: 24448 at fs/btrfs/extent_io.c:436 insert_state+0x101/0x120 [btrfs] [96049.833767] CPU: 0 PID: 24448 Comm: xfs_io Not tainted 4.19.0-rc7-btrfs-next-39 #1 [96049.833768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [96049.833780] RIP: 0010:insert_state+0x101/0x120 [btrfs] [96049.833783] RSP: 0018:ffffafd2c3707af0 EFLAGS: 00010282 [96049.833785] RAX: 0000000000000000 RBX: 000000000004dfff RCX: 0000000000000006 [96049.833786] RDX: 0000000000000007 RSI: ffff99045c143230 RDI: ffff99047b2168a0 [96049.833787] RBP: ffff990457851cd0 R08: 0000000000000001 R09: 0000000000000000 [96049.833787] R10: ffffafd2c3707ab8 R11: 0000000000000000 R12: ffff9903b93b12c8 [96049.833788] R13: 000000000004e000 R14: ffffafd2c3707b80 R15: ffffafd2c3707b78 [96049.833790] FS: 00007f5c14e7d700(0000) GS:ffff99047b200000(0000) knlGS:0000000000000000 [96049.833791] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [96049.833792] CR2: 00007f5c146abff8 CR3: 0000000115f4c004 CR4: 00000000003606f0 [96049.833795] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [96049.833796] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [96049.833796] Call Trace: [96049.833809] __set_extent_bit+0x46c/0x6a0 [btrfs] [96049.833823] lock_extent_bits+0x6b/0x210 [btrfs] [96049.833831] ? _raw_spin_unlock+0x24/0x30 [96049.833841] ? test_range_bit+0xdf/0x130 [btrfs] [96049.833853] lock_extent_range+0x8e/0x150 [btrfs] [96049.833864] btrfs_double_extent_lock+0x78/0xb0 [btrfs] [96049.833875] btrfs_extent_same_range+0x14e/0x550 [btrfs] [96049.833885] ? rcu_read_lock_sched_held+0x3f/0x70 [96049.833890] ? __kmalloc_node+0x2b0/0x2f0 [96049.833899] ? btrfs_dedupe_file_range+0x19a/0x280 [btrfs] [96049.833909] btrfs_dedupe_file_range+0x270/0x280 [btrfs] [96049.833916] vfs_dedupe_file_range_one+0xd9/0xe0 [96049.833919] vfs_dedupe_file_range+0x131/0x1b0 [96049.833924] do_vfs_ioctl+0x272/0x6e0 [96049.833927] ? __fget+0x113/0x200 [96049.833931] ksys_ioctl+0x70/0x80 [96049.833933] __x64_sys_ioctl+0x16/0x20 [96049.833937] do_syscall_64+0x60/0x1b0 [96049.833939] entry_SYSCALL_64_after_hwframe+0x49/0xbe [96049.833941] RIP: 0033:0x7f5c1478ddd7 [96049.833943] RSP: 002b:00007ffe15b196a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [96049.833945] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5c1478ddd7 [96049.833946] RDX: 00005625ece322d0 RSI: 00000000c0189436 RDI: 0000000000000004 [96049.833947] RBP: 0000000000000000 R08: 00007f5c14a46f48 R09: 0000000000000040 [96049.833948] R10: 0000000000000541 R11: 0000000000000202 R12: 0000000000000000 [96049.833949] R13: 0000000000000000 R14: 0000000000000004 R15: 00005625ece322d0 [96049.833954] irq event stamp: 6196 [96049.833956] hardirqs last enabled at (6195): [<ffffffff91b00663>] console_unlock+0x503/0x640 [96049.833958] hardirqs last disabled at (6196): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c [96049.833959] softirqs last enabled at (6114): [<ffffffff92600370>] __do_softirq+0x370/0x421 [96049.833964] softirqs last disabled at (6095): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0 [96049.833965] ---[ end trace db7b05f01b7fa10c ]--- [96049.935816] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910 [96049.935822] irq event stamp: 6584 [96049.935823] hardirqs last enabled at (6583): [<ffffffff91b00663>] console_unlock+0x503/0x640 [96049.935825] hardirqs last disabled at (6584): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c [96049.935827] softirqs last enabled at (6328): [<ffffffff92600370>] __do_softirq+0x370/0x421 [96049.935828] softirqs last disabled at (6313): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0 [96049.935829] ---[ end trace db7b05f01b7fa123 ]--- [96049.935840] ------------[ cut here ]------------ [96049.936065] WARNING: CPU: 1 PID: 24463 at fs/btrfs/extent_io.c:436 insert_state+0x101/0x120 [btrfs] [96049.936107] CPU: 1 PID: 24463 Comm: umount Tainted: G W 4.19.0-rc7-btrfs-next-39 #1 [96049.936108] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [96049.936117] RIP: 0010:insert_state+0x101/0x120 [btrfs] [96049.936119] RSP: 0018:ffffafd2c3637bc0 EFLAGS: 00010282 [96049.936120] RAX: 0000000000000000 RBX: 000000000004dfff RCX: 0000000000000006 [96049.936121] RDX: 0000000000000007 RSI: ffff990445cf88e0 RDI: ffff99047b2968a0 [96049.936122] RBP: ffff990457851cd0 R08: 0000000000000001 R09: 0000000000000000 [96049.936123] R10: ffffafd2c3637b88 R11: 0000000000000000 R12: ffff9904574301e8 [96049.936124] R13: 000000000004e000 R14: ffffafd2c3637c50 R15: ffffafd2c3637c48 [96049.936125] FS: 00007fe4b87e72c0(0000) GS:ffff99047b280000(0000) knlGS:0000000000000000 [96049.936126] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [96049.936128] CR2: 00005562e52618d8 CR3: 00000001151c8005 CR4: 00000000003606e0 [96049.936129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [96049.936131] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [96049.936131] Call Trace: [96049.936141] __set_extent_bit+0x46c/0x6a0 [btrfs] [96049.936154] lock_extent_bits+0x6b/0x210 [btrfs] [96049.936167] btrfs_evict_inode+0x1e1/0x5a0 [btrfs] [96049.936172] evict+0xbf/0x1c0 [96049.936174] dispose_list+0x51/0x80 [96049.936176] evict_inodes+0x193/0x1c0 [96049.936180] generic_shutdown_super+0x3f/0x110 [96049.936182] kill_anon_super+0xe/0x30 [96049.936189] btrfs_kill_super+0x13/0x100 [btrfs] [96049.936191] deactivate_locked_super+0x3a/0x70 [96049.936193] cleanup_mnt+0x3b/0x80 [96049.936195] task_work_run+0x93/0xc0 [96049.936198] exit_to_usermode_loop+0xfa/0x100 [96049.936201] do_syscall_64+0x17f/0x1b0 [96049.936202] entry_SYSCALL_64_after_hwframe+0x49/0xbe [96049.936204] RIP: 0033:0x7fe4b80cfb37 [96049.936206] RSP: 002b:00007ffff092b688 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [96049.936207] RAX: 0000000000000000 RBX: 00005562e5259060 RCX: 00007fe4b80cfb37 [96049.936208] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00005562e525faa0 [96049.936209] RBP: 00005562e525faa0 R08: 00005562e525f770 R09: 0000000000000015 [96049.936210] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fe4b85d1e64 [96049.936211] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910 [96049.936211] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910 [96049.936216] irq event stamp: 6616 [96049.936219] hardirqs last enabled at (6615): [<ffffffff91b00663>] console_unlock+0x503/0x640 [96049.936219] hardirqs last disabled at (6616): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c [96049.936222] softirqs last enabled at (6328): [<ffffffff92600370>] __do_softirq+0x370/0x421 [96049.936222] softirqs last disabled at (6313): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0 [96049.936223] ---[ end trace db7b05f01b7fa124 ]--- The second stack trace, from inode eviction, is repeated forever due to the infinite loop during eviction. This is the same type of problem fixed way back in 2015 by commit 113e8283869b ("Btrfs: fix inode eviction infinite loop after extent_same ioctl") and commit ccccf3d67294 ("Btrfs: fix inode eviction infinite loop after cloning into it"). So fix this by returning immediately if the deduplication range length gets rounded down to 0 bytes, as there is nothing that needs to be done in such case. Example reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xe6 0 100" /mnt/foo $ xfs_io -f -c "pwrite -S 0xe6 0 1M" /mnt/bar # Unmount the filesystem and mount it again so that we start without any # extent state records when we ask for the deduplication. $ umount /mnt $ mount /dev/sdb /mnt $ xfs_io -c "dedupe /mnt/foo 0 500K 100" /mnt/bar # This unmount triggers the infinite loop. $ umount /mnt A test case for fstests will follow soon. Fixes: de02b9f6bb65 ("Btrfs: fix data corruption when deduplicating between different files") CC: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21Btrfs: fix cur_offset in the error case for nocowRobbie Ko1-3/+2
commit 506481b20e818db40b6198815904ecd2d6daee64 upstream. When the cow_file_range fails, the related resources are unlocked according to the range [start..end), so the unlock cannot be repeated in run_delalloc_nocow. In some cases (e.g. cur_offset <= end && cow_start != -1), cur_offset is not updated correctly, so move the cur_offset update before cow_file_range. kernel BUG at mm/page-writeback.c:2663! Internal error: Oops - BUG: 0 [#1] SMP CPU: 3 PID: 31525 Comm: kworker/u8:7 Tainted: P O Hardware name: Realtek_RTD1296 (DT) Workqueue: writeback wb_workfn (flush-btrfs-1) task: ffffffc076db3380 ti: ffffffc02e9ac000 task.ti: ffffffc02e9ac000 PC is at clear_page_dirty_for_io+0x1bc/0x1e8 LR is at clear_page_dirty_for_io+0x14/0x1e8 pc : [<ffffffc00033c91c>] lr : [<ffffffc00033c774>] pstate: 40000145 sp : ffffffc02e9af4f0 Process kworker/u8:7 (pid: 31525, stack limit = 0xffffffc02e9ac020) Call trace: [<ffffffc00033c91c>] clear_page_dirty_for_io+0x1bc/0x1e8 [<ffffffbffc514674>] extent_clear_unlock_delalloc+0x1e4/0x210 [btrfs] [<ffffffbffc4fb168>] run_delalloc_nocow+0x3b8/0x948 [btrfs] [<ffffffbffc4fb948>] run_delalloc_range+0x250/0x3a8 [btrfs] [<ffffffbffc514c0c>] writepage_delalloc.isra.21+0xbc/0x1d8 [btrfs] [<ffffffbffc516048>] __extent_writepage+0xe8/0x248 [btrfs] [<ffffffbffc51630c>] extent_write_cache_pages.isra.17+0x164/0x378 [btrfs] [<ffffffbffc5185a8>] extent_writepages+0x48/0x68 [btrfs] [<ffffffbffc4f5828>] btrfs_writepages+0x20/0x30 [btrfs] [<ffffffc00033d758>] do_writepages+0x30/0x88 [<ffffffc0003ba0f4>] __writeback_single_inode+0x34/0x198 [<ffffffc0003ba6c4>] writeback_sb_inodes+0x184/0x3c0 [<ffffffc0003ba96c>] __writeback_inodes_wb+0x6c/0xc0 [<ffffffc0003bac20>] wb_writeback+0x1b8/0x1c0 [<ffffffc0003bb0f0>] wb_workfn+0x150/0x250 [<ffffffc0002b0014>] process_one_work+0x1dc/0x388 [<ffffffc0002b02f0>] worker_thread+0x130/0x500 [<ffffffc0002b6344>] kthread+0x10c/0x110 [<ffffffc000284590>] ret_from_fork+0x10/0x40 Code: d503201f a9025bb5 a90363b7 f90023b9 (d4210000) CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21Btrfs: fix missing data checksums after a ranged fsync (msync)Filipe Manana1-0/+17
commit 008c6753f7e070c77c70d708a6bf0255b4381763 upstream. Recently we got a massive simplification for fsync, where for the fast path we no longer log new extents while their respective ordered extents are still running. However that simplification introduced a subtle regression for the case where we use a ranged fsync (msync). Consider the following example: CPU 0 CPU 1 mmap write to range [2Mb, 4Mb[ mmap write to range [512Kb, 1Mb[ msync range [512K, 1Mb[ --> triggers fast fsync (BTRFS_INODE_NEEDS_FULL_SYNC not set) --> creates extent map A for this range and adds it to list of modified extents --> starts ordered extent A for this range --> waits for it to complete writeback triggered for range [2Mb, 4Mb[ --> create extent map B and adds it to the list of modified extents --> creates ordered extent B --> start looking for and logging modified extents --> logs extent maps A and B --> finds checksums for extent A in the csum tree, but not for extent B fsync (msync) finishes --> ordered extent B finishes and its checksums are added to the csum tree <power cut> After replaying the log, we have the extent covering the range [2Mb, 4Mb[ but do not have the data checksum items covering that file range. This happens because at the very beginning of an fsync (btrfs_sync_file()) we start and wait for IO in the given range [512Kb, 1Mb[ and therefore wait for any ordered extents in that range to complete before we start logging the extents. However if right before we start logging the extent in our range [512Kb, 1Mb[, writeback is started for any other dirty range, such as the range [2Mb, 4Mb[ due to memory pressure or a concurrent fsync or msync (btrfs_sync_file() starts writeback before acquiring the inode's lock), an ordered extent is created for that other range and a new extent map is created to represent that range and added to the inode's list of modified extents. That means that we will see that other extent in that list when collecting extents for logging (done at btrfs_log_changed_extents()) and log the extent before the respective ordered extent finishes - namely before the checksum items are added to the checksums tree, which is where log_extent_csums() looks for the checksums, therefore making us log an extent without logging its checksums. Before that massive simplification of fsync, this wasn't a problem because besides looking for checkums in the checksums tree, we also looked for them in any ordered extent still running. The consequence of data checksums missing for a file range is that users attempting to read the affected file range will get -EIO errors and dmesg reports the following: [10188.358136] BTRFS info (device sdc): no csum found for inode 297 start 57344 [10188.359278] BTRFS warning (device sdc): csum failed root 5 ino 297 off 57344 csum 0x98f94189 expected csum 0x00000000 mirror 1 So fix this by skipping extents outside of our logging range at btrfs_log_changed_extents() and leaving them on the list of modified extents so that any subsequent ranged fsync may collect them if needed. Also, if we find a hole extent outside of the range still log it, just to prevent having gaps between extent items after replaying the log, otherwise fsck will complain when we are not using the NO_HOLES feature (fstest btrfs/056 triggers such case). Fixes: e7175a692765 ("btrfs: remove the wait ordered logic in the log_one_extent path") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21btrfs: fix pinned underflow after transaction abortedLu Fengqi1-1/+11
commit fcd5e74288f7d36991b1f0fb96b8c57079645e38 upstream. When running generic/475, we may get the following warning in dmesg: [ 6902.102154] WARNING: CPU: 3 PID: 18013 at fs/btrfs/extent-tree.c:9776 btrfs_free_block_groups+0x2af/0x3b0 [btrfs] [ 6902.109160] CPU: 3 PID: 18013 Comm: umount Tainted: G W O 4.19.0-rc8+ #8 [ 6902.110971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 6902.112857] RIP: 0010:btrfs_free_block_groups+0x2af/0x3b0 [btrfs] [ 6902.118921] RSP: 0018:ffffc9000459bdb0 EFLAGS: 00010286 [ 6902.120315] RAX: ffff880175050bb0 RBX: ffff8801124a8000 RCX: 0000000000170007 [ 6902.121969] RDX: 0000000000000002 RSI: 0000000000170007 RDI: ffffffff8125fb74 [ 6902.123716] RBP: ffff880175055d10 R08: 0000000000000000 R09: 0000000000000000 [ 6902.125417] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880175055d88 [ 6902.127129] R13: ffff880175050bb0 R14: 0000000000000000 R15: dead000000000100 [ 6902.129060] FS: 00007f4507223780(0000) GS:ffff88017ba00000(0000) knlGS:0000000000000000 [ 6902.130996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6902.132558] CR2: 00005623599cac78 CR3: 000000014b700001 CR4: 00000000003606e0 [ 6902.134270] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6902.135981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6902.137836] Call Trace: [ 6902.138939] close_ctree+0x171/0x330 [btrfs] [ 6902.140181] ? kthread_stop+0x146/0x1f0 [ 6902.141277] generic_shutdown_super+0x6c/0x100 [ 6902.142517] kill_anon_super+0x14/0x30 [ 6902.143554] btrfs_kill_super+0x13/0x100 [btrfs] [ 6902.144790] deactivate_locked_super+0x2f/0x70 [ 6902.146014] cleanup_mnt+0x3b/0x70 [ 6902.147020] task_work_run+0x9e/0xd0 [ 6902.148036] do_syscall_64+0x470/0x600 [ 6902.149142] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 6902.150375] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 6902.151640] RIP: 0033:0x7f45077a6a7b [ 6902.157324] RSP: 002b:00007ffd589f3e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 6902.159187] RAX: 0000000000000000 RBX: 000055e8eec732b0 RCX: 00007f45077a6a7b [ 6902.160834] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055e8eec73490 [ 6902.162526] RBP: 0000000000000000 R08: 000055e8eec734b0 R09: 00007ffd589f26c0 [ 6902.164141] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e8eec73490 [ 6902.165815] R13: 00007f4507ac61a4 R14: 0000000000000000 R15: 00007ffd589f40d8 [ 6902.167553] irq event stamp: 0 [ 6902.168998] hardirqs last enabled at (0): [<0000000000000000>] (null) [ 6902.170731] hardirqs last disabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00 [ 6902.172773] softirqs last enabled at (0): [<ffffffff810cd810>] copy_process.part.55+0x3b0/0x1f00 [ 6902.174671] softirqs last disabled at (0): [<0000000000000000>] (null) [ 6902.176407] ---[ end trace 463138c2986b275c ]--- [ 6902.177636] BTRFS info (device dm-3): space_info 4 has 273465344 free, is not full [ 6902.179453] BTRFS info (device dm-3): space_info total=276824064, used=4685824, pinned=18446744073708158976, reserved=0, may_use=0, readonly=65536 In the above line there's "pinned=18446744073708158976" which is an unsigned u64 value of -1392640, an obvious underflow. When transaction_kthread is running cleanup_transaction(), another fsstress is running btrfs_commit_transaction(). The btrfs_finish_extent_commit() may get the same range as btrfs_destroy_pinned_extent() got, which causes the pinned underflow. Fixes: d4b450cd4b33 ("Btrfs: fix race between transaction commit and empty block group removal") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ocfs2: free up write context when direct IO failedWengang Wang2-2/+19
commit 5040f8df56fb90c7919f1c9b0b6e54c843437456 upstream. The write context should also be freed even when direct IO failed. Otherwise a memory leak is introduced and entries remain in oi->ip_unwritten_list causing the following BUG later in unlink path: ERROR: bug expression: !list_empty(&oi->ip_unwritten_list) ERROR: Clear inode of 215043, inode has unwritten extents ... Call Trace: ? __set_current_blocked+0x42/0x68 ocfs2_evict_inode+0x91/0x6a0 [ocfs2] ? bit_waitqueue+0x40/0x33 evict+0xdb/0x1af iput+0x1a2/0x1f7 do_unlinkat+0x194/0x28f SyS_unlinkat+0x1b/0x2f do_syscall_64+0x79/0x1ae entry_SYSCALL_64_after_hwframe+0x151/0x0 This patch also logs, with frequency limit, direct IO failures. Link: http://lkml.kernel.org/r/20181102170632.25921-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21ocfs2: fix a misuse a of brelse after failing ocfs2_check_dir_entryChangwei Ge1-2/+1
commit 29aa30167a0a2e6045a0d6d2e89d8168132333d5 upstream. Somehow, file system metadata was corrupted, which causes ocfs2_check_dir_entry() to fail in function ocfs2_dir_foreach_blk_el(). According to the original design intention, if above happens we should skip the problematic block and continue to retrieve dir entry. But there is obviouse misuse of brelse around related code. After failure of ocfs2_check_dir_entry(), current code just moves to next position and uses the problematic buffer head again and again during which the problematic buffer head is released for multiple times. I suppose, this a serious issue which is long-lived in ocfs2. This may cause other file systems which is also used in a the same host insane. So we should also consider about bakcporting this patch into linux -stable. Link: http://lkml.kernel.org/r/HK2PR06MB045211675B43EED794E597B6D56E0@HK2PR06MB0452.apcprd06.prod.outlook.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Suggested-by: Changkuo Shi <shi.changkuo@h3c.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21Revert "ceph: fix dentry leak in splice_dentry()"Yan, Zheng1-2/+6
commit efe328230dc01aa0b1269aad0b5fae73eea4677a upstream. This reverts commit 8b8f53af1ed9df88a4c0fbfdf3db58f62060edf3. splice_dentry() is used by three places. For two places, req->r_dentry is passed to splice_dentry(). In the case of error, req->r_dentry does not get updated. So splice_dentry() should not drop reference. Cc: stable@vger.kernel.org # 4.18+ Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: set FR_SENT while lockedMiklos Szeredi1-1/+1
commit 4c316f2f3ff315cb48efb7435621e5bfb81df96d upstream. Otherwise fuse_dev_do_write() could come in and finish off the request, and the set_bit(FR_SENT, ...) could trigger the WARN_ON(test_bit(FR_SENT, ...)) in request_end(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reported-by: syzbot+ef054c4d3f64cd7f7cec@syzkaller.appspotmai Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: fix blocked_waitq wakeupMiklos Szeredi1-4/+11
commit 908a572b80f6e9577b45e81b3dfe2e22111286b8 upstream. Using waitqueue_active() is racy. Make sure we issue a wake_up() unconditionally after storing into fc->blocked. After that it's okay to optimize with waitqueue_active() since the first wake up provides the necessary barrier for all waiters, not the just the woken one. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 3c18ef8117f0 ("fuse: optimize wake_up") Cc: <stable@vger.kernel.org> # v3.10 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: Fix use-after-free in fuse_dev_do_write()Kirill Tkhai1-1/+5
commit d2d2d4fb1f54eff0f3faa9762d84f6446a4bc5d0 upstream. After we found req in request_find() and released the lock, everything may happen with the req in parallel: cpu0 cpu1 fuse_dev_do_write() fuse_dev_do_write() req = request_find(fpq, ...) ... spin_unlock(&fpq->lock) ... ... req = request_find(fpq, oh.unique) ... spin_unlock(&fpq->lock) queue_interrupt(&fc->iq, req); ... ... ... ... ... request_end(fc, req); fuse_put_request(fc, req); ... queue_interrupt(&fc->iq, req); Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21fuse: Fix use-after-free in fuse_dev_do_read()Kirill Tkhai1-0/+2
commit bc78abbd55dd28e2287ec6d6502b842321a17c87 upstream. We may pick freed req in this way: [cpu0] [cpu1] fuse_dev_do_read() fuse_dev_do_write() list_move_tail(&req->list, ...); ... spin_unlock(&fpq->lock); ... ... request_end(fc, req); ... fuse_put_request(fc, req); if (test_bit(FR_INTERRUPTED, ...)) queue_interrupt(fiq, req); Fix that by keeping req alive until we finish all manipulations. Reported-by: syzbot+4e975615ca01f2277bdd@syzkaller.appspotmail.com Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 46c34a348b0a ("fuse: no fc->lock for pqueue parts") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>