summaryrefslogtreecommitdiff
path: root/fs/btrfs/ioctl.c
AgeCommit message (Collapse)AuthorFilesLines
2020-11-18btrfs: fix potential overflow in cluster_pages_for_defrag on 32bit archMatthew Wilcox (Oracle)1-6/+4
commit a1fbc6750e212c5675a4e48d7f51d44607eb8756 upstream. On 32-bit systems, this shift will overflow for files larger than 4GB as start_index is unsigned long while the calls to btrfs_delalloc_*_space expect u64. CC: stable@vger.kernel.org # 4.4+ Fixes: df480633b891 ("btrfs: extent-tree: Switch to new delalloc space reserve and release") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> [ define the variable instead of repeating the shift ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-18btrfs: reschedule when cloning lots of extentsJohannes Thumshirn1-0/+2
[ Upstream commit 6b613cc97f0ace77f92f7bc112b8f6ad3f52baf8 ] We have several occurrences of a soft lockup from fstest's generic/175 testcase, which look more or less like this one: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [xfs_io:10030] Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 10030 Comm: xfs_io Tainted: G L 5.9.0-rc5+ #768 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 Call Trace: <IRQ> dump_stack+0x77/0xa0 panic+0xfa/0x2cb watchdog_timer_fn.cold+0x85/0xa5 ? lockup_detector_update_enable+0x50/0x50 __hrtimer_run_queues+0x99/0x4c0 ? recalibrate_cpu_khz+0x10/0x10 hrtimer_run_queues+0x9f/0xb0 update_process_times+0x28/0x80 tick_handle_periodic+0x1b/0x60 __sysvec_apic_timer_interrupt+0x76/0x210 asm_call_on_stack+0x12/0x20 </IRQ> sysvec_apic_timer_interrupt+0x7f/0x90 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0010:btrfs_tree_unlock+0x91/0x1a0 [btrfs] RSP: 0018:ffffc90007123a58 EFLAGS: 00000282 RAX: ffff8881cea2fbe0 RBX: ffff8881cea2fbe0 RCX: 0000000000000000 RDX: ffff8881d23fd200 RSI: ffffffff82045220 RDI: ffff8881cea2fba0 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000032 R10: 0000160000000000 R11: 0000000000001000 R12: 0000000000001000 R13: ffff8882357fd5b0 R14: ffff88816fa76e70 R15: ffff8881cea2fad0 ? btrfs_tree_unlock+0x15b/0x1a0 [btrfs] btrfs_release_path+0x67/0x80 [btrfs] btrfs_insert_replace_extent+0x177/0x2c0 [btrfs] btrfs_replace_file_extents+0x472/0x7c0 [btrfs] btrfs_clone+0x9ba/0xbd0 [btrfs] btrfs_clone_files.isra.0+0xeb/0x140 [btrfs] ? file_update_time+0xcd/0x120 btrfs_remap_file_range+0x322/0x3b0 [btrfs] do_clone_file_range+0xb7/0x1e0 vfs_clone_file_range+0x30/0xa0 ioctl_file_clone+0x8a/0xc0 do_vfs_ioctl+0x5b2/0x6f0 __x64_sys_ioctl+0x37/0xa0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f87977fc247 RSP: 002b:00007ffd51a2f6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87977fc247 RDX: 00007ffd51a2f710 RSI: 000000004020940d RDI: 0000000000000003 RBP: 0000000000000004 R08: 00007ffd51a79080 R09: 0000000000000000 R10: 00005621f11352f2 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 00005621f128b958 R15: 0000000080000000 Kernel Offset: disabled ---[ end Kernel panic - not syncing: softlockup: hung tasks ]--- All of these lockup reports have the call chain btrfs_clone_files() -> btrfs_clone() in common. btrfs_clone_files() calls btrfs_clone() with both source and destination extents locked and loops over the source extent to create the clones. Conditionally reschedule in the btrfs_clone() loop, to give some time back to other processes. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-17btrfs: fix wrong address when faulting in pages in the search ioctlFilipe Manana1-1/+2
commit 1c78544eaa4660096aeb6a57ec82b42cdb3bfe5a upstream. When faulting in the pages for the user supplied buffer for the search ioctl, we are passing only the base address of the buffer to the function fault_in_pages_writeable(). This means that after the first iteration of the while loop that searches for leaves, when we have a non-zero offset, stored in 'sk_offset', we try to fault in a wrong page range. So fix this by adding the offset in 'sk_offset' to the base address of the user supplied buffer when calling fault_in_pages_writeable(). Several users have reported that the applications compsize and bees have started to operate incorrectly since commit a48b73eca4ceb9 ("btrfs: fix potential deadlock in the search ioctl") was added to stable trees, and these applications make heavy use of the search ioctls. This fixes their issues. Link: https://lore.kernel.org/linux-btrfs/632b888d-a3c3-b085-cdf5-f9bb61017d92@lechevalier.se/ Link: https://github.com/kilobyte/compsize/issues/34 Fixes: a48b73eca4ceb9 ("btrfs: fix potential deadlock in the search ioctl") CC: stable@vger.kernel.org # 4.4+ Tested-by: A L <mail@lechevalier.se> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-09btrfs: fix potential deadlock in the search ioctlJosef Bacik1-7/+20
[ Upstream commit a48b73eca4ceb9b8a4b97f290a065335dbcd8a04 ] With the conversion of the tree locks to rwsem I got the following lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc7-00165-g04ec4da5f45f-dirty #922 Not tainted ------------------------------------------------------ compsize/11122 is trying to acquire lock: ffff889fabca8768 (&mm->mmap_lock#2){++++}-{3:3}, at: __might_fault+0x3e/0x90 but task is already holding lock: ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-fs-00){++++}-{3:3}: down_write_nested+0x3b/0x70 __btrfs_tree_lock+0x24/0x120 btrfs_search_slot+0x756/0x990 btrfs_lookup_inode+0x3a/0xb4 __btrfs_update_delayed_inode+0x93/0x270 btrfs_async_run_delayed_root+0x168/0x230 btrfs_work_helper+0xd4/0x570 process_one_work+0x2ad/0x5f0 worker_thread+0x3a/0x3d0 kthread+0x133/0x150 ret_from_fork+0x1f/0x30 -> #1 (&delayed_node->mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_delayed_update_inode+0x50/0x440 btrfs_update_inode+0x8a/0xf0 btrfs_dirty_inode+0x5b/0xd0 touch_atime+0xa1/0xd0 btrfs_file_mmap+0x3f/0x60 mmap_region+0x3a4/0x640 do_mmap+0x376/0x580 vm_mmap_pgoff+0xd5/0x120 ksys_mmap_pgoff+0x193/0x230 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&mm->mmap_lock#2){++++}-{3:3}: __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 __might_fault+0x68/0x90 _copy_to_user+0x1e/0x80 copy_to_sk.isra.32+0x121/0x300 search_ioctl+0x106/0x200 btrfs_ioctl_tree_search_v2+0x7b/0xf0 btrfs_ioctl+0x106f/0x30a0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: &mm->mmap_lock#2 --> &delayed_node->mutex --> btrfs-fs-00 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-fs-00); lock(&delayed_node->mutex); lock(btrfs-fs-00); lock(&mm->mmap_lock#2); *** DEADLOCK *** 1 lock held by compsize/11122: #0: ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180 stack backtrace: CPU: 17 PID: 11122 Comm: compsize Kdump: loaded Not tainted 5.8.0-rc7-00165-g04ec4da5f45f-dirty #922 Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018 Call Trace: dump_stack+0x78/0xa0 check_noncircular+0x165/0x180 __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 ? __might_fault+0x3e/0x90 ? find_held_lock+0x72/0x90 __might_fault+0x68/0x90 ? __might_fault+0x3e/0x90 _copy_to_user+0x1e/0x80 copy_to_sk.isra.32+0x121/0x300 ? btrfs_search_forward+0x2a6/0x360 search_ioctl+0x106/0x200 btrfs_ioctl_tree_search_v2+0x7b/0xf0 btrfs_ioctl+0x106f/0x30a0 ? __do_sys_newfstat+0x5a/0x70 ? ksys_ioctl+0x83/0xc0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The problem is we're doing a copy_to_user() while holding tree locks, which can deadlock if we have to do a page fault for the copy_to_user(). This exists even without my locking changes, so it needs to be fixed. Rework the search ioctl to do the pre-fault and then copy_to_user_nofault for the copying. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22btrfs: merge btrfs_find_device and find_deviceAnand Jain1-2/+3
commit 09ba3bc9dd150457c506e4661380a6183af651c1 upstream. Both btrfs_find_device() and find_device() does the same thing except that the latter does not take the seed device onto account in the device scanning context. We can merge them. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [4.19.y backport notes: Vikash : - To apply this patch, a portion of commit e4319cd9cace was used to change the first argument of function "btrfs_find_device" from "struct btrfs_fs_info" to "struct btrfs_fs_devices". Signed-off-by: Vikash Bansal <bvikas@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-31btrfs: abort transaction after failed inode updates in create_subvolJosef Bacik1-2/+8
commit c7e54b5102bf3614cadb9ca32d7be73bad6cecf0 upstream. We can just abort the transaction here, and in fact do that for every other failure in this function except these two cases. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-01btrfs: defrag: use btrfs_mod_outstanding_extents in cluster_pages_for_defragSu Yue1-1/+1
[ Upstream commit 28c4a3e21ad030d7571ee9b1b246a5cbfd886627 ] Since commit 8b62f87bad9c ("Btrfs: rework outstanding_extents"), manual operations of outstanding_extent in btrfs_inode are replaced by btrfs_mod_outstanding_extents(). The one in cluster_pages_for_defrag seems to be lost, so replace it of btrfs_mod_outstanding_extents(). Fixes: 8b62f87bad9c ("Btrfs: rework outstanding_extents") Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-06btrfs: qgroup: Always free PREALLOC META reserve in ↵Qu Wenruo1-4/+2
btrfs_delalloc_release_extents() [ Upstream commit 8702ba9396bf7bbae2ab93c94acd4bd37cfa4f09 ] [Background] Btrfs qgroup uses two types of reserved space for METADATA space, PERTRANS and PREALLOC. PERTRANS is metadata space reserved for each transaction started by btrfs_start_transaction(). While PREALLOC is for delalloc, where we reserve space before joining a transaction, and finally it will be converted to PERTRANS after the writeback is done. [Inconsistency] However there is inconsistency in how we handle PREALLOC metadata space. The most obvious one is: In btrfs_buffered_write(): btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true); We always free qgroup PREALLOC meta space. While in btrfs_truncate_block(): btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0)); We only free qgroup PREALLOC meta space when something went wrong. [The Correct Behavior] The correct behavior should be the one in btrfs_buffered_write(), we should always free PREALLOC metadata space. The reason is, the btrfs_delalloc_* mechanism works by: - Reserve metadata first, even it's not necessary In btrfs_delalloc_reserve_metadata() - Free the unused metadata space Normally in: btrfs_delalloc_release_extents() |- btrfs_inode_rsv_release() Here we do calculation on whether we should release or not. E.g. for 64K buffered write, the metadata rsv works like: /* The first page */ reserve_meta: num_bytes=calc_inode_reservations() free_meta: num_bytes=0 total: num_bytes=calc_inode_reservations() /* The first page caused one outstanding extent, thus needs metadata rsv */ /* The 2nd page */ reserve_meta: num_bytes=calc_inode_reservations() free_meta: num_bytes=calc_inode_reservations() total: not changed /* The 2nd page doesn't cause new outstanding extent, needs no new meta rsv, so we free what we have reserved */ /* The 3rd~16th pages */ reserve_meta: num_bytes=calc_inode_reservations() free_meta: num_bytes=calc_inode_reservations() total: not changed (still space for one outstanding extent) This means, if btrfs_delalloc_release_extents() determines to free some space, then those space should be freed NOW. So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other than btrfs_qgroup_convert_reserved_meta(). The good news is: - The callers are not that hot The hottest caller is in btrfs_buffered_write(), which is already fixed by commit 336a8bb8e36a ("btrfs: Fix wrong btrfs_delalloc_release_extents parameter"). Thus it's not that easy to cause false EDQUOT. - The trans commit in advance for qgroup would hide the bug Since commit f5fef4593653 ("btrfs: qgroup: Make qgroup async transaction commit more aggressive"), when btrfs qgroup metadata free space is slow, it will try to commit transaction and free the wrongly converted PERTRANS space, so it's not that easy to hit such bug. [FIX] So to fix the problem, remove the @qgroup_free parameter for btrfs_delalloc_release_extents(), and always pass true to btrfs_inode_rsv_release(). Reported-by: Filipe Manana <fdmanana@suse.com> Fixes: 43b18595d660 ("btrfs: qgroup: Use separate meta reservation type for delalloc") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-17Btrfs: do not allow trimming when a fs is mounted with the nologreplay optionFilipe Manana1-0/+10
commit f35f06c35560a86e841631f0243b83a984dc11a9 upstream. Whan a filesystem is mounted with the nologreplay mount option, which requires it to be mounted in RO mode as well, we can not allow discard on free space inside block groups, because log trees refer to extents that are not pinned in a block group's free space cache (pinning the extents is precisely the first phase of replaying a log tree). So do not allow the fitrim ioctl to do anything when the filesystem is mounted with the nologreplay option, because later it can be mounted RW without that option, which causes log replay to happen and result in either a failure to replay the log trees (leading to a mount failure), a crash or some silent corruption. Reported-by: Darrick J. Wong <darrick.wong@oracle.com> Fixes: 96da09192cda ("btrfs: Introduce new mount option to disable tree log replay") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-12btrfs: use tagged writepage to mitigate livelock of snapshotEthan Lien1-1/+1
[ Upstream commit 3cd24c698004d2f7668e0eb9fc1f096f533c791b ] Snapshot is expected to be fast. But if there are writers steadily creating dirty pages in our subvolume, the snapshot may take a very long time to complete. To fix the problem, we use tagged writepage for snapshot flusher as we do in the generic write_cache_pages(), so we can omit pages dirtied after the snapshot command. This does not change the semantics regarding which data get to the snapshot, if there are pages being dirtied during the snapshotting operation. There's a sync called before snapshot is taken in old/new case, any IO in flight just after that may be in the snapshot but this depends on other system effects that might still sync the IO. We do a simple snapshot speed test on a Intel D-1531 box: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=write --size=64G --direct=0 --thread=1 --numjobs=1 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 1m58sec patched: 6.54sec This is the best case for this patch since for a sequential write case, we omit nearly all pages dirtied after the snapshot command. For a multi writers, random write test: fio --ioengine=libaio --iodepth=32 --bs=4k --rw=randwrite --size=64G --direct=0 --thread=1 --numjobs=4 --time_based --runtime=120 --filename=/mnt/sub/testfile --name=job1 --group_reporting & sleep 5; time btrfs sub snap -r /mnt/sub /mnt/snap; killall fio original: 15.83sec patched: 10.35sec The improvement is smaller compared to the sequential write case, since we omit only half of the pages dirtied after snapshot command. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Ethan Lien <ethanlien@synology.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-21Btrfs: fix data corruption due to cloning of eof blockFilipe Manana1-2/+10
commit ac765f83f1397646c11092a032d4f62c3d478b81 upstream. We currently allow cloning a range from a file which includes the last block of the file even if the file's size is not aligned to the block size. This is fine and useful when the destination file has the same size, but when it does not and the range ends somewhere in the middle of the destination file, it leads to corruption because the bytes between the EOF and the end of the block have undefined data (when there is support for discard/trimming they have a value of 0x00). Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ export foo_size=$((256 * 1024 + 100)) $ xfs_io -f -c "pwrite -S 0x3c 0 $foo_size" /mnt/foo $ xfs_io -f -c "pwrite -S 0xb5 0 1M" /mnt/bar $ xfs_io -c "reflink /mnt/foo 0 512K $foo_size" /mnt/bar $ od -A d -t x1 /mnt/bar 0000000 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 * 0524288 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c * 0786528 3c 3c 3c 3c 00 00 00 00 00 00 00 00 00 00 00 00 0786544 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 0790528 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 * 1048576 The bytes in the range from 786532 (512Kb + 256Kb + 100 bytes) to 790527 (512Kb + 256Kb + 4Kb - 1) got corrupted, having now a value of 0x00 instead of 0xb5. This is similar to the problem we had for deduplication that got recently fixed by commit de02b9f6bb65 ("Btrfs: fix data corruption when deduplicating between different files"). Fix this by not allowing such operations to be performed and return the errno -EINVAL to user space. This is what XFS is doing as well at the VFS level. This change however now makes us return -EINVAL instead of -EOPNOTSUPP for cases where the source range maps to an inline extent and the destination range's end is smaller then the destination file's size, since the detection of inline extents is done during the actual process of dropping file extent items (at __btrfs_drop_extents()). Returning the -EINVAL error is done early on and solely based on the input parameters (offsets and length) and destination file's size. This makes us consistent with XFS and anyone else supporting cloning since this case is now checked at a higher level in the VFS and is where the -EINVAL will be returned from starting with kernel 4.20 (the VFS changed was introduced in 4.20-rc1 by commit 07d19dc9fbe9 ("vfs: avoid problematic remapping requests into partial EOF block"). So this change is more geared towards stable kernels, as it's unlikely the new VFS checks get removed intentionally. A test case for fstests follows soon, as well as an update to filter existing tests that expect -EOPNOTSUPP to accept -EINVAL as well. CC: <stable@vger.kernel.org> # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-21Btrfs: fix infinite loop on inode eviction after deduplication of eof blockFilipe Manana1-0/+2
commit 11023d3f5fdf89bba5e1142127701ca6e6014587 upstream. If we attempt to deduplicate the last block of a file A into the middle of a file B, and file A's size is not a multiple of the block size, we end rounding the deduplication length to 0 bytes, to avoid the data corruption issue fixed by commit de02b9f6bb65 ("Btrfs: fix data corruption when deduplicating between different files"). However a length of zero will cause the insertion of an extent state with a start value greater (by 1) then the end value, leading to a corrupt extent state that will trigger a warning and cause chaos such as an infinite loop during inode eviction. Example trace: [96049.833585] ------------[ cut here ]------------ [96049.833714] WARNING: CPU: 0 PID: 24448 at fs/btrfs/extent_io.c:436 insert_state+0x101/0x120 [btrfs] [96049.833767] CPU: 0 PID: 24448 Comm: xfs_io Not tainted 4.19.0-rc7-btrfs-next-39 #1 [96049.833768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [96049.833780] RIP: 0010:insert_state+0x101/0x120 [btrfs] [96049.833783] RSP: 0018:ffffafd2c3707af0 EFLAGS: 00010282 [96049.833785] RAX: 0000000000000000 RBX: 000000000004dfff RCX: 0000000000000006 [96049.833786] RDX: 0000000000000007 RSI: ffff99045c143230 RDI: ffff99047b2168a0 [96049.833787] RBP: ffff990457851cd0 R08: 0000000000000001 R09: 0000000000000000 [96049.833787] R10: ffffafd2c3707ab8 R11: 0000000000000000 R12: ffff9903b93b12c8 [96049.833788] R13: 000000000004e000 R14: ffffafd2c3707b80 R15: ffffafd2c3707b78 [96049.833790] FS: 00007f5c14e7d700(0000) GS:ffff99047b200000(0000) knlGS:0000000000000000 [96049.833791] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [96049.833792] CR2: 00007f5c146abff8 CR3: 0000000115f4c004 CR4: 00000000003606f0 [96049.833795] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [96049.833796] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [96049.833796] Call Trace: [96049.833809] __set_extent_bit+0x46c/0x6a0 [btrfs] [96049.833823] lock_extent_bits+0x6b/0x210 [btrfs] [96049.833831] ? _raw_spin_unlock+0x24/0x30 [96049.833841] ? test_range_bit+0xdf/0x130 [btrfs] [96049.833853] lock_extent_range+0x8e/0x150 [btrfs] [96049.833864] btrfs_double_extent_lock+0x78/0xb0 [btrfs] [96049.833875] btrfs_extent_same_range+0x14e/0x550 [btrfs] [96049.833885] ? rcu_read_lock_sched_held+0x3f/0x70 [96049.833890] ? __kmalloc_node+0x2b0/0x2f0 [96049.833899] ? btrfs_dedupe_file_range+0x19a/0x280 [btrfs] [96049.833909] btrfs_dedupe_file_range+0x270/0x280 [btrfs] [96049.833916] vfs_dedupe_file_range_one+0xd9/0xe0 [96049.833919] vfs_dedupe_file_range+0x131/0x1b0 [96049.833924] do_vfs_ioctl+0x272/0x6e0 [96049.833927] ? __fget+0x113/0x200 [96049.833931] ksys_ioctl+0x70/0x80 [96049.833933] __x64_sys_ioctl+0x16/0x20 [96049.833937] do_syscall_64+0x60/0x1b0 [96049.833939] entry_SYSCALL_64_after_hwframe+0x49/0xbe [96049.833941] RIP: 0033:0x7f5c1478ddd7 [96049.833943] RSP: 002b:00007ffe15b196a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [96049.833945] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5c1478ddd7 [96049.833946] RDX: 00005625ece322d0 RSI: 00000000c0189436 RDI: 0000000000000004 [96049.833947] RBP: 0000000000000000 R08: 00007f5c14a46f48 R09: 0000000000000040 [96049.833948] R10: 0000000000000541 R11: 0000000000000202 R12: 0000000000000000 [96049.833949] R13: 0000000000000000 R14: 0000000000000004 R15: 00005625ece322d0 [96049.833954] irq event stamp: 6196 [96049.833956] hardirqs last enabled at (6195): [<ffffffff91b00663>] console_unlock+0x503/0x640 [96049.833958] hardirqs last disabled at (6196): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c [96049.833959] softirqs last enabled at (6114): [<ffffffff92600370>] __do_softirq+0x370/0x421 [96049.833964] softirqs last disabled at (6095): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0 [96049.833965] ---[ end trace db7b05f01b7fa10c ]--- [96049.935816] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910 [96049.935822] irq event stamp: 6584 [96049.935823] hardirqs last enabled at (6583): [<ffffffff91b00663>] console_unlock+0x503/0x640 [96049.935825] hardirqs last disabled at (6584): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c [96049.935827] softirqs last enabled at (6328): [<ffffffff92600370>] __do_softirq+0x370/0x421 [96049.935828] softirqs last disabled at (6313): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0 [96049.935829] ---[ end trace db7b05f01b7fa123 ]--- [96049.935840] ------------[ cut here ]------------ [96049.936065] WARNING: CPU: 1 PID: 24463 at fs/btrfs/extent_io.c:436 insert_state+0x101/0x120 [btrfs] [96049.936107] CPU: 1 PID: 24463 Comm: umount Tainted: G W 4.19.0-rc7-btrfs-next-39 #1 [96049.936108] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [96049.936117] RIP: 0010:insert_state+0x101/0x120 [btrfs] [96049.936119] RSP: 0018:ffffafd2c3637bc0 EFLAGS: 00010282 [96049.936120] RAX: 0000000000000000 RBX: 000000000004dfff RCX: 0000000000000006 [96049.936121] RDX: 0000000000000007 RSI: ffff990445cf88e0 RDI: ffff99047b2968a0 [96049.936122] RBP: ffff990457851cd0 R08: 0000000000000001 R09: 0000000000000000 [96049.936123] R10: ffffafd2c3637b88 R11: 0000000000000000 R12: ffff9904574301e8 [96049.936124] R13: 000000000004e000 R14: ffffafd2c3637c50 R15: ffffafd2c3637c48 [96049.936125] FS: 00007fe4b87e72c0(0000) GS:ffff99047b280000(0000) knlGS:0000000000000000 [96049.936126] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [96049.936128] CR2: 00005562e52618d8 CR3: 00000001151c8005 CR4: 00000000003606e0 [96049.936129] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [96049.936131] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [96049.936131] Call Trace: [96049.936141] __set_extent_bit+0x46c/0x6a0 [btrfs] [96049.936154] lock_extent_bits+0x6b/0x210 [btrfs] [96049.936167] btrfs_evict_inode+0x1e1/0x5a0 [btrfs] [96049.936172] evict+0xbf/0x1c0 [96049.936174] dispose_list+0x51/0x80 [96049.936176] evict_inodes+0x193/0x1c0 [96049.936180] generic_shutdown_super+0x3f/0x110 [96049.936182] kill_anon_super+0xe/0x30 [96049.936189] btrfs_kill_super+0x13/0x100 [btrfs] [96049.936191] deactivate_locked_super+0x3a/0x70 [96049.936193] cleanup_mnt+0x3b/0x80 [96049.936195] task_work_run+0x93/0xc0 [96049.936198] exit_to_usermode_loop+0xfa/0x100 [96049.936201] do_syscall_64+0x17f/0x1b0 [96049.936202] entry_SYSCALL_64_after_hwframe+0x49/0xbe [96049.936204] RIP: 0033:0x7fe4b80cfb37 [96049.936206] RSP: 002b:00007ffff092b688 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [96049.936207] RAX: 0000000000000000 RBX: 00005562e5259060 RCX: 00007fe4b80cfb37 [96049.936208] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00005562e525faa0 [96049.936209] RBP: 00005562e525faa0 R08: 00005562e525f770 R09: 0000000000000015 [96049.936210] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007fe4b85d1e64 [96049.936211] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910 [96049.936211] R13: 0000000000000000 R14: 00005562e5259240 R15: 00007ffff092b910 [96049.936216] irq event stamp: 6616 [96049.936219] hardirqs last enabled at (6615): [<ffffffff91b00663>] console_unlock+0x503/0x640 [96049.936219] hardirqs last disabled at (6616): [<ffffffff91a037dd>] trace_hardirqs_off_thunk+0x1a/0x1c [96049.936222] softirqs last enabled at (6328): [<ffffffff92600370>] __do_softirq+0x370/0x421 [96049.936222] softirqs last disabled at (6313): [<ffffffff91a8dd4d>] irq_exit+0xcd/0xe0 [96049.936223] ---[ end trace db7b05f01b7fa124 ]--- The second stack trace, from inode eviction, is repeated forever due to the infinite loop during eviction. This is the same type of problem fixed way back in 2015 by commit 113e8283869b ("Btrfs: fix inode eviction infinite loop after extent_same ioctl") and commit ccccf3d67294 ("Btrfs: fix inode eviction infinite loop after cloning into it"). So fix this by returning immediately if the deduplication range length gets rounded down to 0 bytes, as there is nothing that needs to be done in such case. Example reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xe6 0 100" /mnt/foo $ xfs_io -f -c "pwrite -S 0xe6 0 1M" /mnt/bar # Unmount the filesystem and mount it again so that we start without any # extent state records when we ask for the deduplication. $ umount /mnt $ mount /dev/sdb /mnt $ xfs_io -c "dedupe /mnt/foo 0 500K 100" /mnt/bar # This unmount triggers the infinite loop. $ umount /mnt A test case for fstests will follow soon. Fixes: de02b9f6bb65 ("Btrfs: fix data corruption when deduplicating between different files") CC: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-13btrfs: Ensure btrfs_trim_fs can trim the whole filesystemQu Wenruo1-4/+7
commit 6ba9fc8e628becf0e3ec94083450d089b0dec5f5 upstream. [BUG] fstrim on some btrfs only trims the unallocated space, not trimming any space in existing block groups. [CAUSE] Before fstrim_range passed to btrfs_trim_fs(), it gets truncated to range [0, super->total_bytes). So later btrfs_trim_fs() will only be able to trim block groups in range [0, super->total_bytes). While for btrfs, any bytenr aligned to sectorsize is valid, since btrfs uses its logical address space, there is nothing limiting the location where we put block groups. For filesystem with frequent balance, it's quite easy to relocate all block groups and bytenr of block groups will start beyond super->total_bytes. In that case, btrfs will not trim existing block groups. [FIX] Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs() can get the unmodified range, which is normally set to [0, U64_MAX]. Reported-by: Chris Murphy <lists@colorremedies.com> Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim ioctl") CC: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-06Merge tag 'for-4.19-rc2-tag' of ↵Linus Torvalds1-0/+35
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix for improper fsync after hardlink - fix for a corruption during file deduplication - use after free fixes - RCU warning fix - fix for buffered write to nodatacow file * tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcu btrfs: use after free in btrfs_quota_enable btrfs: btrfs_shrink_device should call commit transaction at the end btrfs: fix qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadata Btrfs: fix data corruption when deduplicating between different files Btrfs: sync log after logging new name Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space
2018-08-23Btrfs: fix data corruption when deduplicating between different filesFilipe Manana1-0/+19
If we deduplicate extents between two different files we can end up corrupting data if the source range ends at the size of the source file, the source file's size is not aligned to the filesystem's block size and the destination range does not go past the size of the destination file size. Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0x6b 0 2518890" /mnt/foo # The first byte with a value of 0xae starts at an offset (2518890) # which is not a multiple of the sector size. $ xfs_io -c "pwrite -S 0xae 2518890 102398" /mnt/foo # Confirm the file content is full of bytes with values 0x6b and 0xae. $ od -t x1 /mnt/foo 0000000 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b * 11467540 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ae ae ae ae ae ae 11467560 ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae * 11777540 ae ae ae ae ae ae ae ae 11777550 # Create a second file with a length not aligned to the sector size, # whose bytes all have the value 0x6b, so that its extent(s) can be # deduplicated with the first file. $ xfs_io -f -c "pwrite -S 0x6b 0 557771" /mnt/bar # Now deduplicate the entire second file into a range of the first file # that also has all bytes with the value 0x6b. The destination range's # end offset must not be aligned to the sector size and must be less # then the offset of the first byte with the value 0xae (byte at offset # 2518890). $ xfs_io -c "dedupe /mnt/bar 0 1957888 557771" /mnt/foo # The bytes in the range starting at offset 2515659 (end of the # deduplication range) and ending at offset 2519040 (start offset # rounded up to the block size) must all have the value 0xae (and not # replaced with 0x00 values). In other words, we should have exactly # the same data we had before we asked for deduplication. $ od -t x1 /mnt/foo 0000000 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b * 11467540 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ae ae ae ae ae ae 11467560 ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae * 11777540 ae ae ae ae ae ae ae ae 11777550 # Unmount the filesystem and mount it again. This guarantees any file # data in the page cache is dropped. $ umount /dev/sdb $ mount /dev/sdb /mnt $ od -t x1 /mnt/foo 0000000 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b * 11461300 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 00 11461320 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 11470000 ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae ae * 11777540 ae ae ae ae ae ae ae ae 11777550 # The bytes in range 2515659 to 2519040 have a value of 0x00 and not a # value of 0xae, data corruption happened due to the deduplication # operation. So fix this by rounding down, to the sector size, the length used for the deduplication when the following conditions are met: 1) Source file's range ends at its i_size; 2) Source file's i_size is not aligned to the sector size; 3) Destination range does not cross the i_size of the destination file. Fixes: e1d227a42ea2 ("btrfs: Handle unaligned length in extent_same") CC: stable@vger.kernel.org # 4.2+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-22Merge tag 'ovl-update-4.19' of ↵Linus Torvalds1-7/+4
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This contains two new features: - Stack file operations: this allows removal of several hacks from the VFS, proper interaction of read-only open files with copy-up, possibility to implement fs modifying ioctls properly, and others. - Metadata only copy-up: when file is on lower layer and only metadata is modified (except size) then only copy up the metadata and continue to use the data from the lower file" * tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits) ovl: Enable metadata only feature ovl: Do not do metacopy only for ioctl modifying file attr ovl: Do not do metadata only copy-up for truncate operation ovl: add helper to force data copy-up ovl: Check redirect on index as well ovl: Set redirect on upper inode when it is linked ovl: Set redirect on metacopy files upon rename ovl: Do not set dentry type ORIGIN for broken hardlinks ovl: Add an inode flag OVL_CONST_INO ovl: Treat metacopy dentries as type OVL_PATH_MERGE ovl: Check redirects for metacopy files ovl: Move some dir related ovl_lookup_single() code in else block ovl: Do not expose metacopy only dentry from d_real() ovl: Open file with data except for the case of fsync ovl: Add helper ovl_inode_realdata() ovl: Store lower data inode in ovl_inode ovl: Fix ovl_getattr() to get number of blocks from lower ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry ovl: Copy up meta inode data from lowest data inode ovl: Modify ovl_lookup() and friends to lookup metacopy dentry ...
2018-08-17Btrfs: fix unexpected failure of nocow buffered writes after snapshotting ↵Robbie Ko1-0/+16
when low on space Commit e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") forced nocow writes to fallback to COW, during writeback, when a snapshot is created. This resulted in writes made before creating the snapshot to unexpectedly fail with ENOSPC during writeback when success (0) was returned to user space through the write system call. The steps leading to this problem are: 1. When it's not possible to allocate data space for a write, the buffered write path checks if a NOCOW write is possible. If it is, it will not reserve space and success (0) is returned to user space. 2. Then when a snapshot is created, the root's will_be_snapshotted atomic is incremented and writeback is triggered for all inode's that belong to the root being snapshotted. Incrementing that atomic forces all previous writes to fallback to COW during writeback (running delalloc). 3. This results in the writeback for the inodes to fail and therefore setting the ENOSPC error in their mappings, so that a subsequent fsync on them will report the error to user space. So it's not a completely silent data loss (since fsync will report ENOSPC) but it's a very unexpected and undesirable behaviour, because if a clean shutdown/unmount of the filesystem happens without previous calls to fsync, it is expected to have the data present in the files after mounting the filesystem again. So fix this by adding a new atomic named snapshot_force_cow to the root structure which prevents this behaviour and works the following way: 1. It is incremented when we start to create a snapshot after triggering writeback and before waiting for writeback to finish. 2. This new atomic is now what is used by writeback (running delalloc) to decide whether we need to fallback to COW or not. Because we incremented this new atomic after triggering writeback in the snapshot creation ioctl, we ensure that all buffered writes that happened before snapshot creation will succeed and not fallback to COW (which would make them fail with ENOSPC). 3. The existing atomic, will_be_snapshotted, is kept because it is used to force new buffered writes, that start after we started snapshotting, to reserve data space even when NOCOW is possible. This makes these writes fail early with ENOSPC when there's no available space to allocate, preventing the unexpected behaviour of writeback later failing with ENOSPC due to a fallback to COW mode. Fixes: e9894fd3e3b3 ("Btrfs: fix snapshot vs nocow writting") Signed-off-by: Robbie Ko <robbieko@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: Use wrapper macro for rcu string to remove duplicate codeMisono Tomohiro1-4/+2
Cleanup patch and no functional changes. Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: Remove fs_info from btrfs_add_root_refLu Fengqi1-2/+1
It can be referenced from the passed transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: allow defrag on a file opened read-only that has rw permissionsAdam Borowski1-2/+8
Requiring a read-write descriptor conflicts both ways with exec, returning ETXTBSY whenever you try to defrag a program that's currently being run, or causing intermittent exec failures on a live system being defragged. As defrag doesn't change the file's contents in any way, there's no reason to consider it a rw operation. Thus, let's check only whether the file could have been opened rw. Such access control is still needed as currently defrag can use extra disk space, and might trigger bugs. We return EINVAL when the request is invalid; here it's ok but merely the user has insufficient privileges. Thus, the EPERM return value reflects the error better -- as discussed in the identical case for dedupe. According to codesearch.debian.net, no userspace program distinguishes these values beyond strerror(). Signed-off-by: Adam Borowski <kilobyte@angband.pl> Reviewed-by: David Sterba <dsterba@suse.com> [ fold the EPERM patch from Adam ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: qgroup: Drop fs_info parameter from btrfs_qgroup_inheritLu Fengqi1-1/+1
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: qgroup: Drop fs_info parameter from btrfs_run_qgroupsLu Fengqi1-1/+1
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: qgroup: Drop fs_info parameter from btrfs_limit_qgroupLu Fengqi1-2/+1
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: qgroup: Drop fs_info parameter from btrfs_remove_qgroupLu Fengqi1-2/+1
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: qgroup: Drop fs_info parameter from btrfs_create_qgroupLu Fengqi1-1/+1
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: qgroup: Drop fs_info parameter from btrfs_del_qgroup_relationLu Fengqi1-2/+1
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: qgroup: Drop fs_info parameter from btrfs_add_qgroup_relationLu Fengqi1-2/+1
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: qgroups: Move transaction management inside btrfs_quota_enable/disableNikolay Borisov1-13/+2
Commit 5d23515be669 ("btrfs: Move qgroup rescan on quota enable to btrfs_quota_enable") not only resulted in an easier to follow code but it also introduced a subtle bug. It changed the timing when the initial transaction rescan was happening: - before the commit: it would happen after transaction commit had occured - after the commit: it might happen before the transaction was committed This results in failure to correctly rescan the quota since there could be data which is still not committed on disk. This patch aims to fix this by moving the transaction creation/commit inside btrfs_quota_enable, which allows to schedule the quota commit after the transaction has been committed. Fixes: 5d23515be669 ("btrfs: Move qgroup rescan on quota enable to btrfs_quota_enable") Reported-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Link: https://marc.info/?l=linux-btrfs&m=152999289017582 Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: prune unused includesDavid Sterba1-5/+0
Remove includes if none of the interfaces and exports is used in the given source file. Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: Deduplicate extent_buffer init codeNikolay Borisov1-8/+0
When a new extent buffer is allocated there are a few mandatory fields which need to be set in order for the buffer to be sane: level, generation, bytenr, backref_rev, owner and FSID/UUID. Currently this is open coded in the callers of btrfs_alloc_tree_block, meaning it's fairly high in the abstraction hierarchy of operations. This patch solves this by simply moving this init code in btrfs_init_new_buffer, since this is the function which initializes a newly allocated extent buffer. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06btrfs: Fix misleading indentation reported by smatchBart Van Assche1-2/+2
This patch avoids that building the BTRFS source code with smatch triggers complaints about inconsistent indenting. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-07-18Merge tag 'for-4.18-rc5-tag' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Three regression fixes. They're few-liners and fixing some corner cases missed in the origial patches" * tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() btrfs: fix use-after-free of cmp workspace pages btrfs: restore uuid_mutex in btrfs_open_devices
2018-07-13btrfs: fix use-after-free of cmp workspace pagesNaohiro Aota1-0/+2
btrfs_cmp_data_free() puts cmp's src_pages and dst_pages, but leaves their page address intact. Now, if you hit "goto again" in btrfs_extent_same_range() and hit some error in btrfs_cmp_data_prepare(), you'll try to unlock/put already put pages. This is simple fix to reset the address to avoid use-after-free. Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl") Signed-off-by: Naohiro Aota <naota@elisp.net> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-07-07vfs: dedupe: rationalize argsMiklos Szeredi1-3/+4
Clean up f_op->dedupe_file_range() interface. 1) Use loff_t for offsets and length instead of u64 2) Order the arguments the same way as {copy|clone}_file_range(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-07vfs: dedupe: return intMiklos Szeredi1-7/+3
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-06-26Merge tag 'for-4.18-rc1-tag' of ↵Linus Torvalds1-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two regression fixes and an incorrect error value propagation fix from 'rename exchange'" * tag 'for-4.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix return value on rename exchange failure btrfs: fix invalid-free in btrfs_extent_same Btrfs: fix physical offset reported by fiemap for inline extents
2018-06-21btrfs: fix invalid-free in btrfs_extent_sameLu Fengqi1-5/+5
If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) != (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM)) is hit, we will go to free the uninitialized cmp.src_pages and cmp.dst_pages. Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl") Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-06-15Merge tag 'vfs-timespec64' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()
2018-06-06vfs: change inode times to use struct timespec64Deepa Dinamani1-2/+2
struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
2018-06-05btrfs: Check error of btrfs_iget in btrfs_search_path_in_tree_userMisono Tomohiro1-0/+4
The patch introducing the ioctl was not the latest version at the time of merging to the mainline and needs a fixup from this patch. Fixes: ba637a252d30 ("btrfs: Check error of btrfs_iget() in btrfs_search_path_in_tree_user") Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-31btrfs: Add unprivileged version of ino_lookup ioctlTomohiro Misono1-0/+204
Add unprivileged version of ino_lookup ioctl BTRFS_IOC_INO_LOOKUP_USER to allow normal users to call "btrfs subvolume list/show" etc. in combination with BTRFS_IOC_GET_SUBVOL_INFO/BTRFS_IOC_GET_SUBVOL_ROOTREF. This can be used like BTRFS_IOC_INO_LOOKUP but the argument is different. This is because it always searches the fs/file tree correspoinding to the fd with which this ioctl is called and also returns the name of bottom subvolume. The main differences from original ino_lookup ioctl are: 1. Read + Exec permission will be checked using inode_permission() during path construction. -EACCES will be returned in case of failure. 2. Path construction will be stopped at the inode number which corresponds to the fd with which this ioctl is called. If constructed path does not exist under fd's inode, -EACCES will be returned. 3. The name of bottom subvolume is also searched and filled. Note that the maximum length of path is shorter 256 (BTRFS_VOL_NAME_MAX+1) bytes than ino_lookup ioctl because of space of subvolume's name. Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com> Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> [ style fixes ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-31btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REFTomohiro Misono1-0/+99
Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which returns ROOT_REF information of the subvolume containing this inode except the subvolume name (this is because to prevent potential name leak). The subvolume name will be gained by user version of ino_lookup ioctl (BTRFS_IOC_INO_LOOKUP_USER) which also performs permission check. The min id of root ref's subvolume to be searched is specified by @min_id in struct btrfs_ioctl_get_subvol_rootref_args. After the search ends, @min_id is set to the last searched root ref's subvolid + 1. Also, if there are more root refs than BTRFS_MAX_ROOTREF_BUFFER_NUM, -EOVERFLOW is returned. Therefore the caller can just call this ioctl again without changing the argument to continue search. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com> Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com> Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> [ style fixes and struct item renames ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-31btrfs: Add unprivileged ioctl which returns subvolume informationTomohiro Misono1-0/+121
Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns the information of subvolume containing this inode. (i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.) Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com> Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com> Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> [ minor style fixes, update struct comments ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: drop unused parameter qgroup_reservedGu JinXiang1-4/+1
Since commit 7775c8184ec0 ("btrfs: remove unused parameter from btrfs_subvolume_release_metadata") parameter qgroup_reserved is not used by caller of function btrfs_subvolume_reserve_metadata. So remove it. Signed-off-by: Gu JinXiang <gujx@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: Remove fs_info argument from btrfs_uuid_tree_remLu Fengqi1-2/+1
This function always takes a transaction handle which contains a reference to the fs_info. Use that and remove the extra argument. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> [ rename the function ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30btrfs: Remove fs_info argument from btrfs_uuid_tree_addLu Fengqi1-2/+2
This function always takes a transaction handle which contains a reference to the fs_info. Use that and remove the extra argument. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-30Btrfs: fix memory and mount leak in btrfs_ioctl_rm_dev_v2()Omar Sandoval1-2/+4
If we have invalid flags set, when we error out we must drop our writer counter and free the buffer we allocated for the arguments. This bug is trivially reproduced with the following program on 4.7+: #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <linux/btrfs.h> #include <linux/btrfs_tree.h> int main(int argc, char **argv) { struct btrfs_ioctl_vol_args_v2 vol_args = { .flags = UINT64_MAX, }; int ret; int fd; if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } fd = open(argv[1], O_WRONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args); if (ret == -1) perror("ioctl"); close(fd); return EXIT_SUCCESS; } When unmounting the filesystem, we'll hit the WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the filesystem to be remounted read-only as the writer count will stay lifted. Fixes: 6b526ed70cf1 ("btrfs: introduce device delete by devid") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-29Btrfs: fix clone vs chattr NODATASUM raceOmar Sandoval1-5/+7
In btrfs_clone_files(), we must check the NODATASUM flag while the inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags() will change the flags after we check and we can end up with a party checksummed file. The race window is only a few instructions in size, between the if and the locks which is: 3834 if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode)) 3835 return -EISDIR; where the setflags must be run and toggle the NODATASUM flag (provided the file size is 0). The clone will block on the inode lock, segflags takes the inode lock, changes flags, releases log and clone continues. Not impossible but still needs a lot of bad luck to hit unintentionally. Fixes: 0e7b824c4ef9 ("Btrfs: don't make a file partly checksummed through file clone") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: use error code returned by btrfs_read_fs_root_no_name in search ioctlMisono Tomohiro1-3/+2
btrfs_read_fs_root_no_name() may return ERR_PTR(-ENOENT) or ERR_PTR(-ENOMEM) and therefore search_ioctl() and btrfs_search_path_in_tree() should use PTR_ERR() instead of -ENOENT, which all other callers of btrfs_read_fs_root_no_name() do. Drop the error message as it would be confusing, the caller of ioctl will likely interpret the error code and not look into the syslog. Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-05-28btrfs: use kvzalloc for EXTENT_SAME temporary dataDavid Sterba1-7/+9
The dedupe range is 16 MiB, with 4 KiB pages and 8 byte pointers, the arrays can be 32KiB large. To avoid allocation failures due to fragmented memory, use the allocation with fallback to vmalloc. The arrays are allocated and freed only inside btrfs_extent_same and reused for all the ranges. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>