summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2023-11-28Merge tag 'for-6.7-rc3-tag' of ↵Linus Torvalds9-10/+62
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few fixes and message updates: - for simple quotas, handle the case when a snapshot is created and the target qgroup already exists - fix a warning when file descriptor given to send ioctl is not writable - fix off-by-one condition when checking chunk maps - free pages when page array allocation fails during compression read, other cases were handled - fix memory leak on error handling path in ref-verify debugging feature - copy missing struct member 'version' in 64/32bit compat send ioctl - tree-checker verifies inline backref ordering - print messages to syslog on first mount and last unmount - update error messages when reading chunk maps" * tag 'for-6.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: send: ensure send_fd is writable btrfs: free the allocated memory if btrfs_alloc_page_array() fails btrfs: fix 64bit compat send ioctl arguments not initializing version member btrfs: make error messages more clear when getting a chunk map btrfs: fix off-by-one when checking chunk map includes logical address btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod() btrfs: add dmesg output for first mount and last unmount of a filesystem btrfs: do not abort transaction if there is already an existing qgroup btrfs: tree-checker: add type and sequence check for inline backrefs
2023-11-28Merge tag '6.7-rc3-smb3-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds9-141/+162
Pull smb server fixes from Steve French: - Memory leak fix - Fix possible deadlock in open - Multiple SMB3 leasing (caching) fixes including: - incorrect open count (found via xfstest generic/002 with leases) - lease breaking incorrect serialization - lease break error handling fix - fix sending async response when lease pending - Async command fix * tag '6.7-rc3-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: don't update ->op_state as OPLOCK_STATE_NONE on error ksmbd: move setting SMB2_FLAGS_ASYNC_COMMAND and AsyncId ksmbd: release interim response after sending status pending response ksmbd: move oplock handling after unlock parent dir ksmbd: separately allocate ci per dentry ksmbd: fix possible deadlock in smb2_open ksmbd: prevent memory leak on error return
2023-11-27debugfs: add API to allow debugfs operations cancellationJohannes Berg3-1/+118
In some cases there might be longer-running hardware accesses in debugfs files, or attempts to acquire locks, and we want to still be able to quickly remove the files. Introduce a cancellations API to use inside the debugfs handler functions to be able to cancel such operations on a per-file basis. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-27debugfs: annotate debugfs handlers vs. removal with lockdepJohannes Berg3-0/+28
When you take a lock in a debugfs handler but also try to remove the debugfs file under that lock, things can deadlock since the removal has to wait for all users to finish. Add lockdep annotations in debugfs_file_get()/_put() to catch such issues. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-27debugfs: fix automount d_fsdata usageJohannes Berg3-9/+36
debugfs_create_automount() stores a function pointer in d_fsdata, but since commit 7c8d469877b1 ("debugfs: add support for more elaborate ->d_fsdata") debugfs_release_dentry() will free it, now conditionally on DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, but that's not set for the function pointer in automount. As a result, removing an automount dentry would attempt to free the function pointer. Luckily, the only user of this (tracing) never removes it. Nevertheless, it's safer if we just handle the fsdata in one way, namely either DEBUGFS_FSDATA_IS_REAL_FOPS_BIT or allocated. Thus, change the automount to allocate it, and use the real_fops in the data to indicate whether or not automount is filled, rather than adding a type tag. At least for now this isn't actually needed, but the next changes will require it. Also check in debugfs_file_get() that it gets only called on regular files, just to make things clearer. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-11-27Merge tag 'trace-v6.7-rc2' of ↵Linus Torvalds2-48/+30
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt:: "Eventfs fixes: - With the usage of simple_recursive_remove() recommended by Al Viro, the code should not be calling "d_invalidate()" itself. Doing so is causing crashes. The code was calling d_invalidate() on the race of trying to look up a file while the parent was being deleted. This was detected, and the added dentry was having d_invalidate() called on it, but the deletion of the directory was also calling d_invalidate() on that same dentry. - A fix to not free the eventfs_inode (ei) until the last dput() was called on its ei->dentry made the ei->dentry exist even after it was marked for free by setting the ei->is_freed. But code elsewhere still was checking if ei->dentry was NULL if ei->is_freed is set and would trigger WARN_ON if that was the case. That's no longer true and there should not be any warnings when it is true. - Use GFP_NOFS for allocations done under eventfs_mutex. The eventfs_mutex can be taken on file system reclaim, make sure that allocations done under that mutex do not trigger file system reclaim. - Clean up code by moving the taking of inode_lock out of the helper functions and into where they are needed, and not use the parameter to know to take it or not. It must always be held but some callers of the helper function have it taken when they were called. - Warn if the inode_lock is not held in the helper functions. - Warn if eventfs_start_creating() is called without a parent. As eventfs is underneath tracefs, all files created will have a parent (the top one will have a tracefs parent). Tracing update: - Add Mathieu Desnoyers as an official reviewer of the tracing subsystem" * tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer eventfs: Make sure that parent->d_inode is locked in creating files/dirs eventfs: Do not allow NULL parent to eventfs_start_creating() eventfs: Move taking of inode_lock into dcache_dir_open_wrapper() eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held eventfs: Do not invalidate dentry in create_file/dir_dentry() eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
2023-11-26Merge tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds10-375/+314
Pull smb client fixes from Steve French: - use after free fix in releasing multichannel interfaces - fixes for special file types (report char, block, FIFOs properly when created e.g. by NFS to Windows) - fixes for reporting various special file types and symlinks properly when using SMB1 * tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: introduce cifs_sfu_make_node() smb: client: set correct file type from NFS reparse points smb: client: introduce ->parse_reparse_point() smb: client: implement ->query_reparse_point() for SMB1 cifs: fix use after free for iface while disabling secondary channels
2023-11-26bcachefs: bpos is misaligned on big endianKent Overstreet1-1/+5
bkey embeds a bpos that is misaligned on big endian; this is so that bch2_bkey_swab() works correctly without having to differentiate between packed and non-packed keys (a debatable design decision). This means it can't have the __aligned() tag on big endian. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-26bcachefs: Fix ec + durability calculationKent Overstreet1-18/+12
Durability of an erasure coded pointer doesn't add the device durability; durability is the same for any extent in that stripe so the calculation only comes from the stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-26bcachefs: Data update path won't accidentaly grow replicasKent Overstreet5-67/+96
Previously, there was a bug where if an extent had greater durability than required (because we needed to move a durability=1 pointer and ended up putting it on a durability 2 device), we would submit a write for replicas=2 - the durability of the pointer being rewritten - instead of the number of replicas required to bring it back up to the data_replicas option. This, plus the allocation path sometimes allocating on a greater durability device than requested, meant that extents could continue having more and more replicas added as they were being rewritten. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-25Merge tag 'xfs-6.7-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2-5/+21
Pull xfs fix from Chandan Babu: - Validate quota records recovered from the log before writing them to the disk. * tag 'xfs-6.7-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: dquot recovery does not validate the recovered dquot xfs: clean up dqblk extraction
2023-11-24Merge tag 'afs-fixes-20231124' of ↵Linus Torvalds5-3/+18
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: - Fix the afs_server_list struct to be cleaned up with RCU - Fix afs to translate a no-data result from a DNS lookup into ENOENT, not EDESTADDRREQ for consistency with OpenAFS - Fix afs to translate a negative DNS lookup result into ENOENT rather than EDESTADDRREQ - Fix file locking on R/O volumes to operate in local mode as the server doesn't handle exclusive locks on such files - Set SB_RDONLY on superblocks for RO and Backup volumes so that the VFS can see that they're read only * tag 'afs-fixes-20231124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Mark a superblock for an R/O or Backup volume as SB_RDONLY afs: Fix file locking on R/O volumes to operate in local mode afs: Return ENOENT if no cell DNS record can be found afs: Make error on cell lookup failure consistent with OpenAFS afs: Fix afs_server_list to be cleaned up with RCU
2023-11-24btrfs: send: ensure send_fd is writableJann Horn1-1/+1
kernel_write() requires the caller to ensure that the file is writable. Let's do that directly after looking up the ->send_fd. We don't need a separate bailout path because the "out" path already does fput() if ->send_filp is non-NULL. This has no security impact for two reasons: - the ioctl requires CAP_SYS_ADMIN - __kernel_write() bails out on read-only files - but only since 5.8, see commit a01ac27be472 ("fs: check FMODE_WRITE in __kernel_write") Reported-and-tested-by: syzbot+12e098239d20385264d3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=12e098239d20385264d3 Fixes: 31db9f7c23fb ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive") CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-24btrfs: free the allocated memory if btrfs_alloc_page_array() failsQu Wenruo1-3/+8
[BUG] If btrfs_alloc_page_array() fail to allocate all pages but part of the slots, then the partially allocated pages would be leaked in function btrfs_submit_compressed_read(). [CAUSE] As explicitly stated, if btrfs_alloc_page_array() returned -ENOMEM, caller is responsible to free the partially allocated pages. For the existing call sites, most of them are fine: - btrfs_raid_bio::stripe_pages Handled by free_raid_bio(). - extent_buffer::pages[] Handled btrfs_release_extent_buffer_pages(). - scrub_stripe::pages[] Handled by release_scrub_stripe(). But there is one exception in btrfs_submit_compressed_read(), if btrfs_alloc_page_array() failed, we didn't cleanup the array and freed the array pointer directly. Initially there is still the error handling in commit dd137dd1f2d7 ("btrfs: factor out allocating an array of pages"), but later in commit 544fe4a903ce ("btrfs: embed a btrfs_bio into struct compressed_bio"), the error handling is removed, leading to the possible memory leak. [FIX] This patch would add back the error handling first, then to prevent such situation from happening again, also Make btrfs_alloc_page_array() to free the allocated pages as a extra safety net, then we don't need to add the error handling to btrfs_submit_compressed_read(). Fixes: 544fe4a903ce ("btrfs: embed a btrfs_bio into struct compressed_bio") CC: stable@vger.kernel.org # 6.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-24btrfs: fix 64bit compat send ioctl arguments not initializing version memberDavid Sterba1-0/+1
When the send protocol versioning was added in 5.16 e77fbf990316 ("btrfs: send: prepare for v2 protocol"), the 32/64bit compat code was not updated (added by 2351f431f727 ("btrfs: fix send ioctl on 32bit with 64bit kernel")), missing the version struct member. The compat code is probably rarely used, nobody reported any bugs. Found by tool https://github.com/jirislaby/clang-struct . Fixes: e77fbf990316 ("btrfs: send: prepare for v2 protocol") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-24Merge tag 'vfs-6.7-rc3.fixes' of ↵Linus Torvalds10-56/+97
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Avoid calling back into LSMs from vfs_getattr_nosec() calls. IMA used to query inode properties accessing raw inode fields without dedicated helpers. That was finally fixed a few releases ago by forcing IMA to use vfs_getattr_nosec() helpers. The goal of the vfs_getattr_nosec() helper is to query for attributes without calling into the LSM layer which would be quite problematic because incredibly IMA is called from __fput()... __fput() -> ima_file_free() What it does is to call back into the filesystem to update the file's IMA xattr. Querying the inode without using vfs_getattr_nosec() meant that IMA didn't handle stacking filesystems such as overlayfs correctly. So the switch to vfs_getattr_nosec() is quite correct. But the switch to vfs_getattr_nosec() revealed another bug when used on stacking filesystems: __fput() -> ima_file_free() -> vfs_getattr_nosec() -> i_op->getattr::ovl_getattr() -> vfs_getattr() -> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr() -> security_inode_getattr() # calls back into LSMs Now, if that __fput() happens from task_work_run() of an exiting task current->fs and various other pointer could already be NULL. So anything in the LSM layer relying on that not being NULL would be quite surprised. Fix that by passing the information that this is a security request through to the stacking filesystem by adding a new internal ATT_GETATTR_NOSEC flag. Now the callchain becomes: __fput() -> ima_file_free() -> vfs_getattr_nosec() -> i_op->getattr::ovl_getattr() -> if (AT_GETATTR_NOSEC) vfs_getattr_nosec() else vfs_getattr() -> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr() - Fix a bug introduced with the iov_iter rework from last cycle. This broke /proc/kcore by copying too much and without the correct offset. - Add a missing NULL check when allocating the root inode in autofs_fill_super(). - Fix stable writes for multi-device filesystems (xfs, btrfs etc) and the block device pseudo filesystem. Stable writes used to be a superblock flag only, making it a per filesystem property. Add an additional AS_STABLE_WRITES mapping flag to allow for fine-grained control. - Ensure that offset_iterate_dir() returns 0 after reaching the end of a directory so it adheres to getdents() convention. * tag 'vfs-6.7-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: libfs: getdents() should return 0 after reaching EOD xfs: respect the stable writes flag on the RT device xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags block: update the stable_writes flag in bdev_add filemap: add a per-mapping stable writes flag autofs: add: new_inode check in autofs_fill_super() iov_iter: fix copy_page_to_iter_nofault() fs: Pass AT_GETATTR_NOSEC flag to getattr interface function
2023-11-24afs: Mark a superblock for an R/O or Backup volume as SB_RDONLYDavid Howells1-1/+3
Mark a superblock that is for for an R/O or Backup volume as SB_RDONLY when mounting it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-11-24afs: Fix file locking on R/O volumes to operate in local modeDavid Howells1-0/+2
AFS doesn't really do locking on R/O volumes as fileservers don't maintain state with each other and thus a lock on a R/O volume file on one fileserver will not be be visible to someone looking at the same file on another fileserver. Further, the server may return an error if you try it. Fix this by doing what other AFS clients do and handle filelocking on R/O volume files entirely within the client and don't touch the server. Fixes: 6c6c1d63c243 ("afs: Provide mount-time configurable byte-range file locking emulation") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-11-24afs: Return ENOENT if no cell DNS record can be foundDavid Howells1-0/+10
Make AFS return error ENOENT if no cell SRV or AFSDB DNS record (or cellservdb config file record) can be found rather than returning EDESTADDRREQ. Also add cell name lookup info to the cursor dump. Fixes: d5c32c89b208 ("afs: Fix cell DNS lookup") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216637 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-11-24bcachefs: deallocate_extra_replicas()Kent Overstreet1-0/+27
When allocating from devices with different durability, we might end up with more replicas than required; this changes bch2_alloc_sectors_start() to check for this, and drop replicas that aren't needed to hit the number of replicas requested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Proper refcounting for journal_keysKent Overstreet6-11/+42
The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: preserve device path as device nameBrian Foster3-2/+7
Various userspace scripts/tools may expect mount entries in /proc/mounts to reflect the device path names used to mount the associated filesystem. bcachefs seems to normalize the device path to the underlying device name based on the block device. This confuses tools like fstests when the test devices might be lvm or device-mapper based. The default behavior for show_vfsmnt() appers to be to use the string passed to alloc_vfsmnt(), so tweak bcachefs to copy the path at device superblock read time and to display it via ->show_devname(). Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Fix an endianness conversionKent Overstreet1-1/+1
cpu_to_le32(), not le32_to_cpu() - fixes a sparse complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Start gc, copygc, rebalance threads after initing writes refKent Overstreet1-12/+16
This fixes a bug where copygc would occasionally race with going read-write and die, thinking we were read only, because it couldn't take a ref on c->writes. It's not necessary for copygc (or rebalance, or copygc) to take write refs; they could run with BCH_TRANS_COMMIT_nocheck_rw, but this is an easier fix that making sure that flag is passed correctly everywhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Don't stop copygc thread on device resizeKent Overstreet1-2/+0
copygc no longer has to scan the buckets, so it's no longer a problem if the number of buckets is changing while it's running. This also fixes a bug where we forgot to restart copygc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Make sure bch2_move_ratelimit() also waits for move_opsKent Overstreet2-13/+23
This adds move_ctxt_wait_event_timeout(), which can sleep for a timeout while also issueing pending moves as reads complete. Co-developed-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: bch2_moving_ctxt_flush_all()Kent Overstreet1-5/+11
Introduce a new helper to flush all move IOs, and use it in a few places where we should have been. The new helper also drops btree locks before waiting on outstanding move writes, avoiding potential deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Put erasure coding behind an EXPERIMENTAL kconfig optionKent Overstreet2-0/+15
We still have disk space accounting changes coming for erasure coding, and the changes won't be as strictly backwards compatible as they'd ought to be - specifically, we need to start accounting striped data under a separate counter in bch_alloc (which describes buckets). A fsck will suffice for upgrading/downgrading, but since erasure coding is the most incomplete major feature of bcachefs it still makes sense to put behind a separate kconfig option, so that users are fully aware. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24closures: CLOSURE_CALLBACK() to fix type punningKent Overstreet7-29/+26
Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <keescook@chromium.org> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24ksmbd: don't update ->op_state as OPLOCK_STATE_NONE on errorNamjae Jeon1-1/+0
ksmbd set ->op_state as OPLOCK_STATE_NONE on lease break ack error. op_state of lease should not be updated because client can send lease break ack again. This patch fix smb2.lease.breaking2 test failure. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-24ksmbd: move setting SMB2_FLAGS_ASYNC_COMMAND and AsyncIdNamjae Jeon1-5/+2
Directly set SMB2_FLAGS_ASYNC_COMMAND flags and AsyncId in smb2 header of interim response instead of current response header. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-24ksmbd: release interim response after sending status pending responseNamjae Jeon2-1/+5
Add missing release async id and delete interim response entry after sending status pending response. This only cause when smb2 lease is enable. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-24ksmbd: move oplock handling after unlock parent dirNamjae Jeon1-56/+65
ksmbd should process secound parallel smb2 create request during waiting oplock break ack. parent lock range that is too large in smb2_open() causes smb2_open() to be serialized. Move the oplock handling to the bottom of smb2_open() and make it called after parent unlock. This fixes the failure of smb2.lease.breaking1 testcase. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-24ksmbd: separately allocate ci per dentryNamjae Jeon4-25/+18
xfstests generic/002 test fail when enabling smb2 leases feature. This test create hard link file, but removeal failed. ci has a file open count to count file open through the smb client, but in the case of hard link files, The allocation of ci per inode cause incorrectly open count for file deletion. This patch allocate ci per dentry to counts open counts for hard link. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-24ksmbd: fix possible deadlock in smb2_openNamjae Jeon5-59/+75
[ 8743.393379] ====================================================== [ 8743.393385] WARNING: possible circular locking dependency detected [ 8743.393391] 6.4.0-rc1+ #11 Tainted: G OE [ 8743.393397] ------------------------------------------------------ [ 8743.393402] kworker/0:2/12921 is trying to acquire lock: [ 8743.393408] ffff888127a14460 (sb_writers#8){.+.+}-{0:0}, at: ksmbd_vfs_setxattr+0x3d/0xd0 [ksmbd] [ 8743.393510] but task is already holding lock: [ 8743.393515] ffff8880360d97f0 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}, at: ksmbd_vfs_kern_path_locked+0x181/0x670 [ksmbd] [ 8743.393618] which lock already depends on the new lock. [ 8743.393623] the existing dependency chain (in reverse order) is: [ 8743.393628] -> #1 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}: [ 8743.393648] down_write_nested+0x9a/0x1b0 [ 8743.393660] filename_create+0x128/0x270 [ 8743.393670] do_mkdirat+0xab/0x1f0 [ 8743.393680] __x64_sys_mkdir+0x47/0x60 [ 8743.393690] do_syscall_64+0x5d/0x90 [ 8743.393701] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 8743.393711] -> #0 (sb_writers#8){.+.+}-{0:0}: [ 8743.393728] __lock_acquire+0x2201/0x3b80 [ 8743.393737] lock_acquire+0x18f/0x440 [ 8743.393746] mnt_want_write+0x5f/0x240 [ 8743.393755] ksmbd_vfs_setxattr+0x3d/0xd0 [ksmbd] [ 8743.393839] ksmbd_vfs_set_dos_attrib_xattr+0xcc/0x110 [ksmbd] [ 8743.393924] compat_ksmbd_vfs_set_dos_attrib_xattr+0x39/0x50 [ksmbd] [ 8743.394010] smb2_open+0x3432/0x3cc0 [ksmbd] [ 8743.394099] handle_ksmbd_work+0x2c9/0x7b0 [ksmbd] [ 8743.394187] process_one_work+0x65a/0xb30 [ 8743.394198] worker_thread+0x2cf/0x700 [ 8743.394209] kthread+0x1ad/0x1f0 [ 8743.394218] ret_from_fork+0x29/0x50 This patch add mnt_want_write() above parent inode lock and remove nested mnt_want_write calls in smb2_open(). Fixes: 40b268d384a2 ("ksmbd: add mnt_want_write to ksmbd vfs functions") Cc: stable@vger.kernel.org Reported-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-24ksmbd: prevent memory leak on error returnZongmin Zhou1-2/+5
When allocated memory for 'new' failed,just return will cause memory leak of 'ar'. Fixes: 1819a9042999 ("ksmbd: reorganize ksmbd_iov_pin_rsp()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202311031837.H3yo7JVl-lkp@intel.com/ Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-24btrfs: make error messages more clear when getting a chunk mapFilipe Manana1-3/+4
When getting a chunk map, at btrfs_get_chunk_map(), we do some sanity checks to verify we found a chunk map and that map found covers the logical address the caller passed in. However the messages aren't very clear in the sense that don't mention the issue is with a chunk map and one of them prints the 'length' argument as if it were the end offset of the requested range (while the in the string format we use %llu-%llu which suggests a range, and the second %llu-%llu is actually a range for the chunk map). So improve these two details in the error messages. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-24btrfs: fix off-by-one when checking chunk map includes logical addressFilipe Manana1-1/+1
At btrfs_get_chunk_map() we get the extent map for the chunk that contains the given logical address stored in the 'logical' argument. Then we do sanity checks to verify the extent map contains the logical address. One of these checks verifies if the extent map covers a range with an end offset behind the target logical address - however this check has an off-by-one error since it will consider an extent map whose start offset plus its length matches the target logical address as inclusive, while the fact is that the last byte it covers is behind the target logical address (by 1). So fix this condition by using '<=' rather than '<' when comparing the extent map's "start + length" against the target logical address. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-24btrfs: ref-verify: fix memory leaks in btrfs_ref_tree_mod()Bragatheswaran Manickavel1-0/+2
In btrfs_ref_tree_mod(), when !parent 're' was allocated through kmalloc(). In the following code, if an error occurs, the execution will be redirected to 'out' or 'out_unlock' and the function will be exited. However, on some of the paths, 're' are not deallocated and may lead to memory leaks. For example: lookup_block_entry() for 'be' returns NULL, the out label will be invoked. During that flow ref and 'ra' are freed but not 're', which can potentially lead to a memory leak. CC: stable@vger.kernel.org # 5.10+ Reported-and-tested-by: syzbot+d66de4cbf532749df35f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d66de4cbf532749df35f Signed-off-by: Bragatheswaran Manickavel <bragathemanick0908@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-24btrfs: add dmesg output for first mount and last unmount of a filesystemQu Wenruo2-1/+5
There is a feature request to add dmesg output when unmounting a btrfs. There are several alternative methods to do the same thing, but with their own problems: - Use eBPF to watch btrfs_put_super()/open_ctree() Not end user friendly, they have to dip their head into the source code. - Watch for directory /sys/fs/<uuid>/ This is way more simple, but still requires some simple device -> uuid lookups. And a script needs to use inotify to watch /sys/fs/. Compared to all these, directly outputting the information into dmesg would be the most simple one, with both device and UUID included. And since we're here, also add the output when mounting a filesystem for the first time for parity. A more fine grained monitoring of subvolume mounts should be done by another layer, like audit. Now mounting a btrfs with all default mkfs options would look like this: [81.906566] BTRFS info (device dm-8): first mount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2 [81.907494] BTRFS info (device dm-8): using crc32c (crc32c-intel) checksum algorithm [81.908258] BTRFS info (device dm-8): using free space tree [81.912644] BTRFS info (device dm-8): auto enabling async discard [81.913277] BTRFS info (device dm-8): checking UUID tree [91.668256] BTRFS info (device dm-8): last unmount of filesystem 633b5c16-afe3-4b79-b195-138fe145e4f2 CC: stable@vger.kernel.org # 5.4+ Link: https://github.com/kdave/btrfs-progs/issues/689 Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-23smb: client: introduce cifs_sfu_make_node()Paulo Alcantara3-120/+52
Remove duplicate code and add new helper for creating special files in SFU (Services for UNIX) format that can be shared by SMB1+ code. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23smb: client: set correct file type from NFS reparse pointsPaulo Alcantara8-61/+116
Handle all file types in NFS reparse points as specified in MS-FSCC 2.1.2.6 Network File System (NFS) Reparse Data Buffer. The client is now able to set all file types based on the parsed NFS reparse point, which used to support only symlinks. This works for SMB1+. Before patch: $ mount.cifs //srv/share /mnt -o ... $ ls -l /mnt ls: cannot access 'block': Operation not supported ls: cannot access 'char': Operation not supported ls: cannot access 'fifo': Operation not supported ls: cannot access 'sock': Operation not supported total 1 l????????? ? ? ? ? ? block l????????? ? ? ? ? ? char -rwxr-xr-x 1 root root 5 Nov 18 23:22 f0 l????????? ? ? ? ? ? fifo l--------- 1 root root 0 Nov 18 23:23 link -> f0 l????????? ? ? ? ? ? sock After patch: $ mount.cifs //srv/share /mnt -o ... $ ls -l /mnt total 1 brwxr-xr-x 1 root root 123, 123 Nov 18 00:34 block crwxr-xr-x 1 root root 1234, 1234 Nov 18 00:33 char -rwxr-xr-x 1 root root 5 Nov 18 23:22 f0 prwxr-xr-x 1 root root 0 Nov 18 23:23 fifo lrwxr-xr-x 1 root root 0 Nov 18 23:23 link -> f0 srwxr-xr-x 1 root root 0 Nov 19 2023 sock Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23smb: client: introduce ->parse_reparse_point()Paulo Alcantara4-42/+56
Parse reparse point into cifs_open_info_data structure and feed it through cifs_open_info_to_fattr(). Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23smb: client: implement ->query_reparse_point() for SMB1Paulo Alcantara5-175/+113
Reparse points are not limited to symlinks, so implement ->query_reparse_point() in order to handle different file types. Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23cifs: fix use after free for iface while disabling secondary channelsRitvik Budhiraja1-1/+1
We were deferencing iface after it has been released. Fix is to release after all dereference instances have been encountered. Signed-off-by: Ritvik Budhiraja <rbudhiraja@microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202311110815.UJaeU3Tt-lkp@intel.com/ Signed-off-by: Steve French <stfrench@microsoft.com>
2023-11-23eventfs: Make sure that parent->d_inode is locked in creating files/dirsSteven Rostedt (Google)1-0/+4
Since the locking of the parent->d_inode has been moved outside the creation of the files and directories (as it use to be locked via a conditional), add a WARN_ON_ONCE() to the case that it's not locked. Link: https://lkml.kernel.org/r/20231121231112.853962542@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-23eventfs: Do not allow NULL parent to eventfs_start_creating()Steven Rostedt (Google)1-9/+4
The eventfs directory is dynamically created via the meta data supplied by the existing trace events. All files and directories in eventfs has a parent. Do not allow NULL to be passed into eventfs_start_creating() as the parent because that should never happen. Warn if it does. Link: https://lkml.kernel.org/r/20231121231112.693841807@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-23eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()Steven Rostedt (Google)1-14/+2
The both create_file_dentry() and create_dir_dentry() takes a boolean parameter "lookup", as on lookup the inode_lock should already be taken, but for dcache_dir_open_wrapper() it is not taken. There's no reason that the dcache_dir_open_wrapper() can't take the inode_lock before calling these functions. In fact, it's better if it does, as the lock can be held throughout both directory and file creations. This also simplifies the code, and possibly prevents unexpected race conditions when the lock is released. Link: https://lkml.kernel.org/r/20231121231112.528544825@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-23eventfs: Use GFP_NOFS for allocation when eventfs_mutex is heldSteven Rostedt (Google)1-2/+2
If memory reclaim happens, it can reclaim file system pages. The file system pages from eventfs may take the eventfs_mutex on reclaim. This means that allocation while holding the eventfs_mutex must not call into filesystem reclaim. A lockdep splat uncovered this. Link: https://lkml.kernel.org/r/20231121231112.373501894@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 28e12c09f5aa0 ("eventfs: Save ownership and mode") Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-22xfs: dquot recovery does not validate the recovered dquotDarrick J. Wong1-0/+14
When we're recovering ondisk quota records from the log, we need to validate the recovered buffer contents before writing them to disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>