summaryrefslogtreecommitdiff
path: root/fs/kernfs
AgeCommit message (Collapse)AuthorFilesLines
2023-02-24Merge tag 'driver-core-6.3-rc1' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.3-rc1. There's a lot of changes this development cycle, most of the work falls into two different categories: - fw_devlink fixes and updates. This has gone through numerous review cycles and lots of review and testing by lots of different devices. Hopefully all should be good now, and Saravana will be keeping a watch for any potential regression on odd embedded systems. - driver core changes to work to make struct bus_type able to be moved into read-only memory (i.e. const) The recent work with Rust has pointed out a number of areas in the driver core where we are passing around and working with structures that really do not have to be dynamic at all, and they should be able to be read-only making things safer overall. This is the contuation of that work (started last release with kobject changes) in moving struct bus_type to be constant. We didn't quite make it for this release, but the remaining patches will be finished up for the release after this one, but the groundwork has been laid for this effort. Other than that we have in here: - debugfs memory leak fixes in some subsystems - error path cleanups and fixes for some never-able-to-be-hit codepaths. - cacheinfo rework and fixes - Other tiny fixes, full details are in the shortlog All of these have been in linux-next for a while with no reported problems" [ Geert Uytterhoeven points out that that last sentence isn't true, and that there's a pending report that has a fix that is queued up - Linus ] * tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits) debugfs: drop inline constant formatting for ERR_PTR(-ERROR) OPP: fix error checking in opp_migrate_dentry() debugfs: update comment of debugfs_rename() i3c: fix device.h kernel-doc warnings dma-mapping: no need to pass a bus_type into get_arch_dma_ops() driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place Revert "driver core: add error handling for devtmpfs_create_node()" Revert "devtmpfs: add debug info to handle()" Revert "devtmpfs: remove return value of devtmpfs_delete_node()" driver core: cpu: don't hand-override the uevent bus_type callback. devtmpfs: remove return value of devtmpfs_delete_node() devtmpfs: add debug info to handle() driver core: add error handling for devtmpfs_create_node() driver core: bus: update my copyright notice driver core: bus: add bus_get_dev_root() function driver core: bus: constify bus_unregister() driver core: bus: constify some internal functions driver core: bus: constify bus_get_kset() driver core: bus: constify bus_register/unregister_notifier() driver core: remove private pointer from struct bus_type ...
2023-01-19kernfs: remove an unused if statement in kernfs_path_from_node_locked()Zhen Lei1-3/+0
It makes no sense to call kernfs_path_from_node_locked() with NULL buf, and no one is doing that right now. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221126111634.1994-1-thunder.leizhen@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-19fs: port xattr to mnt_idmapChristian Brauner1-2/+2
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->permission() to pass mnt_idmapChristian Brauner2-3/+3
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->rename() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mkdir() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->getattr() to pass mnt_idmapChristian Brauner2-3/+3
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->setattr() to pass mnt_idmapChristian Brauner2-4/+4
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-11-23kernfs: fix all kernel-doc warnings and multiple typosRandy Dunlap6-48/+74
Fix kernel-doc warnings. Many of these are about a function's return value, so use the kernel-doc Return: format to fix those Use % prefix on numeric constant values. dir.c: fix typos/spellos file.c fix typo: s/taret/target/ Fix all of these kernel-doc warnings: dir.c:305: warning: missing initial short description on line: * kernfs_name_hash dir.c:137: warning: No description found for return value of 'kernfs_path_from_node_locked' dir.c:196: warning: No description found for return value of 'kernfs_name' dir.c:224: warning: No description found for return value of 'kernfs_path_from_node' dir.c:292: warning: No description found for return value of 'kernfs_get_parent' dir.c:312: warning: No description found for return value of 'kernfs_name_hash' dir.c:404: warning: No description found for return value of 'kernfs_unlink_sibling' dir.c:588: warning: No description found for return value of 'kernfs_node_from_dentry' dir.c:806: warning: No description found for return value of 'kernfs_find_ns' dir.c:879: warning: No description found for return value of 'kernfs_find_and_get_ns' dir.c:904: warning: No description found for return value of 'kernfs_walk_and_get_ns' dir.c:927: warning: No description found for return value of 'kernfs_create_root' dir.c:996: warning: No description found for return value of 'kernfs_root_to_node' dir.c:1016: warning: No description found for return value of 'kernfs_create_dir_ns' dir.c:1048: warning: No description found for return value of 'kernfs_create_empty_dir' dir.c:1306: warning: No description found for return value of 'kernfs_next_descendant_post' dir.c:1568: warning: No description found for return value of 'kernfs_remove_self' dir.c:1630: warning: No description found for return value of 'kernfs_remove_by_name_ns' dir.c:1667: warning: No description found for return value of 'kernfs_rename_ns' file.c:66: warning: No description found for return value of 'of_on' file.c:88: warning: No description found for return value of 'kernfs_deref_open_node_locked' file.c:1036: warning: No description found for return value of '__kernfs_create_file' inode.c:100: warning: No description found for return value of 'kernfs_setattr' mount.c:160: warning: No description found for return value of 'kernfs_root_from_sb' mount.c:198: warning: No description found for return value of 'kernfs_node_dentry' mount.c:302: warning: No description found for return value of 'kernfs_super_ns' mount.c:318: warning: No description found for return value of 'kernfs_get_tree' symlink.c:28: warning: No description found for return value of 'kernfs_create_link' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20221112031456.22980-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-21Merge 6.1-rc6 into driver-core-nextGreg Kroah-Hartman1-2/+12
We need the kernfs changes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10kernfs: Fix spurious lockdep warning in kernfs_find_and_get_node_by_id()Tejun Heo1-2/+12
c25491747b21 ("kernfs: Add KERNFS_REMOVING flags") made kernfs_find_and_get_node_by_id() test kernfs_active() instead of KERNFS_ACTIVATED. kernfs_find_and_get_by_id() is called without holding the kernfs_rwsem triggering the following lockdep warning. WARNING: CPU: 1 PID: 6191 at fs/kernfs/dir.c:36 kernfs_active+0xe8/0x120 fs/kernfs/dir.c:38 Modules linked in: CPU: 1 PID: 6191 Comm: syz-executor.1 Not tainted 6.0.0-syzkaller-09413-g4899a36f91a9 #0 Hardware name: linux,dummy-virt (DT) pstate: 10000005 (nzcV daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kernfs_active+0xe8/0x120 fs/kernfs/dir.c:36 lr : lock_is_held include/linux/lockdep.h:283 [inline] lr : kernfs_active+0x94/0x120 fs/kernfs/dir.c:36 sp : ffff8000182c7a00 x29: ffff8000182c7a00 x28: 0000000000000002 x27: 0000000000000001 x26: ffff00000ee1f6a8 x25: 1fffe00001dc3ed5 x24: 0000000000000000 x23: ffff80000ca1fba0 x22: ffff8000089efcb0 x21: 0000000000000001 x20: ffff0000091181d0 x19: ffff0000091181d0 x18: ffff00006a9e6b88 x17: 0000000000000000 x16: 0000000000000000 x15: ffff00006a9e6bc4 x14: 1ffff00003058f0e x13: 1fffe0000258c816 x12: ffff700003058f39 x11: 1ffff00003058f38 x10: ffff700003058f38 x9 : dfff800000000000 x8 : ffff80000e482f20 x7 : ffff0000091d8058 x6 : ffff80000e482c60 x5 : ffff000009402ee8 x4 : 1ffff00001bd1f46 x3 : 1fffe0000258c6d1 x2 : 0000000000000003 x1 : 00000000000000c0 x0 : 0000000000000000 Call trace: kernfs_active+0xe8/0x120 fs/kernfs/dir.c:38 kernfs_find_and_get_node_by_id+0x6c/0x140 fs/kernfs/dir.c:708 __kernfs_fh_to_dentry fs/kernfs/mount.c:102 [inline] kernfs_fh_to_dentry+0x88/0x1fc fs/kernfs/mount.c:128 exportfs_decode_fh_raw+0x104/0x560 fs/exportfs/expfs.c:435 exportfs_decode_fh+0x10/0x5c fs/exportfs/expfs.c:575 do_handle_to_path fs/fhandle.c:152 [inline] handle_to_path fs/fhandle.c:207 [inline] do_handle_open+0x2a4/0x7b0 fs/fhandle.c:223 __do_compat_sys_open_by_handle_at fs/fhandle.c:277 [inline] __se_compat_sys_open_by_handle_at fs/fhandle.c:274 [inline] __arm64_compat_sys_open_by_handle_at+0x6c/0x9c fs/fhandle.c:274 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x6c/0x260 arch/arm64/kernel/syscall.c:52 el0_svc_common.constprop.0+0xc4/0x254 arch/arm64/kernel/syscall.c:142 do_el0_svc_compat+0x40/0x70 arch/arm64/kernel/syscall.c:212 el0_svc_compat+0x54/0x140 arch/arm64/kernel/entry-common.c:772 el0t_32_sync_handler+0x90/0x140 arch/arm64/kernel/entry-common.c:782 el0t_32_sync+0x190/0x194 arch/arm64/kernel/entry.S:586 irq event stamp: 232 hardirqs last enabled at (231): [<ffff8000081edf70>] raw_spin_rq_unlock_irq kernel/sched/sched.h:1367 [inline] hardirqs last enabled at (231): [<ffff8000081edf70>] finish_lock_switch kernel/sched/core.c:4943 [inline] hardirqs last enabled at (231): [<ffff8000081edf70>] finish_task_switch.isra.0+0x200/0x880 kernel/sched/core.c:5061 hardirqs last disabled at (232): [<ffff80000c888bb4>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:404 softirqs last enabled at (228): [<ffff800008010938>] _stext+0x938/0xf58 softirqs last disabled at (207): [<ffff800008019380>] ____do_softirq+0x10/0x20 arch/arm64/kernel/irq.c:79 ---[ end trace 0000000000000000 ]--- The lockdep warning in kernfs_active() is there to ensure that the activated state stays stable for the caller. For kernfs_find_and_get_node_by_id(), all that's needed is ensuring that a node which has never been activated can't be looked up and guaranteeing lookup success when the caller knows the node to be active, both of which can be achieved by testing the active count without holding the kernfs_rwsem. Fix the spurious warning by introducing __kernfs_active() which doesn't have the lockdep annotation. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: syzbot+590ce62b128e79cf0a35@syzkaller.appspotmail.com Fixes: c25491747b21 ("kernfs: Add KERNFS_REMOVING flags") Cc: Amir Goldstein <amir73il@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/Y0SwqBsZ9BMmZv6x@slm.duckdns.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-20kernfs: dont take i_lock on revalidateIan Kent1-7/+17
In kernfs_dop_revalidate() when the passed in dentry is negative the dentry directory is checked to see if it has changed and if so the negative dentry is discarded so it can refreshed. During this check the dentry inode i_lock is taken to mitigate against a possible concurrent rename. But if it's racing with a rename, becuase the dentry is negative, it can't be the source it must be the target and it must be going to do a d_move() otherwise the rename will return an error. In this case the parent dentry of the target will not change, it will be the same over the d_move(), only the source dentry parent may change so the inode i_lock isn't needed. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/166606036967.13363.9336408133975631967.stgit@donald.themaw.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-20kernfs: dont take i_lock on inode attr readIan Kent1-4/+0
The kernfs write lock is held when the kernfs node inode attributes are updated. Therefore, when either kernfs_iop_getattr() or kernfs_iop_permission() are called the kernfs node inode attributes won't change. Consequently concurrent kernfs_refresh_inode() calls always copy the same values from the kernfs node. So there's no need to take the inode i_lock to get consistent values for generic_fillattr() and generic_permission(), the kernfs read lock is sufficient. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/166606036215.13363.1288735296954908554.stgit@donald.themaw.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-24kernfs: fix use-after-free in __kernfs_removeChristian A. Ehrhardt1-1/+4
Syzkaller managed to trigger concurrent calls to kernfs_remove_by_name_ns() for the same file resulting in a KASAN detected use-after-free. The race occurs when the root node is freed during kernfs_drain(). To prevent this acquire an additional reference for the root of the tree that is removed before calling __kernfs_remove(). Found by syzkaller with the following reproducer (slab_nomerge is required): syz_mount_image$ext4(0x0, &(0x7f0000000100)='./file0\x00', 0x100000, 0x0, 0x0, 0x0, 0x0) r0 = openat(0xffffffffffffff9c, &(0x7f0000000080)='/proc/self/exe\x00', 0x0, 0x0) close(r0) pipe2(&(0x7f0000000140)={0xffffffffffffffff, <r1=>0xffffffffffffffff}, 0x800) mount$9p_fd(0x0, &(0x7f0000000040)='./file0\x00', &(0x7f00000000c0), 0x408, &(0x7f0000000280)={'trans=fd,', {'rfdno', 0x3d, r0}, 0x2c, {'wfdno', 0x3d, r1}, 0x2c, {[{@cache_loose}, {@mmap}, {@loose}, {@loose}, {@mmap}], [{@mask={'mask', 0x3d, '^MAY_EXEC'}}, {@fsmagic={'fsmagic', 0x3d, 0x10001}}, {@dont_hash}]}}) Sample report: ================================================================== BUG: KASAN: use-after-free in kernfs_type include/linux/kernfs.h:335 [inline] BUG: KASAN: use-after-free in kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline] BUG: KASAN: use-after-free in __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369 Read of size 2 at addr ffff8880088807f0 by task syz-executor.2/857 CPU: 0 PID: 857 Comm: syz-executor.2 Not tainted 6.0.0-rc3-00363-g7726d4c3e60b #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x6e/0x91 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x5e/0x5e5 mm/kasan/report.c:433 kasan_report+0xa3/0x130 mm/kasan/report.c:495 kernfs_type include/linux/kernfs.h:335 [inline] kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline] __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369 __kernfs_remove fs/kernfs/dir.c:1356 [inline] kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589 sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f725f983aed Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f725f0f7028 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f725faa3f80 RCX: 00007f725f983aed RDX: 00000000200000c0 RSI: 0000000020000040 RDI: 0000000000000000 RBP: 00007f725f9f419c R08: 0000000020000280 R09: 0000000000000000 R10: 0000000000000408 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000006 R14: 00007f725faa3f80 R15: 00007f725f0d7000 </TASK> Allocated by task 855: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:437 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:470 kasan_slab_alloc include/linux/kasan.h:224 [inline] slab_post_alloc_hook mm/slab.h:727 [inline] slab_alloc_node mm/slub.c:3243 [inline] slab_alloc mm/slub.c:3251 [inline] __kmem_cache_alloc_lru mm/slub.c:3258 [inline] kmem_cache_alloc+0xbf/0x200 mm/slub.c:3268 kmem_cache_zalloc include/linux/slab.h:723 [inline] __kernfs_new_node+0xd4/0x680 fs/kernfs/dir.c:593 kernfs_new_node fs/kernfs/dir.c:655 [inline] kernfs_create_dir_ns+0x9c/0x220 fs/kernfs/dir.c:1010 sysfs_create_dir_ns+0x127/0x290 fs/sysfs/dir.c:59 create_dir lib/kobject.c:63 [inline] kobject_add_internal+0x24a/0x8d0 lib/kobject.c:223 kobject_add_varg lib/kobject.c:358 [inline] kobject_init_and_add+0x101/0x160 lib/kobject.c:441 sysfs_slab_add+0x156/0x1e0 mm/slub.c:5954 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 857: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:45 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:367 [inline] ____kasan_slab_free mm/kasan/common.c:329 [inline] __kasan_slab_free+0x108/0x190 mm/kasan/common.c:375 kasan_slab_free include/linux/kasan.h:200 [inline] slab_free_hook mm/slub.c:1754 [inline] slab_free_freelist_hook mm/slub.c:1780 [inline] slab_free mm/slub.c:3534 [inline] kmem_cache_free+0x9c/0x340 mm/slub.c:3551 kernfs_put.part.0+0x2b2/0x520 fs/kernfs/dir.c:547 kernfs_put+0x42/0x50 fs/kernfs/dir.c:521 __kernfs_remove.part.0+0x72d/0x960 fs/kernfs/dir.c:1407 __kernfs_remove fs/kernfs/dir.c:1356 [inline] kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589 sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888008880780 which belongs to the cache kernfs_node_cache of size 128 The buggy address is located 112 bytes inside of 128-byte region [ffff888008880780, ffff888008880800) The buggy address belongs to the physical page: page:00000000732833f8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8880 flags: 0x100000000000200(slab|node=0|zone=1) raw: 0100000000000200 0000000000000000 dead000000000122 ffff888001147280 raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888008880680: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888008880700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888008880780: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888008880800: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888008880880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ================================================================== Acked-by: Tejun Heo <tj@kernel.org> Cc: stable <stable@kernel.org> # -rc3 Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Link: https://lore.kernel.org/r/20220913121723.691454-1-lk@c--e.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Implement kernfs_show()Tejun Heo1-1/+36
Currently, kernfs nodes can be created hidden and activated later by calling kernfs_activate() to allow creation of multiple nodes to succeed or fail as a unit. This is an one-way one-time-only transition. This patch introduces kernfs_show() which can toggle visibility dynamically. As the currently proposed use - toggling the cgroup pressure files - only requires operating on leaf nodes, for the sake of simplicity, restrict it as such for now. Hiding uses the same mechanism as deactivation and likewise guarantees that there are no in-flight operations on completion. KERNFS_ACTIVATED and KERNFS_HIDDEN are used to manage the interactions between activations and show/hide operations. A node is visible iff both activated & !hidden. Cc: Chengming Zhou <zhouchengming@bytedance.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-9-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Factor out kernfs_activate_one()Tejun Heo1-10/+17
Factor out kernfs_activate_one() from kernfs_activate() and reorder operations so that KERNFS_ACTIVATED now simply indicates whether activation was attempted on the node ignoring whether activation took place. As the flag doesn't have a reader, the refactoring and reordering shouldn't cause any behavior difference. Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-8-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Add KERNFS_REMOVING flagsTejun Heo1-14/+7
KERNFS_ACTIVATED tracks whether a given node has ever been activated. As a node was only deactivated on removal, this was used for 1. Drain optimization (removed by the previous patch). 2. To hide !activated nodes 3. To avoid double activations 4. Reject adding children to a node being removed 5. Skip activaing a node which is being removed. We want to decouple deactivation from removal so that nodes can be deactivated and hidden dynamically, which makes KERNFS_ACTIVATED useless for all of the above purposes. #1 is already gone. #2 and #3 can instead test whether the node is currently active. A new flag KERNFS_REMOVING is added to explicitly mark nodes which are being removed for #4 and #5. While this leaves KERNFS_ACTIVATED with no users, leave it be as it will be used in a following patch. Cc: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-7-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Improve kernfs_drain() and always call on removalTejun Heo1-12/+12
__kernfs_remove() was skipping draining based on KERNFS_ACTIVATED - whether the node has ever been activated since creation. Instead, update it to always call kernfs_drain() which now drains or skips based on the precise drain conditions. This ensures that the nodes will be deactivated and drained regardless of their states. This doesn't make meaningful difference now but will enable deactivating and draining nodes dynamically by making removals safe when racing those operations. While at it, drop / update comments. v2: Fix the inverted test on kernfs_should_drain_open_files() noted by Chengming. This was fixed by the next unrelated patch in the previous posting. Cc: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-6-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Skip kernfs_drain_open_files() more aggressivelyTejun Heo3-21/+48
Track the number of mmapped files and files that need to be released and skip kernfs_drain_open_file() if both are zero, which are the precise conditions which require draining open_files. The early exit test is factored into kernfs_should_drain_open_files() which is now tested by kernfs_drain_open_files()'s caller - kernfs_drain(). This isn't a meaningful optimization on its own but will enable future stand-alone kernfs_deactivate() implementation. v2: Chengming noticed that on->nr_to_release was leaking after ->open() failure. Fix it by telling kernfs_unlink_open_file() that it's called from the ->open() fail path and should dec the counter. Use kzalloc() to allocate kernfs_open_node so that the tracking fields are correctly initialized. Cc: Chengming Zhou <zhouchengming@bytedance.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-5-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Refactor kernfs_get_open_node()Tejun Heo1-14/+11
Factor out commont part. This is cleaner and should help with future changes. No functional changes. Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-4-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Drop unnecessary "mutex" local variable initializationTejun Heo1-4/+5
These are unnecessary and unconventional. Remove them. Also move variable declaration into the block that it's used. No functional changes. Cc: Imran Khan <imran.f.khan@oracle.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-3-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-01kernfs: Simply by replacing kernfs_deref_open_node() with of_on()Tejun Heo1-43/+13
kernfs_node->attr.open is an RCU pointer to kernfs_open_node. However, RCU dereference is currently only used in kernfs_notify(). Everywhere else, either we're holding the lock which protects it or know that the kernfs_open_node is pinned becaused we have a pointer to a kernfs_open_file which is hanging off of it. kernfs_deref_open_node() is used for the latter case - accessing kernfs_open_node from kernfs_open_file. The lifetime and visibility rules are simple and clear here. To someone who can access a kernfs_open_file, its kernfs_open_node is pinned and visible through of->kn->attr.open. Replace kernfs_deref_open_node() which simpler of_on(). The former takes both @kn and @of and RCU deref @kn->attr.open while sanity checking with @of. The latter takes @of and uses protected deref on of->kn->attr.open. As the return value can't be NULL, remove the error handling in the callers too. This shouldn't cause any functional changes. Cc: Imran Khan <imran.f.khan@oracle.com> Tested-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220828050440.734579-2-tj@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-28kernfs: Fix typo 'the the' in commentSlark Xiao1-1/+1
Replace 'the the' with 'the' in the comment. Signed-off-by: Slark Xiao <slark_xiao@163.com> Link: https://lore.kernel.org/r/20220722100518.79741-1-slark_xiao@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-06Revert "kernfs: Change kernfs_notify_list to llist."Imran Khan1-20/+27
This reverts commit b8f35fa1188b84035c59d4842826c4e93a1b1c9f. This is causing regression due to same kernfs_node getting added multiple times in kernfs_notify_list so revert it until safe way of using llist in this context is found. Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Michael Walle <michael@walle.cc> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Cc: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220705201026.2487665-1-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-01kernfs: fix potential NULL dereference in __kernfs_removeYushan Zhou1-2/+5
When lockdep is enabled, lockdep_assert_held_write would cause potential NULL pointer dereference. Fix the following smatch warnings: fs/kernfs/dir.c:1353 __kernfs_remove() warn: variable dereferenced before check 'kn' (see line 1346) Fixes: 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock") Signed-off-by: Yushan Zhou <katrinzhou@tencent.com> Link: https://lore.kernel.org/r/20220630082512.3482581-1-zys.zljxml@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-27kernfs: Replace global kernfs_open_file_mutex with hashed mutexes.Imran Khan3-14/+26
In current kernfs design a single mutex, kernfs_open_file_mutex, protects the list of kernfs_open_file instances corresponding to a sysfs attribute. So even if different tasks are opening or closing different sysfs files they can contend on osq_lock of this mutex. The contention is more apparent in large scale systems with few hundred CPUs where most of the CPUs have running tasks that are opening, accessing or closing sysfs files at any point of time. Using hashed mutexes in place of a single global mutex, can significantly reduce contention around global mutex and hence can provide better scalability. Moreover as these hashed mutexes are not part of kernfs_node objects we will not see any singnificant change in memory utilization of kernfs based file systems like sysfs, cgroupfs etc. Modify interface introduced in previous patch to make use of hashed mutexes. Use kernfs_node address as hashing key. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220615021059.862643-5-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-27kernfs: Introduce interface to access global kernfs_open_file_mutex.Imran Khan1-18/+38
This allows to change underlying mutex locking, without needing to change the users of the lock. For example next patch modifies this interface to use hashed mutexes in place of a single global kernfs_open_file_mutex. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220615021059.862643-4-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-27kernfs: Change kernfs_notify_list to llist.Imran Khan1-27/+20
At present kernfs_notify_list is implemented as a singly linked list of kernfs_node(s), where last element points to itself and value of ->attr.next tells if node is present on the list or not. Both addition and deletion to list happen under kernfs_notify_lock. Change kernfs_notify_list to llist so that addition to list can heppen locklessly. Suggested by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220615021059.862643-3-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-27kernfs: make ->attr.open RCU protected.Imran Khan1-46/+101
After removal of kernfs_open_node->refcnt in the previous patch, kernfs_open_node_lock can be removed as well by making ->attr.open RCU protected. kernfs_put_open_node can delegate freeing to ->attr.open to RCU and other readers of ->attr.open can do so under rcu_read_(un)lock. Suggested by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220615021059.862643-2-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-27kernfs/file.c: remove redundant error return counter assignmentLin Feng1-1/+0
Since previous 'rc = -EINVAL;', rc value doesn't change, so not necessary to re-assign it again. Signed-off-by: Lin Feng <linf@wangsu.com> Link: https://lore.kernel.org/r/20220617091746.206515-1-linf@wangsu.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-19kernfs: Separate kernfs_pr_cont_buf and rename_lock.Hao Luo1-12/+19
Previously the protection of kernfs_pr_cont_buf was piggy backed by rename_lock, which means that pr_cont() needs to be protected under rename_lock. This can cause potential circular lock dependencies. If there is an OOM, we have the following call hierarchy: -> cpuset_print_current_mems_allowed() -> pr_cont_cgroup_name() -> pr_cont_kernfs_name() pr_cont_kernfs_name() will grab rename_lock and call printk. So we have the following lock dependencies: kernfs_rename_lock -> console_sem Sometimes, printk does a wakeup before releasing console_sem, which has the dependence chain: console_sem -> p->pi_lock -> rq->lock Now, imagine one wants to read cgroup_name under rq->lock, for example, printing cgroup_name in a tracepoint in the scheduler code. They will be holding rq->lock and take rename_lock: rq->lock -> kernfs_rename_lock Now they will deadlock. A prevention to this circular lock dependency is to separate the protection of pr_cont_buf from rename_lock. In principle, rename_lock is to protect the integrity of cgroup name when copying to buf. Once pr_cont_buf has got its content, rename_lock can be dropped. So it's safe to drop rename_lock after kernfs_name_locked (and kernfs_path_from_node_locked) and rely on a dedicated pr_cont_lock to protect pr_cont_buf. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220516190951.3144144-1-haoluo@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-06kernfs: Rename kernfs_put_open_node to kernfs_unlink_open_file.Imran Khan1-9/+19
Since we are no longer using refcnt for kernfs_open_node instances, rename kernfs_put_open_node to kernfs_unlink_open_file to reflect this change. Also update function description and inline comments accordingly. Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220504095123.295859-2-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-02Merge 5.18-rc5 into driver-core-nextGreg Kroah-Hartman1-1/+6
We need the kernfs/driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27kernfs: fix NULL dereferencing in kernfs_removeMinchan Kim1-1/+6
kernfs_remove supported NULL kernfs_node param to bail out but revent per-fs lock change introduced regression that dereferencing the param without NULL check so kernel goes crash. This patch checks the NULL kernfs_node in kernfs_remove and if so, just return. Quote from bug report by Jirka ``` The bug is triggered by running NAS Parallel benchmark suite on SuperMicro servers with 2x Xeon(R) Gold 6126 CPU. Here is the error log: [ 247.035564] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 247.036009] #PF: supervisor read access in kernel mode [ 247.036009] #PF: error_code(0x0000) - not-present page [ 247.036009] PGD 0 P4D 0 [ 247.036009] Oops: 0000 [#1] PREEMPT SMP PTI [ 247.058060] CPU: 1 PID: 6546 Comm: umount Not tainted 5.16.0393c3714081a53795bbff0e985d24146def6f57f+ #16 [ 247.058060] Hardware name: Supermicro Super Server/X11DDW-L, BIOS 2.0b 03/07/2018 [ 247.058060] RIP: 0010:kernfs_remove+0x8/0x50 [ 247.058060] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 49 c7 c4 f4 ff ff ff eb b2 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 41 54 55 <48> 8b 47 08 48 89 fd 48 85 c0 48 0f 44 c7 4c 8b 60 50 49 83 c4 60 [ 247.058060] RSP: 0018:ffffbbfa48a27e48 EFLAGS: 00010246 [ 247.058060] RAX: 0000000000000001 RBX: ffffffff89e31f98 RCX: 0000000080200018 [ 247.058060] RDX: 0000000080200019 RSI: fffff6760786c900 RDI: 0000000000000000 [ 247.058060] RBP: ffffffff89e31f98 R08: ffff926b61b24d00 R09: 0000000080200018 [ 247.122048] R10: ffff926b61b24d00 R11: ffff926a8040c000 R12: ffff927bd09a2000 [ 247.122048] R13: ffffffff89e31fa0 R14: dead000000000122 R15: dead000000000100 [ 247.122048] FS: 00007f01be0a8c40(0000) GS:ffff926fa8e40000(0000) knlGS:0000000000000000 [ 247.122048] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 247.122048] CR2: 0000000000000008 CR3: 00000001145c6003 CR4: 00000000007706e0 [ 247.122048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 247.122048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 247.122048] PKRU: 55555554 [ 247.122048] Call Trace: [ 247.122048] <TASK> [ 247.122048] rdt_kill_sb+0x29d/0x350 [ 247.122048] deactivate_locked_super+0x36/0xa0 [ 247.122048] cleanup_mnt+0x131/0x190 [ 247.122048] task_work_run+0x5c/0x90 [ 247.122048] exit_to_user_mode_prepare+0x229/0x230 [ 247.122048] syscall_exit_to_user_mode+0x18/0x40 [ 247.122048] do_syscall_64+0x48/0x90 [ 247.122048] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 247.122048] RIP: 0033:0x7f01be2d735b ``` Link: https://bugzilla.kernel.org/show_bug.cgi?id=215696 Link: https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/ Fixes: 393c3714081a (kernfs: switch global kernfs_rwsem lock to per-fs lock) Cc: stable@vger.kernel.org Reported-by: Jirka Hladky <jhladky@redhat.com> Tested-by: Jirka Hladky <jhladky@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20220427172152.3505364-1-minchan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-27kernfs: Remove reference counting for kernfs_open_node.Imran Khan1-14/+9
The decision to free kernfs_open_node object in kernfs_put_open_node can be taken based on whether kernfs_open_node->files list is empty or not. As far as kernfs_drain_open_files is concerned it can't overlap with kernfs_fops_open and hence can check for ->attr.open optimistically (if ->attr.open is NULL) or under kernfs_open_file_mutex (if it needs to traverse the ->files list.) Thus kernfs_drain_open_files can work w/o ref counting involved kernfs_open_node as well. So remove ->refcnt and modify the above mentioned users accordingly. Suggested by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220324103040.584491-2-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-02Merge branch 'work.misc' of ↵Linus Torvalds1-6/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted bits and pieces" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: aio: drop needless assignment in aio_read() clean overflow checks in count_mounts() a bit seq_file: fix NULL pointer arithmetic warning uml/x86: use x86 load_unaligned_zeropad() asm/user.h: killed unused macros constify struct path argument of finish_automount()/do_add_mount() fs: Remove FIXME comment in generic_write_checks()
2022-03-18kernfs: fix typos in commentsJulia Lawall1-1/+1
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20220314115354.144023-5-Julia.Lawall@inria.fr Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23kernfs: move struct kernfs_root out of the public view.Greg Kroah-Hartman2-0/+27
There is no need to have struct kernfs_root be part of kernfs.h for the whole kernel to see and poke around it. Move it internal to kernfs code and provide a helper function, kernfs_root_to_node(), to handle the one field that kernfs users were directly accessing from the structure. Cc: Imran Khan <imran.f.khan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220222070713.3517679-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-22kernfs: remove redundant kernfs_rwsem declaration.Imran Khan1-1/+0
Since 'commit 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock")' per-fs kernfs_rwsem has replaced global kernfs_rwsem. Remove redundant declaration of global kernfs_rwsem. Fixes: 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock") Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Link: https://lore.kernel.org/r/20220218010205.717582-1-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-01seq_file: fix NULL pointer arithmetic warningMaíra Canal1-6/+1
Implement conditional logic in order to replace NULL pointer arithmetic. The use of NULL pointer arithmetic was pointed out by clang with the following warning: fs/kernfs/file.c:128:15: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] return NULL + !*ppos; ~~~~ ^ fs/seq_file.c:559:14: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] return NULL + (*pos == 0); Signed-off-by: Maíra Canal <maira.canal@usp.br> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-12-03kernfs: prevent early freeing of root nodeMinchan Kim1-1/+7
Marek reported the warning below. ========================= WARNING: held lock freed! 5.16.0-rc2+ #10984 Not tainted ------------------------- kworker/1:0/18 is freeing memory ffff00004034e200-ffff00004034e3ff, with a lock still held there! ffff00004034e348 (&root->kernfs_rwsem){++++}-{3:3}, at: __kernfs_remove+0x310/0x37c 3 locks held by kworker/1:0/18: #0: ffff000040107938 ((wq_completion)cgroup_destroy){+.+.}-{0:0}, at: process_one_work+0x1f0/0x6f0 #1: ffff80000b55bdc0 ((work_completion)(&(&css->destroy_rwork)->work)){+.+.}-{0:0}, at: process_one_work+0x1f0/0x6f0 #2: ffff00004034e348 (&root->kernfs_rwsem){++++}-{3:3}, at: __kernfs_remove+0x310/0x37c stack backtrace: CPU: 1 PID: 18 Comm: kworker/1:0 Not tainted 5.16.0-rc2+ #10984 Hardware name: Raspberry Pi 4 Model B (DT) Workqueue: cgroup_destroy css_free_rwork_fn Call trace: dump_backtrace+0x0/0x1ac show_stack+0x18/0x24 dump_stack_lvl+0x8c/0xb8 dump_stack+0x18/0x34 debug_check_no_locks_freed+0x124/0x140 kfree+0xf0/0x3a4 kernfs_put+0x1f8/0x224 __kernfs_remove+0x1b8/0x37c kernfs_destroy_root+0x38/0x50 css_free_rwork_fn+0x288/0x3d4 process_one_work+0x288/0x6f0 worker_thread+0x74/0x470 kthread+0x188/0x194 ret_from_fork+0x10/0x20 Since kernfs moves the kernfs_rwsem lock into root, it couldn't hold the lock when the root node is tearing down. Thus, get the refcount of root node. Fixes: 393c3714081a ("kernfs: switch global kernfs_rwsem lock to per-fs lock") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211201231648.1027165-1-minchan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-24kernfs: switch global kernfs_rwsem lock to per-fs lockMinchan Kim5-63/+95
The kernfs implementation has big lock granularity(kernfs_rwsem) so every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the lock. It makes trouble for some cases to wait the global lock for a long time even though they are totally independent contexts each other. A general example is process A goes under direct reclaim with holding the lock when it accessed the file in sysfs and process B is waiting the lock with exclusive mode and then process C is waiting the lock until process B could finish the job after it gets the lock from process A. This patch switches the global kernfs_rwsem to per-fs lock, which put the rwsem into kernfs_root. Suggested-by: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-18Merge 5.15-rc6 into driver-core-nextGreg Kroah-Hartman1-1/+8
We need the driver-core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-05fs/kernfs/symlink.c: replace S_IRWXUGO with 0777 on kernfs_create_link()Luis Chamberlain1-2/+1
If one ends up extending this line checkpatch will complain about the use of S_IRWXUGO suggesting it is not preferred and that 0777 should be used instead. Take the tip from checkpatch and do that change before we do our subsequent changes. This makes no functional changes. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20210927163805.808907-8-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-04kernfs: don't create a negative dentry if inactive node existsIan Kent1-1/+8
It's been reported that doing stress test for module insertion and removal can result in an ENOENT from libkmod for a valid module. In kernfs_iop_lookup() a negative dentry is created if there's no kernfs node associated with the dentry or the node is inactive. But inactive kernfs nodes are meant to be invisible to the VFS and creating a negative dentry for these can have unexpected side effects when the node transitions to an active state. The point of creating negative dentries is to avoid the expensive alloc/free cycle that occurs if there are frequent lookups for kernfs attributes that don't exist. So kernfs nodes that are not yet active should not result in a negative dentry being created so when they transition to an active state VFS lookups can create an associated dentry is a natural way. It's also been reported that https://github.com/osandov/blktests.git test block/001 hangs during the test. It was suggested that recent changes to blktests might have caused it but applying this patch resolved the problem without change to blktests. Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching") Tested-by: Yi Zhang <yi.zhang@redhat.com> ACKed-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/163330943316.19450.15056895533949392922.stgit@mickey.themaw.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-28kernfs: also call kernfs_set_rev() for positive dentryHou Tao1-2/+7
A KMSAN warning is reported by Alexander Potapenko: BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 d_revalidate fs/namei.c:854 lookup_dcache fs/namei.c:1522 __lookup_hash+0x3a6/0x590 fs/namei.c:1543 filename_create+0x312/0x7c0 fs/namei.c:3657 do_mkdirat+0x103/0x930 fs/namei.c:3900 __do_sys_mkdir fs/namei.c:3931 __se_sys_mkdir fs/namei.c:3929 __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929 do_syscall_x64 arch/x86/entry/common.c:51 It seems a positive dentry in kernfs becomes a negative dentry directly through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized when accessing it in kernfs_dop_revalidate(), because it is only initialized when created as negative dentry in kernfs_iop_lookup(). The problem can be reproduced by the following command: cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi A simple fixes seems to be initializing d->d_time for positive dentry in kernfs_iop_lookup() as well. The downside is the negative dentry will be revalidated again after it becomes negative in d_delete(), because the revison of its parent must have been increased due to its removal. Alternative solution is implement .d_iput for kernfs, and assign d_time for the newly-generated negative dentry in it. But we may need to take kernfs_rwsem to protect again the concurrent kernfs_link_sibling() on the parent directory, it is a little over-killing. Now the simple fix is chosen. Link: https://marc.info/?l=linux-fsdevel&m=163249838610499 Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20210928140750.1274441-1-houtao1@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27kernfs: dont call d_splice_alias() under kernfs node lockIan Kent1-4/+2
The call to d_splice_alias() in kernfs_iop_lookup() doesn't depend on any kernfs node so there's no reason to hold the kernfs node lock when calling it. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642772000.63632.10672683419693513226.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27kernfs: use i_lock to protect concurrent inode updatesIan Kent2-8/+14
The inode operations .permission() and .getattr() use the kernfs node write lock but all that's needed is the read lock to protect against partial updates of these kernfs node fields which are all done under the write lock. And .permission() is called frequently during path walks and can cause quite a bit of contention between kernfs node operations and path walks when the number of concurrent walks is high. To change kernfs_iop_getattr() and kernfs_iop_permission() to take the rw sem read lock instead of the write lock an additional lock is needed to protect against multiple processes concurrently updating the inode attributes and link count in kernfs_refresh_inode(). The inode i_lock seems like the sensible thing to use to protect these inode attribute updates so use it in kernfs_refresh_inode(). The last hunk in the patch, applied to kernfs_fill_super(), is possibly not needed but taking the lock was present originally. I prefer to continue to take it to protect against a partial update of the source kernfs fields during the call to kernfs_refresh_inode() made by kernfs_get_inode(). Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642771474.63632.16295959115893904470.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27kernfs: switch kernfs to use an rwsemIan Kent6-70/+71
The kernfs global lock restricts the ability to perform kernfs node lookup operations in parallel during path walks. Change the kernfs mutex to an rwsem so that, when opportunity arises, node searches can be done in parallel with path walk lookups. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642770946.63632.2218304587223241374.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-27kernfs: use VFS negative dentry cachingIan Kent1-20/+35
If there are many lookups for non-existent paths these negative lookups can lead to a lot of overhead during path walks. The VFS allows dentries to be created as negative and hashed, and caches them so they can be used to reduce the fairly high overhead alloc/free cycle that occurs during these lookups. Use the kernfs node parent revision to identify if a change has been made to the containing directory so that the negative dentry can be discarded and the lookup redone. Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://lore.kernel.org/r/162642770420.63632.15791924970508867106.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>