summaryrefslogtreecommitdiff
path: root/fs/tracefs/event_inode.c
AgeCommit message (Collapse)AuthorFilesLines
2023-11-02eventfs: Use simple_recursive_removal() to clean up dentriesSteven Rostedt (Google)1-33/+44
Looking at how dentry is removed via the tracefs system, I found that eventfs does not do everything that it did under tracefs. The tracefs removal of a dentry calls simple_recursive_removal() that does a lot more than a simple d_invalidate(). As it should be a requirement that any eventfs_inode that has a dentry, so does its parent. When removing a eventfs_inode, if it has a dentry, a call to simple_recursive_removal() on that dentry should clean up all the dentries underneath it. Add WARN_ON_ONCE() to check for the parent having a dentry if any children do. Link: https://lore.kernel.org/all/20231101022553.GE1957730@ZenIV/ Link: https://lkml.kernel.org/r/20231101172650.552471568@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02eventfs: Remove special processing of dput() of events directorySteven Rostedt (Google)1-17/+2
The top level events directory is no longer special with regards to how it should be delete. Remove the extra processing for it in eventfs_set_ei_status_free(). Link: https://lkml.kernel.org/r/20231101172650.340876747@goodmis.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02eventfs: Delete eventfs_inode when the last dentry is freedSteven Rostedt (Google)1-79/+67
There exists a race between holding a reference of an eventfs_inode dentry and the freeing of the eventfs_inode. If user space has a dentry held long enough, it may still be able to access the dentry's eventfs_inode after it has been freed. To prevent this, have he eventfs_inode freed via the last dput() (or via RCU if the eventfs_inode does not have a dentry). This means reintroducing the eventfs_inode del_list field at a temporary place to put the eventfs_inode. It needs to mark it as freed (via the list) but also must invalidate the dentry immediately as the return from eventfs_remove_dir() expects that they are. But the dentry invalidation must not be called under the eventfs_mutex, so it must be done after the eventfs_inode is marked as free (put on a deletion list). Link: https://lkml.kernel.org/r/20231101172650.123479767@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ajay Kaher <akaher@vmware.com> Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02eventfs: Hold eventfs_mutex when calling callback functionsSteven Rostedt (Google)1-2/+20
The callback function that is used to create inodes and dentries is not protected by anything and the data that is passed to it could become stale. After eventfs_remove_dir() is called by the tracing system, it is free to remove the events that are associated to that directory. Unfortunately, that means the callbacks must not be called after that. CPU0 CPU1 ---- ---- eventfs_root_lookup() { eventfs_remove_dir() { mutex_lock(&event_mutex); ei->is_freed = set; mutex_unlock(&event_mutex); } kfree(event_call); for (...) { entry = &ei->entries[i]; r = entry->callback() { call = data; // call == event_call above if (call->flags ...) [ USE AFTER FREE BUG ] The safest way to protect this is to wrap the callback with: mutex_lock(&eventfs_mutex); if (!ei->is_freed) r = entry->callback(); else r = -1; mutex_unlock(&eventfs_mutex); This will make sure that the callback will not be called after it is freed. But now it needs to be known that the callback is called while holding internal eventfs locks, and that it must not call back into the eventfs / tracefs system. There's no reason it should anyway, but document that as well. Link: https://lore.kernel.org/all/CA+G9fYu9GOEbD=rR5eMR-=HJ8H6rMsbzDC2ZY5=Y50WpWAE7_Q@mail.gmail.com/ Link: https://lkml.kernel.org/r/20231101172649.906696613@goodmis.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02eventfs: Save ownership and modeSteven Rostedt (Google)1-13/+135
Now that inodes and dentries are created on the fly, they are also reclaimed on memory pressure. Since the ownership and file mode are saved in the inode, if they are freed, any changes to the ownership and mode will be lost. To counter this, if the user changes the permissions or ownership, save them, and when creating the inodes again, restore those changes. Link: https://lkml.kernel.org/r/20231101172649.691841445@goodmis.org Cc: stable@vger.kernel.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02eventfs: Test for ei->is_freed when accessing ei->dentrySteven Rostedt (Google)1-6/+39
The eventfs_inode (ei) is protected by SRCU, but the ei->dentry is not. It is protected by the eventfs_mutex. Anytime the eventfs_mutex is released, and access to the ei->dentry needs to be done, it should first check if ei->is_freed is set under the eventfs_mutex. If it is, then the ei->dentry is invalid and must not be used. The ei->dentry must only be accessed under the eventfs_mutex and after checking if ei->is_freed is set. When the ei is being freed, it will (under the eventfs_mutex) set is_freed and at the same time move the dentry to a free list to be cleared after the eventfs_mutex is released. This means that any access to the ei->dentry must check first if ei->is_freed is set, because if it is, then the dentry is on its way to be freed. Also add comments to describe this better. Link: https://lore.kernel.org/all/CA+G9fYt6pY+tMZEOg=SoEywQOe19fGP3uR15SGowkdK+_X85Cg@mail.gmail.com/ Link: https://lore.kernel.org/all/CA+G9fYuDP3hVQ3t7FfrBAjd_WFVSurMgCepTxunSJf=MTe=6aA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20231101172649.477608228@goodmis.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Beau Belgrave <beaub@linux.microsoft.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02eventfs: Have a free_ei() that just frees the eventfs_inodeSteven Rostedt (Google)1-8/+11
As the eventfs_inode is freed in two different locations, make a helper function free_ei() to make sure all the allocated fields of the eventfs_inode is freed. This requires renaming the existing free_ei() which is called by the srcu handler to free_rcu_ei() and have free_ei() just do the freeing, where free_rcu_ei() will call it. Link: https://lkml.kernel.org/r/20231101172649.265214087@goodmis.org Cc: Ajay Kaher <akaher@vmware.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02eventfs: Remove "is_freed" union with rcu headSteven Rostedt (Google)1-0/+2
The eventfs_inode->is_freed was a union with the rcu_head with the assumption that when it was on the srcu list the head would contain a pointer which would make "is_freed" true. But that was a wrong assumption as the rcu head is a single link list where the last element is NULL. Instead, split the nr_entries integer so that "is_freed" is one bit and the nr_entries is the next 31 bits. As there shouldn't be more than 10 (currently there's at most 5 to 7 depending on the config), this should not be a problem. Link: https://lkml.kernel.org/r/20231101172649.049758712@goodmis.org Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ajay Kaher <akaher@vmware.com> Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02eventfs: Fix kerneldoc of eventfs_remove_rec()Steven Rostedt (Google)1-2/+4
The eventfs_remove_rec() had some missing parameters in the kerneldoc comment above it. Also, rephrase the description a bit more to have a bit more correct grammar. Link: https://lore.kernel.org/linux-trace-kernel/20231030121523.0b2225a7@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode"); Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310052216.4SgqasWo-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-11-02eventfs: Remove extra dget() in eventfs_create_events_dir()Steven Rostedt (Google)1-3/+0
The creation of the top events directory does a dget() at the end of the creation in eventfs_create_events_dir() with a comment saying the final dput() will happen when it is removed. The problem is that a dget() is already done on the dentry when it was created with tracefs_start_creating()! The dget() now just causes a memory leak of that dentry. Remove the extra dget() as the final dput() in the deletion of the events directory actually matches the one in tracefs_start_creating(). Link: https://lore.kernel.org/linux-trace-kernel/20231031124229.4f2e3fa1@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-26eventfs: Fix WARN_ON() in create_file_dentry()Steven Rostedt (Google)1-1/+2
As the comment right above a WARN_ON() in create_file_dentry() states: * Note, with the mutex held, the e_dentry cannot have content * and the ei->is_freed be true at the same time. But the WARN_ON() only has: WARN_ON_ONCE(ei->is_free); Where to match the comment (and what it should actually do) is: dentry = *e_dentry; WARN_ON_ONCE(dentry && ei->is_free) Also in that case, set dentry to NULL (although it should never happen). Link: https://lore.kernel.org/linux-trace-kernel/20231024123628.62b88755@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-20tracefs/eventfs: Modify mismatched function nameJiapeng Chong1-1/+1
No functional modification involved. fs/tracefs/event_inode.c:864: warning: expecting prototype for eventfs_remove(). Prototype was for eventfs_remove_dir() instead. Link: https://lore.kernel.org/linux-trace-kernel/20231019031353.73846-1-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6939 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-20eventfs: Fix failure path in eventfs_create_events_dir()Steven Rostedt (Google)1-1/+2
The failure path of allocating ei goes to a path that dereferences ei. Add another label that skips over the ei dereferences to do the rest of the clean up. Link: https://lore.kernel.org/all/70e7bace-561c-95f-1117-706c2c220bc@inria.fr/ Link: https://lore.kernel.org/linux-trace-kernel/20231019204132.6662fef0@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Fixes: 5790b1fb3d67 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Julia Lawall <julia.lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-18eventfs: Use ERR_CAST() in eventfs_create_events_dir()Nathan Chancellor1-1/+1
When building with clang and CONFIG_RANDSTRUCT_FULL=y, there is an error due to a cast in eventfs_create_events_dir(): fs/tracefs/event_inode.c:734:10: error: casting from randomized structure pointer type 'struct dentry *' to 'struct eventfs_inode *' 734 | return (struct eventfs_inode *)dentry; | ^ 1 error generated. Use the ERR_CAST() function to resolve the error, as it was designed for this exact situation (casting an error pointer to another type). Link: https://lore.kernel.org/linux-trace-kernel/20231018-ftrace-fix-clang-randstruct-v1-1-338cb214abfb@kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/1947 Fixes: 5790b1fb3d67 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-05eventfs: Use eventfs_remove_events_dir()Steven Rostedt (Google)1-12/+7
The update to removing the eventfs_file changed the way the events top level directory was handled. Instead of returning a dentry, it now returns the eventfs_inode. In this changed, the removing of the events top level directory is not much different than removing any of the other directories. Because of this, the removal just called eventfs_remove_dir() instead of eventfs_remove_events_dir(). Although eventfs_remove_dir() does the clean up, it misses out on the dget() of the ei->dentry done in eventfs_create_events_dir(). It makes more sense to match eventfs_create_events_dir() with a specific function eventfs_remove_events_dir() and this specific function can then perform the dput() to the dentry that had the dget() when it was created. Fixes: 5790b1fb3d67 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310051743.y9EobbUr-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-10-05eventfs: Remove eventfs_file and just use eventfs_inodeSteven Rostedt (Google)1-415/+432
Instead of having a descriptor for every file represented in the eventfs directory, only have the directory itself represented. Change the API to send in a list of entries that represent all the files in the directory (but not other directories). The entry list contains a name and a callback function that will be used to create the files when they are accessed. struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent, const struct eventfs_entry *entries, int size, void *data); is used for the top level eventfs directory, and returns an eventfs_inode that will be used by: struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent, const struct eventfs_entry *entries, int size, void *data); where both of the above take an array of struct eventfs_entry entries for every file that is in the directory. The entries are defined by: typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data, const struct file_operations **fops); struct eventfs_entry { const char *name; eventfs_callback callback; }; Where the name is the name of the file and the callback gets called when the file is being created. The callback passes in the name (in case the same callback is used for multiple files), a pointer to the mode, data and fops. The data will be pointing to the data that was passed in eventfs_create_dir() or eventfs_create_events_dir() but may be overridden to point to something else, as it will be used to point to the inode->i_private that is created. The information passed back from the callback is used to create the dentry/inode. If the callback fills the data and the file should be created, it must return a positive number. On zero or negative, the file is ignored. This logic may also be used as a prototype to convert entire pseudo file systems into just-in-time allocation. The "show_events_dentry" file has been updated to show the directories, and any files they have. With just the eventfs_file allocations: Before after deltas for meminfo (in kB): MemFree: -14360 MemAvailable: -14260 Buffers: 40 Cached: 24 Active: 44 Inactive: 48 Inactive(anon): 28 Active(file): 44 Inactive(file): 20 Dirty: -4 AnonPages: 28 Mapped: 4 KReclaimable: 132 Slab: 1604 SReclaimable: 132 SUnreclaim: 1472 Committed_AS: 12 Before after deltas for slabinfo: <slab>: <objects> [ * <size> = <total>] ext4_inode_cache 27 [* 1184 = 31968 ] extent_status 102 [* 40 = 4080 ] tracefs_inode_cache 144 [* 656 = 94464 ] buffer_head 39 [* 104 = 4056 ] shmem_inode_cache 49 [* 800 = 39200 ] filp -53 [* 256 = -13568 ] dentry 251 [* 192 = 48192 ] lsm_file_cache 277 [* 32 = 8864 ] vm_area_struct -14 [* 184 = -2576 ] trace_event_file 1748 [* 88 = 153824 ] kmalloc-1k 35 [* 1024 = 35840 ] kmalloc-256 49 [* 256 = 12544 ] kmalloc-192 -28 [* 192 = -5376 ] kmalloc-128 -30 [* 128 = -3840 ] kmalloc-96 10581 [* 96 = 1015776 ] kmalloc-64 3056 [* 64 = 195584 ] kmalloc-32 1291 [* 32 = 41312 ] kmalloc-16 2310 [* 16 = 36960 ] kmalloc-8 9216 [* 8 = 73728 ] Free memory dropped by 14,360 kB Available memory dropped by 14,260 kB Total slab additions in size: 1,771,032 bytes With this change: Before after deltas for meminfo (in kB): MemFree: -12084 MemAvailable: -11976 Buffers: 32 Cached: 32 Active: 72 Inactive: 168 Inactive(anon): 176 Active(file): 72 Inactive(file): -8 Dirty: 24 AnonPages: 196 Mapped: 8 KReclaimable: 148 Slab: 836 SReclaimable: 148 SUnreclaim: 688 Committed_AS: 324 Before after deltas for slabinfo: <slab>: <objects> [ * <size> = <total>] tracefs_inode_cache 144 [* 656 = 94464 ] shmem_inode_cache -23 [* 800 = -18400 ] filp -92 [* 256 = -23552 ] dentry 179 [* 192 = 34368 ] lsm_file_cache -3 [* 32 = -96 ] vm_area_struct -13 [* 184 = -2392 ] trace_event_file 1748 [* 88 = 153824 ] kmalloc-1k -49 [* 1024 = -50176 ] kmalloc-256 -27 [* 256 = -6912 ] kmalloc-128 1864 [* 128 = 238592 ] kmalloc-64 4685 [* 64 = 299840 ] kmalloc-32 -72 [* 32 = -2304 ] kmalloc-16 256 [* 16 = 4096 ] total = 721352 Free memory dropped by 12,084 kB Available memory dropped by 11,976 kB Total slab additions in size: 721,352 bytes That's over 2 MB in savings per instance for free and available memory, and over 1 MB in savings per instance of slab memory. Link: https://lore.kernel.org/linux-trace-kernel/20231003184059.4924468e@gandalf.local.home Link: https://lore.kernel.org/linux-trace-kernel/20231004165007.43d79161@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ajay Kaher <akaher@vmware.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-30eventfs: Test for dentries array allocated in eventfs_release()Steven Rostedt (Google)1-1/+1
The dcache_dir_open_wrapper() could be called when a dynamic event is being deleted leaving a dentry with no children. In this case the dlist->dentries array will never be allocated. This needs to be checked for in eventfs_release(), otherwise it will trigger a NULL pointer dereference. Link: https://lore.kernel.org/linux-trace-kernel/20230930090106.1c3164e9@rorschach.local.home Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: ef36b4f92868 ("eventfs: Remember what dentries were created on dir open") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-22eventfs: Remember what dentries were created on dir openSteven Rostedt (Google)1-17/+70
Using the following code with libtracefs: int dfd; // create the directory events/kprobes/kp1 tracefs_kprobe_raw(NULL, "kp1", "schedule_timeout", "time=$arg1"); // Open the kprobes directory dfd = tracefs_instance_file_open(NULL, "events/kprobes", O_RDONLY); // Do a lookup of the kprobes/kp1 directory (by looking at enable) tracefs_file_exists(NULL, "events/kprobes/kp1/enable"); // Now create a new entry in the kprobes directory tracefs_kprobe_raw(NULL, "kp2", "schedule_hrtimeout", "expires=$arg1"); // Do another lookup to create the dentries tracefs_file_exists(NULL, "events/kprobes/kp2/enable")) // Close the directory close(dfd); What happened above, the first open (dfd) will call dcache_dir_open_wrapper() that will create the dentries and up their ref counts. Now the creation of "kp2" will add another dentry within the kprobes directory. Upon the close of dfd, eventfs_release() will now do a dput for all the entries in kprobes. But this is where the problem lies. The open only upped the dentry of kp1 and not kp2. Now the close is decrementing both kp1 and kp2, which causes kp2 to get a negative count. Doing a "trace-cmd reset" which deletes all the kprobes cause the kernel to crash! (due to the messed up accounting of the ref counts). To solve this, save all the dentries that are opened in the dcache_dir_open_wrapper() into an array, and use this array to know what dentries to do a dput on in eventfs_release(). Since the dcache_dir_open_wrapper() calls dcache_dir_open() which uses the file->private_data, we need to also add a wrapper around dcache_readdir() that uses the cursor assigned to the file->private_data. This is because the dentries need to also be saved in the file->private_data. To do this create the structure: struct dentry_list { void *cursor; struct dentry **dentries; }; Which will hold both the cursor and the dentries. Some shuffling around is needed to make sure that dcache_dir_open() and dcache_readdir() only see the cursor. Link: https://lore.kernel.org/linux-trace-kernel/20230919211804.230edf1e@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20230922163446.1431d4fa@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ajay Kaher <akaher@vmware.com> Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions") Reported-by: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-12tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper()Steven Rostedt (Google)1-1/+2
The eventfs files list is protected by SRCU. In earlier iterations it was protected with just RCU, but because it needed to also call sleepable code, it had to be switch to SRCU. The dcache_dir_open_wrapper() list_for_each_rcu() was missed and did not get converted over to list_for_each_srcu(). That needs to be fixed. Link: https://lore.kernel.org/linux-trace-kernel/20230911120053.ca82f545e7f46ea753deda18@kernel.org/ Link: https://lore.kernel.org/linux-trace-kernel/20230911200654.71ce927c@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ajay Kaher <akaher@vmware.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-09tracefs/eventfs: Free top level files on removalSteven Rostedt (Google)1-4/+26
When an instance is removed, the top level files of the eventfs directory are not cleaned up. Call the eventfs_remove() on each of the entries to free them. This was found via kmemleak: unreferenced object 0xffff8881047c1280 (size 96): comm "mkdir", pid 924, jiffies 4294906489 (age 2013.077s) hex dump (first 32 bytes): 18 31 ed 03 81 88 ff ff 00 31 09 24 81 88 ff ff .1.......1.$.... 00 00 00 00 00 00 00 00 98 19 7c 04 81 88 ff ff ..........|..... backtrace: [<000000000fa46b4d>] kmalloc_trace+0x2a/0xa0 [<00000000e729cd0c>] eventfs_prepare_ef.constprop.0+0x3a/0x160 [<000000009032e6a8>] eventfs_add_events_file+0xa0/0x160 [<00000000fe968442>] create_event_toplevel_files+0x6f/0x130 [<00000000e364d173>] event_trace_add_tracer+0x14/0x140 [<00000000411840fa>] trace_array_create_dir+0x52/0xf0 [<00000000967804fa>] trace_array_create+0x208/0x370 [<00000000da505565>] instance_mkdir+0x6b/0xb0 [<00000000dc1215af>] tracefs_syscall_mkdir+0x5b/0x90 [<00000000a8aca289>] vfs_mkdir+0x272/0x380 [<000000007709b242>] do_mkdirat+0xfc/0x1d0 [<00000000c0b6d219>] __x64_sys_mkdir+0x78/0xa0 [<0000000097b5dd4b>] do_syscall_64+0x3f/0x90 [<00000000a3f00cfa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 unreferenced object 0xffff888103ed3118 (size 8): comm "mkdir", pid 924, jiffies 4294906489 (age 2013.077s) hex dump (first 8 bytes): 65 6e 61 62 6c 65 00 00 enable.. backtrace: [<0000000010f75127>] __kmalloc_node_track_caller+0x51/0x160 [<000000004b3eca91>] kstrdup+0x34/0x60 [<0000000050074d7a>] eventfs_prepare_ef.constprop.0+0x53/0x160 [<000000009032e6a8>] eventfs_add_events_file+0xa0/0x160 [<00000000fe968442>] create_event_toplevel_files+0x6f/0x130 [<00000000e364d173>] event_trace_add_tracer+0x14/0x140 [<00000000411840fa>] trace_array_create_dir+0x52/0xf0 [<00000000967804fa>] trace_array_create+0x208/0x370 [<00000000da505565>] instance_mkdir+0x6b/0xb0 [<00000000dc1215af>] tracefs_syscall_mkdir+0x5b/0x90 [<00000000a8aca289>] vfs_mkdir+0x272/0x380 [<000000007709b242>] do_mkdirat+0xfc/0x1d0 [<00000000c0b6d219>] __x64_sys_mkdir+0x78/0xa0 [<0000000097b5dd4b>] do_syscall_64+0x3f/0x90 [<00000000a3f00cfa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Link: https://lore.kernel.org/linux-trace-kernel/20230907175859.6fedbaa2@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ajay Kaher <akaher@vmware.com> Cc: Zheng Yejian <zhengyejian1@huawei.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 5bdcd5f5331a2 eventfs: ("Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-07tracefs/eventfs: Use dput to free the toplevel events directorySteven Rostedt (Google)1-5/+12
Currently when rmdir on an instance is done, eventfs_remove_events_dir() is called and it does a dput on the dentry and then frees the eventfs_inode that represents the events directory. But there's no protection against a reader reading the top level events directory at the same time and we can get a use after free error. Instead, use the dput() associated to the dentry to also free the eventfs_inode associated to the events directory, as that will get called when the last reference to the directory is released. This issue triggered the following KASAN report: ================================================================== BUG: KASAN: slab-use-after-free in eventfs_root_lookup+0x88/0x1b0 Read of size 8 at addr ffff888120130ca0 by task ftracetest/1201 CPU: 4 PID: 1201 Comm: ftracetest Not tainted 6.5.0-test-10737-g469e0a8194e7 #13 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x90 print_report+0xcf/0x670 ? __pfx_ring_buffer_record_off+0x10/0x10 ? _raw_spin_lock_irqsave+0x2b/0x70 ? __virt_addr_valid+0xd9/0x160 kasan_report+0xd4/0x110 ? eventfs_root_lookup+0x88/0x1b0 ? eventfs_root_lookup+0x88/0x1b0 eventfs_root_lookup+0x88/0x1b0 ? eventfs_root_lookup+0x33/0x1b0 __lookup_slow+0x194/0x2a0 ? __pfx___lookup_slow+0x10/0x10 ? down_read+0x11c/0x330 walk_component+0x166/0x220 link_path_walk.part.0.constprop.0+0x3a3/0x5a0 ? seqcount_lockdep_reader_access+0x82/0x90 ? __pfx_link_path_walk.part.0.constprop.0+0x10/0x10 path_openat+0x143/0x11f0 ? __lock_acquire+0xa1a/0x3220 ? __pfx_path_openat+0x10/0x10 ? __pfx___lock_acquire+0x10/0x10 do_filp_open+0x166/0x290 ? __pfx_do_filp_open+0x10/0x10 ? lock_is_held_type+0xce/0x120 ? preempt_count_sub+0xb7/0x100 ? _raw_spin_unlock+0x29/0x50 ? alloc_fd+0x1a0/0x320 do_sys_openat2+0x126/0x160 ? rcu_is_watching+0x34/0x60 ? __pfx_do_sys_openat2+0x10/0x10 ? __might_resched+0x2cf/0x3b0 ? __fget_light+0xdf/0x100 __x64_sys_openat+0xcd/0x140 ? __pfx___x64_sys_openat+0x10/0x10 ? syscall_enter_from_user_mode+0x22/0x90 ? lockdep_hardirqs_on+0x7d/0x100 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f1dceef5e51 Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d 9a 27 0e 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 93 00 00 00 48 8b 54 24 28 64 48 2b 14 25 RSP: 002b:00007fff2cddf380 EFLAGS: 00000202 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000241 RCX: 00007f1dceef5e51 RDX: 0000000000000241 RSI: 000055d7520677d0 RDI: 00000000ffffff9c RBP: 000055d7520677d0 R08: 000000000000001e R09: 0000000000000001 R10: 00000000000001b6 R11: 0000000000000202 R12: 0000000000000000 R13: 0000000000000003 R14: 000055d752035678 R15: 000055d752067788 </TASK> Allocated by task 1200: kasan_save_stack+0x2f/0x50 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x8b/0x90 eventfs_create_events_dir+0x54/0x220 create_event_toplevel_files+0x42/0x130 event_trace_add_tracer+0x33/0x180 trace_array_create_dir+0x52/0xf0 trace_array_create+0x361/0x410 instance_mkdir+0x6b/0xb0 tracefs_syscall_mkdir+0x57/0x80 vfs_mkdir+0x275/0x380 do_mkdirat+0x1da/0x210 __x64_sys_mkdir+0x74/0xa0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Freed by task 1251: kasan_save_stack+0x2f/0x50 kasan_set_track+0x21/0x30 kasan_save_free_info+0x27/0x40 __kasan_slab_free+0x106/0x180 __kmem_cache_free+0x149/0x2e0 event_trace_del_tracer+0xcb/0x120 __remove_instance+0x16a/0x340 instance_rmdir+0x77/0xa0 tracefs_syscall_rmdir+0x77/0xc0 vfs_rmdir+0xed/0x2d0 do_rmdir+0x235/0x280 __x64_sys_rmdir+0x5f/0x90 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 The buggy address belongs to the object at ffff888120130ca0 which belongs to the cache kmalloc-16 of size 16 The buggy address is located 0 bytes inside of freed 16-byte region [ffff888120130ca0, ffff888120130cb0) The buggy address belongs to the physical page: page:000000004dbddbb0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x120130 flags: 0x17ffffc0000800(slab|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0017ffffc0000800 ffff8881000423c0 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000800080 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888120130b80: 00 00 fc fc 00 05 fc fc 00 00 fc fc 00 02 fc fc ffff888120130c00: 00 07 fc fc 00 00 fc fc 00 00 fc fc fa fb fc fc >ffff888120130c80: 00 00 fc fc fa fb fc fc 00 00 fc fc 00 00 fc fc ^ ffff888120130d00: 00 00 fc fc 00 00 fc fc 00 00 fc fc fa fb fc fc ffff888120130d80: 00 00 fc fc 00 00 fc fc 00 00 fc fc 00 00 fc fc ================================================================== Link: https://lkml.kernel.org/r/20230907024803.250873643@goodmis.org Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/ Cc: Ajay Kaher <akaher@vmware.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 5bdcd5f5331a2 eventfs: ("Implement removal of meta data from eventfs") Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-06tracefs/eventfs: Add missing lockdown checksSteven Rostedt (Google)1-0/+15
All the eventfs external functions do not check if TRACEFS_LOCKDOWN was set or not. This can caused some functions to return success while others fail, which can trigger unexpected errors. Add the missing lockdown checks. Link: https://lkml.kernel.org/r/20230905182711.899724045@goodmis.org Link: https://lore.kernel.org/all/202309050916.58201dc6-oliver.sang@intel.com/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ajay Kaher <akaher@vmware.com> Cc: Ching-lin Yu <chinglinyu@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-22tracefs: Remove kerneldoc from struct eventfs_fileSteven Rostedt (Google)1-4/+10
The struct eventfs_file is a local structure and should not be parsed by kernel doc. It also does not fully follow the kerneldoc format and is causing kerneldoc to spit out errors. Replace the /** to /* so that kerneldoc no longer processes this structure. Also format the comments of the delete union of the structure to be a bit better. Link: https://lore.kernel.org/linux-trace-kernel/20230818201414.2729745-1-willy@infradead.org/ Link: https://lore.kernel.org/linux-trace-kernel/20230822053313.77aa3397@rorschach.local.home Cc: Mark Rutland <mark.rutland@arm.com> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-31eventfs: Implement removal of meta data from eventfsAjay Kaher1-0/+143
When events are removed from tracefs, the eventfs must be aware of this. The eventfs_remove() removes the meta data from eventfs so that it will no longer create the files associated with that event. When an instance is removed from tracefs, eventfs_remove_events_dir() will remove and clean up the entire "events" directory. The helper function eventfs_remove_rec() is used to clean up and free the associated data from eventfs for both of the added functions. SRCU is used to protect the lists of meta data stored in the eventfs. The eventfs_mutex is used to protect the content of the items in the list. As lookups may be happening as deletions of events are made, the freeing of dentry/inodes and relative information is done after the SRCU grace period has passed. Link: https://lkml.kernel.org/r/1690568452-46553-9-git-send-email-akaher@vmware.com Signed-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202305030611.Kas747Ev-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-31eventfs: Implement functions to create files and dirs when accessedAjay Kaher1-2/+59
Add create_file() and create_dir() functions to create the files and directories respectively when they are accessed. The functions will be called from the lookup operation of the inode_operations or from the open function of file_operations. Link: https://lkml.kernel.org/r/1690568452-46553-8-git-send-email-akaher@vmware.com Signed-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-31eventfs: Implement eventfs lookup, read, open functionsAjay Kaher1-0/+304
Add the inode_operations, file_operations, and helper functions to eventfs: dcache_dir_open_wrapper() eventfs_root_lookup() eventfs_release() eventfs_set_ef_status_free() eventfs_post_create_dir() The inode_operations and file_operations functions will be called from the VFS layer. create_file() and create_dir() are added as stub functions and will be filled in later. Link: https://lkml.kernel.org/r/1690568452-46553-7-git-send-email-akaher@vmware.com Signed-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-31eventfs: Implement eventfs file add functionsAjay Kaher1-0/+86
Add the following functions to add files to evenfs: eventfs_add_events_file() to add the data needed to create a specific file located at the top level events directory. The dentry/inode will be created when the events directory is scanned. eventfs_add_file() to add the data needed for files within the directories below the top level events directory. The dentry/inode of the file will be created when the directory that the file is in is scanned. Link: https://lkml.kernel.org/r/1690568452-46553-6-git-send-email-akaher@vmware.com Signed-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-31eventfs: Implement eventfs dir creation functionsAjay Kaher1-0/+211
Add eventfs_file structure which will hold the properties of the eventfs files and directories. Add following functions to create the directories in eventfs: eventfs_create_events_dir() will create the top level "events" directory within the tracefs file system. eventfs_add_subsystem_dir() creates an eventfs_file descriptor with the given name of the subsystem. eventfs_add_dir() creates an eventfs_file descriptor with the given name of the directory and attached to a eventfs_file of a subsystem. Add tracefs_inode structure to hold the inodes, flags and pointers to private data used by eventfs. Link: https://lkml.kernel.org/r/1690568452-46553-5-git-send-email-akaher@vmware.com Signed-off-by: Ajay Kaher <akaher@vmware.com> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Ching-lin Yu <chinglinyu@google.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-lkp/202305051619.9a469a9a-yujie.liu@intel.com Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>