From 21c094d3f8a6c88dedbd9831631a263d5c49775f Mon Sep 17 00:00:00 2001 From: Donglin Peng Date: Sat, 8 Apr 2023 05:42:16 -0700 Subject: tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex Add documentation for the two newly introduced options for the function_graph tracer. The funcgraph-retval option is used to control whether or not to display the return value, while the funcgraph-retval-hex option is used to control the display format of the return value. Link: https://lkml.kernel.org/r/2b5635f05146161b54c9ea6307e25efe5ccebdad.1680954589.git.pengdonglin@sangfor.com.cn Acked-by: Masami Hiramatsu (Google) Signed-off-by: Donglin Peng Signed-off-by: Steven Rostedt (Google) --- Documentation/trace/ftrace.rst | 126 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 126 insertions(+) (limited to 'Documentation/trace') diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index 027437b745a0..df2d3e57a83f 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -1359,6 +1359,19 @@ Options for function_graph tracer: only a closing curly bracket "}" is displayed for the return of a function. + funcgraph-retval + When set, the return value of each traced function + will be printed after an equal sign "=". By default + this is off. + + funcgraph-retval-hex + When set, the return value will always be printed + in hexadecimal format. If the option is not set and + the return value is an error code, it will be printed + in signed decimal format; otherwise it will also be + printed in hexadecimal format. By default, this option + is off. + sleep-time When running function graph tracer, to include the time a task schedules out in its function. @@ -2704,6 +2717,119 @@ It is default disabled. 0) 1.757 us | } /* kmem_cache_free() */ 0) 2.861 us | } /* putname() */ +The return value of each traced function can be displayed after +an equal sign "=". When encountering system call failures, it +can be verfy helpful to quickly locate the function that first +returns an error code. + + - hide: echo nofuncgraph-retval > trace_options + - show: echo funcgraph-retval > trace_options + + Example with funcgraph-retval:: + + 1) | cgroup_migrate() { + 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */ + 1) | cgroup_migrate_execute() { + 1) | cpu_cgroup_can_attach() { + 1) | cgroup_taskset_first() { + 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */ + 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */ + 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */ + 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */ + 1) 4.369 us | } /* cgroup_migrate_execute = -22 */ + 1) 7.143 us | } /* cgroup_migrate = -22 */ + +The above example shows that the function cpu_cgroup_can_attach +returned the error code -22 firstly, then we can read the code +of this function to get the root cause. + +When the option funcgraph-retval-hex is not set, the return value can +be displayed in a smart way. Specifically, if it is an error code, +it will be printed in signed decimal format, otherwise it will +printed in hexadecimal format. + + - smart: echo nofuncgraph-retval-hex > trace_options + - hexadecimal: echo funcgraph-retval-hex > trace_options + + Example with funcgraph-retval-hex:: + + 1) | cgroup_migrate() { + 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */ + 1) | cgroup_migrate_execute() { + 1) | cpu_cgroup_can_attach() { + 1) | cgroup_taskset_first() { + 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */ + 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */ + 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */ + 1) 2.335 us | } /* cpu_cgroup_can_attach = 0xffffffea */ + 1) 4.369 us | } /* cgroup_migrate_execute = 0xffffffea */ + 1) 7.143 us | } /* cgroup_migrate = 0xffffffea */ + +At present, there are some limitations when using the funcgraph-retval +option, and these limitations will be eliminated in the future: + +- Even if the function return type is void, a return value will still + be printed, and you can just ignore it. + +- Even if return values are stored in multiple registers, only the + value contained in the first register will be recorded and printed. + To illustrate, in the x86 architecture, eax and edx are used to store + a 64-bit return value, with the lower 32 bits saved in eax and the + upper 32 bits saved in edx. However, only the value stored in eax + will be recorded and printed. + +- In certain procedure call standards, such as arm64's AAPCS64, when a + type is smaller than a GPR, it is the responsibility of the consumer + to perform the narrowing, and the upper bits may contain UNKNOWN values. + Therefore, it is advisable to check the code for such cases. For instance, + when using a u8 in a 64-bit GPR, bits [63:8] may contain arbitrary values, + especially when larger types are truncated, whether explicitly or implicitly. + Here are some specific cases to illustrate this point: + + **Case One**:: + + The function narrow_to_u8 is defined as follows:: + + u8 narrow_to_u8(u64 val) + { + // implicitly truncated + return val; + } + + It may be compiled to:: + + narrow_to_u8: + < ... ftrace instrumentation ... > + RET + + If you pass 0x123456789abcdef to this function and want to narrow it, + it may be recorded as 0x123456789abcdef instead of 0xef. + + **Case Two**:: + + The function error_if_not_4g_aligned is defined as follows:: + + int error_if_not_4g_aligned(u64 val) + { + if (val & GENMASK(31, 0)) + return -EINVAL; + + return 0; + } + + It could be compiled to:: + + error_if_not_4g_aligned: + CBNZ w0, .Lnot_aligned + RET // bits [31:0] are zero, bits + // [63:32] are UNKNOWN + .Lnot_aligned: + MOV x0, #-EINVAL + RET + + When passing 0x2_0000_0000 to it, the return value may be recorded as + 0x2_0000_0000 instead of 0. + You can put some comments on specific functions by using trace_printk() For example, if you want to put a comment inside the __might_sleep() function, you just have to include -- cgit v1.2.3 From 83f74441bcb16c324b7bdba0ab4261a44cb1ac21 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Sun, 11 Jun 2023 15:00:29 +0200 Subject: ftrace: Show all functions with addresses in available_filter_functions_addrs Adding new available_filter_functions_addrs file that shows all available functions (same as available_filter_functions) together with addresses, like: # cat available_filter_functions_addrs | head ffffffff81000770 __traceiter_initcall_level ffffffff810007c0 __traceiter_initcall_start ffffffff81000810 __traceiter_initcall_finish ffffffff81000860 trace_initcall_finish_cb ... Note displayed address is the patch-site address and can differ from /proc/kallsyms address. It's useful to have address avilable for traceable symbols, so we don't need to allways cross check kallsyms with available_filter_functions (or the other way around) and have all the data in single file. For backwards compatibility reasons we can't change the existing available_filter_functions file output, but we need to add new file. The problem is that we need to do 2 passes: - through available_filter_functions and find out if the function is traceable - through /proc/kallsyms to get the address for traceable function Having available_filter_functions symbols together with addresses allow us to skip the kallsyms step and we are ok with the address in available_filter_functions_addr not being the function entry, because kprobe_multi uses fprobe and that handles both entry and patch-site address properly. We have 2 interfaces how to create kprobe_multi link: a) passing symbols to kernel 1) user gathers symbols and need to ensure that they are trace-able -> pass through available_filter_functions file 2) kernel takes those symbols and translates them to addresses through kallsyms api 3) addresses are passed to fprobe/ftrace through: register_fprobe_ips -> ftrace_set_filter_ips b) passing addresses to kernel 1) user gathers symbols and needs to ensure that they are trace-able -> pass through available_filter_functions file 2) user takes those symbols and translates them to addresses through /proc/kallsyms 3) addresses are passed to the kernel and kernel calls: register_fprobe_ips -> ftrace_set_filter_ips The new available_filter_functions_addrs file helps us with option b), because we can make 'b 1' and 'b 2' in one step - while filtering traceable functions, we get the address directly. Link: https://lore.kernel.org/linux-trace-kernel/20230611130029.1202298-1-jolsa@kernel.org Cc: Masami Hiramatsu Cc: Mark Rutland Cc: Andrii Nakryiko Tested-by: Jackie Liu # x86 Suggested-by: Steven Rostedt (Google) Suggested-by: Andrii Nakryiko Signed-off-by: Jiri Olsa Signed-off-by: Steven Rostedt (Google) --- Documentation/trace/ftrace.rst | 6 ++++++ include/linux/ftrace.h | 1 + kernel/trace/ftrace.c | 37 +++++++++++++++++++++++++++++++++++++ 3 files changed, 44 insertions(+) (limited to 'Documentation/trace') diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index df2d3e57a83f..b7308ab10c0e 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -324,6 +324,12 @@ of ftrace. Here is a list of some of the key files: "set_graph_function", or "set_graph_notrace". (See the section "dynamic ftrace" below for more details.) + available_filter_functions_addrs: + + Similar to available_filter_functions, but with address displayed + for each function. The displayed address is the patch-site address + and can differ from /proc/kallsyms address. + dyn_ftrace_total_info: This file is for debugging purposes. The number of functions that diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 49f279f4c3a1..8e59bd954153 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -633,6 +633,7 @@ enum { FTRACE_ITER_MOD = (1 << 5), FTRACE_ITER_ENABLED = (1 << 6), FTRACE_ITER_TOUCHED = (1 << 7), + FTRACE_ITER_ADDRS = (1 << 8), }; void arch_ftrace_update_code(int command); diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 764668467155..b24c573934af 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3861,6 +3861,9 @@ static int t_show(struct seq_file *m, void *v) if (!rec) return 0; + if (iter->flags & FTRACE_ITER_ADDRS) + seq_printf(m, "%lx ", rec->ip); + if (print_rec(m, rec->ip)) { /* This should only happen when a rec is disabled */ WARN_ON_ONCE(!(rec->flags & FTRACE_FL_DISABLED)); @@ -3996,6 +3999,30 @@ ftrace_touched_open(struct inode *inode, struct file *file) return 0; } +static int +ftrace_avail_addrs_open(struct inode *inode, struct file *file) +{ + struct ftrace_iterator *iter; + int ret; + + ret = security_locked_down(LOCKDOWN_TRACEFS); + if (ret) + return ret; + + if (unlikely(ftrace_disabled)) + return -ENODEV; + + iter = __seq_open_private(file, &show_ftrace_seq_ops, sizeof(*iter)); + if (!iter) + return -ENOMEM; + + iter->pg = ftrace_pages_start; + iter->flags = FTRACE_ITER_ADDRS; + iter->ops = &global_ops; + + return 0; +} + /** * ftrace_regex_open - initialize function tracer filter files * @ops: The ftrace_ops that hold the hash filters @@ -5916,6 +5943,13 @@ static const struct file_operations ftrace_touched_fops = { .release = seq_release_private, }; +static const struct file_operations ftrace_avail_addrs_fops = { + .open = ftrace_avail_addrs_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release_private, +}; + static const struct file_operations ftrace_filter_fops = { .open = ftrace_filter_open, .read = seq_read, @@ -6377,6 +6411,9 @@ static __init int ftrace_init_dyn_tracefs(struct dentry *d_tracer) trace_create_file("available_filter_functions", TRACE_MODE_READ, d_tracer, NULL, &ftrace_avail_fops); + trace_create_file("available_filter_functions_addrs", TRACE_MODE_READ, + d_tracer, NULL, &ftrace_avail_addrs_fops); + trace_create_file("enabled_functions", TRACE_MODE_READ, d_tracer, NULL, &ftrace_enabled_fops); -- cgit v1.2.3 From e88ed227f639ebcb31ed4e5b88756b47d904584b Mon Sep 17 00:00:00 2001 From: Daniel Bristot de Oliveira Date: Tue, 6 Jun 2023 17:12:27 +0200 Subject: tracing/timerlat: Add user-space interface Going a step further, we propose a way to use any user-space workload as the task waiting for the timerlat timer. This is done via a per-CPU file named osnoise/cpu$id/timerlat_fd file. The tracef_fd allows a task to open at a time. When a task reads the file, the timerlat timer is armed for future osnoise/timerlat_period_us time. When the timer fires, it prints the IRQ latency and wakes up the user-space thread waiting in the timerlat_fd. The thread then starts to run, executes the timerlat measurement, prints the thread scheduling latency and returns to user-space. When the thread rereads the timerlat_fd, the tracer will print the user-ret(urn) latency, which is an additional metric. This additional metric is also traced by the tracer and can be used, for example of measuring the context switch overhead from kernel-to-user and user-to-kernel, or the response time for an arbitrary execution in user-space. The tracer supports one thread per CPU, the thread must be pinned to the CPU, and it cannot migrate while holding the timerlat_fd. The reason is that the tracer is per CPU (nothing prohibits the tracer from allowing migrations in the future). The tracer monitors the migration of the thread and disables the tracer if detected. The timerlat_fd is only available for opening/reading when timerlat tracer is enabled, and NO_OSNOISE_WORKLOAD is set. The simplest way to activate this feature from user-space is: -------------------------------- %< ----------------------------------- int main(void) { char buffer[1024]; int timerlat_fd; int retval; long cpu = 0; /* place in CPU 0 */ cpu_set_t set; CPU_ZERO(&set); CPU_SET(cpu, &set); if (sched_setaffinity(gettid(), sizeof(set), &set) == -1) return 1; snprintf(buffer, sizeof(buffer), "/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd", cpu); timerlat_fd = open(buffer, O_RDONLY); if (timerlat_fd < 0) { printf("error opening %s: %s\n", buffer, strerror(errno)); exit(1); } for (;;) { retval = read(timerlat_fd, buffer, 1024); if (retval < 0) break; } close(timerlat_fd); exit(0); } -------------------------------- >% ----------------------------------- When disabling timerlat, if there is a workload holding the timerlat_fd, the SIGKILL will be sent to the thread. Link: https://lkml.kernel.org/r/69fe66a863d2792ff4c3a149bf9e32e26468bb3a.1686063934.git.bristot@kernel.org Cc: Juri Lelli Cc: William White Cc: Daniel Bristot de Oliveira Cc: Masami Hiramatsu Cc: Jonathan Corbet Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Steven Rostedt (Google) --- Documentation/trace/timerlat-tracer.rst | 78 +++++++ kernel/trace/trace_osnoise.c | 378 +++++++++++++++++++++++++++++++- kernel/trace/trace_output.c | 4 +- 3 files changed, 455 insertions(+), 5 deletions(-) (limited to 'Documentation/trace') diff --git a/Documentation/trace/timerlat-tracer.rst b/Documentation/trace/timerlat-tracer.rst index db17df312bc8..53a56823e903 100644 --- a/Documentation/trace/timerlat-tracer.rst +++ b/Documentation/trace/timerlat-tracer.rst @@ -180,3 +180,81 @@ dummy_load_1ms_pd_init, which had the following code (on purpose):: return 0; } + +User-space interface +--------------------------- + +Timerlat allows user-space threads to use timerlat infra-structure to +measure scheduling latency. This interface is accessible via a per-CPU +file descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd. + +This interface is accessible under the following conditions: + + - timerlat tracer is enable + - osnoise workload option is set to NO_OSNOISE_WORKLOAD + - The user-space thread is affined to a single processor + - The thread opens the file associated with its single processor + - Only one thread can access the file at a time + +The open() syscall will fail if any of these conditions are not met. +After opening the file descriptor, the user space can read from it. + +The read() system call will run a timerlat code that will arm the +timer in the future and wait for it as the regular kernel thread does. + +When the timer IRQ fires, the timerlat IRQ will execute, report the +IRQ latency and wake up the thread waiting in the read. The thread will be +scheduled and report the thread latency via tracer - as for the kernel +thread. + +The difference from the in-kernel timerlat is that, instead of re-arming +the timer, timerlat will return to the read() system call. At this point, +the user can run any code. + +If the application rereads the file timerlat file descriptor, the tracer +will report the return from user-space latency, which is the total +latency. If this is the end of the work, it can be interpreted as the +response time for the request. + +After reporting the total latency, timerlat will restart the cycle, arm +a timer, and go to sleep for the following activation. + +If at any time one of the conditions is broken, e.g., the thread migrates +while in user space, or the timerlat tracer is disabled, the SIG_KILL +signal will be sent to the user-space thread. + +Here is an basic example of user-space code for timerlat:: + + int main(void) + { + char buffer[1024]; + int timerlat_fd; + int retval; + long cpu = 0; /* place in CPU 0 */ + cpu_set_t set; + + CPU_ZERO(&set); + CPU_SET(cpu, &set); + + if (sched_setaffinity(gettid(), sizeof(set), &set) == -1) + return 1; + + snprintf(buffer, sizeof(buffer), + "/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd", + cpu); + + timerlat_fd = open(buffer, O_RDONLY); + if (timerlat_fd < 0) { + printf("error opening %s: %s\n", buffer, strerror(errno)); + exit(1); + } + + for (;;) { + retval = read(timerlat_fd, buffer, 1024); + if (retval < 0) + break; + } + + close(timerlat_fd); + exit(0); + } diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c index 220172cb874d..bd0d01d00fb9 100644 --- a/kernel/trace/trace_osnoise.c +++ b/kernel/trace/trace_osnoise.c @@ -181,6 +181,7 @@ struct osn_irq { #define IRQ_CONTEXT 0 #define THREAD_CONTEXT 1 +#define THREAD_URET 2 /* * sofirq runtime info. */ @@ -238,6 +239,7 @@ struct timerlat_variables { u64 abs_period; bool tracing_thread; u64 count; + bool uthread_migrate; }; static DEFINE_PER_CPU(struct timerlat_variables, per_cpu_timerlat_var); @@ -1181,6 +1183,78 @@ thread_exit(struct osnoise_variables *osn_var, struct task_struct *t) osn_var->thread.arrival_time = 0; } +#ifdef CONFIG_TIMERLAT_TRACER +/* + * osnoise_stop_exception - Stop tracing and the tracer. + */ +static __always_inline void osnoise_stop_exception(char *msg, int cpu) +{ + struct osnoise_instance *inst; + struct trace_array *tr; + + rcu_read_lock(); + list_for_each_entry_rcu(inst, &osnoise_instances, list) { + tr = inst->tr; + trace_array_printk_buf(tr->array_buffer.buffer, _THIS_IP_, + "stop tracing hit on cpu %d due to exception: %s\n", + smp_processor_id(), + msg); + + if (test_bit(OSN_PANIC_ON_STOP, &osnoise_options)) + panic("tracer hit on cpu %d due to exception: %s\n", + smp_processor_id(), + msg); + + tracer_tracing_off(tr); + } + rcu_read_unlock(); +} + +/* + * trace_sched_migrate_callback - sched:sched_migrate_task trace event handler + * + * his function is hooked to the sched:sched_migrate_task trace event, and monitors + * timerlat user-space thread migration. + */ +static void trace_sched_migrate_callback(void *data, struct task_struct *p, int dest_cpu) +{ + struct osnoise_variables *osn_var; + long cpu = task_cpu(p); + + osn_var = per_cpu_ptr(&per_cpu_osnoise_var, cpu); + if (osn_var->pid == p->pid && dest_cpu != cpu) { + per_cpu_ptr(&per_cpu_timerlat_var, cpu)->uthread_migrate = 1; + osnoise_taint("timerlat user-thread migrated\n"); + osnoise_stop_exception("timerlat user-thread migrated", cpu); + } +} + +static int register_migration_monitor(void) +{ + int ret = 0; + + /* + * Timerlat thread migration check is only required when running timerlat in user-space. + * Thus, enable callback only if timerlat is set with no workload. + */ + if (timerlat_enabled() && !test_bit(OSN_WORKLOAD, &osnoise_options)) + ret = register_trace_sched_migrate_task(trace_sched_migrate_callback, NULL); + + return ret; +} + +static void unregister_migration_monitor(void) +{ + if (timerlat_enabled() && !test_bit(OSN_WORKLOAD, &osnoise_options)) + unregister_trace_sched_migrate_task(trace_sched_migrate_callback, NULL); +} +#else +static int register_migration_monitor(void) +{ + return 0; +} +static void unregister_migration_monitor(void) {} +#endif /* * trace_sched_switch - sched:sched_switch trace event handler * @@ -1204,7 +1278,7 @@ trace_sched_switch_callback(void *data, bool preempt, } /* - * hook_thread_events - Hook the insturmentation for thread noise + * hook_thread_events - Hook the instrumentation for thread noise * * Hook the osnoise tracer callbacks to handle the noise from other * threads on the necessary kernel events. @@ -1217,11 +1291,19 @@ static int hook_thread_events(void) if (ret) return -EINVAL; + ret = register_migration_monitor(); + if (ret) + goto out_unreg; + return 0; + +out_unreg: + unregister_trace_sched_switch(trace_sched_switch_callback, NULL); + return -EINVAL; } /* - * unhook_thread_events - *nhook the insturmentation for thread noise + * unhook_thread_events - unhook the instrumentation for thread noise * * Unook the osnoise tracer callbacks to handle the noise from other * threads on the necessary kernel events. @@ -1229,6 +1311,7 @@ static int hook_thread_events(void) static void unhook_thread_events(void) { unregister_trace_sched_switch(trace_sched_switch_callback, NULL); + unregister_migration_monitor(); } /* @@ -1864,10 +1947,24 @@ static void stop_kthread(unsigned int cpu) kthread = per_cpu(per_cpu_osnoise_var, cpu).kthread; if (kthread) { - kthread_stop(kthread); + if (test_bit(OSN_WORKLOAD, &osnoise_options)) { + kthread_stop(kthread); + } else { + /* + * This is a user thread waiting on the timerlat_fd. We need + * to close all users, and the best way to guarantee this is + * by killing the thread. NOTE: this is a purpose specific file. + */ + kill_pid(kthread->thread_pid, SIGKILL, 1); + put_task_struct(kthread); + } per_cpu(per_cpu_osnoise_var, cpu).kthread = NULL; } else { + /* if no workload, just return */ if (!test_bit(OSN_WORKLOAD, &osnoise_options)) { + /* + * This is set in the osnoise tracer case. + */ per_cpu(per_cpu_osnoise_var, cpu).sampling = false; barrier(); return; @@ -1912,7 +2009,6 @@ static int start_kthread(unsigned int cpu) barrier(); return 0; } - snprintf(comm, 24, "osnoise/%d", cpu); } @@ -1941,6 +2037,11 @@ static int start_per_cpu_kthreads(void) int retval = 0; int cpu; + if (!test_bit(OSN_WORKLOAD, &osnoise_options)) { + if (timerlat_enabled()) + return 0; + } + cpus_read_lock(); /* * Run only on online CPUs in which osnoise is allowed to run. @@ -2281,6 +2382,223 @@ err_free: return err; } +#ifdef CONFIG_TIMERLAT_TRACER +static int timerlat_fd_open(struct inode *inode, struct file *file) +{ + struct osnoise_variables *osn_var; + struct timerlat_variables *tlat; + long cpu = (long) inode->i_cdev; + + mutex_lock(&interface_lock); + + /* + * This file is accessible only if timerlat is enabled, and + * NO_OSNOISE_WORKLOAD is set. + */ + if (!timerlat_enabled() || test_bit(OSN_WORKLOAD, &osnoise_options)) { + mutex_unlock(&interface_lock); + return -EINVAL; + } + + migrate_disable(); + + osn_var = this_cpu_osn_var(); + + /* + * The osn_var->pid holds the single access to this file. + */ + if (osn_var->pid) { + mutex_unlock(&interface_lock); + migrate_enable(); + return -EBUSY; + } + + /* + * timerlat tracer is a per-cpu tracer. Check if the user-space too + * is pinned to a single CPU. The tracer laters monitor if the task + * migrates and then disables tracer if it does. However, it is + * worth doing this basic acceptance test to avoid obviusly wrong + * setup. + */ + if (current->nr_cpus_allowed > 1 || cpu != smp_processor_id()) { + mutex_unlock(&interface_lock); + migrate_enable(); + return -EPERM; + } + + /* + * From now on, it is good to go. + */ + file->private_data = inode->i_cdev; + + get_task_struct(current); + + osn_var->kthread = current; + osn_var->pid = current->pid; + + /* + * Setup is done. + */ + mutex_unlock(&interface_lock); + + tlat = this_cpu_tmr_var(); + tlat->count = 0; + + migrate_enable(); + return 0; +}; + +/* + * timerlat_fd_read - Read function for "timerlat_fd" file + * @file: The active open file structure + * @ubuf: The userspace provided buffer to read value into + * @cnt: The maximum number of bytes to read + * @ppos: The current "file" position + * + * Prints 1 on timerlat, the number of interferences on osnoise, -1 on error. + */ +static ssize_t +timerlat_fd_read(struct file *file, char __user *ubuf, size_t count, + loff_t *ppos) +{ + long cpu = (long) file->private_data; + struct osnoise_variables *osn_var; + struct timerlat_variables *tlat; + struct timerlat_sample s; + s64 diff; + u64 now; + + migrate_disable(); + + tlat = this_cpu_tmr_var(); + + /* + * While in user-space, the thread is migratable. There is nothing + * we can do about it. + * So, if the thread is running on another CPU, stop the machinery. + */ + if (cpu == smp_processor_id()) { + if (tlat->uthread_migrate) { + migrate_enable(); + return -EINVAL; + } + } else { + per_cpu_ptr(&per_cpu_timerlat_var, cpu)->uthread_migrate = 1; + osnoise_taint("timerlat user thread migrate\n"); + osnoise_stop_tracing(); + migrate_enable(); + return -EINVAL; + } + + osn_var = this_cpu_osn_var(); + + /* + * The timerlat in user-space runs in a different order: + * the read() starts from the execution of the previous occurrence, + * sleeping for the next occurrence. + * + * So, skip if we are entering on read() before the first wakeup + * from timerlat IRQ: + */ + if (likely(osn_var->sampling)) { + now = ktime_to_ns(hrtimer_cb_get_time(&tlat->timer)); + diff = now - tlat->abs_period; + + /* + * it was not a timer firing, but some other signal? + */ + if (diff < 0) + goto out; + + s.seqnum = tlat->count; + s.timer_latency = diff; + s.context = THREAD_URET; + + trace_timerlat_sample(&s); + + notify_new_max_latency(diff); + + tlat->tracing_thread = false; + if (osnoise_data.stop_tracing_total) + if (time_to_us(diff) >= osnoise_data.stop_tracing_total) + osnoise_stop_tracing(); + } else { + tlat->tracing_thread = false; + tlat->kthread = current; + + hrtimer_init(&tlat->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED_HARD); + tlat->timer.function = timerlat_irq; + + /* Annotate now to drift new period */ + tlat->abs_period = hrtimer_cb_get_time(&tlat->timer); + + osn_var->sampling = 1; + } + + /* wait for the next period */ + wait_next_period(tlat); + + /* This is the wakeup from this cycle */ + now = ktime_to_ns(hrtimer_cb_get_time(&tlat->timer)); + diff = now - tlat->abs_period; + + /* + * it was not a timer firing, but some other signal? + */ + if (diff < 0) + goto out; + + s.seqnum = tlat->count; + s.timer_latency = diff; + s.context = THREAD_CONTEXT; + + trace_timerlat_sample(&s); + + if (osnoise_data.stop_tracing_total) { + if (time_to_us(diff) >= osnoise_data.stop_tracing_total) { + timerlat_dump_stack(time_to_us(diff)); + notify_new_max_latency(diff); + osnoise_stop_tracing(); + } + } + +out: + migrate_enable(); + return 0; +} + +static int timerlat_fd_release(struct inode *inode, struct file *file) +{ + struct osnoise_variables *osn_var; + struct timerlat_variables *tlat_var; + long cpu = (long) file->private_data; + + migrate_disable(); + mutex_lock(&interface_lock); + + osn_var = per_cpu_ptr(&per_cpu_osnoise_var, cpu); + tlat_var = per_cpu_ptr(&per_cpu_timerlat_var, cpu); + + hrtimer_cancel(&tlat_var->timer); + memset(tlat_var, 0, sizeof(*tlat_var)); + + osn_var->sampling = 0; + osn_var->pid = 0; + + /* + * We are leaving, not being stopped... see stop_kthread(); + */ + if (osn_var->kthread) { + put_task_struct(osn_var->kthread); + osn_var->kthread = NULL; + } + + mutex_unlock(&interface_lock); + migrate_enable(); + return 0; +} +#endif + /* * osnoise/runtime_us: cannot be greater than the period. */ @@ -2344,6 +2662,13 @@ static struct trace_min_max_param timerlat_period = { .max = &timerlat_max_period, .min = &timerlat_min_period, }; + +static const struct file_operations timerlat_fd_fops = { + .open = timerlat_fd_open, + .read = timerlat_fd_read, + .release = timerlat_fd_release, + .llseek = generic_file_llseek, +}; #endif static const struct file_operations cpus_fops = { @@ -2381,18 +2706,63 @@ static int init_timerlat_stack_tracefs(struct dentry *top_dir) } #endif /* CONFIG_STACKTRACE */ +static int osnoise_create_cpu_timerlat_fd(struct dentry *top_dir) +{ + struct dentry *timerlat_fd; + struct dentry *per_cpu; + struct dentry *cpu_dir; + char cpu_str[30]; /* see trace.c: tracing_init_tracefs_percpu() */ + long cpu; + + /* + * Why not using tracing instance per_cpu/ dir? + * + * Because osnoise/timerlat have a single workload, having + * multiple files like these are wast of memory. + */ + per_cpu = tracefs_create_dir("per_cpu", top_dir); + if (!per_cpu) + return -ENOMEM; + + for_each_possible_cpu(cpu) { + snprintf(cpu_str, 30, "cpu%ld", cpu); + cpu_dir = tracefs_create_dir(cpu_str, per_cpu); + if (!cpu_dir) + goto out_clean; + + timerlat_fd = trace_create_file("timerlat_fd", TRACE_MODE_READ, + cpu_dir, NULL, &timerlat_fd_fops); + if (!timerlat_fd) + goto out_clean; + + /* Record the CPU */ + d_inode(timerlat_fd)->i_cdev = (void *)(cpu); + } + + return 0; + +out_clean: + tracefs_remove(per_cpu); + return -ENOMEM; +} + /* * init_timerlat_tracefs - A function to initialize the timerlat interface files */ static int init_timerlat_tracefs(struct dentry *top_dir) { struct dentry *tmp; + int retval; tmp = tracefs_create_file("timerlat_period_us", TRACE_MODE_WRITE, top_dir, &timerlat_period, &trace_min_max_fops); if (!tmp) return -ENOMEM; + retval = osnoise_create_cpu_timerlat_fd(top_dir); + if (retval) + return retval; + return init_timerlat_stack_tracefs(top_dir); } #else /* CONFIG_TIMERLAT_TRACER */ diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 15f05faaae44..9f10c0071c4f 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -1446,6 +1446,8 @@ static struct trace_event trace_osnoise_event = { }; /* TRACE_TIMERLAT */ + +static char *timerlat_lat_context[] = {"irq", "thread", "user-ret"}; static enum print_line_t trace_timerlat_print(struct trace_iterator *iter, int flags, struct trace_event *event) @@ -1458,7 +1460,7 @@ trace_timerlat_print(struct trace_iterator *iter, int flags, trace_seq_printf(s, "#%-5u context %6s timer_latency %9llu ns\n", field->seqnum, - field->context ? "thread" : "irq", + timerlat_lat_context[field->context], field->timer_latency); return trace_handle_return(s); -- cgit v1.2.3 From fc30ace06f250f79381a8e3f6ed92dd68e25a9f5 Mon Sep 17 00:00:00 2001 From: Donglin Peng Date: Fri, 23 Jun 2023 15:17:28 +0800 Subject: tracing: Fix warnings when building htmldocs for function graph retval When building htmldocs, the following warnings appear: Documentation/trace/ftrace.rst:2797: WARNING: Literal block expected; none found. Documentation/trace/ftrace.rst:2816: WARNING: Literal block expected; none found. So fix it. Link: https://lore.kernel.org/all/20230623143517.19ffc6c0@canb.auug.org.au/ Link: https://lkml.kernel.org/r/20230623071728.25688-1-pengdonglin@sangfor.com.cn Fixes: 21c094d3f8a6 ("tracing: Add documentation for funcgraph-retval and funcgraph-retval-hex") Signed-off-by: Donglin Peng Acked-by: Masami Hiramatsu (Google) Reported-by: Stephen Rothwell Signed-off-by: Steven Rostedt (Google) --- Documentation/trace/ftrace.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/trace') diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst index b7308ab10c0e..f606c5bd1c0d 100644 --- a/Documentation/trace/ftrace.rst +++ b/Documentation/trace/ftrace.rst @@ -2792,7 +2792,7 @@ option, and these limitations will be eliminated in the future: especially when larger types are truncated, whether explicitly or implicitly. Here are some specific cases to illustrate this point: - **Case One**:: + **Case One**: The function narrow_to_u8 is defined as follows:: @@ -2811,7 +2811,7 @@ option, and these limitations will be eliminated in the future: If you pass 0x123456789abcdef to this function and want to narrow it, it may be recorded as 0x123456789abcdef instead of 0xef. - **Case Two**:: + **Case Two**: The function error_if_not_4g_aligned is defined as follows:: -- cgit v1.2.3