summaryrefslogtreecommitdiff
path: root/tools/perf/arch
AgeCommit message (Collapse)AuthorFilesLines
2021-07-05tools headers UAPI: Sync files changed by the quotactl_fd new syscallArnaldo Carvalho de Melo4-4/+4
To pick the changes in these csets: 64c2c2c62f92339b ("quota: Change quotactl_path() systcall to an fd-based one") 65ffb3d69ed3da28 ("quota: Wire up quotactl_fd syscall") That silences these perf build warnings and add support for those new syscalls in tools such as 'perf trace'. For instance, this is now possible: # perf trace -v -e quota* event qualifier tracepoint filter: (common_pid != 158365 && common_pid != 2512) && (id == 179 || id == 443) ^C# That is the filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ grep quota tools/perf/arch/x86/entry/syscalls/syscall_64.tbl 179 common quotactl sys_quotactl 443 common quotactl_fd sys_quotactl_fd $ This addresses these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl' diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Jan Kara <jack@suse.cz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-01perf cs-etm: Remove callback cs_etm_find_snapshot()Leo Yan1-133/+0
The callback cs_etm_find_snapshot() is invoked for snapshot mode, its main purpose is to find the correct AUX trace data and returns "head" and "old" (we can call "old" as "old head") to the caller, the caller __auxtrace_mmap__read() uses these two pointers to decide the AUX trace data size. This patch removes cs_etm_find_snapshot() with below reasons: - The first thing in cs_etm_find_snapshot() is to check if the head has wrapped around, if it is not, directly bails out. The checking is pointless, this is because the "head" and "old" pointers both are monotonical increasing so they never wrap around. - cs_etm_find_snapshot() adjusts the "head" and "old" pointers and assumes the AUX ring buffer is fully filled with the hardware trace data, so it always subtracts the difference "mm->len" from "head" to get "old". Let's imagine the snapshot is taken in very short interval, the tracers only fill a small chunk of the trace data into the AUX ring buffer, in this case, it's wrongly to copy the whole the AUX ring buffer to perf file. - As the "head" and "old" pointers are monotonically increased, the function __auxtrace_mmap__read() handles these two pointers properly. It calculates the reminders for these two pointers, and the size is clamped to be never more than "snapshot_size". We can simply reply on the function __auxtrace_mmap__read() to calculate the correct result for data copying, it's not necessary to add Arm CoreSight specific callback. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Cc: Denis Nikitin <denik@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: coresight@lists.linaro.org Link: http://lore.kernel.org/lkml/20210701093537.90759-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-01perf tools: Support pmu prefix for mem-store eventJin Yao1-1/+11
For enabling mem-store event, it doesn't need an auxiliary event. So just build an event name string with the pmu prefix. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210527001610.10553-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-01perf tools: Support pmu prefix for mem-load eventJin Yao3-9/+30
The perf_mem_events__name() can generate the mem-load event name. It uses a variable 'mem_loads_name__init' to avoid generating the event name every time (because perf_pmu__scan takes some time). The perf_mem_events__name() assumes the pmu is "cpu" but it's not correct for hybrid platform. For Alderlake, the pmu is "cpu_core" or "cpu_atom" Introduce a new parameter 'pmu_name' in perf_mem_events__name to let the caller specify a pmu name. Considering such event name is x86 specific, so move perf_mem_events[] to arch/x86/util/mem-events.c. We still keep the variable 'mem_loads_name__init' but it's only used when pmu_name is NULL (compatible for original behavior). When pmu_name is not NULL (e.g. "cpu_core"), this patch doesn't have optimization. That can be implemented in follow up patch. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210527001610.10553-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-06-01perf tools: Check mem-loads auxiliary eventJin Yao1-2/+7
For some platforms, an auxiliary event has to be enabled simultaneously with the load latency event. For Alderlake, the auxiliary event is created in "cpu_core" pmu. So first we need to check the existing of "cpu_core" pmu and then check if this pmu has auxiliary event. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210527001610.10553-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-25perf arm-spe: Remove redundant checking for "full_auxtrace"Leo Yan1-1/+1
The option "opts->full_auxtrace" is checked at the earlier place, if it is false the function will directly bail out. So remove the redundant checking for "opts->full_auxtrace". Suggested-by: James Clark <james.clark@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20210519041546.1574961-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-25perf arm-spe: Enable timestamp for per-cpu modeLeo Yan1-2/+31
For per-cpu mmap, it should enable timestamp tracing for Arm SPE; this is helpful for samples correlation. To automatically enable the timestamp, a helper arm_spe_set_timestamp() is introduced for setting "ts_enable" format bit. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210519041546.1574961-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-25perf arm-spe: Correct sample flags for dummy eventLeo Yan1-3/+4
The dummy event is mainly used for mmap, the TIME sample is only needed for per-cpu case so that the perf tool can rely on the correct timing for parsing symbols. And the CPU sample is useless for mmap. The BRANCH_STACK sample bit will be always reset for the dummy event in the function evsel__config(), so don't need to repeatedly reset it for Arm SPE specific. So this patch only enables TIME sample for per-cpu mmap. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20210519041546.1574961-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-25perf arm-spe: Correct sample flags for SPE eventLeo Yan1-3/+4
Now it's hard code to set sample flags for CPU, TIME and TID for SPE event, which is pointless. The CPU is useful for sampling only for per-mmap case, it is used to indicate the AUX trace is associated to which CPU. The TIME sample is not needed for AUX event, since the time for AUX event is not really used and this time is a different thing from the timestamp in Arm SPE trace, the timestamp tracing which is controlled by Arm SPE's config bit. The TID sample is not useful for AUX event. This patch corrects the sample flags for SPE event, it only set CPU sample bit for per-cpu mmap case. Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: James Clark <james.clark@arm.com> Tested-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Al Grant <Al.Grant@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20210519041546.1574961-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-25Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo4-4/+4
To pick up fixes from perf/urgent. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-21perf tests: Drop __maybe_unused on x86 test declarationsRob Herring1-3/+2
Function declarations don't need __maybe_unused annotations, only the implementations do. Drop them on the perf x86 tests. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: masayoshi mizuma <msys.mizuma@gmail.com> Link: http://lore.kernel.org/lkml/20210513174614.2242210-2-robh@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-21perf tests: Consolidate test__arch_unwind_sample declarationRob Herring6-26/+0
There's no reason for making the test__arch_unwind_sample declaration per arch. Currently that's done 2 different ways either with a declaration in arch-tests.h or with an arch define. Unify all this with an unconditional declaration in tests.h. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: masayoshi mizuma <msys.mizuma@gmail.com> Link: http://lore.kernel.org/lkml/20210513174614.2242210-1-robh@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-21tools headers UAPI: Sync files changed by the quotactl_path unwiringArnaldo Carvalho de Melo4-4/+4
To pick the changes in this csets: 5b9fedb31e476693 ("quota: Disable quotactl_path syscall") That silences these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl' diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Jan Kara <jack@suse.cz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12perf x86 kvm-stat: Support to analyze kvm MSRLei Zhao1-0/+46
usage: - kvm stat run a command and gather performance counter statistics - show the result: perf kvm stat report --event=msr See the msr events: Analyze events for all VMs, all VCPUs: MSR Access Samples Samples% Time% Min Time Max Time Avg time 0x6e0:W 67007 98.17% 98.31% 0.59us 10.69us 0.90us ( +- 0.10% ) 0x830:W 1186 1.74% 1.60% 0.53us 108.34us 0.82us ( +- 11.02% ) 0x3b:R 66 0.10% 0.09% 0.56us 1.26us 0.80us ( +- 3.24% ) Total Samples:68259, Total events handled time:61150.95us. Signed-off-by: Lei Zhao <zhaolei27@baidu.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1618470001-7239-1-git-send-email-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-10tools headers UAPI: Sync files changed by landlock, quotactl_path and ↵Arnaldo Carvalho de Melo4-0/+17
mount_settattr new syscalls To pick the changes in these csets: a49f4f81cb48925e ("arch: Wire up Landlock syscalls") 2a1867219c7b27f9 ("fs: add mount_setattr()") fa8b90070a80bb1a ("quota: wire up quotactl_path") That silences these perf build warnings and add support for those new syscalls in tools such as 'perf trace'. For instance, this is now possible: # ~acme/bin/perf trace -v -e landlock* event qualifier tracepoint filter: (common_pid != 129365 && common_pid != 3502) && (id == 444 || id == 445 || id == 446) ^C# That is tha filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ grep landlock tools/perf/arch/x86/entry/syscalls/syscall_64.tbl 444 common landlock_create_ruleset sys_landlock_create_ruleset 445 common landlock_add_rule sys_landlock_add_rule 446 common landlock_restrict_self sys_landlock_restrict_self $ This addresses these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl' diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: James Morris <jamorris@linux.microsoft.com> Cc: Jan Kara <jack@suse.cz> Cc: Mickaël Salaün <mic@linux.microsoft.com> Cc: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-10perf tools: Fix a build error on arm64 with clangMasami Hiramatsu1-1/+1
Since clang's -Wmissing-field-initializers warns if a data structure is initialized with a signle NULL as below, ---- tools/perf $ make CC=clang LLVM=1 ... arch/arm64/util/kvm-stat.c:74:9: error: missing field 'ops' initializer [-Werror,-Wmissing-field-initializers] { NULL }, ^ 1 error generated. ---- add another field initializer expressly as same as other arch's kvm-stat.c code. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Link: http://lore.kernel.org/lkml/162037767540.94840.15758657049033010518.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-01Merge tag 'perf-tools-for-v5.13-2021-04-29' of ↵Linus Torvalds25-32/+1228
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tool updates from Arnaldo Carvalho de Melo: "perf stat: - Add support for hybrid PMUs to support systems such as Intel Alderlake and its BIG/little core/atom cpus. - Introduce 'bperf' to share hardware PMCs with BPF. - New --iostat option to collect and present IO stats on Intel hardware. This functionality is based on recently introduced sysfs attributes for Intel® Xeon® Scalable processor family (code name Skylake-SP) in commit bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping") It is intended to provide four I/O performance metrics in MB per each PCIe root port: - Inbound Read: I/O devices below root port read from the host memory - Inbound Write: I/O devices below root port write to the host memory - Outbound Read: CPU reads from I/O devices below root port - Outbound Write: CPU writes to I/O devices below root port - Align CSV output for summary. - Clarify --null use cases: Assess raw overhead of 'perf stat' or measure just wall clock time. - Improve readability of shadow stats. perf record: - Change the COMM when starting tha workload so that --exclude-perf doesn't seem to be not honoured. - Improve 'Workload failed' message printing events + what was exec'ed. - Fix cross-arch support for TIME_CONV. perf report: - Add option to disable raw event ordering. - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'. - Improvements to --stat output, that shows information about PERF_RECORD_ events. - Preserve identifier id in OCaml demangler. perf annotate: - Show full source location with 'l' hotkey in the 'perf annotate' TUI. - Add line number like in TUI and source location at EOL to the 'perf annotate' --stdio mode. - Add --demangle and --demangle-kernel to 'perf annotate'. - Allow configuring annotate.demangle{,_kernel} in 'perf config'. - Fix sample events lost in stdio mode. perf data: - Allow converting a perf.data file to JSON. libperf: - Add support for user space counter access. - Update topdown documentation to permit rdpmc calls. perf test: - Add 'perf test' for 'perf stat' CSV output. - Add 'perf test' entries to test the hybrid PMU support. - Cleanup 'perf test daemon' if its 'perf test' is interrupted. - Handle metric reuse in pmu-events parsing 'perf test' entry. - Add test for PE executable support. - Add timeout for wait for daemon start in its 'perf test' entries. Build: - Enable libtraceevent dynamic linking. - Improve feature detection output. - Fix caching of feature checks caching. - First round of updates for tools copies of kernel headers. - Enable warnings when compiling BPF programs. Vendor specific events: - Intel: - Add missing skylake & icelake model numbers. - arm64: - Add Hisi hip08 L1, L2 and L3 metrics. - Add Fujitsu A64FX PMU events. - PowerPC: - Initial JSON/events list for power10 platform. - Remove unsupported power9 metrics. - AMD: - Add Zen3 events. - Fix broken L2 Cache Hits from L2 HWPF metric. - Use lowercases for all the eventcodes and umasks. Hardware tracing: - arm64: - Update CoreSight ETM metadata format. - Fix bitmap for CS-ETM option. - Support PID tracing in config. - Detect pid in VMID for kernel running at EL2. Arch specific updates: - MIPS: - Support MIPS unwinding and dwarf-regs. - Generate mips syscalls_n64.c syscall table. - PowerPC: - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC. - Support pipeline stage cycles for powerpc. libbeauty: - Fix fsconfig generator" * tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (132 commits) perf build: Defer printing detected features to the end of all feature checks tools build: Allow deferring printing the results of feature detection perf build: Regenerate the FEATURE_DUMP file after extra feature checks perf session: Dump PERF_RECORD_TIME_CONV event perf session: Add swap operation for event TIME_CONV perf jit: Let convert_timestamp() to be backwards-compatible perf tools: Change fields type in perf_record_time_conv perf tools: Enable libtraceevent dynamic linking perf Documentation: Document intel-hybrid support perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid perf tests: Support 'Convert perf time to TSC' test for hybrid perf tests: Support 'Session topology' test for hybrid perf tests: Support 'Parse and process metrics' test for hybrid perf tests: Support 'Track with sched_switch' test for hybrid perf tests: Skip 'Setup struct perf_event_attr' test for hybrid perf tests: Add hybrid cases for 'Roundtrip evsel->name' test perf tests: Add hybrid cases for 'Parse event definition strings' test perf record: Uniquify hybrid event name perf stat: Warn group events from different hybrid PMU perf stat: Filter out unmatched aggregation for hybrid event ...
2021-04-28Merge tag 'perf-core-2021-04-28' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event updates from Ingo Molnar: - Improve Intel uncore PMU support: - Parse uncore 'discovery tables' - a new hardware capability enumeration method introduced on the latest Intel platforms. This table is in a well-defined PCI namespace location and is read via MMIO. It is organized in an rbtree. These uncore tables will allow the discovery of standard counter blocks, but fancier counters still need to be enumerated explicitly. - Add Alder Lake support - Improve IIO stacks to PMON mapping support on Skylake servers - Add Intel Alder Lake PMU support - which requires the introduction of 'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big') and Gracemont ('small' - Atom derived) cores. The CPU-side feature set is entirely symmetrical - but on the PMU side there's core type dependent PMU functionality. - Reduce data loss with CPU level hardware tracing on Intel PT / AUX profiling, by fixing the AUX allocation watermark logic. - Improve ring buffer allocation on NUMA systems - Put 'struct perf_event' into their separate kmem_cache pool - Add support for synchronous signals for select perf events. The immediate motivation is to support low-overhead sampling-based race detection for user-space code. The feature consists of the following main changes: - Add thread-only event inheritance via perf_event_attr::inherit_thread, which limits inheritance of events to CLONE_THREAD. - Add the ability for events to not leak through exec(), via perf_event_attr::remove_on_exec. - Allow the generation of SIGTRAP via perf_event_attr::sigtrap, extend siginfo with an u64 ::si_perf, and add the breakpoint information to ::si_addr and ::si_perf if the event is PERF_TYPE_BREAKPOINT. The siginfo support is adequate for breakpoints right now - but the new field can be used to introduce support for other types of metadata passed over siginfo as well. - Misc fixes, cleanups and smaller updates. * tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) signal, perf: Add missing TRAP_PERF case in siginfo_layout() signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures perf/x86: Allow for 8<num_fixed_counters<16 perf/x86/rapl: Add support for Intel Alder Lake perf/x86/cstate: Add Alder Lake CPU support perf/x86/msr: Add Alder Lake CPU support perf/x86/intel/uncore: Add Alder Lake support perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE perf/x86/intel: Add Alder Lake Hybrid support perf/x86: Support filter_match callback perf/x86/intel: Add attr_update for Hybrid PMUs perf/x86: Add structures for the attributes of Hybrid PMUs perf/x86: Register hybrid PMUs perf/x86: Factor out x86_pmu_show_pmu_cap perf/x86: Remove temporary pmu assignment in event_init perf/x86/intel: Factor out intel_pmu_check_extra_regs perf/x86/intel: Factor out intel_pmu_check_event_constraints perf/x86/intel: Factor out intel_pmu_check_num_counters perf/x86: Hybrid PMU support for extra_regs perf/x86: Hybrid PMU support for event constraints ...
2021-04-20perf arm64: Fix off-by-one directory paths.Ian Rogers3-6/+6
Relative path include works in the regular build due to -I paths but may break in other situations. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lore.kernel.org/lkml/20210416214113.552252-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20perf stat: Enable iostat mode for x86 platformsAlexander Antonov2-0/+361
This functionality is based on recently introduced sysfs attributes for Intel® Xeon® Scalable processor family (code name Skylake-SP): Commit bb42b3d39781d7fc ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping") Mode is intended to provide four I/O performance metrics in MB per each PCIe root port: - Inbound Read: I/O devices below root port read from the host memory - Inbound Write: I/O devices below root port write to the host memory - Outbound Read: CPU reads from I/O devices below root port - Outbound Write: CPU writes to I/O devices below root port Each metric requiries only one uncore event which increments at every 4B transfer in corresponding direction. The formulas to compute metrics are generic: #EventCount * 4B / (1024 * 1024) Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210419094147.15909-4-alexander.antonov@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20perf stat: Helper functions for PCIe root ports list in iostat modeAlexander Antonov1-0/+110
Introduce helper functions to control PCIe root ports list. These helpers will be used in the follow-up patch. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210419094147.15909-3-alexander.antonov@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-16perf intel-pt: Use aux_watermarkAlexander Shishkin1-0/+6
Turns out, the default setting of attr.aux_watermark to half of the total buffer size is not very useful, especially with smaller buffers. The problem is that, after half of the buffer is filled up, the kernel updates ->aux_head and sets up the next "transaction", while observing that ->aux_tail is still zero (as userspace haven't had the chance to update it), meaning that the trace will have to stop at the end of this second "transaction". This means, for example, that the second PERF_RECORD_AUX in every trace comes with TRUNCATED flag set. Setting attr.aux_watermark to quarter of the buffer gives enough space for the ->aux_tail update to be observed and prevents the data loss. The obligatory before/after showcase: > # perf_before record -e intel_pt//u -m,8 uname > Linux > [ perf record: Woken up 6 times to write data ] > Warning: > AUX data lost 4 times out of 10! > > [ perf record: Captured and wrote 0.099 MB perf.data ] > # perf record -e intel_pt//u -m,8 uname > Linux > [ perf record: Woken up 4 times to write data ] > [ perf record: Captured and wrote 0.039 MB perf.data ] The effect is still visible with large workloads and large buffers, although less pronounced. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210414154955.49603-3-alexander.shishkin@linux.intel.com
2021-04-08perf pmu: Add pmu_events_map__find() function to find the common PMU map for ↵John Garry2-0/+26
the system Add a function to find the common PMU map for the system. For arm64, a special variant is added. This is because arm64 supports heterogeneous CPU systems. As such, it cannot be guaranteed that the cpumap is same for all CPUs. So in case of heterogeneous systems, don't return a cpumap. Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Paul A. Clarke <pc@us.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1617791570-165223-4-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-26perf sort: Display sort dimension p_stage_cyc only on supported archsAthira Rajeev1-0/+7
The sort dimension "p_stage_cyc" is used to represent pipeline stage cycle information. Presently, this is used only in powerpc. For unsupported platforms, we don't want to display it in the perf report output columns. Hence add check in sort_dimension__add() and skip the sort key incase it is not applicable for the particular arch. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-6-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-26perf tools: Support pipeline stage cycles for powerpcAthira Rajeev1-2/+16
The pipeline stage cycles details can be recorded on powerpc from the contents of Performance Monitor Unit (PMU) registers. On ISA v3.1 platform, sampling registers exposes the cycles spent in different pipeline stages. Patch adds perf tools support to present two of the cycle counter information along with memory latency (weight). Re-use the field 'ins_lat' for storing the first pipeline stage cycle. This is stored in 'var2_w' field of 'perf_sample_weight'. Add a new field 'p_stage_cyc' to store the second pipeline stage cycle which is stored in 'var3_w' field of perf_sample_weight. Add new sort function 'Pipeline Stage Cycle' and include this in default_mem_sort_order[]. This new sort function may be used to denote some other pipeline stage in another architecture. So add this to list of sort entries that can have dynamic header string. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-5-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-26perf powerpc: Add support for PERF_SAMPLE_WEIGHT_STRUCTAthira Rajeev3-0/+42
Add arch specific arch_evsel__set_sample_weight() to set the new sample type for powerpc. Add arch specific arch_perf_parse_sample_weight() to store the sample->weight values depending on the sample type applied. if the new sample type (PERF_SAMPLE_WEIGHT_STRUCT) is applied, store only the lower 32 bits to sample->weight. If sample type is 'PERF_SAMPLE_WEIGHT', store the full 64-bit to sample->weight. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: https://lore.kernel.org/r/1616425047-1666-4-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-23perf tools: Fix various typos in commentsIngo Molnar7-10/+10
Fix ~124 single-word typos and a few spelling errors in the perf tooling code, accumulated over the years. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-15tools/perf: Convert to insn_decode()Borislav Petkov2-9/+9
Simplify code, no functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lkml.kernel.org/r/20210304174237.31945-20-bp@alien8.de
2021-03-08Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo11-7/+142
To pick up the fixes sent for v5.12 and continue development based on v5.12-rc2, i.e. without the swap on file bug. This also gets a slightly newer and better tools/perf/arch/arm/util/cs-etm.c patch version, using the BIT() macro, that had already been slated to v5.13 but ended up going to v5.12-rc1 on an older version. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf cs-etm: Fix bitmap for optionSuzuki K Poulose1-4/+8
When set option with macros ETM_OPT_CTXTID and ETM_OPT_TS, it wrongly takes these two values (14 and 28 prespectively) as bit masks, but actually both are the offset for bits. But this doesn't lead to further failure due to the AND logic operation will be always true for ETM_OPT_CTXTID / ETM_OPT_TS. This patch defines new independent macros (rather than using the "config" bits) for requesting the "contextid" and "timestamp" for cs_etm_set_option(). Signed-off-by: Suzuki Poulouse <suzuki.poulose@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Cc: Al Grant <al.grant@arm.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Cc: Denis Nikitin <denik@chromium.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-doc@vger.kernel.org Link: http://lore.kernel.org/lkml/20210206150833.42120-5-leo.yan@linaro.org [ Extract the change as a separate patch for easier review ] Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf tests x86: Move insn.h include to make sure it finds stddef.hArnaldo Carvalho de Melo2-2/+2
In some versions of alpine Linux the perf build is broken since commit 1d509f2a6ebca1ae ("x86/insn: Support big endian cross-compiles"): In file included from /usr/include/linux/byteorder/little_endian.h:13, from /usr/include/asm/byteorder.h:5, from arch/x86/util/../../../../arch/x86/include/asm/insn.h:10, from arch/x86/util/archinsn.c:2: /usr/include/linux/swab.h:161:8: error: unknown type name '__always_inline' static __always_inline __u16 __swab16p(const __u16 *p) So move the inclusion of arch/x86/include/asm/insn.h to later in the places where linux/stddef.h (that conditionally defines __always_inline) to workaround this problem on Alpine Linux 3.9 to 3.11, 3.12 onwards works. Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf test: Support the ins_lat check in the X86 specific testKan Liang4-0/+127
The ins_lat of PERF_SAMPLE_WEIGHT_STRUCT stands for the instruction latency, which is only available for X86. Add a X86 specific test for the ins_lat and PERF_SAMPLE_WEIGHT_STRUCT type. The test__x86_sample_parsing() uses the same way as the test__sample_parsing() to verify a sample type. Since the ins_lat and PERF_SAMPLE_WEIGHT_STRUCT are the only X86 specific sample type for now, the test__x86_sample_parsing() only verify the PERF_SAMPLE_WEIGHT_STRUCT type. Other sample types are still verified in the generic test. $ perf test 77 -v 77: x86 Sample parsing : --- start --- test child forked, pid 102370 test child finished with 0 ---- end ---- x86 Sample parsing: Ok Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/1614787285-104151-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06tools headers: Update syscall.tbl files to support mount_setattrArnaldo Carvalho de Melo3-0/+3
To pick the changes from: 9caccd41541a6f7d ("fs: introduce MOUNT_ATTR_IDMAP") This adds this new syscall to the tables used by tools such as 'perf trace', so that one can specify it by name and have it filtered, etc. Addressing these perf build warnings: Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/YD6Wsxr9ByUbab/a@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf tools: Clean 'generated' directory used for creating the syscall table ↵Andreas Wendleder1-5/+6
on x86 Remove generated directory tools/perf/arch/x86/include/generated. Signed-off-by: Andreas Wendleder <andreas.wendleder@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210301185642.163396-1-gonsolo@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06perf arch powerpc: Sync powerpc syscall.tbl with the kernel sourcesArnaldo Carvalho de Melo1-15/+5
To get the changes in: fbcee2ebe8edbb6a ("powerpc/32: Always save non volatile GPRs at syscall entry") That shouldn't cause any change in tooling, just silences the following tools/perf/ build warning: Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-02perf cs-etm: Support PID tracing in configSuzuki K Poulose1-12/+49
If the kernel is running at EL2, the pid of a task is exposed via VMID instead of the CONTEXTID. Add support for this in the perf tool. This patch respects user setting if user has specified any configs from "contextid", "contextid1" or "contextid2"; otherwise, it dynamically sets config based on PMU format "contextid". Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Co-developed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Al Grant <al.grant@arm.com> Link: https://lore.kernel.org/r/20210213113220.292229-4-leo.yan@linaro.org Link: https://lore.kernel.org/r/20210224164835.3497311-5-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-02perf cs-etm: Fix bitmap for optionSuzuki K Poulose1-4/+4
When set option with macros ETM_OPT_CTXTID and ETM_OPT_TS, it wrongly takes these two values (14 and 28 prespectively) as bit masks, but actually both are the offset for bits. But this doesn't lead to further failure due to the AND logic operation will be always true for ETM_OPT_CTXTID / ETM_OPT_TS. This patch uses the BIT() macro for option bits, thus it can request the correct bitmaps for "contextid" and "timestamp" when calling cs_etm_set_option(). Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mike Leach <mike.leach@linaro.org> Link: https://lore.kernel.org/r/20210213113220.292229-3-leo.yan@linaro.org Link: https://lore.kernel.org/r/20210224164835.3497311-4-mathieu.poirier@linaro.org [Extract the change as a separate patch for easier review] Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-02perf cs-etm: Update ETM metadata formatMike Leach1-2/+5
The current fixed metadata version format (version 0), means that adding metadata parameter items renders files from a previous version of perf unreadable. Per CPU parameters appear in a fixed order, but there is no field to indicate the number of ETM parameters per CPU. This patch updates the per CPU parameter blocks to include a NR_PARAMs value which indicates the number of parameters in the block. The header version is incremented to 1. Fixed ordering is retained, new ETM parameters are added to the end of the list. The reader code is updated to be able to read current version 0 files, For version 1, the reader will read the number of parameters in the per CPU block. This allows the reader to process older or newer files that may have different numbers of parameters than in use at the time perf was built. Signed-off-by: Mike Leach <mike.leach@linaro.org> Reviewed-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Link: https://lore.kernel.org/r/20210202214040.32349-1-mike.leach@linaro.org Link: https://lore.kernel.org/r/20210224164835.3497311-2-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-01perf tools: Generate mips syscalls_n64.c syscall tableTiezhu Yang3-0/+408
Grab a copy of arch/mips/kernel/syscalls/syscall_n64.tbl and use it to generate tools/perf/arch/mips/include/generated/asm/syscalls_n64.c file, this is similar with commit 1b700c997500 ("perf tools: Build syscall table .c header from kernel's syscall_64.tbl") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Juxin Gao <gaojuxin@loongson.cn> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Xuefeng Li <lixuefeng@loongson.cn> Cc: linux-mips@vger.kernel.org Link: http://lore.kernel.org/lkml/1612409724-3516-4-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-01perf tools: Support MIPS unwinding and dwarf-regsTiezhu Yang7-0/+188
Map perf APIs (perf_reg_name/get_arch_regstr/unwind__arch_reg_id) with MIPS specific registers. [ayan@wavecomp.com: repick this patch for unwinding userstack backtrace by perf and libunwind on MIPS based CPU.] [yangtiezhu@loongson.cn: Add sample_reg_masks[] to fix build error, silence some checkpatch errors and warnings, and also separate the original patches into two parts (MIPS kernel and perf tools) to merge easily.] The original patches: https://lore.kernel.org/patchwork/patch/1126521/ https://lore.kernel.org/patchwork/patch/1126520/ Committer notes: Do it as __perf_reg_name() to cope with: 067012974c8ae31a ("perf tools: Fix arm64 build error with gcc-11") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Archer Yan <ayan@wavecomp.com> Cc: David Daney <david.daney@cavium.com> Cc: Jianlin Lv <Jianlin.Lv@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Juxin Gao <gaojuxin@loongson.cn> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Xuefeng Li <lixuefeng@loongson.cn> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: linux-mips@vger.kernel.org Link: http://lore.kernel.org/lkml/1612409724-3516-3-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Archer Yan <ayan@wavecomp.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-23perf arch powerpc: Sync powerpc syscall.tbl with the kernel sourcesArnaldo Carvalho de Melo1-15/+5
To get the changes in: fbcee2ebe8edbb6a ("powerpc/32: Always save non volatile GPRs at syscall entry") That shouldn't cause any change in tooling, just silences the following tools/perf/ build warning: Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf tools: Fix arm64 build error with gcc-11Jianlin Lv7-7/+7
gcc version: 11.0.0 20210208 (experimental) (GCC) Following build error on arm64: ....... In function ‘printf’, inlined from ‘regs_dump__printf’ at util/session.c:1141:3, inlined from ‘regs__printf’ at util/session.c:1169:2: /usr/include/aarch64-linux-gnu/bits/stdio2.h:107:10: \ error: ‘%-5s’ directive argument is null [-Werror=format-overflow=] 107 | return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, \ __va_arg_pack ()); ...... In function ‘fprintf’, inlined from ‘perf_sample__fprintf_regs.isra’ at \ builtin-script.c:622:14: /usr/include/aarch64-linux-gnu/bits/stdio2.h:100:10: \ error: ‘%5s’ directive argument is null [-Werror=format-overflow=] 100 | return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt, 101 | __va_arg_pack ()); cc1: all warnings being treated as errors ....... This patch fixes Wformat-overflow warnings. Add helper function to convert NULL to "unknown". Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Guo Ren <guoren@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: iecedge@gmail.com Cc: linux-csky@vger.kernel.org Cc: linux-riscv@lists.infradead.org Link: http://lore.kernel.org/lkml/20210218031245.2078492-1-Jianlin.Lv@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf intel-pt: Retain the last PIP packet payload as isAdrian Hunter1-2/+2
Retain the PIP packet payload as is, instead of just the CR3, because it contains also the VMX NR flag which is needed to track VM-Entry. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf intel_pt: Add vmlaunch and vmresume as branchesAdrian Hunter1-0/+1
In preparation to support Intel PT decoding of virtual machine traces, add vmlaunch and vmresume as branch instructions. Note, sample flags will show "VMentry" even if the VM-Entry fails. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processingKan Liang1-0/+25
For X86, the var2_w field of PERF_SAMPLE_WEIGHT_STRUCT stands for the instruction latency. Current perf forces the var2_w to the data->ins_lat in the generic code. It works well for now because X86 is the only architecture that supports the PERF_SAMPLE_WEIGHT_STRUCT, but it may bring problems once other architectures support the sample type. For example, the var2_w may be used to capture something else on PowerPC. Create two architecture specific functions to parse and synthesize the weight related samples. Move the X86 specific codes to the X86 version functions. Other architectures can implement their own functions later separately. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/1612540912-6562-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-09perf arm64/s390: Fix printf conversion specifier for IP addressesArnaldo Carvalho de Melo2-2/+4
We need to use "%#" PRIx64 for u64 values, not "%lx". In arm64's and s390x cases the compiler doesn't complain, but lets fix this in case this code gets copied to a 32-bit arch, like with powerpc 32-bit that got fixed in the previous patch. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-09perf powerpc: Fix printf conversion specifier for IP addressesArnaldo Carvalho de Melo1-1/+2
We need to use "%#" PRIx64 for u64 values, not "%lx", fixing this build problem on powerpc 32-bit: 72 13.69 ubuntu:18.04-x-powerpc : FAIL powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 arch/powerpc/util/machine.c: In function 'arch__symbols__fixup_end': arch/powerpc/util/machine.c:23:12: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'u64 {aka long long unsigned int}' [-Werror=format=] pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end); ^ /git/linux/tools/perf/util/debug.h:18:21: note: in definition of macro 'pr_fmt' #define pr_fmt(fmt) fmt ^~~ /git/linux/tools/perf/util/debug.h:33:29: note: in expansion of macro 'pr_debugN' #define pr_debug4(fmt, ...) pr_debugN(4, pr_fmt(fmt), ##__VA_ARGS__) ^~~~~~~~~ /git/linux/tools/perf/util/debug.h:33:42: note: in expansion of macro 'pr_fmt' #define pr_debug4(fmt, ...) pr_debugN(4, pr_fmt(fmt), ##__VA_ARGS__) ^~~~~~ arch/powerpc/util/machine.c:23:2: note: in expansion of macro 'pr_debug4' pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end); ^~~~~~~~~ cc1: all warnings being treated as errors /git/linux/tools/build/Makefile.build:139: recipe for target 'util' failed make[5]: *** [util] Error 2 /git/linux/tools/build/Makefile.build:139: recipe for target 'powerpc' failed make[4]: *** [powerpc] Error 2 /git/linux/tools/build/Makefile.build:139: recipe for target 'arch' failed make[3]: *** [arch] Error 2 73 30.47 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Fixes: 557c3eadb7712741 ("perf powerpc: Fix gap between kernel end and module start") Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08perf tools: Support PERF_SAMPLE_WEIGHT_STRUCTKan Liang2-0/+9
The new sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the PERF_SAMPLE_WEIGHT sample type. Users can apply either the PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample type to retrieve the sample weight, but they cannot apply both sample types simultaneously. The new sample type shares the same space as the PERF_SAMPLE_WEIGHT sample type. The lower 32 bits are exactly the same for both sample type. The higher 32 bits may be different for different architecture. Add arch specific arch_evsel__set_sample_weight() to set the new sample type for X86. Only store the lower 32 bits for the sample->weight if the new sample type is applied. In practice, no memory access could last than 4G cycles. No data will be lost. If the kernel doesn't support the new sample type. Fall back to the PERF_SAMPLE_WEIGHT sample type. There is no impact for other architectures. Committer notes: Fixup related to PERF_SAMPLE_CODE_PAGE_SIZE, present in acme/perf/core but not upstream yet. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/1612296553-21962-6-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08perf tools: Support the auxiliary eventKan Liang2-0/+45
On the Intel Sapphire Rapids server, an auxiliary event has to be enabled simultaneously with the load latency event to retrieve complete Memory Info. Add X86 specific perf_mem_events__name() to handle the auxiliary event. - Users are only interested in the samples of the mem-loads event. Sample read the auxiliary event. - The auxiliary event must be in front of the load latency event in a group. Assume the second event to sample if the auxiliary event is the leader. - Add a weak is_mem_loads_aux_event() to check the auxiliary event for X86. For other ARCHs, it always return false. Parse the unique event name, mem-loads-aux, for the auxiliary event. Committer notes: According to 61b985e3e775a3a7 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids"), ENODATA is only returned by sys_perf_event_open() when used with these auxiliary events, with this in evsel__open_strerror(): case ENODATA: return scnprintf(msg, size, "Cannot collect data source with the load latency event alone. " "Please add an auxiliary event in front of the load latency event."); This is Ok at this point in time, but fragile long term, I pointed this out in the e-mail thread, requesting a follow up patch to check if ENODATA is really for this specific case. Fixed up sizeof(MEM_LOADS_AUX_NAME) bug pointed out by Namhyung. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210205152648.GC920417@kernel.org Link: http://lore.kernel.org/lkml/1612296553-21962-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08perf powerpc: Support exposing Performance Monitor Counter SPRs as part of ↵Athira Rajeev2-0/+12
extended regs To enable presenting of Performance Monitor Counter Registers (PMC1 to PMC6) as part of extended regsiters, this patch adds these to sample_reg_mask in the tool side (to use with -I? option). Simplified the PERF_REG_PMU_MASK_300/31 definition. Excluded the unsupported SPRs (MMCR3, SIER2, SIER3) from extended mask value for CPU_FTR_ARCH_300. Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>