summaryrefslogtreecommitdiff
path: root/tools/perf/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r--tools/perf/Documentation/perf-annotate.txt3
-rw-r--r--tools/perf/Documentation/perf-config.txt8
-rw-r--r--tools/perf/Documentation/perf-kvm.txt9
-rw-r--r--tools/perf/Documentation/perf-lock.txt4
-rw-r--r--tools/perf/Documentation/perf-record.txt60
-rw-r--r--tools/perf/Documentation/perf-report.txt4
-rw-r--r--tools/perf/Documentation/perf-stat.txt27
-rw-r--r--tools/perf/Documentation/perf-top.txt10
-rw-r--r--tools/perf/Documentation/topdown.txt70
9 files changed, 135 insertions, 60 deletions
diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt
index 980fe2c29275..fe168e8165c8 100644
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@@ -116,6 +116,9 @@ include::itrace.txt[]
-M::
--disassembler-style=:: Set disassembler style for objdump.
+--addr2line=<path>::
+ Path to addr2line binary.
+
--objdump=<path>::
Path to objdump binary.
diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt
index 39c890ead2dc..e56ae54805a8 100644
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -250,7 +250,13 @@ annotate.*::
These are in control of addresses, jump function, source code
in lines of assembly code from a specific program.
- annotate.disassembler_style:
+ annotate.addr2line::
+ addr2line binary to use for file names and line numbers.
+
+ annotate.objdump::
+ objdump binary to use for disassembly and annotations.
+
+ annotate.disassembler_style::
Use this to change the default disassembler style to some other value
supported by binutils, such as "intel", see the '-M' option help in the
'objdump' man page.
diff --git a/tools/perf/Documentation/perf-kvm.txt b/tools/perf/Documentation/perf-kvm.txt
index 2ad3f5d9f72b..b66be66fe836 100644
--- a/tools/perf/Documentation/perf-kvm.txt
+++ b/tools/perf/Documentation/perf-kvm.txt
@@ -58,7 +58,7 @@ There are a couple of variants of perf kvm:
events.
'perf kvm stat report' reports statistical data which includes events
- handled time, samples, and so on.
+ handled sample, percent_sample, time, percent_time, max_t, min_t, mean_t.
'perf kvm stat live' reports statistical data in a live mode (similar to
record + report but with statistical data updated live at a given display
@@ -82,6 +82,8 @@ OPTIONS
:GMEXAMPLESUBCMD: top
include::guest-files.txt[]
+--stdio:: Use the stdio interface.
+
-v::
--verbose::
Be more verbose (show counter open errors, etc).
@@ -97,7 +99,10 @@ STAT REPORT OPTIONS
-k::
--key=<value>::
Sorting key. Possible values: sample (default, sort by samples
- number), time (sort by average time).
+ number), percent_sample (sort by sample percentage), time
+ (sort by average time), precent_time (sort by time percentage),
+ max_t (sort by maximum time), min_t (sort by minimum time), mean_t
+ (sort by mean time).
-p::
--pid=::
Analyze events only for given process ID(s) (comma separated list).
diff --git a/tools/perf/Documentation/perf-lock.txt b/tools/perf/Documentation/perf-lock.txt
index 37aae194a2a1..6e5ba3cd2b72 100644
--- a/tools/perf/Documentation/perf-lock.txt
+++ b/tools/perf/Documentation/perf-lock.txt
@@ -155,8 +155,10 @@ CONTENTION OPTIONS
--tid=<value>::
Record events on existing thread ID (comma separated list).
+-M::
--map-nr-entries=<value>::
- Maximum number of BPF map entries (default: 10240).
+ Maximum number of BPF map entries (default: 16384).
+ This will be aligned to a power of 2.
--max-stack=<value>::
Maximum stack depth when collecting lock contention (default: 8).
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index ff815c2f67e8..680396c56bd1 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -119,9 +119,12 @@ OPTIONS
"perf report" to view group events together.
--filter=<filter>::
- Event filter. This option should follow an event selector (-e) which
- selects either tracepoint event(s) or a hardware trace PMU
- (e.g. Intel PT or CoreSight).
+ Event filter. This option should follow an event selector (-e).
+ If the event is a tracepoint, the filter string will be parsed by
+ the kernel. If the event is a hardware trace PMU (e.g. Intel PT
+ or CoreSight), it'll be processed as an address filter. Otherwise
+ it means a general filter using BPF which can be applied for any
+ kind of event.
- tracepoint filters
@@ -176,6 +179,57 @@ OPTIONS
Multiple filters can be separated with space or comma.
+ - bpf filters
+
+ A BPF filter can access the sample data and make a decision based on the
+ data. Users need to set an appropriate sample type to use the BPF
+ filter. BPF filters need root privilege.
+
+ The sample data field can be specified in lower case letter. Multiple
+ filters can be separated with comma. For example,
+
+ --filter 'period > 1000, cpu == 1'
+ or
+ --filter 'mem_op == load || mem_op == store, mem_lvl > l1'
+
+ The former filter only accept samples with period greater than 1000 AND
+ CPU number is 1. The latter one accepts either load and store memory
+ operations but it should have memory level above the L1. Since the
+ mem_op and mem_lvl fields come from the (memory) data_source, it'd only
+ work with some events which set the data_source field.
+
+ Also user should request to collect that information (with -d option in
+ the above case). Otherwise, the following message will be shown.
+
+ $ sudo perf record -e cycles --filter 'mem_op == load'
+ Error: cycles event does not have PERF_SAMPLE_DATA_SRC
+ Hint: please add -d option to perf record.
+ failed to set filter "BPF" on event cycles with 22 (Invalid argument)
+
+ Essentially the BPF filter expression is:
+
+ <term> <operator> <value> (("," | "||") <term> <operator> <value>)*
+
+ The <term> can be one of:
+ ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
+ code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
+ p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
+ mem_dtlb, mem_blk, mem_hops
+
+ The <operator> can be one of:
+ ==, !=, >, >=, <, <=, &
+
+ The <value> can be one of:
+ <number> (for any term)
+ na, load, store, pfetch, exec (for mem_op)
+ l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
+ na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
+ remote (for mem_remote)
+ na, locked (for mem_locked)
+ na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
+ na, by_data, by_addr (for mem_blk)
+ hops0, hops1, hops2, hops3 (for mem_hops)
+
--exclude-perf::
Don't record events issued by perf itself. This option should follow
an event selector (-e) which selects tracepoint event(s). It adds a
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index c242e8da6b1a..af068b4f1e5a 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -117,6 +117,7 @@ OPTIONS
- addr: (Full) virtual address of the sampled instruction
- retire_lat: On X86, this reports pipeline stall of this instruction compared
to the previous instruction in cycles. And currently supported only on X86
+ - simd: Flags describing a SIMD operation. "e" for empty Arm SVE predicate. "p" for partial Arm SVE predicate
By default, comm, dso and symbol keys are used.
(i.e. --sort comm,dso,symbol)
@@ -380,6 +381,9 @@ OPTIONS
This allows to examine the path the program took to each sample.
The data collection must have used -b (or -j) and -g.
+--addr2line=<path>::
+ Path to addr2line binary.
+
--objdump=<path>::
Path to objdump binary.
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 18abdc1dce05..29bdcfa93f04 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -394,10 +394,10 @@ See perf list output for the possible metrics and metricgroups.
Do not aggregate counts across all monitored CPUs.
--topdown::
-Print complete top-down metrics supported by the CPU. This allows to
-determine bottle necks in the CPU pipeline for CPU bound workloads,
-by breaking the cycles consumed down into frontend bound, backend bound,
-bad speculation and retiring.
+Print top-down metrics supported by the CPU. This allows to determine
+bottle necks in the CPU pipeline for CPU bound workloads, by breaking
+the cycles consumed down into frontend bound, backend bound, bad
+speculation and retiring.
Frontend bound means that the CPU cannot fetch and decode instructions fast
enough. Backend bound means that computation or memory access is the bottle
@@ -430,15 +430,18 @@ CPUs the workload runs on. If needed the CPUs can be forced using
taskset.
--td-level::
-Print the top-down statistics that equal to or lower than the input level.
-It allows users to print the interested top-down metrics level instead of
-the complete top-down metrics.
+Print the top-down statistics that equal the input level. It allows
+users to print the interested top-down metrics level instead of the
+level 1 top-down metrics.
+
+As the higher levels gather more metrics and use more counters they
+will be less accurate. By convention a metric can be examined by
+appending '_group' to it and this will increase accuracy compared to
+gathering all metrics for a level. For example, level 1 analysis may
+highlight 'tma_frontend_bound'. This metric may be drilled into with
+'tma_frontend_bound_group' with
+'perf stat -M tma_frontend_bound_group...'.
-The availability of the top-down metrics level depends on the hardware. For
-example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids
-supports both L1 and L2 top-down metrics.
-
-Default: 0 means the max level that the current hardware support.
Error out if the input is higher than the supported max level.
--no-merge::
diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index c60e615b7183..3c202ec080ba 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -161,6 +161,12 @@ Default is to monitor all CPUS.
-M::
--disassembler-style=:: Set disassembler style for objdump.
+--addr2line=<path>::
+ Path to addr2line binary.
+
+--objdump=<path>::
+ Path to objdump binary.
+
--prefix=PREFIX::
--prefix-strip=N::
Remove first N entries from source file path names in executables
@@ -248,6 +254,10 @@ Default is to monitor all CPUS.
The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
Note that this feature may not be available on all processors.
+--branch-history::
+ Add the addresses of sampled taken branches to the callstack.
+ This allows to examine the path the program took to each sample.
+
--raw-trace::
When displaying traceevent output, do not use print fmt or plugins.
diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
index a15b93fdcf50..ae0aee86844f 100644
--- a/tools/perf/Documentation/topdown.txt
+++ b/tools/perf/Documentation/topdown.txt
@@ -1,46 +1,35 @@
-Using TopDown metrics in user space
------------------------------------
+Using TopDown metrics
+---------------------
-Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown
-methodology to break down CPU pipeline execution into 4 bottlenecks:
-frontend bound, backend bound, bad speculation, retiring.
+TopDown metrics break apart performance bottlenecks. Starting at level
+1 it is typical to get metrics on retiring, bad speculation, frontend
+bound, and backend bound. Higher levels provide more detail in to the
+level 1 bottlenecks, such as at level 2: core bound, memory bound,
+heavy operations, light operations, branch mispredicts, machine
+clears, fetch latency and fetch bandwidth. For more details see [1][2][3].
-For more details on Topdown see [1][5]
+perf stat --topdown implements this using available metrics that vary
+per architecture.
-Traditionally this was implemented by events in generic counters
-and specific formulas to compute the bottlenecks.
-
-perf stat --topdown implements this.
-
-Full Top Down includes more levels that can break down the
-bottlenecks further. This is not directly implemented in perf,
-but available in other tools that can run on top of perf,
-such as toplev[2] or vtune[3]
+% perf stat -a --topdown -I1000
+# time % tma_retiring % tma_backend_bound % tma_frontend_bound % tma_bad_speculation
+ 1.001141351 11.5 34.9 46.9 6.7
+ 2.006141972 13.4 28.1 50.4 8.1
+ 3.010162040 12.9 28.1 51.1 8.0
+ 4.014009311 12.5 28.6 51.8 7.2
+ 5.017838554 11.8 33.0 48.0 7.2
+ 5.704818971 14.0 27.5 51.3 7.3
+...
-New Topdown features in Ice Lake
-===============================
+New Topdown features in Intel Ice Lake
+======================================
With Ice Lake CPUs the TopDown metrics are directly available as
fixed counters and do not require generic counters. This allows
to collect TopDown always in addition to other events.
-% perf stat -a --topdown -I1000
-# time retiring bad speculation frontend bound backend bound
- 1.001281330 23.0% 15.3% 29.6% 32.1%
- 2.003009005 5.0% 6.8% 46.6% 41.6%
- 3.004646182 6.7% 6.7% 46.0% 40.6%
- 4.006326375 5.0% 6.4% 47.6% 41.0%
- 5.007991804 5.1% 6.3% 46.3% 42.3%
- 6.009626773 6.2% 7.1% 47.3% 39.3%
- 7.011296356 4.7% 6.7% 46.2% 42.4%
- 8.012951831 4.7% 6.7% 47.5% 41.1%
-...
-
-This also enables measuring TopDown per thread/process instead
-of only per core.
-
-Using TopDown through RDPMC in applications on Ice Lake
-======================================================
+Using TopDown through RDPMC in applications on Intel Ice Lake
+=============================================================
For more fine grained measurements it can be useful to
access the new directly from user space. This is more complicated,
@@ -301,8 +290,8 @@ This "opens" a new measurement period.
A program using RDPMC for TopDown should schedule such a reset
regularly, as in every few seconds.
-Limits on Ice Lake
-==================
+Limits on Intel Ice Lake
+========================
Four pseudo TopDown metric events are exposed for the end-users,
topdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound.
@@ -318,8 +307,8 @@ a sampling read group. Since the SLOTS event must be the leader of a TopDown
group, the second event of the group is the sampling event.
For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
-Extension on Sapphire Rapids Server
-===================================
+Extension on Intel Sapphire Rapids Server
+=========================================
The metrics counter is extended to support TMA method level 2 metrics.
The lower half of the register is the TMA level 1 metrics (legacy).
The upper half is also divided into four 8-bit fields for the new level 2
@@ -338,7 +327,6 @@ other four level 2 metrics by subtracting corresponding metrics as below.
[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
-[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
-[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
+[2] https://sites.google.com/site/analysismethods/yasin-pubs
+[3] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
[4] https://github.com/andikleen/pmu-tools/tree/master/jevents
-[5] https://sites.google.com/site/analysismethods/yasin-pubs