diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-03-27 23:42:32 +0300 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-03-27 23:42:32 +0300 |
commit | 7b58b82b86c8b65a2b57a4c6cb96a460654f9e09 (patch) | |
tree | a13e19f216389f16f1cb6641d54751f167482515 /tools/perf/util/maps.c | |
parent | 02f9a04d76b76b80b05ddc33ceabe806b84fda3c (diff) | |
parent | ab0809af0bee88b689ba289ec8c40aa2be3a17ec (diff) | |
download | linux-7b58b82b86c8b65a2b57a4c6cb96a460654f9e09.tar.xz |
Merge tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
"New features:
perf ftrace:
- Add -n/--use-nsec option to the 'latency' subcommand.
Default: usecs:
$ sudo perf ftrace latency -T dput -a sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 2098375 | ############################# |
1 - 2 us | 61 | |
2 - 4 us | 33 | |
4 - 8 us | 13 | |
8 - 16 us | 124 | |
16 - 32 us | 123 | |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 1 | |
256 - 512 us | 0 | |
Better granularity with nsec:
$ sudo perf ftrace latency -T dput -a -n sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 0 | |
1 - 2 ns | 0 | |
2 - 4 ns | 0 | |
4 - 8 ns | 0 | |
8 - 16 ns | 0 | |
16 - 32 ns | 0 | |
32 - 64 ns | 0 | |
64 - 128 ns | 1163434 | ############## |
128 - 256 ns | 914102 | ############# |
256 - 512 ns | 884 | |
512 - 1024 ns | 613 | |
1 - 2 us | 31 | |
2 - 4 us | 17 | |
4 - 8 us | 7 | |
8 - 16 us | 123 | |
16 - 32 us | 83 | |
perf lock:
- Add -c/--combine-locks option to merge lock instances in the same
class into a single entry.
# perf lock report -c
Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns)
rcu_read_lock 251225 0 0 0 0 0
hrtimer_bases.lock 39450 0 0 0 0 0
&sb->s_type->i_l... 10301 1 662 662 662 662
ptlock_ptr(page) 10173 2 701 1402 760 642
&(ei->i_block_re... 8732 0 0 0 0 0
&xa->xa_lock 8088 0 0 0 0 0
&base->lock 6705 0 0 0 0 0
&p->pi_lock 5549 0 0 0 0 0
&dentry->d_lockr... 5010 4 1274 5097 1844 789
&ep->lock 3958 0 0 0 0 0
- Add -F/--field option to customize the list of fields to output:
$ perf lock report -F contended,wait_max -k avg_wait
Name contended max wait(ns) avg wait(ns)
slock-AF_INET6 1 23543 23543
&lruvec->lru_lock 5 18317 11254
slock-AF_INET6 1 10379 10379
rcu_node_1 1 2104 2104
&dentry->d_lockr... 1 1844 1844
&dentry->d_lockr... 1 1672 1672
&newf->file_lock 15 2279 1025
&dentry->d_lockr... 1 792 792
- Add --synth=no option for record, as there is no need to symbolize,
lock names comes from the tracepoints.
perf record:
- Threaded recording, opt-in, via the new --threads command line
option.
- Improve AMD IBS (Instruction-Based Sampling) error handling
messages.
perf script:
- Add 'brstackinsnlen' field (use it with -F) for branch stacks.
- Output branch sample type in 'perf script'.
perf report:
- Add "addr_from" and "addr_to" sort dimensions.
- Print branch stack entry type in 'perf report --dump-raw-trace'
- Fix symbolization for chrooted workloads.
Hardware tracing:
Intel PT:
- Add CFE (Control Flow Event) and EVD (Event Data) packets support.
- Add MODE.Exec IFLAG bit support.
Explanation about these features from the "Intel® 64 and IA-32
architectures software developer’s manual combined volumes: 1, 2A,
2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at:
https://cdrdv2.intel.com/v1/dl/getContent/671200
At page 3951:
"32.2.4
Event Trace is a capability that exposes details about the
asynchronous events, when they are generated, and when their
corresponding software event handler completes execution. These
include:
o Interrupts, including NMI and SMI, including the interrupt
vector when defined.
o Faults, exceptions including the fault vector.
- Page faults additionally include the page fault address,
when in context.
o Event handler returns, including IRET and RSM.
o VM exits and VM entries.¹
- VM exits include the values written to the “exit reason”
and “exit qualification” VMCS fields. INIT and SIPI events.
o TSX aborts, including the abort status returned for the RTM
instructions.
o Shutdown.
Additionally, it provides indication of the status of the
Interrupt Flag (IF), to indicate when interrupts are masked"
ARM CoreSight:
- Use advertised caps/min_interval as default sample_period on ARM
spe.
- Update deduction of TRCCONFIGR register for branch broadcast on
ARM's CoreSight ETM.
Vendor Events (JSON):
Intel:
- Update events and metrics for: Alderlake, Broadwell, Broadwell DE,
BroadwellX, CascadelakeX, Elkhartlake, Bonnell, Goldmont,
GoldmontPlus, Westmere EP-DP, Haswell, HaswellX, Icelake, IcelakeX,
Ivybridge, Ivytown, Jaketown, Knights Landing, Nehalem EP,
Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX,
Tigerlake, TremontX, Westmere EP-SP, and Westmere EX.
ARM:
- Add support for HiSilicon CPA PMU aliasing.
perf stat:
- Fix forked applications enablement of counters.
- The 'slots' should only be printed on a different order than the
one specified on the command line when 'topdown' events are
present, fix it.
Miscellaneous:
- Sync msr-index, cpufeatures header files with the kernel sources.
- Stop using some deprecated libbpf APIs in 'perf trace'.
- Fix some spelling mistakes.
- Refactor the maps pointers usage to pave the way for using refcount
debugging.
- Only offer the --tui option on perf top, report and annotate when
perf was built with libslang.
- Don't mention --to-ctf in 'perf data --help' when not linking with
the required library, libbabeltrace.
- Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by
array_size.cocci.
- Enhance the matching of sub-commands abbreviations:
'perf c2c rec' -> 'perf c2c record'
'perf c2c recport -> error
- Set build-id using build-id header on new mmap records.
- Fix generation of 'perf --version' string.
perf test:
- Add test for the arm_spe event.
- Add test to check unwinding using fame-pointer (fp) mode on arm64.
- Make metric testing more robust in 'perf test'.
- Add error message for unsupported branch stack cases.
libperf:
- Add API for allocating new thread map array.
- Fix typo in perf_evlist__open() failure error messages in libperf
tests.
perf c2c:
- Replace bitmap_weight() with bitmap_empty() where appropriate"
* tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (143 commits)
perf evsel: Improve AMD IBS (Instruction-Based Sampling) error handling messages
perf python: Add perf_env stubs that will be needed in evsel__open_strerror()
perf tools: Enhance the matching of sub-commands abbreviations
libperf tests: Fix typo in perf_evlist__open() failure error messages
tools arm64: Import cputype.h
perf lock: Add -F/--field option to control output
perf lock: Extend struct lock_key to have print function
perf lock: Add --synth=no option for record
tools headers cpufeatures: Sync with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
perf stat: Fix forked applications enablement of counters
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf evsel: Make evsel__env() always return a valid env
perf build-id: Fix spelling mistake "Cant" -> "Can't"
perf header: Fix spelling mistake "could't" -> "couldn't"
perf script: Add 'brstackinsnlen' for branch stacks
perf parse-events: Move slots only with topdown
perf ftrace latency: Update documentation
perf ftrace latency: Add -n/--use-nsec option
perf tools: Fix version kernel tag
...
Diffstat (limited to 'tools/perf/util/maps.c')
-rw-r--r-- | tools/perf/util/maps.c | 403 |
1 files changed, 403 insertions, 0 deletions
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c new file mode 100644 index 000000000000..37bd5b40000d --- /dev/null +++ b/tools/perf/util/maps.c @@ -0,0 +1,403 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <errno.h> +#include <stdlib.h> +#include <linux/zalloc.h> +#include "debug.h" +#include "dso.h" +#include "map.h" +#include "maps.h" +#include "thread.h" +#include "ui/ui.h" +#include "unwind.h" + +static void __maps__insert(struct maps *maps, struct map *map); + +static void maps__init(struct maps *maps, struct machine *machine) +{ + maps->entries = RB_ROOT; + init_rwsem(&maps->lock); + maps->machine = machine; + maps->last_search_by_name = NULL; + maps->nr_maps = 0; + maps->maps_by_name = NULL; + refcount_set(&maps->refcnt, 1); +} + +static void __maps__free_maps_by_name(struct maps *maps) +{ + /* + * Free everything to try to do it from the rbtree in the next search + */ + zfree(&maps->maps_by_name); + maps->nr_maps_allocated = 0; +} + +void maps__insert(struct maps *maps, struct map *map) +{ + down_write(&maps->lock); + __maps__insert(maps, map); + ++maps->nr_maps; + + if (map->dso && map->dso->kernel) { + struct kmap *kmap = map__kmap(map); + + if (kmap) + kmap->kmaps = maps; + else + pr_err("Internal error: kernel dso with non kernel map\n"); + } + + + /* + * If we already performed some search by name, then we need to add the just + * inserted map and resort. + */ + if (maps->maps_by_name) { + if (maps->nr_maps > maps->nr_maps_allocated) { + int nr_allocate = maps->nr_maps * 2; + struct map **maps_by_name = realloc(maps->maps_by_name, nr_allocate * sizeof(map)); + + if (maps_by_name == NULL) { + __maps__free_maps_by_name(maps); + up_write(&maps->lock); + return; + } + + maps->maps_by_name = maps_by_name; + maps->nr_maps_allocated = nr_allocate; + } + maps->maps_by_name[maps->nr_maps - 1] = map; + __maps__sort_by_name(maps); + } + up_write(&maps->lock); +} + +static void __maps__remove(struct maps *maps, struct map *map) +{ + rb_erase_init(&map->rb_node, &maps->entries); + map__put(map); +} + +void maps__remove(struct maps *maps, struct map *map) +{ + down_write(&maps->lock); + if (maps->last_search_by_name == map) + maps->last_search_by_name = NULL; + + __maps__remove(maps, map); + --maps->nr_maps; + if (maps->maps_by_name) + __maps__free_maps_by_name(maps); + up_write(&maps->lock); +} + +static void __maps__purge(struct maps *maps) +{ + struct map *pos, *next; + + maps__for_each_entry_safe(maps, pos, next) { + rb_erase_init(&pos->rb_node, &maps->entries); + map__put(pos); + } +} + +static void maps__exit(struct maps *maps) +{ + down_write(&maps->lock); + __maps__purge(maps); + up_write(&maps->lock); +} + +bool maps__empty(struct maps *maps) +{ + return !maps__first(maps); +} + +struct maps *maps__new(struct machine *machine) +{ + struct maps *maps = zalloc(sizeof(*maps)); + + if (maps != NULL) + maps__init(maps, machine); + + return maps; +} + +void maps__delete(struct maps *maps) +{ + maps__exit(maps); + unwind__finish_access(maps); + free(maps); +} + +void maps__put(struct maps *maps) +{ + if (maps && refcount_dec_and_test(&maps->refcnt)) + maps__delete(maps); +} + +struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp) +{ + struct map *map = maps__find(maps, addr); + + /* Ensure map is loaded before using map->map_ip */ + if (map != NULL && map__load(map) >= 0) { + if (mapp != NULL) + *mapp = map; + return map__find_symbol(map, map->map_ip(map, addr)); + } + + return NULL; +} + +struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name, struct map **mapp) +{ + struct symbol *sym; + struct map *pos; + + down_read(&maps->lock); + + maps__for_each_entry(maps, pos) { + sym = map__find_symbol_by_name(pos, name); + + if (sym == NULL) + continue; + if (!map__contains_symbol(pos, sym)) { + sym = NULL; + continue; + } + if (mapp != NULL) + *mapp = pos; + goto out; + } + + sym = NULL; +out: + up_read(&maps->lock); + return sym; +} + +int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams) +{ + if (ams->addr < ams->ms.map->start || ams->addr >= ams->ms.map->end) { + if (maps == NULL) + return -1; + ams->ms.map = maps__find(maps, ams->addr); + if (ams->ms.map == NULL) + return -1; + } + + ams->al_addr = ams->ms.map->map_ip(ams->ms.map, ams->addr); + ams->ms.sym = map__find_symbol(ams->ms.map, ams->al_addr); + + return ams->ms.sym ? 0 : -1; +} + +size_t maps__fprintf(struct maps *maps, FILE *fp) +{ + size_t printed = 0; + struct map *pos; + + down_read(&maps->lock); + + maps__for_each_entry(maps, pos) { + printed += fprintf(fp, "Map:"); + printed += map__fprintf(pos, fp); + if (verbose > 2) { + printed += dso__fprintf(pos->dso, fp); + printed += fprintf(fp, "--\n"); + } + } + + up_read(&maps->lock); + + return printed; +} + +int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp) +{ + struct rb_root *root; + struct rb_node *next, *first; + int err = 0; + + down_write(&maps->lock); + + root = &maps->entries; + + /* + * Find first map where end > map->start. + * Same as find_vma() in kernel. + */ + next = root->rb_node; + first = NULL; + while (next) { + struct map *pos = rb_entry(next, struct map, rb_node); + + if (pos->end > map->start) { + first = next; + if (pos->start <= map->start) + break; + next = next->rb_left; + } else + next = next->rb_right; + } + + next = first; + while (next) { + struct map *pos = rb_entry(next, struct map, rb_node); + next = rb_next(&pos->rb_node); + + /* + * Stop if current map starts after map->end. + * Maps are ordered by start: next will not overlap for sure. + */ + if (pos->start >= map->end) + break; + + if (verbose >= 2) { + + if (use_browser) { + pr_debug("overlapping maps in %s (disable tui for more info)\n", + map->dso->name); + } else { + fputs("overlapping maps:\n", fp); + map__fprintf(map, fp); + map__fprintf(pos, fp); + } + } + + rb_erase_init(&pos->rb_node, root); + /* + * Now check if we need to create new maps for areas not + * overlapped by the new map: + */ + if (map->start > pos->start) { + struct map *before = map__clone(pos); + + if (before == NULL) { + err = -ENOMEM; + goto put_map; + } + + before->end = map->start; + __maps__insert(maps, before); + if (verbose >= 2 && !use_browser) + map__fprintf(before, fp); + map__put(before); + } + + if (map->end < pos->end) { + struct map *after = map__clone(pos); + + if (after == NULL) { + err = -ENOMEM; + goto put_map; + } + + after->start = map->end; + after->pgoff += map->end - pos->start; + assert(pos->map_ip(pos, map->end) == after->map_ip(after, map->end)); + __maps__insert(maps, after); + if (verbose >= 2 && !use_browser) + map__fprintf(after, fp); + map__put(after); + } +put_map: + map__put(pos); + + if (err) + goto out; + } + + err = 0; +out: + up_write(&maps->lock); + return err; +} + +/* + * XXX This should not really _copy_ te maps, but refcount them. + */ +int maps__clone(struct thread *thread, struct maps *parent) +{ + struct maps *maps = thread->maps; + int err; + struct map *map; + + down_read(&parent->lock); + + maps__for_each_entry(parent, map) { + struct map *new = map__clone(map); + + if (new == NULL) { + err = -ENOMEM; + goto out_unlock; + } + + err = unwind__prepare_access(maps, new, NULL); + if (err) + goto out_unlock; + + maps__insert(maps, new); + map__put(new); + } + + err = 0; +out_unlock: + up_read(&parent->lock); + return err; +} + +static void __maps__insert(struct maps *maps, struct map *map) +{ + struct rb_node **p = &maps->entries.rb_node; + struct rb_node *parent = NULL; + const u64 ip = map->start; + struct map *m; + + while (*p != NULL) { + parent = *p; + m = rb_entry(parent, struct map, rb_node); + if (ip < m->start) + p = &(*p)->rb_left; + else + p = &(*p)->rb_right; + } + + rb_link_node(&map->rb_node, parent, p); + rb_insert_color(&map->rb_node, &maps->entries); + map__get(map); +} + +struct map *maps__find(struct maps *maps, u64 ip) +{ + struct rb_node *p; + struct map *m; + + down_read(&maps->lock); + + p = maps->entries.rb_node; + while (p != NULL) { + m = rb_entry(p, struct map, rb_node); + if (ip < m->start) + p = p->rb_left; + else if (ip >= m->end) + p = p->rb_right; + else + goto out; + } + + m = NULL; +out: + up_read(&maps->lock); + return m; +} + +struct map *maps__first(struct maps *maps) +{ + struct rb_node *first = rb_first(&maps->entries); + + if (first) + return rb_entry(first, struct map, rb_node); + return NULL; +} |