summaryrefslogtreecommitdiff
path: root/arch/arc/kernel/perf_event.c
AgeCommit message (Collapse)AuthorFilesLines
2021-12-29arc: perf: Move static structs to where they're really usedAlexey Brodkin1-0/+162
It is all well described by Stephen Rothwell who initially spotted that: ----------------------------->8---------------------------- After merging the origin tree, today's linux-next build (arc haps_hs_smp_defconfig+kselftest) produced these warnings: arch/arc/include/asm/perf_event.h:126:27: warning: 'arc_pmu_cache_map' defined but not used [-Wunused-const-variable=] arch/arc/include/asm/perf_event.h:91:27: warning: 'arc_pmu_ev_hw_map' defined but not used [-Wunused-const-variable=] Introduced by commit 0dd450fe13da ("ARC: Add perf support for ARC700 cores") The 2 static arrays should be moved into arch/arc/kernel/perf_event.c (the only place that uses them). We get the warning because perf_event.h is also included by arch/arc/kernel/unaligned.c. ----------------------------->8---------------------------- Could be easily reproduced by running make with "W=1" on any up-to-date sources, when extra warnings get enabled (in particular "-Wunused-const-variable"), otherwise disabled by default in the top-level Makefile as "These warnings generated too much noise in a regular build". Cc: Mischa Jonker <mjonker@synopsys.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
2021-12-29ARC: perf: fix misleading comment about pmu vs counter stopVineet Gupta1-1/+1
Signed-off-by: Vineet Gupta <vgupta@ikernel.org>
2021-12-29ARC: perf: Remove redundant initialization of variable idxColin Ian King1-1/+1
The variable idx is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Reviewed-by: Vladimir Isaev <isaev@synopsys.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2020-10-22ARC: perf: redo the pct irq missing in device-tree handlingVineet Gupta1-9/+18
commit feb92d7d3813456c11dce21 "(ARC: perf: don't bail setup if pct irq missing in device-tree)" introduced a silly brown-paper bag bug: The assignment and comparison in an if statement were not bracketed correctly leaving the order of evaluation undefined. | | if (has_interrupts && (irq = platform_get_irq(pdev, 0) >= 0)) { | ^^^ ^^^^ And given such a chance, the compiler will bite you hard, fully entitled to generating this piece of beauty: | | # if (has_interrupts && (irq = platform_get_irq(pdev, 0) >= 0)) { | | bl.d @platform_get_irq <-- irq returned in r0 | | setge r2, r0, 0 <-- r2 is bool 1 or 0 if irq >= 0 true/false | brlt.d r0, 0, @.L114 | | st_s r2,[sp] <-- irq saved is bool 1 or 0, not actual return val | st 1,[r3,160] # arc_pmu.18_29->irq <-- drops bool and assumes 1 | | # return __request_percpu_irq(irq, handler, 0, | | bl.d @__request_percpu_irq; | mov_s r0,1 <-- drops even bool and assumes 1 which fails With the snafu fixed, everything is as expected. | bl.d @platform_get_irq <-- returns irq in r0 | | mov_s r2,r0 | brlt.d r2, 0, @.L112 | | st_s r0,[sp] <-- irq isaved is actual return value above | st r0,[r13,160] #arc_pmu.18_27->irq | | bl.d @__request_percpu_irq <-- r0 unchanged so actual irq returned | add r4,r4,r12 #, tmp363, __ptr Cc: <stable@vger.kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2020-08-17ARC: perf: don't bail setup if pct irq missing in device-treeVineet Gupta1-10/+4
Current code inadventely bails if hardware supports sampling/overflow interrupts, but the irq is missing from device tree. | | # perf stat -e cycles,instructions,major-faults,minor-faults ../hackbench | Running with 10 groups 400 process | Time: 0.921 | | Performance counter stats for '../hackbench': | | <not supported> cycles | <not supported> instructions | 0 major-faults | 8679 minor-faults This need not be as we can still do simple counting based perf stat. This unborks perf on HSDK-4xD Cc: <stable@vger.kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-10-22ARC: perf: Accommodate big-endian CPUAlexey Brodkin1-2/+2
8-letter strings representing ARC perf events are stores in two 32-bit registers as ASCII characters like that: "IJMP", "IALL", "IJMPTAK" etc. And the same order of bytes in the word is used regardless CPU endianness. Which means in case of big-endian CPU core we need to swap bytes to get the same order as if it was on little-endian CPU. Otherwise we're seeing the following error message on boot: ------------------------->8---------------------- ARC perf : 8 counters (32 bits), 40 conditions, [overflow IRQ support] sysfs: cannot create duplicate filename '/devices/arc_pct/events/pmji' CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3 Stack Trace: arc_unwind_core+0xd4/0xfc dump_stack+0x64/0x80 sysfs_warn_dup+0x46/0x58 sysfs_add_file_mode_ns+0xb2/0x168 create_files+0x70/0x2a0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at kernel/events/core.c:12144 perf_event_sysfs_init+0x70/0xa0 Failed to register pmu: arc_pct, reason -17 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.18 #3 Stack Trace: arc_unwind_core+0xd4/0xfc dump_stack+0x64/0x80 __warn+0x9c/0xd4 warn_slowpath_fmt+0x22/0x2c perf_event_sysfs_init+0x70/0xa0 ---[ end trace a75fb9a9837bd1ec ]--- ------------------------->8---------------------- What happens here we're trying to register more than one raw perf event with the same name "PMJI". Why? Because ARC perf events are 4 to 8 letters and encoded into two 32-bit words. In this particular case we deal with 2 events: * "IJMP____" which counts all jump & branch instructions * "IJMPC___" which counts only conditional jumps & branches Those strings are split in two 32-bit words this way "IJMP" + "____" & "IJMP" + "C___" correspondingly. Now if we read them swapped due to CPU core being big-endian then we read "PMJI" + "____" & "PMJI" + "___C". And since we interpret read array of ASCII letters as a null-terminated string on big-endian CPU we end up with 2 events of the same name "PMJI". Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: stable@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-01-18ARC: perf: avoid kernel killing where it is possibleEugeniy Paltsev1-2/+4
No, not gonna die tonight. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-01-18ARC: perf: move HW events mapping to separate functionEugeniy Paltsev1-15/+33
Move HW events mapping to separate function to make code more readable. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-01-18ARC: perf: introduce Kernel PMU events supportEugeniy Paltsev1-1/+105
Export all available ARC architected hardware events as kernel PMU events to make non-generic events accessible. ARC PMU HW allow us to read the list of all available events names. So we generate kernel PMU event list dynamically in arc_pmu_device_probe() using human-readable events names we got from HW instead of using pre-defined events list. -------------------------->8-------------------------- $ perf list [snip] arc_pmu/bdata64/ [Kernel PMU event] arc_pmu/bdcstall/ [Kernel PMU event] arc_pmu/bdslot/ [Kernel PMU event] arc_pmu/bfbmp/ [Kernel PMU event] arc_pmu/bfirqex/ [Kernel PMU event] arc_pmu/bflgstal/ [Kernel PMU event] arc_pmu/bflush/ [Kernel PMU event] -------------------------->8-------------------------- Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-01-18ARC: perf: trivial code cleanupEugeniy Paltsev1-43/+42
* Use BIT(), lower_32_bits(), upper_32_bits() macroses, fix code style violations. * Use u32, u64, s64 instead of uint32_t, uint64_t, int64_t * Fix description comment as this code doesn't belong only to ARC700 anymore. * Use SPDX License Identifier. * Remove useless ifdefs. ifdef around 'arc_pmu_match' structure declaration is useless as we refer to 'arc_pmu_match' in several places which aren't guarded with ifdef. Nevertheless 'ARC' option selects 'OF' unconditionally so we can simply get rid of this ifdef. Acked-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-11-22ARCv2: perf: optimize given that num counters <= 32Vineet Gupta1-9/+7
use ffz primitive which maps to ARCv2 instruction, vs. non atomic __test_and_set_bit It is unlikely if we will even have more than 32 counters, but still add a BUILD_BUG to catch that Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-11-22ARCv2: perf: tweak overflow interruptVineet Gupta1-10/+14
Current perf ISR loops thru all 32 counters, checking for each if it caused the interrupt. Instead only loop thru counters which actually interrupted (typically 1). Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-10-01arc: perf: Enable generic "cache-references" and "cache-misses" eventsAlexey Brodkin1-2/+4
We used to live with PERF_COUNT_HW_CACHE_REFERENCES and PERF_COUNT_HW_CACHE_REFERENCES not specified on ARC. Those events are actually aliases to 2 cache events that we do support and so this change sets "cache-reference" and "cache-misses" events in the same way as "L1-dcache-loads" and L1-dcache-load-misses. And while at it adding debug info for cache events as well as doing a subtle fix in HW events debug info - config value is much better represented by hex so we may see not only event index but as well other control bits set (if they exist). Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-05-30Fix typosAndrea Gelmini1-1/+1
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-05-17perf core: Pass max stack as a perf_callchain_entry contextArnaldo Carvalho de Melo1-3/+3
This makes perf_callchain_{user,kernel}() receive the max stack as context for the perf_callchain_entry, instead of accessing the global sysctl_perf_event_max_stack. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-12-12ARCv2: perf: Ensure perf intr gets enabled on all coresVineet Gupta1-23/+9
This was the second perf intr issue perf sampling on multicore requires intr to be enabled on all cores. ARC perf probe code used helper arc_request_percpu_irq() which calls - request_percpu_irq() on core0 - enable_percpu_irq() on all all cores (including core0) genirq requires that request be made ahead of enable call. However if perf probe happened on non core0 (observed on a 3.18 kernel), enable would get called ahead of request, failing obviously and rendering perf intr disabled on all such cores [ 11.120000] 1 ARC perf : 8 counters (48 bits), 113 conditions, [overflow IRQ support] [ 11.130000] 1 -----> enable_percpu_irq() IRQ 20 failed [ 11.140000] 3 -----> enable_percpu_irq() IRQ 20 failed [ 11.140000] 2 -----> enable_percpu_irq() IRQ 20 failed [ 11.140000] 0 =====> request_percpu_irq() IRQ 20 [ 11.140000] 0 -----> enable_percpu_irq() IRQ 20 Fix this fragility, by calling request_percpu_irq() on whatever core calls probe (there is no requirement on which core calls this anyways) and then calling enable on each cores. Interestingly this started as invesigation of STAR 9000838902: "sporadically IRQs enabled on perf prob" which was about occassional boot spew as request_percpu_irq got called non-locally (from an IPI), and re-enabled interrupts in following path proc_mkdir -> spin_unlock_irq() which the irq work code didn't like. | ARC perf : 8 counters (48 bits), 113 conditions, [overflow IRQ support] | | BUG: failure at ../kernel/irq_work.c:135/irq_work_run_list()! | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.10-01127-g285efb8e66d1 #2 | | Stack Trace: | arc_unwind_core.constprop.1+0x94/0x104 | dump_stack+0x62/0x98 | irq_work_run_list+0xb0/0xb4 | irq_work_run+0x22/0x3c | do_IPI+0x74/0x9c | handle_irq_event_percpu+0x34/0x164 | handle_percpu_irq+0x58/0x78 | generic_handle_irq+0x1e/0x2c | arch_do_IRQ+0x3c/0x60 | ret_from_exception+0x0/0x8 Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Brodkin <abrodkin@synopsys.com> Cc: <stable@vger.kernel.org> #4.2+ Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27ARCv2: perf: Finally introduce HS perf unitVineet Gupta1-1/+2
With all features in place, the ARC HS pct block can now be effectively allowed to be probed/used Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27ARCv2: perf: SMP supportAlexey Brodkin1-15/+54
* split off pmu info into singleton and per-cpu bits * setup PMU on all cores Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27ARCv2: perf: implement exclusion of event counting in user or kernel modeAlexey Brodkin1-2/+14
Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27ARCv2: perf: Support sampling events using overflow interruptsAlexey Brodkin1-8/+120
In times of ARC 700 performance counters didn't have support of interrupt an so for ARC we only had support of non-sampling events. Put simply only "perf stat" was functional. Now with ARC HS we have support of interrupts in performance counters which this change introduces support of. ARC performance counters act in the following way in regard of interrupts generation. [1] A counter counts starting from value set in PCT_COUNT register pair [2] Once counter reaches value set in PCT_INT_CNT interrupt is raised Basic setup look like this: [1] PCT_COUNT = 0; [2] PCT_INT_CNT = __limit_value__; [3] Enable interrupts for that counter and let it run [4] Let counter reach its limit [5] Handle interrupt when it happens Note that PCT HW block is build in CPU core and so ints interrupt line (which is basically OR of all counters IRQs) is wired directly to top-level IRQC. That means do de-assert PCT interrupt it's required to reset IRQs from all counters that have reached their limit values. Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27ARCv2: perf: implement "event_set_period"Alexey Brodkin1-16/+63
This generalization prepares for support of overflow interrupts. Hardware event counters on ARC work that way: Each counter counts from programmed start value (set in ARC_REG_PCT_COUNT) to a limit value (set in ARC_REG_PCT_INT_CNT) and once limit value is reached this timer generates an interrupt. Even though this hardware implementation allows for more flexibility, in Linux kernel we decided to mimic behavior of other architectures this way: [1] Set limit value as half of counter's max value (to allow counter to run after reaching it limit, see below for more explanation): ---------->8----------- arc_pmu->max_period = (1ULL << counter_size) / 2 - 1ULL; ---------->8----------- [2] Set start value as "arc_pmu->max_period - sample_period" and then count up to the limit Our event counters don't stop on reaching max value (the one we set in ARC_REG_PCT_INT_CNT) but continue to count until kernel explicitly stops each of them. And setting a limit as half of counter capacity is done to allow capturing of additional events in between moment when interrupt was triggered until we're actually processing PMU interrupts. That way we're trying to be more precise. For example if we count CPU cycles we keep track of cycles while running through generic IRQ handling code: [1] We set counter period as say 100_000 events of type "crun" [2] Counter reaches that limit and raises its interrupt [3] Once we get in PMU IRQ handler we read current counter value from ARC_REG_PCT_SNAP ans see there something like 105_000. If counters stop on reaching a limit value then we would miss additional 5000 cycles. Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-27ARC: perf: cap the number of counters to hardware max of 32Vineet Gupta1-3/+3
The number of counters in PCT can never be more than 32 (while countable conditions could be 100+) for both ARCompact and ARCv2 And while at it update copyright dates. Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-08-20ARC: add/fix some comments in code - no functional changeVineet Gupta1-2/+2
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-19ARC: perf: Remove unnecessary local variableTobias Klauser1-4/+2
Directly return the result of perf_pmu_register() in arc_pmu_device_probe() instead of assigning and returning variable ret. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-06-19arc: fix use of uninitialized arc_pmuMax Filippov1-1/+0
static arc_pmu in the arch/arc/kernel/perf_event.c is not initialized as it's shadowed by a local variable of the same name in the arc_pmu_device_probe. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Fixes: 03c94fcf954d "ARC: perf: make @arc_pmu static global" CC: <stable@vger.kernel.org> # 4.1 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-04-20ARC: perf: don't add code for impossible caseVineet Gupta1-4/+1
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-04-20ARC: perf: Rename DT binding to not confuse with power mgmtVineet Gupta1-2/+2
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-04-20ARC: perf: add user space attribution in callchainsVineet Gupta1-0/+10
The actual user space unwinding is more involved, so simply capture the user space PC Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-04-20ARC: perf: Add kernel callchain supportVineet Gupta1-0/+29
Signed-off-by: Mischa Jonker <mjonker@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-04-20ARC: perf: Add some comments/debug stuffVineet Gupta1-5/+13
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-04-20ARC: perf: make @arc_pmu static globalVineet Gupta1-5/+2
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2014-10-13ARC: boot: cpu feature print enhancementsVineet Gupta1-12/+10
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2014-06-18arc, perf: Use common PMU interrupt disabled codeVince Weaver1-4/+3
Transition to using the new generic PERF_PMU_CAP_NO_INTERRUPT method for failing a sampling event when no PMU interrupt is available. Signed-off-by: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406150159280.16738@vincent-weaver-1.umelst.maine.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-28ARC: [perf] Fix a few thinkosVineet Gupta1-2/+2
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-11-15ARC: perf: ARC 700 PMU doesn't support sampling eventsMischa Jonker1-0/+4
The ARC 700 does not have an interrupt associated with it, and as such it cannot trigger when a counter overflows. As the counters are 48 bit, it will usually take at least 100 days before a counter overflows, so for mere counting of events, there is no problem. Sampling is not supported though. Signed-off-by: Mischa Jonker <mjonker@synopsys.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-11-12ARC: Add perf support for ARC700 coresMischa Jonker1-0/+322
This adds basic perf support for ARC700 cores. Most PERF_COUNT_HW* events are supported now. Signed-off-by: Mischa Jonker <mjonker@synopsys.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>