summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-11-25io_uring: make poll refs more robustPavel Begunkov1-1/+35
poll_refs carry two functions, the first is ownership over the request. The second is notifying the io_poll_check_events() that there was an event but wake up couldn't grab the ownership, so io_poll_check_events() should retry. We want to make poll_refs more robust against overflows. Instead of always incrementing it, which covers two purposes with one atomic, check if poll_refs is elevated enough and if so set a retry flag without attempts to grab ownership. The gap between the bias check and following atomics may seem racy, but we don't need it to be strict. Moreover there might only be maximum 4 parallel updates: by the first and the second poll entries, __io_arm_poll_handler() and cancellation. From those four, only poll wake ups may be executed multiple times, but they're protected by a spin. Cc: stable@vger.kernel.org Reported-by: Lin Ma <linma@zju.edu.cn> Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c762bc31f8683b3270f3587691348a7119ef9c9d.1668963050.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-25io_uring: cmpxchg for poll arm refs releasePavel Begunkov1-5/+3
Replace atomically substracting the ownership reference at the end of arming a poll with a cmpxchg. We try to release ownership by setting 0 assuming that poll_refs didn't change while we were arming. If it did change, we keep the ownership and use it to queue a tw, which is fully capable to process all events and (even tolerates spurious wake ups). It's a bit more elegant as we reduce races b/w setting the cancellation flag and getting refs with this release, and with that we don't have to worry about any kinds of underflows. It's not the fastest path for polling. The performance difference b/w cmpxchg and atomic dec is usually negligible and it's not the fastest path. Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0c95251624397ea6def568ff040cad2d7926fd51.1668963050.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-18io_uring: disallow self-propelled ring pollingPavel Begunkov1-0/+2
When we post a CQE we wake all ring pollers as it normally should be. However, if a CQE was generated by a multishot poll request targeting its own ring, it'll wake that request up, which will make it to post a new CQE, which will wake the request and so on until it exhausts all CQ entries. Don't allow multishot polling io_uring files but downgrade them to oneshots, which was always stated as a correct behaviour that the userspace should check for. Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3124038c0e7474d427538c2d915335ec28c92d21.1668785722.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17io_uring: fix multishot recv request leaksPavel Begunkov1-9/+7
Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 1300ebb20286b ("io_uring: multishot recv") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17io_uring: fix multishot accept request leaksPavel Begunkov4-8/+8
Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17io_uring: fix tw losing poll eventsPavel Begunkov1-0/+7
We may never try to process a poll wake and its mask if there was multiple wake ups racing for queueing up a tw. Force io_poll_check_events() to update the mask by vfs_poll(). Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17io_uring: update res mask in io_poll_check_eventsPavel Begunkov1-0/+3
When io_poll_check_events() collides with someone attempting to queue a task work, it'll spin for one more time. However, it'll continue to use the mask from the first iteration instead of updating it. For example, if the first wake up was a EPOLLIN and the second EPOLLOUT, the userspace will not get EPOLLOUT in time. Clear the mask for all subsequent iterations to force vfs_poll(). Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-11io_uring/poll: lockdep annote io_poll_req_insert_lockedPavel Begunkov1-0/+2
Add a lockdep annotation in io_poll_req_insert_locked(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8115d8e702733754d0aea119e9b5bb63d1eb8b24.1668184658.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-11io_uring/poll: fix double poll req->flags racesPavel Begunkov1-12/+17
io_poll_double_prepare() | io_poll_wake() | poll->head = NULL smp_load(&poll->head); /* NULL */ | flags = req->flags; | | req->flags &= ~SINGLE_POLL; req->flags = flags | DOUBLE_POLL | The idea behind io_poll_double_prepare() is to serialise with the first poll entry by taking the wq lock. However, it's not safe to assume that io_poll_wake() is not running when we can't grab the lock and so we may race modifying req->flags. Skip double poll setup if that happens. It's ok because the first poll entry will only be removed when it's definitely completing, e.g. pollfree or oneshot with a valid mask. Fixes: 49f1c68e048f1 ("io_uring: optimise submission side poll_refs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b7fab2d502f6121a7d7b199fe4d914a43ca9cdfd.1668184658.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-10io_uring: check for rollover of buffer ID when providing buffersJens Axboe1-0/+2
We already check if the chosen starting offset for the buffer IDs fit within an unsigned short, as 65535 is the maximum value for a provided buffer. But if the caller asks to add N buffers at offset M, and M + N would exceed the size of the unsigned short, we simply add buffers with wrapping around the ID. This is not necessarily a bug and could in fact be a valid use case, but it seems confusing and inconsistent with the initial check for starting offset. Let's check for wrap consistently, and error the addition if we do need to wrap. Reported-by: Olivier Langlois <olivier@trillion01.com> Link: https://github.com/axboe/liburing/issues/726 Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-08io_uring: calculate CQEs from the user visible valueDylan Yudaken1-2/+8
io_cqring_wait (and it's wake function io_has_work) used cached_cq_tail in order to calculate the number of CQEs. cached_cq_tail is set strictly before the user visible rings->cq.tail However as far as userspace is concerned, if io_uring_enter(2) is called with a minimum number of events, they will verify by checking rings->cq.tail. It is therefore possible for io_uring_enter(2) to return early with fewer events visible to the user. Instead make the wait functions read from the user visible value, so there will be no discrepency. This is triggered eventually by the following reproducer: struct io_uring_sqe *sqe; struct io_uring_cqe *cqe; unsigned int cqe_ready; struct io_uring ring; int ret, i; ret = io_uring_queue_init(N, &ring, 0); assert(!ret); while(true) { for (i = 0; i < N; i++) { sqe = io_uring_get_sqe(&ring); io_uring_prep_nop(sqe); sqe->flags |= IOSQE_ASYNC; } ret = io_uring_submit(&ring); assert(ret == N); do { ret = io_uring_wait_cqes(&ring, &cqe, N, NULL, NULL); } while(ret == -EINTR); cqe_ready = io_uring_cq_ready(&ring); assert(!ret); assert(cqe_ready == N); io_uring_cq_advance(&ring, N); } Fixes: ad3eb2c89fb2 ("io_uring: split overflow state into SQ and CQ side") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221108153016.1854297-1-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-06io_uring: fix typo in io_uring.h commentJens Axboe1-1/+1
Just a basic s/thig/this swap, fixing up a typo introduced by a commit added in the 6.1 release. Fixes: 9cda70f622cd ("io_uring: introduce fixed buffer support for io_uring_cmd") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-02selftests/net: don't tests batched TCP io_uring zcPavel Begunkov1-1/+1
It doesn't make sense batch submitting io_uring requests to a single TCP socket without linking or some other kind of ordering. Moreover, it causes spurious -EINTR fails due to interaction with task_work. Disable it for now and keep queue depth=1. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b547698d5938b1b1a898af1c260188d8546ded9a.1666700897.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-27io_uring: unlock if __io_run_local_work locked insideDylan Yudaken2-6/+15
It is possible for tw to lock the ring, and this was not propogated out to io_run_local_work. This can cause an unlock to be missed. Instead pass a pointer to locked into __io_run_local_work. Fixes: 8ac5d85a89b4 ("io_uring: add local task_work run helper that is entered locked") Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-3-dylany@meta.com [axboe: WARN_ON() -> WARN_ON_ONCE() and add a minor comment] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-27io_uring: use io_run_local_work_locked helperDylan Yudaken1-2/+1
prefer to use io_run_local_work_locked helper for consistency Signed-off-by: Dylan Yudaken <dylany@meta.com> Link: https://lore.kernel.org/r/20221027144429.3971400-2-dylany@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-22io_uring/net: fail zc sendmsg when unsupported by socketPavel Begunkov1-0/+2
The previous patch fails zerocopy send requests for protocols that don't support it, do the same for zerocopy sendmsg. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0854e7bb4c3d810a48ec8b5853e2f61af36a0467.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-22io_uring/net: fail zc send when unsupported by socketPavel Begunkov1-0/+2
If a protocol doesn't support zerocopy it will silently fall back to copying. This type of behaviour has always been a source of troubles so it's better to fail such requests instead. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2db3c7f16bb6efab4b04569cd16e6242b40c5cb3.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-22net: flag sockets supporting msghdr originated zerocopyPavel Begunkov3-0/+3
We need an efficient way in io_uring to check whether a socket supports zerocopy with msghdr provided ubuf_info. Add a new flag into the struct socket flags fields. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-20io-wq: Fix memory leak in worker creationRafael Mendonca1-1/+1
If the CPU mask allocation for a node fails, then the memory allocated for the 'io_wqe' struct of the current node doesn't get freed on the error handling path, since it has not yet been added to the 'wqes' array. This was spotted when fuzzing v6.1-rc1 with Syzkaller: BUG: memory leak unreferenced object 0xffff8880093d5000 (size 1024): comm "syz-executor.2", pid 7701, jiffies 4295048595 (age 13.900s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000cb463369>] __kmem_cache_alloc_node+0x18e/0x720 [<00000000147a3f9c>] kmalloc_node_trace+0x2a/0x130 [<000000004e107011>] io_wq_create+0x7b9/0xdc0 [<00000000c38b2018>] io_uring_alloc_task_context+0x31e/0x59d [<00000000867399da>] __io_uring_add_tctx_node.cold+0x19/0x1ba [<000000007e0e7a79>] io_uring_setup.cold+0x1b80/0x1dce [<00000000b545e9f6>] __x64_sys_io_uring_setup+0x5d/0x80 [<000000008a8a7508>] do_syscall_64+0x5d/0x90 [<000000004ac08bec>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 0e03496d1967 ("io-wq: use private CPU mask") Cc: stable@vger.kernel.org Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Link: https://lore.kernel.org/r/20221020014710.902201-1-rafaelmendsr@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-19io_uring/msg_ring: Fix NULL pointer dereference in io_msg_send_fd()Harshit Mogalapalli1-0/+3
Syzkaller produced the below call trace: BUG: KASAN: null-ptr-deref in io_msg_ring+0x3cb/0x9f0 Write of size 8 at addr 0000000000000070 by task repro/16399 CPU: 0 PID: 16399 Comm: repro Not tainted 6.1.0-rc1 #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 ? io_msg_ring+0x3cb/0x9f0 kasan_report+0xbc/0xf0 ? io_msg_ring+0x3cb/0x9f0 kasan_check_range+0x140/0x190 io_msg_ring+0x3cb/0x9f0 ? io_msg_ring_prep+0x300/0x300 io_issue_sqe+0x698/0xca0 io_submit_sqes+0x92f/0x1c30 __do_sys_io_uring_enter+0xae4/0x24b0 .... RIP: 0033:0x7f2eaf8f8289 RSP: 002b:00007fff40939718 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2eaf8f8289 RDX: 0000000000000000 RSI: 0000000000006f71 RDI: 0000000000000004 RBP: 00007fff409397a0 R08: 0000000000000000 R09: 0000000000000039 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004006d0 R13: 00007fff40939880 R14: 0000000000000000 R15: 0000000000000000 </TASK> Kernel panic - not syncing: panic_on_warn set ... We don't have a NULL check on file_ptr in io_msg_send_fd() function, so when file_ptr is NUL src_file is also NULL and get_file() dereferences a NULL pointer and leads to above crash. Add a NULL check to fix this issue. Fixes: e6130eba8a84 ("io_uring: add support for passing fixed file descriptors") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20221019171218.1337614-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-17io_uring/rw: remove leftover debug statementJens Axboe1-2/+0
This debug statement was never meant to go into the upstream release, kill it off before it ends up in a release. It was just part of the testing for the initial version of the patch. Fixes: 2ec33a6c3cca ("io_uring/rw: ensure kiocb_end_write() is always called") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-17io_uring: don't iopoll from io_ring_ctx_wait_and_kill()Pavel Begunkov1-8/+5
We should not be completing requests from a task context that has already undergone io_uring cancellations, i.e. __io_uring_cancel(), as there are some assumptions, e.g. around cached task refs draining. Remove iopolling from io_ring_ctx_wait_and_kill() as it can be called later after PF_EXITING is set with the last task_work run. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7c03cc91455c4a1af49c6b9cbda4e57ea467aa11.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-17io_uring: reuse io_alloc_req()Pavel Begunkov1-6/+2
Don't duplicate io_alloc_req() in io_req_caches_free() but reuse the helper. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6005fc88274864a49fc3096c22d8bdd605cf8576.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-17io_uring: kill hot path fixed file bitmap debug checksPavel Begunkov2-1/+1
We test file_table.bitmap in io_file_get_fixed() to check invariants, don't do it, it's expensive and was showing up in profiles. No reports of this triggering has come in. Move the check to the file clear instead, which will still catch any wrong usage. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cf77f2ded68d2e5b2bc7355784d969837d48e023.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-17io_uring: remove FFS_SCMPavel Begunkov4-25/+3
THe lifetime of SCM'ed files is bound to ring_sock, which is destroyed strictly after we're done with registered file tables. This means there is no need for the FFS_SCM hack, which was not available on 32-bit builds anyway. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/984226a1045adf42dc35d8bd7fb5a8bbfa472ce1.1665891182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-10-17Linux 6.1-rc1v6.1-rc1Linus Torvalds1-2/+2
2022-10-17Merge tag 'random-6.1-rc1-for-linus' of ↵Linus Torvalds185-421/+378
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: "This time with some large scale treewide cleanups. The intent of this pull is to clean up the way callers fetch random integers. The current rules for doing this right are: - If you want a secure or an insecure random u64, use get_random_u64() - If you want a secure or an insecure random u32, use get_random_u32() The old function prandom_u32() has been deprecated for a while now and is just a wrapper around get_random_u32(). Same for get_random_int(). - If you want a secure or an insecure random u16, use get_random_u16() - If you want a secure or an insecure random u8, use get_random_u8() - If you want secure or insecure random bytes, use get_random_bytes(). The old function prandom_bytes() has been deprecated for a while now and has long been a wrapper around get_random_bytes() - If you want a non-uniform random u32, u16, or u8 bounded by a certain open interval maximum, use prandom_u32_max() I say "non-uniform", because it doesn't do any rejection sampling or divisions. Hence, it stays within the prandom_*() namespace, not the get_random_*() namespace. I'm currently investigating a "uniform" function for 6.2. We'll see what comes of that. By applying these rules uniformly, we get several benefits: - By using prandom_u32_max() with an upper-bound that the compiler can prove at compile-time is ≤65536 or ≤256, internally get_random_u16() or get_random_u8() is used, which wastes fewer batched random bytes, and hence has higher throughput. - By using prandom_u32_max() instead of %, when the upper-bound is not a constant, division is still avoided, because prandom_u32_max() uses a faster multiplication-based trick instead. - By using get_random_u16() or get_random_u8() in cases where the return value is intended to indeed be a u16 or a u8, we waste fewer batched random bytes, and hence have higher throughput. This series was originally done by hand while I was on an airplane without Internet. Later, Kees and I worked on retroactively figuring out what could be done with Coccinelle and what had to be done manually, and then we split things up based on that. So while this touches a lot of files, the actual amount of code that's hand fiddled is comfortably small" * tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: prandom: remove unused functions treewide: use get_random_bytes() when possible treewide: use get_random_u32() when possible treewide: use get_random_{u8,u16}() when possible, part 2 treewide: use get_random_{u8,u16}() when possible, part 1 treewide: use prandom_u32_max() when possible, part 2 treewide: use prandom_u32_max() when possible, part 1
2022-10-17Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of ↵Linus Torvalds36-71/+1265
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools updates from Arnaldo Carvalho de Melo: - Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels when using bperf (perf BPF based counters) with cgroups. - Support HiSilicon PCIe Performance Monitoring Unit (PMU), that monitors bandwidth, latency, bus utilization and buffer occupancy. Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst. - User space tasks can migrate between CPUs, so when tracing selected CPUs, system-wide sideband is still needed, fix it in the setup of Intel PT on hybrid systems. - Fix metricgroups title message in 'perf list', it should state that the metrics groups are to be used with the '-M' option, not '-e'. - Sync the msr-index.h copy with the kernel sources, adding support for using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well as decoding it when printing the MSR tracepoint arguments. - Fix program header size and alignment when generating a JIT ELF in 'perf inject'. - Add multiple new Intel PT 'perf test' entries, including a jitdump one. - Fix the 'perf test' entries for 'perf stat' CSV and JSON output when running on PowerPC due to an invalid topology number in that arch. - Fix the 'perf test' for arm_coresight failures on the ARM Juno system. - Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this option to the or expression expected in the intercepted perf_event_open() syscall. - Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the 'perf annotate' asm parser. - Fix 'perf mem record -C' option processing, it was being chopped up when preparing the underlying 'perf record -e mem-events' and thus being ignored, requiring using '-- -C CPUs' as a workaround. - Improvements and tidy ups for 'perf test' shell infra. - Fix Intel PT information printing segfault in uClibc, where a NULL format was being passed to fprintf. * tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits) tools arch x86: Sync the msr-index.h copy with the kernel sources perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver perf auxtrace arm: Refactor event list iteration in auxtrace_record__init() perf tests stat+json_output: Include sanity check for topology perf tests stat+csv_output: Include sanity check for topology perf intel-pt: Fix system_wide dummy event for hybrid perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc perf test: Fix attr tests for PERF_FORMAT_LOST perf test: test_intel_pt.sh: Add 9 tests perf inject: Fix GEN_ELF_TEXT_OFFSET for jit perf test: test_intel_pt.sh: Add jitdump test perf test: test_intel_pt.sh: Tidy some alignment perf test: test_intel_pt.sh: Print a message when skipping kernel tracing perf test: test_intel_pt.sh: Tidy some perf record options perf test: test_intel_pt.sh: Fix return checking again perf: Skip and warn on unknown format 'configN' attrs perf list: Fix metricgroups title message perf mem: Fix -C option behavior for perf mem record perf annotate: Add missing condition flags for arm64 ...
2022-10-16Merge tag 'kbuild-fixes-v6.1' of ↵Linus Torvalds6-11/+18
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the combination of Clang >= 14 and GAS <= 2.35. - Drop vmlinux.bz2 from the rpm package as it just annoyingly increased the package size. - Fix modpost error under build environments using musl. - Make *.ll files keep value names for easier debugging - Fix single directory build - Prevent RISC-V from selecting the broken DWARF5 support when Clang and GAS are used together. * tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5 kbuild: fix single directory build kbuild: add -fno-discard-value-names to cmd_cc_ll_c scripts/clang-tools: Convert clang-tidy args to list modpost: put modpost options before argument kbuild: Stop including vmlinux.bz2 in the rpm's Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
2022-10-16Merge tag 'clk-for-linus' of ↵Linus Torvalds18-129/+1831
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull more clk updates from Stephen Boyd: "This is the final part of the clk patches for this merge window. The clk rate range series needed another week to fully bake. Maxime fixed the bug that broke clk notifiers and prevented this from being included in the first pull request. He also added a unit test on top to make sure it doesn't break so easily again. The majority of the series fixes up how the clk_set_rate_*() APIs work, particularly around when the rate constraints are dropped and how they move around when reparenting clks. Overall it's a much needed improvement to the clk rate range APIs that used to be pretty broken if you looked sideways. Beyond the core changes there are a few driver fixes for a compilation issue or improper data causing clks to fail to register or have the wrong parents. These are good to get in before the first -rc so that the system actually boots on the affected devices" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits) clk: tegra: Fix Tegra PWM parent clock clk: at91: fix the build with binutils 2.27 clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks clk: mediatek: clk-mux: Add .determine_rate() callback clk: tests: Add tests for notifiers clk: Update req_rate on __clk_recalc_rates() clk: tests: Add missing test case for ranges clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d clk: Introduce the clk_hw_get_rate_range function clk: Zero the clk_rate_request structure clk: Stop forwarding clk_rate_requests to the parent clk: Constify clk_has_parent() clk: Introduce clk_core_has_parent() clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock clk: Add our request boundaries in clk_core_init_rate_req clk: Introduce clk_hw_init_rate_request() clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller clk: Change clk_core_init_rate_req prototype clk: Set req_rate on reparenting clk: Take into account uncached clocks in clk_set_rate_range() ...
2022-10-16Merge tag '6.1-rc-smb3-client-fixes-part2' of ↵Linus Torvalds23-726/+922
git://git.samba.org/sfrench/cifs-2.6 Pull more cifs updates from Steve French: - fix a regression in guest mounts to old servers - improvements to directory leasing (caching directory entries safely beyond the root directory) - symlink improvement (reducing roundtrips needed to process symlinks) - an lseek fix (to problem where some dir entries could be skipped) - improved ioctl for returning more detailed information on directory change notifications - clarify multichannel interface query warning - cleanup fix (for better aligning buffers using ALIGN and round_up) - a compounding fix - fix some uninitialized variable bugs found by Coverity and the kernel test robot * tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: smb3: improve SMB3 change notification support cifs: lease key is uninitialized in two additional functions when smb1 cifs: lease key is uninitialized in smb1 paths smb3: must initialize two ACL struct fields to zero cifs: fix double-fault crash during ntlmssp cifs: fix static checker warning cifs: use ALIGN() and round_up() macros cifs: find and use the dentry for cached non-root directories also cifs: enable caching of directories for which a lease is held cifs: prevent copying past input buffer boundaries cifs: fix uninitialised var in smb2_compound_op() cifs: improve symlink handling for smb2+ smb3: clarify multichannel warning cifs: fix regression in very old smb1 mounts cifs: fix skipping to incorrect offset in emit_cached_dirents
2022-10-16Revert "cpumask: fix checking valid cpu range".Tetsuo Handa1-8/+11
This reverts commit 78e5a3399421 ("cpumask: fix checking valid cpu range"). syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at cpu_max_bits_warn() [1], for commit 78e5a3399421 ("cpumask: fix checking valid cpu range") is broken. Obviously that patch hits WARN_ON_ONCE() when e.g. reading /proc/cpuinfo because passing "cpu + 1" instead of "cpu" will trivially hit cpu == nr_cpumask_bits condition. Although syzbot found this problem in linux-next.git on 2022/09/27 [2], this problem was not fixed immediately. As a result, that patch was sent to linux.git before the patch author recognizes this problem, and syzbot started failing to test changes in linux.git since 2022/10/10 [3]. Andrew Jones proposed a fix for x86 and riscv architectures [4]. But [2] and [5] indicate that affected locations are not limited to arch code. More delay before we find and fix affected locations, less tested kernel (and more difficult to bisect and fix) before release. We should have inspected and fixed basically all cpumask users before applying that patch. We should not crash kernels in order to ask existing cpumask users to update their code, even if limited to CONFIG_DEBUG_PER_CPU_MAPS=y case. Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1] Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2] Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3] Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4] Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5] Reported-by: Andrew Jones <ajones@ventanamicro.com> Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Yury Norov <yury.norov@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-10-16lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5Nathan Chancellor1-2/+7
When building with a RISC-V kernel with DWARF5 debug info using clang and the GNU assembler, several instances of the following error appear: /tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported Dumping the .s file reveals these .uleb128 directives come from .debug_loc and .debug_ranges: .Ldebug_loc0: .byte 4 # DW_LLE_offset_pair .uleb128 .Lfunc_begin0-.Lfunc_begin0 # starting offset .uleb128 .Ltmp1-.Lfunc_begin0 # ending offset .byte 1 # Loc expr size .byte 90 # DW_OP_reg10 .byte 0 # DW_LLE_end_of_list .Ldebug_ranges0: .byte 4 # DW_RLE_offset_pair .uleb128 .Ltmp6-.Lfunc_begin0 # starting offset .uleb128 .Ltmp27-.Lfunc_begin0 # ending offset .byte 4 # DW_RLE_offset_pair .uleb128 .Ltmp28-.Lfunc_begin0 # starting offset .uleb128 .Ltmp30-.Lfunc_begin0 # ending offset .byte 0 # DW_RLE_end_of_list There is an outstanding binutils issue to support a non-constant operand to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to be any movement on it, due to concerns over how it would work with linker relaxation. To avoid these build errors, prevent DWARF5 from being selected when using clang and an assembler that does not have support for these symbol deltas, which can be easily checked in Kconfig with as-instr plus the small test program from the dwz test suite from the binutils issue. Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215 Link: https://github.com/ClangBuiltLinux/linux/issues/1719 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-10-16kbuild: fix single directory buildMasahiro Yamada1-0/+2
Commit f110e5a250e3 ("kbuild: refactor single builds of *.ko") was wrong. KBUILD_MODULES _is_ needed for single builds. Otherwise, "make foo/bar/baz/" does not build module objects at all. Fixes: f110e5a250e3 ("kbuild: refactor single builds of *.ko") Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: David Sterba <dsterba@suse.com>
2022-10-16Merge tag 'slab-for-6.1-rc1-hotfix' of ↵Linus Torvalds2-19/+19
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab hotfix from Vlastimil Babka: "A single fix for the common-kmalloc series, for warnings on mips and sparc64 reported by Guenter Roeck" * tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
2022-10-16Merge tag 'for-linus' of https://github.com/openrisc/linuxLinus Torvalds2-9/+9
Pull OpenRISC updates from Stafford Horne: "I have relocated to London so not much work from me while I get settled. Still, OpenRISC picked up two patches in this window: - Fix for kernel page table walking from Jann Horn - MAINTAINER entry cleanup from Palmer Dabbelt" * tag 'for-linus' of https://github.com/openrisc/linux: MAINTAINERS: git://github -> https://github.com for openrisc openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
2022-10-16Merge tag 'pci-v6.1-fixes-1' of ↵Linus Torvalds1-61/+1
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fix from Bjorn Helgaas: "Revert the attempt to distribute spare resources to unconfigured hotplug bridges at boot time. This fixed some dock hot-add scenarios, but Jonathan Cameron reported that it broke a topology with a multi-function device where one function was a Switch Upstream Port and the other was an Endpoint" * tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: Revert "PCI: Distribute available resources for root buses, too"
2022-10-15mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocationHyeonggon Yoo2-19/+19
After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2) requests to buddy like SLUB does. SLAB has been using kmalloc caches to allocate freelist_idx_t array for off slab caches. But after the commit, freelist_size can be bigger than KMALLOC_MAX_CACHE_SIZE. Instead of using pointer to kmalloc cache, use kmalloc_node() and only check if the kmalloc cache is off slab during calculate_slab_order(). If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens as it allocates freelist_idx_t array directly from buddy. Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/ Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator") Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-10-15MAINTAINERS: git://github -> https://github.com for openriscPalmer Dabbelt1-1/+1
Github deprecated the git:// links about a year ago, so let's move to the https:// URLs instead. Reported-by: Conor Dooley <conor.dooley@microchip.com> Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Stafford Horne <shorne@gmail.com>
2022-10-15smb3: improve SMB3 change notification supportSteve French6-13/+90
Change notification is a commonly supported feature by most servers, but the current ioctl to request notification when a directory is changed does not return the information about what changed (even though it is returned by the server in the SMB3 change notify response), it simply returns when there is a change. This ioctl improves upon CIFS_IOC_NOTIFY by returning the notify information structure which includes the name of the file(s) that changed and why. See MS-SMB2 2.2.35 for details on the individual filter flags and the file_notify_information structure returned. To use this simply pass in the following (with enough space to fit at least one file_notify_information structure) struct __attribute__((__packed__)) smb3_notify { uint32_t completion_filter; bool watch_tree; uint32_t data_len; uint8_t data[]; } __packed; using CIFS_IOC_NOTIFY_INFO 0xc009cf0b or equivalently _IOWR(CIFS_IOCTL_MAGIC, 11, struct smb3_notify_info) The ioctl will block until the server detects a change to that directory or its subdirectories (if watch_tree is set). Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15cifs: lease key is uninitialized in two additional functions when smb1Steve French1-2/+2
cifs_open and _cifsFileInfo_put also end up with lease_key uninitialized in smb1 mounts. It is cleaner to set lease key to zero in these places where leases are not supported (smb1 can not return lease keys so the field was uninitialized). Addresses-Coverity: 1514207 ("Uninitialized scalar variable") Addresses-Coverity: 1514331 ("Uninitialized scalar variable") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15cifs: lease key is uninitialized in smb1 pathsSteve French1-1/+1
It is cleaner to set lease key to zero in the places where leases are not supported (smb1 can not return lease keys so the field was uninitialized). Addresses-Coverity: 1513994 ("Uninitialized scalar variable") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15smb3: must initialize two ACL struct fields to zeroSteve French1-1/+2
Coverity spotted that we were not initalizing Stbz1 and Stbz2 to zero in create_sd_buf. Addresses-Coverity: 1513848 ("Uninitialized scalar variable") Cc: <stable@vger.kernel.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15cifs: fix double-fault crash during ntlmsspPaulo Alcantara1-7/+9
The crash occurred because we were calling memzero_explicit() on an already freed sess_data::iov[1] (ntlmsspblob) in sess_free_buffer(). Fix this by not calling memzero_explicit() on sess_data::iov[1] as it's already by handled by callers. Fixes: a4e430c8c8ba ("cifs: replace kfree() with kfree_sensitive() for sensitive data") Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-10-15tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo1-0/+18
To pick up the changes in: b8d1d163604bd1e6 ("x86/apic: Don't disable x2APIC if locked") ca5b7c0d9621702e ("perf/x86/amd/lbr: Add LbrExtV2 branch record support") Addressing these tools/perf build warnings: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' That makes the beautification scripts to pick some new entries: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2022-10-14 18:06:34.294561729 -0300 +++ after 2022-10-14 18:06:41.285744044 -0300 @@ -264,6 +264,7 @@ [0xc0000102 - x86_64_specific_MSRs_offset] = "KERNEL_GS_BASE", [0xc0000103 - x86_64_specific_MSRs_offset] = "TSC_AUX", [0xc0000104 - x86_64_specific_MSRs_offset] = "AMD64_TSC_RATIO", + [0xc000010e - x86_64_specific_MSRs_offset] = "AMD64_LBR_SELECT", [0xc000010f - x86_64_specific_MSRs_offset] = "AMD_DBG_EXTN_CFG", [0xc0000300 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_STATUS", [0xc0000301 - x86_64_specific_MSRs_offset] = "AMD64_PERF_CNTR_GLOBAL_CTL", $ Now one can trace systemwide asking to see backtraces to where that MSR is being read/written, see this example with a previous update: # perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" ^C# If we use -v (verbose mode) we can see what it does behind the scenes: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB" Using CPUID AuthenticAMD-25-21-0 0x6a0 0x6a8 New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) 0x6a0 0x6a8 New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313) mmap size 528384B ^C# Example with a frequent msr: # perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2 Using CPUID AuthenticAMD-25-21-0 0x48 New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) 0x48 New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841) mmap size 528384B Looking at the vmlinux_path (8 entries long) symsrc__init: build id mismatch for vmlinux. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols 0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule ([kernel.kallsyms]) futex_wait_queue_me ([kernel.kallsyms]) futex_wait ([kernel.kallsyms]) do_futex ([kernel.kallsyms]) __x64_sys_futex ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so) 0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2) do_trace_write_msr ([kernel.kallsyms]) do_trace_write_msr ([kernel.kallsyms]) __switch_to_xtra ([kernel.kallsyms]) __switch_to ([kernel.kallsyms]) __schedule ([kernel.kallsyms]) schedule_idle ([kernel.kallsyms]) do_idle ([kernel.kallsyms]) cpu_startup_entry ([kernel.kallsyms]) secondary_startup_64_no_verify ([kernel.kallsyms]) # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Sneddon <daniel.sneddon@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/lkml/Y0nQkz2TUJxwfXJd@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packetQi Liu7-0/+396
Add support for using 'perf report --dump-raw-trace' to parse PTT packet. Example usage: Output will contain raw PTT data and its textual representation, such as (8DW format): 0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000 offset: 0 ref: 0xa5d50c725 idx: 0 tid: -1 cpu: 0 . . ... HISI PTT data: size 4194304 bytes . 00000000: 00 00 00 00 Prefix . 00000004: 08 20 00 60 Header DW0 . 00000008: ff 02 00 01 Header DW1 . 0000000c: 20 08 00 00 Header DW2 . 00000010: 10 e7 44 ab Header DW3 . 00000014: 2a a8 1e 01 Time . 00000020: 00 00 00 00 Prefix . 00000024: 01 00 00 60 Header DW0 . 00000028: 0f 1e 00 01 Header DW1 . 0000002c: 04 00 00 00 Header DW2 . 00000030: 40 00 81 02 Header DW3 . 00000034: ee 02 00 00 Time .... This patch only add basic parsing support according to the definition of the PTT packet described in Documentation/trace/hisi-ptt.rst. And the fields of each packet can be further decoded following the PCIe Spec's definition of TLP packet. Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.garry@huawei.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-4-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driverQi Liu7-1/+273
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the PCIe link's events, and trace the TLP headers). This patch add support for PTT device in perf tool, so users could use 'perf record' to get TLP headers trace data. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()Qi Liu1-19/+34
Add find_pmu_for_event() and use to simplify logic in auxtrace_record_init(). find_pmu_for_event() will be reused in subsequent patches. Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-2-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf tests stat+json_output: Include sanity check for topologyAthira Rajeev1-4/+39
Testcase stat+json_output.sh fails in powerpc: 86: perf stat JSON output linter : FAILED! The testcase "stat+json_output.sh" verifies perf stat JSON output. The test covers aggregation modes like per-socket, per-core, per-die, -A (no_aggr mode) along with few other tests. It counts expected fields for various commands. For example say -A (i.e, AGGR_NONE mode), expects 7 fields in the output having "CPU" as first field. Same way, for per-socket, it expects the first field in result to point to socket id. The testcases compares the result with expected count. The values for socket, die, core and cpu are fetched from topology directory: /sys/devices/system/cpu/cpu*/topology. For example, socket value is fetched from "physical_package_id" file of topology directory. (cpu__get_topology_int() in util/cpumap.c) If a platform fails to fetch the topology information, values will be set to -1. For example, incase of pSeries platform of powerpc, value for "physical_package_id" is restricted and not exposed. So, -1 will be assigned. Perf code has a checks for valid cpu id in "aggr_printout" (stat-display.c), which displays the fields. So, in cases where topology values not exposed, first field of the output displaying will be empty. This cause the testcase to fail, as it counts number of fields in the output. Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output, becos of -1 value obtained from topology files for some, only 6 fields are printed. Hence a testcase failure reported due to mismatch in number of fields in the output. Patch here adds a sanity check in the testcase for topology. Check will help to skip the test if -1 value found. Fixes: 0c343af2a2f82844 ("perf test: JSON format checking") Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Suggested-by: Ian Rogers <irogers@google.com> Suggested-by: James Clark <james.clark@arm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Claire Jensen <cjense@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Link: https://lore.kernel.org/r/20221006155149.67205-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf tests stat+csv_output: Include sanity check for topologyAthira Rajeev1-4/+39
Testcase stat+csv_output.sh fails in powerpc: 84: perf stat CSV output linter: FAILED! The testcase "stat+csv_output.sh" verifies perf stat CSV output. The test covers aggregation modes like per-socket, per-core, per-die, -A (no_aggr mode) along with few other tests. It counts expected fields for various commands. For example say -A (i.e, AGGR_NONE mode), expects 7 fields in the output having "CPU" as first field. Same way, for per-socket, it expects the first field in result to point to socket id. The testcases compares the result with expected count. The values for socket, die, core and cpu are fetched from topology directory: /sys/devices/system/cpu/cpu*/topology. For example, socket value is fetched from "physical_package_id" file of topology directory. (cpu__get_topology_int() in util/cpumap.c) If a platform fails to fetch the topology information, values will be set to -1. For example, incase of pSeries platform of powerpc, value for "physical_package_id" is restricted and not exposed. So, -1 will be assigned. Perf code has a checks for valid cpu id in "aggr_printout" (stat-display.c), which displays the fields. So, in cases where topology values not exposed, first field of the output displaying will be empty. This cause the testcase to fail, as it counts number of fields in the output. Incase of -A (AGGR_NONE mode,), testcase expects 7 fields in the output, becos of -1 value obtained from topology files for some, only 6 fields are printed. Hence a testcase failure reported due to mismatch in number of fields in the output. Patch here adds a sanity check in the testcase for topology. Check will help to skip the test if -1 value found. Fixes: 7473ee56dbc91c98 ("perf test: Add checking for perf stat CSV output.") Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Suggested-by: Ian Rogers <irogers@google.com> Suggested-by: James Clark <james.clark@arm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Claire Jensen <cjense@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Link: https://lore.kernel.org/r/20221006155149.67205-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>