summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-08tools/resolve_btfids: Fix cross-compilation to non-host endiannessViktor Malik1-0/+35
The .BTF_ids section is pre-filled with zeroed BTF ID entries during the build and afterwards patched by resolve_btfids with correct values. Since resolve_btfids always writes in host-native endianness, it relies on libelf to do the translation when the target ELF is cross-compiled to a different endianness (this was introduced in commit 61e8aeda9398 ("bpf: Fix libelf endian handling in resolv_btfids")). Unfortunately, the translation will corrupt the flags fields of SET8 entries because these were written during vmlinux compilation and are in the correct endianness already. This will lead to numerous selftests failures such as: $ sudo ./test_verifier 502 502 #502/p sleepable fentry accept FAIL Failed to load prog 'Invalid argument'! bpf_fentry_test1 is not sleepable verification time 34 usec stack depth 0 processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Summary: 0 PASSED, 0 SKIPPED, 1 FAILED Since it's not possible to instruct libelf to translate just certain values, let's manually bswap the flags (both global and entry flags) in resolve_btfids when needed, so that libelf then translates everything correctly. Fixes: ef2c6f370a63 ("tools/resolve_btfids: Add support for 8-byte BTF sets") Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/7b6bff690919555574ce0f13d2a5996cacf7bf69.1707223196.git.vmalik@redhat.com
2024-02-08tools/resolve_btfids: Refactor set sorting with types from btf_ids.hViktor Malik2-14/+30
Instead of using magic offsets to access BTF ID set data, leverage types from btf_ids.h (btf_id_set and btf_id_set8) which define the actual layout of the data. Thanks to this change, set sorting should also continue working if the layout changes. This requires to sync the definition of 'struct btf_id_set8' from include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync the rest of the file at the moment, b/c that would require to also sync multiple dependent headers and we don't need any other defs from btf_ids.h. Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com
2024-02-06libbpf: Use OPTS_SET() macro in bpf_xdp_query()Toke Høiland-Jørgensen1-2/+2
When the feature_flags and xdp_zc_max_segs fields were added to the libbpf bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro. This causes libbpf to write to those fields unconditionally, which means that programs compiled against an older version of libbpf (with a smaller size of the bpf_xdp_query_opts struct) will have its stack corrupted by libbpf writing out of bounds. The patch adding the feature_flags field has an early bail out if the feature_flags field is not part of the opts struct (via the OPTS_HAS) macro, but the patch adding xdp_zc_max_segs does not. For consistency, this fix just changes the assignments to both fields to use the OPTS_SET() macro. Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com
2024-02-06bpf: Use -Wno-address-of-packed-member in some selftestsJose E. Marchesi3-0/+6
[Differences from V2: - Remove conditionals in the source files pragmas, as the pragma is supported by both GCC and clang.] Both GCC and clang implement the -Wno-address-of-packed-member warning, which is enabled by -Wall, that warns about taking the address of a packed struct field when it can lead to an "unaligned" address. This triggers the following errors (-Werror) when building three particular BPF selftests with GCC: progs/test_cls_redirect.c 986 | if (ipv4_is_fragment((void *)&encap->ip)) { progs/test_cls_redirect_dynptr.c 410 | pkt_ipv4_checksum((void *)&encap_gre->ip); progs/test_cls_redirect.c 521 | pkt_ipv4_checksum((void *)&encap_gre->ip); progs/test_tc_tunnel.c 232 | set_ipv4_csum((void *)&h_outer.ip); These warnings do not signal any real problem in the tests as far as I can see. This patch adds pragmas to these test files that inhibit the -Waddress-of-packed-member warning. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240206102330.7113-1-jose.marchesi@oracle.com
2024-02-06bpf, docs: Fix typos in instructions-set.rstDave Thaler1-3/+4
* "imm32" should just be "imm" * Add blank line to fix formatting error reported by Stephen Rothwell [0] [0]: https://lore.kernel.org/bpf/20240206153301.4ead0bad@canb.auug.org.au/T/#u Signed-off-by: Dave Thaler <dthaler1968@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240206045146.4965-1-dthaler1968@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06selftests/bpf: mark dynptr kfuncs __weak to make them optional on old kernelsAndrii Nakryiko1-9/+9
Mark dynptr kfuncs as __weak to allow verifier_global_subprogs/arg_ctx_{perf,kprobe,raw_tp} subtests to be loadable on old kernels. Because bpf_dynptr_from_xdp() kfunc is used from arg_tag_dynptr BPF program in progs/verifier_global_subprogs.c *and* is not marked as __weak, loading any subtest from verifier_global_subprogs fails on old kernels that don't have bpf_dynptr_from_xdp() kfunc defined. Even if arg_tag_dynptr program itself is not loaded, libbpf bails out on non-weak reference to bpf_dynptr_from_xdp (that can't be resolved), which shared across all programs in progs/verifier_global_subprogs.c. So mark all dynptr-related kfuncs as __weak to unblock libbpf CI ([0]). In the upcoming "kfunc in vmlinux.h" work we should make sure that kfuncs are always declared __weak as well. [0] https://github.com/libbpf/libbpf/actions/runs/7792673215/job/21251250831?pr=776#step:4:7961 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240206004008.1541513-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06libbpf: fix return value for PERF_EVENT __arg_ctx type fix up checkAndrii Nakryiko1-3/+3
If PERF_EVENT program has __arg_ctx argument with matching architecture-specific pt_regs/user_pt_regs/user_regs_struct pointer type, libbpf should still perform type rewrite for old kernels, but not emit the warning. Fix copy/paste from kernel code where 0 is meant to signify "no error" condition. For libbpf we need to return "true" to proceed with type rewrite (which for PERF_EVENT program will be a canonical `struct bpf_perf_event_data *` type). Fixes: 9eea8fafe33e ("libbpf: fix __arg_ctx type enforcement for perf_event programs") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240206002243.1439450-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06Merge branch 'xsk-support-redirect-to-any-socket-bound-to-the-same-umem'Alexei Starovoitov2-15/+23
Magnus Karlsson says: ==================== xsk: support redirect to any socket bound to the same umem This patch set adds support for directing a packet to any socket bound to the same umem. This makes it possible to use the XDP program to select what socket the packet should be received on. The user can populate the XSKMAP with various sockets and as long as they share the same umem, the XDP program can pick any one of them. The implementation is straight-forward. Instead of testing that the incoming packet is targeting the same device and queue id as the socket is bound to, just check that the umem the packet was received on is the same as the socket we want it to be received on. This guarantees that the redirect is legal as it is already in the correct umem. Patch #1 implements the feature and patch #2 adds documentation. Thanks: Magnus ==================== Link: https://lore.kernel.org/r/20240205123553.22180-1-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06xsk: document ability to redirect to any socket bound to the same umemMagnus Karlsson1-14/+19
Document the ability to redirect to any socket bound to the same umem. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20240205123553.22180-3-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06xsk: support redirect to any socket bound to the same umemMagnus Karlsson1-1/+4
Add support for directing a packet to any socket bound to the same umem. This makes it possible to use the XDP program to select what socket the packet should be received on. The user can populate the XSKMAP with various sockets and as long as they share the same umem, the XDP program can pick any one of them. Suggested-by: Yuval El-Hanany <yuvale@radware.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240205123553.22180-2-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06Merge branch 'transfer-rcu-lock-state-across-subprog-calls'Alexei Starovoitov3-2/+127
Kumar Kartikeya Dwivedi says: ==================== Transfer RCU lock state across subprog calls David suggested during the discussion in [0] that we should handle RCU locks in a similar fashion to spin locks where the verifier understands when a lock held in a caller is released in callee, or lock taken in callee is released in a caller, or the callee is called within a lock critical section. This set extends the same semantics to RCU read locks and adds a few selftests to verify correct behavior. This issue has also come up for sched-ext programs. This would now allow static subprog calls to be made without errors within RCU read sections, for subprogs to release RCU locks of callers and return to them, or for subprogs to take RCU lock which is later released in the caller. [0]: https://lore.kernel.org/bpf/20240204120206.796412-1-memxor@gmail.com Changelog: ---------- v1 -> v2: v1: https://lore.kernel.org/bpf/20240204230231.1013964-1-memxor@gmail.com * Add tests for global subprog behaviour (Yafang) * Add Acks, Tested-by (Yonghong, Yafang) ==================== Link: https://lore.kernel.org/r/20240205055646.1112186-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06selftests/bpf: Add tests for RCU lock transfer between subprogsKumar Kartikeya Dwivedi2-0/+126
Add selftests covering the following cases: - A static or global subprog called from within a RCU read section works - A static subprog taking an RCU read lock which is released in caller works - A static subprog releasing the caller's RCU read lock works Global subprogs that leave the lock in an imbalanced state will not work, as they are verified separately, so ensure those cases fail as well. Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240205055646.1112186-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06bpf: Transfer RCU lock state between subprog callsKumar Kartikeya Dwivedi1-2/+1
Allow transferring an imbalanced RCU lock state between subprog calls during verification. This allows patterns where a subprog call returns with an RCU lock held, or a subprog call releases an RCU lock held by the caller. Currently, the verifier would end up complaining if the RCU lock is not released when processing an exit from a subprog, which is non-ideal if its execution is supposed to be enclosed in an RCU read section of the caller. Instead, simply only check whether we are processing exit for frame#0 and do not complain on an active RCU lock otherwise. We only need to update the check when processing BPF_EXIT insn, as copy_verifier_state is already set up to do the right thing. Suggested-by: David Vernet <void@manifault.com> Tested-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240205055646.1112186-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06Merge branch 'enable-static-subprog-calls-in-spin-lock-critical-sections'Alexei Starovoitov5-4/+120
Kumar Kartikeya Dwivedi says: ==================== Enable static subprog calls in spin lock critical sections This set allows a BPF program to make a call to a static subprog within a bpf_spin_lock critical section. This problem has been hit in sched-ext and ghOSt [0] as well, and is mostly an annoyance which is worked around by inling the static subprog into the critical section. In case of sched-ext, there are a lot of other helper/kfunc calls that need to be allow listed for the support to be complete, but a separate follow up will deal with that. Unlike static subprogs, global subprogs cannot be allowed yet as the verifier will not explore their body when encountering a call instruction for them. Therefore, we would need an alternative approach (some sort of function summarization to ensure a lock is never taken from a global subprog and all its callees). [0]: https://lore.kernel.org/bpf/bd173bf2-dea6-3e0e-4176-4a9256a9a056@google.com Changelog: ---------- v1 -> v2 v1: https://lore.kernel.org/bpf/20240204120206.796412-1-memxor@gmail.com * Indicate global function call in verifier error string (Yonghong, David) * Add Acks from Yonghong, David ==================== Link: https://lore.kernel.org/r/20240204222349.938118-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06selftests/bpf: Add test for static subprog call in lock csKumar Kartikeya Dwivedi3-0/+111
Add selftests for static subprog calls within bpf_spin_lock critical section, and ensure we still reject global subprog calls. Also test the case where a subprog call will unlock the caller's held lock, or the caller will unlock a lock taken by a subprog call, ensuring correct transfer of lock state across frames on exit. Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: David Vernet <void@manifault.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240204222349.938118-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06bpf: Allow calling static subprogs while holding a bpf_spin_lockKumar Kartikeya Dwivedi2-4/+9
Currently, calling any helpers, kfuncs, or subprogs except the graph data structure (lists, rbtrees) API kfuncs while holding a bpf_spin_lock is not allowed. One of the original motivations of this decision was to force the BPF programmer's hand into keeping the bpf_spin_lock critical section small, and to ensure the execution time of the program does not increase due to lock waiting times. In addition to this, some of the helpers and kfuncs may be unsafe to call while holding a bpf_spin_lock. However, when it comes to subprog calls, atleast for static subprogs, the verifier is able to explore their instructions during verification. Therefore, it is similar in effect to having the same code inlined into the critical section. Hence, not allowing static subprog calls in the bpf_spin_lock critical section is mostly an annoyance that needs to be worked around, without providing any tangible benefit. Unlike static subprog calls, global subprog calls are not safe to permit within the critical section, as the verifier does not explore them during verification, therefore whether the same lock will be taken again, or unlocked, cannot be ascertained. Therefore, allow calling static subprogs within a bpf_spin_lock critical section, and only reject it in case the subprog linkage is global. Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: David Vernet <void@manifault.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240204222349.938118-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-06bpf, docs: Expand set of initial conformance groupsDave Thaler1-12/+36
This patch attempts to update the ISA specification according to the latest mailing list discussion about conformance groups, in a way that is intended to be consistent with IANA registry processes and IETF 118 WG meeting discussion. It does the following: * Split basic into base32 and base64 for 32-bit vs 64-bit base instructions * Split division/multiplication/modulo instructions out of base groups * Split atomic instructions out of base groups There may be additional changes as discussion continues, but there seems to be consensus on the principles above. v1->v2: fixed typo pointed out by David Vernet v2->v3: Moved multiplication to same groups as division/modulo Signed-off-by: Dave Thaler <dthaler1968@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240202221110.3872-1-dthaler1968@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-05selftests/bpf: Fix flaky selftest lwt_redirect/lwt_rerouteYonghong Song3-2/+2
Recently, when running './test_progs -j', I occasionally hit the following errors: test_lwt_redirect:PASS:pthread_create 0 nsec test_lwt_redirect_run:FAIL:netns_create unexpected error: 256 (errno 0) #142/2 lwt_redirect/lwt_redirect_normal_nomac:FAIL #142 lwt_redirect:FAIL test_lwt_reroute:PASS:pthread_create 0 nsec test_lwt_reroute_run:FAIL:netns_create unexpected error: 256 (errno 0) test_lwt_reroute:PASS:pthread_join 0 nsec #143/2 lwt_reroute/lwt_reroute_qdisc_dropped:FAIL #143 lwt_reroute:FAIL The netns_create() definition looks like below: #define NETNS "ns_lwt" static inline int netns_create(void) { return system("ip netns add " NETNS); } One possibility is that both lwt_redirect and lwt_reroute create netns with the same name "ns_lwt" which may cause conflict. I tried the following example: $ sudo ip netns add abc $ echo $? 0 $ sudo ip netns add abc Cannot create namespace file "/var/run/netns/abc": File exists $ echo $? 1 $ The return code for above netns_create() is 256. The internet search suggests that the return value for 'ip netns add ns_lwt' is 1, which matches the above 'sudo ip netns add abc' example. This patch tried to use different netns names for two tests to avoid 'ip netns add <name>' failure. I ran './test_progs -j' 10 times and all succeeded with lwt_redirect/lwt_reroute tests. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240205052914.1742687-1-yonghong.song@linux.dev
2024-02-05selftests/bpf: Suppress warning message of an unused variable.Kui-Feng Lee3-7/+4
"r" is used to receive the return value of test_2 in bpf_testmod.c, but it is not actually used. So, we remove "r" and change the return type to "void". Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401300557.z5vzn8FM-lkp@intel.com/ Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240204061204.1864529-1-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-05selftests/bpf: Fix flaky test ptr_untrustedYonghong Song1-3/+3
Somehow recently I frequently hit the following test failure with either ./test_progs or ./test_progs-cpuv4: serial_test_ptr_untrusted:PASS:skel_open 0 nsec serial_test_ptr_untrusted:PASS:lsm_attach 0 nsec serial_test_ptr_untrusted:PASS:raw_tp_attach 0 nsec serial_test_ptr_untrusted:FAIL:cmp_tp_name unexpected cmp_tp_name: actual -115 != expected 0 #182 ptr_untrusted:FAIL Further investigation found the failure is due to bpf_probe_read_user_str() where reading user-level string attr->raw_tracepoint.name is not successfully, most likely due to the string itself still in disk and not populated into memory yet. One solution is do a printf() call of the string before doing bpf syscall which will force the raw_tracepoint.name into memory. But I think a more robust solution is to use bpf_copy_from_user() which is used in sleepable program and can tolerate page fault, and the fix here used the latter approach. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240204194452.2785936-1-yonghong.song@linux.dev
2024-02-05bpf: Remove an unnecessary check.Kui-Feng Lee1-12/+9
The "i" here is always equal to "btf_type_vlen(t)" since the "for_each_member()" loop never breaks. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240203055119.2235598-1-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-03Merge branch 'two-small-fixes-for-global-subprog-tagging'Alexei Starovoitov3-3/+36
Andrii Nakryiko says: ==================== Two small fixes for global subprog tagging Fix a bug with passing trusted PTR_TO_BTF_ID_OR_NULL register into global subprog that expects `__arg_trusted __arg_nullable` arguments, which was discovered when adopting production BPF application. Also fix annoying warnings that are irrelevant for static subprogs, which are just an artifact of using btf_prepare_func_args() for both static and global subprogs. ==================== Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240202190529.2374377-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-03bpf: don't emit warnings intended for global subprogs for static subprogsAndrii Nakryiko1-0/+6
When btf_prepare_func_args() was generalized to handle both static and global subprogs, a few warnings/errors that are meant only for global subprog cases started to be emitted for static subprogs, where they are sort of expected and irrelavant. Stop polutting verifier logs with irrelevant scary-looking messages. Fixes: e26080d0da87 ("bpf: prepare btf_prepare_func_args() for handling static subprogs") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240202190529.2374377-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-03selftests/bpf: add more cases for __arg_trusted __arg_nullable argsAndrii Nakryiko1-3/+29
Add extra layer of global functions to ensure that passing around (trusted) PTR_TO_BTF_ID_OR_NULL registers works as expected. We also extend trusted_task_arg_nullable subtest to check three possible valid argumements: known NULL, known non-NULL, and maybe NULL cases. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240202190529.2374377-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-03bpf: handle trusted PTR_TO_BTF_ID_OR_NULL in argument check logicAndrii Nakryiko1-0/+1
Add PTR_TRUSTED | PTR_MAYBE_NULL modifiers for PTR_TO_BTF_ID to check_reg_type() to support passing trusted nullable PTR_TO_BTF_ID registers into global functions accepting `__arg_trusted __arg_nullable` arguments. This hasn't been caught earlier because tests were either passing known non-NULL PTR_TO_BTF_ID registers or known NULL (SCALAR) registers. When utilizing this functionality in complicated real-world BPF application that passes around PTR_TO_BTF_ID_OR_NULL, it became apparent that verifier rejects valid case because check_reg_type() doesn't handle this case explicitly. Existing check_reg_type() logic is already anticipating this combination, so we just need to explicitly list this combo in the switch statement. Fixes: e2b3c4ff5d18 ("bpf: add __arg_trusted global func arg tag") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240202190529.2374377-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-03selftests/bpf: trace_helpers.c: do not use poisoned typeShung-Hsi Yu1-1/+1
After commit c698eaebdf47 ("selftests/bpf: trace_helpers.c: Optimize kallsyms cache") trace_helpers.c now includes libbpf_internal.h, and thus can no longer use the u32 type (among others) since they are poison in libbpf_internal.h. Replace u32 with __u32 to fix the following error when building trace_helpers.c on powerpc: error: attempt to use poisoned "u32" Fixes: c698eaebdf47 ("selftests/bpf: trace_helpers.c: Optimize kallsyms cache") Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20240202095559.12900-1-shung-hsi.yu@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-03Merge branch 'improvements-for-tracking-scalars-in-the-bpf-verifier'Andrii Nakryiko3-32/+404
Maxim Mikityanskiy says: ==================== Improvements for tracking scalars in the BPF verifier From: Maxim Mikityanskiy <maxim@isovalent.com> The goal of this series is to extend the verifier's capabilities of tracking scalars when they are spilled to stack, especially when the spill or fill is narrowing. It also contains a fix by Eduard for infinite loop detection and a state pruning optimization by Eduard that compensates for a verification complexity regression introduced by tracking unbounded scalars. These improvements reduce the surface of false rejections that I saw while working on Cilium codebase. Patches 1-9 of the original series were previously applied in v2. Patches 1-2 (Maxim): Support the case when boundary checks are first performed after the register was spilled to the stack. Patches 3-4 (Maxim): Support narrowing fills. Patches 5-6 (Eduard): Optimization for state pruning in stacksafe() to mitigate the verification complexity regression. veristat -e file,prog,states -f '!states_diff<50' -f '!states_pct<10' -f '!states_a<10' -f '!states_b<10' -C ... * Without patch 5: File Program States (A) States (B) States (DIFF) -------------------- -------- ---------- ---------- ---------------- pyperf100.bpf.o on_event 4878 6528 +1650 (+33.83%) pyperf180.bpf.o on_event 6936 11032 +4096 (+59.05%) pyperf600.bpf.o on_event 22271 39455 +17184 (+77.16%) pyperf600_iter.bpf.o on_event 400 490 +90 (+22.50%) strobemeta.bpf.o on_event 4895 14028 +9133 (+186.58%) * With patch 5: File Program States (A) States (B) States (DIFF) ----------------------- ------------- ---------- ---------- --------------- bpf_xdp.o tail_lb_ipv4 2770 2224 -546 (-19.71%) pyperf100.bpf.o on_event 4878 5848 +970 (+19.89%) pyperf180.bpf.o on_event 6936 8868 +1932 (+27.85%) pyperf600.bpf.o on_event 22271 29656 +7385 (+33.16%) pyperf600_iter.bpf.o on_event 400 450 +50 (+12.50%) xdp_synproxy_kern.bpf.o syncookie_tc 280 226 -54 (-19.29%) xdp_synproxy_kern.bpf.o syncookie_xdp 302 228 -74 (-24.50%) v2 changes: Fixed comments in patch 1, moved endianness checks to header files in patch 12 where possible, added Eduard's ACKs. v3 changes: Maxim: Removed __is_scalar_unbounded altogether, addressed Andrii's comments. Eduard: Patch #5 (#14 in v2) changed significantly: - Logical changes: - Handling of STACK_{MISC,ZERO} mix turned out to be incorrect: a mix of MISC and ZERO in old state is not equivalent to e.g. just MISC is current state, because verifier could have deduced zero scalars from ZERO slots in old state for some loads. - There is no reason to limit the change only to cases when old or current stack is a spill of unbounded scalar, it is valid to compare any 64-bit scalar spill with fake register impersonating MISC. - STACK_ZERO vs spilled zero case was dropped, after recent changes for zero handling by Andrii and Yonghong it is hard (impossible?) to conjure all ZERO slots for an spi. => the case does not make any difference in veristat results. - Use global static variable for unbound_reg (Andrii) - Code shuffling to remove duplication in stacksafe() (Andrii) ==================== Link: https://lore.kernel.org/r/20240127175237.526726-1-maxtram95@gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-02-03selftests/bpf: States pruning checks for scalar vs STACK_MISCEduard Zingerman1-0/+154
Check that stacksafe() compares spilled scalars with STACK_MISC. The following combinations are explored: - old spill of imprecise scalar is equivalent to cur STACK_{MISC,INVALID} (plus error in unpriv mode); - old spill of precise scalar is not equivalent to cur STACK_MISC; - old STACK_MISC is equivalent to cur scalar; - old STACK_MISC is not equivalent to cur non-scalar. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240127175237.526726-7-maxtram95@gmail.com
2024-02-03bpf: Handle scalar spill vs all MISC in stacksafe()Eduard Zingerman1-3/+69
When check_stack_read_fixed_off() reads value from an spi all stack slots of which are set to STACK_{MISC,INVALID}, the destination register is set to unbound SCALAR_VALUE. Exploit this fact by allowing stacksafe() to use a fake unbound scalar register to compare 'mmmm mmmm' stack value in old state vs spilled 64-bit scalar in current state and vice versa. Veristat results after this patch show some gains: ./veristat -C -e file,prog,states -f 'states_pct>10' not-opt after File Program States (DIFF) ----------------------- --------------------- --------------- bpf_overlay.o tail_rev_nodeport_lb4 -45 (-15.85%) bpf_xdp.o tail_lb_ipv4 -541 (-19.57%) pyperf100.bpf.o on_event -680 (-10.42%) pyperf180.bpf.o on_event -2164 (-19.62%) pyperf600.bpf.o on_event -9799 (-24.84%) strobemeta.bpf.o on_event -9157 (-65.28%) xdp_synproxy_kern.bpf.o syncookie_tc -54 (-19.29%) xdp_synproxy_kern.bpf.o syncookie_xdp -74 (-24.50%) Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240127175237.526726-6-maxtram95@gmail.com
2024-02-03selftests/bpf: Add test cases for narrowing fillMaxim Mikityanskiy1-0/+111
The previous commit allowed to preserve boundaries and track IDs of scalars on narrowing fills. Add test cases for that pattern. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240127175237.526726-5-maxtram95@gmail.com
2024-02-03bpf: Preserve boundaries and track scalars on narrowing fillMaxim Mikityanskiy3-11/+39
When the width of a fill is smaller than the width of the preceding spill, the information about scalar boundaries can still be preserved, as long as it's coerced to the right width (done by coerce_reg_to_size). Even further, if the actual value fits into the fill width, the ID can be preserved as well for further tracking of equal scalars. Implement the above improvements, which makes narrowing fills behave the same as narrowing spills and MOVs between registers. Two tests are adjusted to accommodate for endianness differences and to take into account that it's now allowed to do a narrowing fill from the least significant bits. reg_bounds_sync is added to coerce_reg_to_size to correctly adjust umin/umax boundaries after the var_off truncation, for example, a 64-bit value 0xXXXXXXXX00000000, when read as a 32-bit, gets umin = 0, umax = 0xFFFFFFFF, var_off = (0x0; 0xffffffff00000000), which needs to be synced down to umax = 0, otherwise reg_bounds_sanity_check doesn't pass. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240127175237.526726-4-maxtram95@gmail.com
2024-02-03selftests/bpf: Test tracking spilled unbounded scalarsMaxim Mikityanskiy1-0/+27
The previous commit added tracking for unbounded scalars on spill. Add the test case to check the new functionality. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240127175237.526726-3-maxtram95@gmail.com
2024-02-03bpf: Track spilled unbounded scalarsMaxim Mikityanskiy2-18/+4
Support the pattern where an unbounded scalar is spilled to the stack, then boundary checks are performed on the src register, after which the stack frame slot is refilled into a register. Before this commit, the verifier didn't treat the src register and the stack slot as related if the src register was an unbounded scalar. The register state wasn't copied, the id wasn't preserved, and the stack slot was marked as STACK_MISC. Subsequent boundary checks on the src register wouldn't result in updating the boundaries of the spilled variable on the stack. After this commit, the verifier will preserve the bond between src and dst even if src is unbounded, which permits to do boundary checks on src and refill dst later, still remembering its boundaries. Such a pattern is sometimes generated by clang when compiling complex long functions. One test is adjusted to reflect that now unbounded scalars are tracked. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240127175237.526726-2-maxtram95@gmail.com
2024-02-02selftests/bpf: Fix bench runner SIGSEGVAndrii Nakryiko1-2/+10
Some benchmarks don't have either "consumer" or "producer" sides. For example, trig-tp and other BPF triggering benchmarks don't have consumers, as they only do "producing" by calling into syscall or predefined uproes. As such it's valid for some benchmarks to have zero consumers or producers. So allows to specify `-c0` explicitly. This triggers another problem. If benchmark doesn't support either consumer or producer side, consumer_thread/producer_thread callback will be NULL, but benchmark runner will attempt to use those NULL callback to create threads anyways. So instead of crashing with SIGSEGV in case of misconfigured benchmark, detect the condition and report error. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240201172027.604869-6-andrii@kernel.org
2024-02-02libbpf: Add missed btf_ext__raw_data() APIAndrii Nakryiko3-3/+7
Another API that was declared in libbpf.map but actual implementation was missing. btf_ext__get_raw_data() was intended as a discouraged alias to consistently-named btf_ext__raw_data(), so make this an actuality. Fixes: 20eccf29e297 ("libbpf: hide and discourage inconsistently named getters") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240201172027.604869-5-andrii@kernel.org
2024-02-02libbpf: Add btf__new_split() API that was declared but not implementedAndrii Nakryiko2-1/+7
Seems like original commit adding split BTF support intended to add btf__new_split() API, and even declared it in libbpf.map, but never added (trivial) implementation. Fix this. Fixes: ba451366bf44 ("libbpf: Implement basic split BTF support") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240201172027.604869-4-andrii@kernel.org
2024-02-02libbpf: Add missing LIBBPF_API annotation to libbpf_set_memlock_rlim APIAndrii Nakryiko1-1/+1
LIBBPF_API annotation seems missing on libbpf_set_memlock_rlim API, so add it to make this API callable from libbpf's shared library version. Fixes: e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF") Fixes: ab9a5a05dc48 ("libbpf: fix up few libbpf.map problems") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240201172027.604869-3-andrii@kernel.org
2024-02-02libbpf: Call memfd_create() syscall directlyAndrii Nakryiko1-1/+10
Some versions of Android do not implement memfd_create() wrapper in their libc implementation, leading to build failures ([0]). On the other hand, memfd_create() is available as a syscall on quite old kernels (3.17+, while bpf() syscall itself is available since 3.18+), so it is ok to assume that syscall availability and call into it with syscall() helper to avoid Android-specific workarounds. Validated in libbpf-bootstrap's CI ([1]). [0] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7701003207/job/20986080319#step:5:83 [1] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7715988887/job/21031767212?pr=253 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240201172027.604869-2-andrii@kernel.org
2024-02-01bpf: Minor clean-up to sleepable_lsm_hooks BTF setMatt Bobrowski1-4/+2
There's already one main CONFIG_SECURITY_NETWORK ifdef block within the sleepable_lsm_hooks BTF set. Consolidate this duplicated ifdef block as there's no need for it and all things guarded by it should remain in one place in this specific context. Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/Zbt1smz43GDMbVU3@google.com
2024-02-01selftests/bpf: Enable inline bpf_kptr_xchg() test for RV64Pu Lehui1-1/+2
Enable inline bpf_kptr_xchg() test for RV64, and the test have passed as show below: Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240130124659.670321-3-pulehui@huaweicloud.com
2024-02-01riscv, bpf: Enable inline bpf_kptr_xchg() for RV64Pu Lehui1-0/+5
RV64 JIT supports 64-bit BPF_XCHG atomic instructions. At the same time, the underlying implementation of xchg() and atomic64_xchg() in RV64 both are raw_xchg() that supported 64-bit. Therefore inline bpf_kptr_xchg() will have equivalent semantics. Let's inline it for better performance. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240130124659.670321-2-pulehui@huaweicloud.com
2024-02-01bpf, docs: Clarify which legacy packet instructions existedDave Thaler1-1/+3
As discussed on the BPF IETF mailing list (see link), this patch updates the "Legacy BPF Packet access instructions" section to clarify which instructions are deprecated (vs which were never defined and so are not deprecated). Signed-off-by: Dave Thaler <dthaler1968@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: David Vernet <void@manifault.com> Link: https://mailarchive.ietf.org/arch/msg/bpf/5LnnKm093cGpOmDI9TnLQLBXyys Link: https://lore.kernel.org/bpf/20240131033759.3634-1-dthaler1968@gmail.com
2024-02-01libbpf: Remove unnecessary null check in kernel_supports()Eduard Zingerman1-1/+1
After recent changes, Coverity complained about inconsistent null checks in kernel_supports() function: kernel_supports(const struct bpf_object *obj, ...) [...] // var_compare_op: Comparing obj to null implies that obj might be null if (obj && obj->gen_loader) return true; // var_deref_op: Dereferencing null pointer obj if (obj->token_fd) return feat_supported(obj->feat_cache, feat_id); [...] - The original null check was introduced by commit [0], which introduced a call `kernel_supports(NULL, ...)` in function bump_rlimit_memlock(); - This call was refactored to use `feat_supported(NULL, ...)` in commit [1]. Looking at all places where kernel_supports() is called: - There is either `obj->...` access before the call; - Or `obj` comes from `prog->obj` expression, where `prog` comes from enumeration of programs in `obj`; - Or `obj` comes from `prog->obj`, where `prog` is a parameter to one of the API functions: - bpf_program__attach_kprobe_opts; - bpf_program__attach_kprobe; - bpf_program__attach_ksyscall. Assuming correct API usage, it appears that `obj` can never be null when passed to kernel_supports(). Silence the Coverity warning by removing redundant null check. [0] e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF") [1] d6dd1d49367a ("libbpf: Further decouple feature checking logic from bpf_object") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240131212615.20112-1-eddyz87@gmail.com
2024-02-01Merge branch 'annotate-kfuncs-in-btf_ids-section'Alexei Starovoitov23-66/+87
Daniel Xu says: ==================== Annotate kfuncs in .BTF_ids section === Description === This is a bpf-treewide change that annotates all kfuncs as such inside .BTF_ids. This annotation eventually allows us to automatically generate kfunc prototypes from bpftool. We store this metadata inside a yet-unused flags field inside struct btf_id_set8 (thanks Kumar!). pahole will be taught where to look. More details about the full chain of events are available in commit 3's description. The accompanying pahole and bpftool changes can be viewed here on these "frozen" branches [0][1]. [0]: https://github.com/danobi/pahole/tree/kfunc_btf-v3-mailed [1]: https://github.com/danobi/linux/tree/kfunc_bpftool-mailed === Changelog === Changes from v3: * Rebase to bpf-next and add missing annotation on new kfunc Changes from v2: * Only WARN() for vmlinux kfuncs Changes from v1: * Move WARN_ON() up a call level * Also return error when kfunc set is not properly tagged * Use BTF_KFUNCS_START/END instead of flags * Rename BTF_SET8_KFUNC to BTF_SET8_KFUNCS ==================== Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/cover.1706491398.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-01bpf: treewide: Annotate BPF kfuncs in BTFDaniel Xu22-62/+70
This commit marks kfuncs as such inside the .BTF_ids section. The upshot of these annotations is that we'll be able to automatically generate kfunc prototypes for downstream users. The process is as follows: 1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs 2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for each function inside BTF_KFUNCS sets 3. At runtime, vmlinux or module BTF is made available in sysfs 4. At runtime, bpftool (or similar) can look at provided BTF and generate appropriate prototypes for functions with "bpf_kfunc" tag To ensure future kfunc are similarly tagged, we now also return error inside kfunc registration for untagged kfuncs. For vmlinux kfuncs, we also WARN(), as initcall machinery does not handle errors. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-01bpf: btf: Add BTF_KFUNCS_START/END macro pairDaniel Xu1-0/+11
This macro pair is functionally equivalent to BTF_SET8_START/END, except with BTF_SET8_KFUNCS flag set in the btf_id_set8 flags field. The next commit will codemod all kfunc set8s to this new variant such that all kfuncs are tagged as such in .BTF_ids section. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/d536c57c7c2af428686853cc7396b7a44faa53b7.1706491398.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-31bpf: btf: Support flags for BTF_SET8 setsDaniel Xu1-4/+6
This commit adds support for flags on BTF_SET8s. struct btf_id_set8 already supported 32 bits worth of flags, but was only used for alignment purposes before. We now use these bits to encode flags. The first use case is tagging kfunc sets with a flag so that pahole can recognize which BTF_ID_FLAGS(func, ..) are actual kfuncs. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/7bb152ec76d6c2c930daec88e995bf18484a5ebb.1706491398.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-31selftests/bpf: Disable IPv6 for lwt_redirect testManu Bretelle1-0/+1
After a recent change in the vmtest runner, this test started failing sporadically. Investigation showed that this test was subject to race condition which got exacerbated after the vm runner change. The symptoms being that the logic that waited for an ICMPv4 packet is naive and will break if 5 or more non-ICMPv4 packets make it to tap0. When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6 router solicitation... On a system with good performance, the expected ICMPv4 packet would very likely make it to the network interface promptly, but on a system with poor performance, those "guarantees" do not hold true anymore. Given that the test is IPv4 only, this change disable IPv6 in the test netns by setting `net.ipv6.conf.all.disable_ipv6` to 1. This essentially leaves "ping" as the sole generator of traffic in the network namespace. If this test was to be made IPv6 compatible, the logic in `wait_for_packet` would need to be modified. In more details... At a high level, the test does: - create a new namespace - in `setup_redirect_target` set up lo, tap0, and link_err interfaces as well as add 2 routes that attaches ingress/egress sections of `test_lwt_redirect.bpf.o` to the xmit path. - in `send_and_capture_test_packets` send an ICMP packet and read off the tap interface (using `wait_for_packet`) to check that a ICMP packet with the right size is read. `wait_for_packet` will try to read `max_retry` (5) times from the tap0 fd looking for an ICMPv4 packet matching some criteria. The problem is that when we set up the `tap0` interface, because IPv6 is enabled by default, traffic such as Router solicitation is sent through tap0, as in: # tcpdump -r /tmp/lwt_redirect.pc reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet) 04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32 04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108 04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16 04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28 04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16 If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in: 2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec 2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec 2024-01-31T03:51:25.0344844Z open_netns:PASS:open /proc/self/ns/net 0 nsec 2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec 2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec 2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec 2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec 2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec 2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec 2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec 2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec 2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec 2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec 2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec 2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec 2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec 2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec 2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec 2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec 2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec 2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1 2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails 2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec When running in a VM which potential resource contrains, the odds that calling `ping` is not scheduled very soon after bringing `tap0` up increases, and with this the chances to get our ICMP packet pushed to position 6+ in the network trace. To confirm this indeed solves the issue, I ran the test 100 times in a row with: errors=0 successes=0 for i in `seq 1 100` do ./test_progs -t lwt_redirect/lwt_redirect_normal if [ $? -eq 0 ]; then successes=$((successes+1)) else errors=$((errors+1)) fi done echo "successes: $successes/errors: $errors" While this test would at least fail a couple of time every 10 runs, here it ran 100 times with no error. Fixes: 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT") Signed-off-by: Manu Bretelle <chantr4@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240131053212.2247527-1-chantr4@gmail.com
2024-01-31Merge branch 'libbpf: add bpf_core_cast() helper'Martin KaFai Lau10-24/+27
Andrii Nakryiko says: ==================== Add bpf_core_cast(<ptr>, <type>) macro wrapper around bpf_rdonly_cast() kfunc to make it easier to use this functionality in BPF code. See patch #2 for BPF selftests conversions demonstrating improvements in code succinctness. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-01-31selftests/bpf: convert bpf_rdonly_cast() uses to bpf_core_cast() macroAndrii Nakryiko8-18/+12
Use more ergonomic bpf_core_cast() macro instead of bpf_rdonly_cast() in selftests code. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240130212023.183765-3-andrii@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>