summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-03-11ima: Always return a file measurement in ima_file_hash()Roberto Sassu1-13/+33
__ima_inode_hash() checks if a digest has been already calculated by looking for the integrity_iint_cache structure associated to the passed inode. Users of ima_file_hash() (e.g. eBPF) might be interested in obtaining the information without having to setup an IMA policy so that the digest is always available at the time they call this function. In addition, they likely expect the digest to be fresh, e.g. recalculated by IMA after a file write. Although getting the digest from the bprm_committed_creds hook (as in the eBPF test) ensures that the digest is fresh, as the IMA hook is executed before that hook, this is not always the case (e.g. for the mmap_file hook). Call ima_collect_measurement() in __ima_inode_hash(), if the file descriptor is available (passed by ima_file_hash()) and the digest is not available/not fresh, and store the file measurement in a temporary integrity_iint_cache structure. This change does not cause memory usage increase, due to using the temporary integrity_iint_cache structure, and due to freeing the ima_digest_data structure inside integrity_iint_cache before exiting from __ima_inode_hash(). For compatibility reasons, the behavior of ima_inode_hash() remains unchanged. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/bpf/20220302111404.193900-3-roberto.sassu@huawei.com
2022-03-11ima: Fix documentation-related warnings in ima_main.cRoberto Sassu1-5/+6
Fix the following warnings in ima_main.c, displayed with W=n make argument: security/integrity/ima/ima_main.c:432: warning: Function parameter or member 'vma' not described in 'ima_file_mprotect' security/integrity/ima/ima_main.c:636: warning: Function parameter or member 'inode' not described in 'ima_post_create_tmpfile' security/integrity/ima/ima_main.c:636: warning: Excess function parameter 'file' description in 'ima_post_create_tmpfile' security/integrity/ima/ima_main.c:843: warning: Function parameter or member 'load_id' not described in 'ima_post_load_data' security/integrity/ima/ima_main.c:843: warning: Excess function parameter 'id' description in 'ima_post_load_data' Also, fix some style issues in the description of ima_post_create_tmpfile() and ima_post_path_mknod(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/bpf/20220302111404.193900-2-roberto.sassu@huawei.com
2022-03-11bpftool: Ensure bytes_memlock json output is correctChris J Arges2-2/+2
If a BPF map is created over 2^32 the memlock value as displayed in JSON format will be incorrect. Use atoll instead of atoi so that the correct number is displayed. ``` $ bpftool map create /sys/fs/bpf/test_bpfmap type hash key 4 \ value 1024 entries 4194304 name test_bpfmap $ bpftool map list 1: hash name test_bpfmap flags 0x0 key 4B value 1024B max_entries 4194304 memlock 4328521728B $ sudo bpftool map list -j | jq .[].bytes_memlock 33554432 ``` Signed-off-by: Chris J Arges <carges@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/b6601087-0b11-33cc-904a-1133d1500a10@cloudflare.com
2022-03-11bpf: Use offsetofend() to simplify macro definitionYuntao Wang1-2/+1
Use offsetofend() instead of offsetof() + sizeof() to simplify MIN_BPF_LINEINFO_SIZE macro definition. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/bpf/20220310161518.534544-1-ytcoode@gmail.com
2022-03-11bpf: Fix comment for helper bpf_current_task_under_cgroup()Hengqi Chen2-4/+4
Fix the descriptions of the return values of helper bpf_current_task_under_cgroup(). Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)") Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220310155335.1278783-1-hengqi.chen@gmail.com
2022-03-11Merge branch 'bpf-tstamp-follow-ups'Daniel Borkmann6-138/+125
Martin KaFai Lau says: ==================== This set is a follow up on the bpf side based on discussion [0]. Patch 1 is to remove some skbuff macros that are used in bpf filter.c. Patch 2 and 3 are to simplify the bpf insn rewrite on __sk_buff->tstamp. Patch 4 is to simplify the bpf uapi by modeling the __sk_buff->tstamp and __sk_buff->tstamp_type (was delivery_time_type) the same as its kernel counter part skb->tstamp and skb->mono_delivery_time. Patch 5 is to adjust the bpf selftests due to changes in patch 4. [0]: https://lore.kernel.org/bpf/419d994e-ff61-7c11-0ec7-11fefcb0186e@iogearbox.net/ ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2022-03-11bpf: selftests: Update tests after s/delivery_time/tstamp/ change in bpf.hMartin KaFai Lau1-19/+19
The previous patch made the follow changes: - s/delivery_time_type/tstamp_type/ - s/bpf_skb_set_delivery_time/bpf_skb_set_tstamp/ - BPF_SKB_DELIVERY_TIME_* to BPF_SKB_TSTAMP_* This patch is to change the test_tc_dtime.c to reflect the above. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090515.3712742-1-kafai@fb.com
2022-03-11bpf: Remove BPF_SKB_DELIVERY_TIME_NONE and rename s/delivery_time_/tstamp_/Martin KaFai Lau4-93/+77
This patch is to simplify the uapi bpf.h regarding to the tstamp type and use a similar way as the kernel to describe the value stored in __sk_buff->tstamp. My earlier thought was to avoid describing the semantic and clock base for the rcv timestamp until there is more clarity on the use case, so the __sk_buff->delivery_time_type naming instead of __sk_buff->tstamp_type. With some thoughts, it can reuse the UNSPEC naming. This patch first removes BPF_SKB_DELIVERY_TIME_NONE and also rename BPF_SKB_DELIVERY_TIME_UNSPEC to BPF_SKB_TSTAMP_UNSPEC and BPF_SKB_DELIVERY_TIME_MONO to BPF_SKB_TSTAMP_DELIVERY_MONO. The semantic of BPF_SKB_TSTAMP_DELIVERY_MONO is the same: __sk_buff->tstamp has delivery time in mono clock base. BPF_SKB_TSTAMP_UNSPEC means __sk_buff->tstamp has the (rcv) tstamp at ingress and the delivery time at egress. At egress, the clock base could be found from skb->sk->sk_clockid. __sk_buff->tstamp == 0 naturally means NONE, so NONE is not needed. With BPF_SKB_TSTAMP_UNSPEC for the rcv tstamp at ingress, the __sk_buff->delivery_time_type is also renamed to __sk_buff->tstamp_type which was also suggested in the earlier discussion: https://lore.kernel.org/bpf/b181acbe-caf8-502d-4b7b-7d96b9fc5d55@iogearbox.net/ The above will then make __sk_buff->tstamp and __sk_buff->tstamp_type the same as its kernel skb->tstamp and skb->mono_delivery_time counter part. The internal kernel function bpf_skb_convert_dtime_type_read() is then renamed to bpf_skb_convert_tstamp_type_read() and it can be simplified with the BPF_SKB_DELIVERY_TIME_NONE gone. A BPF_ALU32_IMM(BPF_AND) insn is also saved by using BPF_JMP32_IMM(BPF_JSET). The bpf helper bpf_skb_set_delivery_time() is also renamed to bpf_skb_set_tstamp(). The arg name is changed from dtime to tstamp also. It only allows setting tstamp 0 for BPF_SKB_TSTAMP_UNSPEC and it could be relaxed later if there is use case to change mono delivery time to non mono. prog->delivery_time_access is also renamed to prog->tstamp_type_access. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090509.3712315-1-kafai@fb.com
2022-03-11bpf: Simplify insn rewrite on BPF_WRITE __sk_buff->tstampMartin KaFai Lau1-12/+14
BPF_JMP32_IMM(BPF_JSET) is used to save a BPF_ALU32_IMM(BPF_AND). The skb->tc_at_ingress and skb->mono_delivery_time are at the same offset, so only one BPF_LDX_MEM(BPF_B) is needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090502.3711982-1-kafai@fb.com
2022-03-11bpf: Simplify insn rewrite on BPF_READ __sk_buff->tstampMartin KaFai Lau1-10/+11
The skb->tc_at_ingress and skb->mono_delivery_time are at the same byte offset. Thus, only one BPF_LDX_MEM(BPF_B) is needed and both bits can be tested together. /* BPF_READ: a = __sk_buff->tstamp */ if (skb->tc_at_ingress && skb->mono_delivery_time) a = 0; else a = skb->tstamp; Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090456.3711530-1-kafai@fb.com
2022-03-11bpf: net: Remove TC_AT_INGRESS_OFFSET and SKB_MONO_DELIVERY_TIME_OFFSET macroMartin KaFai Lau2-12/+12
This patch removes the TC_AT_INGRESS_OFFSET and SKB_MONO_DELIVERY_TIME_OFFSET macros. Instead, PKT_VLAN_PRESENT_OFFSET is used because all of them are at the same offset. Comment is added to make it clear that changing the position of tc_at_ingress or mono_delivery_time will require to adjust the defined macros. The earlier discussion can be found here: https://lore.kernel.org/bpf/419d994e-ff61-7c11-0ec7-11fefcb0186e@iogearbox.net/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220309090450.3710955-1-kafai@fb.com
2022-03-10bpf, test_run: Use kvfree() for memory allocated with kvmalloc()Yihao Han1-2/+2
It is allocated with kvmalloc(), the corresponding release function should not be kfree(), use kvfree() instead. Generated by: scripts/coccinelle/api/kfree_mismatch.cocci Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Signed-off-by: Yihao Han <hanyihao@vivo.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20220310092828.13405-1-hanyihao@vivo.com
2022-03-10bpf: Initialise retval in bpf_prog_test_run_xdp()Toke Høiland-Jørgensen1-1/+1
The kernel test robot pointed out that the newly added bpf_test_run_xdp_live() runner doesn't set the retval in the caller (by design), which means that the variable can be passed unitialised to bpf_test_finish(). Fix this by initialising the variable properly. Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220310110228.161869-1-toke@redhat.com
2022-03-10bpftool: Restore support for BPF offload-enabled feature probingNiklas Söderlund1-13/+139
Commit 1a56c18e6c2e4e74 ("bpftool: Stop supporting BPF offload-enabled feature probing") removed the support to probe for BPF offload features. This is still something that is useful for NFP NIC that can support offloading of BPF programs. The reason for the dropped support was that libbpf starting with v1.0 would drop support for passing the ifindex to the BPF prog/map/helper feature probing APIs. In order to keep this useful feature for NFP restore the functionality by moving it directly into bpftool. The code restored is a simplified version of the code that existed in libbpf which supposed passing the ifindex. The simplification is that it only targets the cases where ifindex is given and call into libbpf for the cases where it's not. Before restoring support for probing offload features: # bpftool feature probe dev ens4np0 Scanning system call availability... bpf() syscall is available Scanning eBPF program types... Scanning eBPF map types... Scanning eBPF helper functions... eBPF helpers supported for program type sched_cls: eBPF helpers supported for program type xdp: Scanning miscellaneous eBPF features... Large program size limit is NOT available Bounded loop support is NOT available ISA extension v2 is NOT available ISA extension v3 is NOT available With support for probing offload features restored: # bpftool feature probe dev ens4np0 Scanning system call availability... bpf() syscall is available Scanning eBPF program types... eBPF program_type sched_cls is available eBPF program_type xdp is available Scanning eBPF map types... eBPF map_type hash is available eBPF map_type array is available Scanning eBPF helper functions... eBPF helpers supported for program type sched_cls: - bpf_map_lookup_elem - bpf_get_prandom_u32 - bpf_perf_event_output eBPF helpers supported for program type xdp: - bpf_map_lookup_elem - bpf_get_prandom_u32 - bpf_perf_event_output - bpf_xdp_adjust_head - bpf_xdp_adjust_tail Scanning miscellaneous eBPF features... Large program size limit is NOT available Bounded loop support is NOT available ISA extension v2 is NOT available ISA extension v3 is NOT available Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220310121846.921256-1-niklas.soderlund@corigine.com
2022-03-10Merge branch 'Add support for transmitting packets using XDP in bpf_prog_run()'Alexei Starovoitov14-105/+821
Toke Høiland-Jørgensen says: ==================== This series adds support for transmitting packets using XDP in bpf_prog_run(), by enabling a new mode "live packet" mode which will handle the XDP program return codes and redirect the packets to the stack or other devices. The primary use case for this is testing the redirect map types and the ndo_xdp_xmit driver operation without an external traffic generator. But it turns out to also be useful for creating a programmable traffic generator in XDP, as well as injecting frames into the stack. A sample traffic generator, which was included in previous versions of the series, but now moved to xdp-tools, transmits up to 9 Mpps/core on my test machine. To transmit the frames, the new mode instantiates a page_pool structure in bpf_prog_run() and initialises the pages to contain XDP frames with the data passed in by userspace. These frames can then be handled as though they came from the hardware XDP path, and the existing page_pool code takes care of returning and recycling them. The setup is optimised for high performance with a high number of repetitions to support stress testing and the traffic generator use case; see patch 1 for details. v11: - Fix override of return code in xdp_test_run_batch() - Add Martin's ACKs to remaining patches v10: - Only propagate memory allocation errors from xdp_test_run_batch() - Get rid of BPF_F_TEST_XDP_RESERVED; batch_size can be used to probe - Check that batch_size is unset in non-XDP test_run funcs - Lower the number of repetitions in the selftest to 10k - Count number of recycled pages in the selftest - Fix a few other nits from Martin, carry forward ACKs v9: - XDP_DROP packets in the selftest to ensure pages are recycled - Fix a few issues reported by the kernel test robot - Rewrite the documentation of the batch size to make it a bit clearer - Rebase to newest bpf-next v8: - Make the batch size configurable from userspace - Don't interrupt the packet loop on errors in do_redirect (this can be caught from the tracepoint) - Add documentation of the feature - Add reserved flag userspace can use to probe for support (kernel didn't check flags previously) - Rebase to newest bpf-next, disallow live mode for jumbo frames v7: - Extend the local_bh_disable() to cover the full test run loop, to prevent running concurrently with the softirq. Fixes a deadlock with veth xmit. - Reinstate the forwarding sysctl setting in the selftest, and bump up the number of packets being transmitted to trigger the above bug. - Update commit message to make it clear that user space can select the ingress interface. v6: - Fix meta vs data pointer setting and add a selftest for it - Add local_bh_disable() around code passing packets up the stack - Create a new netns for the selftest and use a TC program instead of the forwarding hack to count packets being XDP_PASS'ed from the test prog. - Check for the correct ingress ifindex in the selftest - Rebase and drop patches 1-5 that were already merged v5: - Rebase to current bpf-next v4: - Fix a few code style issues (Alexei) - Also handle the other return codes: XDP_PASS builds skbs and injects them into the stack, and XDP_TX is turned into a redirect out the same interface (Alexei). - Drop the last patch adding an xdp_trafficgen program to samples/bpf; this will live in xdp-tools instead (Alexei). - Add a separate bpf_test_run_xdp_live() function to test_run.c instead of entangling the new mode in the existing bpf_test_run(). v3: - Reorder patches to make sure they all build individually (Patchwork) - Remove a couple of unused variables (Patchwork) - Remove unlikely() annotation in slow path and add back John's ACK that I accidentally dropped for v2 (John) v2: - Split up up __xdp_do_redirect to avoid passing two pointers to it (John) - Always reset context pointers before each test run (John) - Use get_mac_addr() from xdp_sample_user.h instead of rolling our own (Kumar) - Fix wrong offset for metadata pointer ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-10selftests/bpf: Add selftest for XDP_REDIRECT in BPF_PROG_RUNToke Høiland-Jørgensen2-0/+277
This adds a selftest for the XDP_REDIRECT facility in BPF_PROG_RUN, that redirects packets into a veth and counts them using an XDP program on the other side of the veth pair and a TC program on the local side of the veth. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-6-toke@redhat.com
2022-03-10selftests/bpf: Move open_netns() and close_netns() into network_helpers.cToke Høiland-Jørgensen3-89/+95
These will also be used by the xdp_do_redirect test being added in the next commit. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-5-toke@redhat.com
2022-03-10libbpf: Support batch_size option to bpf_prog_test_runToke Høiland-Jørgensen2-1/+3
Add support for setting the new batch_size parameter to BPF_PROG_TEST_RUN to libbpf; just add it as an option and pass it through to the kernel. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-4-toke@redhat.com
2022-03-10Documentation/bpf: Add documentation for BPF_PROG_RUNToke Høiland-Jørgensen2-0/+118
This adds documentation for the BPF_PROG_RUN command; a short overview of the command itself, and a more verbose description of the "live packet" mode for XDP introduced in the previous commit. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-3-toke@redhat.com
2022-03-10bpf: Add "live packet" mode for XDP in BPF_PROG_RUNToke Høiland-Jørgensen5-15/+328
This adds support for running XDP programs through BPF_PROG_RUN in a mode that enables live packet processing of the resulting frames. Previous uses of BPF_PROG_RUN for XDP returned the XDP program return code and the modified packet data to userspace, which is useful for unit testing of XDP programs. The existing BPF_PROG_RUN for XDP allows userspace to set the ingress ifindex and RXQ number as part of the context object being passed to the kernel. This patch reuses that code, but adds a new mode with different semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES flag. When running BPF_PROG_RUN in this mode, the XDP program return codes will be honoured: returning XDP_PASS will result in the frame being injected into the networking stack as if it came from the selected networking interface, while returning XDP_TX and XDP_REDIRECT will result in the frame being transmitted out that interface. XDP_TX is translated into an XDP_REDIRECT operation to the same interface, since the real XDP_TX action is only possible from within the network drivers themselves, not from the process context where BPF_PROG_RUN is executed. Internally, this new mode of operation creates a page pool instance while setting up the test run, and feeds pages from that into the XDP program. The setup cost of this is amortised over the number of repetitions specified by userspace. To support the performance testing use case, we further optimise the setup step so that all pages in the pool are pre-initialised with the packet data, and pre-computed context and xdp_frame objects stored at the start of each page. This makes it possible to entirely avoid touching the page content on each XDP program invocation, and enables sending up to 9 Mpps/core on my test box. Because the data pages are recycled by the page pool, and the test runner doesn't re-initialise them for each run, subsequent invocations of the XDP program will see the packet data in the state it was after the last time it ran on that particular page. This means that an XDP program that modifies the packet before redirecting it has to be careful about which assumptions it makes about the packet content, but that is only an issue for the most naively written programs. Enabling the new flag is only allowed when not setting ctx_out and data_out in the test specification, since using it means frames will be redirected somewhere else, so they can't be returned. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
2022-03-09Merge branch 'BPF test_progs tests improvement'Andrii Nakryiko6-22/+35
Mykola Lysenko says: ==================== First patch reduces the sample_freq to 1000 to ensure test will work even when kernel.perf_event_max_sample_rate was reduced to 1000. Patches for send_signal and find_vma tune the test implementation to make sure needed thread is scheduled. Also, both tests will finish as soon as possible after the test condition is met. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2022-03-09Improve stability of find_vma BPF testMykola Lysenko1-9/+19
Remove unneeded spleep and increase length of dummy CPU intensive computation to guarantee test process execution. Also, complete aforemention computation as soon as test success criteria is met Signed-off-by: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220308200449.1757478-4-mykolal@fb.com
2022-03-09Improve send_signal BPF test stabilityMykola Lysenko2-8/+11
Substitute sleep with dummy CPU intensive computation. Finish aforemention computation as soon as signal was delivered to the test process. Make the BPF code to only execute when PID global variable is set Signed-off-by: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220308200449.1757478-3-mykolal@fb.com
2022-03-09Improve perf related BPF tests (sample_freq issue)Mykola Lysenko4-5/+5
Linux kernel may automatically reduce kernel.perf_event_max_sample_rate value when running tests in parallel on slow systems. Linux kernel checks against this limit when opening perf event with freq=1 parameter set. The lower bound is 1000. This patch reduces sample_freq value to 1000 in all BPF tests that use sample_freq to ensure they always can open perf event. Signed-off-by: Mykola Lysenko <mykolal@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220308200449.1757478-2-mykolal@fb.com
2022-03-09tools: Fix unavoidable GCC call in Clang buildsAdrian Ratiu1-0/+4
In ChromeOS and Gentoo we catch any unwanted mixed Clang/LLVM and GCC/binutils usage via toolchain wrappers which fail builds. This has revealed that GCC is called unconditionally in Clang configured builds to populate GCC_TOOLCHAIN_DIR. Allow the user to override CLANG_CROSS_FLAGS to avoid the GCC call - in our case we set the var directly in the ebuild recipe. In theory Clang could be able to autodetect these settings so this logic could be removed entirely, but in practice as the commit cebdb7374577 ("tools: Help cross-building with clang") mentions, this does not always work, so giving distributions more control to specify their flags & sysroot is beneficial. Suggested-by: Manoj Gupta <manojgupta@chromium.com> Suggested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/lkml/87czjk4osi.fsf@ryzen9.i-did-not-set--mail-host-address--so-tickle-me Link: https://lore.kernel.org/bpf/20220308121428.81735-1-adrian.ratiu@collabora.com
2022-03-08selftests/bpf: Make test_lwt_ip_encap more stable and fasterFelix Maurer1-1/+9
In test_lwt_ip_encap, the ingress IPv6 encap test failed from time to time. The failure occured when an IPv4 ping through the IPv6 GRE encapsulation did not receive a reply within the timeout. The IPv4 ping and the IPv6 ping in the test used different timeouts (1 sec for IPv4 and 6 sec for IPv6), probably taking into account that IPv6 might need longer to successfully complete. However, when IPv4 pings (with the short timeout) are encapsulated into the IPv6 tunnel, the delays of IPv6 apply. The actual reason for the long delays with IPv6 was that the IPv6 neighbor discovery sometimes did not complete in time. This was caused by the outgoing interface only having a tentative link local address, i.e., not having completed DAD for that lladdr. The ND was successfully retried after 1 sec but that was too late for the ping timeout. The IPv6 addresses for the test were already added with nodad. However, for the lladdrs, DAD was still performed. We now disable DAD in the test netns completely and just assume that the two lladdrs on each veth pair do not collide. This removes all the delays for IPv6 traffic in the test. Without the delays, we can now also reduce the delay of the IPv6 ping to 1 sec. This makes the whole test complete faster because we don't need to wait for the excessive timeout for each IPv6 ping that is supposed to fail. Fixes: 0fde56e4385b0 ("selftests: bpf: add test_lwt_ip_encap selftest") Signed-off-by: Felix Maurer <fmaurer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/4987d549d48b4e316cd5b3936de69c8d4bc75a4f.1646305899.git.fmaurer@redhat.com
2022-03-08bpf: Determine buf_info inside check_buffer_access()Shung-Hsi Yu1-9/+3
Instead of determining buf_info string in the caller of check_buffer_access(), we can determine whether the register type is read-only through type_is_rdonly_mem() helper inside check_buffer_access() and construct buf_info, making the code slightly cleaner. Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/YiWYLnAkEZXBP/gH@syu-laptop
2022-03-08bpf/docs: Update list of architectures supported.KP Singh1-1/+1
vmtest.sh also supports s390x now. Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220307133048.1287644-2-kpsingh@kernel.org
2022-03-08bpf/docs: Update vmtest docs for static linkingKP Singh1-0/+8
Dynamic linking when compiling on the host can cause issues when the libc version does not match the one in the VM image. Update the docs to explain how to do this. Before: ./vmtest.sh -- ./test_progs -t test_ima ./test_progs: /usr/lib/libc.so.6: version `GLIBC_2.33' not found (required by ./test_progs) After: LDLIBS=-static ./vmtest.sh -- ./test_progs -t test_ima test_ima:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Reported-by: "Geyslan G. Bem" <geyslan@gmail.com> Signed-off-by: KP Singh <kpsingh@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220307133048.1287644-1-kpsingh@kernel.org
2022-03-08bpf: Remove redundant slashYuntao Wang1-3/+2
The trailing slash of LIBBPF_SRCS is redundant, remove it. Also inline it as its only used in LIBBPF_INCLUDE. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220305161013.361646-1-ytcoode@gmail.com
2022-03-08libbpf: Fix array_size.cocci warningGuo Zhengkui2-3/+4
Fix the following coccicheck warning: tools/lib/bpf/bpf.c:114:31-32: WARNING: Use ARRAY_SIZE tools/lib/bpf/xsk.c:484:34-35: WARNING: Use ARRAY_SIZE tools/lib/bpf/xsk.c:485:35-36: WARNING: Use ARRAY_SIZE It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220306023426.19324-1-guozhengkui@vivo.com
2022-03-08bpf: Replace strncpy() with strscpy()Yuntao Wang1-7/+2
Using strncpy() on NUL-terminated strings is considered deprecated[1]. Moreover, if the length of 'task->comm' is less than the destination buffer size, strncpy() will NUL-pad the destination buffer, which is a needless performance penalty. Replacing strncpy() with strscpy() fixes all these issues. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304070408.233658-1-ytcoode@gmail.com
2022-03-08libbpf: Unmap rings when umem deletedlic1211-0/+11
xsk_umem__create() does mmap for fill/comp rings, but xsk_umem__delete() doesn't do the unmap. This works fine for regular cases, because xsk_socket__delete() does unmap for the rings. But for the case that xsk_socket__create_shared() fails, umem rings are not unmapped. fill_save/comp_save are checked to determine if rings have already be unmapped by xsk. If fill_save and comp_save are NULL, it means that the rings have already been used by xsk. Then they are supposed to be unmapped by xsk_socket__delete(). Otherwise, xsk_umem__delete() does the unmap. Fixes: 2f6324a3937f ("libbpf: Support shared umems between queues and devices") Signed-off-by: Cheng Li <lic121@chinatelecom.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220301132623.GA19995@vscode.7~
2022-03-06Merge branch 'bpf: add __percpu tagging in vmlinux BTF'Alexei Starovoitov7-44/+253
Hao Luo says: ==================== This patchset is very much similar to Yonghong's patchset on adding __user tagging [1], where a "user" btf_type_tag was introduced to describe __user memory pointers. Similar approach can be applied on __percpu pointers. The __percpu attribute in kernel is used to identify pointers that point to memory allocated in percpu region. Normally, accessing __percpu memory requires using special functions like per_cpu_ptr() etc. Directly accessing __percpu pointer is meaningless. Currently vmlinux BTF does not have a way to differentiate a __percpu pointer from a regular pointer. So BPF programs are allowed to load __percpu memory directly, which is an incorrect behavior. With the previous work that encodes __user information in BTF, a nice framework has been set up to allow us to encode __percpu information in BTF and let the verifier to reject programs that try to directly access percpu pointer. Previously, there is a PTR_TO_PERCPU_BTF_ID reg type which is used to represent those percpu static variables in the kernel. Pahole is able to collect variables that are stored in ".data..percpu" section in the kernel image and emit BTF information for those variables. The bpf_per_cpu_ptr() and bpf_this_cpu_ptr() helper functions were added to access these variables. Now with __percpu information, we can tag those __percpu fields in a struct (such as cgroup->rstat_cpu) and allow the pair of bpf percpu helpers to access them as well. In addition to adding __percpu tagging, this patchset also fixes a harmless bug in the previous patch that introduced __user. Patch 01/04 is for that. Patch 02/04 adds the new attribute "percpu". Patch 03/04 adds MEM_PERCPU tag for PTR_TO_BTF_ID and replaces PTR_TO_PERCPU_BTF_ID with (BTF_ID | MEM_PERCPU). Patch 04/04 refactors the btf_tag test a bit and adds tests for percpu tag. Like [1], the minimal requirements for btf_type_tag is clang (>= clang14) and pahole (>= 1.23). [1] https://lore.kernel.org/bpf/20211220015110.3rqxk5qwub3pa2gh@ast-mbp.dhcp.thefacebook.com/t/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-06selftests/bpf: Add a test for btf_type_tag "percpu"Hao Luo3-29/+215
Add test for percpu btf_type_tag. Similar to the "user" tag, we test the following cases: 1. __percpu struct field. 2. __percpu as function parameter. 3. per_cpu_ptr() accepts dynamically allocated __percpu memory. Because the test for "user" and the test for "percpu" are very similar, a little bit of refactoring has been done in btf_tag.c. Basically, both tests share the same function for loading vmlinux and module btf. Example output from log: > ./test_progs -v -t btf_tag libbpf: prog 'test_percpu1': BPF program load failed: Permission denied libbpf: prog 'test_percpu1': -- BEGIN PROG LOAD LOG -- ... ; g = arg->a; 1: (61) r1 = *(u32 *)(r1 +0) R1 is ptr_bpf_testmod_btf_type_tag_1 access percpu memory: off=0 ... test_btf_type_tag_mod_percpu:PASS:btf_type_tag_percpu 0 nsec #26/6 btf_tag/btf_type_tag_percpu_mod1:OK libbpf: prog 'test_percpu2': BPF program load failed: Permission denied libbpf: prog 'test_percpu2': -- BEGIN PROG LOAD LOG -- ... ; g = arg->p->a; 2: (61) r1 = *(u32 *)(r1 +0) R1 is ptr_bpf_testmod_btf_type_tag_1 access percpu memory: off=0 ... test_btf_type_tag_mod_percpu:PASS:btf_type_tag_percpu 0 nsec #26/7 btf_tag/btf_type_tag_percpu_mod2:OK libbpf: prog 'test_percpu_load': BPF program load failed: Permission denied libbpf: prog 'test_percpu_load': -- BEGIN PROG LOAD LOG -- ... ; g = (__u64)cgrp->rstat_cpu->updated_children; 2: (79) r1 = *(u64 *)(r1 +48) R1 is ptr_cgroup_rstat_cpu access percpu memory: off=48 ... test_btf_type_tag_vmlinux_percpu:PASS:btf_type_tag_percpu_load 0 nsec #26/8 btf_tag/btf_type_tag_percpu_vmlinux_load:OK load_btfs:PASS:could not load vmlinux BTF 0 nsec test_btf_type_tag_vmlinux_percpu:PASS:btf_type_tag_percpu 0 nsec test_btf_type_tag_vmlinux_percpu:PASS:btf_type_tag_percpu_helper 0 nsec #26/9 btf_tag/btf_type_tag_percpu_vmlinux_helper:OK Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304191657.981240-5-haoluo@google.com
2022-03-06bpf: Reject programs that try to load __percpu memory.Hao Luo3-13/+30
With the introduction of the btf_type_tag "percpu", we can add a MEM_PERCPU to identify those pointers that point to percpu memory. The ability of differetiating percpu pointers from regular memory pointers have two benefits: 1. It forbids unexpected use of percpu pointers, such as direct loads. In kernel, there are special functions used for accessing percpu memory. Directly loading percpu memory is meaningless. We already have BPF helpers like bpf_per_cpu_ptr() and bpf_this_cpu_ptr() that wrap the kernel percpu functions. So we can now convert percpu pointers into regular pointers in a safe way. 2. Previously, bpf_per_cpu_ptr() and bpf_this_cpu_ptr() only work on PTR_TO_PERCPU_BTF_ID, a special reg_type which describes static percpu variables in kernel (we rely on pahole to encode them into vmlinux BTF). Now, since we can identify __percpu tagged pointers, we can also identify dynamically allocated percpu memory as well. It means we can use bpf_xxx_cpu_ptr() on dynamic percpu memory. This would be very convenient when accessing fields like "cgroup->rstat_cpu". Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304191657.981240-4-haoluo@google.com
2022-03-06compiler_types: Define __percpu as __attribute__((btf_type_tag("percpu")))Hao Luo1-1/+6
This is similar to commit 7472d5a642c9 ("compiler_types: define __user as __attribute__((btf_type_tag("user")))"), where a type tag "user" was introduced to identify the pointers that point to user memory. With that change, the newest compile toolchain can encode __user information into vmlinux BTF, which can be used by the BPF verifier to enforce safe program behaviors. Similarly, we have __percpu attribute, which is mainly used to indicate memory is allocated in percpu region. The __percpu pointers in kernel are supposed to be used together with functions like per_cpu_ptr() and this_cpu_ptr(), which perform necessary calculation on the pointer's base address. Without the btf_type_tag introduced in this patch, __percpu pointers will be treated as regular memory pointers in vmlinux BTF and BPF programs are allowed to directly dereference them, generating incorrect behaviors. Now with "percpu" btf_type_tag, the BPF verifier is able to differentiate __percpu pointers from regular pointers and forbids unexpected behaviors like direct load. The following is an example similar to the one given in commit 7472d5a642c9: [$ ~] cat test.c #define __percpu __attribute__((btf_type_tag("percpu"))) int foo(int __percpu *arg) { return *arg; } [$ ~] clang -O2 -g -c test.c [$ ~] pahole -JV test.o ... File test.o: [1] INT int size=4 nr_bits=32 encoding=SIGNED [2] TYPE_TAG percpu type_id=1 [3] PTR (anon) type_id=2 [4] FUNC_PROTO (anon) return=1 args=(3 arg) [5] FUNC foo type_id=4 [$ ~] for the function argument "int __percpu *arg", its type is described as PTR -> TYPE_TAG(percpu) -> INT The kernel can use this information for bpf verification or other use cases. Like commit 7472d5a642c9, this feature requires clang (>= clang14) and pahole (>= 1.23). Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304191657.981240-3-haoluo@google.com
2022-03-06bpf: Fix checking PTR_TO_BTF_ID in check_mem_accessHao Luo1-1/+2
With the introduction of MEM_USER in commit c6f1bfe89ac9 ("bpf: reject program if a __user tagged memory accessed in kernel way") PTR_TO_BTF_ID can be combined with a MEM_USER tag. Therefore, most likely, when we compare reg_type against PTR_TO_BTF_ID, we want to use the reg's base_type. Previously the check in check_mem_access() wants to say: if the reg is BTF_ID but not NULL, the execution flow falls into the 'then' branch. But now a reg of (BTF_ID | MEM_USER), which should go into the 'then' branch, goes into the 'else'. The end results before and after this patch are the same: regs tagged with MEM_USER get rejected, but not in a way we intended. So fix the condition, the error message now is correct. Before (log from commit 696c39011538): $ ./test_progs -v -n 22/3 ... libbpf: prog 'test_user1': BPF program load failed: Permission denied libbpf: prog 'test_user1': -- BEGIN PROG LOAD LOG -- R1 type=ctx expected=fp 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 ; int BPF_PROG(test_user1, struct bpf_testmod_btf_type_tag_1 *arg) 0: (79) r1 = *(u64 *)(r1 +0) func 'bpf_testmod_test_btf_type_tag_user_1' arg0 has btf_id 136561 type STRUCT 'bpf_testmod_btf_type_tag_1' 1: R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,off=0,imm=0) ; g = arg->a; 1: (61) r1 = *(u32 *)(r1 +0) R1 invalid mem access 'user_ptr_' Now: libbpf: prog 'test_user1': BPF program load failed: Permission denied libbpf: prog 'test_user1': -- BEGIN PROG LOAD LOG -- R1 type=ctx expected=fp 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 ; int BPF_PROG(test_user1, struct bpf_testmod_btf_type_tag_1 *arg) 0: (79) r1 = *(u64 *)(r1 +0) func 'bpf_testmod_test_btf_type_tag_user_1' arg0 has btf_id 104036 type STRUCT 'bpf_testmod_btf_type_tag_1' 1: R1_w=user_ptr_bpf_testmod_btf_type_tag_1(id=0,ref_obj_id=0,off=0,imm=0) ; g = arg->a; 1: (61) r1 = *(u32 *)(r1 +0) R1 is ptr_bpf_testmod_btf_type_tag_1 access user memory: off=0 Note the error message for the reason of rejection. Fixes: c6f1bfe89ac9 ("bpf: reject program if a __user tagged memory accessed in kernel way") Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220304191657.981240-2-haoluo@google.com
2022-03-06Merge branch 'Fixes for bad PTR_TO_BTF_ID offset'Alexei Starovoitov11-53/+230
Kumar Kartikeya Dwivedi says: ==================== This set fixes a bug related to bad var_off being permitted for kfunc call in case of PTR_TO_BTF_ID, consolidates offset checks for all register types allowed as helper or kfunc arguments into a common shared helper, and introduces a couple of other checks to harden the kfunc release logic and prevent future bugs. Some selftests are also included that fail in absence of these fixes, serving as demonstration of the issues being fixed. Changelog: ---------- v3 -> v4: v3: https://lore.kernel.org/bpf/20220304000508.2904128-1-memxor@gmail.com * Update commit message for __diag patch to say clang instead of LLVM (Nathan) * Address nits for check_func_arg_reg_off (Martin) * Add comment for fixed_off_ok case, remove is_kfunc check (Martin) v2 -> v3: v2: https://lore.kernel.org/bpf/20220303045029.2645297-1-memxor@gmail.com * Add my SoB to __diag for clang patch (Nathan) v1 -> v2: v1: https://lore.kernel.org/bpf/20220301065745.1634848-1-memxor@gmail.com * Put reg->off check for release kfunc inside check_func_arg_reg_off, make the check a bit more readable * Squash verifier selftests errstr update into patch 3 for bisect (Alexei) * Include fix from Nathan for clang warning about missing prototypes * Add unified __diag_ingore_all that works for both GCC/LLVM (Alexei) Older discussion: Link: https://lore.kernel.org/bpf/20220219113744.1852259-1-memxor@gmail.com Kumar Kartikeya Dwivedi (7): bpf: Add check_func_arg_reg_off function bpf: Fix PTR_TO_BTF_ID var_off check bpf: Disallow negative offset in check_ptr_off_reg bpf: Harden register offset checks for release helpers and kfuncs compiler_types.h: Add unified __diag_ignore_all for GCC/LLVM bpf: Replace __diag_ignore with unified __diag_ignore_all selftests/bpf: Add tests for kfunc register offset checks ==================== Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-06selftests/bpf: Add tests for kfunc register offset checksKumar Kartikeya Dwivedi2-0/+94
Include a few verifier selftests that test against the problems being fixed by previous commits, i.e. release kfunc always require PTR_TO_BTF_ID fixed and var_off to be 0, and negative offset is not permitted and returns a helpful error message. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-9-memxor@gmail.com
2022-03-06bpf: Replace __diag_ignore with unified __diag_ignore_allKumar Kartikeya Dwivedi2-4/+5
Currently, -Wmissing-prototypes warning is ignored for GCC, but not clang. This leads to clang build warning in W=1 mode. Since the flag used by both compilers is same, we can use the unified __diag_ignore_all macro that works for all supported versions and compilers which have __diag macro support (currently GCC >= 8.0, and Clang >= 11.0). Also add nf_conntrack_bpf.h include to prevent missing prototype warning for register_nf_conntrack_bpf. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-8-memxor@gmail.com
2022-03-06compiler_types.h: Add unified __diag_ignore_all for GCC/LLVMKumar Kartikeya Dwivedi3-0/+10
Add a __diag_ignore_all macro, to ignore warnings for both GCC and LLVM, without having to specify the compiler type and version. By default, GCC 8 and clang 11 are used. This will be used by bpf subsystem to ignore -Wmissing-prototypes warning for functions that are meant to be global functions so that they are in vmlinux BTF, but don't have a prototype. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-7-memxor@gmail.com
2022-03-06compiler-clang.h: Add __diag infrastructure for clangNathan Chancellor1-0/+22
Add __diag macros similar to those in compiler-gcc.h, so that warnings that need to be adjusted for specific cases but not globally can be ignored when building with clang. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-6-memxor@gmail.com [ Kartikeya: wrote commit message ]
2022-03-06bpf: Harden register offset checks for release helpers and kfuncsKumar Kartikeya Dwivedi3-18/+43
Let's ensure that the PTR_TO_BTF_ID reg being passed in to release BPF helpers and kfuncs always has its offset set to 0. While not a real problem now, there's a very real possibility this will become a problem when more and more kfuncs are exposed, and more BPF helpers are added which can release PTR_TO_BTF_ID. Previous commits already protected against non-zero var_off. One of the case we are concerned about now is when we have a type that can be returned by e.g. an acquire kfunc: struct foo { int a; int b; struct bar b; }; ... and struct bar is also a type that can be returned by another acquire kfunc. Then, doing the following sequence: struct foo *f = bpf_get_foo(); // acquire kfunc if (!f) return 0; bpf_put_bar(&f->b); // release kfunc ... would work with the current code, since the btf_struct_ids_match takes reg->off into account for matching pointer type with release kfunc argument type, but would obviously be incorrect, and most likely lead to a kernel crash. A test has been included later to prevent regressions in this area. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-5-memxor@gmail.com
2022-03-06bpf: Disallow negative offset in check_ptr_off_regKumar Kartikeya Dwivedi3-5/+11
check_ptr_off_reg only allows fixed offset to be set for PTR_TO_BTF_ID, where reg->off < 0 doesn't make sense. This would shift the pointer backwards, and fails later in btf_struct_ids_match or btf_struct_walk due to out of bounds access (since offset is interpreted as unsigned). Improve the verifier by rejecting this case by using a better error message for BPF helpers and kfunc, by putting a check inside the check_func_arg_reg_off function. Also, update existing verifier selftests to work with new error string. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-4-memxor@gmail.com
2022-03-06bpf: Fix PTR_TO_BTF_ID var_off checkKumar Kartikeya Dwivedi1-3/+6
When kfunc support was added, check_ctx_reg was called for PTR_TO_CTX register, but no offset checks were made for PTR_TO_BTF_ID. Only reg->off was taken into account by btf_struct_ids_match, which protected against type mismatch due to non-zero reg->off, but when reg->off was zero, a user could set the variable offset of the register and allow it to be passed to kfunc, leading to bad pointer being passed into the kernel. Fix this by reusing the extracted helper check_func_arg_reg_off from previous commit, and make one call before checking all supported register types. Since the list is maintained, any future changes will be taken into account by updating check_func_arg_reg_off. This function prevents non-zero var_off to be set for PTR_TO_BTF_ID, but still allows a fixed non-zero reg->off, which is needed for type matching to work correctly when using pointer arithmetic. ARG_DONTCARE is passed as arg_type, since kfunc doesn't support accepting a ARG_PTR_TO_ALLOC_MEM without relying on size of parameter type from BTF (in case of pointer), or using a mem, len pair. The forcing of offset check for ARG_PTR_TO_ALLOC_MEM is done because ringbuf helpers obtain the size from the header located at the beginning of the memory region, hence any changes to the original pointer shouldn't be allowed. In case of kfunc, size is always known, either at verification time, or using the length parameter, hence this forcing is not required. Since this check will happen once already for PTR_TO_CTX, remove the check_ptr_off_reg call inside its block. Fixes: e6ac2450d6de ("bpf: Support bpf program calling kernel function") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-3-memxor@gmail.com
2022-03-06bpf: Add check_func_arg_reg_off functionKumar Kartikeya Dwivedi2-28/+44
Lift the list of register types allowed for having fixed and variable offsets when passed as helper function arguments into a common helper, so that they can be reused for kfunc checks in later commits. Keeping a common helper aids maintainability and allows us to follow the same consistent rules across helpers and kfuncs. Also, convert check_func_arg to use this function. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220304224645.3677453-2-memxor@gmail.com
2022-03-05Merge branch 'libbpf: support custom SEC() handlers'Alexei Starovoitov6-102/+586
Andrii Nakryiko says: ==================== Add ability for user applications and libraries to register custom BPF program SEC() handlers. See patch #2 for examples where this is useful. Patch #1 does some preliminary refactoring to allow exponsing program init, preload, and attach callbacks as public API. It also establishes a protocol to allow optional auto-attach behavior. This will also help the case of sometimes auto-attachable uprobes. v4->v5: - API documentation improvements (Daniel); v3->v4: - init_fn -> prog_setup_fn, preload_fn -> prog_prepare_load_fn (Alexei); v2->v3: - moved callbacks and cookie into OPTS struct (Alan); - added more test scenarios (Alan); - address most of Alan's feedback, but kept API name; v1->v2: - resubmitting due to git send-email screw up. Cc: Alan Maguire <alan.maguire@oracle.com> ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-05selftests/bpf: Add custom SEC() handling selftestAndrii Nakryiko2-0/+239
Add a selftest validating various aspects of libbpf's handling of custom SEC() handlers. It also demonstrates how libraries can ensure very early callbacks registration and unregistration using __attribute__((constructor))/__attribute__((destructor)) functions. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220305010129.1549719-4-andrii@kernel.org
2022-03-05libbpf: Support custom SEC() handlersAndrii Nakryiko4-53/+268
Allow registering and unregistering custom handlers for BPF program. This allows user applications and libraries to plug into libbpf's declarative SEC() definition handling logic. This allows to offload complex and intricate custom logic into external libraries, but still provide a great user experience. One such example is USDT handling library, which has a lot of code and complexity which doesn't make sense to put into libbpf directly, but it would be really great for users to be able to specify BPF programs with something like SEC("usdt/<path-to-binary>:<usdt_provider>:<usdt_name>") and have correct BPF program type set (BPF_PROGRAM_TYPE_KPROBE, as it is uprobe) and even support BPF skeleton's auto-attach logic. In some cases, it might be even good idea to override libbpf's default handling, like for SEC("perf_event") programs. With custom library, it's possible to extend logic to support specifying perf event specification right there in SEC() definition without burdening libbpf with lots of custom logic or extra library dependecies (e.g., libpfm4). With current patch it's possible to override libbpf's SEC("perf_event") handling and specify a completely custom ones. Further, it's possible to specify a generic fallback handling for any SEC() that doesn't match any other custom or standard libbpf handlers. This allows to accommodate whatever legacy use cases there might be, if necessary. See doc comments for libbpf_register_prog_handler() and libbpf_unregister_prog_handler() for detailed semantics. This patch also bumps libbpf development version to v0.8 and adds new APIs there. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220305010129.1549719-3-andrii@kernel.org