summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-16bpf: Clear subreg_def for global function return valuesIlya Leoshkevich1-1/+2
test_global_func4 fails on s390 as reported by Yauheni in [1]. The immediate problem is that the zext code includes the instruction, whose result needs to be zero-extended, into the zero-extension patchlet, and if this instruction happens to be a branch, then its delta is not adjusted. As a result, the verifier rejects the program later. However, according to [2], as far as the verifier's algorithm is concerned and as specified by the insn_no_def() function, branching insns do not define anything. This includes call insns, even though one might argue that they define %r0. This means that the real problem is that zero extension kicks in at all. This happens because clear_caller_saved_regs() sets BPF_REG_0's subreg_def after global function calls. This can be fixed in many ways; this patch mimics what helper function call handling already does. [1] https://lore.kernel.org/bpf/20200903140542.156624-1-yauheni.kaliuta@redhat.com/ [2] https://lore.kernel.org/bpf/CAADnVQ+2RPKcftZw8d+B1UwB35cpBhpF5u3OocNh90D9pETPwg@mail.gmail.com/ Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification") Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210212040408.90109-1-iii@linux.ibm.com
2021-02-13Merge branch 'Add support of pointer to struct in global'Alexei Starovoitov14-47/+588
Dmitrii Banshchikov says: ==================== This patchset adds support of pointers to type with known size among global function arguments. The motivation is to overcome the limit on the maximum number of allowed arguments and avoid tricky and unoptimal ways of passing arguments. A referenced type may contain pointers but access via such pointers cannot be veirified currently. v2 -> v3 - Fix reg ID generation - Fix commit description - Fix typo - Fix tests v1 -> v2: - Allow pointer to any type with known size rather than struct only - Allow pointer in global functions only - Add more tests - Fix wrapping and v1 comments ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-13selftests/bpf: Add unit tests for pointers in global functionsDmitrii Banshchikov11-0/+449
test_global_func9 - check valid pointer's scenarios test_global_func10 - check that a smaller type cannot be passed as a larger one test_global_func11 - check that CTX pointer cannot be passed test_global_func12 - check access to a null pointer test_global_func13 - check access to an arbitrary pointer value test_global_func14 - check that an opaque pointer cannot be passed test_global_func15 - check that a variable has an unknown value after it was passed to a global function by pointer test_global_func16 - check access to uninitialized stack memory test_global_func_args - check read and write operations through a pointer Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-5-me@ubique.spb.ru
2021-02-13bpf: Support pointers in global func argsDmitrii Banshchikov3-10/+77
Add an ability to pass a pointer to a type with known size in arguments of a global function. Such pointers may be used to overcome the limit on the maximum number of arguments, avoid expensive and tricky workarounds and to have multiple output arguments. A referenced type may contain pointers but indirect access through them isn't supported. The implementation consists of two parts. If a global function has an argument that is a pointer to a type with known size then: 1) In btf_check_func_arg_match(): check that the corresponding register points to NULL or to a valid memory region that is large enough to contain the expected argument's type. 2) In btf_prepare_func_args(): set the corresponding register type to PTR_TO_MEM_OR_NULL and its size to the size of the expected type. Only global functions are supported because allowance of pointers for static functions might break validation. Consider the following scenario. A static function has a pointer argument. A caller passes pointer to its stack memory. Because the callee can change referenced memory verifier cannot longer assume any particular slot type of the caller's stack memory hence the slot type is changed to SLOT_MISC. If there is an operation that relies on slot type other than SLOT_MISC then verifier won't be able to infer safety of the operation. When verifier sees a static function that has a pointer argument different from PTR_TO_CTX then it skips arguments check and continues with "inline" validation with more information available. The operation that relies on the particular slot type now succeeds. Because global functions were not allowed to have pointer arguments different from PTR_TO_CTX it's not possible to break existing and valid code. Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-4-me@ubique.spb.ru
2021-02-13bpf: Extract nullable reg type conversion into a helper functionDmitrii Banshchikov1-31/+52
Extract conversion from a register's nullable type to a type with a value. The helper will be used in mark_ptr_not_null_reg(). Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-3-me@ubique.spb.ru
2021-02-13bpf: Rename bpf_reg_state variablesDmitrii Banshchikov1-8/+12
Using "reg" for an array of bpf_reg_state and "reg[i + 1]" for an individual bpf_reg_state is error-prone and verbose. Use "regs" for the former and "reg" for the latter as other code nearby does. Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212205642.620788-2-me@ubique.spb.ru
2021-02-13selftests/bpf: Tests using bpf_check_mtu BPF-helperJesper Dangaard Brouer2-0/+414
Adding selftest for BPF-helper bpf_check_mtu(). Making sure it can be used from both XDP and TC. V16: - Fix 'void' function definition V11: - Addresse nitpicks from Andrii Nakryiko V10: - Remove errno non-zero test in CHECK_ATTR() - Addresse comments from Andrii Nakryiko Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/161287791989.790810.13612620012522164562.stgit@firesoul
2021-02-13selftests/bpf: Use bpf_check_mtu in selftest test_cls_redirectJesper Dangaard Brouer1-0/+7
This demonstrate how bpf_check_mtu() helper can easily be used together with bpf_skb_adjust_room() helper, prior to doing size adjustment, as delta argument is already setup. Hint: This specific test can be selected like this: ./test_progs -t cls_redirect Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287791481.790810.4444271170546646080.stgit@firesoul
2021-02-13bpf: Drop MTU check when doing TC-BPF redirect to ingressJesper Dangaard Brouer3-23/+47
The use-case for dropping the MTU check when TC-BPF does redirect to ingress, is described by Eyal Birger in email[0]. The summary is the ability to increase packet size (e.g. with IPv6 headers for NAT64) and ingress redirect packet and let normal netstack fragment packet as needed. [0] https://lore.kernel.org/netdev/CAHsH6Gug-hsLGHQ6N0wtixdOa85LDZ3HNRHVd0opR=19Qo4W4Q@mail.gmail.com/ V15: - missing static for function declaration V9: - Make net_device "up" (IFF_UP) check explicit in skb_do_redirect V4: - Keep net_device "up" (IFF_UP) check. - Adjustment to handle bpf_redirect_peer() helper Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287790971.790810.11785274340154740591.stgit@firesoul
2021-02-13bpf: Add BPF-helper for MTU checkingJesper Dangaard Brouer3-0/+264
This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs. The SKB object is complex and the skb->len value (accessible from BPF-prog) also include the length of any extra GRO/GSO segments, but without taking into account that these GRO/GSO segments get added transport (L4) and network (L3) headers before being transmitted. Thus, this BPF-helper is created such that the BPF-programmer don't need to handle these details in the BPF-prog. The API is designed to help the BPF-programmer, that want to do packet context size changes, which involves other helpers. These other helpers usually does a delta size adjustment. This helper also support a delta size (len_diff), which allow BPF-programmer to reuse arguments needed by these other helpers, and perform the MTU check prior to doing any actual size adjustment of the packet context. It is on purpose, that we allow the len adjustment to become a negative result, that will pass the MTU check. This might seem weird, but it's not this helpers responsibility to "catch" wrong len_diff adjustments. Other helpers will take care of these checks, if BPF-programmer chooses to do actual size adjustment. V14: - Improve man-page desc of len_diff. V13: - Enforce flag BPF_MTU_CHK_SEGS cannot use len_diff. V12: - Simplify segment check that calls skb_gso_validate_network_len. - Helpers should return long V9: - Use dev->hard_header_len (instead of ETH_HLEN) - Annotate with unlikely req from Daniel - Fix logic error using skb_gso_validate_network_len from Daniel V6: - Took John's advice and dropped BPF_MTU_CHK_RELAX - Returned MTU is kept at L3-level (like fib_lookup) V4: Lot of changes - ifindex 0 now use current netdev for MTU lookup - rename helper from bpf_mtu_check to bpf_check_mtu - fix bug for GSO pkt length (as skb->len is total len) - remove __bpf_len_adj_positive, simply allow negative len adj Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287790461.790810.3429728639563297353.stgit@firesoul
2021-02-13bpf: bpf_fib_lookup return MTU value as output when looked upJesper Dangaard Brouer3-11/+33
The BPF-helpers for FIB lookup (bpf_xdp_fib_lookup and bpf_skb_fib_lookup) can perform MTU check and return BPF_FIB_LKUP_RET_FRAG_NEEDED. The BPF-prog don't know the MTU value that caused this rejection. If the BPF-prog wants to implement PMTU (Path MTU Discovery) (rfc1191) it need to know this MTU value for the ICMP packet. Patch change lookup and result struct bpf_fib_lookup, to contain this MTU value as output via a union with 'tot_len' as this is the value used for the MTU lookup. V5: - Fixed uninit value spotted by Dan Carpenter. - Name struct output member mtu_result Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287789952.790810.13134700381067698781.stgit@firesoul
2021-02-13bpf: Fix bpf_fib_lookup helper MTU check for SKB ctxJesper Dangaard Brouer1-3/+10
BPF end-user on Cilium slack-channel (Carlo Carraro) wants to use bpf_fib_lookup for doing MTU-check, but *prior* to extending packet size, by adjusting fib_params 'tot_len' with the packet length plus the expected encap size. (Just like the bpf_check_mtu helper supports). He discovered that for SKB ctx the param->tot_len was not used, instead skb->len was used (via MTU check in is_skb_forwardable() that checks against netdev MTU). Fix this by using fib_params 'tot_len' for MTU check. If not provided (e.g. zero) then keep existing TC behaviour intact. Notice that 'tot_len' for MTU check is done like XDP code-path, which checks against FIB-dst MTU. V16: - Revert V13 optimization, 2nd lookup is against egress/resulting netdev V13: - Only do ifindex lookup one time, calling dev_get_by_index_rcu(). V10: - Use same method as XDP for 'tot_len' MTU check Fixes: 4c79579b44b1 ("bpf: Change bpf_fib_lookup to return lookup status") Reported-by: Carlo Carraro <colrack@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/161287789444.790810.15247494756551413508.stgit@firesoul
2021-02-13bpf: Remove MTU check in __bpf_skb_max_lenJesper Dangaard Brouer1-8/+4
Multiple BPF-helpers that can manipulate/increase the size of the SKB uses __bpf_skb_max_len() as the max-length. This function limit size against the current net_device MTU (skb->dev->mtu). When a BPF-prog grow the packet size, then it should not be limited to the MTU. The MTU is a transmit limitation, and software receiving this packet should be allowed to increase the size. Further more, current MTU check in __bpf_skb_max_len uses the MTU from ingress/current net_device, which in case of redirects uses the wrong net_device. This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit is elsewhere in the system. Jesper's testing[1] showed it was not possible to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is in-effect due to this being called from softirq context see code __gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed that frames above 16KiB can cause NICs to reset (but not crash). Keep this sanity limit at this level as memory layer can differ based on kernel config. [1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@firesoul
2021-02-13bpf, devmap: Use GFP_KERNEL for xdp bulk queue allocationJun'ichi Nomura1-3/+1
The devmap bulk queue is allocated with GFP_ATOMIC and the allocation may fail if there is no available space in existing percpu pool. Since commit 75ccae62cb8d42 ("xdp: Move devmap bulk queue into struct net_device") moved the bulk queue allocation to NETDEV_REGISTER callback, whose context is allowed to sleep, use GFP_KERNEL instead of GFP_ATOMIC to let percpu allocator extend the pool when needed and avoid possible failure of netdev registration. As the required alignment is natural, we can simply use alloc_percpu(). Fixes: 75ccae62cb8d42 ("xdp: Move devmap bulk queue into struct net_device") Signed-off-by: Jun'ichi Nomura <junichi.nomura@nec.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210209082451.GA44021@jeru.linux.bs1.fc.nec.co.jp
2021-02-13bpf: Fix an unitialized value in bpf_iterYonghong Song1-1/+1
Commit 15d83c4d7cef ("bpf: Allow loading of a bpf_iter program") cached btf_id in struct bpf_iter_target_info so later on if it can be checked cheaply compared to checking registered names. syzbot found a bug that uninitialized value may occur to bpf_iter_target_info->btf_id. This is because we allocated bpf_iter_target_info structure with kmalloc and never initialized field btf_id afterwards. This uninitialized btf_id is typically compared to a u32 bpf program func proto btf_id, and the chance of being equal is extremely slim. This patch fixed the issue by using kzalloc which will also prevent future likely instances due to adding new fields. Fixes: 15d83c4d7cef ("bpf: Allow loading of a bpf_iter program") Reported-by: syzbot+580f4f2a272e452d55cb@syzkaller.appspotmail.com Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210212005926.2875002-1-yhs@fb.com
2021-02-13tools/resolve_btfids: Add /libbpf to .gitignoreStanislav Fomichev1-0/+1
This is what I see after compiling the kernel: # bpf-next...bpf-next/master ?? tools/bpf/resolve_btfids/libbpf/ Fixes: fc6b48f692f8 ("tools/resolve_btfids: Build libbpf and libsubcmd in separate directories") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212010053.668700-1-sdf@google.com
2021-02-12Merge branch 'introduce bpf_iter for task_vma'Alexei Starovoitov5-11/+444
Song Liu says: ==================== This set introduces bpf_iter for task_vma, which can be used to generate information similar to /proc/pid/maps. Patch 4/4 adds an example that mimics /proc/pid/maps. Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of vma's of a process. However, these information are not flexible enough to cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB pages (x86_64), there is no easy way to tell which address ranges are backed by 2MB pages. task_vma solves the problem by enabling the user to generate customize information based on the vma (and vma->vm_mm, vma->vm_file, etc.). Changes v6 => v7: 1. Let BPF iter program use bpf_d_path without specifying sleepable. (Alexei) Changes v5 => v6: 1. Add more comments for task_vma_seq_get_next() to explain the logic of find_vma() calls. (Alexei) 2. Skip vma found by find_vma() when both vm_start and vm_end matches prev_vm_[start|end]. Previous versions only compares vm_start. IOW, if vma of [4k, 8k] is replaced by [4k, 12k] after relocking mmap_lock, v5 will skip the new vma, while v6 will process it. Changes v4 => v5: 1. Fix a refcount leak on task_struct. (Yonghong) 2. Fix the selftest. (Yonghong) Changes v3 => v4: 1. Avoid skipping vma by assigning invalid prev_vm_start in task_vma_seq_stop(). (Yonghong) 2. Move "again" label in task_vma_seq_get_next() save a check. (Yonghong) Changes v2 => v3: 1. Rewrite 1/4 so that we hold mmap_lock while calling BPF program. This enables the BPF program to access the real vma with BTF. (Alexei) 2. Fix the logic when the control is returned to user space. (Yonghong) 3. Revise commit log and cover letter. (Yonghong) Changes v1 => v2: 1. Small fixes in task_iter.c and the selftests. (Yonghong) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-12selftests/bpf: Add test for bpf_iter_task_vmaSong Liu3-10/+174
The test dumps information similar to /proc/pid/maps. The first line of the output is compared against the /proc file to make sure they match. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-4-songliubraving@fb.com
2021-02-12bpf: Allow bpf_d_path in bpf_iter programSong Liu1-0/+4
task_file and task_vma iter programs have access to file->f_path. Enable bpf_d_path to print paths of these file. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-3-songliubraving@fb.com
2021-02-12bpf: Introduce task_vma bpf_iterSong Liu1-1/+266
Introduce task_vma bpf_iter to print memory information of a process. It can be used to print customized information similar to /proc/<pid>/maps. Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of vma's of a process. However, these information are not flexible enough to cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB pages (x86_64), there is no easy way to tell which address ranges are backed by 2MB pages. task_vma solves the problem by enabling the user to generate customize information based on the vma (and vma->vm_mm, vma->vm_file, etc.). To access the vma safely in the BPF program, task_vma iterator holds target mmap_lock while calling the BPF program. If the mmap_lock is contended, task_vma unlocks mmap_lock between iterations to unblock the writer(s). This lock contention avoidance mechanism is similar to the one used in show_smaps_rollup(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com
2021-02-12bpf: selftests: Add non function pointer test to struct_opsMartin KaFai Lau1-0/+1
This patch adds a "void *owner" member. The existing bpf_tcp_ca test will ensure the bpf_cubic.o and bpf_dctcp.o can be loaded. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212021037.267278-1-kafai@fb.com
2021-02-12libbpf: Ignore non function pointer member in struct_opsMartin KaFai Lau1-11/+11
When libbpf initializes the kernel's struct_ops in "bpf_map__init_kern_struct_ops()", it enforces all pointer types must be a function pointer and rejects others. It turns out to be too strict. For example, when directly using "struct tcp_congestion_ops" from vmlinux.h, it has a "struct module *owner" member and it is set to NULL in a bpf_tcp_cc.o. Instead, it only needs to ensure the member is a function pointer if it has been set (relocated) to a bpf-prog. This patch moves the "btf_is_func_proto(kern_mtype)" check after the existing "if (!prog) { continue; }". The original debug message in "if (!prog) { continue; }" is also removed since it is no longer valid. Beside, there is a later debug message to tell which function pointer is set. The "btf_is_func_proto(mtype)" has already been guaranteed in "bpf_object__collect_st_ops_relos()" which has been run before "bpf_map__init_kern_struct_ops()". Thus, this check is removed. v2: - Remove outdated debug message (Andrii) Remove because there is a later debug message to tell which function pointer is set. - Following mtype->type is no longer needed. Remove: "skip_mods_and_typedefs(btf, mtype->type, &mtype_id)" - Do "if (!prog)" test before skip_mods_and_typedefs. Fixes: 590a00888250 ("bpf: libbpf: Add STRUCT_OPS support") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210212021030.266932-1-kafai@fb.com
2021-02-12libbpf: Use AF_LOCAL instead of AF_INET in xsk.cStanislav Fomichev1-1/+1
We have the environments where usage of AF_INET is prohibited (cgroup/sock_create returns EPERM for AF_INET). Let's use AF_LOCAL instead of AF_INET, it should perfectly work with SIOCETHTOOL. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20210209221826.922940-1-sdf@google.com
2021-02-12bpf: Fix subreg optimization for BPF_FETCHIlya Leoshkevich1-2/+21
All 32-bit variants of BPF_FETCH (add, and, or, xor, xchg, cmpxchg) define a 32-bit subreg and thus have zext_dst set. Their encoding, however, uses dst_reg field as a base register, which causes opt_subreg_zext_lo32_rnd_hi32() to zero-extend said base register instead of the one the insn really defines (r0 or src_reg). Fix by properly choosing a register being defined, similar to how check_atomic() already does that. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210210204502.83429-1-iii@linux.ibm.com
2021-02-12docs: bpf: Clarify BPF_CMPXCHG wordingIlya Leoshkevich1-2/+2
Based on [1], BPF_CMPXCHG should always load the old value into R0. The phrasing in bpf.rst is somewhat ambiguous in this regard, improve it to make this aspect crystal clear. [1] https://lore.kernel.org/bpf/CAADnVQJFcFwxEz=wnV=hkie-EDwa8s5JGbBQeFt1TGux1OihJw@mail.gmail.com/ Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210210142853.82203-1-iii@linux.ibm.com
2021-02-12bpf: Clear per_cpu pointers during bpf_prog_reallocAlexei Starovoitov1-0/+2
bpf_prog_realloc copies contents of struct bpf_prog. The pointers have to be cleared before freeing old struct. Reported-by: Ilya Leoshkevich <iii@linux.ibm.com> Fixes: 700d4796ef59 ("bpf: Optimize program stats") Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-12selftests/bpf: Add a selftest for the tracing bpf_get_socket_cookieFlorent Revest2-6/+41
This builds up on the existing socket cookie test which checks whether the bpf_get_socket_cookie helpers provide the same value in cgroup/connect6 and sockops programs for a socket created by the userspace part of the test. Instead of having an update_cookie sockops program tag a socket local storage with 0xFF, this uses both an update_cookie_sockops program and an update_cookie_tracing program which succesively tag the socket with 0x0F and then 0xF0. Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210210111406.785541-5-revest@chromium.org
2021-02-12selftests/bpf: Use vmlinux.h in socket_cookie_prog.cFlorent Revest1-5/+6
When migrating from the bpf.h's to the vmlinux.h's definition of struct bps_sock, an interesting LLVM behavior happened. LLVM started producing two fetches of ctx->sk in the sockops program this means that the verifier could not keep track of the NULL-check on ctx->sk. Therefore, we need to extract ctx->sk in a variable before checking and dereferencing it. Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210111406.785541-4-revest@chromium.org
2021-02-12selftests/bpf: Integrate the socket_cookie test to test_progsFlorent Revest5-213/+72
Currently, the selftest for the BPF socket_cookie helpers is built and run independently from test_progs. It's easy to forget and hard to maintain. This patch moves the socket cookies test into prog_tests/ and vastly simplifies its logic by: - rewriting the loading code with BPF skeletons - rewriting the server/client code with network helpers - rewriting the cgroup code with test__join_cgroup - rewriting the error handling code with CHECKs Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210111406.785541-3-revest@chromium.org
2021-02-12bpf: Expose bpf_get_socket_cookie to tracing programsFlorent Revest5-0/+31
This needs a new helper that: - can work in a sleepable context (using sock_gen_cookie) - takes a struct sock pointer and checks that it's not NULL Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210111406.785541-2-revest@chromium.org
2021-02-12bpf: Be less specific about socket cookies guaranteesFlorent Revest2-8/+8
Since "92acdc58ab11 bpf, net: Rework cookie generator as per-cpu one" socket cookies are not guaranteed to be non-decreasing. The bpf_get_socket_cookie helper descriptions are currently specifying that cookies are non-decreasing but we don't want users to rely on that. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210210111406.785541-1-revest@chromium.org
2021-02-11kbuild: Do not clean resolve_btfids if the output does not existJiri Olsa1-1/+7
Nathan reported issue with cleaning empty build directory: $ make -s O=build distclean ../../scripts/Makefile.include:4: *** \ O=/ho...build/tools/bpf/resolve_btfids does not exist. Stop. The problem that tools scripts require existing output directory, otherwise it fails. Adding check around the resolve_btfids clean target to ensure the output directory is in place. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/bpf/20210211124004.1144344-1-jolsa@kernel.org
2021-02-11selftests/bpf: Add a test for map-in-map and per-cpu maps in sleepable progsAlexei Starovoitov1-0/+69
Add a basic test for map-in-map and per-cpu maps in sleepable programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-10-alexei.starovoitov@gmail.com
2021-02-11bpf: Allows per-cpu maps and map-in-map in sleepable programsAlexei Starovoitov2-3/+8
Since sleepable programs are now executing under migrate_disable the per-cpu maps are safe to use. The map-in-map were ok to use in sleepable from the time sleepable progs were introduced. Note that non-preallocated maps are still not safe, since there is no rcu_read_lock yet in sleepable programs and dynamically allocated map elements are relying on rcu protection. The sleepable programs have rcu_read_lock_trace instead. That limitation will be addresses in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-9-alexei.starovoitov@gmail.com
2021-02-11selftests/bpf: Improve recursion selftestAlexei Starovoitov1-0/+8
Since recursion_misses counter is available in bpf_prog_info improve the selftest to make sure it's counting correctly. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210210033634.62081-8-alexei.starovoitov@gmail.com
2021-02-11bpf: Count the number of times recursion was preventedAlexei Starovoitov6-6/+33
Add per-program counter for number of times recursion prevention mechanism was triggered and expose it via show_fdinfo and bpf_prog_info. Teach bpftool to print it. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-7-alexei.starovoitov@gmail.com
2021-02-11selftest/bpf: Add a recursion testAlexei Starovoitov2-0/+79
Add recursive non-sleepable fentry program as a test. All attach points where sleepable progs can execute are non recursive so far. The recursion protection mechanism for sleepable cannot be activated yet. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-6-alexei.starovoitov@gmail.com
2021-02-11bpf: Add per-program recursion prevention mechanismAlexei Starovoitov7-11/+50
Since both sleepable and non-sleepable programs execute under migrate_disable add recursion prevention mechanism to both types of programs when they're executed via bpf trampoline. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-5-alexei.starovoitov@gmail.com
2021-02-11bpf: Compute program stats for sleepable programsAlexei Starovoitov3-35/+42
Since sleepable programs don't migrate from the cpu the excution stats can be computed for them as well. Reuse the same infrastructure for both sleepable and non-sleepable programs. run_cnt -> the number of times the program was executed. run_time_ns -> the program execution time in nanoseconds including the off-cpu time when the program was sleeping. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-4-alexei.starovoitov@gmail.com
2021-02-11bpf: Run sleepable programs with migration disabledAlexei Starovoitov1-0/+2
In older non-RT kernels migrate_disable() was the same as preempt_disable(). Since commit 74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT") migrate_disable() is real and doesn't prevent sleeping. Running sleepable programs with migration disabled allows to add support for program stats and per-cpu maps later. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-3-alexei.starovoitov@gmail.com
2021-02-11bpf: Optimize program statsAlexei Starovoitov6-18/+18
Move bpf_prog_stats from prog->aux into prog to avoid one extra load in critical path of program execution. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-2-alexei.starovoitov@gmail.com
2021-02-11bpf_lru_list: Read double-checked variable once without lockMarco Elver1-3/+4
For double-checked locking in bpf_common_lru_push_free(), node->type is read outside the critical section and then re-checked under the lock. However, concurrent writes to node->type result in data races. For example, the following concurrent access was observed by KCSAN: write to 0xffff88801521bc22 of 1 bytes by task 10038 on cpu 1: __bpf_lru_node_move_in kernel/bpf/bpf_lru_list.c:91 __local_list_flush kernel/bpf/bpf_lru_list.c:298 ... read to 0xffff88801521bc22 of 1 bytes by task 10043 on cpu 0: bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:507 bpf_lru_push_free kernel/bpf/bpf_lru_list.c:555 ... Fix the data races where node->type is read outside the critical section (for double-checked locking) by marking the access with READ_ONCE() as well as ensuring the variable is only accessed once. Fixes: 3a08c2fd7634 ("bpf: LRU List") Reported-by: syzbot+3536db46dfa58c573458@syzkaller.appspotmail.com Reported-by: syzbot+516acdb03d3e27d91bcd@syzkaller.appspotmail.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210209112701.3341724-1-elver@google.com
2021-02-10selftests/bpf: Simplify the calculation of variablesJiapeng Chong1-3/+3
Fix the following coccicheck warnings: ./tools/testing/selftests/bpf/xdpxceiver.c:954:28-30: WARNING !A || A && B is equivalent to !A || B. ./tools/testing/selftests/bpf/xdpxceiver.c:932:28-30: WARNING !A || A && B is equivalent to !A || B. ./tools/testing/selftests/bpf/xdpxceiver.c:909:28-30: WARNING !A || A && B is equivalent to !A || B. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1612860398-102839-1-git-send-email-jiapeng.chong@linux.alibaba.com
2021-02-10selftests/bpf: Fix endianness issues in atomic testsIlya Leoshkevich3-3/+3
Atomic tests store a DW, but then load it back as a W from the same address. This doesn't work on big-endian systems, and since the point of those tests is not testing narrow loads, fix simply by loading a DW. Fixes: 98d666d05a1d ("bpf: Add tests for new BPF atomic operations") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210020713.77911-1-iii@linux.ibm.com
2021-02-10Merge branch 'allow variable-offset stack acces'Alexei Starovoitov14-186/+742
Andrei Matei says: ==================== Before this patch, variable offset access to the stack was dissalowed for regular instructions, but was allowed for "indirect" accesses (i.e. helpers). This patch removes the restriction, allowing reading and writing to the stack through stack pointers with variable offsets. This makes stack-allocated buffers more usable in programs, and brings stack pointers closer to other types of pointers. The motivation is being able to use stack-allocated buffers for data manipulation. When the stack size limit is sufficient, allocating buffers on the stack is simpler than per-cpu arrays, or other alternatives. V2 -> V3 - var-offset writes mark all the stack slots in range as initialized, so that future reads are not rejected. - rewrote the C test to not use uprobes, as per Andrii's suggestion. - addressed other review comments from Alexei. V1 -> V2 - add support for var-offset stack writes, in addition to reads - add a C test - made variable offset direct reads no longer destroy spilled registers in the access range - address review nits ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-02-10selftest/bpf: Add test for var-offset stack accessAndrei Matei2-0/+86
Add a higher-level test (C BPF program) for the new functionality - variable access stack reads and writes. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210207011027.676572-5-andreimatei1@gmail.com
2021-02-10selftest/bpf: Verifier tests for var-off accessAndrei Matei1-2/+97
Add tests for the new functionality - reading and writing to the stack through a variable-offset pointer. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210207011027.676572-4-andreimatei1@gmail.com
2021-02-10selftest/bpf: Adjust expected verifier errorsAndrei Matei9-37/+41
The verifier errors around stack accesses have changed slightly in the previous commit (generally for the better). Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210207011027.676572-3-andreimatei1@gmail.com
2021-02-10bpf: Allow variable-offset stack accessAndrei Matei3-147/+518
Before this patch, variable offset access to the stack was dissalowed for regular instructions, but was allowed for "indirect" accesses (i.e. helpers). This patch removes the restriction, allowing reading and writing to the stack through stack pointers with variable offsets. This makes stack-allocated buffers more usable in programs, and brings stack pointers closer to other types of pointers. The motivation is being able to use stack-allocated buffers for data manipulation. When the stack size limit is sufficient, allocating buffers on the stack is simpler than per-cpu arrays, or other alternatives. In unpriviledged programs, variable-offset reads and writes are disallowed (they were already disallowed for the indirect access case) because the speculative execution checking code doesn't support them. Additionally, when writing through a variable-offset stack pointer, if any pointers are in the accessible range, there's possilibities of later leaking pointers because the write cannot be tracked precisely. Writes with variable offset mark the whole range as initialized, even though we don't know which stack slots are actually written. This is in order to not reject future reads to these slots. Note that this doesn't affect writes done through helpers; like before, helpers need the whole stack range to be initialized to begin with. All the stack slots are in range are considered scalars after the write; variable-offset register spills are not tracked. For reads, all the stack slots in the variable range needs to be initialized (but see above about what writes do), otherwise the read is rejected. All register spilled in stack slots that might be read are marked as having been read, however reads through such pointers don't do register filling; the target register will always be either a scalar or a constant zero. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210207011027.676572-2-andreimatei1@gmail.com
2021-02-09Merge branch 'kbuild/resolve_btfids: Invoke resolve_btfids'Andrii Nakryiko3-25/+28
Jiri Olsa says: ==================== hi, resolve_btfids tool is used during the kernel build, so we should clean it on kernel's make clean. v2 changes: - add Song's acks on patches 1 and 4 (others changed) [Song] - add missing / [Andrii] - change srctree variable initialization [Andrii] - shifted ifdef for clean target [Andrii] thanks, jirka ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>