summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-07mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().Alexei Starovoitov2-2/+62
vmap/vmalloc APIs are used to map a set of pages into contiguous kernel virtual space. get_vm_area() with appropriate flag is used to request an area of kernel address range. It's used for vmalloc, vmap, ioremap, xen use cases. - vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag. - the areas created by vmap() function should be tagged with VM_MAP. - ioremap areas are tagged with VM_IOREMAP. BPF would like to extend the vmap API to implement a lazily-populated sparse, yet contiguous kernel virtual space. Introduce VM_SPARSE flag and vm_area_map_pages(area, start_addr, count, pages) API to map a set of pages within a given area. It has the same sanity checks as vmap() does. It also checks that get_vm_area() was created with VM_SPARSE flag which identifies such areas in /proc/vmallocinfo and returns zero pages on read through /proc/kcore. The next commits will introduce bpf_arena which is a sparsely populated shared memory region between bpf program and user space process. It will map privately-managed pages into a sparse vm area with the following steps: // request virtual memory region during bpf prog verification area = get_vm_area(area_size, VM_SPARSE); // on demand vm_area_map_pages(area, kaddr, kend, pages); vm_area_unmap_pages(area, kaddr, kend); // after bpf program is detached and unloaded free_vm_area(area); Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/bpf/20240305030516.41519-3-alexei.starovoitov@gmail.com
2024-03-06mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.Alexei Starovoitov1-0/+13
There are various users of get_vm_area() + ioremap_page_range() APIs. Enforce that get_vm_area() was requested as VM_IOREMAP type and range passed to ioremap_page_range() matches created vm_area to avoid accidentally ioremap-ing into wrong address range. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/bpf/20240305030516.41519-2-alexei.starovoitov@gmail.com
2024-03-05Merge branch 'Allow struct_ops maps with a large number of programs'Martin KaFai Lau7-60/+279
Kui-Feng Lee says: ==================== The BPF struct_ops previously only allowed for one page to be used for the trampolines of all links in a map. However, we have recently run out of space due to the large number of BPF program links. By allocating additional pages when we exhaust an existing page, we can accommodate more links in a single map. The variable st_map->image has been changed to st_map->image_pages, and its type has been changed to an array of pointers to buffers of PAGE_SIZE. Additional pages are allocated when all existing pages are exhausted. The test case loads a struct_ops maps having 40 programs. Their trampolines takes about 6.6k+ bytes over 1.5 pages on x86. --- Major differences from v3: - Refactor buffer allocations to bpf_struct_ops_tramp_buf_alloc() and bpf_struct_ops_tramp_buf_free(). Major differences from v2: - Move image buffer allocation to bpf_struct_ops_prepare_trampoline(). Major differences from v1: - Always free pages if failing to update. - Allocate 8 pages at most. v3: https://lore.kernel.org/all/20240224030302.1500343-1-thinker.li@gmail.com/ v2: https://lore.kernel.org/all/20240221225911.757861-1-thinker.li@gmail.com/ v1: https://lore.kernel.org/all/20240216182828.201727-1-thinker.li@gmail.com/ ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-05selftests/bpf: Test struct_ops maps with a large number of struct_ops program.Kui-Feng Lee3-0/+176
Create and load a struct_ops map with a large number of struct_ops programs to generate trampolines taking a size over multiple pages. The map includes 40 programs. Their trampolines takes 6.6k+, more than 1.5 pages, on x86. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240224223418.526631-4-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-05bpf: struct_ops supports more than one page for trampolines.Kui-Feng Lee3-50/+96
The BPF struct_ops previously only allowed one page of trampolines. Each function pointer of a struct_ops is implemented by a struct_ops bpf program. Each struct_ops bpf program requires a trampoline. The following selftest patch shows each page can hold a little more than 20 trampolines. While one page is more than enough for the tcp-cc usecase, the sched_ext use case shows that one page is not always enough and hits the one page limit. This patch overcomes the one page limit by allocating another page when needed and it is limited to a total of MAX_IMAGE_PAGES (8) pages which is more than enough for reasonable usages. The variable st_map->image has been changed to st_map->image_pages, and its type has been changed to an array of pointers to pages. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240224223418.526631-3-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-04bpf, net: validate struct_ops when updating value.Kui-Feng Lee2-10/+7
Perform all validations when updating values of struct_ops maps. Doing validation in st_ops->reg() and st_ops->update() is not necessary anymore. However, tcp_register_congestion_control() has been called in various places. It still needs to do validations. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240224223418.526631-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-04selftests/bpf: xdp_hw_metadata reduce sleep intervalSong Yoong Siang1-1/+1
In current ping-pong design, xdp_hw_metadata will wait until the packet transmission completely done, then only start to receive the next packet. The current sleep interval is 10ms, which is unnecessary large. Typically, a NIC does not need such a long time to transmit a packet. Furthermore, during this 10ms sleep time, the app is unable to receive incoming packets. Therefore, this commit reduce sleep interval to 10us, so that xdp_hw_metadata is able to support periodic packets with shorter interval. 10us * 500 = 5ms should be enough for packet transmission and status retrieval. Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20240303083225.1184165-2-yoong.siang.song@intel.com
2024-03-04selftests/bpf: Extend uprobe/uretprobe triggering benchmarksAndrii Nakryiko4-46/+103
Settle on three "flavors" of uprobe/uretprobe, installed on different kinds of instruction: nop, push, and ret. All three are testing different internal code paths emulating or single-stepping instructions, so are interesting to compare and benchmark separately. To ensure `push rbp` instruction we ensure that uprobe_target_push() is not a leaf function by calling (global __weak) noop function and returning something afterwards (if we don't do that, compiler will just do a tail call optimization). Also, we need to make sure that compiler isn't skipping frame pointer generation, so let's add `-fno-omit-frame-pointers` to Makefile. Just to give an idea of where we currently stand in terms of relative performance of different uprobe/uretprobe cases vs a cheap syscall (getpgid()) baseline, here are results from my local machine: $ benchs/run_bench_uprobes.sh base : 1.561 ± 0.020M/s uprobe-nop : 0.947 ± 0.007M/s uprobe-push : 0.951 ± 0.004M/s uprobe-ret : 0.443 ± 0.007M/s uretprobe-nop : 0.471 ± 0.013M/s uretprobe-push : 0.483 ± 0.004M/s uretprobe-ret : 0.306 ± 0.007M/s Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240301214551.1686095-1-andrii@kernel.org
2024-03-04libbpf: Correct debug message in btf__load_vmlinux_btfChen Shen1-1/+1
In the function btf__load_vmlinux_btf, the debug message incorrectly refers to 'path' instead of 'sysfs_btf_path'. Signed-off-by: Chen Shen <peterchenshen@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240302062218.3587-1-peterchenshen@gmail.com
2024-03-04bpf, docs: Rename legacy conformance group to packetDave Thaler1-2/+2
There could be other legacy conformance groups in the future, so use a more descriptive name. The status of the conformance group in the IANA registry is what designates it as legacy, not the name of the group. Signed-off-by: Dave Thaler <dthaler1968@gmail.com> Link: https://lore.kernel.org/r/20240302012229.16452-1-dthaler1968@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2024-03-03bpf, docs: Use IETF format for field definitions in instruction-set.rstDave Thaler1-241/+290
In preparation for publication as an IETF RFC, the WG chairs asked me to convert the document to use IETF packet format for field layout, so this patch attempts to make it consistent with other IETF documents. Some fields that are not byte aligned were previously inconsistent in how values were defined. Some were defined as the value of the byte containing the field (like 0x20 for a field holding the high four bits of the byte), and others were defined as the value of the field itself (like 0x2). This PR makes them be consistent in using just the values of the field itself, which is IETF convention. As a result, some of the defines that used BPF_* would no longer match the value in the spec, and so this patch also drops the BPF_* prefix to avoid confusion with the defines that are the full-byte equivalent values. For consistency, BPF_* is then dropped from other fields too. BPF_<foo> is thus the Linux implementation-specific define for <foo> as it appears in the BPF ISA specification. The syntax BPF_ADD | BPF_X | BPF_ALU only worked for full-byte values so the convention {ADD, X, ALU} is proposed for referring to field values instead. Also replace the redundant "LSB bits" with "least significant bits". A preview of what the resulting Internet Draft would look like can be seen at: https://htmlpreview.github.io/?https://raw.githubusercontent.com/dthaler/ebp f-docs-1/format/draft-ietf-bpf-isa.html v1->v2: Fix sphinx issue as recommended by David Vernet Signed-off-by: Dave Thaler <dthaler1968@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240301222337.15931-1-dthaler1968@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-03Merge tag 'for-netdev' of ↵Jakub Kicinski150-995/+3589
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-02-29 We've added 119 non-merge commits during the last 32 day(s) which contain a total of 150 files changed, 3589 insertions(+), 995 deletions(-). The main changes are: 1) Extend the BPF verifier to enable static subprog calls in spin lock critical sections, from Kumar Kartikeya Dwivedi. 2) Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF global subprogs, from Andrii Nakryiko. 3) Larger batch of riscv BPF JIT improvements and enabling inlining of the bpf_kptr_xchg() for RV64, from Pu Lehui. 4) Allow skeleton users to change the values of the fields in struct_ops maps at runtime, from Kui-Feng Lee. 5) Extend the verifier's capabilities of tracking scalars when they are spilled to stack, especially when the spill or fill is narrowing, from Maxim Mikityanskiy & Eduard Zingerman. 6) Various BPF selftest improvements to fix errors under gcc BPF backend, from Jose E. Marchesi. 7) Avoid module loading failure when the module trying to register a struct_ops has its BTF section stripped, from Geliang Tang. 8) Annotate all kfuncs in .BTF_ids section which eventually allows for automatic kfunc prototype generation from bpftool, from Daniel Xu. 9) Several updates to the instruction-set.rst IETF standardization document, from Dave Thaler. 10) Shrink the size of struct bpf_map resp. bpf_array, from Alexei Starovoitov. 11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer, from Benjamin Tissoires. 12) Fix bpftool to be more portable to musl libc by using POSIX's basename(), from Arnaldo Carvalho de Melo. 13) Add libbpf support to gcc in CORE macro definitions, from Cupertino Miranda. 14) Remove a duplicate type check in perf_event_bpf_event, from Florian Lehner. 15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them with notrace correctly, from Yonghong Song. 16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible array to fix build warnings, from Kees Cook. 17) Fix resolve_btfids cross-compilation to non host-native endianness, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) selftests/bpf: Test if shadow types work correctly. bpftool: Add an example for struct_ops map and shadow type. bpftool: Generated shadow variables for struct_ops maps. libbpf: Convert st_ops->data to shadow type. libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. bpf: Replace bpf_lpm_trie_key 0-length array with flexible array bpf, arm64: use bpf_prog_pack for memory management arm64: patching: implement text_poke API bpf, arm64: support exceptions arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT bpf: add is_async_callback_calling_insn() helper bpf: introduce in_sleepable() helper bpf: allow more maps in sleepable bpf programs selftests/bpf: Test case for lacking CFI stub functions. bpf: Check cfi_stubs before registering a struct_ops type. bpf: Clarify batch lookup/lookup_and_delete semantics bpf, docs: specify which BPF_ABS and BPF_IND fields were zero bpf, docs: Fix typos in instruction-set.rst selftests/bpf: update tcp_custom_syncookie to use scalar packet offset bpf: Shrink size of struct bpf_map/bpf_array. ... ==================== Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-01Merge branch 'inet_dump_ifaddr-no-rtnl'David S. Miller2-92/+79
Eric Dumazet says: ==================== inet: no longer use RTNL to protect inet_dump_ifaddr() This series convert inet so that a dump of addresses (ip -4 addr) no longer requires RTNL. ==================== Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01inet: use xa_array iterator to implement inet_dump_ifaddr()Eric Dumazet1-58/+37
1) inet_dump_ifaddr() can can run under RCU protection instead of RTNL. 2) properly return 0 at the end of a dump, avoiding an an extra recvmsg() system call. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01inet: prepare inet_base_seq() to run without RTNLEric Dumazet2-3/+4
In the following patch, inet_base_seq() will no longer be called with RTNL held. Add READ_ONCE()/WRITE_ONCE() annotations in dev_base_seq_inc() and inet_base_seq(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01inet: annotate data-races around ifa->ifa_flagsEric Dumazet1-9/+13
ifa->ifa_flags can be read locklessly. Add appropriate READ_ONCE()/WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01inet: annotate data-races around ifa->ifa_preferred_lftEric Dumazet1-8/+8
ifa->ifa_preferred_lft can be read locklessly. Add appropriate READ_ONCE()/WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01inet: annotate data-races around ifa->ifa_valid_lftEric Dumazet1-8/+8
ifa->ifa_valid_lft can be read locklessly. Add appropriate READ_ONCE()/WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01inet: annotate data-races around ifa->ifa_tstamp and ifa->ifa_cstampEric Dumazet1-10/+13
ifa->ifa_tstamp can be read locklessly. Add appropriate READ_ONCE()/WRITE_ONCE() annotations. Do the same for ifa->ifa_cstamp to prepare upcoming changes. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01Merge branch 'netdevsim-link'David S. Miller6-5/+342
David Wei says: ==================== netdevsim: link and forward skbs between ports This patchset adds the ability to link two netdevsim ports together and forward skbs between them, similar to veth. The goal is to use netdevsim for testing features e.g. zero copy Rx using io_uring. This feature was tested locally on QEMU, and a selftest is included. I ran netdev selftests CI style and all tests but the following passed: - gro.sh - l2tp.sh - ip_local_port_range.sh gro.sh fails because virtme-ng mounts as read-only and it tries to write to log.txt. This issue was reported to virtme-ng upstream. l2tp.sh and ip_local_port_range.sh both fail for me on net-next/main as well. --- v13->v14: - implement ndo_get_iflink() - fix returning 0 if peer is already linked during linking or not linked during unlinking - bump dropped counter if nsim_ipsec_tx() fails and generally reorder nsim_start_xmit() - fix overflowing lines and indentations v12->v13: - wait for socat listening port to be ready before sending data in selftest v11->v12: - fix leaked netns refs - fix rtnetlink.sh kci_test_ipsec_offload() selftest v10->v11: - add udevadm settle after creating netdevsims in selftest v9->v10: - fix not freeing skb when not there is no peer - prevent possible id clashes in selftest - cleanup selftest on error paths v8->v9: - switch to getting netns using fd rather than id - prevent linking a netdevsim to itself - update tests v7->v8: - fix not dereferencing RCU ptr using rcu_dereference() - remove unused variables in selftest v6->v7: - change link syntax to netnsid:ifidx - replace dev_get_by_index() with __dev_get_by_index() - check for NULL peer when linking - add a sysfs attribute for unlinking - only update Tx stats if not dropped - update selftest v5->v6: - reworked to link two netdevsims using sysfs attribute on the bus device instead of debugfs due to deadlock possibility if a netdevsim is removed during linking - removed unnecessary patch maintaining a list of probed nsim_devs - updated selftest v4->v5: - reduce nsim_dev_list_lock critical section - fixed missing mutex unlock during unwind ladder - rework nsim_dev_peer_write synchronization to take devlink lock as well as rtnl_lock - return err msgs to user during linking if port doesn't exist or linking to self - update tx stats outside of RCU lock v3->v4: - maintain a mutex protected list of probed nsim_devs instead of using nsim_bus_dev - fixed synchronization issues by taking rtnl_lock - track tx_dropped skbs v2->v3: - take lock when traversing nsim_bus_dev_list - take device ref when getting a nsim_bus_dev - return 0 if nsim_dev_peer_read cannot find the port - address code formatting - do not hard code values in selftests - add Makefile for selftests v1->v2: - renamed debugfs file from "link" to "peer" - replaced strstep() with sscanf() for consistency - increased char[] buf sz to 22 for copying id + port from user - added err msg w/ expected fmt when linking as a hint to user - prevent linking port to itself - protect peer ptr using RCU ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01netdevsim: fix rtnetlink.sh selftestDavid Wei1-0/+2
I cleared IFF_NOARP flag from netdevsim dev->flags in order to support skb forwarding. This breaks the rtnetlink.sh selftest kci_test_ipsec_offload() test because ipsec does not connect to peers it cannot transmit to. Fix the issue by adding a neigh entry manually. ipsec_offload test now successfully pass. Signed-off-by: David Wei <dw@davidwei.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01netdevsim: add selftest for forwarding skb between connected portsDavid Wei2-0/+144
Connect two netdevsim ports in different namespaces together, then send packets between them using socat. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01netdevsim: add ndo_get_iflink() implementationDavid Wei1-0/+16
Add an implementation for ndo_get_iflink() in netdevsim that shows the ifindex of the linked peer, if any. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01netdevsim: forward skbs from one connected port to anotherDavid Wei2-5/+23
Forward skbs sent from one netdevsim port to its connected netdevsim port using dev_forward_skb, in a spirit similar to veth. Add a tx_dropped variable to struct netdevsim, tracking the number of skbs that could not be forwarded using dev_forward_skb(). The xmit() function accessing the peer ptr is protected by an RCU read critical section. The rcu_read_lock() is functionally redundant as since v5.0 all softirqs are implicitly RCU read critical sections; but it is useful for human readers. If another CPU is concurrently in nsim_destroy(), then it will first set the peer ptr to NULL. This does not affect any existing readers that dereferenced a non-NULL peer. Then, in unregister_netdevice(), there is a synchronize_rcu() before the netdev is actually unregistered and freed. This ensures that any readers i.e. xmit() that got a non-NULL peer will complete before the netdev is freed. Any readers after the RCU_INIT_POINTER() but before synchronize_rcu() will dereference NULL, making it safe. The codepath to nsim_destroy() and nsim_create() takes both the newly added nsim_dev_list_lock and rtnl_lock. This makes it safe with concurrent calls to linking two netdevsims together. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01netdevsim: allow two netdevsim ports to be connectedDavid Wei3-0/+157
Add two netdevsim bus attribute to sysfs: /sys/bus/netdevsim/link_device /sys/bus/netdevsim/unlink_device Writing "A M B N" to link_device will link netdevsim M in netnsid A with netdevsim N in netnsid B. Writing "A M" to unlink_device will unlink netdevsim M in netnsid A from its peer, if any. rtnl_lock is taken to ensure nothing changes during the linking. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Maciek Machnikowski <maciek@machnikowski.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01Merge branch 'selftests-xfail'David S. Miller10-141/+178
Jakub Kicinski says: ==================== selftests: kselftest_harness: support using xfail When running selftests for our subsystem in our CI we'd like all tests to pass. Currently some tests use SKIP for cases they expect to fail, because the kselftest_harness limits the return codes to pass/fail/skip. XFAIL which would be a great match here cannot be used. Remove the no_print handling and use vfork() to run the test in a different process than the setup. This way we don't need to pass "failing step" via the exit code. Further clean up the exit codes so that we can use all KSFT_* values. Rewrite the result printing to make handling XFAIL/XPASS easier. Support tests declaring combinations of fixture + variant they expect to fail. Merge plan is to put it on top of -rc6 and merge into net-next. That way others should be able to pull the patches without any networking changes. v4: - rebase on top of Mickael's vfork() changes v3: https://lore.kernel.org/all/20240220192235.2953484-1-kuba@kernel.org/ - combine multiple series - change to "list of expected failures" rather than SKIP()-like handling v2: https://lore.kernel.org/all/20240216002619.1999225-1-kuba@kernel.org/ - fix alignment follow up RFC: https://lore.kernel.org/all/20240216004122.2004689-1-kuba@kernel.org/ v1: https://lore.kernel.org/all/20240213154416.422739-1-kuba@kernel.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: ip_local_port_range: use XFAIL instead of SKIPJakub Kicinski1-3/+3
SCTP does not support IP_LOCAL_PORT_RANGE and we know it, so use XFAIL instead of SKIP. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: kselftest_harness: support using xfailJakub Kicinski1-1/+48
Currently some tests report skip for things they expect to fail e.g. when given combination of parameters is known to be unsupported. This is confusing because in an ideal test environment and fully featured kernel no tests should be skipped. Selftest summary line already includes xfail and xpass counters, e.g.: Totals: pass:725 fail:0 xfail:0 xpass:0 skip:0 error:0 but there's no way to use it from within the harness. Add a new per-fixture+variant combination list of test cases we expect to fail. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: kselftest_harness: let PASS / FAIL provide diagnosticJakub Kicinski1-5/+4
Switch to printing KTAP line for PASS / FAIL with ksft_test_result_code(), this gives us the ability to report diagnostic messages. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: kselftest_harness: separate diagnostic message with # in ↵Jakub Kicinski2-1/+6
ksft_test_result_code() According to the spec we should always print a # if we add a diagnostic message. Having the caller pass in the new line as part of diagnostic message makes handling this a bit counter-intuitive, so append the new line in the helper. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: kselftest_harness: print test name for SKIPJakub Kicinski2-4/+6
Jakub points out that for parsers it's rather useful to always have the test name on the result line. Currently if we SKIP (or soon XFAIL or XPASS), we will print: ok 17 # SKIP SCTP doesn't support IP_BIND_ADDRESS_NO_PORT ^ no test name Always print the test name. KTAP format seems to allow or even call for it, per: https://docs.kernel.org/dev-tools/ktap.html Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/all/87jzn6lnou.fsf@cloudflare.com/ Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: kselftest: add ksft_test_result_code(), handling all exit codesJakub Kicinski2-2/+46
For generic test harness code it's more useful to deal with exit codes directly, rather than having to switch on them and call the right ksft_test_result_*() helper. Add such function to kselftest.h. Note that "directive" and "diagnostic" are what ktap docs call those parts of the message. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: kselftest_harness: use exit code to store skipJakub Kicinski1-14/+5
We always use skip in combination with exit_code being 0 (KSFT_PASS). This are basic KSFT / KTAP semantics. Store the right KSFT_* code in exit_code directly. This makes it easier to support tests reporting other extended KSFT_* codes like XFAIL / XPASS. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: kselftest_harness: save full exit code in metadataJakub Kicinski7-35/+41
Instead of tracking passed = 0/1 rename the field to exit_code and invert the values so that they match the KSFT_* exit codes. This will allow us to fold SKIP / XFAIL into the same value. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: kselftest_harness: generate test name onceJakub Kicinski1-6/+10
Since we added variant support generating full test case name takes 4 string arguments. We're about to need it in another two places. Stop the duplication and print once into a temporary buffer. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests: kselftest_harness: use KSFT_* exit codesJakub Kicinski1-6/+5
Now that we no longer need low exit codes to communicate assertion steps - use normal KSFT exit codes. Acked-by: Kees Cook <keescook@chromium.org> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests/harness: Merge TEST_F_FORK() into TEST_F()Mickaël Salaün2-91/+27
Replace Landlock-specific TEST_F_FORK() with an improved TEST_F() which brings four related changes: Run TEST_F()'s tests in a grandchild process to make it possible to drop privileges and delegate teardown to the parent. Compared to TEST_F_FORK(), simplify handling of the test grandchild process thanks to vfork(2), and makes it generic (e.g. no explicit conversion between exit code and _metadata). Compared to TEST_F_FORK(), run teardown even when tests failed with an assert thanks to commit 63e6b2a42342 ("selftests/harness: Run TEARDOWN for ASSERT failures"). Simplify the test harness code by removing the no_print and step fields which are not used. I added this feature just after I made kselftest_harness.h more broadly available but this step counter remained even though it wasn't needed after all. See commit 369130b63178 ("selftests: Enhance kselftest_harness.h to print which assert failed"). Replace spaces with tabs in one line of __TEST_F_IMPL(). Cc: Günther Noack <gnoack@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Mickaël Salaün <mic@digikod.net> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01selftests/landlock: Redefine TEST_F() as TEST_F_FORK()Mickaël Salaün1-1/+5
This has the effect of creating a new test process for either TEST_F() or TEST_F_FORK(), which doesn't change tests but will ease potential backports. See next commit for the TEST_F_FORK() merge into TEST_F(). Cc: Günther Noack <gnoack@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01Merge branch 'net-asp22-optimizations'David S. Miller6-118/+204
Justin Chen says: ==================== Support for ASP 2.2 and optimizations ASP 2.2 adds some power savings during low power modes. Also make various improvements when entering low power modes and reduce MDIO traffic by hooking up interrupts. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01net: bcmasp: Add support for PHY interruptsJustin Chen3-0/+26
Hook up the phy interrupts for internal phys to reduce mdio traffic and improve responsiveness of link changes. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01net: bcmasp: Keep buffers through power managementJustin Chen2-108/+85
There is no advantage of freeing and re-allocating buffers through suspend and resume. This waste cycles and makes suspend/resume time longer. We also open ourselves to failed allocations in systems with heavy memory fragmentation. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01net: phy: mdio-bcm-unimac: Add asp v2.2 supportJustin Chen1-0/+1
Add mdio compat string for ASP 2.0 ethernet driver. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01net: bcmasp: Add support for ASP 2.2Justin Chen3-10/+87
ASP 2.2 improves power savings during low power modes. A new register was added to toggle to a slower clock during low power modes. EEE was broken for ASP 2.0/2.1. A HW workaround was added for ASP 2.2 that requires toggling a chicken bit. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01dt-bindings: net: brcm,asp-v2.0: Add asp-v2.2Justin Chen1-0/+4
Add support for ASP 2.2. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01dt-bindings: net: brcm,unimac-mdio: Add asp-v2.2Justin Chen1-0/+1
The ASP 2.2 Ethernet controller uses a brcm unimac. Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01Merge branch 'qcom-phy-possible'David S. Miller1-5/+24
Robert Marko says: ==================== net: phy: qcom: qca808x: fill in possible_interfaces QCA808x does not currently fill in the possible_interfaces. This leads to Phylink not being aware that it supports 2500Base-X as well so in cases where it is connected to a DSA switch like MV88E6393 it will limit that port to phy-mode set in the DTS. That means that if SGMII is used you are limited to 1G only while if 2500Base-X was set you are limited to 2.5G only. Populating the possible_interfaces fixes this. Changes in v2: * Get rid of the if/else by Russels suggestion in the helper ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01net: phy: qcom: qca808x: fill in possible_interfacesRobert Marko1-0/+12
Currently QCA808x driver does not fill the possible_interfaces. 2.5G QCA808x support SGMII and 2500Base-X while 1G model only supports SGMII, so fill the possible_interfaces accordingly. Signed-off-by: Robert Marko <robimarko@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01net: phy: qcom: qca808x: add helper for checking for 1G only modelRobert Marko1-5/+12
There are 2 versions of QCA808x, one 2.5G capable and one 1G capable. Currently, this matter only in the .get_features call however, it will be required for filling supported interface modes so lets add a helper that can be reused. Signed-off-by: Robert Marko <robimarko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01Simplify net_dbg_ratelimited() dummyGeert Uytterhoeven1-4/+1
There is no need to wrap calls to the no_printk() helper inside an always-false check, as no_printk() already does that internally. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-01Merge branch 'ipv6-devconf-lockless'David S. Miller20-229/+245
Eric Dumazet says: ==================== ipv6: lockless accesses to devconf - First patch puts in a cacheline_group the fields used in fast paths. - Annotate all data races around idev->cnf fields. - Last patch in this series removes RTNL use for RTM_GETNETCONF dumps. v3: addressed Jakub Kicinski feedback in addrconf_disable_ipv6() Added tags from Jiri and Florian. v2: addressed Jiri Pirko feedback - Added "ipv6: addrconf_disable_ipv6() optimizations" and "ipv6: addrconf_disable_policy() optimization" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>