summaryrefslogtreecommitdiff
path: root/tools/include
AgeCommit message (Collapse)AuthorFilesLines
2023-08-29Merge tag 'net-next-6.6' of ↵Linus Torvalds3-18/+145
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations - Store netdevs in an xarray, to simplify iterating over netdevs - Refactor nexthop selection for multipath routes - Improve sched class lifetime handling - Add backup nexthop ID support for bridge - Implement drop reasons support in openvswitch - Several data races annotations and fixes - Constify the sk parameter of routing functions - Prepend kernel version to netconsole message Protocols: - Implement support for TCP probing the peer being under memory pressure - Remove hard coded limitation on IPv6 specific info placement inside the socket struct - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor - Scaling-up the IPv6 expired route GC via a separated list of expiring routes - In-kernel support for the TLS alert protocol - Better support for UDP reuseport with connected sockets - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size - Get rid of additional ancillary per MPTCP connection struct socket - Implement support for BPF-based MPTCP packet schedulers - Format MPTCP subtests selftests results in TAP - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation BPF: - Multi-buffer support in AF_XDP - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it - Add SO_REUSEPORT support for TC bpf_sk_assign - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64 - Support defragmenting IPv(4|6) packets in BPF - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling - Introduce bpf map element count and enable it for all program types - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress - Add uprobe support for the bpf_get_func_ip helper - Check skb ownership against full socket - Support for up to 12 arguments in BPF trampoline - Extend link_info for kprobe_multi and perf_event links Netfilter: - Speed-up process exit by aborting ruleset validation if a fatal signal is pending - Allow NLA_POLICY_MASK to be used with BE16/BE32 types Driver API: - Page pool optimizations, to improve data locality and cache usage - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info - Extend and use the yaml devlink specs to [re]generate the split ops - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool - Remove phylink legacy mode support - Support offload LED blinking to phy - Add devlink port function attributes for IPsec New hardware / drivers: - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers: - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering" * tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits) net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler net: stmmac: clarify difference between "interface" and "phy_interface" r8152: add vendor/device ID pair for D-Link DUB-E250 devlink: move devlink_notify_register/unregister() to dev.c devlink: move small_ops definition into netlink.c devlink: move tracepoint definitions into core.c devlink: push linecard related code into separate file devlink: push rate related code into separate file devlink: push trap related code into separate file devlink: use tracepoint_enabled() helper devlink: push region related code into separate file devlink: push param related code into separate file devlink: push resource related code into separate file devlink: push dpipe related code into separate file devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper devlink: push shared buffer related code into separate file devlink: push port related code into separate file devlink: push object register/unregister notifications into separate helpers inet: fix IP_TRANSPARENT error handling ...
2023-08-29Merge tag 'linux-kselftest-nolibc-6.6-rc1' of ↵Linus Torvalds20-994/+673
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull nolibc updates from Shuah Khan: "Nolibc: - improved portability by removing build errors with -ENOSYS - added syscall6() on MIPS to support pselect6() and mmap() - added setvbuf(), rmdir(), pipe(), pipe2() - add support for ppc/ppc64 - environ is no longer optional - fixed frame pointer issues at -O0 - dropped sys_stat() in favor of sys_statx() - centralized _start_c() to remove lots of asm code - switched size_t to __SIZE_TYPE__ Selftests: - improved status reporting (success/warning/failure counts, path to log file) - various code cleanups (indent, unused variables, ...) - more consistent test numbering - enabled compiler warnings - dropped unreliable chmod_net test - improved reliability (create /dev/zero & /tmp, rely less on /proc) - new tests (brk/sbrk/mmap/munmap) - improved compatibility with musl - new run-nolibc-test target to build and run natively - new run-libc-test target to build and run against native libc - made the cmdline parser more reliable against boolean arguments - dropped dependency on memfd for vfprintf() test - nolibc-test is no longer stripped - added support for extending ARCH via XARCH Other: - add Thomas as co-maintainer" * tag 'linux-kselftest-nolibc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (103 commits) tools/nolibc: avoid undesired casts in the __sysret() macro tools/nolibc: keep brk(), sbrk(), mmap() away from __sysret() tools/nolibc: silence ppc64 compile warnings selftests/nolibc: libc-test: use HOSTCC instead of CC tools/nolibc: stackprotector.h: make __stack_chk_init static selftests/nolibc: allow report with existing test log selftests/nolibc: add test support for ppc64 selftests/nolibc: add test support for ppc64le selftests/nolibc: add test support for ppc selftests/nolibc: add XARCH and ARCH mapping support tools/nolibc: add support for powerpc64 tools/nolibc: add support for powerpc MAINTAINERS: nolibc: add myself as co-maintainer selftests/nolibc: enable compiler warnings selftests/nolibc: don't strip nolibc-test selftests/nolibc: prevent out of bounds access in expect_vfprintf selftests/nolibc: use correct return type for read() and write() selftests/nolibc: avoid sign-compare warnings selftests/nolibc: avoid unused parameter warnings selftests/nolibc: make functions static if possible ...
2023-08-23tools/nolibc: avoid undesired casts in the __sysret() macroWilly Tarreau1-14/+13
Having __sysret() as an inline function has the unfortunate effect of adding casts and large constants comparisons after the syscall returns that significantly inflate some light code that's otherwise syscall- heavy. Even nolibc-test grew by ~1%. Let's switch back to a macro for this, and use it only with signed arguments. Note that it is also possible to design a slightly more complex macro covering unsigned and pointers but we only have 3 such syscalls so it is pointless, and these were just addressed not to use this macro anymore. Now for the argument (the local variable containing the syscall return value), any negative value is an error, that results in -1 being returned and errno to be assigned the opposite value. This may be revisited again in the future if really needed but for now let's get back to something sane. Fixes: 428905da6ec4 ("tools/nolibc: sys.h: add a syscall return helper") Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/ Link: https://lore.kernel.org/lkml/ZNKOJY+g66nkIyvv@1wt.eu/ Cc: Zhangjin Wu <falcon@tinylab.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Thomas Weißschuh <thomas@t-8ch.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: keep brk(), sbrk(), mmap() away from __sysret()Willy Tarreau1-3/+16
The __sysret() function causes some undesirable casts so we'll revert it. In order to keep it simple it will now only support integer return values like in the past, so we must basically revert the changes that were made to these 3 syscalls which return a pointer so that they simply rely on their own test and the SET_ERRNO() macro. Fixes: 4201cfce15fe ("tools/nolibc: clean up sbrk() routine") Fixes: 924e9539aeaa ("tools/nolibc: clean up mmap() routine") Fixes: d27447bc2e0a ("tools/nolibc: sys.h: apply __sysret() helper") Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/ Link: https://lore.kernel.org/lkml/ZNKOJY+g66nkIyvv@1wt.eu/ Cc: Zhangjin Wu <falcon@tinylab.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Thomas Weißschuh <thomas@t-8ch.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: silence ppc64 compile warningsZhangjin Wu1-3/+11
Silence the following warnings reported by the new -Wall -Wextra options with pure assembly code. In file included from sysroot/powerpc/include/stdio.h:13, from nolibc-test.c:13: sysroot/powerpc/include/arch.h: In function '_start': sysroot/powerpc/include/arch.h:192:32: warning: unused variable 'r2' [-Wunused-variable] 192 | register volatile long r2 __asm__ ("r2") = (void *)&TOC - (void *)_start; | ^~ sysroot/powerpc/include/arch.h:187:97: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var] 187 | void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void) | ^~~~~~ Since only elfv2 ABI requires to save the TOC/GOT pointer to r2 register, when using elfv1 ABI, the old C code is simply ignored by the compiler, but the compiler can not ignore the inline assembly code and will introduce build failure or running segfaults. So, let's further only add the new assembly code for elfv2 ABI with the checking of _CALL_ELF == 2. Link: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf Link: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: stackprotector.h: make __stack_chk_init staticZhangjin Wu2-4/+3
This allows to generate smaller text/data/dec size. As the _start_c() function added by crt.h, __stack_chk_init() is called from _start_c() instead of the assembly _start. So, it is able to mark it with static now. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add support for powerpc64Zhangjin Wu1-0/+16
This follows the 64-bit PowerPC ABI [1], refers to the slides: "A new ABI for little-endian PowerPC64 Design & Implementation" [2] and the musl code in arch/powerpc64/crt_arch.h. First, stdu and clrrdi are used instead of stwu and clrrwi for powerpc64. Second, the stack frame size is increased to 32 bytes for powerpc64, 32 bytes is the minimal stack frame size supported described in [2]. Besides, the TOC pointer (GOT pointer) must be saved to r2. This works on both little endian and big endian 64-bit PowerPC. [1]: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf [2]: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add support for powerpcZhangjin Wu2-0/+199
Both syscall declarations and _start code definition are added for powerpc to nolibc. Like mips, powerpc uses a register (exactly, the summary overflow bit) to record the error occurred, and uses another register to return the value [1]. So, the return value of every syscall declaration must be normalized to match the __sysret() helper, return -value when there is an error, otheriwse, return value directly. Glibc and musl use different methods to check the summary overflow bit, glibc (sysdeps/unix/sysv/linux/powerpc/sysdep.h) saves the cr register to r0 at first, and then check the summary overflow bit in cr0: mfcr r0 r0 & (1 << 28) ? -r3 : r3 --> 10003c14: 7c 00 00 26 mfcr r0 10003c18: 74 09 10 00 andis. r9,r0,4096 10003c1c: 41 82 00 08 beq 0x10003c24 10003c20: 7c 63 00 d0 neg r3,r3 Musl (arch/powerpc/syscall_arch.h) directly checks the summary overflow bit with the 'bns' instruction, it is smaller: /* no summary overflow bit means no error, return value directly */ bns+ 1f /* otherwise, return negated value */ neg r3, r3 1: --> 10000418: 40 a3 00 08 bns 0x10000420 1000041c: 7c 63 00 d0 neg r3,r3 Like musl, Linux (arch/powerpc/include/asm/vdso/gettimeofday.h) uses the same method for do_syscall_2() too. Here applies the second method to get smaller size. [1]: https://man7.org/linux/man-pages/man2/syscall.2.html Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: stdint: use __SIZE_TYPE__ for size_tThomas Weißschuh1-1/+1
Otherwise both gcc and clang may generate warnings about type mismatches: sysroot/mips/include/string.h:12:14: warning: mismatch in argument 1 type of built-in function 'malloc'; expected 'unsigned int' [-Wbuiltin-declaration-mismatch] 12 | static void *malloc(size_t len); | ^~~~~~ The compiler provides __SIZE_TYPE__ as the type that corresponds to size_t (typically "long unsigned int" or "unsigned int"). It was verified to be available at least since gcc-3.4 and clang-3.8, so from now on we'll use this definition for size_t. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/20230805161929.GA15284@1wt.eu/ Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: sys: avoid implicit sign castThomas Weißschuh1-1/+1
getauxval() returns an unsigned long but the overall type of the ternary operator needs to be signed. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: setvbuf: avoid unused parameter warningsThomas Weißschuh1-1/+4
This warning will be enabled later so avoid triggering it. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: fix return type of getpagesize()Thomas Weißschuh1-2/+2
It's documented as returning int which is also implemented by glibc and musl, so adopt that return type. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: drop unused variablesThomas Weißschuh1-1/+0
Nobody needs it, get rid of it. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add pipe() and pipe2() supportYuan Tan1-0/+24
According to manual page [1], posix spec [2] and source code like arch/mips/kernel/syscall.c, for historic reasons, the sys_pipe() syscall on some architectures has an unusual calling convention. It returns results in two registers which means there is no need for it to do verify the validity of a userspace pointer argument. Historically that used to be expensive in Linux. These days the performance advantage is negligible. Nolibc doesn't support the unusual calling convention above, luckily Linux provides a generic sys_pipe2() with an additional flags argument from 2.6.27. If flags is 0, then pipe2() is the same as pipe(). So here we use sys_pipe2() to implement the pipe(). pipe2() is also provided to allow users to use flags argument on demand. [1]: https://man7.org/linux/man-pages/man2/pipe.2.html [2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/pipe.html Suggested-by: Zhangjin Wu <falcon@tinylab.org> Link: https://lore.kernel.org/all/20230729100401.GA4577@1wt.eu/ Signed-off-by: Yuan Tan <tanyuan@tinylab.org> Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc/stdio: add setvbuf() to set buffering modeRyan Roberts1-0/+24
Add a minimal implementation of setvbuf(), which error checks the mode argument (as required by spec) and returns. Since nolibc never buffers output, nothing needs to be done. The kselftest framework recently added a call to setvbuf(). As a result, any tests that use the kselftest framework and nolibc cause a compiler error due to missing function. This provides an urgent fix for the problem which is preventing arm64 testing on linux-next. Example: clang --target=aarch64-linux-gnu -fintegrated-as -Werror=unknown-warning-option -Werror=ignored-optimization-argument -Werror=option-ignored -Werror=unused-command-line-argument --target=aarch64-linux-gnu -fintegrated-as -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \ -include ../../../../include/nolibc/nolibc.h -I../..\ -static -ffreestanding -Wall za-fork.c build/kselftest/arm64/fp/za-fork-asm.o -o build/kselftest/arm64/fp/za-fork In file included from <built-in>:1: In file included from ./../../../../include/nolibc/nolibc.h:97: In file included from ./../../../../include/nolibc/arch.h:25: ./../../../../include/nolibc/arch-aarch64.h:178:35: warning: unknown attribute 'optimize' ignored [-Wunknown-attributes] void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from za-fork.c:12: ../../kselftest.h:123:2: error: call to undeclared function 'setvbuf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] setvbuf(stdout, NULL, _IOLBF, 0); ^ ../../kselftest.h:123:24: error: use of undeclared identifier '_IOLBF' setvbuf(stdout, NULL, _IOLBF, 0); ^ 1 warning and 2 errors generated. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/linux-kselftest/CA+G9fYus3Z8r2cg3zLv8uH8MRrzLFVWdnor02SNr=rCz+_WGVg@mail.gmail.com/ Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: s390: shrink _start with _start_cZhangjin Wu1-31/+5
move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: riscv: shrink _start with _start_cZhangjin Wu1-39/+5
move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: loongarch: shrink _start with _start_cZhangjin Wu1-40/+4
move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: mips: shrink _start with _start_cZhangjin Wu1-38/+8
move most of the _start operations to _start_c(), include the stackprotector initialization. Also clean up the instructions in delayed slots. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: x86_64: shrink _start with _start_cZhangjin Wu1-23/+6
move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: i386: shrink _start with _start_cZhangjin Wu1-27/+7
move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: aarch64: shrink _start with _start_cZhangjin Wu1-23/+4
move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arm: shrink _start with _start_cZhangjin Wu1-39/+5
move most of the _start operations to _start_c(), include the stackprotector initialization. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: crt.h: initialize stack protectorZhangjin Wu1-0/+4
As suggested by Thomas, It is able to move the stackprotector initialization from the assembly _start to the beginning of the new _start_c(). Let's call __stack_chk_init() in _start_c() as a preparation. Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/a00284a6-54b1-498c-92aa-44997fa78403@t-8ch.de/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: stackprotector.h: add empty __stack_chk_init for ↵Zhangjin Wu1-0/+2
!_NOLIBC_STACKPROTECTOR Let's define an empty __stack_chk_init for the !_NOLIBC_STACKPROTECTOR branch. This allows to remove #ifdef around every call of __stack_chk_init(). Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add new crt.h with _start_cZhangjin Wu2-0/+58
As the environ and _auxv support added for nolibc, the assembly _start function becomes more and more complex and therefore makes the porting of nolibc to new architectures harder and harder. To simplify portability, this C version of _start_c() is added to do most of the assembly start operations in C, which reduces the complexity a lot and will eventually simplify the porting of nolibc to the new architectures. The new _start_c() only requires a stack pointer argument, it will find argc, argv, envp/environ and _auxv for us, and then call main(), finally, it exit() with main's return status. With this new _start_c(), the future new architectures only require to add very few assembly instructions. As suggested by Thomas, users may use a different signature of main (e.g. void main(void)), a _nolibc_main alias is added for main to silence the warning about potential conflicting types. As suggested by Willy, the code is carefully polished for both smaller size and better readability with local variables and the right types. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230715095729.GC24086@1wt.eu/ Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/90fdd255-32f4-4caf-90ff-06456b53dac3@t-8ch.de/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: remove the old sys_stat supportZhangjin Wu9-248/+13
The statx manpage [1] shows that it has been supported from Linux 4.11 and glibc 2.28, the Linux support can be checked for all of the architectures with this command: $ git grep -r statx v4.11 arch/ include/uapi/asm-generic/unistd.h \ | grep -E "aarch64|arm|mips|s390|x86|:include/uapi" Besides riscv and loongarch, all of the nolibc supported architectures have added sys_statx from Linux v4.11. riscv is mainlined to v4.15, loongarch is mainlined to v5.19, both of them use the generic unistd.h, so, they have added sys_statx from their first mainline versions. The current oldest stable branch is v4.14, only reserving sys_statx still preserves compatibility with all of the supported stable branches, So, let's remove the old arch related and dependent sys_stat support completely. This is friendly to the future new architecture porting. [1]: https://man7.org/linux/man-pages/man2/statx.2.html Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: fix up startup failures for -O0 under gcc < 11.1.0Zhangjin Wu8-8/+8
As gcc doc [1] shows: Most optimizations are completely disabled at -O0 or if an -O level is not set on the command line, even if individual optimization flags are specified. Test result [2] shows, gcc>=11.1.0 deviates from the above description, but before gcc 11.1.0, "-O0" still forcely uses frame pointer in the _start function even if the individual optimize("omit-frame-pointer") flag is specified. The frame pointer related operations will change the stack pointer (e.g. In x86_64, an extra "push %rbp" will be inserted at the beginning of _start) and make it differs from the one we expected, as a result, break the whole startup function. To fix up this issue, as suggested by Thomas, the individual "Os" and "omit-frame-pointer" optimize flags are used together on _start function to disable frame pointer completely even if the -O0 is set on the command line. [1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html [2]: https://lore.kernel.org/lkml/20230714094723.140603-1-falcon@tinylab.org/ Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/34b21ba5-7b59-4b3b-9ed6-ef9a3a5e06f7@t-8ch.de/ Fixes: 7f8548589661 ("tools/nolibc: make compiler and assembler agree on the section around _start") Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arch-*.h: add missing space after ','Zhangjin Wu8-8/+8
Fix up such errors reported by scripts/checkpatch.pl: ERROR: space required after that ',' (ctx:VxV) #148: FILE: tools/include/nolibc/arch-aarch64.h:148: +void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void) ^ ERROR: space required after that ',' (ctx:VxV) #148: FILE: tools/include/nolibc/arch-aarch64.h:148: +void __attribute__((weak,noreturn,optimize("omit-frame-pointer"))) __no_stack_protector _start(void) ^ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: completely remove optional environ supportThomas Weißschuh1-10/+2
In commit 52e423f5b93e ("tools/nolibc: export environ as a weak symbol on i386") and friends the asm startup logic was extended to directly populate the "environ" array. This makes it impossible for "environ" to be dropped by the linker. Therefore also drop the other logic to handle non-present "environ". Also add a testcase to validate the initialization of environ. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add rmdir() supportZhangjin Wu1-0/+22
a reverse operation of mkdir() is meaningful, add rmdir() here. required by nolibc-test to remove /proc while CONFIG_PROC_FS is not enabled. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: types.h: add RB_ flags for reboot()Zhangjin Wu2-2/+11
Both glibc and musl provide RB_ flags via <sys/reboot.h> for reboot(), they don't need to include <linux/reboot.h>, let nolibc provide RB_ flags too. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: clean up sbrk() routineZhangjin Wu1-5/+4
Fix up the error reported by scripts/checkpatch.pl: ERROR: do not use assignment in if condition #95: FILE: tools/include/nolibc/sys.h:95: + if ((ret = sys_brk(0)) && (sys_brk(ret + inc) == ret + inc)) Apply the new generic __sysret() to merge the SET_ERRNO() and return lines. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: clean up mmap() routineZhangjin Wu2-23/+12
Do several cleanups together: - Since all supported architectures have my_syscall6() now, remove the #ifdef check. - Move the mmap() related macros to tools/include/nolibc/types.h and reuse most of them from <linux/mman.h> - Apply the new generic __sysret() to convert the calling of sys_map() to oneline code Note, since MAP_FAILED is -1 on Linux, so we can use the generic __sysret() which returns -1 upon error and still satisfy user land that checks for MAP_FAILED. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20230702192347.GJ16233@1wt.eu/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: __sysret: support syscalls who return a pointerZhangjin Wu1-5/+12
No official reference states the errno range, here aligns with musl and glibc and uses [-MAX_ERRNO, -1] instead of all negative ones. - musl: src/internal/syscall_ret.c - glibc: sysdeps/unix/sysv/linux/sysdep.h The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc defined in tools/include/nolibc/errno.h. Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@1wt.eu/ Suggested-by: David Laight <David.Laight@ACULAB.COM> Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.aculab.com/ Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add missing my_syscall6() for mipsZhangjin Wu2-6/+30
It is able to pass the 6th argument like the 5th argument via the stack for mips, let's add a new my_syscall6() now, see [1] for details: The mips/o32 system call convention passes arguments 5 through 8 on the user stack. Both mmap() and pselect6() require my_syscall6(). [1]: https://man7.org/linux/man-pages/man2/syscall.2.html Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arch-mips.h: shrink with _NOLIBC_SYSCALL_CLOBBERLISTZhangjin Wu1-12/+10
my_syscall<N> share the same long clobber list, define a macro for them. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arch-loongarch.h: shrink with _NOLIBC_SYSCALL_CLOBBERLISTZhangjin Wu1-14/+9
my_syscall<N> share the same long clobber list, define a macro for them. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23toolc/nolibc: arch-*.h: clean up whitespaces after __asm__Zhangjin Wu8-54/+54
replace "__asm__ volatile" with "__asm__ volatile" and insert necessary whitespace before "\" to make sure the lines are aligned. $ sed -i -e 's/__asm__ volatile ( /__asm__ volatile ( /g' tools/include/nolibc/*.h Note, arch-s390.h uses post-tab instead of post-whitespaces, must avoid insert whitespace just before the tabs: $ sed -i -e 's/__asm__ volatile (\t/__asm__ volatile (\t/g' tools/include/nolibc/arch-*.h Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: arch-*.h: fix up code indent errorsZhangjin Wu5-39/+39
More than 8 whitespaces of the code indent are replaced with "tab + whitespaces" to fix up such errors reported by scripts/checkpatch.pl: ERROR: code indent should use tabs where possible #64: FILE: tools/include/nolibc/arch-mips.h:64: +^I \$ ERROR: code indent should use tabs where possible #72: FILE: tools/include/nolibc/arch-mips.h:72: +^I "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \$ This command is used: $ sed -i -e '/^\t* /{s/ /\t/g}' tools/include/nolibc/arch-*.h Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-22bpf: Add pid filter support for uprobe_multi linkJiri Olsa1-0/+1
Adding support to specify pid for uprobe_multi link and the uprobes are created only for task with given pid value. Using the consumer.filter filter callback for that, so the task gets filtered during the uprobe installation. We still need to check the task during runtime in the uprobe handler, because the handler could get executed if there's another system wide consumer on the same uprobe (thanks Oleg for the insight). Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-22bpf: Add cookies support for uprobe_multi linkJiri Olsa1-0/+1
Adding support to specify cookies array for uprobe_multi link. The cookies array share indexes and length with other uprobe_multi arrays (offsets/ref_ctr_offsets). The cookies[i] value defines cookie for i-the uprobe and will be returned by bpf_get_attach_cookie helper when called from ebpf program hooked to that specific uprobe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-22bpf: Add multi uprobe linkJiri Olsa1-0/+16
Adding new multi uprobe link that allows to attach bpf program to multiple uprobes. Uprobes to attach are specified via new link_create uprobe_multi union: struct { __aligned_u64 path; __aligned_u64 offsets; __aligned_u64 ref_ctr_offsets; __u32 cnt; __u32 flags; } uprobe_multi; Uprobes are defined for single binary specified in path and multiple calling sites specified in offsets array with optional reference counters specified in ref_ctr_offsets array. All specified arrays have length of 'cnt'. The 'flags' supports single bit for now that marks the uprobe as return probe. Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-22bpf: Switch BPF_F_KPROBE_MULTI_RETURN macro to enumJiri Olsa1-1/+3
Switching BPF_F_KPROBE_MULTI_RETURN macro to anonymous enum, so it'd show up in vmlinux.h. There's not functional change compared to having this as macro. Acked-by: Yafang Shao <laoar.shao@gmail.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230809083440.3209381-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-08bpf: Add support for bpf_get_func_ip helper for uprobe programJiri Olsa1-1/+6
Adding support for bpf_get_func_ip helper for uprobe program to return probed address for both uprobe and return uprobe. We discussed this in [1] and agreed that uprobe can have special use of bpf_get_func_ip helper that differs from kprobe. The kprobe bpf_get_func_ip returns: - address of the function if probe is attach on function entry for both kprobe and return kprobe - 0 if the probe is not attach on function entry The uprobe bpf_get_func_ip returns: - address of the probe for both uprobe and return uprobe The reason for this semantic change is that kernel can't really tell if the probe user space address is function entry. The uprobe program is actually kprobe type program attached as uprobe. One of the consequences of this design is that uprobes do not have its own set of helpers, but share them with kprobes. As we need different functionality for bpf_get_func_ip helper for uprobe, I'm adding the bool value to the bpf_trace_run_ctx, so the helper can detect that it's executed in uprobe context and call specific code. The is_uprobe bool is set as true in bpf_prog_run_array_sleepable, which is currently used only for executing bpf programs in uprobe. Renaming bpf_prog_run_array_sleepable to bpf_prog_run_array_uprobe to address that it's only used for uprobes and that it sets the run_ctx.is_uprobe as suggested by Yafang Shao. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> [1] https://lore.kernel.org/bpf/CAEf4BzZ=xLVkG5eurEuvLU79wAMtwho7ReR+XJAgwhFF4M-7Cg@mail.gmail.com/ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Viktor Malik <vmalik@redhat.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230807085956.2344866-2-jolsa@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-06tools/nolibc: unistd.h: reorder the syscall macrosZhangjin Wu1-2/+2
Tune the macros in the using order and align most of them. Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-06tools/nolibc: sys.h: apply __sysret() helperZhangjin Wu1-310/+44
Use __sysret() to shrink most of the library routines to oneline code. Removed 266 lines of duplicated code. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-06tools/nolibc: unistd.h: apply __sysret() helperZhangjin Wu1-10/+1
Use __sysret() to shrink the whole _syscall() to oneline code. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-06tools/nolibc: sys.h: add a syscall return helperZhangjin Wu1-0/+10
Most of the library routines share the same syscall return logic: In general, a 0 return value indicates success. A -1 return value indicates an error, and an error number is stored in errno. [1] Let's add a __sysret() helper for the above logic to simplify the coding and shrink the code lines too. Thomas suggested to use inline function instead of macro for __sysret(). Willy suggested to make __sysret() be always inline. [1]: https://man7.org/linux/man-pages/man2/syscall.2.html Suggested-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/linux-riscv/ZH1+hkhiA2+ItSvX@1wt.eu/ Suggested-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/linux-riscv/ea4e7442-7223-4211-ba29-70821e907888@t-8ch.de/ Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-06tools/nolibc: fix up undeclared syscall macros with #ifdef and -ENOSYSZhangjin Wu1-0/+12
Compiling nolibc for rv32 got such errors: nolibc/sysroot/riscv/include/sys.h: In function ‘sys_gettimeofday’: nolibc/sysroot/riscv/include/sys.h:557:21: error: ‘__NR_gettimeofday’ undeclared (first use in this function); did you mean ‘sys_gettimeofday’? 557 | return my_syscall2(__NR_gettimeofday, tv, tz); | ^~~~~~~~~~~~~~~~~ nolibc/sysroot/riscv/include/sys.h: In function ‘sys_lseek’: nolibc/sysroot/riscv/include/sys.h:675:21: error: ‘__NR_lseek’ undeclared (first use in this function) 675 | return my_syscall3(__NR_lseek, fd, offset, whence); | ^~~~~~~~~~ nolibc/sysroot/riscv/include/sys.h: In function ‘sys_wait4’: nolibc/sysroot/riscv/include/sys.h:1341:21: error: ‘__NR_wait4’ undeclared (first use in this function) 1341 | return my_syscall4(__NR_wait4, pid, status, options, rusage); If a syscall macro is not supported by a target platform, wrap it with '#ifdef' and 'return -ENOSYS' for the '#else' branch, which lets the other syscalls work as-is and allows developers to fix up the test failures reported by nolibc-test one by one later. This wraps all of the failed syscall macros with '#ifdef' and 'return -ENOSYS' for the '#else' branch, so, all of the undeclared failures are fixed. Suggested-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/linux-riscv/5e7d2adf-e96f-41ca-a4c6-5c87a25d4c9c@app.fastmail.com/ Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>