From 90156f4bfa21ada9943b538e27385152407605b1 Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Fri, 16 Dec 2022 13:43:18 -0800 Subject: bpf, x86: Improve PROBE_MEM runtime load check This patch rewrites the runtime PROBE_MEM check insns emitted by the BPF JIT in order to ensure load safety. The changes in the patch fix two issues with the previous logic and more generally improve size of emitted code. Paragraphs between this one and "FIX 1" below explain the purpose of the runtime check and examine the current implementation. When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the address being loaded from is not necessarily valid. The BPF jit sets up exception handlers for each such load which catch page faults and 0 out the destination register. Arbitrary register-relative loads can escape this exception handling mechanism. Specifically, a load like dst_reg = *(src_reg + off) will not trigger BPF exception handling if (src_reg + off) is outside of kernel address space, resulting in an uncaught page fault. A concrete example of such behavior is a program like: struct result { char space[40]; long a; }; /* if err, returns ERR_PTR(-EINVAL) */ struct result *ptr = get_ptr_maybe_err(); long x = ptr->a; If get_ptr_maybe_err returns ERR_PTR(-EINVAL) and the result isn't checked for err, 'result' will be (u64)-EINVAL, a number close to U64_MAX. The ptr->a load will be > U64_MAX and will wrap over to a small positive u64, which will be in userspace and thus not covered by BPF exception handling mechanism. In order to prevent such loads from occurring, the BPF jit emits some instructions which do runtime checking of (src_reg + off) and skip the actual load if it's out of range. As an example, here are instructions emitted for a %rdi = *(%rdi + 0x10) PROBE_MEM load: 72: movabs $0x800000000010,%r11 --| 7c: cmp %r11,%rdi |- 72 - 7f: Check 1 7f: jb 0x000000000000008d --| 81: mov %rdi,%r11 -----| 84: add $0x0000000000000010,%r11 |- 81-8b: Check 2 8b: jnc 0x0000000000000091 -----| 8d: xor %edi,%edi ---- 0 out dest 8f: jmp 0x0000000000000095 91: mov 0x10(%rdi),%rdi ---- Actual load 95: The JIT considers kernel address space to start at MAX_TASK_SIZE + PAGE_SIZE. Determining whether a load will be outside of kernel address space should be a simple check: (src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE But because there is only one spare register when the checking logic is emitted, this logic is split into two checks: Check 1: src_reg >= (MAX_TASK_SIZE + PAGE_SIZE - off) Check 2: src_reg + off doesn't wrap over U64_MAX and result in small pos u64 Emitted insns implementing Checks 1 and 2 are annotated in the above example. Check 1 can be done with a single spare register since the source reg by definition is the left-hand-side of the inequality. Since adding 'off' to both sides of Check 1's inequality results in the original inequality we want, it's equivalent to testing that inequality. Except in the case where src_reg + off wraps past U64_MAX, which is why Check 2 needs to actually add src_reg + off if Check 1 passes - again using the single spare reg. FIX 1: The Check 1 inequality listed above is not what current code is doing. Current code is a bit more pessimistic, instead checking: src_reg >= (MAX_TASK_SIZE + PAGE_SIZE + abs(off)) The 0x800000000010 in above example is from this current check. If Check 1 was corrected to use the correct right-hand-side, the value would be 0x7ffffffffff0. This patch changes the checking logic more broadly (FIX 2 below will elaborate), fixing this issue as a side-effect of the rewrite. Regardless, it's important to understand why Check 1 should've been doing MAX_TASK_SIZE + PAGE_SIZE - off before proceeding. FIX 2: Current code relies on a 'jnc' to determine whether src_reg + off addition wrapped over. For negative offsets this logic is incorrect. Consider Check 2 insns emitted when off = -0x10: 81: mov %rdi,%r11 84: add 0xfffffffffffffff0,%r11 8b: jnc 0x0000000000000091 2's complement representation of -0x10 is a large positive u64. Any value of src_reg that passes Check 1 will result in carry flag being set after (src_reg + off) addition. So a load with any negative offset will always fail Check 2 at runtime and never do the actual load. This patch fixes the negative offset issue by rewriting both checks in order to not rely on carry flag. The rewrite takes advantage of the fact that, while we only have one scratch reg to hold arbitrary values, we know the offset at JIT time. This we can use src_reg as a temporary scratch reg to hold src_reg + offset since we can return it to its original value by later subtracting offset. As a result we can directly check the original inequality we care about: (src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE For a load like %rdi = *(%rsi + -0x10), this results in emitted code: 43: movabs $0x800000000000,%r11 4d: add $0xfffffffffffffff0,%rsi --- src_reg += off 54: cmp %r11,%rsi --- Check original inequality 57: jae 0x000000000000005d 59: xor %edi,%edi 5b: jmp 0x0000000000000061 5d: mov 0x0(%rdi),%rsi --- Actual Load 61: sub $0xfffffffffffffff0,%rsi --- src_reg -= off Note that the actual load is always done with offset 0, since previous insns have already done src_reg += off. Regardless of whether the new check succeeds or fails, insn 61 is always executed, returning src_reg to its original value. Because the goal of these checks is to ensure that loaded-from address will be protected by BPF exception handler, the new check can safely ignore any wrapover from insn 4d. If such wrapped-over address passes insn 54 + 57's cmp-and-jmp it will have such protection so the load can proceed. IMPROVEMENTS: The above improved logic is 8 insns vs original logic's 9, and has 1 fewer jmp. The number of checking insns can be further improved in common scenarios: If src_reg == dst_reg, the actual load insn will clobber src_reg, so there's no original src_reg state for the sub insn immediately following the load to restore, so it can be omitted. In fact, it must be omitted since it would incorrectly subtract from the result of the load if it wasn't. So for src_reg == dst_reg, JIT emits these insns: 3c: movabs $0x800000000000,%r11 46: add $0xfffffffffffffff0,%rdi 4d: cmp %r11,%rdi 50: jae 0x0000000000000056 52: xor %edi,%edi 54: jmp 0x000000000000005a 56: mov 0x0(%rdi),%rdi 5a: The only difference from larger example being the omitted sub, which would've been insn 5a in this example. If offset == 0, we can similarly omit the sub as in previous case, since there's nothing added to subtract. For the same reason we can omit the addition as well, resulting in JIT emitting these insns: 46: movabs $0x800000000000,%r11 4d: cmp %r11,%rdi 50: jae 0x0000000000000056 52: xor %edi,%edi 54: jmp 0x000000000000005a 56: mov 0x0(%rdi),%rdi 5a: Although the above example also has src_reg == dst_reg, the same offset == 0 optimization is valid to apply if src_reg != dst_reg. To summarize the improvements in emitted insn count for the check-and-load: BEFORE: 8 check insns, 3 jmps AFTER (general case): 7 check insns, 2 jmps (12.5% fewer insn, 33% jmp) AFTER (src == dst): 6 check insns, 2 jmps (25% fewer insn) AFTER (offset == 0): 5 check insns, 2 jmps (37.5% fewer insn) (Above counts don't include the 1 load insn, just checking around it) Based on BPF bytecode + JITted x86 insn I saw while experimenting with these improvements, I expect the src_reg == dst_reg case to occur most often, followed by offset == 0, then the general case. Signed-off-by: Dave Marchevsky Signed-off-by: Daniel Borkmann Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20221216214319.3408356-1-davemarchevsky@fb.com --- arch/x86/net/bpf_jit_comp.c | 70 +++++++++++++++++++++++++-------------------- 1 file changed, 39 insertions(+), 31 deletions(-) (limited to 'arch') diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 36ffe67ad6e5..e3e2b57e4e13 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -992,6 +992,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image u8 b2 = 0, b3 = 0; u8 *start_of_ldx; s64 jmp_offset; + s16 insn_off; u8 jmp_cond; u8 *func; int nops; @@ -1358,57 +1359,52 @@ st: if (is_imm8(insn->off)) case BPF_LDX | BPF_PROBE_MEM | BPF_W: case BPF_LDX | BPF_MEM | BPF_DW: case BPF_LDX | BPF_PROBE_MEM | BPF_DW: + insn_off = insn->off; + if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { - /* Though the verifier prevents negative insn->off in BPF_PROBE_MEM - * add abs(insn->off) to the limit to make sure that negative - * offset won't be an issue. - * insn->off is s16, so it won't affect valid pointers. + /* Conservatively check that src_reg + insn->off is a kernel address: + * src_reg + insn->off >= TASK_SIZE_MAX + PAGE_SIZE + * src_reg is used as scratch for src_reg += insn->off and restored + * after emit_ldx if necessary */ - u64 limit = TASK_SIZE_MAX + PAGE_SIZE + abs(insn->off); - u8 *end_of_jmp1, *end_of_jmp2; - /* Conservatively check that src_reg + insn->off is a kernel address: - * 1. src_reg + insn->off >= limit - * 2. src_reg + insn->off doesn't become small positive. - * Cannot do src_reg + insn->off >= limit in one branch, - * since it needs two spare registers, but JIT has only one. + u64 limit = TASK_SIZE_MAX + PAGE_SIZE; + u8 *end_of_jmp; + + /* At end of these emitted checks, insn->off will have been added + * to src_reg, so no need to do relative load with insn->off offset */ + insn_off = 0; /* movabsq r11, limit */ EMIT2(add_1mod(0x48, AUX_REG), add_1reg(0xB8, AUX_REG)); EMIT((u32)limit, 4); EMIT(limit >> 32, 4); + + if (insn->off) { + /* add src_reg, insn->off */ + maybe_emit_1mod(&prog, src_reg, true); + EMIT2_off32(0x81, add_1reg(0xC0, src_reg), insn->off); + } + /* cmp src_reg, r11 */ maybe_emit_mod(&prog, src_reg, AUX_REG, true); EMIT2(0x39, add_2reg(0xC0, src_reg, AUX_REG)); - /* if unsigned '<' goto end_of_jmp2 */ - EMIT2(X86_JB, 0); - end_of_jmp1 = prog; - - /* mov r11, src_reg */ - emit_mov_reg(&prog, true, AUX_REG, src_reg); - /* add r11, insn->off */ - maybe_emit_1mod(&prog, AUX_REG, true); - EMIT2_off32(0x81, add_1reg(0xC0, AUX_REG), insn->off); - /* jmp if not carry to start_of_ldx - * Otherwise ERR_PTR(-EINVAL) + 128 will be the user addr - * that has to be rejected. - */ - EMIT2(0x73 /* JNC */, 0); - end_of_jmp2 = prog; + + /* if unsigned '>=', goto load */ + EMIT2(X86_JAE, 0); + end_of_jmp = prog; /* xor dst_reg, dst_reg */ emit_mov_imm32(&prog, false, dst_reg, 0); /* jmp byte_after_ldx */ EMIT2(0xEB, 0); - /* populate jmp_offset for JB above to jump to xor dst_reg */ - end_of_jmp1[-1] = end_of_jmp2 - end_of_jmp1; - /* populate jmp_offset for JNC above to jump to start_of_ldx */ + /* populate jmp_offset for JAE above to jump to start_of_ldx */ start_of_ldx = prog; - end_of_jmp2[-1] = start_of_ldx - end_of_jmp2; + end_of_jmp[-1] = start_of_ldx - end_of_jmp; } - emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off); + emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn_off); if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { struct exception_table_entry *ex; u8 *_insn = image + proglen + (start_of_ldx - temp); @@ -1417,6 +1413,18 @@ st: if (is_imm8(insn->off)) /* populate jmp_offset for JMP above */ start_of_ldx[-1] = prog - start_of_ldx; + if (insn->off && src_reg != dst_reg) { + /* sub src_reg, insn->off + * Restore src_reg after "add src_reg, insn->off" in prev + * if statement. But if src_reg == dst_reg, emit_ldx + * above already clobbered src_reg, so no need to restore. + * If add src_reg, insn->off was unnecessary, no need to + * restore either. + */ + maybe_emit_1mod(&prog, src_reg, true); + EMIT2_off32(0x81, add_1reg(0xE8, src_reg), insn->off); + } + if (!bpf_prog->aux->extable) break; -- cgit v1.2.3 From 7f7880495770329d095d402c2865bfa7089192f8 Mon Sep 17 00:00:00 2001 From: Pu Lehui Date: Thu, 5 Jan 2023 11:50:26 +0800 Subject: bpf, x86: Simplify the parsing logic of structure parameters Extra_nregs of structure parameters and nr_args can be added directly at the beginning, and using a flip flag to identifiy structure parameters. Meantime, renaming some variables to make them more sense. Signed-off-by: Pu Lehui Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20230105035026.3091988-1-pulehui@huaweicloud.com Signed-off-by: Martin KaFai Lau --- arch/x86/net/bpf_jit_comp.c | 101 +++++++++++++++++++++----------------------- 1 file changed, 48 insertions(+), 53 deletions(-) (limited to 'arch') diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 8db6077febdd..1056bbf55b17 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1857,62 +1857,59 @@ emit_jmp: return proglen; } -static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_args, +static void save_regs(const struct btf_func_model *m, u8 **prog, int nr_regs, int stack_size) { - int i, j, arg_size, nr_regs; + int i, j, arg_size; + bool next_same_struct = false; + /* Store function arguments to stack. * For a function that accepts two pointers the sequence will be: * mov QWORD PTR [rbp-0x10],rdi * mov QWORD PTR [rbp-0x8],rsi */ - for (i = 0, j = 0; i < min(nr_args, 6); i++) { - if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) { - nr_regs = (m->arg_size[i] + 7) / 8; + for (i = 0, j = 0; i < min(nr_regs, 6); i++) { + /* The arg_size is at most 16 bytes, enforced by the verifier. */ + arg_size = m->arg_size[j]; + if (arg_size > 8) { arg_size = 8; - } else { - nr_regs = 1; - arg_size = m->arg_size[i]; + next_same_struct = !next_same_struct; } - while (nr_regs) { - emit_stx(prog, bytes_to_bpf_size(arg_size), - BPF_REG_FP, - j == 5 ? X86_REG_R9 : BPF_REG_1 + j, - -(stack_size - j * 8)); - nr_regs--; - j++; - } + emit_stx(prog, bytes_to_bpf_size(arg_size), + BPF_REG_FP, + i == 5 ? X86_REG_R9 : BPF_REG_1 + i, + -(stack_size - i * 8)); + + j = next_same_struct ? j : j + 1; } } -static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_args, +static void restore_regs(const struct btf_func_model *m, u8 **prog, int nr_regs, int stack_size) { - int i, j, arg_size, nr_regs; + int i, j, arg_size; + bool next_same_struct = false; /* Restore function arguments from stack. * For a function that accepts two pointers the sequence will be: * EMIT4(0x48, 0x8B, 0x7D, 0xF0); mov rdi,QWORD PTR [rbp-0x10] * EMIT4(0x48, 0x8B, 0x75, 0xF8); mov rsi,QWORD PTR [rbp-0x8] */ - for (i = 0, j = 0; i < min(nr_args, 6); i++) { - if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) { - nr_regs = (m->arg_size[i] + 7) / 8; + for (i = 0, j = 0; i < min(nr_regs, 6); i++) { + /* The arg_size is at most 16 bytes, enforced by the verifier. */ + arg_size = m->arg_size[j]; + if (arg_size > 8) { arg_size = 8; - } else { - nr_regs = 1; - arg_size = m->arg_size[i]; + next_same_struct = !next_same_struct; } - while (nr_regs) { - emit_ldx(prog, bytes_to_bpf_size(arg_size), - j == 5 ? X86_REG_R9 : BPF_REG_1 + j, - BPF_REG_FP, - -(stack_size - j * 8)); - nr_regs--; - j++; - } + emit_ldx(prog, bytes_to_bpf_size(arg_size), + i == 5 ? X86_REG_R9 : BPF_REG_1 + i, + BPF_REG_FP, + -(stack_size - i * 8)); + + j = next_same_struct ? j : j + 1; } } @@ -2138,8 +2135,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i struct bpf_tramp_links *tlinks, void *func_addr) { - int ret, i, nr_args = m->nr_args, extra_nregs = 0; - int regs_off, ip_off, args_off, stack_size = nr_args * 8, run_ctx_off; + int i, ret, nr_regs = m->nr_args, stack_size = 0; + int regs_off, nregs_off, ip_off, run_ctx_off; struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; @@ -2148,17 +2145,14 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i u8 *prog; bool save_ret; - /* x86-64 supports up to 6 arguments. 7+ can be added in the future */ - if (nr_args > 6) - return -ENOTSUPP; - - for (i = 0; i < MAX_BPF_FUNC_ARGS; i++) { + /* extra registers for struct arguments */ + for (i = 0; i < m->nr_args; i++) if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) - extra_nregs += (m->arg_size[i] + 7) / 8 - 1; - } - if (nr_args + extra_nregs > 6) + nr_regs += (m->arg_size[i] + 7) / 8 - 1; + + /* x86-64 supports up to 6 arguments. 7+ can be added in the future */ + if (nr_regs > 6) return -ENOTSUPP; - stack_size += extra_nregs * 8; /* Generated trampoline stack layout: * @@ -2172,7 +2166,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i * [ ... ] * RBP - regs_off [ reg_arg1 ] program's ctx pointer * - * RBP - args_off [ arg regs count ] always + * RBP - nregs_off [ regs count ] always * * RBP - ip_off [ traced function ] BPF_TRAMP_F_IP_ARG flag * @@ -2184,11 +2178,12 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i if (save_ret) stack_size += 8; + stack_size += nr_regs * 8; regs_off = stack_size; - /* args count */ + /* regs count */ stack_size += 8; - args_off = stack_size; + nregs_off = stack_size; if (flags & BPF_TRAMP_F_IP_ARG) stack_size += 8; /* room for IP address argument */ @@ -2221,11 +2216,11 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i EMIT1(0x53); /* push rbx */ /* Store number of argument registers of the traced function: - * mov rax, nr_args + extra_nregs - * mov QWORD PTR [rbp - args_off], rax + * mov rax, nr_regs + * mov QWORD PTR [rbp - nregs_off], rax */ - emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_args + extra_nregs); - emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -args_off); + emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_regs); + emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -nregs_off); if (flags & BPF_TRAMP_F_IP_ARG) { /* Store IP address of the traced function: @@ -2236,7 +2231,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ip_off); } - save_regs(m, &prog, nr_args, regs_off); + save_regs(m, &prog, nr_regs, regs_off); if (flags & BPF_TRAMP_F_CALL_ORIG) { /* arg1: mov rdi, im */ @@ -2266,7 +2261,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i } if (flags & BPF_TRAMP_F_CALL_ORIG) { - restore_regs(m, &prog, nr_args, regs_off); + restore_regs(m, &prog, nr_regs, regs_off); if (flags & BPF_TRAMP_F_ORIG_STACK) { emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, 8); @@ -2307,7 +2302,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i } if (flags & BPF_TRAMP_F_RESTORE_REGS) - restore_regs(m, &prog, nr_args, regs_off); + restore_regs(m, &prog, nr_regs, regs_off); /* This needs to be done regardless. If there were fmod_ret programs, * the return value is only updated on the stack and still needs to be -- cgit v1.2.3 From 1f4263ea6a4b3a3457e9b21bec00976b00a1136e Mon Sep 17 00:00:00 2001 From: Clark Wang Date: Fri, 13 Jan 2023 11:33:44 +0800 Subject: arm64: dts: imx93: add eqos support Add EQoS node for imx93 platform. Signed-off-by: Clark Wang Reviewed-by: Peng Fan Signed-off-by: David S. Miller --- arch/arm64/boot/dts/freescale/imx93.dtsi | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx93.dtsi b/arch/arm64/boot/dts/freescale/imx93.dtsi index 5d79663b3b84..de85e7f31582 100644 --- a/arch/arm64/boot/dts/freescale/imx93.dtsi +++ b/arch/arm64/boot/dts/freescale/imx93.dtsi @@ -536,6 +536,28 @@ status = "disabled"; }; + eqos: ethernet@428a0000 { + compatible = "nxp,imx93-dwmac-eqos", "snps,dwmac-5.10a"; + reg = <0x428a0000 0x10000>; + interrupts = , + ; + interrupt-names = "eth_wake_irq", "macirq"; + clocks = <&clk IMX93_CLK_ENET_QOS_GATE>, + <&clk IMX93_CLK_ENET_QOS_GATE>, + <&clk IMX93_CLK_ENET_TIMER2>, + <&clk IMX93_CLK_ENET>, + <&clk IMX93_CLK_ENET_QOS_GATE>; + clock-names = "stmmaceth", "pclk", "ptp_ref", "tx", "mem"; + assigned-clocks = <&clk IMX93_CLK_ENET_TIMER2>, + <&clk IMX93_CLK_ENET>; + assigned-clock-parents = <&clk IMX93_CLK_SYS_PLL_PFD1_DIV2>, + <&clk IMX93_CLK_SYS_PLL_PFD0_DIV2>; + assigned-clock-rates = <100000000>, <250000000>; + intf_mode = <&wakeupmix_gpr 0x28>; + clk_csr = <0>; + status = "disabled"; + }; + usdhc3: mmc@428b0000 { compatible = "fsl,imx93-usdhc", "fsl,imx8mm-usdhc"; reg = <0x428b0000 0x10000>; -- cgit v1.2.3 From eaaf471085402df4d2f0afe355322820d114e85d Mon Sep 17 00:00:00 2001 From: Clark Wang Date: Fri, 13 Jan 2023 11:33:45 +0800 Subject: arm64: dts: imx93: add FEC support Add FEC node for imx93 platform. Signed-off-by: Clark Wang Reviewed-by: Peng Fan Signed-off-by: David S. Miller --- arch/arm64/boot/dts/freescale/imx93.dtsi | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx93.dtsi b/arch/arm64/boot/dts/freescale/imx93.dtsi index de85e7f31582..22dd2ee70be7 100644 --- a/arch/arm64/boot/dts/freescale/imx93.dtsi +++ b/arch/arm64/boot/dts/freescale/imx93.dtsi @@ -558,6 +558,32 @@ status = "disabled"; }; + fec: ethernet@42890000 { + compatible = "fsl,imx93-fec", "fsl,imx8mq-fec", "fsl,imx6sx-fec"; + reg = <0x42890000 0x10000>; + interrupts = , + , + , + ; + clocks = <&clk IMX93_CLK_ENET1_GATE>, + <&clk IMX93_CLK_ENET1_GATE>, + <&clk IMX93_CLK_ENET_TIMER1>, + <&clk IMX93_CLK_ENET_REF>, + <&clk IMX93_CLK_ENET_REF_PHY>; + clock-names = "ipg", "ahb", "ptp", + "enet_clk_ref", "enet_out"; + assigned-clocks = <&clk IMX93_CLK_ENET_TIMER1>, + <&clk IMX93_CLK_ENET_REF>, + <&clk IMX93_CLK_ENET_REF_PHY>; + assigned-clock-parents = <&clk IMX93_CLK_SYS_PLL_PFD1_DIV2>, + <&clk IMX93_CLK_SYS_PLL_PFD0_DIV2>, + <&clk IMX93_CLK_SYS_PLL_PFD1_DIV2>; + assigned-clock-rates = <100000000>, <250000000>, <50000000>; + fsl,num-tx-queues = <3>; + fsl,num-rx-queues = <3>; + status = "disabled"; + }; + usdhc3: mmc@428b0000 { compatible = "fsl,imx93-usdhc", "fsl,imx8mm-usdhc"; reg = <0x428b0000 0x10000>; -- cgit v1.2.3 From 1b110dd678d9a1a5a97f7691bfa29432a42928b8 Mon Sep 17 00:00:00 2001 From: Clark Wang Date: Fri, 13 Jan 2023 11:33:46 +0800 Subject: arm64: dts: imx93-11x11-evk: enable eqos Enable EQoS function for imx93-11x11-evk board. Signed-off-by: Clark Wang Reviewed-by: Peng Fan Signed-off-by: David S. Miller --- arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts | 39 +++++++++++++++++++++++ 1 file changed, 39 insertions(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts b/arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts index 69786c326db0..f37b6efa32aa 100644 --- a/arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts +++ b/arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts @@ -35,6 +35,26 @@ status = "okay"; }; +&eqos { + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_eqos>; + phy-mode = "rgmii-id"; + phy-handle = <ðphy1>; + status = "okay"; + + mdio { + compatible = "snps,dwmac-mdio"; + #address-cells = <1>; + #size-cells = <0>; + clock-frequency = <5000000>; + + ethphy1: ethernet-phy@1 { + reg = <1>; + eee-broken-1000t; + }; + }; +}; + &lpuart1 { /* console */ pinctrl-names = "default"; pinctrl-0 = <&pinctrl_uart1>; @@ -65,6 +85,25 @@ }; &iomuxc { + pinctrl_eqos: eqosgrp { + fsl,pins = < + MX93_PAD_ENET1_MDC__ENET_QOS_MDC 0x57e + MX93_PAD_ENET1_MDIO__ENET_QOS_MDIO 0x57e + MX93_PAD_ENET1_RD0__ENET_QOS_RGMII_RD0 0x57e + MX93_PAD_ENET1_RD1__ENET_QOS_RGMII_RD1 0x57e + MX93_PAD_ENET1_RD2__ENET_QOS_RGMII_RD2 0x57e + MX93_PAD_ENET1_RD3__ENET_QOS_RGMII_RD3 0x57e + MX93_PAD_ENET1_RXC__CCM_ENET_QOS_CLOCK_GENERATE_RX_CLK 0x5fe + MX93_PAD_ENET1_RX_CTL__ENET_QOS_RGMII_RX_CTL 0x57e + MX93_PAD_ENET1_TD0__ENET_QOS_RGMII_TD0 0x57e + MX93_PAD_ENET1_TD1__ENET_QOS_RGMII_TD1 0x57e + MX93_PAD_ENET1_TD2__ENET_QOS_RGMII_TD2 0x57e + MX93_PAD_ENET1_TD3__ENET_QOS_RGMII_TD3 0x57e + MX93_PAD_ENET1_TXC__CCM_ENET_QOS_CLOCK_GENERATE_TX_CLK 0x5fe + MX93_PAD_ENET1_TX_CTL__ENET_QOS_RGMII_TX_CTL 0x57e + >; + }; + pinctrl_uart1: uart1grp { fsl,pins = < MX93_PAD_UART1_RXD__LPUART1_RX 0x31e -- cgit v1.2.3 From c897dc7f3a8d0ea5355f86d15a7da69c9d4d8309 Mon Sep 17 00:00:00 2001 From: Clark Wang Date: Fri, 13 Jan 2023 11:33:47 +0800 Subject: arm64: dts: imx93-11x11-evk: enable fec function Enable FEC function for imx93-11x11-evk board. Signed-off-by: Clark Wang Reviewed-by: Peng Fan Signed-off-by: David S. Miller --- arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts | 39 +++++++++++++++++++++++ 1 file changed, 39 insertions(+) (limited to 'arch') diff --git a/arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts b/arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts index f37b6efa32aa..91b196c49be1 100644 --- a/arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts +++ b/arch/arm64/boot/dts/freescale/imx93-11x11-evk.dts @@ -55,6 +55,26 @@ }; }; +&fec { + pinctrl-names = "default"; + pinctrl-0 = <&pinctrl_fec>; + phy-mode = "rgmii-id"; + phy-handle = <ðphy2>; + fsl,magic-packet; + status = "okay"; + + mdio { + #address-cells = <1>; + #size-cells = <0>; + clock-frequency = <5000000>; + + ethphy2: ethernet-phy@2 { + reg = <2>; + eee-broken-1000t; + }; + }; +}; + &lpuart1 { /* console */ pinctrl-names = "default"; pinctrl-0 = <&pinctrl_uart1>; @@ -104,6 +124,25 @@ >; }; + pinctrl_fec: fecgrp { + fsl,pins = < + MX93_PAD_ENET2_MDC__ENET1_MDC 0x57e + MX93_PAD_ENET2_MDIO__ENET1_MDIO 0x57e + MX93_PAD_ENET2_RD0__ENET1_RGMII_RD0 0x57e + MX93_PAD_ENET2_RD1__ENET1_RGMII_RD1 0x57e + MX93_PAD_ENET2_RD2__ENET1_RGMII_RD2 0x57e + MX93_PAD_ENET2_RD3__ENET1_RGMII_RD3 0x57e + MX93_PAD_ENET2_RXC__ENET1_RGMII_RXC 0x5fe + MX93_PAD_ENET2_RX_CTL__ENET1_RGMII_RX_CTL 0x57e + MX93_PAD_ENET2_TD0__ENET1_RGMII_TD0 0x57e + MX93_PAD_ENET2_TD1__ENET1_RGMII_TD1 0x57e + MX93_PAD_ENET2_TD2__ENET1_RGMII_TD2 0x57e + MX93_PAD_ENET2_TD3__ENET1_RGMII_TD3 0x57e + MX93_PAD_ENET2_TXC__ENET1_RGMII_TXC 0x5fe + MX93_PAD_ENET2_TX_CTL__ENET1_RGMII_TX_CTL 0x57e + >; + }; + pinctrl_uart1: uart1grp { fsl,pins = < MX93_PAD_UART1_RXD__LPUART1_RX 0x31e -- cgit v1.2.3 From 68f4eae781dd25aca2eb84ca2279663689db8d19 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 25 Jan 2023 23:14:16 -0800 Subject: net: checksum: drop the linux/uaccess.h include net/checksum.h pulls in linux/uaccess.h which is large. In the x86 header the include seems to not be needed at all. ARM on the other hand does not include uaccess.h, even tho it calls access_ok(). In the generic implementation guard the include of linux/uaccess.h with the same condition as the code that needs it. With this change pre-processed net/checksum.h shrinks on x86 from 30616 lines to just 1193. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- arch/arm/include/asm/checksum.h | 1 + arch/x86/include/asm/checksum_64.h | 1 - include/net/checksum.h | 4 +++- 3 files changed, 4 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/arm/include/asm/checksum.h b/arch/arm/include/asm/checksum.h index f0f54aef3724..d8a13959bff0 100644 --- a/arch/arm/include/asm/checksum.h +++ b/arch/arm/include/asm/checksum.h @@ -11,6 +11,7 @@ #define __ASM_ARM_CHECKSUM_H #include +#include /* * computes the checksum of a memory block at buff, length len, diff --git a/arch/x86/include/asm/checksum_64.h b/arch/x86/include/asm/checksum_64.h index 407beebadaf4..4d4a47a3a8ab 100644 --- a/arch/x86/include/asm/checksum_64.h +++ b/arch/x86/include/asm/checksum_64.h @@ -9,7 +9,6 @@ */ #include -#include #include /** diff --git a/include/net/checksum.h b/include/net/checksum.h index 6bc783b7a06c..1338cb92c8e7 100644 --- a/include/net/checksum.h +++ b/include/net/checksum.h @@ -18,8 +18,10 @@ #include #include #include -#include #include +#if !defined(_HAVE_ARCH_COPY_AND_CSUM_FROM_USER) || !defined(HAVE_CSUM_COPY_USER) +#include +#endif #ifndef _HAVE_ARCH_COPY_AND_CSUM_FROM_USER static __always_inline -- cgit v1.2.3 From 07dcbd7325cee92d5798536f417b9b1da497f9da Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Sat, 28 Jan 2023 01:06:45 +0100 Subject: s390/bpf: Fix a typo in a comment "desription" should be "description". Signed-off-by: Ilya Leoshkevich Link: https://lore.kernel.org/r/20230128000650.1516334-27-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov --- arch/s390/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index af35052d06ed..eb1a78c0e6a8 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -510,7 +510,7 @@ static void bpf_skip(struct bpf_jit *jit, int size) * Emit function prologue * * Save registers and create stack frame if necessary. - * See stack frame layout desription in "bpf_jit.h"! + * See stack frame layout description in "bpf_jit.h"! */ static void bpf_jit_prologue(struct bpf_jit *jit, u32 stack_depth) { -- cgit v1.2.3 From bb4ef8fc3d193ed8d5583fb47cbeff5d8fb8302f Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Sun, 29 Jan 2023 20:04:55 +0100 Subject: s390/bpf: Add expoline to tail calls All the indirect jumps in the eBPF JIT already use expolines, except for the tail call one. Fixes: de5cb6eb514e ("s390: use expoline thunks in the BPF JIT") Signed-off-by: Ilya Leoshkevich Link: https://lore.kernel.org/r/20230129190501.1624747-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov --- arch/s390/net/bpf_jit_comp.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index eb1a78c0e6a8..8400a06c926e 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -1393,8 +1393,16 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, /* lg %r1,bpf_func(%r1) */ EMIT6_DISP_LH(0xe3000000, 0x0004, REG_1, REG_1, REG_0, offsetof(struct bpf_prog, bpf_func)); - /* bc 0xf,tail_call_start(%r1) */ - _EMIT4(0x47f01000 + jit->tail_call_start); + if (nospec_uses_trampoline()) { + jit->seen |= SEEN_FUNC; + /* aghi %r1,tail_call_start */ + EMIT4_IMM(0xa70b0000, REG_1, jit->tail_call_start); + /* brcl 0xf,__s390_indirect_jump_r1 */ + EMIT6_PCREL_RILC(0xc0040000, 0xf, jit->r1_thunk_ip); + } else { + /* bc 0xf,tail_call_start(%r1) */ + _EMIT4(0x47f01000 + jit->tail_call_start); + } /* out: */ if (jit->prg_buf) { *(u16 *)(jit->prg_buf + patch_1_clrj + 2) = -- cgit v1.2.3 From f1d5df84cd8c3ec6460c78f5b86be7c84577a83f Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Sun, 29 Jan 2023 20:04:56 +0100 Subject: s390/bpf: Implement bpf_arch_text_poke() bpf_arch_text_poke() is used to hotpatch eBPF programs and trampolines. s390x has a very strict hotpatching restriction: the only thing that is allowed to be hotpatched is conditional branch mask. Take the same approach as commit de5012b41e5c ("s390/ftrace: implement hotpatching"): create a conditional jump to a "plt", which loads the target address from memory and jumps to it; then first patch this address, and then the mask. Trampolines (introduced in the next patch) respect the ftrace calling convention: the return address is in %r0, and %r1 is clobbered. With that in mind, bpf_arch_text_poke() does not differentiate between jumps and calls. However, there is a simple optimization for jumps (for the epilogue_ip case): if a jump already points to the destination, then there is no "plt" and we can just flip the mask. For simplicity, the "plt" template is defined in assembly, and its size is used to define C arrays. There doesn't seem to be a way to convey this size to C as a constant, so it's hardcoded and double-checked during runtime. Signed-off-by: Ilya Leoshkevich Link: https://lore.kernel.org/r/20230129190501.1624747-4-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov --- arch/s390/net/bpf_jit_comp.c | 97 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) (limited to 'arch') diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 8400a06c926e..c72eb3fc1f98 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -30,6 +30,7 @@ #include #include #include +#include #include "bpf_jit.h" struct bpf_jit { @@ -50,6 +51,8 @@ struct bpf_jit { int r14_thunk_ip; /* Address of expoline thunk for 'br %r14' */ int tail_call_start; /* Tail call start offset */ int excnt; /* Number of exception table entries */ + int prologue_plt_ret; /* Return address for prologue hotpatch PLT */ + int prologue_plt; /* Start of prologue hotpatch PLT */ }; #define SEEN_MEM BIT(0) /* use mem[] for temporary storage */ @@ -506,6 +509,36 @@ static void bpf_skip(struct bpf_jit *jit, int size) } } +/* + * PLT for hotpatchable calls. The calling convention is the same as for the + * ftrace hotpatch trampolines: %r0 is return address, %r1 is clobbered. + */ +extern const char bpf_plt[]; +extern const char bpf_plt_ret[]; +extern const char bpf_plt_target[]; +extern const char bpf_plt_end[]; +#define BPF_PLT_SIZE 32 +asm( + ".pushsection .rodata\n" + " .align 8\n" + "bpf_plt:\n" + " lgrl %r0,bpf_plt_ret\n" + " lgrl %r1,bpf_plt_target\n" + " br %r1\n" + " .align 8\n" + "bpf_plt_ret: .quad 0\n" + "bpf_plt_target: .quad 0\n" + "bpf_plt_end:\n" + " .popsection\n" +); + +static void bpf_jit_plt(void *plt, void *ret, void *target) +{ + memcpy(plt, bpf_plt, BPF_PLT_SIZE); + *(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret; + *(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target; +} + /* * Emit function prologue * @@ -514,6 +547,11 @@ static void bpf_skip(struct bpf_jit *jit, int size) */ static void bpf_jit_prologue(struct bpf_jit *jit, u32 stack_depth) { + /* No-op for hotpatching */ + /* brcl 0,prologue_plt */ + EMIT6_PCREL_RILC(0xc0040000, 0, jit->prologue_plt); + jit->prologue_plt_ret = jit->prg; + if (jit->seen & SEEN_TAIL_CALL) { /* xc STK_OFF_TCCNT(4,%r15),STK_OFF_TCCNT(%r15) */ _EMIT6(0xd703f000 | STK_OFF_TCCNT, 0xf000 | STK_OFF_TCCNT); @@ -589,6 +627,13 @@ static void bpf_jit_epilogue(struct bpf_jit *jit, u32 stack_depth) /* br %r1 */ _EMIT2(0x07f1); } + + jit->prg = ALIGN(jit->prg, 8); + jit->prologue_plt = jit->prg; + if (jit->prg_buf) + bpf_jit_plt(jit->prg_buf + jit->prg, + jit->prg_buf + jit->prologue_plt_ret, NULL); + jit->prg += BPF_PLT_SIZE; } static int get_probe_mem_regno(const u8 *insn) @@ -1776,6 +1821,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp) struct bpf_jit jit; int pass; + if (WARN_ON_ONCE(bpf_plt_end - bpf_plt != BPF_PLT_SIZE)) + return orig_fp; + if (!fp->jit_requested) return orig_fp; @@ -1867,3 +1915,52 @@ out: tmp : orig_fp); return fp; } + +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, + void *old_addr, void *new_addr) +{ + struct { + u16 opc; + s32 disp; + } __packed insn; + char expected_plt[BPF_PLT_SIZE]; + char current_plt[BPF_PLT_SIZE]; + char *plt; + int err; + + /* Verify the branch to be patched. */ + err = copy_from_kernel_nofault(&insn, ip, sizeof(insn)); + if (err < 0) + return err; + if (insn.opc != (0xc004 | (old_addr ? 0xf0 : 0))) + return -EINVAL; + + if (t == BPF_MOD_JUMP && + insn.disp == ((char *)new_addr - (char *)ip) >> 1) { + /* + * The branch already points to the destination, + * there is no PLT. + */ + } else { + /* Verify the PLT. */ + plt = (char *)ip + (insn.disp << 1); + err = copy_from_kernel_nofault(current_plt, plt, BPF_PLT_SIZE); + if (err < 0) + return err; + bpf_jit_plt(expected_plt, (char *)ip + 6, old_addr); + if (memcmp(current_plt, expected_plt, BPF_PLT_SIZE)) + return -EINVAL; + /* Adjust the call address. */ + s390_kernel_write(plt + (bpf_plt_target - bpf_plt), + &new_addr, sizeof(void *)); + } + + /* Adjust the mask of the branch. */ + insn.opc = 0xc004 | (new_addr ? 0xf0 : 0); + s390_kernel_write((char *)ip + 1, (char *)&insn.opc + 1, 1); + + /* Make the new code visible to the other CPUs. */ + text_poke_sync_lock(); + + return 0; +} -- cgit v1.2.3 From 528eb2cb87bc1353235a6384696b4849bde8b0ba Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Sun, 29 Jan 2023 20:04:57 +0100 Subject: s390/bpf: Implement arch_prepare_bpf_trampoline() arch_prepare_bpf_trampoline() is used for direct attachment of eBPF programs to various places, bypassing kprobes. It's responsible for calling a number of eBPF programs before, instead and/or after whatever they are attached to. Add a s390x implementation, paying attention to the following: - Reuse the existing JIT infrastructure, where possible. - Like the existing JIT, prefer making multiple passes instead of backpatching. Currently 2 passes is enough. If literal pool is introduced, this needs to be raised to 3. However, at the moment adding literal pool only makes the code larger. If branch shortening is introduced, the number of passes needs to be increased even further. - Support both regular and ftrace calling conventions, depending on the trampoline flags. - Use expolines for indirect calls. - Handle the mismatch between the eBPF and the s390x ABIs. - Sign-extend fmod_ret return values. invoke_bpf_prog() produces about 120 bytes; it might be possible to slightly optimize this, but reaching 50 bytes, like on x86_64, looks unrealistic: just loading cookie, __bpf_prog_enter, bpf_func, insnsi and __bpf_prog_exit as literals already takes at least 5 * 12 = 60 bytes, and we can't use relative addressing for most of them. Therefore, lower BPF_MAX_TRAMP_LINKS on s390x. Signed-off-by: Ilya Leoshkevich Link: https://lore.kernel.org/r/20230129190501.1624747-5-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov --- arch/s390/net/bpf_jit_comp.c | 542 +++++++++++++++++++++++++++++++++++++++++-- include/linux/bpf.h | 4 + 2 files changed, 524 insertions(+), 22 deletions(-) (limited to 'arch') diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index c72eb3fc1f98..3fe082d5328d 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -71,6 +71,10 @@ struct bpf_jit { #define REG_0 REG_W0 /* Register 0 */ #define REG_1 REG_W1 /* Register 1 */ #define REG_2 BPF_REG_1 /* Register 2 */ +#define REG_3 BPF_REG_2 /* Register 3 */ +#define REG_4 BPF_REG_3 /* Register 4 */ +#define REG_7 BPF_REG_6 /* Register 7 */ +#define REG_8 BPF_REG_7 /* Register 8 */ #define REG_14 BPF_REG_0 /* Register 14 */ /* @@ -595,6 +599,43 @@ static void bpf_jit_prologue(struct bpf_jit *jit, u32 stack_depth) } } +/* + * Emit an expoline for a jump that follows + */ +static void emit_expoline(struct bpf_jit *jit) +{ + /* exrl %r0,.+10 */ + EMIT6_PCREL_RIL(0xc6000000, jit->prg + 10); + /* j . */ + EMIT4_PCREL(0xa7f40000, 0); +} + +/* + * Emit __s390_indirect_jump_r1 thunk if necessary + */ +static void emit_r1_thunk(struct bpf_jit *jit) +{ + if (nospec_uses_trampoline()) { + jit->r1_thunk_ip = jit->prg; + emit_expoline(jit); + /* br %r1 */ + _EMIT2(0x07f1); + } +} + +/* + * Call r1 either directly or via __s390_indirect_jump_r1 thunk + */ +static void call_r1(struct bpf_jit *jit) +{ + if (nospec_uses_trampoline()) + /* brasl %r14,__s390_indirect_jump_r1 */ + EMIT6_PCREL_RILB(0xc0050000, REG_14, jit->r1_thunk_ip); + else + /* basr %r14,%r1 */ + EMIT2(0x0d00, REG_14, REG_1); +} + /* * Function epilogue */ @@ -608,25 +649,13 @@ static void bpf_jit_epilogue(struct bpf_jit *jit, u32 stack_depth) if (nospec_uses_trampoline()) { jit->r14_thunk_ip = jit->prg; /* Generate __s390_indirect_jump_r14 thunk */ - /* exrl %r0,.+10 */ - EMIT6_PCREL_RIL(0xc6000000, jit->prg + 10); - /* j . */ - EMIT4_PCREL(0xa7f40000, 0); + emit_expoline(jit); } /* br %r14 */ _EMIT2(0x07fe); - if ((nospec_uses_trampoline()) && - (is_first_pass(jit) || (jit->seen & SEEN_FUNC))) { - jit->r1_thunk_ip = jit->prg; - /* Generate __s390_indirect_jump_r1 thunk */ - /* exrl %r0,.+10 */ - EMIT6_PCREL_RIL(0xc6000000, jit->prg + 10); - /* j . */ - EMIT4_PCREL(0xa7f40000, 0); - /* br %r1 */ - _EMIT2(0x07f1); - } + if (is_first_pass(jit) || (jit->seen & SEEN_FUNC)) + emit_r1_thunk(jit); jit->prg = ALIGN(jit->prg, 8); jit->prologue_plt = jit->prg; @@ -707,6 +736,34 @@ static int bpf_jit_probe_mem(struct bpf_jit *jit, struct bpf_prog *fp, return 0; } +/* + * Sign-extend the register if necessary + */ +static int sign_extend(struct bpf_jit *jit, int r, u8 size, u8 flags) +{ + if (!(flags & BTF_FMODEL_SIGNED_ARG)) + return 0; + + switch (size) { + case 1: + /* lgbr %r,%r */ + EMIT4(0xb9060000, r, r); + return 0; + case 2: + /* lghr %r,%r */ + EMIT4(0xb9070000, r, r); + return 0; + case 4: + /* lgfr %r,%r */ + EMIT4(0xb9140000, r, r); + return 0; + case 8: + return 0; + default: + return -1; + } +} + /* * Compile one eBPF instruction into s390x code * @@ -1355,13 +1412,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, jit->seen |= SEEN_FUNC; /* lgrl %w1,func */ EMIT6_PCREL_RILB(0xc4080000, REG_W1, _EMIT_CONST_U64(func)); - if (nospec_uses_trampoline()) { - /* brasl %r14,__s390_indirect_jump_r1 */ - EMIT6_PCREL_RILB(0xc0050000, REG_14, jit->r1_thunk_ip); - } else { - /* basr %r14,%w1 */ - EMIT2(0x0d00, REG_14, REG_W1); - } + /* %r1() */ + call_r1(jit); /* lgr %b0,%r2: load return value into %b0 */ EMIT4(0xb9040000, BPF_REG_0, REG_2); break; @@ -1964,3 +2016,449 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, return 0; } + +struct bpf_tramp_jit { + struct bpf_jit common; + int orig_stack_args_off;/* Offset of arguments placed on stack by the + * func_addr's original caller + */ + int stack_size; /* Trampoline stack size */ + int stack_args_off; /* Offset of stack arguments for calling + * func_addr, has to be at the top + */ + int reg_args_off; /* Offset of register arguments for calling + * func_addr + */ + int ip_off; /* For bpf_get_func_ip(), has to be at + * (ctx - 16) + */ + int arg_cnt_off; /* For bpf_get_func_arg_cnt(), has to be at + * (ctx - 8) + */ + int bpf_args_off; /* Offset of BPF_PROG context, which consists + * of BPF arguments followed by return value + */ + int retval_off; /* Offset of return value (see above) */ + int r7_r8_off; /* Offset of saved %r7 and %r8, which are used + * for __bpf_prog_enter() return value and + * func_addr respectively + */ + int r14_off; /* Offset of saved %r14 */ + int run_ctx_off; /* Offset of struct bpf_tramp_run_ctx */ + int do_fexit; /* do_fexit: label */ +}; + +static void load_imm64(struct bpf_jit *jit, int dst_reg, u64 val) +{ + /* llihf %dst_reg,val_hi */ + EMIT6_IMM(0xc00e0000, dst_reg, (val >> 32)); + /* oilf %rdst_reg,val_lo */ + EMIT6_IMM(0xc00d0000, dst_reg, val); +} + +static int invoke_bpf_prog(struct bpf_tramp_jit *tjit, + const struct btf_func_model *m, + struct bpf_tramp_link *tlink, bool save_ret) +{ + struct bpf_jit *jit = &tjit->common; + int cookie_off = tjit->run_ctx_off + + offsetof(struct bpf_tramp_run_ctx, bpf_cookie); + struct bpf_prog *p = tlink->link.prog; + int patch; + + /* + * run_ctx.cookie = tlink->cookie; + */ + + /* %r0 = tlink->cookie */ + load_imm64(jit, REG_W0, tlink->cookie); + /* stg %r0,cookie_off(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W0, REG_0, REG_15, cookie_off); + + /* + * if ((start = __bpf_prog_enter(p, &run_ctx)) == 0) + * goto skip; + */ + + /* %r1 = __bpf_prog_enter */ + load_imm64(jit, REG_1, (u64)bpf_trampoline_enter(p)); + /* %r2 = p */ + load_imm64(jit, REG_2, (u64)p); + /* la %r3,run_ctx_off(%r15) */ + EMIT4_DISP(0x41000000, REG_3, REG_15, tjit->run_ctx_off); + /* %r1() */ + call_r1(jit); + /* ltgr %r7,%r2 */ + EMIT4(0xb9020000, REG_7, REG_2); + /* brcl 8,skip */ + patch = jit->prg; + EMIT6_PCREL_RILC(0xc0040000, 8, 0); + + /* + * retval = bpf_func(args, p->insnsi); + */ + + /* %r1 = p->bpf_func */ + load_imm64(jit, REG_1, (u64)p->bpf_func); + /* la %r2,bpf_args_off(%r15) */ + EMIT4_DISP(0x41000000, REG_2, REG_15, tjit->bpf_args_off); + /* %r3 = p->insnsi */ + if (!p->jited) + load_imm64(jit, REG_3, (u64)p->insnsi); + /* %r1() */ + call_r1(jit); + /* stg %r2,retval_off(%r15) */ + if (save_ret) { + if (sign_extend(jit, REG_2, m->ret_size, m->ret_flags)) + return -1; + EMIT6_DISP_LH(0xe3000000, 0x0024, REG_2, REG_0, REG_15, + tjit->retval_off); + } + + /* skip: */ + if (jit->prg_buf) + *(u32 *)&jit->prg_buf[patch + 2] = (jit->prg - patch) >> 1; + + /* + * __bpf_prog_exit(p, start, &run_ctx); + */ + + /* %r1 = __bpf_prog_exit */ + load_imm64(jit, REG_1, (u64)bpf_trampoline_exit(p)); + /* %r2 = p */ + load_imm64(jit, REG_2, (u64)p); + /* lgr %r3,%r7 */ + EMIT4(0xb9040000, REG_3, REG_7); + /* la %r4,run_ctx_off(%r15) */ + EMIT4_DISP(0x41000000, REG_4, REG_15, tjit->run_ctx_off); + /* %r1() */ + call_r1(jit); + + return 0; +} + +static int alloc_stack(struct bpf_tramp_jit *tjit, size_t size) +{ + int stack_offset = tjit->stack_size; + + tjit->stack_size += size; + return stack_offset; +} + +/* ABI uses %r2 - %r6 for parameter passing. */ +#define MAX_NR_REG_ARGS 5 + +/* The "L" field of the "mvc" instruction is 8 bits. */ +#define MAX_MVC_SIZE 256 +#define MAX_NR_STACK_ARGS (MAX_MVC_SIZE / sizeof(u64)) + +/* -mfentry generates a 6-byte nop on s390x. */ +#define S390X_PATCH_SIZE 6 + +static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, + struct bpf_tramp_jit *tjit, + const struct btf_func_model *m, + u32 flags, + struct bpf_tramp_links *tlinks, + void *func_addr) +{ + struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; + struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; + struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; + int nr_bpf_args, nr_reg_args, nr_stack_args; + struct bpf_jit *jit = &tjit->common; + int arg, bpf_arg_off; + int i, j; + + /* Support as many stack arguments as "mvc" instruction can handle. */ + nr_reg_args = min_t(int, m->nr_args, MAX_NR_REG_ARGS); + nr_stack_args = m->nr_args - nr_reg_args; + if (nr_stack_args > MAX_NR_STACK_ARGS) + return -ENOTSUPP; + + /* Return to %r14, since func_addr and %r0 are not available. */ + if (!func_addr && !(flags & BPF_TRAMP_F_ORIG_STACK)) + flags |= BPF_TRAMP_F_SKIP_FRAME; + + /* + * Compute how many arguments we need to pass to BPF programs. + * BPF ABI mirrors that of x86_64: arguments that are 16 bytes or + * smaller are packed into 1 or 2 registers; larger arguments are + * passed via pointers. + * In s390x ABI, arguments that are 8 bytes or smaller are packed into + * a register; larger arguments are passed via pointers. + * We need to deal with this difference. + */ + nr_bpf_args = 0; + for (i = 0; i < m->nr_args; i++) { + if (m->arg_size[i] <= 8) + nr_bpf_args += 1; + else if (m->arg_size[i] <= 16) + nr_bpf_args += 2; + else + return -ENOTSUPP; + } + + /* + * Calculate the stack layout. + */ + + /* Reserve STACK_FRAME_OVERHEAD bytes for the callees. */ + tjit->stack_size = STACK_FRAME_OVERHEAD; + tjit->stack_args_off = alloc_stack(tjit, nr_stack_args * sizeof(u64)); + tjit->reg_args_off = alloc_stack(tjit, nr_reg_args * sizeof(u64)); + tjit->ip_off = alloc_stack(tjit, sizeof(u64)); + tjit->arg_cnt_off = alloc_stack(tjit, sizeof(u64)); + tjit->bpf_args_off = alloc_stack(tjit, nr_bpf_args * sizeof(u64)); + tjit->retval_off = alloc_stack(tjit, sizeof(u64)); + tjit->r7_r8_off = alloc_stack(tjit, 2 * sizeof(u64)); + tjit->r14_off = alloc_stack(tjit, sizeof(u64)); + tjit->run_ctx_off = alloc_stack(tjit, + sizeof(struct bpf_tramp_run_ctx)); + /* The caller has already reserved STACK_FRAME_OVERHEAD bytes. */ + tjit->stack_size -= STACK_FRAME_OVERHEAD; + tjit->orig_stack_args_off = tjit->stack_size + STACK_FRAME_OVERHEAD; + + /* aghi %r15,-stack_size */ + EMIT4_IMM(0xa70b0000, REG_15, -tjit->stack_size); + /* stmg %r2,%rN,fwd_reg_args_off(%r15) */ + if (nr_reg_args) + EMIT6_DISP_LH(0xeb000000, 0x0024, REG_2, + REG_2 + (nr_reg_args - 1), REG_15, + tjit->reg_args_off); + for (i = 0, j = 0; i < m->nr_args; i++) { + if (i < MAX_NR_REG_ARGS) + arg = REG_2 + i; + else + arg = tjit->orig_stack_args_off + + (i - MAX_NR_REG_ARGS) * sizeof(u64); + bpf_arg_off = tjit->bpf_args_off + j * sizeof(u64); + if (m->arg_size[i] <= 8) { + if (i < MAX_NR_REG_ARGS) + /* stg %arg,bpf_arg_off(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0024, arg, + REG_0, REG_15, bpf_arg_off); + else + /* mvc bpf_arg_off(8,%r15),arg(%r15) */ + _EMIT6(0xd207f000 | bpf_arg_off, + 0xf000 | arg); + j += 1; + } else { + if (i < MAX_NR_REG_ARGS) { + /* mvc bpf_arg_off(16,%r15),0(%arg) */ + _EMIT6(0xd20ff000 | bpf_arg_off, + reg2hex[arg] << 12); + } else { + /* lg %r1,arg(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0004, REG_1, REG_0, + REG_15, arg); + /* mvc bpf_arg_off(16,%r15),0(%r1) */ + _EMIT6(0xd20ff000 | bpf_arg_off, 0x1000); + } + j += 2; + } + } + /* stmg %r7,%r8,r7_r8_off(%r15) */ + EMIT6_DISP_LH(0xeb000000, 0x0024, REG_7, REG_8, REG_15, + tjit->r7_r8_off); + /* stg %r14,r14_off(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0024, REG_14, REG_0, REG_15, tjit->r14_off); + + if (flags & BPF_TRAMP_F_ORIG_STACK) { + /* + * The ftrace trampoline puts the return address (which is the + * address of the original function + S390X_PATCH_SIZE) into + * %r0; see ftrace_shared_hotpatch_trampoline_br and + * ftrace_init_nop() for details. + */ + + /* lgr %r8,%r0 */ + EMIT4(0xb9040000, REG_8, REG_0); + } else { + /* %r8 = func_addr + S390X_PATCH_SIZE */ + load_imm64(jit, REG_8, (u64)func_addr + S390X_PATCH_SIZE); + } + + /* + * ip = func_addr; + * arg_cnt = m->nr_args; + */ + + if (flags & BPF_TRAMP_F_IP_ARG) { + /* %r0 = func_addr */ + load_imm64(jit, REG_0, (u64)func_addr); + /* stg %r0,ip_off(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0024, REG_0, REG_0, REG_15, + tjit->ip_off); + } + /* lghi %r0,nr_bpf_args */ + EMIT4_IMM(0xa7090000, REG_0, nr_bpf_args); + /* stg %r0,arg_cnt_off(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0024, REG_0, REG_0, REG_15, + tjit->arg_cnt_off); + + if (flags & BPF_TRAMP_F_CALL_ORIG) { + /* + * __bpf_tramp_enter(im); + */ + + /* %r1 = __bpf_tramp_enter */ + load_imm64(jit, REG_1, (u64)__bpf_tramp_enter); + /* %r2 = im */ + load_imm64(jit, REG_2, (u64)im); + /* %r1() */ + call_r1(jit); + } + + for (i = 0; i < fentry->nr_links; i++) + if (invoke_bpf_prog(tjit, m, fentry->links[i], + flags & BPF_TRAMP_F_RET_FENTRY_RET)) + return -EINVAL; + + if (fmod_ret->nr_links) { + /* + * retval = 0; + */ + + /* xc retval_off(8,%r15),retval_off(%r15) */ + _EMIT6(0xd707f000 | tjit->retval_off, + 0xf000 | tjit->retval_off); + + for (i = 0; i < fmod_ret->nr_links; i++) { + if (invoke_bpf_prog(tjit, m, fmod_ret->links[i], true)) + return -EINVAL; + + /* + * if (retval) + * goto do_fexit; + */ + + /* ltg %r0,retval_off(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0002, REG_0, REG_0, REG_15, + tjit->retval_off); + /* brcl 7,do_fexit */ + EMIT6_PCREL_RILC(0xc0040000, 7, tjit->do_fexit); + } + } + + if (flags & BPF_TRAMP_F_CALL_ORIG) { + /* + * retval = func_addr(args); + */ + + /* lmg %r2,%rN,reg_args_off(%r15) */ + if (nr_reg_args) + EMIT6_DISP_LH(0xeb000000, 0x0004, REG_2, + REG_2 + (nr_reg_args - 1), REG_15, + tjit->reg_args_off); + /* mvc stack_args_off(N,%r15),orig_stack_args_off(%r15) */ + if (nr_stack_args) + _EMIT6(0xd200f000 | + (nr_stack_args * sizeof(u64) - 1) << 16 | + tjit->stack_args_off, + 0xf000 | tjit->orig_stack_args_off); + /* lgr %r1,%r8 */ + EMIT4(0xb9040000, REG_1, REG_8); + /* %r1() */ + call_r1(jit); + /* stg %r2,retval_off(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0024, REG_2, REG_0, REG_15, + tjit->retval_off); + + im->ip_after_call = jit->prg_buf + jit->prg; + + /* + * The following nop will be patched by bpf_tramp_image_put(). + */ + + /* brcl 0,im->ip_epilogue */ + EMIT6_PCREL_RILC(0xc0040000, 0, (u64)im->ip_epilogue); + } + + /* do_fexit: */ + tjit->do_fexit = jit->prg; + for (i = 0; i < fexit->nr_links; i++) + if (invoke_bpf_prog(tjit, m, fexit->links[i], false)) + return -EINVAL; + + if (flags & BPF_TRAMP_F_CALL_ORIG) { + im->ip_epilogue = jit->prg_buf + jit->prg; + + /* + * __bpf_tramp_exit(im); + */ + + /* %r1 = __bpf_tramp_exit */ + load_imm64(jit, REG_1, (u64)__bpf_tramp_exit); + /* %r2 = im */ + load_imm64(jit, REG_2, (u64)im); + /* %r1() */ + call_r1(jit); + } + + /* lmg %r2,%rN,reg_args_off(%r15) */ + if ((flags & BPF_TRAMP_F_RESTORE_REGS) && nr_reg_args) + EMIT6_DISP_LH(0xeb000000, 0x0004, REG_2, + REG_2 + (nr_reg_args - 1), REG_15, + tjit->reg_args_off); + /* lgr %r1,%r8 */ + if (!(flags & BPF_TRAMP_F_SKIP_FRAME)) + EMIT4(0xb9040000, REG_1, REG_8); + /* lmg %r7,%r8,r7_r8_off(%r15) */ + EMIT6_DISP_LH(0xeb000000, 0x0004, REG_7, REG_8, REG_15, + tjit->r7_r8_off); + /* lg %r14,r14_off(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0004, REG_14, REG_0, REG_15, tjit->r14_off); + /* lg %r2,retval_off(%r15) */ + if (flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET)) + EMIT6_DISP_LH(0xe3000000, 0x0004, REG_2, REG_0, REG_15, + tjit->retval_off); + /* aghi %r15,stack_size */ + EMIT4_IMM(0xa70b0000, REG_15, tjit->stack_size); + /* Emit an expoline for the following indirect jump. */ + if (nospec_uses_trampoline()) + emit_expoline(jit); + if (flags & BPF_TRAMP_F_SKIP_FRAME) + /* br %r14 */ + _EMIT2(0x07fe); + else + /* br %r1 */ + _EMIT2(0x07f1); + + emit_r1_thunk(jit); + + return 0; +} + +int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, + void *image_end, const struct btf_func_model *m, + u32 flags, struct bpf_tramp_links *tlinks, + void *func_addr) +{ + struct bpf_tramp_jit tjit; + int ret; + int i; + + for (i = 0; i < 2; i++) { + if (i == 0) { + /* Compute offsets, check whether the code fits. */ + memset(&tjit, 0, sizeof(tjit)); + } else { + /* Generate the code. */ + tjit.common.prg = 0; + tjit.common.prg_buf = image; + } + ret = __arch_prepare_bpf_trampoline(im, &tjit, m, flags, + tlinks, func_addr); + if (ret < 0) + return ret; + if (tjit.common.prg > (char *)image_end - (char *)image) + /* + * Use the same error code as for exceeding + * BPF_MAX_TRAMP_LINKS. + */ + return -E2BIG; + } + + return ret; +} diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c411c6bb86c4..e11db75094d0 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -944,7 +944,11 @@ struct btf_func_model { * bytes on x86. */ enum { +#if defined(__s390x__) + BPF_MAX_TRAMP_LINKS = 27, +#else BPF_MAX_TRAMP_LINKS = 38, +#endif }; struct bpf_tramp_links { -- cgit v1.2.3 From dd691e847d28ac5f8b8e3005be44fd0e46722809 Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Sun, 29 Jan 2023 20:04:58 +0100 Subject: s390/bpf: Implement bpf_jit_supports_subprog_tailcalls() Allow mixing subprogs and tail calls by passing the current tail call count to subprogs. Signed-off-by: Ilya Leoshkevich Link: https://lore.kernel.org/r/20230129190501.1624747-6-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov --- arch/s390/net/bpf_jit_comp.c | 37 +++++++++++++++++++++++++++---------- 1 file changed, 27 insertions(+), 10 deletions(-) (limited to 'arch') diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 3fe082d5328d..b84c7ddb758a 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -58,7 +58,6 @@ struct bpf_jit { #define SEEN_MEM BIT(0) /* use mem[] for temporary storage */ #define SEEN_LITERAL BIT(1) /* code uses literals */ #define SEEN_FUNC BIT(2) /* calls C functions */ -#define SEEN_TAIL_CALL BIT(3) /* code uses tail calls */ #define SEEN_STACK (SEEN_FUNC | SEEN_MEM) /* @@ -549,20 +548,23 @@ static void bpf_jit_plt(void *plt, void *ret, void *target) * Save registers and create stack frame if necessary. * See stack frame layout description in "bpf_jit.h"! */ -static void bpf_jit_prologue(struct bpf_jit *jit, u32 stack_depth) +static void bpf_jit_prologue(struct bpf_jit *jit, struct bpf_prog *fp, + u32 stack_depth) { /* No-op for hotpatching */ /* brcl 0,prologue_plt */ EMIT6_PCREL_RILC(0xc0040000, 0, jit->prologue_plt); jit->prologue_plt_ret = jit->prg; - if (jit->seen & SEEN_TAIL_CALL) { + if (fp->aux->func_idx == 0) { + /* Initialize the tail call counter in the main program. */ /* xc STK_OFF_TCCNT(4,%r15),STK_OFF_TCCNT(%r15) */ _EMIT6(0xd703f000 | STK_OFF_TCCNT, 0xf000 | STK_OFF_TCCNT); } else { /* - * There are no tail calls. Insert nops in order to have - * tail_call_start at a predictable offset. + * Skip the tail call counter initialization in subprograms. + * Insert nops in order to have tail_call_start at a + * predictable offset. */ bpf_skip(jit, 6); } @@ -1410,6 +1412,19 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, REG_SET_SEEN(BPF_REG_5); jit->seen |= SEEN_FUNC; + /* + * Copy the tail call counter to where the callee expects it. + * + * Note 1: The callee can increment the tail call counter, but + * we do not load it back, since the x86 JIT does not do this + * either. + * + * Note 2: We assume that the verifier does not let us call the + * main program, which clears the tail call counter on entry. + */ + /* mvc STK_OFF_TCCNT(4,%r15),N(%r15) */ + _EMIT6(0xd203f000 | STK_OFF_TCCNT, + 0xf000 | (STK_OFF_TCCNT + STK_OFF + stack_depth)); /* lgrl %w1,func */ EMIT6_PCREL_RILB(0xc4080000, REG_W1, _EMIT_CONST_U64(func)); /* %r1() */ @@ -1426,10 +1441,7 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, * B1: pointer to ctx * B2: pointer to bpf_array * B3: index in bpf_array - */ - jit->seen |= SEEN_TAIL_CALL; - - /* + * * if (index >= array->map.max_entries) * goto out; */ @@ -1793,7 +1805,7 @@ static int bpf_jit_prog(struct bpf_jit *jit, struct bpf_prog *fp, jit->prg = 0; jit->excnt = 0; - bpf_jit_prologue(jit, stack_depth); + bpf_jit_prologue(jit, fp, stack_depth); if (bpf_set_addr(jit, 0) < 0) return -1; for (i = 0; i < fp->len; i += insn_count) { @@ -2462,3 +2474,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, return ret; } + +bool bpf_jit_supports_subprog_tailcalls(void) +{ + return true; +} -- cgit v1.2.3 From 63d7b53ab59f0dd8540e672b49836493d1fa0a79 Mon Sep 17 00:00:00 2001 From: Ilya Leoshkevich Date: Sun, 29 Jan 2023 20:04:59 +0100 Subject: s390/bpf: Implement bpf_jit_supports_kfunc_call() Implement calling kernel functions from eBPF. In general, the eBPF ABI is fairly close to that of s390x, with one important difference: on s390x callers should sign-extend signed arguments. Handle that by using information returned by bpf_jit_find_kfunc_model(). Here is an example of how sign extensions works. Suppose we need to call the following function from BPF: ; long noinline bpf_kfunc_call_test4(signed char a, short b, int c, long d) 0000000000936a78 : 936a78: c0 04 00 00 00 00 jgnop bpf_kfunc_call_test4 ; return (long)a + (long)b + (long)c + d; 936a7e: b9 08 00 45 agr %r4,%r5 936a82: b9 08 00 43 agr %r4,%r3 936a86: b9 08 00 24 agr %r2,%r4 936a8a: c0 f4 00 1e 3b 27 jg <__s390_indirect_jump_r14> As per the s390x ABI, bpf_kfunc_call_test4() has the right to assume that a, b and c are sign-extended by the caller, which results in using 64-bit additions (agr) without any additional conversions. Without sign extension we would have the following on the JITed code side: ; tmp = bpf_kfunc_call_test4(-3, -30, -200, -1000); ; 5: b4 10 00 00 ff ff ff fd w1 = -3 0x3ff7fdcdad4: llilf %r2,0xfffffffd ; 6: b4 20 00 00 ff ff ff e2 w2 = -30 0x3ff7fdcdada: llilf %r3,0xffffffe2 ; 7: b4 30 00 00 ff ff ff 38 w3 = -200 0x3ff7fdcdae0: llilf %r4,0xffffff38 ; 8: b7 40 00 00 ff ff fc 18 r4 = -1000 0x3ff7fdcdae6: lgfi %r5,-1000 0x3ff7fdcdaec: mvc 64(4,%r15),160(%r15) 0x3ff7fdcdaf2: lgrl %r1,bpf_kfunc_call_test4@GOT 0x3ff7fdcdaf8: brasl %r14,__s390_indirect_jump_r1 This first 3 llilfs are 32-bit loads, that need to be sign-extended to 64 bits. Note: at the moment bpf_jit_find_kfunc_model() does not seem to play nicely with XDP metadata functions: add_kfunc_call() adds an "abstract" bpf_*() version to kfunc_btf_tab, but then fixup_kfunc_call() puts the concrete version into insn->imm, which bpf_jit_find_kfunc_model() cannot find. But this seems to be a common code problem. Signed-off-by: Ilya Leoshkevich Link: https://lore.kernel.org/r/20230129190501.1624747-7-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov --- arch/s390/net/bpf_jit_comp.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index b84c7ddb758a..d0846ba818ee 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -1401,9 +1401,10 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, */ case BPF_JMP | BPF_CALL: { - u64 func; + const struct btf_func_model *m; bool func_addr_fixed; - int ret; + int j, ret; + u64 func; ret = bpf_jit_get_func_addr(fp, insn, extra_pass, &func, &func_addr_fixed); @@ -1425,6 +1426,21 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, /* mvc STK_OFF_TCCNT(4,%r15),N(%r15) */ _EMIT6(0xd203f000 | STK_OFF_TCCNT, 0xf000 | (STK_OFF_TCCNT + STK_OFF + stack_depth)); + + /* Sign-extend the kfunc arguments. */ + if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) { + m = bpf_jit_find_kfunc_model(fp, insn); + if (!m) + return -1; + + for (j = 0; j < m->nr_args; j++) { + if (sign_extend(jit, BPF_REG_1 + j, + m->arg_size[j], + m->arg_flags[j])) + return -1; + } + } + /* lgrl %w1,func */ EMIT6_PCREL_RILB(0xc4080000, REG_W1, _EMIT_CONST_U64(func)); /* %r1() */ @@ -1980,6 +1996,11 @@ out: return fp; } +bool bpf_jit_supports_kfunc_call(void) +{ + return true; +} + int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, void *old_addr, void *new_addr) { -- cgit v1.2.3 From 2083656bb30df231caad56abb4f1c2a6366f5923 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 27 Jan 2023 23:31:08 -0800 Subject: sh: checksum: add missing linux/uaccess.h include SuperH does not include uaccess.h, even tho it calls access_ok(). Fixes: 68f4eae781dd ("net: checksum: drop the linux/uaccess.h include") Reviewed-by: Simon Horman Tested-by: Simon Horman Link: https://lore.kernel.org/r/20230128073108.1603095-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- arch/sh/include/asm/checksum_32.h | 1 + 1 file changed, 1 insertion(+) (limited to 'arch') diff --git a/arch/sh/include/asm/checksum_32.h b/arch/sh/include/asm/checksum_32.h index a6501b856f3e..2b5fa75b4651 100644 --- a/arch/sh/include/asm/checksum_32.h +++ b/arch/sh/include/asm/checksum_32.h @@ -7,6 +7,7 @@ */ #include +#include /* * computes the checksum of a memory block at buff, length len, -- cgit v1.2.3 From 64f50f6575721ef03d001e907455cbe3baa2a5b1 Mon Sep 17 00:00:00 2001 From: Hengqi Chen Date: Tue, 14 Feb 2023 15:26:33 +0000 Subject: LoongArch, bpf: Use 4 instructions for function address in JIT This patch fixes the following issue of function calls in JIT, like: [ 29.346981] multi-func JIT bug 105 != 103 The issus can be reproduced by running the "inline simple bpf_loop call" verifier test. This is because we are emiting 2-4 instructions for 64-bit immediate moves. During the first pass of JIT, the placeholder address is zero, emiting two instructions for it. In the extra pass, the function address is in XKVRANGE, emiting four instructions for it. This change the instruction index in JIT context. Let's always use 4 instructions for function address in JIT. So that the instruction sequences don't change between the first pass and the extra pass for function calls. Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support") Signed-off-by: Hengqi Chen Signed-off-by: Daniel Borkmann Tested-by: Tiezhu Yang Link: https://lore.kernel.org/bpf/20230214152633.2265699-1-hengqi.chen@gmail.com --- arch/loongarch/net/bpf_jit.c | 2 +- arch/loongarch/net/bpf_jit.h | 21 +++++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) (limited to 'arch') diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c index c4b1947ebf76..288003a9f0ca 100644 --- a/arch/loongarch/net/bpf_jit.c +++ b/arch/loongarch/net/bpf_jit.c @@ -841,7 +841,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, bool ext if (ret < 0) return ret; - move_imm(ctx, t1, func_addr, is32); + move_addr(ctx, t1, func_addr); emit_insn(ctx, jirl, t1, LOONGARCH_GPR_RA, 0); move_reg(ctx, regmap[BPF_REG_0], LOONGARCH_GPR_A0); break; diff --git a/arch/loongarch/net/bpf_jit.h b/arch/loongarch/net/bpf_jit.h index ca708024fdd3..c335dc4eed37 100644 --- a/arch/loongarch/net/bpf_jit.h +++ b/arch/loongarch/net/bpf_jit.h @@ -82,6 +82,27 @@ static inline void emit_sext_32(struct jit_ctx *ctx, enum loongarch_gpr reg, boo emit_insn(ctx, addiw, reg, reg, 0); } +static inline void move_addr(struct jit_ctx *ctx, enum loongarch_gpr rd, u64 addr) +{ + u64 imm_11_0, imm_31_12, imm_51_32, imm_63_52; + + /* lu12iw rd, imm_31_12 */ + imm_31_12 = (addr >> 12) & 0xfffff; + emit_insn(ctx, lu12iw, rd, imm_31_12); + + /* ori rd, rd, imm_11_0 */ + imm_11_0 = addr & 0xfff; + emit_insn(ctx, ori, rd, rd, imm_11_0); + + /* lu32id rd, imm_51_32 */ + imm_51_32 = (addr >> 32) & 0xfffff; + emit_insn(ctx, lu32id, rd, imm_51_32); + + /* lu52id rd, rd, imm_63_52 */ + imm_63_52 = (addr >> 52) & 0xfff; + emit_insn(ctx, lu52id, rd, rd, imm_63_52); +} + static inline void move_imm(struct jit_ctx *ctx, enum loongarch_gpr rd, long imm, bool is32) { long imm_11_0, imm_31_12, imm_51_32, imm_63_52, imm_51_0, imm_51_31; -- cgit v1.2.3 From 5e57fb7b0bd3ea7e994ef1c0ab3562d1fe0676b2 Mon Sep 17 00:00:00 2001 From: Pu Lehui Date: Wed, 15 Feb 2023 21:52:02 +0800 Subject: riscv: Extend patch_text for multiple instructions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Extend patch_text for multiple instructions. This is the preparaiton for multiple instructions text patching in riscv BPF trampoline, and may be useful for other scenario. Signed-off-by: Pu Lehui Signed-off-by: Daniel Borkmann Tested-by: Björn Töpel Reviewed-by: Conor Dooley Acked-by: Björn Töpel Link: https://lore.kernel.org/bpf/20230215135205.1411105-2-pulehui@huaweicloud.com --- arch/riscv/include/asm/patch.h | 2 +- arch/riscv/kernel/patch.c | 19 ++++++++++++------- arch/riscv/kernel/probes/kprobes.c | 15 ++++++++------- 3 files changed, 21 insertions(+), 15 deletions(-) (limited to 'arch') diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h index 9a7d7346001e..f433121774c0 100644 --- a/arch/riscv/include/asm/patch.h +++ b/arch/riscv/include/asm/patch.h @@ -7,6 +7,6 @@ #define _ASM_RISCV_PATCH_H int patch_text_nosync(void *addr, const void *insns, size_t len); -int patch_text(void *addr, u32 insn); +int patch_text(void *addr, u32 *insns, int ninsns); #endif /* _ASM_RISCV_PATCH_H */ diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c index 765004b60513..8086d1a281cd 100644 --- a/arch/riscv/kernel/patch.c +++ b/arch/riscv/kernel/patch.c @@ -15,7 +15,8 @@ struct patch_insn { void *addr; - u32 insn; + u32 *insns; + int ninsns; atomic_t cpu_count; }; @@ -102,12 +103,15 @@ NOKPROBE_SYMBOL(patch_text_nosync); static int patch_text_cb(void *data) { struct patch_insn *patch = data; - int ret = 0; + unsigned long len; + int i, ret = 0; if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) { - ret = - patch_text_nosync(patch->addr, &patch->insn, - GET_INSN_LENGTH(patch->insn)); + for (i = 0; ret == 0 && i < patch->ninsns; i++) { + len = GET_INSN_LENGTH(patch->insns[i]); + ret = patch_text_nosync(patch->addr + i * len, + &patch->insns[i], len); + } atomic_inc(&patch->cpu_count); } else { while (atomic_read(&patch->cpu_count) <= num_online_cpus()) @@ -119,11 +123,12 @@ static int patch_text_cb(void *data) } NOKPROBE_SYMBOL(patch_text_cb); -int patch_text(void *addr, u32 insn) +int patch_text(void *addr, u32 *insns, int ninsns) { struct patch_insn patch = { .addr = addr, - .insn = insn, + .insns = insns, + .ninsns = ninsns, .cpu_count = ATOMIC_INIT(0), }; diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c index 41c7481afde3..ef6d6e702485 100644 --- a/arch/riscv/kernel/probes/kprobes.c +++ b/arch/riscv/kernel/probes/kprobes.c @@ -23,13 +23,14 @@ post_kprobe_handler(struct kprobe *, struct kprobe_ctlblk *, struct pt_regs *); static void __kprobes arch_prepare_ss_slot(struct kprobe *p) { + u32 insn = __BUG_INSN_32; unsigned long offset = GET_INSN_LENGTH(p->opcode); p->ainsn.api.restore = (unsigned long)p->addr + offset; - patch_text(p->ainsn.api.insn, p->opcode); + patch_text(p->ainsn.api.insn, &p->opcode, 1); patch_text((void *)((unsigned long)(p->ainsn.api.insn) + offset), - __BUG_INSN_32); + &insn, 1); } static void __kprobes arch_prepare_simulate(struct kprobe *p) @@ -114,16 +115,16 @@ void *alloc_insn_page(void) /* install breakpoint in text */ void __kprobes arch_arm_kprobe(struct kprobe *p) { - if ((p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32) - patch_text(p->addr, __BUG_INSN_32); - else - patch_text(p->addr, __BUG_INSN_16); + u32 insn = (p->opcode & __INSN_LENGTH_MASK) == __INSN_LENGTH_32 ? + __BUG_INSN_32 : __BUG_INSN_16; + + patch_text(p->addr, &insn, 1); } /* remove breakpoint from text */ void __kprobes arch_disarm_kprobe(struct kprobe *p) { - patch_text(p->addr, p->opcode); + patch_text(p->addr, &p->opcode, 1); } void __kprobes arch_remove_kprobe(struct kprobe *p) -- cgit v1.2.3 From 0fd1fd0104954380477353aea29c347e85dff16d Mon Sep 17 00:00:00 2001 From: Pu Lehui Date: Wed, 15 Feb 2023 21:52:03 +0800 Subject: riscv, bpf: Factor out emit_call for kernel and bpf context MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The current emit_call function is not suitable for kernel function call as it store return value to bpf R0 register. We can separate it out for common use. Meanwhile, simplify judgment logic, that is, fixed function address can use jal or auipc+jalr, while the unfixed can use only auipc+jalr. Signed-off-by: Pu Lehui Signed-off-by: Daniel Borkmann Tested-by: Björn Töpel Acked-by: Björn Töpel Link: https://lore.kernel.org/bpf/20230215135205.1411105-3-pulehui@huaweicloud.com --- arch/riscv/net/bpf_jit_comp64.c | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) (limited to 'arch') diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index f2417ac54edd..69ebab81d935 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -428,12 +428,12 @@ static void emit_sext_32_rd(u8 *rd, struct rv_jit_context *ctx) *rd = RV_REG_T2; } -static int emit_jump_and_link(u8 rd, s64 rvoff, bool force_jalr, +static int emit_jump_and_link(u8 rd, s64 rvoff, bool fixed_addr, struct rv_jit_context *ctx) { s64 upper, lower; - if (rvoff && is_21b_int(rvoff) && !force_jalr) { + if (rvoff && fixed_addr && is_21b_int(rvoff)) { emit(rv_jal(rd, rvoff >> 1), ctx); return 0; } else if (in_auipc_jalr_range(rvoff)) { @@ -454,24 +454,17 @@ static bool is_signed_bpf_cond(u8 cond) cond == BPF_JSGE || cond == BPF_JSLE; } -static int emit_call(bool fixed, u64 addr, struct rv_jit_context *ctx) +static int emit_call(u64 addr, bool fixed_addr, struct rv_jit_context *ctx) { s64 off = 0; u64 ip; - u8 rd; - int ret; if (addr && ctx->insns) { ip = (u64)(long)(ctx->insns + ctx->ninsns); off = addr - ip; } - ret = emit_jump_and_link(RV_REG_RA, off, !fixed, ctx); - if (ret) - return ret; - rd = bpf_to_rv_reg(BPF_REG_0, ctx); - emit_mv(rd, RV_REG_A0, ctx); - return 0; + return emit_jump_and_link(RV_REG_RA, off, fixed_addr, ctx); } static void emit_atomic(u8 rd, u8 rs, s16 off, s32 imm, bool is64, @@ -913,7 +906,7 @@ out_be: /* JUMP off */ case BPF_JMP | BPF_JA: rvoff = rv_offset(i, off, ctx); - ret = emit_jump_and_link(RV_REG_ZERO, rvoff, false, ctx); + ret = emit_jump_and_link(RV_REG_ZERO, rvoff, true, ctx); if (ret) return ret; break; @@ -1032,17 +1025,20 @@ out_be: /* function call */ case BPF_JMP | BPF_CALL: { - bool fixed; + bool fixed_addr; u64 addr; mark_call(ctx); - ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass, &addr, - &fixed); + ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass, + &addr, &fixed_addr); if (ret < 0) return ret; - ret = emit_call(fixed, addr, ctx); + + ret = emit_call(addr, fixed_addr, ctx); if (ret) return ret; + + emit_mv(bpf_to_rv_reg(BPF_REG_0, ctx), RV_REG_A0, ctx); break; } /* tail call */ @@ -1057,7 +1053,7 @@ out_be: break; rvoff = epilogue_offset(ctx); - ret = emit_jump_and_link(RV_REG_ZERO, rvoff, false, ctx); + ret = emit_jump_and_link(RV_REG_ZERO, rvoff, true, ctx); if (ret) return ret; break; -- cgit v1.2.3 From 596f2e6f9cf41436a5512a3f278c86da5c5598fb Mon Sep 17 00:00:00 2001 From: Pu Lehui Date: Wed, 15 Feb 2023 21:52:04 +0800 Subject: riscv, bpf: Add bpf_arch_text_poke support for RV64 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implement bpf_arch_text_poke for RV64. For call scenario, to make BPF trampoline compatible with the kernel and BPF context, we follow the framework of RV64 ftrace to reserve 4 nops for BPF programs as function entry, and use auipc+jalr instructions for function call. However, since auipc+jalr call instruction is non-atomic operation, we need to use stop-machine to make sure instructions patching in atomic context. Also, we use auipc+jalr pair and need to patch in stop-machine context for jump scenario. Signed-off-by: Pu Lehui Signed-off-by: Daniel Borkmann Tested-by: Björn Töpel Acked-by: Björn Töpel Link: https://lore.kernel.org/bpf/20230215135205.1411105-4-pulehui@huaweicloud.com --- arch/riscv/net/bpf_jit.h | 5 +++ arch/riscv/net/bpf_jit_comp64.c | 88 ++++++++++++++++++++++++++++++++++++++++- 2 files changed, 91 insertions(+), 2 deletions(-) (limited to 'arch') diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h index d926e0f7ef57..bf9802a63061 100644 --- a/arch/riscv/net/bpf_jit.h +++ b/arch/riscv/net/bpf_jit.h @@ -573,6 +573,11 @@ static inline u32 rv_fence(u8 pred, u8 succ) return rv_i_insn(imm11_0, 0, 0, 0, 0xf); } +static inline u32 rv_nop(void) +{ + return rv_i_insn(0, 0, 0, 0, 0x13); +} + /* RVC instrutions. */ static inline u16 rvc_addi4spn(u8 rd, u32 imm10) diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index 69ebab81d935..b6b9bbcc977a 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -8,6 +8,8 @@ #include #include #include +#include +#include #include "bpf_jit.h" #define RV_REG_TCC RV_REG_A6 @@ -238,7 +240,7 @@ static void __build_epilogue(bool is_tail_call, struct rv_jit_context *ctx) if (!is_tail_call) emit_mv(RV_REG_A0, RV_REG_A5, ctx); emit_jalr(RV_REG_ZERO, is_tail_call ? RV_REG_T3 : RV_REG_RA, - is_tail_call ? 4 : 0, /* skip TCC init */ + is_tail_call ? 20 : 0, /* skip reserved nops and TCC init */ ctx); } @@ -615,6 +617,84 @@ static int add_exception_handler(const struct bpf_insn *insn, return 0; } +static int gen_call_or_nops(void *target, void *ip, u32 *insns) +{ + s64 rvoff; + int i, ret; + struct rv_jit_context ctx; + + ctx.ninsns = 0; + ctx.insns = (u16 *)insns; + + if (!target) { + for (i = 0; i < 4; i++) + emit(rv_nop(), &ctx); + return 0; + } + + rvoff = (s64)(target - (ip + 4)); + emit(rv_sd(RV_REG_SP, -8, RV_REG_RA), &ctx); + ret = emit_jump_and_link(RV_REG_RA, rvoff, false, &ctx); + if (ret) + return ret; + emit(rv_ld(RV_REG_RA, -8, RV_REG_SP), &ctx); + + return 0; +} + +static int gen_jump_or_nops(void *target, void *ip, u32 *insns) +{ + s64 rvoff; + struct rv_jit_context ctx; + + ctx.ninsns = 0; + ctx.insns = (u16 *)insns; + + if (!target) { + emit(rv_nop(), &ctx); + emit(rv_nop(), &ctx); + return 0; + } + + rvoff = (s64)(target - ip); + return emit_jump_and_link(RV_REG_ZERO, rvoff, false, &ctx); +} + +int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type, + void *old_addr, void *new_addr) +{ + u32 old_insns[4], new_insns[4]; + bool is_call = poke_type == BPF_MOD_CALL; + int (*gen_insns)(void *target, void *ip, u32 *insns); + int ninsns = is_call ? 4 : 2; + int ret; + + if (!is_bpf_text_address((unsigned long)ip)) + return -ENOTSUPP; + + gen_insns = is_call ? gen_call_or_nops : gen_jump_or_nops; + + ret = gen_insns(old_addr, ip, old_insns); + if (ret) + return ret; + + if (memcmp(ip, old_insns, ninsns * 4)) + return -EFAULT; + + ret = gen_insns(new_addr, ip, new_insns); + if (ret) + return ret; + + cpus_read_lock(); + mutex_lock(&text_mutex); + if (memcmp(ip, new_insns, ninsns * 4)) + ret = patch_text(ip, new_insns, ninsns); + mutex_unlock(&text_mutex); + cpus_read_unlock(); + + return ret; +} + int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, bool extra_pass) { @@ -1266,7 +1346,7 @@ out_be: void bpf_jit_build_prologue(struct rv_jit_context *ctx) { - int stack_adjust = 0, store_offset, bpf_stack_adjust; + int i, stack_adjust = 0, store_offset, bpf_stack_adjust; bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); if (bpf_stack_adjust) @@ -1293,6 +1373,10 @@ void bpf_jit_build_prologue(struct rv_jit_context *ctx) store_offset = stack_adjust - 8; + /* reserve 4 nop insns */ + for (i = 0; i < 4; i++) + emit(rv_nop(), ctx); + /* First instruction is always setting the tail-call-counter * (TCC) register. This instruction is skipped for tail calls. * Force using a 4-byte (non-compressed) instruction. -- cgit v1.2.3 From 49b5e77ae3e214acff4728595b4ac7bf776693ca Mon Sep 17 00:00:00 2001 From: Pu Lehui Date: Wed, 15 Feb 2023 21:52:05 +0800 Subject: riscv, bpf: Add bpf trampoline support for RV64 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit BPF trampoline is the critical infrastructure of the BPF subsystem, acting as a mediator between kernel functions and BPF programs. Numerous important features, such as using BPF program for zero overhead kernel introspection, rely on this key component. We can't wait to support bpf trampoline on RV64. The related tests have passed, as well as the test_verifier with no new failure ceses. Signed-off-by: Pu Lehui Signed-off-by: Daniel Borkmann Tested-by: Björn Töpel Acked-by: Björn Töpel Link: https://lore.kernel.org/bpf/20230215135205.1411105-5-pulehui@huaweicloud.com --- arch/riscv/net/bpf_jit_comp64.c | 317 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 317 insertions(+) (limited to 'arch') diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index b6b9bbcc977a..f5a668736c79 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -695,6 +695,323 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type poke_type, return ret; } +static void store_args(int nregs, int args_off, struct rv_jit_context *ctx) +{ + int i; + + for (i = 0; i < nregs; i++) { + emit_sd(RV_REG_FP, -args_off, RV_REG_A0 + i, ctx); + args_off -= 8; + } +} + +static void restore_args(int nregs, int args_off, struct rv_jit_context *ctx) +{ + int i; + + for (i = 0; i < nregs; i++) { + emit_ld(RV_REG_A0 + i, -args_off, RV_REG_FP, ctx); + args_off -= 8; + } +} + +static int invoke_bpf_prog(struct bpf_tramp_link *l, int args_off, int retval_off, + int run_ctx_off, bool save_ret, struct rv_jit_context *ctx) +{ + int ret, branch_off; + struct bpf_prog *p = l->link.prog; + int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie); + + if (l->cookie) { + emit_imm(RV_REG_T1, l->cookie, ctx); + emit_sd(RV_REG_FP, -run_ctx_off + cookie_off, RV_REG_T1, ctx); + } else { + emit_sd(RV_REG_FP, -run_ctx_off + cookie_off, RV_REG_ZERO, ctx); + } + + /* arg1: prog */ + emit_imm(RV_REG_A0, (const s64)p, ctx); + /* arg2: &run_ctx */ + emit_addi(RV_REG_A1, RV_REG_FP, -run_ctx_off, ctx); + ret = emit_call((const u64)bpf_trampoline_enter(p), true, ctx); + if (ret) + return ret; + + /* if (__bpf_prog_enter(prog) == 0) + * goto skip_exec_of_prog; + */ + branch_off = ctx->ninsns; + /* nop reserved for conditional jump */ + emit(rv_nop(), ctx); + + /* store prog start time */ + emit_mv(RV_REG_S1, RV_REG_A0, ctx); + + /* arg1: &args_off */ + emit_addi(RV_REG_A0, RV_REG_FP, -args_off, ctx); + if (!p->jited) + /* arg2: progs[i]->insnsi for interpreter */ + emit_imm(RV_REG_A1, (const s64)p->insnsi, ctx); + ret = emit_call((const u64)p->bpf_func, true, ctx); + if (ret) + return ret; + + if (save_ret) + emit_sd(RV_REG_FP, -retval_off, regmap[BPF_REG_0], ctx); + + /* update branch with beqz */ + if (ctx->insns) { + int offset = ninsns_rvoff(ctx->ninsns - branch_off); + u32 insn = rv_beq(RV_REG_A0, RV_REG_ZERO, offset >> 1); + *(u32 *)(ctx->insns + branch_off) = insn; + } + + /* arg1: prog */ + emit_imm(RV_REG_A0, (const s64)p, ctx); + /* arg2: prog start time */ + emit_mv(RV_REG_A1, RV_REG_S1, ctx); + /* arg3: &run_ctx */ + emit_addi(RV_REG_A2, RV_REG_FP, -run_ctx_off, ctx); + ret = emit_call((const u64)bpf_trampoline_exit(p), true, ctx); + + return ret; +} + +static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, + const struct btf_func_model *m, + struct bpf_tramp_links *tlinks, + void *func_addr, u32 flags, + struct rv_jit_context *ctx) +{ + int i, ret, offset; + int *branches_off = NULL; + int stack_size = 0, nregs = m->nr_args; + int retaddr_off, fp_off, retval_off, args_off; + int nregs_off, ip_off, run_ctx_off, sreg_off; + struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; + struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; + struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; + void *orig_call = func_addr; + bool save_ret; + u32 insn; + + /* Generated trampoline stack layout: + * + * FP - 8 [ RA of parent func ] return address of parent + * function + * FP - retaddr_off [ RA of traced func ] return address of traced + * function + * FP - fp_off [ FP of parent func ] + * + * FP - retval_off [ return value ] BPF_TRAMP_F_CALL_ORIG or + * BPF_TRAMP_F_RET_FENTRY_RET + * [ argN ] + * [ ... ] + * FP - args_off [ arg1 ] + * + * FP - nregs_off [ regs count ] + * + * FP - ip_off [ traced func ] BPF_TRAMP_F_IP_ARG + * + * FP - run_ctx_off [ bpf_tramp_run_ctx ] + * + * FP - sreg_off [ callee saved reg ] + * + * [ pads ] pads for 16 bytes alignment + */ + + if (flags & (BPF_TRAMP_F_ORIG_STACK | BPF_TRAMP_F_SHARE_IPMODIFY)) + return -ENOTSUPP; + + /* extra regiters for struct arguments */ + for (i = 0; i < m->nr_args; i++) + if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) + nregs += round_up(m->arg_size[i], 8) / 8 - 1; + + /* 8 arguments passed by registers */ + if (nregs > 8) + return -ENOTSUPP; + + /* room for parent function return address */ + stack_size += 8; + + stack_size += 8; + retaddr_off = stack_size; + + stack_size += 8; + fp_off = stack_size; + + save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET); + if (save_ret) { + stack_size += 8; + retval_off = stack_size; + } + + stack_size += nregs * 8; + args_off = stack_size; + + stack_size += 8; + nregs_off = stack_size; + + if (flags & BPF_TRAMP_F_IP_ARG) { + stack_size += 8; + ip_off = stack_size; + } + + stack_size += round_up(sizeof(struct bpf_tramp_run_ctx), 8); + run_ctx_off = stack_size; + + stack_size += 8; + sreg_off = stack_size; + + stack_size = round_up(stack_size, 16); + + emit_addi(RV_REG_SP, RV_REG_SP, -stack_size, ctx); + + emit_sd(RV_REG_SP, stack_size - retaddr_off, RV_REG_RA, ctx); + emit_sd(RV_REG_SP, stack_size - fp_off, RV_REG_FP, ctx); + + emit_addi(RV_REG_FP, RV_REG_SP, stack_size, ctx); + + /* callee saved register S1 to pass start time */ + emit_sd(RV_REG_FP, -sreg_off, RV_REG_S1, ctx); + + /* store ip address of the traced function */ + if (flags & BPF_TRAMP_F_IP_ARG) { + emit_imm(RV_REG_T1, (const s64)func_addr, ctx); + emit_sd(RV_REG_FP, -ip_off, RV_REG_T1, ctx); + } + + emit_li(RV_REG_T1, nregs, ctx); + emit_sd(RV_REG_FP, -nregs_off, RV_REG_T1, ctx); + + store_args(nregs, args_off, ctx); + + /* skip to actual body of traced function */ + if (flags & BPF_TRAMP_F_SKIP_FRAME) + orig_call += 16; + + if (flags & BPF_TRAMP_F_CALL_ORIG) { + emit_imm(RV_REG_A0, (const s64)im, ctx); + ret = emit_call((const u64)__bpf_tramp_enter, true, ctx); + if (ret) + return ret; + } + + for (i = 0; i < fentry->nr_links; i++) { + ret = invoke_bpf_prog(fentry->links[i], args_off, retval_off, run_ctx_off, + flags & BPF_TRAMP_F_RET_FENTRY_RET, ctx); + if (ret) + return ret; + } + + if (fmod_ret->nr_links) { + branches_off = kcalloc(fmod_ret->nr_links, sizeof(int), GFP_KERNEL); + if (!branches_off) + return -ENOMEM; + + /* cleanup to avoid garbage return value confusion */ + emit_sd(RV_REG_FP, -retval_off, RV_REG_ZERO, ctx); + for (i = 0; i < fmod_ret->nr_links; i++) { + ret = invoke_bpf_prog(fmod_ret->links[i], args_off, retval_off, + run_ctx_off, true, ctx); + if (ret) + goto out; + emit_ld(RV_REG_T1, -retval_off, RV_REG_FP, ctx); + branches_off[i] = ctx->ninsns; + /* nop reserved for conditional jump */ + emit(rv_nop(), ctx); + } + } + + if (flags & BPF_TRAMP_F_CALL_ORIG) { + restore_args(nregs, args_off, ctx); + ret = emit_call((const u64)orig_call, true, ctx); + if (ret) + goto out; + emit_sd(RV_REG_FP, -retval_off, RV_REG_A0, ctx); + im->ip_after_call = ctx->insns + ctx->ninsns; + /* 2 nops reserved for auipc+jalr pair */ + emit(rv_nop(), ctx); + emit(rv_nop(), ctx); + } + + /* update branches saved in invoke_bpf_mod_ret with bnez */ + for (i = 0; ctx->insns && i < fmod_ret->nr_links; i++) { + offset = ninsns_rvoff(ctx->ninsns - branches_off[i]); + insn = rv_bne(RV_REG_T1, RV_REG_ZERO, offset >> 1); + *(u32 *)(ctx->insns + branches_off[i]) = insn; + } + + for (i = 0; i < fexit->nr_links; i++) { + ret = invoke_bpf_prog(fexit->links[i], args_off, retval_off, + run_ctx_off, false, ctx); + if (ret) + goto out; + } + + if (flags & BPF_TRAMP_F_CALL_ORIG) { + im->ip_epilogue = ctx->insns + ctx->ninsns; + emit_imm(RV_REG_A0, (const s64)im, ctx); + ret = emit_call((const u64)__bpf_tramp_exit, true, ctx); + if (ret) + goto out; + } + + if (flags & BPF_TRAMP_F_RESTORE_REGS) + restore_args(nregs, args_off, ctx); + + if (save_ret) + emit_ld(RV_REG_A0, -retval_off, RV_REG_FP, ctx); + + emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx); + + if (flags & BPF_TRAMP_F_SKIP_FRAME) + /* return address of parent function */ + emit_ld(RV_REG_RA, stack_size - 8, RV_REG_SP, ctx); + else + /* return address of traced function */ + emit_ld(RV_REG_RA, stack_size - retaddr_off, RV_REG_SP, ctx); + + emit_ld(RV_REG_FP, stack_size - fp_off, RV_REG_SP, ctx); + emit_addi(RV_REG_SP, RV_REG_SP, stack_size, ctx); + + emit_jalr(RV_REG_ZERO, RV_REG_RA, 0, ctx); + + ret = ctx->ninsns; +out: + kfree(branches_off); + return ret; +} + +int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, + void *image_end, const struct btf_func_model *m, + u32 flags, struct bpf_tramp_links *tlinks, + void *func_addr) +{ + int ret; + struct rv_jit_context ctx; + + ctx.ninsns = 0; + ctx.insns = NULL; + ret = __arch_prepare_bpf_trampoline(im, m, tlinks, func_addr, flags, &ctx); + if (ret < 0) + return ret; + + if (ninsns_rvoff(ret) > (long)image_end - (long)image) + return -EFBIG; + + ctx.ninsns = 0; + ctx.insns = image; + ret = __arch_prepare_bpf_trampoline(im, m, tlinks, func_addr, flags, &ctx); + if (ret < 0) + return ret; + + bpf_flush_icache(ctx.insns, ctx.insns + ctx.ninsns); + + return ninsns_rvoff(ret); +} + int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, bool extra_pass) { -- cgit v1.2.3