Age | Commit message (Collapse) | Author | Files | Lines |
|
Add LSX and LASX implementations of xor operations, operating on 64
bytes (one L1 cache line) at a time, for a balance between memory
utilization and instruction mix. Huacai confirmed that all future
LoongArch implementations by Loongson (that we care) will likely also
feature 64-byte cache lines, and experiments show no throughput
improvement with further unrolling.
Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:
> 8regs : 12702 MB/sec
> 8regs_prefetch : 10920 MB/sec
> 32regs : 12686 MB/sec
> 32regs_prefetch : 10918 MB/sec
> lsx : 17589 MB/sec
> lasx : 26116 MB/sec
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Inspired by the commit 42d038c4fb00f ("arm64: Add support for function
error injection") and the commit ee55ff803b383 ("riscv: Add support for
function error injection"), this patch supports function error injection
for LoongArch.
Mainly implement two functions:
(1) regs_set_return_value() which is used to overwrite the return value,
(2) override_function_with_return() which is used to override the probed
function returning and jump to its caller.
Here is a simple test under CONFIG_FUNCTION_ERROR_INJECTION and
CONFIG_FAIL_FUNCTION:
# echo sys_clone > /sys/kernel/debug/fail_function/inject
# echo 100 > /sys/kernel/debug/fail_function/probability
# dmesg
bash: fork: Invalid argument
# dmesg
...
FAULT_INJECTION: forcing a failure.
name fail_function, interval 1, probability 100, space 0, times 1
...
Call Trace:
[<90000000002238f4>] show_stack+0x5c/0x180
[<90000000012e384c>] dump_stack_lvl+0x60/0x88
[<9000000000b1879c>] should_fail_ex+0x1b0/0x1f4
[<900000000032ead4>] fei_kprobe_handler+0x28/0x6c
[<9000000000230970>] kprobe_breakpoint_handler+0xf0/0x118
[<90000000012e3e60>] do_bp+0x2c4/0x358
[<9000000002241924>] exception_handlers+0x1924/0x10000
[<900000000023b7d0>] sys_clone+0x0/0x4
[<90000000012e4744>] do_syscall+0x7c/0x94
[<9000000000221e44>] handle_syscall+0xc4/0x160
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
LoongArch platform is 64-bit system, which supports 8-bytes memory
accessing, but generic checksum functions use 4-byte memory access.
So add 8-bytes memory access optimization for checksum functions on
LoongArch. And the code comes from arm64 system.
When network hw checksum is disabled, iperf performance improves about
10% with this patch.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Use the alternative to optimize common libraries according whether CPU
has UAL (hardware unaligned access support) feature, including memset(),
memcopy(), memmove(), copy_user() and clear_user().
We have tested UnixBench on a Loongson-3A5000 quad-core machine (1.6GHz):
1, One copy, before patch:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 9566582.0 819.8
Double-Precision Whetstone 55.0 2805.3 510.1
Execl Throughput 43.0 2120.0 493.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 209833.0 529.9
File Copy 256 bufsize 500 maxblocks 1655.0 89400.0 540.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 320036.0 551.8
Pipe Throughput 12440.0 340624.0 273.8
Pipe-based Context Switching 4000.0 109939.1 274.8
Process Creation 126.0 4728.7 375.3
Shell Scripts (1 concurrent) 42.4 2223.1 524.3
Shell Scripts (8 concurrent) 6.0 883.1 1471.9
System Call Overhead 15000.0 518639.1 345.8
========
System Benchmarks Index Score 500.2
2, One copy, after patch:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 9567674.7 819.9
Double-Precision Whetstone 55.0 2805.5 510.1
Execl Throughput 43.0 2392.7 556.4
File Copy 1024 bufsize 2000 maxblocks 3960.0 417804.0 1055.1
File Copy 256 bufsize 500 maxblocks 1655.0 112909.5 682.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 1255207.4 2164.2
Pipe Throughput 12440.0 555712.0 446.7
Pipe-based Context Switching 4000.0 99964.5 249.9
Process Creation 126.0 5192.5 412.1
Shell Scripts (1 concurrent) 42.4 2302.4 543.0
Shell Scripts (8 concurrent) 6.0 919.6 1532.6
System Call Overhead 15000.0 511159.3 340.8
========
System Benchmarks Index Score 640.1
3, Four copies, before patch:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 38268610.5 3279.2
Double-Precision Whetstone 55.0 11222.2 2040.4
Execl Throughput 43.0 7892.0 1835.3
File Copy 1024 bufsize 2000 maxblocks 3960.0 235149.6 593.8
File Copy 256 bufsize 500 maxblocks 1655.0 74959.6 452.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 545048.5 939.7
Pipe Throughput 12440.0 1337359.0 1075.0
Pipe-based Context Switching 4000.0 473663.9 1184.2
Process Creation 126.0 17491.2 1388.2
Shell Scripts (1 concurrent) 42.4 6865.7 1619.3
Shell Scripts (8 concurrent) 6.0 1015.9 1693.1
System Call Overhead 15000.0 1899535.2 1266.4
========
System Benchmarks Index Score 1278.3
4, Four copies, after patch:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 38272815.5 3279.6
Double-Precision Whetstone 55.0 11222.8 2040.5
Execl Throughput 43.0 8839.2 2055.6
File Copy 1024 bufsize 2000 maxblocks 3960.0 313912.9 792.7
File Copy 256 bufsize 500 maxblocks 1655.0 80976.1 489.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 1176594.3 2028.6
Pipe Throughput 12440.0 2100941.9 1688.9
Pipe-based Context Switching 4000.0 476696.4 1191.7
Process Creation 126.0 18394.7 1459.9
Shell Scripts (1 concurrent) 42.4 7172.2 1691.6
Shell Scripts (8 concurrent) 6.0 1058.3 1763.9
System Call Overhead 15000.0 1874714.7 1249.8
========
System Benchmarks Index Score 1488.8
Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Loongson-2 series (Loongson-2K500, Loongson-2K1000) don't support
unaligned access in hardware, while Loongson-3 series (Loongson-3A5000,
Loongson-3C5000) are configurable whether support unaligned access in
hardware. This patch add unaligned access emulation for those LoongArch
processors without hardware support.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add Kbuild, Makefile, Kconfig and link script for LoongArch build
infrastructure.
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|