summaryrefslogtreecommitdiff
path: root/arch/s390/include/asm/fpu-insn.h
AgeCommit message (Collapse)AuthorFilesLines
2024-02-16s390/fpu: add vector instruction inline assemblies for crc32Heiko Carstens1-0/+56
Provide various vector instruction inline assemblies for crc32 calculations. This is just preparation to keep the conversion of the existing crc32 implementations from assembly to C small. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/sysinfo: convert bogomips calculation to CHeiko Carstens1-0/+35
Provide several one instruction fpu inline assemebles and use them to implement the bogomips calculation in C like style. This is more for illustration purposes on how kernel fpu code can be written in C. This has the advantage that the author only has to take care of the floating point instructions, but doesn't need to take care of general purpose register allocation (if needed), and the semantics of all other instructions not related to fpu. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/raid6: convert to use standard fpu_*() inline assembliesHeiko Carstens1-0/+48
Move the s390 specific raid6 inline assemblies, make them generic, and reuse them to implement the raid6 gen/xor implementation. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/checksum: provide csum_partial_copy_nocheck()Heiko Carstens1-0/+58
With csum_partial(), which reads all bytes into registers it is easy to also implement csum_partial_copy_nocheck() which copies the buffer while calculating its checksum. For a 512 byte buffer this reduces the runtime by 19%. Compared to the old generic variant (memcpy() + cksm instruction) runtime is reduced by 42%). Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/checksum: provide vector register variant of csum_partial()Heiko Carstens1-0/+99
Provide a faster variant of csum_partial() which uses vector registers instead of the cksm instruction. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: let fpu_vlm() and fpu_vstm() return number of registersHeiko Carstens1-8/+16
Let fpu_vlm() and fpu_vstm() macros return the number of registers saved / loaded. This is helpful to read easy to read code in case there are several subsequent fpu_vlm() or fpu_vstm() calls: __vector128 *vxrs = .... vxrs += fpu_vstm(0, 15, vxrs); vxrs += fpu_vstm(16, 31, vxrs); Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: provide and use vlm and vstm inline assembliesHeiko Carstens1-0/+70
Instead of open-coding vlm and vstm inline assemblies at several locations, provide an fpu_* function for each instruction, and use them in the new save_vx_regs() and load_vx_regs() helper functions. Note that "O" and "R" inline assembly operand modifiers are used in order to pass the displacement and base register of the memory operands to the existing VLM and VSTM macros. The two operand modifiers are not available for clang. Therefore provide two variants of each inline assembly. The clang variant always uses and clobbers general purpose register 1, like in the previous inline assemblies, so it can be used as base register with a zero displacement. This generates slightly less efficient code, but can be removed as soon as clang has support for the used operand modifiers. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: provide and use lfpc, sfpc, and stfpc inline assembliesHeiko Carstens1-0/+26
Instead of open-coding lfpc, sfpc, and stfpc inline assemblies at several locations, provide an fpu_* function for each instruction and use the function instead. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: provide and use ld and std inline assembliesHeiko Carstens1-0/+18
Deduplicate the 64 ld and std inline assemblies. Provide an fpu inline assembly for both instructions, and use them in the new save_fp_regs() and load_fp_regs() helper functions. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: use lfpc instead of sfpc instructionHeiko Carstens1-8/+14
The only user of sfpc_safe() needs to read the new fpc register value from memory before it is set with sfpc. Avoid this indirection and use lfpc, which reads the new value from memory. Also add the "fpu_" prefix to have a common name space for fpu related inline assemblies, and provide memory access instrumentation. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: add documentation about fpu helper functionsHeiko Carstens1-0/+20
Add documentation which describes what the fpu helper functions are good for, and why they should be used. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16s390/fpu: move, rename, and merge header filesHeiko Carstens1-0/+42
Move, rename, and merge the fpu and vx header files. This way fpu header files have a consistent naming scheme (fpu*.h). Also get rid of the fpu subdirectory and move header files to asm directory, so that all fpu and vx header files can be found at the same location. Merge internal.h header file into other header files, since the internal helpers are used at many locations. so those helper functions are really not internal. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>