summaryrefslogtreecommitdiff
path: root/arch/x86/crypto/aria_aesni_avx_glue.c
AgeCommit message (Collapse)AuthorFilesLines
2023-01-20crypto: x86/aria-avx - fix build failure with old binutilsTaehee Yoo1-1/+3
The minimum version of binutils for kernel build is currently 2.23 and it doesn't support GFNI. So, it fails to build the aria-avx if the old binutils is used. The code using GFNI is an optional part of aria-avx. So, it disables GFNI part in it when the old binutils is used. In order to check whether the using binutils is supporting GFNI or not, AS_GFNI is added. Fixes: ba3579e6e45c ("crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-06crypto: x86/aria - implement aria-avx2Taehee Yoo1-0/+6
aria-avx2 implementation uses AVX2, AES-NI, and GFNI. It supports 32way parallel processing. So, byteslicing code is changed to support 32way parallel. And it exports some aria-avx functions such as encrypt() and decrypt(). There are two main logics, s-box layer and diffusion layer. These codes are the same as aria-avx implementation. But some instruction are exchanged because they don't support 256bit registers. Also, AES-NI doesn't support 256bit register. So, aesenclast and aesdeclast are used twice like below: vextracti128 $1, ymm0, xmm6; vaesenclast xmm7, xmm0, xmm0; vaesenclast xmm7, xmm6, xmm6; vinserti128 $1, xmm6, ymm0, ymm0; Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100: ARIA-AVX2 with GFNI(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption tcrypt: 1 operation in 2003 cycles (1024 bytes) tcrypt: 1 operation in 5867 cycles (4096 bytes) tcrypt: 1 operation in 2358 cycles (1024 bytes) tcrypt: 1 operation in 7295 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption tcrypt: 1 operation in 2004 cycles (1024 bytes) tcrypt: 1 operation in 5956 cycles (4096 bytes) tcrypt: 1 operation in 2409 cycles (1024 bytes) tcrypt: 1 operation in 7564 cycles (4096 bytes) ARIA-AVX with GFNI(128bit and 256bit) testing speed of multibuffer ecb(aria) (ecb-aria-avx) encryption tcrypt: 1 operation in 2761 cycles (1024 bytes) tcrypt: 1 operation in 9390 cycles (4096 bytes) tcrypt: 1 operation in 3401 cycles (1024 bytes) tcrypt: 1 operation in 11876 cycles (4096 bytes) testing speed of multibuffer ecb(aria) (ecb-aria-avx) decryption tcrypt: 1 operation in 2735 cycles (1024 bytes) tcrypt: 1 operation in 9424 cycles (4096 bytes) tcrypt: 1 operation in 3369 cycles (1024 bytes) tcrypt: 1 operation in 11954 cycles (4096 bytes) Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-01-06crypto: x86/aria - add keystream array into request ctxTaehee Yoo1-13/+26
avx accelerated aria module used local keystream array. But, keystream array size is too big. So, it puts the keystream array into request ctx. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-09-24crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of ↵Taehee Yoo1-0/+213
aria cipher The implementation is based on the 32-bit implementation of the aria. Also, aria-avx process steps are the similar to the camellia-avx. 1. Byteslice(16way) 2. Add-round-key. 3. Sbox 4. Diffusion layer. Except for s-box, all steps are the same as the aria-generic implementation. s-box step is very similar to camellia and sm4 implementation. There are 2 implementations for s-box step. One is to use AES-NI and affine transformation, which is the same as Camellia, sm4, and others. Another is to use GFNI. GFNI implementation is faster than AES-NI implementation. So, it uses GFNI implementation if the running CPU supports GFNI. There are 4 s-boxes in the ARIA and the 2 s-boxes are the same as AES's s-boxes. To calculate the first sbox, it just uses the aesenclast and then inverts shift_row. No more process is needed for this job because the first s-box is the same as the AES encryption s-box. To calculate the second sbox(invert of s1), it just uses the aesdeclast and then inverts shift_row. No more process is needed for this job because the second s-box is the same as the AES decryption s-box. To calculate the third s-box, it uses the aesenclast, then affine transformation, which is combined AES inverse affine and ARIA S2. To calculate the last s-box, it uses the aesdeclast, then affine transformation, which is combined X2 and AES forward affine. The optimized third and last s-box logic and GFNI s-box logic are implemented by Jussi Kivilinna. The aria-generic implementation is based on a 32-bit implementation, not an 8-bit implementation. the aria-avx Diffusion Layer implementation is based on aria-generic implementation because 8-bit implementation is not fit for parallel implementation but 32-bit is enough to fit for this. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>