summaryrefslogtreecommitdiff
path: root/arch/x86/crypto
AgeCommit message (Collapse)AuthorFilesLines
2018-07-08crypto: shash - remove useless setting of type flagsEric Biggers5-21/+1
Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH. But this is redundant with the C structure type ('struct shash_alg'), and crypto_register_shash() already sets the type flag automatically, clearing any type flag that was already there. Apparently the useless assignment has just been copy+pasted around. So, remove the useless assignment from all the shash algorithms. This patch shouldn't change any actual behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-08crypto: x86/sha-mb - decrease priority of multibuffer algorithmsEric Biggers3-3/+24
With all the crypto modules enabled on x86, and with a CPU that supports AVX-2 but not SHA-NI instructions (e.g. Haswell, Broadwell, Skylake), the "multibuffer" implementations of SHA-1, SHA-256, and SHA-512 are the highest priority. However, these implementations only perform well when many hash requests are being submitted concurrently, filling all 8 AVX-2 lanes. Otherwise, they are incredibly slow, as they waste time waiting for more requests to arrive before proceeding to execute each request. For example, here are the speeds I see hashing 4096-byte buffers with a single thread on a Haswell-based processor: generic avx2 mb (multibuffer) ------- -------- ---------------- sha1 602 MB/s 997 MB/s 0.61 MB/s sha256 228 MB/s 412 MB/s 0.61 MB/s sha512 312 MB/s 559 MB/s 0.61 MB/s So, the multibuffer implementation is 500 to 1000 times slower than the other implementations. Note that with smaller buffers or more update()s per digest, the difference would be even greater. I believe the vast majority of people are in the boat where the multibuffer code is much slower, and only a small minority are doing the highly parallel, hashing-intensive, latency-flexible workloads (maybe IPsec on servers?) where the multibuffer code may be beneficial. Yet, people often aren't familiar with all the crypto config options and so the multibuffer code may inadvertently be built into the kernel. Also the multibuffer code apparently hasn't been very well tested, seeing as it was sometimes computing the wrong SHA-256 digest. So, let's make the multibuffer algorithms low priority. Users who want to use them can either request them explicitly by driver name, or use NETLINK_CRYPTO (crypto_user) to increase their priority at runtime. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-07-08crypto: x86/sha256-mb - fix digest copy in sha256_mb_mgr_get_comp_job_avx2()Eric Biggers1-1/+1
There is a copy-paste error where sha256_mb_mgr_get_comp_job_avx2() copies the SHA-256 digest state from sha256_mb_mgr::args::digest to job_sha256::result_digest. Consequently, the sha256_mb algorithm sometimes calculates the wrong digest. Fix it. Reproducer using AF_ALG: #include <assert.h> #include <linux/if_alg.h> #include <stdio.h> #include <string.h> #include <sys/socket.h> #include <unistd.h> static const __u8 expected[32] = "\xad\x7f\xac\xb2\x58\x6f\xc6\xe9\x66\xc0\x04\xd7\xd1\xd1\x6b\x02" "\x4f\x58\x05\xff\x7c\xb4\x7c\x7a\x85\xda\xbd\x8b\x48\x89\x2c\xa7"; int main() { int fd; struct sockaddr_alg addr = { .salg_type = "hash", .salg_name = "sha256_mb", }; __u8 data[4096] = { 0 }; __u8 digest[32]; int ret; int i; fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); fork(); fd = accept(fd, 0, 0); do { ret = write(fd, data, 4096); assert(ret == 4096); ret = read(fd, digest, 32); assert(ret == 32); } while (memcmp(digest, expected, 32) == 0); printf("wrong digest: "); for (i = 0; i < 32; i++) printf("%02x", digest[i]); printf("\n"); } Output was: wrong digest: ad7facb2000000000000000000000000ffffffef7cb47c7a85dabd8b48892ca7 Fixes: 172b1d6b5a93 ("crypto: sha256-mb - fix ctx pointer and digest copy") Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-30crypto: x86/salsa20 - remove x86 salsa20 implementationsEric Biggers4-1838/+0
The x86 assembly implementations of Salsa20 use the frame base pointer register (%ebp or %rbp), which breaks frame pointer convention and breaks stack traces when unwinding from an interrupt in the crypto code. Recent (v4.10+) kernels will warn about this, e.g. WARNING: kernel stack regs at 00000000a8291e69 in syzkaller047086:4677 has bad 'bp' value 000000001077994c [...] But after looking into it, I believe there's very little reason to still retain the x86 Salsa20 code. First, these are *not* vectorized (SSE2/SSSE3/AVX2) implementations, which would be needed to get anywhere close to the best Salsa20 performance on any remotely modern x86 processor; they're just regular x86 assembly. Second, it's still unclear that anyone is actually using the kernel's Salsa20 at all, especially given that now ChaCha20 is supported too, and with much more efficient SSSE3 and AVX2 implementations. Finally, in benchmarks I did on both Intel and AMD processors with both gcc 8.1.0 and gcc 4.9.4, the x86_64 salsa20-asm is actually slightly *slower* than salsa20-generic (~3% slower on Skylake, ~10% slower on Zen), while the i686 salsa20-asm is only slightly faster than salsa20-generic (~15% faster on Skylake, ~20% faster on Zen). The gcc version made little difference. So, the x86_64 salsa20-asm is pretty clearly useless. That leaves just the i686 salsa20-asm, which based on my tests provides a 15-20% speed boost. But that's without updating the code to not use %ebp. And given the maintenance cost, the small speed difference vs. salsa20-generic, the fact that few people still use i686 kernels, the doubt that anyone is even using the kernel's Salsa20 at all, and the fact that a SSE2 implementation would almost certainly be much faster on any remotely modern x86 processor yet no one has cared enough to add one yet, I don't think it's worthwhile to keep. Thus, just remove both the x86_64 and i686 salsa20-asm implementations. Reported-by: syzbot+ffa3a158337bbc01ff09@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-30crypto: morus - Mark MORUS SIMD glue as x86-specificOndrej Mosnacek3-0/+603
Commit 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for MORUS") accidetally consiedered the glue code to be usable by different architectures, but it seems to be only usable on x86. This patch moves it under arch/x86/crypto and adds 'depends on X86' to the Kconfig options and also removes the prompt to hide these internal options from the user. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-26crypto: x86/aegis256 - Fix wrong key buffer sizeOndrej Mosnacek1-3/+3
AEGIS-256 key is two blocks, not one. Fixes: 1d373d4e8e15 ("crypto: x86 - Add optimized AEGIS implementations") Reported-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-18crypto: x86 - Add optimized MORUS implementationsOndrej Mosnacek7-0/+2344
This patch adds optimized implementations of MORUS-640 and MORUS-1280, utilizing the SSE2 and AVX2 x86 extensions. For MORUS-1280 (which operates on 256-bit blocks) we provide both AVX2 and SSE2 implementation. Although SSE2 MORUS-1280 is slower than AVX2 MORUS-1280, it is comparable in speed to the SSE2 MORUS-640. Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-18crypto: x86 - Add optimized AEGIS implementationsOndrej Mosnacek7-0/+3505
This patch adds optimized implementations of AEGIS-128, AEGIS-128L, and AEGIS-256, utilizing the AES-NI and SSE2 x86 extensions. Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-05crypto: ghash-clmulni - fix spelling mistake: "acclerated" -> "accelerated"Colin Ian King1-1/+1
Trivial fix to spelling mistake in module description text Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-09crypto: x86/des3_ede - des3_ede_skciphers[] can be staticWu Fengguang1-1/+1
Fixes: 09c0f03bf8ce ("crypto: x86/des3_ede - convert to skcipher interface") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/glue_helper - rename glue_skwalk_fpu_begin()Eric Biggers2-14/+11
There are no users of the original glue_fpu_begin() anymore, so rename glue_skwalk_fpu_begin() to glue_fpu_begin() so that it matches glue_fpu_end() again. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/glue_helper - remove blkcipher_walk functionsEric Biggers1-344/+0
Now that all glue_helper users have been switched from the blkcipher interface over to the skcipher interface, remove the versions of the glue_helper functions that handled the blkcipher interface. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: lrw - remove lrw_crypt()Eric Biggers1-1/+0
Now that all users of lrw_crypt() have been removed in favor of the LRW template wrapping an ECB mode algorithm, remove lrw_crypt(). Also remove crypto/lrw.h as that is no longer needed either; and fold 'struct lrw_table_ctx' into 'struct priv', lrw_init_table() into setkey(), and lrw_free_table() into exit_tfm(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/camellia-aesni-avx, avx2 - convert to skcipher interfaceEric Biggers2-432/+202
Convert the AESNI AVX and AESNI AVX2 implementations of Camellia from the (deprecated) ablkcipher and blkcipher interfaces over to the skcipher interface. Note that this includes replacing the use of ablk_helper with crypto_simd. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/camellia - convert to skcipher interfaceEric Biggers1-83/+79
Convert the x86 asm implementation of Camellia from the (deprecated) blkcipher interface over to the skcipher interface. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/camellia - remove XTS algorithmEric Biggers2-111/+22
The XTS template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic XTS code themselves via xts_crypt(). Remove the xts-camellia-asm algorithm which did this. Users who request xts(camellia) and previously would have gotten xts-camellia-asm will now get xts(ecb-camellia-asm) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/camellia - remove LRW algorithmEric Biggers1-84/+1
The LRW template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic LRW code themselves via lrw_crypt(). Remove the lrw-camellia-asm algorithm which did this. Users who request lrw(camellia) and previously would have gotten lrw-camellia-asm will now get lrw(ecb-camellia-asm) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/camellia-aesni-avx2 - remove LRW algorithmEric Biggers1-179/+1
The LRW template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic LRW code themselves via lrw_crypt(). Remove the lrw-camellia-aesni-avx2 algorithm which did this. Users who request lrw(camellia) and previously would have gotten lrw-camellia-aesni-avx2 will now get lrw(ecb-camellia-aesni-avx2) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/camellia-aesni-avx - remove LRW algorithmEric Biggers1-167/+1
The LRW template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic LRW code themselves via lrw_crypt(). Remove the lrw-camellia-aesni algorithm which did this. Users who request lrw(camellia) and previously would have gotten lrw-camellia-aesni will now get lrw(ecb-camellia-aesni) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/des3_ede - convert to skcipher interfaceEric Biggers1-119/+119
Convert the x86 asm implementation of Triple DES from the (deprecated) blkcipher interface over to the skcipher interface. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/blowfish: convert to skcipher interfaceEric Biggers1-117/+113
Convert the x86 asm implementation of Blowfish from the (deprecated) blkcipher interface over to the skcipher interface. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/cast6-avx - convert to skcipher interfaceEric Biggers1-211/+100
Convert the AVX implementation of CAST6 from the (deprecated) ablkcipher and blkcipher interfaces over to the skcipher interface. Note that this includes replacing the use of ablk_helper with crypto_simd. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/cast6-avx - remove LRW algorithmEric Biggers1-179/+1
The LRW template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic LRW code themselves via lrw_crypt(). Remove the lrw-cast6-avx algorithm which did this. Users who request lrw(cast6) and previously would have gotten lrw-cast6-avx will now get lrw(ecb-cast6-avx) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/cast5-avx - convert to skcipher interfaceEric Biggers1-224/+127
Convert the AVX implementation of CAST5 from the (deprecated) ablkcipher and blkcipher interfaces over to the skcipher interface. Note that this includes replacing the use of ablk_helper with crypto_simd. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/cast5-avx - fix ECB encryption when long sg follows short oneEric Biggers1-2/+1
With ecb-cast5-avx, if a 128+ byte scatterlist element followed a shorter one, then the algorithm accidentally encrypted/decrypted only 8 bytes instead of the expected 128 bytes. Fix it by setting the encryption/decryption 'fn' correctly. Fixes: c12ab20b162c ("crypto: cast5/avx - avoid using temporary stack buffers") Cc: <stable@vger.kernel.org> # v3.8+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/twofish-avx - convert to skcipher interfaceEric Biggers1-215/+100
Convert the AVX implementation of Twofish from the (deprecated) ablkcipher and blkcipher interfaces over to the skcipher interface. Note that this includes replacing the use of ablk_helper with crypto_simd. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/twofish-avx - remove LRW algorithmEric Biggers1-188/+1
The LRW template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic LRW code themselves via lrw_crypt(). Remove the lrw-twofish-avx algorithm which did this. Users who request lrw(twofish) and previously would have gotten lrw-twofish-avx will now get lrw(ecb-twofish-avx) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/twofish-3way - convert to skcipher interfaceEric Biggers1-84/+67
Convert the 3-way implementation of Twofish from the (deprecated) blkcipher interface over to the skcipher interface. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/twofish-3way - remove XTS algorithmEric Biggers2-109/+25
The XTS template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic XTS code themselves via xts_crypt(). Remove the xts-twofish-3way algorithm which did this. Users who request xts(twofish) and previously would have gotten xts-twofish-3way will now get xts(ecb-twofish-3way) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/twofish-3way - remove LRW algorithmEric Biggers2-80/+27
The LRW template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic LRW code themselves via lrw_crypt(). Remove the lrw-twofish-3way algorithm which did this. Users who request lrw(twofish) and previously would have gotten lrw-twofish-3way will now get lrw(ecb-twofish-3way) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/serpent-avx,avx2 - convert to skcipher interfaceEric Biggers2-435/+212
Convert the AVX and AVX2 implementations of Serpent from the (deprecated) ablkcipher and blkcipher interfaces over to the skcipher interface. Note that this includes replacing the use of ablk_helper with crypto_simd. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/serpent-avx - remove LRW algorithmEric Biggers1-176/+1
The LRW template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic LRW code themselves via lrw_crypt(). Remove the lrw-serpent-avx algorithm which did this. Users who request lrw(serpent) and previously would have gotten lrw-serpent-avx will now get lrw(ecb-serpent-avx) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/serpent-avx2 - remove LRW algorithmEric Biggers1-175/+1
The LRW template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic LRW code themselves via lrw_crypt(). Remove the lrw-serpent-avx2 algorithm which did this. Users who request lrw(serpent) and previously would have gotten lrw-serpent-avx2 will now get lrw(ecb-serpent-avx2) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/serpent-sse2 - convert to skcipher interfaceEric Biggers1-151/+70
Convert the SSE2 implementation of Serpent from the (deprecated) ablkcipher and blkcipher interfaces over to the skcipher interface. Note that this includes replacing the use of ablk_helper with crypto_simd. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/serpent-sse2 - remove XTS algorithmEric Biggers1-172/+0
The XTS template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic XTS code themselves via xts_crypt(). Remove the xts-serpent-sse2 algorithm which did this. Users who request xts(serpent) and previously would have gotten xts-serpent-sse2 will now get xts(ecb-serpent-sse2) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/serpent-sse2 - remove LRW algorithmEric Biggers1-129/+1
The LRW template now wraps an ECB mode algorithm rather than the block cipher directly. Therefore it is now redundant for crypto modules to wrap their ECB code with generic LRW code themselves via lrw_crypt(). Remove the lrw-serpent-sse2 algorithm which did this. Users who request lrw(serpent) and previously would have gotten lrw-serpent-sse2 will now get lrw(ecb-serpent-sse2) instead, which is just as fast. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-02crypto: x86/glue_helper - add skcipher_walk functionsEric Biggers1-0/+207
Add ECB, CBC, and CTR functions to glue_helper which use skcipher_walk rather than blkcipher_walk. This will allow converting the remaining x86 algorithms from the blkcipher interface over to the skcipher interface, after which we'll be able to remove the blkcipher_walk versions of these functions. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Update aesni-intel_glue to use scatter/gatherDave Watson1-0/+133
Add gcmaes_crypt_by_sg routine, that will do scatter/gather by sg. Either src or dst may contain multiple buffers, so iterate over both at the same time if they are different. If the input is the same as the output, iterate only over one. Currently both the AAD and TAG must be linear, so copy them out with scatterlist_map_and_copy. If first buffer contains the entire AAD, we can optimize and not copy. Since the AAD can be any size, if copied it must be on the heap. TAG can be on the stack since it is always < 16 bytes. Only the SSE routines are updated so far, so leave the previous gcmaes_en/decrypt routines, and branch to the sg ones if the keysize is inappropriate for avx, or we are SSE only. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Introduce scatter/gather asm function stubsDave Watson2-26/+106
The asm macros are all set up now, introduce entry points. GCM_INIT and GCM_COMPLETE have arguments supplied, so that the new scatter/gather entry points don't have to take all the arguments, and only the ones they need. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Add fast path for > 16 byte updateDave Watson1-0/+25
We can fast-path any < 16 byte read if the full message is > 16 bytes, and shift over by the appropriate amount. Usually we are reading > 16 bytes, so this should be faster than the READ_PARTIAL macro introduced in b20209c91e2 for the average case. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Introduce partial block macroDave Watson1-1/+150
Before this diff, multiple calls to GCM_ENC_DEC will succeed, but only if all calls are a multiple of 16 bytes. Handle partial blocks at the start of GCM_ENC_DEC, and update aadhash as appropriate. The data offset %r11 is also updated after the partial block. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Move HashKey computation from stack to gcm_contextDave Watson1-99/+106
HashKey computation only needs to happen once per scatter/gather operation, save it between calls in gcm_context struct instead of on the stack. Since the asm no longer stores anything on the stack, we can use %rsp directly, and clean up the frame save/restore macros a bit. Hashkeys actually only need to be calculated once per key and could be moved to when set_key is called, however, the current glue code falls back to generic aes code if fpu is disabled. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Move ghash_mul to GCM_COMPLETEDave Watson1-1/+9
Prepare to handle partial blocks between scatter/gather calls. For the last partial block, we only want to calculate the aadhash in GCM_COMPLETE, and a new partial block macro will handle both aadhash update and encrypting partial blocks between calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Fill in new context data structuresDave Watson1-12/+39
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values. pblocklen, aadhash, and pblockenckey are also updated at the end of each scatter/gather operation, to be carried over to the next operation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Split AAD hash calculation to separate macroDave Watson1-28/+43
AAD hash only needs to be calculated once for each scatter/gather operation. Move it to its own macro, and call it from GCM_INIT instead of INITIAL_BLOCKS. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Introduce gcm_context_dataDave Watson2-75/+121
Introduce a gcm_context_data struct that will be used to pass context data between scatter/gather update calls. It is passed as the second argument (after crypto keys), other args are renumbered. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Merge encode and decode to GCM_ENC_DEC macroDave Watson1-179/+114
Make a macro for the main encode/decode routine. Only a small handful of lines differ for enc and dec. This will also become the main scatter/gather update routine. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Add GCM_COMPLETE macroDave Watson1-109/+63
Merge encode and decode tag calculations in GCM_COMPLETE macro. Scatter/gather routines will call this once at the end of encryption or decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Add GCM_INIT macroDave Watson1-51/+33
Reduce code duplication by introducting GCM_INIT macro. This macro will also be exposed as a function for implementing scatter/gather support, since INIT only needs to be called once for the full operation. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22crypto: aesni - Macro-ify func save/restoreDave Watson1-29/+24
Macro-ify function save and restore. These will be used in new functions added for scatter/gather update operations. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>