summaryrefslogtreecommitdiff
path: root/arch/arm/crypto
AgeCommit message (Collapse)AuthorFilesLines
2024-02-24crypto: arm/sha - fix function cast warningsArnd Bergmann2-15/+10
clang-16 warns about casting between incompatible function types: arch/arm/crypto/sha256_glue.c:37:5: error: cast from 'void (*)(u32 *, const void *, unsigned int)' (aka 'void (*)(unsigned int *, const void *, unsigned int)') to 'sha256_block_fn *' (aka 'void (*)(struct sha256_state *, const unsigned char *, int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] 37 | (sha256_block_fn *)sha256_block_data_order); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/arm/crypto/sha512-glue.c:34:3: error: cast from 'void (*)(u64 *, const u8 *, int)' (aka 'void (*)(unsigned long long *, const unsigned char *, int)') to 'sha512_block_fn *' (aka 'void (*)(struct sha512_state *, const unsigned char *, int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict] 34 | (sha512_block_fn *)sha512_block_data_order); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix the prototypes for the assembler functions to match the typedef. The code already relies on the digest being the first part of the state structure, so there is no change in behavior. Fixes: c80ae7ca3726 ("crypto: arm/sha512 - accelerated SHA-512 using ARM generic ASM and NEON") Fixes: b59e2ae3690c ("crypto: arm/sha256 - move SHA-224/256 ASM/NEON implementation to base layer") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20crypto: arm/nhpoly1305 - implement ->digestEric Biggers1-0/+9
Implement the ->digest method to improve performance on single-page messages by reducing the number of indirect calls. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-05-12crypto: arm/sha512-neon - Fix clang function cast warningsHerbert Xu1-7/+5
Instead of casting the function which upsets clang for some reason, change the assembly function siganture instead. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304081828.zjGcFUyE-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-05-12crypto: arm/sha256-neon - Fix clang function cast warningsHerbert Xu1-7/+5
Instead of casting the function which upsets clang for some reason, change the assembly function siganture instead. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304081828.zjGcFUyE-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-05-12crypto: arm/sha1-neon - Fix clang function cast warningsHerbert Xu1-7/+5
Instead of casting the function which upsets clang for some reason, change the assembly function siganture instead. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304081828.zjGcFUyE-lkp@intel.com/ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-02-22Merge tag 'v6.3-p1' of ↵Linus Torvalds1-8/+6
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Use kmap_local instead of kmap_atomic - Change request callback to take void pointer - Print FIPS status in /proc/crypto (when enabled) Algorithms: - Add rfc4106/gcm support on arm64 - Add ARIA AVX2/512 support on x86 Drivers: - Add TRNG driver for StarFive SoC - Delete ux500/hash driver (subsumed by stm32/hash) - Add zlib support in qat - Add RSA support in aspeed" * tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (156 commits) crypto: x86/aria-avx - Do not use avx2 instructions crypto: aspeed - Fix modular aspeed-acry crypto: hisilicon/qm - fix coding style issues crypto: hisilicon/qm - update comments to match function crypto: hisilicon/qm - change function names crypto: hisilicon/qm - use min() instead of min_t() crypto: hisilicon/qm - remove some unused defines crypto: proc - Print fips status crypto: crypto4xx - Call dma_unmap_page when done crypto: octeontx2 - Fix objects shared between several modules crypto: nx - Fix sparse warnings crypto: ecc - Silence sparse warning tls: Pass rec instead of aead_req into tls_encrypt_done crypto: api - Remove completion function scaffolding tls: Remove completion function scaffolding tipc: Remove completion function scaffolding net: ipv6: Remove completion function scaffolding net: ipv4: Remove completion function scaffolding net: macsec: Remove completion function scaffolding dm: Remove completion function scaffolding ...
2023-02-22Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds3-17/+790
Pull ARM udpates from Russell King: - Improve Kconfig help text for Cortex A8 and Cortex A9 errata - Kconfig spelling and grammar fixes - Allow kernel-mode VFP/Neon in softirq context - Use Neon in softirq context - Implement AES-CTR/GHASH version of GCM * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9289/1: Allow pre-ARMv5 builds with ld.lld 16.0.0 and newer ARM: 9288/1: Kconfigs: fix spelling & grammar ARM: 9286/1: crypto: Implement fused AES-CTR/GHASH version of GCM ARM: 9285/1: remove meaningless arch/arm/mach-rda/Makefile ARM: 9283/1: permit non-nested kernel mode NEON in softirq context ARM: 9282/1: vfp: Manipulate task VFP state with softirqs disabled ARM: 9281/1: improve Cortex A8/A9 errata help text
2023-01-23ARM: 9287/1: Reduce __thumb2__ definition to crypto files that require itNathan Chancellor1-1/+6
Commit 1d2e9b67b001 ("ARM: 9265/1: pass -march= only to compiler") added a __thumb2__ define to ASFLAGS to avoid build errors in the crypto code, which relies on __thumb2__ for preprocessing. Commit 59e2cf8d21e0 ("ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA") followed up on this by removing -mthumb from AFLAGS so that __thumb2__ would not be defined when the default target was ARMv7 or newer. Unfortunately, the second commit's fix assumes that the toolchain defaults to -mno-thumb / -marm, which is not the case for Debian's arm-linux-gnueabihf target, which defaults to -mthumb: $ echo | arm-linux-gnueabihf-gcc -dM -E - | grep __thumb #define __thumb2__ 1 #define __thumb__ 1 This target is used by several CI systems, which will still see redefined macro warnings, despite '-mthumb' not being present in the flags: <command-line>: warning: "__thumb2__" redefined <built-in>: note: this is the location of the previous definition Remove the global AFLAGS __thumb2__ define and move it to the crypto folder where it is required by the imported OpenSSL algorithms; the rest of the kernel should use the internal CONFIG_THUMB2_KERNEL symbol to know whether or not Thumb2 is being used or not. Be sure that __thumb2__ is undefined first so that there are no macro redefinition warnings. Link: https://github.com/ClangBuiltLinux/linux/issues/1772 Reported-by: "kernelci.org bot" <bot@kernelci.org> Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Fixes: 59e2cf8d21e0 ("ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA") Fixes: 1d2e9b67b001 ("ARM: 9265/1: pass -march= only to compiler") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2023-01-18ARM: 9286/1: crypto: Implement fused AES-CTR/GHASH version of GCMArd Biesheuvel3-17/+790
On 32-bit ARM, AES in GCM mode takes full advantage of the ARMv8 Crypto Extensions when available, resulting in a performance of 6-7 cycles per byte for typical IPsec frames on cores such as Cortex-A53, using the generic GCM template encapsulating the accelerated AES-CTR and GHASH implementations. At such high rates, any time spent copying data or doing other poorly optimized work in the generic layer hurts disproportionately, and we can get a significant performance improvement by combining the optimized AES-CTR and GHASH implementations into a single GCM driver. On Cortex-A53, this results in a performance improvement of around 75%, and AES-256-GCM-128 with RFC4106 encapsulation runs in 4 cycles per byte. Note that this code takes advantage of the fact that kernel mode NEON is now supported in softirq context as well, and therefore does not provide a non-NEON fallback path at all. (AEADs are only callable in process or softirq context) Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-12-30crypto: arm/sha1 - Fix clang function cast warningsHerbert Xu1-8/+6
Instead of casting the function which upsets clang for some reason, change the assembly function siganture instead. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-12-02crypto: Prepare to move crypto_tfm_ctxHerbert Xu1-1/+1
The helper crypto_tfm_ctx is only used by the Crypto API algorithm code and should really be in algapi.h. However, for historical reasons many files relied on it to be in crypto.h. This patch changes those files to use algapi.h instead in prepartion for a move. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-25crypto: arm/nhpoly1305 - eliminate unnecessary CFI wrapperEric Biggers2-10/+3
The arm architecture doesn't support CFI yet, and even if it did, the new CFI implementation supports indirect calls to assembly functions. Therefore, there's no need to use a wrapper function for nh_neon(). Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-11-11crypto: move gf128mul library into lib/cryptoArd Biesheuvel1-1/+1
The gf128mul library does not depend on the crypto API at all, so it can be moved into lib/crypto. This will allow us to use it in other library code in a subsequent patch without having to depend on CONFIG_CRYPTO. While at it, change the Kconfig symbol name to align with other crypto library implementations. However, the source file name is retained, as it is reflected in the module .ko filename, and changing this might break things for users. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-08-26crypto: Kconfig - simplify cipher entriesRobert Elliott1-9/+31
Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-08-26crypto: Kconfig - simplify hash entriesRobert Elliott1-29/+65
Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-08-26crypto: Kconfig - simplify CRC entriesRobert Elliott1-2/+15
Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-08-26crypto: Kconfig - simplify public-key entriesRobert Elliott1-1/+6
Shorten menu titles and make them consistent: - acronym - name - architecture features in parenthesis - no suffixes like "<something> algorithm", "support", or "hardware acceleration", or "optimized" Simplify help text descriptions, update references, and ensure that https references are still valid. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-08-26crypto: Kconfig - sort the arm entriesRobert Elliott1-55/+56
Sort the arm entries so all like entries are together. Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-08-26crypto: Kconfig - submenus for arm and arm64Robert Elliott1-9/+2
Move ARM- and ARM64-accelerated menus into a submenu under the Crypto API menu (paralleling all the architectures). Make each submenu always appear if the corresponding architecture is supported. Get rid of the ARM_CRYPTO and ARM64_CRYPTO symbols. The "ARM Accelerated" or "ARM64 Accelerated" entry disappears from: General setup ---> Platform selection ---> Kernel Features ---> Boot options ---> Power management options ---> CPU Power Management ---> [*] ACPI (Advanced Configuration and Power Interface) Support ---> [*] Virtualization ---> [*] ARM Accelerated Cryptographic Algorithms ---> (or) [*] ARM64 Accelerated Cryptographic Algorithms ---> ... -*- Cryptographic API ---> Library routines ---> Kernel hacking ---> and moves into the Cryptographic API menu, which now contains: ... Accelerated Cryptographic Algorithms for CPU (arm) ---> (or) Accelerated Cryptographic Algorithms for CPU (arm64) ---> [*] Hardware crypto devices ---> ... Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-06-10crypto: blake2s - remove shash moduleJason A. Donenfeld3-79/+2
BLAKE2s has no currently known use as an shash. Just remove all of this unnecessary plumbing. Removing this shash was something we talked about back when we were making BLAKE2s a built-in, but I simply never got around to doing it. So this completes that project. Importantly, this fixs a bug in which the lib code depends on crypto_simd_disabled_for_test, causing linker errors. Also add more alignment tests to the selftests and compare SIMD and non-SIMD compression functions, to make up for what we lose from testmgr.c. Reported-by: gaochao <gaochao49@huawei.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: stable@vger.kernel.org Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-31Merge tag 'v5.18-p1' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: - Missing Kconfig dependency on arm that leads to boot failure - x86 SLS fixes - Reference leak in the stm32 driver * tag 'v5.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/sm3 - Fixup SLS crypto: x86/poly1305 - Fixup SLS crypto: x86/chacha20 - Avoid spurious jumps to other functions crypto: stm32 - fix reference leak in stm32_crc_remove crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
2022-03-25crypto: arm/aes-neonbs-cbc - Select generic cbc and aesHerbert Xu1-0/+2
The algorithm __cbc-aes-neonbs requires a fallback so we need to select the config options for them or otherwise it will fail to register on boot-up. Fixes: 00b99ad2bac2 ("crypto: arm/aes-neonbs - Use generic cbc...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-03-22Merge branch 'linus' of ↵Linus Torvalds2-63/+77
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - hwrng core now credits for low-quality RNG devices. Algorithms: - Optimisations for neon aes on arm/arm64. - Add accelerated crc32_be on arm64. - Add ffdheXYZ(dh) templates. - Disallow hmac keys < 112 bits in FIPS mode. - Add AVX assembly implementation for sm3 on x86. Drivers: - Add missing local_bh_disable calls for crypto_engine callback. - Ensure BH is disabled in crypto_engine callback path. - Fix zero length DMA mappings in ccree. - Add synchronization between mailbox accesses in octeontx2. - Add Xilinx SHA3 driver. - Add support for the TDES IP available on sama7g5 SoC in atmel" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (137 commits) crypto: xilinx - Turn SHA into a tristate and allow COMPILE_TEST MAINTAINERS: update HPRE/SEC2/TRNG driver maintainers list crypto: dh - Remove the unused function dh_safe_prime_dh_alg() hwrng: nomadik - Change clk_disable to clk_disable_unprepare crypto: arm64 - cleanup comments crypto: qat - fix initialization of pfvf rts_map_msg structures crypto: qat - fix initialization of pfvf cap_msg structures crypto: qat - remove unneeded assignment crypto: qat - disable registration of algorithms crypto: hisilicon/qm - fix memset during queues clearing crypto: xilinx: prevent probing on non-xilinx hardware crypto: marvell/octeontx - Use swap() instead of open coding it crypto: ccree - Fix use after free in cc_cipher_exit() crypto: ccp - ccp_dmaengine_unregister release dma channels crypto: octeontx2 - fix missing unlock hwrng: cavium - fix NULL but dereferenced coccicheck error crypto: cavium/nitrox - don't cast parameter in bit operations crypto: vmx - add missing dependencies MAINTAINERS: Add maintainer for Xilinx ZynqMP SHA3 driver crypto: xilinx - Add Xilinx SHA3 driver ...
2022-02-05crypto: arm/aes-neonbs-ctr - deal with non-multiples of AES block sizeArd Biesheuvel2-63/+77
Instead of falling back to C code to deal with the final bit of input that is not a round multiple of the block size, handle this in the asm code, permitting us to use overlapping loads and stores for performance, and implement the 16-byte wide XOR using a single NEON instruction. Since NEON loads and stores have a natural width of 16 bytes, we need to handle inputs of less than 16 bytes in a special way, but this rarely occurs in practice so it does not impact performance. All other input sizes can be consumed directly by the NEON asm code, although it should be noted that the core AES transform can still only process 128 bytes (8 AES blocks) at a time. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-02-04lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFIJason A. Donenfeld1-2/+2
blake2s_compress_generic is weakly aliased by blake2s_compress. The current harness for function selection uses a function pointer, which is ordinarily inlined and resolved at compile time. But when Clang's CFI is enabled, CFI still triggers when making an indirect call via a weak symbol. This seems like a bug in Clang's CFI, as though it's bucketing weak symbols and strong symbols differently. It also only seems to trigger when "full LTO" mode is used, rather than "thin LTO". [ 0.000000][ T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444) [ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1 [ 0.000000][ T0] Hardware name: MT6873 (DT) [ 0.000000][ T0] Call trace: [ 0.000000][ T0] dump_backtrace+0xfc/0x1dc [ 0.000000][ T0] dump_stack_lvl+0xa8/0x11c [ 0.000000][ T0] panic+0x194/0x464 [ 0.000000][ T0] __cfi_check_fail+0x54/0x58 [ 0.000000][ T0] __cfi_slowpath_diag+0x354/0x4b0 [ 0.000000][ T0] blake2s_update+0x14c/0x178 [ 0.000000][ T0] _extract_entropy+0xf4/0x29c [ 0.000000][ T0] crng_initialize_primary+0x24/0x94 [ 0.000000][ T0] rand_initialize+0x2c/0x6c [ 0.000000][ T0] start_kernel+0x2f8/0x65c [ 0.000000][ T0] __primary_switched+0xc4/0x7be4 [ 0.000000][ T0] Rebooting in 5 seconds.. Nonetheless, the function pointer method isn't so terrific anyway, so this patch replaces it with a simple boolean, which also gets inlined away. This successfully works around the Clang bug. In general, I'm not too keen on all of the indirection involved here; it clearly does more harm than good. Hopefully the whole thing can get cleaned up down the road when lib/crypto is overhauled more comprehensively. But for now, we go with a simple bandaid. Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in") Link: https://github.com/ClangBuiltLinux/linux/issues/1567 Reported-by: Miles Chen <miles.chen@mediatek.com> Tested-by: Miles Chen <miles.chen@mediatek.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Tested-by: John Stultz <john.stultz@linaro.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-01-07lib/crypto: blake2s: include as built-inJason A. Donenfeld4-77/+83
In preparation for using blake2s in the RNG, we change the way that it is wired-in to the build system. Instead of using ifdefs to select the right symbol, we use weak symbols. And because ARM doesn't need the generic implementation, we make the generic one default only if an arch library doesn't need it already, and then have arch libraries that do need it opt-in. So that the arch libraries can remain tristate rather than bool, we then split the shash part from the glue code. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: linux-crypto@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-07-16crypto: arm/curve25519 - rename 'mod_init' & 'mod_exit' functions to be ↵Randy Dunlap1-4/+4
module-specific Rename module_init & module_exit functions that are named "mod_init" and "mod_exit" so that they are unique in both the System.map file and in initcall_debug output instead of showing up as almost anonymous "mod_init". This is helpful for debugging and in determining how long certain module_init calls take to execute. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: linux-arm-kernel@lists.infradead.org Cc: patches@armlinux.org.uk Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-05-14crypto: arm - use a pattern rule for generating *.S filesMasahiro Yamada1-7/+1
Unify similar build rules. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-05-14crypto: arm - generate *.S by Perl at build time instead of shipping themMasahiro Yamada4-5848/+3
Generate *.S by Perl like arch/{mips,x86}/crypto/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-04-16crypto: arm/curve25519 - Move '.fpu' after '.arch'Nathan Chancellor1-1/+1
Debian's clang carries a patch that makes the default FPU mode 'vfp3-d16' instead of 'neon' for 'armv7-a' to avoid generating NEON instructions on hardware that does not support them: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/raw/5a61ca6f21b4ad8c6ac4970e5ea5a7b5b4486d22/debian/patches/clang-arm-default-vfp3-on-armv7a.patch https://bugs.debian.org/841474 https://bugs.debian.org/842142 https://bugs.debian.org/914268 This results in the following build error when clang's integrated assembler is used because the '.arch' directive overrides the '.fpu' directive: arch/arm/crypto/curve25519-core.S:25:2: error: instruction requires: NEON vmov.i32 q0, #1 ^ arch/arm/crypto/curve25519-core.S:26:2: error: instruction requires: NEON vshr.u64 q1, q0, #7 ^ arch/arm/crypto/curve25519-core.S:27:2: error: instruction requires: NEON vshr.u64 q0, q0, #8 ^ arch/arm/crypto/curve25519-core.S:28:2: error: instruction requires: NEON vmov.i32 d4, #19 ^ Shuffle the order of the '.arch' and '.fpu' directives so that the code builds regardless of the default FPU mode. This has been tested against both clang with and without Debian's patch and GCC. Cc: stable@vger.kernel.org Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation") Link: https://github.com/ClangBuiltLinux/continuous-integration2/issues/118 Reported-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Jessica Clarke <jrtc27@jrtc27.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-04-02crypto: poly1305 - fix poly1305_core_setkey() declarationArnd Bergmann1-1/+1
gcc-11 points out a mismatch between the declaration and the definition of poly1305_core_setkey(): lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=] 13 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 raw_key[16]) | ~~~~~~~~~^~~~~~~~~~~ In file included from lib/crypto/poly1305-donna32.c:11: include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 *’ {aka ‘const unsigned char *’} 21 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 *raw_key); This is harmless in principle, as the calling conventions are the same, but the more specific prototype allows better type checking in the caller. Change the declaration to match the actual function definition. The poly1305_simd_init() is a bit suspicious here, as it previously had a 32-byte argument type, but looks like it needs to take the 16-byte POLY1305_BLOCK_SIZE array instead. Fixes: 1c08a104360f ("crypto: poly1305 - add new 32 and 64-bit generic versions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-19crypto: arm/chacha-scalar - switch to common rev_l macroArd Biesheuvel1-30/+13
Drop the local definition of a byte swapping macro and use the common one instead. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-19crypto: arm/aes-scalar - switch to common rev_l/mov_l macrosArd Biesheuvel1-32/+10
The scalar AES implementation has some locally defined macros which reimplement things that are now available in macros defined in assembler.h. So let's switch to those. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Nicolas Pitre <nico@fluxnic.net> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-19crypto: arm/blake2s - fix for big endianEric Biggers1-0/+21
The new ARM BLAKE2s code doesn't work correctly (fails the self-tests) in big endian kernel builds because it doesn't swap the endianness of the message words when loading them. Fix this. Fixes: 5172d322d34c ("crypto: arm/blake2s - add ARM scalar optimized BLAKE2s") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-03-07crypto: arm/blake2b - drop unnecessary return statementEric Biggers1-2/+2
Neither crypto_unregister_shashes() nor the module_exit function return a value, so the explicit 'return' is unnecessary. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: arm/blake2b - add NEON-accelerated BLAKE2bEric Biggers4-0/+464
Add a NEON-accelerated implementation of BLAKE2b. On Cortex-A7 (which these days is the most common ARM processor that doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as SHA-256, and slightly faster than SHA-1. It is also almost three times as fast as the generic implementation of BLAKE2b: Algorithm Cycles per byte (on 4096-byte messages) =================== ======================================= blake2b-256-neon 14.0 sha1-neon 16.3 blake2s-256-arm 18.8 sha1-asm 20.8 blake2s-256-generic 26.0 sha256-neon 28.9 sha256-asm 32.0 blake2b-256-generic 38.9 This implementation isn't directly based on any other implementation, but it borrows some ideas from previous NEON code I've written as well as from chacha-neon-core.S. At least on Cortex-A7, it is faster than the other NEON implementations of BLAKE2b I'm aware of (the implementation in the BLAKE2 official repository using intrinsics, and Andrew Moon's implementation which can be found in SUPERCOP). It does only one block at a time, so it performs well on short messages too. NEON-accelerated BLAKE2b is useful because there is interest in using BLAKE2b-256 for dm-verity on low-end Android devices (specifically, devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On these devices, the performance cost of upgrading to SHA-256 may be unacceptable, whereas BLAKE2b-256 would actually improve performance. Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which is intended for 32-bit platforms), on 32-bit ARM processors with NEON, BLAKE2b is actually faster than BLAKE2s. This is because NEON supports 64-bit operations, and because BLAKE2s's block size is too small for NEON to be helpful for it. The best I've been able to do with BLAKE2s on Cortex-A7 is 18.8 cpb with an optimized scalar implementation. (I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but they're more complex as they require running multiple hashes at once. Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7, so I expect that any speedup from BLAKE2sp or BLAKE3 would come only from the smaller number of rounds, not from the extra parallelism.) For now this BLAKE2b implementation is only wired up to the shash API, since there is no library API for BLAKE2b yet. However, I've tried to keep things consistent with BLAKE2s, e.g. by defining blake2b_compress_arch() which is analogous to blake2s_compress_arch() and could be exported for use by the library API later if needed. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: arm/blake2s - add ARM scalar optimized BLAKE2sEric Biggers4-0/+374
Add an ARM scalar optimized implementation of BLAKE2s. NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too small for NEON to help. Each NEON instruction would depend on the previous one, resulting in poor performance. With scalar instructions, on the other hand, we can take advantage of ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an implementation get runs much faster than the C implementation. Performance results on Cortex-A7 in cycles per byte using the shash API: 4096-byte messages: blake2s-256-arm: 18.8 blake2s-256-generic: 26.0 500-byte messages: blake2s-256-arm: 20.3 blake2s-256-generic: 27.9 100-byte messages: blake2s-256-arm: 29.7 blake2s-256-generic: 39.2 32-byte messages: blake2s-256-arm: 50.6 blake2s-256-generic: 66.2 Except on very short messages, this is still slower than the NEON implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8, and 76.1 cpb on 4096, 500, 100, and 32-byte messages, respectively. However, optimized BLAKE2s is useful for cases where BLAKE2s is used instead of BLAKE2b, such as WireGuard. This new implementation is added in the form of a new module blake2s-arm.ko, which is analogous to blake2s-x86_64.ko in that it provides blake2s_compress_arch() for use by the library API as well as optionally register the algorithms with the shash API. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: remove cipher routines from public crypto APIArd Biesheuvel1-0/+3
The cipher routines in the crypto API are mostly intended for templates implementing skcipher modes generically in software, and shouldn't be used outside of the crypto subsystem. So move the prototypes and all related definitions to a new header file under include/crypto/internal. Also, let's use the new module namespace feature to move the symbol exports into a new namespace CRYPTO_INTERNAL. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2021-01-03crypto: arm/chacha-neon - add missing counter incrementArd Biesheuvel1-0/+1
Commit 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block size multiples") refactored the chacha block handling in the glue code in a way that may result in the counter increment to be omitted when calling chacha_block_xor_neon() to process a full block. This violates the skcipher API, which requires that the output IV is suitable for handling more input as long as the preceding input has been presented in round multiples of the block size. Also, the same code is exposed via the chacha library interface whose callers may actually rely on this increment to occur even for final blocks that are smaller than the chacha block size. So increment the counter after calling chacha_block_xor_neon(). Fixes: 86cd97ec4b943af3 ("crypto: arm/chacha-neon - optimize for non-block size multiples") Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-12-04crypto: arm/aes-ce - work around Cortex-A57/A72 silion errataArd Biesheuvel1-10/+22
ARM Cortex-A57 and Cortex-A72 cores running in 32-bit mode are affected by silicon errata #1742098 and #1655431, respectively, where the second instruction of a AES instruction pair may execute twice if an interrupt is taken right after the first instruction consumes an input register of which a single 32-bit lane has been updated the last time it was modified. This is not such a rare occurrence as it may seem: in counter mode, only the least significant 32-bit word is incremented in the absence of a carry, which makes our counter mode implementation susceptible to these errata. So let's shuffle the counter assignments around a bit so that the most recent updates when the AES instruction pair executes are 128-bit wide. [0] ARM-EPM-049219 v23 Cortex-A57 MPCore Software Developers Errata Notice [1] ARM-EPM-012079 v11.0 Cortex-A72 MPCore Software Developers Errata Notice Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-20crypto: sha - split sha.h into sha1.h and sha2.hEric Biggers9-9/+9
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2, and <crypto/sha3.h> contains declarations for SHA-3. This organization is inconsistent, but more importantly SHA-1 is no longer considered to be cryptographically secure. So to the extent possible, SHA-1 shouldn't be grouped together with any of the other SHA versions, and usage of it should be phased out. Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and <crypto/sha2.h>, and make everyone explicitly specify whether they want the declarations for SHA-1, SHA-2, or both. This avoids making the SHA-1 declarations visible to files that don't want anything to do with SHA-1. It also prepares for potentially moving sha1.h into a new insecure/ or dangerous/ directory. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-13crypto: arm/chacha-neon - optimize for non-block size multiplesArd Biesheuvel2-24/+107
The current NEON based ChaCha implementation for ARM is optimized for multiples of 4x the ChaCha block size (64 bytes). This makes sense for block encryption, but given that ChaCha is also often used in the context of networking, it makes sense to consider arbitrary length inputs as well. For example, WireGuard typically uses 1420 byte packets, and performing ChaCha encryption involves 5 invocations of chacha_4block_xor_neon() and 3 invocations of chacha_block_xor_neon(), where the last one also involves a memcpy() using a buffer on the stack to process the final chunk of 1420 % 64 == 12 bytes. Let's optimize for this case as well, by letting chacha_4block_xor_neon() deal with any input size between 64 and 256 bytes, using NEON permutation instructions and overlapping loads and stores. This way, the 140 byte tail of a 1420 byte input buffer can simply be processed in one go. This results in the following performance improvements for 1420 byte blocks, without significant impact on power-of-2 input sizes. (Note that Raspberry Pi is widely used in combination with a 32-bit kernel, even though the core is 64-bit capable) Cortex-A8 (BeagleBone) : 7% Cortex-A15 (Calxeda Midway) : 21% Cortex-A53 (Raspberry Pi 3) : 3% Cortex-A72 (Raspberry Pi 4) : 19% Cc: Eric Biggers <ebiggers@google.com> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-06crypto: arm/aes-neonbs - fix usage of cbc(aes) fallbackHoria Geantă1-3/+5
Loading the module deadlocks since: -local cbc(aes) implementation needs a fallback and -crypto API tries to find one but the request_module() resolves back to the same module Fix this by changing the module alias for cbc(aes) and using the NEED_FALLBACK flag when requesting for a fallback algorithm. Fixes: 00b99ad2bac2 ("crypto: arm/aes-neonbs - Use generic cbc encryption path") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/aes-neonbs - use typed init/exit routines for XTSArd Biesheuvel1-6/+6
Use the typed skcipher init/exit routines instead of the generic cra_init/_exit routines when instantiating/releasing the XTS skciphers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/aes-neonbs - avoid loading reorder argument on encryptionArd Biesheuvel1-2/+3
Reordering the tweak is never necessary for encryption, so avoid the argument load on the encryption path. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/aes-neonbs - avoid hacks to prevent Thumb2 mode switchesArd Biesheuvel1-27/+22
Instead of using a homegrown macrofied version of the adr instruction that sets the Thumb bit in the output value, only to ensure that any bx instructions consuming that value will not switch out of Thumb mode when branching, use non-interworking mov (to PC) instructions, which achieve the same thing. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/sha512-neon - avoid ADRL pseudo instructionArd Biesheuvel2-4/+4
The ADRL pseudo instruction is not an architectural construct, but a convenience macro that was supported by the ARM proprietary assembler and adopted by binutils GAS as well, but only when assembling in 32-bit ARM mode. Therefore, it can only be used in assembler code that is known to assemble in ARM mode only, but as it turns out, the Clang assembler does not implement ADRL at all, and so it is better to get rid of it entirely. So replace the ADRL instruction with a ADR instruction that refers to a nearer symbol, and apply the delta explicitly using an additional instruction. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-25crypto: arm/sha256-neon - avoid ADRL pseudo instructionArd Biesheuvel2-4/+4
The ADRL pseudo instruction is not an architectural construct, but a convenience macro that was supported by the ARM proprietary assembler and adopted by binutils GAS as well, but only when assembling in 32-bit ARM mode. Therefore, it can only be used in assembler code that is known to assemble in ARM mode only, but as it turns out, the Clang assembler does not implement ADRL at all, and so it is better to get rid of it entirely. So replace the ADRL instruction with a ADR instruction that refers to a nearer symbol, and apply the delta explicitly using an additional instruction. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-11crypto: arm/aes-neonbs - Use generic cbc encryption pathHerbert Xu1-18/+28
Since commit b56f5cbc7e08ec7d31c42fc41e5247677f20b143 ("crypto: arm/aes-neonbs - resolve fallback cipher at runtime") the CBC encryption path in aes-neonbs is now identical to that obtained through the cbc template. This means that it can simply call the generic cbc template instead of doing its own thing. This patch removes the custom encryption path and simply invokes the generic cbc template. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-09-04crypto: arm/poly1305 - Add prototype for poly1305_blocks_neonHerbert Xu1-0/+1
This patch adds a prototype for poly1305_blocks_neon to slience a compiler warning: CC [M] arch/arm/crypto/poly1305-glue.o ../arch/arm/crypto/poly1305-glue.c:25:13: warning: no previous prototype for `poly1305_blocks_neon' [-Wmissing-prototypes] void __weak poly1305_blocks_neon(void *state, const u8 *src, u32 len, u32 hibit) ^~~~~~~~~~~~~~~~~~~~ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>