summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-17crypto: ecc - Prevent ecc_digits_from_bytes from reading too many bytesStefan Berger2-13/+24
Prevent ecc_digits_from_bytes from reading too many bytes from the input byte array in case an insufficient number of bytes is provided to fill the output digit array of ndigits. Therefore, initialize the most significant digits with 0 to avoid trying to read too many bytes later on. Convert the function into a regular function since it is getting too big for an inline function. If too many bytes are provided on the input byte array the extra bytes are ignored since the input variable 'ndigits' limits the number of digits that will be filled. Fixes: d67c96fb97b5 ("crypto: ecdsa - Convert byte arrays with key coordinates to digits") Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-17crypto: qat - Fix ADF_DEV_RESET_SYNC memory leakHerbert Xu1-14/+5
Using completion_done to determine whether the caller has gone away only works after a complete call. Furthermore it's still possible that the caller has not yet called wait_for_completion, resulting in another potential UAF. Fix this by making the caller use cancel_work_sync and then freeing the memory safely. Fixes: 7d42e097607c ("crypto: qat - resolve race condition during AER recovery") Cc: <stable@vger.kernel.org> #6.8+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: atmel-sha204a - provide the otp contentLothar Rubusch1-0/+45
Set up sysfs for the Atmel SHA204a. Provide the content of the otp zone as an attribute field on the sysfs entry. Thereby make sure that if the chip is locked, not connected or trouble with the i2c bus, the sysfs device is not set up. This is mostly already handled in atmel-i2c. Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: atmel-sha204a - add reading from otp zoneLothar Rubusch3-0/+52
Provide a read function reading the otp zone. The otp zone can be used for storing serial numbers. The otp zone, as also data zone, are only accessible if the chip was locked before. Locking the chip is a post production customization and has to be done manually i.e. not by this driver. Without this step the chip is pretty much not usable, where putting or not putting data into the otp zone is optional. Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: atmel-i2c - rename read functionLothar Rubusch2-4/+4
Make the memory read function name more specific to the read memory zone. The Atmel SHA204 chips provide config, otp and data zone. The implemented read function in fact only reads some fields in zone config. The function renaming allows for a uniform naming scheme when reading from other memory zones. Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: atmel-i2c - add missing arg descriptionLothar Rubusch1-0/+1
Add missing description for argument hwrng. Signed-off-by: Lothar Rubusch <l.rubusch@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: iaa - Use kmemdup() instead of kzalloc() and memcpy()Thorsten Blum1-4/+2
Fixes the following two Coccinelle/coccicheck warnings reported by memdup.cocci: iaa_crypto_main.c:350:19-26: WARNING opportunity for kmemdup iaa_crypto_main.c:358:18-25: WARNING opportunity for kmemdup Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: sahara - use 'time_left' variable with wait_for_completion_timeout()Wolfram Sang1-8/+8
There is a confusing pattern in the kernel to use a variable named 'timeout' to store the result of wait_for_completion_timeout() causing patterns like: timeout = wait_for_completion_timeout(...) if (!timeout) return -ETIMEDOUT; with all kinds of permutations. Use 'time_left' as a variable to make the code self explaining. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: api - use 'time_left' variable with ↵Wolfram Sang1-4/+4
wait_for_completion_killable_timeout() There is a confusing pattern in the kernel to use a variable named 'timeout' to store the result of wait_for_completion_killable_timeout() causing patterns like: timeout = wait_for_completion_killable_timeout(...) if (!timeout) return -ETIMEDOUT; with all kinds of permutations. Use 'time_left' as a variable to make the code self explaining. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: caam - i.MX8ULP donot have CAAM page0 accessPankaj Gupta1-0/+3
iMX8ULP have a secure-enclave hardware IP called EdgeLock Enclave(ELE), that control access to caam controller's register page, i.e., page0. At all, if the ELE release access to CAAM controller's register page, it will release to secure-world only. Clocks are turned on automatically for iMX8ULP. There exists the caam clock gating bit, but it is not advised to gate the clock at linux, as optee-os or any other entity might be using it. Signed-off-by: Pankaj Gupta <pankaj.gupta@nxp.com> Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com> Reviewed-by: Horia Geanta <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: caam - init-clk based on caam-page0-accessPankaj Gupta1-1/+15
CAAM clock initializat is done based on the basis of soc specific info stored in struct caam_imx_data: - caam-page0-access flag - num_clks CAAM driver needs to be aware of access rights to CAAM control page i.e., page0, to do things differently. Signed-off-by: Pankaj Gupta <pankaj.gupta@nxp.com> Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: starfive - Use fallback for unaligned dma accessJia Jie Ho1-5/+7
Dma address mapping fails on unaligned scatterlist offset. Use sw fallback for these cases. Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: starfive - Do not free stack bufferJia Jie Ho1-1/+0
RSA text data uses variable length buffer allocated in software stack. Calling kfree on it causes undefined behaviour in subsequent operations. Cc: <stable@vger.kernel.org> #6.7+ Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: starfive - Skip unneeded fallback allocationJia Jie Ho1-6/+4
Skip sw fallback allocation if RSA module failed to get device handle. Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-10crypto: starfive - Skip dma setup for zeroed messageJia Jie Ho1-0/+4
Skip dma setup and mapping for AES driver if plaintext is empty. Signed-off-by: Jia Jie Ho <jiajie.ho@starfivetech.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-03crypto: hisilicon/sec2 - fix for register offsetWenkai Lin1-1/+1
The offset of SEC_CORE_ENABLE_BITMAP should be 0 instead of 32, it cause a kasan shift-out-bounds warning, fix it. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-03crypto: hisilicon/debugfs - mask the unnecessary info from the dumpChenghai Huang3-13/+22
Some information showed by the dump function is invalid. Mask the unnecessary information from the dump file. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-03crypto: qat - specify firmware files for 402xxGiovanni Cabiddu1-0/+2
The 4xxx driver can probe 4xxx and 402xx devices. However, the driver only specifies the firmware images required for 4xxx. This might result in external tools missing these binaries, if required, in the initramfs. Specify the firmware image used by 402xx with the MODULE_FIRMWARE() macros in the 4xxx driver. Fixes: a3e8c919b993 ("crypto: qat - add support for 402xx devices") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: x86/aes-gcm - simplify GCM hash subkey derivationEric Biggers1-18/+8
Remove a redundant expansion of the AES key, and use rodata for zeroes. Also rename rfc4106_set_hash_subkey() to aes_gcm_derive_hash_subkey() because it's used for both versions of AES-GCM, not just RFC4106. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: x86/aes-gcm - delete unused GCM assembly codeEric Biggers1-186/+0
Delete aesni_gcm_enc() and aesni_gcm_dec() because they are unused. Only the incremental AES-GCM functions (aesni_gcm_init(), aesni_gcm_enc_update(), aesni_gcm_finalize()) are actually used. This saves 17 KB of object code. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath()Eric Biggers1-8/+5
Since the total length processed by the loop in xts_crypt_slowpath() is a multiple of AES_BLOCK_SIZE, just round the length down to AES_BLOCK_SIZE even on the last step. This doesn't change behavior, as the last step will process a multiple of AES_BLOCK_SIZE regardless. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26hwrng: stm32 - repair clock handlingMarek Vasut1-0/+9
The clock management in this driver does not seem to be correct. The struct hwrng .init callback enables the clock, but there is no matching .cleanup callback to disable the clock. The clock get disabled as some later point by runtime PM suspend callback. Furthermore, both runtime PM and sleep suspend callbacks access registers first and disable clock which are used for register access second. If the IP is already in RPM suspend and the system enters sleep state, the sleep callback will attempt to access registers while the register clock are already disabled. This bug has been fixed once before already in commit 9bae54942b13 ("hwrng: stm32 - fix pm_suspend issue"), and regressed in commit ff4e46104f2e ("hwrng: stm32 - rework power management sequences") . Fix this slightly differently, disable register clock at the end of .init callback, this way the IP is disabled after .init. On every access to the IP, which really is only stm32_rng_read(), do pm_runtime_get_sync() which is already done in stm32_rng_read() to bring the IP from RPM suspend, and pm_runtime_mark_last_busy()/pm_runtime_put_sync_autosuspend() to put it back into RPM suspend. Change sleep suspend/resume callbacks to enable and disable register clock around register access, as those cannot use the RPM suspend/resume callbacks due to slightly different initialization in those sleep callbacks. This way, the register access should always be performed with clock surely enabled. Fixes: ff4e46104f2e ("hwrng: stm32 - rework power management sequences") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26hwrng: stm32 - put IP into RPM suspend on failureMarek Vasut1-2/+5
In case of an irrecoverable failure, put the IP into RPM suspend to avoid RPM imbalance. I did not trigger this case, but it seems it should be done based on reading the code. Fixes: b17bc6eb7c2b ("hwrng: stm32 - rework error handling in stm32_rng_read()") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26hwrng: stm32 - use logical OR in conditionalMarek Vasut1-1/+1
The conditional is used to check whether err is non-zero OR whether reg variable is non-zero after clearing bits from it. This should be done using logical OR, not bitwise OR, fix it. Fixes: 6b85a7e141cb ("hwrng: stm32 - implement STM32MP13x support") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: ecdh - Initialize ctx->private_key in proper byte orderStefan Berger3-25/+15
The private key in ctx->private_key is currently initialized in reverse byte order in ecdh_set_secret and whenever the key is needed in proper byte order the variable priv is introduced and the bytes from ctx->private_key are copied into priv while being byte-swapped (ecc_swap_digits). To get rid of the unnecessary byte swapping initialize ctx->private_key in proper byte order and clean up all functions that were previously using priv or were called with ctx->private_key: - ecc_gen_privkey: Directly initialize the passed ctx->private_key with random bytes filling all the digits of the private key. Get rid of the priv variable. This function only has ecdh_set_secret as a caller to create NIST P192/256/384 private keys. - crypto_ecdh_shared_secret: Called only from ecdh_compute_value with ctx->private_key. Get rid of the priv variable and work with the passed private_key directly. - ecc_make_pub_key: Called only from ecdh_compute_value with ctx->private_key. Get rid of the priv variable and work with the passed private_key directly. Cc: Salvatore Benedetto <salvatore.benedetto@intel.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: ecdh - Pass private key in proper byte order to check valid keyStefan Berger1-3/+8
ecc_is_key_valid expects a key with the most significant digit in the last entry of the digit array. Currently ecdh_set_secret passes a reversed key to ecc_is_key_valid that then passes the rather simple test checking whether the private key is in range [2, n-3]. For all current ecdh- supported curves (NIST P192/256/384) the 'n' parameter is a rather large number, therefore easily passing this test. Throughout the ecdh and ecc codebase the variable 'priv' is used for a private_key holding the bytes in proper byte order. Therefore, introduce priv in ecdh_set_secret and copy the bytes from ctx->private_key into priv in proper byte order by using ecc_swap_digits. Pass priv to ecc_is_valid_key. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Salvatore Benedetto <salvatore.benedetto@intel.com> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: tegra - Fix some error codesDan Carpenter1-2/+2
Return negative -ENOMEM, instead of positive ENOMEM. Fixes: 0880bb3b00c8 ("crypto: tegra - Add Tegra Security Engine driver") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Akhil R <akhilrajeev@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26dt-bindings: crypto: starfive: Restore sort orderGeert Uytterhoeven1-1/+1
Restore alphabetical sort order of the list of supported compatible values. Fixes: 2ccf7a5d9c50f3ea ("dt-bindings: crypto: starfive: Add jh8100 support") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: qat - validate slices count returned by FWLucas Segarra Fernandez3-0/+23
The function adf_send_admin_tl_start() enables the telemetry (TL) feature on a QAT device by sending the ICP_QAT_FW_TL_START message to the firmware. This triggers the FW to start writing TL data to a DMA buffer in memory and returns an array containing the number of accelerators of each type (slices) supported by this HW. The pointer to this array is stored in the adf_tl_hw_data data structure called slice_cnt. The array slice_cnt is then used in the function tl_print_dev_data() to report in debugfs only statistics about the supported accelerators. An incorrect value of the elements in slice_cnt might lead to an out of bounds memory read. At the moment, there isn't an implementation of FW that returns a wrong value, but for robustness validate the slice count array returned by FW. Fixes: 69e7649f7cc2 ("crypto: qat - add support for device telemetry") Signed-off-by: Lucas Segarra Fernandez <lucas.segarra.fernandez@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: aead,cipher - zeroize key buffer after useHailey Mothershead2-4/+2
I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding cryptographic information should be zeroized once they are no longer needed. Accomplish this by using kfree_sensitive for buffers that previously held the private key. Signed-off-by: Hailey Mothershead <hailmo@amazon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: arm64/aes-ce - Simplify round key load sequenceArd Biesheuvel2-30/+24
Tweak the round key logic so that they can be loaded using a single branchless sequence using overlapping loads. This is shorter and simpler, and puts the conditional branches based on the key size further apart, which might benefit microarchitectures that cannot record taken branches at every instruction. For these branches, use test-bit-branch instructions that don't clobber the condition flags. Note that none of this has any impact on performance, positive or otherwise (and the branch prediction benefit would only benefit AES-192 which nobody uses). It does make for nicer code, though. While at it, use \@ to generate the labels inside the macros, which is more robust than using fixed numbers, which could clash inadvertently. Also, bring aes-neon.S in line with these changes, including the switch to test-and-branch instructions, to avoid surprises in the future when we might start relying on the condition flags being preserved in the chaining mode wrappers in aes-modes.S Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-26crypto: tegra - Convert to platform remove callback returning voidUwe Kleine-König1-4/+2
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Fixes: 0880bb3b00c8 ("crypto: tegra - Add Tegra Security Engine driver") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Akhil R <akhilrajeev@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/aes-xts - optimize size of instructions operating on lengthsEric Biggers2-28/+30
x86_64 has the "interesting" property that the instruction size is generally a bit shorter for instructions that operate on the 32-bit (or less) part of registers, or registers that are in the original set of 8. This patch adjusts the AES-XTS code to take advantage of that property by changing the LEN parameter from size_t to unsigned int (which is all that's needed and is what the non-AVX implementation uses) and using the %eax register for KEYLEN. This decreases the size of aes-xts-avx-x86_64.o by 1.2%. Note that changing the kmovq to kmovd was going to be needed anyway to make the AVX10/256 code really work on CPUs that don't support 512-bit vectors (since the AVX10 spec says that 64-bit opmask instructions will only be supported on processors that support 512-bit vectors). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/aes-xts - eliminate a few more instructionsEric Biggers1-26/+13
- For conditionally subtracting 16 from LEN when decrypting a message whose length isn't a multiple of 16, use the cmovnz instruction. - Fold the addition of 4*VL to LEN into the sub of VL or 16 from LEN. - Remove an unnecessary test instruction. This results in slightly shorter code, both source and binary. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/aes-xts - handle AES-128 and AES-192 more efficientlyEric Biggers1-86/+92
Decrease the amount of code specific to the different AES variants by "right-aligning" the sequence of round keys, and for AES-128 and AES-192 just skipping irrelevant rounds at the beginning. This shrinks the size of aes-xts-avx-x86_64.o by 13.3%, and it improves the efficiency of AES-128 and AES-192. The tradeoff is that for AES-256 some additional not-taken conditional jumps are now executed. But these are predicted well and are cheap on x86. Note that the ARMv8 CE based AES-XTS implementation uses a similar strategy to handle the different AES variants. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/aesni-xts - deduplicate aesni_xts_enc() and aesni_xts_dec()Eric Biggers1-191/+79
Since aesni_xts_enc() and aesni_xts_dec() are very similar, generate them from a macro that's passed an argument enc=1 or enc=0. This reduces the length of aesni-intel_asm.S by 112 lines while still producing the exact same object file in both 32-bit and 64-bit mode. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/aes-xts - handle CTS encryption more efficientlyEric Biggers1-24/+29
When encrypting a message whose length isn't a multiple of 16 bytes, encrypt the last full block in the main loop. This works because only decryption uses the last two tweaks in reverse order, not encryption. This improves the performance of decrypting messages whose length isn't a multiple of the AES block length, shrinks the size of aes-xts-avx-x86_64.o by 5.0%, and eliminates two instructions (a test and a not-taken conditional jump) when encrypting a message whose length *is* a multiple of the AES block length. While it's not super useful to optimize for ciphertext stealing given that it's rarely needed in practice, the other two benefits mentioned above make this optimization worthwhile. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: stm32/hash - add full DMA support for stm32mpxMaxime Méré1-122/+448
Due to a lack of alignment in the data sent by requests, the actual DMA support of the STM32 hash driver is only working with digest calls. This patch, based on the algorithm used in the driver omap-sham.c, allows for the usage of DMA in any situation. It has been functionally tested on STM32MP15, STM32MP13 and STM32MP25. By checking the performance of this new driver with OpenSSL, the following results were found: Performance: (datasize: 4096, number of hashes performed in 10s) |type |no DMA |DMA support|software | |-------|----------|-----------|----------| |md5 |13873.56k |10958.03k |71163.08k | |sha1 |13796.15k |10729.47k |39670.58k | |sha224 |13737.98k |10775.76k |22094.64k | |sha256 |13655.65k |10872.01k |22075.39k | CPU Usage: (algorithm used: sha256, computation time: 20s, measurement taken at ~10s) |datasize |no DMA |DMA | software | |----------|-------|-----|----------| | 2048 | 56% | 49% | 50% | | 4096 | 54% | 46% | 50% | | 8192 | 53% | 40% | 50% | | 16384 | 53% | 33% | 50% | Note: this update doesn't change the driver performance without DMA. As shown, performance with DMA is slightly lower than without, but in most cases, it will save CPU time. Signed-off-by: Maxime Méré <maxime.mere@foss.st.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: qat - improve error logging to be consistent across featuresAdam Guerin1-1/+1
Improve error logging in rate limiting feature. Staying consistent with the error logging found in the telemetry feature. Fixes: d9fb8408376e ("crypto: qat - add rate limiting feature to qat_4xxx") Signed-off-by: Adam Guerin <adam.guerin@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: qat - improve error message in adf_get_arbiter_mapping()Adam Guerin2-2/+2
Improve error message to be more readable. Fixes: 5da6a2d5353e ("crypto: qat - generate dynamically arbiter mappings") Signed-off-by: Adam Guerin <adam.guerin@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/sha256-ni - simplify do_4roundsEric Biggers1-6/+4
Instead of loading the message words into both MSG and \m0 and then adding the round constants to MSG, load the message words into \m0 and the round constants into MSG and then add \m0 to MSG. This shortens the source code slightly. It changes the instructions slightly, but it doesn't affect binary code size and doesn't seem to affect performance. Suggested-by: Stefan Kanthak <stefan.kanthak@nexgo.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/sha256-ni - optimize code sizeEric Biggers1-15/+15
- Load the SHA-256 round constants relative to a pointer that points into the middle of the constants rather than to the beginning. Since x86 instructions use signed offsets, this decreases the instruction length required to access some of the later round constants. - Use punpcklqdq or punpckhqdq instead of longer instructions such as pshufd, pblendw, and palignr. This doesn't harm performance. The end result is that sha256_ni_transform shrinks from 839 bytes to 791 bytes, with no loss in performance. Suggested-by: Stefan Kanthak <stefan.kanthak@nexgo.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/sha256-ni - rename some register aliasesEric Biggers1-17/+17
MSGTMP[0-3] are used to hold the message schedule and are not temporary registers per se. MSGTMP4 is used as a temporary register for several different purposes and isn't really related to MSGTMP[0-3]. Rename them to MSG[0-3] and TMP accordingly. Also add a comment that clarifies what MSG is. Suggested-by: Stefan Kanthak <stefan.kanthak@nexgo.de> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/sha256-ni - convert to use rounds macrosEric Biggers1-182/+29
To avoid source code duplication, do the SHA-256 rounds using macros. This reduces the length of sha256_ni_asm.S by 153 lines while still producing the exact same object file. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: qat - implement dh fallback for primes > 4KDamian Muszynski1-6/+60
The Intel QAT driver provides support for the Diffie-Hellman (DH) algorithm, limited to prime numbers up to 4K. This driver is used by default on platforms with integrated QAT hardware for all DH requests. This has led to failures with algorithms requiring larger prime sizes, such as ffdhe6144. alg: ffdhe6144(dh): test failed on vector 1, err=-22 alg: self-tests for ffdhe6144(qat-dh) (ffdhe6144(dh)) failed (rc=-22) Implement a fallback mechanism when an unsupported request is received. Signed-off-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: x86/aes-xts - access round keys using single-byte offsetsEric Biggers1-37/+44
Access the AES round keys using offsets -7*16 through 7*16, instead of 0*16 through 14*16. This allows VEX-encoded instructions to address all round keys using 1-byte offsets, whereas before some needed 4-byte offsets. This decreases the code size of aes-xts-avx-x86_64.o by 4.2%. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19crypto: octeontx2 - add missing check for dma_map_singleChen Ni1-0/+4
Add check for dma_map_single() and return error if it fails in order to avoid invalid dma address. Fixes: e92971117c2c ("crypto: octeontx2 - add ctx_val workaround") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Bharat Bhushan <bbhushan2@marvell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: x86/aes-xts - make non-AVX implementation use new glue codeEric Biggers3-203/+132
Make the non-AVX implementation of AES-XTS (xts-aes-aesni) use the new glue code that was introduced for the AVX implementations of AES-XTS. This reduces code size, and it improves the performance of xts-aes-aesni due to the optimization for messages that don't span page boundaries. This required moving the new glue functions higher up in the file and allowing the IV encryption function to be specified by the caller. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12X.509: Introduce scope-based x509_certificate allocationLukas Wunner4-49/+30
Add a DEFINE_FREE() clause for x509_certificate structs and use it in x509_cert_parse() and x509_key_preparse(). These are the only functions where scope-based x509_certificate allocation currently makes sense. A third user will be introduced with the forthcoming SPDM library (Security Protocol and Data Model) for PCI device authentication. Unlike most other DEFINE_FREE() clauses, this one checks for IS_ERR() instead of NULL before calling x509_free_certificate() at end of scope. That's because the "constructor" of x509_certificate structs, x509_cert_parse(), returns a valid pointer or an ERR_PTR(), but never NULL. Comparing the Assembler output before/after has shown they are identical, save for the fact that gcc-12 always generates two return paths when __cleanup() is used, one for the success case and one for the error case. In x509_cert_parse(), add a hint for the compiler that kzalloc() never returns an ERR_PTR(). Otherwise the compiler adds a gratuitous IS_ERR() check on return. Introduce an assume() macro for this which can be re-used elsewhere in the kernel to provide hints for the compiler. Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Link: https://lore.kernel.org/all/20231003153937.000034ca@Huawei.com/ Link: https://lwn.net/Articles/934679/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: hisilicon/qm - Add the err memory release process to qm uninitChenghai Huang1-4/+1
When the qm uninit command is executed, the err data needs to be released to prevent memory leakage. The error information release operation and uacce_remove are integrated in qm_remove_uacce. So add the qm_remove_uacce to qm uninit to avoid err memory leakage. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>