summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-12crypto: hisilicon/debugfs - Resolve the problem of applying for redundant ↵Chenghai Huang1-6/+5
space in sq dump When dumping SQ, only the corresponding ID's SQE needs to be dumped, and there is no need to apply for the entire SQE memory. This is because excessive dump operations can lead to memory resource waste. Therefor apply for the space corresponding to sqe_id separately to avoid space waste. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: hisilicon/sec - Fix memory leak for sec resource releaseChenghai Huang1-1/+3
The AIV is one of the SEC resources. When releasing resources, it need to release the AIV resources at the same time. Otherwise, memory leakage occurs. The aiv resource release is added to the sec resource release function. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: hisilicon - Adjust debugfs creation and release orderChenghai Huang3-36/+32
There is a scenario where the file directory is created but the file memory is not set. In this case, if a user accesses the file, an error occurs. So during the creation process of debugfs, memory should be allocated first before creating the directory. In the release process, the directory should be deleted first before releasing the memory to avoid the situation where the memory does not exist when accessing the directory. In addition, the directory released by the debugfs is a global variable. When the debugfs of an accelerator fails to be initialized, releasing the directory of the global variable affects the debugfs initialization of other accelerators. The debugfs root directory released by debugfs init should be a member of qm, not a global variable. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: hisilicon/qm - Add the default processing branchChenghai Huang1-0/+3
The cmd type can be extended. Currently, only four types of cmd can be processed. Therefor, add the default processing branch to intercept incorrect parameter input. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: hisilicon/debugfs - Fix the processing logic issue in the debugfs ↵Chenghai Huang1-3/+3
creation There is a scenario where the file directory is created but the file attribute is not set. In this case, if a user accesses the file, an error occurs. So adjust the processing logic in the debugfs creation to prevent the file from being accessed before the file attributes such as the index are set. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: hisilicon/sgl - Delete redundant parameter verificationChenghai Huang1-4/+1
The input parameter check in acc_get_sgl is redundant. The caller has been verified once. When the check is performed for multiple times, the performance deteriorates. So the redundant parameter verification is deleted, and the index verification is changed to the module entry function for verification. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: hisilicon/debugfs - Fix debugfs uninit process issueChenghai Huang1-3/+18
During the zip probe process, the debugfs failure does not stop the probe. When debugfs initialization fails, jumping to the error branch will also release regs, in addition to its own rollback operation. As a result, it may be released repeatedly during the regs uninit process. Therefore, the null check needs to be added to the regs uninit process. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: hisilicon/sec - Add the condition for configuring the sriov functionChenghai Huang1-1/+2
When CONFIG_PCI_IOV is disabled, the SRIOV configuration function is not required. An error occurs if this function is incorrectly called. Consistent with other modules, add the condition for configuring the sriov function of sec_pci_driver. Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: x86/sha512-avx2 - add missing vzeroupperEric Biggers1-0/+1
Since sha512_transform_rorx() uses ymm registers, execute vzeroupper before returning from it. This is necessary to avoid reducing the performance of SSE code. Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: x86/sha256-avx2 - add missing vzeroupperEric Biggers1-0/+1
Since sha256_transform_rorx() uses ymm registers, execute vzeroupper before returning from it. This is necessary to avoid reducing the performance of SSE code. Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: x86/nh-avx2 - add missing vzeroupperEric Biggers1-0/+1
Since nh_avx2() uses ymm registers, execute vzeroupper before returning from it. This is necessary to avoid reducing the performance of SSE code. Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: iaa - Use cpumask_weight() when rebalancingTom Zanussi1-2/+2
If some cpus are offlined, or if the node mask is smaller than expected, the 'nonexistent cpu' warning in rebalance_wq_table() may be erroneously triggered. Use cpumask_weight() to make sure we only iterate over the exact number of cpus in the mask. Also use num_possible_cpus() instead of num_online_cpus() to make sure all slots in the wq table are initialized. Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: x509 - Add OID for NIST P521 and extend parser for itStefan Berger2-0/+4
Enable the x509 parser to accept NIST P521 certificates and add the OID for ansip521r1, which is the identifier for NIST P521. Cc: David Howells <dhowells@redhat.com> Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: asymmetric_keys - Adjust signature size calculation for NIST P521Stefan Berger1-1/+13
Adjust the calculation of the maximum signature size for support of NIST P521. While existing curves may prepend a 0 byte to their coordinates (to make the number positive), NIST P521 will not do this since only the first bit in the most significant byte is used. If the encoding of the x & y coordinates requires at least 128 bytes then an additional byte is needed for the encoding of the length. Take this into account when calculating the maximum signature size. Reviewed-by: Lukas Wunner <lukas@wunner.de> Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecdsa - Register NIST P521 and extend test suiteStefan Berger3-0/+184
Register NIST P521 as an akcipher and extend the testmgr with NIST P521-specific test vectors. Add a module alias so the module gets automatically loaded by the crypto subsystem when the curve is needed. Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecdsa - Rename keylen to bufsize where necessaryStefan Berger1-6/+6
In cases where 'keylen' was referring to the size of the buffer used by a curve's digits, it does not reflect the purpose of the variable anymore once NIST P521 is used. What it refers to then is the size of the buffer, which may be a few bytes larger than the size a coordinate of a key. Therefore, rename keylen to bufsize where appropriate. Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecdsa - Replace ndigits with nbits where precision is neededStefan Berger1-1/+1
Replace the usage of ndigits with nbits where precise space calculations are needed, such as in ecdsa_max_size where the length of a coordinate is determined. Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecc - Add NIST P521 curve parametersStefan Berger3-0/+48
Add the parameters for the NIST P521 curve and define a new curve ID for it. Make the curve available in ecc_get_curve. Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecc - Add special case for NIST P521 in ecc_point_multStefan Berger1-1/+4
In ecc_point_mult use the number of bits of the NIST P521 curve + 2. The change is required specifically for NIST P521 to pass mathematical tests on the public key. Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecc - Implement vli_mmod_fast_521 for NIST p521Stefan Berger2-1/+27
Implement vli_mmod_fast_521 following the description for how to calculate the modulus for NIST P521 in the NIST publication "Recommendations for Discrete Logarithm-Based Cryptography: Elliptic Curve Domain Parameters" section G.1.4. NIST p521 requires 9 64bit digits, so increase the ECC_MAX_DIGITS so that the vli digit array provides enough elements to fit the larger integers required by this curve. Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecc - Add nbits field to ecc_curve structureStefan Berger3-0/+11
Add the number of bits a curve has to the ecc_curve definition to be able to derive the number of bytes a curve requires for its coordinates from it. It also allows one to identify a curve by its particular size. Set the number of bits on all curve definitions. Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Vitaly Chikunov <vt@altlinux.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecdsa - Extend res.x mod n calculation for NIST P521Stefan Berger1-1/+1
res.x has been calculated by ecc_point_mult_shamir, which uses 'mod curve_prime' on res.x and therefore p > res.x with 'p' being the curve_prime. Further, it is true that for the NIST curves p > n with 'n' being the 'curve_order' and therefore the following may be true as well: p > res.x >= n. If res.x >= n then res.x mod n can be calculated by iteratively sub- tracting n from res.x until res.x < n. For NIST P192/256/384 this can be done in a single subtraction. This can also be done in a single subtraction for NIST P521. The mathematical reason why a single subtraction is sufficient is due to the values of 'p' and 'n' of the NIST curves where the following holds true: note: max(res.x) = p - 1 max(res.x) - n < n p - 1 - n < n p - 1 < 2n => holds true for the NIST curves Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecdsa - Adjust tests on length of key parametersStefan Berger1-1/+1
In preparation for support of NIST P521, adjust the basic tests on the length of the provided key parameters to only ensure that the length of the x plus y coordinates parameter array is not an odd number and that each coordinate fits into an array of 'ndigits' digits. Mathematical tests on the key's parameters are then done in ecc_is_pubkey_valid_full rejecting invalid keys. The change is necessary since NIST P521 keys do not have keys with coordinates that each require 'full' digits (= all bits in u64 used). NIST P521 only requires 2 bytes (9 bits) in the most significant digit unlike NIST P192/256/384 that each require multiple 'full' digits. Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecdsa - Convert byte arrays with key coordinates to digitsStefan Berger2-5/+30
For NIST P192/256/384 the public key's x and y parameters could be copied directly from a given array since both parameters filled 'ndigits' of digits (a 'digit' is a u64). For support of NIST P521 the key parameters need to have leading zeros prepended to the most significant digit since only 2 bytes of the most significant digit are provided. Therefore, implement ecc_digits_from_bytes to convert a byte array into an array of digits and use this function in ecdsa_set_pub_key where an input byte array needs to be converted into digits. Suggested-by: Lukas Wunner <lukas@wunner.de> Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecc - Use ECC_CURVE_NIST_P192/256/384_DIGITS where possibleStefan Berger1-6/+6
Replace hard-coded numbers with ECC_CURVE_NIST_P192/256/384_DIGITS where possible. Tested-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: tegra - Add Tegra Security Engine driverAkhil R9-0/+4171
Add support for Tegra Security Engine which can accelerate various crypto algorithms. The Engine has two separate instances within for AES and HASH algorithms respectively. The driver registers two crypto engines - one for AES and another for HASH algorithms and these operate independently and both uses the host1x bus. Additionally, it provides hardware-assisted key protection for up to 15 symmetric keys which it can use for the cipher operations. Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12gpu: host1x: Add Tegra SE to SID tableAkhil R1-0/+24
Add Tegra Security Engine details to the SID table in host1x driver. These entries are required to be in place to configure the stream ID for SE. Register writes to stream ID registers fail otherwise. Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Acked-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12dt-bindings: crypto: Add Tegra Security EngineAkhil R2-0/+104
Add DT binding document for Tegra Security Engine. The AES and HASH algorithms are handled independently by separate engines within the Security Engine. These engines are registered as two separate crypto engine drivers. Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12padata: Disable BH when taking works lock on MT pathHerbert Xu1-4/+4
As the old padata code can execute in softirq context, disable softirqs for the new padata_do_mutithreaded code too as otherwise lockdep will get antsy. Reported-by: syzbot+0cb5bb0f4bf9e79db3b3@syzkaller.appspotmail.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ccp - drop platform ifdef checksArnd Bergmann1-12/+2
When both ACPI and OF are disabled, the dev_vdata variable is unused: drivers/crypto/ccp/sp-platform.c:33:34: error: unused variable 'dev_vdata' [-Werror,-Wunused-const-variable] This is not a useful configuration, and there is not much point in saving a few bytes when only one of the two is enabled, so just remove all these ifdef checks and rely on of_match_node() and acpi_match_device() returning NULL when these subsystems are disabled. Fixes: 6c5063434098 ("crypto: ccp - Add ACPI support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: qat - Fix spelling mistake "Invalide" -> "Invalid"Colin Ian King1-1/+1
There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: algboss - remove NULL check in cryptomgr_schedule_probe()Roman Smirnov1-3/+0
The for loop will be executed at least once, so i > 0. If the loop is interrupted before i is incremented (e.g., when checking len for NULL), i will not be checked. Found by Linux Verification Center (linuxtesting.org) with Svace. Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12crypto: ecc - remove checks in crypto_ecdh_shared_secret() and ↵Roman Smirnov1-2/+2
ecc_make_pub_key() With the current state of the code, ecc_get_curve() cannot return NULL in crypto_ecdh_shared_secret() and ecc_make_pub_key(). This is conditioned by the fact that they are only called from ecdh_compute_value(), which implements the kpp_alg::{generate_public_key,compute_shared_secret}() methods. Also ecdh implements the kpp_alg::init() method, which is called before the other methods and sets ecdh_ctx::curve_id to a valid value. Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: jitter - Replace http with httpsThorsten Blum1-1/+1
The PDF is also available via https. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: jitter - Remove duplicate word in commentThorsten Blum1-1/+1
s/in// Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: x86/aes-xts - wire up VAES + AVX10/512 implementationEric Biggers2-0/+41
Add an AES-XTS implementation "xts-aes-vaes-avx10_512" for x86_64 CPUs with the VAES, VPCLMULQDQ, and either AVX10/512 or AVX512BW + AVX512VL extensions. This implementation uses zmm registers to operate on four AES blocks at a time. The assembly code is instantiated using a macro so that most of the source code is shared with other implementations. To avoid downclocking on older Intel CPU models, an exclusion list is used to prevent this 512-bit implementation from being used by default on some CPU models. They will use xts-aes-vaes-avx10_256 instead. For now, this exclusion list is simply coded into aesni-intel_glue.c. It may make sense to eventually move it into a more central location. xts-aes-vaes-avx10_512 is slightly faster than xts-aes-vaes-avx10_256 on some current CPUs. E.g., on AMD Zen 4, AES-256-XTS decryption throughput increases by 13% with 4096-byte inputs, or 14% with 512-byte inputs. On Intel Sapphire Rapids, AES-256-XTS decryption throughput increases by 2% with 4096-byte inputs, or 3% with 512-byte inputs. Future CPUs may provide stronger 512-bit support, in which case a larger benefit should be seen. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: x86/aes-xts - wire up VAES + AVX10/256 implementationEric Biggers2-0/+25
Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX512VL extensions. This implementation avoids using zmm registers, instead using ymm registers to operate on two AES blocks at a time. The assembly code is instantiated using a macro so that most of the source code is shared with other implementations. This is the optimal implementation on CPUs that support VAES and AVX512 but where the zmm registers should not be used due to downclocking effects, for example Intel's Ice Lake. It should also be the optimal implementation on future CPUs that support AVX10/256 but not AVX10/512. The performance is slightly better than that of xts-aes-vaes-avx2, which uses the same 256-bit vector length, due to factors such as being able to use ymm16-ymm31 to cache the AES round keys, and being able to use the vpternlogd instruction to do XORs more efficiently. For example, on Ice Lake, the throughput of decrypting 4096-byte messages with AES-256-XTS is 6.6% higher with xts-aes-vaes-avx10_256 than with xts-aes-vaes-avx2. While this is a small improvement, it is straightforward to provide this implementation (xts-aes-vaes-avx10_256) as long as we are providing xts-aes-vaes-avx2 and xts-aes-vaes-avx10_512 anyway, due to the way the _aes_xts_crypt macro is structured. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: x86/aes-xts - wire up VAES + AVX2 implementationEric Biggers2-0/+31
Add an AES-XTS implementation "xts-aes-vaes-avx2" for x86_64 CPUs with the VAES, VPCLMULQDQ, and AVX2 extensions, but not AVX512 or AVX10. This implementation uses ymm registers to operate on two AES blocks at a time. The assembly code is instantiated using a macro so that most of the source code is shared with other implementations. This is the optimal implementation on AMD Zen 3. It should also be the optimal implementation on Intel Alder Lake, which similarly supports VAES but not AVX512. Comparing to xts-aes-aesni-avx on Zen 3, xts-aes-vaes-avx2 provides 70% higher AES-256-XTS decryption throughput with 4096-byte messages, or 23% higher with 512-byte messages. A large improvement is also seen with CPUs that do support AVX512 (e.g., 98% higher AES-256-XTS decryption throughput on Ice Lake with 4096-byte messages), though the following patches add AVX512 optimized implementations to get a bit more performance on those CPUs. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: x86/aes-xts - wire up AESNI + AVX implementationEric Biggers2-2/+209
Add an AES-XTS implementation "xts-aes-aesni-avx" for x86_64 CPUs that have the AES-NI and AVX extensions but not VAES. It's similar to the existing xts-aes-aesni in that uses xmm registers to operate on one AES block at a time. It differs from xts-aes-aesni in the following ways: - It uses the VEX-coded (non-destructive) instructions from AVX. This improves performance slightly. - It incorporates some additional optimizations such as interleaving the tweak computation with AES en/decryption, handling single-page messages more efficiently, and caching the first round key. - It supports only 64-bit (x86_64). - It's generated by an assembly macro that will also be used to generate VAES-based implementations. The performance improvement over xts-aes-aesni varies from small to large, depending on the CPU and other factors such as the size of the messages en/decrypted. For example, the following increases in AES-256-XTS decryption throughput are seen on the following CPUs: | 4096-byte messages | 512-byte messages | ----------------------+--------------------+-------------------+ Intel Skylake | 6% | 31% | Intel Cascade Lake | 4% | 26% | AMD Zen 1 | 61% | 73% | AMD Zen 2 | 36% | 59% | (The above CPUs don't support VAES, so they can't use VAES instead.) While this isn't as large an improvement as what VAES provides, this still seems worthwhile. This implementation is fairly easy to provide based on the assembly macro that's needed for VAES anyway, and it will be the best implementation on a large number of CPUs (very roughly, the CPUs launched by Intel and AMD from 2011 to 2018). This makes the existing xts-aes-aesni *mostly* obsolete. For now, leave it in place to support 32-bit kernels and also CPUs like Intel Westmere that support AES-NI but not AVX. (We could potentially remove it anyway and just rely on the indirect acceleration via ecb-aes-aesni in those cases, but that change will need to be considered separately.) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: x86/aes-xts - add AES-XTS assembly macro for modern CPUsEric Biggers2-1/+802
Add an assembly file aes-xts-avx-x86_64.S which contains a macro that expands into AES-XTS implementations for x86_64 CPUs that support at least AES-NI and AVX, optionally also taking advantage of VAES, VPCLMULQDQ, and AVX512 or AVX10. This patch doesn't expand the macro at all. Later patches will do so, adding each implementation individually so that the motivation and use case for each individual implementation can be fully presented. The file also provides a function aes_xts_encrypt_iv() which handles the encryption of the IV (tweak), using AES-NI and AVX. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05x86: add kconfig symbols for assembler VAES and VPCLMULQDQ supportEric Biggers1-0/+10
Add config symbols AS_VAES and AS_VPCLMULQDQ that expose whether the assembler supports the vector AES and carryless multiplication cryptographic extensions. Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: ecdh - explicitly zeroize private_keyJoachim Vandersmissen1-0/+2
private_key is overwritten with the key parameter passed in by the caller (if present), or alternatively a newly generated private key. However, it is possible that the caller provides a key (or the newly generated key) which is shorter than the previous key. In that scenario, some key material from the previous key would not be overwritten. The easiest solution is to explicitly zeroize the entire private_key array first. Note that this patch slightly changes the behavior of this function: previously, if the ecc_gen_privkey failed, the old private_key would remain. Now, the private_key is always zeroized. This behavior is consistent with the case where params.key is set and ecc_is_key_valid fails. Signed-off-by: Joachim Vandersmissen <git@jvdsn.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: fips - Remove the now superfluous sentinel element from ctl_table arrayJoel Granados1-1/+0
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel from crypto_sysctl_table Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: jitter - Use kvfree_sensitive() to fix Coccinelle warningThorsten Blum1-2/+1
Replace memzero_explicit() and kvfree() with kvfree_sensitive() to fix the following Coccinelle/coccicheck warning reported by kfree_sensitive.cocci: WARNING opportunity for kfree_sensitive/kvfree_sensitive Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05dt-bindings: crypto: ti,omap-sham: Convert to dtschemaAnimesh Agarwal2-28/+56
Convert the OMAP SoC SHA crypto Module bindings to DT Schema. Signed-off-by: Animesh Agarwal <animeshagarwal28@gmail.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05crypto: qat - Avoid -Wflex-array-member-not-at-end warningsGustavo A. R. Silva2-6/+11
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. Use the `__struct_group()` helper to separate the flexible array from the rest of the members in flexible `struct qat_alg_buf_list`, through tagged `struct qat_alg_buf_list_hdr`, and avoid embedding the flexible-array member in the middle of `struct qat_alg_fixed_buf_list`. Also, use `container_of()` whenever we need to retrieve a pointer to the flexible structure. So, with these changes, fix the following warnings: drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/crypto/intel/qat/qat_common/qat_bl.h:25:33: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://github.com/KSPP/linux/issues/202 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02hwrng: mxc-rnga - Drop usage of platform_driver_probe()Uwe Kleine-König1-4/+5
There are considerations to drop platform_driver_probe() as a concept that isn't relevant any more today. It comes with an added complexity that makes many users hold it wrong. (E.g. this driver should have mark the driver struct with __refdata.) Convert the driver to the more usual module_platform_driver(). This fixes a W=1 build warning: WARNING: modpost: drivers/char/hw_random/mxc-rnga: section mismatch in reference: mxc_rnga_driver+0x10 (section: .data) -> mxc_rnga_remove (section: .exit.text) with CONFIG_HW_RANDOM_MXC_RNGA=m. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02crypto: x86/aesni - Update aesni_set_key() to return voidChang S. Bae2-7/+6
The aesni_set_key() implementation has no error case, yet its prototype specifies to return an error code. Modify the function prototype to return void and adjust the related code. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02crypto: x86/aesni - Rearrange AES key size checkChang S. Bae1-10/+8
aes_expandkey() already includes an AES key size check. If AES-NI is unusable, invoke the function without the size check. Also, use aes_check_keylen() instead of open code. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02crypto: bcm - Fix pointer arithmeticAleksandr Mishin1-1/+1
In spu2_dump_omd() value of ptr is increased by ciph_key_len instead of hash_iv_len which could lead to going beyond the buffer boundaries. Fix this bug by changing ciph_key_len to hash_iv_len. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 9d12ba86f818 ("crypto: brcm - Add Broadcom SPU driver") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>