Age | Commit message (Collapse) | Author | Files | Lines |
|
With all the crypto modules enabled on x86, and with a CPU that supports
AVX-2 but not SHA-NI instructions (e.g. Haswell, Broadwell, Skylake),
the "multibuffer" implementations of SHA-1, SHA-256, and SHA-512 are the
highest priority. However, these implementations only perform well when
many hash requests are being submitted concurrently, filling all 8 AVX-2
lanes. Otherwise, they are incredibly slow, as they waste time waiting
for more requests to arrive before proceeding to execute each request.
For example, here are the speeds I see hashing 4096-byte buffers with a
single thread on a Haswell-based processor:
generic avx2 mb (multibuffer)
------- -------- ----------------
sha1 602 MB/s 997 MB/s 0.61 MB/s
sha256 228 MB/s 412 MB/s 0.61 MB/s
sha512 312 MB/s 559 MB/s 0.61 MB/s
So, the multibuffer implementation is 500 to 1000 times slower than the
other implementations. Note that with smaller buffers or more update()s
per digest, the difference would be even greater.
I believe the vast majority of people are in the boat where the
multibuffer code is much slower, and only a small minority are doing the
highly parallel, hashing-intensive, latency-flexible workloads (maybe
IPsec on servers?) where the multibuffer code may be beneficial. Yet,
people often aren't familiar with all the crypto config options and so
the multibuffer code may inadvertently be built into the kernel.
Also the multibuffer code apparently hasn't been very well tested,
seeing as it was sometimes computing the wrong SHA-256 digest.
So, let's make the multibuffer algorithms low priority. Users who want
to use them can either request them explicitly by driver name, or use
NETLINK_CRYPTO (crypto_user) to increase their priority at runtime.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The HASH_FIRST flag is never set. Remove it.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
We recently added some new locking but missed the unlocks on these
error paths in sha512_ctx_mgr_submit().
Fixes: c459bd7beda0 ("crypto: sha512-mb - Protect sha512 mb ctx mgr access")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The flusher and regular multi-buffer computation via mcryptd may race with another.
Add here a lock and turn off interrupt to to access multi-buffer
computation state cstate->mgr before a round of computation. This should
prevent the flusher code jumping in.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Current multi-buffer hash implementations have a restriction on the total
length of a hash job to 512MB. Hashing larger buffers will result in an
incorrect hash. This extends the limit to 2^62 - 1.
Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
1. fix ctx pointer
Use req_ctx which is the ctx for the next job that have
been completed in the lanes instead of the first
completed job rctx, whose completion could have been
called and released.
Signed-off-by: Xiaodong Liu <xiaodong.liu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
for condition comparison and cleanup multiline comment style
In sha*_ctx_mgr_submit, we currently use the | operator instead of ||
((ctx->partial_block_buffer_length) | (len < SHA1_BLOCK_SIZE))
Switching it to || and remove extraneous paranthesis to
adhere to coding style.
Also cleanup inconsistent multiline comment style.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
This patch introduces the multi-buffer job manager which is responsible
for submitting scatter-gather buffers from several SHA512 jobs to the
multi-buffer algorithm. It also contains the flush routine that's called
by the crypto daemon to complete the job when no new jobs arrive before
the deadline of maximum latency of a SHA512 crypto job.
The SHA512 multi-buffer crypto algorithm is defined and initialized in this
patch.
Signed-off-by: Megha Dey <megha.dey@linux.intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|