[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: crypto_scrypt-sse.c speedup
Colin -
What's the point in having blockmix_salsa8() use externally-provided
temporary space (the X variable)? Is it to ensure X is properly
aligned (regardless of stack (mis)alignment)? Is it to ensure there's
no cache tag conflict between X and other arrays, in case we're on a
machine with low L1 data cache associativity?
I am getting some speedup by declaring X as a local variable in
blockmix_salsa8() instead:
__m128i X[4];
The old speed on FX-8120 was (with my previous set of optimizations):
On Thu, Nov 15, 2012 at 02:52:11PM +0400, Solar Designer wrote:
> user@bull:~/scrypt/escrypt/escrypt-5$ time ./tests | md5sum
> 4455b1ce0529e7f877de53f24ff78bec -
>
> real 0m2.732s
> user 0m2.184s
> sys 0m0.512s
with this change, it is:
$ time ./tests | md5sum
4455b1ce0529e7f877de53f24ff78bec -
real 0m2.657s
user 0m2.080s
sys 0m0.540s
The actual speed varies from invocation to invocation a little bit, but
the above difference in real and user time is typical (across many
invocations) between these two code revisions.
I think having X as a local variable lets the compiler fully keep it in
registers, whereas having it passed into the function by reference may
result in unnecessary writes into the provided X array before the
function returns; it may also encourage the compiler to do such writes
inside the loop, especially since its iteration count is determined by r
and thus is not known at compile time (might be low).
Alexander