[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: crypto_scrypt-sse.c speedup

Colin -

What's the point in having blockmix_salsa8() use externally-provided
temporary space (the X variable)?  Is it to ensure X is properly
aligned (regardless of stack (mis)alignment)?  Is it to ensure there's
no cache tag conflict between X and other arrays, in case we're on a
machine with low L1 data cache associativity?

I am getting some speedup by declaring X as a local variable in
blockmix_salsa8() instead:

	__m128i X[4];

The old speed on FX-8120 was (with my previous set of optimizations):

On Thu, Nov 15, 2012 at 02:52:11PM +0400, Solar Designer wrote:
> user@bull:~/scrypt/escrypt/escrypt-5$ time ./tests | md5sum
> 4455b1ce0529e7f877de53f24ff78bec  -
> real    0m2.732s
> user    0m2.184s
> sys     0m0.512s

with this change, it is:

$ time ./tests | md5sum
4455b1ce0529e7f877de53f24ff78bec  -

real    0m2.657s
user    0m2.080s
sys     0m0.540s

The actual speed varies from invocation to invocation a little bit, but
the above difference in real and user time is typical (across many
invocations) between these two code revisions.

I think having X as a local variable lets the compiler fully keep it in
registers, whereas having it passed into the function by reference may
result in unnecessary writes into the provided X array before the
function returns; it may also encourage the compiler to do such writes
inside the loop, especially since its iteration count is determined by r
and thus is not known at compile time (might be low).