[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

scrypt Integerify



Colin,

Curiously, the scrypt paper does not appear to fully define Integerify().
The code uses:

static inline uint64_t
integerify(void * B, size_t r)
{
	uint32_t * X = (void *)((uintptr_t)(B) + (2 * r - 1) * 64);

	return (((uint64_t)(X[13]) << 32) + X[0]);
}

However, this also works fine (on little-endian), except for very large
values of N, where it'd differ:

integerify(void * B, size_t r)
{
	return *(uint64_t *)((uintptr_t)(B) + (2 * r - 1) * 64);
}

Obviously, this is slightly smaller and faster code.

Since the test vectors given in the scrypt paper only go up to 2^20 for
N, which is way below 2^32, and in fact 2^32 would correspond to at
least 512 GiB of memory, can we possibly change scrypt code as above
sooner rather than later?  We need to do it some years before people
possibly start using scrypt with 512+ GiB of memory.  Note that this is
not even a redefinition of scrypt, because the paper does not define
Integerify. :-)  The paper gives test vectors, which will continue to
apply despite of this change (I've just tested this to make sure).

Perhaps the intent behind using X[13] was to use Salsa20's X3 output
here, to ensure that Integerify can't be computed until Salsa20 is fully
computed, but on one hand this goal was not achieved for currently
realistic amounts of memory anyway and on the other it was not needed:
further computation depends on the full value of X anyway.  While we
probably don't want to reveal the memory access pattern too much in
advance, knowing where the very next V_j is to be fetched from slightly
sooner may be beneficial on typical computers (where scrypt is to be
used defensively).

Can we get this change in?  Obviously, the source files that may be used
on big-endian will have to use 32-bit X[1] and X[0] - we'll just change
13 to 1 there (we can optionally add optimized code within an #ifdef).
In the -sse file, we'll optimize things as above since SSE2 implies that
we're on x86, which is little-endian.

Even if we don't make this change, attackers will, even if it's just a
1% overall speedup or less.

Alexander