[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

optimized/specialized/revised scrypt implementations


@floodyberry on Twitter kindly pointed me at these:


cpuminer has specialized (for Litecoin's scrypt parameters), yet highly
optimized code for scrypt in assembly for x86-64 (including with XOP
extensions) and 32-bit x86 (these are under 2-clause BSD), and for ARM
(under GPLv2).

One curious aspect is that it includes a version with 3x interleave
(3 instances of scrypt are computed with inter-mixed instructions for
greater instruction-level parallelism).  This confirms my gut feeling
that Salsa20 core does not contain sufficient parallelism for some
current CPUs.

scrypt-jane also contains optimized assembly and intrinsics code, but it
is not specialized.  It has modular design, where the mixing function
can be replaced from Salsa20/8 to any other, and it provides a version
with ChaCha20/8.

In my opinion, ChaCha20 is an improvement in terms of using SSSE3
shuffles, but if we do deviate from the original scrypt, we should do a
lot more than just this.  (I have specific concerns and ideas.)