[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
optimized/specialized/revised scrypt implementations
Hi,
@floodyberry on Twitter kindly pointed me at these:
https://github.com/pooler/cpuminer
https://github.com/floodyberry/scrypt-jane
cpuminer has specialized (for Litecoin's scrypt parameters), yet highly
optimized code for scrypt in assembly for x86-64 (including with XOP
extensions) and 32-bit x86 (these are under 2-clause BSD), and for ARM
(under GPLv2).
One curious aspect is that it includes a version with 3x interleave
(3 instances of scrypt are computed with inter-mixed instructions for
greater instruction-level parallelism). This confirms my gut feeling
that Salsa20 core does not contain sufficient parallelism for some
current CPUs.
scrypt-jane also contains optimized assembly and intrinsics code, but it
is not specialized. It has modular design, where the mixing function
can be replaced from Salsa20/8 to any other, and it provides a version
with ChaCha20/8.
In my opinion, ChaCha20 is an improvement in terms of using SSSE3
shuffles, but if we do deviate from the original scrypt, we should do a
lot more than just this. (I have specific concerns and ideas.)
Alexander