[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

scrypt crypto primitive (was: revising scrypt for password hashing)

On Wed, Mar 27, 2013 at 10:21:34AM +0400, Solar Designer wrote:
> Another replacement I am considering is sort of "wide Blowfish" - an
> extension of Blowfish to SIMD vectors - which may be shown as being
> about as fast to compute as normal Blowfish (key setup) in bcrypt, but
> no quicker to attack than bcrypt (including on GPUs and specialized
> devices).

As it relates to attacks with ASICs, the Passwords^12 revision of scrypt


gives a table on slide 51 comparing different crypto primitives for use
in scrypt.  Specifically, it compares SHA-256, Blowfish, AES-128,
Salsa20/8, and Keccak - and it shows how Salsa20/8 is best among these.
The score is calculated as SW^2/HW, where SW and HW are performance in
software and hardware (for one crypto core), respectively.  This formula
is such because a faster crypto primitive lets us not only compute it
more times, but also use more memory given the same target duration.
The die area consumed by the crypto core is not considered because the
die area cost is assumed to be almost exclusively from the memory needs.

The table gives:

SW = 800 Mbps, HW = 1000 Mbps, Score = 640

SW = 1200 Mbps, HW = 40000 Mbps, Score = 36

SW = 2000 Mbps, HW = 2000 Mbps, Score = 2000

The HW speeds given here might not actually be directly comparable as I
think the fast AES core's latency might not be proportionally lower, and
for use in scrypt it matters - although it depends on how AES is used.

Now let's see what happens if we manage to extend Blowfish to 128-bit
SIMD while not reducing its speed per 32-bit SIMD vector element:

(4*800)^2/(4*1000) = 2560

For an actual block cipher, such extension could be difficult, but in
scrypt our use of Blowfish would be less demanding, so I think this is

Two other knobs are the rounds count and the instruction-level
parallelism factor.  For example, we might halve the rounds count, but
mix instructions from two instances of the crypto primitive (computed on
slightly different inputs), then XOR the two outputs together.  Would
the resulting scheme remain secure enough for our data mixing use?
Perhaps it would.  Assuming that our CPU can execute the two inter-mixed
instances as fast as it does one instance (per round), we get:

For 2*Salsa20/4:

(2*2000)^2/(2*2000) = 4000

For 2*wideBlowfish/8:

(2*4*800)^2/(2*4*1000) = 5120

However, as we make the crypto primitive faster, we're more likely to
bump into the memory bandwidth, which would limit the actual score.