[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: crypto_scrypt-sse.c speedup

On Thu, Nov 15, 2012 at 06:47:16AM -0800, Colin Percival wrote:
> On 11/15/12 02:52, Solar Designer wrote:
> > The attached patch for crypto_scrypt-sse.c (against its revision in
> > scrypt-1.1.6.tgz) speeds it up by 30% on AMD Bulldozer (tested on
> > FX-8120) in -march=native or -mxop builds, and by 5% to 10% on Intel
> > CPUs (tested on Xeon E5649), when run on the official test vectors.
> Interesting!  What compiler is this?

gcc 4.6.x

> I'm surprised that the "inline"
> does anything, and it seems odd that the loop unrolling wouldn't happen
> automatically too.

I've tried compiling in two ways:

1. -march=native -O2 -fomit-frame-pointer
2. -march=native -O2 -fomit-frame-pointer -funroll-loops -finline-functions

The 5% to 10% speedup on Intel is for #1.  With #2, I've just measured a
speedup of 4% on the same E5649.

I think the compiler does some inlining and unrolling when that is
requested with optimization flags, but doing it explicitly helps anyway.

My use of the "inline" keyword is a bit selective, and in the unrolling
I rely on knowledge that the block size is a multiple of 64.  The
compiler would only be able to make use of such knowledge along with
inlining or by examining all calls to a non-inlined static function,
which is trickier, and it'd need to use the fact that "128 * r" is a
multiple of 64.  Perhaps modern compilers are capable of all that, but
I do see some performance difference even with -funroll-loops
-finline-functions somehow.

The 30% speedup on AMD Bulldozer is primarily due to the use of XOP bit
rotate intrinsics, indeed.  With only this one change and no other
changes, the speedup was about 25%.

> > Please let me know if I should add a copyright statement, although maybe
> > these changes are too minor to be subject to copyright.
> Up to you -- either declare your changes to be public domain (and don't add
> a copyright line in your name) or declare them to be 2-clause BSD licensed
> (and add a copyright line).  Either is fine with me, but I need you to pick
> one. :-)

Let's do the latter.  I've been placing some of my works in the public
domain until a few years ago, but some people expressed concern that it
might not be legally possible in my jurisdiction - so I had to add
fallbacks to permissive license to my public domain statements anyway.

So can you add this copyright line if/when you merge the changes? -

 * Copyright 2012 Solar Designer

or you may use my real name (Alexander Peslyak), or both.