[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: crypto_scrypt-sse.c speedup

On Fri, Nov 16, 2012 at 02:12:46AM -0800, Colin Percival wrote:
> On 11/15/12 16:09, Solar Designer wrote:
> > The 30% speedup on AMD Bulldozer is primarily due to the use of XOP bit
> > rotate intrinsics, indeed.  With only this one change and no other
> > changes, the speedup was about 25%.
> Sounds like those are worth having... at the expense of needing yet another
> compile-time option (or run-time detection, ick).

I think we can start by using #ifdef __XOP__, like I used in the
proposed patch.  The compiler defines this when it is permitted to
generate XOP instructions - e.g., gcc does it when run with -mxop, or
when run with -march=native and the host's CPU supports XOP (as well as
for specific arch names that imply XOP support).

Yes, we could also have --enable-xop, which would add -mxop, and/or we
could have runtime detection (best for users, but most complicated -
especially if we want the code to be almost as fast as each of the
compile-time choices).

> I'll put your legal name in, just in case...