[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: identical input, different output?

On Wed, Dec 10, 2014 at 01:16:02AM +1100, Ray Callaghan wrote:
> Perhaps if the SSE self check fails, rather than random, just revert to the
> NO-SSE implementation (after self-testing the no-sse version) - so it still
> returns valid data, exactly like it should.

... and what if it fails too?

I think what you describe makes sense not so much to deal with "the
Pentium 3 issue", but simply to support SSE2 in a binary that also works
on pre-SSE2 CPUs.  To do this, a source file (a new wrapper file or the
-nosse source file) would need to be built without gcc's -msse2 (or the
current compiler's equivalent).  It would need to check CPUID
(unfortunately, no "portable" intrinsic for it exists, so some #ifdef's
and/or inline asm there).  If SSE2 is detected, it would invoke the
function from the -sse source file.  Otherwise it'd proceed with the one
from -nosse.

> Sure, it doubles the program size, but on anything which might support SSE,
> nobody cares about a few kb of code.

However, we care about the associated complexity and risk of bugs.
Having a piece of almost-dead code in there, which is omitted from most
tests (thereby reducing their relative code coverage), is not great.

Having a runtime self-test even after the -nosse code was invoked helps
(and I think we should), but it means we do need to decide on how to
handle possible failure.

> And presumably, compilers should detect SSE issues, but if the
> compiler is broken, we can only do it at runtime, with a performance
> penalty; but we should probably still consider that a compiler issue.

I disagree.  A compiler is also not able to "detect SSE issues" other
than at runtime.

> I think there should definitely be a way for a caller to ask scrypt to do a
> self-check; regardless of sse, or any other issues; but we can apply a
> heuristic as to whether we automatically force the self-check, or whether
> it's only for the caller to explicitly ask for. Possibly, the compiler has
> a "yes" (like detecting 64 bit), so uses SSE without a forced self-check; a
> "no", uses the NO-SSE; and a "maybe", like the SSE flags are on, but it's a
> 32-bit - in which case, compile with SSE, but force a self-check on every
> usage. (or compile with both, and have a runtime downgrade to no-sse)
> Thoughts?

I think this would introduce too much complexity.  I think if we do
introduce a runtime self-test, it's better to invoke it all the time
(and it would need to be lightweight enough for that).

The way scrypt's SSE2 code runs on Pentium 3 era CPUs is just a special
case (albeit a rather unique one: both the compiler and the CPU work as
intended, yet the result is wrong).  There may be other issues, both
with -sse and -nosse.  I've already mentioned C strict aliasing issues
(OK, need to fix those in the source code) and possible compiler bugs.

If we do introduce a self-test, it'd be a pity to limit it to just this
one special case.