So to make sure my crypto_scrypt() function will work on all CPUs, I should prefer crypto_scrypt-nosse.c over crypto_scrypt-sse.c?
Are there modern CPUs on which SSE won't work? The Apple A4, A5, ... chips on their iOS devices, maybe?
I believe all x64 CPUs support SSE2. I'm not sure if its mandatory in the spec, or just that all manufacturers included it.
The Apple chips, like most tablet/phone chips, are ARM instruction set (which is neither x86 or x64) - so they don't support SSE at all; but there are equivalents of SIMD operations for ARM.
For best portability, the nosse will work everywhere; but for any cpu which has sse, the sse version is much faster.