[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Miscompilation under strict C aliasing rules
Thanks for the report, and to all discussion about this!
We fixed this in scrypt over a month ago:
2022-04-01 scrypt: Fix strict aliasing
https://github.com/Tarsnap/scrypt/commit/209fd279c9357010d1dabd446c458dfeb9820e6c
but it occurred to me that I should add something for the benefit of the email
archives. The above commit fixes both the SSE2 and non-SSE2 code. If anybody
is interested in the detailed discussion,
https://github.com/Tarsnap/scrypt/pull/333
You mentioned that looping over bytes (instead of uint32_t *) could end up
being faster (with a modern compiler). I did find that to be true (+3% on
clang, +7.5% on gcc), but that was for running non-SSE2 code on an x86 system.
We decided not to make that change, since it complicated the code somewhat,
and approximately 0% of x86 users should use non-SSE2 code, and we would need
to benchmark a range of different non-x86 systems (and compilers) to get a
sense of the effects of such a change. If anybody is curious about this, the
commit is available:
https://github.com/Tarsnap/scrypt/tree/DO-NOT-MERGE-faster-plain-smix
Cheers,
- Graham Percival
On Thu, Feb 24, 2022 at 09:42:06AM +0100, Quentin Carbonneaux wrote:
> Hi,
>
> Yesterday evening my archival tool started refusing correct
> passphrases. A bit of debugging revealed a difference of
> behavior in the blockmix_salsa8() function when compiled
> with GCC 11.2.0 (straight from the Debian package) with
> optimization flags -O3.
>
> Here is the offending assembly:
>
> 0x00005555555b6fdf <+351>: mov (%rbx),%r14
> A) 0x00005555555b6fe2 <+354>: xor 0x40(%r12),%r14
> 0x00005555555b6fe7 <+359>: mov %rbx,%rdi
> B) 0x00005555555b6fea <+362>: mov %r14,(%rbx)
> 0x00005555555b6fed <+365>: mov 0x48(%r12),%rax
> 0x00005555555b6ff2 <+370>: xor %rax,0x8(%rbx)
> 0x00005555555b6ff6 <+374>: mov 0x50(%r12),%rax
> 0x00005555555b6ffb <+379>: xor %rax,0x10(%rbx)
> 0x00005555555b6fff <+383>: mov 0x58(%r12),%rax
> 0x00005555555b7004 <+388>: xor %rax,0x18(%rbx)
> 0x00005555555b7008 <+392>: mov 0x60(%r12),%rax
> 0x00005555555b700d <+397>: xor %rax,0x20(%rbx)
> 0x00005555555b7011 <+401>: mov 0x68(%r12),%rax
> 0x00005555555b7016 <+406>: xor %rax,0x28(%rbx)
> 0x00005555555b701a <+410>: mov 0x70(%r12),%rax
> 0x00005555555b701f <+415>: xor %rax,0x30(%rbx)
> 0x00005555555b7023 <+419>: mov 0x78(%r12),%rax
> 0x00005555555b7028 <+424>: xor %rax,0x38(%rbx)
> C) 0x00005555555b702c <+428>: call 0x5555555b6b40 <salsa20_8>
> 0x00005555555b7031 <+433>: mov -0x58(%rbp),%rdi
> 0x00005555555b7035 <+437>: mov -0x48(%rbp),%rcx
> 0x00005555555b7039 <+441>: mov %r12,-0x50(%rbp)
> 0x00005555555b703d <+445>: mov -0x38(%rbp),%rsi
> D) 0x00005555555b7041 <+449>: mov %r14,(%rdi,%rcx,4)
>
> The root of the problem is that the store B) resulting
> from inlining blkxor() line 134 of blockmix_salsa8 [0]
> is forwarded to D). And this despite the call C).
>
> This effectively results in a dysfunctional implementation
> of the primitive.
>
> The compiler is technically allowed to forward the store
> because blkcpy (and also blkxor) breaks the strict aliasing
> rules of C. (See 6.5.7. in the C99 standard, for example.)
>
> On my end, I will replace blkcpy() with memcpy() and have
> blkxor() act on bytes. I hope this can be fixed upstream
> as well.
>
> Cheers!
>
> [0] https://github.com/Tarsnap/scrypt/blob/6154fe1ab139a30a733ef4f831ef0912c8b11b6e/lib/crypto/crypto_scrypt_smix.c