Colin, You might want to add a source code comment into crypto_scrypt-*.c: crypto_scrypt() saying that if the "for (i = 0; i < p; i++)" loop is actually parallelized, then separate instances of XY and V will need to be allocated for each thread (and you might want to emphasize in the comment that there may be fewer than "p" of these). Thanks, Alexander