On Thu, Jul 17, 2025 at 07:20:45PM +0000, Colin Percival wrote:
Aha! I had completely forgotten about this -- it's something I implemented
in 2008 -- but yes, there's a scenario where tarsnap switches into "lowmem"
mode. Specifically, if you're archiving a large number of small files, the
default ("normalmem") regime can end up caching a lot of archive data; to
avoid this, tarsnap keeps track of the memory used to track "trailers" (aka
data at the end of a file which is too small to be its own block) and stops
storing those if they're taking up more memory than tarsnap is using to keep
track of complete blocks of data.
The good news here is that
1. In this scenario, *most* large files still get completely cached; it's just
an unlucky few (around 5-10% of them) which happen to be a number of complete
chunks plus a small extra bit which get affected, and
2. Tarsnap is still caching the list of complete chunks, so while it has to
re-read the file every time it's doing the far less cpu-intensive process of
computing hashes to verify that the data hasn't changed, rather than running
all of the data through the (considerably more cpu-intensive) chunking code.
Hmm, in this particular archive there's <30k files to backup. Is that a
"large number"? It's mostly small files, plus a very small percentage of
huge ones.