[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Backing up large unchanging files



On 7/17/25 13:37, Tim Bishop wrote:
On Thu, Jul 17, 2025 at 07:20:45PM +0000, Colin Percival wrote:
Aha!  I had completely forgotten about this -- it's something I implemented
in 2008 -- but yes, there's a scenario where tarsnap switches into "lowmem"
mode.  Specifically, if you're archiving a large number of small files, the
default ("normalmem") regime can end up caching a lot of archive data; to
avoid this, tarsnap keeps track of the memory used to track "trailers" (aka
data at the end of a file which is too small to be its own block) and stops
storing those if they're taking up more memory than tarsnap is using to keep
track of complete blocks of data.

The good news here is that
1. In this scenario, *most* large files still get completely cached; it's just
an unlucky few (around 5-10% of them) which happen to be a number of complete
chunks plus a small extra bit which get affected, and
2. Tarsnap is still caching the list of complete chunks, so while it has to
re-read the file every time it's doing the far less cpu-intensive process of
computing hashes to verify that the data hasn't changed, rather than running
all of the data through the (considerably more cpu-intensive) chunking code.

Hmm, in this particular archive there's <30k files to backup. Is that a
"large number"? It's mostly small files, plus a very small percentage of
huge ones.

It's not so much "number" as "proportion".  Or "average file size" really --
the calculation basically depends on the combined sizes of files < 4 kB as a
fraction of the total amount of data stored.

I *think* I can address this with a patch to say "cache trailers on large
files even if we've decided on our own to switch into lowmem mode".  You've
been using Tarsnap for a long time; can I assume that you can test a patch
for me once I've written something?

Absolutely!

Can you try this patch?

https://github.com/Tarsnap/tarsnap/commit/61dab1292c19510f98d7dc7266b0e9320457ee97

--
Colin Percival
FreeBSD Release Engineering Lead & EC2 platform maintainer
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid