[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: confusing differences in total storage stats



On 12/20/25 15:27, tarasnap@spag.fastmail.com wrote:
I've used tarsnap to back up a series of old cd/dvd/external drive backups, a lot of which were duplicate directories of photos.

It's costing more than I'd budgeted so I was looking to cut down what I was storing and wanted to see what was taking up the most space.

I used the `tarsnap --print-stats -f '*'` command from the docs but the output confuses me: the 'All archives (unique data)' line lists the compressed size as ~140GB, and this tallies with my monthly spend. But the sum of all the individual archives' compressed unique data only comes to ~41GB, which I wasn't expecting.

[...]

So either I've missed something or I've misunderstood how to interpret the stats, but I'm not sure how to figure out what I can prune if 70% of the storage is not directly attributed -- any pointers?

"Unique data" means "how much data is in this archive *and not any others*".
Or from a different perspective: It tells you how much data will be removed
if you delete that *one* archive.

If you have two identical archives, they'll both show very close to zero
"unique data" (just a very small amount of non-deduplicated metadata).

So, of your ~140 GB of data, ~40 GB is blocks which are present in only one
archive and the other ~100 GB is blocks which appear in multiple archives.
Some of those blocks might appear in two archives; some might be found in
every single one of your archives.  (Tarsnap does actually know for each
block of data how many archives use it -- it needs this reference count in
order to know when it can be deleted -- but there's no interface to that
information.)

--
Colin Percival
FreeBSD Release Engineering Lead & EC2 platform maintainer
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid