On 21/12/2025 04:07, Colin Percival wrote:
On 12/20/25 15:27, tarasnap@spag.fastmail.com wrote:I used the `tarsnap --print-stats -f '*'` command from the docs but the output confuses me: the 'All archives (unique data)' line lists the compressed size as ~140GB, and this tallies with my monthly spend. But the sum of all the individual archives' compressed unique data only comes to ~41GB, which I wasn't expecting."Unique data" means "how much data is in this archive *and not any others*". Or from a different perspective: It tells you how much data will be removedif you delete that *one* archive. If you have two identical archives, they'll both show very close to zero "unique data" (just a very small amount of non-deduplicated metadata).So, of your ~140 GB of data, ~40 GB is blocks which are present in only one archive and the other ~100 GB is blocks which appear in multiple archives.Some of those blocks might appear in two archives; some might be found in every single one of your archives. (Tarsnap does actually know for eachblock of data how many archives use it -- it needs this reference count inorder to know when it can be deleted -- but there's no interface to that information.)
Thank you, that makes sense -- I'd come to the same conclusion having slept on it! I think my previous mental model of unique data was 'originally unique at time of upload' with the other archives then being incremental, but that number wouldn't actually help me in figuring out which were the most beneficial archives to offload.
So in terms of outright minimisation, my best strategy is removing archives that have the highest unique data. To figure out anything more complicated in terms of storing what's most valuable to me, I'll maybe have to get creative with dry run options....
I don't know if this has come up before, but the most valuable version of tarsnap for me would be a local one, i.e. ability to use an external drive/remote fs of my choice as the store. I can't fault the software itself and the encryption and deduplication aspects are excellent, but the backup/restore process itself is so slow (I wouldn't have predicted 'days' on a fibre connection!) and comparatively expensive that the total cost for the holistic quality of service ends up pretty high.
I would definitely pay to have the former without the latter! Thanks again, Tara