[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rotating back-ups, removed files, etc.

Craig Hartnett <craig@1811.spamslip.com> wrote:

> Hi again,
> So if I delete my initial archive today, Tarsnap will realise that it
> has to upload pretty much everything -- not everything, but almost
> everything -- again, right?
> And what if I delete a file -- any file -- on my hard drive that has
> been backed up in the past? Of course Tarsnap won't upload a null file,
> but does that file continue to exist in the archives unless or until I
> delete the last archive that contains it? In other words, it's *my*
> responsibility to curate my archives, right? (I'm quite happy to curate
> my own stuff. Just want to make sure.)

It's your duty to prune your old archives, yes, but the rest of your
assumptions are wrong.

It's actually far more simple than you are making it (whilst at the
same time, being far more complex!)

What I mean, is that from a coding point of view, the way files are
actually stored is very efficient, with encryption, compression and
de-duplication on a block level rather than a file level.

However, you can ignore all of this complex black magic:

Just consider every backup you make to be a complete and independent
full backup. They are not incremental backups! (the black magic only
requires literal incremental updates, and duplicate archived files
aren't stored server-side, but again, leave that to the black magic!)

So, every backup is seen to you as a complete and independent full archive.

So, you'll aready now know the answers to your questions, but to confirm:

Id you delete your first archive, it won't be much different from
deleting your second or your third archive.

All data which only existed in the original file is deleted from the
server, all the rest remains.

And if you delete a file client-side, then future backups won't reference
it at all (in other words, yes, no null files will be uploaded...
remember, think of your next backup as an independent full backup), but
the data will still exist server inside, inside every other backup that
file stil exists within (again, black magic, the file data doesn't literally
exist in duplication, but from your view it does)

I.E. If you remove a file, its data will only be removed server-side
once you've deleted all prior backups that refer to that file in its specific

> And what if I want to delete a file from my hard drive *and* my
> back-ups? Since the archives are immutable, and this file was in my
> initial back-up, am I right that there is no way to delete that single
> file from the back-up archives without deleting the whole archive, and
> consequently re-uploading most of the original archive again?

You delete the whole archive, and any other archive it exists in, yes.

BUT, black magic deals with the rest - no re-uploading will be done.
tarsnap intelligently keeps the bits still needed. In other words, deleting
the whole archive won't make any difference to time/data of subsequent backups.

> Which leads me to the conclusion that I should pick a time frame -- say,
> 90 days -- or come up with some traditional, staggered rotation system,
> and start deleting archives older than that *except* the initial
> archive, right?

You'll know the answer by now.... :-) Yep, you'll probably want to have
some sort of traditional deletion mechanism of old archives, but first,
check this out:

# tarsnap --print-stats --humanize-numbers -f '*'

(It takes some time to run...)

This will show you how much data each archive is using individually. I.E. How
much space will be literally freed up if you delete said archive.

Unless you update files manically, you'll probably be surprised how little space
each one takes up.

So, bearing those stats in mind, choose an appropriate deletion schedule.

Personally, I don't bother. I just do it manually now and then... Maybe for
recent backups deleting all but the first of the day, for older ones, all
but the first of the week, for older still, all but the first of the month etc.

But due to the efficient storage, I sometimes have years worths/hundreds of backups
stored at a time until I get around to pruning.

What I will do manually and quickly is demonstrated by the following example:

I recently downloaded a zip file of many many Gb in size.

tarsnap duly backed it up (of course, I could have set it not to, but it was important)

About a week or so later, I was still too busy to process it, but I tried unzipping
it and archiving it as an xzipped tar file. This took up far less space, but due to
the fact both old and new files are compressed, there is nothing in common between the
two that tarsnap can hold on to.. So I deleted all the archives from the previous week
that contained the old zip file, and let the new .tar.xz file be uploaded in its
place - cursing myself for not doing this before the file was ever backed up, saving
the upload pennies)

This leads me into another point:

You don't have to dump all your data as one archive. You may want to backup the system
stuff seperately from the user stuff.

Then, for instance, if you completly upgrade your system and are happy, you can zap
all the old system backups staight away, whilst still leaving user backups around a bit
longer, in case you suddenly realise that document you deleted a month ago is still

In other words, you can make a customised rotation scheme not just per machine, but
also per data-set, keeping some longer than others.

As for needing to keep your original archive, you know the answer now...:

"Nope, delete the initial archive just like any other! It's nothing special!"

> Or am I completely out to lunch here? :)

I'm afraid so, but I am most of the time, and anyway, lunch is more enjoyable than

> Thanks for any light you can shed on this, via links to documentation
> that covers it of course if I have missed it.

All of the black magic stuff and other stuff is on the www.tarsnap.com website...
But beware, witches and goblins!

Cheers! Jamie