[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rotating back-ups, removed files, etc.



Hi again Jamie,

A thousand more apologies for not thanking you for the time you took to
explain all of this to me, and whoever finds this in the future. I
appreciate both your humour in doing so and your patience. Everything
your wrote made complete sense.


Craig



On Sat, 2018-11-24 at 16:40 +0000, Jamie Landeg-Jones wrote:
> Craig Hartnett <craig@1811.spamslip.com> wrote:
> 
> > Hi again,
> >
> > So if I delete my initial archive today, Tarsnap will realise that it
> > has to upload pretty much everything -- not everything, but almost
> > everything -- again, right?
> >
> > And what if I delete a file -- any file -- on my hard drive that has
> > been backed up in the past? Of course Tarsnap won't upload a null file,
> > but does that file continue to exist in the archives unless or until I
> > delete the last archive that contains it? In other words, it's *my*
> > responsibility to curate my archives, right? (I'm quite happy to curate
> > my own stuff. Just want to make sure.)
> 
> It's your duty to prune your old archives, yes, but the rest of your
> assumptions are wrong.
> 
> It's actually far more simple than you are making it (whilst at the
> same time, being far more complex!)
> 
> What I mean, is that from a coding point of view, the way files are
> actually stored is very efficient, with encryption, compression and
> de-duplication on a block level rather than a file level.
> 
> However, you can ignore all of this complex black magic:
> 
> Just consider every backup you make to be a complete and independent
> full backup. They are not incremental backups! (the black magic only
> requires literal incremental updates, and duplicate archived files
> aren't stored server-side, but again, leave that to the black magic!)
> 
> So, every backup is seen to you as a complete and independent full archive.
> 
> So, you'll aready now know the answers to your questions, but to confirm:
> 
> Id you delete your first archive, it won't be much different from
> deleting your second or your third archive.
> 
> All data which only existed in the original file is deleted from the
> server, all the rest remains.
> 
> And if you delete a file client-side, then future backups won't reference
> it at all (in other words, yes, no null files will be uploaded...
> remember, think of your next backup as an independent full backup), but
> the data will still exist server inside, inside every other backup that
> file stil exists within (again, black magic, the file data doesn't literally
> exist in duplication, but from your view it does)
> 
> I.E. If you remove a file, its data will only be removed server-side
> once you've deleted all prior backups that refer to that file in its specific
> incarnation.
> 
> > And what if I want to delete a file from my hard drive *and* my
> > back-ups? Since the archives are immutable, and this file was in my
> > initial back-up, am I right that there is no way to delete that single
> > file from the back-up archives without deleting the whole archive, and
> > consequently re-uploading most of the original archive again?
> 
> You delete the whole archive, and any other archive it exists in, yes.
> 
> BUT, black magic deals with the rest - no re-uploading will be done.
> tarsnap intelligently keeps the bits still needed. In other words, deleting
> the whole archive won't make any difference to time/data of subsequent backups.
> 
> > Which leads me to the conclusion that I should pick a time frame -- say,
> > 90 days -- or come up with some traditional, staggered rotation system,
> > and start deleting archives older than that *except* the initial
> > archive, right?
> 
> You'll know the answer by now.... :-) Yep, you'll probably want to have
> some sort of traditional deletion mechanism of old archives, but first,
> check this out:
> 
> # tarsnap --print-stats --humanize-numbers -f '*'
> 
> (It takes some time to run...)
> 
> This will show you how much data each archive is using individually. I.E. How
> much space will be literally freed up if you delete said archive.
> 
> Unless you update files manically, you'll probably be surprised how little space
> each one takes up.
> 
> So, bearing those stats in mind, choose an appropriate deletion schedule.
> 
> Personally, I don't bother. I just do it manually now and then... Maybe for
> recent backups deleting all but the first of the day, for older ones, all
> but the first of the week, for older still, all but the first of the month etc.
> 
> But due to the efficient storage, I sometimes have years worths/hundreds of backups
> stored at a time until I get around to pruning.
> 
> What I will do manually and quickly is demonstrated by the following example:
> 
> I recently downloaded a zip file of many many Gb in size.
> 
> tarsnap duly backed it up (of course, I could have set it not to, but it was important)
> 
> About a week or so later, I was still too busy to process it, but I tried unzipping
> it and archiving it as an xzipped tar file. This took up far less space, but due to
> the fact both old and new files are compressed, there is nothing in common between the
> two that tarsnap can hold on to.. So I deleted all the archives from the previous week
> that contained the old zip file, and let the new .tar.xz file be uploaded in its
> place - cursing myself for not doing this before the file was ever backed up, saving
> the upload pennies)
> 
> This leads me into another point:
> 
> You don't have to dump all your data as one archive. You may want to backup the system
> stuff seperately from the user stuff.
> 
> Then, for instance, if you completly upgrade your system and are happy, you can zap
> all the old system backups staight away, whilst still leaving user backups around a bit
> longer, in case you suddenly realise that document you deleted a month ago is still
> required!
> 
> In other words, you can make a customised rotation scheme not just per machine, but
> also per data-set, keeping some longer than others.
> 
> As for needing to keep your original archive, you know the answer now...:
> 
> "Nope, delete the initial archive just like any other! It's nothing special!"
> 
> > Or am I completely out to lunch here? :)
> 
> I'm afraid so, but I am most of the time, and anyway, lunch is more enjoyable than
> computers!
> 
> > Thanks for any light you can shed on this, via links to documentation
> > that covers it of course if I have missed it.
> 
> All of the black magic stuff and other stuff is on the www.tarsnap.com website...
> But beware, witches and goblins!
> 
> Cheers! Jamie