[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ability to tell what has changed



On Aug 23, 2015, at 12:43 PM, Colin Percival <cperciva@tarsnap.com> wrote:

> On 08/22/15 15:42, Garance AE Drosehn wrote:
>>                                       Total size  Compressed size
>> This archive                            20_468320         5_974899
>> New data                                   109878             7560
>> 
>> The thing is that I'm pretty sure I haven't changed anything in that
>> folder (although it is possible I did).
> 
> It's impossible to say for certain based on these statistics alone, but
> the very high compression ratio for the new data makes me think that
> it's probably mostly tar headers [...]
> 
> Considering that you're only looking at 7560 new bytes of compressed
> data, I wouldn't worry too much at this point. ;-)

Ah, indeed!  I hadn't paid attention to the compressed size.

>> And in a related question:  Is there way to do a dry-run and find
>> out that there is *no* new data to back up?
> 
> Not right now.  Theoretically the archive-creation code could check
> whether any new blocks were uploaded before it uploads the final
> non-deduplicated metadata and abort if not, but I'd have to write
> new code for this.

Well, it doesn't need to abort.  It could be a feature that is only
available for a --dry-run.  But the feature would also have to
handle the issue that I describe below, so it might be completely
infeasible to do.

>> For some of the
>> folders that I back up, I do not want to create another backup if
>> nothing has changed.
> 
> Why not?

It depends on the collection of data that I'm archiving.  For some
collections all I want is the ability to recreate the data "as it
was yesterday" if (say) my hard disk crashes.  But for other
collections the data will not change often, but it may be several
months before I realize that something has changed for the worse
and I want an older snapshot of that data.

If I back up that second category of data every day, then I could
end up with 200 archives (which are basically the same thing) before
I get to an archive where something has actually changed.  Those
200 archives are doing nothing for me, except to be a whole lot of
clutter when I do want to find a significant set of changes.

I'd rather have four archives of the collection where each archive
includes some changes, than to have 365 archives of the collection.

And yet I do want to *check* every day, because if there is some
significant change then I want to capture those changes before I
forget that they happened.  Disk disasters can happen to all types
of data collections!

    =   =   issue   =   = 
And after thinking some more about my request, I realize that it
wouldn't really do what I want anyway (even if you could implement
exactly what I had asked for).  It wouldn't catch when files are
removed from the collection of data, and that's pretty important.

I looked through some earlier messages to this mailing list, and
I'm going to adapt the idea that Hugo Osvaldo Barrera described
in January 2015 on "How I keep track of file additions/removals".
Maybe do something with md5deep, which I've used before in other
situations.

Thanks!

-- 
Garance Alistair Drosehn            =  gadcode@earthlink.net
Senior Systems Programmer           or       gad@FreeBSD.org