[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ability to tell what has changed



On 08/22/15 15:42, Garance AE Drosehn wrote:
>                                        Total size  Compressed size
> This archive                            20_468320         5_974899
> New data                                   109878             7560
> 
> The thing is that I'm pretty sure I haven't changed anything in that
> folder (although it is possible I did).

It's impossible to say for certain based on these statistics alone, but
the very high compression ratio for the new data makes me think that
it's probably mostly tar headers (which inherently have lots of zero
bytes in their sparse format).  Archiving files from a different location
(such that the relative paths are different), having an updated mtime on
one of the files... there's lots of things which could cause a few bytes
of tar headers to change.

Considering that you're only looking at 7560 new bytes of compressed data,
I wouldn't worry too much at this point. ;-)

> But one of the reasons I like the idea of doing dry-runs is to see
> if the amount of new data to backup seems "reasonable".  I've been
> known to download or generate pretty huge files "temporarily", only
> to forget about those files for years.  And there was one time that
> a new version of the Tivoli Storage Manager caused my config file
> to be handled differently, and I nearly backed up almost 1 TB of
> data which I really did *not* want to be backed up to TSM.

FYI, you can also use tarsnap's --maxbw option to tell it to stop
archiving when it hits a certain number of bytes of upload.

> Is there any way that tarsnap would tell me which files have new
> data, at least when I'm doing a dry-run?  It would probably be
> nice to have two options:  (1) a count of files which have new
> data, (2) a list of the specific filenames.

No, because the concept of "which files have new data" isn't really
well-defined; files turn into a stream of tar which gets chopped up
into pieces which then get deduplicated, so one new chunk of data
could correspond to one file, or several files, or even to no files
at all (in the case of a change which only happened in tar headers).

> And in a related question:  Is there way to do a dry-run and find
> out that there is *no* new data to back up?

Not right now.  Theoretically the archive-creation code could check
whether any new blocks were uploaded before it uploads the final
non-deduplicated metadata and abort if not, but I'd have to write
new code for this.

> For some of the
> folders that I back up, I do not want to create another backup if
> nothing has changed.

Why not?

-- 
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid