[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Odd thing about restore times



Hi Jamie,

A thousand apologies for not acknowledging your email almost four months
ago. You did a great job of educating me, and bringing to my attention
the -q switch. Thanks.


Craig



On Sat, 2018-11-24 at 16:01 +0000, Jamie Landeg-Jones wrote:
> Craig Hartnett <craig@1811.spamslip.com> wrote:
> 
> > Yet in both cases, the command does not exit for about 16-21 minutes,
> > which is what was going to lead me to complain. However, the actual
> > restore was done about as quickly as one would expect.
> 
> Hi. tarsnap follows the "tar" standard. In fact, it actually uses the
> bsd "libarchive" library to do the tar-bits.
> 
> The thing with the tar standard, is that a file can exist at any place in
> the archive, and also *more than once* in an archive. (I don't know why,
> but if I had to guess, I'd say this is to allow appending to tar archives
> an update to a file - useful for tape backups!)
> 
> For the first case, tarsnap therefore needs to scan the whole archive
> when you are restoring directories or wildcards.
> 
> For the second case, to ensure you always restore the latest version of a
> file, even when you specify a specific filename directly, it still needs
> to scan through everything if it's found your file.
> 
> To get around this issue, Colin has added a "quick" mode flag that causes
> the restore to stop as soon as the listed files are restored. Note,
> using this will only work properly if each specific filename you want
> to restore is specified literally, no wildcards or just direcory names.
> 
> This is what you want to use.
> 
> From the tarsnap manpage:
> 
>  | -q (--fast-read) (x and t modes only) Extract or list only the first
>  |     archive entry that matches each pattern or filename operand.
>  |     Exit as soon as each specified pattern or filename has been matched.
>  |     By default, the archive is always read to the very end, since there
>  |     can be multiple entries with the same name and, by convention, later
>  |     entries overwrite earlier entries.  This option is provided as a
>  |     performance optimization.
> 
> Now, this raises 2 other questions, but you'll need a reply from Colin or
> Graham on these!:
> 
> 1) We can never append to tarsnap archives. The way they are stored, this is
>    nonsensical.
> 
>    Restoring the same file more than once in an archive therefore makes no
>    sense.
> 
>    Therefore, wouldn't it be better if on a single tarsnap run, tarsnap refused
>    to backup the same file more than once, and then similarly on a resore,
>    always assumed "-q" when all the named entries to be retrieved are mentioned
>    literally?
> 
>    Obviously, if an entry turns out to be a directory, and not a file,
>    tarsnap will have to then fall back to scanning the full index (as indeed
>    it would if it first happens to find a filename whose path is a subset of
>    the named entry [i.e. the named entry is a directory, but a file within it
>    happens to be in the index before the directory itself...] Though I guess this
>    latter point would never happen with tarsnap, so may be moot)
> 
>  2) tarsnap archives are not sequential-only-access files. Why isn't the index
>     of an archives contents held more optimally than it appears? (There will be
>     a good reason for this, I just don't know what it is!)
> 
>    Cheers,
>    Jamie