[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Odd thing about restore times
Hi Jamie,
A thousand apologies for not acknowledging your email almost four months
ago. You did a great job of educating me, and bringing to my attention
the -q switch. Thanks.
Craig
On Sat, 2018-11-24 at 16:01 +0000, Jamie Landeg-Jones wrote:
> Craig Hartnett <craig@1811.spamslip.com> wrote:
>
> > Yet in both cases, the command does not exit for about 16-21 minutes,
> > which is what was going to lead me to complain. However, the actual
> > restore was done about as quickly as one would expect.
>
> Hi. tarsnap follows the "tar" standard. In fact, it actually uses the
> bsd "libarchive" library to do the tar-bits.
>
> The thing with the tar standard, is that a file can exist at any place in
> the archive, and also *more than once* in an archive. (I don't know why,
> but if I had to guess, I'd say this is to allow appending to tar archives
> an update to a file - useful for tape backups!)
>
> For the first case, tarsnap therefore needs to scan the whole archive
> when you are restoring directories or wildcards.
>
> For the second case, to ensure you always restore the latest version of a
> file, even when you specify a specific filename directly, it still needs
> to scan through everything if it's found your file.
>
> To get around this issue, Colin has added a "quick" mode flag that causes
> the restore to stop as soon as the listed files are restored. Note,
> using this will only work properly if each specific filename you want
> to restore is specified literally, no wildcards or just direcory names.
>
> This is what you want to use.
>
> From the tarsnap manpage:
>
> | -q (--fast-read) (x and t modes only) Extract or list only the first
> | archive entry that matches each pattern or filename operand.
> | Exit as soon as each specified pattern or filename has been matched.
> | By default, the archive is always read to the very end, since there
> | can be multiple entries with the same name and, by convention, later
> | entries overwrite earlier entries. This option is provided as a
> | performance optimization.
>
> Now, this raises 2 other questions, but you'll need a reply from Colin or
> Graham on these!:
>
> 1) We can never append to tarsnap archives. The way they are stored, this is
> nonsensical.
>
> Restoring the same file more than once in an archive therefore makes no
> sense.
>
> Therefore, wouldn't it be better if on a single tarsnap run, tarsnap refused
> to backup the same file more than once, and then similarly on a resore,
> always assumed "-q" when all the named entries to be retrieved are mentioned
> literally?
>
> Obviously, if an entry turns out to be a directory, and not a file,
> tarsnap will have to then fall back to scanning the full index (as indeed
> it would if it first happens to find a filename whose path is a subset of
> the named entry [i.e. the named entry is a directory, but a file within it
> happens to be in the index before the directory itself...] Though I guess this
> latter point would never happen with tarsnap, so may be moot)
>
> 2) tarsnap archives are not sequential-only-access files. Why isn't the index
> of an archives contents held more optimally than it appears? (There will be
> a good reason for this, I just don't know what it is!)
>
> Cheers,
> Jamie