[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Odd thing about restore times



Craig Hartnett <craig@1811.spamslip.com> wrote:

> Yet in both cases, the command does not exit for about 16-21 minutes,
> which is what was going to lead me to complain. However, the actual
> restore was done about as quickly as one would expect.

Hi. tarsnap follows the "tar" standard. In fact, it actually uses the
bsd "libarchive" library to do the tar-bits.

The thing with the tar standard, is that a file can exist at any place in
the archive, and also *more than once* in an archive. (I don't know why,
but if I had to guess, I'd say this is to allow appending to tar archives
an update to a file - useful for tape backups!)

For the first case, tarsnap therefore needs to scan the whole archive
when you are restoring directories or wildcards.

For the second case, to ensure you always restore the latest version of a
file, even when you specify a specific filename directly, it still needs
to scan through everything if it's found your file.

To get around this issue, Colin has added a "quick" mode flag that causes
the restore to stop as soon as the listed files are restored. Note,
using this will only work properly if each specific filename you want
to restore is specified literally, no wildcards or just direcory names.

This is what you want to use.

From the tarsnap manpage:

 | -q (--fast-read) (x and t modes only) Extract or list only the first
 |     archive entry that matches each pattern or filename operand.
 |     Exit as soon as each specified pattern or filename has been matched.
 |     By default, the archive is always read to the very end, since there
 |     can be multiple entries with the same name and, by convention, later
 |     entries overwrite earlier entries.  This option is provided as a
 |     performance optimization.

Now, this raises 2 other questions, but you'll need a reply from Colin or
Graham on these!:

1) We can never append to tarsnap archives. The way they are stored, this is
   nonsensical.

   Restoring the same file more than once in an archive therefore makes no
   sense.

   Therefore, wouldn't it be better if on a single tarsnap run, tarsnap refused
   to backup the same file more than once, and then similarly on a resore,
   always assumed "-q" when all the named entries to be retrieved are mentioned
   literally?

   Obviously, if an entry turns out to be a directory, and not a file,
   tarsnap will have to then fall back to scanning the full index (as indeed
   it would if it first happens to find a filename whose path is a subset of
   the named entry [i.e. the named entry is a directory, but a file within it
   happens to be in the index before the directory itself...] Though I guess this
   latter point would never happen with tarsnap, so may be moot)

 2) tarsnap archives are not sequential-only-access files. Why isn't the index
    of an archives contents held more optimally than it appears? (There will be
    a good reason for this, I just don't know what it is!)

   Cheers,
   Jamie