[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Client->Server bandwidth < Server->Client bandwidth?



Hi Colin,

On Tue, Oct 29, 2013 at 12:51:34AM -0700, Colin Percival wrote:
> If you use the --print-stats option when creating archives, I think you'll find
> that the total size of the archive you're creating is much much larger than the
> amount of new data being uploaded.

That would definitely make sense: since I'm snapshoting every hour, the delta
for each archive is very small compared to the total size.

> When you create an archive, tarsnap deduplicates everything locally and only
> uploads new blocks.  Most of those blocks are (~64kB of) data; some are
> metadata consisting of lists of (~1600) data blocks.  If you have enough
> (~100 MB) of unchanging data all together, the metadata block listing the
> data blocks will be identical to a previously stored one, so that won't be
> uploaded again either.

Oh, so metadata is deduplicated too? That is nice.

> When you delete an archive, tarsnap needs to download all the metadata -- all
> the lists of blocks -- so that it can adjust its reference counts locally and
> figure out which blocks need to be deleted.

I thought this was cached, hence my surprise. So tarsnap does not keep any
information locally about an archive once it is uploaded (except very indirectly
through the value of reference counts)?

Many thanks for your explanations,
-- 
Gabriel