[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Client->Server bandwidth < Server->Client bandwidth?
On 10/29/13 01:07, Gabriel Kerneis wrote:
> On Tue, Oct 29, 2013 at 12:51:34AM -0700, Colin Percival wrote:
>> When you create an archive, tarsnap deduplicates everything locally and only
>> uploads new blocks. Most of those blocks are (~64kB of) data; some are
>> metadata consisting of lists of (~1600) data blocks. If you have enough
>> (~100 MB) of unchanging data all together, the metadata block listing the
>> data blocks will be identical to a previously stored one, so that won't be
>> uploaded again either.
>
> Oh, so metadata is deduplicated too? That is nice.
Yes -- the lists of blocks, at least. There's a separate metadata block with
the name of the archive, the time it was created, the command line, etc, and
that is not deduplicated (it can't be, since the names have to be different).
But 100% of the "tar format" data is deduplicated and 99% of the rest is.
>> When you delete an archive, tarsnap needs to download all the metadata -- all
>> the lists of blocks -- so that it can adjust its reference counts locally and
>> figure out which blocks need to be deleted.
>
> I thought this was cached, hence my surprise. So tarsnap does not keep any
> information locally about an archive once it is uploaded (except very indirectly
> through the value of reference counts)?
The ${cachedir}/directory file has block hashes, sizes, and reference counts.
The ${cachedir}/cache file is basically a disk cache -- it says "last time we
looked at file foo, here's the blocks it contained", which allows tarsnap to
avoid disk bandwidth and CPU work reading and re-deduplicating files which
haven't been modified recently.
--
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid