[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Client->Server bandwidth < Server->Client bandwidth?



Hi Gabriel & list,

On 10/29/13 00:45, Gabriel Kerneis wrote:
> I have a question about daily bandwith usage.  On my admin interface, I see:
> 
>     Server->Client ≃ 3.7 × Client->Server
> 
> I wonder if this expected, considering that I am only creating (and deleting in
> a round-robin manner), never restoring archives.  I am slightly surprised that
> the amount of control data would be so large, but this is just a naive question,
> I didn't study tarsnap source code. 

I think it's the other way around: The amount of data being uploaded is so small.

When you create an archive, tarsnap deduplicates everything locally and only
uploads new blocks.  Most of those blocks are (~64kB of) data; some are
metadata consisting of lists of (~1600) data blocks.  If you have enough
(~100 MB) of unchanging data all together, the metadata block listing the
data blocks will be identical to a previously stored one, so that won't be
uploaded again either.

When you delete an archive, tarsnap needs to download all the metadata -- all
the lists of blocks -- so that it can adjust its reference counts locally and
figure out which blocks need to be deleted.  As a result, the download bandwidth
used by deletes depends only on the size of the archive -- not how well it was
deduplicated when it was created.

If you use the --print-stats option when creating archives, I think you'll find
that the total size of the archive you're creating is much much larger than the
amount of new data being uploaded.

> Could this indicate an issue with my local cache?

No, I can't see any possible connection there.

-- 
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid