[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Delete archive time
On 04/18/12 10:40, Chris Webb wrote:
> Colin Percival <cperciva@tarsnap.com> writes:
>> On 04/18/12 06:34, Michael Stevens wrote:
>>> I've been trying to clear up old archives and deleting about a year's
>>> worth of one a day - it's been going around 36 hours now and using a
>>> fair bit of bandwidth.
>>
>> Can you send me your --print-stats output?
>
> I've just tried one of the slow deletes, and got something like:
>
> Total size Compressed size
> All archives 63455072738534 40291312663270
> (unique data) 429840395099 231331365110
> This archive 207425160114 135175092080
> Deleted data 724614571 360346294
Hmm. Just crunching some numbers here: Your ~200 GB archive should
contain approximately 3.2 million blocks, which takes about 120 MB
of mostly incompressible data to list the blocks. So Tarsnap has to
download roughly 120 MB in order to delete 360 MB... ick. So much for
my "it's not likely that anyone will ever see deduplication performance
high enough to make the index downloads significant".
Do you need faster performance for deleting *one* archive, or would it
be sufficient to have faster performance for deleting *several archives
at once*? The reason I ask is that with this much data deduplication
going on, you've probably got some index blocks which are duplicated
between archives too -- in which case it should be possible with a small
amount of new code to make this substantially faster (by caching index
blocks rather than processing them, freeing them, and then potentially
re-downloading them later).
Performance will improve generally when I get the extract performance
improvements done, but that's further out and won't have any effect on
the amount of data which needs to be downloaded.
--
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid