[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Delete archive time



On 4/18/12 3:13 PM, Michael Stevens wrote:
> On Wed, Apr 18, 2012 at 06:33:04PM +0100, Michael Stevens wrote:
>> On Wed, Apr 18, 2012 at 10:09:48AM -0700, Colin Percival wrote:
>>> [picking a random email to reply to...]
>>>
>>> On 04/18/12 06:34, Michael Stevens wrote:
>>>> I'm finding deleting archives is very slow and quite bandwidth intensive
>>>> as well.
>>> The bandwidth used should be about 0.1% of the size of the archive you're
>>> deleting; are you seeing more than this?
>> It looks like I've done about 10-20gb of bandwidth in the last 48 hours
>> deleting archives. I'm deleting ~100 archives with content mostly
>> duplicated between them.
>>
>>>> I've been trying to clear up old archives and deleting about a year's
>>>> worth of one a day - it's been going around 36 hours now and using a
>>>> fair bit of bandwidth.
>>> Can you send me your --print-stats output?  It's possible that if Tarsnap's
>>> deduplication worked extremely well when it was creating archives then the
>>> 0.1% overhead is significant -- I hadn't worried about it because I figured
>>> it would always be much less than the time taken to originally create the
>>> archives.
>> On one machine (but not the one I'm referring to above):
>> 									Total size  Compressed size
>> All archives                         603859837025     245148994034
>>   (unique data)                       24571415132      10465650048
>>
>> I'll see if I can get some more numbers later.
> The machine I was particularly complaining about has taken almost
> exactly 48 hours to delete around 300 archives (one failed due to
> network problems my end).
>
> The output of the final delete:
>
>                                        Total size  Compressed size
> All archives                        1075430463240     689511951789
>   (unique data)                      104970009987      75556009572
> This archive                         154419395356      99378386213
> Deleted data                            341060679        234719159
>
> Going as far back as I have scrollback, for one of the earlier deletes:
>
>                                        Total size  Compressed size
> All archives                       29119731147593   18272658548233
>   (unique data)                      136239802444      91778391522
> This archive                         148559529876      92986492326
> Deleted data                            107609840         61792498
>
> I had one delete I wanted to do left over, which I've timed:
>
> root@osaka:~# time tarsnap -d -f osaka-home-2011-05-18
>                                        Total size  Compressed size
> All archives                         926836591828     596496483543
>   (unique data)                       96754663741      68973013309
> This archive                         148593871412      93015468246
> Deleted data                           8215346246       6582996263
>
> real	14m37.343s
> user	0m42.731s
> sys	0m4.344s
>
> Network connection is roughly 10mbit down/1mbit up ADSL line.
>
> In my case, the reason I'm deleting so much is that I have no automated
> way of cleaning up old incremental backups, and they're cheap, so I've
> let them build up, and am finding cleaning things up is slower than I
> expected.
>
> I'm wondering if a cheap fix for this would be to provide some higher
> level operations (possibly as a wrapper), like "archive daily and keep
> last n days" (<insert attempt to get colin to do all the work here>).
>
> Michael

Hi Michael,

I have been using feather (https://github.com/danrue/feather), and it
has been working well as a replacement for rsnapshot.  I also added it
to the FreeBSD ports tree and maintain it there:
http://www.freshports.org/sysutils/feather/

Hope that helps,
Greg Larkin