[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Delete archive time



On Wed, Apr 18, 2012 at 06:33:04PM +0100, Michael Stevens wrote:
> On Wed, Apr 18, 2012 at 10:09:48AM -0700, Colin Percival wrote:
> > [picking a random email to reply to...]
> > 
> > On 04/18/12 06:34, Michael Stevens wrote:
> > > I'm finding deleting archives is very slow and quite bandwidth intensive
> > > as well.
> > 
> > The bandwidth used should be about 0.1% of the size of the archive you're
> > deleting; are you seeing more than this?
> 
> It looks like I've done about 10-20gb of bandwidth in the last 48 hours
> deleting archives. I'm deleting ~100 archives with content mostly
> duplicated between them.
> 
> > > I've been trying to clear up old archives and deleting about a year's
> > > worth of one a day - it's been going around 36 hours now and using a
> > > fair bit of bandwidth.
> > 
> > Can you send me your --print-stats output?  It's possible that if Tarsnap's
> > deduplication worked extremely well when it was creating archives then the
> > 0.1% overhead is significant -- I hadn't worried about it because I figured
> > it would always be much less than the time taken to originally create the
> > archives.
> 
> On one machine (but not the one I'm referring to above):
> 									Total size  Compressed size
> All archives                         603859837025     245148994034
>   (unique data)                       24571415132      10465650048
> 
> I'll see if I can get some more numbers later.

The machine I was particularly complaining about has taken almost
exactly 48 hours to delete around 300 archives (one failed due to
network problems my end).

The output of the final delete:

                                       Total size  Compressed size
All archives                        1075430463240     689511951789
  (unique data)                      104970009987      75556009572
This archive                         154419395356      99378386213
Deleted data                            341060679        234719159

Going as far back as I have scrollback, for one of the earlier deletes:

                                       Total size  Compressed size
All archives                       29119731147593   18272658548233
  (unique data)                      136239802444      91778391522
This archive                         148559529876      92986492326
Deleted data                            107609840         61792498

I had one delete I wanted to do left over, which I've timed:

root@osaka:~# time tarsnap -d -f osaka-home-2011-05-18
                                       Total size  Compressed size
All archives                         926836591828     596496483543
  (unique data)                       96754663741      68973013309
This archive                         148593871412      93015468246
Deleted data                           8215346246       6582996263

real	14m37.343s
user	0m42.731s
sys	0m4.344s

Network connection is roughly 10mbit down/1mbit up ADSL line.

In my case, the reason I'm deleting so much is that I have no automated
way of cleaning up old incremental backups, and they're cheap, so I've
let them build up, and am finding cleaning things up is slower than I
expected.

I'm wondering if a cheap fix for this would be to provide some higher
level operations (possibly as a wrapper), like "archive daily and keep
last n days" (<insert attempt to get colin to do all the work here>).

Michael