[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using wildcards to remove archives

On Fri, Mar 15, 2019 at 12:42:39AM -0700, Craig Hartnett wrote:
> On Thu, 2019-03-14 at 20:21 -0700, Graham Percival wrote:
> > $ tarsnap -d --archive-names todelete.txt
> Brilliant! Almost exactly what I was looking for and it worked
> perfectly. Thanks Graham. Only problem is now you're making less money
> from my having previously over-archived.

Unless your data changes a lot, it probably doesn't make much of a
difference to the amount that you pay.  For example, let's look at
my personal data that I back up.  I started in July of 2012, and
made 3 archives:

$ grep "2012-07" archives.txt > check.txt
$ more check.txt 
$ tarsnap --print-stats --archive-names check.txt
                                       Total size  Compressed size
All archives                          31530638982      21480417909
  (unique data)                        1590891610       1025306568
all-2012-07-14                          388384107        297320442
  (unique data)                           1970795           838647
all-2012-07-15                          387590611        296985868
  (unique data)                            712486           235231
all-2012-07-28                          391691883        299653912
  (unique data)                           2165719          1050518

What we care about is the "(unique data)" and the "Compresed
size".  Look at my very first archive, all-2012-07-14: 838647 (or
slightly less than 1 MB).

Because I've lazily not bothered to delete that archive, how much
money have I paid to Tarsnap in the past 7 years?

$ wcalc
Enter an expression to evaluate, q to quit, or ? for help:
-> 838647 * 250 * 10^-12 * 7*12
 = 0.0176116

Less than two cents.  Deduplication is amazing!

Sure, I sometimes get OCD twinges at having all these old useless
backups floating around (although I only back up approximately 1
per month, so it's far less than you'd imagine).  But whenever I
look at the math of how much I'm spending, vs. the possible risk
of my deleting an archive that I might end up needing later... I
figure it's just not worth it.

Of course, everybody's data and usage patterns are different.  If
you're curious, I encourage you to run this experiment with your
own data.  For any given archive, how much will it cost to save
that unique compressed data for 5 or 10 years?

- Graham