[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tarsnap GUI shows 0 data archived or backed up

On Thu, Sep 21, 2017 at 06:58:43PM -0000, amar@sdf.org wrote:
> Sorry for the delay. Here's the file type distribution (some are unknown;
> I just shared as powershell printed):

Thanks!  This looks quite reasonable to me.  Just to keep the data together, in
another email you wrote:

> I am backing up 1.45 GB and I see the usage are 1.07 GB.

Disregarding the blank extensions in your list, and only looking at extensions
which are more than 50 MB, we have:

> Extension  Size (MB) Count
> ---------  --------- -----
> jpg           391.85  4653
> pdf           180.03   456
> tcx            108.9   146
> mp4             80.2     7
> dcm            63.26    56

jpg files are typically compressed.  This can be disabled in most image
programs, but I'm willing to bet that these are compressed.  Tarsnap's
deduplication will be useful when you make additional archives, but Tarsnap's
compression stage will not save you a lot of space on those jpgs.

Same goes for mp4 and pdf -- they're already compressed.

I'm not certain about "tcx" and "dcm" files.  Googling suggests that the latter
could be DICOM images (which include compression) or "DCM audio module" (I'm
not certain about the compression status there).  "tcx" could be TurboCAD in
text form or "Training Center XML", both of which *will* benefit from Tarsnap's

Assuming that the above guesses are plausible, and only looking at the small
table of file extensions, there's 823 Mb total data, of which we could expect
to see significant reduction in 171 Mb of that (due to compression).  This
value depends a huge amount on the actual content, so people are quite
reluctant to say anything like "assuming typical user data, DEFLATE will give
you a reduction of xy%".

That said, let's assume that we reduce 75% of that 171 MB.  That saves us 128

You saw a reduction of 1.45 GB to to 1.07 GB, or a saving of 380 MB.  I was
looking at roughly half of your data, so let's double the estimated saving and
get 250 MB.  So you're seeing more compression than we expected.

There's a huge amount of quibble room here.  Deduplication will probably have
/some/ kind of benefit to your initial data.  Your jpg files were probably
compressed by a further 0.2% or 0.5% by Tarsnap.  Your data in file extensions
less than 50 MB might be easier to compress.  And the figure of "75%" was a
complete guess on my part.

Still, in terms of a "back-of-the-napkin analysis", the answer is "yes, your
reduction of 1.45 GB to 1.07 GB for the initial archive, on the data you
provided, looks quite reasonable".

- Graham