[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Database backup and deduplication question



On 12/23/11 11:49 PM, Colin Percival wrote:
> Can you try
>
> # tarsnap --dry-run -cvf testarchive file1 file1
> # tarsnap --dry-run -cvf testarchive part1 part1
> # tarsnap --dry-run -cvf testarchive part2 part2
> # tarsnap --dry-run -cvf testarchive part3 part3
>
> You should get a perfect 2:1 deduplication ratio (modulo overhead) storing the
> same file twice... but of course you should have gotten a 2:1 ratio when storing
> the file and its separate parts too, so I'd like to see if this works properly.
>
Hi Colin,

I ran those tests as shown, and each one worked as expected with a 2:1
ratio.  Then I ran a few more sequences, in case they help:

sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1
a file1
a file1
a part1
                                       Total size  Compressed size
All archives                            236079049        237256169
  (unique data)                         105008116        105526431
This archive                            236079049        237256169
New data                                105008116        105526431
sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2
a file1
a file1
a part1
a part2
                                       Total size  Compressed size
All archives                            288540303        289974299
  (unique data)                         129278071        129911860
This archive                            288540303        289974299
New data                                129278071        129911860
sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2 part3
a file1
a file1
a part1
a part2
a part3
                                       Total size  Compressed size
All archives                            314771757        316338103
  (unique data)                         138220066        138902042
This archive                            314771757        316338103
New data                                138220066        138902042

The first sequence looks ok (file1 file1 part1), but after that, the
"New data" number increases more than expected.  I have another Mac, and
I'll try the same tests to see if there's any machine-specific issue here.

Thank you,
Greg