[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Database backup and deduplication question
> On 12/23/11 11:49 PM, Colin Percival wrote:
>> Can you try
>>
>> # tarsnap --dry-run -cvf testarchive file1 file1
>> # tarsnap --dry-run -cvf testarchive part1 part1
>> # tarsnap --dry-run -cvf testarchive part2 part2
>> # tarsnap --dry-run -cvf testarchive part3 part3
>>
>> You should get a perfect 2:1 deduplication ratio (modulo overhead)
>> storing the
>> same file twice... but of course you should have gotten a 2:1 ratio when
>> storing
>> the file and its separate parts too, so I'd like to see if this works
>> properly.
>>
> Hi Colin,
>
> I ran those tests as shown, and each one worked as expected with a 2:1
> ratio. Then I ran a few more sequences, in case they help:
>
> sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1
> a file1
> a file1
> a part1
> Total size Compressed size
> All archives 236079049 237256169
> (unique data) 105008116 105526431
> This archive 236079049 237256169
> New data 105008116 105526431
> sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2
> a file1
> a file1
> a part1
> a part2
> Total size Compressed size
> All archives 288540303 289974299
> (unique data) 129278071 129911860
> This archive 288540303 289974299
> New data 129278071 129911860
> sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2 part3
> a file1
> a file1
> a part1
> a part2
> a part3
> Total size Compressed size
> All archives 314771757 316338103
> (unique data) 138220066 138902042
> This archive 314771757 316338103
> New data 138220066 138902042
>
> The first sequence looks ok (file1 file1 part1), but after that, the
> "New data" number increases more than expected. I have another Mac, and
> I'll try the same tests to see if there's any machine-specific issue here.
>
> Thank you,
> Greg
>
Hi again,
I just ran the same tests with the same files on a different Mac running
10.5.8 instead of 10.6.8. This time, the results are as expected:
sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1
a file1
a file1
a part1
Total size Compressed size
All archives 236075569 237226272
(unique data) 105098024 105606395
This archive 236075569 237226272
New data 105098024 105606395
sh-3.2#
sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2
a file1
a file1
a part1
a part2
Total size Compressed size
All archives 288537463 289949808
(unique data) 105244723 105751742
This archive 288537463 289949808
New data 105244723 105751742
sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2 part3
a file1
a file1
a part1
a part2
a part3
Total size Compressed size
All archives 314768037 316306069
(unique data) 105331285 105837271
This archive 314768037 316306069
New data 105331285 105837271
If you have any ideas about debug messages or other instrumentation that I
can put in the source code, let me know. I'll compare the shell
environment between the two machines to see if that reveals anything
interesting.
Thank you,
Greg