[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Expected deduplication doesn't take place



Colin Percival wrote on 19/01/2016 21:35:
> Hi Igor,
> 
> On 01/19/16 03:14, Igor Ostapenko wrote:
>> Colin Percival wrote on 19/01/2016 11:17:
>>> Looks like the compression on file.tar.xz is getting in the way -- tarsnap
>>> can't find any duplicated blocks, because in the compressed files there
>>> aren't any.
>> 
>> Then it looks I don't understand tarsnap deduplication mechanism
>> correctly. That is, in this particular case I didn't expect tarsnap to
>> find duplicates in *.tar.xz file, but I did expect it to respect
>> previously archived file blob to be re-used (referenced) somehow in the
>> next archive with absolutely the same file.
> 
> Oh, it's exactly the same file?  I assumed it was a new daily file.  That's
> strange then.

Yes, this is my case.

> 
> Speaking of strange though, and looking more closely...
>> $ # The first run
>> $ tarsnap -cvf .test.daily.20160119104958 .test
>> a .test
>> a .test/file.tar.xz
>>                                        Total size  Compressed size
>> All archives                               8.3 GB           3.5 GB
>>   (unique data)                            1.4 GB           622 MB
>> This archive                                10 MB            10 MB
>> New data                                    10 MB            10 MB
>> 
>> $ # The second run
>> $ tarsnap -cvf .test.daily.20160119105034 .test
>> a .test
>> a .test/file.tar.xz
>>                                        Total size  Compressed size
>> All archives                               8.3 GB           3.5 GB
>>   (unique data)                            1.4 GB           622 MB
>> This archive                                10 MB            10 MB
>> New data                                    10 MB            10 MB
> 
> The unique compressed data is 622 MB in both cases.  Are you sure that
> you didn't delete .test.daily.20160119104958 before you ran tarsnap
> again to create .test.daily.20160119105034 ?
> 

The second run was invoked right after the first one. There were no
deletion. Actually, write-only key is used in this situation.

Yep, 'unique data' is still the same. Probably it means that
deduplication is fine and it's just a question to '--print-stats'.