[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Expected deduplication doesn't take place
- To: Colin Percival <cperciva@tarsnap.com>
- Subject: Re: Expected deduplication doesn't take place
- From: Igor Ostapenko <igor.ostapenko@gmail.com>
- Date: Tue, 19 Jan 2016 23:21:27 +0200
- Cc: tarsnap-users@tarsnap.com
- In-reply-to: <000001525b62c00f-810e550a-d311-40fe-a1c4-8f4d8d3de279-000000@email.amazonses.com>
- Openpgp: id=FDD1E68BAB02BCB54B587C686604BB2A48FE9F99
- References: <569DFDF5.10408@gmail.com> <00000152592cfc9e-f40fe2e4-8fb1-4bfa-bf1e-896c034d9516-000000@email.amazonses.com> <569E1A94.1040507@gmail.com> <000001525b62c00f-810e550a-d311-40fe-a1c4-8f4d8d3de279-000000@email.amazonses.com>
Colin Percival wrote on 19/01/2016 21:35:
> Hi Igor,
>
> On 01/19/16 03:14, Igor Ostapenko wrote:
>> Colin Percival wrote on 19/01/2016 11:17:
>>> Looks like the compression on file.tar.xz is getting in the way -- tarsnap
>>> can't find any duplicated blocks, because in the compressed files there
>>> aren't any.
>>
>> Then it looks I don't understand tarsnap deduplication mechanism
>> correctly. That is, in this particular case I didn't expect tarsnap to
>> find duplicates in *.tar.xz file, but I did expect it to respect
>> previously archived file blob to be re-used (referenced) somehow in the
>> next archive with absolutely the same file.
>
> Oh, it's exactly the same file? I assumed it was a new daily file. That's
> strange then.
Yes, this is my case.
>
> Speaking of strange though, and looking more closely...
>> $ # The first run
>> $ tarsnap -cvf .test.daily.20160119104958 .test
>> a .test
>> a .test/file.tar.xz
>> Total size Compressed size
>> All archives 8.3 GB 3.5 GB
>> (unique data) 1.4 GB 622 MB
>> This archive 10 MB 10 MB
>> New data 10 MB 10 MB
>>
>> $ # The second run
>> $ tarsnap -cvf .test.daily.20160119105034 .test
>> a .test
>> a .test/file.tar.xz
>> Total size Compressed size
>> All archives 8.3 GB 3.5 GB
>> (unique data) 1.4 GB 622 MB
>> This archive 10 MB 10 MB
>> New data 10 MB 10 MB
>
> The unique compressed data is 622 MB in both cases. Are you sure that
> you didn't delete .test.daily.20160119104958 before you ran tarsnap
> again to create .test.daily.20160119105034 ?
>
The second run was invoked right after the first one. There were no
deletion. Actually, write-only key is used in this situation.
Yep, 'unique data' is still the same. Probably it means that
deduplication is fine and it's just a question to '--print-stats'.