[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Database backup and deduplication question

To: "Greg Larkin" <glarkin@sourcehosting.net>
Subject: Re: Database backup and deduplication question
From: "Greg Larkin" <glarkin@sourcehosting.net>
Date: Thu, 29 Dec 2011 23:02:46 -0500 (EST)
Cc: "Colin Percival" <cperciva@tarsnap.com>, tarsnap-users@tarsnap.com
Importance: Normal
In-reply-to: <4EFCD9E1.2000206@sourcehosting.net>
References: <4EEFC1E6.5020606@sourcehosting.net> <4EF01785.4030205@tarsnap.com> <4EF0D7C8.6050107@sourcehosting.net> <4EF1C4C5.1070500@tarsnap.com> <4EF20A09.7070504@sourcehosting.net> <4EF559C2.8060609@tarsnap.com> <4EFCD9E1.2000206@sourcehosting.net>
Reply-to: glarkin@sourcehosting.net

> On 12/23/11 11:49 PM, Colin Percival wrote:
>> Can you try
>>
>> # tarsnap --dry-run -cvf testarchive file1 file1
>> # tarsnap --dry-run -cvf testarchive part1 part1
>> # tarsnap --dry-run -cvf testarchive part2 part2
>> # tarsnap --dry-run -cvf testarchive part3 part3
>>
>> You should get a perfect 2:1 deduplication ratio (modulo overhead)
>> storing the
>> same file twice... but of course you should have gotten a 2:1 ratio when
>> storing
>> the file and its separate parts too, so I'd like to see if this works
>> properly.
>>
> Hi Colin,
>
> I ran those tests as shown, and each one worked as expected with a 2:1
> ratio.  Then I ran a few more sequences, in case they help:
>
> sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1
> a file1
> a file1
> a part1
>                                        Total size  Compressed size
> All archives                            236079049        237256169
>   (unique data)                         105008116        105526431
> This archive                            236079049        237256169
> New data                                105008116        105526431
> sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2
> a file1
> a file1
> a part1
> a part2
>                                        Total size  Compressed size
> All archives                            288540303        289974299
>   (unique data)                         129278071        129911860
> This archive                            288540303        289974299
> New data                                129278071        129911860
> sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2 part3
> a file1
> a file1
> a part1
> a part2
> a part3
>                                        Total size  Compressed size
> All archives                            314771757        316338103
>   (unique data)                         138220066        138902042
> This archive                            314771757        316338103
> New data                                138220066        138902042
>
> The first sequence looks ok (file1 file1 part1), but after that, the
> "New data" number increases more than expected.  I have another Mac, and
> I'll try the same tests to see if there's any machine-specific issue here.
>
> Thank you,
> Greg
>

Hi again,

I just ran the same tests with the same files on a different Mac running
10.5.8 instead of 10.6.8.  This time, the results are as expected:

sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1
a file1
a file1
a part1
                                       Total size  Compressed size
All archives                            236075569        237226272
  (unique data)                         105098024        105606395
This archive                            236075569        237226272
New data                                105098024        105606395
sh-3.2#
sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2
a file1
a file1
a part1
a part2
                                       Total size  Compressed size
All archives                            288537463        289949808
  (unique data)                         105244723        105751742
This archive                            288537463        289949808
New data                                105244723        105751742
sh-3.2# tarsnap --dry-run -cvf testarchive file1 file1 part1 part2 part3
a file1
a file1
a part1
a part2
a part3
                                       Total size  Compressed size
All archives                            314768037        316306069
  (unique data)                         105331285        105837271
This archive                            314768037        316306069
New data                                105331285        105837271

If you have any ideas about debug messages or other instrumentation that I
can put in the source code, let me know.  I'll compare the shell
environment between the two machines to see if that reveals anything
interesting.

Thank you,
Greg

Follow-Ups:
- Re: Database backup and deduplication question
  - From: Colin Percival <cperciva@tarsnap.com>

References:
- Database backup and deduplication question
  - From: Greg Larkin <glarkin@sourcehosting.net>
- Re: Database backup and deduplication question
  - From: Colin Percival <cperciva@tarsnap.com>
- Re: Database backup and deduplication question
  - From: Greg Larkin <glarkin@sourcehosting.net>
- Re: Database backup and deduplication question
  - From: Colin Percival <cperciva@tarsnap.com>
- Re: Database backup and deduplication question
  - From: Greg Larkin <glarkin@sourcehosting.net>
- Re: Database backup and deduplication question
  - From: Colin Percival <cperciva@tarsnap.com>
- Re: Database backup and deduplication question
  - From: Greg Larkin <glarkin@sourcehosting.net>

Prev by Date: Re: Database backup and deduplication question
Next by Date: Excluding directories with names that have at-signs works from CLI but not .tarsnaprc?
Previous by thread: Re: Database backup and deduplication question
Next by thread: Re: Database backup and deduplication question
Index(es):
- Date
- Thread