[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Database backup and deduplication question



On 12/19/11 6:09 PM, Dan Rue wrote:
> On Mon, Dec 19, 2011 at 05:59:50PM -0500, Greg Larkin wrote:
>>                                        Total size  Compressed size
>> All archives                            524618980        527202985
>>   (unique data)                         183722213        184630527
>> deduptest5                              104924533        105446860
>>   (unique data)                          78795582         79187867
>>
>> Tarsnap reports 75MB of unique data in this archive, instead of 50MB. 
>> Is that due to the design of the chunking algorithm and expected
>> behavior?  If it is, is my best option to split the dump file into parts
>> that will likely remain static and the ones that will change more
>> frequently?
> Hi Greg, 
>
> I went through the same exercise a while ago. I found that whenever you
> compress something, you can basically forget about deduplication.
> However, there is an option in the gnu version of gzip (available in
> freebsd under archivers/gzip) called --rsyncable that will compress in
> such a way to preserve deduplication, at a small penalty to efficiency.
>
> I wrote a couple blog posts along these lines:
> http://therub.org/2010/10/20/can-compression-play-nicely-with-deduplication/
> http://therub.org/2010/11/08/compression-can-play-nice-with-deduplication-and-rsync/
>
> Dan
Hi Dan,

Excellent - thanks very much for that tip.  I didn't know about that
optional patch, and it looks like it will help a lot!

Cheers,
Greg