[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Database backup and deduplication question
On 12/19/11 6:09 PM, Dan Rue wrote:
> On Mon, Dec 19, 2011 at 05:59:50PM -0500, Greg Larkin wrote:
>> Total size Compressed size
>> All archives 524618980 527202985
>> (unique data) 183722213 184630527
>> deduptest5 104924533 105446860
>> (unique data) 78795582 79187867
>>
>> Tarsnap reports 75MB of unique data in this archive, instead of 50MB.
>> Is that due to the design of the chunking algorithm and expected
>> behavior? If it is, is my best option to split the dump file into parts
>> that will likely remain static and the ones that will change more
>> frequently?
> Hi Greg,
>
> I went through the same exercise a while ago. I found that whenever you
> compress something, you can basically forget about deduplication.
> However, there is an option in the gnu version of gzip (available in
> freebsd under archivers/gzip) called --rsyncable that will compress in
> such a way to preserve deduplication, at a small penalty to efficiency.
>
> I wrote a couple blog posts along these lines:
> http://therub.org/2010/10/20/can-compression-play-nicely-with-deduplication/
> http://therub.org/2010/11/08/compression-can-play-nice-with-deduplication-and-rsync/
>
> Dan
Hi Dan,
Excellent - thanks very much for that tip. I didn't know about that
optional patch, and it looks like it will help a lot!
Cheers,
Greg