[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Database backup and deduplication question
On Mon, Dec 19, 2011 at 05:59:50PM -0500, Greg Larkin wrote:
> Total size Compressed size
> All archives 524618980 527202985
> (unique data) 183722213 184630527
> deduptest5 104924533 105446860
> (unique data) 78795582 79187867
>
> Tarsnap reports 75MB of unique data in this archive, instead of 50MB.
> Is that due to the design of the chunking algorithm and expected
> behavior? If it is, is my best option to split the dump file into parts
> that will likely remain static and the ones that will change more
> frequently?
Hi Greg,
I went through the same exercise a while ago. I found that whenever you
compress something, you can basically forget about deduplication.
However, there is an option in the gnu version of gzip (available in
freebsd under archivers/gzip) called --rsyncable that will compress in
such a way to preserve deduplication, at a small penalty to efficiency.
I wrote a couple blog posts along these lines:
http://therub.org/2010/10/20/can-compression-play-nicely-with-deduplication/
http://therub.org/2010/11/08/compression-can-play-nice-with-deduplication-and-rsync/
Dan