[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Database backup and deduplication question



On Mon, Dec 19, 2011 at 05:59:50PM -0500, Greg Larkin wrote:
>                                        Total size  Compressed size
> All archives                            524618980        527202985
>   (unique data)                         183722213        184630527
> deduptest5                              104924533        105446860
>   (unique data)                          78795582         79187867
> 
> Tarsnap reports 75MB of unique data in this archive, instead of 50MB. 
> Is that due to the design of the chunking algorithm and expected
> behavior?  If it is, is my best option to split the dump file into parts
> that will likely remain static and the ones that will change more
> frequently?

Hi Greg, 

I went through the same exercise a while ago. I found that whenever you
compress something, you can basically forget about deduplication.
However, there is an option in the gnu version of gzip (available in
freebsd under archivers/gzip) called --rsyncable that will compress in
such a way to preserve deduplication, at a small penalty to efficiency.

I wrote a couple blog posts along these lines:
http://therub.org/2010/10/20/can-compression-play-nicely-with-deduplication/
http://therub.org/2010/11/08/compression-can-play-nice-with-deduplication-and-rsync/

Dan