[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Database backup and deduplication question
Hi everyone,
I'm using tarsnap to back up some large MySQL database dump files, and
at the moment, they are compressed prior to backup. I know that means
I'll have to push the maximum amount of data each day, so I'm looking to
reduce the time and bandwidth it takes to store each one.
I assume that if I generate a uncompressed full database backup file of
~1GB each day and only ~5MB of the contents change, tarsnap recognizes
that and sends only the changed data. My questions are:
1) Does that assumption hold even if the filename changes each day?
2) Does that assumption hold if the filesystem inodes of the backup file
change each day?
3) Does tarsnap recognize what data to send if only a small amount in
the middle of the file changes?
I tested some scenarios by generating a 100MB file of random data. I
tarsnapped several times, and obviously, the full file is only
transmitted the first time. I then copied the file to a new name and
tarsnapped again. The file is not transmitted because it's identical to
the original file, so it appears #1 and #2 are true.
Next I changed the new file so I had 25MB of identical data at the
beginning of the file, 50MB of different data in the middle, and 25MB of
identical data at the end. I tarsnapped again, and this time, I saw:
Total size Compressed size
All archives 524618980 527202985
(unique data) 183722213 184630527
deduptest5 104924533 105446860
(unique data) 78795582 79187867
Tarsnap reports 75MB of unique data in this archive, instead of 50MB.
Is that due to the design of the chunking algorithm and expected
behavior? If it is, is my best option to split the dump file into parts
that will likely remain static and the ones that will change more
frequently?
Thank you,
Greg