[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Database backup and deduplication question

To: Greg Larkin <glarkin@sourcehosting.net>
Subject: Re: Database backup and deduplication question
From: Colin Percival <cperciva@tarsnap.com>
Date: Wed, 21 Dec 2011 03:36:37 -0800
Cc: tarsnap-users@tarsnap.com
In-reply-to: <4EF0D7C8.6050107@sourcehosting.net>
References: <4EEFC1E6.5020606@sourcehosting.net> <4EF01785.4030205@tarsnap.com> <4EF0D7C8.6050107@sourcehosting.net>

On 12/20/11 10:45, Greg Larkin wrote:
> On 12/20/11 12:05 AM, Colin Percival wrote:
>> That's very weird.  I just did my own test with two 100 MB files which
>> were the same aside from their middle 50MB and I got the expected result
>> (200 MB total archive size, 150 MB post-deduplication):
>>
> Ok, I just simplified everything and used the same dd/cat commands you
> have above.  My test is a little different in that I am only adding
> single files to each archive, instead of multiple files with a lot of
> common data to the same archive.  I wonder if that has something to do
> with it?

I just tested with exactly the same sequence of commands as you and I'm
still seeing 100 MB for the first archive and 50 MB for the second.  Are
you using the checkpoint-bytes option?  That can affect the sequence of
chunks which a block of data gets divided into, although it shouldn't
have this much of an effect.  Aside from that slight possibility, I'm
utterly mystified... can you try
# dd if=/dev/random bs=1m count=25 of=part1
# dd if=/dev/random bs=1m count=50 of=part2
# dd if=/dev/random bs=1m count=25 of=part3
# cat part1 part2 part3 > file1
# tarsnap --dry-run -c file1 part1 part2 part3
so we can see if the individual parts are getting deduplicated properly
when they're not stuck together with other data?

-- 
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid

Follow-Ups:
- Re: Database backup and deduplication question
  - From: Greg Larkin <glarkin@sourcehosting.net>

References:
- Database backup and deduplication question
  - From: Greg Larkin <glarkin@sourcehosting.net>
- Re: Database backup and deduplication question
  - From: Colin Percival <cperciva@tarsnap.com>
- Re: Database backup and deduplication question
  - From: Greg Larkin <glarkin@sourcehosting.net>

Prev by Date: Re: [patch] nodump breakage on linux
Next by Date: Re: Database backup and deduplication question
Previous by thread: Re: Database backup and deduplication question
Next by thread: Re: Database backup and deduplication question
Index(es):
- Date
- Thread