[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Client-side deduplication during extraction



Hi Colin,

On Sat, Apr 08, 2017 at 07:52:54PM -0700, Colin Percival wrote:
> > If not, then I am planning to use an us-east-1 EC2 instance so that at
> > least the Tarsnap server<->client bandwidth is in one place. I can then
> > use that machine to deduplicate and then the download to my machine here
> > can at least be efficient. In this case, will I still end up being
> > billed by Tarsnap for the "Compressed size/All archives" figure?
> 
> If you extract all of the archives, yes.
> 
> How are you planning on storing your data after you extract all of the
> archives?  Something like ZFS which provides filesystem level deduplication?

I intend to use my own tool, ddar[1]. It deduplicates at userspace
level. This particular set of archives will never change again, and I
won't ever need to add to it again. So I'd like to take it offline so
that storage is cheaper. I use git-annex to keep multiple copies of
static data, including off-site, as needed.

Robie

[1] https://github.com/basak/ddar