[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Client-side deduplication during extraction

To: Colin Percival <cperciva@tarsnap.com>
Subject: Re: Client-side deduplication during extraction
From: Robie Basak <rb@truebluelogic.co.uk>
Date: Sun, 9 Apr 2017 23:15:20 +0100
Cc: tarsnap-users@tarsnap.com
In-reply-to: <e050be45-7764-ddeb-f374-1153938f3d00@tarsnap.com>
References: <20170404200611.GZ30722@mal.justgohome.co.uk> <e050be45-7764-ddeb-f374-1153938f3d00@tarsnap.com>

Hi Colin,

On Sat, Apr 08, 2017 at 07:52:54PM -0700, Colin Percival wrote:
> > If not, then I am planning to use an us-east-1 EC2 instance so that at
> > least the Tarsnap server<->client bandwidth is in one place. I can then
> > use that machine to deduplicate and then the download to my machine here
> > can at least be efficient. In this case, will I still end up being
> > billed by Tarsnap for the "Compressed size/All archives" figure?
> 
> If you extract all of the archives, yes.
> 
> How are you planning on storing your data after you extract all of the
> archives?  Something like ZFS which provides filesystem level deduplication?

I intend to use my own tool, ddar[1]. It deduplicates at userspace
level. This particular set of archives will never change again, and I
won't ever need to add to it again. So I'd like to take it offline so
that storage is cheaper. I use git-annex to keep multiple copies of
static data, including off-site, as needed.

Robie

[1] https://github.com/basak/ddar

References:
- Client-side deduplication during extraction
  - From: Robie Basak <rb@truebluelogic.co.uk>
- Re: Client-side deduplication during extraction
  - From: Colin Percival <cperciva@tarsnap.com>

Prev by Date: Re: Merging tarsnap accounts / moving servers between accounts
Next by Date: Re: Merging tarsnap accounts / moving servers between accounts
Previous by thread: Re: Client-side deduplication during extraction
Next by thread: Re: Client-side deduplication during extraction
Index(es):
- Date
- Thread