[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Verbose mode that prints added/modified files?
On 04/03/11 00:42, Simo Melenius wrote:
> A quick question/feature req: When creating an archive is it possible
> to print the paths of new files and updated files (that generated new
> blocks on top of what was in the store already for that particular
> file)?
New files isn't possible: Tarsnap's deduplication works on an archive set
wide basis, and the tar headers are bundled together to create more blocks,
so there's no way to identify is a file is new or not. (And how exactly do
you define "new"? What if a file is renamed and thus has a new path but
all the contents is old?)
Updated files is difficult, but theoretically possible. You'd need to
decide what to do with file fragments, though -- if Tarsnap has a small
amount left over in a file which isn't worth storing as a block by itself,
Tarsnap will bundle it together with other small fragments; so you could
have a small part of a file being stored again even though the file hadn't
changed.
> [snip]
> Would this be technically difficult considering tarsnap architecture?
> I'm willing to cook up a patch for review and potential inclusion in
> the upstream if it doesn't break badly against the design of tarsnap.
The problem here is that the deduplication is done at a different layer
from the crawl-a-directory-tree-and-generate-a-tarball code. In essence
the layers are:
* bsdtar code crawls a directory tree and feeds files to
* libarchive code, which generates a stream of tar and feeds it to
* multitape code, which splits the stream into several sub-streams and uses the
* chunkifier code, which splits each sub-stream into chunks and sends them to
* chunk deduplication code, which looks at each chunk to decide if it's new.
(And underneath this all is the transactional storage layer, the request
protocol layer, the network connection protocol layer, and the underlying
non-blocking network I/O code.)
Feeding information back from the chunk deduplication code up to the bsdtar
code is possible, but would definitely be awkward.
--
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid