Hi Scott,
Thanks for clarifying the use case! Colin had the idea of using the Tarsnap
cache to detect disk errors. Namely, if the filesystem reports that a file
hasn't changed, there would be a random chance that the tarsnap client would
read the file anyway, and compare the chunk hashes against the expected values
(from previous backups). If you wanted to be paranoid, you could specify a
probability of 100%, but more likely you'd pick a value like 10% so that it
didn't impact performance too much.
This wouldn't warn you about a disk failure which changed the file
modification time or size, but it would be perfect for a disk which flipped a
few bits in a file.
https://github.com/Tarsnap/tarsnap/issues/19
The good news is that I have a proof-of-concept implementation of this.
I ended up putting it on the back-burner, but I've been looking at the code
this morning and I still think it's plausible. Does this sound useful?
Cheers,
- Graham
On Sun, Jun 23, 2019 at 06:16:52PM +1000, Scott Dickinson wrote:
Thanks Colin & Jacob.
With several hundred Gb's of data being archived, the local tarbell
option is probably not going to work for me.
Does "tarsnap -t -f" show file modification date based on what the
filesystem is reporting, on when tarsnap detects a change?
To provide more details, I had a number of sectors on an SSD silently
faile so I needed to identify and restore files that were corrupted by
this evemt. The filesystem did not report any change in modification
date on these files, so couldn't rely on this to identify which files
to restore. Hence my question around reporting on the files impacted by
block changes between archives, to both identify an expected change,
and recover from this.
If tarsnap can't do this, perhaps I need to start capturing a hash of
each file at the time of backup, and compare those between archives.
Cheers,
Scott
On 19/6/19 7:15 am, Jacob Larsen wrote:
I had the same issue a while back. I was told it was not easily
fixed due to the layers in Tarsnap. I ended up making a regular
tarball and fed that to tarsnap. That way I had a local tarball that
matched the actual data in the archive. Then I could extract it and
compere at the next backup. A bit data heavy process but it gave me
what I needed. It is scriptable, so it is possible to let your
backup script log the changed files on each backup run, but it has a
pretty high cost in disk I/O, plus you need to keep a copy of your
data around between backups.
/Jacob
On 18/06/2019 13.49, Scott Dickinson wrote:
Hi,
I'm trying to work out how to generate a report on files that are
new or changed in a particular archive. I can't seem to find an easy
way to do this, so hoping someone can help.
Here is the scenario I'm working through.
1. Backup directory "x" on 1st May 2019. First time archive, all
10Gb are sent as expected.
2. Backup directory "x" on 1st June 2019. Second time archive, 25Mb
are sent.
How can I report on which files that 25Mb of delta's are part of? In
this scenario, I wasn't expecting any changes to the files over the
month, so am surprised there were anything above the metadata to be
backed up. My understanding is that Tarsnap needs to know which
files the changed blocks belong to, therefore in theory this
metadata should be extractable.
The closest I've found to locate this is "tarnsap -t -f 'x' -v
--iso-dates", but this doesn't natively provide the details I'm
after. Ideally I'd like tarsnap to be able to report which files
were uploaded at the time or archive with an option similar to
--print-stats.
Anyone got any ideas?
Cheers,
Scott