[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Identifying which files changed between archives
Hi Scott,
Of course! The current draft has it called
--probability-check-file
The man-page entry (again, in the draft) is:
--probability-check-file p
(c mode only)
If a file has not changed, read it anyway (with the given probability p from
0.0 to 1.0) and compare its hash(es) and trailer to the cached values. This
might provide a warning about a failing disk if other detection methods fail.
However, this operation will make Tarsnap run considerably slower, so we do
not recommend using a high probability with large archives.
Cheers,
- Graham
On Tue, Jun 25, 2019 at 07:55:04AM +1000, Scott Dickinson wrote:
> Hi Graham,
> That definitely sounds useful. Is the proposed probability value
> intended to be a user configurable value? I'm thinking that keeping it
> low for normal backups as you suggested, then ramping up to a higher
> value once every x backups. Similar idea to running deltas most of the
> time, then a full one every period.
> Cheers,
> Scott
>
> On 25 June 2019 7:09:54 am AEST, Graham Percival <gperciva@tarsnap.com>
> wrote:
>
> Hi Scott,
> Thanks for clarifying the use case! Colin had the idea of using the Tarsnap
> cache to detect disk errors. Namely, if the filesystem reports that a file
> hasn't changed, there would be a random chance that the tarsnap client would
> read the file anyway, and compare the chunk hashes against the expected values
> (from previous backups). If you wanted to be paranoid, you could specify a
> probability of 100%, but more likely you'd pick a value like 10% so that it
> didn't impact performance too much.
> This wouldn't warn you about a disk failure which changed the file
> modification time or size, but it would be perfect for a disk which flipped a
> few bits in a file.
> [1]https://github.com/Tarsnap/tarsnap/issues/19
> The good news is that I have a proof-of-concept implementation of this.
> I ended up putting it on the back-burner, but I've been looking at the code
> this morning and I still think it's plausible. Does this sound useful?
> Cheers,
> - Graham
> On Sun, Jun 23, 2019 at 06:16:52PM +1000, Scott Dickinson wrote:
>
> Thanks Colin & Jacob.
> With several hundred Gb's of data being archived, the local tarbell
> option is probably not going to work for me.
> Does "tarsnap -t -f" show file modification date based on what the
> filesystem is reporting, on when tarsnap detects a change?
> To provide more details, I had a number of sectors on an SSD
> silently
> faile so I needed to identify and restore files that were corrupted
> by
> this evemt. The filesystem did not report any change in modification
> date on these files, so couldn't rely on this to identify which
> files
> to restore. Hence my question around reporting on the files impacted
> by
> block changes between archives, to both identify an expected change,
> and recover from this.
> If tarsnap can't do this, perhaps I need to start capturing a hash
> of
> each file at the time of backup, and compare those between archives.
> Cheers,
> Scott
> On 19/6/19 7:15 am, Jacob Larsen wrote:
> I had the same issue a while back. I was told it was not easily
> fixed due to the layers in Tarsnap. I ended up making a regular
> tarball and fed that to tarsnap. That way I had a local tarball that
> matched the actual data in the archive. Then I could extract it and
> compere at the next backup. A bit data heavy process but it gave me
> what I needed. It is scriptable, so it is possible to let your
> backup script log the changed files on each backup run, but it has a
> pretty high cost in disk I/O, plus you need to keep a copy of your
> data around between backups.
> /Jacob
> On 18/06/2019 13.49, Scott Dickinson wrote:
> Hi,
> I'm trying to work out how to generate a report on files that are
> new or changed in a particular archive. I can't seem to find an easy
> way to do this, so hoping someone can help.
> Here is the scenario I'm working through.
> 1. Backup directory "x" on 1st May 2019. First time archive, all
> 10Gb are sent as expected.
> 2. Backup directory "x" on 1st June 2019. Second time archive, 25Mb
> are sent.
> How can I report on which files that 25Mb of delta's are part of? In
> this scenario, I wasn't expecting any changes to the files over the
> month, so am surprised there were anything above the metadata to be
> backed up. My understanding is that Tarsnap needs to know which
> files the changed blocks belong to, therefore in theory this
> metadata should be extractable.
> The closest I've found to locate this is "tarnsap -t -f 'x' -v
> --iso-dates", but this doesn't natively provide the details I'm
> after. Ideally I'd like tarsnap to be able to report which files
> were uploaded at the time or archive with an option similar to
> --print-stats.
> Anyone got any ideas?
> Cheers,
> Scott
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
> References
>
> 1. https://github.com/Tarsnap/tarsnap/issues/19