[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Backing up large unchanging files



Hi Colin,

On Mon, Jul 14, 2025 at 03:53:50PM +0000, Colin Percival wrote:
> On 7/14/25 08:46, Tim Bishop wrote:
> > I'm backing up some large unchanging files (web server logs). Aside from
> > the current log, they mostly are unchanging on a daily basis. As per
> > recommendations I've not compressed these files which gives Tarsnap the
> > best chance to deduplicate and compress.
> > 
> > But, the problem is that Tarsnap is reading these files every day in
> > their entirety. I guess it has to so it can identify changed blocks, but
> > this is making the backup take a long time and creates a fair amount of
> > I/O. And aside from the monthly log rollover, these files haven't
> > changed from one day to the next.
> 
> Assuming you're not running with --lowmem, tarsnap should recognize files
> which haven't had their {path, inode number, size, mtime} change since the
> last backup.  So it should only be re-reading the file which is currently
> being written, not the rotated logs.

Thanks - that's what I hoped would happen. But not what I saw (no
--lowmem option in use). Here's an example rotated log file:

root@server:/home/tdb # stat -x /logs/access.log.0
  File: "/logs/access.log.0"
  Size: 49492046236  FileType: Regular File
  Mode: (0644/-rw-r--r--)         Uid: (    0/    root)  Gid: (    0/ wheel)
Device: 18446744072915916357,1759051977   Inode: 13271    Links: 1
Access: Mon Jul 14 16:19:56 2025
Modify: Thu Jul 10 16:22:28 2025
Change: Thu Jul 10 16:22:28 2025
 Birth: Fri Jul 23 15:33:48 2010

So path and mtime definitely haven't changed in 4 days. I can't see how
the inode could have. Size must be the same too.

There was a sucessful run of tarsnap the past few days with these same
files too - I check the archives with -t to confirm the mtime and size
at least.

Here's some procstat output for tarsnap when it was reading that file:

root@server:/ # procstat files 46728
  PID COMM                FD T V FLAGS    REF  OFFSET PRO NAME
46728 tarsnap           text v r r-------   -       - -   /usr/local/bin/tarsnap
46728 tarsnap            cwd v d r-------   -       - -   /logs
46728 tarsnap           root v d r-------   -       - -   /
46728 tarsnap              0 p - rw------   3       0 -   -
46728 tarsnap              1 p - rw------   4       0 -   -
46728 tarsnap              2 p - rw------   4       0 -   -
46728 tarsnap              3 v r rw------   1       0 -   /var/tarsnap-cache/lockf
46728 tarsnap              4 s - rw---n--   1       0 TCP 0 0 my::ipv6.21911 tarsnap::ipv6.9279
46728 tarsnap              5 v d r-------   1       0 -   /root
46728 tarsnap              6 v d r----n--   1 536547227 - /logs
46728 tarsnap              7 v r r-------   1 11414077440 - /logs/access.log.0

The last line showing it some distance through reading that file.

Of course, I now do a dry run with -v and it flies past the file without
a problem. And I'm pretty sure yesterday it didn't spend ages on it
either.

Is there a way of extracting the last {path, inode number, size, mtime}
for a particular file, either from the cache or previous archives? That
would at least confirm whether it's just my mistake somewhere.

Tim.

-- 
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x6C226B37FDF38D55