[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Backing up large unchanging files
Hi Colin,
On Mon, Jul 14, 2025 at 03:53:50PM +0000, Colin Percival wrote:
> On 7/14/25 08:46, Tim Bishop wrote:
> > I'm backing up some large unchanging files (web server logs). Aside from
> > the current log, they mostly are unchanging on a daily basis. As per
> > recommendations I've not compressed these files which gives Tarsnap the
> > best chance to deduplicate and compress.
> >
> > But, the problem is that Tarsnap is reading these files every day in
> > their entirety. I guess it has to so it can identify changed blocks, but
> > this is making the backup take a long time and creates a fair amount of
> > I/O. And aside from the monthly log rollover, these files haven't
> > changed from one day to the next.
>
> Assuming you're not running with --lowmem, tarsnap should recognize files
> which haven't had their {path, inode number, size, mtime} change since the
> last backup. So it should only be re-reading the file which is currently
> being written, not the rotated logs.
Thanks - that's what I hoped would happen. But not what I saw (no
--lowmem option in use). Here's an example rotated log file:
root@server:/home/tdb # stat -x /logs/access.log.0
File: "/logs/access.log.0"
Size: 49492046236 FileType: Regular File
Mode: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ wheel)
Device: 18446744072915916357,1759051977 Inode: 13271 Links: 1
Access: Mon Jul 14 16:19:56 2025
Modify: Thu Jul 10 16:22:28 2025
Change: Thu Jul 10 16:22:28 2025
Birth: Fri Jul 23 15:33:48 2010
So path and mtime definitely haven't changed in 4 days. I can't see how
the inode could have. Size must be the same too.
There was a sucessful run of tarsnap the past few days with these same
files too - I check the archives with -t to confirm the mtime and size
at least.
Here's some procstat output for tarsnap when it was reading that file:
root@server:/ # procstat files 46728
PID COMM FD T V FLAGS REF OFFSET PRO NAME
46728 tarsnap text v r r------- - - - /usr/local/bin/tarsnap
46728 tarsnap cwd v d r------- - - - /logs
46728 tarsnap root v d r------- - - - /
46728 tarsnap 0 p - rw------ 3 0 - -
46728 tarsnap 1 p - rw------ 4 0 - -
46728 tarsnap 2 p - rw------ 4 0 - -
46728 tarsnap 3 v r rw------ 1 0 - /var/tarsnap-cache/lockf
46728 tarsnap 4 s - rw---n-- 1 0 TCP 0 0 my::ipv6.21911 tarsnap::ipv6.9279
46728 tarsnap 5 v d r------- 1 0 - /root
46728 tarsnap 6 v d r----n-- 1 536547227 - /logs
46728 tarsnap 7 v r r------- 1 11414077440 - /logs/access.log.0
The last line showing it some distance through reading that file.
Of course, I now do a dry run with -v and it flies past the file without
a problem. And I'm pretty sure yesterday it didn't spend ages on it
either.
Is there a way of extracting the last {path, inode number, size, mtime}
for a particular file, either from the cache or previous archives? That
would at least confirm whether it's just my mistake somewhere.
Tim.
--
Tim Bishop
http://www.bishnet.net/tim/
PGP Key: 0x6C226B37FDF38D55