[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Backing up large unchanging files

To: Tim Bishop <tim@bishnet.net>
Subject: Re: Backing up large unchanging files
From: Colin Percival <cperciva@tarsnap.com>
Date: Thu, 17 Jul 2025 12:20:44 -0700
Autocrypt: addr=cperciva@tarsnap.com; keydata= xsFNBGWMSrYBEACdWRqDn3B3SKO7IG0/fGHYtfs26f3Q5QeAcasy1fQLniwGQWn5rlILhbCD K/jdNoDm5Zxq20eqyffoDNObCjnHgg4tGANdi+RmDy+7CDpE789H8dss9y7Pt5DlGGAXQQnt hxush3EYS/Ctprd9UUL/lzOOLOU1aNtzB84tNrJBtcJmL7OYHfyTSNFxvedqJrrasejIQOLI t/DQ89BPzz+vsKHz7FJPXh3fsVkzLA00DJYcfkgxyABfJNA7U6yMwd4DVSdx/SsvfIDMVXnu UXCXswo106WPZbYGlZPpq0wW6iibtTerJix+8AeuwXvl9O1p8yESK4ErkIxCnmghTSz+pdzj z/6xBRkdDM9VdZ0r+CzsaNXMpDOzFuKyjaiYBdgCLljbDnXIHFcqXenrZ7Xwkm09g/M4uVSh pIUG2RYa6tsHSQoGCp3f2RZv1znfViKQFbbL83QjtPA20AhseZSYbHp1FPhXyy9J0wkGL16L e99g6gdGeIRE82BZjBjKGDkoyDPq+oDRSFl8NtzmIKy+cfz00nViqcTF4bREXEawFGhlpO0X O9q8mijI9iFB6zaPBiSdJGBL5ML5qLTNCl8Zlf4m1TBvmRTqF/lzMHVXHidDoUhpSh/y3AFZ 1KrYc27ztJQywDJPJPWPbtY8YhFLFs377gfP8WldsZjzp8nvoQARAQABzSVDb2xpbiBQZXJj aXZhbCA8Y3BlcmNpdmFAdGFyc25hcC5jb20+wsGRBBMBCAA7FiEEglY7hNBiDtwN+4ZBOJfy 4i5lrT8FAmWMSyYCGwMICwkNCAwHCwMFFQoJCAsFFgMCAQACHgUCF4AACgkQOJfy4i5lrT+i Yg/+PYyJNoFuygtV5t/skcjYmvEC93mnazEvh+x99vGYZnGKeJ8NDOF4QCUzeHquOWxDi8Zl reXyswKcrIquPxxX6+YyGe97VbvLnez3ksfzOYRj1F4qV0Rq8ZNK51+bvIrbcS3SfDaRioAk D7WWwFor8y/hSwxYkfsKbtP5PRcem20JUxuC085zqWLaKv5t5n2CBzAGMjwJaQ3tM3AXVwWJ uJaHA6ot/6fntJlmkfcyCYyyr0D6b0guRj3STbZ2hNn5o2AI+f6LJJ31s2sPFjl6rs7fORf3 hFSNOHDd2HxfVBXFdQy24ROkC4orBBz2xh9GScjxxT/hbXkfufkubFubw7n0HkvHzA3UF+Qq A8JiI3n+d7ocsP0/5BQ2sZdeqPGJgHx6RkAMuW1tJ29wSvCN1qMgFwhYkpQdfvHlociQrimU fvlRfSrBEe8o7tvIuEdpvwvCZSTJqQbVoMw8UHFE7nzyCXUSab5h6PbjakCqim13ekVO2KFF TTPcz5o5jEeUY75tzbIwcDfFbT5KqNjWy06TVdM9VEJDHSfOfxHR3kSEwZ+tT2aTvL3grsUn gFwSNcj4Cl4CRFfUw8zVZY+7O7RiMlhBqykikvUurrdGKc1Scwa0yuppdA6eVvylyTWSQGrQ +uLWtV1LUKN7ZqKJWBkLPt9nS4XZWGyBvxOHYqjOwU0EZYxKtgEQANYfgbtUMVnhjxDHhWLp g5kLHK3YW0TfJKzpXqDB7NiqxHofn4OcbZnVC3MKggcbs9o1/UtsjnlsG8550PfiYkDXvPiO RJwgbGs6MGIDK797C6cnBLQ8xwBa9SL4cl5iQFnhWmt6vwnJ+an/cm5JpYves3wL7jV09qU9 57hkHXEUcl38r4FssZzVcLKPUVTa3Un+QGRTGDGe/f4ctjMaqv0ZCM+l2ixPhf/vqESrfSLv V/+T3dmtUfXjazO3SABvsHwxgGuTTYOlKoPCaebr+BRdqm0xeIShoIlhvTI8y4clchqx/Uxg UG5X2kvU13k3DS3Q8uLE4Et9x1CcZT6WGgBZSR6R0WfD0SDnzufNnRWJ0dEPA2MtJHE7+85R Vi9j/IgZV+y5Ur+bnPkjDG1s2SVciX5v9HQ0oilcBhvx0j5lGE9hhurD9F+fCvkr4KdbCknE 6Y8ce8pCNBUoB/DqibJivOzTk9K9MGB5x0De5TerIrFiaw3/mQC9nGeO9dtE7wvDJetWeoTq 4BEaCzpufNqbkpOaTQILr4V6Gp7M6v97g83TVAwZntz/q8ptwuKQPZ2JaSFLZn7oWUpYXA5s +SIODFHLn6iMoYpBQskHQjnj4lEPJadl4qj+ZKA89iDAKsniyoFXsbJe2CPbMS1yzBxKZq6K D/jpt7BOnuHr/JrXABEBAAHCwXYEGAEIACAWIQSCVjuE0GIO3A37hkE4l/LiLmWtPwUCZYxK tgIbDAAKCRA4l/LiLmWtP3jmEACQrh9gWe8F1Tkw3m6VoHKwLc5he4tX3WpQa//soPO6iGG3 S3WPruQ46NrAaAojoOcKI9UONDO5rxG0ZTX53S+lu2EO47jbcLwOCjaEpjKpDRt9ZXBQE8Xl mtBE9Bp3W9gpjB1nE3KNM1mJYgsK0QdRpwwfh4pVgGpOj8j23I6MCK+v99zEBnpgCn2GX8W/ kctRXHqWwndHysOJtRP/zrl7dDaABF1f9efUl0LL3TD3GJ9VDz+DNOin/uK2a1hiJo8QzTRk PpfUQ2ebzDsrd1i/pOWkMSkdH+rEu4AGrXWtaBwrMyrGkL6Icb6yO+P9/z0W2wlgBf3P1YRt JPgQt/Dj3yvA/UnaV/QmuVQPjl13o24UnJGsZM8XGnNdfWBKkC1Q6VXC4QT+dyBHYH9MuE9d 6oGl8pFM1+cTfEfbM62/rRoPkF1yHMsI/903VxEvuUIKfhEZAVLFyHldooNxuchntHQP9y8J 8Ou9bWYQP7MnEn+kwSwrZkjurfPkan+xQvp6dDYnj3V0GwA5pprBMaB928VIDVOv+1PNQI3t Cvk5VPv/skq+TJRMHW7bFSt8PRa91cUf1FOLIz9APDiJOzXkwxUEHGV3zPSaUhs1JYjyBeGT wDAvtLUdjOnRhEUOwlnIrztmvyciutjJoVzKEEjj5WXnHk9L9kQ1bpAjkjTONw==
Cc: tarsnap-users@tarsnap.com
In-reply-to: <aHdsmj6e2w7XdAaE@carrick-users.bishnet.net>
References: <aHUmSdRSpxrd8CXF@carrick-users.bishnet.net> <0100019809a4774d-ee703bb3-255f-4c21-a43d-e2acee766119-000000@email.amazonses.com> <aHVHTonZCWDmQtxb@carrick-users.bishnet.net> <aHdsmj6e2w7XdAaE@carrick-users.bishnet.net>

On 7/16/25 02:10, Tim Bishop wrote:

On Mon, Jul 14, 2025 at 07:07:10PM +0100, Tim Bishop wrote:

On Mon, Jul 14, 2025 at 03:53:50PM +0000, Colin Percival wrote:

On 7/14/25 08:46, Tim Bishop wrote:

I'm backing up some large unchanging files (web server logs). Aside from
the current log, they mostly are unchanging on a daily basis. As per
recommendations I've not compressed these files which gives Tarsnap the
best chance to deduplicate and compress.

But, the problem is that Tarsnap is reading these files every day in
their entirety. I guess it has to so it can identify changed blocks, but
this is making the backup take a long time and creates a fair amount of
I/O. And aside from the monthly log rollover, these files haven't
changed from one day to the next.


Assuming you're not running with --lowmem, tarsnap should recognize files
which haven't had their {path, inode number, size, mtime} change since the
last backup.  So it should only be re-reading the file which is currently
being written, not the rotated logs.


Thanks - that's what I hoped would happen. But not what I saw (no
--lowmem option in use). Here's an example rotated log file:


It happened again the following night. I couldn't replicate with a dry
run, or with a dry run using --lowmem. However, a dry run with
--verylowmem did exhibit the same behaviour.

Does Tarsnap automatically enable lowmem/verylowmem in any circumstance?
For example, if system memory is low.


Aha!  I had completely forgotten about this -- it's something I implemented
in 2008 -- but yes, there's a scenario where tarsnap switches into "lowmem"
mode.  Specifically, if you're archiving a large number of small files, the
default ("normalmem") regime can end up caching a lot of archive data; to
avoid this, tarsnap keeps track of the memory used to track "trailers" (aka
data at the end of a file which is too small to be its own block) and stops
storing those if they're taking up more memory than tarsnap is using to keep
track of complete blocks of data.

The good news here is that
1. In this scenario, *most* large files still get completely cached; it's just
an unlucky few (around 5-10% of them) which happen to be a number of complete
chunks plus a small extra bit which get affected, and
2. Tarsnap is still caching the list of complete chunks, so while it has to
re-read the file every time it's doing the far less cpu-intensive process of
computing hashes to verify that the data hasn't changed, rather than running
all of the data through the (considerably more cpu-intensive) chunking code.

I *think* I can address this with a patch to say "cache trailers on large
files even if we've decided on our own to switch into lowmem mode".  You've
been using Tarsnap for a long time; can I assume that you can test a patch
for me once I've written something?

--
Colin Percival
FreeBSD Release Engineering Lead & EC2 platform maintainer
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid

References:
- Backing up large unchanging files
  - From: Tim Bishop <tim-lists@bishnet.net>
- Re: Backing up large unchanging files
  - From: Tim Bishop <tim@bishnet.net>
- Re: Backing up large unchanging files
  - From: Tim Bishop <tim@bishnet.net>

Prev by Date: Re: Backing up large unchanging files
Next by Date: Re: Backing up large unchanging files
Previous by thread: Re: Backing up large unchanging files
Next by thread: Re: Backing up large unchanging files
Index(es):
- Date
- Thread