[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

faster restore needed

Hi fellow tarsnappers,

like all good sysadmins we're actually testing our restores, however
throughput isn't sufficient at present for the business - zfs reports
stored (logicalused = uncompressed) after 3 hours of restore.  At this
rate a
recovery of ~ 120GB data will take 40+ hours. I'd expected something
around 5
hours duration.

We've tried a few different VMs in different locations over the last
couple of
days with similar results. The data below is from a physical production
box in Ashburn, Virginia, showing the same constraints despite
bandwidth, network
& disk io being for tarsnap/practical purposes unlimited.

## archive

The archive is neither big data nor toy data:

- 30 files
- 120 GiB (archive size)
- 3 large DB files
	- representing 2/5, 1/5 and 1/5 of the total backed-up
	compressed data
	- 110, 52, 49 GiB "on-disk" size
- all the rest are 18GiB and less

I can parallelise the restore somewhat by using multiple tarsnap
each retrieving a single file from the archive, which does help, but the
large DB
files still push overall restore time around 10h+ hours, despite
throughput to ~ 100Mb/s. See end of this email for an example.

## questions

- is this expected throughput? what are other people getting?
- how can I increase the restore throughput for a single large file?
- would spinning up a machine inside a specific AWS zone make an
  difference? a 20% increase here would be worth it

## process

- acquire a FreeBSD 11.x amd64 + zfs system
- at least 4GB RAM & 200GB zpool, minimum 2 CPU cores

# zfs create -o canmount=off zroot/var/db
# zfs create zroot/var/db/couchdb
# chown -R couchdb:couchdb /var/db/couchdb
# echo 'tmpfs   /tmp    tmpfs   rw,mode=01700,size=12g  0       0' >>
# mount /tmp
# pkg install -y couchdb tmux rsync htop pstree tarsnap

## recover data

- retrieve the master key to /tmp/tarsnap.key and chmod it

cd /var/db/couchdb
time tarsnap -xv --retry-forever -S --strip-components 6 \
  --print-stats --humanize-numbers --keyfile /tmp/tarsnap.key \
  --chroot -f couchdb-20171008-2100

## network

# sockstat -s |grep 51772
root     tarsnap    51772 3  dgram  -> /var/run/log
root     tarsnap    51772 4  tcp4   ESTABLISHED

### net-mgmt/iftop report

There is a marked difference between iftop's bandwidth report and what
ending up on disk. If iftop's throughput matched on-disk restored data,
be ok:

- 7Mb/sec ~ 25Gb/hours = 5 hours for the 120Gb
- but we are seeing on-disk speed of < 3 Gb/hour. =>  219Kb   184Kb   220Kb
             <=                 5.65Mb  4.67Mb  5.63Mb
averaged rates over 40s:
transmit        731Kb   844Kb   905Kb
receive     6.84Mb  6.69Mb  6.44Mb
total       7.56Mb  7.52Mb  7.33Mb

## tarsnap archive details

last tarsnap run (via backed-up machine):
a var/db/couchdb/.zfs/snapshot/tarsnap-active-backup/_replicator.couch
a var/db/couchdb/.zfs/snapshot/tarsnap-active-backup/_users.couch
.. another 30 files 
                                       Total size  Compressed size
All archives                                27 TB           9.4 TB
  (unique data)                            307 GB           109 GB
This archive                               307 GB           109 GB
New data                                   254 MB            92 MB

## infrastructure:

- Xeon-Dual-E5-2660 64GB + 200GB local SSD
- FreeBSD 11.0p9 amd64
- tarsnap 1.0.37 built from custom ports tree

## some observations

### speed & usability

As a user, I'd expect that tarsnap should without scripting be able to
as fast as my network and disk can handle it -- the backend S3
should be an implementation detail, not a customer-facing limiting

Personally, I don't really mind how long backups take, but restores
should go
as fast as possible without resorting to adhoc scripting. When I need to
restore I'm not going to mind additional cost so I can get the business
up and
running again.

A --parallel-restore <n> flag that spawns n incantations and does the
optimising work of sorting files by size for the incantations for
restore, thus churning through files a bit faster would ease my specific
case, but I'm still limited by the largest file time which is too slow.
Yes I
can script this, but it increases the chance of errors at a time when
already enough panic.

Using google's nearline storage
could solve this for everybody.

### duration

Not knowing the estimated time to completion is really frustrating. In a
situation this would be the #1 thing I'd want to have even a rough
of. We're calculating the data manually via zfs stats, and network
rather than a more accurate internal accounting. It would be awesome to
able to hit ^T and get useful stats about recovery rather than wait for

- bandwidth (data transiting the network)
- throughput (data coming out of the tar end onto disk)
- estimated time to completion (sweaty palm duration)
- in human-readable figures (Gb/hour or similar)

BTW I was tempted to report all the figures in nanoseconds and
"picobytes" ;-)

Pro Ops Yak Herder

# recover all files in parallel from the most recent archive
# MIT license
# https://git.io/vdrbG
# "works on my machine"
# lots of assumptions notably path length (strip-component)

# get the latest archive as our names can be sorted by time
ARCHIVE=`tarsnap --keyfile /tmp/tarsnap.key --list-archives | sort |
tail -1`

# order the archives by descending size
FILES=`tarsnap --keyfile /tmp/tarsnap.key -tvf ${ARCHIVE} | cut -w -f
5,9 | sort -rn | cut -w -f 2`

# spawn 10 invocations in parallel (use -P 0 for unlimited)
echo $FILES | xargs -P 10 -n 1 -t \
	time tarsnap \
	--retry-forever \
	-S \
	--strip-components 6 \
	--print-stats \
	--humanize-numbers \
	--keyfile /tmp/tarsnap.key \
	--chroot \
	-xv \
	-f ${ARCHIVE}

# profit