[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tarsnap outage 2017-02-28 17:37 -- 2017-02-28 ~21:52



Hi everyone,

I'm sure that most of you are aware of Tuesday's Amazon S3 outage and the
resulting Tarsnap outage.  I'm sending this email to provide a few additional
details.

Attempts to create an archive between 17:37:33 and approximately 21:52 UTC
will have failed.  Archive creations which started before that time and were
still ongoing will almost certainly have also failed (the exception is if the
tarsnap client spent 4.5 hours reading data without finding any data which it
identified as "new"; this is possible but unlikely), but they may have created
checkpoints before the outage began.  Such checkpoints will be automatically
recovered into "partial" archives (named "foo.part" where "foo" is the name
originally specified for the archive) the next time an archive is created or
deleted, or if you run `tarsnap --recover`.

Attempts to list or extract archives between 17:37:33 and approximately 21:03
UTC will have failed.  (S3 recovered for reads approximately 49 minutes before
it recovered for writes.  This is not surprising; writes are always much more
likely to fail, since they store data in multiple places rather than reading
from one place, and Amazon's process of bringing internal errors under control
appears to have been gradual.)

No data was lost; any archives successfully created (even those created at
17:37:32) remain intact, despite having been inaccessible for the duration of
the outage.

As always, I'm happy to answer any further questions.

Sincerely,
-- 
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid