[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: retry/append/restart restores ? Re: Speeding up slooooowwww extractions




> On 27 May 2021, at 14:53 , Dave Cottlehuber <dch@skunkwerks.at> wrote:
> 
> On Thu, 27 May 2021, at 11:04, hvjunk wrote:
>> SO, my next issue that pops up is the ability to restart/append a file 
>> busy being extracted when the tarsnap process gets killed/etc. during 
>> the restore.
>> I don’t see anything in the manual page so wonder where that is 
>> documented if at all? 
> 
> does --retry-forever help?

the case/issue is the VM/instance got restarted, ie. tarsnap needs to restart, 
not just a conenction error

> 
>> ( and yes I’ve started an instance in Canada to be closer to the 
>> tarsnap USoA for the restores, yes, seems to be about double the speed, 
>> but still <50% after 24hours for a 100GB file extraction ;( )
> 
> https://www.tarsnap.com/improve-speed.html
> 
> The only sensible option for performant tarsnap restores of large files is:
> 
> - splitting the archive *before* it goes to tarsnap

yeah, that is the challenge ;(

> - parallelised recovery

as mentioned earlier: current single big file

> - into AWS server running in US S3 hopefully in the same network area

$$$$$ cost suddenly ;(

> - then move to the expected location

adding extra $$$ ;(

> I hacked a script here https://git.io/vdrbG "works on my machine" and
> makes a number of assumptions including path length that may bite you.
> It won't help you restore a single large file, but it does help for
> many large-ish files.

YEs, that’s why it won’t work in my (current) case ;( 

> The moment we introduce pipes and splitting in shell scripts, is the
> moment when, years later, we find that the split tool truncates at 64-bit
> size, and data has been irrecoverably lost. tarsnap really should be able
> to handle this scenario natively and sensibly.


;(
Yes, that is what I’m trying to prevent myself


> In all other respects its
> my preferred choice for backup & recovery of Stuff That Matters.

Yes, so the current issue is about the CAPEX costs now to write a solution
to fix this by splitting (*reliably*) the big files, or spend more opex for a different solution ;(