Is there any documentation on how de-duplication works?
From what I understand, each archive is split into chunks, which are hashed
then encrypted (or encrypted then hashed, perhaps) and uploaded to the
server, and two chunks with the same hash are considered duplicates, saving
an upload.
I'm wondering how the boundaries between chuncks is established. If my
first upload is chunked as (BC)(DEFG)(HIJ) and my 2nd (after inserting an A
at the start) as (AB)(CDEF)(GHIJ) then none of the chunks will be the same,
and the whole file will need to be uploaded again even though the change to
the file was tiny.
... so I guess that's not how it works, and I'm left what kind of
cleverness is in play here.