[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

de-duplication detail



Is there any documentation on how de-duplication works?

From what I understand, each archive is split into chunks, which are hashed then encrypted (or encrypted then hashed, perhaps) and uploaded to the server, and two chunks with the same hash are considered duplicates, saving an upload.

I'm wondering how the boundaries between chuncks is established.  If my first upload is chunked as (BC)(DEFG)(HIJ) and my 2nd (after inserting an A at the start) as (AB)(CDEF)(GHIJ) then none of the chunks will be the same, and the whole file will need to be uploaded again even though the change to the file was tiny.

... so I guess that's not how it works, and I'm left what kind of cleverness is in play here.