Rsync:ProblemSolution: RsyncOverview of How Hashing UsedIdeal CaseIdeal ProtocolSender Analyzes Own BlocksCommands: Copy or AddAdvantage of IdealDisadvantage of IdealCompute More HashesOrdinary Sum of BytesDisadvantage of a Simple SumWeighted SumReordering the i + 1 SumFurther EnhancementsSynchronizing DirectoriesSummary of Hashing UsedRsync: Efficiently Synchronizing Files Using HashingBy David ShaoFor CS 265, Spring 2004ProblemWant to synchronize with newer version of a file on a remote serverWant to minimize data sent over slow network linkWant to minimize (round-trip) communication latenciesSolution: RsyncOpen source software projecthttp://samba.anu.edu.au/rsync/Command line driven server and client for Unix-like systemsSynchronizes directories as well as filesAndrew Tridgell’s Ph.D. thesisOverview of How Hashing UsedCan reduce amount of data sent if willing to live with a very small probability of inaccuracySeveral layers of hashing—fast but less accurate and slower but almost always accurate both usedIdeal CaseDivide files into equal-sized blocksFiles are almost identical except for relatively few blocksHave almost all of the data blocks one needs—but how to know it.ReceiverSenderIdeal ProtocolReceiverSenderHashes of blocksCommands on how to build fileSender Analyzes Own BlocksHash Receiver Block 1Hash Receiver Block 2Hash Receiver Block 3Hash Receiver Block 4Hash Sender Block?Commands: Copy or AddCOPY: If the receiver already has the data block, just tell him to copy it. ADD: If the receiver does not have a data block, send it to him. COPY cheap, ADD expensiveAdvantage of IdealIf COPY, reduction in network traffic by factor approximately L / h, where L is the block size and h is the size of a hash of a block of size LDisadvantage of IdealExample: Edit source code, delete a comment at the beginningBlocks no longer neatly alignedCompute More HashesSender needs to compute hash at every byte positionMore expensive: L times more hashes computed by senderUse weaker, faster hash to weed outOrdinary Sum of BytesRolling-type property: sum of L bytes starting at position i+1 almost the same as sum starting at i.Subtract red, add green, yellow sameSum starting at iSum starting at i+1Disadvantage of a Simple SumA simple sum is too symmetricSum of “All men are mortals” is the same as “All mortals are men”Weighted SumFirst bytes have more weight than the tail ones—arbitrary decision0 1 2 3 4 5 6 0 1 2 3 4 5 6Reordering the i + 1 SumRed part to be subtracted and the green part to be added. Yellow is same.0 1 2 3 4 5 6 0 1 2 3 4 5 6Further EnhancementsCompute separate (MD4) signature for entire fileReconstruct new file using temporary storage so that the old version is never removed until a new one is known to be goodSynchronizing DirectoriesDivide into separate receiver/generatorReceiverGeneratorSenderSummary of Hashing UsedWeaker easier to compute hash with the rolling propertyStronger hash (MD4) once most candidates have been weeded outSignature over entire file as a separate
View Full Document