Flexible Update Propagation for Weakly Consistent ReplicationOutlineAnti-EntropyGoalsData StructuresOrderingsOrderings, continuedThe Algorithm (Quick Version)The Algorithm, continuedSlide 10Creation and RetirementCreation and Retirement, continuedDiscussionDiscussion, continuedPerformancePerformance, continuedP2P?Flexible Update Propagation for Weakly Consistent ReplicationKarin Petersen, Mike K. Spreitzer, Douglas B. Terry,Marvin M. Theimer and Alan J. DemersPresented by: Ryan HuebschCS294-4 P2P Systems – 10/13/03OutlineAnti-EntropyGoalsData StructuresOrderingThe AlgorithmCreation and RetirementDiscussionPerformanceP2P discussion/questionsAnti-EntropyEntropy - a process of degradation or running down or a trend to disorder.Bring 2 replicas up-to-dateThree Major Design DecisionsPairwise communication between replicasExchange of update operationsOrdered propagation of operationsGoalsSupport for arbitrary communication topologiesOperation over low-bandwidth networksIncremental progressEventual consistencyEfficient storage managementLight-weight management of dynamic replica setsArbitrary policy choicesData StructuresReplica:DatabaseWrite LogServer:ClockV, OCSN, OSN…DatabaseLogTruncatedLogTruncated(< OSN)AHighest A.Clockfor server Athat is in logB CCommitted(< CSN)AHighest A.Clock forserver A that hasbeen truncatedB COVOrderingsPrefix PropertyIf R has write Wi that was accepted by server X, it has all writes X accepted before WiStable (Committed Order)Decided by primary replicaAssigns the final CSN, which is < infinityNew CSN is propagated to nodesAccept OrderPartial order of all writes accepted by a particular serverAccept stamp – logical or real-time clockOrderings, continuedCausal-Accept OrderAccept-stamp is a logical clockClock is advanced when a write is received (through anti-entropy) that has a higher accept-stamp.Provides better chances of a node seeing the same database from different serversIf they have the same writes, even if uncommitted, will be same orderThe Algorithm (Quick Version)R is being updated by SS retrieves R.V and R.CSNSTEP 1: Decide if a full transfer is neededIF (S.OSN > R.CSN) THEN[If S does have enough log]Rollback S’s database to the state corresponding to S.O[Remove all writes that S has a log for]OutputDatabase(S.DB)OutputVector(S.O)OutputOSN(S.OSN)[R now has the same database and truncated thewrite log to the same point as S]ENDThe Algorithm, continuedStep 2: Bring R up-to-date with remaining committed writesIF R.CSN < S.CSN THEN[If R is missing committed writes] w = first write after CSNWHILE (w) DO IF w.accept-stamp <= R.V(w.server-id) THEN [Check R’s vector to see if it has the write]OutputCommitNotification(w) ELSEOutputWrite(w) END w = next commited write in S.logENDENDThe Algorithm, continuedStep 3: Bring R up-to-date with remaining uncommitted writesw = first tentative write in S.logWHILE (w) DOIF R.V(w.server-id) < w.accept-stamp THEN[Check R’s vector to see if has the write] OutputWrite(w)ENDw = next write in S.logENDStep 4: Finish UpOutputCSN(S.CSN)OutputVector(S.V)Creation and RetirementTreated just like a write (elegant)Si is trying to join via server SxSx creates a new write<infinity, Tk,i, Sk>Si is server id, <Tk,i, Sk>Si sets clock to Tk,i + 1Notice the new server id is globally unique, recursive, and could be longThe write is propagated to other nodes through anti-entropyCreation and Retirement, continuedServer S is updating server RServer S.V has an entry for server Si (<Tk,i, Sk>), while R does not.2 Cases:R has not seen the creation of SiThen R.V(Sk) < Tk,iS has not seen the retirement of SiThen R.V(Sk) >= Tk,iWhy? Creation/Deletion is recorded as a normal write, thus the prefix property will hold.Recursive naming helps too, if Sk retired, can still trace back and decide the proper state. This is explained as the virtual CompleteV in the paper.DiscussionDiscussion, continuedMost properties are not special in themselves, the combination is novelDifferent decisions are mostly independentIdeas can be applied to other systems (other than Bayou)SecurityUse certificates to insure user can make updateNot much detail givenUsed later on as an excuse for high overheadsLots of policy decisions to be madeWhen to reconcile, with whom, when to truncate logPerformance1316 bytes of update overhead520 bytes for certificateNetwork transfer most significant costPerformance, continuedHard to know if the numbers are good, nothing to compare them toWould have been nice to see a larger deployment and measure propagation delay, consistency, etc.P2P?Is Anti-Entropy applicable to P2P systems?Review the goals… arbitrary topology, low b/w, aggressive storage management…There is a centralized component (the serializer)… is this okay?Can it handle failures/churn?Security, what happens if there is a faulty
View Full Document