1Gossip-based protocolsWhere we were Programmers face problems in buildingdistributed applications Fundamental problems Consensus Atomic Broadcast / Multicast Group membership Isis Toolkit [Birman, van Renesse et al.]Where we areScalabilityA. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis,D. Swinehart, and D. Terry. Epidemic algorithms for replicated databasemaintenance. In Proceedings of the 6th Annual ACM Symposium on Principlesof Distributed Computing, Vancouver, BC, August 1987, pp. 1-12.Setup Database replicated at thousands of sites Network is slightly unreliable Point-to-Point communication abstraction Crash failure model2Setup Database replicated at thousands of sites Network is slightly unreliable Point-to-Point communication abstraction Crash failure model Updates injected at a single site Updates must propagate to all other sites* Want contents of all replicas to be identical ifupdates stop and system left aloneNotation S is a set of n sites (replicas) K is a set of keys V is a set of values T is a set of timestamps (totally ordered) For any site s and key k,s.ValueOf : K (V T)More notation Pretend there is only one keys.ValueOf (V T) Consistency definition s, s’ S : s.ValueOf = s’.ValueOf To update the database with value v at time ts.ValueOf := (v, t)Direct mailIdea: If an update is injected at site s, then s mails theupdate to every other site in SUpon an update at site s: for each s’ S \ {s} dosend (Update, s.ValueOf) to s’ endloopUpon receiving (Update, (v,t)): if s.ValueOf.t < t thens.ValueOf := (v,t) endifWeakness: send is not reliable what if site crashes?3Anti-entropyIdea: Every site regularly chooses another site at randomand exchanges database contents with it to resolvedifferences.Each server s periodically executes: for some s’ S \ {s} doResolveDifference(s,s’) endloopPush: ResolveDifference(s,s’) { if s.ValueOf.t > s’.ValueOf.t thens’.ValueOf := s.ValueOf endifPull: ResolveDifference(s,s’) { if s.ValueOf.t < s’.ValueOf.t thens.ValueOf := s’.ValueOf endifPush vs. pull analysisLet pi be the probability that a site still has notbeen updated by the ith try at anti-entropyFor large values of n: Push: pi+1 = pi e-1Pull: pi+1 = (pi )2Converges much faster for small piExample using pull mechanisms1s2s3s4(v, 1)Updatedirect mail(v, 1)anti-entropy(v, 1)(v, 2)direct mail(v, 2)(v, 2)(v, 2)Anti-entropy facts Guaranteed to eventually propagate updateto everyone with probability 1 Anti-entropy infects everyone in O(log n) foruniformly chosen sites Backup mechanism for direct mail Weakness: must go through entire database4Epidemic terminology Complex epidemics Sites can become “cured” Terminology: susceptible, infective, removed Strengths: sites do not mail everyone and do nothave to enumerate entire database Weakness: some may be left susceptible Resilient to unreliable communication Anti-entropy is a simple epidemicRumor mongering (informal) All sites start out susceptible When a site s receives a new update, itbecomes infective s periodically chooses another site s’Rumor mongering (informal) All sites start out susceptible When a site s receives a new update, itbecomes infective s periodically chooses another site s’ If s’ does not know the rumor, then it receivesthe update and also becomes infective If s’ already knows the rumor, then sbecomes removed with some probabilityRumor mongering protocolFor a site s:let L be a list of (initially empty) infective updatesperiodically:for some s’ S \ {s} do for each update u L send u to s’if s’ already knows about u then remove u from L with probability 1/k endloopupon receiving new update u:insert u into L5Analysis of rumor mongeringi = fraction of infective sitess = fraction of susceptible sitesr = fraction of removed sitessidtds=isksidtdi)1(1+=kskkdsdi 11++=dskskkdi ++=11cksskksi +++=ln1)(kkc1+=()ksskksiln11)( ++=()()skes+=11Rumor mongering facts Expected fraction of susceptible sitess = e-(k+1)(1-s)Back up mongering with anti-entropy Mongering vs. direct mail Redistribution Consider case when half of sites receive update Old rumors die fastDeath and its consequences Replace deleted item with a deathcertificate = (NIL, tnow) Provided no further updates, a deathcertificate eventually “deletes” all copies of anitem…but when? Problem: what if a single site is down?Death certificates Death certificate contains two values t – time of deletion t1 – threshold value, all servers discard deathcertificate after time t + t16Dormant death certificates Death certificate contains four values R – set of sites that keep a dormant death certificateafter t + t1 t – time of deletion t1 – threshold value, all servers not in R discard deathcertificate after time t + t1 t2 – all servers discard the certificate after t + t2Dormant death certificates Death certificate contains five values R – set of sites that keep a dormant death certificateafter ta + t1 t – time of deletion ta – time of activation t1 – all servers not in R discard certificate after ta + t1 t2 – all servers discard the certificate after ta + t21Bimodal multicastK.P. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky. BimodalMulticast. ACM Transactions on Computer Systems. 17(2): 41-88. May 1999War gamesgeneralClass I – Strong reliability Properties: Agreement, validity, termination, integrity Costly protocols Limited scalability Unpredictable performance undercongestion Degraded throughput under transientfailures (full buffers and flow control)Class II – Best effort reliability “If a participating process discovers a failure,a reasonable effort is made to overcome it.” Better scalability than Class I protocols Difficult to reason about systems withoutconcrete guarantees2Bimodal multicast claims Scales well Provides predictable reliability and steadythroughput under highly perturbed conditions Very small probability a few processes deliver High probability almost everyone delivers “Vanishingly small probability” in betweenA problem to our solution Applications that need high
View Full Document