DOC PREVIEW
UT CS 378 - Epidemics

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EpidemicsContext“This course is designed to cover some of the key ideas that haveproved useful or are expected to be useful for designing andbuilding tomorrow's distributed systems. The course focuses onfundamentals.”Some fundamentals Atomic Commit Reliable Broadcast Consensus Isis Toolkit [Birman, van Renesse et al.]Y. Amir, D. Dolev, S. Kramer, and D. Malki. Transis: A communication sub-system for high availability. In Proc. 22nd Annual International Symposium on Fault-Tolerant Computing, pages 76--84, July 1992.Scalability Long message delays Unreliable communication Network partitionsA. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis,D. Swinehart, and D. Terry. Epidemic algorithms for replicated databasemaintenance. In Proceedings of the 6th Annual ACM Symposium on Principlesof Distributed Computing, Vancouver, BC, August 1987, pp. 1-12.Setup Database replicated at thousands of sitesacross the nation Unreliable point-to-point links Crash failure model Updates injected at a single site Updates must propagate to all other sites Want contents of all replicas to be identical ifupdates stop and system left aloneNotation S is a set of n sites (replicas) Pretend there is only one database entry v is the value of the database entry t is the timestamp associated with v Timestamps are totally ordered All sites are consistent iff s, s’  S : s.v = s’.vBroadcastIdea: If an update is injected at site s, then s mails theupdate to every other site in SUpon an update at site s: for each s’  S \ {s} dosend (Update, s.v, s.t) to s’ endloopUpon receiving (Update, v’, t’): if s.t < t’ thens.v := v’s.t := t’ endifWeakness: send is unreliable what if crashes occurAnti-entropyIdea: Every site regularly chooses another site at randomand exchanges database contents with it to resolvedifferences.Each server s periodically executes: for some s’  S \ {s} doResolveDifference(s,s’) endloopPush: ResolveDifference(s,s’) { if s.t > s’.t then s’.v := s.vs’.t = s.t endifPull: ResolveDifference(s,s’) { if s.t < s’.t then s.v := s’.vs.t := s’.t endifPush vs. pull analysisLet pi be the probability that a site still has notbeen updated by the ith try at anti-entropyFor large values of n: Push: pi+1 = pi e-1Pull: pi+1 = (pi )2Converges much faster for small piExample using pull mechanisms1s2s3s4(v, 1)Updatebroadcast(v, 1)anti-entropy(v, 1)(v, 2)broadcast(v, 2)(v, 2)(v, 2)Anti-entropy facts Guaranteed to eventually propagate updateto everyone with probability 1 Anti-entropy infects everyone in O(log n)time for uniformly chosen sites Good backup mechanism for direct mail Weakness: must go through entire databaseEpidemics Complex epidemics Sites can become “cured” Terminology: susceptible, infective, removed Strengths: sites do not mail everyone and do nothave to enumerate entire database Weakness: some may be left susceptible Resilient against unreliable communication Anti-entropy is a simple epidemicRumor mongering (informal) All sites start out susceptible When a site s receives a new update, itbecomes infective s periodically chooses another site s’ If s’ does not know the rumor, then itreceives the update and also becomesinfective If s’ already knows the rumor, then sbecomes removed with some probabilityRumor mongering protocolFor a site s:let L be a list of (initially empty) infective updatesperiodically:for some s’  S \ {s} do for each update u  L send u to s’if s’ already knows about u then remove u from L with probability 1/k end loopend loopupon receiving new update u:insert u into LAnalysis of rumor mongeringi = fraction of infective sitess = fraction of susceptible sitesr = fraction of removed sitessidtds=isksidtdi)1(1+=kskkdsdi 11++=dskskkdi ++=11cksskksi +++=ln1)(kkc1+=()ksskksiln11)( ++=()()skes+=11Rumor mongering facts Expected fraction of susceptible sitess = e-(k+1)(1-s)Back up mongering with anti-entropy Redistribution of update Rumor mongering vs. broadcast Consider case when half of sites receive update Old rumors die fastDeath and its consequences Replace deleted item with a deathcertificate = (NIL, tnow) Provided no further updates, a deathcertificate eventually “deletes” all copies ofan item…but when? Problem: what if a single site is down?Death certificates Death certificate contains two values t – time of deletion t1 – threshold value, all servers discard deathcertificate after time t + t1Dormant death certificates Death certificate contains four values R – set of sites that keep a dormant death certificateafter t + t1 t – time of deletion t1 – all servers not in R discard death certificate aftertime t + t1 t2 – all servers discard the certificate after t + t2 Interaction with anti-entropy?Dormant death certificates Death certificate contains five values R – set of sites that keep a dormant death certificateafter ta + t1 t – time of deletion ta – time of activation t1 – all servers not in R discard certificate after ta + t1 t2 – all servers discard the certificate after ta + t2Bimodal multicastK.P. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky. BimodalMulticast. ACM Transactions on Computer Systems. 17(2): 41-88. May 1999Class I – Strong reliability Properties: agreement, validity, termination, integrity Expensive and limited scalability Unpredictable performance undercongestion Degraded throughput under transientfailures Why? Full buffers and flow controlClass II – Best effort reliability “If a participating process discovers a failure,a reasonable effort is made to overcome it.” Better scalability than Class I protocols Difficult to reason about systems withoutconcrete guaranteesBimodal multicast claims Provides predictable reliability and steadythroughput under highly perturbed conditions Very small probability only a few processesdeliver High probability almost everyone delivers “Vanishingly small probability” in betweenSystem assumptions At least 75% of healthy processes willrespond to incoming messages within aknown bound 75% of messages will get through thenetwork Crash failure


View Full Document

UT CS 378 - Epidemics

Documents in this Course
Discourse

Discourse

13 pages

Phishing

Phishing

49 pages

Load more
Download Epidemics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Epidemics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Epidemics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?