DOC PREVIEW
U of I CS 525 - A Gossip-Style Failure Detection Service

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1MembershipDongyun JinThuy NguyenCS525 PresentationA Gossip-Style Failure Detection ServiceR. v. RenesseY. MinskyM. Hayden3OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion4MotivationWhy do we need Failure Detection?System ManagementReplicationLoad BalancingWhen we detect a Failure in the system,Not using itMoving responsibilityMaking another copy5Motivation26810161722File “reviews.doc”6System ModelFail-stop model (not Byzantine Fault)Minimal assumptions about the networkSome message lostMost messages are delivered within predetermined, reasonable time.Goal : detecting failures at each hostCompletenessAccuracySpeed27Characterization & Approach{Strong/Weak} COMPLETENESSFailure of any member is detected by {all/some} non-faulty membersSTRONG ACCURACYno mistakeCannot achieve bothGuarantees completeness always and accuracy with high probability.8RequirementsCOMPLETENESSSPEEDEvery failure is detected by some non-faulty member within T time unitsACCURACYThe probability that non-faulty member(not yet detected as failed) is detected as faulty by any other non-faulty member should be less than Pmistake9OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion10Mechanism of the Algorithm1243AddressHeartbeat CounterTimeGossiping this list to othersAnd when a node receives it, merge this list with its list6210103263100983661012016510111411Mechanism of the Algorithm162101032631009836610120165101114243Gossiping this list to othersAnd when a node receives it, merge this list with its list6410118165101114581009036410110270101201651011147010098364101102Current time : 70 at node 212Mechanism of the AlgorithmIf the heartbeat has not increased for more than Tfailseconds, the member is considered failedAnd after Tcleanupseconds, it will delete the member from the list313Mechanism of the AlgorithmThe reason they don’t delete a host right after Tfailseconds?1621010325510098366101201651011142436610120165101114501009836410110266101201651011146410110266101201651011147510098364101102Current time : 75 at node 214OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion15Parameters to tune the AlgorithmGiven Pmistake,TfailTcleanupRate of gossiping, TgossipChoose Tfailso that erroneous failure detection -> PmistakeChoose Tcleanupto be double of TfailChoose Rate of gossiping depending on network bandwidth16Parameters to tune the Algorithmfailuretimet+Tfailt+Tcleanup=t+2*Tfailt17AnalysisSimplified AnalysisIn a round, only one member can gossip to another member with the success probability ParrivalBy using dynamic method, we get the required number of rounds to achieve a certain quality of detection.As # round increases, Pmistake(r) decreaseDetection Time : Multiply the # round when Pmistake(r) ≤ Pmistake, by Tgossip18AnalysisAs # members increases, the detection time increase419AnalysisAs requirement is loosened, the detection time decrease.20AnalysisAs # failed members increases, the detection time increase significantly21AnalysisThe algorithm is resilient to message loss22OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion23Multi-level GossipingIn subnet i, with probability 1/ni24DiscussionWe might be able to use other gossip method to improve the performanceA hybrid of pull and push?Sending only new contents?It is consuming a lot of network resource.525ConclusionA failure detection based on a gossip protocol.Accurate with known probabilityResilient against message lossSimple analysis of this algorithmMulti-level GossipingOn Scalable and Efficient Distributed Failure DetectorsI. GuptaT. D. ChandraG. S. Goldszmidt27OutlineMotivation & Previous WorkProblem StatementWorst-case Network LoadRandomized Distributed Failure Detector ProtocolAnalysis and Experimental ResultDiscussionConclusion28Motivation & Previous WorkMost distributed applications rely on failure detector algorithms to avoid impossibility resultThe Heartbeating algorithms are not as efficient and scalable as claimedPrevious analysis model didn’t consider scalability.29Characterization & Approach{Strong/Weak} COMPLETENESSFailure of any member is detected by {all/some} non-faulty membersSTRONG ACCURACYno mistakeCannot achieve bothGuarantees completeness always and accuracy with high probability.30RequirementsCOMPLETENESSSPEEDEvery failure is detected by some non-faulty member within T time unitsACCURACYThe probability that non-faulty member(not yet detected as failed) is detected as faulty by any other non-faulty member should be less than PM(T)631System ModelA large group of n membersEach member knows each otherCrash (Non-Byzantine) FailureMessage loss rate pmlMember Failure Prebability Pfqml= (1 – pml)qf= (1 – pf)32OutlineMotivation & Previous WorkProblem StatementWorst-case Network LoadRandomized Distributed Failure Detector ProtocolAnalysis and Experimental ResultDiscussionConclusion33Worst-case Network LoadSCALEThe worst-case network load L is close to the optimal worst-case network load as possibleEqual expected load per memberDefinitionThe worst-case network load L of a failure detector protocol is the maximum number of messagestransmitted by any run of the protocol within any time interval of length T, divided by T34Any distributed failure detector algorithm imposes a minimal worst-case network load of , forGroup of size nSatisfying COMPLETENESS, SPEED, ACCURACYGiven values of T and PM(T)TpTPMnLml⋅⋅=)log())(log(*Optimal Worst-case Network Load35Optimal Worst-case Network LoadA group member at a random point in time t is not detected as failed yet and stays non-faulty until at least time t+TIt sends m messages for time TIf all messages are dropped, it need to be detected as failed (SPEED)Its probability should be less than PM(T))(TPMpmml≤)log())(log(mlpTPMm ≥36Worst-case Network


View Full Document

U of I CS 525 - A Gossip-Style Failure Detection Service

Documents in this Course
Epidemics

Epidemics

12 pages

LECTURE

LECTURE

7 pages

LECTURE

LECTURE

39 pages

LECTURE

LECTURE

41 pages

P2P Apps

P2P Apps

49 pages

Lecture

Lecture

48 pages

Epidemics

Epidemics

69 pages

GRIFFIN

GRIFFIN

25 pages

Load more
Download A Gossip-Style Failure Detection Service
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Gossip-Style Failure Detection Service and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Gossip-Style Failure Detection Service 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?