DOC PREVIEW
U of I CS 525 - A Gossip-Style Failure Detection Service

This preview shows page 1-2-3-4-5-34-35-36-37-68-69-70-71-72 out of 72 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MembershipA Gossip-Style Failure Detection ServiceOutlineMotivationSlide 5System ModelCharacterization & ApproachRequirementsSlide 9Mechanism of the AlgorithmSlide 11Slide 12Slide 13Slide 14Parameters to tune the AlgorithmSlide 16AnalysisSlide 18Slide 19Slide 20Slide 21Slide 22Multi-level GossipingDiscussionConclusionOn Scalable and Efficient Distributed Failure DetectorsSlide 27Motivation & Previous WorkSlide 29Slide 30Slide 31Slide 32Worst-case Network LoadOptimal Worst-case Network LoadSlide 35Slide 36Slide 37Randomized Distributed Failure Detector ProtocolSlide 39Slide 40Slide 41Analysis and ParametersExperimental ResultSlide 44Slide 45SWIM: Scalable Weakly-consistent Infection-style Process Group Memberhip ProtocolQuestions:Group Membership ServiceLarge Scale Process Group requires ScalabilitySWIM Actions:Scalable Failure DetectorsMinimal Network Load L*HeartbeatingSWIM: Failure DetectorSWIM vs HeartbeatingSWIM Failure DetectorDisseminationDissemination OptionsInfection-style DisseminationSlide 60Suspicion MechanismSlide 62Slide 63Time-bounded CompletenessExperiments Set-UpSlide 66Slide 67Slide 68Slide 69Slide 70Answers:Critiques/QuestionsMembershipDongyun JinThuy NguyenCS525 PresentationA Gossip-Style Failure Detection ServiceR. v. Renesse Y. MinskyM. Hayden3OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion4MotivationWhy do we need Failure Detection?System ManagementReplicationLoad BalancingWhen we detect a Failure in the system,Not using itMoving responsibilityMaking another copy5Motivation26810161722File “reviews.doc”6System ModelFail-stop model (not Byzantine Fault)Minimal assumptions about the networkSome message lostMost messages are delivered within predetermined, reasonable time.Goal : detecting failures at each hostCompletenessAccuracySpeed7Characterization & Approach{Strong/Weak} COMPLETENESSFailure of any member is detected by {all/some} non-faulty membersSTRONG ACCURACYno mistakeCannot achieve bothGuarantees completeness always and accuracy with high probability.8RequirementsCOMPLETENESSSPEEDEvery failure is detected by some non-faulty member within T time unitsACCURACYThe probability that non-faulty member(not yet detected as failed) is detected as faulty by any other non-faulty member should be less than Pmistake9OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion10Mechanism of the Algorithm1243AddressHeartbeat CounterTimeGossiping this list to othersAnd when a node receives it, merge this list with its list1 10120 662 10103 623 10098 634 10111 6511Mechanism of the Algorithm11 10120 662 10103 623 10098 634 10111 65243Gossiping this list to othersAnd when a node receives it, merge this list with its list1 10118 642 10110 643 10090 584 10111 651 10120 702 10110 643 10098 704 10111 65Current time : 70 at node 212Mechanism of the AlgorithmIf the heartbeat has not increased for more than Tfail seconds, the member is considered failedAnd after Tcleanup seconds, it will delete the member from the list13Mechanism of the AlgorithmThe reason they don’t delete a host right after Tfail seconds?11 10120 662 10103 623 10098 554 10111 652431 10120 662 10110 643 10098 504 10111 651 10120 662 10110 644 10111 651 10120 662 10110 643 10098 754 10111 65Current time : 75 at node 214OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion15Parameters to tune the AlgorithmGiven Pmistake,Tfail TcleanupRate of gossiping, TgossipChoose Tfail so that erroneous failure detection -> PmistakeChoose Tcleanup to be double of TfailChoose Rate of gossiping depending on network bandwidth16Parameters to tune the Algorithmfailuretimet+Tfailt+Tcleanup=t+2*Tfailt17AnalysisSimplified AnalysisIn a round, only one member can gossip to another member with the success probability ParrivalBy using dynamic method, we get the required number of rounds to achieve a certain quality of detection.As # round increases, Pmistake(r) decreaseDetection Time : Multiply the # round when Pmistake(r) ≤ Pmistake , by Tgossip18AnalysisAs # members increases, the detection time increase19AnalysisAs requirement is loosened, the detection time decrease.20AnalysisAs # failed members increases, the detection time increase significantly21AnalysisThe algorithm is resilient to message loss22OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion23Multi-level GossipingIn subnet i, with probability 1/ni24DiscussionWe might be able to use other gossip method to improve the performanceA hybrid of pull and push?Sending only new contents?It is consuming a lot of network resource.25ConclusionA failure detection based on a gossip protocol.Accurate with known probabilityResilient against message lossSimple analysis of this algorithmMulti-level GossipingOn Scalable and Efficient Distributed Failure DetectorsI. GuptaT. D. ChandraG. S. Goldszmidt27OutlineMotivation & Previous WorkProblem StatementWorst-case Network LoadRandomized Distributed Failure Detector ProtocolAnalysis and Experimental ResultDiscussionConclusion28Motivation & Previous WorkMost distributed applications rely on failure detector algorithms to avoid impossibility resultThe Heartbeating algorithms are not as efficient and scalable as claimedPrevious analysis model didn’t consider scalability.29Characterization & Approach{Strong/Weak} COMPLETENESSFailure of any member is detected by {all/some} non-faulty membersSTRONG ACCURACYno mistakeCannot achieve bothGuarantees completeness always and accuracy with high probability.30RequirementsCOMPLETENESSSPEEDEvery failure is detected by some non-faulty member within T time unitsACCURACYThe probability that non-faulty member(not yet detected as failed) is detected as faulty by any other non-faulty member should be less than PM(T)31System ModelA large group of n membersEach member knows each otherCrash (Non-Byzantine) FailureMessage loss rate pmlMember Failure


View Full Document

U of I CS 525 - A Gossip-Style Failure Detection Service

Documents in this Course
Epidemics

Epidemics

12 pages

LECTURE

LECTURE

7 pages

LECTURE

LECTURE

39 pages

LECTURE

LECTURE

41 pages

P2P Apps

P2P Apps

49 pages

Lecture

Lecture

48 pages

Epidemics

Epidemics

69 pages

GRIFFIN

GRIFFIN

25 pages

Load more
Download A Gossip-Style Failure Detection Service
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Gossip-Style Failure Detection Service and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Gossip-Style Failure Detection Service 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?