U of I CS 525 - A Gossip-Style Failure Detection Service - D1676029

Home> Schools> University of Illinois> Computer Science (CS) > CS 525> A Gossip-Style Failure Detection Service

DOC PREVIEW

U of I CS 525 - A Gossip-Style Failure Detection Service

School name University of Illinois

Course Cs 525- Advanced Graphics Processor Programming

Pages 72

This preview shows page 1-2-3-4-5-34-35-36-37-68-69-70-71-72 out of 72 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

MembershipA Gossip-Style Failure Detection ServiceOutlineMotivationSlide 5System ModelCharacterization & ApproachRequirementsSlide 9Mechanism of the AlgorithmSlide 11Slide 12Slide 13Slide 14Parameters to tune the AlgorithmSlide 16AnalysisSlide 18Slide 19Slide 20Slide 21Slide 22Multi-level GossipingDiscussionConclusionOn Scalable and Efficient Distributed Failure DetectorsSlide 27Motivation & Previous WorkSlide 29Slide 30Slide 31Slide 32Worst-case Network LoadOptimal Worst-case Network LoadSlide 35Slide 36Slide 37Randomized Distributed Failure Detector ProtocolSlide 39Slide 40Slide 41Analysis and ParametersExperimental ResultSlide 44Slide 45SWIM: Scalable Weakly-consistent Infection-style Process Group Memberhip ProtocolQuestions:Group Membership ServiceLarge Scale Process Group requires ScalabilitySWIM Actions:Scalable Failure DetectorsMinimal Network Load L*HeartbeatingSWIM: Failure DetectorSWIM vs HeartbeatingSWIM Failure DetectorDisseminationDissemination OptionsInfection-style DisseminationSlide 60Suspicion MechanismSlide 62Slide 63Time-bounded CompletenessExperiments Set-UpSlide 66Slide 67Slide 68Slide 69Slide 70Answers:Critiques/QuestionsMembershipDongyun JinThuy NguyenCS525 PresentationA Gossip-Style Failure Detection ServiceR. v. Renesse Y. MinskyM. Hayden3OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion4MotivationWhy do we need Failure Detection?System ManagementReplicationLoad BalancingWhen we detect a Failure in the system,Not using itMoving responsibilityMaking another copy5Motivation26810161722File “reviews.doc”6System ModelFail-stop model (not Byzantine Fault)Minimal assumptions about the networkSome message lostMost messages are delivered within predetermined, reasonable time.Goal : detecting failures at each hostCompletenessAccuracySpeed7Characterization & Approach{Strong/Weak} COMPLETENESSFailure of any member is detected by {all/some} non-faulty membersSTRONG ACCURACYno mistakeCannot achieve bothGuarantees completeness always and accuracy with high probability.8RequirementsCOMPLETENESSSPEEDEvery failure is detected by some non-faulty member within T time unitsACCURACYThe probability that non-faulty member(not yet detected as failed) is detected as faulty by any other non-faulty member should be less than Pmistake9OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion10Mechanism of the Algorithm1243AddressHeartbeat CounterTimeGossiping this list to othersAnd when a node receives it, merge this list with its list1 10120 662 10103 623 10098 634 10111 6511Mechanism of the Algorithm11 10120 662 10103 623 10098 634 10111 65243Gossiping this list to othersAnd when a node receives it, merge this list with its list1 10118 642 10110 643 10090 584 10111 651 10120 702 10110 643 10098 704 10111 65Current time : 70 at node 212Mechanism of the AlgorithmIf the heartbeat has not increased for more than Tfail seconds, the member is considered failedAnd after Tcleanup seconds, it will delete the member from the list13Mechanism of the AlgorithmThe reason they don’t delete a host right after Tfail seconds?11 10120 662 10103 623 10098 554 10111 652431 10120 662 10110 643 10098 504 10111 651 10120 662 10110 644 10111 651 10120 662 10110 643 10098 754 10111 65Current time : 75 at node 214OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion15Parameters to tune the AlgorithmGiven Pmistake,Tfail TcleanupRate of gossiping, TgossipChoose Tfail so that erroneous failure detection -> PmistakeChoose Tcleanup to be double of TfailChoose Rate of gossiping depending on network bandwidth16Parameters to tune the Algorithmfailuretimet+Tfailt+Tcleanup=t+2*Tfailt17AnalysisSimplified AnalysisIn a round, only one member can gossip to another member with the success probability ParrivalBy using dynamic method, we get the required number of rounds to achieve a certain quality of detection.As # round increases, Pmistake(r) decreaseDetection Time : Multiply the # round when Pmistake(r) ≤ Pmistake , by Tgossip18AnalysisAs # members increases, the detection time increase19AnalysisAs requirement is loosened, the detection time decrease.20AnalysisAs # failed members increases, the detection time increase significantly21AnalysisThe algorithm is resilient to message loss22OutlineMotivationSystem ModelMechanism of the AlgorithmParameters to tune the AlgorithmAnalysisMulti-level GossipingDiscussionConclusion23Multi-level GossipingIn subnet i, with probability 1/ni24DiscussionWe might be able to use other gossip method to improve the performanceA hybrid of pull and push?Sending only new contents?It is consuming a lot of network resource.25ConclusionA failure detection based on a gossip protocol.Accurate with known probabilityResilient against message lossSimple analysis of this algorithmMulti-level GossipingOn Scalable and Efficient Distributed Failure DetectorsI. GuptaT. D. ChandraG. S. Goldszmidt27OutlineMotivation & Previous WorkProblem StatementWorst-case Network LoadRandomized Distributed Failure Detector ProtocolAnalysis and Experimental ResultDiscussionConclusion28Motivation & Previous WorkMost distributed applications rely on failure detector algorithms to avoid impossibility resultThe Heartbeating algorithms are not as efficient and scalable as claimedPrevious analysis model didn’t consider scalability.29Characterization & Approach{Strong/Weak} COMPLETENESSFailure of any member is detected by {all/some} non-faulty membersSTRONG ACCURACYno mistakeCannot achieve bothGuarantees completeness always and accuracy with high probability.30RequirementsCOMPLETENESSSPEEDEvery failure is detected by some non-faulty member within T time unitsACCURACYThe probability that non-faulty member(not yet detected as failed) is detected as faulty by any other non-faulty member should be less than PM(T)31System ModelA large group of n membersEach member knows each otherCrash (Non-Byzantine) FailureMessage loss rate pmlMember Failure

View Full Document