MTU CS 6461 - FIGHTING FIRE WITH FIRE - D1904867

Home> Schools> Michigan Tech> Computer Science (CS) > CS 6461> FIGHTING FIRE WITH FIRE

DOC PREVIEW

MTU CS 6461 - FIGHTING FIRE WITH FIRE

School name Michigan Tech

Course Cs 6461- Advanced Computer Networks f 3

Pages 20

This preview shows page 1-2-19-20 out of 20 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1 INTRODUCTION2 SCALABILITY AND RELIABILITY2.1 Throughput instability2.2 Micropartitions2.3 Convoys2.4 Request and retransmission storms3 BIMODAL MULTICAST4 FINDINGS SO FAR AND PROVIDING STRONGER GUARANTEES4.1 General insight4.2 Our solution---a non-traditional stack architecture5 SCALING VIRTUAL SYNCHRONY5.1 Scaling virtual synchrony---a back-of-the-envelope analysis5.1.1 Virtual synchrony, unicast implementation.5.1.2 Virtual synchrony, implemented over hbox Pbcast.5.2 Algorithm description6 EXPERIMENTAL EVALUATION6.1 Scalability6.2 Insensitivity to perturbations6.3 Laziness of the new non-traditional stack6.4 Message load on members6.5 Effect of larger leader committees6.6 Summary of experimental results7 SUMMARY AND CONCLUSIONSQUALITY AND RELIABILITY ENGINEERING INTERNATIONALQual. Reliab. Engng. Int. 2002; 18: 165–184 (DOI: 10.1002/qre.473)FIGHTING FIRE WITH FIRE: USING RANDOMIZED GOSSIPTO COMBAT STOCHASTIC SCALABILITY LIMITSINDRANIL GUPTA, KENNETH P. BIRMAN∗AND ROBBERT VAN RENESSECornell University, Ithaca, NY 14853, USASUMMARYThe mechanisms used to improve the reliability of distributed systems often limit performance and scalability.Focusing on one widely-used definition of reliability, we explore the origins of this phenomenon and conclude thatit reflects a tradeoff arising deep within the typical protocol stack. Specifically, we suggest that protocol designsoften disregard the high cost of infrequent events. When a distributed system is scaled, both the frequency andthe overall cost of such events often grow with the size of the system. This triggers an O(n2) phenomenon, whichbecomes visible above some threshold sizes. Our findings suggest that it would be more effective to constructlarge-scale reliable systems where, unlike traditional protocol stacks, lower layers use randomized mechanisms,with probabilistic guarantees, to overcome low-probability events. Reliability and other end-to-end properties areintroduced closer to the application. We employ a back-of-the-envelope analysis to quantify this phenomenonfor a class of strongly reliable multicast problems. We construct a non-traditional stack, as described above,that implements virtually synchronous multicast. Experimental results reveal that virtual synchrony over a non-traditional, probabilistic stack helps break through the scalability barrier faced by traditional implementations ofthe protocol. Copyright 2002 John Wiley & Sons, Ltd.KEY WORDS: group communication; reliable multicast; scalability; randomized algorithms; non-traditionalstack; virtual synchrony1. INTRODUCTIONThe scalability of distributed protocols and systems isa major determinant of success in demanding settings.This paper focuses on the scalability of distributedprotocols, in group communication systems, providingsome form of guaranteed reliability. Examplesinclude the virtual synchrony protocols for reliablegroup communication [1], scalable reliable multicast(SRM) [2], and reliable multicast transport protocol(RMTP) [3]. We argue that the usual architecture forsupporting reliability exposes mechanisms of this sortto serious scalability problems.Reliability can be understood as a point within a∗Correspondence to: K. P. Birman, Department of ComputerScience, Cornell University, 4130 Upson Hall, Ithaca, NY 14853,USA. Email: [email protected]/grant sponsor: DARPA/AFRL-IFGA; contract/grantnumber: F30602-99-1-0532Contract/grant sponsor: NSF-CISE; contract/grant number:9703470Contract/grant sponsor: AFRL-IFGA Information AssuranceInstituteContract/grant sponsor: Microsoft ResearchContract/grant sponsor: Intel Corporationspectrum of possible guarantees. At one extreme ofthis spectrum one finds very costly, very strong guar-antees (for example, the Byzantine Agreement usedto support Replicated State Machines). Replicateddatabase systems represent a step in the directionof weaker, cheaper goals: one-copy serializability,typically implemented by some form of multiphasecommit using a quorum read and write mechanism.Lamport’s Paxos architecture and the consensus-basedapproach of Chandra and Toueg achieve similar prop-erties. Virtual synchrony, the model upon which wefocus here, is perhaps the weakest and cheapest ap-proach that can still be rigorously specified, modeledand reasoned about. Beyond this point in the spec-trum, one finds best-effort reliability models, lackingrigorous semantics, but more typical of the modernInternet—SRM and RMTP are of this nature. Theusual assumption is that, being the weakest reliabilityproperties, they should also be the cheapest to achieveand most scalable. One surprising finding of our workis that this usual belief is incorrect.Traditionally, one discusses reliability by posinga problem in a setting exposed to some class offaults. Fault-tolerant protocols solving the problemCopyright 2002 John Wiley & Sons, Ltd.166 I.GUPTA,K.P.BIRMANANDR.VANRENESSEcan then be compared. The protocols cited aboveprovide multicast tolerance of message loss andendpoint failures, and discussions of their behaviorwould typically look at throughput and latency undernormal conditions, message complexity, backgroundoverheads, and at the degree to which failures disruptthese properties.Oddly, even thorough performance analyses gener-ally focus on the extremes—performance of the proto-col under ideal conditions (when nothing goes wrong),and performance impact when injected failures disruptexecution. This paper adopts a different perspective;it looks at reliable protocols under the influence ofwhat might be called mundane transient problems,such as network or processor scheduling delays andbrief periods of packet loss. One would expect reliableprotocols to ride out such events, but we find that thisis rarely the case, particularly if we look at the impactof a disruptive event as a function of scale. In fact,reliable protocols degrade dramatically under this typeof mundane stress, a phenomenon attributable to low-probability events that become both more frequent andmore costly as the scale of the system grows.After presenting these arguments, we shift attentionto an old class of protocols that resemble NNTP, thegossip-based algorithm used to propagate ‘news’ onthe Internet. These turn out to be scalable under thesame style of analysis that predicts poor scalability fortheir non-gossip counterparts.This finding leads us to suggest that protocoldesigners should fight fire with fire, employingprobabilistic techniques that

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-19-20 out of 20 pages.

MTU CS 6461 - FIGHTING FIRE WITH FIRE

Sign up for free to view:

Please select your school