Virtual SynchronyA motivating exampleRequirements for a distributed system (or a replicated service)Requirements for a distributed system (or a replicated service)System as a set of groupsA process groupA model of a dynamic process groupThe lifecycle of a member (a replica)The Idea (Roughly)Slide 10Why another approach, though?A special class of applicationsDistributed trading systemWhat’s special about these systems?Differences (in a Nutshell)Back to virtual synchronyA process group: joining / leavingSlide 18A process group: hadling failuresCausal delivery and vector clocksWhat’s great about fbcast / cbcast ?Asynchronous pipeliningWhat’s to be careful with ?Why use causality?A migrating thread and FIFO analogySlide 26Causal vs. total orderingTotal ordering: atomic, synchronousWhy total ordering?Implementing totally ordered multicastAtomicity of failuresWhy atomicity of failures?Atomicity: message flushingA multi-phase failure-atomic protocolSimple toolsSimple replication: state machineUpdates with token-style lockingSlide 38Multiple locks on unrelated dataReplicated servicesSlide 41Other types of toolsComplaints (Cheriton / Skeen)At what level to apply causality?Slide 45OverheadsSlide 47Slide 48ConclusionsVirtual SynchronyKrzysztof [email protected] motivating exampleGENERALSSENSORSWEAPONnotificationsordersDANGERdetectionwell-coordinated response?????decisionmadeRequirements for a distributed system (or a replicated service)Consistent views across componentsE.g. vice-generals see the same events as the chief generalAgreement on what messages have been received or deliveredE.g. each general has same view of the world (consistent state)Replicas of the distributed service do not divergeE.g. everyone should have same view of membershipIf a component is unavailable, all others decide up or down togetherRequirements for a distributed system (or a replicated service)Consistent actionsE.g. generals don’t contradict each other (don’t issue conflicting orders)A single service may need to respond to a single requestResponses to independent requests may need to be consistentBut: Consistent Same (same actions determinism no fault tolerance)System as a set of groupsclient-server groupspeer groupdiffusion groupmulticastinga decisioneventA process groupA notion of process groupMembers know each other, cooperateSuspected failures group restructures itself, consistentlyA failed node is excluded, eventually learns of its failureA recovered node rejoinsA group maintains a consistent common stateConsistency refers only to membersMembership is a problem on its own... but it CAN be solvedA model of a dynamic process groupCRASHconsistent statemembership viewsJOINRECOVERconsistent statestate transfersABCDEFBCEThe lifecycle of a member (a replica)alive,but not ingroupdead or suspected to be deadin groupprocessingrequestsjointransfer state hereunjoinfail or just unreachablecome upassumed tabula rasa all information = lostThe Idea (Roughly)Take some membership protocol, or an external serviceGuarantee consistency in inductive mannerStart in an identical replicated stateApply any changesAtomically, that is either everywhere or nowhereIn the exact same order at all replicasConsistency of all actions / responses comes as a resultSame events seen:Rely on ordering + atomicity of failures and message deliveryThe Idea (Roughly)We achieve it by the following primitives:Lower-levelCreate / join / leave a groupMulticasting: FBCAST, CBCAST / ABCAST (the "CATOCS")Higher-levelDownload current state from the existing active replicasRequest / release locks (read-only / read-write)UpdateRead (locally)Why another approach, though?We have the whole range of other toolsTransactions: ACID; one-copy serializability with durabilityPaxos, Chandra-Toueg (FLP-syle consensus schemes)All kinds of locking schemes, e.g. two-phase locking (2PL)Virtual Synchrony is a point in the space of solutionsWhy are other tools not perfectSome are very slow: lots of messages, roundtrip latenciesSome limit asynchrony (e.g. transactions at commit time)Have us pay very high cost for freatures we may not needA special class of applicationsCommand / ControlJoint Battlespace Infosphere, telecommunications, Distribution / processing / filtering data streamsTrading system, air traffic control system, stock exchange, real-time data for banking, risk-managementReal-Time SystemsShop floor process control, medical decision support, power gridWhat do they have in common:A distributed, but coordinated processing and controlHighly-efficient, pipelined distributed data processingDistributed trading systemPricing DB’sHistorical DataAnalyticsCurrent PricingMarketDataFeedsLong-Haul WAN SpoolerTokyo, London, Zurich, ...Trader Clients1.2.3.Availability for historical dataLoad balancing and consistentmessage delivery for price distributionParallel execution for analyticsWhat’s special about these systems?Need high performance: we must weaken consistencyData is of different nature: more dynamicMore relevant online, in contextStoring it persistently often doesn’t make that much senseCommunication-orientedOnline progress: nobody cares about faulty nodesFaulty nodes can be rebootedRebooted nodes are just spare replicas in the common poolDifferences (in a Nutshell)Databases Command / Controlrelatively independent programs closely cooperating programs......organized into process groups consistent data;(external) strong consistencyweakened consistency; instead focus on making online progresspersistency, durable operations mostly replicated state and control infoone-copy serializability serializability w/o durability (nobody cares)atomicity of groups of operations atomicity of messages + causalityheavy-weight mechanisms; slow lightweight, stress on responsivenessrelationships in data relationships between actions and in the sequences of messagesmulti-phase protocols, ACKs etc. preferably one-way, pipelined processingBack to virtual synchronyOur plan:Group membershipOrdered multicast within group membership viewsDelivery of new views synchronized with multicastHigher-level primitivesA process group: joining / leavingABCDV1 = {A,B,C}request to joingroup membership protocolsending a new viewrequest to leaveV2 = {A,B,C,D}V3 =
View Full Document