1Virtual synchronyKen BirmanVirtual Synchrony Goal: Simplifies distributed systems development by introducing emulating a simplified world – a synchronous one Features of the virtual synchrony model Process groups with state transfer, automated fault detection and membership reporting Ordered reliable multicast, in several flavors Fault-tolerance, replication tools layered on top Extremely good performanceProcess groups Offered as a new and fundamental programming abstraction Just a set of application processes that cooperate for some purpose Could replicate data, coordinate handling of incoming requests or events, perform parallel tasks, or have a shared perspective on some sort of “fact” about the system Can create many of them** Within limits... Many systems only had limited scalabilityWhy “virtual” synchrony? What would a synchronous execution look like? In what ways is a “virtual” synchrony execution not the same thing?A synchronous executionpqrstu With truesynchrony executions run in genuine lock-step. Virtual Synchrony at a glanceWith virtualsynchrony executions only look “lock step” to the applicationpqrstu2Virtual Synchrony at a glancepqrstuWe use the weakest (least ordered, hence fastest) form of communication possibleChances to “weaken” ordering Suppose that any conflicting updates are synchronized using some form of locking Multicast sender will have mutual exclusion Hence simply because we used locks, cbcastdelivers conflicting updates in order they were performed! If our system ever does see concurrent multicasts… they must not have conflicted. So it won’t matter if cbcast delivers them in different orders at different recipients!Causally ordered updates Each thread corresponds to a different lock In effect: red “events” never conflict with green ones!prst1234512In general? Replace “safe” (dynamic uniformity) with a standard multicast when possible Replace abcast with cbcast Replace cbcast with fbcast Unless replies are needed, don’t wait for replies to a multicastWhy “virtual” synchrony? The user writes code as it will experience a purely synchronous execution Simplifies the developer’s task – very few cases to worry about, and all group members see the same thing at the same “time” But the actual execution is rather concurrent and asynchronous Maximizes performance Reduces risk that lock-step execution will trigger correlated failuresWhy groups? Other concurrent work, such as Lamport’sstate machines, treat the entire program as a deterministic entity and replicate it But a group replicates state at the “abstract data type” level Each group can correspond to one object This is a good fit with modern styles of application development3Correlated failures Perhaps surprisingly, experiments showed that virtual synchrony makes these less likely! Recall that many programs are buggy Often these are Heisenbugs (order sensitive) With lock-step execution each group member sees group events in identical order So all die in unison With virtual synchrony orders differ So an order-sensitive bug might only kill one group member!Programming with groups Many systems just have one group E.g. replicated bank servers Cluster mimics one highly reliable server But we can also use groups at finer granularity E.g. to replicate a shared data structure Now one process might belong to many groups A further reason that different processes might see different inputs and event ordersEmbedding groups into “tools” We can design a groups API: pg_join(), pg_leave(), cbcast()… But we can also use groups to build other higher level mechanisms Distributed algorithms, like snapshot Fault-tolerant request execution Publish-subscribeDistributed algorithms Processes that might participate join an appropriate group Now the group view gives a simple leader election rule Everyone sees the same members, in the same order, ranked by when they joined Leader can be, e.g., the “oldest” processDistributed algorithms A group can easily solve consensus Leader multicasts: “what’s your input”? All reply: “Mine is 0. Mine is 1” Initiator picks the most common value and multicasts that: the “decision value” If the leader fails, the new leader just restarts the algorithm Puzzle: Does FLP apply here?Distributed algorithms A group can easily do consistent snapshot algorithm Either use cbcast throughout system, or build the algorithm over gbcast Two phases: Start snapshot: a first cbcast Finished: a second cbcast, collect process states and channel logs4Distributed algorithms: Summary Leader election Consensus and other forms of agreement like voting Snapshots, hence deadlock detection, auditing, load balancingMore tools: fault-tolerance Suppose that we want to offer clients “fault-tolerant request execution” We can replace a traditional service with a group of members Each request is assigned to a primary (ideally, spread the work around) and a backup Primary sends a “cc” of the response to the request to the backup Backup keeps a copy of the request and steps in only if the primary crashes before replying Sometimes called “coordinator/cohort” just to distinguish from “primary/backup”Coordinator-cohortpqrstuQ assigned as coordinator for t’s request…. But p takes over if q fails Coordinator-cohortpqrstuP picked to perform u’s request. Q stands by until it sees request completion message Parallel processingpqrstP and Q split a task. P performs part 1 of 2; Q performs part 2 of 2. Such as searching a large database… they agree on the initial state in which the request was received Parallel processingpqrstIn this example, r is the cohort and both p and q function as coordinators. If either fails, r can step in and take over its role….5Parallel processingpqrst…. As it did in this case, when q fails Publish / Subscribe Goal is to support a simple API: Publish(“topic”, message) Subscribe(“topic”, event_hander) We can just create a group for each topic Publish multicasts to the group Subscribers are the membersScalability warnings! Many existing group communication systems don’t scale incredibly well E.g. JGroups, Isis, Horus, Ensemble, Spread Group sizes
View Full Document