CS514: Intermediate Course in Operating SystemsRecap… Consistent cutsWhat time is it?But what does time “mean”?Lamport’s approachDrawing time-line pictures:Slide 7Slide 8Slide 9Slide 10Slide 11Happens before “relation”Logical clocksRules for managing logical clocksTime-line with LT annotationsSlide 16Can we do better?Vector clocksTime-line with VT annotationsRules for comparison of VTsSlide 21Vector time and happens beforeConsistent cutsReplication… and replication is powerful!Let’s look at time vis-à-vis updatesDrill down: Life of a groupQuestions to ask about orderSlide 29Ordering exampleSlide 31Slide 32Slide 33CommutativitySingle updaterSlide 36Mutual exclusionSlide 38Slide 39Slide 40Types of ordering we’ve seenSlide 42Recommended readingsCS514: Intermediate Course in Operating SystemsProfessor Ken BirmanVivek Vishnumurthy: TARecap… Consistent cutsOn Monday we saw that simply gathering the state of a system isn’t enoughOften the “state” includes tricky relationshipsConsistent cuts are a way of collecting state that “could” have arisen concurrently in real-timeWhat time is it?In distributed system we need practical ways to deal with timeE.g. we may need to agree that update A occurred before update BOr offer a “lease” on a resource that expires at time 10:10.0150 Or guarantee that a time critical event will reach all interested parties within 100msBut what does time “mean”?Time on a global clock?E.g. with GPS receiver… or on a machine’s local clockBut was it set accurately?And could it drift, e.g. run fast or slow?What about faults, like stuck bits?… or could try to agree on timeLamport’s approachLeslie Lamport suggested that we should reduce time to its basicsTime lets a system ask “Which came first: event A or event B?”In effect: time is a means of labeling events so that…If A happened before B, TIME(A) < TIME(B)If TIME(A) < TIME(B), A happened before BDrawing time-line pictures:pmsndp(m)qrcvq(m) delivq(m)DDrawing time-line pictures:A, B, C and D are “events”. Could be anything meaningful to the applicationSo are snd(m) and rcv(m) and deliv(m)What ordering claims are meaningful?pmACBsndp(m)qrcvq(m) delivq(m)DDrawing time-line pictures:A happens before B, and C before D“Local ordering” at a single processWrite and pqmACBrcvq(m) delivq(m)sndp(m)BApDCqDDrawing time-line pictures:sndp(m) also happens before rcvq(m)“Distributed ordering” introduced by a messageWritepqmACBrcvq(m) delivq(m)sndp(m))m(rcv)m(sndqMpDDrawing time-line pictures:A happens before DTransitivity: A happens before sndp(m), which happens before rcvq(m), which happens before DpqmDACBrcvq(m) delivq(m)sndp(m)Drawing time-line pictures:B and D are concurrentLooks like B happens first, but D has no way to know. No information flowed…pqmDACBrcvq(m) delivq(m)sndp(m)Happens before “relation”We’ll say that “A happens before B”, written AB, if1. APB according to the local ordering, or2. A is a snd and B is a rcv and AMB, or3. A and B are related under the transitive closure of rules (1) and (2)So far, this is just a mathematical notation, not a “systems tool”Logical clocksA simple tool that can capture parts of the happens before relationFirst version: uses just a single integerDesigned for big (64-bit or more) countersEach process p maintains LTp, a local counterA message m will carry LTmRules for managing logical clocksWhen an event happens at a process p it increments LTp. Any event that matters to pNormally, also snd and rcv events (since we want receive to occur “after” the matching send)When p sends m, setLTm = LTpWhen q receives m, setLTq = max(LTq, LTm)+1Time-line with LT annotationsLT(A) = 1, LT(sndp(m)) = 2, LT(m) = 2LT(rcvq(m))=max(1,2)+1=3, etc…pqmDACBrcvq(m) delivq(m)sndp(m)LTq0 0 0 1 1 1 1 3 3 3 4 5 5LTp0 1 1 2 2 2 2 2 2 3 3 3 3Logical clocksIf A happens before B, AB,then LT(A)<LT(B)But converse might not be true:If LT(A)<LT(B) can’t be sure that AB This is because processes that don’t communicate still assign timestamps and hence events will “seem” to have an orderCan we do better?One option is to use vector clocksHere we treat timestamps as a listOne counter for each processRules for managing vector times differ from what did with logical clocksVector clocksClock is a vector: e.g. VT(A)=[1, 0]We’ll just assign p index 0 and q index 1Vector clocks require either agreement on the numbering, or that the actual process id’s be included with the vectorRules for managing vector clockWhen event happens at p, increment VTp[indexp]Normally, also increment for snd and rcv events When sending a message, set VT(m)=VTpWhen receiving, set VTq=max(VTq, VT(m))Time-line with VT annotationspqmDACBrcvq(m) delivq(m)sndp(m)VTq0 00 00 00 10 10 10 12 22 22 2232 32 4VTp0 01 01 02 02 02 02 02 02 03 0303 03 0VT(m)=[2,0]Could also be [1,0] if we decide not to increment the clock on a snd event. Decision depends on how the timestamps will be used.Rules for comparison of VTsWe’ll say that VTA ≤ VTB ifI, VTA[i] ≤ VTB[i]And we’ll say that VTA < VTB ifVTA ≤ VTB but VTA ≠ VTBThat is, for some i, VTA[i] < VTB[i]Examples?[2,4] ≤ [2,4][1,3] < [7,3][1,3] is “incomparable” to [3,1]Time-line with VT annotationsVT(A)=[1,0]. VT(D)=[2,4]. So VT(A)<VT(D)VT(B)=[3,0]. So VT(B) and VT(D) are incomparablepqmDACBrcvq(m) delivq(m)sndp(m)VTq0 00 00 00 10 10 10 12 22 22 2232 32 4VTp0 01 01 02 02 02 02 02 02 03 0303 03 0VT(m)=[2,0]Vector time and happens beforeIf AB, then VT(A)<VT(B)Write a chain of events from A to BStep by step the vector clocks get largerIf VT(A)<VT(B) then ABTwo cases: if A and B both happen at same process p, trivialIf A happens at p and B at q, can trace the path back by which q “learned” VTA[p]Otherwise A and B happened concurrentlyConsistent cutsIf we had time, we could revisit these using logical and vector clocksIn fact there are algorithms that find a consistent cut byImplementing some form of clockAsking everyone to record their state at time now+ (for some large )And this can be made to work well…ReplicationAnother use of time arises when we talk about replicating
View Full Document