Unformatted text preview:

6.897: Advanced Data Structures Spring 2003Lecture 19 — May 5, 2003Prof. Erik Demaine Scribe: Ilya Baran, Deniss Cebikins, and Lev Teytelman1 OverviewIn this lecture, we examine fault-tolerant data structures. To motivate their study, consider thestandard implementation of a linked list:°−→°−→°−→°−→°−→°If the first node in the list is corrupted, the entire list is lost. We would like to construct datastructures with good bounds on the damage that a small number of faults may cause.We will construct fault-tolerant versions of the stack and of the linked list and analyze their per-formance in terms of the damage that faults can cause and the time it takes to recover from faults.It is also possible to construct fault-tolerant trees, but we will not cover it. All of these results arebased on a paper by Aumann and Bender [AB96].1.1 ModelWe will work in the pointer-machine model of computation. Our model of faults is as follows:1. A node may become faulty at any time. If a node becomes faulty, all of its data and pointersare lost forever.2. An algorithm can detect whether a node is faulty, but only upon attempt to access the node.In practice, this may be accomplished with error-correcting codes.3. The number of nodes is unlimited. When replacing faulty nodes, we never run out of nodes.4. There are s “secure” words of memory which cannot become faulty. This is necessary to havean “entry-point” into the data structure. This is justified in real life by the existence of thememory hierarchy: for example, disks fail more often than RAM and RAM fails more oftenthan registers.1.2 GoalsWe wish to build fault-tolerant versions of familiar pointer-machine data structures. The behaviorof fault-tolerant data structures should be the same as that of their non-fault-tolerant counterpartsand their asymptotic performance should not be worse. Furthermore, when faults occur, we wouldlike the following characteristics:11. There should be a constant c > 0 such that, as long as there are fewer than cs faults, we don’tlose the entire data structure. Notice that c cannot be greater than or equal to 1, becauseif we lose all of the nodes to which the s secure words point, the entire data structure isnecessarily lost.2. Suppose f ≤ cs faults occur before a fault is detected. The number of nodes we lose shoulddepend only on f, not on the size of the entire data structure, n. We should lose no morethan O(`(f)) nodes for some reasonable (polynomial) function `.3. Similarly, if f faults occur before a fault is detected, the recovery should take time that isindependent of n. The data structure should be able to recover to a consistent state usingO(r(f, s)) time for some polynomial function r.4. A recovery should preserve some basic properties of the data structure. For example, theelements of a linked list or a stack that are not lost should remain in the same order theywere before the fault. The recovery procedure must guarantee such a specified property ofthe structure.2 StackIn this section we consider the stack data structure. We divide the n elements into dn/se layers ofconsecutive elements as shown. The top layer is the only layer with possibly fewer than s elements.sn/s...Horizontal links.Horizontal links indicate the list order and are not actually stored, because the Stack data structuredoes not require functionality for finding the sucessor of a given node.We now connect the layers vertically, and use the secure storage to point to the topmost elementin each column:...Top...2Vertical links.We are not yet done: even using both horizontal and vertical links, we can easily lose Ω(s2) nodeswith O(s) pointer faults, and we would like to do better.We now connect every dlg se + 1 layers with a butterfly exchange network. This means connectingx in level i + 1 to x ⊕ 2i mod dlg se+1at level i. In other words, we flip the lowest-order bit in thebinary representation of the column number in the first layer, then the next bit in the next layer,and so on to the highest-order bit. These links are called diagonal links.lowestorder bithighestorder bit...Diagonal Links.In the structure with vertical and diagonal links, every node has exactly 2 outgoing pointers, andeither 2 incoming pointers or 1 secure incoming pointer (the top-layer nodes).2.1 ImplementationThe top-layer element in each column has a secure pointer. There is also a secure counter to storewhich column starts the topmost layer. We now need to implement Top, Pop, and Push methodsof stack:• Top:– Follow the secure pointer corresponding to the first column of the topmost layer.• Pop:– Remove the Top element– Follow the Down pointer to update the secure pointer– Increment the secure counter• Push:– Decrement the secure counter modulo s– Add a node to this column– Add Down, Right, and Diagonal pointers using secure pointersNext, we are consider recovery algorithm which is invoked any time a failed node is encounteredby any of the above procedures.32.2 Recovery Algorithm• Repeatedly pop nodes from the stack.• For every node, check whether it is accessible by either of the incoming pointers from aprevious level. If this fails, mark this node as lost and discard.• Stop when s accessible (not lost and not faulty) consecutive nodes have been encountered.• We now have all the secure pointers required for the next level.• Push all popped nodes back onto the stack.2.3 Fault Analysisxi...i+1...x 2i+Suppose we have seen f faults so far. (There might be more down below.) We claim that O(f lg f)extra nodes are lost in this case:• Consider inaccessible node x in layer i. If x ⊕ 2imoddlg se+1(the diagonal pointer from x) isalso inaccessible in layer i, then both are inaccessible in layer i + 1. If x ⊕ 2imoddlg se+1isaccessible in layer i, then both are accessible in layer i + 1.• We can have at most f lg f pairs hx, ii such that x and x ⊕ 2imoddlg se+1are both faulty atany layer. Provided that f ≤ s/2, we stay in one butterfly structure.• Therefore, the number of lost nodes is O(f lg f).• Therefore, the number of inaccessible nodes is O(f(1 + lg f)) = O(f lg f).2.4 Recovery Algorithm Performance• There are O(f lg f) inaccessible nodes.• Therefore, at most O(f lg f) layers are considered.• There are s elements per layer, so recovery time is O(sf lg f).• We can get rid of the lg f factor in the recovery time bound by noting that there can be atmost f layers with inaccessible


View Full Document

MIT 6 897 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?