Unformatted text preview:

6.897: Advanced Data Structures Spring 2003Lecture 12 — Monday, April 7, 2003Prof. Erik Demaine Scribes: Wayland Ni and Gautam Jayaraman1 OverviewIn the last lecture we solved the static least common ancestor (LCA) problem in constant timeper operation after linear preprocessing. We also built suffix trees in linear time after sorting thealphabet.In this lecture we establish succinct data structures, which are [typically static] structures that usethe information-theoretic optimum space up to lower-order terms and support basic navigationaloperations quickly.2 Succinct Data StructuresOur goal is to use space equal to the information-theoretic optimum (or information-theoretic lowerbound ) plus lower-order terms and still be able to manipulate and search the structure. Specifically,for an n-bit string, we would like to get the right constant term in front of the n instead of justsettling for O(n).2.1 Binary Trees/TriesFor a given binary trie, we would like to know how many bits it would take to encode the trie. Westart by noting that the number of distinct rooted labelled binary tries on n nodes is the Catalannumber Cn, where Cn≡¡2nn¢/(n + 1). Then, on average, it would take lg Cnbits to encode asingle trie.lg Cn= lg(2n)!(n!)2− lg(n + 1)= lg(2n)! − 2 lg(n!) − lg(n + 1)= 2n lg(2n) − 2n lg n + o(n) (from Stirling’s Approximation)= (2n)(1 + lg n) − 2n lg n + o(n)= 2n + o(n)So we can intuitively (log-asymptotically) think of Cnas being roughly 4n, because lg Cn∼ 2n.2.1.1 Sidenote: Prefix CodesPrefix codes are inserted at the beginning of a code to establish the length of the encoding, so thereader knows when to stop reading. A good tool to know is how to make codes into prefix codes.1Given an n-bit string, we must encode its length. Of course, lg n bits are needed to encode thelength of the string. But then, we also need lg lg n bits to encode the length of the extra lg n bits,and lg lg lg n bits to encode the length of the extra lg lg n bits, etc. Thus, the length of the optimalprefix code would be n + lg n + lg lg n + ··· + 1 + lg∗n.However, since the length of the optimal code is such a complex expression, we can simplify thecode by using a little extra space. We use lg n bits (the “prefix”) to encode the length of the n-bitstring and another lg n bits (the “pre-prefix”) to encode the length of the lg n-bit prefix in unary.In other words, we repeat 0’s followed by a 1 to indicate the length. The total length would thenbe n + 2 lg n.2.2 Level-Order RepresentationThe trivial way to use the optimum space, lg Cn≈ 2n + o(n), would be to use a lookup table sowe could refer to a specific binary trie by number. However, our goal is to be able to navigate andoperate on the structure efficently as well.Definition: An internal node is a node that contains an element.Definition: An external node is a node without an element. External nodes are appended to thetree for each missing child.We now examine an encoding that is more navigable, called level-order encoding. To construct theencoding, we first append an external node for each missing child. Now, there are 2n + 1 nodes, sowe spend 1 bit per node. We visit the nodes in level order, that is, we examine the nodes one rowat a time, starting at the root and continuing downwards, going from left to right within each row.We write down a 1 for an internal node and a 0 for an external node. Since this encoding takes2n + 1 total bits, it must be optimal.Here is an example encoding for the tree below:A B C * D E F * G * * * * * *1 1 1 0 1 1 1 0 1 0 0 0 0 0 0ABDGCE FAB·D·G· ·CE· ·F· ·Visual representation of binary trie (left) and with added missing children (right).Claim 1. The positions of the left and right children of the ith internal node are 2i and 2i + 1.2Proof. Suppose there are j external nodes up to the ith internal node. We show that strictlybetween the ith internal node and its left child, there are i − j − 1 nodes:• offspring of i − 1 previous internal nodes = (2(i − 1)).• i − 1 of which were internal nodes, (all but root).• j of which were external nodes.Thus, the number of nodes strictly between the ith internal node and its left child is 2(i −1) −(i −1) − j = i − j −1.Therefore, the position of left child = i + j + (i − j − 1) + 1 = 2i.2.3 Rank and SelectWe define the functions Rank(i) and Select(i) in a static bit string:Rank(i) ≡ Number of 1’s at or before index iSelect(i) ≡ Index of ith 1 bitNote that Rank(i) and Select(i) are inverses. We can now define left-child(i), right-child(i) andparent(i) as functions of Rank(i) and Select(i):left-child(i) ≡ 2 × Rank(i)right-child(i) ≡ 2 × Rank(i) + 1parent(i) ≡ Select(bi/2c)Our goal is to support these queries in O(1) time and o(n) extra space.2.3.1 RankSuppose we divide up the string into bit strings of length12lg n. Using a lookup table on each piecewould require O(√n lg n lg lg n) space. Between each pair of pieces, we must store the number of1’s so far from the left. Since we split the string inton1/2lgn=2nlg npieces, and lg n bits are neededto store the cumulative counts of 1’s, the total cost is Θ(n), which is too much.Instead, using the idea from [J89], we can split the string inton(lg n)2pieces of length (lg n)2. Thenstoring the cumulative ranks will cost O(n(lg n)2lg n) = O(nlg n) bits, which is o(n).Now we must split the (lg n)2-size chunks into12lg n-size sub chunks. We then store the cumulativeranks within chunks between each pair of sub chunks. Essentially, we are storing ”local” countsof 1 bits between every pair of12lg n-bit sub chunks. Moreover, since each count is local to thespecific chunk, the maximum number of 1 bits is (lg n)2, which takes lg lg n bits to encode. Sincethere arenlg ntotal subchunks, and each local count can be encoded by lg lg n bits, the amount ofspace used is O(nlg nlg lg n) = o(n).At the bottom level of the structure, a simple lookup table can be used.31/2 lg n1/2 lg n store cumulative ranks(lg n)^2store cumulative ranksstore cumulative rankswithin chunksFigure 1: Division of strings into bit strings of length12lg n (top) and length (lg n)2(bottom)2.3.2 SelectImplementing Select is also possible but much harder, because the splits aren’t “even” in terms ofsparseness of 1 bits. The (complicated) details can be found in [CM96].2.4 TransformationsWe now show three succinct binary trie representations, and how to move between


View Full Document

MIT 6 897 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?