MIT 6 897 - Huffman Coding - D2405488

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 897> Huffman Coding

MIT 6 897 - Huffman Coding

School name Massachusetts Institute of Technology

Course 6 897-

Pages 8

Download Save

Unformatted text preview:

6.897: Advanced Data Structures Spring 2003Lecture 13 (Compression) — April 9, 2003Prof. Erik Demaine Scribes: Percy Liang, David Malan1 OverviewThe motivation for compression is the following: a sender S wishes to transmit some data—a strings of letters—through a channel to a receiver R. S encodes s into some code c (hopefully smallerthan s) and sends c across the channel. R decodes c to recover s. We want to take advantage ofknowledge about the probability distribution of s to minimize the expected number of bits in c.First, we consider the case where the distribution of s is based on a stochastic source that generatesa sequence of independent letters. We examine how Huffman coding (Section 2) and arithmeticcoding (Section 3) attempt to achieve entropy.Next, we see how we can do better if our data comes from a non-stochastic source. We generalizethe notion of entropy (Section 4) and introduce the Burrows-Wheeler Transform (BWT) to achievekth-order entropy (Section 5).2 Huffman CodingAssume that our stochastic source generates letters from a finite alphabet Σ such that letter i isgenerated with probability pi. The probability distribution of letters is static.2.1 GoalsThe idea of Huffman coding [1] is to construct a prefix code for Σ: associate each letter i with acodeword (bit string) wi. The goal is to minimize the average codeword length:E£|wi|¤=|Σ|Xi=1pi|wi|.To be a prefix code, no wican be a prefix of another wj. This means that, when the receiver isreading the incoming stream of codewords, reading wiunambiguously refers to letter i. A binaryprefix code corresponds to all the root-to-leaf paths of a binary tree. See an example in Figure 1.The receiver does not need to know the probability distribution, but only the codewords. (Ofcourse, given the probability distribution, one can recover the codewords if some canonical way ofbreaking ties between equal probabilities in the Huffman algorithm is established.)1Letter i piwiA 0.1 000B 0.1 001C 0.2 01D 0.3 10E 0.3 11Table 1: Σ = {A, B, C, D, E}, and each letter’s probability piis given in the table. The codewordswiare an example of an optimal Huffman code. Figure 1 shows the corresponding binary tree ofthis code.C DABE000111011.00.40.20.10.1 0.20.30.30.6Figure 1: The binary tree of the example Huffman code in Table 1. Each node v is labelled with thesum of the probabilities of the leaves in the subtree rooted at v. The labels of the edges traversedin a root-to-leaf path determine the codeword for the letter corresponding to that leaf.2.2 AlgorithmNow we present an algorithm for constructing the binary tree of a Huffman code. Note that thisproblem is similar to constructing an optimal BST, but there are two differences: (1) the order ofthe leaves is irrelevant and (2) only root-to-leaf paths matter with respect to optimality. Becausethe problem is less constrained, we can solve it very efficiently using a greedy algorithm.The algorithm is as follows:1. Start with the |Σ| disconnected leaves, which are trivial subtrees.2. Choose the two subtrees t1and t2with smallest pt1and pt2.3. Create a new internal node t with the two children t1and t2. t defines a subtree withprobability pt= pt1+ pt2.4. Until we have a single tree, repeat Steps 2–3. The root node is added last with probability 1.Steps 2–3 are repeated |Σ| − 1 times.2.3 PropertiesTheorem 1. A Huffman code minimizes E£|wi|¤= E£codeword length¤= E£leaf depth¤.2Proof. A formal proof is given in [1]. The main idea is that we first show that in an optimum tree,if a leaf vahas a lower probability than vb, then the depth of vais at most the depth of vb. Then ifthe optimal code is a non-Huffman code, we can always greedily interchange two leaves to improve(or not worsen) the expected depth of a leaf.Theorem 2. E£huffman codeword length¤∈ [H, H + 1), where H is the entropy of the probabilitydistribution of Σ.Proof. Consider an infinite complete binary tree. The claim is that you can, for each letter i, picka node liin the binary tree such that depth(li) =§lg1pi¨and liis not an ancestor of any lj. Then,take the tree on those leaves l1, l2, . . . , l|Σ|, and we haveE£leaf depth¤=|Σ|Xi=1pillg1pim< H + 1.Suppose p1≥ p2≥ · · · ≥ pn. Then pick l1, l2, . . . , lnin order of increasing leaf depth. After pickingliarbitrarily, there are still dd1piee − 1 nodes at that depth remaining, which is enough to supportthe 1 − pi≤ 1 − bbpicc remaining probability.We can also generalize Huffman codes to nonbinary codes (nonbinary trees), or to k-grams insteadof letters, but we omit the details.3 Arithmetic Coding3.1 MotivationConsider an example where Σ = {a, b} with pa=11024and pb= 1 −11024. One can verify thatH ≈ 0.01. However, a Huffman code is wa= 0 and wb= 1. The expected code length E£|wi|¤isobviously 1, which is about 100 times greater than the entropy H!In this example, Huffman coding is terrible in the multiplicative sense. The problem is that code-words can only be an integer number of bits, which forces us to round up to 1 no matter how smallthe entropy is. A clever idea is to aggregate several fractional bits into one bit. This is done inarithmetic coding [4]. Instead of losing 1 bit per letter, we lose 1 bit over the entire string. Thus,for long strings, we basically achieve entropy.3.2 The CodeThe idea is to encode each string as a unique real number in [0, 1]. We divide the [0, 1] interval into|Σ| first-level subintervals, with the length of each subinterval i equal to pi. Then, we recursivelydivide each subinterval in the same way, scaling accordingly. Given a real number r, the first-levelsubinterval of [0, 1] containing r determines the first letter of the decoded string. Which subintervalof the second-level subinterval contains r determines the second letter, and so on. The sender mustinclude the length of the string (an additional O(lg n) bits) with the code so that the receiver knowswhen to stop recursing.3For infinite strings, the code produces a real number, but for finite strings, there is a intervalof acceptable real numbers (L, R]. (We use L and R to represent both the real numbers andtheir corresponding bit-string encodings.) When encoding a string, we arbitrarily pick the numberlcp(L, R)1. Here lcp(L, R) is the longest common prefix of the bit strings of L and R. If we thinkof the bit strings as having an infinite number of trailing zeros, then after lcp(L, R), L reads 0 . . .and R reads 1 . . . , so clearly,

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 897 - Huffman Coding

Sign up for free to view:

Please select your school