LECTURE 6 Last time: Kraft inequality • optimal codes. • Lecture outline Huffman codes • Reading: Scts. 5.5-5.7.Kraft inequality Any instantaneous code C with code lengths l1, l2, . . . , lm must satisfy m� D−li ≤ 1 i=1 Conversely, given lengths l1, l2, . . . , lm that satisfy the above inequality, there exists and instantaneous code with these codeword lengths How do we achieve such a code in a prac-tical fashion? Make frequent elements short and infre-quent one longer.Huffman codes Definition: let X be a set of m source sym-bols, let D be a D-ary alphabet. A Huffman code CHuff is an optimum instanta-: X �→ D∗ neous code in which the 2+((m−2)mod(D− 1)) least likely source symbols have the same length and differ only in the last digit Proposition: for any set of source symbols X with m symbols, it is possible define a Huffman code for those source symbols Consider a binary code: reorder the xi in terms of decreasing prob-ability the two least likely symbols are xm−1, xm�Huffman codes for binary D For the code C to be optimal, l(xi) ≥ l(xj) for i ≥ j for every maximal length codeword C(xi) there must a codeword C(xj) that differs only in the last bit -otherwise erase one bit while still satisfying prefix condition to satisfy that C(xm) and C(xm−1) differ only in the last bit: find xi such that C(xm) and C(xi) differ only in the last bit and if xi = xm, swap them repeat with code for symbols x1, . . . , xm−2How do we construct them? Find the q = 2 + ((m − 2)mod(D − 1)) least likely source symbols xm, . . . , xm−q+1 Delete these symbols from the set of source symbols and replace them with a single sym- bol ym−q Assign p(ym−q) = �mi=m−qp(xi) Now we have new set of symbols X� Construct a code CHuff,m−q : X� �→ D∗ Note: could be using arbitrary weight func- tion instead of probabilityWhy does this work? Illustrate for binaryWhy does this work? Amalgamation is not always least likely event in X�Why does this work? Two questions arise: Why is it enough to now find a Huffman code CHuff,m−q? Where does the the q = 2 +((m− 2)mod(D−1)) come from? Add one more letter for the q symbols xm, . . . , xm−q with respect to CHuff,m−q Average length of code is average length of CHuff,m−q, plus p(ym−q) = �mi=m−qp(xi) Could we have done better by taking some unused node in CHuff,m−q to represent some of the xm, . . . , xm−q? We’ll see that this is not possible and it is related to the first questionComplete trees Definition: a complete code tree is a fi-nite code tree in which each intermediate node has D nodes of the next higher order stemming from it In a complete tree the Kraft inequality is satisfied with equalityComplete trees The number of terminal nodes in a com-plete code tree with alphabet size D must be of the form D + n(D − 1) Smallest complete tree has D terminal nodes When we replace a terminal node by an in-termediate node, we lose one terminal node and gain D more, for a net gain of D − 1Optimal codes and complete trees Optimal code can be seen as a complete tree with some number B of unused termi-nal nodes By contradiction, if there are incomplete intermediate nodes, nodes of higher order could complete intermediate nodes without adverse effect on length B ≤ D −2, otherwise we could swap unused terminal nodes to group D − 1 of them, in which case we can altogether eliminate those terminal nodesOptimal codes and complete trees How large is B? B + m = n(D − 1) + D so D − 2 − B is the remainder of dividing m − 2 by D − 1, or (m − 2)mod(D − 1) B = D − 2 − ((m − 2)mod(D − 1)) That is why we first group the q = 2+((m− 2)mod(D − 1)) least likely source symbols After we have grouped those symbols, a complete tree is needed for the remaining m − q symbols plus the symbol created by the amalgamation of the least likely q sym-bols Use the fact that B + q = D m − q + 1 = n(D − 1) + D − B − q + 1 = n(D − 1) + 1 = (n − 1)(D − 1) + DWhat happens if the unlikely events change probability? Major change may be necessary in the code Cannot do a good job of coding until all events have been cataloguedMIT OpenCourseWarehttp://ocw.mit.edu 6.441 Information Theory Spring 2010 For information about citing these materials or our Terms of Use, visit:
View Full Document