Huffman EncodingText Compression (Zip)ASCII CodesFirst ApproachOf Course!!However…Prefix CodesFor ExampleHow do we find the optimal coding tree?Constructing a Huffman CodeSlide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Decode the treeIn class practiceHuffman EncodingDr. Bernard Chen Ph.D.University of Central ArkansasText Compression (Zip)On a computer: changing the representation of a file so that it takes less space to store or/and less time to transmit.Original file can be reconstructed exactly from the compressed representationASCII CodesLet the wordFirst ApproachLet the wordHow to write this string in a most economical way?Since it has 5 words, 3 bit to represent it is required!!Is there a better way?Of Course!!However…There are some concerns…Suppose we have A-> 01B-> 0101If we have 010101, is this AB? BA? Or AAA?Therefore: prefix codes, no codeword is a prefix of another codeword, is necessaryPrefix CodesAny prefix code can be represented by a full binary treeEach leaf stores a symbol.Each node has two children – left branch means 0, right means 1.codeword = path from the root to the leaf interpreting suitably the left and right branchesFor ExampleA = 0B = 100C = 1010D = 1011R = 11Decoding is unique and simple!How do we find the optimal coding tree?it is clear that the two symbols with the smallest frequencies must be at the bottom of the optimal tree, as children of the lowest internal nodeThis is a good sign that we have to use a bottom-up manner to build the optimal code!Huffman’s idea is based on a greedy approach, using the previous notices.Constructing a Huffman CodeAssume that frequencies of symbols are A: 50 B: 15 C: 10 D: 10 R: 18Smallest numbers are 10 and 10 (C and D)Constructing a Huffman CodeNow Assume that frequencies of symbols are A: 50 B: 15 C+D: 20 R: 18C and D have already been used, and the new node above them (call it C+D) has value 20The smallest values are B + RConstructing a Huffman CodeNow Assume that frequencies of symbols are A: 50 B+R: 33 C+D: 20The smallest values are (B + R)+(C+D)=53Constructing a Huffman CodeNow Assume that frequencies of symbols are A: 50 (B+R) + (C+D): 53The smallest values are A+ ((B + R)+(C+D))=103Constructing a Huffman CodeConstructing a Huffman CodeAssume that frequencies of symbols are A: 50 B: 20 C: 10 D: 10 R: 30Smallest numbers are 10 and 10 (C and D)Constructing a Huffman CodeAssume that frequencies of symbols are A: 50 B: 20 C: 10 D: 10 R: 30C and D have already beenused, and the new node above them (call it C+D) has value 20The smallest values are B, C+DConstructing a Huffman CodeAssume that frequencies of symbols are A: 50 B: 20 C: 10 D: 10 R: 30Next, B+C+D (40) and R (30)Constructing a Huffman CodeAssume that frequencies of symbols are A: 50 B: 20 C: 10 D: 10 R: 30Finally:Constructing a Huffman CodeDecode the treeSuppose we have the Following code:10001011What is the decode result?In class practiceA: 10B: 10C: 25D: 15E: 30F: 21What is the Huffman Encoding
View Full Document