Huffman CodesSlide 2Slide 3Slide 4Slide 5Slide 6PowerPoint PresentationSlide 8Slide 9Slide 10Huffman Codes•Information coding:–Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code.–Why? These electric signals are either present or absent at any specific time.•Suppose Voyager on-board camera is sensitive to four shades of gray:–White–Light gray–Dark gray–black•Camera picture is digitized into 24000 (400*600) “dots”, then transmitted by radio to Earth, in a single stream of signals, to be reconstructed and printed.Huffman Codes•In designing a binary code, we want to decide how to encode the “color” of each dot in binary, so that:–1) No waste of signals (efficiency)–2) Recognizable (later) •Example: encode–White – 0001 –Light gray – 0010 –Dark gray – 0100 –Black – 1000 WASTEFUL!! One picture would cost 4*24000 = almost 100 000 signals4 “digits” per symbol (dot)•How many digits do you need?–1 not enough, only 2 values–2 ok 4 values–3 too much–…Huffman CodesFixed-length code of length 2 (2 yes/no questions suffice to identify the color)No problem on receiving end, every two digits define a dot. •Try 2:–W – 00–LG – 01–DG – 10–B – 11Encoding mechanism: Decision tree0W0 1 101LGDGBStart at root, follow till leaf is reachedHuffman Codes•There are other shapes with four leaf nodes0W01101LGDG B Which one is better? Criterion is weighted average length 00 1101 Suppose we have these probabilities: W -- .40 -- 1 LG -- .30 -- 00 DG -- .18 -- 011 B -- .12 -- 0101000 11Huffman Codes•VARIABLE – LENGTH CODE•Weighted average for tree 1 = .40*2 + .30*2 + .18*2 + .12*2 = 2•Weighted average for tree 2 = .40*1 + .30*2 + .18*3 + .12*3 = 1.9•On average, tree 2 is better, costs only 1.9*24000 = 45600, less than half of first try.Huffman Codes•General problem:–Given n symbols, with their respective probabilities, which is the best tree? (code?)–To determine the fewest digits (yes/no questions necessary to identify the symbol)•Construct the tree from the leaves to root:–1) label each leaf with its probabilities–2) Determine the two fatherless nodes with the smallest probabilities. In case of tie, choose arbitrarily.–3) Create a father for these two nodes; label father with the sum of the two probabilities.–4) Repeat 2) 3) until there is 1 fatherless node (the root).•In our case:By convention, left is 0, right is 1.12 .18 .30 .40 B DG LG W01.0.60.300 1101Using this method, the code obtained is minimum – redundancy, or Huffman code.So, we have: W -- .40 -- 1 LG -- .30 -- 01 DG -- .18 -- 001 B -- .12 -- 000a – 01b – 11c – 10d – 001e – 000 Sample Huffman code; minimize the average number of yes/no questions necessary to distinguish 1 of 5 symbols that occur with known probabilities. 0.11e0.15d0.21c0.25b0.28a0.540.460 10.2601011.0010•Weighted Average Length = 2*(.28+.25+.21)+3*(.15+.11) = 2*.74 + 3*.26 = 2.26 The Huffman code is always a prefix code. A prefix code satisfies the prefix condition. A code satisfies the prefix condition if no code is a prefix of another code.A Prefix code: a:0b:1c:00d:01 If met with 00, it is ambiguous, can’t figure out if it is aa or c 00 11Not A Prefix code: 0011a:0b:01c:10 Not ambiguous1Not A Prefix code: 101001000 00110At any point, it’s possible to delimit the symbol
View Full Document