DOC PREVIEW
Penn CIT 594 - Huffman Encoding

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Huffman EncodingEntropyActual information contentFixed and variable bit widthsExample Huffman encodingWhy it worksCreating a Huffman encodingExample, step IExample, step IIExample, step IIIExample, step IVExample, step VUnique prefix propertyPractical considerationsAbout the exampleData compressionThe EndJan 13, 2019Huffman EncodingEntropyEntropy is a measure of information content: the number of bits actually required to store data.Entropy is sometimes called a measure of surpriseA highly predictable sequence contains little actual informationExample: 11011011011011011011011011 (what’s next?)Example: I didn’t win the lottery this weekA completely unpredictable sequence of n bits contains n bits of informationExample: 01000001110110011010010000 (what’s next?)Example: I just won $10 million in the lottery!!!!Note that nothing says the information has to have any “meaning” (whatever that is)Actual information contentA partially predictable sequence of n bits carries less than n bits of informationExample #1: 111110101111111100101111101100Blocks of 3: 111110101111111100101111101100Example #2: 101111011111110111111011111100Unequal probabilities: p(1) = 0.75, p(0) = 0.25Example #3: "We, the people, in order to form a..."Unequal character probabilities: e and t are common, j and q are uncommonExample #4: {we, the, people, in, order, to, ...}Unequal word probabilities: the is very commonFixed and variable bit widthsTo encode English text, we need 26 lower case letters, 26 upper case letters, and a handful of punctuationWe can get by with 64 characters (6 bits) in allEach character is therefore 6 bits wideWe can do better, provided:Some characters are more frequent than othersCharacters may be different bit widths, so that for example, e use only one or two bits, while x uses severalWe have a way of decoding the bit streamMust tell where each character begins and endsExample Huffman encodingA = 0B = 100C = 1010D = 1011R = 11ABRACADABRA = 01001101010010110100110This is eleven letters in 23 bitsA fixed-width encoding would require 3 bits for five different letters, or 33 bits for 11 lettersNotice that the encoded bit string can be decoded!Why it worksIn this example, A was the most common letterIn ABRACADABRA:5 As code for A is 1 bit long2 Rs code for R is 2 bits long2 Bs code for B is 3 bits long1 C code for C is 4 bits long1 D code for D is 4 bits longCreating a Huffman encodingFor each encoding unit (letter, in this example), associate a frequency (number of times it occurs)You can also use a percentage or a probabilityCreate a binary tree whose children are the encoding units with the smallest frequenciesThe frequency of the root is the sum of the frequencies of the leavesRepeat this procedure until all the encoding units are in the binary treeExample, step IAssume that relative frequencies are:A: 40B: 20C: 10D: 10R: 20(I chose simpler numbers than the real frequencies)Smallest number are 10 and 10 (C and D), so connect thoseExample, step IIC and D have already been used, and the new node above them (call it C+D) has value 20The smallest values are B, C+D, and R, all of which have value 20Connect any two of theseExample, step IIIThe smallest values is R, while A and B+C+D all have value 40Connect R to either of the othersExample, step IVConnect the final two nodesExample, step VAssign 0 to left branches, 1 to right branchesEach encoding is a path from the rootA = 0B = 100C = 1010D = 1011R = 11Each path terminates at a leafDo you see why encoded strings are decodable?Unique prefix propertyA = 0B = 100C = 1010D = 1011R = 11No bit string is a prefix of any other bit stringFor example, if we added E=01, then A (0) would be a prefix of ESimilarly, if we added F=10, then it would be a prefix of three other encodings (B=100, C=1010, and D=1011)The unique prefix property holds because, in a binary tree, a leaf is not on a path to any other nodePractical considerationsIt is not practical to create a Huffman encoding for a single short string, such as ABRACADABRATo decode it, you would need the code tableIf you include the code table in the entire message, the whole thing is bigger than just the ASCII messageHuffman encoding is practical if:The encoded string is large relative to the code table, ORWe agree on the code table beforehandFor example, it’s easy to find a table of letter frequencies for English (or any other alphabet-based language)About the exampleMy example gave a nice, good-looking binary tree, with no lines crossing other linesThat’s because I chose my example and numbers carefullyIf you do this for real data, you can expect your drawing will be a lot messier—that’s OKExample:Data compressionHuffman encoding is a simple example of data compression: representing data in fewer bits than it would otherwise needA more sophisticated method is GIF (Graphics Interchange Format) compression, for .gif filesAnother is JPEG (Joint Photographic Experts Group), for .jpg filesUnlike the others, JPEG is lossy—it loses informationGenerally OK for photographs (if you don’t compress them too much), because decompression adds “fake” data very similiar to the originalThe


View Full Document

Penn CIT 594 - Huffman Encoding

Documents in this Course
Trees

Trees

17 pages

Searching

Searching

24 pages

Pruning

Pruning

11 pages

Arrays

Arrays

17 pages

Stacks

Stacks

30 pages

Recursion

Recursion

25 pages

Hashing

Hashing

24 pages

Recursion

Recursion

24 pages

Graphs

Graphs

25 pages

Storage

Storage

37 pages

Trees

Trees

21 pages

Arrays

Arrays

24 pages

Hashing

Hashing

24 pages

Recursion

Recursion

25 pages

Graphs

Graphs

23 pages

Graphs

Graphs

25 pages

Stacks

Stacks

25 pages

Recursion

Recursion

25 pages

Quicksort

Quicksort

21 pages

Quicksort

Quicksort

21 pages

Graphs

Graphs

25 pages

Recursion

Recursion

25 pages

Searching

Searching

24 pages

Counting

Counting

20 pages

HTML

HTML

18 pages

Recursion

Recursion

24 pages

Pruning

Pruning

11 pages

Graphs

Graphs

25 pages

Load more
Download Huffman Encoding
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Huffman Encoding and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Huffman Encoding 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?