DOC PREVIEW
Duke CPS 100E - Text Compression

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CompSci 100E34.1Text Compressionÿ Input: String Sÿ Output: String S′′′′þ Shorterþ S can be reconstructed from S′′′′CompSci 100E34.2Text Compressionÿ …He said to his friend, "If the British marchBy land or sea from the town to-night,Hang a lantern aloft in the belfry archOf the North Church tower as a signal light,--One if by land, and two if by sea;And I on the opposite shore will be,Ready to ride and spread the alarmThrough every Middlesex village and farm,For the country folk to be up and to arm."…Henry Wadsworth Longfellow (1807-1882)By land 1By sea 2CompSci 100E34.3Text Compression: Examples00101101100100d0101001100011c1010001100101e1100101100010b00000001100001aVar.lengthFixedlengthASCIISymbol“abcde” in the different formatsASCII:01100001011000100110001101100100…Fixed:000001010011100Var:00011010011000000001111a b c d eadbc e00001111EncodingsASCII: 8 bits/characterUnicode: 16 bits/characterCompSci 100E34.4Huffman coding:go go gophersÿ Encoding uses tree:þ 0 left/1 rightþ How many bits? 37!!þ Savings? Worth it?ASCII(7bits) 3 bits Huffmang 1031100111000 00o 1111101111001 01p 1121110000010 1100h 1041101000011 1101e 1011100101100 1110r 1141110010101 1111s 1151110011110 101sp. 321000000111 1013s1*22p1h12e1r14g3o3632p1h12e1r14s1*27g3o3613CompSci 100E34.5Huffman Codingÿ D.A Huffman in early 1950’sÿ Before compressing data, analyze the input streamÿ Represent data using variable length codesÿ Variable length codes though Prefix codesþ Each letter is assigned a codewordþ Codeword is for a given letter is produced by traversingthe Huffman treeþ Property: No codeword produced is the prefix of anotherþ Letters appearing frequently have short codewords, whilethose that appear rarely have longer onesÿ Huffman coding is optimal per-character codingmethodCompSci 100E34.6Building a Huffman treeÿ Begin with a forest of single-node trees (leaves)þ Each node/tree/leaf is weighted with character countþ Node stores two values: character and countþ There are n nodes in forest, n is size of alphabet?ÿ Repeat until there is only one node left: root of treeþ Remove two minimally weighted trees from forestþ Create new tree with minimal trees as children,o New tree root's weight: sum of children (character ignored)ÿ Does this process terminate? How do we get minimal trees?þ Remove minimal trees, hmmm……CompSci 100E34.7Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1C1F1P2U2R2L2D2G3T3O3B3A4M4SCompSci 100E34.8Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1C1F1P2U2R2L2D2G3T3O3B3A4M4S22CompSci 100E34.9Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3T3O3B3A4M4S2233CompSci 100E34.10Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3T3O3B3A4M4S223344CompSci 100E34.11Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3T3O3B3A4M4S22334444CompSci 100E34.12Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S233444455CompSci 100E34.13Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 344445566CompSci 100E34.14Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34444556666CompSci 100E34.15Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 3444455666886CompSci 100E34.16Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 3445566688688CompSci 100E34.17Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34455666886881010CompSci 100E34.18Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 3445668868810101111CompSci 100E34.19Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 3445688688101011111212CompSci 100E34.20Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 344568681610101111121216CompSci 100E34.21Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 344568681610211112121621CompSci 100E34.22Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34456868161021111223162123CompSci 100E34.23Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34456868161021111223372337CompSci 100E34.24Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34456868161021111223376060CompSci 100E34.25Encoding1. Count occurrence of all occurring character O( )2. Build priority queue O()3. Build Huffman tree O( )4. Create Table of codes from tree O( )5. Write Huffman tree and coded data to file O( )NAA log AA log ANCompSci 100E34.26Properties of Huffman codingÿ Want to minimize weighted path length L(T)of tree Tÿþ wiis the weight or count of each codeword iþ diis the leaf corresponding to codeword iÿ How do we calculate character (codeword)frequencies?ÿ Huffman coding creates pretty full bushy trees?þ When would it produce a “bad” tree?ÿ How do we produce coded compressed data frominput efficiently?ÿ∈=)()(TLeafiiiwdTLCompSci 100E34.27Writing code out to fileÿ How do we go from characters to encodings?þ Build Huffman treeþ Root-to-leaf path generates encodingÿ Need way of writing bits out to fileþ Platform dependent?þ Complicated to write bits and read in same orderingÿ See BitInputStream and BitOutputStream classesþ Depend on each other, bit ordering preservedÿ How do we know bits come from compressed file?þ Store a magic numberCompSci 100E34.28Decoding a message116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34456868161021111223376001100000100001001101CompSci 100E34.29Decoding a message116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 3445686816102111122337601100000100001001101CompSci 100E34.30Decoding a message116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2


View Full Document

Duke CPS 100E - Text Compression

Documents in this Course
Topics

Topics

9 pages

Lecture

Lecture

3 pages

Notes

Notes

2 pages

Hashing

Hashing

19 pages

Lecture

Lecture

59 pages

Lecture

Lecture

6 pages

Lecture

Lecture

4 pages

Lecture

Lecture

20 pages

Lecture

Lecture

12 pages

Lecture

Lecture

12 pages

Lecture

Lecture

7 pages

Lecture

Lecture

8 pages

Lecture

Lecture

10 pages

Lecture

Lecture

4 pages

Notes

Notes

16 pages

Lecture

Lecture

5 pages

Lecture

Lecture

9 pages

Lecture

Lecture

4 pages

Lecture

Lecture

13 pages

Lecture

Lecture

6 pages

Lecture

Lecture

16 pages

Lecture

Lecture

5 pages

Lecture

Lecture

5 pages

Lecture

Lecture

12 pages

Lecture

Lecture

12 pages

Lecture

Lecture

10 pages

Sets

Sets

14 pages

Lecture

Lecture

9 pages

Lecture

Lecture

4 pages

Test 1

Test 1

7 pages

Load more
Download Text Compression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Text Compression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Text Compression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?