DOC PREVIEW
Duke CPS 100E - Text Compression

This preview shows page 1-2-3-4-27-28-29-30-56-57-58-59 out of 59 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 59 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Text CompressionSlide 2Text Compression: ExamplesHuffman coding: go go gophersHuffman CodingBuilding a Huffman treeBuilding a treeSlide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24EncodingProperties of Huffman codingWriting code out to fileDecoding a messageSlide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48DecodingHuffman Tree 2Slide 51Slide 52Slide 53Slide 54Slide 55Slide 56Slide 57Slide 58Other methodsCompSci 100E34.1Text CompressionInput: String SOutput: String SShorterS can be reconstructed from SCompSci 100E34.2Text Compression… He said to his friend, "If the British marchBy land or sea from the town to-night,Hang a lantern aloft in the belfry archOf the North Church tower as a signal light,--One if by land, and two if by sea;And I on the opposite shore will be,Ready to ride and spread the alarmThrough every Middlesex village and farm,For the country folk to be up and to arm." … Henry Wadsworth Longfellow (1807-1882)By land 1By sea 2CompSci 100E34.3Text Compression: ExamplesSymbolASCII FixedlengthVar.lengtha 01100001000 000b 01100010001 11c 01100011010 01d 01100100011 001e 01100101100 10“abcde” in the different formatsASCII: 01100001011000100110001101100100…Fixed: 000001010011100 Var: 000110100110000000011 11a b c d eadbc e0000 1111EncodingsASCII: 8 bits/characterUnicode: 16 bits/characterCompSci 100E34.4Huffman coding: go go gophersEncoding uses tree:0 left/1 rightHow many bits? 37!!Savings? Worth it? ASCII(7bits) 3 bits Huffmang 103 1100111 000 00o 111 1101111 001 01p 112 1110000 010 1100h 104 1101000 011 1101e 101 1100101 100 1110r 114 1110010 101 1111s 115 1110011 110 101sp. 32 1000000 111 1013s1*22p1h12e1r14g3o3632p1h12e1r14s1*27g3o3613CompSci 100E34.5Huffman CodingD.A Huffman in early 1950’sBefore compressing data, analyze the input streamRepresent data using variable length codesVariable length codes though Prefix codesEach letter is assigned a codewordCodeword is for a given letter is produced by traversing the Huffman treeProperty: No codeword produced is the prefix of anotherLetters appearing frequently have short codewords, while those that appear rarely have longer onesHuffman coding is optimal per-character coding methodCompSci 100E34.6Building a Huffman treeBegin with a forest of single-node trees (leaves)Each node/tree/leaf is weighted with character countNode stores two values: character and countThere are n nodes in forest, n is size of alphabet?Repeat until there is only one node left: root of treeRemove two minimally weighted trees from forestCreate new tree with minimal trees as children,oNew tree root's weight: sum of children (character ignored)Does this process terminate? How do we get minimal trees?Remove minimal trees, hmmm……CompSci 100E34.7Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1C1F1P2U2R2L2D2G3T3O3B3A4M4SCompSci 100E34.8Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1C1F1P2U2R2L2D2G3T3O3B3A4M4S22CompSci 100E34.9Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3T3O3B3A4M4S2233CompSci 100E34.10Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3T3O3B3A4M4S223344CompSci 100E34.11Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3T3O3B3A4M4S22334444CompSci 100E34.12Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S233444455CompSci 100E34.13Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 344445566CompSci 100E34.14Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34444556666CompSci 100E34.15Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 3444455666886CompSci 100E34.16Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5E5N1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 3445566688688CompSci 100E34.17Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34455666886881010CompSci 100E34.18Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”11 6I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 3445668868810101111CompSci 100E34.19Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 3445688688101011111212CompSci 100E34.20Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 344568681610101111121216CompSci 100E34.21Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 344568681610211112121621CompSci 100E34.22Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34456868161021111223162123CompSci 100E34.23Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34456868161021111223372337CompSci 100E34.24Building a tree“A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS”116I5N5E1F1C1P2U2R2L2D2G3O3T3B3A4M4S2 34456868161021111223376060CompSci 100E34.25Encoding1. Count occurrence of all occurring character O( )2. Build priority queue O( )3. Build Huffman tree O( )4. Create Table of codes from tree O( )5. Write Huffman tree and coded data to file O( )NAA log AA log ANCompSci 100E34.26Properties of Huffman codingWant to minimize weighted path length L(T)of tree Twi is the weight or count of each codeword idi is the leaf corresponding to codeword iHow do we calculate character (codeword) frequencies?Huffman coding creates pretty full bushy trees?When would it produce a “bad” tree?How do we produce coded compressed data from input


View Full Document

Duke CPS 100E - Text Compression

Documents in this Course
Topics

Topics

9 pages

Lecture

Lecture

3 pages

Notes

Notes

2 pages

Hashing

Hashing

19 pages

Lecture

Lecture

59 pages

Lecture

Lecture

6 pages

Lecture

Lecture

4 pages

Lecture

Lecture

20 pages

Lecture

Lecture

12 pages

Lecture

Lecture

12 pages

Lecture

Lecture

7 pages

Lecture

Lecture

8 pages

Lecture

Lecture

10 pages

Lecture

Lecture

4 pages

Notes

Notes

16 pages

Lecture

Lecture

5 pages

Lecture

Lecture

9 pages

Lecture

Lecture

4 pages

Lecture

Lecture

13 pages

Lecture

Lecture

6 pages

Lecture

Lecture

16 pages

Lecture

Lecture

5 pages

Lecture

Lecture

5 pages

Lecture

Lecture

12 pages

Lecture

Lecture

12 pages

Lecture

Lecture

10 pages

Sets

Sets

14 pages

Lecture

Lecture

9 pages

Lecture

Lecture

4 pages

Test 1

Test 1

7 pages

Load more
Download Text Compression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Text Compression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Text Compression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?