DOC PREVIEW
UMD CMSC 132 - Compression & Huffman Codes

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Compression & Huffman CodesCompressionCompression ExamplesSources of CompressibilityTypes of CompressionEffectiveness of CompressionSlide 7Slide 8Lossless Compression TechniquesHuffman CodeHuffman Code ExampleHuffman Code Data StructuresHuffman Code Algorithm OverviewHuffman Code – Creating TreeHuffman Tree Construction 1Huffman Tree Construction 2Huffman Tree Construction 3Huffman Tree Construction 4Huffman Tree Construction 5Huffman Coding ExampleSlide 21Huffman Decoding 1Huffman Decoding 2Huffman Decoding 3Huffman Decoding 4Huffman Decoding 5Huffman Decoding 6Huffman Decoding 7Huffman Code PropertiesSlide 30Compression & Huffman CodesFawzi EmadChau-Wen TsengDepartment of Computer ScienceUniversity of Maryland, College ParkCompressionDefinitionReduce size of data (number of bits needed to represent data)BenefitsReduce storage neededReduce transmission cost / latency / bandwidthCompression ExamplesToolswinzip, pkzip, compress, gzipFormatsImages.jpg, .gifAudio.mp3, .wavVideompeg1 (VCD), mpeg2 (DVD), mpeg4 (Divx) General.zip, .gzSources of CompressibilityRedundancyRecognize repeating patternsExploit usingDictionaryVariable length encodingHuman perceptionLess sensitive to some informationCan discard less important dataTypes of CompressionLosslessPreserves all informationExploits redundancy in dataApplied to general dataLossyMay lose some informationExploits redundancy & human perceptionApplied to audio, image, videoEffectiveness of CompressionMetricsBits per byte (8 bits)2 bits / byte  ¼ original size8 bits / byte  no compressionPercentage75% compression  ¼ original sizeEffectiveness of CompressionDepends on dataRandom data  hardExample: 1001110100  ?Organized data  easyExample: 1111111111  110CorollaryNo universally best compression algorithmEffectiveness of CompressionCompression is not guaranteedPigeonhole principleReduce size 1 bit  can only store ½ of dataExample 000, 001, 010, 011, 100, 101, 110, 111  00, 01, 10, 11If compression is always possible (alternative view)1. Compress file (reduce size by 1 bit)2. Recompress output 3. Repeat (until we can store data with 0 bits)Lossless Compression TechniquesLZW (Lempel-Ziv-Welch) compressionBuild pattern dictionaryReplace patterns with index into dictionaryBurrows-Wheeler transformBlock sort data to improve compressionRun length encodingFind & compress repetitive sequencesHuffman codeUse variable length codes based on frequencyHuffman CodeApproachVariable length encoding of symbolsExploit statistical frequency of symbolsEfficient when symbol probabilities vary widelyPrincipleUse fewer bits to represent frequent symbols Use more bits to represent infrequent symbols A A B AA AA BHuffman Code ExampleExpected sizeOriginal  1/82 + 1/42 + 1/22 + 1/82 = 2 bits / symbolHuffman  1/83 + 1/42 + 1/21 + 1/83 = 1.75 bits / symbolSymbol Dog Cat Bird FishFrequency 1/8 1/4 1/2 1/8Original Encoding00 01 10 112 bits 2 bits 2 bits 2 bitsHuffman Encoding110 10 0 1113 bits 2 bits 1 bit 3 bitsHuffman Code Data StructuresBinary (Huffman) treeRepresents Huffman codeEdge  code (0 or 1)Leaf  symbolPath to leaf  encodingExampleA = “11”, H = “10”, C = “0”Priority queueTo efficiently build binary tree11 00ACHHuffman Code Algorithm OverviewEncoding1. Calculate frequency of symbols in file2. Create binary tree representing “best” encoding 3. Use binary tree to encode compressed fileFor each symbol, output path from root to leafSize of encoding = length of path4. Save binary treeHuffman Code – Creating TreeAlgorithm1. Place each symbol in leafWeight of leaf = symbol frequency2. Select two trees L and R (initially leafs) Such that L, R have lowest frequencies in tree3. Create new (internal) node Left child  LRight child  RNew frequency  frequency( L ) + frequency( R )4. Repeat until all nodes merged into one treeHuffman Tree Construction 135 8 2 7AC E H IHuffman Tree Construction 235 8275AC EHIHuffman Tree Construction 335827510ACEHIHuffman Tree Construction 43582751015ACEHIHuffman Tree Construction 535 827510152511110000AC EHIE = 01I = 00C = 10A = 111H = 110Huffman Coding ExampleHuffman codeInputACEOutput(111)(10)(01) = 1111001E = 01I = 00C = 10A = 111H = 110Huffman Code Algorithm OverviewDecoding1. Read compressed file & binary tree2. Use binary tree to decode fileFollow path from root to leafHuffman Decoding 135 827510152511110000AC EHI1111001Huffman Decoding 235 827510152511110000AC EHI1111001Huffman Decoding 335 827510152511110000AC EHI1111001AHuffman Decoding 435 827510152511110000AC EHI1111001AHuffman Decoding 535 827510152511110000AC EHI1111001ACHuffman Decoding 635 827510152511110000AC EHI1111001ACHuffman Decoding 735 827510152511110000AC EHI1111001ACEHuffman Code PropertiesPrefix codeNo code is a prefix of another codeExampleHuffman(“dog”)  abHuffman(“cat”)  abc // not legal prefix codeCan stop as soon as complete code foundNo need for end-of-code markerNondeterministicMultiple Huffman coding possible for same inputIf more than two trees with same minimal weightHuffman Code PropertiesGreedy algorithmChooses best local solution at each stepCombines 2 trees with lowest frequencyStill yields overall best solutionOptimal prefix codeBased on statistical frequencyBetter compression possible (depends on data)Using other approaches (e.g., pattern


View Full Document

UMD CMSC 132 - Compression & Huffman Codes

Documents in this Course
Notes

Notes

8 pages

Recursion

Recursion

12 pages

Sorting

Sorting

31 pages

HTML

HTML

7 pages

Trees

Trees

19 pages

HTML

HTML

18 pages

Trees

Trees

19 pages

Honors

Honors

19 pages

Lecture 1

Lecture 1

11 pages

Quiz #3

Quiz #3

2 pages

Hashing

Hashing

21 pages

Load more
Download Compression & Huffman Codes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Compression & Huffman Codes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Compression & Huffman Codes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?