DOC PREVIEW
SJSU CS 146 - Data Compressor

This preview shows page 1-2-3-21-22-23-42-43-44 out of 44 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 44 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Data Compressor---Huffman Encoding and DecodingHuffman EncodingPowerPoint PresentationSlide 4Slide 5Slide 6Slide 7Huffman's AlgorithmSlide 9Slide 10Slide 11Slide 12Slide 13Binary Tree RepresentationSlide 15Slide 16Prefix CodeSlide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Huffman Encoding - OperationSlide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Huffman Encoding - DecodingSlide 43Huffman Encoding - Time ComplexityData Compressor---Huffman Encoding and DecodingHuffman Encoding•Compression•Typically, in files and messages,•Each character requires 1 byte or 8 bits•Already wasting 1 bit for most purposes!•Question•What’s the smallest number of bits that can be used to store an arbitrary piece of text?•Idea•Find the frequency of occurrence of each character•Encode Frequent characters short bit strings• Rarer characters longer bit stringsHuffman's Algorithm•1952•Repeatedly merges trees - maintains a forest•Tree weight - the sum of its leaves frequencies•For C characters to code, start with C single node trees•Select two trees, T1 and T2, of smallest weights and merge them•C - 1 merge operationsHuffman Encoding•Encoding•Use a tree•Encode by followingtree to leaf•eg •E is 00•S is 011•Frequent charactersE, T 2 bit encodings•Others A, S, N, O 3 bit encodingsHuffman Encoding•Encoding•Use a tree•Inefficient in practice•Use a direct-addressed lookuptable ?Finding the optimal encoding•Smallest number of bits torepresent arbitrary textA 010E 00 B::N:ST11000110•A divide-and-conquer approach might have us asking which characters should appear in the left and right subtrees and trying to build the tree from the top down. •A greedy approach places our n characters in n sub-trees and starts by combining the two least weight nodes into a tree which is assigned the sum of the two leaf node weights as the weight for its root node.Huffman Encoding•Divide and conquer•Decide on a root - n choices•Decide on roots for sub-trees - n choices•Repeat n timesO(n!)•Greedy Approach•Sort characters by frequency•Form two lowest weight nodes into a sub-tree•Sub-tree weight = sum of weights of nodes•Move new tree to correct placeStandard Coding SchemeBinary Tree Representation•For the character set of C characters, the standard fixed-length coding needs ┌log C┐ bits•Fixed-length code can be represented by a binary tree where characters are stored only in leaf nodes - binary trie•Each character path - start at the root, follow the branches, record 0 for the left branch and 1 for the right branch•Optimal code is always a full tree - all nodes are either leaves or have two childrenRepresentation by a Binary TrieImproved Binary TriePrefix Code•The fixed-length character code that has characters places only at the leaves guarantees that any bit sequence can be decoded unambiguously•Prefix code - characters may have varying lengths as long as no character code is a prefix of another code•That means that characters can be only in leafsOptimal Prefix Code TreeOptimal Prefix Code CostHuffman’s Algorithm Example - IHuffman’s Algorithm Example - IIHuffman’s Algorithm Example - IIIHuffman’s Algorithm Example - IVHuffman’s Algorithm Example - VHuffman’s Algorithm Example - VIHuffman’s Algorithm Example-VIIHuffman Encoding - OperationInitial sequenceSorted by frequencyCombine lowest twointo sub-treeMove it to correctplaceAfter shifting sub-treeto its correct place ...Huffman Encoding - OperationCombine next lowestpairMove sub-tree to correct placeMove the new tree to the correct place ...Huffman Encoding - OperationNow the lowest two are the“14” sub-tree and DCombine and move to correct placeMove the new tree to the correct place ...Huffman Encoding - OperationNow the lowest two are thethe “25” and “30” treesCombine and move to correct placeHuffman Encoding - OperationCombine last two trees•How do we decode a Huffman-encoded bit string? With these variable length strings, it's not possible to break up an encoded string of bits into characters!" •The decoding procedure is deceptively simple. Starting with the first bit in the stream, one then uses successive bits from the stream to determine whether to go left or right in the decoding tree. When we reach a leaf of the tree, we've decoded a character, so we place that character onto the (uncompressed) output stream. The next bit in the input stream is the first bit of the next character.Huffman Encoding - DecodingHuffman Encoding - Time Complexity•Sort keys O(n log n)•Repeat n times•Form new sub-tree O(1)•Move sub-tree O(logn)(binary search)•Total O(n log n) •Overall O(n log


View Full Document

SJSU CS 146 - Data Compressor

Download Data Compressor
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Compressor and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Compressor 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?