DOC PREVIEW
SJSU CS 146 - Data Compression

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Data CompressionWhat is Data Compression?Examples in computersWhat is Greedy AlgorithmExamples of GreedyProblem with GreedyDavid HuffmanASCII CodeIntro to Huffman AlgorithmHuffman Algorithm (English)Huffman Algorithm (Technical)Ambiguity in using code?PowerPoint PresentationSlide 14Slide 15Proof: part 1Proof: part 2Lengths of Encoding SetSlide 19Expected Value / characterMain PointImage CompressionSteps to Image CompressionImage DecompositionReferencesData CompressionGabriel LadenCS146 – Dr. Sin-Min LeeSpring 2004What is Data Compression?•There is lossless and lossy compression, either way, file size is reduced•This saves both time and space (premium)•Data Compression Algorithms are more successful if they are based on statistical analysis of the frequency of the data and the accuracy needed to represent the data.Examples in computers•jpeg is a compressed image file•mp3 is a compressed audio file•zip is a compressed archive of files•there are lots of encoding algorithms, we will look at Huffman’s Algorithm(see our textbook pp.357-362)What is Greedy Algorithm•Solve a problem in stages•Make a locally optimum decision•Algorithm is good if local optimum is equal is to the global optimumExamples of Greedy•Dijkstra, Prim, Kruskal •Bin Packing problem•Huffman CodeProblem with Greedy•Greedy Algorithm does not always work with the set of data, there can be some conflicts•What if all characters are equally distributed?•What if characters are very unequally distributed?•A problem from our text book:If we had such a thing as a 12cent coin, and we are asked to make 15cents change, Greedy Algorithm would produce :1(12cent) + 3(penny) = 15 incorrect answer1(dime) + 1(nickel) = 15 correct answerDavid Huffman•Paper published in 1952•“A Method for the Construction of Minimum Redundancy Codes”•What we call “Data Compression” is what he termed “Minimum Redundancy”ASCII Code•128 characters includes punctuation•log 128 = 7 bits•1 byte = 8 bits•All characters are 8 bits long•“Fixed-Length Encoding”•“Etaoin Shrdlu” most common letters!!!Intro to Huffman Algorithm•Method of construction for an encoding tree•Full Binary Tree Representation•Each edge of the tree has a value,(0 is the left child, 1 is the right child)•Data is at the leaves, not internal nodes•Result: encoding tree•“Variable-Length Encoding”Huffman Algorithm (English)•1. Maintain a forest of trees•2. Weight of tree = sum frequency of leaves•3. For 0 to N-1–Select two smallest weight trees–Form a new treeHuffman Algorithm (Technical)•n  |C|•Q  C•For i 1 to n – 1–Do z  AllocateNode()–x  left[z]  ExtractMin(Q)–y  right[z]  ExtractMin(Q)–f[z]  f[x] + f[y]–Insert(Q, z)•Return Extract-Min(Q)Ambiguity in using code?•What if you have an encoded string:•000010101101011000110001110•How do you know where to break it up?•Prefix Coding Rule–No code is a prefix of another–The way the tree is built disallows this–If there is a “00” code, there cannot be a “0”Step0:10 20 5 15 25 1 16(q) (w) (e) (r) (t) (y) (u)Step1: ( )610 20 15 25 16 / \(q) (w) (r) (t) (u) (y) (e)Step2: ( )16 / \ ( ) (q) 20 15 25 16 / \ (w) (r) (t) (u) (y) (e)Step3: ( )16 / \ ( )31 ( ) (q) 20 25 / \ / \ (w) (t) (r) (u) (y) (e)Step3: ( )16 / \ ( )31 ( ) (q) 20 25 / \ / \ (w) (t) (r) (u) (y) (e)Step4: ( )36 / \ ( ) (w) / \ ( )31 ( ) (q)25 / \ / \ (t) (r) (u) (y) (e)Step5: ( )36 / \ ( )56 ( ) (w) / \ / \ (t) ( ) ( ) (q) / \ / \ (r) (u) (y) (e)Step6: ( )92 / \ ( ) ( ) / \ / \ ( ) (w) (t) ( ) / \ / \ ( ) (q) (r) (u) / \ (y) (e)Step6: ( )92 / \ ( ) ( ) / \ / \ ( ) (w) (t) ( ) / \ / \ ( ) (q) (r) (u) / \ (y) (e)When tree is used to encode a file it is written as a header above the body of the encoded bits of text.• 0 is left, 1 is right edge• use a stack to do this Table: 01 w0000 y 10 t0001 e 110 r 001 q 111 uHeader:0000y0001e001q01w10t110r111uProof: part 1•Lemma: –Let C be an alphabet in which each character c in C has frequency f[c]–Let x and y be two characters in C having lowest frequencies–There exists an optimal prefix code in C in which the codes for x and y have the same length and differ only in last bitProof: part 2•Lemma:–Let T be a full binary tree representing an optimal prefix code over an alphabet C–Let z be the parent of two leaves x and y–Then T” = T – {x,y} represents an optimal prefix code for C” = C – {x,y}U{z}Lengths of Encoding Set root / \ / \ / \ / \ / \ / \ / \1 2 3 4 5 6 7 8 Length of set is:(8 nodes) * (3 edges) = 24bitsThis is what you would get if the nodes are mostly random and equal in probabilityLengths of Encoding Set root / \ / \ 8 / \ 7 / \ 6 / \ 5 / \ 4 / \ 3 1 2Length of set is:7+7+6+5+4+3+2+1 = 35bitsThis is what you would get if the nodes vary the most in probability.Expected Value / character•In example 1:•8 * (1/2^3) * 3) = 3 bits•In example 2:•2 * (1/2^7 * 7) + (1/2^6 * 6) + (1/2^5 * 5) + (1/2^4 * 4) + (1/2^3 * 3) + (1/2^2 * 2) + (1/2^1 * 1) = 1.98 bitsMain Point•Statistical methods work better when the symbols in the data set have varying probabilities.•Otherwise you need to use a different method for compression. (Example jpeg)Image Compression•“Lossy” – meaning details are lost•An approximation of original image is made where large areas of similar color are combined into a single block•This introduces a certain amount of error, which is a tradeoffSteps to Image Compression•Specify requested output file size•Divide image into several areas•Divide file size by the # of areas•Quantize each area (information lost here)•Encode each area separately, write to fileImage DecompositionReferences•Data Structures & Algorithm Analysis - Mark Allen Weiss•Introduction to Algorithms – Thomas H.


View Full Document

SJSU CS 146 - Data Compression

Download Data Compression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Compression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Compression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?