DOC PREVIEW
U-M EECS 281 - Lecture Note

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

OutlinePA1 Timing Analysis Report Due NowLast time:• Trie• RLE and Huffman encodingToday:• Dictionary-based compression• LZ77 and LZSS• LZ78 and LZWSugih Jamin ([email protected])Dictionary-based CompressionHuffman encoding represents frequently seen symbols with small codesOften sequences of symbols are frequently seenExample: in English text, “the”, “and”, “ion”, “ing”, “of the”(and “like”, “so”, “right”) are seen more frequently than other sequencesIdea: assign codes to entire sequencesAdvantages over Huffman:• better compression• universal code: encode a piece of text “on-the-fly,”in one-pass, “streaming” mode• no need to store code table (dictionary) in compressed fileSugih Jamin ([email protected])Dictionary-based TaxonomyAll of dictionary-based methods can be divided into two:• if a sequence has been previously encountered in the input data,output a pointer to the earlier occurrence; the dictionary is implicitlyrepresented by previously processed and output data, example: LZ77Input Stream:Output Stream:P• enter phrases into the dictionary; when the same phrase isencountered, output its index into the dictionary, example: LZWInput Stream:Output Stream:Dictionary:1 41234156Sugih Jamin ([email protected])Lempel-Ziv 77 CompressionLZ77:• find duplicated strings in input data• replace latter occurrence of a string bya pointer to the previous occurrencePointer is encoded as (P, L), where• P is the distance between the two occurrences of the string• L is the length of the matchExample: Blah blah blah blah blah!Output: Blah b(5,18)!Sugih Jamin ([email protected])LZ77Algorithm:• uses two variables: prefix, and lookahead buffer of samefixed-size• prefix initially empty, buffer contains input block• while (buffer not empty) {find longest match in prefix+buffer of bufferif (match) {output (P, L)move prefix and buffer forward L chars} else {output (0, 0)}output the char at head of buffermove prefix and buffer forward 1 char}Sugih Jamin ([email protected])LZ77 ExampleString: aaabbcabstep pos prefix[8] buffer[8] match output forward1 1 – aaabbcab – (0,0)a 12 2 a aabbcab aa (1,2)b 33 5 aaab bcab b (1,1)c 24 7 aaabbc ab ab (4,2)\0 35 10 aaabbcab – – – –Output: 00a12b11c42 (56 bits)(P, L) is represented with 6 bits, so each (P, L)char is 14 bitsSugih Jamin ([email protected])LZSSLZ-Storer-Szymanski observations:• null pointer redundant• actually, if L < 2, use of pointer redundant• the mismatch after a pointer could be a matchLZSS algorithm:while (buffer not empty) {find longest match in prefix+buffer of bufferif (match and L >= MINLENGTH(2)) {output (P, L)move prefix and buffer forward L chars} else {output the char at head of buffermove prefix and buffer forward 1 char}}Sugih Jamin ([email protected])LZSS Mode ExampleString: aaabbcabstep pos prefix[8] buffer[8] match output forward1 1 – aaabbcab – a 12 4 a aabbcab aa (1,2) 23 5 aaa bbcab – b 14 6 aaab bcab b b 15 7 aaabb cab – c 17 8 aaabbc ab ab (4,2) 28 10 aaabbcab – – – –Output: a12bbc42 (54 bits)Assume each char is represented as 9 bits:1 bit to differentiate between char and pointerSugih Jamin ([email protected])LZSS ExampleString: Blah blah blah blah blah!matched pos pos prefix[8] buffer[8] output/count– 1 – Blah bla B– 2 B lah blah l– 3 Bl ah blah a– 4 Bla h blah b h– 5 Blah blah bl– 6 Blah blah bla b2 7 Blah b lah blah (5,13 8 Blah bl ah blah 24 9 Blah bla h blah b 35 10 lah blah blah bl 46 11 ah blah blah bla 57 12 h blah b lah blah 6. . . . . . . . . . . . . . .19 24 blah bla h! (5, 18)– 25 lah blah ! !– 26 lah blah! – –Output: Blah b518! (requires 5 bits for L)Sugih Jamin ([email protected])LZ77/SS Wrap UpImplementation issue: how big should the prefix+buffer window be?Implications?LZSS is what people usually mean when they say LZ77gzip: LZ77/SS output is further compressed using HuffmanAlso used in PKZip, ARJ, LHArc, Zoo, etc.Sugih Jamin ([email protected])Dictionary-based TaxonomyAll of dictionary-based methods can be divided into two:• if a sequence has been previously encountered in the input data,output a pointer to the earlier occurrence; the dictionary is implicitlyrepresented by previously processed and output data, example: LZ77Input Stream:Output Stream:P• enter phrases into the dictionary; when the same phrase isencountered, output its index into the dictionary, example: LZWInput Stream:Output Stream:Dictionary:1 41234156Sugih Jamin ([email protected])LZ78Idea: enter sequences of symbols into a dictionary;when a sequence is seen again, output its index in the dictionaryAlgorithm:1. define a phrase recursively as a previously seen phrase (prefix) plusan additional character (such that the concatenation has not beenseen before)2. enter each new phrase into a dictionary maintained as a trie3. reset prefix after entering a new phrase4. nodes on the trie are labeled in order of addition (phrase index),root has index 05. trie ordered by index6. when a phrase is repeated, output the phrase’s indexinstead of the phrase, otherwise output 07. output the last (mismatching) character from Step 1Sugih Jamin ([email protected])LZ78 Encoding ExampleString: how now brown cow in town.ohwn_wc_b_tnri012 3 4 5812 14.67 101391115Encoding: 0h0o0w00n2w4b0r6n4c6 0i5 0t9.Time complexity (n string length, |Σ| alphabet size): O(n|Σ|)Sugih Jamin ([email protected])LZ78 DecodingString: how now brown cow in town.Encoding: 0h0o0w00n2w4b0r6n4c6 0i5 0t9.How to decode the encoded string in O(n),n the size of the encoded string?Sugih Jamin ([email protected])LZW CompressionLZ78 encoding is O(n|Σ|), want an O(n) algorithm, n: input sizeWelch’s adaptation of LZ78:• encoding and decoding use an array (dictionary)(similar to LZ78 decoding, each trie node encoded as an array index)• all 8-bit chars assumed already in array, at indices 0-255• each new phrase is assigned the next index• phrase matching doesn’t start at root node, instead use last characterin previous phrase as start of new phrase• finding a phrase in the dictionary is like finding a path in the LZ78 trie,reading in each new char corresponds to going down one level intothe trie, hence no code output until mismatch• upon mismatch, output the index of the matched prefix, and enter thenew phrase into the dictionary• use hashing to match phrases in O(1) time ⇒ total time O(n)Sugih Jamin


View Full Document

U-M EECS 281 - Lecture Note

Download Lecture Note
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Note and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Note 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?