DOC PREVIEW
MIT 6 375 - Encoding

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

GZIP Encoding6.375 Final ProjectBehram Mistree & Dmitry KashlevGZIP - OutlineGZIP−Lossless compression algorithm−Specified by RFC 1950, 1951, and 1952.−Two parts:LZ77Huffman EncodingInput FileCompressed FileHuffman DecoderLZ77 DecoderCompressed FileOriginal FileLZ77 EncoderHuffman EncoderLZ77 – Basic IdeaLZ77 looks at partial strings within text.−If a particular string occurred within the previous 32 Kb of data, replace it with a pointer to the previous string.−Takes advantage of the repetitive nature of English text.LZ77 - ExampleOriginal Text:This will be encoded.This will be encoded.This will be encoded.This will be encoded.This will be encoded.Encoded Text:This will be encoded.<21, 21><21, 21><21, 21><21, 21>LZ77 - EncoderDecoupled 32Kb of memory−Stores last 32Kb of text fileMemory manager−Writes to correct position of memory−Checks memory against input dataWrapper−Receives value from GET/PUT interface−Writes out either single character or encoded distance, length pairLZ77 - DecoderDeals with two cases:−Case 1: Receives a characterPipes character directly to output.−Case 2: Receives a length-distance pairPerform a memory lookup and write out string of characters to output buffer.LZ77 - Clock Time and AreaEncoder*:−4051.00 µm2−3.93 ns critical pathDecoder*:−2015.25 µm2−3.93 ns critical path* We could not get Encounter to synthesize memories correctly, so these values do not include a 32K long, 8 byte SRAM memory.LZ77 - Initial Results CompressionPre-Encoding: 108,673 character text file input.Encoded: 27,052 pair and character values in encoded file. −23848 pair values.−3204 single characters slip through.−Encoded gives 89,653 bytes*. 82% the size of the initial file without Huffman.*Assuming a 29 bits for each pair.LZ77 - Limiting FactorsThe encoding algorithm is the primary bottleneck.Relies on repetitive nature of document. In worst case (no repetitions), to encode each character will need to examine 32Kb of data.Algorithm is O(n) time. But it has a huge constant factor.LZ77 - ExplorationMemory requests can return more than one piece of data at a time.−Increase data from memoryIncrease concurrency−Can check for multiple characters and single characters concurrentlyHuffman codeEvery ascii character has an equivalent Huffman codeHuffman code is a sequence of bits.The huffman sequences may have the same value, but different bit lengthExample: 0011 and 11 are different huffman codesAssuming the following alphabet:D: 00 E: 11H: 010L: 011O: 101R: 1000W: 1001 : 10001HELLO WORLD01011011011101100011001101100001100Huffman TreeHuffman code is a prefix-free codeHuffman code does not have a fixed lengthDuring encoding, the ascii characters are generated by going down the binary Huffman tree starting at root node.Leaf Val=BNode NodeLeafValue=ANodeNodeRoot nodeLeafVal=CLeafVal=DNode LeafVal=E010101010101A: 00B: 11C: 010D: 011E: 101Huffman EncoderA Table mapping ascii characters to huffman code stored in a register file. indexHuffman codeRegister1111111111111111….11111111111111101111111111111111….111111111111111101111111111111111….1111111111111111101111111111111111….11111111111111111101111111111111111….11111111111111111110979899100101‘a’‘b’‘c’‘d’‘e’RegistersFor every ascii character in a file, perform table lookup to get huffman code for the characterLookup is easy. Every register stores huffman code for equivalent ascii character (A=97 in ascii)Huffman DecoderA huffman tree is generated before decoding.Root Node (leftPointer=2, rightPointer=3)Leaf, (Value=’a’)Node (leftPointer=4, rightPointer=5, Value=1)Leaf (Value=’b’)Node (leftPointer=6, rightPointer=7, Value=1)Leaf (Value=’c’)123456Leaf (Value=’d’)7NodeRoot node01Left pointer is taken if bit is “0”Right pointer is taken if bit is “1”LeafVal=aLeafVal=aNodeLeafVal=aLeafVal=a0011Huffman module overviewinQoutQinQ outQEncoderDecoderHuffman TableHuffman TreeInput from LZ77 encoder CHAR or PAIRInput from Huffman Encoder (1 bit)Output to LZ77 Decoder CHAR or PAIROutput to Huffman Encoder (1 bit)If PAIR, pass to outQIf CHAR, look up huffman code in tableIf PAIR, pass to outQIf BIT, go down huffman tree to obtain ascii charLZ77 and Huffman ResultsPre-Encoding: 108,673 character text file input.Encoded: 161,177 bytes*.*Assuming a 29 bits for each pair.\0InitializationRead 3 valuesinto first[0], first[1], first[2]from infifocheck_equalssend requestto mem checkfor all values of first look_for_morewrites first[0], first[1], and first[2] into memoryset count = 0.Write first[0]Write first[0]into memory and to outfifo.Case 1: Memory !contain first in consecutive order.Case 2: Memory contains first in consecutive order.Shiftfirst[0] = first[1]first[1] = first[2]Read new valuefirst[3] = infifo.first()infifo.deq()check_equalstmp = infifo.first()Case 1: !=Case 2: ==write tmp to memoryinfifo.deq()++countWriteWrite pair(count +3, distance)to outfifo. (note: distance is returned by check_equals)Things to add and improve•Compress LZ77 pairs with huffman coding•Dynamic Huffman•Instead of large register file, use decoupled memoryStatic vs. Dynamic HuffmanTwo ways of generating huffman alphabet2)Static Huffman – the huffman table and tree are generated before encoding/decoding takes place3)Dynamic Huffman – The table is dynamically changing as new ascii characters are introduced, based on the frequency of ascii charactersLZ77 - SpeedRequired 2,854,403,040 clock cycles to perform encoding and decoding with LZ77 previous file.11.2178039 s to perform both lz77 encoding and decoding 2,853,820,260 clock cycles to perform encoding and decoding with huffman and lz77. 11.2155136 s (assuming 3.93ns critical


View Full Document

MIT 6 375 - Encoding

Documents in this Course
IP Lookup

IP Lookup

15 pages

Verilog 1

Verilog 1

19 pages

Verilog 2

Verilog 2

23 pages

Quiz

Quiz

10 pages

IP Lookup

IP Lookup

30 pages

Load more
Download Encoding
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Encoding and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Encoding 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?