DOC PREVIEW
Princeton COS 226 - Data Compression

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Algorithms in Java, 4th Edition·Robert Sedgewick and Kevin Wayne·Copyright © 2008·April 15, 2008 11:25:10 PMData CompressionReferences: Algorithms 2nd edition, Chapter 22 http://www.cs.princeton.edu/algs4/65compression‣fixed-length codes‣variable-length codes‣an application‣adaptive codes2Data compressionCompression reduces the size of a file:•To save space when storing it.•To save time when transmitting it.•Most files have lots of redundancy.Who needs compression?•Moore's law: # transistors on a chip doubles every 18-24 months.•Parkinson's law: data expands to fill space available. •Text, images, sound, video, …Basic concepts ancient (1950s), best technology recently developed.“ All of the books in the world contain no more information than is broadcast as video in a single large American city in a single year. Not all bits have equal value. ” — Carl Sagan3ApplicationsGeneric file compression.•Files: GZIP, BZIP, BOA.•Archivers: PKZIP.•File systems: NTFS.Multimedia.•Images: GIF, JPEG. •Sound: MP3.•Video: MPEG, DivX™, HDTV.Communication.•ITU-T T4 Group 3 Fax.•V.42bis modem.Databases. Google.4Encoding and decodingMessage. Binary data M we want to compress.Encode. Generate a "compressed" representation C(M).Decode. Reconstruct original message or some approximation M'.Compression ratio. Bits in C(M) / bits in M.Lossless. M = M', 50-75% or lower.Ex. Natural language, source code, executables.Lossy. M ≈ M', 10% or lower.Ex. Images, sound, video.EncoderMDecoderC(M) M'uses fewer bits (you hope)this lecture5Food for thoughtData compression has been omnipresent since antiquity:•Number systems. •Natural languages. •Mathematical notation.has played a central role in communications technology,•Braille.•Morse code.•Telephone system. and is part of modern life.•MP3.•MPEG.Q. What role will it play in the future?6What data can be compressed?US Patent 5,533,051 on "Methods for Data Compression", which is capable of compression all files.Slashdot reports of the Zero Space Tuner™ and BinaryAccelerator™.“ ZeoSync has announced a breakthrough in data compression that allows for 100:1 lossless compression of random data. If this is true, our bandwidth problems just got a lot smaller.… ” 7Perpetual motion machinesUniversal data compression is the analog of perpetual motion.Closed-cycle mill by Robert Fludd, 1618Gravity engine by Bob SchadewaldReference: Museum of Unworkable Devices by Donald E. Simanekhttp://www.lhup.edu/~dsimanek/museum/unwork.htm8What data can be compressed?Proposition. Impossible to losslessly compress all files.Pf 1.•consider all 1,000 bit messages.•21000 possible messages.•only 2999 + 2998 + … + 1 can be encoded with ≤ 999 bits.•only 1 in 2499 can be encoded with ≤ 500 bits!Pf 2 (by contradiction).•given a file M, compress it to get a smaller file M1.•compress that file to get a still smaller file M2.•continue until reaching file size 0.•implication: all files can be compressed with 0 bits!9Rdenudcany in Enlgsih lnagugaeQ. How much redundancy is in the English language?A. Quite a bit.“ ... randomising letters in the middle of words [has] little or no effect on the ability of skilled readers to understand the text. This is easy to denmtrasote. In a pubiltacion of New Scnieitst you could ramdinose all the letetrs, keipeng the first two and last two the same, and reibadailty would hadrly be aftcfeed. My ansaylis did not come to much beucase the thoery at the time was for shape and senqeuce retigcionon. Saberi's work sugsegts we may have some pofrweul palrlael prsooscers at work. The resaon for this is suerly that idnetiyfing coentnt by paarllel prseocsing speeds up regnicoiton. We only need the first and last two letetrs to spot chganes in meniang. ” — Graham Rawlinson10‣fixed-length codes‣variable-length codes‣an application‣adaptive codes11Fixed-length coding•Use same number of bits for each symbol.•k-bit code supports 2k different symbols.Ex. 7-bit ASCII code.chardecimalcodeNUL00000000......a971100001b981100010c991100011d1001100100......DEL1271111111abracadabra!11000011100010111001011000011100011110000111001001100001110001011100101100001111111112 symbols × 7 bits per symbol = 84 bits in code12Fixed-length coding•Use same number of bits for each symbol.•k-bit code supports 2k different symbols.Ex. 3-bit custom code.charcodea000b001c010d011r100!111abracadabra!00000110000001000001100000110000011112 symbols × 3 bits per symbol = 36 bits in code13Fixed-length coding: general scheme•Count number of different symbols.•~ lg M bits suffice to support M different symbols.Ex. Genomic sequences.•4 different nucleotides.•2 bits suffice.•Amazing but true: initial databases in 1990s did not use such a code!Important detail. Decoder needs to know the code!charcodea00c01t10g11actacagatgaa000110000100110010110000~ 2N bits to encode genome with N nucleotides 2-bit DNA code14‣fixed-length codes‣variable-length codes‣an application‣adaptive codesUse different number of bits to encode different symbols.Ex. Morse code.Issue. Ambiguity. • • • − − − • • •SOS ?IAMIE ?EEWNI ?V7O ?Remark. Separate words with medium gap. 15Variable-length codingQ. How do we avoid ambiguity?A. Ensure that no codeword is a prefix of another.Ex 1. Fixed-length code.Ex 2. Append special stop symbol to each codeword.Ex 3. Custom prefix-free code.16Variable-length codingcharcodeS• • •E•I••V•••-prefix of Vprefix of I, Sprefix of Sabracadabra!0111110010100100011111001011charcodea0b111c1010d100r110!110128 bits to encode message (vs. 36 bits for fixed-length 3-bit code)17Q. How to represent a prefix-free code?A. Binary trie.•Symbols are stored in leaves.•Codeword is path to leaf.Encoding.•Method 1: start at leaf; follow path upto the root, and print bits in reverse order.•Method 2: create ST of symbol-codeword pairs.Decoding.•Start at root of tree.•Go left if bit is 0; go right if 1. •If leaf node, print symbol and return to root.Prefix-free code: encoding and decodingad0c !r b10 10 10 10 1charcodea0b111c1010d100r110!101118Representing the codeQ. How to transmit the trie?A. Send preorder traversal of trie.Note. If message is long, overhead of transmitting trie is small.*a**d*c!*rb120111110010100100011111001011preorder traversal# chars to decodethe message bits(pack 8 to the byte)ad0c !r b10


View Full Document

Princeton COS 226 - Data Compression

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download Data Compression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Compression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Compression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?