DOC PREVIEW
UW CSEP 590 - Lecture Notes

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CSEP 590 Data CompressionAutumn 2007Course PoliciesIntroduction to Data CompressionEntropyVariable Length CodesCSEP 590 - Lecture 1 - Autumn 2007 2Instructors• Instructor– Richard Ladner– [email protected]– 206 543-9347• TA– Rahul Vanam– [email protected] 590 - Lecture 1 - Autumn 2007 3Helpful Knowledge• Algorithm Design and Analysis• ProbabilityCSEP 590 - Lecture 1 - Autumn 2007 4Resources• Text Book– Khalid Sayood, Introduction to Data Compression, Third Edition, Morgan Kaufmann Publishers, 2006. • Course Web Page–http://www.cs.washington.edu/csep590a• Papers and Sections from Books• Discussion Board– For discussionCSEP 590 - Lecture 1 - Autumn 2007 5Engagement by Students• Weekly Assignments– Understand compression methodology– Due in class on Fridays (except midterm Friday)– No late assignments accepted except with prior approval• Programming Projects– Bi-level arithmetic coder and decoder.– Build code and experimentCSEP 590 - Lecture 1 - Autumn 2007 6Final Exam and Grading• 6:30-8:20 p.m. Thursday, Dec. 13, 2007 • Percentages– Weekly assignments (50%) – Project (20%) – Final exam (30%)2CSEP 590 - Lecture 1 - Autumn 2007 7Logistics• I will be gone the week of October 15th. We’ll need to have a make up class.• There is no class Thanksgiving week, November 19th.• We have some guest speakers toward the end of the quarter.CSEP 590 - Lecture 1 - Autumn 2007 8Basic Data Compression ConceptsEncoder Decodercompressedoriginalxyxˆ• Lossless compression– Also called entropy coding, reversible coding.• Lossy compression– Also called irreversible coding. • Compression ratio = – is number of bits in x.xxˆ=xxˆ≠yxxdecompressedCSEP 590 - Lecture 1 - Autumn 2007 9Why Compress• Conserve storage space• Reduce time for transmission– Faster to encode, send, then decode than to send the original• Progressive transmission– Some compression techniques allow us to send the most important bits first so we can get a low resolution version of some data before getting the high fidelity version• Reduce computation– Use less data to achieve an approximate answerCSEP 590 - Lecture 1 - Autumn 2007 10Braille• System to read text by feeling raised dots on paper (or on electronic displays). Invented in 1820s by Louis Braille, a French blind man.a b c zand the with mother th ghchCSEP 590 - Lecture 1 - Autumn 2007 11Braille ExampleClear text:Call me Ishmael. Some years ago -- never mind how long precisely -- having \\ little or no money in my purse, and nothing particular to interest me on shore, \\ I thought I would sail about a little and see the watery part of the world. (238 characters)Grade 2 Braille in ASCII.,call me ,i\%mael4 ,``s ye$>$s ago -- n``e m9d h[ l;g precisely -- hav+ \\ ll or no m``oy 9 my purse1 \& no?+ ``picul$>$ 6 9t]e/ me on \%ore1 \\ ,i $?$``$|$ ,i wd sail ab a ll \& see ! wat]y ``p ( ! \_w4 (203 characters)Compression ratio = 238/203 = 1.17 CSEP 590 - Lecture 1 - Autumn 2007 12Lossless Compression• Data is not lost - the original is really needed.– text compression– compression of computer binary files• Compression ratio typically no better than 4:1 for lossless compression on many kinds of files.• Statistical Techniques– Huffman coding– Arithmetic coding– Golomb coding• Dictionary techniques– LZW, LZ77 – Sequitur – Burrows-Wheeler Method• Standards - Morse code, Braille, Unix compress, gzip, zip, bzip, GIF, JBIG, Lossless JPEG3CSEP 590 - Lecture 1 - Autumn 2007 13Lossy Compression • Data is lost, but not too much.– audio– video– still images, medical images, photographs• Compression ratios of 10:1 often yield quite high fidelity results.• Major techniques include– Vector Quantization– Wavelets– Block transforms– Standards - JPEG, JPEG2000, MPEG 2, H.264 CSEP 590 - Lecture 1 - Autumn 2007 14Why is Data Compression Possible• Most data from nature has redundancy– There is more data than the actual information contained in the data.– Squeezing out the excess data amounts to compression.– However, unsqueezing is necessary to be able to figure out what the data means.• Information theory is needed to understand the limits of compression and give clues on how to compress well.CSEP 590 - Lecture 1 - Autumn 2007 15What is Information• Analog data– Also called continuous data– Represented by real numbers (or complex numbers)• Digital data– Finite set of symbols {a1, a2, ... , am}– All data represented as sequences (strings) in the symbol set.– Example: {a,b,c,d,r} abracadabra– Digital data can be an approximation to analog dataCSEP 590 - Lecture 1 - Autumn 2007 16Symbols• Roman alphabet plus punctuation• ASCII - 256 symbols• Binary - {0,1}– 0 and 1 are called bits– All digital information can be represented efficiently in binary– {a,b,c,d} fixed length representation– 2 bits per symbol11100100binarydcbasymbolCSEP 590 - Lecture 1 - Autumn 2007 17Exercise - How Many Bits Per Symbol?• Suppose we have n symbols. How many bits (as a function of n ) are needed in to represent a symbol in binary?– First try n a power of 2.CSEP 590 - Lecture 1 - Autumn 2007 18Discussion: Non-Powers of Two• Can we do better than a fixed length representation for non-powers of two?4CSEP 590 - Lecture 1 - Autumn 2007 19Information Theory• Developed by Shannon in the 1940’s and 50’s• Attempts to explain the limits of communication using probability theory.• Example: Suppose English text is being sent– It is much more likely to receive an “e” than a “z”.– In some sense “z” has more information than “e”.CSEP 590 - Lecture 1 - Autumn 2007 20012345670.010.080.150.220.290.360.430.50.570.640.710.780.850.920.99xy-log(x)First-order Information• Suppose we are given symbols {a1, a2, ... , am}.• P(ai) = probability of symbol aioccurring in the absence of any other information.P(a1) + P(a2) + ... + P(am) = 1• inf(ai) = log2(1/P(ai)) bits is the information of aiin bits.CSEP 590 - Lecture 1 - Autumn 2007 21Example• {a, b, c} with P(a) = 1/8, P(b) = 1/4, P(c) = 5/8– inf(a) = log2(8) = 3– inf(b) = log2(4) = 2– inf(c) = log2(8/5) = .678• Receiving an “a” has more information than receiving a “b” or “c”.CSEP 590 - Lecture 1 - Autumn 2007 22First Order Entropy• The first order entropy is defined for a probability distribution over symbols {a1, a2, ... , am}.•


View Full Document

UW CSEP 590 - Lecture Notes

Documents in this Course
Sequitur

Sequitur

56 pages

Sequitur

Sequitur

56 pages

Protocols

Protocols

106 pages

Spyware

Spyware

31 pages

Sequitur

Sequitur

10 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?