1 Kolmogorov complexity1.1 Spectrum of research on data compression2 Summary of the course6.441 Spring 2006 24-16.441 Transmission of Information May 18, 2006Lecture 24Lecturer: Madhu Sudan Scribe: Chung Chan1 Kolmogorov complexityShannon’s notion of compressibility is closely tied to a probability distribution. However, the probabilitydistribution of the source is often unknown to the encoder. Sometimes, we interested in compressibilityof specific sequence. e.g. How compressible is the Bible? We have the Lempel-Ziv universal compressionalgorithm that can compress any string without knowledge of the underlying probability except theassumption that the strings comes from a stochastic source. But is it the best compression possible foreach deterministic bit string? This notion of compressibility/complexity of a deterministic bit string hasbeen studied by Solomonoff in 1964, Kolmogorov in 1966, Chaitin in 1967 and Levin.Consider the following n -s equence0100011011000001010011100101110111 · · ·Although it may appear random, it is the enumeration of all binary strings. A natural way to compressit is to represent it by the procedure that ge nerates it: enumerate all strings in binary, and stop at then-th bit. The compression achieved in bits is|compression length| ≤ 2 log n + O(1)More generally, with the universal Turing machine, we can encode data to a computer program thatgenerates it. The length of the smallest program that produces the bit string x is called the Kolmogorovcomplexity of x.Definition 1 (Kolmogorov complexity) For every language L , the Kolmogorov complexity of the bitstring x with respect to L isKL(x) = minp:L (p)=xl(p)where p is a program represented as a bit string, L (p) is the output of the program with respect to thelanguage L , and l(p) is the length of the program, or more precisely, the point at which the executionhalts.But this notion of complexity depends on the particular language L , which seems too specific to beuseful. For example, we may have a language that prints the bible by a short program, say the ASCIIstring of ”print the bible”. The length of the program depends of the language so much that it doesnot seem to reflect our intuitive understanding of what compressibility is. Without fixing a particularlanguage, however, the notion of complexity is ill-defined. Fortunately, we have the following theoremof the universal language which roughly says that the Kolmogorov complexity of x with respect to auniversal language is well-defined up to a constant.Theorem 2 (Universal language)(∃ universal language U)(∀ language L)(∃ finite constant CL)(∀ bit string x)KU(x) ≤ KL(x) + CLCan we choose the best universal language U that minimizes CLover all choices of L ? Such aminimum may not exist because CL, although finite, is unbounded.11Given any universal language U that one claims to be the best, we can find a finite bit string x that U compresses tomore tha n 1 bit , and then give a universal language U0that com press es x to exactly 1 bit by storing x within the languagemanual.6.441 Spring 2006 24-2To relate Kolmogorov complexity to Shannon’s notion of compressibility, let us ask the followingquestion: what is the probability model PUunder which the compression using the shortest program isgood? Intuitively, sequences that can be generated by shorter programs should be more probable. i.e.KU(x) ≈ log1PU(x)along the idea of entropy encoding. This can be satisfied with the following modelfor ge nerating X ,1. Fix a universal language U.2. Generate a random program p from an iid Bern(0.5) source.3. Generate X as the output U(p).The corresponding distribution PUis called the universal probability, defined as follows,Definition 3 (Universal probability)PU(x) =Xp:U(p)=x2−l(p)1.1 Spectrum of research on data compressionShannon’s model assumes a known distribution, and easy optimal compression schemes are available. Onthe other hand, Kolmogorov model is robust under unknown distribution, but the compression, whichinvolves searching for the shortest programs for strings, is incomputable. Fortunately, there are a varietyof models between these two extremes .Lempel Ziv Model assumes an finite-state Markov chains that may be unknown to the encoder. Thecompression algorithm is easy and is implemented in practice.Keiffer-Yang uses grammars to compress strings. For example, a gramm ar may consists of the followingrules with the associated probabilities:sentence.9−−→ subject,Verb,Object subject.9−−→ nounsentence.1−−→ one word subject.1−−→ pronounVerb.2−−→ an pronoun.99−−→ I object.3−−→ me.........Charikar, Sahai, Lehman etc. have come up with efficient grammars for polynomial-time compressionalgorithms.Resource bounded Kolmogorov complexity Kn2U(x) is defined as the le ngth of the smallest programp that produces x in time l(x)2. The compression can be done within 2l(x)2, basically by searchingthrough all the possible programs that satisfies the resource constraint.2 Summary of the courseStarting with the basic probability theory, we defined the entropy and mutual information as an intuitivemeasure of the uncertainty of a random variable and that of the information shared between two randomvariables. Then, we introduced the AEP, which comes in handy when we tackle large objects from smallprocesses. In particular, we applied it in typical set source encoding and channel decoding. Randomencoding is another important notion that simplified the error analysis when proving achievability resultsof source and channel coding. We then introduced the differential entropy as the appropriate measureof randomness of a continuous random variable, not in the absolute sense, but in the relative sensefor the purpose of comparing to another random variable in the same coordinate system. Finally, we6.441 Spring 2006 24-3introduced the network information theory for multiple access and broadcast channels, coding theoryand Kolmogorov complexity. The need for information theory to address computational complexity isimportant for designing practical systems, such as channel codes with efficient encoding and decodingalgorithms, and cryptographic systems that is c omputationally infeasible to break.There are some topics that we wish to have covered. For example, the applications of informationtheory outside the communication settings such as Gambling and Stock Market, and the rate
View Full Document