Unformatted text preview:

6.441 Transmission of Information February 23, 2006Lecture 5Lecturer: Madhu Sudan Scribe: Hyun Sung Chang1 Introduction1.1 Today’s Topic• Markov chains/processes• Entropy rate of Markov chain1.2 Motivating ExampleExample 1: Let us start by considering the following example. What are the rates of X and Y ?                    2 Stochastic ProcessA stochastic process can be viewed as an infinite sequence of random variables, e.g., X−n, X−n+1, · · · ,X0, X1, X2, · · · , Xn, · · · , whose distribution may be expressed byPr[X1= x1, X2= x2, · · · , Xn= xn] ∼ p(x1, · · · , xn).There are some meaningful and restricted classes of stochastic process.5-1Definition 1 (Stationary Process) hXninis a stationary process ifPr[X1= x1, · · · , Xn= xn] = Pr[X1+l= x1, · · · , Xn+l= xn| {z }time shift by l], ∀n, l, x1, · · · , xn.Definition 2 (Markov Process/Markov Chain) hXninis a Markov chain ifPr[Xn= xn|X1= x1, · · · , Xn−1= xn−1] = Pr[Xn= xn|Xn−1= xn−1], ∀n, x1, · · · , xn.If Xi∈ Ω and Ω is finite, then Pr[Xn= xn|Xn−1= xn−1] is just |Ω|2entries for every n. But, canwe describe it in finite terms? No.Definition 3 (Time Invariant Markov Chain) Markov Chain is time-invariant ifPr[Xn= a|Xn−1= b] = Pr[Xn+l= a|Xn+l−1= b], ∀n, l, a, b ∈ Ω.Time invariant Markov chain can be specified by distribution on X0and probability transition matrixP = [Pij], where Pij= Pr[X2= j|X1= i]. Throughout the rest of lecture, time invariant Markov chainwill be referred to simply as Markov chain (MC).Example 2: Consider the following three-state MC. In this case, P =0 1 00 0 11 0 0.  With X0= A, the resulting sequence will be “ABCABCABC · · · .” Note that this is not stationarybecause Pr[X0= A, X1= B, X2= C] = 1 but Pr[X1= A, X2= B, X3= C] = 0. Instead, Pr[X1=B, X2= C, X3= A] = 1Fact 1 For every MC, ∃stationary distribution µ on X0such that µ and P define a stationary process.In the example 2, µ =£131313¤.BecausePr[X1= x1, X2= x2, · · · , Xn= xn]= Pr[X1= x1] · Pr[X2= x2|X1= x1] · · · Pr[Xn= xn|Xn−1= xn−1]= Pr[X1= x1] · Px1x2· · · Pxn−1xn,the overall distribution depends only on the distribution on X1, which implies that the distribution µon X0is stationary if Pr[X1= i] = µi(= Pr[X0= i]).5-2Example 3: Let us consider the following example: In this case, µA= µC= 0, µB= 1 is stationary, but µA= µB= 0, µC= 1 is also stationary.More than one stationary distribution can be problematic, and this situation happens because the MCis reducible.Definition 4 (Reducibility of Markov Chain) 1. Markov chain given by probability transitionmatrix P is reducible if P can be written as·P0P10 P2¸,where P0, P2are square matrices.2. MC is irreducible if it is not reducible.In terms of graph structure, the “irreducible” and ”aperiodic” characteristics can be interpreted as• irreducible - strongly connected, ∃path from each state i to state j.• aperiodic - greatest common divisor of cycle lengths is 1.Theorem 2 (Perron-Frobenius’s Theorem) Every (aperiodic) irreducible Markov chain has a uniquestationary distribution.For stationary distribution, the probability distribution on X1should be the same as µ, the proba-bility distribution of X0. ⇒ Pr[X1= j] =PNi=1µiPij= µi, where N = |Ω| and Ω = {1, 2, · · · , N }. Ifwe use vector-matrix notation,[ µ ]P= [ µ ], (1)and µ corresponds to an eigenvector. For the example 1,P =0.9 0.1 00 2/3 1/32/3 1/3 0.Theorem 2 implies that there exists a unique eigenvector with all entries non-negative. We can computeµ = [µ1µ2µ3] using (1) and µ1+ µ2+ µ3= 1. ⇒ µ = [2032932332].5-33 Entropy Rate of Stochastic ProcessThere are two reasonable notions for measuring the uncertainty of X = hXnin.• Entropy rate:H(X ) = limn→∞1nH(X1, · · · , Xn) if the limit exists.• Entropy0rate:H0(X ) = limn→∞H(Xn|X1, · · · , Xn−1) if the limit exists.Theorem 3 Entropy rate of a stationary stochastic process exists and equals entropy0rate.H(X ) = H0(X ).Proof Idea The following inequality can be used for the proof of the existence of H0(X ).H(Xn|X1, · · · , Xn−1) ≤ H(Xn|X2, · · · , Xn−1) = H(Xn−1|X1, · · · , Xn−1).For complete proof, refer to pp.64-65 of Cover.Theorem 4 If irreducible MC has probability transition matrix P and stationary distribution µ,H(X ) = H0(X ) = −Xi,jµiPijlog Pij. (2)ProofH0(X ) = limn→∞H(Xn|X1, · · · , Xn−1)= limn→∞H(Xn|Xn−1)= H(X2|X1)=XiPr[X1= i] · H(X2|X1= i)= −XiµiXjPijlog Pij.Using (2), H(X ) of the example 1 can be computed:H(X ) =58H(0.9) +38Hµ23¶.AEP for Markov Chain:−1nlog p(X1, · · · , Xn) −→ H(X ).This doesn’t follow from our law of large numbers because random variables may be dependent oneach other.Hidden Markov Model: Now, let us consider the rate of hYninin the example 1. H0(Y ) =limn→∞H(Yn|Y1, · · · , Yn−1), and is bounded byH(Yn|Y1, · · · , Yn−1, X1) ≤ H0(Y ) = limn→∞H(Yn|Y1, · · · , Yn−1) ≤ H(Yn|Y1, · · · , Yn−1) ∀n.5-4(Try to prove the inequality at the left-hand side!) If we denote the interval between the upper and thelower bounds by ²n,²n= H(Yn|Y1, · · · , Yn−1) − H(Yn|Y1, · · · , Yn−1, X1) = I(X1; Yn|Y1, · · · , Yn−1),andMXn=1²n=MXn=1I(X1; Yn|Y1, · · · , Yn−1) ≤


View Full Document

MIT 6 441 - LECTURE NOTES

Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?