DOC PREVIEW
MIT 6 856J - Markov Chains

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

� Markov Chains Markov chains: • Powerful tool for sampling from complicated distributions • rely only on local moves to explore state space. • Many use Markov chains to model events that arise in nature. • We create Markov chains to explore and sample from problems. 2SAT: • Fix some assignment A • let f(k) be expected time to get all n variables to match A if n currently match. • Then f(n) = 0, f(0) = 1 + f(1), and f(k) = 1 + 1 2 (f(k + 1) + f(k − 1). • Rewrite: f(0) − f(1) = 1 and f(k) − f(k + 1) = 2 + f(k − 1) − f(k) • So f(k) − f (k + 1) = 2k + 1 • deduce f(0) = 1 + 3 + · · · + (2n − 1) = n2 • so, find with probability 1/2 in 2n2 time. • With high probability, find in O(n2 log n) time. More general formulation: Markov chain • State space S • markov chain begins in a start state X0, moves from state to state, so output of chain is a sequence of states X0, X1, . . . = {Xt}∞t=0 • movement controlled by matrix of transition probabilities pij = probability next state will be j given current is i. • thus, j pij = 1 for every i ∈ S • implicit in definition is memorylessness property: Pr[Xt+1 = j | X0 = i0, X1 = i1, . . . , Xt = i] = Pr[Xt+1 = j Xt = i] = pij .| • Initial state X0 can come from any probability distribution, or might be fixed (trivial prob. dist.) • Dist for X0 leads to dist over sequences {Xt} • Suppose Xt has distribution q (vector, qi is prob. of state i). Then Xt+1 has dist qP . Why? 1• Observe Pr[Xt+r = j | Xt = i] = P r ij Graph of MC: • Vertex for every state • Edge (i, j) if pij > 0 • Edge weight pij • weighted outdegree 1 • Possible state sequences are paths through the graph Stationary distribution: a π such that πP = π• • left eigenvector, eigenvalue 1 • steady state behavior of chain: if in stationary, stay there. • note stationary distribution is a sample from state space, so if can get right stationary distribution, can sample lots of chains have them. • • to say which, need definitions. Things to rule out: • infinite directed line (no stationary) • 2-cycle (no stationary) • disconnected graph (multiple) Irreducibility • any state can each any other state • i.e. path between any two states • i.e. single strong component in graph Persistence/Transience: (t) • is probability first hit state j at t, given start state i.rij • fij is probability eventually reach j from i, so � r(t) ij � (t) • expected time to reach is hitting time hij = trij 2• If fij < 1 then hij = ∞ since might never reach. Converse not always true. • If fii < 1, state is transient. Else persistent. If hii = ∞, null persistent. Persistence in finite graphs: • graph has strong components • final strong component has no outgoing edges • Nonfinal components: – once leave nonfinal component, cannot return – if nonfinal, nonzero probability of leaving in n steps. – so guaranteed to leave eventually – so, vertices in nonfinal components are transient • Final components – if final, will stay in that component – If two vertices in same strong component, have path between them – so nonzero probability of reaching in (say) n steps. – so, vertices in final components are persistent – geometric distribution on time to reach, so expected time finite. Not null-persistent Conclusion: – In finite chain, no null-persistent states – In finite irreducible chain, all states non-null persistent (no transient states) Periodicity: • Periodicity of a state is max T such that some state only has nonzero probability at times a + Ti for integer i • Chain periodic if some state has periodicity > 1 • In graph, all cycles containing state have length multiple of T • Easy to eliminate: add self loops • slows down chain, otherwise same Ergodic: • aperiodic and non-null persistent • means might be in state at any time in (sufficiently far) future 3Fundamental Theorem of Markov chains: Any irreducible, finite, aperiodic Markov chain satisfies: • All states ergodic (reachable at any time in future) • unique stationary distribution π, with all πi > 0 • fii = 1 and hii = 1/πi • number of times visit i in t steps approaches tπi in limit of t. Justify all except uniqueness here. Finite irreducible aperiodic implies ergodic (since finite irreducible implies non-null persis- tent) Intuitions for quantities: • hii is expected return time • So hit every 1/hii steps on average • So hii = 1/πi • If in stationary dist, tπi visits follows from linearity of expectation Random walks on undirected graphs: • general Markov chains are directed graphs. But undirected have some very nice prop-erties. • take a connected, non-bipartite undirected graph on n vertices states are vertices. • • move to uniformly chosen neighber. • So puv = 1/d(u) for every neighbor v • stationary distribution: πv = d(v)/2m • unqiqueness says this is only one • deduce hvv = 2m/d(v) Definitions: • Hitting time huv is expected time to reach u from v • commute time is huv + hvu • Cu(G) is expected time to visit all vertices of G, starting at u • cover time is maxu Cu(G) (so in fact is max over any starting distribution). 4• let’s analyze max cover time Examples: • clique: commute time n, cover time Θ(n log n) • line: commute time between ends is Θ(n2) • lollipop: huv = Θ(n3) while hvu = Θ(n2) (big difference!) • also note: lollipop has edges added to line, but higher cover time: adding edges can increase cover time even though improves connectivity. general graphs: adjacent vertices: • lemma: for adjcaent (u, v), huv + hvu ≤ 2m • proof: new markov chain on edge traversed following vertex MC – transition matrix is doubly stochastic: column sums are 1 (exactly d(v) edges can transit to edge (v, w), each does so with probability 1/d(v)) – In homework, show such matrices have uniform stationary distribution. – Deduce πe = 1/2m. Thus hee = 2m. • So consider suppose original chain on vertex v. – suppose arrived via (u, v) – expected to traverse (u, v) again in 2m steps – at this point will have commuted u to v and back. – so conditioning on arrival method, commute time 2m (thanks to memorylessness) General graph cover time: • theorem: cover time O(mn) • proof: find a


View Full Document

MIT 6 856J - Markov Chains

Download Markov Chains
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Markov Chains and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Markov Chains 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?