MIT 6 856J - Markov Chains - D3040529

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 856J> Markov Chains

DOC PREVIEW

MIT 6 856J - Markov Chains

School name Massachusetts Institute of Technology

Course 6 856j- Randomized Algorithms

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

� Markov Chains Markov chains: • Powerful tool for sampling from complicated distributions • rely only on local moves to explore state space. • Many use Markov chains to model events that arise in nature. • We create Markov chains to explore and sample from problems. 2SAT: • Fix some assignment A • let f(k) be expected time to get all n variables to match A if n currently match. • Then f(n) = 0, f(0) = 1 + f(1), and f(k) = 1 + 1 2 (f(k + 1) + f(k − 1). • Rewrite: f(0) − f(1) = 1 and f(k) − f(k + 1) = 2 + f(k − 1) − f(k) • So f(k) − f (k + 1) = 2k + 1 • deduce f(0) = 1 + 3 + · · · + (2n − 1) = n2 • so, ﬁnd with probability 1/2 in 2n2 time. • With high probability, ﬁnd in O(n2 log n) time. More general formulation: Markov chain • State space S • markov chain begins in a start state X0, moves from state to state, so output of chain is a sequence of states X0, X1, . . . = {Xt}∞t=0 • movement controlled by matrix of transition probabilities pij = probability next state will be j given current is i. • thus, j pij = 1 for every i ∈ S • implicit in deﬁnition is memorylessness property: Pr[Xt+1 = j | X0 = i0, X1 = i1, . . . , Xt = i] = Pr[Xt+1 = j Xt = i] = pij .| • Initial state X0 can come from any probability distribution, or might be ﬁxed (trivial prob. dist.) • Dist for X0 leads to dist over sequences {Xt} • Suppose Xt has distribution q (vector, qi is prob. of state i). Then Xt+1 has dist qP . Why? 1• Observe Pr[Xt+r = j | Xt = i] = P r ij Graph of MC: • Vertex for every state • Edge (i, j) if pij > 0 • Edge weight pij • weighted outdegree 1 • Possible state sequences are paths through the graph Stationary distribution: a π such that πP = π• • left eigenvector, eigenvalue 1 • steady state behavior of chain: if in stationary, stay there. • note stationary distribution is a sample from state space, so if can get right stationary distribution, can sample lots of chains have them. • • to say which, need deﬁnitions. Things to rule out: • inﬁnite directed line (no stationary) • 2-cycle (no stationary) • disconnected graph (multiple) Irreducibility • any state can each any other state • i.e. path between any two states • i.e. single strong component in graph Persistence/Transience: (t) • is probability ﬁrst hit state j at t, given start state i.rij • fij is probability eventually reach j from i, so � r(t) ij � (t) • expected time to reach is hitting time hij = trij 2• If fij < 1 then hij = ∞ since might never reach. Converse not always true. • If fii < 1, state is transient. Else persistent. If hii = ∞, null persistent. Persistence in ﬁnite graphs: • graph has strong components • ﬁnal strong component has no outgoing edges • Nonﬁnal components: – once leave nonﬁnal component, cannot return – if nonﬁnal, nonzero probability of leaving in n steps. – so guaranteed to leave eventually – so, vertices in nonﬁnal components are transient • Final components – if ﬁnal, will stay in that component – If two vertices in same strong component, have path between them – so nonzero probability of reaching in (say) n steps. – so, vertices in ﬁnal components are persistent – geometric distribution on time to reach, so expected time ﬁnite. Not null-persistent Conclusion: – In ﬁnite chain, no null-persistent states – In ﬁnite irreducible chain, all states non-null persistent (no transient states) Periodicity: • Periodicity of a state is max T such that some state only has nonzero probability at times a + Ti for integer i • Chain periodic if some state has periodicity > 1 • In graph, all cycles containing state have length multiple of T • Easy to eliminate: add self loops • slows down chain, otherwise same Ergodic: • aperiodic and non-null persistent • means might be in state at any time in (suﬃciently far) future 3Fundamental Theorem of Markov chains: Any irreducible, ﬁnite, aperiodic Markov chain satisﬁes: • All states ergodic (reachable at any time in future) • unique stationary distribution π, with all πi > 0 • fii = 1 and hii = 1/πi • number of times visit i in t steps approaches tπi in limit of t. Justify all except uniqueness here. Finite irreducible aperiodic implies ergodic (since ﬁnite irreducible implies non-null persis- tent) Intuitions for quantities: • hii is expected return time • So hit every 1/hii steps on average • So hii = 1/πi • If in stationary dist, tπi visits follows from linearity of expectation Random walks on undirected graphs: • general Markov chains are directed graphs. But undirected have some very nice prop-erties. • take a connected, non-bipartite undirected graph on n vertices states are vertices. • • move to uniformly chosen neighber. • So puv = 1/d(u) for every neighbor v • stationary distribution: πv = d(v)/2m • unqiqueness says this is only one • deduce hvv = 2m/d(v) Deﬁnitions: • Hitting time huv is expected time to reach u from v • commute time is huv + hvu • Cu(G) is expected time to visit all vertices of G, starting at u • cover time is maxu Cu(G) (so in fact is max over any starting distribution). 4• let’s analyze max cover time Examples: • clique: commute time n, cover time Θ(n log n) • line: commute time between ends is Θ(n2) • lollipop: huv = Θ(n3) while hvu = Θ(n2) (big diﬀerence!) • also note: lollipop has edges added to line, but higher cover time: adding edges can increase cover time even though improves connectivity. general graphs: adjacent vertices: • lemma: for adjcaent (u, v), huv + hvu ≤ 2m • proof: new markov chain on edge traversed following vertex MC – transition matrix is doubly stochastic: column sums are 1 (exactly d(v) edges can transit to edge (v, w), each does so with probability 1/d(v)) – In homework, show such matrices have uniform stationary distribution. – Deduce πe = 1/2m. Thus hee = 2m. • So consider suppose original chain on vertex v. – suppose arrived via (u, v) – expected to traverse (u, v) again in 2m steps – at this point will have commuted u to v and back. – so conditioning on arrival method, commute time 2m (thanks to memorylessness) General graph cover time: • theorem: cover time O(mn) • proof: ﬁnd a

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

MIT 6 856J - Markov Chains

Sign up for free to view:

Please select your school