New version page

# UW-Madison CS 760 - Markov Chain Monte Carlo

Pages: 25
Documents in this Course
Unformatted text preview:

Markov Chain Monte CarloProf. David Pagetranscribed by Matthew G. LeeMarkov Chain• A Markov chain includes– A set of states– A set of associated transition probabilities• For every pair of states s and s’ (not necessarily distinct) we have an associated transition probability T(sÎs’) of moving from state s to state s’• For any time t, T(sÎs’) is the probability of the Markov process being in state s’ at time t+1 given that it is in state s at time tSome Properties of Markov Chains(Some we’ll use, some you may hear used elsewhere and want to know about)• Irreducible chain: can get from any state to any other eventually (non-zero probability)• Periodic state: state i is periodic with period k if all returns to i must occur in multiples of k• Ergodic chain: irreducible and has an aperiodic state. Implies all states are aperiodic, so chain is aperiodic.• Finite state space: can represent chain as matrix of transition probabilities… then ergodic = regular…• Regular chain: some power of chain has only positive elements• Reversible chain: satisfies detailed balance (later)Sufficient Condition for Regularity• A Markov chain is regular if the following properties both hold:1. For any pair of states s, s’ that each have nonzero probability there exists some path from s to s’ with nonzero probability2. For all s with nonzero probability, the “self loop” probability T(sÎs) is nonzero• Gibbs sampling is regular if no zeroes in CPTsExamples of Markov Chains (arrows denote nonzero-probability transitions)• Regular • Non-regularSampling of Random Variables Defines a Markov Chain• A state in the Markov chain is an assignment of values to all random variablesExample• Each of the four large ovals is a state• Transitions correspond to a Gibbs samplerY1= F Y2=TY1= T Y2=TY1= F Y2=FY1= T Y2=FBayes Net for which Gibbs Sampling is a Non-Regular Markov Chain0FF1TF1FT0TTP(C)BAA BC.xx...P(A).yy...P(B)The Markov chain defined by Gibbs sampling has eight states, each an assignment to the three Boolean states A, B, and C. It is impossible to go from the state A=T, B=T, C=F to any other stateNotation: States• yiand yi’ denote assignments of values to the random variable Yi• We abbreviate Yi=yiby yi• y denotes the state of assignments y=(y1,y2,...,yn)• uiis the partial description of a state given by Yj=yjfor all j not equal to i, or (y1,y2,...,yi-1,yi+1...,yn)• Similarly, y’ =(y1’,y2’,...,yn’) and ui’=(y1’,y2’,...,yi-1’,yi+1’...,yn’)Notation: Probabilities• πt(y) = probability of being in state y at time t• Transition function T(yÎy’) = probability of moving from state y to state y’Bayesian Network Probabilities• We use P to denote probabilities according to our Bayesian network, conditioned on the evidence– For example, P(yi’|ui) is the probability that random variable Yihas value yi’ given that Yj=yjfor all j not equal to iAssumption: CPTs nonzero• We will assume that all probabilities in all conditional probability tables are nonzero• So, for any y,• So, for any event S, • So, for any events S1and S2, 0])[|()(1>∈∀=∏=nijiiParentsjyyPyP0)()( >=∑∈SyyPSP0)()()|(22121>∩=SPSSPSSPGibbs Sampler Markov Chain• We assume we have already chosen to sample variable Yi–T(ui,yiÎui,yi’) = P(yi’|ui)• If we want to incorporate the probability of randomly uniformly choosing a variable to sample, simply multiply all transition probabilities by 1/nGibbs Sampler Markov Chain is Regular• Path from y to y’ with Nonzero Probability:–Let n be the number of variables in the Bayes net.– For step i = 1 to n :– Set variable Yito yi’ and leave other variables the same. That is, go from (y1’,y2’,...,yi-1’,yi,yi+1,...,yn) to (y1’,y2’,...,yi-1’,yi’,yi+1,...,yn)– The probability of this step isP(yi’|y1’,y2’,...,yi-1’,yi+1,...,yn), which is nonzero• So all steps, and thus the path, has nonzero probability• Self loop T(yÎy) has probability P(yi|ui) > 0How π Changes with Time in a Markov Chain• πt+1(y’) = • A distribution πtis stationary if πt= πt+1, that is, for all y, πt(y) = πt+1(y)∑→yyyy ))T((t'πDetailed Balance• A Markov chain satisfies detailed balance if there exists a unique distribution π such that for all states y, y’,π(y)T(yÎy’) = π(y’)T(y’Îy)• If a regular Markov chain satisfies detailed balance with distribution π, then there exists tsuch that for any initial distribution π0, πt= π• Detailed balance (with regularity) implies convergence to unique stationary distributionExamples of Markov Chains (arrows denote nonzero-probability transitions)• Regular, Detailed Balance (with appropriate π and T) ÎConverges to Stationary Distribution• Detailed Balance with πon nodes and T on arcs. Does not converge because not regular1/61/32/31/31/32/31/61/32/31/31/32/3Gibbs Sampler satisfies Detailed BalanceClaim: A Gibbs sampler Markov chain defined by a Bayesian network with all CPT entries nonzero satisfies detailed balance with probability distribution π(y)=P(y) for all states yProof: First we will show that P(y)T(yÎy’) = P(y’)T(y’Îy). Then we will show that no other probability distribution πsatisfies π(y)T(yÎy’) = π(y’)T(y’Îy)Gibbs Sampler satisfies Detailed Balance, Part 1P(y)T(yÎy’) = P(yi,ui)P(yi’|ui) (Gibbs Sampler Def.)= P(yi|ui)P(ui)P(yi’|ui) (Chain Rule)= P(yi’,ui)P(yi|ui) (Reverse Chain Rule)= P(y’)T(y’Îy) (Gibbs Sampler Def.)Gibbs Sampler Satisfies Detailed Balance, Part 2Since all CPT entries are nonzero, the Markov chain is regular. Suppose there exists a probability distribution π not equal to P such that π(y)T(yÎy’) = π(y’)T(y’Îy). Without loss of generality, there exists some state y such that π(y) > P(y). So, for every neighbor y’ of y, that is, every y’ such that T(yÎy’) is nonzero,π(y’)T(y’Îy) = π(y)T(yÎy’) > P(y)T(yÎy’) = P(y’)T(y’Îy)So π(y’) > P(y’).Gibbs Sampler Satisfies Detailed Balance, Part 3We can inductively see that π(y’’) > P(y’’) for every state y’’ path-reachable from y with nonzero probability. Since the Markov chain is regular, π(y’’) > P(y’’) for all states y’’ with nonzero probability. But the sum over all states y’’ of π(y’’) is 1, and the sum over all states y’’of P(y’’) is 1. This is a contradiction. So we can conclude that P is the unique

View Full Document Unlocking...