DOC PREVIEW
CMU CS 10708 - Lecture

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11School of Computer ScienceApproximate Inference:Markov Chain Monte CarloProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture 19, Nov 26, 2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Reading: J-Chap. 1, KF-Chap. 11Eric Xing 2Summary: Monte Carlo Methodsz Direct Sampling z Very difficult to populate a high-dimensional state space z Rejection Samplingz Create samples like direct sampling, only count samples which isconsistent with given evidences.z Likelihood weighting, ...z Sample variables and calculate evidence weight. Only create the samples which support the evidences.z Markov chain Monte Carlo (MCMC)z Metropolis-Hastingz Gibbs2Eric Xing 3Rao-Blackwellised samplingz Sampling in high dimensional spaces causes high variance in the estimate.z RB idea: sample some variables Xp, and conditional on that, compute expected value of rest Xdanalytically:z This has lower variance, because of the identity:z Hence , so is a lower variance estimator.[][][])|(~ ,),(),()|(),(),|()|(),()|,()(),|(),|()|(expxXxfEdxXxfEexpdxdxxxfexxpexpdxdxxxfexxpXfEpmpmdmpexXpMxpdpexXppxpxddppdpdpdpdpeXpmpdppdpd∑∫∫∫∫==⎟⎟⎠⎞⎜⎜⎝⎛==1[][][][][]pdppdpdpXXXEXXXEXX |),(var|),(var),(varτττ+=[][][]),(var|),(vardppdpXXXXXEττ≤[]pdpdpXXXfEXX |),(),( =τEric Xing 4Markov chain Monte Carlo (MCMC)z Importance sampling does not scale well to high dimensions.z Rao-Blackwellisation not always possible.z MCMC is an alternative.z Construct a Markov chain whose stationary distribution is the target density = P(X|e).z Run for Tsamples (burn-in time) until the chain converges/mixes/reaches stationary distribution.z Then collect M(correlated) samples xm.z Key issues:z Designing proposals so that the chain mixes rapidly.z Diagnosing convergence.3Eric Xing 5Markov Chainsz Definition:z Given an n-dimensional state spacez Random vector X = (x1,…,xn)z x(t)= x at time-step tz x(t)transitions to x(t+1)with probP(x(t+1)| x(t),…,x(1)) = T(x(t+1)| x(t)) = T(x(t)Æ x(t+1)) z Homogenous: chain determined by state x(0), fixed transition kernel T (rows sum to 1)z Equilibrium:π(x) is a stationary (equilibrium) distribution if π(x') = Σxπ(x) T(xÆx'). i.e., is a left eigenvector of the transition matrix πTT= πTT.()()⎟⎟⎟⎠⎞⎜⎜⎜⎝⎛=05050307007500250305020305020............X1X2X30.25 0.70.50.50.75 0.3Eric Xing 6Markov Chainsz An MC is irreducible if transition graph connectedz An MC is aperiodic if it is not trapped in cyclesz An MC is ergodic (regular) if you can get from state x to x ' in a finite number of steps.z Detailed balance: prob(x(t)Æx(i-1)) = prob(x(t-1)Æx(t))summing over x(t-1)z Detailed bal Æ stationary dist exists)|()()|()()()()()()()( 111 −−−=ttttttTpTpxxxxxx∑−−−=)()|()()()()()()(111tttttTppxxxxx4Eric Xing 7Metropolis-Hastingsz Treat the target distribution as stationary distributionz Sample from an easier proposal distribution, followed by an acceptance testz This induces a transition matrix that satisfies detailed balancez MH proposes moves according to Q(x'|x) and accepts samples with probability A(x'|x).z The induced transition matrix isz Detailed balance meansz Hence the acceptance ratio is)|'()|'()'(xxAxxQxxT=→)'|()'|()'()|'()|'()(xxAxxQxxxAxxQxππ=⎟⎟⎠⎞⎜⎜⎝⎛=)|'()()'|()'(,min)|'(xxQxxxQxxxAππ1Eric Xing 8Metropolis-Hastings1. Initialize x(0)2. While not mixing // burn-inzx=x(t)zt += 1,z sample u ~ Unif(0,1)z sample x* ~ Q(x*|x)-ifzx(t)= x* // transition-elsezx(t)= x// stay in current state z Reset t=0, for t =1:Nz x(t+1)) Å Draw sample (x(t))⎟⎟⎠⎞⎜⎜⎝⎛=<)|*()(*)|(*)(,min)|*(xxQxxxQxxxAuππ1Function Draw sample (x(t))5Eric Xing 9Mixing timez The εmixing time Tεis the minimal number of steps (from any starting distribution) until Dvar(P(T), π) ≤ε, where Dvaris the variational distance between the two distributions:z Chains with low bandwidth (conductance) regions of space take a long time to mix.z This arises for GMs with deterministic or highly skewed potentials.X1X2X3X4X5X7X6)()(sup),(defvarAASA2121µµµµ−=⊂DEric Xing 10MCMC example q(x*|x) ~ N(x(i),100)p(x) ~ 0.3 exp(-0.2x2) + 0.7 exp(-0.2(x-10)2)6Eric Xing 11Summary of MHz Random walk through state spacez Can simulate multiple chains in parallelz Much hinges on proposal distribution Qz Want to visit state space where p(X) puts massz Want A(x*|x) high in modes of p(X) z Chain mixes wellz Convergence diagnosisz How can we tell when burn-in is over?z Run multiple chains from different starting conditions, wait until they start “behaving similarly”.z Various heuristics have been proposed.Eric Xing 127Eric Xing 13Gibbs samplingz Gibbs sampling is an MCMC algorithm that is especially appropriate for inference in graphical models.z The proceduez we have variable set X={x1, x2, x3,... xN} for a GMz at each step one of the variables Xiis selected (at random or according to some fixed sequences), denote the remaining variables as X-i , and its current value as x-i(t-1)z Using the "alarm network" as an example, say at time t we choose XE, and we denote the current value assignments of the remaining variables,X-E, obtained from previous samples, as z the conditonal distributionp(Xi| x-i(t-1)) is computedz a value xi(t)is sampled from this distributionz the sample xi(t)replaces the previous sampled value of Xiin X.z i.e., {})()()()()(,,,11111 −−−−−−=tMtJtAtBtExxxxx)()()(tEtEtxxx∪=−−1Eric Xing 14Markov Blanketz Markov Blanket in BNz A variable is independent from others, given its parents, children and children‘s parents (d-separation).z MB in MRFz A variable is independent all its non-neighbors, given all its direct neighbors.⇒ p(Xi| X-i)= p(Xi| MB(Xi))z Gibbs samplingz Every step, choose one variable and sample it by P(X|MB(X)) based on previous sample.8Eric Xing 15Gibbs sampling of the alarm networkz To calculate P(J|B1,M1)z Choose (B1,E0,A1,M1,J1) as a startz Evidences are B1, M1, variables are A, E, J.z Choose next variable as Az Sample A by P(A|MB(A))=P(A|B1, E0, M1, J1) suppose to be false.z (B1, E0, A0, M1, J1)z Choose next random variable as E, sample E~P(E|B1,A0) z ...MB(A)={B, E, J, M}MB(E)={A, B}Eric Xing 16z Gibbs sampling is a special case of MHz The transition matrix updates each node one at a time using the


View Full Document

CMU CS 10708 - Lecture

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

Notes

Notes

12 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?