CMU CS 10708 - Approximate Inference by Sampling - D2388342

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10708> Approximate Inference by Sampling

DOC PREVIEW

CMU CS 10708 - Approximate Inference by Sampling

School name Carnegie Mellon University

Course Cs 10708- Probabilistic Graphical Models

Pages 23

This preview shows page 1-2-22-23 out of 23 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 23 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Approximate Inference by SamplingWhat you’ve learned so farGoalForward SamplingMultinomial SamplingSample-based probability estimatesSample ComplexityImportance SamplingSlide 9Mutilated Proposal Q(X)Slide 11Limitation of Forward SamplersMarkov Blanket ApproachesGibbs SamplingSlide 15Computing P(Xi | ui)(Simple) Image SegmentationSlide 18Slide 19MAP by SamplingMarkov Chain InterpretationConvergenceWhat you need to knowApproximate Inference by SamplingGraphical Models – 10708Ajit SinghCarnegie Mellon UniversityNovember 10th, 2008Readings:K&F: 10.1, 10.2, 10.3 (Particle Based Approximate Inference)What you’ve learned so farVE & Junction TreesExact inferenceExponential in tree-widthBelief Propagation, Mean FieldApproximate inference for marginals/conditionalsFast, but can get inaccurate estimatesSample-based InferenceApproximate inference for marginals/conditionalsWith “enough” samples, will converge to the right answer (or a high accuracy estimate)10-708 – Ajit Singh 2006-20082(If you want to be cynical, replace “enough” with “ridiculously many”)10-708 – Ajit Singh 2006-20083GoalOften we want expectations given samples x[1] … x[M] from a distribution P.Discrete Random Variables:Number of samples from P(X):10-708 – Ajit Singh 2006-20084Forward Sampling•Sample nodes in topological order•End result is one sample from P(X)•Assignment to parents selects P(X|Pa(X))•Repeat to get more samplesMultinomial SamplingGiven an assignment to its parents, Xi is a multinomial random variable.10-708 – Ajit Singh 2006-20085Sample-based probability estimatesHave a set of M samples from P(X)Can estimate any probability by counting records:10-708 – Ajit Singh 2006-20086Marginals:Conditionals:Rejection sampling: once the sample and evidence disagree, throw away the sample.Rare events: If the evidence is unlikely, i.e., P(E = e) small, then the sample size for P(X|E=e) is lowSample ComplexityIn many cases the probability estimate is the sum of indicator (Bernoulli) random variables:Forward sampling for marginal probabilities.Rejection sampling for conditional probabilities.The indicators are independent and identically distributed10-708 – Ajit Singh 2006-20087Additive Chernoff:(absolute error)Multiplicative Chernoff:(relative error)Bound the r.h.s. by δ and solve for M.Reducing relative error is hard if P(x) is small. can be replaced by any marginal or conditional estimated by the sum of iid BernoullisImportance SamplingLimitations of forward and rejection samplingWhat if the evidence is a rare event ?Either accept low accuracy estimate, or sample a lot more.What if the model has no topological ordering ?Bayesian networks always have a T.O.Tree Markov Random Fields have a T.O.Arbitrary undirected graphical models may not have a T.O.Hard to sample from P(X).Importance sampling addresses these issues.10-708 – Ajit Singh 2006-20088Importance SamplingWant to estimate P(X)Basic idea: pick Q(X) such thatKL(P||Q) is small.Dominance: Q(x) > 0 whenever P(x) > 0.Sampling from Q is easier than sampling from P.10-708 – Ajit Singh 2006-20089Assumes it’s easy to evaluate P(x)Mutilated Proposal Q(X)Fix the evidence distributions.Cut edges so that observed nodes have no parents.10-708 – Ajit Singh 2006-200810Unlike forward sampling, we do not throw away samples = less work.Variance of the estimates reduces at a rate of 1/M.If Q is good, then the variance of the estimates is lower than forward or rejection sampling.Importance SamplingCan be generalized to deal with MRFs, where we can only easily get unnormalized probabilities. Gibbs sampling is more common in undirected models.Importance sampling yields a priori bounds on the sample complexity.10-708 – Ajit Singh 2006-20081110-708 – Ajit Singh 2006-200812Limitation of Forward SamplersForward sampling, rejection sampling, and importance sampling are all forward samplersFixing an evidence node only allows it to directly affect its descendents.10-708 – Ajit Singh 2006-200813Markov Blanket ApproachesForward Samplers: Compute weight of Xi given assignment to ancestors in topological ordering.Markov Blanket Samplers: Compute weight of Xi given assignment to its Markov Blanket.Forward SamplerMarkov Blanket Sampler10-708 – Ajit Singh 2006-200814Gibbs SamplingWe will focus on Gibbs SamplingThe most common Markov Blanket samplerWorks for directed and undirected modelsExploits independencies in graphical modelsA common form of Markov Chain Monte Carlo10-708 – Ajit Singh 2006-200815Gibbs Sampling1. Let X be the non-evidence variables2. Generate an initial assignment (0) 3. For t = 1..MAXITER.1 (t) = (t-1)2. For each Xi in X1. ui = Value of variables X - {Xi} in sample (t)2. Compute P(Xi | ui)3. Sample xi(t) from P(Xi | ui)4. Set the value of Xi = xi(t) in (t) 4. Samples are taken from (0) … (T)10-708 – Ajit Singh 2006-200816Computing P(Xi | ui)The major task in designing a Gibbs sampler is deriving P(Xi | ui).Use conditional independenceXi  Xj | MB(Xi) for all Xj in X - MB(Xi) - {Xi}P(X|Y = y) P(Y|X = x) =(Simple) Image Segmentation10-708 – Ajit Singh 2006-200817Noisy grayscale image.Label each pixel as on/off.Model using a pairwise MRF.10-708 – Ajit Singh 2006-200818Gibbs SamplingGibbs Sampling10-708 – Ajit Singh 2006-200819I1 I2I3 I410-708 – Ajit Singh 2006-200820MAP by SamplingGenerate a few samples from the posterior For each Xi the MAP is the majority assignment10-708 – Ajit Singh 2006-200821Markov Chain InterpretationThe state space consists of assignments to X. P(xi | ui) are the transition probability (neighboring states differ only in one variable)Given the transition matrix you could compute the exact stationary distributionTypically impossible to store the transition matrix.Gibbs does not need to store the transition matrix !10-708 – Ajit Singh 200622ConvergenceNot all samples (0) … (T) are independent. Consider one marginal P(xi|ui).Burn-inThinning10-708 – Ajit Singh 200623What you need to knowForward sampling approachesForward Sampling / Rejection SamplingGenerate samples from P(X) or P(X|e)Likelihood Weighting / Importance SamplingSampling where the evidence is rareFixing variables

View Full Document