Approximate Inference by SamplingApproximate inference overviewGoalForward SamplingSlide 5IdeaLikelihood WeightingImportance SamplingSlide 9Mutilated BN ProposalForward Sampling ApproachesScratch spaceMarkov Blanket ApproachesMarkov Blanket SamplersGibbs SamplingComputing P(Xi | ui)Pairwise Markov Random FieldMarkov Chain InterpretationSlide 19ConvergenceMAP by SamplingWhat you need to know1Approximate Inference by SamplingGraphical Models – 10708Ajit SinghCarnegie Mellon UniversityNovember 3rd, 2006Readings:K&F: 10.1, 10.2, 10.310-708 – Ajit Singh 20062 Approximate inference overviewThere are many many many many approximate inference algorithms for PGMsWe will focus on three representative ones:sampling - todayvariational inference - continues next classloopy belief propagation and generalized belief propagation10-708 – Ajit Singh 20063 GoalOften we want expectations given samples x[1] … x[m] from a distribution P.10-708 – Ajit Singh 20064 Forward SamplingSample nodes in topological order10-708 – Ajit Singh 20065 Forward SamplingP(Y = y) = #(Y = y) / NP(Y = y | E = e) = #(Y = y, E = e) / #(E = e)Rejection sampling: throw away samples that do not match the evidence.Sample efficiencyHow often do we expect to see a record with E = e ?10-708 – Ajit Singh 20066 IdeaWhat is we just fix thevalue of evidence nodes ?What is expected numberof records with (Intelligence = Low) ?10-708 – Ajit Singh 20067 Likelihood Weighting10-708 – Ajit Singh 20068 Importance SamplingWhat if you cannot easily sample ?Posterior distribution on a Bayesian networkP(Y = y | E = e) where the evidence itself is a rare event.Sampling from a Markov network with cycles is always hardSee homework 4Pick some distribution Q(X) that is easier to sample from.Assume that if P(x) > 0 then Q(x) > 0Hopefully D(P||Q) is small10-708 – Ajit Singh 20069 Importance SamplingUnnormalized Importance Sampling10-708 – Ajit Singh 200610 Mutilated BN ProposalGenerating a proposal distribution for a Bayesian networkEvidenced nodes have no parents. Each evidence node Zi = zi has distribution P(Zi = zi) = 1Equivalent to likelihood weighting10-708 – Ajit Singh 200611 Forward Sampling ApproachesForward sampling, rejection sampling, and likelihood weighting are all forward samplersRequires a topological ordering. This limits us toBayesian networksTree Markov networksUnnormalized importance sampling can be done on cyclic Markov networks, but it is expensiveSee homework 4LimitationFixing an evidence node only allows it to directly affect its descendents.10-708 – Ajit Singh 200612 Scratch space10-708 – Ajit Singh 200613 Markov Blanket ApproachesForward Samplers: Compute weight of Xi given assignment to ancestors in topological orderingMarkov Blanket Samplers: Compute weight of Xi given assignment to its Markov Blanket.10-708 – Ajit Singh 200614 Markov Blanket SamplersWorks on any type of graphical model covered in the course thus far.10-708 – Ajit Singh 200615 Gibbs Sampling1. Let X be the non-evidence variables2. Generate an initial assignment (0) 3. For t = 1..T.1 (t) = (t-1)2. For each Xi in X1. ui = Value of variables X - {Xi} in sample (t)2. Compute P(Xi | ui)3. Sample xi(t) from P(Xi | ui)4. Set the value of Xi = xi(t) in (t) 4. Samples are taken from (0) … (T)10-708 – Ajit Singh 200616 Computing P(Xi | ui)The major task in designing a Gibbs sampler is deriving P(Xi | ui)Use conditional independenceXi Xj | MB(Xi) for all Xj in X - MB(Xi) - {Xi}P(X|Y = y) = P(Y|X = x) =10-708 – Ajit Singh 200617 Pairwise Markov Random Field10-708 – Ajit Singh 200618 Markov Chain InterpretationThe state space consists of assignments to X. P(xi | ui) are the transition probability (neighboring states differ only in one variable)Given the transition matrix you could compute the exact stationary distributionTypically impossible to store the transition matrix.Gibbs does not need to store the transition matrix !10-708 – Ajit Singh 200619 Scratch space10-708 – Ajit Singh 200620 ConvergenceNot all samples (0) … (T) are independent. Consider one marginal P(xi|ui).Burn-inThinning10-708 – Ajit Singh 200621 MAP by SamplingGenerate a few samples from the posterior For each Xi the MAP is the majority assignment10-708 – Ajit Singh 200622 What you need to knowForward sampling approachesForward Sampling / Rejection SamplingGenerate samples from P(X) or P(X|e)Likelihood Weighting / Importance SamplingSampling where the evidence is rareFixing variables lowers variance of samples when compared to rejection sampling.Useful on Bayesian networks & tree Markov networksMarkov blanket approachesGibbs SamplingWorks on any graphical model where we can sample from P(Xi | rest).Markov chain interpretation.Samples are independent when the Markov chain converges.Convergence heuristics, burn-in,
View Full Document