CALTECH CS 155 - Probabilistic Graphical Models - D1293208

Home> Schools> California Institute of Technology> Computer Science (CS) > CS 155> Probabilistic Graphical Models

CALTECH CS 155 - Probabilistic Graphical Models

School name California Institute of Technology

Course Cs 155- Probabilistic Graphical Models

Pages 31

Download Save

Unformatted text preview:

ProbabilisticGraphical ModelsLecture 16 – SamplingCS/CNS/EE 155Andreas Krause2AnnouncementsHomework 3 due todayProject poster session on Friday December 4 (tentative)Final writeup (8 pages NIPS format) due Dec 93Approximate inferenceThree major classes of general-purpose approachesMessage passingE.g.: Loopy Belief Propagation (today!)Inference as optimizationApproximate posterior distribution by simple distributionMean field / structured mean fieldAssumed density filtering / expectation propagationSampling based inferenceImportance sampling, particle filteringGibbs sampling, MCMCMany other alternatives (often for special cases)4Variational approximationKey idea: Approximate posterior with simpler distribution that’s as close as possible to PWhat is a “simple” distribution?What does “as close as possible” mean?Simple = efficient inferenceTypically: factorized (fully independent, chain, tree, …)Gaussian approximationAs close as possible = KL divergence5Finding simple approximate distributionsKL divergence not symmetric; need to choose directionsP: true distribution; Q: our approximationD(P || Q)The “right” wayOften intractable to computeAssumed Density FilteringD(Q || P)The “reverse” wayUnderestimates support(overconfident)Mean field approximationBoth special cases of α-divergencemin D(P||Q)min D(Q||P)6Approximate inferenceThree major classes of general-purpose approachesMessage passingE.g.: Loopy Belief Propagation (today!)Inference as optimization Approximate posterior distribution by simple distributionMean field / structured mean fieldAssumed density filtering / expectation propagationSampling based inferenceImportance sampling, particle filteringGibbs sampling, MCMCMany other alternatives (often for special cases)7Sampling based inferenceSo far: deterministic inference techniquesLoopy belief propagation(Structured) mean field approximationAssumed density filteringWill now introduce stochastic approximationsAlgorithms that “randomize” to compute expectationsIn contrast to the deterministic methods, can sometimes get approximation guaranteesMore exact, but slower than deterministic variants8Computing expectationsOften, we’re not necessarily interested in computing marginal distributions, but certain expectations:Moments (mean, variance, …)Event probabilities9Sample approximations of expectationsx1,…,xNsamples from RV XLaw of large numbers:Hereby, the convergence is with probability 1 (almost sure convergence)Finite samples:10How many samples do we need?Hoeffding inequalitySuppose f is bounded in [0,C]. ThenThus, probability of error decreases exponentially in N!Need to be able to draw samples from P11Sampling from a Bernoulli distributionX ~ Bernoulli(p)How can we draw samples from X?12Sampling from a MultinomialX ~ Mult([θ,…,θk])where θi= P(X=i); ∑iθi= 1Function g: [0,1]{1,…,k} assigns state g(x) to each xDraw sample from uniform distribution on [0,1]Return g-1(x)θθθ…θk0113Forward sampling from a BN14Monte Carlo sampling from a BNSort variables in topological ordering X1,…,XnFor i = 1 to n doSample xi~ P(Xi| X1=x1, …, Xi-1=xi-1)Works even with high-treewidth models!CD IG SLJH15Computing probabilities through samplingWant to estimate probabilitiesDraw N samples from BNMarginalsConditionalsCD IG SLJH16Rejection samplingCollect samples over all variablesThrow away samples that disagree with xBCan be problematic if P(XB= xB) is rare event17Sample complexity for probability estimatesAbsolute error:Relative error:18Sampling from rare eventsEstimating conditional probabilities P(XA| XB=xB) using rejection sampling is hard!The more observations, the unlikelier P(XB= xB) becomesWant to directly sample from posterior distribution!19Sampling from intractable distributionsGiven unnormalized distributionP(X) ∝ Q(X)Q(X) efficient to evaluate, but normalizer intractableFor example, Q(X) = ∏jΨ(Cj)Want to sample from P(X)Ingenious idea: Can create Markov chain that is efficient to simulate and that has stationary distribution P(X)20Markov ChainsA Markov chain is a sequence of RVs, X1,…,XN,… withPrior P(X1)Transition probabilities P(Xt+1| Xt)A Markov Chain with P(Xt+1| Xt)>0 has a unique stationary distribution µµµµ(X), such that for all xlimN→∞P(XN=x) = µ(x)The stationary distribution is independent of P(X1) X1X2X3X4X5X621Simulating a Markov ChainCan sample from a Markov chain as from a BN:Sample x1~P(X1)Sample x2~P(X2| X1=x1)…Sample xN~P(XN| XN-1=xN-1)…If simulated “sufficiently long”, sample XNis drawn from a distribution “very close” to stationary distribution µ22Markov Chain Monte CarloGiven an unnormalized distribution Q(x)Want to design a Markov chain with stationary distributionπ(x) = 1/Z Q(x)Need to specify transition probabilities P(x | x’)!23Detailed balance equationA Markov Chain satisfies the detailed balance equationfor unnormalized distribution Q if for all x, x’:Q(x) P(x’|x) = Q(x’) P(x | x’)In this case, the Markov chain has stationary distribution 1/Z Q(x)24Designing Markov Chains1) Proposal distribution R(X’ | X)Given Xt= x, sample “proposal” x’~R(X’ | X=x)Performance of algorithm will strongly depend on R2) Acceptance distribution:Suppose Xt= xWith probabilityset Xt+1= x’With probability 1-α, set Xt+1= xTheorem [Metropolis, Hastings]: The stationary distribution is Z-1Q(x)Proof: Markov chain satisfies detailed balance condition!25MCMC for Graphical ModelsRandom vector X=(X1,…,Xn) is high-dimensionalNeed to specify proposal distributions R(x’|x) over such random vectorsx’: old state x: proposed state, x’ ~ R(X’ | X=x)Examples26Gibbs samplingStart with initial assignment x(0)to all variablesFor t = 1 to ∞ doSet x(t)= x(t-1)For each variable XiSet vi= values of all x(t)except xiSample x(t)ifrom P(Xi| vi)Gibbs sampling satisfies detailed balance equation for PKey challenge: Computing conditional distributions P(Xi| vi)27Computing P(Xi| vi)28Example: (Simple) image segmentation[see Singh ’08]29Gibbs Sampling iterations30Convergence of Gibbs SamplingWhen are we close to stationary distribution?31Summary of SamplingRandomized approximate inference for computing expections, (conditional) probabilities, etc.Exact in the limitBut may need ridiculously many samplesCan even directly sample from intractable distributionsDisguise distribution as stationary distribution of Markov ChainFamous example: Gibbs

View Full Document


School:
Email:
New Password:
Confirm Password:

CALTECH CS 155 - Probabilistic Graphical Models

Sign up for free to view:

Please select your school