Undirected Graphical Models (finishing off)What you learned about so farBNs ! MNs: MoralizationFrom Markov nets to Bayes netsMNs ! BNs: TriangulationMarkov nets v. Pairwise MNsOverview of types of graphical models and transformations between themMean Field and Variational Methods First approximate inferenceApproximate inference overviewApproximating the posterior v. approximating the priorKL divergence: Distance between distributionsFind simple approximate distributionBack to graphical modelsD(p||q) for mean field – KL the right wayD(q||p) for mean field – KL the reverse directionD(q||p) for mean field – KL the reverse direction: Entropy termD(q||p) for mean field – KL the reverse direction: cross-entropy termWhat you need to know so farReverse KL & The Partition Function Back to the general caseUnderstanding Reverse KL, Energy Function & The Partition FunctionStructured Variational Approximate InferenceOptimization for mean fieldUnderstanding fixed point equationQi only needs to consider factors that intersect XiThere are many stationary points!Very simple approach for finding one stationary pointMore general structured approximationsWhat you need to know about variational methods1Undirected Graphical Models (finishing off)Graphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 3rd, 2008Readings:K&F: 4.1, 4.2, 4.3, 4.4, 4.5 10-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-20082What you learned about so farBayes netsJunction trees(General) Markov networksPairwise Markov networksFactor graphsHow do we transform between them?More formally:I give you an graph in one representation, find an I-map in the other10-708 – Carlos Guestrin 2006-20083BNs ! MNs: MoralizationTheorem: Given a BN G the Markov net H formed by moralizing G is the minimal I-map for I(G)Intuition:in a Markov net, each factor must correspond to a subset of a cliquethe factors in BNs are the CPTsCPTs are factors over a node and its parentsthus node and its parents must form a cliqueEffect:some independencies that could be read from the BN graph become hiddenDifficultySATGradeHappyJobCoherenceLetterIntelligenceDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-20084From Markov nets to Bayes netsExamGradeJobLetterIntelligenceSAT10-708 – Carlos Guestrin 2006-20085MNs ! BNs: TriangulationTheorem: Given a MN H, let G be the Bayes net that is a minimal I-map for I(H) then G must be chordalIntuition:v-structures in BN introduce immoralitiesthese immoralities were not present in a Markov netthe triangulation eliminates immoralitiesEffect:many independencies that could be read from the MN graph become hiddenExamGradeJobLetterIntelligenceSATExamGradeJobLetterIntelligenceSAT10-708 – Carlos Guestrin 2006-20086Markov nets v. Pairwise MNsEvery Markov network can be transformed into a Pairwise Markov netintroduce extra “variable” for each factor over three or more variablesdomain size of extra variable is exponential in number of vars in factorEffect:any local structure in factor is losta chordal MN doesn’t look chordal anymoreABC10-708 – Carlos Guestrin 2006-20087Overview of types of graphical models and transformations between them8Mean Field and Variational MethodsFirst approximate inferenceGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 3rd, 2008Readings:K&F: 10.1, 10.510-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-20089Approximate inference overviewSo far: VE & junction treesexact inferenceexponential in tree-widthThere are many many many many approximate inference algorithms for PGMsWe will focus on three representative ones:samplingvariational inferenceloopy belief propagation and generalized belief propagation10-708 – Carlos Guestrin 2006-200810Approximating the posterior v. approximating the priorPrior model represents entire world world is complicatedthus prior model can be very complicatedPosterior: after making observationssometimes can become much more sure about the way things aresometimes can be approximated by a simple modelFirst approach to approximate inference: find simple model that is “close” to posteriorFundamental problems:what is close?posterior is intractable result of inference, how can we approximate what we don’t have?DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200811KL divergence: Distance between distributionsGiven two distributions p and q KL divergence:D(p||q) = 0 iff p=qNot symmetric – p determines where difference is importantp(x)=0 and q(x) 0p(x)0 and q(x)=010-708 – Carlos Guestrin 2006-200812Find simple approximate distributionSuppose p is intractable posteriorWant to find simple q that approximates p KL divergence not symmetricD(p||q)true distribution p defines support of diff. the “correct” directionwill be intractable to computeD(q||p)approximate distribution defines supporttends to give overconfident resultswill be tractable10-708 – Carlos Guestrin 2006-200813Back to graphical modelsInference in a graphical model:P(x) = want to compute P(Xi|e)our p:What is the simplest q?every variable is independent:mean field approximationcan compute any prob. very efficiently10-708 – Carlos Guestrin 2006-200814D(p||q) for mean field – KL the right wayp:q:D(p||q)=10-708 – Carlos Guestrin 2006-200815D(q||p) for mean field – KL the reverse directionp:q:D(q||p)=10-708 – Carlos Guestrin 2006-200816D(q||p) for mean field – KL the reverse direction: Entropy termp:q:10-708 – Carlos Guestrin 2006-200817D(q||p) for mean field – KL the reverse direction: cross-entropy termp:q:10-708 – Carlos Guestrin 2006-200818What you need to know so farGoal:Find an efficient distribution that is close to posteriorDistance:measure distance in terms of KL divergenceAsymmetry of KL:D(p||q) D(q||p)Computing right KL is intractable, so we use the reverse KL10-708 – Carlos Guestrin 2006-200819Reverse KL & The Partition FunctionBack to the general caseConsider again the defn. of D(q||p):p is Markov net PFTheorem: where energy
View Full Document