CMU CS 10708 - undirected-variational - D865765

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10708> undirected-variational

DOC PREVIEW

CMU CS 10708 - undirected-variational

School name Carnegie Mellon University

Course Cs 10708- Probabilistic Graphical Models

Pages 28

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Undirected Graphical Models (finishing off)What you learned about so farBNs ! MNs: MoralizationFrom Markov nets to Bayes netsMNs ! BNs: TriangulationMarkov nets v. Pairwise MNsOverview of types of graphical models and transformations between themMean Field and Variational Methods First approximate inferenceApproximate inference overviewApproximating the posterior v. approximating the priorKL divergence: Distance between distributionsFind simple approximate distributionBack to graphical modelsD(p||q) for mean field – KL the right wayD(q||p) for mean field – KL the reverse directionD(q||p) for mean field – KL the reverse direction: Entropy termD(q||p) for mean field – KL the reverse direction: cross-entropy termWhat you need to know so farReverse KL & The Partition Function Back to the general caseUnderstanding Reverse KL, Energy Function & The Partition FunctionStructured Variational Approximate InferenceOptimization for mean fieldUnderstanding fixed point equationQi only needs to consider factors that intersect XiThere are many stationary points!Very simple approach for finding one stationary pointMore general structured approximationsWhat you need to know about variational methods1Undirected Graphical Models (finishing off)Graphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 3rd, 2008Readings:K&F: 4.1, 4.2, 4.3, 4.4, 4.5 10-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-20082What you learned about so farBayes netsJunction trees(General) Markov networksPairwise Markov networksFactor graphsHow do we transform between them?More formally:I give you an graph in one representation, find an I-map in the other10-708 – Carlos Guestrin 2006-20083BNs ! MNs: MoralizationTheorem: Given a BN G the Markov net H formed by moralizing G is the minimal I-map for I(G)Intuition:in a Markov net, each factor must correspond to a subset of a cliquethe factors in BNs are the CPTsCPTs are factors over a node and its parentsthus node and its parents must form a cliqueEffect:some independencies that could be read from the BN graph become hiddenDifficultySATGradeHappyJobCoherenceLetterIntelligenceDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-20084From Markov nets to Bayes netsExamGradeJobLetterIntelligenceSAT10-708 – Carlos Guestrin 2006-20085MNs ! BNs: TriangulationTheorem: Given a MN H, let G be the Bayes net that is a minimal I-map for I(H) then G must be chordalIntuition:v-structures in BN introduce immoralitiesthese immoralities were not present in a Markov netthe triangulation eliminates immoralitiesEffect:many independencies that could be read from the MN graph become hiddenExamGradeJobLetterIntelligenceSATExamGradeJobLetterIntelligenceSAT10-708 – Carlos Guestrin 2006-20086Markov nets v. Pairwise MNsEvery Markov network can be transformed into a Pairwise Markov netintroduce extra “variable” for each factor over three or more variablesdomain size of extra variable is exponential in number of vars in factorEffect:any local structure in factor is losta chordal MN doesn’t look chordal anymoreABC10-708 – Carlos Guestrin 2006-20087Overview of types of graphical models and transformations between them8Mean Field and Variational MethodsFirst approximate inferenceGraphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityNovember 3rd, 2008Readings:K&F: 10.1, 10.510-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-20089Approximate inference overviewSo far: VE & junction treesexact inferenceexponential in tree-widthThere are many many many many approximate inference algorithms for PGMsWe will focus on three representative ones:samplingvariational inferenceloopy belief propagation and generalized belief propagation10-708 – Carlos Guestrin 2006-200810Approximating the posterior v. approximating the priorPrior model represents entire world world is complicatedthus prior model can be very complicatedPosterior: after making observationssometimes can become much more sure about the way things aresometimes can be approximated by a simple modelFirst approach to approximate inference: find simple model that is “close” to posteriorFundamental problems:what is close?posterior is intractable result of inference, how can we approximate what we don’t have?DifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-200811KL divergence: Distance between distributionsGiven two distributions p and q KL divergence:D(p||q) = 0 iff p=qNot symmetric – p determines where difference is importantp(x)=0 and q(x) 0p(x)0 and q(x)=010-708 – Carlos Guestrin 2006-200812Find simple approximate distributionSuppose p is intractable posteriorWant to find simple q that approximates p KL divergence not symmetricD(p||q)true distribution p defines support of diff. the “correct” directionwill be intractable to computeD(q||p)approximate distribution defines supporttends to give overconfident resultswill be tractable10-708 – Carlos Guestrin 2006-200813Back to graphical modelsInference in a graphical model:P(x) = want to compute P(Xi|e)our p:What is the simplest q?every variable is independent:mean field approximationcan compute any prob. very efficiently10-708 – Carlos Guestrin 2006-200814D(p||q) for mean field – KL the right wayp:q:D(p||q)=10-708 – Carlos Guestrin 2006-200815D(q||p) for mean field – KL the reverse directionp:q:D(q||p)=10-708 – Carlos Guestrin 2006-200816D(q||p) for mean field – KL the reverse direction: Entropy termp:q:10-708 – Carlos Guestrin 2006-200817D(q||p) for mean field – KL the reverse direction: cross-entropy termp:q:10-708 – Carlos Guestrin 2006-200818What you need to know so farGoal:Find an efficient distribution that is close to posteriorDistance:measure distance in terms of KL divergenceAsymmetry of KL:D(p||q)  D(q||p)Computing right KL is intractable, so we use the reverse KL10-708 – Carlos Guestrin 2006-200819Reverse KL & The Partition FunctionBack to the general caseConsider again the defn. of D(q||p):p is Markov net PFTheorem: where energy

View Full Document