DOC PREVIEW
MIT 6 867 - Lecture Notes

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine learning: lecture 22Tommi S. JaakkolaMIT [email protected]: graphs and probabilities• Directed graphical models (Bayesian networks)P (x1) · P (x2) · P (x3|x1, x2)x2given nothingx1indep. fromx2given nothingx1x2x3⇒ d-sep. ⇒ indep. ⇒ assoc. distributionx1d-sep. fromTommi Jaakkola, MIT CSAIL 2Review: graphs and probabilities• Directed graphical models (Bayesian networks)P (x1) · P (x2) · P (x3|x1, x2)x2given nothingx1indep. fromx2given nothingx1x2x3⇒ d-sep. ⇒ indep. ⇒ assoc. distributionx1d-sep. from• Undirected graphical models (Markov Random Fields)1Z· ψ13(x1, x3) · ψ23(x2, x3)x2given x3x1separated fromx2given x3x1x2x3⇒ separation ⇒ indep. ⇒ assoc. distributionx1indep. fromTommi Jaakkola, MIT CSAIL 3Outline• Quantitative probabilistic inference– diagnosis example– marginalization and message passing– cliques, clique trees, and the junction tree algorithmTommi Jaakkola, MIT CSAIL 4Three inference problems• Medical diagnosis example: binary disease variables d andpossible findings ff. . .. . .DiseasesFindingsd1. What are the marginal posterior probabilities over thediseases given observed findings f∗= {f∗1, . . . , f∗k}?2. What is the most likely setting of all the underlying diseasevariables?3. Which test should we carry out next in order to get themost information about the diseases?Tommi Jaakkola, MIT CSAIL 5Inference: graph transformationd1f1f2P (f2|d2, d3)P (f1|d1, d2)P (d2)P (d1) P (d3)d3d2Tommi Jaakkola, MIT CSAIL 6Inference: graph transformationψ12(d1, d2)f1f2P (f2|d2, d3)P (f1|d1, d2)P (d2)P (d1) P (d3)d3d2d1ψ12(d1, d2) = P (d1)P (d2)P (f∗1|d1, d2)Tommi Jaakkola, MIT CSAIL 7Inference: graph transformationψ23(d2, d3)f1f2P (f2|d2, d3)P (f1|d1, d2)P (d2)P (d1) P (d3)d3d2d1ψ12(d1, d2)ψ12(d1, d2) = P (d1)P (d2)P (f∗1|d1, d2)ψ23(d2, d3) = P (d3)P (f∗2|d2, d3)Tommi Jaakkola, MIT CSAIL 8Inference: graph transformationψ23(d2, d3)f1f2P (f2|d2, d3)P (f1|d1, d2)P (d2)P (d1) P (d3)d3d2d1ψ12(d1, d2)ψ12(d1, d2) = P (d1)P (d2)P (f∗1|d1, d2)ψ23(d2, d3) = P (d3)P (f∗2|d2, d3)• Joint distribution as a product of “interaction potentials”P (d1, d2, d3, data) = ψ12(d1, d2) · ψ23(d2, d3)Tommi Jaakkola, MIT CSAIL 9Inference: graph transformation• We have transformed the Bayesian network into anundirected graph model (Markov random field):ψ23(d2, d3)f1f2P (f2|d2, d3)P (f1|d1, d2)P (d2)P (d1) P (d3)d3d2d1ψ12(d1, d2)⇒on simple graph separationd2d3d1ψ23(d2, d3)ψ12(d1, d2)undirected interactionsindependence properties basedP (d1, d2, d3, data) = ψ12(d1, d2) · ψ23(d2, d3)Tommi Jaakkola, MIT CSAIL 10Marginalizationψ12(d1, d2)d2d3d1ψ23(d2, d3)• It suffices to evaluate the following probabilitiesP (d1, data) =!d2,d3P (d1, d2, d3, data)P (d2, data) =!d1,d3P (d1, d2, d3, data)P (d3, data) =!d1,d2P (d1, d2, d3, data)These will readily yield the posterior probabilities of interest:P (d1|data) = P (d1, data)/!d$1P (d$1, data)Tommi Jaakkola, MIT CSAIL 11Marginalization and messagesm1→2(d2)d2d3d1ψ23(d2, d3)ψ12(d1, d2)P (d2, d3, data) =!d1P (d1, d2, d3, data)Tommi Jaakkola, MIT CSAIL 12Marginalization and messagesm1→2(d2)d2d3d1ψ23(d2, d3)ψ12(d1, d2)P (d2, d3, data) =!d1P (d1, d2, d3, data)=!d1ψ12(d1, d2) · ψ23(d2, d3)Tommi Jaakkola, MIT CSAIL 13Marginalization and messagesm1→2(d2)d2d3d1ψ23(d2, d3)ψ12(d1, d2)P (d2, d3, data) =!d1P (d1, d2, d3, data)=!d1ψ12(d1, d2) · ψ23(d2, d3)=!d1ψ12(d1, d2)· ψ23(d2, d3)Tommi Jaakkola, MIT CSAIL 14Marginalization and messagesm1→2(d2)d2d3d1ψ23(d2, d3)ψ12(d1, d2)P (d2, d3, data) =!d1P (d1, d2, d3, data)=!d1ψ12(d1, d2) · ψ23(d2, d3)=!d1ψ12(d1, d2)· ψ23(d2, d3)= m1→2(d2) · ψ23(d2, d3)Tommi Jaakkola, MIT CSAIL 15Marginalization and messagesm2→3(d3)d2d3d1ψ23(d2, d3)ψ12(d1, d2)m1→2(d2)P (d3, data) =!d2P (d2, d3, data)Tommi Jaakkola, MIT CSAIL 16Marginalization and messagesm2→3(d3)d2d3d1ψ23(d2, d3)ψ12(d1, d2)m1→2(d2)P (d3, data) =!d2P (d2, d3, data)=!d2m1→2(d2) · ψ23(d2, d3) · 1Tommi Jaakkola, MIT CSAIL 17Marginalization and messagesm2→3(d3)d2d3d1ψ23(d2, d3)ψ12(d1, d2)m1→2(d2)P (d3, data) =!d2P (d2, d3, data)=!d2m1→2(d2) · ψ23(d2, d3) · 1=!d2m1→2(d2) · ψ23(d2, d3)· 1Tommi Jaakkola, MIT CSAIL 18Marginalization and messagesm2→3(d3)d2d3d1ψ23(d2, d3)ψ12(d1, d2)m1→2(d2)P (d3, data) =!d2P (d2, d3, data)=!d2m1→2(d2) · ψ23(d2, d3) · 1=!d2m1→2(d2) · ψ23(d2, d3)· 1= m2→3(d3) · 1Tommi Jaakkola, MIT CSAIL 19Marginalization and messagesm3→2(d2)d2d3d1ψ23(d2, d3)ψ12(d1, d2)m1→2(d2)P (d2, data) =!d3P (d2, d3, data)Tommi Jaakkola, MIT CSAIL 20Marginalization and messagesm3→2(d2)d2d3d1ψ23(d2, d3)ψ12(d1, d2)m1→2(d2)P (d2, data) =!d3P (d2, d3, data)=!d3m1→2(d2) · ψ23(d2, d3)Tommi Jaakkola, MIT CSAIL 21Marginalization and messagesm3→2(d2)d2d3d1ψ23(d2, d3)ψ12(d1, d2)m1→2(d2)P (d2, data) =!d3P (d2, d3, data)=!d3m1→2(d2) · ψ23(d2, d3)= m1→2(d2) ·!d3ψ23(d2, d3)Tommi Jaakkola, MIT CSAIL 22Marginalization and messagesm3→2(d2)d2d3d1ψ23(d2, d3)ψ12(d1, d2)m1→2(d2)P (d2, data) =!d3P (d2, d3, data)=!d3m1→2(d2) · ψ23(d2, d3)= m1→2(d2) ·!d3ψ23(d2, d3)= m1→2(d2) · m3→2(d2)Tommi Jaakkola, MIT CSAIL 23Message passing and trees• The same message passing approach (belief propagation)works for any tree structured modelψ12(d1, d2)d2d3d1d4ψ23(d2, d3)m1→2(d2) m2→3(d3)m4→2(d2) ψ24(d2, d4)m2→3(d3) =!d2m1→2(d2)m4→2(d2)ψ23(d2, d3)P (d2, data) = ?Tommi Jaakkola, MIT CSAIL 24Back to the diagnosis problem• This does not look like a tree...FindingsDiseasesTommi Jaakkola, MIT CSAIL 25Outline• Quantitative probabilistic inference– diagnosis example– marginalization and message passing– cliques, clique trees, and the junction tree algorithmTommi Jaakkola, MIT CSAIL 26Exact inference• All exact inference algorithms for Bayesian networks performessentially the same calculations but operate on differentrepresentations• The junction tree algorithm is a simple message passingalgorithm over clusters of variablesPreliminary steps:1. transform the Bayesian network into an undirected modelvia moralization (“marry parents”)2. triangulate the resulting undirected graph (add edges)3. identify the cliques (clusters) of the resulting triangulatedgraph4. construct the junction tree from the cliquesTommi Jaakkola, MIT CSAIL


View Full Document

MIT 6 867 - Lecture Notes

Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?