DOC PREVIEW
CMU CS 10701 - lecture13-annotated

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Eric Xing 1Machine LearningMachine Learning1010--701/15701/15--781, Fall 2006781, Fall 2006Graphical Models IIGraphical Models IIInferenceInferenceEric XingEric XingLecture 13, October 26, 2006Reading: Chap. 8, C.B bookVisit to AsiaTuberculosisTuberculosisor CancerXRay ResultDyspneaBronchitisLung CancerSmokingX1X2X3X4X5X6X7X8Visit to AsiaTuberculosisTuberculosisor CancerXRay ResultDyspneaBronchitisLung CancerSmokingX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Eric Xing 2),,,,|(),,,|(),,|(),|()|()(=),,,,,( 543216432153214213121654321XXXXXXPXXXXXPXXXXPXXXPXXPXPXXXXXXPX1X2X3X4X5X6p(X6| X2, X5)p(X1)p(X5| X4)p(X4| X1)p(X2| X1)p(X3| X2)P(X1, X2, X3, X4, X5, X6)= P(X1) P(X2| X1) P(X3| X2) P(X4| X1) P(X5| X4) P(X6| X2, X5)Recap of Basic Prob. Conceptsz Joint probability dist. on multiple variables:z If Xi's are independent: (P(Xi|·)= P(Xi))z If Xi's are conditionally independent (as described by a GM), the joint can be factored to simpler products, e.g., ∏==iiXPXPXPXPXPXPXPXXXXXXP)()()()()()()(),,,,,( 6543216543212Eric Xing 3Eric Xing 4Structure: an undirected graph• Meaning: a node is conditionally independent of every other node in the network given its Directed neighbors• Local contingency functions (potentials) and the cliques in the graph completely determine the joint dist. •Give correlations between variables, but no explicit way to generate samplesXY1Y2Markov Random Fields3Eric Xing 5Representationz Defn: an undirected graphical model represents a distribution P(X1 ,…,Xn) defined by an undirected graph H, and a set of positive potential functions ycassociated with cliques of H, s.t.where Z is known as the partition function:z Also known as Markov Random Fields, Markov networks …z The potential function can be understood as an contingency function of its arguments assigning "pre-probabilistic" score of their joint configuration. ∏∈=CcccnZxxP )(),,( xψ11K∑∏∈=nxxCcccZ,,)(K1xψEric Xing 6Density estimationRegressionClassificationParametric and nonparametric methodsLinear, conditional mixture, nonparametricGenerative and discriminative approachQXQXXYm,sXXGMs are your old friends4Eric Xing 7(Picture by ZoubinGhahramani and Sam Roweis)An (incomplete) genealogy of graphical modelsEric Xing 8Probabilistic Inferencez We now have compact representations of probability distributions: Graphical Modelsz A GM Mdescribes a unique probability distribution Pz How do we answer queries about P?z We use inference as a name for the process of computing answers to such queries5Eric Xing 9z Most of the queries one may ask involve evidencez Evidence eis an assignment of values to a set E variables in the domainz Without loss of generality E = { Xk+1, …, Xn}z Simplest query: compute probability of evidencez this is often referred to as computing the likelihood of e∑∑=1)(1xxkk,e,x,xPP(e) KKQuery 1: LikelihoodEric Xing 10z Often we are interested in the conditional probability distribution of a variable given the evidencez this is the a posteriori belief in X, given evidence ez We usually query a subset Yof all domain variables X={Y,Z}and "don't care" about the remaining, Z:z the process of summing out the "don't care" variables zis called marginalization, and the resulting P(y|e) is called a marginal prob.∑===xx,e)P(XP(X,e)P(e)P(X,e)e)P(X |∑==ze)zP(Y,Ze)P(Y ||Query 2: Conditional Probability6Eric Xing 11A CB?A CB?Applications of a posteriori Beliefz Prediction: what is the probability of an outcome given the starting conditionz the query node is a descendent of the evidencez Diagnosis: what is the probability of disease/fault given symptomsz the query node an ancestor of the evidencez Learning under partial observationz fill in the unobserved values under an "EM" setting (more later)z The directionality of information flow between variables is not restricted by the directionality of the edges in a GMz probabilistic inference can combine evidence form all parts of the networkEric Xing 12z In this query we want to find the most probable joint assignment (MPA) for some variables of interestz Such reasoning is usually performed under some given evidence e, and ignoring (the values of) other variables z :z this is the maximum a posteriori configuration of y.∑==zyyezyPeyPeY )|,(maxarg)|(maxarg)|(MPAQuery 3: Most Probable Assignment7Eric Xing 13x y P(x,y)00 0.3501 0.0510 0.311 0.3Applications of MPAz Classification z find most likely label, given the evidencez Explanation z what is the most likely scenario, given the evidenceCautionary note:z The MPA of a variable depends on its "context"---the set of variables been jointly queriedz Example:z MPA of X ?z MPA of (X, Y) ?Eric Xing 14Thm:Computing P(X= x|e) in a GM is NP-hardz Hardness does not mean we cannot solve inferencez It implies that we cannot find a general procedure that works efficiently for arbitrary GMsz For particular families of GMs, we can have provably efficient proceduresComplexity of Inference8Eric Xing 15√√√√Approaches to inferencez Exact inference algorithmsz The elimination algorithmz The junction tree algorithms (but will not cover in detail here)z Approximate inference techniquesz Stochastic simulation / sampling methodsz Markov chain Monte Carlo methodsz Variational algorithms (will be covered in advanced ML courses)Eric Xing 16z A signal transduction pathway:z Query: P(e)z By chain decomposition, we getA B CED∑∑∑∑∑∑∑∑==dcbadcbadePcdPbcPabPaPe)P(a,b,c,d,eP)|()|()|()|()()(a naïve summation needs to enumerate over an exponential number of termsWhat is the likelihood that protein E is active?Marginalization and Elimination9Eric Xing 17A B CED∑∑∑ ∑∑∑∑∑==dcb adcbaabPaPdePcdPbcPdePcdPbcPabPaPeP)|()()|()|()|()|()|()|()|()()(Elimination on Chainsz Rearranging terms ...Eric Xing 18z Now we can perform innermost summationz This summation "eliminates" one variable from our summation argument at a "local cost".A B CEDX∑∑∑∑∑∑∑==dcbdcb abpdePcdPbcPabPaPdePcdPbcPeP)()|()|()|()|()()|()|()|()(Elimination on Chains10Eric Xing 19A B CED∑∑∑∑ ∑∑∑∑===dcdc bdcbcpdePcdPbpbcPdePcdPbpdePcdPbcPeP)()|()|()()|()|()|()()|()|()|()(XXElimination in Chainsz Rearranging and then summing again, we getEric Xing 20z Eliminate nodes one by one all the way to the end, we getz Complexity:z Each step costs O(|Val(Xi)|*|Val(Xi+1)|) operations: O(kn2)z Compare to naïve evaluation that sums over joint values of n-1 variables O(nk)A B CED∑=ddpdePeP )()|()(XXXXElimination in


View Full Document

CMU CS 10701 - lecture13-annotated

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download lecture13-annotated
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view lecture13-annotated and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view lecture13-annotated 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?