CMU CS 10601 - Directed Graphical Probabilistic Models - D2384199

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10601> Directed Graphical Probabilistic Models

DOC PREVIEW

CMU CS 10601 - Directed Graphical Probabilistic Models

School name Carnegie Mellon University

Course Cs 10601- Introduction to Machine Learning

Pages 37

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 37 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Directed Graphical Probabilistic Models:Directed Graphical Probabilistic Models: the son of the child of the bride of the sequelOutlineThe story so far: Bayes netsThe story so far: d-separationThe story so far: “Explaining away”Recap: Inference in linear chain networksRecap: Inference in polytreesRecap: Learning for Bayes netsSlide 10Recap: A detailed exampleA detailed exampleSlide 13Slide 14A more interesting exampleSome special cases of Bayes net learningAnother interesting exampleSlide 18Slide 19Slide 20Slide 21IE by text segmentationIE with Hidden Markov ModelsSlide 24Results: Comparative EvaluationLearning with hidden variablesLearning with Hidden Variables: ExampleSlide 28Slide 29Slide 30Slide 31Slide 32Slide 33Why does this work?Slide 35Jensen’s inequalitySlide 37Slide 1Directed Graphical Probabilistic Models:William W. CohenMachine Learning 10-601Feb 22 2008the sequel25Slide 2Directed Graphical Probabilistic Models:the son of the child of the bride of the sequelWilliam W. CohenMachine Learning 10-601Feb 27 2008Slide 3Outline•Quick recap•An example of learning •Given structure, find CPTs from “fully observed” data•Some interesting special cases of this•Learning with hidden variables•Expectation-maximization•Handwave argument for why EM worksSlide 4The story so far: Bayes nets•Many problems can be solved using the joint probability P(X1,…,Xn).•Bayes nets describe a way to compactly write the joint.•For a Bayes net:A BFirst guessThe moneyCThe goatDStick or swap?ESecond guessA P(A)1 0.332 0.333 0.33B P(B)1 0.332 0.333 0.33A B C P(C|A,B)1 1 2 0.51 1 3 0.51 2 3 1.01 3 2 1.0… … … …A C D P(E|A,C,D)… … … …))(|(),...,|(),...,(111iiiiiinXparentsXPXXXPXXP),,|()(),|()()(),,,|(),,|(),|()|()(),,,,(DCAEPDPBACPBPAPDCBAEPCBADPBACPABPAPEDCBAP•Conditional independence:)|(),|(,, |EXPYEXPYEXIEYXSlide 5The story so far: d-separation ZZ ZXYE)|(),|(| ,,EXPYEXPEYXYEXI)|(),|(:,, eExXPyYeExXPeyx There are three ways paths from X to Y given evidence E can be blocked.X is d-separated from Y given E iff all paths from X to Y given E are blocked…see there? If X is d-separated from Y given E, then I<X,E,Y>Slide 6The story so far: “Explaining away”X YE {}|XY 5.0)( YP5.0)( XPYXE X Y E P(E|X,Y) P(E,X,Y)0 0 0 0.96 0.240 0 1 0.04 0.010 1 0 0.96 0.010 1 1 0.96 0.241 0 0 0.04 0.011 0 1 0.96 0.241 1 0 0.04 0.011 1 1 0.96 0.241)0,1()0,1,1( )0,1|1( XEPXEYPXEYP3/2)1()1,1()1|1( EPEYPEYP | EXY X Y E P(E|X,Y) P(E,X,Y)P(X,Y|E=1)0 0 1 0.04 0.01 0.0140 1 1 0.96 0.24 0.3291 0 1 0.96 0.24 0.3291 1 1 0.96 0.24 0.329Slide 7Recap: Inference in linear chain networks Xj......XnX1)()|()...|()|(),...,(1122111XPXXPXXPXXPXXPnnnnn EE)|()|(1xXPXxPjjn),|(1 njxxXP“backward” “forward”Instead of recursion you can use “message passing” (forward-backward, Baum-Welsh)….Slide 8Recap: Inference in polytrees•Reduce P(X|E) to the product of two recursively calculated parts:•P(X=x|E+)•i.e., CPT for X and product of “forward” messages from parents •P(E-|X=x)•i.e., combination of “backward” messages from parents, CPTs, and P(Z|EZ\Yk), a simpler instance of P(X|E)•This can also be implemented by message-passing (belief propagation)jjXjuuEuPuuXP )|(), |(,21,21 jy zYZjkjkjkjkjYjk jkkjjEzPzXyPyEP )|(),|()|(\Slide 9Recap: Learning for Bayes nets•Input: •Sample of the joint:•Graph structure of the variables•for I=1,…,N, you know Xi and parents(Xi)•Output: •Estimated CPTs),....,(),....,,....,(),,....,,(122111211mnmnnxxxxxxxA BCDEA B C P(C|A,B)1 1 2 0.51 1 3 0.51 2 3 1.01 3 2 1.0… … … …B P(B)1 0.332 0.333 0.33…Method (discrete variables):• Estimate each CPT independently• Use a MLE or MAPSlide 10Recap: Learning for Bayes nets),....,(),....,,....,(),,....,,(122111211mnmnnxxxxxxxDA BCDEA B C P(C|A,B)1 1 2 0.51 1 3 0.51 2 3 1.01 3 2 1.0… … … …B P(B)1 0.332 0.333 0.33…Method (discrete variables):• Estimate each CPT independently• Use a MLE or MAP•MAP:uxjjjuXparentsxXP|:ˆ))(|(',':,:|:})({#})({#ˆxuxjjuxjjjuxjuXparentsuXparentsxXDDSlide 11Recap: A detailed exampleZXYZ X P(X|Z)undergr<2020s30+grad <2020s30+prof <2020s30+Z P(Z)undergrgradprofZ Y P(Y|Z)undergrfacebkthesisgrantsgrad facebkthesisgrantsprof facebkthesisgrantsZ X Yugrad <20 facebookugrad 20s facebookgrad 20s thesisgrad 20s facebookprof 30+ grants.ˆ|:ugz',':,:|:})({#})({#ˆxuxjjuxjjjuxjuXparentsuXparentsxXDD1....ˆ|:grz.ˆ|:grzD:ugrad <20 facebookugrad <20 thesisugrad <20 grantugrad 20s facebookugrad 20s …ugrad 30+ facebookugrad … …grad … ..Slide 12A detailed exampleZXYZ X P(X|Z)undergr<2020s30+Grad <2020s30+prof <2020s30+Z P(Z)undergr 0.375Grad 0.375Prof 0.250Z Y P(Y|Z)undergrfacebkthesisgrantsgrad facebkthesisgrantsprof facebkthesisgrantsZ X Yugrad <20 facebookugrad 20s facebookgrad 20s thesisgrad 20s facebookprof 30+ grants351252.ˆ|:ugz',':,:|:})({#})({#ˆxuxjjuxjjjuxjuXparentsuXparentsxXDD1...8251.ˆ|:przD: ugx |20:ˆugsx |20:ˆ ugx |30:ˆ351252.ˆ|:grz4.03211214.03211212.0321020Slide 13A detailed exampleZXYZ X P(X|Z)undergr<20 .420s .430+ .2grad <20 .220s .630+ .2Prof <20 .2520s .2530+ .5Z P(Z)undergr 0.375grad 0.375prof 0.250Z Y P(Y|Z)undergrfacebkthesisgrantsgrad facebkthesisgrantsprof facebkthesisgrantsZ X Yugrad <20 facebookugrad 20s facebookgrad 20s thesisgrad 20s facebookprof 30+ grants',':,:|:})({#})({#ˆxuxjjuxjjjuxjuXparentsuXparentsxXDD1...D:prfby |:ˆprthy |:ˆprgry |:ˆ25.0311125.031105.03110Slide 14A detailed exampleZXYZ X P(X|Z)undergr<20 .420s .430+ .2grad <20 .220s .630+ .2Prof <20 .2520s .2530+ .5Z P(Z)undergr 0.375grad 0.375prof 0.250Z Y P(Y|Z)undergrfacebk .6thesis .2grants .2grad facebk .4thesis .4grants .2prof facebk .25thesis .25grants .5Now we’re done learning: what can we do with this? • guess what your favorite professor is doing now?• given a new x,y compute P(prof|x,y), P(grad|x,y), P(ugrad|x,y)…using Bayes net

View Full Document