Directed Graphical Probabilistic Models:Directed Graphical Probabilistic Models: the son of the child of the bride of the sequelOutlineThe story so far: Bayes netsThe story so far: d-separationThe story so far: “Explaining away”Recap: Inference in linear chain networksRecap: Inference in polytreesRecap: Learning for Bayes netsSlide 10Recap: A detailed exampleA detailed exampleSlide 13Slide 14A more interesting exampleSome special cases of Bayes net learningAnother interesting exampleSlide 18Slide 19Slide 20Slide 21IE by text segmentationIE with Hidden Markov ModelsSlide 24Results: Comparative EvaluationLearning with hidden variablesLearning with Hidden Variables: ExampleSlide 28Slide 29Slide 30Slide 31Slide 32Slide 33Why does this work?Slide 35Jensen’s inequalitySlide 37Slide 1Directed Graphical Probabilistic Models:William W. CohenMachine Learning 10-601Feb 22 2008the sequel25Slide 2Directed Graphical Probabilistic Models:the son of the child of the bride of the sequelWilliam W. CohenMachine Learning 10-601Feb 27 2008Slide 3Outline•Quick recap•An example of learning •Given structure, find CPTs from “fully observed” data•Some interesting special cases of this•Learning with hidden variables•Expectation-maximization•Handwave argument for why EM worksSlide 4The story so far: Bayes nets•Many problems can be solved using the joint probability P(X1,…,Xn).•Bayes nets describe a way to compactly write the joint.•For a Bayes net:A BFirst guessThe moneyCThe goatDStick or swap?ESecond guessA P(A)1 0.332 0.333 0.33B P(B)1 0.332 0.333 0.33A B C P(C|A,B)1 1 2 0.51 1 3 0.51 2 3 1.01 3 2 1.0… … … …A C D P(E|A,C,D)… … … …))(|(),...,|(),...,(111iiiiiinXparentsXPXXXPXXP),,|()(),|()()(),,,|(),,|(),|()|()(),,,,(DCAEPDPBACPBPAPDCBAEPCBADPBACPABPAPEDCBAP•Conditional independence:)|(),|(,, |EXPYEXPYEXIEYXSlide 5The story so far: d-separation ZZ ZXYE)|(),|(| ,,EXPYEXPEYXYEXI)|(),|(:,, eExXPyYeExXPeyx There are three ways paths from X to Y given evidence E can be blocked.X is d-separated from Y given E iff all paths from X to Y given E are blocked…see there? If X is d-separated from Y given E, then I<X,E,Y>Slide 6The story so far: “Explaining away”X YE {}|XY 5.0)( YP5.0)( XPYXE X Y E P(E|X,Y) P(E,X,Y)0 0 0 0.96 0.240 0 1 0.04 0.010 1 0 0.96 0.010 1 1 0.96 0.241 0 0 0.04 0.011 0 1 0.96 0.241 1 0 0.04 0.011 1 1 0.96 0.241)0,1()0,1,1( )0,1|1( XEPXEYPXEYP3/2)1()1,1()1|1( EPEYPEYP | EXY X Y E P(E|X,Y) P(E,X,Y)P(X,Y|E=1)0 0 1 0.04 0.01 0.0140 1 1 0.96 0.24 0.3291 0 1 0.96 0.24 0.3291 1 1 0.96 0.24 0.329Slide 7Recap: Inference in linear chain networks Xj......XnX1)()|()...|()|(),...,(1122111XPXXPXXPXXPXXPnnnnn EE)|()|(1xXPXxPjjn),|(1 njxxXP“backward” “forward”Instead of recursion you can use “message passing” (forward-backward, Baum-Welsh)….Slide 8Recap: Inference in polytrees•Reduce P(X|E) to the product of two recursively calculated parts:•P(X=x|E+)•i.e., CPT for X and product of “forward” messages from parents •P(E-|X=x)•i.e., combination of “backward” messages from parents, CPTs, and P(Z|EZ\Yk), a simpler instance of P(X|E)•This can also be implemented by message-passing (belief propagation)jjXjuuEuPuuXP )|(), |(,21,21 jy zYZjkjkjkjkjYjk jkkjjEzPzXyPyEP )|(),|()|(\Slide 9Recap: Learning for Bayes nets•Input: •Sample of the joint:•Graph structure of the variables•for I=1,…,N, you know Xi and parents(Xi)•Output: •Estimated CPTs),....,(),....,,....,(),,....,,(122111211mnmnnxxxxxxxA BCDEA B C P(C|A,B)1 1 2 0.51 1 3 0.51 2 3 1.01 3 2 1.0… … … …B P(B)1 0.332 0.333 0.33…Method (discrete variables):• Estimate each CPT independently• Use a MLE or MAPSlide 10Recap: Learning for Bayes nets),....,(),....,,....,(),,....,,(122111211mnmnnxxxxxxxDA BCDEA B C P(C|A,B)1 1 2 0.51 1 3 0.51 2 3 1.01 3 2 1.0… … … …B P(B)1 0.332 0.333 0.33…Method (discrete variables):• Estimate each CPT independently• Use a MLE or MAP•MAP:uxjjjuXparentsxXP|:ˆ))(|(',':,:|:})({#})({#ˆxuxjjuxjjjuxjuXparentsuXparentsxXDDSlide 11Recap: A detailed exampleZXYZ X P(X|Z)undergr<2020s30+grad <2020s30+prof <2020s30+Z P(Z)undergrgradprofZ Y P(Y|Z)undergrfacebkthesisgrantsgrad facebkthesisgrantsprof facebkthesisgrantsZ X Yugrad <20 facebookugrad 20s facebookgrad 20s thesisgrad 20s facebookprof 30+ grants.ˆ|:ugz',':,:|:})({#})({#ˆxuxjjuxjjjuxjuXparentsuXparentsxXDD1....ˆ|:grz.ˆ|:grzD:ugrad <20 facebookugrad <20 thesisugrad <20 grantugrad 20s facebookugrad 20s …ugrad 30+ facebookugrad … …grad … ..Slide 12A detailed exampleZXYZ X P(X|Z)undergr<2020s30+Grad <2020s30+prof <2020s30+Z P(Z)undergr 0.375Grad 0.375Prof 0.250Z Y P(Y|Z)undergrfacebkthesisgrantsgrad facebkthesisgrantsprof facebkthesisgrantsZ X Yugrad <20 facebookugrad 20s facebookgrad 20s thesisgrad 20s facebookprof 30+ grants351252.ˆ|:ugz',':,:|:})({#})({#ˆxuxjjuxjjjuxjuXparentsuXparentsxXDD1...8251.ˆ|:przD: ugx |20:ˆugsx |20:ˆ ugx |30:ˆ351252.ˆ|:grz4.03211214.03211212.0321020Slide 13A detailed exampleZXYZ X P(X|Z)undergr<20 .420s .430+ .2grad <20 .220s .630+ .2Prof <20 .2520s .2530+ .5Z P(Z)undergr 0.375grad 0.375prof 0.250Z Y P(Y|Z)undergrfacebkthesisgrantsgrad facebkthesisgrantsprof facebkthesisgrantsZ X Yugrad <20 facebookugrad 20s facebookgrad 20s thesisgrad 20s facebookprof 30+ grants',':,:|:})({#})({#ˆxuxjjuxjjjuxjuXparentsuXparentsxXDD1...D:prfby |:ˆprthy |:ˆprgry |:ˆ25.0311125.031105.03110Slide 14A detailed exampleZXYZ X P(X|Z)undergr<20 .420s .430+ .2grad <20 .220s .630+ .2Prof <20 .2520s .2530+ .5Z P(Z)undergr 0.375grad 0.375prof 0.250Z Y P(Y|Z)undergrfacebk .6thesis .2grants .2grad facebk .4thesis .4grants .2prof facebk .25thesis .25grants .5Now we’re done learning: what can we do with this? • guess what your favorite professor is doing now?• given a new x,y compute P(prof|x,y), P(grad|x,y), P(ugrad|x,y)…using Bayes net
View Full Document