2005-2007 Carlos GuestrinBayesian Networks –Inference Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 21st, 20072005-2007 Carlos GuestrinHandwriting recognitionCharacter recognition, e.g., kernel SVMszcbcacrrrrrr2005-2007 Carlos GuestrinHandwriting recognition 22005-2007 Carlos GuestrinFactored joint distribution -PreviewFluAllergySinusHeadacheNose2005-2007 Carlos GuestrinKey: Independence assumptionsFluAllergySinusHeadacheNoseKnowing sinus separates the variables from each other2005-2007 Carlos GuestrinThe independence assumption FluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parents2005-2007 Carlos GuestrinExplaining awayFluAllergySinusHeadacheNoseLocal Markov Assumption:A variable X is independentof its non-descendants given its parents2005-2007 Carlos GuestrinThe Representation Theorem –Joint Distribution to BNJoint probabilitydistribution:ObtainBN:Encodes independenceassumptionsIf conditionalindependenciesin BN are subset of conditional independencies in P2005-2007 Carlos GuestrinA general Bayes net Set of random variables Directed acyclic graph Encodes independence assumptions CPTs Joint distribution:2005-2007 Carlos GuestrinHow many parameters in a BN? Discrete variables X1, …, Xn Graph Defines parents of Xi, PaXi CPTs – P(Xi| PaXi)2005-2007 Carlos GuestrinAnother example Variables: B – Burglar E – Earthquake A – Burglar alarm N – Neighbor calls R – Radio report Both burglars and earthquakes can set off the alarm If the alarm sounds, a neighbor may call An earthquake may be announced on the radio2005-2007 Carlos GuestrinIndependencies encoded in BN We said: All you need is the local Markov assumption (Xi⊥ NonDescendantsXi| PaXi) But then we talked about other (in)dependencies e.g., explaining away What are the independencies encoded by a BN? Only assumption is local Markov But many others can be derived using the algebra of conditional independencies!!!2005-2007 Carlos GuestrinUnderstanding independencies in BNs– BNs with 3 nodesZYXLocal Markov Assumption:A variable X is independentof its non-descendants given its parents Z YXZ YXZYXIndirect causal effect:Indirect evidential effect:Common cause:Common effect:2005-2007 Carlos GuestrinUnderstanding independencies in BNs– Some examplesAHCEGDBFKJI2005-2007 Carlos GuestrinAn active trail – ExampleA HCEGDBFF’’F’When are A and H independent?2005-2007 Carlos GuestrinActive trails formalized A path X1 – X2 – · · · –Xkis an active trail when variables O⊆{X1,…,Xn} are observed if for each consecutive triplet in the trail: Xi-1→Xi→Xi+1, and Xiis not observed (Xi∉O) Xi-1←Xi←Xi+1, and Xiis not observed (Xi∉O) Xi-1←Xi→Xi+1, and Xiis not observed (Xi∉O) Xi-1→Xi←Xi+1, and Xiis observed (Xi∈O), or one of its descendents2005-2007 Carlos GuestrinActive trails and independence? Theorem: Variables Xiand Xjare independent given Z⊆{X1,…,Xn} if the is no active trail between Xiand Xjwhen variables Z⊆{X1,…,Xn} are observedAHCEGDBFKJI2005-2007 Carlos GuestrinThe BN Representation TheoremIf joint probabilitydistribution:ObtainThen conditionalindependenciesin BN are subset of conditional independencies in PJoint probabilitydistribution:ObtainIf conditionalindependenciesin BN are subset of conditional independencies in PImportant because: Every P has at least one BN structure GImportant because: Read independencies of P from BN structure G2005-2007 Carlos GuestrinLearning Bayes netsMissing dataFully observable dataUnknown structureKnown structurex(1)…x(m)Datastructure parametersCPTs –P(Xi| PaXi)2005-2007 Carlos GuestrinLearning the CPTsx(1)…x(m)DataFor each discrete variable Xi2005-2007 Carlos GuestrinWhat you need to know Bayesian networks A compact representation for large probability distributions Not an algorithm Semantics of a BN Conditional independence assumptions Representation Variables Graph CPTs Why BNs are useful Learning CPTs from fully observable data Play with applet!!! ☺2005-2007 Carlos GuestrinGeneral probabilistic inference Query: Using Bayes rule: Normalization:FluAllergySinusHeadacheNose2005-2007 Carlos GuestrinMarginalizationFlu Sinus Nose=t2005-2007 Carlos GuestrinProbabilistic inference exampleFluAllergySinusHeadacheNose=tInference seems exponential in number of variables!Actually, inference in graphical models is NP-hard 2005-2007 Carlos GuestrinFast probabilistic inference example – Variable eliminationFluAllergySinusHeadacheNose=t(Potential for) Exponential reduction in computation!2005-2007 Carlos GuestrinUnderstanding variable elimination –Exploiting distributivityFlu Sinus Nose=t2005-2007 Carlos GuestrinUnderstanding variable elimination –Order can make a HUGE differenceFluAllergySinusHeadacheNose=t2005-2007 Carlos GuestrinUnderstanding variable elimination –Another examplePharmacySinusHeadacheNose=t2005-2007 Carlos GuestrinVariable elimination algorithm Given a BN and a query P(X|e) ∝ P(X,e) Instantiate evidence e Choose an ordering on variables, e.g., X1, …, Xn For i = 1 to n, If Xi∉{X,e} Collect factors f1,…,fkthat include Xi Generate a new factor by eliminating Xifrom these factors Variable Xihas been eliminated! Normalize P(X,e) to obtain P(X|e)IMPORTANT!!!2005-2007 Carlos GuestrinComplexity of variable elimination –(Poly)-tree graphsVariable elimination order:Start from “leaves” up –find topological order, eliminate variables in reverse orderLinear in number of variables!!! (versus exponential)2005-2007 Carlos GuestrinComplexity of variable elimination –Graphs with loopsExponential in number of variables in largest factor generated2005-2007 Carlos GuestrinComplexity of variable elimination –Tree-widthMoralize graph:Connect parents into a clique and remove edge directionsComplexity of VE elimination:(“Only”) exponential in tree-widthTree-width is maximum node cut +12005-2007 Carlos GuestrinExample: Large tree-width with small number of parentsCompact representation ⇒⇒⇒⇒ Easy inference 2005-2007 Carlos GuestrinChoosing an elimination order Choosing best order is NP-complete Reduction from MAX-Clique Many
View Full Document