1Graphical Models andBayesian Networks Machine Learning 10-701Tom M. MitchellCenter for Automated Learning and DiscoveryCarnegie Mellon UniversityNovember 1, 2005Required reading:• Ghahramani, section 2, “Learning Dynamic Bayesian Networks” (just 3.5 pages :-)Optional reading:• Mitchell, chapter 6.11 Bayesian Belief NetworksGraphical Models• Key Idea: – Conditional independence assumptions useful – but Naïve Bayes is extreme!– Graphical models express sets of conditional independence assumptions via graph structure– Graph structure plus associated parameters define joint probability distribution over set of variables/nodes• Two types of graphical models:– Directed graphs (aka Bayesian Networks)– Undirected graphs (aka Markov Random Fields)today2Graphical Models – Why Care?• Among most important ML developments of the decade• Graphical models allow combining:– Prior knowledge in form of dependencies/independencies– Observed data to estimate parameters• Principled and ~general methods for– Probabilistic inference– Learning• Useful in practice– Diagnosis, help systems, text analysis, time series models, ...Marginal IndependenceDefinition: X is marginally independent of Y ifEquivalently, ifEquivalently, if3Conditional IndependenceDefinition: X is conditionally independent of Y given Z, if the probability distribution governing X is independent of the value of Y, given the value of ZWhich we often writeE.g., Bayesian NetworkStormCloudsLightningRainThunderWindSurfBayes network: a directed acyclic graph defining a joint probability distribution over a set of variablesEach node denotes a random variableEach node is conditionally independent of its non-descendents, given its immediate parents.A conditional probability distribution (CPD) is associated with each node N, defining P(N | Parents(N))0.10.9¬L, ¬R0.80.2¬L, R1.00L, ¬R1.00L, R P(¬W|Pa)P(W|Pa)ParentsWindSurf4Bayesian Networks• Each node denotes a variable• Edges denote dependencies• CPD for each node Xidescribes P(Xi| Pa(Xi))• Joint distribution given by• Node Xiis conditionally independent of its non-descendents, given its immediate parentsParents = Pa(X) = immediate parentsAntecedents = parents, parents of parents, ...Children = immediate childrenDescendents = children, children of children, ...Bayesian Networks• CPD for each node Xidescribes P(Xi| Pa(Xi))• Chain rule of probability:• But in Bayes net:5StormCloudsLightningRainThunderWindSurf0.10.9¬L, ¬R0.80.2¬L, R1.00L, ¬R1.00L, R P(¬W|Pa)P(W|Pa)ParentsWindSurfHow Many Parameters?In full joint distribution?Given this Bayes Net?Bayes NetInference: P(BattPower=t | Radio=t, Starts=f)Most probable explanation:What is most likely value of Leak, BatteryPower given Starts=f?Active data collection:What is most useful variable to observe next, to improve our knowledge of node X?6Algorithm for Constructing Bayes Network• Choose an ordering over variables, e.g., X1, X2, ... Xn• For i=1 to n– Add Xito the network– Select parents Pa(Xi) as minimal subset of X1... Xi-1 such that Notice this choice of parents assures(by chain rule)(by construction)Example• Bird flu and Allegies both cause Nasal problems• Nasal problems cause Sneezes and Headaches7What is the Bayes Network for Naïve Bayes?Bayes Network for a Hidden Markov ModelAssume the future is conditionally independent of the past, given the presentSt-2St-1StSt+1St+2Ot-2Ot-1OtOt+1Ot+2Unobserved state:Observed output:8Conditional Independence, Revisited• We said:– Each node is conditionally independent of its non-descendents, given its immediate parents.• Does this rule give us all of the conditional independence relations implied by the Bayes network?–No!– E.g., X1 and X4 are conditionally indep given {X2, X3}– But X1 and X4 not conditionally indep given X3– For this, we need to understand D-separationX1X4 X2X3Explaining Away9X and Y are conditionally independent given Z, iff X and Y are D-separated by Z.D-connection: If G is a directed graph in which X, Y and Z are disjoint sets of vertices, then X and Y are d-connected by Z in G if and only if there exists an undirected path U between some vertex in X and some vertex in Y such that (1) for every collider C on U, either C or a descendent of C is in Z, and (2) no non-collider on U is in Z. X and Y are D-separatedby Z in G if and only if they are not D-connected by Z in G. See d-Separation Applethttp://www.andrew.cmu.edu/user/wimberly/dsep/dSep.htmlA0 and A2 conditionally indep. given {A1, A3}See d-Separation tutorial http://www.andrew.cmu.edu/user/scheines/tutor/d-sep.html10Inference in Bayes Nets• In general, intractable (NP-complete)• For certain cases, tractable– Assigning probability to fully observed set of variables– Or if just one variable unobserved– Or for singly connected graphs (ie., no undirected loops)• Belief propagation• For multiply connected graphs (no directed loops)• Junction tree• Sometimes use Monte Carlo methods– Generate a sample according to known distribution• Variational methods for tractable approximate solutionsLearning in Bayes Nets• Four categories of learning problems– Graph structure may be known/unknown– Variables may be observed/unobserved• Easy case: learn parameters for known graph structure, using fully observed data• Gruesome case: learn graph and parameters, from partly unobserved data• More on these in next lectures11Java Bayes Net Applethttp://www.pmr.poli.usp.br/ltd/Software/javabayes/Home/applet.htmlWhat You Should Know• Bayes nets are convenient representation for encoding dependencies / conditional independence• BN = Graph plus parameters of CPD’s– Defines joint distribution over variables– Can calculate everything else from that– Though inference may be intractable• Reading conditional independence relations from the graph– N cond indep of non-descendents, given parents– D-separation– ‘Explaining
View Full Document