Machine Learning 10 701 Tom M Mitchell Machine Learning Department Carnegie Mellon University February 8 2011 Today Readings Required Bishop chapter 8 through 8 2 Graphical models Bayes Nets Representing distributions Conditional independencies Simple inference Simple learning Graphical Models Key Idea Conditional independence assumptions useful but Na ve Bayes is extreme Graphical models express sets of conditional independence assumptions via graph structure Graph structure plus associated parameters define joint probability distribution over set of variables nodes Two types of graphical models today Directed graphs aka Bayesian Networks Undirected graphs aka Markov Random Fields 1 Graphical Models Why Care Among most important ML developments of the decade Graphical models allow combining Prior knowledge in form of dependencies independencies Observed data to estimate parameters Principled and general methods for Probabilistic inference Learning Useful in practice Diagnosis help systems text analysis time series models Conditional Independence Definition X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given the value of Z Which we often write E g 2 Marginal Independence Definition X is marginally independent of Y if Equivalently if Equivalently if Represent Joint Probability Distribution over Variables 3 Describe network of dependencies Bayesian Networks define Joint Distribution in terms of this graph plus parameters 4 Bayesian Network Bayes network a directed acyclic graph defining a joint probability distribution over a set of variables Each node denotes a random variable StormClouds Rain Lightning Thunder A conditional probability distribution CPD is associated with each node N defining P N Parents N Parents P W Pa P W Pa L R 0 1 0 L R 0 1 0 L R 0 2 0 8 L R 0 9 0 1 WindSurf WindSurf The joint distribution over all variables in the network is defined in terms of these CPD s plus the graph Bayesian Network What can we say about conditional independencies in a Bayes Net One thing is this Each node is conditionally independent of its non descendents given only its immediate parents StormClouds Lightning Thunder Rain WindSurf Parents P W Pa P W Pa L R 0 1 0 L R 0 1 0 L R 0 2 0 8 L R 0 9 0 1 WindSurf 5 Bayesian Networks Definition A Bayes network represents the joint probability distribution over a collection of random variables A Bayes network is a directed acyclic graph and a set of CPD s Each node denotes a random variable Edges denote dependencies CPD for each node Xi defines P Xi Pa Xi The joint distribution over all variables is defined as Pa X immediate parents of X in the graph Some helpful terminology Parents Pa X immediate parents Antecedents parents parents of parents Children immediate children Descendents children children of children 6 Bayesian Networks CPD for each node Xi describes P Xi Pa Xi Chain rule of probability But in a Bayes net How Many Parameters StormClouds Lightning Rain Parents P W Pa P W Pa L R 0 1 0 L R 0 1 0 L R 0 2 0 8 L R 0 9 0 1 WindSurf Thunder WindSurf In full joint distribution Given this Bayes Net 7 Bayes Net Inference P BattPower t Radio t Starts f Most probable explanation What is most likely value of Leak BatteryPower given Starts f Active data collection What is most useful variable to observe next to improve our knowledge of node X Algorithm for Constructing Bayes Network Choose an ordering over variables e g X1 X2 Xn For i 1 to n Add Xi to the network Select parents Pa Xi as minimal subset of X1 Xi 1 such that Notice this choice of parents assures by chain rule by construction 8 Example Bird flu and Allegies both cause Nasal problems Nasal problems cause Sneezes and Headaches What is the Bayes Network for X1 Xn with NO assumed conditional independencies 9 What is the Bayes Network for Na ve Bayes What do we do if variables are mix of discrete and real valued 10 Bayes Network for a Hidden Markov Model Assume the future is conditionally independent of the past given the present Unobserved state St 2 St 1 St St 1 St 2 Observed output Ot 2 Ot 1 Ot Ot 1 Ot 2 How Can We Train a Bayes Net 1 when graph is given and each training example gives value of every RV Easy use data to obtain MLE or MAP estimates of for each CPD P Xi Pa Xi e g like training the CPD s of a na ve Bayes classifier 2 when graph unknown or some RV s unobserved this is more difficult later 11
View Full Document