Graphical ModelsAarti SinghSlides Courtesy: Carlos GuestrinMachine Learning 10-701/15-781Nov 10, 2010Recitation• HMMs & Graphical Models• Strongly recommended!!• Place: NSH 1507 (Note)• Time: 5-6 pmMiniid to dependent dataHMM Graphical Models- sequential dependence - general dependenceApplications• Character recognition, e.g., kernel SVMsccccccrrrrrrApplications• Webpage ClassificationSportsScienceNewsApplications• Speech recognition• Diagnosis of diseases• Study Human genome• Robot mapping• Modeling fMRI data• Fault diagnosis• Modeling sensor network data• Modeling protein-protein interactions• Weather prediction• Computer vision• Statistical physics• Many, many more …Graphical Models• Key Idea:– Conditional independence assumptions useful– but Naïve Bayes is extreme!– Graphical models express sets of conditional independence assumptions via graph structure– Graph structure plus associated parameters define joint probability distribution over set of variables/nodes• Two types of graphical models:– Directed graphs (aka Bayesian Networks)– Undirected graphs (aka Markov Random Fields)Topics in Graphical Models• Representation– Which joint probability distributions does a graphical model represent?• Inference– How to answer questions about the joint probability distribution?• Marginal distribution of a node variable• Most likely assignment of node variables• Learning– How to learn the parameters and structure of a graphical model?Conditional Independence9• X is conditionally independent of Y given Z:probability distribution governing X is independent of the value of Y, given the value of Z• Equivalent to:• Also to:Directed - Bayesian Networks• Representation– Which joint probability distributions does a graphical model represent?For any arbitrary distribution,Chain rule:More generally: Fully connecteddirected graph between X1, …, XnDirected - Bayesian Networks• Representation– Which joint probability distributions does a graphical model represent?Absence of edges in a graphical model conveys useful information.Directed - Bayesian Networks• Representation– Which joint probability distributions does a graphical model represent?BN is a directed acyclic graph (DAG) that provides a compact representation for joint distributionLocal Markov Assumption: A variable X is independent of its non-descendants given its parents (only the parents)Bayesian Networks Example• Suppose we know the following:– The flu causes sinus inflammation– Allergies cause sinus inflammation– Sinus inflammation causes a runny nose– Sinus inflammation causes headaches• Causal Network• Local Markov Assumption: If you have no sinus infection, then flu has no influence on headache (flu causes headache but only through sinus)FluAllergySinusHeadacheNoseMarkov independence assumption Local Markov Assumption: A variable X is independent of its non-descendants given its parents (only the parents)parents non-desc assumptionS HNFAFluAllergySinusHeadacheNoseF,A - -S F,A,N H {F,A,N}|SS F,A,H N {F,A,H}|S- A F A- F A FMarkov independence assumption FluAllergySinusHeadacheNoseLocal Markov Assumption: A variable X is independent of its non-descendants given its parents (only the parents)Joint distribution:P(F, A, S, H, N) = P(F) P(F|A) P(S|F,A) P(H|S,F,A) P(N|S,F,A,H)Chain rule= P(F) P(A) P(S|F,A) P(H|S) P(N|S)Markov AssumptionF A, H {F,A}|S, N {F,A,H}|SHow many parameters in a BN?• Discrete variables X1, …, Xn• Directed Acyclic Graph (DAG)– Defines parents of Xi, PaXi• CPTs (Conditional Probability Tables)– P(Xi| PaXi)E.g. Xi= S, PaXi= {F, A}F=f, A=f F=t, A=f F=f, A=t F=t,A=t S=t 0.9 0.8 0.7 0.3S=f 0.1 0.2 0.3 0.7n variables, K values, max d parents/node O(nK x Kd)FASHNTwo (trivial) special casesFully disconnected graph Fully connected graphXiXiparents: parents: X1, …, Xi-1non-descendants: X1,…,Xi-1, non-descendants: Xi+1,…, XnXi X1,…,Xi-1,Xi+1,…, XnNo independence assumptionX1X2X3X4X1X2X3X4Bayesian Networks Example• Naïve Bayes Xi X1,…,Xi-1,Xi+1,…, Xn|YP(X1,…,Xn,Y) =P(Y)P(X1|Y)…P(X1|Y)• HMMX1X2X3X4YO1O2OT-1OTS1S2ST-1STExplaining AwayFluAllergySinusHeadacheNoseLocal Markov Assumption: A variable X is independent of its non-descendants given its parents (only the parents)F A P(F|A=t) = P(F)F A|S ?P(F|A=t,S=t) = P(F|S=t)?P(F=t|S=t) is high, but P(F=t|A=t,S=t) not as highsince A = t explains away S=tInfact, P(F=t|A=t,S=t) < P(F=t|S=t)F A|N ? No!No!Independencies encoded in BN• We said: All you need is the local Markov assumption– (Xi NonDescendantsXi| PaXi)• But then we talked about other (in)dependencies– e.g., explaining away• What are the independencies encoded by a BN?– Only assumption is local Markov– But many others can be derived using the algebra of conditional independencies!!!D-separation• a is D-separated from b by c ≡ a b|c• Three important configurationsca……bCausal directioncCommon causeabcV-structure(Explaining away)a bca b…D-separation• A, B, C – non-intersecting set of nodes• A is D-separated from B by C ≡ A B|Cif all paths between nodes in A & B are “blocked”i.e. path contains a node z such that eitherand z in C, ORand neither z nor any of its descendants is in C.z zzD-separation Examplea fecbz zzAnd z in CAnd neither z nor its descendants are in Cora b | f ?Yes, Consider z = f or z = ea b | c ?No, Consider z = eA is D-separated from B by C if every path between A and B contains a node z such that eitherRepresentation Theorem• Set of distributions that factorize according to the graph - F• Set of distributions that respect conditional independencies implied by d-separation properties of graph – IF II FImportant because: Given independencies of P can get BN structure GImportant because: Read independencies of P from BN structure GMarkov Blanket• Conditioning on the Markov Blanket, node i is independent of all other nodes.Only terms that remain are the ones which involve i• Markov Blanket of node i - Set of parents, children and co-parents of node iUndirected – Markov Random Fields• Popular in statistical physics and computer vision communities• Example – Image Denoising xi– value at pixel iyi– observed noisy valueConditional Independence properties• No directed edges•
View Full Document