1Probabilistic Graphical Models10-708Eric Xing and Carlos Eric Xing and Carlos GuestrinGuestrinLecture 1 Sep 12, 2005 Class webpage: http://www.cs.cmu.edu/~epxing/Class/10708/ http://www.cs.cmu.edu/~guestrin/Class/10708/Logistics2Logistics Bi-weekly homework: 50% of grade Theory exercises Implementation exercises Final project: 30% of grade Applying PGM to your research area• NLP, IR, Computational Biology, vision, graphics … Theoretical and/or algorithmic work • a more efficient approximate inference algorithm• a new sampling scheme for a non-trivial model … Take home final: 20% of grade Theory exercises and/or analysis Policies …Logistics No formal text book, but draft chapters will be handed out in class: M. I. Jordan, An Introduction to Probabilistic Graphical Models Daphne Koller and Nir Friedman, Bayesian Networks and Beyond Mailing Lists: To contact the instructors: [email protected] Class announcements list: [email protected]. TA: Pradeep Ravikumar, Wean Hall 3713, Office hours: Fridays 4-5pm Class Assistant: Monica Hopes, Wean Hall 4616, x8-55273Visit to AsiaTuberculosisTuberculosisor CancerXRay ResultDyspneaBronchitisLung CancerSmokingX1X2X3X4X5X6X7X8What is a graphical model?--- example from medical diagnostics A possible world for a patient with lung problem: Representation: what is the joint probability dist. on multiple variables? How many state configurations in total? --- 28 Are they all needed to be represented? Do we get any scientific/medical insight? Learning: where do we get all this probabilities? Maximal-likelihood estimation? but how many data do we need? Where do we put domain knowledge in terms of plausible relationships between variables, and plausible values of the probabilities? Inference: If not all variables are observable, how to compute the conditional distribution of latent variables given evidence?),,,,,,,,( 87654321XXXXXXXXPRecap of Basic Prob. Concepts4Visit to AsiaTuberculosisTuberculosisor CancerXRay ResultDyspneaBronchitisLung CancerSmokingPatient InformationMedical DifficultiesDiagnostic TestsX1X2X3X4X5X6X7X8Dependencies among variables If Xi's are conditionally independent (as described by a PGM), the joint can be factored to a product of simpler terms, e.g., Why we may favor a PGM? Representation cost: how many probability statements are needed? Algorithms for systematic and efficient inference/learning computation• Exploring the graph structure and probabilistic (e.g., Bayesian, Markovian) semantics P(X1, X2, X3, X4, X5, X6, X7, X8)= P(X1) P(X2) P(X3| X1) P(X4| X2) P(X5| X2)P(X6| X3, X4) P(X7| X6) P(X8| X5, X6)Visit to AsiaTuberculosisTuberculosisor CancerXRay ResultDyspneaBronchitisLung CancerSmokingX1X2X3X4X5X6X7X8Visit to AsiaTuberculosisTuberculosisor CancerXRay ResultDyspneaBronchitisLung CancerSmokingX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Probabilistic Graphical Models2+2+4+4+4+8+4+8=36, an 8-fold reduction from 28!5 Directed edges give causality relationships (Bayesian Network or Directed Graphical Model): Undirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model):Two types of GMsStructure: DAG• Meaning: a node is conditionally independentof every other node in the network outside its Markov blanket• Local conditional distributions (CPD) and the DAGcompletely determine the joint dist. •Give causality relationshipsXY1Y2DescendentAncestorParentChildren's co-parentChildren's co-parentChildBayesian Network60.25a10.75a00.67b10.33b00.550.45a0b001a0b10.10.9a1b0a1b10.3c10.7c0ABCP(a,b,c.d) = P(a)P(b)P(c|a,b)P(d|c)D070.3c0c10.5d10.5d0Conditional probability tables (CPTs)ABCP(a,b,c.d) = P(a)P(b)P(c|a,b)P(d|c)DA~N(µa, Σa)B~N(µb, Σb)C~N(A+B, Σc)D~N(µa+C, Σa)DCP(D| C)Conditional probability density func. (CPDs)7Structure: an undirected graph• Meaning: a node is conditionally independent of every other node in the network given its direct neighbors• Local contingency functions (potentials) and the cliques in the graph completely determine the joint dist. •Give correlations between variables XY1Y2Markov Random FieldsDensity estimationRegressionClassificationParametric and nonparametric methodsLinear, conditional mixture, nonparametricGenerative and discriminative approachQXQXXYm,sXXSimplest GMs: the building blocks8(Picture by ZoubinGhahramani and Sam Roweis)An (incomplete) genealogy of graphical models Computing statistical queries regarding the network, e.g.: Is node X independent on node Y given nodes Z,W ? What is the probability of X=true if (Y=false and Z=true)? What is the joint distribution of (X,Y) if R=false? What is the likelihood of some full assignment? What is the most likely assignment of values to all or a subset the nodes of the network? General purpose algorithms exist to fully automate such computation Computational cost depends on the topology of the network Exact inference: • The junction tree algorithm Approximate inference; • Loopy belief propagation, variational inference, Monte Carlo sampling Probabilistic Inference9The goal:Given set of independent samples (assignments of random variables), find the best (the most likely?) Bayesian Network (both DAG and CPDs)(B,E,A,C,R)=(T,F,F,T,F)(B,E,A,C,R)=(T,F,T,T,F)……..(B,E,A,C,R)=(F,T,T,T,F)ERBAC0.9 0.1ebe0.2 0.80.01 0.990.9 0.1bebbeBEP(A | E,B)ERBACLearning Graphical ModelsApplication of GMs Machine Learning Computational statistics Computer vision and graphics Natural language processing Informational retrieval Robotic control Decision making under uncertainty Error-control codes Computational biology Genetics and medical diagnosis/prognosis Finance and economics Etc.10Genetic PedigreeA0A1AgB0B1BgM0M1F0F1FgC0C1CgSgSpeech recognitionA AA AX2X3X1XTY2Y3Y1YT... ... Hidden Markov Model11Reinforcement learning Partially observed Markov decision processes (POMDP)Evolution ancestorACQhQmT years?AGAGACTree Model12Solid State PhysicsIsing/Potts model Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data
View Full Document