Machine learning lecture 20 Tommi S Jaakkola MIT CSAIL tommi csail mit edu Topics Representation and graphical models examples Bayesian networks examples specification graphs and independence associated distribution Tommi Jaakkola MIT CSAIL 2 What is a good representation Properties of good representations 1 Explicit 2 Modular 3 Permits efficient computation 4 etc Tommi Jaakkola MIT CSAIL 3 Representation explicit Representation in terms of variables and dependencies a graphical model s1 s2 s3 s4 Representation in terms of state transitions transition diagram P s2 s1 P s3 s2 Tommi Jaakkola MIT CSAIL P s1 s1 s2 s3 4 Representation modular We can easily add remove components of the model Markov model s1 s2 s3 s4 s1 s2 s3 s4 x1 x2 x3 x4 Hidden Markov model Tommi Jaakkola MIT CSAIL 5 Representation efficient computation 1 2 s1 s2 s3 x1 x2 x3 Posterior marginals forward backward Max probabilities viterbi Tommi Jaakkola MIT CSAIL 6 Graphical models examples Factorial Hidden Markov model as a Bayesian network directed graphical model linguistic features acoustic observations Tommi Jaakkola MIT CSAIL 7 Graphical models examples Plates and repeated sampling topics class This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled words N M This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled each document has N words sampled from a distribution that depends on the choice of topics the topics for each document are sampled from a class conditional distribution Tommi Jaakkola MIT CSAIL 8 Graphical models examples Lattice models e g Ising model as a Markov random field s1 s2 symmetric interactions e g alignment of two nearby spins is energetically favorable Tommi Jaakkola MIT CSAIL 9 Graphical models examples Factor graphs and codes information theory y1 y2 y3 y4 y5 x1 Bits x2 x3 x5 x4 parity checks circles denote variables while the squares are factors functions that constrain the values of the variables Tommi Jaakkola MIT CSAIL 10 Graphical models linguistic features topics class M acoustic observations y1 s1 s2 words N y2 y3 y4 y5 x1 Bits x2 x3 x4 x5 parity checks Graph semantics graph separation properties independence Association with probability distributions independence family of distributions Inference and estimation graph structure efficient computation Tommi Jaakkola MIT CSAIL 11 Bayesian networks Bayesian networks are directed acyclic graphs where the nodes represent variables and directed edges capture dependencies parent of x A mixture model as a Bayesian network i influences x i causes x x depends on i i P i P x i x child of i Tommi Jaakkola MIT CSAIL 12 Bayesian networks Bayesian networks are directed acyclic graphs where the nodes represent variables and directed edges capture dependencies parent of x A mixture model as a Bayesian network i influences x i causes x x depends on i i P i P x i x child of i Graph semantics graph separation properties independence Association with probability distributions independence family of distributions Tommi Jaakkola MIT CSAIL 13 Example A simple Bayesian network coin tosses x2 x1 Tommi Jaakkola MIT CSAIL 14 Example A simple Bayesian network coin tosses x2 x1 0 5 0 5 P x2 P x1 0 5 0 5 Tommi Jaakkola MIT CSAIL 15 Example A simple Bayesian network coin tosses x2 x1 0 5 0 5 P x2 P x1 0 5 0 5 x3 same Tommi Jaakkola MIT CSAIL 16 Example A simple Bayesian network coin tosses x2 x1 0 5 0 5 P x2 P x1 0 5 0 5 x3 same hh ht th tt y 1 0 0 0 0 0 1 0 P x3 x1 x2 n 0 0 1 0 1 0 0 0 Tommi Jaakkola MIT CSAIL 17 Example A simple Bayesian network coin tosses x2 x1 0 5 0 5 P x2 P x1 0 5 0 5 x3 same hh ht th tt y 1 0 0 0 0 0 1 0 P x3 x1 x2 n 0 0 1 0 1 0 0 0 Two levels of description 1 graph structure dependencies independencies 2 associated probability distribution Tommi Jaakkola MIT CSAIL 18 Example cont d What can the graph alone tell us x1 x2 x3 same Tommi Jaakkola MIT CSAIL 19 Example cont d What can the graph alone tell us x1 x2 x3 same x1 and x2 are marginally independent Tommi Jaakkola MIT CSAIL 20 Example cont d What can the graph alone tell us x1 x2 x3 same x1 and x2 are marginally independent x1 x2 x3 same x1 and x2 become dependent if we know x3 the dependence concerns our beliefs about the outcomes Tommi Jaakkola MIT CSAIL 21 Traffic example N X is nice L traffic light S X decides to stop T the other car turns left C crash N L T S C Tommi Jaakkola MIT CSAIL 22 Traffic example N X is nice L traffic light S X decides to stop T the other car turns left C crash N L T S C Tommi Jaakkola MIT CSAIL 23 Traffic example N X is nice L traffic light S X decides to stop T the other car turns left C crash N L T S C Tommi Jaakkola MIT CSAIL 24 Traffic example N X is nice L traffic light S X decides to stop T the other car turns left C crash N L T S C Tommi Jaakkola MIT CSAIL 25 Traffic example N X is nice L traffic light S X decides to stop T the other car turns left C crash N L T S C Tommi Jaakkola MIT CSAIL 26 Traffic example N X is nice L traffic light S X decides to stop T the other car turns left C crash N L T S C If we only know that X decided to stop can X s character variable N tell us anything about the other car turning variable T Tommi Jaakkola MIT CSAIL 27 Graph independence d separation Are N and T independent given S N L T S C Tommi Jaakkola MIT CSAIL 28 Graph independence d separation Are N and T independent given S N L T S C Definition Variables N and T are D separated given S if S separates them in the moralized ancestral graph Tommi Jaakkola MIT CSAIL 29 Graph independence d separation Are N and T independent given S N L T S C Definition Variables N and T are D separated given S if S separates them in the moralized ancestral …
View Full Document
Unlocking...