MIT 6 867 - Machine learning: lecture 20 - D2990851

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 867> Machine learning: lecture 20

DOC PREVIEW

MIT 6 867 - Machine learning: lecture 20

School name Massachusetts Institute of Technology

Course 6 867- Machine Learning

Pages 38

This preview shows page 1-2-3-18-19-36-37-38 out of 38 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 38 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine learning: lecture 20Tommi S. JaakkolaMIT [email protected]• Repres entation and graphical models– examples• Bayesian networks– examples, specification– graphs and independence– associated distributionTommi Jaakkola, MIT CSAIL 2What is a good representation?• Properties of good representations1. Explicit2. Modular3. Permits efficient computation4. etc.Tommi Jaakkola, MIT CSAIL 3Representation: explicit• Repres entation in terms of variables and dependencies (agraphical model):s1s3s4s2• Repres entation in terms of state transitions (transitiondiagram). . .. . .. . .P (s3|s2)s1s2s3P (s1)P (s2|s1)Tommi Jaakkola, MIT CSAIL 4Representation: modular• We can easily add/remove c omponents of the modelMarkov models1s3s4s2Hidden Markov models4s2s1x3x4x2x1s3Tommi Jaakkola, MIT CSAIL 5Representation: efficient computations3s121x3x1x2s2• Post erior marginals (forward-backward)• Max-probabilities (viterbi)Tommi Jaakkola, MIT CSAIL 6Graphical models: examples• Factorial Hidden Markov model as a Bayesian network(directed graphical model). . .acoustic observationsfeatureslinguistic. . .. . .Tommi Jaakkola, MIT CSAIL 7Graphical models: examples• Plates and repeated samplingThis paper shows that theaccuracy of learned textaugmenting a small number oflabeled training documentswith a large pool of unlabeledclassifiers can be improved byThis paper shows that theaccuracy of learned textaugmenting a small number oflabeled training documentswith a large pool of unlabeledclassifiers can be improved byThis paper shows that theaccuracy of learned textaugmenting a small number oflabeled training documentswith a large pool of unlabeledclassifiers can be improved byThis paper shows that theaccuracy of learned textaugmenting a small number oflabeled training documentswith a large pool of unlabeledclassifiers can be improved byThis paper shows that theaccuracy of learned textaugmenting a small number oflabeled training documentswith a large pool of unlabeledclassifiers can be improved byMtopicswordsclassN– each document has N words, sampled from a distributionthat depends on the choice of topics– the topics for each document are sampled from a classconditional distributionTommi Jaakkola, MIT CSAIL 8Graphical models: examples• Lattice models (e.g., Ising model) as a Markov random field...s1s2......– symmetric interactions (e.g., alignment of two nearby spinsis energetically favorable)Tommi Jaakkola, MIT CSAIL 9Graphical models: examples• Factor graphs and codes (information theory)Bitsy4y3y2y1y5x4x5x2x1x3. . .. . .. . .parity checks– circles denote variables while the squares are factors(functions) that constrain the values of the variablesTommi Jaakkola, MIT CSAIL 10Graphical models. . .acoustic observationsfeatureslinguistic. . .. . .classNMtopicswords...s1s2......Bitsy4y3y2y1y5x4x5x2x1x3. . .. . .. . .parity checks• Graph semantics:graph ⇒ separation properties ⇒ independence• Association with probability distributions:independence ⇒ family of distributions• Inference and estimation:graph structure ⇒ efficient computationTommi Jaakkola, MIT CSAIL 11Bayesian networks• Bayesian networks are directed acyclic graphs, wherethe nodes represent variables and directed edges capturedependenciesA mixt ure model asa Bayesian network"i influences x""i causes x""x depends on i""parent of x""child of i"P (i)P (x|i)xiTommi Jaakkola, MIT CSAIL 12Bayesian networks• Bayesian networks are directed acyclic graphs, wherethe nodes represent variables and directed edges capturedependenciesA mixt ure model asa Bayesian network"i influences x""i causes x""x depends on i""parent of x""child of i"P (i)P (x|i)xi• Graph semantics:graph ⇒ separation properties ⇒ independence• Association with probability distributions:independence ⇒ family of distributionsTommi Jaakkola, MIT CSAIL 13Example• A sim ple Bayesian network: coin tossesx1x2Tommi Jaakkola, MIT CSAIL 14Example• A sim ple Bayesian network: coin tossesx1x2P (x2) :0.50.5P (x1) :0.50.5Tommi Jaakkola, MIT CSAIL 15Example• A sim ple Bayesian network: coin tossesP (x1) :0.50.5x3= same ?x1x2P (x2) :0.50.5Tommi Jaakkola, MIT CSAIL 16Example• A sim ple Bayesian network: coin tossesP (x3|x1, x2) :hh ht th tty 1.0 0.0 0.0 1.0n 0.0 1.0 1.0 0.0x3= same ?x1x2P (x2) :0.50.5P (x1) :0.50.5Tommi Jaakkola, MIT CSAIL 17Example• A sim ple Bayesian network: coin tossesP (x3|x1, x2) :hh ht th tty 1.0 0.0 0.0 1.0n 0.0 1.0 1.0 0.0x3= same ?x1x2P (x2) :0.50.5P (x1) :0.50.5• Two levels of description1. graph structure (dependencies, independencies)2. associated probability distributionTommi Jaakkola, MIT CSAIL 18Example cont’d• What c an the graph alone tell us?x2x3= same?x1Tommi Jaakkola, MIT CSAIL 19Example cont’d• What c an the graph alone tell us?x2x3= same?x1• x1and x2are marginally independentTommi Jaakkola, MIT CSAIL 20Example cont’d• What c an the graph alone tell us?x2x3= same?x1• x1and x2are marginally independentx2x3= same?x1• x1and x2become dependent if we know x3(the de pendence concerns our beliefs about the outcomes)Tommi Jaakkola, MIT CSAIL 21Traffic exampleN = X is nice?L = traffic lightS = X decides to stop?T = the other car turns left?C = crash?NTCSLTommi Jaakkola, MIT CSAIL 22Traffic exampleN = X is nice?L = traffic lightS = X decides to stop?T = the other car turns left?C = crash?NTCSLTommi Jaakkola, MIT CSAIL 23Traffic exampleN = X is nice?L = traffic lightS = X decides to stop?T = the other car turns left?C = crash?NTCSLTommi Jaakkola, MIT CSAIL 24Traffic exampleN = X is nice?L = traffic lightS = X decides to stop?T = the other car turns left?C = crash?NTCSLTommi Jaakkola, MIT CSAIL 25Traffic exampleN = X is nice?L = traffic lightS = X decides to stop?T = the other car turns left?C = crash?NTCSLTommi Jaakkola, MIT CSAIL 26Traffic exampleN = X is nice?L = traffic lightS = X decides to stop?T = the other car turns left?C = crash?NTCSL• If we only know that X decided to stop, can X’s character(variable N) tell us anything about the other car turning(variable T)?Tommi Jaakkola, MIT CSAIL 27Graph, independence, d-separation• Are N and T independent given S?NTCSLTommi Jaakkola, MIT CSAIL 28Graph, independence, d-separation• Are N and T independent given S?NTCSLDefinition: Variables N and T are D-separated given S ifS separates them in the moralized ancestral graphTommi Jaakkola, MIT CSAIL 29Graph, independence, d-separation• Are N and T independent give n

View Full Document