DOC PREVIEW
CMU CS 10701 - mmmn-comments

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Max Margin Markov Networks B Taskar C Guestrin and D Koller Neural Information Processing Systems Conference NIPS03 2003 http www cs berkeley edu taskar pubs mmmn ps Learning Associative Markov Networks B Taskar V Chatalbashev and D Koller Twenty First International Conference on Machine Learning ICML04 2004 http www cs berkeley edu taskar pubs mmamn ps Max Margin Parsing B Taskar D Klein M Collins D Koller and C Manning Empirical Methods in Natural Language Processing EMNLP04 2004 http www cs berkeley edu taskar pubs mmcfg ps Graphical Models meet Marginbased Learning Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University April 13th 2005 Next few lectures Today Advanced topic in graphical models Next week learning to make decisions with reinforcement learning Week after Dealing with very large datasets active learning and BIG PICTURE Handwriting Recognition Character recognition kernel SVMs rr r r r c r a c z bc Support Vector Machines Advantages High dim learning kernels Generalization bounds SVM Handwriting Recognition 2 SVMs for sequences Problem of classes exponential in length brare brace zzzzz brick aaaaa brake Handwriting Recognition 2 Graphical models HMMs MNs Linear in length SVMs vs MNs Advantages High dim learning kernels Generalization bounds Efficiently exploit label correlations SVM MN SVMs MNs vs M3Ns Advantages High dim learning kernels Generalization bounds Efficiently exploit label correlations SVM MN M3N Chain Markov Net aka CRF P y x 1 Z x i xi yi i yi yi 1 xi yi exp w f xi yi yi yi 1 exp w f yi yi 1 y a z a z a z a z a z f y y I y z y a f x y I xp 1 y z x Lafferty et al 01 Chain Markov Net aka CRF P y x 1 Z x i xi yi i yi yi 1 1 Z x exp wTf x y i xi yi exp w w i f xi yw i w i y f x y f x y i yi yi 1 exp f x y w i f y i 1 y a z a z a z a z a z f x y y z y a f x y xp 1 y z x Lafferty et al 01 Max Conditional Likelihood D x1 t x1 xm t xm Estimation Classification f x y Don t need to learn entire distribution OCR Example We want argmaxword wT f word brace Equivalently wT f wT f wT f brace wT f brace wT f aaaaa aaaab brace wT f zzzzz a lot Max Margin Estimation Goal find w such that A wTf x t x wTf x y wT f x t x f x y 0 x D A y t x w fx y 0 tx y Maximize margin Gain over y grows with of mistakes in y tx y t w f craze 2 craze 2 t w f zzzzz 5 zzzzz 5 M3Ns D x1 t x1 xm t xm f x y Estimation Classification M3Ns Estimation Exponential size Dual Quadratic Program Polynomial size Factored Dual M3N Dual w f craze 2 craze w f zzzzz 5 zzzzz Exponential number of variables x y represents a probability distribution Key insight from graphical models Can use network structure to factorize distribution Dual Probability Distribution b b b b c 2 r 8 c r r r a o o a r r c c ca 2 ro 4 a 6 ra 4 o 4 e e e e 2 15 25 4 x y Factored Dual Variables Introduce factored dual variables Linear in the size of the network Rewrite dual using s maximize QuadraticObjective s t ConsistentMarginals linear constraints Equivalent to original dual Factored Objective t b b b b b 1 c r r r a o o a r r c c r 8 a 6 c 65 e c 2 o 4 r 35 e e e e 1 2 2 1 0 2 15 25 4 Factored Objective Factored Constraints normalization non negativity If network is a tree Else add clique tree constraints normalization non negativity agreement triangulation Factored Dual Objective is quadratic in network size Constraint set is exponential in tree width Linear for sequences and trees Complexity same as inference and max likelihood Factored Dual nodes edges Kernel trick works Node and edge potentials can use kernels Generalization Bound Theorem Test set Training set per label error per label error with probability at least 1 Distribution free First per label bound Dependence on L logarithmic vs linear Colllins 01 L number of nodes and edges Length 8 chars Letter 16x8 pixels 10 fold Train Test 5000 50000 letters 600 6000 words Test error average per character Handwriting Recognition 30 raw pixels quadratic kernel cubic kernel better 25 20 15 Models 10 Multiclass SVMs 45 error reduction over linear CRFs 5 CRFs 33 error reduction over multiclass SVMs 3 M nets 0 MC SVMs Crammer Singer 01 CRFs M 3 nets Named Entity Recognition Locate and classify named entities in sentences CoNLL 03 data set 200K words train 50K words test of f ic Ri ia l ch ar Bu d tle r he ad s Ba for gh da d U N 4 categories organization person location misc e g U N official Richard Butler heads for Baghdad yi org per loc misc none f yi x I yi org xi U N I yi per xi capitalized I yi loc xi known city 92 y x 91 better 90 T est F1 89 88 87 86 85 84 32 error reduction over CRFs CRFs M 3N Linear M 3N Quad Hypertext Classification WebKB dataset Four CS department websites 1300 pages 3500 links Classify each page faculty course student project other Train on three universities test on fourth better 20 relaxed dual Test Error 15 10 5 loopy belief propagation Taskar et al 02 0 53 error reduction over SVMs 38 error reduction over RMNs SVMs RMNS M 3Ns M3Ns Basic algorithm works for any low tree width graphical model Estimation Exponential size Dual Quadratic Program Polynomial size Factored Dual Other possible max margin learning problems Large tree width Markov networks with attractive potentials Parsing using probabilistic context free grammars Learning to cluster Max margin learning of any poly time problem Associative Markov networks Point features spin images point height Edge features length of edge edge orientation associative restriction i yi ij yj Max margin AMNs results Label ground building tree shrub Training 30 thousand points Testing 3 million points Segmentation results Hand labeled 180K test points Model Accuracy SVM 68 V SVM 73 M3N 93 Max margin parsing Classic learning problem P NP NP PP P NP DT NN Usually learn probabilities with counts Learn max margin discriminative model PCFG NP DT NN PP IN NP NN sea Disulfide bonds non bipartite matching 2 RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 1 4 6 6 1 6 2 4 3 5 Fariselli Casadio 01 Baldi et al 04 3 5 Scoring function 2 RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 2 3 4 5 1 4 6 RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 3 6 6 5 String …


View Full Document

CMU CS 10701 - mmmn-comments

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download mmmn-comments
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view mmmn-comments and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view mmmn-comments and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?