CMU CS 10701 - Graphical Models meet Margin- based Learning - D2182782

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Graphical Models meet Margin- based Learning

DOC PREVIEW

CMU CS 10701 - Graphical Models meet Margin- based Learning

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 45

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Graphical Models meet Margin-based LearningNext few lecturesHandwriting RecognitionSupport Vector MachinesHandwriting Recognition 2Handwriting Recognition 2SVMs vs. MNsSVMs, MNs vs. M3NsChain Markov Net (aka CRF*)Chain Markov Net (aka CRF*)Max (Conditional) LikelihoodOCR ExampleMax Margin EstimationM3NsM3NsM3N DualDual = Probability DistributionFactored Dual VariablesFactored ObjectiveFactored ObjectiveFactored ConstraintsFactored DualFactored DualGeneralization BoundHandwriting RecognitionHypertext ClassificationM3NsOther possible max-margin learning problemsAssociative Markov networksMax-margin AMNs resultsSegmentation resultsMax-margin parsingPCFGDisulfide bonds: non-bipartite matchingScoring functionLearning to clusterLearning to cluster resultsLearning to optimizeConclusionAcknowledgementsMax-Margin Markov Networks, B. Taskar, C. Guestrin and D. Koller. Neural Information Processing Systems Conference (NIPS03), 2003. http://www.cs.berkeley.edu/~taskar/pubs/mmmn.psLearning Associative Markov Networks, B. Taskar, V. Chatalbashev and D. Koller. Twenty First International Conference on Machine Learning (ICML04), 2004.http://www.cs.berkeley.edu/~taskar/pubs/mmamn.psMax-Margin Parsing, B. Taskar, D. Klein, M. Collins, D. Koller and C. Manning. Empirical Methods in Natural Language Processing (EMNLP04), 2004. http://www.cs.berkeley.edu/~taskar/pubs/mmcfg.psGraphical Models meet Margin-based LearningMachine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityApril 13th, 2005Next few lectures Today – Advanced topic in graphical models Next week – learning to make decisions with reinforcement learning Week after – Dealing with very large datasets, active learning and BIG PICTUREHandwriting RecognitionCharacter recognition: kernel SVMszcbcacrrrrrrSupport Vector MachinesAdvantages: SVMHigh-dim learning (kernels)Generalization boundsHandwriting Recognition 2SVMs for sequences?brakeaaaaabrickbraceProblem: # of classes exponential in lengthbrarezzzzz.....Handwriting Recognition 2Graphical models: HMMs, MNsLinear in lengthSVMs vs. MNsAdvantages: SVM MNHigh-dim learning (kernels)Generalization boundsEfficiently exploit label correlationsSVMs, MNs vs. M3NsAdvantages: SVM MN M3NHigh-dim learning (kernels)Generalization boundsEfficiently exploit label correlationsChain Markov Net (aka CRF*)P(y|x) = Z(x)Πiφ(xi,yi) Πiφ(yi,yi+1)1φ(xi,yi) = exp{∑αwαfα(xi,yi)} φ(yi,yi+1)= exp{∑βwβfβ(yi,yi+1)} fβ(y,y’) = I(y=‘z’,y’=‘a’)a-z a-z a-z a-z a-zfα(x,y) = I(xp=1, y=‘z’)yx*Lafferty et al. 01Chain Markov Net (aka CRF*)Πiφ(xi,yi) = exp{∑αwα∑ifα(xi,yi)} Πiφ(yi,yi+1)= exp{∑βwβ∑ifβ(yi,yi+1)} P(y|x) = Z(x)Πiφ(xi,yi) Πiφ(yi,yi+1) 1= Z(x)exp{wTf(x,y)}1w = [… , wα, … , wβ, …]f(x,y) = [… , fα(x,y) , … , fβ(x,y), …]fβ(x,y) = #(y=‘z’,y’=‘a’)a-z a-z a-z a-z a-zyfα(x,y) = #(xp=1, y=‘z’)x*Lafferty et al. 01Max (Conditional) Likelihoodx1,t(x1)…xm,t(xm)DEstimation Classificationf(x,y)Don’t need to learn entire distribution!OCR Example We want:argmaxwordwTf(,word) = “brace” Equivalently:wTf(,“brace”) > wTf( ,“aaaaa”)wTf(,“brace”) > wTf( ,“aaaab”)…wTf(,“brace”) > wTf( ,“zzzzz”)a lot!Max Margin Estimation Goal: find w such thatwTf(x,t(x)) > wTf(x,y) x∈D y≠t(x)wT[f(x,t(x)) – f(x,y)] > 0 Maximize margin γ Gain over y grows with # of mistakes in y: ∆tx(y)∆t (“craze”) = 2 ∆t (“zzzzz”) = 5w>∆fx(y) > 0w>∆fx(y) ≥γA Aw>∆f (“craze”) ≥ 2γw>∆f (“zzzzz”) ≥ 5γ∆tx(y)M3Nsx1,t(x1)…xm,t(xm)DEstimation Classificationf(x,y)M3NsEstimationDual QuadraticProgramExponential sizePolynomial sizeFactored DualM3N Dualα (“craze”)w>∆f (“craze”) ≥ 2γα (“zzzzz”)w>∆f (“zzzzz”) ≥ 5γ Exponential number of variablesαx(y) represents a probability distribution Key insight from graphical models: Can use network structure to factorize distributionDual = Probability Distributionαx(y)carora.4b c a r e b r o r e b r o c e b r a c e crao.2.15.25.8.4.6.2.4.4.2Factored Dual Variables Introduce factored dual variables: Linear in the size of the network Rewrite dual using µ’s:maximize QuadraticObjective(µ)s.t. µ∈ ConsistentMarginals (linear constraints)Equivalent to original dual!Factored Objectiveb c a r e b r o r e b r o c eb r a c e αrcaocr.2.15.25.4.2.35.65.8.4.61b1e2 2 10 ∆tµFactored ObjectiveFactored Constraintsnormalizationnon-negativityIf network is a treeElse add clique tree constraintsnormalizationnon-negativityagreementtriangulationFactored Dual Objective is quadratic in network size Constraint set is exponential in tree-width Linear for sequences and trees Complexity same as inferenceand max likelihoodFactored Dual Kernel trick works! Node and edge potentials can use kernelsnodes ⇒edges ⇒Generalization BoundTheorem:with probability at least 1-δ.Training set per-label γ-errorTest set per-label error Distribution-free First per-label bound Dependence on L logarithmic vs. linear [Colllins 01]L= number of nodes and edgesHandwriting Recognition051015202530CRFsMC–SVMs M^3 netsTest error (average per-character)rawpixelsquadratickernelcubickernelLength: ~8 charsLetter: 16x8 pixels 10-fold Train/Test5000/50000 letters600/6000 words Models:Multiclass-SVMs*CRFsM3nets better45% error reduction over linear CRFs33% error reduction over multiclass SVMs*Crammer & Singer 01Named Entity Recognition Locate and classify named entities in sentences: 4 categories: organization, person, location, misc. e.g. “U.N. official Richard Butler heads for Baghdad”. CoNLL 03 data set (200K words train, 50K words test)848586878889909192Test F1CRFsM^3N LinearM^3N Quad32% error reduction over CRFsU.N.officialRichardheadsforBaghdadyi= org/per/loc/misc/nonef(yi, x) = […,I(yi=org, xi=“U.N.”),I(yi=per, xi=capitalized),I(yi=loc, xi=known city),…, ]yxButlerbetter05101520Test ErrorSVMs RMNS M^3NsHypertext Classification WebKB dataset Four CS department websites: 1300 pages/3500 links Classify each page: faculty, course, student, project, other Train on three universities/test on fourth53% error reduction over SVMs38% error reduction over RMNsrelaxed dual*Taskar et al 02betterloopy belief propagationM3NsBasic algorithm works for any low tree-width graphical

View Full Document

CMU CS 10701 - Graphical Models meet Margin- based Learning

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

CMU CS 10701 - Graphical Models meet Margin- based Learning

Sign up for free to view:

Please select your school