DOC PREVIEW
CMU CS 10708 - Notes

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

11School of Computer ScienceLarge Margin Structured ModelsProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture # 21Lecture # 2111/28/200711/28/2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Eric Xing 2OCR examplebraceSequential structurexya-z a-z a-z a-z a-zyx2Eric Xing 3Image segmentation exampleSpatial structurexyyiyjEric Xing 4Structured modelsspace of feasible outputsscoring functionAssumptions: linear combination of featuressum of part scores: • index p represents a part in the structure3Eric Xing 5Learning wz Training examplesz Probabilistic approach:z Computing Zw(x) can be NP-completez Tractable models but intractable estimation z Large margin approach: z Exact and efficient when prediction is tractableb r a c eEric Xing 6Learning Strategyz Recall that in CRFz We predict based on:z And we learn based on:z MaxMargin:z We predict based on:z And we learn based on:⎭⎬⎫⎩⎨⎧==∑ccccyyxfxZxypxy ),(exp),()|(maxarg|*θθθ1{}∏∑∏⎭⎬⎫⎩⎨⎧==nccnnccnnnnnncyxfxZxypxyc),(exp),()|(maxarg,|,*θθθθθ1),(maxarg),(maxarg|*yxFwyxfxyTyccccy==∑θ{} ( )⎟⎠⎞⎜⎝⎛−=∀≠),'(),(maxmaxarg,|,'*nnnnTnyywnnxyFxyFwxywnn4Eric Xing 7MLE of Feature Based UGMsz Scaled likelihood functionz Instead of optimizing this objective directly, we attack its lower boundz The logarithm has a linear upper bound …z This bound holds for all µ, in particular, forz Thus we have ∑∑===xnnxpxpxpNNDD)|(log)(~)|(log/);();(~θθθθ1ll∑∑−=xiiiZxfxp)(log)()(~θθ1−−≤µθµθlog)()(logZZ)()(tZθµ1−=1+−−≥∑∑xttiiiZZZxfxpD)(log)()()()(~);(~)()(θθθθθlEric Xing 8Generalized Iterative Scaling (GIS)z Lower bound of scaled loglikelihoodz Definez Relax againz Assume z Convexity of exponential: z We have:1+−−≥∑∑xttiiiZZZxfxpD)(log)()()()(~);(~)()(θθθθθl)(def)(tiitiθθθ−=∆11111+−⎭⎬⎫⎩⎨⎧∆−=+−⎭⎬⎫⎩⎨⎧∆⎭⎬⎫⎩⎨⎧−=+−⎭⎬⎫⎩⎨⎧−≥∑∑∑∑∑∑∑∑∑∑∑∑∑)(log)(exp)|()()(~)(log)(exp)(exp)()()(~)(log)(exp)()()(~);(~)()()()()()()()()(tiitixtiixitiitiiitixtiixixtiiixtiiiZxfxpxfxpZxfxfZxfxpZxfZxfxpDθθθθθθθθθθθθθθl()()∑∑≤iiiiiixxexpexpππ())()(logexp)()|()()(~);(~def)()()(θθθθθθΛ=+−∆−≥∑∑∑∑1txitiitiixiZxfxpxfxpDl10 =≥∑iiixfxf)( ,)(5Eric Xing 9GISz Lower bound of scaled loglikelihoodz Take derivative:z Set to zeroz where p(t)(x) is the unnormalized version of p(x|θ(t))z Update())()(logexp)()|()()(~);(~def)()()(θθθθθθΛ=+−∆−≥∑∑∑∑1txitiitiixiZxfxpxfxpDl()∑∑∆−=∂Λ∂xittiixixfxpxfxp)()|(exp)()(~)()(θθθ)()()()()(~)()|()()(~)()()()(txitixxitixZxfxpxfxpxfxpxfxpetiθθθ∑∑∑∑==∆)()()()()()()()()(xftttititiitiexpxpθθθθ∆++=⇒∆+=11⇒()∏∏∏⎟⎠⎞⎜⎝⎛=∑⎟⎠⎞⎜⎝⎛=⎟⎠⎞⎜⎝⎛=∑∑∑∑∑∑+ixfxfxpxfxptixftxfxfxpxfxpttixftxfxpxfxptttixitixiiixitixixitixxpZZxpZZxpxp)()()()()(~)()()()()()()()(~)()()()()()()()(~)()()()()()()()()()()()()()(θθθθ1Eric Xing 10Learning wz Training examplesz Probabilistic approach:z Computing Zw(x) can be NP-completez Tractable models but intractable estimation z Large margin approach: z Exact and efficient when prediction is tractableb r a c e6Eric Xing 11OCR Examplez We want:argmaxwordwTf(, word) = “brace”z Equivalently:wTf(,“brace”) > wTf( ,“aaaaa”)wTf(,“brace”) > wTf( ,“aaaab”)…wTf(,“brace”) > wTf( ,“zzzzz”)a lot!Eric Xing 12Large Margin Estimationz Given training example (x, y*), we want:*Taskar et al. 03z Maximize marginz Mistake weighted margin:# of mistakes in y7Eric Xing 13Large Margin Estimationz Recall from SVMs: z Maximizing margin γ is equivalent to minimizing the square of the L2-norm of the weight vector w:z New objective function:Eric Xing 14z Brute force enumeration of constraints:z The constraints are exponential in the size of the structurez Alternative: min-max formulation z add only the most violated constraintz Handles more general loss functionsz Only polynomial # of constraints needed Min-max Formulation8Eric Xing 15Min-max formulationz Key step: convert the maximization in the constraint from discrete to continuousz This enables us to plug it into a QP z How to do this conversion?z Linear chain example in the next slides Ædiscrete optim.continuous optim.Eric Xing 16y ⇒ z map for linear chain structures00000. . . 0.000.100:010:100:010:100:10B:BA00000. . . 0.010.0000000. . . 0.000.1000000. . . 0.100.00B:BAB.BA B.BA B.BA B.BA9Eric Xing 17y ⇒ z map for linear chain structures000000100000000001000010normalizationagreementRewriting the maximization function in terms of indicator variables:Eric Xing 18Min-max formulationz Original problem:z Transformed problem:z Has integral solutions z for chains, treesz Can be fractional for untriangulated networks10Eric Xing 19Min-max formulationz Using strong Lagrangian duality:(beyond the scope of this lecture)z Use the result above to minimize jointly over w and µ: Eric Xing 20Min-max formulationz Formulation produces compact QP forz Low-treewidth Markov networks z Associative Markov networksz Context free grammarsz Bipartite matchingsz Any problem with compact LP inference11Eric Xing 21Results: Handwriting RecognitionLength: ~8 charsLetter: 16x8 pixels 10-fold Train/Test5000/50000 letters600/6000 words Models:Multiclass-SVMs*CRFsM3nets *Crammer & Singer 01051015202530CRFsMC–SVMs M^3 netsTest error (average per-character)rawpixelsquadratickernelcubickernel45% error reduction over linear CRFs33% error reduction over multiclass SVMsbetterEric Xing 2205101520Test ErrorSVMs RMNS M^3NsResults: Hypertext Classificationz WebKB datasetz Four CS department websites: 1300 pages/3500 linksz Classify each page: faculty, course, student, project, otherz Train on three universities/test on fourth53% error reduction over SVMs38% error reduction over RMNs*Taskar et al 02better12Eric Xing 23Summaryz Structured Maximum Margin Networksz Concise representationz Efficient QP algorithmsz Applications:z Sequencesz Treesz Matchings ….z Strong empirical results z Acknowledgments: z Adopted from two different presentations given by Ben Taskar.Eric Xing 24Open Problemsz Unsupervised CRF learning and MaxMargin Learningz We want to recognize a pattern that is


View Full Document

CMU CS 10708 - Notes

Documents in this Course
Lecture

Lecture

15 pages

Lecture

Lecture

25 pages

Lecture

Lecture

24 pages

causality

causality

53 pages

lecture11

lecture11

16 pages

Exam

Exam

15 pages

lecture

lecture

18 pages

lecture

lecture

16 pages

Lecture

Lecture

17 pages

Lecture

Lecture

15 pages

Lecture

Lecture

17 pages

Lecture

Lecture

19 pages

Lecture

Lecture

42 pages

Lecture

Lecture

16 pages

r6

r6

22 pages

lecture

lecture

20 pages

lecture

lecture

35 pages

Lecture

Lecture

19 pages

Lecture

Lecture

21 pages

lecture

lecture

21 pages

lecture

lecture

13 pages

review

review

50 pages

Semantics

Semantics

30 pages

lecture21

lecture21

26 pages

MN-crf

MN-crf

20 pages

hw4

hw4

5 pages

lecture

lecture

12 pages

Lecture

Lecture

25 pages

Lecture

Lecture

25 pages

Lecture

Lecture

14 pages

Lecture

Lecture

15 pages

Load more
Download Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?