CMU CS 10708 - Notes - D2570784

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10708> Notes

DOC PREVIEW

CMU CS 10708 - Notes

School name Carnegie Mellon University

Course Cs 10708- Probabilistic Graphical Models

Pages 12

This preview shows page 1-2-3-4 out of 12 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 12 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

11School of Computer ScienceLarge Margin Structured ModelsProbabilistic Graphical Models (10Probabilistic Graphical Models (10--708)708)Lecture # 21Lecture # 2111/28/200711/28/2007Eric XingEric XingReceptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8Receptor AKinase CTF FGene GGene HKinaseEKinase DReceptor BX1X2X3X4X5X6X7X8X1X2X3X4X5X6X7X8Eric Xing 2OCR examplebraceSequential structurexya-z a-z a-z a-z a-zyx2Eric Xing 3Image segmentation exampleSpatial structurexyyiyjEric Xing 4Structured modelsspace of feasible outputsscoring functionAssumptions: linear combination of featuressum of part scores: • index p represents a part in the structure3Eric Xing 5Learning wz Training examplesz Probabilistic approach:z Computing Zw(x) can be NP-completez Tractable models but intractable estimation z Large margin approach: z Exact and efficient when prediction is tractableb r a c eEric Xing 6Learning Strategyz Recall that in CRFz We predict based on:z And we learn based on:z MaxMargin:z We predict based on:z And we learn based on:⎭⎬⎫⎩⎨⎧==∑ccccyyxfxZxypxy ),(exp),()|(maxarg|*θθθ1{}∏∑∏⎭⎬⎫⎩⎨⎧==nccnnccnnnnnncyxfxZxypxyc),(exp),()|(maxarg,|,*θθθθθ1),(maxarg),(maxarg|*yxFwyxfxyTyccccy==∑θ{} ( )⎟⎠⎞⎜⎝⎛−=∀≠),'(),(maxmaxarg,|,'*nnnnTnyywnnxyFxyFwxywnn4Eric Xing 7MLE of Feature Based UGMsz Scaled likelihood functionz Instead of optimizing this objective directly, we attack its lower boundz The logarithm has a linear upper bound …z This bound holds for all µ, in particular, forz Thus we have ∑∑===xnnxpxpxpNNDD)|(log)(~)|(log/);();(~θθθθ1ll∑∑−=xiiiZxfxp)(log)()(~θθ1−−≤µθµθlog)()(logZZ)()(tZθµ1−=1+−−≥∑∑xttiiiZZZxfxpD)(log)()()()(~);(~)()(θθθθθlEric Xing 8Generalized Iterative Scaling (GIS)z Lower bound of scaled loglikelihoodz Definez Relax againz Assume z Convexity of exponential: z We have:1+−−≥∑∑xttiiiZZZxfxpD)(log)()()()(~);(~)()(θθθθθl)(def)(tiitiθθθ−=∆11111+−⎭⎬⎫⎩⎨⎧∆−=+−⎭⎬⎫⎩⎨⎧∆⎭⎬⎫⎩⎨⎧−=+−⎭⎬⎫⎩⎨⎧−≥∑∑∑∑∑∑∑∑∑∑∑∑∑)(log)(exp)|()()(~)(log)(exp)(exp)()()(~)(log)(exp)()()(~);(~)()()()()()()()()(tiitixtiixitiitiiitixtiixixtiiixtiiiZxfxpxfxpZxfxfZxfxpZxfZxfxpDθθθθθθθθθθθθθθl()()∑∑≤iiiiiixxexpexpππ())()(logexp)()|()()(~);(~def)()()(θθθθθθΛ=+−∆−≥∑∑∑∑1txitiitiixiZxfxpxfxpDl10 =≥∑iiixfxf)( ,)(5Eric Xing 9GISz Lower bound of scaled loglikelihoodz Take derivative:z Set to zeroz where p(t)(x) is the unnormalized version of p(x|θ(t))z Update())()(logexp)()|()()(~);(~def)()()(θθθθθθΛ=+−∆−≥∑∑∑∑1txitiitiixiZxfxpxfxpDl()∑∑∆−=∂Λ∂xittiixixfxpxfxp)()|(exp)()(~)()(θθθ)()()()()(~)()|()()(~)()()()(txitixxitixZxfxpxfxpxfxpxfxpetiθθθ∑∑∑∑==∆)()()()()()()()()(xftttititiitiexpxpθθθθ∆++=⇒∆+=11⇒()∏∏∏⎟⎠⎞⎜⎝⎛=∑⎟⎠⎞⎜⎝⎛=⎟⎠⎞⎜⎝⎛=∑∑∑∑∑∑+ixfxfxpxfxptixftxfxfxpxfxpttixftxfxpxfxptttixitixiiixitixixitixxpZZxpZZxpxp)()()()()(~)()()()()()()()(~)()()()()()()()(~)()()()()()()()()()()()()()(θθθθ1Eric Xing 10Learning wz Training examplesz Probabilistic approach:z Computing Zw(x) can be NP-completez Tractable models but intractable estimation z Large margin approach: z Exact and efficient when prediction is tractableb r a c e6Eric Xing 11OCR Examplez We want:argmaxwordwTf(, word) = “brace”z Equivalently:wTf(,“brace”) > wTf( ,“aaaaa”)wTf(,“brace”) > wTf( ,“aaaab”)…wTf(,“brace”) > wTf( ,“zzzzz”)a lot!Eric Xing 12Large Margin Estimationz Given training example (x, y*), we want:*Taskar et al. 03z Maximize marginz Mistake weighted margin:# of mistakes in y7Eric Xing 13Large Margin Estimationz Recall from SVMs: z Maximizing margin γ is equivalent to minimizing the square of the L2-norm of the weight vector w:z New objective function:Eric Xing 14z Brute force enumeration of constraints:z The constraints are exponential in the size of the structurez Alternative: min-max formulation z add only the most violated constraintz Handles more general loss functionsz Only polynomial # of constraints needed Min-max Formulation8Eric Xing 15Min-max formulationz Key step: convert the maximization in the constraint from discrete to continuousz This enables us to plug it into a QP z How to do this conversion?z Linear chain example in the next slides Ædiscrete optim.continuous optim.Eric Xing 16y ⇒ z map for linear chain structures00000. . . 0.000.100:010:100:010:100:10B:BA00000. . . 0.010.0000000. . . 0.000.1000000. . . 0.100.00B:BAB.BA B.BA B.BA B.BA9Eric Xing 17y ⇒ z map for linear chain structures000000100000000001000010normalizationagreementRewriting the maximization function in terms of indicator variables:Eric Xing 18Min-max formulationz Original problem:z Transformed problem:z Has integral solutions z for chains, treesz Can be fractional for untriangulated networks10Eric Xing 19Min-max formulationz Using strong Lagrangian duality:(beyond the scope of this lecture)z Use the result above to minimize jointly over w and µ: Eric Xing 20Min-max formulationz Formulation produces compact QP forz Low-treewidth Markov networks z Associative Markov networksz Context free grammarsz Bipartite matchingsz Any problem with compact LP inference11Eric Xing 21Results: Handwriting RecognitionLength: ~8 charsLetter: 16x8 pixels 10-fold Train/Test5000/50000 letters600/6000 words Models:Multiclass-SVMs*CRFsM3nets *Crammer & Singer 01051015202530CRFsMC–SVMs M^3 netsTest error (average per-character)rawpixelsquadratickernelcubickernel45% error reduction over linear CRFs33% error reduction over multiclass SVMsbetterEric Xing 2205101520Test ErrorSVMs RMNS M^3NsResults: Hypertext Classificationz WebKB datasetz Four CS department websites: 1300 pages/3500 linksz Classify each page: faculty, course, student, project, otherz Train on three universities/test on fourth53% error reduction over SVMs38% error reduction over RMNs*Taskar et al 02better12Eric Xing 23Summaryz Structured Maximum Margin Networksz Concise representationz Efficient QP algorithmsz Applications:z Sequencesz Treesz Matchings ….z Strong empirical results z Acknowledgments: z Adopted from two different presentations given by Ben Taskar.Eric Xing 24Open Problemsz Unsupervised CRF learning and MaxMargin Learningz We want to recognize a pattern that is

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 12 pages.

CMU CS 10708 - Notes

Sign up for free to view:

Please select your school