Learning Structured Prediction Models: A Large Margin ApproachSlide 2Handwriting recognitionObject segmentationNatural language parsingDisulfide connectivity predictionOutlineStructured modelsChain Markov Net (aka CRF*)Slide 10Associative Markov NetsPCFGDisulfide bonds: non-bipartite matchingScoring functionSlide 15MAP inference linear programMarkov Net Inference LPAssociative MN Inference LPOther Inference LPsSlide 20Learning wGeometric ExampleOCR ExampleLarge margin estimationSlide 25Min-max formulationSlide 27Slide 283D MappingSlide 30Slide 31Slide 32Slide 33Segmentation resultsFly-throughCertificate formulationCertificate for non-bipartite matchingSlide 38Slide 39Slide 40Known bonded stateUnknown bonded stateFormulation summaryEstimationOmissionsCurrent WorkConclusionSlide 48Duals and KernelsHandwriting RecognitionHypertext ClassificationProjected GradientMin-Cost Flow for Markov ChainsMin-Cost Flow for Bipartite MatchingsCFG ChartCFG Inference LPLearning Structured Prediction Models:A Large Margin Approach Ben TaskarU.C. BerkeleyVassil Chatalbashev Michael Collins Carlos Guestrin Dan Klein Daphne Koller Chris Manning“Don’t worry, Howard. The big questions are multiple choice.”Handwriting recognitionbraceSequential structurex yObject segmentationSpatial structurex yNatural language parsingThe screen was a sea of redRecursive structurex yDisulfide connectivity predictionRSCCPCYWGGCPWGQNCYPEGCSGPKVCombinatorial structurex yOutlineStructured prediction modelsSequences (CRFs)Trees (CFGs)Associative Markov networks (Special MRFs)Matchings Geometric ViewStructured model polytopes Linear programming inferenceStructured large margin estimationMin-max formulationApplication: 3D object segmentationCertificate formulationApplication: disulfide connectivity predictionStructured modelsMild assumption: linear combinationspace of feasible outputsscoring functionChain Markov Net (aka CRF*)a-za-za-za-za-zyx(xi,yi) = exp{ wf(xi,yi)} (yi,yi+1) = exp{ wf (yi,yi+1)} P(y|x) i (xi,yi) i (yi,yi+1)f(y,y’) = I(y=‘z’,y’=‘a’)*Lafferty et al. 01f(x,y) = I(xp=1, y=‘z’)Chain Markov Net (aka CRF*)a-za-za-za-za-zyxi (xi,yi) = exp{ w i f(xi,yi)} i (yi,yi+1) = exp{ w i f (yi,yi+1)} P(y|x) i (xi,yi) i (yi,yi+1) *Lafferty et al. 01 = exp{wTf(x,y)}f(x,y) = #(y=‘z’,y’=‘a’)f(x,y) = #(xp=1, y=‘z’)w = [… , w , … , w, …]f(x,y) = [… , f(x,y) , … , f(x,y) , …]Associative Markov NetsPoint featuresspin-images, point heightEdge featureslength of edge, edge orientation yiyjiji“associative” restrictionPCFG#(NP DT NN)…#(PP IN NP)…#(NN ‘sea’)Disulfide bonds: non-bipartite matching12 346 5RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6 612453Fariselli & Casadio `01, Baldi et al. ‘04Scoring functionRSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6 RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6 12 346 5String features: residues, physical propertiesStructured modelsMild assumption: Another mild assumption: linear programmingspace of feasible outputsscoring functionMAP inference linear programLP inference forChainsTreesAssociative Markov NetsBipartite Matchings…Markov Net Inference LP0 0 0 00 0 0 00 1 0 00 0 0 000100 1 0 0Has integral solutions y for chains, treesGives upper bound for general networksAssociative MN Inference LPFor K=2, solutions are always integral (optimal)For K>2, within factor of 2 of optimal Constraint matrix A is linear in number of nodes and edges, regardless of the tree-width“associative” restrictionOther Inference LPsContext-free parsingDynamic programsBipartite matchingNetwork flow Many other combinatorial problemsOutlineStructured prediction modelsSequences (CRFs)Trees (CFGs)Associative Markov networks (Special MRFs)Matchings Geometric ViewStructured model polytopes Linear programming inferenceStructured large margin estimationMin-max formulationApplication: 3D object segmentationCertificate formulationApplication: disulfide connectivity predictionLearning wTraining example (x, y*) Probabilistic approach:Maximize conditional likelihoodProblem: computing Zw(x) is #P-completeGeometric ExampleTraining data:Goal: Learn w s.t. wTf( , y*) points the “right” wayOCR ExampleWe want:argmaxword wT f( ,word) = “brace”Equivalently:wT f( ,“brace”) > wT f( ,“aaaaa”)wT f( ,“brace”) > wT f( ,“aaaab”)…wT f( ,“brace”) > wT f( ,“zzzzz”)a lot!Large margin estimationGiven training example (x, y*), we want:*Taskar et al. 03Maximize marginMistake weighted margin:# of mistakes in yLarge margin estimationBrute force enumerationMin-max formulation‘Plug-in’ linear program for inferenceMin-max formulationLP inferenceAssume linear loss (Hamming):InferenceMin-max formulationBy strong LP dualityMinimize jointly over w, zMin-max formulationFormulation produces compact QP forLow-treewidth Markov networks Associative Markov networksContext free grammarsBipartite matchingsAny problem with compact LP inference3D MappingLaser Range FinderGPSIMUData provided by: Michael Montemerlo & Sebastian ThrunLabel: ground, building, tree, shrub Training: 30 thousand points Testing: 3 million pointsSegmentation resultsHand labeled 180K test pointsModelAccuracySVM 68%V-SVM73%M3N 93%Fly-throughCertificate formulationNon-bipartite matchings:O(n3) combinatorial algorithmNo polynomial-size LP knownSpanning treesNo polynomial-size LP knownSimple certificate of optimality Intuition: Verifying optimality easier than optimizing Compact optimality condition of y* wrt. 12 346 5ijklCertificate for non-bipartite matchingAlternating cycle: Every other edge is in matchingAugmenting alternating cycle: Score of edges not in matching greater than edges in matchingNegate score of edges not in
View Full Document