DOC PREVIEW
Columbia COMS W4705 - A Tutorial on Dual Decomposition and Lagrangian Relaxation

This preview shows page 1-2-3-4-24-25-26-50-51-52-53 out of 53 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A Tutorial on Dual Decomposition and Lagrangian Relaxation forInference in Natural Language ProcessingAlexander M. Rush1,[email protected] [email protected] Science and Artificial Intelligence LaboratoryMassachusetts Institute of TechnologyCambridge, MA 02139, USA2Department of Computer ScienceColumbia UniversityNew York, NY 10027, USAAbstractDual decomposition, and more generally Lagrangian relaxation, is a classical method for com-binatorial optimization; it has recently been applied to several inference problems in natural lan-guage processing (NLP). This tutorial gives an overview of the technique. We describe example al-gorithms, describe formal guarantees for the method, and describe practical issues in implementingthe algorithms. While our examples are predominantly drawn from the NLP literature, the materialshould be of general relevance to inference problems in machine learning. A central theme of thistutorial is that Lagrangian relaxation is naturally applied in conjunction with a broad class of com-binatorial algorithms, allowing inference in models that go significantly beyond previous work onLagrangian relaxation for inference in graphical models.1. IntroductionIn many problems in statistical natural language processing, the task is to map some input x (e.g., astring) to some structured output y (e.g., a parse tree). This mapping is often defined asy⇤= argmaxy2Yh(y) (1)where Y is a finite set of possible structures for the input x, and h : Y ! R is a function that assignsa score h(y) to each y in Y. For example, in part-of-speech tagging, x would be a sentence, and Ywould be the set of all possible tag sequences for x; in parsing, x would be a sentence and Y wouldbe the set of all parse trees for x; in machine translation, x would be a source-language sentenceand Y would be the set of all possible translations for x. The problem of finding y⇤is referred toas the decoding problem. The size of Y typically grows exponentially with respect to the size ofthe input x, making exhaustive search for y⇤intractable.This paper gives an overview of decoding algorithms for NLP based on dual decomposition,and more generally, Lagrangian relaxation. Dual decomposition leverages the observation thatmany decoding problems can be decomposed into two or more sub-problems, together with linear1constraints that enforce some notion of agreement between solutions to the different problems.The sub-problems are chosen such that they can be solved efficiently using exact combinatorialalgorithms. The agreement constraints are incorporated using Lagrange multipliers, and an iterativealgorithm—for example, a subgradient algorithm—is used to minimize the resulting dual. Dualdecomposition algorithms have the following properties:• They are typically simple and efficient. For example, subgradient algorithms involve twosteps at each iteration: first, each of the sub-problems is solved using a combinatorial algo-rithm; second, simple additive updates are made to the Lagrange multipliers.• They have well-understood formal properties, in particular through connections to linear pro-gramming (LP) relaxations.• In cases where the underlying LP relaxation is tight, they produce an exact solution to theoriginal decoding problem, with a certificate of optimality.1In cases where the underlying LPis not tight, heuristic methods can be used to derive a good solution; alternatively, constraintscan be added incrementally until the relaxation is tight, at which point an exact solution isrecovered.Dual decomposition, where two or more combinatorial algorithms are used, is a special case ofLagrangian relaxation (LR). It will be useful to also consider LR methods that make use of a singlecombinatorial algorithm, together with a set of linear constraints that are again incorporated usingLagrange multipliers. The use of a single combinatorial algorithm is qualitatively different fromdual decomposition approaches, although the techniques are very closely related.Lagrangian relaxation has a long history in the combinatorial optimization literature, going backto the seminal work of Held and Karp (1971), who derive a relaxation algorithm for the travelingsalesman problem. Initial work on Lagrangian relaxation/dual decomposition for decoding in sta-tistical models focused on the MAP problem in Markov random fields (Komodakis, Paragios, &Tziritas, 2007, 2010). More recently, decoding algorithms have been derived for several modelsin statistical NLP, including models that combine a weighted context-free grammar (WCFG) witha finite-state tagger (Rush, Sontag, Collins, & Jaakkola, 2010); models that combine a lexicalizedWCFG with a discriminative dependency parsing model (Rush et al., 2010); head-automata modelsfor non-projective dependency parsing (Koo, Rush, Collins, Jaakkola, & Sontag, 2010); alignmentmodels for statistical machine translation (DeNero & Macherey, 2011); models for event extraction(Riedel & McCallum, 2011); models for combined CCG parsing and supertagging (Auli & Lopez,2011); phrase-based models for statistical machine translation (Chang & Collins, 2011); syntax-based models for statistical machine translation (Rush & Collins, 2011); and models based on theintersection of weighted automata (Paul & Eisner, 2012). We will give an overview of several ofthese algorithms in this paper.While our focus is on examples from natural language processing, the material in this tutorialshould be of general relevance to inference problems in machine learning. There is clear relevanceto the problem of inference in graphical models, as described for example by (Komodakis et al.,2007, 2010); however one central theme of this tutorial is that Lagrangian relaxation is naturally1. A certificate of optimality is information that gives an efficient proof that the given solution is optimal; more specifi-cally, it is information that allows a proof of optimality of the solution to be constructed in polynomial time.2applied in conjunction with a much broader class of combinatorial algorithms than max-productbelief propagation, allowing inference in models that go significantly beyond graphical models.The remainder of this paper is structured as follows. Section 2 describes related work. Section 3gives a formal introduction to Lagrangian relaxation. Section 4 describes a dual decompositionalgorithm (from Rush et al. (2010)) for decoding a model that combines a weighted


View Full Document

Columbia COMS W4705 - A Tutorial on Dual Decomposition and Lagrangian Relaxation

Download A Tutorial on Dual Decomposition and Lagrangian Relaxation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Tutorial on Dual Decomposition and Lagrangian Relaxation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Tutorial on Dual Decomposition and Lagrangian Relaxation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?