NYU CSCI-GA 2273 - Formal Mechanisms for Capturing Regularizations

Unformatted text preview:

Formal Mechanisms for Capturing RegularizationsAdam Meyers, Ralph Grishman, Michiko KosakaNew York University, 719 Broadway, 7th Floor, NY, NY 10003 USAmeyers/[email protected] Monmouth University, West Long Branch, N.J. 07764, [email protected] initial treebanks and treebank parsers primarily involved surface analysis, recent work focuses on predicate argument (PA) struc-ture. PA structure provides means to regularize variants (e.g., actives/passives) of sentences so that individual patterns may have bettercoverage (in MT, QA, IE, etc.), offsetting the sparse data problem. We encode such PA information in the GLARF framework. Ourprevious work discusses procedures for producing GLARF from treebanks and parsed data. This paper shows that GLARF is particu-larly well-suited for capturing regularization. We discuss crucial components of GLARF and demonstrate that other frameworks wouldrequire equivalent components to adequately express regularization.1. IntroductionThe past decade has seen a great deal of work on de-veloping ‘treebanks’ and using treebanks to create increas-ingly accurate parsers. Although initial work primarily in-volved surface-structure analysis, much recent work has fo-cused on predicate argument (PA) structure. PA structurecan be used to capture syntactic regularizations, providinga common representation for variants (such as active andpassive clauses) which convey the same semantic relation-ship. In this way regularizations can reduce the numberof patterns required to capture a semantic relation for suchapplications as MT, QA, and IE. While this advantage haslong been recognized, the goal of recent efforts is to com-bine these benefits with the high accuracy of corpus-basedanalyzers. To a limited degree, PA structure has been addedto parse trees using function tags, labels carrying grammat-ical role information or semantic class information (Marcuset al., 1994; Blaheta and Charniak, 2000). However, effortsto incorporate much more PA structure information are un-derway.PA structure seems to mean different things to differ-ent researchers, but all to some degree seem to address theproblem of regularization – expressing noncanonical con-structions in terms of canonical ones. PA structures mayinclude any of the following elements: (1) function tagswhich semantically classify constituents or assign themgrammatical roles; (2) labeled arcs (representing dependen-cies or grammaticalroles) which show how constituents arerelated to each other; and (3) filler/gap representations, e.g.,empty categories coindexed with antecedents. In our work,we focus specifically on representing regularization incor-porating these sorts of mechanisms and others into one co-herent framework: GLARF (Grammatical and Logical Ar-gument Representation Framework). By focusing on theregularization aspect of PA structure, we are trying to max-imize the utility of our research for a range of applicationsinvolving generalization of syntactic patterns. Regulariza-tion could be a major tool for combating the sparse dataproblem because a simple pattern may be able to recog-nize 20 or more times the number of sentence types whenapplied to regularized data rather than unregularized data.If regularizations are adequately exploited, statistical NLPshould be able to achieve better coverage with less trainingdata.Our previous work (Meyers et al., 2001a; Meyers etal., 2001b) discusses our procedures for producing GLARFfrom treebanks and parsed sentences – we currently havea small, but growing hand-corrected GLARF corpus andhave applied our GLARFing procedures to the entire PennTreebank and thousands of computer-parsed sentences. Inthis paper, we show that GLARF is better suited than previ-ous treebank frameworks for capturing regularizations. Wewill define three components of GLARF that are crucialto adequately representing regularizations: gap typing, in-dex typing and arc typing. Without equivalent components,we argue that other frameworks will not be able to repre-sent crucial details of regularizations. Until now, the broadrange of issues involved in representing regularization havenot been addressed in a single computational framework.2. Regularization in Syntactic TheoryRegularization has been explored thoroughly in syntac-tic theories over the last half century beginning with theworkof Zellig Harris, and continuing with subsequent workin most other frameworks (TransformationalGrammar, Re-lational Grammar (RG), Case Grammar, Feature Structureframeworks, Dependency Grammar frameworks,).Zellig Harris used transformations (Harris, 1968) tocapture paraphrase relations between related sentence typesso that, for example, a passive sentence (“Apple was ac-quired by Disney”) or a nominalization phrase (“Disney’sacquisition of Apple”) could be transformed into the samesimple sentence (“Disney acquired Apple”). Later work inTransformational Grammar (TG) (Chomsky, 1973; Fiengo,1974) provided a way for one analysis to represent both thenoncanonical construct and its regularization (D-structure).Thus in “Apple was acquired by Disney”:andrepresent canonical or logical subject and object positionsand the coindexed words occur in their surface positions.1The placeholders (and) for the gaps are called emptycategories (ecs). RG and other graph-based frameworks(e.g., Feature Structure frameworks) adopted an approachin which a gap is represented by an arc rather than an emptycategory. Under this approach, the same constituent ap-pears at the head of more than one arc in a graph.2Onlyone of the arcs represents the surface position of the con-stituent.3In addition to pure lexical or grammatical alternationswhere a constituent has “moved” (metaphorically) fromits canonical position (e.g., passive), there are some con-structions in which a single constituent is assigned multiplegrammatical roles. For example, in the control (or equi)construction “Johntried tto talk to Mary”, “John” is thelogical subject of both “tried” and “talk”, where the secondrelation is modeled here as an ec which is coindexed with“John”. While many current linguistic frameworks handlecontrol and related phenomena, (Mel’ˆcuk, 1988) enumer-ates some additional argument sharing structures (repre-sented with “lexical functions”). For example, “Carthage”is the subject of “suffered”, as well as the logical object


View Full Document

NYU CSCI-GA 2273 - Formal Mechanisms for Capturing Regularizations

Download Formal Mechanisms for Capturing Regularizations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Formal Mechanisms for Capturing Regularizations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Formal Mechanisms for Capturing Regularizations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?