Unformatted text preview:

CS 674/INFO 630: Advanced Language Technologies Fall 2007Lecture 20 — November 8, 2007Prof. Lillian Lee Scribes: Cristian Danescu Niculescu-Mizil &Nam Nguyen & Myle OttAnalyzing Syntactic StructureUp to now we have focused on the corpus as a whole and almost entirely ignored the structuraldetails of each document. We claim that the sentence-level structure is very important when itcomes to extracting the meaning of a text; the followin g two sentences are a clear example in whichat least the order of the words is crucial:Google bought YouTube.YouTube bought Google.For analyzing the structure of a sentence we will use constituent analysis, a hierarchical modelintro duced by Chomsky [1]. In this framework, sentences are modeled in the form of a pars e tree(see Figure 1), where leaves are lexical items, or terminals, and internal nodes are constituent labels.The constituent labels form grammatical types from the lexical items, and lexical items form thesentence. For example, the firs t level of the tree in Figure 1 tells us that the analyzed sentence Sis made ou t of a noun phrase NP and a verb phrase VP. To s ee this more clearly, let us introducethe following, m ore compact, notation:[[Police]N P[put barricades around Upson]V P]SThe constituent labels that are fou nd immediately above the leaves correspond to parts-of-speechand serve as types for the lexical items; in our example N, V, and Pr correspond to the “noun,”“verb,” and “preposition” parts-of-speech.One should take note of the X-bar regularity, which refers to the fact that for any X-phrase, XP,there exists a descendant of type X that is the head of the XP. In the example in Figure 1, thehead of the first NP is the noun “Police” and the head of the VP is the verb “put.”SNPNPoliceVPVputNPNbarricadesPrPPraroundNPNUpsonFigure 1: A parse tree for th e sentence “Police put barricades around Upson.” The constituentlabels are: S=sentence, NP=noun phrase, VP=verb phrase, PrP=p repositional ph rase, N=noun,Pr=preposition1It shou ld also be mentioned that constituent analysis is ju st one of many types of hierarchicalsentence-structure analysis. For example, one commonly used alternative to constituent analysisis depend en cy analysis ([6], [4]). In d ependency analysis, relationships between words, rather thanconstituents, are labeled. Consequently, dependency analysis is insensitive to word ordering andis, therefore, often used for analyzing free-order languages. Figure 2 gives a dependency analysisanalogous to the constituent analysis of Figure 1.Figure 2: A dependency analysis for the sentence “Police pu t barricades around Upson.”Context Free GrammarsOne important aspect of constituent analysis is distinguishing between correct and incorrect anal-yses. However, several ambiguities can arise with respect to syntactic structure and, therefore,one sentence can have more than one valid constituent analysis. To illustrate this, consider thefollowing sentence:I saw her duck [with a telescop e]P rPThe following ambiguities arise:• P rP -attachment: the P rP “with a telescope” can modify any one of the three entities:I: I, using a telescope, s aw her duck.her: I saw that she was holding a telescope while s he du cked.duck: I saw a duck with a telescope, and the duck belonged to her.• Part-of-speech (P OS) ambiguity: the word “duck” can either be a noun or a verb.It’s important to note that not all combinations of the above choices are possible. For example,given that the word “duck” is a noun, the P rP cannot mod ify “her.”In addition to ambiguities in syntactic structure, there are also often semantic, or word sen se,ambiguities. For example, in the above sentence, the word “saw” could have different meanings.Namely, it could refer to the act of s awing something (using a telescope). Alternatively, in thecontext of a poker game, she might h ave bet a duck and I might be “seeing” her bet with my ownbet (a telescope).It would be desirable to have a method to specify all and only valid constituent an alyses; to do so wewill start with employing Context Free Grammars (CFGs). For constituent analysis, a particularCFG, G, is given by:• a finite set of lexical items (also called terminals)2• a finite set of constituent types (also called non-terminals)• a finite set of one-level decompositions (also called productions or rewrite rules)• a distinguished root type, S, corresponding to the sentenceFor example, the CFG implied by Figure 1 is given by:• lexical items: {police, put, barricades, around, Upson}• constituent types: {S, NP, VP, PrP, N, V, Pr}• one-level decompositions:(1) S → N P V P(2) V P → V NP P rP(3) P rP → P r NP(4) NP → N(5) N → police(6) N → barricades(7) N → U pson(8) V → put(9) P r → around• root type: SA parse tree is valid with r espect to G if and only if every branch is a decomposition given by Gand all the leaves are lexical items. These grammars are called “context-free” because a decom-position of a constituent can be employed regardless of the context in which the constituent appears.Problems with CFGsIn this section we look at some problems that arise from using a na¨ıve CFG.1) Proliferation of CategoriesConsider the following three errors made by the CFG implied by Figure 1, which is given above.Notice that in each case, the most obvious solution is to create new categories to handle theerroneous cases. This is appropriately referred to as the “proliferation of categories” p roblem.3a. Selectional ErrorLet us illustrate the error by way of example:# [[Upson]N[put]V[barricades]N[around]P r[police]N]SWhile the above sentence is “semantically doubtful” (indicated by the leading ‘#’), our CFG wouldaccept it. This is referred to as a “selectional error,” and refers to the need for semantic informationin type labels. In this case, the “selection error” is that Upson1is inanimate and, thus, cannotactively put anything around anything else.b. Case MismatchLet us again illustrate by example:∗ [[Police]N[put]V[they]N[around]P r[Upson]N]SThis sentence is syntactically incorrect (indicated by th e leading ‘∗’) because, while both “they”and “them” are nouns, the former is clearly in correct in this sentence. The problem here is that“they” and “them” are different types of n ouns, i.e. they are not interchangeable because they havedifferent cases.2Clearly this must be accounted for in


View Full Document
Download CS 674 Lecture 20
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CS 674 Lecture 20 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CS 674 Lecture 20 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?