Unformatted text preview:

1Three Models for the Description of LanguageAlaa Kharbouch and Zahi KaramI. INTRODUCTIONThe grammar of a language is a device that describesthe structure of that language. The grammar is comprisedof a set of rules whose goal is twofold: first theserules can be used to create sentences of the associatedlanguage and only these, and second they can be usedto classify whether a given sentence is an element of thelanguage or not. The goal of a linguist is to discovergrammars that are simple and yet are able to fully spanthe language. In [1] Chomsky describes three possibleoptions of increasing complexity for English grammars:Finite-state, Phrase Structure and Transformational. Thispaper briefly present these three grammars and summa-rizes Chomsky’s analysis and results which state thatfinite-state grammars are inadequate because they fail tospan all possible sentences of the English language, andphrase structure grammar is overly complex.A. Definition of Basic StructuresLanguage, as defined in [1] is a finite or infinite set ofsentences each of finite length constructed from a finitealphabet, A, of symbols. A string in A is defined as aconcatenation of the symbols of A. Therefore, a grammarof a language is some sort of device that produces allthe strings that are sentences in that language and onlythese.II. FINITE STATE MARKOV PROCESSESThe first grammar that Chomsky examines is thefinite-state grammar, which is defined as one that consistsof a finite number of states (Si) with transition symbols,aij, and a set C={(Si, Sj)} of connected states. As thisgrammar evolves from state to state it produces stringsof concatenated symbols aijthat form all the sentencesof a finite-state language LG.The symbols aijmay be chosen to be phonemes, mor-phemes or words. Morphemes are defined as the small-est grammatically functioning elements of a language,e.g. ”boy”, ”s” in ”books”. Typically morphemes orwords are used as they simplify the grammar, and theneach word and morpheme is replaced by its phonemicspelling.To analyze the types of languages that that finite-stategrammars can handle, we must first define a dependencyand a dependency set:Consider the following sentence of the language Lformed by the concatenation of the symbols aiof itsalphabet (Note that _ represents a string concatenation)S = a1_ ... _ an.S is said to have an (i,j)-dependency with respect to Lif and only if replacing the symbol aiwith a symbolbi(ai6= bi) in S requires that the symbol aj(i < j) bereplaced with bjso that the sentence S is still an elementof the language L.A dependency set of S in L,D = {(α1, β1), ..., (αm, βm)},contains dependencies that are distinct in both terms andeach first element (αi) in S precedes all second elements(βi).A sentence S with an m-term dependency requires a 2mstate finite-state grammar. For a language L to be bea finite-state language we therefore require that thereis a finite m such that all valid sentences in L havedependency sets of at most size m. This requirementallows us to decide whether a given language can bedescribed by a finite state grammar or not. The languagesL1, L2, and L3described below are three examples ofnonfinite-state languages:• L1consists of sentences formed from the con-catenation of m occurrences of a followed by moccurrences of b.L1: a _ ...mtimes_ a _ b...mtimes_ b• L2consists of sentences where the second half isjust the mirror image of the first half.L2: a _ b _ b _ a _ a _ b _ b _ a• L3consists of sentences whose second half is justa copy of the first half.L3: a _ b _ b _ a _ a _ b _ b _ aThese are examples of nonfinite-state languages becausefor any fixed m we can create a valid sentence ofthat language of length 2m + 2 that will have m + 1dependencies.Now that we have a criterion to determine if a given2language can be fully represented by a finite-state gram-mar, we examine whether English satisfies that criterion.Sentences of the English language can contain an infinitenumber of embedded declarative sentences such as thoseof the form:“if ..., then ...” or “either ..., or ...”,and can have infinitely many subordinate clauses of theform:“... since ...” or “... which ...”.Each of the declarative sentences and the subordinateclauses describe a dependency, therefore we cannotbound the number of dependencies in sentences of theEnglish language.Another finite-state grammar that could be used forthe English language is the nthorder approximation.This grammar decides the probability of a given wordoccurring conditioned on the n − 1 previous words thatwere observed. This method is attractive since as nincreases the sentences created by this grammar startto resemble proper English sentences. However, thisnthorder approximation to English fails to meet therequirement that the grammar be able to distinguishbetween grammatically correct and incorrect sentences,this can best be seen by the following example presentedby Chomsky:colorless green ideas sleep furiouslyfuriously sleep ideas green colorlessIn both sentences the likelihood that any of the wordsfollows the other is almost nil. Therefore, this grammarwould treat both sentences in the same manner declaringboth as ungrammatical, and yet the first sentence isgrammatically correct while the second is not. Thissection leads to the conclusion that English is not a finite-state language.III. PHRASE STRUCTUREIn “immediate constituent analysis”, the words in asentence are initially collected as a part of phrases, andthe components of each phrase may again be groupedinto its constituent phrases, and this is repeated until thesmallest constituents, such as words or morphemes, arereached. This kind of description offers us the advantageof simplification, as each phrase or constituent can begiven a label such as “noun phrase (NP)” or “verbphrase (VP)”, and can encapsulate a complex class ofexpressions. An example of this kind of analysis onthe sentence: ”the man took the book” is shown inthe diagram in figure 1. We can ask what kind ofspecification for a grammar would correspond to thisview of language or analysis, and this brings us toPhrase-Structure grammars.the man took the book NP Verb NP VP Sentence Fig. 1. Constituent analysis exampleA phrase-structure grammar consists of a vocabularyor alphabet Vp, a finite set of initial stringsP, and a setof rules F . These rules specify an instruction to replaceXiin Vpwith another string Yialso in the vocabulary,by changing


View Full Document

MIT 6 441 - Study Guide

Download Study Guide
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Guide and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Guide 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?