DOC PREVIEW
UMass Amherst LINGUIST 726 - LINGUIST 726 Lecture Notes

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Ling 726: Mathematical Linguistics, Lec 13 (revised Nov 4): Are NLs Finite-state? V. Borschev and B. Partee, November 2, 2004 p. 1 Lecture 13. Are Natural Languages finite-state languages? (and other questions) 0. Review.......................................................................................................................................................................1 1. Inadequacy of Type 3 grammars for natural languages: Classic examples ...............................................................2 2. Issues raised in Hauser and Fitch’s work...................................................................................................................4 Reading: Section 17.3 of PtMW: Regular Languages. pp. 471-480. Especially in connection with Marc Hauser’s work: (See links in WHISC of September 23: http://people.umass.edu/potts/whisc/whisc-2004-9-23.html#hauser .) • Marc D. Hauser, Noam Chomsky, and W. Tecumseh Fitch. 2002. The faculty of language: What is it, who has it, and how did it evolve? Science 298(5598):1569-1579, 22 November. • Thomas Bever and Mario Montalbetti. 2002. Noam's ark. Science 298(5598):1565-1566, 22 November. • Mark Liberman. September 3, 2003. Update on Fitch & Hauser. Linguist List 15.2450. • W. Tecumseh Fitch and Mark D. Hauser. 2004. Computational constraints on syntactic processing in a nonhuman primate. Science Magazine 303(5656):377-380, 16 January. • Pierre Perruchet and Arnaud Rey. 2004. Does the mastery of center-embedded linguistic structures distinguish humans from nonhuman primates? To appear in the Psychonomic Bulletin and Review. • Mark Liberman. January 16, 2004. Language in humans and monkeys. Language Log. • Mark Liberman. January 16, 2004. Hi Lo Hi Lo, it's off to formal language theory we go. Language Log. • Mark Liberman. August 31, 2004. Humans context-free, monkeys finite-state? Apparently not. Language Log. • Greg Kochanski. 2004. Is a phrase structure grammar the important difference between humans and monkeys? A comment on 'Computational constraints on syntactic processing in a nonhuman primate'. • Ray Jackendoff and Steven Pinker. In press. The faculty of language: What's special about it? Cognition. 0. Review. First, let’s do on the board three finite-state automata that will be relevant to the ensuing discussion. 1. (ab)n, i.e. {(ab)n| n>0} (This is the language the tamarin monkeys reportedly learned.) 2. {ab, aabb, aaabbb, aaaabbbb} (This is a finite sublanguage of the non-finite-state language anbn. ) 3. aA*a ∪ bA*b , where A = {a,b}, i.e. the set of all strings of a’s and b’s of length ≥ 2 which begin and end with the same symbol. (This is a finite-state language with some long-distance dependency, but no center-embedding, showing that we have to separate those issues.)Ling 726: Mathematical Linguistics, Lec 13 (revised Nov 4): Are NLs Finite-state? V. Borschev and B. Partee, November 2, 2004 p. 2 Second, let’s recall the closure properties of the set of finite-state languages (= regular languages, = languages which can be generated by a type 3 grammar, alias left-linear or right-linear grammar.) These were discussed in Lecture 12; they can all be shown by construction algorithms on finite-state automata. Assume a fixed alphabet A. So all the languages to be considered are subsets of A*. Union: If X, Y are regular languages, then so is X ∪ Y. Concatenation: If X, B are regular languages, then so is XB. Kleene star: If X is a regular language, then so is X*. Complementation: If X is a regular language, then A* - X is a regular language. Intersection: If X, Y are regular languages, then so is X ∩ Y. And given that the empty language ∅ and the universal language A* are regular languages (accepted by 1-state finite state automata, the simplest ones possible), we can conclude that the class of regular languages over any fixed alphabet is a Boolean algebra. 1. Inadequacy of Type 3 grammars for natural languages: Classic examples Is English a regular language? No. The typical proof (as in PtMW, 478-9) uses closure under intersection plus the pumping lemma (pumping theorem). First let’s practice using the pumping lemma some more to show that some languages on the alphabet {a,b} are not regular languages. Example 1 (repeated from Lecture 12): {anbn | n > 0 } : the set of all strings of n a’s followed by n b’s. Good strings: ab, aabb, aaabbb, aaaabbbb, … . Bad strings: aa, aabbb, abab, bbaa, … Recall what the pumping lemma says: (= theorem 17.2 in PtMW) Theorem 17.2 If L is an infinite fal over alphabet A, then there are strings x,y,z ∈ A* such that y ≠ e and xynz ∈ L for all n ≥ 0. For example 1, the anbn languages, we argued that there can be no such string y, no matter what we take for x and z. We argued that there are only three possibilities for y: either it’s some string of a’s, or some string of b’s, or some string of a’s followed by some string of b’s. And then it’s immediately obvious that if you take some grammatical string containing such a string y and then allow that string y to ‘loop’, i.e. to repeat 2 or more times, the resulting string will NOT be of the form anbn. Example 2. {x{xR| x ∈ {a,b}*}: the set of all strings of a’s and b’s in which the second half of the string is a mirror image of the first half. All the strings are of even length. The empty string isLing 726: Mathematical Linguistics, Lec 13 (revised Nov 4): Are NLs Finite-state? V. Borschev and B. Partee, November 2, 2004 p. 3 included in this ‘mirror-image language’. Good strings: e, aa, bb, abba, aaaa, abbbba, ababbaba, … . Bad strings: a, b, aba, aabb, ababa, bbba, abbba, aaaaa, abbaba, …. In this case it’s not so easy to apply the pumping lemma directly, but we can employ a common strategy, namely first to intersect this language with a known regular language to give a language to which the pumping lemma can be applied straightforwardly. Why is this legitimate? Because we know (lecture 12; see PtMW 475-77) that the intersection of any two regular languages is regular. So let’s intersect the mirror-image language {x{xR| x ∈ {a,b}*} with the regular language aa*bbaa* (i.e. with the set of all strings containing one or more a’s followed by exactly two b’s followed by one or more a’s.) The intersection gives the set {anbban | n > 0 }. And to this set it’s


View Full Document

UMass Amherst LINGUIST 726 - LINGUIST 726 Lecture Notes

Download LINGUIST 726 Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LINGUIST 726 Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LINGUIST 726 Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?