This preview shows page 1-2-3-4-25-26-27-51-52-53-54 out of 54 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.863J Natural Language ProcessingLecture 5: What’s my line?Instructor: Robert C. [email protected]/9.611J SP04 Lecture 5The Menu Bar• AdministriviaLecture 3 posted; Lab 1a (aka “component II”) due yesterday; Lab 1b, due next Monday• Postmortem: Complexity of Kimmo/fst’s – too weak? Too strong? What makes a good computational linguistics representation? A good linguistic representation? A good algorithm?• Alternatives: morphology w/o a dictionary• What’s my line: take a chance6.863J/9.611J SP04 Lecture 5Is Kimmo necessary?• Does it explain why many non-human systems never occur (ruling them out)• Or does it overshoot?• Ans: it seems to overshoot, in at least 2 ways• Overshoots detected by computationalanalysis6.863J/9.611J SP04 Lecture 5Overshoot #1: too powerful with dependencies• More powerful than well-known grammars in linguistics (and computational linguistics)• We can use kimmo to ‘count’ – but natural languages don’t (or cannot) do this…• (Recall: we can use Kimmo to output a language with one counting relation: anbn– not a finite-state language)• But we can do more… nothing stops us from producing a language with mcounting relations, e.g,for any n, {(x, (cx)n) | x ∈ {a* b* }}, e.g., for n=3,cababcababcabab, cbbbcbbbcbbb…6.863J/9.611J SP04 Lecture 5Not captured by context-free language• (Familiar): anbncn• Intuition: use of pushdown stack – can catch onesuch pairing, but not more6.863J/9.611J SP04 Lecture 5So: Kimmo admits more than context-free languages!• So Kimmo is more powerful than this!But how powerful is it? We can still parse context-free languages in cubic time (in the length of sentences)• We shall see that Kimmo is more complexthan this!• Conjecture: all the context-sensitive languages6.863J/9.611J SP04 Lecture 5Complexity of Kimmo word recognition • All these finite-state devices, working in parallel• There is backup• Is it intrinsic to the system? Or eradicable? Or, doesn’t matter in practice?6.863J/9.611J SP04 Lecture 5Litmus test #2 – computational complexity of Kimmo – word parsing is intractable!• Kimmo Recognition Problem (KRP):Given a language defined by an arbitrary (finite) Kimmo dictionary (lexical automata) and a finite set of Kimmo rules, how long in the worst case will it take to recognize whether a form is or is not in the language?• Kimmo recognition problem is NP-hard• As hard as any other problem solvable by a nondeterminstic Turing machine in polynomial time• No known det polytime (eg, cubic) algorithm for NP-hard problems…6.863J/9.611J SP04 Lecture 5Complexity hierarchyExp-timePspace (CSL recog,intersection fsa’s,NP (traveling sales3-SAT)P (CFL recog, fsa)6.863J/9.611J SP04 Lecture 5Parsing words with Kimmo is computationally intractable• Intuition: what if the characters on the surface don’t give any clues as to what ‘features’ they ought to have underlyingly? (e.g., whether a Noun or a Verb, as in police police police)• This seems awfully close to the famous 3-SAT problem: is there an assignment of T(rue), F(alse) to the literals of an arbitrary Boolean formula in 3-conjunctive normal form s.t. the formula evaluates to true?• In fact, we can simulate this problem using Kimmo6.863J/9.611J SP04 Lecture 53-Sat (3-satisfiability) is NP-complete•Given(arb) 3-Sat formula, e.g.,• There is no known deterministic Turing machine that can figure out quickly (in polynomial time) whether there is an assignment of true or false to literals x,y, z in order to make the formula evaluates to true justby inspecting the local surface string• We could guess this in polynomial time – i.e., Nondeterministic Polynomial, or NP time (time measured in length of the formula)()()()xyz yqp xqz∨∨ ∧ ∨∨ ∧ ∨∨6.863J/9.611J SP04 Lecture 5Reduction of 3-Sat to Kimmorecognition problem• For every 3-Sat problem, we can find, in polynomial time, a corresponding Kimmo word recognition problem where there’s a valid word if the 3-Sat problem was satisfiable• If Kimmo recognition could be done in deterministic polynomial time (P) then so could 3-SAT6.863J/9.611J SP04 Lecture 5ReductionAny 3-Sat problemEquivalentKimmo recognition problemAnswer to original SATproblemEfficient (polynomialtime) transformation6.863J/9.611J SP04 Lecture 5The reduction:Given: arbitrary 3-SAT problem instance, e.g.,If we could solve Kimmo recognition easily,Then we could solve 3-Sat easily(fixed) Lexicon, LFst’s, 1per variableFast(polytime)transformationword∈L if Sat instance satisfiable(x v ¬y v z) (¬x v ¬z) (x v y)6.863J/9.611J SP04 Lecture 5Two components to 3-Sat• The fact that an x that has a truth assignment in one place, must have the same truth assignment everywhere - what morphological process is that like?• The fact that every triple must have at least 1 ‘T’ underlyingly (so that the triple is true) -what morphological process is that like? 6.863J/9.611J SP04 Lecture 5How the reduction works• Given arbitrary 3-sat formula φ, e.g.,(x v ¬y v z) (¬x v ¬z) (x v y)• Represent in the form, a ‘word’:x-yz,-xz,xy• For each variable x, we have an ‘assignment machine’ that ensures that x is mapped to T or F throughout the whole formula• We have one machine (and a fixed dictionary) to checks each disjunction to make sure that at least one disjunct is true in every conjunct6.863J/9.611J SP04 Lecture 5Two components• Agreement: vowel harmony (if round at some point, round everywhere)• Ambiguity: we can’t tell what the underlying value of xis from the surface, but if there’s at least one “t” per ‘part of word’, then we can spell out this constraint in dictionary• Note that words (like Sat formulas) must be arbitrarily long… (pas de probleme)• Dictionary is fixed…• # of Vowel harmony processes corresponds to # of distinct literals6.863J/9.611J SP04 Lecture 5Reduce until done: assignment consistency6.863J/9.611J SP04 Lecture 5Reduce until done – formula must eval to true6.863J/9.611J SP04 Lecture 5What are the implications?• FTNs inherently require backup if simulated (in the worst case) – Kimmo at least NP-hard (proof later on)• Empty elements cause computational complexity (unless restricted – equal length condition) – true in allareas of linguistics• Composition can save us, but then rule ordering must be watched carefully6.863J/9.611J SP04 Lecture 5Implications• Do we need a


View Full Document

MIT 6 863J - Lecture Notes

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?