DOC PREVIEW
MIT 6 863J - Automata, Two-level phonology, & PC-Kimmo

This preview shows page 1-2-3-4-5-34-35-36-37-38-69-70-71-72-73 out of 73 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 73 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.863J Natural Language ProcessingLecture 2: Automata, Two-level phonology, & PC-Kimmo(the Hamlet lecture) Instructor: Robert C. Berwick6.863J/9.611J SP03 Lecture 2The Menu Bar• AdministriviaQuestionnaire posted (did you email it?)Lab1: split into Lab1a (this time) Lab1b (next time)• What and How: word processing, or computational morphology• What’s in a word: morphology• Modeling morpho-phonology by finite-state devices• Finite-state automata vs. finite state transducers• Some examples from English• PC-Kimmo & Laboratory 1:how-to6.863J/9.611J SP03 Lecture 2Levels of language• Phonetics/phonology/morphology: what words (or subwords) are we dealing with?• Syntax: What phrases are we dealing with? Which words modify one another?• Semantics: What’s the literal meaning?• Pragmatics: What should you conclude from the fact that I said something? How should you react?6.863J/9.611J SP03 Lecture 2The “spiral notebook” Modelthe dogs ate ice-creamθε dawgz…Sentence‘surface’formNoun phrase Verb phraseVerb Noun Phraseate ice-creamthe dogzλx, xε{dogs}, ate(x, i-c)‘sound’form‘phrase’form‘logical’form6.863J/9.611J SP03 Lecture 2Start with words: they illustrate all the problems (and solutions) in NLP• Parsing wordsCats → CAT + N(oun) + PL(ural)• Used in:• Traditional NLP applications• Finding word boundaries (e.g., Latin, Chinese)• Text to speech (boathouse)• Document retrieval (example next slide)• In particular, the problems of parsing, ambiguity,and computational efficiency (as well as the problems of how people do it)6.863J/9.611J SP03 Lecture 2Example from information retrieval• Keywork retrieval: marsupial or kangaroo or koala• Trying to form equivalence classes - ending not important• Can try to do this without extensive knowledge, but then:organization → organ European → Europegeneralization → generic noise → noisy6.863J/9.611J SP03 Lecture 2Morphology• Morphology is the study of how words are built up from smaller meaningful units called morphemes (morph= shape; logos=word)• Easy in English – what about other languages?6.863J/9.611J SP03 Lecture 2What about other languages?amarenamarainamaríanamenamaronamaránamambanamanamáisamareisamaraisamaríanosamemosamomosamremosamambaisamadamáisamamosamáremeamaraamaríaameamóamaráamambaamaamesamaresamarasamaríasamesamasteamarásamabasamaamasamareamaraamaríaameaméamaréamabaamoFutureSubj.Imp.Subj.CondPresentSubjunPreteriteFutureImperfIndic.ImperfPresent indicativeHow to love in Spanish…incomplete…you canfinish it after Valentine’s Day…6.863J/9.611J SP03 Lecture 2What about other languages?6.863J/9.611J SP03 Lecture 2What about other processes?• Stem: core meaning unit (morpheme) of a word• Affixes: bits and pieces that combine with the stem to modify its meaning and grammatical functionsPrefix: un- , anti-, etc.Suffix: -ity, -ation, etc.Infix:Tagalog: um+hinigi → humingi (borrow)Any infixes in ‘nonexotic’ language like English?Here’s one: un-f******-believable6.863J/9.611J SP03 Lecture 2OK, now how do we deal with this computationally?• What knowledge do we need?• How is that knowledge put to use?• What:duckling; beer (implies what K…?)chase + ed → chased (implies what K?)breakable + un →unbreakable (‘prefix’)• How: a bit trickier, but clearly we are at least doing this kind of mapping…6.863J/9.611J SP03 Lecture 2Our goal: PC-Kimmof lSurface formLexiconiseRulesF LY+ SLexical form6.863J/9.611J SP03 Lecture 2Two parts to the “what”1. Which units can glue to which others (roots and affixes) (or stems and affixes), eg, 2. What ‘spelling changes’ (orthographic changes) occur – like dropping the e in ‘chase + ed’ OK, let’s tackle these one at a time, but first consider a (losing) alternative…6.863J/9.611J SP03 Lecture 2KISS: A (very) large dictionary1. Impractical: some languages associate a single meaning w/ aSagan number of distinct surface forms (600 billion in Turkish)German: Leben+s+versichergun+gesellschaft+s+angestellter(life+CmpAug+insurance+CmpAug+company+CompAug+employee)Chinese compounding: about 3000 ‘words,’ combine to yield tens of thousands2. Speakers don’t represent words as a listWug test (Berko, 1958)Juvenate is rejected slower than pertoire (real prefix matters)6.863J/9.611J SP03 Lecture 2Representing possible roots + affixes as a finite-state automaton/usr/dict/wordsFSM17728 states, 37100 arcs2 sec25K words206K charsclearclevereareverfatfatherWordlistcompilerlc aeveethfaNetwork6.863J/9.611J SP03 Lecture 2Now add in states to get possible combos, as well as features+Adjr+Compb i geThis much is easy – a straightforward fsaStates = equivalence classeslfailaccept06.863J/9.611J SP03 Lecture 2English morphology: what states do we need for the fsa?• As an example, consider adjectivesBig, bigger, biggestCool, cooler, coolest, coollyRed, redder, reddestClear, clearer, clearest, clearly, unclear, unclearlyHappy, happier, happiest, happilyUnhappy, unhappier, unhappiest, unhappilyReal, unreal, silly6.863J/9.611J SP03 Lecture 2Will this fsa work?06.863J/9.611J SP03 Lecture 2Ans: no!• Accepts all adjectives above, but• Also accepts unbig, readly, realest• Common problem: overgeneration• Solution?6.863J/9.611J SP03 Lecture 2Revised picture6.863J/9.611J SP03 Lecture 2How does PC-Kimmo represent this?Here’s what the pc-kimmo fsalooks like – the fsa states are called ‘alternation classes’ or ‘lexicons’6.863J/9.611J SP03 Lecture 2PC-Kimmo states for affix combos (portion) = lexicon treeBegin (Initial)N_root Adj_prefix V_prefix(at start of file english.lex)N_root2N_root1N_suffix GenitiveNumberENDENDENDENDENDAdj_root6.863J/9.611J SP03 Lecture 2Next: what about the spelling changes? That’s harder!ü Which units can glue to which others (roots and affixes) (or stems and affixes) 2. What ‘spelling changes’ (orthographic changes) occur – like dropping the e in ‘chase + ed’6.863J/9.611J SP03 Lecture 2Mapping between surface form & underlying formc h a s e dc h a s e + e dSurface:Underlying:But clearly this can go either way – given the underlying form, we can generate the surface form – so we reallyhave a relation betw. surface & underlying form, viz.:6.863J/9.611J SP03 Lecture 2Conventional notationLexical (underlying) form: c h a s e + e dSurface form: c h a s 0 0 e dThe 0’s “line up” the lexical & surface


View Full Document

MIT 6 863J - Automata, Two-level phonology, & PC-Kimmo

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Automata, Two-level phonology, & PC-Kimmo
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Automata, Two-level phonology, & PC-Kimmo and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Automata, Two-level phonology, & PC-Kimmo 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?