DOC PREVIEW
MIT 6 863J - Automata, Two-level phonology, & PC-Kimmo

This preview shows page 1-2-3-4-5-34-35-36-37-68-69-70-71-72 out of 72 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 72 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.863J Natural Language Processing Lecture 2: Automata, Two-level phonology, & PC-Kimmo (the Hamlet lecture)The Menu BarLevels of languageThe “spiral notebook” ModelStart with words: they illustrate all the problems (and solutions) in NLPExample from information retrievalMorphologyWhat about other languages?Slide 9What about other processes?OK, now how do we deal with this computationally?Our goal: PC-KimmoTwo parts to the “what”KISS: A (very) large dictionaryRepresenting possible roots + affixes as a finite-state automatonNow add in states to get possible combos, as well as featuresEnglish morphology: what states do we need for the fsa?Will this fsa work?Ans: no!Revised pictureHow does PC-Kimmo represent this?PC-Kimmo states for affix combos (portion) = lexicon treeNext: what about the spelling changes? That’s harder!Mapping between surface form & underlying formConventional notationFinite-state transducers: a pairing between lexical/surface stringsDefinition of finite-state automaton (fsa)Definition of finite-state transducerRegular relations on stringsThe difference between (familiar) fsa’s and fst’s: functions from…Defining an fst for a spelling-change ruleInsert ‘e’ before non-initial z, s, x (“epenthesis”)Successful pairing of foxes,fox+sNow we combine the fst for the rules and the fsa for the lexicon by compositionSo we’re done, no?So, we’re done, right?Simultaneous rulesThe classical problemExample from English (“gemination”)What’s the difference?But this is a problem…When is this possible?Schuztenberger’s condition on closure of fst’sSimultaneous read headsThe conditionPlus lexicon – lexical forms always constrained by the path we’re following through the lexicon treeAnd that’s PC-Kimmo, folks… or “Two-level morphology”Spelling change rulesHow do we write these in PC-Kimmo?PC-Kimmo 2-level RulesForm & Semantics of 2-level Rulesa:b => l_rExample: epenthesisa:b <= l_rY-I spellinga:b <=> l_rPossessives with ‘s’Example: Japanese past tensea:b <= /l_rGemination (consonant doubling)2-Level Rule Semantics: summaryAutomata Notation (.rul file)Shudder…Limits?Summary: what have we learned so far?Lab 1: PC-kimmo warmupAn example – try it yourselfOutfoxed? Off to the races…More to go…And still more maze of twisty passages, all alike…it’s going to try all the sublexicons w/ this bad guess..Winding paths…after 22 steps…The End6.863J Natural Language ProcessingLecture 2: Automata, Two-level phonology, & PC-Kimmo(the Hamlet lecture) Instructor: Robert C. [email protected]/9.611J SP03 Lecture 2The Menu Bar•Administriviaweb page: www.ai.mit.edu/courses/6.863/ now with Lecture 1, Lab1Questionnaire posted (did you email it?)Lab1: split into Lab1a (this time) Lab1b (next time)•What and How: word processing, or computational morphology•What’s in a word: morphology•Modeling morpho-phonology by finite-state devices•Finite-state automata vs. finite state transducers•Some examples from English•PC-Kimmo & Laboratory 1:how-to6.863J/9.611J SP03 Lecture 2Levels of language•Phonetics/phonology/morphology: what words (or subwords) are we dealing with?•Syntax: What phrases are we dealing with? Which words modify one another?•Semantics: What’s the literal meaning?•Pragmatics: What should you conclude from the fact that I said something? How should you react?6.863J/9.611J SP03 Lecture 2The “spiral notebook” Modelthe dogs ate ice-cream dawgz…Sentence‘surface’formNoun phrase Verb phraseVerb Noun Phraseate ice-cream the dogzx, x{dogs}, ate(x, i-c)‘sound’form‘phrase’form‘logical’form6.863J/9.611J SP03 Lecture 2Start with words: they illustrate all the problems (and solutions) in NLP•Parsing wordsCats  CAT + N(oun) + PL(ural)•Used in:•Traditional NLP applications•Finding word boundaries (e.g., Latin, Chinese)•Text to speech (boathouse)•Document retrieval (example next slide)•In particular, the problems of parsing, ambiguity,and computational efficiency (as well as the problems of how people do it)6.863J/9.611J SP03 Lecture 2Example from information retrieval•Keywork retrieval: marsupial or kangaroo or koala•Trying to form equivalence classes - ending not important•Can try to do this without extensive knowledge, but then:organization  organ European Europegeneralization  generic noise  noisy6.863J/9.611J SP03 Lecture 2Morphology•Morphology is the study of how words are built up from smaller meaningful units called morphemes (morph= shape; logos=word)•Easy in English – what about other languages?6.863J/9.611J SP03 Lecture 2What about other languages?Present indicativeImperf ImperfIndic.Future Preterite PresentSubjunCond Imp.Subj.FutureSubj.amo amaba amaré amé ame amaría amara amareamas ama amabas amarás amaste ames amarías amaras amaresamesama amamba amará amó ame amaría amara amáremeamamosamáis amad amambaisamremosamomos amemos amaríanosamaraisamareisamáisaman amamban amarán amaron amen amarían amarainamarenHow to love in Spanish…incomplete…you canfinish it after Valentine’s Day…6.863J/9.611J SP03 Lecture 2What about other languages?6.863J/9.611J SP03 Lecture 2What about other processes?•Stem: core meaning unit (morpheme) of a word•Affixes: bits and pieces that combine with the stem to modify its meaning and grammatical functionsPrefix: un- , anti-, etc.Suffix: -ity, -ation, etc.Infix:Tagalog: um+hinigi  humingi (borrow)Any infixes in ‘nonexotic’ language like English?Here’s one: un-f******-believable6.863J/9.611J SP03 Lecture 2OK, now how do we deal with this computationally?•What knowledge do we need?•How is that knowledge put to use?•What: duckling; beer (implies what K…?)chase + ed  chased (implies what K?)breakable + un unbreakable (‘prefix’)•How: a bit trickier, but clearly we are at least doing this kind of mapping…6.863J/9.611J SP03 Lecture 2Our goal: PC-Kimmof lSurface formLexiconiseRulesF LY+ SLexical form6.863J/9.611J SP03 Lecture 2Two parts to the “what”1. Which units can glue to which others (roots and affixes) (or stems and affixes), eg, 2. What ‘spelling changes’ (orthographic changes) occur – like dropping the e in ‘chase + ed’ OK, let’s tackle these one at a time, but first consider a (losing) alternative…6.863J/9.611J SP03 Lecture 2KISS: A (very) large dictionary1. Impractical: some languages associate a single


View Full Document

MIT 6 863J - Automata, Two-level phonology, & PC-Kimmo

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Automata, Two-level phonology, & PC-Kimmo
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Automata, Two-level phonology, & PC-Kimmo and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Automata, Two-level phonology, & PC-Kimmo 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?