This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Walk-ing the walk, talk-ing the talk –My Fair Lady LectureProfessor Robert C. [email protected] Menu Bar• Administrivia: read the instructions before opening;Lab 2 due next Monday at 6 pm; Lab 3 releasedMonday evening.• My Fair Lady: “words, words, words, I’m so sick ofwords…”• A detailed walk-through of a 2-level example• What’s in Laboratory 3?• What can the two-level (finite-state) transducers do?• What can’t fst’s do? Complexity issues &representational issue: decentralize the method?!• The coup de…BambarraThe story so far…• We divide morpho-syntax, word parsing knowledge,into 2 parts: (1) Lexicon (roots+endings); and (2)spelling changes;• We further divided spelling changes into a (small) set• We implemented both as finite state transducers(FSTs), to capture (1) output ‘glosses’ in the case ofthe lexicon; and (2) lexical/surface pairings in thecase of spelling changes• We showed that FSTs don’t behave quite like FSAs• We implement their action together as a intersectedFST, all in the system demo’d last time (“kimmo”)Today…• We’ll trace through an example in detail, to show how bothcomponents work• We will see that both lexicon and spelling change rulesinvoke nondeterminism in essential (though slightly different)ways• We will see how spelling change rules must interact; showinghow multiple spelling changes go together• We will add 1 more bit about how FSTs aren’t like finite-stateautomata• We will show how to write a new spelling change rule• We will see how all this fits together in the next Lab• We’ll probe the computational complexity of this system andthe strengths & weaknesses of this way of parsing words6.863J/9.611J SP116.863J/9.611J SP11Morphology is finite-state: regular relationsOur implementation is a so-called two-levelsystemEach spelling change FST must pass all pairs of lexical,surface character pairs: thus, the FSTs are really acting asconstraint filters (they are failure driven)This is the intersection of all the FSTsSet of paralleltwo-level rules(constraints)compiled into finite-statemachinesinterpreted as transducers;one fst is a ‘letter tree’ forthe roots+affixesfst 2fst n...Lexical (‘upper’) formSurface (‘lower’) formFST1dict6.863J/9.611J SP11The “same length” constraint• So that FSTs are closed under intersectionF O X + 0 S # lexicalf o x 0 e s # surfaceS P Y 0 + S # lexicals p i e 0 s #The same-length constraint6.863J/9.611J SP11How are underlying/surface pairingsdone in linguistic theory?• Insert e after ‘sh’, ‘x’, etc: “epenthesis”• Statement of rule is actually quite complex:• Rewrite rule: x→ y | α _ β (Chomsky & Halle, 1968)• 0:e → [Csib (c h) (s h) y:i] +:0 _ s(transducer notation)Note that the re-write notation is very powerful (in general,an arbitrary Turing machine)So the problem always was, how to implement thisefficientlyThis was not solved until the 2-level method was inventedHow hairy can these rules be, after all?6.863J/9.611J SP11From underlying forms bubbling to thesurface (from Halle, 1960)…R1 dupe a+ked re+ked a+sin re+sinR2 s-to-z a+kked re+ked a+ssin re+zinR3 k-to-s a+ksed re+sed a+ssin re+zinR4 Vowel akseyd reseyd assayn reziyn Shift accede recede assign resignaccede recede assign resign6.863J/9.611J SP114 ordered rules - write out lexical:surfacepairs (L:S pairs)• Lexical: a+ked re+ked a+sin re+sinSurface: akseyd reseyd assayn rezaynPad out so both of equal length, also noting +:0correspondencea+ked re+ked a+sin re+sinaksed re0sed assin re0zin6.863J/9.611J SP11Extract contexts to find declarativeconstraintsa+ked re+ked a+sin re+sinaksed re0sed assin re0zinRule: +:k ó a:a __ k:s +:s ó a:a __ s:sRule: k:s ó e +:0 __ V | +:k __ V +:s ó a:a __ s:s6.863J/9.611J SP11Why don’t we need to look at thederivational steps now??• We can look at the lexical:surface substringssimultaneously• So, if a rule has applied (conversely, not applied),its effects should be visible via the jointlexical:surface pairs surrounding it in somelexical:surface pairing example (or else not visibleif the rule did not apply)• Otherwise, the rule must have been superfluous (ithas no visible effects on the relation between anylexical:surface pairs)6.863J/9.611J SP11Examples for ‘e’ insertion…Fox - foxes; church - churches; bus-busesWhat elses?Must look at non-examples as well… as - ases?What is the rule?In traditional ‘rewrite form’: ε → e / {x,s,z}+__ s#Now redo this in terms of (lexical, surface) pairs, whichtells us how to build the transducer:Lexical nothing (0) is paired with e, or 0:e in context of: x:x, +:0 ____ s:s, #:#6.863J/9.611J SP11Turning the data into a finite-statetransducer with pairings• Write down the left, center, and right contexts asdeclarative constraints• In this case: x:x +:0 0:e s:s #:#Csib:Csib• Pad out with nulls (0’s) [to obey the same lengthconstraint]• Write an FST that accepts exactly this string, and rejectseverything else (we want the FSTs to work basically asfilters)6.863J/9.611J SP11And acceptance (cook until done)Csib:Csib +:0 0:e s:s #:#1 2 3 4 5 6reject@reject@@:@@:@[email protected]/9.611J SP11Tabular format for FST - the state tableRules: Epenthesis c h s Csib + # 0 @ c h s Csib 0 # e @ 1: 2 1 4 3 1 1 0 1 2: 2 3 3 3 1 1 0 1 3: 2 1 3 3 5 1 0 1 4: 2 3 3 3 5 1 0 1 5: 2 1 2 2 1 1 6 1 6. 0 0 7 0 0 0 0 0 7. 0 0 0 0 0 1 0 0For states:0 = failure state; colon after state number: an accepting stateNote! We will have to change this table a bit…let’s see why..lexicalsurfaceStatesPairsWhat about…Spy vs. Spies?Here there are several spelling change rules…This example requires twos p y + 0 ss p i 0 e s0:e <=> Csib:| y:i _ s:sEpenthesis ruley:i <=> _ 0:e +:0s p y 0 + ss p i e 0 sy-i ruleThe epenthesis automaon must be ‘aware’ of whatthe y:i automaton does…and vice-versa6.863J/9.611J SP11More than 1 Spelling change ruleName Description ExampleConsonantDoubling(gemination, G)1-letter consonantdoubled before -ing/edbeg/beggingE deletion(elision, EL),Silent e dropped before -ing, -edmake/makingE insertion(epenthesis, EP)e added after -s, -z, -ch, -sh before -sfox/foxesY replacement(Y)-y changes to -ie before -edtry/triedI spelling (I) I goes to y before vowel lie/lyingThe Lexicon FST• What’s


View Full Document

MIT 6 863J - Lecture Notes

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?