DOC PREVIEW
Columbia COMS W4705 - Morphology

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 3What is morphology?Slide 3English Inflectional MorphologySlide 5English Derivational MorphologySlide 7How do people represent words?Slide 9ParsingWhy parse words?What do we need to build a morphological parser?Morphotactic ModelsSlide 14Using FSAs to Represent the Lexicon and Do Morphological RecognitionLimitationsParsing with Finite State TransducersFinite State TransducersSlide 19FST for a 2-level LexiconFST for English Nominal InflectionOrthographic Rules and FSTsSlide 23Summing UpHomework 1Word ClassesCS 4705Lecture 3MorphologyWhat is morphology?•The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)–Stems–Affixes (prefixes, suffixes, circumfixes, infixes)•Immaterial•Trying•Gesagt•Absobl**dylutely–Concatenative vs. non-concatenative (e.g. Arabic root-and-pattern)•Multiple affixes–Unreadable•Agglutinative languages (e.g. Turkish, Japanese) vs. inflectional languages (e.g. Latin, Russian) vs. analytic languages (e.g. Mandarin)English Inflectional Morphology•Word stem combines with grammatical morpheme–Usually produces word of same class–Usually serves a syntactic function (e.g. agreement)like  likes or likedbird  birds•Nominal morphology–Plural forms•s or es•Irregular forms•Mass vs. count nouns (email or emails)–Possessives•Verbal inflection–Main verbs (sleep, like, fear) verbs are relatively regular•-s, ing, ed •And productive: Emailed, instant-messaged, faxed, homered•But eat/ate/eaten, catch/caught/caught–Primary (be, have, do) and modal verbs (can, will, must) are often irregular and not productive•Be: am/is/are/were/was/been/being–Irregular verbs few (~250) but frequently occurring–English verbal inflection is much simpler than e.g. LatinEnglish Derivational Morphology•Word stem combines with grammatical morpheme–Usually produces word of different class–More complicated than inflectional•Example: nominalization–-ize verbs  -ation nouns–generalize, realize  generalization, realization•Example: verbs, nouns  adjectives–embrace, pity embraceable, pitiable–care, wit  careless, witless•Example: adjective  adverb–happy  happily•More complicated to model than inflection–Less productive: *science-less, *concern-less, *go-able, *sleep-able–Meanings of derived terms harder to predict by rule•clueless, careless, nervelessHow do people represent words?•Hypotheses:–Full listing hypothesis: words listed –Minimum redundancy hypothesis: morphemes listed•Experimental evidence:–Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither–Regularly inflected forms prime stem but not derived forms –But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart)•Speech errors suggest affixes must be represented separately in the mental lexicon–easy enoughlyParsing•Taking a surface input and identifying its components and underlying structure•Morphological parsing: parsing a word into stem and affixes and identifying the parts and their relationships–Stem and features:•goose  goose +N +SG or goose + V•geese  goose +N +PL•gooses  goose +V +3SG–Bracketing: indecipherable  [in [[de [cipher]] able]]Why parse words?•For spell-checking –Is muncheble a legal word?•To identify a word’s part-of-speech (pos)–For sentence parsing, for machine translation, …•To identify a word’s stem–For information retrieval•Why not just list all word forms in a lexicon?What do we need to build a morphological parser?•Lexicon: stems and affixes (w/ corresponding pos)•Morphotactics of the language: model of how morphemes can be affixed to a stem•Orthographic rules: spelling modifications that occur when affixation occurs–in  il in context of l (in- + legal)Morphotactic Models•English nominal inflectionq0 q2q1plural (-s)reg-nirreg-sg-nirreg-pl-n•Inputs: cats, goose, geese•Derivational morphology: adjective fragmentq3q5q4q0q1 q2un-adj-root1-er, -ly, -estadj-root1adj-root2-er, -est• Adj-root1: clear, happy, real• Adj-root2: big, redUsing FSAs to Represent the Lexicon and Do Morphological Recognition•Lexicon: We can expand each non-terminal in our NFSA into each stem in its class (e.g. adj_root2 = {big, red}) and expand each such stem to the letters it includes (e.g. red  r e d, big  b i g)q0q1un-req2q4q3-er, -estdbgq5q6iq7Limitations•To cover all of e.g. English will require very large FSAs with consequent search problems–Adding new items to the lexicon means recomputing the FSA–Non-determinism •FSAs can only tell us whether a word is in the language or not – what if we want to know more?–What is the stem?–What are the affixes and what sort are they?–We used this information to build our FSA: can we get it back?Parsing with Finite State Transducers•cats cat +N +PL•Kimmo Koskenniemi’s two-level morphology–Words represented as correspondences between lexical level (the morphemes) and surface level (the orthographic word)–Morphological parsing :building mappings between the lexical and surface levelsc a t +N +PLc a t sFinite State Transducers•FSTs map between one set of symbols and another using an FSA whose alphabet  is composed of pairs of symbols from input and output alphabets•In general, FSTs can be used for–Translator (Hello:Ciao)–Parser/generator (Hello:How may I help you?)–To map between the lexical and surface levels of Kimmo’s 2-level morphology•FST is a 5-tuple consisting of–Q: set of states {q0,q1,q2,q3,q4}: an alphabet of complex symbols, each an i/o pair s.t. i  I (an input alphabet) and o  O (an output alphabet) and  is in I x O–q0: a start state–F: a set of final states in Q {q4}(q,i:o): a transition function mapping Q x  to Q–Emphatic Sheep  Quizzical Cowq0q4q1 q2 q3b:m a:oa:oa:o !:?FST for a 2-level Lexicon•E.g. Reg-n Irreg-pl-n Irreg-sg-nc a t g o:e o:e s e g o o s eq0 q1 q2 q3c a tq1 q3 q4q2se:o e:o eq0 q5gFST for English Nominal Inflectionq0 q7+PL:^s#Combining (cascade or composition) this FSA with FSAs for each noun type replaces e.g. reg-n with every regular noun representation in the lexicon (cf. J&M p.76)q1 q4q2 q5q3 q6reg-nirreg-n-sgirreg-n-pl+N:+PL:-s#+SG:-#+SG:-#+N:+N:Orthographic Rules and FSTs•Define additional FSTs to implement rules


View Full Document

Columbia COMS W4705 - Morphology

Download Morphology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Morphology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Morphology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?