DOC PREVIEW
MSU CSE 842 - Lecture2-Morphology
Course Cse 842-
Pages 7

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1/12/2011 CSE842, Spring 2011, MSU 1CSE 842Natural Language ProcessingLecture 2: Morphology1/12/2011 CSE842, Spring 2011, MSU 2What is morphology?• The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)• Two broad classes of morphemes:– Stems: “main” morpheme of the word, supplying meaning– Affixes: Bits and pieces that combine with stems to modify their meanings and grammatical functions (prefixes, suffixes, circumfixes, infixes)•Impossible• Enjoying• Multiple affixes–Unreachable, Unbelievable• English doesn’t usually have more than four or five affixes. – Turkish can have more nine or ten affixes: agglutinative language1/12/2011 CSE842, Spring 2011, MSU 3Ways to form words• Inflection: new forms of the same word (usually in the same class)– Tense, number, mood, voice marking in verbs– Number, gender marking in nominals– Comparison of adjectives• Derivation: yield different words in different class– Deverbal nominals– Denominal adjectives and verbs• Compounding: new words out of two or more other words– Noun-noun compounding (e.g., doghouse)• Cliticization: combine a word with a clitic (which acts syntactically like a word but in a reduced form, e.g., I’ve)1/12/2011 CSE842, Spring 2011, MSU 4English Inflectional Morphology• Word stem combines with grammatical morpheme– Usually produces word of same class– Usually serves a grammatical role that the stem could not (e.g. agreement)like Æ likes or likedbird Æ birds• Nouns have a simple inflectional morphology: markers for plural and markers for possessives• Verbs are slightly more complex1/12/2011 CSE842, Spring 2011, MSU 5Nominal Inflection• Nominal morphology– Plural forms•sor es• Irregular forms, e.g., Goose/Geese, Mouse/Mice– Possessives• children’s1/12/2011 CSE842, Spring 2011, MSU 6• Main verbs (walk, like) are relatively regular–-s, -ing, -ed– And productive: Emailed, instant-messaged, faxed–But eat/ate/eaten, catch/caught/caught• Primary (be, have, do) and modal verbs (can, will, must) are often irregular and not productive– Be: am/is/are/were/was/been/being• Irregular verbs few (~250) but frequently occurring• English verbal inflection is much simpler than e.g. LatinVerbal Inflection1/12/2011 CSE842, Spring 2011, MSU 7Regulars and Irregular Verbsmerge try mapmerges tries mapsmerging trying mappingmerged tried mappedStem-s form-ing participlePast form or –ed participleRegularly Inflected VerbsMorphological Form Classeseat catch cuteats catches cutseating catching cuttingate caught cuteaten caught cutStem-s form-ing participlePast form –ed participleIrregularly Inflected VerbsMorphological Form Classes1/12/2011 CSE842, Spring 2011, MSU 8English Derivational Morphology• Word stem combines with grammatical morpheme– Usually produces word of different class– More complicated than inflectional•Example: nominalization–-ize verbs Æ -ation nouns– generalize, realize Æ generalization, realization• Example: verbs, nouns Æ adjectives– embrace, pityÆ embraceable, pitiable– care, wit Æ careless, witless1/12/2011 CSE842, Spring 2011, MSU 9• Example: adjective Æ adverb– happy Æ happily• More complicated to model than inflection– Less productive: *science-less, *concern-less, *go-able, *sleep-able– Meanings of derived terms harder to predict by rule1/12/2011 CSE842, Spring 2011, MSU 10Morphological Parsing• Taking a surface input and identifying its components and underlying structure• Morphological parsing: parsing a word into stem and affixes and identifying the parts and their relationships– Stem and features:• goose Æ goose +N +SG or goose + V•geese Æ goose +N +PL• gooses Æ goose +V +3SG1/12/2011 CSE842, Spring 2011, MSU 11Why parse words?• For spell-checking –Is muncheble a legal word?• To identify a word’s part-of-speech (pos)– For sentence parsing, for machine translation, …• To identify a word’s stem– For information retrieval• Why not just list all word forms in a lexicon?1/12/2011 CSE842, Spring 2011, MSU 12What do we need to build a morphological parser?• Lexicon: stems and affixes (w/ corresponding pos)• Morphotactics of the language: model of how morphemes can be affixed to a stem– E.g., in English, plural morpheme follows the noun rather than preceding it. • Orthographic rules: spelling modifications that occur when affixation occurs– y -> ie (e.g., city -> cities)1/12/2011 CSE842, Spring 2011, MSU 13Morphotactic Models• English nominal inflectionq0 q2q1plural (-s)reg-nirreg-sg-nirreg-pl-n•Inputs: cats, goose, geese1/12/2011 CSE842, Spring 2011, MSU 14• Derivational morphology: adjective fragmentq5q0q1 q2un-adj-root-er, -ly, -estεWhat kind of adjectives will this FSA recognize/generate?1/12/2011 CSE842, Spring 2011, MSU 15• Derivational morphology: adjective fragmentq3q5q4q0q1 q2un-adj-root1-er, -ly, -estεadj-root1adj-root2-er, -est• Adj-root1: clear, happy• Adj-root2: big, red1/12/2011 CSE842, Spring 2011, MSU 16Using FSAs to Represent the Lexicon and Do Morphological Recognition• Lexicon: We can expand each non-terminal in our NFSA into each stem in its class (e.g. adj_root2= {big, red}) and expand each such stem to the letters it includes (e.g. red Ær e d, big Æ b i g)q0q1req2q4q3-er, -estdbgq5iq7εq61/12/2011 CSE842, Spring 2011, MSU 17Limitations• To cover all of e.g. English will require very large FSAs with consequent search problems– Adding new items to the lexicon means recomputingthe FSA– Non-determinism • FSAs can only tell us whether a word is in the language or not – what if we want to know more?– What is the stem?– What are the affixes and what sort are they?– We used this information to build our FSA: can we get it back?1/12/2011 CSE842, Spring 2011, MSU 18Parsing with Finite State Transducers•cats Æcat +N +PL• Kimmo Koskenniemi’s two-level morphology– Words represented as correspondences between lexical level (the morphemes) and surface level (the orthographic word)– Morphological parsing: building mappings between the lexical and surface levelsstacc+PL+NtaLexicalSurface1/12/2011 CSE842, Spring 2011, MSU 19Finite State Transducers• FSTs map between one set of symbols and another


View Full Document

MSU CSE 842 - Lecture2-Morphology

Download Lecture2-Morphology
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture2-Morphology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture2-Morphology 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?