CMSC 723 LING 645 Intro to Computational Linguistics September 15 2004 Dorr More about FSA s Finite State Morphology J M 3 Prof Bonnie J Dorr Dr Christof Monz TA Adam Lee More about FSAs Transducers Equivalence of DFSAs and NFSAs Recognition as search depth first breadth search Recognition using NFSAs NFSA Recognition of baaa Breadth first Recognition of baaa should be q2 Regular languages Regular languages are characterized by FSAs For every NFSA there is an equivalent DFSA Regular languages are closed under concatenation Kleene closure union Concatenation Kleene Closure Union Morphology Definitions and Problems What is Morphology Topology of Morphologies Approaches to Computational Morphology Lexicons and Rules Computational Morphology Approaches Morphology The study of the way words are built up from smaller meaning units called Morphemes Abstract versus Realized HOP PAST hop ed hopped hapt Syntax Lexeme Inflected Lexeme Grammars sentences Morphology Morpheme Allomorph Morphotactics words Phonology Phoneme Allophone Phonotactics letters Phonology and Morphology Phonology vs Orthography Historical spelling night nite attention mission fish Script Limitations Spoken English has 14 vowels heed hid hayed head had hoed hood who d hide how d taught Tut toy enough English Alphabet has 5 Use vowel combinatios far fair fare Consonantal doubling hopping vs hoping Syntax and Morphology Phrase level agreement Subject Verb John studies hard STUDY 3SG Noun Adjective Las vacas hermosas Sub word phrasal structures conj prep That in book PL Poss 1PL noun Which are in our books article plural poss Topology of Morphologies Concatenative vs Templatic Derivational vs Inflectional Regular vs Irregular Concatenative Morphology Morpheme Morpheme Morpheme Stems also called lemma base form root lexeme hope ing hoping hop hopping Affixes Prefixes Antidisestablishmentarianism Suffixes Antidisestablishmentarianism Infixes hingi borrow humingi borrower in Tagalog Circumfixes sagen say gesagt said in German Agglutinative Languages uygarla t ramad klar m zdanm s n zcas na uygar la t r ama d k lar m z dan m s n z cas na Behaving as if you are among those whom we could not cause to become civilized Templatic Morphology Roots and Patterns KTB maktuub ktuuv written written Templatic Morphology Root Meaning KTB writing stuff book library office write letter writer spelling address Derivational vs Inflectional Word Classes Parts of speech noun verb adjectives etc Word class dictates how a word combines with morphemes to form new words Derivational morphology Nominalization computerization appointee killer fuzziness Formation of adjectives computational clueless embraceable CatVar Categorial Variation Database http clipdemos umiacs umd edu catvar Inflectional morphology Adds Tense number person mood aspect Word class doesn t change Word serves new grammatical role Five verb forms in English Other languages have lots more Nouns and Verbs in English Nouns have simple inflectional morphology cat cat s cat s Verbs have more complex morphology Regulars and Irregulars Nouns Cat Cats Mouse Mice Ox Oxen Goose Geese Verbs Walk Walked Go Went Fly Flew Regular English Verbs Morphological Form Classes Regularly Inflected Verbs Stem walk merge try map s form walks merges tries maps ing form walking merging trying mapping Past form or ed participle walked merged tried mapped Irregular English Verbs Morphological Form Classes Irregularly Inflected Verbs Stem eat catch cut s form eats catches cuts ing form eating catching cutting Past form ate caught cut ed participle eaten caught cut To love in Spanish Computational Morphology Finite State Morphology Finite State Transducers FST Input Output Analysis Generation Computational Morphology WORD STEM FEATURES cats cat N PL cat cat N SG cities city N PL geese goose N PL ducks duck N PL or duck V 3SG merging merge V PRES PART caught catch V PAST PART or catch V PAST Building a Morphological Parser The Rules and the Lexicon General versus Specific Regular versus Irregular Accuracy speed space The Morphology of a language Approaches Lexicon only Lexicon and Rules Finite state Automata Finite state Transducers Rules only Lexicon only Morphology The lexicon lists all surface level and lexical level pairs No rules Analysis Generation is easy Very large for English What about Arabic or Turkish Chinese acclaim acclaim acclaimed acclaimed acclaiming acclaims acclaims acclamation acclamations acclimate acclimated acclimated acclimates acclimating acclaim N acclaim V 0 acclaim V ed acclaim V en acclaim V ing acclaim N s acclaim V s acclamation acclamation acclimate acclimate acclimate acclimate acclimate N N s V 0 V ed V en V s V ing Building a Morphological Parser The Rules and the Lexicon General versus Specific Regular versus Irregular Accuracy speed space The Morphology of a language Approaches Lexicon only Lexicon and Rules Finite state Automata Finite state Transducers Rules only Lexicon and Rules FSA Inflectional Noun Morphology English Noun Lexicon reg noun Irreg pl noun Irreg sg noun plural fox cat dog geese sheep mice goose sheep mouse s English Noun Rule Lexicon and Rules FSA English Verb Inflectional Morphology reg verb stem irreg verb stem irreg past verb past past part pres part 3sg walk fry talk impeach cut speak spoken sing sang caught ate eaten ed ed ing s FSA for Derivational Morphology Adjectival Formation More Complex Derivational Morphology Using FSAs for Recognition English Nouns and their Inflection Morphological Parsing Finite state automata FSA Recognizer One level morphology Finite state transducers FST Two level morphology PC Kimmo Koskenniemi 83 input output pair Terminology for PC Kimmo Upper lexical tape Lower surface tape Characters correspond to pairs written a b If a a write a for shorthand Two level lexical entries word boundary morpheme boundary Other any feasible pair that is not in this transducer Final states indicated with and non final states indicated with Four Fold View of FSTs As a recognizer As a generator As a translator As a set relater Nominal Inflection FST Lexical and Intermediate Tapes Spelling Rules Name Rule Description Example Consonant Doubling 1 letter consonant doubled before ing ed beg begging E deletion Silent e dropped before ing and ed make making E insertion e added after s z x ch sh before s watch watches Y replacement y changes to ie before s i before ed try tries K insertion verbs ending with vowel c add k panic panicked Chomsky
View Full Document
Unlocking...