1CMSC 723 / LING 645: Intro to Computational LinguisticsNovember 3, 2004Lecture 9 (Dorr):Word Classes, POS Tagging (Chapter 8)Intro to Syntax (Start chapter 9)Prof. Bonnie J. DorrDr. Christof MonzTA: Adam LeeAdministriviaAssignment 2 extension: Now due one week later. NOVEMBER 17, 2004Word Classes and Part-of-Speech TaggingDefinition and ExampleMotivationWord ClassesRule-based TaggingStochastic TaggingTransformation-Based TaggingTagging Unknown WordsDefinition“The process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin)thegirlkissedtheboyonthecheekWORDSTAGSNVPDETAn ExamplethegirlkisstheboyonthecheekLEMMA TAG+DET+NOUN+VPAST+DET+NOUN+PREP+DET+NOUNthegirlkissedtheboyonthecheekWORDFrom: http://www.xrce.xerox.com/competencies/content-analysis/fsnlp/tagger.en.htmlMotivationSpeech synthesis — pronunciationSpeech recognition — class-based N-gramsInformation retrieval — stemming, selection high-content wordsWord-sense disambiguationCorpus analysis of language & lexicography2Word ClassesBasic word classes: Noun, Verb, Adjective, Adverb, Preposition, …POS based on morphology and syntaxOpen vs. Closed classes– Open: • Nouns, Verbs, Adjectives, Adverbs. – Closed: • determiners: a, an, the• pronouns: she, he, I• prepositions: on, under, over, near, by, …Open Class WordsEvery known human language has nouns and verbsNouns: people, places, things– Classes of nouns• proper vs. common• count vs. massVerbs: actions and processesAdjectives: properties, qualitiesAdverbs: hodgepodge!– Unfortunately, John walked home extremely slowly yesterdayClosed Class WordsIdiosyncraticExamples:– prepositions: on, under, over, …– particles: up, down, on, off, …– determiners: a, an, the, …– pronouns: she, who, I, ..– conjunctions: and, but, or, …– auxiliary verbs: can, may should, …– numerals: one, two, three, third, …Prepositions from CELEXEnglish Single-Word Particles Pronouns in CELEX3Conjunctions AuxiliariesWord Classes: Tag SetsVary in number of tags: a dozen to over 200Size of tag sets depends on language, objectives and purpose– Some tagging approaches (e.g., constraint grammar based) make fewer distinctions e.g., conflating prepositions, conjunctions, particles– Simple morphology = more ambiguity = fewer tagsWord Classes: Tag set examplePRPPRP$Example of Penn Treebank Tagging of Brown Corpus SentenceThe/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.VB DT NN .Book that flight .VBZ DT NN VB NN ?Does that flight serve dinner ?The ProblemWords often have more than one word class: this– This is a nice day = PRP– This day is nice = DT– You can go this far = RB4Word Class Ambiguity(in the Brown Corpus)Unambiguous (1 tag): 35,340Ambiguous (2-7 tags): 4,10017 tags26 tags125 tags614 tags2643 tags3,7602 tags(Derose, 1988)Part-of-Speech TaggingRule-Based Tagger: ENGTWOLStochastic Tagger: HMM-basedTransformation-Based Tagger: BrillRule-Based TaggingBasic Idea:– Assign all possible tags to words– Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv.– Typically more than 1000 hand-written rules, but may be machine-learned.Sample ENGTWOL LexiconStage 1 of ENGTWOL TaggingFirst Stage: Run words through Kimmo-style morphological analyzer to get all parts of speech.Example: Pavlov had shown that salivation …Pavlov PAVLOV N NOM SG PROPERhad HAVE V PAST VFIN SVOHAVE PCP2 SVOshown SHOW PCP2 SVOO SVO SVthat ADVPRON DEM SGDET CENTRAL DEM SGCSsalivation N NOM SGStage 2 of ENGTWOL TaggingSecond Stage: Apply constraints.Constraints used in negative way.Example: Adverbial “that” ruleGiven input: “that”If(+1 A/ADV/QUANT)(+2 SENT-LIM)(NOT -1 SVOC/A)Then eliminate non-ADV tagsElse eliminate ADV5Stochastic TaggingBased on probability of certain tag occurring given various possibilitiesNecessitates a training corpusNo probabilities for words not in corpus.Training corpus may be too different from test corpus.Stochastic Tagging (cont.)Simple Method: Choose most frequent tag in training text for each word!– Result: 90% accuracy– Why?– Baseline: Others will do better– HMM is an exampleHMM TaggerIntuition: Pick the most likely tag for this word.HMM Taggers choose tag sequence that maximizes this formula:– P(word|tag) × P(tag|previous n tags)Let T = t1,t2,…,tnLet W = w1,w2,…,wnFind POS tags that generate a sequence of words, i.e., look for most probable sequence of tags T underlying the observed words W.Start with Bigram-HMM Tagger argmaxTP(T|W) argmaxTP(T)P(W|T) argmaxtP(t1…tn)P(w1…wn|t1…tn) argmaxt[P(t1)P(t2|t1)…P(tn|tn-1)][P(w1|t1)P(w2|t2)…P(wn|tn)] To tag a single word: ti= argmaxjP(tj|ti-1)P(wi|tj) How do we compute P(ti|ti-1)?– c(ti-1ti)/c(ti-1) How do we compute P(wi|ti)?– c(wi,ti)/c(ti) How do we compute the most probable tag sequence?– ViterbiAn ExampleSecretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NNPeople/NNS continue/VBP to/TO inquire/VB the DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NNto/TO race/???the/DT race/??? ti= argmaxjP(tj|ti-1)P(wi|tj)– max[P(VB|TO)P(race|VB) , P(NN|TO)P(race|NN)]Brown:– P(NN|TO) = .021 × P(race|NN) = .00041 = .000007– P(VB|TO) = .34 × P(race|VB) = .00003 = .00001An Early Approach to Statistical POS TaggingPARTS tagger (Church, 1988): Stores probability of tag given word instead of word given tag.P(tag|word) × P(tag|previous n tags)Compare to:P(word|tag) × P(tag|previous n tags)Consider this alternative (on your own).http://www.comp.lancs.ac.uk/ucrel/claws/trial.html6Transformation-Based Tagging (Brill Tagging)Combination of Rule-based and stochastic tagging methodologies– Like rule-based because rules are used to specify tags in a certain environment– Like stochastic approach because machine learning is used—with tagged corpus as inputInput:– tagged corpus– dictionary (with most frequent tags)Transformation-BasedTagging/Learning Algorithm Basic Idea of Tagging Algorithm:1. Set the most probable tag for each word as a start value2. Change tags according to rules of type “if word-1 is a
View Full Document