Word Classes and POS TaggingWhy Do We Care about Parts of Speech?Remember the Mapping ProblemUnderstanding – the Big PictureTwo Kinds of IssuesWhat is a Part of Speech?Morphological and Syntactic Definition of POSHow Many Parts of Speech Are There?But It Gets HarderWhat’s a PrepositionWhat’s a Pronoun?TagsetsAlgorithms for POS TaggingSlide 14Algorithms for POS Tagging - KnowledgeAlgorithms for POS Tagging - ApproachesTraining/Teaching an NLP ComponentTraining/Teaching a POS TaggerContrast with Training Other NLP PartsRule-Based POS TaggingStochastic POS TaggingHybrids – the Brill TaggerLearning Brill Tagger TransformationsThe Universe of Possible Transformations?One or Many AnswersSearchEvaluationHow Good is An Algorithm?Word Classes and POS TaggingRead J & M Chapter 8.You may also want to look at:http://www.georgetown.edu/faculty/ballc/ling361/tagging_overview.htmlWhy Do We Care about Parts of Speech?•PronunciationHand me the lead pipe.•Predicting what words can be expected nextPersonal pronoun (e.g., I, she) ____________•Stemming-s means singular for verbs, plural for nouns•As the basis for syntactic parsing and then meaning extractionI will lead the group into the lead smelter.•Machine translation• (E) content +N (F) contenu +N• (E) content +Adj (F) content +Adj or satisfait +AdjRemember the Mapping ProblemWe’ve sort of ignored this issue as we’ve looked at:•Dealing with a noisy channel, •Probabilistic techniques we can use for various subproblems•Corpora we can analyze to collect our facts.We need to return to it now. POS tagging is the first step.Understanding – the Big PictureMorphologyPOS TaggingSyntaxSemanticsDiscourse IntegrationGeneration goes backwards. For this reason, we generally want declarative representations of the facts. POS tagging is an exception to this.Two Kinds of Issues•Linguistic – what are the facts about language?•Algorithmic – what are effective computational procedures for dealing with those facts?What is a Part of Speech?Is this a semantic distinction? For example, maybe Noun is the class of words for people, places and things. Maybe Adjective is the class of words for properties of nouns.Consider: green bookbook is a Noungreen is an AdjectiveNow consider: book wormThis green is very soothing.Morphological and Syntactic Definition of POSAn Adjective is a word that can fill the blank in:It’s so __________.A Noun is a word that can be marked as plural.A Noun is a word that can fill the blank in:the __________ isWhat is green?It’s so green.Both greens could work for the walls.The green is a little much given the red rug.How Many Parts of Speech Are There?A first cut at the easy distinctions:Open classes: •nouns, verbs, adjectives, adverbsClosed classes: function words•conjunctions: and, or, but•pronounts: I, she, him•prepositions: with, on•determiners: the, a, anBut It Gets Harderprovided, as in “I’ll go provided John does.”there, as in “There aren’t any cookies.”might, as in “I might go.” or “I might could go.”no, as in “No, I won’t go.”What’s a PrepositionFrom the CELEX online dictionary. Frequencies are from the COBUILD 16 million word corpus.What’s a Pronoun?CELEX dictionary list of pronouns:TagsetsBrown corpus tagset (87 tags): http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html Penn Treebank tagset (45 tags): http://www.cs.colorado.edu/~martin/SLP/Figures/ (8.6)C7 tagset (146 tags)http://www.comp.lancs.ac.uk/ucrel/claws7tags.htmlAlgorithms for POS TaggingWhy can’t we just look them up in a dictionary?•Ambiguity – In the Brown corpus, 11.5% of the word types are ambiguous (using 87 tags):Worse, 40% of the tokens are ambiguous.Algorithms for POS TaggingWhy can’t we just look them up in a dictionary?•Words that aren’t in the dictionaryhttp://story.news.yahoo.com/news?tmpl=story&cid=578&ncid=578&e=1&u=/nm/20030922/ts_nm/iraq_usa_dc •One idea: P(ti | wi) = the probability that a random hapax legomenon in the corpus has tag ti.Nouns are more likely than verbs, which are more likely than pronouns.•Another idea: use morphology.Algorithms for POS Tagging - Knowledge•Dictionary•Morphological rules, e.g.,•_____-tion•_____-ly•capitalization•N-gram frequencies•to _____•DET _____ N•But what about rare words, e.g, smelt (two verb forms, melt and past tense of smell, and one noun form, a small fish)•Combining these• V _____-ing I was gracking vs. Gracking is fun.Algorithms for POS Tagging - Approaches•Basic approaches•Rule-Based•Stochastic•Do we return one best answer or several answers and let later steps decide?•How does the requisite knowledge get entered?Training/Teaching an NLP ComponentEach step of NLP analysis requires a module that knows what to do. How do such modules get created?•By hand•By trainingAdvantages of hand creation: based on sound linguistic principles, sensible to people, explainableAdvantages of training from a corpus: less work, extensible to new languages, customizable for specific domains.Training/Teaching a POS TaggerThe problem is tractable. We can do a very good job with just:•a dictionary•A tagset•a large corpus, usually tagged by handThere are only somewhere between 50 and 150 possibilities for each word and 3 or 4 words of context is almost always enough.The task:____ _ __ ______ __ _ _____What is the weather like in Austin?Contrast with Training Other NLP PartsThe task:____ _ __ ______ __ _ _____What is the weather like in Austin?The weather in Austin is like what?MonthsMonthDaysRainfallByStationyearmonthstationrainfallStationsstationCityRule-Based POS TaggingStep 1: Using a dictionary, assign to each word a list of possible tags.Step 2: Figure out what to do about words that are unknown or ambiguous. Two approaches:•Rules that specify what to do.•Rules that specify what not to do:Example: Adverbial “that” ruleGiven input: “that”If(+1 A/ADV/QUANT)(+2 SENT-LIM)(NOT -1 SVOC/A)Then eliminate non-ADV tagsElse eliminate ADVIt isn’t that odd vsI consider that odd vsI believe that he is right.From ENGTWOLStochastic POS TaggingFirst approximation: choose the tag that is most likely for the given word. Next try: consider N-gram frequencies and choose the tag that is most likely in the current context. Should the context be the last N words or the last N classes? Next try: combine the two:)|()|(maxarg1
View Full Document