DOC PREVIEW
UT CS 378 - Word Classes and POS Tagging

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Word Classes and POS TaggingWhy Do We Care about Parts of Speech?Remember the Mapping ProblemUnderstanding – the Big PictureTwo Kinds of IssuesWhat is a Part of Speech?Morphological and Syntactic Definition of POSHow Many Parts of Speech Are There?But It Gets HarderWhat’s a PrepositionWhat’s a Pronoun?TagsetsAlgorithms for POS TaggingSlide 14Algorithms for POS Tagging - KnowledgeAlgorithms for POS Tagging - ApproachesTraining/Teaching an NLP ComponentTraining/Teaching a POS TaggerContrast with Training Other NLP PartsRule-Based POS TaggingStochastic POS TaggingHybrids – the Brill TaggerLearning Brill Tagger TransformationsThe Universe of Possible Transformations?One or Many AnswersSearchEvaluationHow Good is An Algorithm?Word Classes and POS TaggingRead J & M Chapter 8.You may also want to look at:http://www.georgetown.edu/faculty/ballc/ling361/tagging_overview.htmlWhy Do We Care about Parts of Speech?•PronunciationHand me the lead pipe.•Predicting what words can be expected nextPersonal pronoun (e.g., I, she) ____________•Stemming-s means singular for verbs, plural for nouns•As the basis for syntactic parsing and then meaning extractionI will lead the group into the lead smelter.•Machine translation• (E) content +N  (F) contenu +N• (E) content +Adj  (F) content +Adj or satisfait +AdjRemember the Mapping ProblemWe’ve sort of ignored this issue as we’ve looked at:•Dealing with a noisy channel, •Probabilistic techniques we can use for various subproblems•Corpora we can analyze to collect our facts.We need to return to it now. POS tagging is the first step.Understanding – the Big PictureMorphologyPOS TaggingSyntaxSemanticsDiscourse IntegrationGeneration goes backwards. For this reason, we generally want declarative representations of the facts. POS tagging is an exception to this.Two Kinds of Issues•Linguistic – what are the facts about language?•Algorithmic – what are effective computational procedures for dealing with those facts?What is a Part of Speech?Is this a semantic distinction? For example, maybe Noun is the class of words for people, places and things. Maybe Adjective is the class of words for properties of nouns.Consider: green bookbook is a Noungreen is an AdjectiveNow consider: book wormThis green is very soothing.Morphological and Syntactic Definition of POSAn Adjective is a word that can fill the blank in:It’s so __________.A Noun is a word that can be marked as plural.A Noun is a word that can fill the blank in:the __________ isWhat is green?It’s so green.Both greens could work for the walls.The green is a little much given the red rug.How Many Parts of Speech Are There?A first cut at the easy distinctions:Open classes: •nouns, verbs, adjectives, adverbsClosed classes: function words•conjunctions: and, or, but•pronounts: I, she, him•prepositions: with, on•determiners: the, a, anBut It Gets Harderprovided, as in “I’ll go provided John does.”there, as in “There aren’t any cookies.”might, as in “I might go.” or “I might could go.”no, as in “No, I won’t go.”What’s a PrepositionFrom the CELEX online dictionary. Frequencies are from the COBUILD 16 million word corpus.What’s a Pronoun?CELEX dictionary list of pronouns:TagsetsBrown corpus tagset (87 tags): http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html Penn Treebank tagset (45 tags): http://www.cs.colorado.edu/~martin/SLP/Figures/ (8.6)C7 tagset (146 tags)http://www.comp.lancs.ac.uk/ucrel/claws7tags.htmlAlgorithms for POS TaggingWhy can’t we just look them up in a dictionary?•Ambiguity – In the Brown corpus, 11.5% of the word types are ambiguous (using 87 tags):Worse, 40% of the tokens are ambiguous.Algorithms for POS TaggingWhy can’t we just look them up in a dictionary?•Words that aren’t in the dictionaryhttp://story.news.yahoo.com/news?tmpl=story&cid=578&ncid=578&e=1&u=/nm/20030922/ts_nm/iraq_usa_dc •One idea: P(ti | wi) = the probability that a random hapax legomenon in the corpus has tag ti.Nouns are more likely than verbs, which are more likely than pronouns.•Another idea: use morphology.Algorithms for POS Tagging - Knowledge•Dictionary•Morphological rules, e.g.,•_____-tion•_____-ly•capitalization•N-gram frequencies•to _____•DET _____ N•But what about rare words, e.g, smelt (two verb forms, melt and past tense of smell, and one noun form, a small fish)•Combining these• V _____-ing I was gracking vs. Gracking is fun.Algorithms for POS Tagging - Approaches•Basic approaches•Rule-Based•Stochastic•Do we return one best answer or several answers and let later steps decide?•How does the requisite knowledge get entered?Training/Teaching an NLP ComponentEach step of NLP analysis requires a module that knows what to do. How do such modules get created?•By hand•By trainingAdvantages of hand creation: based on sound linguistic principles, sensible to people, explainableAdvantages of training from a corpus: less work, extensible to new languages, customizable for specific domains.Training/Teaching a POS TaggerThe problem is tractable. We can do a very good job with just:•a dictionary•A tagset•a large corpus, usually tagged by handThere are only somewhere between 50 and 150 possibilities for each word and 3 or 4 words of context is almost always enough.The task:____ _ __ ______ __ _ _____What is the weather like in Austin?Contrast with Training Other NLP PartsThe task:____ _ __ ______ __ _ _____What is the weather like in Austin?The weather in Austin is like what?MonthsMonthDaysRainfallByStationyearmonthstationrainfallStationsstationCityRule-Based POS TaggingStep 1: Using a dictionary, assign to each word a list of possible tags.Step 2: Figure out what to do about words that are unknown or ambiguous. Two approaches:•Rules that specify what to do.•Rules that specify what not to do:Example: Adverbial “that” ruleGiven input: “that”If(+1 A/ADV/QUANT)(+2 SENT-LIM)(NOT -1 SVOC/A)Then eliminate non-ADV tagsElse eliminate ADVIt isn’t that odd vsI consider that odd vsI believe that he is right.From ENGTWOLStochastic POS TaggingFirst approximation: choose the tag that is most likely for the given word. Next try: consider N-gram frequencies and choose the tag that is most likely in the current context. Should the context be the last N words or the last N classes? Next try: combine the two:)|()|(maxarg1


View Full Document

UT CS 378 - Word Classes and POS Tagging

Documents in this Course
Epidemics

Epidemics

31 pages

Discourse

Discourse

13 pages

Phishing

Phishing

49 pages

Load more
Download Word Classes and POS Tagging
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Word Classes and POS Tagging and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Word Classes and POS Tagging 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?