DOC PREVIEW
UMD CMSC 723 - Lecture 9

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CMSC 723 / LING 645: Intro to Computational LinguisticsNovember 3, 2004Lecture 9 (Dorr):Word Classes, POS Tagging (Chapter 8)Intro to Syntax (Start chapter 9)Prof. Bonnie J. DorrDr. Christof MonzTA: Adam LeeAdministriviaAssignment 2 extension: Now due one week later. NOVEMBER 17, 2004Word Classes and Part-of-Speech TaggingDefinition and ExampleMotivationWord ClassesRule-based TaggingStochastic TaggingTransformation-Based TaggingTagging Unknown WordsDefinition“The process of assigning a part-of-speech or other lexical class marker to each word in a corpus” (Jurafsky and Martin)thegirlkissedtheboyonthecheekWORDSTAGSNVPDETAn ExamplethegirlkisstheboyonthecheekLEMMA TAG+DET+NOUN+VPAST+DET+NOUN+PREP+DET+NOUNthegirlkissedtheboyonthecheekWORDFrom: http://www.xrce.xerox.com/competencies/content-analysis/fsnlp/tagger.en.htmlMotivationSpeech synthesis — pronunciationSpeech recognition — class-based N-gramsInformation retrieval — stemming, selection high-content wordsWord-sense disambiguationCorpus analysis of language & lexicography2Word ClassesBasic word classes: Noun, Verb, Adjective, Adverb, Preposition, …POS based on morphology and syntaxOpen vs. Closed classes– Open: • Nouns, Verbs, Adjectives, Adverbs. – Closed: • determiners: a, an, the• pronouns: she, he, I• prepositions: on, under, over, near, by, …Open Class WordsEvery known human language has nouns and verbsNouns: people, places, things– Classes of nouns• proper vs. common• count vs. massVerbs: actions and processesAdjectives: properties, qualitiesAdverbs: hodgepodge!– Unfortunately, John walked home extremely slowly yesterdayClosed Class WordsIdiosyncraticExamples:– prepositions: on, under, over, …– particles: up, down, on, off, …– determiners: a, an, the, …– pronouns: she, who, I, ..– conjunctions: and, but, or, …– auxiliary verbs: can, may should, …– numerals: one, two, three, third, …Prepositions from CELEXEnglish Single-Word Particles Pronouns in CELEX3Conjunctions AuxiliariesWord Classes: Tag SetsVary in number of tags: a dozen to over 200Size of tag sets depends on language, objectives and purpose– Some tagging approaches (e.g., constraint grammar based) make fewer distinctions e.g., conflating prepositions, conjunctions, particles– Simple morphology = more ambiguity = fewer tagsWord Classes: Tag set examplePRPPRP$Example of Penn Treebank Tagging of Brown Corpus SentenceThe/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.VB DT NN .Book that flight .VBZ DT NN VB NN ?Does that flight serve dinner ?The ProblemWords often have more than one word class: this– This is a nice day = PRP– This day is nice = DT– You can go this far = RB4Word Class Ambiguity(in the Brown Corpus)Unambiguous (1 tag): 35,340Ambiguous (2-7 tags): 4,10017 tags26 tags125 tags614 tags2643 tags3,7602 tags(Derose, 1988)Part-of-Speech TaggingRule-Based Tagger: ENGTWOLStochastic Tagger: HMM-basedTransformation-Based Tagger: BrillRule-Based TaggingBasic Idea:– Assign all possible tags to words– Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv.– Typically more than 1000 hand-written rules, but may be machine-learned.Sample ENGTWOL LexiconStage 1 of ENGTWOL TaggingFirst Stage: Run words through Kimmo-style morphological analyzer to get all parts of speech.Example: Pavlov had shown that salivation …Pavlov PAVLOV N NOM SG PROPERhad HAVE V PAST VFIN SVOHAVE PCP2 SVOshown SHOW PCP2 SVOO SVO SVthat ADVPRON DEM SGDET CENTRAL DEM SGCSsalivation N NOM SGStage 2 of ENGTWOL TaggingSecond Stage: Apply constraints.Constraints used in negative way.Example: Adverbial “that” ruleGiven input: “that”If(+1 A/ADV/QUANT)(+2 SENT-LIM)(NOT -1 SVOC/A)Then eliminate non-ADV tagsElse eliminate ADV5Stochastic TaggingBased on probability of certain tag occurring given various possibilitiesNecessitates a training corpusNo probabilities for words not in corpus.Training corpus may be too different from test corpus.Stochastic Tagging (cont.)Simple Method: Choose most frequent tag in training text for each word!– Result: 90% accuracy– Why?– Baseline: Others will do better– HMM is an exampleHMM TaggerIntuition: Pick the most likely tag for this word.HMM Taggers choose tag sequence that maximizes this formula:– P(word|tag) × P(tag|previous n tags)Let T = t1,t2,…,tnLet W = w1,w2,…,wnFind POS tags that generate a sequence of words, i.e., look for most probable sequence of tags T underlying the observed words W.Start with Bigram-HMM Tagger argmaxTP(T|W) argmaxTP(T)P(W|T) argmaxtP(t1…tn)P(w1…wn|t1…tn) argmaxt[P(t1)P(t2|t1)…P(tn|tn-1)][P(w1|t1)P(w2|t2)…P(wn|tn)] To tag a single word: ti= argmaxjP(tj|ti-1)P(wi|tj) How do we compute P(ti|ti-1)?– c(ti-1ti)/c(ti-1) How do we compute P(wi|ti)?– c(wi,ti)/c(ti) How do we compute the most probable tag sequence?– ViterbiAn ExampleSecretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NNPeople/NNS continue/VBP to/TO inquire/VB the DT reason/NN for/IN the/DT race/NN for/IN outer/JJ space/NNto/TO race/???the/DT race/??? ti= argmaxjP(tj|ti-1)P(wi|tj)– max[P(VB|TO)P(race|VB) , P(NN|TO)P(race|NN)]Brown:– P(NN|TO) = .021 × P(race|NN) = .00041 = .000007– P(VB|TO) = .34 × P(race|VB) = .00003 = .00001An Early Approach to Statistical POS TaggingPARTS tagger (Church, 1988): Stores probability of tag given word instead of word given tag.P(tag|word) × P(tag|previous n tags)Compare to:P(word|tag) × P(tag|previous n tags)Consider this alternative (on your own).http://www.comp.lancs.ac.uk/ucrel/claws/trial.html6Transformation-Based Tagging (Brill Tagging)Combination of Rule-based and stochastic tagging methodologies– Like rule-based because rules are used to specify tags in a certain environment– Like stochastic approach because machine learning is used—with tagged corpus as inputInput:– tagged corpus– dictionary (with most frequent tags)Transformation-BasedTagging/Learning Algorithm Basic Idea of Tagging Algorithm:1. Set the most probable tag for each word as a start value2. Change tags according to rules of type “if word-1 is a


View Full Document

UMD CMSC 723 - Lecture 9

Documents in this Course
Smoothing

Smoothing

15 pages

Load more
Download Lecture 9
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 9 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 9 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?