PowerPoint PresentationLast TimeGoalsParts-of-SpeechSlide 5Part-of-Speech AmbiguityWhy POS Tagging?HMMsParameter EstimationPractical Issues with EstimationDisambiguationFinding the Best TrajectoryThe Path TrellisThe Viterbi AlgorithmThe Path Trellis as DP TableHow Well Does It Work?What’s Next for POS TaggingHMMs as Language ModelsSumming over PathsThe Forward-Backward AlgorithmWhat Does This Buy Us?How’s the HMM as a LM?Next TimeLectures #16 & 17: Part of Speech Tagging, Hidden Markov ModelsThanks to Dan Klein of UC Berkeley for many of the materials used in this lecture.CS 601R, section 2:Statistical Natural Language ProcessingLast TimeMaximum entropy modelsA technique for estimating multinomial distributions conditionally on many featuresA building block of many NLP systems'exp ( ') ( )i ic ic f dλ∑ ∑=),|( λdcPexp ( ) ( )i iic f dλ∑GoalsTo be able to model sequencesApplication: Part-of-Speech TaggingTechnique: Hidden Markov Models (HMMs)Think of this as sequential classificationParts-of-SpeechSyntactic classes of wordsUseful distinctions vary from language to languageTagsets vary from corpus to corpus [See M+S p. 142]Some tags from the Penn tagsetCD numeral, cardinal mid-1890 nine-thirty 0.5 oneDT determiner a all an every no that theIN preposition or conjunction, subordinating among whether out on by ifJJ adjective or numeral, ordinal third ill-mannered regrettableMD modal auxiliary can may might will would NN noun, common, singular or mass cabbage thermostat investment subhumanityNNP noun, proper, singular Motown Cougar Yvette LiverpoolPRP pronoun, personal hers himself it we themRB adverb occasionally maddeningly adventurouslyRP particle aboard away back by on open throughVB verb, base form ask bring fire see takeVBD verb, past tense pleaded swiped registered sawVBN verb, past participle dilapidated imitated reunifed unsettledVBP verb, present tense, not 3rd person singular twist appear comprise mold postponeCC conjunction, coordinating and both but either orCD numeral, cardinal mid-1890 nine-thirty 0.5 oneDT determiner a all an every no that theEX existential there there FW foreign word gemeinschaft hund ich jeuxIN preposition or conjunction, subordinating among whether out on by ifJJ adjective or numeral, ordinal third ill-mannered regrettableJJR adjective, comparative braver cheaper tallerJJS adjective, superlative bravest cheapest tallestMD modal auxiliary can may might will would NN noun, common, singular or mass cabbage thermostat investment subhumanityNNP noun, proper, singular Motown Cougar Yvette LiverpoolNNPS noun, proper, plural Americans Materials StatesNNS noun, common, plural undergraduates bric-a-brac averagesPOS genitive marker ' 's PRP pronoun, personal hers himself it we themPRP$ pronoun, possessive her his mine my our ours their thy your RB adverb occasionally maddeningly adventurouslyRBR adverb, comparative further gloomier heavier less-perfectlyRBS adverb, superlative best biggest nearest worst RP particle aboard away back by on open throughTO "to" as preposition or infinitive marker to UH interjection huh howdy uh whammo shucks heckVB verb, base form ask bring fire see takeVBD verb, past tense pleaded swiped registered sawVBG verb, present participle or gerund stirring focusing approaching erasingVBN verb, past participle dilapidated imitated reunifed unsettledVBP verb, present tense, not 3rd person singular twist appear comprise mold postponeVBZ verb, present tense, 3rd person singular bases reconstructs marks usesWDT WH-determiner that what whatever which whichever WP WH-pronoun that what whatever which who whomWP$ WH-pronoun, possessive whose WRB Wh-adverb however whenever where whyPart-of-Speech AmbiguityExampleTwo basic sources of constraint:Grammatical environmentIdentity of the current wordMany more possible features:… but we won’t be able to use them until next classFed raises interest rates 0.5 percentNNP NNS NN NNS CD NNVBN VBZ VBP VBZVBD VBWhy POS Tagging?Useful in and of itselfText-to-speech: record, leadLemmatization: saw[v] see, saw[n] sawQuick-and-dirty NP-chunk detection: grep {JJ | NN}* {NN | NNS}Useful as a pre-processing step for parsingLess tag ambiguity means fewer parsesHowever, some tag choices are better decided by parsers! DT NN IN NN VBD NNS VBDThe average of interbank offered rates plummeted … DT NNP NN VBD VBN RP NN NNSThe Georgia branch had taken on loan commitments …INVBNHMMsWe want a generative model over sequences t and observations w using states sAssumptions:Tag sequence is generated by an order n markov modelThis corresponds to a 1st order model over tag n-gramsWords are chosen independently, conditioned only on the tagThese are totally broken assumptions: why?∏−−=iiiiiitwPtttPWTP )|(),|(),(21<,>∏−=iiiiiswPssPWTP )|()|(),(1s1s2snw1w2wns0< , t1> < t1, t2> < tn-1, tn>Parameter EstimationNeed two multinomialsTransitions:Emissions:Can get these off a collection of tagged sentences:),|(21 −− iiitttP)|(iitwPPractical Issues with EstimationUse standard smoothing methods to estimate transition scores, e.g.:Emissions are trickierWords we’ve never seen beforeWords which occur with tags we’ve never seenOne option: break out the Good-Turing smoothingIssue: words aren’t black boxes:Another option: decompose words into features and use a maxent model along with Bayes’ rule.)|(ˆ),|(ˆ),|(1121221 −−−−−+=iiiiiiiittPtttPtttP λλ343,127.23 11-year Minteria reintroducible)(/)()|()|( tPwPwtPtwPMAXENT=DisambiguationGiven these two multinomials, we can score any word / tag sequence pairIn principle, we’re done – list all possible tag sequences, score each one, pick the best one (the Viterbi state sequence) Fed raises interest rates 0.5 percent .NNP VBZ NN NNS CD NN .P(NNP|<,>) P(Fed|NNP) P(VBZ|<NNP,>) P(raises|VBZ) P(NN|<VBZ,NNP>)…..NNP VBZ NN NNS CD NNNNP NNS NN NNS CD NNNNP VBZ VB NNS CD NNlogP = -23logP = -29logP = -27<,> <,NNP> <NNP, VBZ> <VBZ, NN> <NN, NNS> <NNS, CD> <CD, NN> <STOP>Finding the Best Trajectory Too many
View Full Document