Hidden Markov ModelsPlease See the SpreadsheetMarginalizationMarginalizationMarginalizationMarginalizationConditionalizationMarginalization & conditionalization in the weather exampleInteresting probabilities in the weather exampleThe HMM trellisComputing ValuesThe HMM trellisComputing ValuesComputing State ProbabilitiesComputing Arc ProbabilitiesPosterior taggingAlternative: Viterbi taggingMax-product instead of sum-product600.465 - Intro to NLP - J. Eisner 1Hidden Markov Modelsand the Forward-Backward Algorithm600.465 - Intro to NLP - J. Eisner 2Please See the Spreadsheet I like to teach this material using an interactive spreadsheet: http://cs.jhu.edu/~jason/papers/#tnlp02 Has the spreadsheet and the lesson plan I’ll also show the following slides at appropriate points.600.465 - Intro to NLP - J. Eisner 3MarginalizationSALES Jan Feb Mar Apr …Widgets 5 0 3 2 …Grommets 7 3 10 8 …Gadgets 0 0 1 0 …… … … … … …600.465 - Intro to NLP - J. Eisner 4MarginalizationSALES Jan Feb Mar Apr … TOTALWidgets 5 0 3 2 … 30Grommets 7 3 10 8 … 80Gadgets 0 0 1 0 … 2… … … … … …TOTAL 99 25 126 90 1000Write the totals in the marginsGrand total600.465 - Intro to NLP - J. Eisner 5Marginalizationprob. Jan Feb Mar Apr … TOTALWidgets .005 0 .003 .002 … .030Grommets .007 .003 .010 .008 … .080Gadgets 0 0 .001 0 … .002… … … … … …TOTAL .099 .025 .126 .090 1.000Grand totalGiven a random sale, what & when was it?600.465 - Intro to NLP - J. Eisner 6Marginalizationprob. Jan Feb Mar Apr … TOTALWidgets .005 0 .003 .002 … .030Grommets .007 .003 .010 .008 … .080Gadgets 0 0 .001 0 … .002… … … … … …TOTAL .099 .025 .126 .090 1.000Given a random sale, what & when was it?marginal prob: p(Jan)marginal prob: p(widget)joint prob: p(Jan,widget)marginal prob:p(anything in table)600.465 - Intro to NLP - J. Eisner 7Conditionalizationprob. Jan Feb Mar Apr … TOTALWidgets .005 0 .003 .002 … .030Grommets .007 .003 .010 .008 … .080Gadgets 0 0 .001 0 … .002… … … … … …TOTAL .099 .025 .126 .090 1.000Given a random sale in Jan., what was it?marginal prob: p(Jan)joint prob: p(Jan,widget)conditional prob: p(widget|Jan)=.005/.099p(… | Jan).005/.099.007/.0990….099/.099Divide column through by Z=0.99 so it sums to 1600.465 - Intro to NLP - J. Eisner 8Marginalization & conditionalizationin the weather example Instead of a 2-dimensional table, now we have a 66-dimensional table: 33 of the dimensions have 2 choices: {C,H} 33 of the dimensions have 3 choices: {1,2,3} Cross-section showing just 3 of the dimensions:Weather2=C Weather2=HIceCream2=1 0.000… 0.000…IceCream2=2 0.000… 0.000…IceCream2=3 0.000… 0.000…600.465 - Intro to NLP - J. Eisner 9Interesting probabilities in the weather example Prior probability of weather:p(Weather=CHH…) Posterior probability of weather (after observing evidence):p(Weather=CHH… | IceCream=233…) Posterior marginal probability that day 3 is hot:p(Weather3=H | IceCream=233…)= w such that w3=H p(Weather=w | IceCream=233…) Posterior conditional probability that day 3 is hot if day 2 is:p(Weather3=H | Weather2=H, IceCream=233…)600.465 - Intro to NLP - J. Eisner 10The HMM trellisThe dynamic programming computation of works forward from Start.Day 1: 2 conesStartCHCHDay 2: 3 conesCH0.5*0.2=0.1p(H|Start)*p(2|H)p(H|H)*p(3|H)0.8*0.7=0.56p(H|H)*p(3|H)0.8*0.7=0.560.1*0.7=0.07p(H|C)*p(3|H)0.1*0.7=0.07p(H|C)*p(3|H)p(C|H)*p(3|C)0.1*0.1=0.01p(C|H)*p(3|C)0.1*0.1=0.01Day 3: 3 cones This “trellis” graph has 233 paths. These represent all possible weather sequences that could explain the observed ice cream sequence 2, 3, 3, … What is the product of all the edge weights on one path H, H, H, …? Edge weights are chosen to get p(weather=H,H,H,… & icecream=2,3,3,…) What is the probability at each state? It’s the total probability of all paths from Start to that state. How can we compute it fast when there are many paths?=0.1*0.07+0.1*0.56=0.063=0.1*0.08+0.1*0.01=0.009=0.1=0.1=0.009*0.07+0.063*0.56=0.03591=0.009*0.08+0.063*0.01=0.00135=1p(C|Start)*p(2|C)0.5*0.2=0.1p(C|C)*p(3|C)0.8*0.1=0.08p(C|C)*p(3|C)0.8*0.1=0.08600.465 - Intro to NLP - J. Eisner 11Computing ValuesCHp2fdeAll paths to state: = (ap1 + bp1 + cp1)+ (dp2 + ep2 + fp2)= 1p1 + 2p22Cp1abc1Thanks, distributive law!600.465 - Intro to NLP - J. Eisner 12The HMM trellisDay 34: lose diaryStopp(Stop|C)0.10.1p(Stop|H)CHp(C|C)*p(2|C)0.8*0.2=0.16p(H|H)*p(2|H)0.8*0.2=0.16=0.16*0.1+0.02*0.1=0.018p(H|C)*p(2|H)0.1*0.2=0.02 =0.16*0.1+0.02*0.1=0.018Day 33: 2 cones=0.1CHp(C|C)*p(2|C)0.8*0.2=0.16p(H|H)*p(2|H)0.8*0.2=0.16 0.1*0.2=0.02p(C|H)*p(2|C)=0.16*0.018+0.02*0.018=0.00324p(H|C)*p(2|H)0.1*0.2=0.02 =0.16*0.018+0.02*0.018=0.00324Day 32: 2 conesThe dynamic programming computation of works back from Stop. What is the probability at each state? It’s the total probability of all paths from that state to Stop How can we compute it fast when there are many paths?p(C|H)*p(2|C)CH0.1*0.2=0.02=0.1600.465 - Intro to NLP - J. Eisner 13Computing ValuesCHp2zxyAll paths from state: = (p1u + p1v + p1w)+ (p2x + p2y + p2z)= p11 + p22Cp1uvw21600.465 - Intro to NLP - J. Eisner 14Computing State ProbabilitiesCxyzabcAll paths through state:ax + ay + az+ bx + by + bz+ cx + cy + cz= (a+b+c)(x+y+z)= (C) (C)Thanks, distributive law!600.465 - Intro to NLP - J. Eisner 15Computing Arc ProbabilitiesCHpxyzabcAll paths through the p arc:apx + apy + apz+ bpx + bpy + bpz+ cpx + cpy + cpz= (a+b+c)p(x+y+z)= (H) p (C)Thanks, distributive law!600.465 - Intro to NLP - J. Eisner 16600.465 - Intro to NLP - J. Eisner 16Posterior tagging Give each word its highest-prob tag according to forward-backward. Do this independently of other words. Det Adj 0.35 Det N 0.2 N V 0.45 Output is Det V 0 Defensible: maximizes expected # of correct tags. But not a coherent sequence. May screw up subsequent processing (e.g., can’t find any parse). exp # correct tags = 0.55+0.35 = 0.9 exp # correct tags = 0.55+0.2 = 0.75 exp # correct tags = 0.45+0.45 = 0.9 exp # correct tags = 0.55+0.45 = 1.0600.465 - Intro to NLP - J. Eisner 17600.465 - Intro to NLP - J. Eisner 17Alternative: Viterbi taggingPosterior tagging:
View Full Document