600.465 - Intro to NLP - J. Eisner 1Bayes’ TheoremLet’s revisit this600.465 –Intro to NLP –J. Eisner 2Remember Language ID?• Let p(X) = probability of text X in English• Let q(X) = probability of text X in Polish• Which probability is higher?– (we’d also like bias toward English since it’s more likely a priori – ignore that for now)“Horses and Lukasiewicz are on the curriculum.”p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)600.465 - Intro to NLP - J. Eisner 3Bayes’ Theorem p(A | B) = p(B | A) * p(A) / p(B) Easy to check by removing syntactic sugar Use 1: Converts p(B | A) to p(A | B) Use 2: Updates p(A) to p(A | B) Stare at it so you’ll recognize it later600.465 - Intro to NLP - J. Eisner 4Language ID Given a sentence x, I suggested comparing its prob in different languages: p(SENT=x | LANG=english) (i.e., penglish(SENT=x)) p(SENT=x | LANG=polish) (i.e., ppolish(SENT=x)) p(SENT=x | LANG=xhosa) (i.e., pxhosa(SENT=x)) But surely for language ID we should compare p(LANG=english | SENT=x) p(LANG=polish | SENT=x) p(LANG=xhosa | SENT=x)600.465 - Intro to NLP - J. Eisner 5a posterioria priori likelihood (what we had before)Language IDsum of these is a way to find p(SENT=x); can divide back by that to get posterior probs For language ID we should compare p(LANG=english | SENT=x) p(LANG=polish | SENT=x) p(LANG=xhosa | SENT=x) For ease, multiply by p(SENT=x) and compare p(LANG=english, SENT=x) p(LANG=polish, SENT=x) p(LANG=xhosa, SENT=x) Must know prior probabilities; then rewrite as p(LANG=english) * p(SENT=x | LANG=english) p(LANG=polish) * p(SENT=x | LANG=polish) p(LANG=xhosa) * p(SENT=x | LANG=xhosa)likelihoodp(SENT=x | LANG=english)p(SENT=x | LANG=polish)p(SENT=x | LANG=xhosa)600.465 - Intro to NLP - J. Eisner 6Let’s try it!0.000010.000040.00005joint probability===p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)0.0000070.0000080.0000050.70.20.1from a very simple model: a single die whose sides are the languages of the worldfrom a set of trigram dice (actually 3 sets, one per language)bestbestbest compromiseprobability of evidencep(SENT=x)0.000020total over all ways of getting SENT=x“First we pick a random LANG, then we roll a random SENT with the LANG dice.”prior probp(LANG=english) * p(LANG=polish) * p(LANG=xhosa) *joint probabilityLet’s try it!p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)===0.0000070.0000080.000005probability of evidencebest compromisep(SENT=x)0.000020total probability of getting SENT=xone way or another!“First we pick a random LANG, then we roll a random SENT with the LANG dice.”…posterior probabilityp(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)0.000007/0.000020 = 7/200.000008/0.000020 = 8/200.000005/0.000020 = 5/20bestadd upnormalize(divide bya constantso they’llsum to 1)given the evidence SENT=x,the possible languages sum to 1joint probability600.465 - Intro to NLP - J. Eisner 8Let’s try it!p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)===0.0000070.0000080.000005probability of evidencebest compromisep(SENT=x)0.000020total over all ways of getting x600.465 - Intro to NLP - J. Eisner 9General Case (“noisy channel”)mess up a into bab“noisy channel”“decoder”most likely reconstruction of ap(A=a)p(B=b | A=a)language texttext speechspelled misspelledEnglish Frenchmaximize p(A=a | B=b)= p(A=a) p(B=b | A=a) / (B=b)= p(A=a) p(B=b | A=a) / a’p(A=a’) p(B=b | A=a’)600.465 - Intro to NLP - J. Eisner 10likelihooda posterioria prioriLanguage ID For language ID we should compare p(LANG=english | SENT=x) p(LANG=polish | SENT=x) p(LANG=xhosa | SENT=x) For ease, multiply by p(SENT=x) and compare p(LANG=english, SENT=x) p(LANG=polish, SENT=x) p(LANG=xhosa, SENT=x) which we find as follows (we need prior probs!): p(LANG=english) * p(SENT=x | LANG=english) p(LANG=polish) * p(SENT=x | LANG=polish) p(LANG=xhosa) * p(SENT=x | LANG=xhosa)600.465 - Intro to NLP - J. Eisner 11likelihooda posterioria prioriGeneral Case (“noisy channel”) Want most likely A to have generated evidence B p(A = a1 | B = b) p(A = a2 | B = b) p(A = a3 | B = b) For ease, multiply by p(B=b) and compare p(A = a1, B = b) p(A = a2, B = b) p(A = a3, B = b) which we find as follows (we need prior probs!): p(A = a1) * p(B = b | A = a1) p(A = a2) * p(B = b | A = a2) p(A = a3) * p(B = b | A = a3)600.465 - Intro to NLP - J. Eisner 12likelihooda posterioria prioriSpeech Recognition For baby speech recognition we should compare p(MEANING=gimme | SOUND=uhh) p(MEANING=changeme | SOUND=uhh) p(MEANING=loveme | SOUND=uhh) For ease, multiply by p(SOUND=uhh) & compare p(MEANING=gimme, SOUND=uhh) p(MEANING=changeme, SOUND=uhh) p(MEANING=loveme, SOUND=uhh) which we find as follows (we need prior probs!): p(MEAN=gimme) * p(SOUND=uhh | MEAN=gimme) p(MEAN=changeme) * p(SOUND=uhh | MEAN=changeme) p(MEAN=loveme) * p(SOUND=uhh | MEAN=loveme)600.465 - Intro to NLP - J. Eisner 13Life or Death! p(hoof) = 0.001 so p(hoof) = 0.999 p(positive test | hoof) = 0.05 “false pos” p(negative test | hoof) = x 0 “false neg”so p(positive test | hoof) = 1-x 1 What is p(hoof | positive test)? don’t panic - still very small! < 1/51 for any x Does Epitaph have hoof-and-mouth disease?He tested positive – oh no!False positive rate only
View Full Document