Bayes’ TheoremRemember Language ID?Bayes’ TheoremLanguage IDSlide 5Let’s try it!Slide 7Slide 8General Case (“noisy channel”)Slide 10Slide 11Speech RecognitionLife or Death!600.465 - Intro to NLP - J. Eisner 1Bayes’ TheoremLet’s revisit this600.465 – Intro to NLP – J. Eisner 2Remember Language ID?•Let p(X) = probability of text X in English•Let q(X) = probability of text X in Polish•Which probability is higher?–(we’d also like bias toward English since it’s more likely a priori – ignore that for now)“Horses and Lukasiewicz are on the curriculum.”p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)600.465 - Intro to NLP - J. Eisner 3Bayes’ Theoremp(A | B) = p(B | A) * p(A) / p(B)Easy to check by removing syntactic sugarUse 1: Converts p(B | A) to p(A | B)Use 2: Updates p(A) to p(A | B)Stare at it so you’ll recognize it later600.465 - Intro to NLP - J. Eisner 4Language IDGiven a sentence x, I suggested comparing its prob in different languages:p(SENT=x | LANG=english) (i.e., penglish(SENT=x))p(SENT=x | LANG=polish) (i.e., ppolish(SENT=x))p(SENT=x | LANG=xhosa) (i.e., pxhosa(SENT=x))But surely for language ID we should comparep(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)600.465 - Intro to NLP - J. Eisner 5a posteriori a priori likelihood (what we had before)Language IDsum of these is a way to find p(SENT=x); can divide back by that to get posterior probsFor language ID we should comparep(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)For ease, multiply by p(SENT=x) and comparep(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)Must know prior probabilities; then rewrite asp(LANG=english) * p(SENT=x | LANG=english)p(LANG=polish) * p(SENT=x | LANG=polish)p(LANG=xhosa) * p(SENT=x | LANG=xhosa)likelihoodp(SENT=x | LANG=english)p(SENT=x | LANG=polish)p(SENT=x | LANG=xhosa)600.465 - Intro to NLP - J. Eisner 6Let’s try it!0.000010.000040.00005joint probability===p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)0.0000070.0000080.0000050.70.20.1from a very simple model: a single die whose sides are the languages of the worldfrom a set of trigram dice (actually 3 sets, one per language)bestbestbest compromiseprobability of evidence p(SENT=x)0.000020total over all ways of getting SENT=x“First we pick a random LANG, then we roll a random SENT with the LANG dice.” prior probp(LANG=english) * p(LANG=polish) * p(LANG=xhosa) *joint probabilityLet’s try it!p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)===0.0000070.0000080.000005probability of evidencebest compromise p(SENT=x)0.000020total probability of getting SENT=xone way or another!“First we pick a random LANG, then we roll a random SENT with the LANG dice.”…posterior probabilityp(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)0.000007/0.000020 = 7/200.000008/0.000020 = 8/200.000005/0.000020 = 5/20 bestadd upnormalize(divide bya constantso they’llsum to 1) given the evidence SENT=x,the possible languages sum to 1joint probability600.465 - Intro to NLP - J. Eisner 8Let’s try it!p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)===0.0000070.0000080.000005probability of evidencebest compromise p(SENT=x)0.000020total over all ways of getting x600.465 - Intro to NLP - J. Eisner 9General Case (“noisy channel”)mess up a into ba b“noisy channel”“decoder”most likely reconstruction of ap(A=a)p(B=b | A=a)language texttext speechspelled misspelledEnglish Frenchmaximize p(A=a | B=b)= p(A=a) p(B=b | A=a) / (B=b)= p(A=a) p(B=b | A=a) / a’ p(A=a’) p(B=b | A=a’)600.465 - Intro to NLP - J. Eisner 10likelihooda posteriori a prioriLanguage IDFor language ID we should comparep(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)For ease, multiply by p(SENT=x) and comparep(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)which we find as follows (we need prior probs!):p(LANG=english) * p(SENT=x | LANG=english)p(LANG=polish) * p(SENT=x | LANG=polish)p(LANG=xhosa) * p(SENT=x | LANG=xhosa)600.465 - Intro to NLP - J. Eisner 11likelihooda posteriori a prioriGeneral Case (“noisy channel”)Want most likely A to have generated evidence Bp(A = a1 | B = b)p(A = a2 | B = b)p(A = a3 | B = b)For ease, multiply by p(B=b) and comparep(A = a1, B = b)p(A = a2, B = b)p(A = a3, B = b)which we find as follows (we need prior probs!):p(A = a1) * p(B = b | A = a1)p(A = a2) * p(B = b | A = a2)p(A = a3) * p(B = b | A = a3)600.465 - Intro to NLP - J. Eisner 12likelihooda posteriori a prioriSpeech RecognitionFor baby speech recognition we should comparep(MEANING=gimme | SOUND=uhh)p(MEANING=changeme | SOUND=uhh)p(MEANING=loveme | SOUND=uhh)For ease, multiply by p(SOUND=uhh) & comparep(MEANING=gimme, SOUND=uhh)p(MEANING=changeme, SOUND=uhh)p(MEANING=loveme, SOUND=uhh)which we find as follows (we need prior probs!):p(MEAN=gimme) * p(SOUND=uhh | MEAN=gimme)p(MEAN=changeme) * p(SOUND=uhh | MEAN=changeme)p(MEAN=loveme) * p(SOUND=uhh | MEAN=loveme)600.465 - Intro to NLP - J. Eisner 13Life or Death!p(hoof) = 0.001 so p(hoof) = 0.999p(positive test | hoof) = 0.05 “false pos”p(negative test | hoof) = x 0 “false neg” so p(positive test | hoof) = 1-x 1 What is p(hoof | positive test)?don’t panic - still very small! < 1/51 for any x Does Epitaph have hoof-and-mouth disease?He tested positive – oh no!False positive rate only
View Full Document