Unformatted text preview:

600.465 - Intro to NLP - J. Eisner 1Bayes’ TheoremLet’s revisit this600.465 –Intro to NLP –J. Eisner 2Remember Language ID?• Let p(X) = probability of text X in English• Let q(X) = probability of text X in Polish• Which probability is higher?– (we’d also like bias toward English since it’s more likely a priori – ignore that for now)“Horses and Lukasiewicz are on the curriculum.”p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)600.465 - Intro to NLP - J. Eisner 3Bayes’ Theorem p(A | B) = p(B | A) * p(A) / p(B) Easy to check by removing syntactic sugar Use 1: Converts p(B | A) to p(A | B) Use 2: Updates p(A) to p(A | B) Stare at it so you’ll recognize it later600.465 - Intro to NLP - J. Eisner 4Language ID Given a sentence x, I suggested comparing its prob in different languages: p(SENT=x | LANG=english) (i.e., penglish(SENT=x)) p(SENT=x | LANG=polish) (i.e., ppolish(SENT=x)) p(SENT=x | LANG=xhosa) (i.e., pxhosa(SENT=x)) But surely for language ID we should compare p(LANG=english | SENT=x) p(LANG=polish | SENT=x) p(LANG=xhosa | SENT=x)600.465 - Intro to NLP - J. Eisner 5a posterioria priori likelihood (what we had before)Language IDsum of these is a way to find p(SENT=x); can divide back by that to get posterior probs For language ID we should compare p(LANG=english | SENT=x) p(LANG=polish | SENT=x) p(LANG=xhosa | SENT=x) For ease, multiply by p(SENT=x) and compare p(LANG=english, SENT=x) p(LANG=polish, SENT=x) p(LANG=xhosa, SENT=x) Must know prior probabilities; then rewrite as p(LANG=english) * p(SENT=x | LANG=english) p(LANG=polish) * p(SENT=x | LANG=polish) p(LANG=xhosa) * p(SENT=x | LANG=xhosa)likelihoodp(SENT=x | LANG=english)p(SENT=x | LANG=polish)p(SENT=x | LANG=xhosa)600.465 - Intro to NLP - J. Eisner 6Let’s try it!0.000010.000040.00005joint probability===p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)0.0000070.0000080.0000050.70.20.1from a very simple model: a single die whose sides are the languages of the worldfrom a set of trigram dice (actually 3 sets, one per language)bestbestbest compromiseprobability of evidencep(SENT=x)0.000020total over all ways of getting SENT=x“First we pick a random LANG, then we roll a random SENT with the LANG dice.”prior probp(LANG=english) * p(LANG=polish) * p(LANG=xhosa) *joint probabilityLet’s try it!p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)===0.0000070.0000080.000005probability of evidencebest compromisep(SENT=x)0.000020total probability of getting SENT=xone way or another!“First we pick a random LANG, then we roll a random SENT with the LANG dice.”…posterior probabilityp(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)0.000007/0.000020 = 7/200.000008/0.000020 = 8/200.000005/0.000020 = 5/20bestadd upnormalize(divide bya constantso they’llsum to 1)given the evidence SENT=x,the possible languages sum to 1joint probability600.465 - Intro to NLP - J. Eisner 8Let’s try it!p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)===0.0000070.0000080.000005probability of evidencebest compromisep(SENT=x)0.000020total over all ways of getting x600.465 - Intro to NLP - J. Eisner 9General Case (“noisy channel”)mess up a into bab“noisy channel”“decoder”most likely reconstruction of ap(A=a)p(B=b | A=a)language  texttext  speechspelled  misspelledEnglish  Frenchmaximize p(A=a | B=b)= p(A=a) p(B=b | A=a) / (B=b)= p(A=a) p(B=b | A=a) / a’p(A=a’) p(B=b | A=a’)600.465 - Intro to NLP - J. Eisner 10likelihooda posterioria prioriLanguage ID For language ID we should compare p(LANG=english | SENT=x) p(LANG=polish | SENT=x) p(LANG=xhosa | SENT=x) For ease, multiply by p(SENT=x) and compare p(LANG=english, SENT=x) p(LANG=polish, SENT=x) p(LANG=xhosa, SENT=x) which we find as follows (we need prior probs!): p(LANG=english) * p(SENT=x | LANG=english) p(LANG=polish) * p(SENT=x | LANG=polish) p(LANG=xhosa) * p(SENT=x | LANG=xhosa)600.465 - Intro to NLP - J. Eisner 11likelihooda posterioria prioriGeneral Case (“noisy channel”) Want most likely A to have generated evidence B p(A = a1 | B = b) p(A = a2 | B = b) p(A = a3 | B = b) For ease, multiply by p(B=b) and compare p(A = a1, B = b) p(A = a2, B = b) p(A = a3, B = b) which we find as follows (we need prior probs!): p(A = a1) * p(B = b | A = a1) p(A = a2) * p(B = b | A = a2) p(A = a3) * p(B = b | A = a3)600.465 - Intro to NLP - J. Eisner 12likelihooda posterioria prioriSpeech Recognition For baby speech recognition we should compare p(MEANING=gimme | SOUND=uhh) p(MEANING=changeme | SOUND=uhh) p(MEANING=loveme | SOUND=uhh) For ease, multiply by p(SOUND=uhh) & compare p(MEANING=gimme, SOUND=uhh) p(MEANING=changeme, SOUND=uhh) p(MEANING=loveme, SOUND=uhh) which we find as follows (we need prior probs!): p(MEAN=gimme) * p(SOUND=uhh | MEAN=gimme) p(MEAN=changeme) * p(SOUND=uhh | MEAN=changeme) p(MEAN=loveme) * p(SOUND=uhh | MEAN=loveme)600.465 - Intro to NLP - J. Eisner 13Life or Death! p(hoof) = 0.001 so p(hoof) = 0.999 p(positive test | hoof) = 0.05 “false pos” p(negative test | hoof) = x  0 “false neg”so p(positive test | hoof) = 1-x  1  What is p(hoof | positive test)? don’t panic - still very small! < 1/51 for any x Does Epitaph have hoof-and-mouth disease?He tested positive – oh no!False positive rate only


View Full Document

Johns Hopkins EN 600 465 - Bayes’ Theorem

Download Bayes’ Theorem
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayes’ Theorem and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayes’ Theorem 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?