Johns Hopkins EN 600 465 - Bayes’ Theorem - D2211852

Home> Schools> Johns Hopkins University> EN Computer Science (EN 600) > EN 600 465> Bayes’ Theorem

DOC PREVIEW

Johns Hopkins EN 600 465 - Bayes’ Theorem

School name Johns Hopkins University

Course En 600 465- Natural Language Processing

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Bayes’ TheoremRemember Language ID?Bayes’ TheoremLanguage IDSlide 5Let’s try it!Slide 7Slide 8General Case (“noisy channel”)Slide 10Slide 11Speech RecognitionLife or Death!600.465 - Intro to NLP - J. Eisner 1Bayes’ TheoremLet’s revisit this600.465 – Intro to NLP – J. Eisner 2Remember Language ID?•Let p(X) = probability of text X in English•Let q(X) = probability of text X in Polish•Which probability is higher?–(we’d also like bias toward English since it’s more likely a priori – ignore that for now)“Horses and Lukasiewicz are on the curriculum.”p(x1=h, x2=o, x3=r, x4=s, x5=e, x6=s, …)600.465 - Intro to NLP - J. Eisner 3Bayes’ Theoremp(A | B) = p(B | A) * p(A) / p(B)Easy to check by removing syntactic sugarUse 1: Converts p(B | A) to p(A | B)Use 2: Updates p(A) to p(A | B)Stare at it so you’ll recognize it later600.465 - Intro to NLP - J. Eisner 4Language IDGiven a sentence x, I suggested comparing its prob in different languages:p(SENT=x | LANG=english) (i.e., penglish(SENT=x))p(SENT=x | LANG=polish) (i.e., ppolish(SENT=x))p(SENT=x | LANG=xhosa) (i.e., pxhosa(SENT=x))But surely for language ID we should comparep(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)600.465 - Intro to NLP - J. Eisner 5a posteriori a priori likelihood (what we had before)Language IDsum of these is a way to find p(SENT=x); can divide back by that to get posterior probsFor language ID we should comparep(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)For ease, multiply by p(SENT=x) and comparep(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)Must know prior probabilities; then rewrite asp(LANG=english) * p(SENT=x | LANG=english)p(LANG=polish) * p(SENT=x | LANG=polish)p(LANG=xhosa) * p(SENT=x | LANG=xhosa)likelihoodp(SENT=x | LANG=english)p(SENT=x | LANG=polish)p(SENT=x | LANG=xhosa)600.465 - Intro to NLP - J. Eisner 6Let’s try it!0.000010.000040.00005joint probability===p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)0.0000070.0000080.0000050.70.20.1from a very simple model: a single die whose sides are the languages of the worldfrom a set of trigram dice (actually 3 sets, one per language)bestbestbest compromiseprobability of evidence p(SENT=x)0.000020total over all ways of getting SENT=x“First we pick a random LANG, then we roll a random SENT with the LANG dice.” prior probp(LANG=english) * p(LANG=polish) * p(LANG=xhosa) *joint probabilityLet’s try it!p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)===0.0000070.0000080.000005probability of evidencebest compromise p(SENT=x)0.000020total probability of getting SENT=xone way or another!“First we pick a random LANG, then we roll a random SENT with the LANG dice.”…posterior probabilityp(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)0.000007/0.000020 = 7/200.000008/0.000020 = 8/200.000005/0.000020 = 5/20 bestadd upnormalize(divide bya constantso they’llsum to 1) given the evidence SENT=x,the possible languages sum to 1joint probability600.465 - Intro to NLP - J. Eisner 8Let’s try it!p(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)===0.0000070.0000080.000005probability of evidencebest compromise p(SENT=x)0.000020total over all ways of getting x600.465 - Intro to NLP - J. Eisner 9General Case (“noisy channel”)mess up a into ba b“noisy channel”“decoder”most likely reconstruction of ap(A=a)p(B=b | A=a)language  texttext  speechspelled  misspelledEnglish  Frenchmaximize p(A=a | B=b)= p(A=a) p(B=b | A=a) / (B=b)= p(A=a) p(B=b | A=a) / a’ p(A=a’) p(B=b | A=a’)600.465 - Intro to NLP - J. Eisner 10likelihooda posteriori a prioriLanguage IDFor language ID we should comparep(LANG=english | SENT=x)p(LANG=polish | SENT=x)p(LANG=xhosa | SENT=x)For ease, multiply by p(SENT=x) and comparep(LANG=english, SENT=x)p(LANG=polish, SENT=x)p(LANG=xhosa, SENT=x)which we find as follows (we need prior probs!):p(LANG=english) * p(SENT=x | LANG=english)p(LANG=polish) * p(SENT=x | LANG=polish)p(LANG=xhosa) * p(SENT=x | LANG=xhosa)600.465 - Intro to NLP - J. Eisner 11likelihooda posteriori a prioriGeneral Case (“noisy channel”)Want most likely A to have generated evidence Bp(A = a1 | B = b)p(A = a2 | B = b)p(A = a3 | B = b)For ease, multiply by p(B=b) and comparep(A = a1, B = b)p(A = a2, B = b)p(A = a3, B = b)which we find as follows (we need prior probs!):p(A = a1) * p(B = b | A = a1)p(A = a2) * p(B = b | A = a2)p(A = a3) * p(B = b | A = a3)600.465 - Intro to NLP - J. Eisner 12likelihooda posteriori a prioriSpeech RecognitionFor baby speech recognition we should comparep(MEANING=gimme | SOUND=uhh)p(MEANING=changeme | SOUND=uhh)p(MEANING=loveme | SOUND=uhh)For ease, multiply by p(SOUND=uhh) & comparep(MEANING=gimme, SOUND=uhh)p(MEANING=changeme, SOUND=uhh)p(MEANING=loveme, SOUND=uhh)which we find as follows (we need prior probs!):p(MEAN=gimme) * p(SOUND=uhh | MEAN=gimme)p(MEAN=changeme) * p(SOUND=uhh | MEAN=changeme)p(MEAN=loveme) * p(SOUND=uhh | MEAN=loveme)600.465 - Intro to NLP - J. Eisner 13Life or Death!p(hoof) = 0.001 so p(hoof) = 0.999p(positive test | hoof) = 0.05 “false pos”p(negative test | hoof) = x  0 “false neg” so p(positive test | hoof) = 1-x  1 What is p(hoof | positive test)?don’t panic - still very small! < 1/51 for any x Does Epitaph have hoof-and-mouth disease?He tested positive – oh no!False positive rate only

View Full Document