!Introduc*on!to!N,grams!!!Language!Modeling!Dan!Jurafsky!Probabilis1c!Language!Models!• Today’s!goal:!assign!a!probability!to!a!sentence!• Machine!Transla*on:!• P(high!winds!tonite)!>!P(large!winds!tonite)!• Spell!Correc*on!• The!office!is!about!fiIeen!minuets!from!my!house!• P(about!fiIeen!minutes!from)!>!P(about!fiIeen!minuets!from)!• Speech!Recogni*on!• P(I!saw!a!van)!>>!P(eyes!awe!of!an)!• +!Summariza*on,!ques*on,answering,!etc.,!etc.!!!Why?!Dan!Jurafsky!Probabilis1c!Language!Modeling!• Goal:!compute!the!probability!of!a!sentence!or!sequence!of!words:!!!!!!P(W)!=!P(w1,w2,w3,w4,w5…wn)!• Related!task:!probability!of!an!upcoming!word:!!!!!!!P(w5|w1,w2,w3,w4)!• A!model!that!computes!either!of!these:!!!!!!!!!!!P(W)!!!!!or!!!!!P(wn|w1,w2…wn,1)!!!!!!!!!!is!called!a!language!model.!• Be_er:!the!grammar!!!!!!!But!language! model!or!LM!is!standard!Dan!Jurafsky!How!to!compute!P(W)!• How!to!compute!this!joint!probability:!• P(its,!water,!is,!so,!transparent,!that)!• Intui*on:!let’s!rely!on!the!Chain!Rule!of!Probability!Dan!Jurafsky!Reminder:!The!Chain!Rule!• Recall!the!defini*on!of!condi*onal!probabili* es!! !!!!!!!Rewri*ng:!!• More!variables:!!P(A,B,C,D)!=!P(A)P(B|A)P(C|A,B)P(D|A,B,C)!• The!Chain!Rule!in!General!!!P(x1,x2,x3,…,xn)!=!P(x1)P(x2|x1)P(x3|x1,x2)…P(xn|x1,…,xn,1)!!Dan!Jurafsky!The!Chain!Rule!applied!to!compute!joint!probability!of!words!in!sentence!!!!P(“its!water!is!so!transparent”)!=!!P(its)!×!P(water|its)!×!!P(is|its!water)!!!!!!!!!!!×!!P(so|its!water!is)!×!!P(transparent|its!water!is!so)! ! P(w1w2…wn) = P(wi| w1w2…wi"1)i#Dan!Jurafsky!How!to!es1mate!these!probabili1 e s!• Could!we!just!count!and!divide?!• No!!!Too!many!possible!sentences!!• We’ll!never!see!enough!data!for!es*ma*ng!these!! P(the | its water is so transparent that) =Count(its water is so transparent that the)Count(its water is so transparent that)Dan!Jurafsky!Markov!Assump1on!• Simpl ifyi ng!assump *on:!!!• Or!maybe!!! P(the | its water is so transparent that) " P(the | that)! P(the | its water is so transparent that) " P(the | transparent that)Andrei!Markov!Dan!Jurafsky!Markov!Assump1on!• In!other!words,!we!approximate!each!component!in!the!product!! ! P(w1w2…wn) " P(wi| wi#k…wi#1)i$ ! P(wi| w1w2…wi"1) # P(wi| wi"k…wi"1)Dan!Jurafsky!Simplest!case:!Unigram!mode l!fifth, an, of, futures, the, an, incorporated, a, a, the, inflation, most, dollars, quarter, in, is, mass!!thrift, did, eighty, said, hard, 'm, july, bullish!!that, or, limited, the!Some!automa*cally!generated!sentences!from!a!unigram!model! ! P(w1w2…wn) " P(wi)i#Dan!Jurafsky!" Condi*on!on!the!previous!word:! Bigram!model!texaco, rose, one, in, this, issue, is, pursuing, growth, in, a, boiler, house, said, mr., gurria, mexico, 's, motion, control, proposal, without, permission, from, five, hundred, fifty, five, yen!!outside, new, car, parking, lot, of, the, agreement, reached!!this, would, be, a, record, november! ! P(wi| w1w2…wi"1) # P(wi| wi"1)Dan!Jurafsky!NJgram!models!• We!can!extend!to!trigrams,!4,grams,!5,grams!• In!general!this!is!an!insufficient!model!of!language!• because!language!has!longJdistance!dependencie s:!!“The!computer!which!I!had!just!put!into!the!machine!room!on!the!fiIh!floor!crashed.”!• But!we!can!oIen!get!away!with!N,gram!models!!Introduc*on!to!N,grams!!!Language!Modeling!!Es*ma*ng!N,gram!Probabili*es!!!Language!Modeling!Dan!Jurafsky!Es1ma1ng!bigram!probabili1e s!• The!Maximum!Likelihood!Es*mate!! P(wi| wi"1) =count(wi"1,wi)count(wi"1)! P(wi| wi"1) =c(wi"1,wi)c(wi"1)Dan!Jurafsky!An!example!<s>!I!am!Sam!</s>!<s>!Sam!I!am!</s>!<s>!I!do!not!like!green!eggs!and!ham!</s>!!! P(wi| wi"1) =c(wi"1,wi)c(wi"1)Dan!Jurafsky!More!examples:!!Berkeley!Restaurant!Project!sentences!• can!you!tell!me!about!any!good!cantonese!restaurants!close!by!• mid!priced!thai!food!is!what!i’m!looking!for!• tell!me!about!chez!panisse!• can!you!give!me!a!lis*ng!of!the!kinds!of!food!that!are!available!• i’m!looking!for!a!good!place!to!eat!breakfast!• when!is!caffe!venezia!open!during!the!dayDan!Jurafsky!Raw!bigram!counts!• Out!of!9222!sentences!Dan!Jurafsky!Raw!bigram!probabili1es!• Normalize!by!unigrams:!• Result:!Dan!Jurafsky!Bigram!es1mate s!of!sentence!pr obabili1es!P(<s>!I!want!english!food!</s>)!=!!P(I|<s>)!!!!! !×!!P(want|I)!!!!×!!P(english|want)!!!!!×!!P(food|english)!!!!!×!!P(</s>|food)!!!!!!!!=!!.000031!Dan!Jurafsky!What!kinds!of!knowledge?!• P(english|want)!!=!.0011!• P(chinese|want)!=!!.0065!• P(to|want)!=!.66!• P(eat!|!to)!=!.28!• P(food!|!to)!=!0!• P(want!|!spend)!=!0!• P!(i!|!<s>)!=!.25!Dan!Jurafsky!Prac1cal!Issues!• We!do!everything!in!log!space!• Avoid!underflow!• (also!adding!is !faster!than!mul *pl ying)!log(p1! p2! p3! p4) = log p1+ log p2+ log p3+ log p4Dan!Jurafsky!Language!Modeling!Toolkits!• SRILM!• h_p://www.speech.sri.com/projects/srilm/ !Dan!Jurafsky!Google!NJGram!Release,!August!2006!…Dan!Jurafsky!Google!NJGram!Release!• serve as the incoming 92!• serve as the incubator 99!• serve as the independent 794!• serve as the index 223!• serve as the indication 72!• serve as the indicator 120!• serve as the indicators 45!• serve as the indispensable 111!• serve as the indispensible 40!• serve as the individual 234!http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.htmlDan!Jurafsky!Google!Book!NJgrams!• h_p://ngrams.googlelabs.com/!!Es*ma*ng!N,gram!Probabili*es!!!Language!Modeling!!Evalua*on!and!Perplexity!!!Language!Modeling!Dan!Jurafsky!Evalua1on:!How!good!is!our!model?!• Does!our!language!model!prefer!good!sentences!to!bad!ones?!• Assign!higher!probability!to!“real”!or!“frequently!observed”!sentences!!• Than!“ungramma*cal”!or!“rarely!observed”!sentences?!• We!train!parameters!of!our!model!on!a!training!set.!• We!test!the!model’s!performance!on!data!we!haven’t!seen.!• A!test!set!is!an!unseen!dataset!that!is!different!from!our!training!set,!totally!unused.!• An!evalua1on!metric!tells!us!how!well!our!model!does!on!the!test!set.!Dan!Jurafsky!Extrinsic!evalua1on!of!NJgram !m ode ls!• Best!evalua*on!for!comparing!models!A!and!B!• Put!each!model!in!a!task!•
View Full Document