Natural Language Processing Lecture 7 9 19 2013 Jim Martin Today More Language modeling N grams Smoothing Finish Good Turing Pretty good smoothing Bayesian prior smoothing Word classes Part of speech tagging 9 19 13 Speech and Language Processing Jurafsky and Martin 2 1 Smoothing Dealing w Zero Counts Back to Shakespeare Recall that Shakespeare produced 300 000 bigram types out of V2 844 million possible bigrams So 99 96 of the possible bigrams were never seen have zero entries in the table Does that mean that any sentence that contains one of those bigrams should have a probability of 0 For generation shannon game it means we ll never emit those bigrams But for analysis it s problematic because if we run across a new bigram in the future then we have no choice but to assign it a probability of zero 9 19 13 Speech and Language Processing Jurafsky and Martin 3 Zero Counts Some of those zeros are really zeros Things that really aren t ever going to happen Fewer of these than you might think On the other hand some of them are just rare events If the training corpus had been a little bigger they would have had a count What would that count be in all likelihood 9 19 13 Speech and Language Processing Jurafsky and Martin 4 2 Zero Counts Zipf s Law long tail phenomenon A small number of events occur with high frequency A large number of events occur with low frequency You can quickly collect statistics on the high frequency events You might have to wait an arbitrarily long time to get good statistics on low frequency events Result Our estimates are necessarily sparse We have no counts at all for the vast number of events we want to estimate Answer Estimate the likelihood of unseen zero count N grams Speech and Language Processing Jurafsky and Martin 9 19 13 5 Laplace Smoothing Also called Add One smoothing Just add one to all the counts Very simple MLE estimate Laplace estimate Reconstructed counts 9 19 13 Speech and Language Processing Jurafsky and Martin 6 3 BERP Bigram Counts 9 19 13 Speech and Language Processing Jurafsky and Martin 7 Laplace Smoothed Bigram Counts 9 19 13 Speech and Language Processing Jurafsky and Martin 8 4 Laplace Smoothed Bigram Probabilities 9 19 13 Speech and Language Processing Jurafsky and Martin 9 Reconstituted Counts 9 19 13 Speech and Language Processing Jurafsky and Martin 10 5 Reconstituted Counts 2 Speech and Language Processing Jurafsky and Martin 9 19 13 11 Big Change to the Counts C want to went from 608 to 238 P to want from 66 to 26 Discount d c c d for chinese food 10 A 10x reduction So in general Laplace is a blunt instrument Could use more fine grained method add k But Laplace smoothing not generally used for N grams as we have much better methods Despite its flaws Laplace add k is however still used to smooth other probabilistic models in NLP especially 9 19 13 For pilot studies In document classification Information retrieval In domains where the number of zeros isn t so huge Speech and Language Processing Jurafsky and Martin 12 6 Fun with Unix Thanks to Ken Church Unix for Poets 9 19 13 Speech and Language Processing Jurafsky and Martin 13 Better Smoothing Intuition used by many smoothing algorithms Good Turing Kneser Ney Witten Bell Use the count of things we ve seen once to help estimate the count of things we ve never seen 9 19 13 Speech and Language Processing Jurafsky and Martin 14 7 One Fish Two Fish Imagine you are fishing There are 8 species carp perch whitefish trout salmon eel catfish bass Not sure where this fishing hole is You have caught up to now 10 carp 3 perch 2 whitefish 1 trout 1 salmon 1 eel 18 fish How likely is it that the next fish to be caught is an eel How likely is it that the next fish caught will be a member of newly seen species Now how likely is it that the next fish caught will be an eel Slide adapted from Josh Goodman Speech and Language Processing Jurafsky and Martin 9 19 13 15 Good Turing Notation Nx is the frequency of frequency x So N10 1 Number of fish species seen 10 times is 1 carp N1 3 Number of fish species seen 1 is 3 trout salmon eel To estimate the probability of an unseen species Use number of species words we ve seen once c0 c1 p0 N1 N 3 18 All other estimates are adjusted downward to account for unseen probabilities c eel c 1 1 1 1 3 2 3 9 19 13 Slide from Josh Goodman Speech and Language Processing Jurafsky and Martin 16 8 Bigram Frequencies of Frequencies and GT Re estimates 9 19 13 Speech and Language Processing Jurafsky and Martin 17 Bigram Frequencies of Frequencies and GT Re estimates 3 4 381 642 4 593 2 37 9 19 13 Speech and Language Processing Jurafsky and Martin 18 9 GT Smoothed Bigram Probabilities 9 19 13 Speech and Language Processing Jurafsky and Martin 19 GT Complications In practice assume large counts c k for some k are reliable The image cannot be displayed Your computer may not have enough memory to open the image or the image may have been corrupted Restart your computer and then open the file again If the red x still appears you may have to delete the image and then insert it again Also need all the N k to be non zero so we need to smooth interpolate the Nk counts before computing c from them 9 19 13 Speech and Language Processing Jurafsky and Martin 20 10 Pretty Good Smoothing Maximum Likelihood Estimation P w 2 w1 C w1 w 2 C w1 Laplace Smoothing PLaplace w 2 w1 C w1 w 2 1 C w1 vocab Bayesian prior Smoothing PPrior w 2 w1 9 19 13 C w1 w 2 P w 2 C w1 1 Speech and Language Processing Jurafsky and Martin 21 Pretty Good Smoothing Bayesian prior smoothing PPrior w 2 w1 C w1 w 2 P w 2 C w1 1 Why is there a 1 here 9 19 13 Speech and Language Processing Jurafsky and Martin 22 11 Toolkits With FSAs FSTs Openfst org For language modeling SRILM SRI Language Modeling Toolkit All the bells and whistles you can imagine 9 19 13 Speech and Language Processing Jurafsky and Martin 23 Break HW Questions 9 19 13 Speech and Language Processing Jurafsky and Martin 24 12 Break Quiz is Thursday Oct 3 Chapters 1 to 6 I ll post specific readings when enough people remind nag me 9 19 13 Speech and Language Processing Jurafsky and Martin 25 Back to Some Linguistics 9 19 13 Speech and Language Processing Jurafsky and Martin 26 13 Word Classes Parts of Speech 8 ish traditional parts of speech Noun verb adjective preposition adverb article interjection pronoun conjunction etc Also known as parts of speech lexical categories word classes morphological classes lexical …
View Full Document
Unlocking...