Natural Language Processing Lecture 6 9 17 2013 Jim Martin Today More language modeling with Ngrams Basic counting Probabilistic model Independence assumptions Simple smoothing methods 01 14 19 Speech and Language Processing Jurafsky and Martin 2 N Gram Models We can use knowledge of the counts of N grams to assess the conditional probability of candidate words as the next word in a sequence Or we can use them to assess the probability of an entire sequence of words Pretty much the same thing as we ll see 01 14 19 Speech and Language Processing Jurafsky and Martin 3 Language Modeling Back to word prediction We can model the word prediction task as the ability to assess the conditional probability of a word given the previous words in the sequence P wn w1 w2 wn 1 We ll call a statistical model that can assess this a Language Model 01 14 19 Speech and Language Processing Jurafsky and Martin 4 Language Modeling How might we go about calculating such a conditional probability One way is to use the definition of conditional probabilities and look for counts So to get P the its water is so transparent that By definition that s P its water is so transparent that the P its water is so transparent that 01 14 19 Speech and Language Processing Jurafsky and Martin 5 Very Easy Estimate How to estimate P the its water is so transparent that P the its water is so transparent that Count its water is so transparent that the Count its water is so transparent that 01 14 19 Speech and Language Processing Jurafsky and Martin 6 Language Modeling Unfortunately for most sequences and for most text collections we won t get good estimates from this method What we re likely to get is 0 Or worse 0 0 Clearly we ll have to be a little more clever Let s first use the chain rule of probability And then apply a particularly useful independence assumption 01 14 19 Speech and Language Processing Jurafsky and Martin 7 The Chain Rule Recall the definition of conditional probabilities P A B P A B P B Rewriting P A B P A B P B For sequences P A B C D P A P B A P C A B P D A B C In general P x1 x2 x3 xn P x1 P x2 x1 P x3 x1 x2 P xn x1 xn 1 01 14 19 Speech and Language Processing Jurafsky and Martin 8 The Chain Rule P its water was so transparent P its P water its P was its water P so its water was P transparent its water was so 01 14 19 Speech and Language Processing Jurafsky and Martin 9 Unfortunately That doesn t really help since it relies on having N gram counts for a sequence that s only 1 shorter than what we started with Not likely to help with getting counts In general we will never be able to get enough data to compute the statistics for those longer prefixes Same problem we had for the strings themselves 01 14 19 Speech and Language Processing Jurafsky and Martin 10 Independence Assumption Make a simplifying assumption P lizard the other day I was walking along and s aw a P lizard a Or maybe P lizard the other day I was walking along and s aw a P lizard saw a That is the probability in question is to some degree independent of its earlier history 01 14 19 Speech and Language Processing Jurafsky and Martin 11 Independence Assumption This particular kind of independence assumption is called a Markov assumption after the Russian mathematician Andrei Markov 01 14 19 Speech and Language Processing Jurafsky and Martin 12 Markov Assumption Replace each component in the product with a shorter approximation assuming a prefix of N 1 n 1 1 n 1 n N 1 P wn w P wn w Bigram N 2 version n 1 1 P wn w P wn wn 1 01 14 19 Speech and Language Processing Jurafsky and Martin 13 Bigram Example P its water was so transparent P its P water its P was its water P so its water was P transparent its water was so P its water was so transparent P its P water its P was water P so was P transparent so 01 14 19 Speech and Language Processing Jurafsky and Martin 14 Estimating Bigram Probabilities The Maximum Likelihood Estimate MLE count wi 1 wi P wi wi 1 count wi 1 01 14 19 Speech and Language Processing Jurafsky and Martin 15 An Example s I am Sam s s Sam I am s s I do not like green eggs and ham s 01 14 19 Speech and Language Processing Jurafsky and Martin 16 Maximum Likelihood Estimates The maximum likelihood estimate of some parameter of a model M from a training set T Is the estimate that maximizes the likelihood of the training set T given the model M Suppose the word Chinese occurs 400 times in a corpus of a million words Brown corpus What is the probability that a random word from some other text from the same distribution will be Chinese MLE estimate is 400 1000000 004 This may be a bad estimate for some other corpus But it is the estimate that makes it most likely that Chinese will occur 400 times in a million word corpus 01 14 19 Speech and Language Processing Jurafsky and Martin 17 Berkeley Restaurant Project Sentences can you tell me about any good cantonese restaurants close by mid priced thai food is what i m looking for tell me about chez panisse can you give me a listing of the kinds of food that are available i m looking for a good place to eat breakfast when is caffe venezia open during the day 01 14 19 Speech and Language Processing Jurafsky and Martin 18 Bigram Counts Out of 9222 sentences Eg I want occurred 827 times 01 14 19 Speech and Language Processing Jurafsky and Martin 19 Bigram Probabilities Divide bigram counts by prefix unigram counts to get probabilities 01 14 19 Speech and Language Processing Jurafsky and Martin 20 Bigram Estimates of Sentence Probabilities P s I want english food s P i s P want I P english want P food english P s food 000031 01 14 19 Speech and Language Processing Jurafsky and Martin 21 Kinds of Knowledge As crude as they are N gram probabilities capture a range of interesting facts about language P english want World knowledge 0011 P chinese want 0065 Syntax P to want 66 P eat to 28 P food to 0 Discourse P want spend 0 P i s 25 01 14 19 Speech and Language Processing Jurafsky and Martin 22 Shannon s Game Assigning probabilities to sentences is all well and good but it s not terribly entertaining What if we turn these models around and use them to generate random sentences that are like the sentences from which the model was derived 01 14 19 Speech and Language Processing …
View Full Document
Unlocking...