CSCI 5832 Natural Language Processing Lecture 6 Jim Martin 2 1 07 CSCI 5832 Spring 2007 1 Today 2 1 Review Noisy Channel Model Basic Probability Review Break N Gram language models 2 1 07 CSCI 5832 Spring 2007 2 1 Review FSAs FSTs can do lots of cool stuff but they can t do it all In many cases they simply don t have the power to handle the facts e g an bn More on this later e g CFGs In the case of global ambiguity they can t tell us which output is more likely to be the correct one 2 1 07 CSCI 5832 Spring 2007 3 So We ll modify finite state machines so they can tell us more about how likely various correct outputs are By applying some simple probability theory 2 1 07 CSCI 5832 Spring 2007 4 2 Noisy Channel An influential metaphor in language processing is the noisy channel model 2 1 07 CSCI 5832 Spring 2007 5 Noisy Channel Obvious applications include Speech recognition Optical character recognition Spelling correction Not so obvious Semantic analysis Machine translation I e German to English is a matter of uncorrupting the original signal 2 1 07 CSCI 5832 Spring 2007 6 3 Probability Basics Prior or unconditional probability Written as P A For now think of A as a proposition that can turn out to be True or False P A is your belief that A is true given that you know nothing else relevant to A In NLP applications this is the normalized count of some linguistic event Priors for words NPs sentences sentence types names etc 2 1 07 CSCI 5832 Spring 2007 7 Basics Conditional or posterior probabilities Written as P A B Pronounced as the probability of A given B Think of it as your belief in A given that you know absolutely that B is true In NLP applications this is the count of some event conditioned on some other usually linguistic event 2 1 07 CSCI 5832 Spring 2007 8 4 And P A B your belief in A given that you know B is true AND B is all you know that is relevant to A 2 1 07 CSCI 5832 Spring 2007 9 Conditionals Defined Conditionals Rearranging And also 2 1 07 CSCI 5832 Spring 2007 10 5 Conditionals Defined 2 1 07 CSCI 5832 Spring 2007 11 Bayes We know So rearranging things 2 1 07 CSCI 5832 Spring 2007 12 6 Bayes Memorize this 2 1 07 CSCI 5832 Spring 2007 13 Bayes and the Noisy Channel In applying Bayes to the noisy channel we want to compute the most likely source given some observed corrupt output signal Argmaxi P Sourcei Signal Often not always this is hard to get so we apply Bayes 2 1 07 CSCI 5832 Spring 2007 14 7 Bayes and Noisy Channel So argmax this instead 2 1 07 CSCI 5832 Spring 2007 15 Argmax and Bayes What does this mean Argmax Plug in each possible source and compute the corresponding probability Pick the one with the highest Note the denominator is the same for each source candidate so we can ignore it for the purposes of the argmax 2 1 07 CSCI 5832 Spring 2007 16 8 Argmax and Bayes Ignoring the denominator leaves us with two factors P Source and P Signal Source 2 1 07 CSCI 5832 Spring 2007 17 Bayesian Decoding P Source This is often referred to as a language model It encodes information about the likelihood of particular sequences or structures independent of the observed signal P Signal Source This encodes specific information about how the channel tends to introduce noise How likely is it that a given source would produce an observed signal 2 1 07 CSCI 5832 Spring 2007 18 9 Note This framework is completely general it makes minimal assumptions about the nature of the application the source or the channel 2 1 07 CSCI 5832 Spring 2007 19 Transition Up to this point we ve mostly been discussing words in isolation and their insides Now we ll switch to looking at sequences of words And we re going to worry about assigning probabilities to sequences of words 2 1 07 CSCI 5832 Spring 2007 20 10 Who Cares Why would you want to assign a probability to a sentence or Why would you want to predict the next word Lots of applications Historically it was first used effectively in automatic speech recognition 2 1 07 CSCI 5832 Spring 2007 21 Break Quiz will be 2 8 Focus on 2 3 4 maybe the start of 5 Review past quizzes Question relate to lectures readings and the assignment Yes even stuff in the readings not covered in class HW 2 to be posted asap 2 1 07 CSCI 5832 Spring 2007 22 11 Chain Rule Recall the definition of conditional probabilities Rewriting Or Or 2 1 07 CSCI 5832 Spring 2007 23 Example The big red dog P The P big the P red the big P dog the big red Better P The Beginning of sentence written as P The S 2 1 07 CSCI 5832 Spring 2007 24 12 General Case The word sequence from position 1 to n is So the probability of a sequence is 2 1 07 CSCI 5832 Spring 2007 25 Unfortunately That doesn t help since its unlikely we ll ever gather the right statistics for the prefixes 2 1 07 CSCI 5832 Spring 2007 26 13 Markov Assumption Assume that the entire prefix history isn t necessary In other words an event doesn t depend on all of its history just a fixed length near history 2 1 07 CSCI 5832 Spring 2007 27 Markov Assumption So for each component in the product replace each with its with the approximation assuming a prefix of N 2 1 07 CSCI 5832 Spring 2007 28 14 N Grams The big red dog Unigrams Bigrams Trigrams Four grams P dog P dog red P dog big red P dog the big red In general we ll be dealing with P Word Some fixed prefix 2 1 07 CSCI 5832 Spring 2007 29 Caveat The formulation P Word Some fixed prefix is not really appropriate in many applications It is if we re dealing with real time speech where we only have access to prefixes But if we re dealing with text we already have the right and left contexts There s no a priori reason to stick to left contexts 2 1 07 CSCI 5832 Spring 2007 30 15 BERP Table Counts 2 1 07 CSCI 5832 Spring 2007 31 Counts Bigram Probs Recall if we want P want I that s the P I want P want and that s just Count I want Count want 2 1 07 CSCI 5832 Spring 2007 32 16 BERP Table Bigram Probabilities 2 1 07 CSCI 5832 Spring 2007 33 Some Observations The following numbers are very informative Think about what they capture 2 1 07 P want I 32 P to want 65 P eat to 26 P food Chinese 56 P lunch eat 055 CSCI …
View Full Document
Unlocking...