CSCI 5832 Natural Language Processing Lecture 6 Jim Martin 01 14 19 CSCI 5832 Spring 2007 1 Today 2 1 Review Noisy Channel Model Basic Probability Review Break N Gram language models 01 14 19 CSCI 5832 Spring 2007 2 Review FSAs FSTs can do lots of cool stuff but they can t do it all In many cases they simply don t have the power to handle the facts e g an bn More on this later e g CFGs In the case of global ambiguity they can t tell us which output is more likely to be the correct one 01 14 19 CSCI 5832 Spring 2007 3 So We ll modify finite state machines so they can tell us more about how likely various correct outputs are By applying some simple probability theory 01 14 19 CSCI 5832 Spring 2007 4 Noisy Channel An influential metaphor in language processing is the noisy channel model 01 14 19 CSCI 5832 Spring 2007 5 Noisy Channel Obvious applications include Speech recognition Optical character recognition Spelling correction Not so obvious Semantic analysis Machine translation I e German to English is a matter of uncorrupting the original signal 01 14 19 CSCI 5832 Spring 2007 6 Probability Basics Prior or unconditional probability Written as P A For now think of A as a proposition that can turn out to be True or False P A is your belief that A is true given that you know nothing else relevant to A In NLP applications this is the normalized count of some linguistic event Priors for words NPs sentences sentence types names etc 01 14 19 CSCI 5832 Spring 2007 7 Basics Conditional or posterior probabilities Written as P A B Pronounced as the probability of A given B Think of it as your belief in A given that you know absolutely that B is true In NLP applications this is the count of some event conditioned on some other usually linguistic event 01 14 19 CSCI 5832 Spring 2007 8 And P A B your belief in A given that you know B is true AND B is all you know that is relevant to A 01 14 19 CSCI 5832 Spring 2007 9 Conditionals Defined Conditionals Rearranging P A B P A B P B P A B P A B P B And also P A B P B A P A 01 14 19 CSCI 5832 Spring 2007 10 Conditionals Defined 01 14 19 CSCI 5832 Spring 2007 11 Bayes We know So rearranging things P A B P A B P B and P A B P B A P A P A B P B P B A P A P A B 01 14 19 CSCI 5832 Spring 2007 P B A P A P B 12 Bayes Memorize this P B A P A P A B P B 01 14 19 CSCI 5832 Spring 2007 13 Bayes and the Noisy Channel In applying Bayes to the noisy channel we want to compute the most likely source given some observed corrupt output signal Argmaxi P Sourcei Signal Often not always this is hard to get so we apply Bayes 01 14 19 CSCI 5832 Spring 2007 14 Bayes and Noisy Channel So argmax this instead P Signal Source P Source P Source Signal P Signal 01 14 19 CSCI 5832 Spring 2007 15 Argmax and Bayes What does this mean Argmax P Signal Source P Source P Signal Plug in each possible source and compute the corresponding probability Pick the one with the highest Note the denominator is the same for each source candidate so we can ignore it for the purposes of the argmax 01 14 19 CSCI 5832 Spring 2007 16 Argmax and Bayes Ignoring the denominator leaves us with two factors P Source and P Signal Source 01 14 19 CSCI 5832 Spring 2007 17 Bayesian Decoding P Source This is often referred to as a language model It encodes information about the likelihood of particular sequences or structures independent of the observed signal P Signal Source This encodes specific information about how the channel tends to introduce noise How likely is it that a given source would produce an observed signal 01 14 19 CSCI 5832 Spring 2007 18 Note This framework is completely general it makes minimal assumptions about the nature of the application the source or the channel 01 14 19 CSCI 5832 Spring 2007 19 Transition Up to this point we ve mostly been discussing words in isolation and their insides Now we ll switch to looking at sequences of words And we re going to worry about assigning probabilities to sequences of words 01 14 19 CSCI 5832 Spring 2007 20 Who Cares Why would you want to assign a probability to a sentence or Why would you want to predict the next word Lots of applications Historically it was first used effectively in automatic speech recognition 01 14 19 CSCI 5832 Spring 2007 21 Break Quiz will be 2 8 Focus on 2 3 4 maybe the start of 5 Review past quizzes Question relate to lectures readings and the assignment Yes even stuff in the readings not covered in class HW 2 to be posted asap 01 14 19 CSCI 5832 Spring 2007 22 Chain Rule Recall the definition of conditional probabilities Rewriting Or Or 01 14 19 P A B P A B P B P A B P A B P B P The big P big the P the P The big P the P big the CSCI 5832 Spring 2007 23 Example The big red dog P The P big the P red the big P dog the big red Better P The Beginning of sentence written as P The S 01 14 19 CSCI 5832 Spring 2007 24 General Case w n The word sequence from position 1 to n1 is So the probability of a sequence is P w1n P w1 P w2 w1 P w3 w12 P wn w1n 1 n P w1 k 2 P wk w1k 1 01 14 19 CSCI 5832 Spring 2007 25 Unfortunately That doesn t help since its unlikely we ll ever gather the right statistics for the prefixes 01 14 19 CSCI 5832 Spring 2007 26 Markov Assumption Assume that the entire prefix history isn t necessary In other words an event doesn t depend on all of its history just a fixed length near history 01 14 19 CSCI 5832 Spring 2007 27 Markov Assumption So for each component in the product replace each with its with the approximation assuming a prefix of N n 1 1 n 1 n N 1 P wn w P wn w 01 14 19 CSCI 5832 Spring 2007 28 N Grams The big red dog Unigrams P dog Bigrams P dog red Trigrams P dog big red Four grams P dog the big red In general we ll be dealing with P Word Some fixed prefix 01 14 19 CSCI 5832 Spring 2007 29 Caveat The formulation P Word Some fixed prefix is not really appropriate in many applications It is if …
View Full Document
Unlocking...