Unformatted text preview:

Natural Language Processing Lecture 24 12 5 2013 Jim Martin Today Statistical machine translation details Word alignment Phrase based methods Decoding 01 13 19 Speech and Language Processing Jurafsky and Martin 2 Stat MT Current statistical MT systems are based on maximizing P E F We ll first start by using Bayes 01 13 19 Speech and Language Processing Jurafsky and Martin 3 Word Alignment via EM Let s start with a two sentence aligned corpus 01 13 19 Speech and Language Processing Jurafsky and Martin 4 Alignment Probs Let s define the P of a sentence alignment as the product of the P s of their component word translation probabilities 01 13 19 Speech and Language Processing Jurafsky and Martin 5 Alignment Probs 2 But each of those needs to be normalized Nothing very surprising here Each alignment is equally likely so they each normalize to 50 50 since there are 2 for each sentence in this case 01 13 19 Speech and Language Processing Jurafsky and Martin 6 Word Translation Probs So now that we know the alignment probabilities we can gather and prorate the individual translation counts 01 13 19 Speech and Language Processing Jurafsky and Martin 7 Word Translation Probs To turn those into probs we just count and divide to get new conditional probabilities Note these all started at 1 3 All the right ones have gone up and some of the wrong ones have gone down 01 13 19 Speech and Language Processing Jurafsky and Martin 8 New Alignment Probs Now we can use these new word translation probs to derive new sentence alignment probs And thereby get new adjusted counts To get new word translation probs To get new alignment probs 01 13 19 Speech and Language Processing Jurafsky and Martin 9 Alignment So if we do that We have a word aligned bitext From that we can directly get our word to word translation probabilities Which is really what we want 01 13 19 Speech and Language Processing Jurafsky and Martin 10 Phrase Based MT Turns out that the word based approach gets complicated and messy very quickly Translating phrases solves most of the problems and uses essentially same mechanisms And works really really well 01 13 19 Speech and Language Processing Jurafsky and Martin 11 Phrase Based Stat MT The basic premise in phrase based MT is that the texts consists of phrases that Need to be translated And moved around 01 13 19 Speech and Language Processing Jurafsky and Martin 12 Phrase Based MT The probability of such a translation is the product of the individual phrase translations and the the movement dislocation probabilities 01 13 19 Speech and Language Processing Jurafsky and Martin 13 Phrase Based MT Of course that means we know what a useful phrase is We could use the notion a syntactic constituent That s too hard Better is to try to discover phrases from the wordaligned corpus And then use those discovered phrases to get estimates for our translation model 01 13 19 Speech and Language Processing Jurafsky and Martin 14 Discovering Phrases 1 Align both ways then intersect to get high precision alignments 01 13 19 Speech and Language Processing Jurafsky and Martin 15 Discovering Phrases 2 From these high precision points grow phrases by trying to connect the dots using candidate alignments from the union of the original alignments 01 13 19 Speech and Language Processing Jurafsky and Martin 16 Discovering Phrases 3 These initial phrases can then be grown into larger phrases by fusing the phrases such that Each proposed phrase alignment includes all the words in the component phrase alignments 01 13 19 Speech and Language Processing Jurafsky and Martin 17 Discovering Phrases 3 01 13 19 Speech and Language Processing Jurafsky and Martin 18 Discovering Phrases 3 01 13 19 Speech and Language Processing Jurafsky and Martin 19 Discovering Phrases 3 01 13 19 Speech and Language Processing Jurafsky and Martin 20 Discovering Phrases 3 01 13 19 Speech and Language Processing Jurafsky and Martin 21 Discovering Phrases 3 01 13 19 Speech and Language Processing Jurafsky and Martin 22 Discovering Phrases 3 01 13 19 Speech and Language Processing Jurafsky and Martin 23 Phrase Translation Given such phrases we can get the required counts for our translation model from 01 13 19 Speech and Language Processing Jurafsky and Martin 24 Decoding Basic idea is to search the space of possible English translations in an efficient manner Still according to 01 13 19 Speech and Language Processing Jurafsky and Martin 25 Decoding as Search We start with a null state No foreign content accounted for No English content produced We drive the search by 1 Segmenting the foreign input all segmentations allowed by the phrase table 2 Choosing foreign word phrases to cover 3 Choosing a way to cover them English translations are pasted left to right to previous choices Done when all the foreign input is covered 01 13 19 Speech and Language Processing Jurafsky and Martin 26 Decoding Maria 01 13 19 no dio una bofetad a a Speech and Language Processing Jurafsky and Martin la bruja verde 27 Decoding Maria no dio una bofetad a a la bruja verde Mary 01 13 19 Speech and Language Processing Jurafsky and Martin 28 Decoding Maria no Mary did not 01 13 19 dio una bofetad a a Speech and Language Processing Jurafsky and Martin la bruja verde 29 Decoding Maria no dio Mary Did not slap 01 13 19 una bofetad a a Speech and Language Processing Jurafsky and Martin la bruja verde 30 Decoding Maria no dio Mary Did not slap 01 13 19 una bofetad a a la bruja verde the Speech and Language Processing Jurafsky and Martin 31 Decoding Maria no dio Mary Did not slap 01 13 19 una bofetad a a the Speech and Language Processing Jurafsky and Martin la bruja verde green 32 Decoding Maria no dio Mary Did not slap 01 13 19 una bofetad a a the Speech and Language Processing Jurafsky and Martin la bruja verde green witch 33 Decoding Maria no dio Mary did not slap 01 13 19 una bofetad a a the Speech and Language Processing Jurafsky and Martin la bruja verde green witch 34 Decoding Of course that just showed one set of choices out of many many many many possible ones We want to incrementally pursue a large number of paths focusing on the promising ones first Manage that as a heuristic search through a search space 01 13 19 Speech and Language Processing Jurafsky and Martin 35 Decoding Search cost is really based on two factors Current cost Language model cost and translation cost for the chosen phrase Future cost Estimated cost to translate the remaining parts of the


View Full Document

CU-Boulder CSCI 5832 - Lecture 24

Loading Unlocking...
Login

Join to view Lecture 24 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 24 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?