Unformatted text preview:

CSCI 5832 Natural Language Processing Lecture 26 Jim Martin 01 13 19 CSCI 5832 Spring 2007 1 Today 4 26 Review Machine translation framework Word based models Worked out EM alignment example Phrase based translation Syntax based 01 13 19 CSCI 5832 Spring 2007 2 Sample from Today s Al Jazeera 01 13 19 CSCI 5832 Spring 2007 3 MT Pyramid interlingua semantics semantics syntax syntax phrases phrases words words 01 13 19 SOURCE CSCI 5832 Spring 2007 TARGET 4 Bayes Rule Noisy Channel Garbled English Spanish Translation Model P s e Que hambre tengo yo English Language Model P e Decoding algorithm argmax P e P s e e I am so hungry Given a source sentence s the decoder should consider many possible translations and return the target string e that maximizes P e s By Bayes Rule we can also write this as P e x P s e P s and maximize that instead P s never changes while we compare different e s so we can equivalently maximize this 01 13 19 CSCI 5832 Spring 2007 5 P e x P s e Parts List We need probabilities for n x y The probability that word y will yield x outputs in the translation fertility p The probability of a null insertion t The actual word translation probability table d j i the probability that a word at position i will make an appearance at position j in the translation 01 13 19 CSCI 5832 Spring 2007 6 Parts List Every one of these can be learned from a sentence aligned corpus Ie A corpus where sentences are paired but nothing else is specified And the EM algorithm 01 13 19 CSCI 5832 Spring 2007 7 Word Alignment la maison la maison bleue la fleur the house the blue house the flower Inherent hidden structure revealed by EM training 01 13 19 CSCI 5832 Spring 2007 8 EM Worked out example Focus only on the word translation probs la maison la maison bleue la fleur the house the blue house the flower How many alignments are there 01 13 19 CSCI 5832 Spring 2007 9 EM Worked out Knight example Focus only on the word translation probs la maison la maison bleue la fleur the house the blue house the flower 2 6 2 How many alignments are there 01 13 19 CSCI 5832 Spring 2007 10 EM Step 1 Make up some numbers for the parameters of interest In this case just the word translation probabilities la the m the b the f the 01 13 19 la la blue la house flower m blue m house b blue f flower b house CSCI 5832 Spring 2007 11 EM Step 1 Make up some numbers for the parameters of interest In this case just the translation probs word translation probs la the 1 4 m the 1 4 b the 1 4 f the 1 4 01 13 19 la house 1 3 m house 1 3 b house 1 3 la blue 1 3 m blue 1 3 b blue 1 3 CSCI 5832 Spring 2007 la flower 1 2 f flower 1 2 12 EM Step 2 Make some simple assumptions and produce some normalized alignment probabilities la maison la maison the house the house 1 3 1 12 1 3 1 12 01 13 19 CSCI 5832 Spring 2007 la maison bleue the blue house 1 3 1 3 1 36 13 EM Step 2 normalize Make some simple assumptions and produce some normalized alignment probabilities la maison la maison la maison bleue the house the house the blue house 1 3 1 12 1 3 1 3 1 36 1 3 1 12 1 12 2 12 1 2 01 13 19 1 12 2 12 1 2 CSCI 5832 Spring 2007 1 6 For each 14 EM Step 3 Now that we have the probability of each alignment we can go back and count the evidence in the alignments for each translation pair and prorate them based on the alignments they come from 01 13 19 CSCI 5832 Spring 2007 15 EM Step 3 Now that we have the probability of each alignment we can go back and count the evidence in the alignments for each translation pair and prorate them based on the alignments they come from Huh Let s just look at la the What evidence do we have 01 13 19 CSCI 5832 Spring 2007 16 EM Step 3 Evidence for la the la maison la maison bleue la fleur the house the blue house the flower 1 01 13 19 2 CSCI 5832 Spring 2007 1 17 EM Step 3 Evidence for la the la maison la maison bleue la fleur the house the blue house the flower 1 1 01 13 19 2 2 1 6 CSCI 5832 Spring 2007 1 1 18 EM Step 3 Evidence for la the la maison la maison bleue la fleur the house the blue house the flower 1 1 2 2 1 6 1 1 8 6 Discounted count for la the 01 13 19 CSCI 5832 Spring 2007 19 EM Step 3 Do that for the other the and la normalize the la the 8 6 la the 44 8 6 m the 5 6 b the 2 6 f the 3 6 01 13 19 18 6 m the 27 m the 5 6 b the 11 18 6 f the 16 b the 2 6 18 6 f the 3 6 18 6 CSCI 5832 Spring 2007 20 EM Step 4 Do that for all the words in the table Recompute the alignment probabilities using the new word translation probabilities Recompute the fractional counts Normalize to get new word translation probs Iterate until done When you re done you can use the numbers to get the most probable alignment From which you can read off all the parameters you need 01 13 19 CSCI 5832 Spring 2007 21 EM Alignment This is totally weird voodoo science We need some parameters which we don t have We can get them from a word aligned corpus which we don t have So we make up some parameters to get the alignment and then use that alignment to get the right numbers 01 13 19 CSCI 5832 Spring 2007 22 Parts List Given a sentence alignment we can induce a word alignment Given that word alignment we can get the p t d and n parameters we need for the model Ie We can argmax P e f by max over P f e P e and we can do that by iterating over some large space of f possibilities 01 13 19 CSCI 5832 Spring 2007 23 Break Quiz Coverage Semantics Compositional and IE Chapter 16 Pages 1 29 Chapter 19 Sections 19 1 19 4 Review Bioinformatics Slides Discourse Chapter 21 Pages 1 31 skip 20 6 2 MT Chapter 24 Pages 1 41 skip 24 2 01 13 19 CSCI 5832 Spring 2007 24 Decoding Remember Viterbi Just a fancier Viterbi Given foreign sentence f find English sentence e that maximizes P e x …


View Full Document

CU-Boulder CSCI 5832 - Lecture Notes

Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?