UA CSC 620 - Advanced Topics in Natural Language Processing

Unformatted text preview:

C SC 620Advanced Topics in NaturalLanguage ProcessingLecture 244/22Reading List• Readings in Machine Translation, Eds. Nirenburg, S. et al. MIT Press2003.– 19. Montague Grammar and Machine Translation. Landsbergen, J.– 20. Dialogue Translation vs. Text Translation – InterpretationBased Approach. Tsujii, J.-I. And M. Nagao– 21. Translation by Structural Correspondences. Kaplan, R. et al.– 22. Pros and Cons of the Pivot and Transfer Approaches inMultilingual Machine Translation. Boitet, C.– 31. A Framework of a Mechanical Translation between Japaneseand English by Analogy Principle. Nagao, M.– 32. A Statistical Approach to Machine Translation. Brown, P.F. et al.Paper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• Time: Early 1990s• Emergence of the Statistical Approach to MT and to languagemodelling in general– Statistical learning methods for context-free grammars• inside-outside algorithm• Like the the popular Example-Based Machine Translation (EBMT)framework discussed last time, we avoid the explicit construction oflinguistically sophisticated models of grammar• Why now, and not in the 1950s?– Computers 105 times faster– Gigabytes of storage– Large, machine-readable corpora readily available for parameterestimation– It’s our turn – symbolic methods have been tried for 40 yearsPaper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• Machine Translation– Source sentence S– Target sentence T– Every pair (S,T) has a probability– P(T|S) = probability target is T given S– Bayes’ theorem• P(S|T) = P(S)P(T|S)/P(T)Paper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.Paper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.Paper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• The Language Model: P(S)– bigrams:• w1 w2 w3 w4 w5• w1w2, w2w3, w3w4, w4w5– sequences of words• S = w1 … wn• P(S) = P(w1)P(w2| w1)…P(wn | w1 …wn-1)– product of probability of wi given preceding context for wi• problem: we need to know too many probabilities– bigram approximation• limit the context• P(S) ≈ P(w1)P(w2| w1)…P(wn | wn-1)– bigram probability estimation from corpora• P(wi| wi-1) ≈ freq(wi-1wi)/freq(wi-1) in a corpusPaper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• The Language Model: P(S)– n-gram models used successfully in speech recognition– could use trigrams:• w1 w2 w3 w4 w5• w1w2w3, w2w3w4, w3w4w5– problem• need even more data for parameter estimation• sparse data problem even with large corpora• handled using smoothing– interpolate for missing data– estimate trigram probabilities from bigram and unigram dataPaper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• The Translation Model: P(T|S)– Alignment model:• assume there is a transfer relationship between source andtarget words• not necessarily 1-to-1– Example• S = w1 w2 w3 w4 w5 w6 w7• T = u1 u2 u3 u4 u5 u6 u7 u8 u9• w4 -> u3 u5• fertility of w4 = 2• distortion w5 -> u9Paper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• Alignment notation– use word positions in parentheses– no word position, no mapping– Example• ( Les propositions ne seront pas mises en application maintenant | The(1)proposal(2) will(4) not(3,5) now(9) be implemented(6,7,8) )• This particular alignment is not correct, an artifact of their algorithmPaper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• How to compute probability of an alignment?– Need to estimate• Fertility probabilities– P(fertility=n|w) = probability word w has fertility n• Distortion probabilities– P(i|j,l) = probability target word is at position i given source word atposition j and l is the length of the target– Example• (Le chien est battu par Jean | John(6) does beat(3,4) the(1) dog(2))– P(f=1|John)P(Jean|John) x– P(f=0|does) x– P(f=2|beat)P(est|beat)P(battu|beat) x– P(f=1|the)P(Le|the) x– P(f=1|dog)P(chien|dog) x– P(f=1|<null>)P(par|<null>) x distortion probabilities…Paper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• Not done yet– Given T– translation problem is to find S thatmaximizes P(S)P(T|S)– can’t look for all possible S in thelanguage• Idea (Search):– construct best S incrementally– start with a highly likely word transfer– and find a valid alignment– extending candidate S at each step– (Jean aime Marie | * )– (Jean aime Marie | John(1) * )• Failure?– best S not a goodtranslation• language modelfailed or• translation modelfailed– couldn’t find best S• search failurePaper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• Parameter Estimation– English/French• from the Hansard corpus– 100 million words– bilingual Canadian parliamentary proceedings– unaligned corpus– Language Model• P(S) from bigram model– Translation Model• how to estimate this with an unaligned corpus?• Used EM (Estimation and Maximization) algorithm, an iterative algorithm forre-estimating probabilities• Need– P(u|w) for words u in T and w in S– P(n|w) for fertility n and w in S– P(i|j,l) for target position i and source position j and target length lPaper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• Experiment 1:Parameter Estimationfor the TranslationModel– Pick 9,000 mostcommon words forFrench and English– 40,000 sentence pairs– 81,000,000 parameters– Initial guess: minimalassumptionsPaper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• Experiment 1: results– (English) Hear, hear!– (French) Bravo!Paper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.• Experiment 2: Translation from French to English– Make task manageable• English lexicon– 1,000 most frequent English words in corpus• French lexicon– 1,700 most frequent French words in translations completely covered bythe selected English words• 117,000 sentence pairs with words covered by the lexicons• 17 million parameters estimated for the translation model• bigram model of English– 570,000 sentences– 12 million words– 73 test sentences• Categories: (exact, alternate, different), wrong, ungrammaticalPaper 32. A Statistical Approach to MachineTranslation. Brown, P. F. et al.Paper 32.


View Full Document

UA CSC 620 - Advanced Topics in Natural Language Processing

Download Advanced Topics in Natural Language Processing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Advanced Topics in Natural Language Processing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Advanced Topics in Natural Language Processing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?