6.863J Natural Language Processing Lecture 19: Machine translation 3 Robert C. Berwick The Menu Bar • Administrivia: • Start w/ final projects – (final proj: was 20% - boost to 35%, 4 labs 55%?) • Agenda: • MT: the statistical approach • Formalize what we did last time • Divide & conquer: 4 steps • Noisy channel model • Language Model • Translation model • Scrambling & Fertility; NULL words 6.863J/9.611J Lecture 19 Sp03 Submenu Like our alien system • The basic idea: moving from Language A to Language B • The noisy channel model • Juggling words in translation – bag of words model; divide & translate • Using n-grams – the Language Model • The Translation Model • Estimating parameters from data • Bootstrapping via EM • Searching for the best solution 6.863J/9.611J Lecture 19 Sp03 • We will have two parts: that will tell us1. A bi-lingual dictionary what e words go w/ what f words 2. A shake-n-bake idea of how the words might get scrambled around We get these from cycling between alignment & word translations – re-estimation loop on which words linked with which other words 6.863J/9.611J Lecture 19 Sp03‘George Bush’ model of translation (noisy channel) IBM “Model 3” (observed) f noise (corrupted) e rendered English • First to do this, late 80s: Brown et al, Computational Linguistics• We’ll follow that paper & 1993 paper on estimating parameters French text f “The Mathematics of Statistical Machine Translation”, , 1990 (orig 1988 conference) – “Candide” Same French text f, e are strings of (french, english) words 6.863J/9.611J Lecture 19 Sp03 • 1993: Brown, Della Pietra, et al, “The mathematics of statistical MT” J. Assoc. Comp. Ling, 19:2, 264-311. 6.863J/9.611J Lecture 19 Sp03 Summary of components – Model 3 OK, what are the other models? • The language model: P(e) • The translation model for P(f|e) • Word translation t • Distortion (scrambling) d • Fertility f words e0 and f0• (really evil): null • Maximize (A* search) through product space 6.863J/9.611J Lecture 19 Sp03 • Model 1 – just t • Model 2 – just t & simple d • What are they for? • As we’ll see – used to pipeline training – get estimates for Model 3 6.863J/9.611J Lecture 19 Sp036.863J/9.611J Lecture 19 Sp03 e.g. which P(f|e), or P(?| e1 e0 ) P(les|the) 6.863J/9.611J Lecture 19 Sp03 How to estimate? EM Algorithm some associations, lower others The training data - Hansard Q: What do you think is the biggest error source in Hansard? A: How about this – P(? | hear, hear) as in “Hear Hear!” • Formalize alignment • Formalize dictionary in terms of P(f|e) • Formalize shake-n-bake in terms of P(e) • Formalize re-estimation in terms of the • Give initial estimate (uniform), then up pr’s of Fundamentals Finding the pr estimates • The basic equation ê = argmax Pr(e) Pr(f|e) • Language Model Probability Estimation - Pr(e) • Translation Model Probability Estimation -Pr(f|e) • Search Problem - maximizing their product 6.863J/9.611J Lecture 19 Sp03 • Usual problem: sparse data • We cannot create a “sentence dictionary” E « F • we do not see a sentence even twice, let alone once 6.863J/9.611J Lecture 19 Sp03Let’s see what this means P(e) – Language model P(e) x P(f|e) • Review: it does the job of ordering the English words text• We estimate this from monolingual Factor 1: Language Factor 2: Translation Model Model 6.863J/9.611J Lecture 19 Sp03 • Just like our alien language bigram data 6.863J/9.611J Lecture 19 Sp03 Bag translation? Bag results? • Take sentence, cut into words, put in bag, shake, recover original sentence • Why? (why: show how it gets order of English language, for P(e) estimate) • How? Use n-gram model to rank difft arrangements of words: • S better than S’ if P(S) > P(S’) • Test: 100 S’s, trigram model 6.863J/9.611J Lecture 19 Sp03 • Exact reconstruction (63%) • Please give me your response as soon as possible • Please give me your response as soon as possible • Reconstruction that preserves meaning (20%) • Now let me mention some of the disadvantages • Let me mention some of the disadvantages • Rest – garbage • In our organization research has two missions • In our missions research organization has two • What is time complexity? What K does this use? 6.863J/9.611J Lecture 19 Sp03Estimating P(e) mustard P(f|e) - Recall Model 3 story: French • IBM used trigrams • LOTS of them… we’ll see details later • For now… 6.863J/9.611J Lecture 19 Sp03 • Words in English replaced by French words, then scrambled • Let’s review how • Not word for word replacement (can’t always have same length sentences) 6.863J/9.611J Lecture 19 Sp03 Alignment as the “Translation Example alignmentModel” 0 1 2 3 4 5 6 • e0 And the program has been implemented • f0 Le programme a été mis en application 0 1 2 3 4 5 6 7 • Notation: f0(1) Le(2) programme(3) a(4) été(5) mis (6) en(6) application(6) = [2 3 4 5 6 6 6] 6.863J/9.611J Lecture 19 Sp03 The proposal will not now be implemented d F t Les propositions ne seront pas mises en application maintenant 4 parameters for P(f|e) 1. Word translation, tSpurious word toss-in, p 2. Distortion (scrambling), d3. Fertility, F 6.863J/9.611J Lecture 19 Sp03Notation OK, what parameters do we need? • e= English sentence • f = French sentence • English sentence i= 1, 2, …, l words • ei = ith english word • Look at dependencies in the generative • fj = jth french word story!• l = # of words in English sentence • m = # words in French sentence • 3 basic parameters • a = alignment (vector of integers a1 a2 … am where each aj • Parameter 1: Which f word to generate ranges from 0 to l) • aj = actual English position connected to by the jth French depends only on English word e that is word in alignment a doing generating • eaj = actual English word connected to by the jth French • Example: prob(fromage | monkey) word in alignment a � Fi = fertility of English word i (i = 1 to l) given alignment a • Denote these by t(ti | ei) 6.863J/9.611J Lecture 19 Sp03 6.863J/9.611J Lecture 19 Sp03 Procrustean bed Fertility 1. iFor each word e in the english sentence e, i= 1, 2, …, l, we choose a fertility f(ei), equal • Prob that monkey will produce certain #to 0, 1, 2,…[25] of French words•
View Full Document