MIT 6 863J - Lecture 19: Machine translation 3 - D2046854

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 863J> Lecture 19: Machine translation 3

DOC PREVIEW

MIT 6 863J - Lecture 19: Machine translation 3

School name Massachusetts Institute of Technology

Course 6 863j- Natural Language and the Computer Representation of Knowledge

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

6.863J Natural Language Processing Lecture 19: Machine translation 3 Robert C. Berwick The Menu Bar • Administrivia: • Start w/ final projects – (final proj: was 20% - boost to 35%, 4 labs 55%?) • Agenda: • MT: the statistical approach • Formalize what we did last time • Divide & conquer: 4 steps • Noisy channel model • Language Model • Translation model • Scrambling & Fertility; NULL words 6.863J/9.611J Lecture 19 Sp03 Submenu Like our alien system • The basic idea: moving from Language A to Language B • The noisy channel model • Juggling words in translation – bag of words model; divide & translate • Using n-grams – the Language Model • The Translation Model • Estimating parameters from data • Bootstrapping via EM • Searching for the best solution 6.863J/9.611J Lecture 19 Sp03 • We will have two parts: that will tell us1. A bi-lingual dictionary what e words go w/ what f words 2. A shake-n-bake idea of how the words might get scrambled around We get these from cycling between alignment & word translations – re-estimation loop on which words linked with which other words 6.863J/9.611J Lecture 19 Sp03‘George Bush’ model of translation (noisy channel) IBM “Model 3” (observed) f noise (corrupted) e rendered English • First to do this, late 80s: Brown et al, Computational Linguistics• We’ll follow that paper & 1993 paper on estimating parameters French text f “The Mathematics of Statistical Machine Translation”, , 1990 (orig 1988 conference) – “Candide” Same French text f, e are strings of (french, english) words 6.863J/9.611J Lecture 19 Sp03 • 1993: Brown, Della Pietra, et al, “The mathematics of statistical MT” J. Assoc. Comp. Ling, 19:2, 264-311. 6.863J/9.611J Lecture 19 Sp03 Summary of components – Model 3 OK, what are the other models? • The language model: P(e) • The translation model for P(f|e) • Word translation t • Distortion (scrambling) d • Fertility f words e0 and f0• (really evil): null • Maximize (A* search) through product space 6.863J/9.611J Lecture 19 Sp03 • Model 1 – just t • Model 2 – just t & simple d • What are they for? • As we’ll see – used to pipeline training – get estimates for Model 3 6.863J/9.611J Lecture 19 Sp036.863J/9.611J Lecture 19 Sp03 e.g. which P(f|e), or P(?| e1 e0 ) P(les|the) 6.863J/9.611J Lecture 19 Sp03 How to estimate? EM Algorithm some associations, lower others The training data - Hansard Q: What do you think is the biggest error source in Hansard? A: How about this – P(? | hear, hear) as in “Hear Hear!” • Formalize alignment • Formalize dictionary in terms of P(f|e) • Formalize shake-n-bake in terms of P(e) • Formalize re-estimation in terms of the • Give initial estimate (uniform), then up pr’s of Fundamentals Finding the pr estimates • The basic equation ê = argmax Pr(e) Pr(f|e) • Language Model Probability Estimation - Pr(e) • Translation Model Probability Estimation -Pr(f|e) • Search Problem - maximizing their product 6.863J/9.611J Lecture 19 Sp03 • Usual problem: sparse data • We cannot create a “sentence dictionary” E « F • we do not see a sentence even twice, let alone once 6.863J/9.611J Lecture 19 Sp03Let’s see what this means P(e) – Language model P(e) x P(f|e) • Review: it does the job of ordering the English words text• We estimate this from monolingual Factor 1: Language Factor 2: Translation Model Model 6.863J/9.611J Lecture 19 Sp03 • Just like our alien language bigram data 6.863J/9.611J Lecture 19 Sp03 Bag translation? Bag results? • Take sentence, cut into words, put in bag, shake, recover original sentence • Why? (why: show how it gets order of English language, for P(e) estimate) • How? Use n-gram model to rank difft arrangements of words: • S better than S’ if P(S) > P(S’) • Test: 100 S’s, trigram model 6.863J/9.611J Lecture 19 Sp03 • Exact reconstruction (63%) • Please give me your response as soon as possible • Please give me your response as soon as possible • Reconstruction that preserves meaning (20%) • Now let me mention some of the disadvantages • Let me mention some of the disadvantages • Rest – garbage • In our organization research has two missions • In our missions research organization has two • What is time complexity? What K does this use? 6.863J/9.611J Lecture 19 Sp03Estimating P(e) mustard P(f|e) - Recall Model 3 story: French • IBM used trigrams • LOTS of them… we’ll see details later • For now… 6.863J/9.611J Lecture 19 Sp03 • Words in English replaced by French words, then scrambled • Let’s review how • Not word for word replacement (can’t always have same length sentences) 6.863J/9.611J Lecture 19 Sp03 Alignment as the “Translation Example alignmentModel” 0 1 2 3 4 5 6 • e0 And the program has been implemented • f0 Le programme a été mis en application 0 1 2 3 4 5 6 7 • Notation: f0(1) Le(2) programme(3) a(4) été(5) mis (6) en(6) application(6) = [2 3 4 5 6 6 6] 6.863J/9.611J Lecture 19 Sp03 The proposal will not now be implemented d F t Les propositions ne seront pas mises en application maintenant 4 parameters for P(f|e) 1. Word translation, tSpurious word toss-in, p 2. Distortion (scrambling), d3. Fertility, F 6.863J/9.611J Lecture 19 Sp03Notation OK, what parameters do we need? • e= English sentence • f = French sentence • English sentence i= 1, 2, …, l words • ei = ith english word • Look at dependencies in the generative • fj = jth french word story!• l = # of words in English sentence • m = # words in French sentence • 3 basic parameters • a = alignment (vector of integers a1 a2 … am where each aj • Parameter 1: Which f word to generate ranges from 0 to l) • aj = actual English position connected to by the jth French depends only on English word e that is word in alignment a doing generating • eaj = actual English word connected to by the jth French • Example: prob(fromage | monkey) word in alignment a � Fi = fertility of English word i (i = 1 to l) given alignment a • Denote these by t(ti | ei) 6.863J/9.611J Lecture 19 Sp03 6.863J/9.611J Lecture 19 Sp03 Procrustean bed Fertility 1. iFor each word e in the english sentence e, i= 1, 2, …, l, we choose a fertility f(ei), equal • Prob that monkey will produce certain #to 0, 1, 2,…[25] of French words•

View Full Document