Unformatted text preview:

A Statistical MT Tutorial Workbook Kevin Knight prepared in connection with the JHU summer workshop April 30 1999 1 The Overall Plan We want to automatically analyze existing human sentence translations with an eye toward building general translation rules we will use these rules to translate new texts automatically I know this looks like a thick workbook but if you take a day to work through it you will know almost as much about statistical machine translation as anybody The basic text that this tutorial relies on is Brown et al The Mathematics of Statistical Machine Translation Computational Linguistics 1993 On top of this excellent presentation I can only add some perspective and perhaps some sympathy for the poor reader who has after all done nothing wrong Important terms are underlined throughout 2 Basic Probability We re going to consider that an English sentence e may translate into any French sentence f Some translations are just more likely than others Here are the basic notations we ll use to formalize more likely P e a priori probability The chance that e happens For example if e is the English string I like snakes then P e is the chance that a certain person at a certain time will say I like snakes as opposed to saying something else P f e conditional probability The chance of f given e For example if e is the English string I like snakes and if f is the French string maison bleue then P f e is the chance that upon seeing e a translator will produce f Not bloody likely in this case P e f joint probability The chance of e and f both happening If e and f don t influence each other then we can write P e f P e P f For example if e stands for the first roll of the die comes up 5 and f stands for the second roll of the die comes up 3 then P e f P e P f 1 6 1 6 1 36 If e and f do influence each other then we had better write P e f P e P f e That means the chance that e happens times the chance that if e happens then f happens If e and f are strings that are mutual translations then there s definitely some influence Exercise P e f P f All these probabilities are between zero and one inclusive A probability of 0 5 means there s a half a chance 3 Sums and Products To represent the addition of integers from 1 to n we write n S i i 1 1 2 3 n For the product of integers from 1 to n we write n P i i 1 1 2 3 n If there s a factor inside a summation that does not depend on what s being summed over it can be taken outside n S i k i 1 n k 2k 3k nk k S i i 1 Exercise n P i k i 1 Sometimes we ll sum over all strings e Here are some useful things about probabilities S P e 1 e S P e f 1 e P f S P e P f e e You can read the last one like this Suppose f is influenced by some event Then for each possible influencing event e we calculate the chance that 1 e happened and 2 if e happened then f happened To cover all possible influencing events we add up all those chances 4 Statistical Machine Translation Given a French sentence f we seek the English sentence e that maximizes P e f The most likely translation Sometimes we write argmax P e f e Read this argmax as follows the English sentence e out of all such sentences which yields the highest value for P e f If you want to think of this in terms of computer programs you could imagine one program that takes a pair of sentences e and f and returns a probability P e f We will look at such a program later on e P e f f Or you could imagine another program that takes a sentence f as input and outputs every conceivable string ei along with its P ei f This program would take a long time to run even if you limit English translations some arbitrary length e1 P e1 f f en P en f 5 The Noisy Channel Memorize Bayes Rule it s very important P e f P e P f e P f Exercise Now prove it using the exercise in section 2 Using Bayes Rule we can rewrite the expression for the most likely translation argmax P e f argmax P e P f e e e Exercise What happened to P f That means the most likely translation e maximizes the product of two terms 1 the chance that someone would say e in the first place and 2 if he did say e the chance that someone else would translate it into f The noisy channel works like this We imagine that someone has e in his head but by the time it gets on to the printed page it is corrupted by noise and becomes f To recover the most likely e we reason about 1 what kinds of things people say any English and 2 how English gets turned into French These are sometimes called source modeling and channel modeling People use the noisy channel metaphor for a lot of engineering problems like actual noise on telephone transmissions If you want to think of P e in terms of computer programs you can think of one program that takes any English string e and outputs a probability P e We ll see such a program pretty soon Or likewise you can think of a program that produces a long list of all sentences ei with their associated probabilities P ei e P e e1 P e1 en P en To think about the P f e factor imagine another program that takes a pair of sentences e and f and outputs P f e Or likewise a program that takes a sentence e and produces various sentences fi along with corresponding probabilities P fi e e P f e f f1 P f1 e e fn P fn e These last two programs are sort of like the ones in section 4 except P f e is not the same thing as P e f You can put the source and channel modules together like this e P e f P f e There are many ways to produce the same French sentence f Each way corresponds to a different choice of source sentence e Notice that the modules have arrows pointing to the right This is called a generative model because it is a theory of how French sentences get generated The theory is first an English sentence is generated then it gets turned into French Kind of a weird theory 6 Bayesian Reasoning Even though the arrows point …


View Full Document

UMD CMSC 723 - A Statistical MT Tutorial Workbook

Documents in this Course
Lecture 9

Lecture 9

12 pages

Smoothing

Smoothing

15 pages

Load more
Loading Unlocking...
Login

Join to view A Statistical MT Tutorial Workbook and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Statistical MT Tutorial Workbook and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?