1CS 294-5: StatisticalNatural Language ProcessingMachine Translation IILecture 11: 10/12/05Phrase MovementDes tremblements de terre ont à nouveau touché le Japon jeudi 4 novembre. On Tuesday Nov. 4, earthquakes rocked Japan once againPhrase Movement The HMM Model Model 2 preferred global monotonicity We want local monotonicity: Most jumps are small HMM model (Vogel 96) Re-estimate using the forward-backward algorithm Handling nulls requires some care What are we still missing?-3 -2 -1 0 1 2 3IBM Models 3/4/5Mary did not slap the green witchMary not slap slap slap the green witch Mary not slap slap slap NULL the green witchn(3|slap)Mary no daba una botefada a la verde brujaMary no daba una botefada a la bruja verdeP(NULL)t(la|the)d(j|i)[Al-Onaizan and Knight, 1998]Examples: Translation and Fertility2Example: Idioms Example: MorphologyAlignment Error Rate Alignment Error RateSure align.Possible align.Predicted align.=== Some Results [Och and Ney 03]Decoding In these word-to-word models Finding best alignments is easy Finding translations is hard (why?)Bag “Generation” (Decoding)3Bag Generation as a TSP Imagine bag generation with a bigram LM Words are nodes Edge weights are P(w|w’) Valid sentences are Hamiltonian paths Not the best news for word-based MT!itisnotclear.IBM Decoding as a TSPDecoding, Anyway Simplest possible decoder: Enumerate sentences, score each with TM and LM Greedy decoding: Assign each French word it’s most likely English translation Operators: Change a translation Insert a word into the English (zero-fertile French) Remove a word from the English (null-generated French) Swap two adjacent English words Do hill-climbing (or annealing)Greedy DecodingStack Decoding Stack decoding: Beam search Usually A* estimates for completion cost One stack per candidate sentence length Other methods: Dynamic programming decoders possible if we make assumptions about the set of allowable permutationsWSD? Remember when we discussed WSD? Word-based MT systems rarely have a WSD step Why
View Full Document