DOC PREVIEW
Berkeley COMPSCI 294 - Statistical Natural Language Processing

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 294-5: StatisticalNatural Language ProcessingMachine Translation IILecture 11: 10/12/05Phrase MovementDes tremblements de terre ont à nouveau touché le Japon jeudi 4 novembre. On Tuesday Nov. 4, earthquakes rocked Japan once againPhrase Movement The HMM Model Model 2 preferred global monotonicity We want local monotonicity: Most jumps are small HMM model (Vogel 96) Re-estimate using the forward-backward algorithm Handling nulls requires some care What are we still missing?-3 -2 -1 0 1 2 3IBM Models 3/4/5Mary did not slap the green witchMary not slap slap slap the green witch Mary not slap slap slap NULL the green witchn(3|slap)Mary no daba una botefada a la verde brujaMary no daba una botefada a la bruja verdeP(NULL)t(la|the)d(j|i)[Al-Onaizan and Knight, 1998]Examples: Translation and Fertility2Example: Idioms Example: MorphologyAlignment Error Rate Alignment Error RateSure align.Possible align.Predicted align.=== Some Results [Och and Ney 03]Decoding In these word-to-word models Finding best alignments is easy Finding translations is hard (why?)Bag “Generation” (Decoding)3Bag Generation as a TSP Imagine bag generation with a bigram LM Words are nodes Edge weights are P(w|w’) Valid sentences are Hamiltonian paths Not the best news for word-based MT!itisnotclear.IBM Decoding as a TSPDecoding, Anyway Simplest possible decoder: Enumerate sentences, score each with TM and LM Greedy decoding: Assign each French word it’s most likely English translation Operators: Change a translation Insert a word into the English (zero-fertile French) Remove a word from the English (null-generated French) Swap two adjacent English words Do hill-climbing (or annealing)Greedy DecodingStack Decoding Stack decoding: Beam search Usually A* estimates for completion cost One stack per candidate sentence length Other methods: Dynamic programming decoders possible if we make assumptions about the set of allowable permutationsWSD? Remember when we discussed WSD? Word-based MT systems rarely have a WSD step Why


View Full Document

Berkeley COMPSCI 294 - Statistical Natural Language Processing

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download Statistical Natural Language Processing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Statistical Natural Language Processing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Statistical Natural Language Processing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?