1CS 294-5: StatisticalNatural Language ProcessingMachine TranslationDan Kleinincludes slides from Manning (from Yamada, Knight)Machine TranslationREF:According to the data provided today by the Ministry of ForeignTrade and Economic Cooperation, as of November this year, Chinahas actually utilized 46.959 billion US dollars of foreign capital, including40.007 billion US dollars of direct investment from foreign businessmen.the Ministry of Foreign Trade and Economic Cooperation, including foreigndirect investment 40.007 billion US dollars today provide data includethat year to November china actually using foreign 46.959 billion US dollars andtoday’s available data of the Ministry of Foreign Trade and Economic Cooperationshows that china’s actual utilization of November this year will include 40.007billion US dollars for the foreign direct investment among 46.959 billion US dollarsin foreign capitalIBM4:Yamada/Knight:History 1950’s: Intensive research activity in MT 1960’s: Direct word-for-word replacement 1966 (ALPAC): NRC Report on MT Conclusion: MT no longer worthy of serious scientific investigation. 1966-1975: `Recovery period’ 1975-1985: Resurgence (Europe, Japan) 1985-present: Gradual Resurgence (US)http://ourworld.compuserve.com/homepages/WJHutchins/MTS-93.htmApproachesInterlinguaSemanticStructureSemanticStructureSyntacticStructureSyntacticStructureWordStructureWordStructureSource TextTarget TextSemanticCompositionSemanticDecompositionSemanticAnalysisSemanticGenerationSyntacticAnalysisSyntacticGenerationMorphologicalAnalysisMorphologicalGenerationSemanticTransferSyntacticTransferDirect(Vauquoistriangle)Just a Code? “Also knowing nothing official about, but having guessed and inferred considerable about, the powerful new mechanized methods in cryptography—methods which I believe succeed even when one does not know what language has been coded—one naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ” Warren Weaver (1955:18, quoting a letter he wrote in 1947)Bag Generation2MT System ComponentssourceP(e)efdecoderobserved argmax P(e|f) = argmax P(f|e)P(e)eeefbestchannelP(f|e)Language Model Translation ModelWord-to-Word AlignmentWord-to-Word Alignment (1-Many) Word-to-Word Alignment (Many-1)Word-to-Word (Many-Many) Monotonic TranslationLe Japon secoué par deux nouveaux séismesJapan shaken by two new quakes3Order ChangeLe Japon est au confluent de quatre plaques tectoniquesJapan is at the junction of four tectonic platesPhrase MovementDes tremblements de terre ont à nouveau touché le Japon jeudi 4 novembre. On Tuesday Nov. 4, earthquakes rocked Japan once againHead SwitchingLe tremblement de terre a fait 39 morts et 3,183 blessés.The earthquake killed 39 and wounded 3,183. Non-Literal TranslationUn train s'est également arrêté sans qu'aucun passager ne soit blessé. Injuries were also avoided by the automatic shutdown of a train.IBM Model 1Le Japon secoué par deux nouveaux séismesJapan shaken by two new quakesLearning with EM Model 1 Parameters: P(f|e) Start with P(f|e) uniform, including P(f|null) For each sentence: For each French position i Calculate posterior over English positions, P(ai|i) Increment count of word fiwith word eai Iterate until convergence''(| )(|)(| )iiiiaiiaaPf ePa iPfe=∑4IBM Model 2 HMM Alignment ModelModeling Fertility Cascaded
View Full Document