New version page

MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching

This preview shows page 1-2-3-4-5-6 out of 17 pages.

View Full Document
View Full Document

End of preview. Want to read all 17 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

MEMT: Multi-Engine Machine Translation Guided by Explicit Word MatchingMEMT Goals and ApproachSynthetic Combination MEMTSlide 4The Word Alignment MatcherMatcher ExampleScoring MEMT HypothesesDemoExampleSystem DevelopmentExperimental Results: Arabic-to-EnglishArchitecture and EngineeringUIMA-based MEMTSlide 14ConclusionsOpen Research IssuesReferencesMEMT:Multi-Engine Machine Translation Guided by Explicit Word MatchingAlon LavieLanguage Technologies InstituteCarnegie Mellon University Joint work with:Gregory Hanneman, Justin Merrill, Shyamsundar Jayaraman, Satanjeev Banerjee, Jaime CarbonellMarch 22, 2006 GALE: MEMT 2MEMT Goals and Approach•Scientific Challenge:–How to combine the output of multiple MT engines into a synthetic output that outperforms the originals in translation quality–Synthetic combination of the output from the original systems, NOT just selecting the best system•Engineering Challenge:–How to integrate multiple distributed translation engines and the MEMT combination engine in a common framework that supports ongoing development and evaluationMarch 22, 2006 GALE: MEMT 3Synthetic Combination MEMTTwo Stage Approach:1. Identify common words and phrases across the translations provided by the engines2. Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translationExample:1. announced afghan authorities on saturday reconstituted four intergovernmental committees 2. The Afghan authorities on Saturday the formation of the four committees of governmentMarch 22, 2006 GALE: MEMT 4Synthetic Combination MEMTTwo Stage Approach:1. Identify common words and phrases across the translations provided by the engines2. Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translationExample:1. announced afghan authorities on saturday reconstituted four intergovernmental committees 2. The Afghan authorities on Saturday the formation of the four committees of governmentMEMT: the afghan authorities announced on Saturday the formation of four intergovernmental committeesMarch 22, 2006 GALE: MEMT 5The Word Alignment Matcher•Developed by Satanjeev Banerjee as a component in our METEOR Automatic MT Evaluation metric•Finds maximal alignment match with minimal “crossing branches”•Allows alignment of:–Identical words–Morphological variants of words–Synonymous words (based on WordNet synsets)•Implementation: Clever search algorithm for best match using pruning of sub-optimal sub-solutionsMarch 22, 2006 GALE: MEMT 6Matcher Examplethe sri lanka prime minister criticizes the leader of the countryPresident of Sri Lanka criticized by the country’s Prime MinisterMarch 22, 2006 GALE: MEMT 7Scoring MEMT Hypotheses•Scoring:–Word confidence score [0,1] based on engine confidence and reinforcement from alignments of the words–LM score based on trigram LM–Log-linear combination: weighted sum of logs of confidence score and LM score–Select best scoring hypothesis based on:•Total score (bias towards shorter hypotheses)•Average score per wordMarch 22, 2006 GALE: MEMT 8DemoMarch 22, 2006 GALE: MEMT 9ExampleIBM: victims russians are one man and his wife and abusing their eight year old daughter plus a ( 11 and 7 years ) man and his wife and driver , egyptian nationality . : 0.6327 ISI: The victims were Russian man and his wife, daughter of the most from the age of eight years in addition to the young girls ) 11 7 years ( and a man and his wife and the bus driver Egyptian nationality. : 0.7054 CMU: the victims Cruz man who wife and daughter both critical of the eight years old addition to two Orient ( 11 ) 7 years ) woman , wife of bus drivers Egyptian nationality . : 0.5293 MEMT Sentence : Selected : the victims were russian man and his wife and daughter of the eight years from the age of a 11 and 7 years in addition to man and his wife and bus drivers egyptian nationality . 0.7647 -3.25376Oracle : the victims were russian man and wife and his daughter of the eight years old from the age of a 11 and 7 years in addition to the man and his wife and bus drivers egyptian nationality young girls . 0.7964 -3.44128March 22, 2006 GALE: MEMT 10System Development•Initial development tests performed on TIDES 2003 Arabic-to-English MT data, using IBM, ISI and CMU SMT system output•Evaluation tests performed on Arabic-to-English EBMT Apptek and SYSTRAN system output and on three Chinese-to-English COTS systems•Tests on GALE dry-run data currently in progress:–MT systems from IBM, CMU, UMDMarch 22, 2006 GALE: MEMT 11Experimental Results:Arabic-to-EnglishSystem METEOR ScoreApptek .4241EBMT .4231Systran .4405Choosing best online translation .4432MEMT .5185Best hypothesis generated by MEMT .5883March 22, 2006 GALE: MEMT 12Architecture and Engineering•Challenge: How do we construct an effective architecture for running MEMT within large-scale distributed projects?–Example: GALE Project–Multiple MT engines running at different locations–Input may be text or output of speech recognizers, Output may go downstream to other applications (IE, Summarization, TDT)•Approach: Using IBM’s UIMA: Unstructured Information Management Architecture–Provides support for building robust processing “workflows” with heterogeneous components–Components act as “annotators” at the character level within documentsMarch 22, 2006 GALE: MEMT 13UIMA-based MEMT•MT engines and MEMT engine are set up as distributed servers:–Communication over socket connections–Sentence-by-sentence translation•Java “wrappers” convert these into UIMA-style annotator components•UIMA-based “workflows” implement a variety of a-synchronous tasks, with results stored in a common Annotations Database (ADB)–Translation workflows–MEMT workflow–Evaluation/scoring workflow•ADB and ADB Collection Reader/Consumer components developed at CMU by Eric Nyberg’s groupMarch 22, 2006 GALE: MEMT 14UIMA-based MEMT•MEMT Workflow:–Retrieve document translation annotations labeled by X, Y, Z from ADB–“Annotate” the document with a new MEMT annotation–Write back MEMT annotation into ADBMarch 22, 2006 GALE: MEMT 15Conclusions•New sentence-level MEMT approach with nice properties and encouraging results•Easy to run on both research and COTS systems•UIMA-based architecture design for effective integration in large distributed systems/projects–Pilot study has been


Loading Unlocking...
Login

Join to view MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?