CMSC 723 / LING 645: Intro to Computational LinguisticsMT Challenges: AmbiguityMT Challenges: DivergencesDivergence FrequencyPowerPoint PresentationSlide 6Application of Divergence Detection: Bilingual Alignment for MTThe Problem: Alignment & ProjectionWhy is this a hard problem?Slide 10Our Goal: Improved Alignment & ProjectionDUSTer Approach: Divergence UnravelingSlide 13Slide 14Word-Level Alignment ResultsDivergence Unraveling ConclusionsHow do we evaluate MT?BiLingual Evaluation Understudy (BLEU —Papineni, 2001)Bleu ComparisonHow Do We Compute Bleu Scores?Modified Unigram Precision: Candidate #1Modified Unigram Precision: Candidate #2Modified Bigram Precision: Candidate #1Modified Bigram Precision: Candidate #2Catching CheatersCMSC 723 / LING 645: Intro to Computational LinguisticsSeptember 8, 2004: DorrMT (continued), MT EvaluationProf. Bonnie J. DorrDr. Christof MonzTA: Adam LeeMT Challenges: AmbiguitySyntactic AmbiguityI saw the man on the hill with the telescopeLexical AmbiguityE: bookS: libro, reservarSemantic Ambiguity–Homography:ball(E) = pelota, baile(S)–Polysemy:kill(E), matar, acabar (S)–Semantic granularityesperar(S) = wait, expect, hope (E)be(E) = ser, estar(S)fish(E) = pez, pescado(S)MT Challenges: DivergencesMeaning of two translationally equivalent phrases is distributed differently in the two languagesExample:–English: [RUN INTO ROOM]–Spanish: [ENTER IN ROOM RUNNING]Divergence Frequency32% of sentences in UN Spanish/English Corpus (5K)35% of sentences in TREC El Norte Corpus (19K)Divergence Types–Categorial (X tener hambre X have hunger) [98%]–Conflational (X dar puñaladas a Z X stab Z) [83%]–Structural (X entrar en Y X enter Y) [35%]–Head Swapping (X cruzar Y nadando X swim across Y) [8%]–Thematic (X gustar a Y Y like X) [6%]Spanish/Arabic DivergencesDivergence E/E’ (Spanish) E/E’ (Arabic)Categorial be jealous when he returns have jealousy [tener celos] upon his return [ ﻪﻋﻭﺠﺮ ﺩﻧﻋ]Conflational float come again go floating [ir flotando] return [ﺪﺎﻋ]Structural enter the house seek enter in the house [entrar en la casa] search for [ ﻦﻋ ﺙﺣﺒ]Head Swap run in do something quickly enter running [entrar corriendo] go-quickly in doing something [ﻉﺭﺴﺍ] Thematic I have a headache my-head hurts me [me duele la cabeza] —[Arg1 [V]] [Arg1 [MotionV] Modifier(v)]“The boat floated’’ “The boat went floating’’(using narrowly defined divergence detection rules)Language Detected Human Sample Corpus Confirmed Size SizeSpanish – Total 11.1% 10.5% 19K 150KArabic – Total 31.9 12.5% 1K 28KAutomatic Divergence DetectionApplication of Divergence Detection: Bilingual Alignment for MTWord-level alignments of bilingual texts are an integral part of MT modelsDivergences present a great challenge to the alignment taskCommon divergence types can be found in multiple language pairs, systematically identified, and resolvedThe Problem:Alignment & Projection I began to eat the fishYo empecé a comer el pescadoWhy is this a hard problem? I run into the roomYo entro en el cuarto corriendoDivergences!English: [RUN INTO ROOM]Spanish: [ENTER IN ROOM RUNNING]Our Goal: Improved Alignment & ProjectionInduce higher interannotator agreement rateIncrease the number of aligned wordsDecrease multiple alignmentsDUSTer Approach: Divergence UnravelingI run into the roomE:I move-in running the roomE:Yo entro en el cuarto corriendoS:Word-Level Alignment (1): Test SetuprunJohnintoroomJohnenterroomrunning Ex: John ran into the room → John entered the room runningDivergence Detection: Categorize English sentences into one of 5 divergence typesDivergence Correction: Apply appropriate structural transformation [E → E]Word-Level Alignment (2): Testing Impact of Divergence CorrectionHuman align English and foreign sentenceHuman align English and foreign sentenceCompare inter-annotator agreement, unaligned units, multiple alignmentsWord-Level Alignment ResultsInter-Annotator Agreement: – English-Spanish: agreement increased from 80.2% to 82.9%– English-Arabic: agreement increased from 69.7% to 75.1%Number of aligned words:– English-Spanish: aligned words increased from 82.8% to 86%– English-Arabic: aligned words increased from 61.5% to 88.1%Multiple Alignments:– English-Spanish: number of links went from 1.35 to 1.16– English-Arabic: number of links increased from 1.48 to 1.72Divergence Unraveling ConclusionsDivergence handling shows promise for improvement of automatic alignmentConservative lower bound on divergence frequencyEffective solution: syntactic transformation of EnglishValidity of solution shown through alignment experimentsHow do we evaluate MT?Human-based Metrics–Semantic Invariance–Pragmatic Invariance–Lexical Invariance–Structural Invariance–Spatial Invariance–Fluency–Accuracy–“Do you get it?”Automatic Metrics: BleuBiLingual Evaluation Understudy (BLEU —Papineni, 2001)Automatic Technique, but ….Requires the pre-existence of Human (Reference) TranslationsApproach:–Produce corpus of high-quality human translations–Judge “closeness” numerically (word-error rate)–Compare n-gram matches between candidate translation and 1 or more reference translationshttp://www.research.ibm.com/people/k/kishore/RC22176.pdfBleu ComparisonChinese-English Translation Example:Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.How Do We Compute Bleu Scores?Key Idea: A reference word should be considered exhausted after a matching candidate word is identified.•For each word compute: (1) candidate word count(2) maximum ref count•Add counts for each
View Full Document