Machine Translation Word AlignmentOverviewFertility ModelsThe Generative StoryFertility ModelFertility Model: ConstraintsSlide 7Fertility Model: Some IssuesFertility Model: Empty PositionSlide 10DeficiencyIBM 4: 1st Order Distortion ModelInverted AlignmentCharacteristics of Alignment ModelsConsideration: OverfittingExtension: Using Manual DictionariesExtension: Using POSAnd Much More …Alignment ResultsUnaligned WordsAlignment Errors for Most Frequent Words (CH-EN)Sentence Length DistributionSummaryStephan Vogel - Machine Translation 1Machine TranslationWord AlignmentStephan VogelSpring Semester 2011Stephan Vogel - Machine Translation 2OverviewIBM 3: FertilityIBM 4: Relative DistortionAcknowledgement: These slides are based on slides by Hermann Ney and Franz Josef OchStephan Vogel - Machine Translation 3Fertility ModelsBasic concept: each word in one language can generatemultiple words in the other languagedeseo – I would likeübermorgen – the day after tomorrowdeparted – fuhr abThe same word can generate different number of words -> probability distribution Alignment is function -> fertility only on one sideIn my terminology: target words have fertility, i.e. each target word can cover multiple source wordsOthers say source word generates multiple target wordsSome source words are aligned to NULL word, i.e. NULL word has fertilityMany target words are not aligned, i.e. have fertility 0Stephan Vogel - Machine Translation 4The Generative Storye0e1e2e3e4e51 2 0 1 3 0 f01 f11 f12 f31 f41 f42 f43f1f2f3f4f5f6f7fertilitygenerationwordgenerationpermutationgenerationStephan Vogel - Machine Translation 5Fertility ModelJaIJJIJeafef0)|,Pr()|Pr(01101)(ie)(,...,1,~iief Alignment model:Select fertility for each English word:For each English word select a tablet of French words:Select a permutation for the entire sequence of French words:iji ),(:Sum over all realizations:),(),~(001111)|,~Pr()|,Pr(JJaffIIJJefeafStephan Vogel - Machine Translation 6Fertility Model: ConstraintsJjjiiaie1),()(iffi~Fertility bound to alignment:Permutation:French words:iajiiii :,...,1 ,Stephan Vogel - Machine Translation 7Fertility ModelIiiiIIiefpef0 100)|~(),|~Pr(),,~|Pr(),|~Pr()|Pr()|,~Pr(0000000IIIIIIIefefeef IiiiIiiIIepepe110000)|(),|()|Pr(Decomposition into factors:Apply chain rule to each factor, limit dependencies:Fertility generation (IBM 3,4,5):Word generation (IBM 3,4,5):Permutation generation (only IBM 3):IiiIIiJIipef1 1000),,|(!1),,~|Pr(Note: 1/0 results from special model for i = 0.Stephan Vogel - Machine Translation 8Fertility Model: Some IssuesPermutation model can not guaranty that p is a permutation-> Words ca be stacked on top of each other-> This leads to deficiencyPosition i = 0 is not a real position-> special alignment and fertility model for the empty wordStephan Vogel - Machine Translation 9Fertility Model: Empty PositionAlignment assumptions for the empty position i = 0Uniform position distribution for each of the 0 French words generated from e0Place these French words only after all other words have been placedAlignment model for the positions aligned to the Empty position:One position:All positions:0010010!111),,0|(JIip vacantis j if11occupied is j if0:),,0|(00JIijpStephan Vogel - Machine Translation 10Fertility Model: Empty PositionFertility model for words generated by e0, i.e. by empty positionWe assume that each word from f1J requires the Empty word withprobability [1 – p0]Probability that exactly 0from the J words in f1J require the Empty word:': ,:'with ]1['),'|(010'000000JJJppJeJpIiiJStephan Vogel - Machine Translation 14DeficiencyDistortion model for real words is deficientDistortion model for empty word is non-deficientDeficiency can be reduced by aligning more words to the empty wordTraining corpus likelihood can be increased by aligning more words with empty wordPlay with p0!Stephan Vogel - Machine Translation 15IBM 4: 1st Order Distortion ModelIntroduce more detailed dependencies into the alignment (permutation) modelFirst order dependency along e-axisHMMIBM4Stephan Vogel - Machine Translation 16Inverted AlignmentConsider alignmentsDependency along I axis: jumps along the J axisTwo first order models for aligning first word in a set and for aligning remaining wordsWe skip the math :-)},...,,...,1{: JjBiBi ...)|( and ...)|(11jpjp Stephan Vogel - Machine Translation 17Characteristics of Alignment ModelsModel Alignment Fertility E-step DeficientIBM1 Uniform No Exact NoIBM2 0-order No Exact NoHMM 1-order No Exact NoIBM3 0-order Yes Approx YesIBM4 1-order Yes Approx YesIBM5 1-order Yes Approx NoStephan Vogel - Machine Translation 18Consideration: OverfittingTraining on data has always the danger of overfittingModel describes training data in too much detailBut does not perform well on unseen test dataSolution: SmoothingLexicon: distribute some of the probability mass from seen events to unseen eventsfor p( f | e ), do this for each e)For unseen e: uniform distribution or ???Distortion: interpolate with uniform distributionFertility: for many languages ‘longer word’ = ‘more content’E.g. compounds or agglutinative morphologyTrain a model for fertility given word length and interpolate with Interpolate fertility estimates based on word frequency: frequent word, use the word model, low frequency word bias towards the length model ))(|( egp))(|( egp/Iα,I)a|α)p(a(,I)a|p'(ajjjj1111Stephan Vogel - Machine Translation 19Extension: Using Manual DictionariesAdding manual dictionariesSimple method 1: add as bilingual dataSimple method 2: interpolate manual with trained dictionaryUse constraint GIZA (Gao, Nguyen, Vogel, WMT 2010)Can put higher weight on word pairs from dictionary (Och, ACL 2000)Not so simple: “But dictionaries are data too” (Brown et al, HLT 93)Problem: manual dictionaries do not have inflected formPossible
View Full Document