CMU LTI 11731 - MT-Class-110216-MER - D2603024

Home> Schools> Carnegie Mellon University> Language Technologies Institute (LTI) > LTI 11731> MT-Class-110216-MER

DOC PREVIEW

CMU LTI 11731 - MT-Class-110216-MER

School name Carnegie Mellon University

Course Lti 11731- MACHINE TRANSLATION

Pages 27

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Translation Minimum Error Rate TrainingOverviewTuning the SMT SystemProblemsBrute Force Approach – Manual TuningAutomatic TuningAutomatic Tuning on N-best ListSimplex (Nelder-Mead)DemoExpansion and ContractionChanging the SimplexPowell Line SearchMinimum Error TrainingSlide 14Slide 15Slide 16Slide 17Slide 18Iterate Decoding - OptimizationAvoiding Local MinimaRandom RestartsOptimizing NOT Towards ReferencesOptimizing Towards Different MetricsGeneralization to other Test SetsLarge Weight = Important Feature?Open IssuesSummaryStephan Vogel - Machine Translation 1Stephan VogelSpring Semester 2011Machine TranslationMinimum Error Rate TrainingStephan Vogel - Machine Translation 2OverviewOptimization approachesSimplexMERAvoiding local minimaAdditional considerationsTuning towards different metricsTuning on different development setsStephan Vogel - Machine Translation 3Tuning the SMT SystemWe use different models in SMT systemModels have simplificationsTrained on different amounts of data=> Models have different levels of reliability and scores have different ranges=> Give different weight to different ModelsQ = c1 Q1 + c2 Q2 + … + cn QnFind optimal scaling factors (feature weights) c1 … cnOptimal means: Highest score for chosen evaluation metric Mie: find (c1, …, cn) such that M(argmine{Q(e,f)}) is highMetric M is our objective functionStephan Vogel - Machine Translation 4ProblemsThe surface of the objective function is not niceNot convex -> local minima (actually, many local minima)Not differentiable -> gradient descent methods not readily applicableThere may be dangerousareas (‘boundary cliffs’)Example:Tune on Dev set withshort reference translationsOptimization leads towardsshort translationsNew test set has long reference translationsTranslations are now too short ->length penaltySmall changeBig effectStephan Vogel - Machine Translation 5Brute Force Approach – Manual TuningDecode with different scaling factorsGet feeling for range of good valuesGet feeling for importance of modelsLM is typically most importantSentence length (word count feature) to balance shortening effect of LMWord reordering is more or less effective depending on languageNarrow down range in which scaling factors are testedEssentially multi-linear optimizationWorks good for small number of modelsTime consuming (CPU wise) if decoding takes long timeStephan Vogel - Machine Translation 6Automatic TuningMany algorithms to find (near) optimal solutions availableSimplexPowell (line search)MIRA (Margin Infused Relaxed Algorithm)Specially designed minimum error training (Och 2003)Genetic algorithmNote: models are not improved, only their combinationNote: some parameters change performance of decoder, but are not in QNumber of alternative translationBeam sizeWord reordering restrictionsStephan Vogel - Machine Translation 7Automatic Tuning on N-best ListOptimization algorithm need many iterations – too expensive to run full translations=> Use n-best listse.g. for each of 500 source sentences 1000 translationsChange scaling factors results in re-ranking the n-best listsEvaluate new 1-best translationsApply any of the standard optimization techniquesAdvantage: much fasterCan pre-calculate the counts (e.g. n-gram matches) for each translation to speed up evaluationStephan Vogel - Machine Translation 8Simplex (Nelder-Mead)Start with n+1 random configurationsGet 1-best translation for each configuration -> objective functionSort points xk according to objective function:f(x1) < f(x2) < … < f(xn+1)Calculate x0 as center of gravity for x1 … xnReplace worst point with a point reflected through the centroidxr = x0 + r * (x0 – xn+1)Stephan Vogel - Machine Translation 9DemoObviously, we need to change the size of the simplex to enforce convergenceAlso, want to adjust the step sizeIf new point is best point – increase step sizeIf new point is worse then x1 … xn – decrease step size119127869Stephan Vogel - Machine Translation 10Expansion and ContractionReflection:Calculate xr = x0 + r * (x0 – xn+1)if f(x1) <= f(xr) < f(xn) replace xn+1 with xr; Next iterationExpansion:If reflected point is better then best, i.e. f(xr) < f(x1) Calculate xe = x0 + e * (x0 – xn+1) If f(xe) < f(xr) then replace xn+1 with xe else replace xn+1 with xr Next iterationelse ContractContraction:Reflected point f(xr) >= f(xn)Calculate xc = xn+1 + c * (x0 – xn+1)If f(xc) <= f(xn+1) then replace xn+1 with xc else ShrinkShrinking:For all xk, k = 2 … n+1: xk = x1 + s * (xk – x1)Next iterationStephan Vogel - Machine Translation 11Changing the Simplexxn+1x1reflectionxn+1x0expansionxn+1x0contractionxn+1x0shrinkingStephan Vogel - Machine Translation 12Powell Line SearchSelect directions in search space, thenLoop until convergence Loop over directions d Perform line search for direction d until convergenceMany variantsSelect directionsEasiest is to use the model scoresOr combine multiple scoresStep size in line search MER (Och 2003) is line search along models with smart selection of stepsStephan Vogel - Machine Translation 13Minimum Error TrainingFor each hypothesis we haveQ =  ck*QkSelect oneQ\k = ck Qk + n\k cn*Qn = ck Qk + QRestckMetric ScoreWER = 8TotalModelScoreQRestQkIndividual model scoregives slope1Stephan Vogel - Machine Translation 14Minimum Error TrainingSource sentence 1Depending on scaling factor ck, different hyps are in 1-best positionSet ck to have metric-best hyp also being model-bestckh11: WER = 8h12 : WER = 5h13 : WER = 4best hyp:h11h12h138 5 4ModelScoreStephan Vogel - Machine Translation 15Minimum Error TrainingSelect minimum number of evaluation pointsCalculate intersection pointKeep only if hyps are minimum at that pointChoose evaluation points between intersection pointsckh11: WER = 8h12 : WER = 5h13 : WER = 4best hyp:h11h12h138 5 4ModelScoreStephan Vogel - Machine Translation 16Minimum Error TrainingSource sentence 1, now different error scoresOptimization would find a different ck=> Different metrics lead to different scaling factorsckModelScoreh11: WER = 8h12 : WER = 2h13 : WER = 4best hyp:h11h12h138 2 4Stephan Vogel - Machine Translation 17Minimum Error TrainingSentence 2Best ck in a different rangeNo

View Full Document