DOC PREVIEW
CMU LTI 11731 - Discriminative Word Alignment

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Machine Translation Discriminative Word AlignmentGenerative Alignment ModelsDiscriminative Word AlignmentTasks2005 – Year of DWAYang Liu et al. 2005More FeaturesSearchMoore 2005TrainingModeling Alignment with CRFModeling Alignment MatrixSlide 14Probability of AlignmentFeaturesLocal FeaturesFertility FeaturesFirst Order FeaturesInference – Finding the Best AlignmentBelief PropagationGetting the ProbabilitySlide 23Some Results: Spanish-EnglishSummaryMachine TranslationDiscriminative Word AlignmentStephan VogelSpring Semester 2011Stephan Vogel - Machine Translation) 2Generative Alignment ModelsGenerative word alignment models: P(f, a|e) = …Alignment a as hidden variableActual word alignment is not observedSum over all alignmentsWell-known IBM 1 … 5 models, HMM, ITGModel lexical association, distortion, fertilityIt is difficult to incorporate additional informationPOS of words (used in distortion model, not as direct link features)Manual dictionarySyntax information…Stephan Vogel - Machine Translation) 3Discriminative Word AlignmentModel alignment directly: p(a | f, e)Find alignment that maximizes p(a | f, e) Well-suited framework: maximum entropySet of feature functions hm(a, f, e), m = 1, …, MSet of model parameters (feature weights) cm, m = 1, …, MDecision rule:Stephan Vogel - Machine Translation) 5TasksModeling: design feature functions which capture cross-lingual divergencesSearch: find alignment with highest probabilityTraining: find optimal feature weightsMinimize alignment errors given some gold-standard alignments(Notice: Alignments no longer hidden!) Supervised training, i.e. we evaluate against gold standardNotice: features functions may result from some training procedure themselvesE.g. use statistical dictionary resulting from IBMn alignment, trained on large corpusHere now additional training step, on small (hand-aligned) corpus(Similar to MERT for decoder)Stephan Vogel - Machine Translation) 62005 – Year of DWAYang Liu, Qun Liu, and Shouxun Lin. 2005.Loglinear Models for Word Alignment.Abraham Ittycheriah and Salim Roukos. 2005.A Maximum Entropy Word Aligner for Arabic-English Machine Translation.Ben Taskar, Simon Lacoste-Julien, and Dan Klein. 2005.A Discriminative Matching Approach to Word Alignment.Robert C. Moore. 2005.A Discriminative Framework for Bilingual Word Alignment.Necip Fazil Ayan, Bonnie J. Dorr, and Christof Monz. 2005.NeurAlign: Combining Word Alignments Using Neural Networks.Stephan Vogel - Machine Translation) 7Yang Liu et al. 2005Start out with features used in generative alignmentLexicons E.g. IBM1Use both directions: p(fj|ei) and p(ei|fj), =>Symmetrical alignment modelAnd/or symmetric modelFertility model: p(i|ei)Stephan Vogel - Machine Translation) 8More FeaturesCross count: number of crossings in alignmentNeighbor count: count the number of links in the immediate neighborhoodExact match: count number of src/tgt pairs, where src=tgtLinked word count: total number of links (to influence density)Link types: count how many 1-1, 1-m, m-1, n-m alignmentsSibling distance: if word is aligned to multiple words, add the distance between these aligned wordsLink Co-occurrence count: given multiple alignments (e.g. Viterbi alignments from IBM models) count how often links co-occurStephan Vogel - Machine Translation) 9SearchGreedy search based on gain by adding a linkFor each of the features the gain can be calculatedE.g. IBM1Algorithm:Start with empty alignmentLoop until no addition gain Loop over all (j,i) not in set if gain(j,i) > best_gain then store as (j’,i’) Set link(j’,i’)Stephan Vogel - Machine Translation) 10Moore 2005Log-Likelihood-based modelMeasure word association strengthValues can get largeConditional-Link-Probability-basedEstimated probability of two words being linkedUsed simpler alignment model to establish linksAdd simple smoothingAdditional features: one-to-one, one-to-many, non-monotonicityStephan Vogel - Machine Translation) 11TrainingFinding optimal alignment is non-trivialAdding link can affect nonmonotonicity, one-to-many featuresDynamic programming does not workBeam search could be usedRequires pruningParameter optimizationModified version of average perceptron learning)),,(),,(( feahfeahrefirefiiiStephan Vogel - Machine Translation) 12Modeling Alignment with CRFCRF is an undirected graphical modelEach vertex (node) represents a random variable whose distribution is to be inferredEach edge represents a dependency between two random variablesThe distribution of each discrete random variable Y in the graph is conditioned on an input sequence X. Cliques: set of nodes in graph fully connectedIn our case:Features derived from source and target words are the input sequence XAlignment links are the random variables YDifferent ways to model alignmentBlunsom & Cohn (2006): many-to-one word alignments, where each source word is aligned with zero or one target words (-> asymmetric)Niehues & Vogel (2008): model not sequence, but entire alignment matrix(->symmetric)Stephan Vogel - Machine Translation) 13Modeling Alignment MatrixRandom variables yji for all possible alignment links2 values: 0/1 – word in position j is not linked/linked to word in position iRepresented as nodes in a graphStephan Vogel - Machine Translation) 14Modeling Alignment MatrixFactored nodes x representing features (observables)Linked to random variablesDefine potential for each yjiStephan Vogel - Machine Translation) 15Probability of Alignment))(exp(1 ))(exp(1 )(1)|(pfunction potential-))(exp()(ctor weight vea- vectorfeature a -)(clique) (a nodes connected ofset a - nodes factored ofset - FNFNFNVcccVcccVccccccccccFNVFZVFZVZxyVFVVFVVStephan Vogel - Machine Translation) 16FeaturesLocal features, e.g. lexical, POS, …Fertility featuresFirst-order features: capturing relation between linksPhrase-features: interaction between word and phrase alignmentStephan Vogel - Machine Translation) 17Local FeaturesLocal information about link probabilityFeatures derived from positions j and i onlyFactored node connected to only one random variableFeaturesLexical probabilities, also


View Full Document

CMU LTI 11731 - Discriminative Word Alignment

Download Discriminative Word Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Discriminative Word Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Discriminative Word Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?