Automatic Category Label Coarsening for Syntax-Based Machine Translation




3 views

Unformatted text preview:

Automatic Category Label Coarsening for Syntax Based Machine Translation Greg Hanneman and Alon Lavie Language Technologies Institute Carnegie Mellon University Pittsburgh PA 15213 USA ghannema alavie cs cmu edu Abstract We consider SCFG based MT systems that get syntactic category labels from parsing both the source and target sides of parallel training data The resulting joint nonterminals often lead to needlessly large label sets that are not optimized for an MT scenario This paper presents a method of iteratively coarsening a label set for a particular language pair and training corpus We apply this label collapsing on Chinese English and French English grammars obtaining test set improvements of up to 2 8 BLEU 5 2 TER and 0 9 METEOR on Chinese English translation An analysis of label collapsing s effect on the grammar and the decoding process is also given 1 Introduction A common modeling choice among syntax based statistical machine translation systems is the use of synchronous context free grammar SCFG where a source language string and a target language string are produced simultaneously by applying a series of re write rules Given a parallel corpus that has been statistically word aligned and annotated with constituency structure on one or both sides SCFG models for MT can be learned via a variety of methods Parsing may be applied on the source side Liu et al 2006 on the target side Galley et al 2004 or on both sides of the parallel corpus Lavie et al 2008 Zhechev and Way 2008 In any of these cases using the raw label set from source and or target side parsers can be undesirable Label sets used in statistical parsers are usually inherited directly from monolingual treebank projects where the inventory of category labels was designed by independent teams of human linguists These labels sets are not necessarily ideal for statistical parsing let alone for bilingual syntax based translation models Further the side s on which syntax is represented defines the






Loading Unlocking...
Login

Join to view Automatic Category Label Coarsening for Syntax-Based Machine Translation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Automatic Category Label Coarsening for Syntax-Based Machine Translation and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?