View Full Document

Experiments with a Noun-Phrase driven Statistical Machine Translation System



View the full content.
View Full Document
View Full Document

4 views

Unformatted text preview:

Experiments with a Noun Phrase driven Statistical Machine Translation System Sanjika Hewavitharana Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh PA 15213 USA sanjika alavie vogel cs cmu edu Abstract This paper presents a noun phrase driven two level statistical machine translation system Noun phrases NPs are used as the unit of decomposition to build a two level hierarchy of phrases English noun phrases are identified using a parser The corresponding translations are induced using a statistical word alignment model Identified noun phrase pairs in the training corpus are replaced with a tag to produce a NP tagged corpus This corpus is then used to extract phrase translation pairs Both NP translations and NP tagged phrases are used in a two level translation decoder NP translations tag NPs in the first level where NP tagged phrases match across NPs to produce translations in the second level The two level system shows significant improvements over a baseline SMT system It also produces longer matching phrases due to the generalization introduced by tagging NPs 1 Introduction When using statistical machine translation SMT systems we often notice that the phrases used to construct the translations are rather short On average these phrases are less than two words long This is in spite of that fact that some phrase extraction methods allow the extraction of arbitrarily long phrases The main reason for this behavior is data sparseness long exact matching phrases are relatively rare in the training data In the decoder these phrases have to compete with abundant shorter phrases Due to this reason Koehn et al 2003 find that phrases longer than three words give little performance improvement However with limited reordering strategies used in most of the statistical machines translation systems a combination of small short phrases does not always generate the desired translation Zhang 2005 shows improved



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Experiments with a Noun-Phrase driven Statistical Machine Translation System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Experiments with a Noun-Phrase driven Statistical Machine Translation System and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?