View Full Document

The MILE Corpus for Less Commonly Taught Languages



View the full content.
View Full Document
View Full Document

9 views

Unformatted text preview:

The MILE Corpus for Less Commonly Taught Languages name address1 address2 address3 email name address1 address2 address3 email Abstract This paper describes a small structured English corpus that is designed for translation into Less Commonly Taught Languages LCTLs and a set of re usable tools for creation of similar corpora 1 The corpus is highly structured so that it can support machine learning with only a small amount of data The corpus systematically explores meanings that are known to affect morphology or syntax in the world s languages Each sentence is associated with a feature structure showing the elements of meaning that are represented in the sentence As part of the REFLEX program the corpus will be translated into multiple LCTLs resulting in parallel corpora can be used for training of MT and other language technologies name address1 address2 address3 email machine translation system Currently there are efforts to build language packs for Less Commonly Taught Languages LCTLs Each language pack includes parallel corpora consisting of naturally occurring text translated from English into the LCTL or vice versa This paper describes a small corpus that supplements the naturally occurring text with highly systematic enumeration of meanings that are known to affect morphology and syntax in the world s languages The supplemental corpus will enable the exploration of constructions that are sparse or obscured in complex data The corpus consists of n000 English sentences totaling n000 words It will be translated into each of the seven targeted LCTLs each year 1 Introduction Of the 6 000 living languages in the world only a handful have the necessary monolingual or bilingual resources to build a working statistical or example based 1 Acknowledge NSF AVENUE This work was supported by the United States Central Intelligence Agency This paper describes the construction of the corpus including tools Figure 1 A sampling of sentences from the complete elicitation corpus



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view The MILE Corpus for Less Commonly Taught Languages and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The MILE Corpus for Less Commonly Taught Languages and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?