DOC PREVIEW
CMU LTI 11731 - Controlled Language Input/Output

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Controlled Language Input/Output11-731 Machine TranslationTeruko MitamuraLanguage Technologies InstituteCarnegie Mellon University2Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Outline• Introduction– What is Controlled Language?– Goals of Controlled Language– Types of Controlled Language– Advantages and Challenges• History of CL & Applications– Document Authoring– Document Translation3Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Outline [2]• Designing a Controlled Vocabulary and Grammar• Deployment Issues for CL• Evaluating the Use of Controlled Language• Automatic Rewriting for MT4Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Introduction5Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.What is Controlled Language?• A form of language usage restricted by grammar and vocabulary rules• No single “controlled language” for English• Controlled language can be used:– solely as a guideline for authoring– with a checking tool to verify conformance– in conjunction with machine translation6Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Goals of Controlled Language– Achieve consistent authoring– Encourage clear and direct writing– Improve the quality of translation output– Use as input to machine translation systemse.g. The KANT System, CASL System7Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Types of Controlled Language• Human-oriented CL: to improve text comprehension by humans (for authors and translators)• Machine-oriented CL: to improve “text comprehension” by computers (for CL checkers or MT systems)8Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Designing for Different Types of CLHuman-oriented CLMachine-oriented CLAuthorTranslatorsAuthor +CL CheckerMTHCL docMCL docTL docPost-editors9Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Examples of Writing Rules• Do not use sentences with more than 20 words• Do not use passive voice• Do not make noun clusters of more than 4 nouns• Write only one instruction per sentence10Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Examples [2]• Make your instructions as specific as possible• Use a bulleted layout for long lists• Present new and complex information slowly and carefullyQ: Which rules can be checked automatically?11Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.CL Advantages• Improves the source text:– readability– comprehensibility– consistency– reusability• Improves translation:– controlled texts easier to translate– consistent text easier to reuse12Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.CL Challenges• Writing may become more time-consuming• An additional verification step is required• Developing a CL may be costly• CL use must be evaluated carefully13Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.History of CL & Applications14Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Roots of CL• C.K. Ogden’s “Basic English” (1930’s)–850 basic words– an “international language”, foundation for learning standard English– never widely used15Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Roots of CL [2]• Caterpillar Fundamental English (CFE) -1970’s– Non-technical vocabulary and grammar– First version had only 850 terms– For non-native English speakers – Abandoned after ~10 years:• insufficient for complex writing• CFE difficult to train and enforce16Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.ExamplesNon CFE: “Enlarge the hole.”CFE: “Use a drill to make the hole larger.”Non CFE: “The brake components must be matchedduring installation.”CFE: “The brake parts with same numbers on the lower ends of the brake shoes must be installed together.”17Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Survey of CLsOgden’s BasicEnglishCaterpillarFundamentalEnglish (CFE)Smart’s Plain English Program (PEP)White’s International Language for Serving and Maintenance (ILSAM)•Clark•Rockwell International•Hyster•AECMA•IBM•Ericsson Telecom•Boeing SE18Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.CL Checking• Aids an author in determining whether a text conforms to a particular CL– Verify all words & phrases are approved– Verify all writing rules are obeyed– May offer help to the author when words or sentences not in the CL are found19Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.CL for Machine Translation• Use of software to analyze texts and translate to other languages• Technical Translation– Large segment of translation market– Documentation for complex products (e.g., consumer electronics, computer hardware, heavy machinery, automobiles, etc.)– Involves large, specialized vocabulary– Writing style may be complicated20Carnegie MellonSchool of Computer Science11-731 Machine TranslationCopyright © 2005, Carnegie Mellon. All Rights Reserved.Challenges for MT• Ambiguity– Lexical, Structural, Referential• Complexity– Assigning meaning to complex syntactic structures• Controlled language reduces the


View Full Document

CMU LTI 11731 - Controlled Language Input/Output

Download Controlled Language Input/Output
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Controlled Language Input/Output and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Controlled Language Input/Output 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?