DOC PREVIEW
MIT 6 863J - Lecture 17: Machine translation I

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.863J Natural Language Processing Lecture 17: Machine translation I Robert C. Berwick The Menu Bar • Administrivia: • Start w/ final projects, unless there are objections • Agenda: • Machine Translation (MT) as a ‘litmus test’ or ‘sandbox’ (graveyard?) for putting together all of NLP • Practical systems: Phraselator; Systran (Babelfish); Logos,… 6.863J/9.611J Lecture 17 Sp03Submenu bar • What is MT? • Why MT as litmus test? • A brief history of time • Getting in the sandbox (nitty gritty) • The current methods: the great triangle • Word-word • Transfer • Interlingual • (Statistical methods used in all) 6.863J/9.611J Lecture 17 Sp03 Why study this? • Contains all parts of NLP • Famously hard: more or less a Turing test – have computer fool you that there’s a human translator behind the curtain • Current applications & trends • Web pages • High quality semantics-based in restricted domains – weather reports; equipment manuals • Software assistants for MT • Automatic knowledge acquisition for improving MT 6.863J/9.611J Lecture 17 Sp03The golden (Bermuda?) triangle 6.863J/9.611J Lecture 17 Sp03 The golden (Bermuda?) triangle high Increasing Interlingual meaning (universal) abstraction thematic syntactic low s word-word Source (eg, Spanish) t Target (eg, English) 6.863J/9.611J Lecture 17 Sp03Then too • We all have our favorite Monty Python episodes… 6.863J/9.611J Lecture 17 Sp03 The Full Monty • “My hovercraft… is full of eels” • Hungarian: “Can you direct me to the railway station?” • […censored…] • Mi aerodeslizador es lleno de anguilas • Where is the men’s room? • ¿Dónde está el cuarto de los hombres? 6.863J/9.611J Lecture 17 Sp03A few more idioms… • Out of sight, out of mind • ????????, • From vision to heart • Famous MT – on mag tape – to Russian: ?? ???????????, ?? ?????? From the sighting, from the reason 6.863J/9.611J Lecture 17 Sp03 What is MT? • Use of computer • to target language (semi)automatically Translate text (speech) from source • Can have humans in the loop • Holy Grail: FAHQT 6.863J/9.611J Lecture 17 Sp03Why MT? • EU uses > 2000 translators for 11 languages • What % of web is other than English? • 10% done w/ Systran • Professional translator gets 15-20 cents/word (Chinese 3x as much) 6.863J/9.611J Lecture 17 Sp03 MT • Given a sentence s in the source language S, return a sentence t in the target language T that conveys the same meaning as s • ‘conveys the same meaning’ is left unspecified! 6.863J/9.611J Lecture 17 Sp03A brief history of time – the dawn age • 1946/47: First discussions on the feasibility of Machine Translation (Warren Weaver and Andrew Booth – after Rockefeller Fdn turned down computer analysis of protein structure…) • 1949: Weaver’s memorandum (considered to be the single act which initiated MT R&D) • MT studies at MIT (Weiner), Univ. of1950-52: Washington, UCLA, National Bureau of Standards (NBS), and RAND Corporation. • 1951: Yehosha Bar-Hillel becomes first full-time MT research person; his appointment was at MIT 6.863J/9.611J Lecture 17 Sp03 The dawn age: the codebreakers • 1952: First MT Conference, MIT • 1952: Creation of the Georgetown University research team under Léon Dostert • 1954: Georgetown-IBM experiment, IBM Technical Computing Bureau, NY; English-Russian MT (eventually: Systran) • 1954: First English MT research team, Cambridge University • 1954: First issue of Mechanical Translation • 1955: First known Soviet MT research 6.863J/9.611J Lecture 17 Sp03And then came.. • 1956: First international conference on MT • 1959: Bar-Hillel’s Report on the state of machine translation in the United States and Great Britain: “pig in the pen” example • Continued US efforts in MT1956-1966: including: University of Washington, IBM’s Watson Research Center; University of Texas; Georgetown University; RAND Corporation; University of Michigan; MIT; National Bureau of Standards, Harvard University ... 6.863J/9.611J Lecture 17 Sp03 The Dark ages..(?) • 1964: the Automatic Language Processing Advisory Committee (ALPAC) formed by the National Academy of Sciences to study the feasibility of machine translation • 1966: the ALPAC published its Language and machines: computers in translation and linguistics report, known simply as The ALPAC Report • The ALPAC Report essentially quashed MT research in the US and other parts of the world until the early 1980’s with some exceptions • Why? 6.863J/9.611J Lecture 17 Sp03Let’s see why… • Approach it like a cryptographic problem • Word-for-word cipher • Here’s a sample from alien languages (courtesy K. Knight) 6.863J/9.611J Lecture 17 Sp03Alien languages: Alpha-centauri & Betelgeuse-1a. ok-voon ororok sprok . 2a. ok-drubel ok-voon anok plok sprok . 1b. at-voon bichat dat . 2b. at-drubel at-voon pippat rrat dat . 3a. erok sprok izok hihok ghirok . 4a. ok-voon anok drok brok jok . 3b. totat dat arrat vat hilat . 4b. at-voon krat pippat sat lat . 5a. wiwok farok izok stok . 6a. lalok sprok izok jok stok . 5b. totat jjat quat cat . 6b. wat dat krat quat cat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 8a. lalok brok anok plok nok . 9a. wiwok nok izok kantok ok-yurp 8b. iat lat pippat rrat nnat . 9b. totat nnat quat oloat at-yurp 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 12a. lalok rarok nok izok hihok mok . 6.863J/9.611J Lecture 17 Sp0312b. wat nnat forat arrat vat gat . We will build two things • Assume word-word translation – though not same word order of words to build• Use alignment translation dictionary • Use translation dictionary to improve the alignment – because it eliminates some possibilities 6.863J/9.611J Lecture 17 Sp03To begin -1a. ok-voon ororok sprok . 2a. ok-drubel ok-voon anok plok sprok . 1b. at-voon bichat dat . 2b. at-drubel at-voon pippat rrat dat . 3a. erok sprok izok hihok ghirok . 4a. ok-voon anok drok brok jok . 3b. totat dat arrat vat hilat . 4b. at-voon krat pippat sat lat . 5a. wiwok farok izok stok . 6a. lalok sprok izok jok stok . 5b. totat jjat quat cat . 6b. wat dat krat quat cat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 8a. lalok brok anok plok


View Full Document

MIT 6 863J - Lecture 17: Machine translation I

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Lecture 17: Machine translation I
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 17: Machine translation I and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 17: Machine translation I 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?