6.863J Natural Language Processing Lecture 17: Machine translation I Robert C. Berwick The Menu Bar • Administrivia: • Start w/ final projects, unless there are objections • Agenda: • Machine Translation (MT) as a ‘litmus test’ or ‘sandbox’ (graveyard?) for putting together all of NLP • Practical systems: Phraselator; Systran (Babelfish); Logos,… 6.863J/9.611J Lecture 17 Sp03Submenu bar • What is MT? • Why MT as litmus test? • A brief history of time • Getting in the sandbox (nitty gritty) • The current methods: the great triangle • Word-word • Transfer • Interlingual • (Statistical methods used in all) 6.863J/9.611J Lecture 17 Sp03 Why study this? • Contains all parts of NLP • Famously hard: more or less a Turing test – have computer fool you that there’s a human translator behind the curtain • Current applications & trends • Web pages • High quality semantics-based in restricted domains – weather reports; equipment manuals • Software assistants for MT • Automatic knowledge acquisition for improving MT 6.863J/9.611J Lecture 17 Sp03The golden (Bermuda?) triangle 6.863J/9.611J Lecture 17 Sp03 The golden (Bermuda?) triangle high Increasing Interlingual meaning (universal) abstraction thematic syntactic low s word-word Source (eg, Spanish) t Target (eg, English) 6.863J/9.611J Lecture 17 Sp03Then too • We all have our favorite Monty Python episodes… 6.863J/9.611J Lecture 17 Sp03 The Full Monty • “My hovercraft… is full of eels” • Hungarian: “Can you direct me to the railway station?” • […censored…] • Mi aerodeslizador es lleno de anguilas • Where is the men’s room? • ¿Dónde está el cuarto de los hombres? 6.863J/9.611J Lecture 17 Sp03A few more idioms… • Out of sight, out of mind • ????????, • From vision to heart • Famous MT – on mag tape – to Russian: ?? ???????????, ?? ?????? From the sighting, from the reason 6.863J/9.611J Lecture 17 Sp03 What is MT? • Use of computer • to target language (semi)automatically Translate text (speech) from source • Can have humans in the loop • Holy Grail: FAHQT 6.863J/9.611J Lecture 17 Sp03Why MT? • EU uses > 2000 translators for 11 languages • What % of web is other than English? • 10% done w/ Systran • Professional translator gets 15-20 cents/word (Chinese 3x as much) 6.863J/9.611J Lecture 17 Sp03 MT • Given a sentence s in the source language S, return a sentence t in the target language T that conveys the same meaning as s • ‘conveys the same meaning’ is left unspecified! 6.863J/9.611J Lecture 17 Sp03A brief history of time – the dawn age • 1946/47: First discussions on the feasibility of Machine Translation (Warren Weaver and Andrew Booth – after Rockefeller Fdn turned down computer analysis of protein structure…) • 1949: Weaver’s memorandum (considered to be the single act which initiated MT R&D) • MT studies at MIT (Weiner), Univ. of1950-52: Washington, UCLA, National Bureau of Standards (NBS), and RAND Corporation. • 1951: Yehosha Bar-Hillel becomes first full-time MT research person; his appointment was at MIT 6.863J/9.611J Lecture 17 Sp03 The dawn age: the codebreakers • 1952: First MT Conference, MIT • 1952: Creation of the Georgetown University research team under Léon Dostert • 1954: Georgetown-IBM experiment, IBM Technical Computing Bureau, NY; English-Russian MT (eventually: Systran) • 1954: First English MT research team, Cambridge University • 1954: First issue of Mechanical Translation • 1955: First known Soviet MT research 6.863J/9.611J Lecture 17 Sp03And then came.. • 1956: First international conference on MT • 1959: Bar-Hillel’s Report on the state of machine translation in the United States and Great Britain: “pig in the pen” example • Continued US efforts in MT1956-1966: including: University of Washington, IBM’s Watson Research Center; University of Texas; Georgetown University; RAND Corporation; University of Michigan; MIT; National Bureau of Standards, Harvard University ... 6.863J/9.611J Lecture 17 Sp03 The Dark ages..(?) • 1964: the Automatic Language Processing Advisory Committee (ALPAC) formed by the National Academy of Sciences to study the feasibility of machine translation • 1966: the ALPAC published its Language and machines: computers in translation and linguistics report, known simply as The ALPAC Report • The ALPAC Report essentially quashed MT research in the US and other parts of the world until the early 1980’s with some exceptions • Why? 6.863J/9.611J Lecture 17 Sp03Let’s see why… • Approach it like a cryptographic problem • Word-for-word cipher • Here’s a sample from alien languages (courtesy K. Knight) 6.863J/9.611J Lecture 17 Sp03Alien languages: Alpha-centauri & Betelgeuse-1a. ok-voon ororok sprok . 2a. ok-drubel ok-voon anok plok sprok . 1b. at-voon bichat dat . 2b. at-drubel at-voon pippat rrat dat . 3a. erok sprok izok hihok ghirok . 4a. ok-voon anok drok brok jok . 3b. totat dat arrat vat hilat . 4b. at-voon krat pippat sat lat . 5a. wiwok farok izok stok . 6a. lalok sprok izok jok stok . 5b. totat jjat quat cat . 6b. wat dat krat quat cat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 8a. lalok brok anok plok nok . 9a. wiwok nok izok kantok ok-yurp 8b. iat lat pippat rrat nnat . 9b. totat nnat quat oloat at-yurp 10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat . 11a. lalok nok crrrok hihok yorok zanzanok . 11b. wat nnat arrat mat zanzanat . 12a. lalok rarok nok izok hihok mok . 6.863J/9.611J Lecture 17 Sp0312b. wat nnat forat arrat vat gat . We will build two things • Assume word-word translation – though not same word order of words to build• Use alignment translation dictionary • Use translation dictionary to improve the alignment – because it eliminates some possibilities 6.863J/9.611J Lecture 17 Sp03To begin -1a. ok-voon ororok sprok . 2a. ok-drubel ok-voon anok plok sprok . 1b. at-voon bichat dat . 2b. at-drubel at-voon pippat rrat dat . 3a. erok sprok izok hihok ghirok . 4a. ok-voon anok drok brok jok . 3b. totat dat arrat vat hilat . 4b. at-voon krat pippat sat lat . 5a. wiwok farok izok stok . 6a. lalok sprok izok jok stok . 5b. totat jjat quat cat . 6b. wat dat krat quat cat . 7a. lalok farok ororok lalok sprok izok enemok . 7b. wat jjat bichat wat dat vat eneat . 8a. lalok brok anok plok
View Full Document