DOC PREVIEW
MIT 6 863J - Natural Language Processing

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.863J Natural Language ProcessingLecture 24: Machine Translation 3Instructor: Robert C. [email protected]/9.611J SP04 Lecture 24The Menu Bar• Administrivia:• Final project!• Agenda:• Formalize what we did last time: Shake ‘n Bake• Divide & conquer: 4 steps• Noisy channel model• Language Model• Translation model• Scrambling & Fertility6.863J/9.611J SP04 Lecture 24Alien languages: Alpha-centauri & Betelgeuse-1a. ok-voon ororok sprok . 2a. ok-drubel ok-voon anok plok sprok.1b. at-voon bichat dat . 2b. at-drubel at-voon pippat rrat dat .3a. erok sprok izok hihok ghirok . 4a. ok-voon anok drok brok jok3b. totat dat arrat vat hilat . 4b. at-voon krat pippat sat lat .5a. wiwok farok izok stok . 6a. lalok sprok izok jok stok .5b. totat jjat quat cat . 6b. wat dat krat quat cat .7a. lalok farok ororok lalok sprok izok enemok .7b. wat jjat bichat wat dat vat eneat .8a. lalok brok anok plok nok .9a. wiwok nok izok kantok ok-yurp .8b. iat lat pippat rrat nnat . 9b. totat nnat quat oloat at-yurp .10a. lalok mok nok yorok ghirok clok . 10b. wat nnat gat mat bat hilat .11a. lalok nok crrrok hihok yorok zanzanok .11b. wat nnat arrat mat zanzanat .12a. lalok rarok nok izok hihok mok .12b. wat nnat forat arrat vat gat .6.863J/9.611J SP04 Lecture 24We will build two things• Assume word-word translation – though not same word order• Use alignment of words to build translation dictionary • Use translation dictionary to improve the alignment – because it eliminates some possibilities6.863J/9.611J SP04 Lecture 24To begin-1a. ok-voon ororok sprok . 2a. ok-drubel ok-voon anok plok sprok .1b. at-voon bichat dat . 2b. at-drubel at-voon pippat rrat dat .3a. erok sprok izok hihok ghirok. 4a. ok-voon anok drok brok jok .3b. totat dat arrat vat hilat . 4b. at-voon krat pippat sat lat .5a. wiwok farok izok stok . 6a. lalok sprok izok jok stok .5b. totat jjat quat cat . 6b. wat dat krat quat cat .7a. lalok farok ororok lalok sprok izok enemok .7b. wat jjat bichat wat dat vat eneat .8a. lalok brok anok plok nok . 9a. wiwok nok izok kantok ok-yurp .8b. iat lat pippat rrat nnat . 9b. totat nnat quat oloat at-yurp .10a. lalok mok nok yorok ghirokclok . 10b. wat nnat gat mat bat hilat.11a. lalok nok crrrok hihok yorok zanzanok . Translation dictionary:11b. wat nnat arrat mat zanzanat .12a. lalok rarok nok izok hihok mok .12b. wat nnat forat arrat vat gat .ghiork – hilatok-drubel – at-drubelok-voon – at-voonok-yurp – at – yurpzananok - zanzanat6.863J/9.611J SP04 Lecture 24OK, what does pairing buy us?• Sentence 1: 2 possibilities left…1. ororok ↔ bichat & sprok ↔ dat2. ororok ↔ dat & sprok ↔ bichat(But also: what if ororok untrans aux v…?)Which is more likely?Look for sentence w/ sprok but not ororokSentence (2a)Link throughout corpus (1, 2, 3, 6, 7)Sentence (2) now looks like a good place to crack…6.863J/9.611J SP04 Lecture 24Sentences 2, 3…• S2: anok plok/pippat rrat• S4: 4a. ok-voon anok drok brok jok .4b. at-voon krat pippat sat lat .Ok, anok↔ pippat & plok ↔ rratS3: So far we have:erok sprok izok hihok ghiroktotat dat arrat dat hilatLook at 8; 11; 3 & 12; 5, 6, 96.863J/9.611J SP04 Lecture 24This suggestserok sprok izok hihok ghiroktotat dat arrat vat hilat6.863J/9.611J SP04 Lecture 24Note:• Aligning builds the translation dictionary• Building the translation dictionary aids alignment• “Decipherment”• We shall see how this can be automated next time6.863J/9.611J SP04 Lecture 24The dictionary so far…anok - pippat ok-yurp - at-yurperok - total ok-voon - at-voonghirok - hilat ororok - bichathihok - arrat plok - rratizok – vat/quat sprok - datok-drubel - at-drubel zanzanok - zanzanat6.863J/9.611J SP04 Lecture 24Full dictionaryanok - pippat mok – gat ok-yurp - at-yurpbrok – lat nok – nnat clok – batcrrok – none? ok-drubel – at-drubeldrok – sat ok-yurp – at-yurpenemok – eneat ororok – bichat erok - total plok - rratfarok – jjat rarok - foratghirok - hilat sprok - dathihok - arrat stok - catizok – vat/quat wiwok - totatjok – krat yorok - matkantok – oloat zanzanok - zanzanatlalok – wat/iat6.863J/9.611J SP04 Lecture 24If you work through it you’ll getall the pairs here, save 1: crrrok• But you are suddenly abducted to the Federation Translation Center & presented with this sentence from Betelgeuse to translate into Alpha-Centaurian:• iat lat pippat eneat hilat oloat at-yurp .6.863J/9.611J SP04 Lecture 24Translation B to A• 13(B) iat lat pippat eneat hilat oloat at-yurp• Consult dictionary – 7 words can be directly looked up• iat lat pippat eneat hilat oloat at-yurp• Many possible word orders for ‘felicitous’ translation!…how do we decide?6.863J/9.611J SP04 Lecture 24You are given this fragmentof Alpha-C text & its bigrams… to help6.863J/9.611J SP04 Lecture 24The translation (answer sheet)• iat lat pippat eneat hilat oloat at-yurp• Word for word:(13a) Lalok brok anok enemok ghirok kantok ok-yurpLalok brok anok {enemok ghirok kantok ok-yurp}Lalok brok anok ghirok {enemok kantok ok-yurp}Lalok brok anok ghirok enemok {kantok ok-yurp}Final: Lalok brok anok ghirok enemok kantok ok-yurp• (14b) totat nnat forat arrat mat baterok? wiwok?Now what? Wiwok to…?6.863J/9.611J SP04 Lecture 24Various possibilitiesWiwok…(14a) Wiwok rarok nok crrrok hihok yorok clok…6.863J/9.611J SP04 Lecture 24How is this like/unlike ‘real’ translation• Only 2 of the 27 AlphaC words were ambiguous• Sentence length unchanged in all but one• Sentences much shorter than typical• Words & context -• Output word order should be sensitive to input word order (J. loves M, M loves J)• Data cooked• No phrasal dictionary (amok plok = pippat rrat)6.863J/9.611J SP04 Lecture 24The actual sentences1. Garcia and associates.Garcia y associados.2. Carlos Garcias has three associates.Carlos Garcias tiene tres associados.3. His associates are not strong.Sus associados no son fuertes.4. Garcia has a company also.5. Its clients are angry.6. The associates are also angry.7. The clients and the associates are enemies.6.863J/9.611J SP04 Lecture 24Statistical Machine Translation• The fundamental idea of statistical MT is to let the computer learn how to do MT through studying the translation statistics from a bilingual corpus6.863J/9.611J SP04 Lecture 24What’s the data? What are we doing?• Pairs of sentences that are translations of one another are used• Learn parameters


View Full Document

MIT 6 863J - Natural Language Processing

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Natural Language Processing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Natural Language Processing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Natural Language Processing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?