DOC PREVIEW
CMU LTI 11731 - SMT – Basic Ideas

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Statistical Machine Translation SMT – Basic IdeasOverviewDeciphering ExampleSlide 4… Vocabularies… Word Frequencies… Location in Corpus… Location in Sentence… POS InformationTranslate New Sentences: Ap - EnTranslate New Sentences: En - ApSlide 12Principles of SMTStatistical versus Grammar-BasedStatistical Machine TranslationTasks in SMTNoisy Channel ViewBayesian ApproachSMT ArchitectureLog-Linear ModelSlide 23Corpus StatisticsAll Simple, Basic, ImportantBTEC Spa-EngTokenizationGigaWord CorpusAnd then the more interesting StuffStephan Vogel - Machine Translation 1Statistical Machine TranslationSMT – Basic IdeasStephan VogelMT ClassSpring Semester 2011Stephan Vogel - Machine Translation 2OverviewDeciphering foreign text – an examplePrinciples of SMTData processingStephan Vogel - Machine Translation 3Deciphering ExampleApinaye – EnglishApinaye belongs to the Ge family of BrazilSpoken by 800 (according to SIL, 1994)http://www.ethnologue.com/show_family.asp?subid=90784http://www.language-museum.com/a/apinaye.phpExample from Linguistic Olympics 2008, see http://www.naclo.cs.cmu.eduParallel Corpus (some characters adapted)Kukre kokoi The monkey eatsApe kre The child worksApe kokoi rats The big monkey worksApe mi mets The good man worksApe mets kra The child works wellApe punui mi pinjets The old man works badlyCan we translate new sentence?Stephan Vogel - Machine Translation 4Deciphering ExampleParallel Corpus (some characters adapted)Can we build a lexicon from these sentence pairs?Observations:Apinaye: Kukre (1) Ape (5), English: The (6), works (5)Aha! -> first guess: Ape – worksmonkey in 1, 3; child in 2, 4; man in 4, 6different distribution over corpus: do we find words with similar distribution on the Apinaye side?Kukre kokoi The monkey eatsApe kra The child worksApe kokoi rats The big monkey worksApe mi mets The good man worksApe mets kra The child works wellApe punui mi pinjets The old man works badlyStephan Vogel - Machine Translation 5… VocabulariesCorpus VocabulariesKukre kokoi The monkey eatsApe kra The child worksApe kokoi rats The big monkey worksApe mi mets The good man worksApe mets kra The child works wellApe punui mi pinjetsThe old man works badlyApinaye Englishkukre Thekokoi monkeyape eatskra childrats worksmi bigmets goodpunui manpinjets welloldbadlyObservations: 9 Apinaye words, 11 English wordsExpectations:English words without translation?Apinaye words corresponding to more then 1 English word?Stephan Vogel - Machine Translation 6… Word FrequenciesCorpus Vocabularies, with frequenciesApinaye Englishkukre 1 The 6kokoi 2 monkey 2ape 5 eats 1kra 2 child 2rats 1 works 5mi 1 big 1mets 2 good 1punui 1 man 2pinjets 1 well 1old 1badly 1Kukre kokoi The monkey eatsApe kra The child worksApe kokoi rats The big monkey worksApe mi mets The good man worksApe mets kra The child works wellApe punui mi pinjetsThe old man works badlySuggestions:‘ape’ (5) could align to ‘The’ (6) or ‘works’ (5)More likely that content word ‘works’has match, i.e. ‘ape’ = ‘works’Other word pairs difficult to predict – too many similar frequenciesStephan Vogel - Machine Translation 7… Location in CorpusCorpus Vocabularies, with occurrencesApinayeSentencesEnglishSentenceskukre 1 The 1 2 3 4 5 6kokoi 1 3 monkey1 3ape 2 3 4 5 6 eats 1kra 2 5 child 2 5rats 3 works 2 3 4 5 6mi 4 6 big 3mets 4 5 good 4punui 6 man 4 6pinjets 6 well 5old 6badly 6Observations:Same sentences: ‘kukre’ – ‘eats’, ‘kokoi’ – ‘monkey’, ‘ape’ – ‘works’,‘kra’ – ‘child’, ‘rats’ – ‘big’, ‘mi’ – ‘man’‘mets’ (4 and 5) =? ‘good’ (4) and ‘well’ (5); makes sense‘punui’ and ‘pinjets’ match ‘old’ and ‘badly’ – which is which?Kukre kokoi The monkey eatsApe kra The child worksApe kokoi rats The big monkey worksApe mi mets The good man worksApe mets kra The child works wellApe punui mi pinjetsThe old man works badlyStephan Vogel - Machine Translation 8… Location in SentenceCorpusObservations:First English word (‘The’) does not align; we say it aligns to the NULL word Apinaye verb in first positionEnglish last word aligns to 1st or 2nd positionEnglish -> Apinaye: reverse word order (not strictly in sentence pair 5)Hypothesis:alignment for last sentence pair is 1-0 2-4 3-3 4-1 5-2I.e: ‘pinjets’ – ‘old’ and ‘punui’ – ‘badly’Apinaye English Alignment EN - APKukre kokoi The monkey eats 1-0 2-2 3-1Ape kra The child works 1-0 2-2 3-1Ape kokoi rats The big monkey works 1-0 2-3 3-2 4-1Ape mi mets The good man works 1-0 2-3 3-2 4-1Ape mets kra The child works well 1-0 2-3 3-1 4-2Ape punui mi pinjetsThe old man works badly1-0 2-??? 3-3 4-1 5-???Stephan Vogel - Machine Translation 9… POS InformationCorpusObservations:English determiner (‘The’) does not align; perhaps no determiners in ApinayeEnglish Verb Adverb -> Apinaye: Verb Adverb -> no reorderingEnglish Adjective Noun -> Apinaye: Noun Adjective -> reorderingHypothesis: ‘pinjets’ is Adj to make it N Adj, ‘punui’ is Adv(consistent with alignment hypothesis)Kukre kokoi V N The monkey eats DET N VApe kra V N The child works Det N VApe kokoi rats V N Adj The big monkey works Det Adj N VApe mi mets V N Adj The good man works Det Adj N VApe mets kra V Adv N The child works well Det N V AdvApe punui mi pinjetsV ??? N ??? The old man works badlyDet Adj N V AdvStephan Vogel - Machine Translation 10Translate New Sentences: Ap - EnSource Sentence: Ape rats mi metsLexical information: works big man good/wellReordering information: The good man works bigBetter lexical choice: The good man works hardCompare: Ape mi mets -> The good man worksSource Sentence: Kukre rats kokoi punuiLexical information: eats big monkey badlyReordering information: The bad monkey eats bigBetter lexical choice: The bad monkey eats a lotStephan Vogel - Machine Translation 11Translate New Sentences: En - ApSource Sentence: The old monkey eats a lotLexical information: NULL pinjets kokio kukre ratsReordering information: kukre rats kokio pinjetsOrDeleting words: old monkey eats a


View Full Document

CMU LTI 11731 - SMT – Basic Ideas

Download SMT – Basic Ideas
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view SMT – Basic Ideas and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SMT – Basic Ideas 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?