DOC PREVIEW
Berkeley COMPSCI 188 - Machine translation

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1CS 188: Artificial IntelligenceSpring 2006Lecture 28: Machine Translation5/2/2006Dan Klein – UC BerkeleyMachine Translation: ExamplesLevels of TransferInterlinguaSemanticStructureSemanticStructureSyntacticStructureSyntacticStructureWordStructureWordStructureSource TextTarget TextSemanticCompositionSemanticDecompositionSemanticAnalysisSemanticGenerationSyntacticAnalysisSyntacticGenerationMorphologicalAnalysisMorphologicalGenerationSemanticTransferSyntacticTransferDirect(Vauquoistriangle)General Approaches Rule-based approaches Expert system style rewrite systems Interlingua methods (analyze and generate) Lexicons come from humans or dictionaries Can be very fast, and can accumulate a lot of knowledge over time (e.g. Systran) Statistical approaches Noisy channel systems Lower-level transfer Lexicons discovered using parallel corpora Require little human declaration of knowledgeThe Coding View “One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ”  Warren Weaver (1955:18, quoting a letter he wrote in 1947)MT System ComponentssourceP(e)efdecoderobserved argmax P(e|f) = argmax P(f|e)P(e)eeefbestchannelP(f|e)Language Model Translation ModelFinds an English translation which is both fluent and semantically faithful to the French source2Language Models Language Models Any probabilistic model capable of assigning probabilities to sentences Usually n-gram models, but also PCFGs Exact same technology (and software) as in ASR Train on a huge collection of monolingual corpora (documents in the target language)w1w2wn-1STOPSTARTParallel Corpora Parallel corpora (or bitexts) Collection of source-target translation pairs Main resource for learning a translation model Either naturally occurring (e.g. parliamentary proceedings, news translation services) or commissionedBuilding a Translation Model Steps in building a simple statistical translation model Match up words in training sentence pairs (word alignment) Learn a lexicon from these alignments Learn larger phrasesWhatis theanticipatedcostofcollecting fees under the new proposal?En vertudelesnouvellespropositions, quelestle coûtprévude perception de les droits?1-to-Many AlignmentsMany-to-Many Alignments The HMM Alignment Model The HMM model (Vogel 96) Re-estimate using the forward-backward algorithm Handling nulls requires some care Note: alignments are not provided, but induced-2 -1 0 1 2 33Examples: Translation and FertilityPhrases vs Word Modelsil hoche la têtehe is noddingExtracting Phrases Basic Phrase-Based Model[Koehn et al, 2003]Segmentation Translation DistortionDecoding Now we have a phrase table: A huge list of translation phrases (e.g. 1M phrases) Each phrase has a probability P(f|e) When we see a new input sentence: Grow a translation left to right Extend translation using known phrases  Also multiply by language model scoreThe Pharaoh Decoder Probabilities at each step include LM and TM4Some OutputMadame la présidente, votre présidence de cette institution a étémarquante.Mrs Fontaine, your presidency of this institution has been outstanding.Madam President, president of this house has been discoveries.Madam President, your presidency of this institution has been impressive.Je vais maintenant m'exprimer brièvement en irlandais.I shall now speak briefly in Irish .I will now speak briefly in Ireland . I will now speak briefly in Irish .Nous trouvons en vous un président tel que nous le souhaitions.We think that you are the type of president that we want.We are in you a president as the wanted. We are in you a president as we the wanted.Translations Even human translators aren’t perfect: In an Austrian ski hotel:Not to perambulate the corridors in the hours of repose in the boots of ascension.  In a Copenhagen airline ticket office:We take your bags and send them in all directions. From a brochure of a car rental firm in Tokyo: When passenger of foot heave in sight, tootle the horn. Trumpet him melodiously at first, but if he still obstacles your passage then tootle him with


View Full Document

Berkeley COMPSCI 188 - Machine translation

Documents in this Course
CSP

CSP

42 pages

Metrics

Metrics

4 pages

HMMs II

HMMs II

19 pages

NLP

NLP

23 pages

Midterm

Midterm

9 pages

Agents

Agents

8 pages

Lecture 4

Lecture 4

53 pages

CSPs

CSPs

16 pages

Midterm

Midterm

6 pages

MDPs

MDPs

20 pages

mdps

mdps

2 pages

Games II

Games II

18 pages

Load more
Download Machine translation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Machine translation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Machine translation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?