Speech Processing 15-492/18-492Speech TranslationSpeech TranslationThree part systemsThree part systemsASR ASR --> Translation > Translation --> TTS> TTSSystem configurationsSystem configurationsOne way One way ––phrasalphrasalOne way One way ––broadcast/lecturebroadcast/lecture1.5 way 1.5 way ––phrasal with limited answersphrasal with limited answersTwo way Two way ––full two wayfull two wayMachine Translation TechnologiesPhrasalPhrasalPhrase to phrase look upPhrase to phrase look upTemplate:Template:Template fillers, fixed translationTemplate fillers, fixed translationInterlinguaInterlinguaTranslation into meaning representationTranslation into meaning representationStatistical Machine TranslationStatistical Machine TranslationFrom large collect of parallel textFrom large collect of parallel textClassification base translationClassification base translationIdentify classes and deal directly with themIdentify classes and deal directly with themChoices in TranslationChoose any two …Choose any two …High accuracyHigh accuracyLarge vocabularyLarge vocabularyFully automaticFully automaticSpeech Speech vsvsTextTextSpeech less clear than textSpeech less clear than textLess speech to train fromLess speech to train fromNeeds to be realNeeds to be real--time (probably)time (probably)Simple TranslationPhrase to PhrasePhrase to PhraseGreetingsGreetingsDo you need medical attention?Do you need medical attention?Relatively easy to build, but limited useRelatively easy to build, but limited useTemplate translationsTemplate translationsThe next train leaves at TIME from gate The next train leaves at TIME from gate GATEGATEform PLACEform PLACELimited but still usefulLimited but still usefulInterlinguaTranslate sentences into standard formTranslate sentences into standard formGenerate sentences from standard formGenerate sentences from standard formPROS:PROS:Can do multiple languages easilyCan do multiple languages easilyCan be very accurateCan be very accurateCONSCONSDesigning universal interlingua is very hardDesigning universal interlingua is very hardDoesn’t do well when out of domainDoesn’t do well when out of domainStatistical Machine TranslationBuild probabilistic models from parallel textBuild probabilistic models from parallel textParallel text often available fromParallel text often available fromBilingual organizationsBilingual organizationsGovernments, UNGovernments, UNRelatively easy to collect Relatively easy to collect Requires translators rather than MT expertsRequires translators rather than MT expertsLearning from Parallel TextLearning from Parallel TextStatistical Machine TranslationPROSPROSData collection doesn’t require MT expertsData collection doesn’t require MT expertsData drivenData drivenDegrades gracefully when out of domainDegrades gracefully when out of domainCONSCONSNeeds all language pairsNeeds all language pairsNeeds good/lots of dataNeeds good/lots of dataHard to fix specific errorsHard to fix specific errorsSPEECH TranslationSpeech isn’t textSpeech isn’t textDifferent style, hard to find lots of Different style, hard to find lots of exaplesexaplesSpeech isn’t fluentSpeech isn’t fluentFalse starts, hesitations, ungrammaticalFalse starts, hesitations, ungrammaticalASR never makes errors ASR never makes errors ☺☺One Way: Broadcast One speaker One speaker Lecturer: can modify language modelLecturer: can modify language modelMultiple speakersMultiple speakersMay be repeat speakers (News Anchor)May be repeat speakers (News Anchor)May had other noises: music etcMay had other noises: music etc(TV programs)(TV programs)Doesn’t need to be real time (maybe)Doesn’t need to be real time (maybe)Two Way: DialogUsers can detect own errors and correctUsers can detect own errors and correctNeeds to be real timeNeeds to be real timeOne user may be much more familiarOne user may be much more familiarHow do you teach the other userHow do you teach the other userTypically domain directedTypically domain directedSpeech Technology IssuesASR:ASR:DisfluenciesDisfluencies, dialects, speaking style, dialects, speaking styleUnfamiliarity with systemUnfamiliarity with systemTTS:TTS:MT output isn’t always fluentMT output isn’t always fluentTTS says it anywayTTS says it anywayCan be hard to understandCan be hard to understandSpeech Technology IssuesSpoken not Written LanguagesSpoken not Written LanguagesArabic Arabic vsvsArabic DialectsArabic DialectsMixture of languagesMixture of languagesPoliteness levelsPoliteness levelsGender in speechGender in
View Full Document