Speech Processing 15-492/18-492Speech TranslationCase study: TranstacDetailsTranstac: Two S2S SystemDARPA developed forDARPA developed forCheck points, medical and civil defenseCheck points, medical and civil defenseRequirementsRequirementsTwo wayTwo wayEyesEyes--free (no screen)free (no screen)PortablePortableUsable by real Usable by real usersSusersSTranstac SystemLaptop secured in BackpackOptional speech controlPush-to-Talk ButtonsClose-talking MicrophoneSmall powerful SpeakersTranstac System DetailsTwo way systemTwo way system2 ASR systems: English and Iraqi2 ASR systems: English and Iraqi2 way statistical translation2 way statistical translation2 synthesizers2 synthesizersPushPush--toto--talk systemtalk system(Users don’t like “translate everything mode”)(Users don’t like “translate everything mode”)Echo back ASR resultEcho back ASR resultAnd then translationAnd then translationIraqi LanguageIraqi Arabic is a dialectIraqi Arabic is a dialectMost Iraqi’s write Modern Standard ArabicMost Iraqi’s write Modern Standard ArabicMost Iraqi’s do not write their own dialectMost Iraqi’s do not write their own dialectNo standardized spellingNo standardized spellingTranstacTranstacproject invented oneproject invented oneBut Iraqi’s may not be used to itBut Iraqi’s may not be used to itArabic (MSA and dialects)Arabic (MSA and dialects)Do not write short vowels in wordsDo not write short vowels in wordsData for TrainingCollected human mediated dialogsCollected human mediated dialogsHuman acts as a machineHuman acts as a machinePassed a microphone back an forwardPassed a microphone back an forwardTry to get people not to talk at same timeTry to get people not to talk at same timeLarge number of collections (over 4 years)Large number of collections (over 4 years)650 thousand sentences pairs650 thousand sentences pairsMany different speakersMany different speakersHand transcribed by experts (in Iraqi spelling)Hand transcribed by experts (in Iraqi spelling)Hand translate (Source sentences and Interpreter’s)Hand translate (Source sentences and Interpreter’s)Iraqi ASRAcoustic model from Iraqi dataAcoustic model from Iraqi dataBased on MSA Based on MSA phonesetphonesetNeeds to be small fast modelsNeeds to be small fast modelsDiscriminative TrainingDiscriminative TrainingSpeaker specific adaptationSpeaker specific adaptationLexiconLexiconBased on LDC provided lexiconBased on LDC provided lexiconMultiple pronunciations/typos still a problemMultiple pronunciations/typos still a problemStatistically trained LTS rulesStatistically trained LTS rulesLanguage ModelLanguage ModelTrained on Iraqi input (and translated output)Trained on Iraqi input (and translated output)English ASRAcoustic modelAcoustic modelOriginally using other modelsOriginally using other modelsThen trained from collected dataThen trained from collected data(Mostly military personnel)(Mostly military personnel)LexiconLexiconExisting lexicon but needed to add Military speak: Existing lexicon but needed to add Military speak: MRAP, IEDMRAP, IEDLanguage modelLanguage modelTrained from data providedTrained from data providedTrained from “similar” data found on the webTrained from “similar” data found on the webTraining from hand created “typical” examplesTraining from hand created “typical” examplesTTSStandard English TTSStandard English TTSAppropriate “command” voiceAppropriate “command” voiceUnit selectionUnit selectionAdded lots of military vocabularyAdded lots of military vocabularyIraqi TTSIraqi TTSRecorded from Iraqi radio announcerRecorded from Iraqi radio announcerBased on example sentences in the domainBased on example sentences in the domainLDC lexicon and LTS rules (same as ASR)LDC lexicon and LTS rules (same as ASR)Hand tunedHand tunedS2S Interface IssuesHow do you teach people to use the systemHow do you teach people to use the system““TranstacTranstacsay instructions”say instructions”Not really sufficientNot really sufficientHow can you tell it translated correctlyHow can you tell it translated correctlyGive (speech) feedback.Give (speech) feedback.BacktranslationBacktranslationASR echo backASR echo backS2S Interface IssuesHow do you translate namesHow do you translate namesA correct translation/transliteration is hard to A correct translation/transliteration is hard to understandunderstandMark names in translationsMark names in translations“My name is … Abdullah”“My name is … Abdullah”“He lives on … al“He lives on … al--AqarAqar… street”… street”S2S Evaluation (Transtac)Offline testsOffline testsASRASR-->Text and Text>Text and Text-->Text>TextCompare to translation referencesCompare to translation referencesWER and “BLEU” scoreWER and “BLEU” scoreOnline testsOnline testsConcept transfer (through defined scenarios)Concept transfer (through defined scenarios)Speed (number of concepts per minute)Speed (number of concepts per minute)(English speech masking)(English speech masking)Utility testsUtility testsDoes it really workDoes it really workTranstac ParticipantsDeveloper groupsDeveloper groupsIBMIBMSRISRIBBNBBNCMUCMUUSCUSCEvaluationsEvaluationsTwice a year in Iraqi (somewhere in DC)Twice a year in Iraqi (somewhere in DC)One surprise language (Farsi, One surprise language (Farsi, BahasaBahasaMalay)Malay)Other evaluations with military groupsOther evaluations with military groupsDoes it work??Yes, mostlyYes, mostly27 concepts out of 3027 concepts out of 30--ish turnsish turnsSystems are mostly similarSystems are mostly similarBut some better than othersBut some better than othersOther techniquesOther techniquesBelt/holster based PC with handheld speakerBelt/holster based PC with handheld speakerSmall PC in pouchSmall PC in pouchChest mounted array microphoneChest mounted array microphoneS2S ASR Advanced issuesTight
View Full Document