Speech Processing 15-492/18-492Speech ProcessingCurrent Topics and Future challengesCommercial and ResearchCurrent and FutureWhat are the hot topics in SpeechWhat are the hot topics in SpeechWhat currently worksWhat currently worksWhat could work soon (5What could work soon (5--10years)10years)What are the industry hot topicsWhat are the industry hot topicsWhat are the research challengesWhat are the research challengesSpoken Dialog: NowIndustry:Industry:Location based queryingLocation based queryingGoogleGoogle: 411, : 411, smartphonesmartphoneMicrosoft Live Search: Microsoft Live Search: smartphonesmartphoneYahoo (Yahoo (VlingoVlingo))Blackberry, Blackberry, IPhoneIPhone(Owners have money)(Owners have money)How do you make money out of this …How do you make money out of this …Spoken Dialog: NowResearchResearchError recoveryError recoveryAdaptive systemsAdaptive systemsRapid deploymentRapid deploymentLearning dialog structure from dataLearning dialog structure from dataASR: NowIndustryIndustryMoving from grammar based to NMoving from grammar based to N--gram basedgram basedBroadcast news transcription of IRBroadcast news transcription of IRRobust speech recognition:Robust speech recognition:In car, outside, in noisy officeIn car, outside, in noisy officeLM adaptation from other sourcesLM adaptation from other sourcesUsing click through and search queriesUsing click through and search queriesPronunciation variants (“wrong” ones too)Pronunciation variants (“wrong” ones too)Medical transcriptionMedical transcriptionASR: NowResearch:Research:Discriminative trainingDiscriminative trainingAcoustic parameter projections to discriminate Acoustic parameter projections to discriminate between the correct answers and competitorsbetween the correct answers and competitorsRobust recognitionRobust recognitionFar field microphonesFar field microphonesBlind source separationBlind source separationOut of vocabulary wordsOut of vocabulary wordsUnsupervised trainingUnsupervised trainingTTS: NowIndustryIndustryBuilding custom voices (and your voice)Building custom voices (and your voice)Multilingual on small devicesMultilingual on small devicesE.g. for GPS Navigation over EuropeE.g. for GPS Navigation over EuropeEasy methods to build new languagesEasy methods to build new languagesTTS: NowResearchResearchImproving statistical synthesisImproving statistical synthesisRapid support in new languagesRapid support in new languagesEmotional speech synthesisEmotional speech synthesisAutomatic building of voices from dataAutomatic building of voices from dataWithout any human interventionWithout any human interventionSynthesis beyond the sentenceSynthesis beyond the sentenceSynthesis with more text analysisSynthesis with more text analysisSpeech to Speech TranslationIndustryIndustryOne way systems, domain limited systemsOne way systems, domain limited systemsSimple targeted cell phone systemsSimple targeted cell phone systemsResearchResearchTwo way systems, large domainsTwo way systems, large domainsOne way lecture/broadcast newsOne way lecture/broadcast newsVC and SID: NowVoice conversionVoice conversionCross Lingual Voice ConversionCross Lingual Voice ConversionEmotion/style conversionEmotion/style conversionConversion without training dataConversion without training dataSpeaker IDSpeaker IDAccuracy on large data sets (> 1000 speakers)Accuracy on large data sets (> 1000 speakers)Cross channel/language IDCross channel/language IDMore information in ID (prosody, More information in ID (prosody, vocabvocab))CALL: NowIndustryIndustryPronunciation trainingPronunciation trainingScenario practicingScenario practicingResearchResearchGame based toolsGame based toolsMeasuring educational contributionMeasuring educational contributionSpeech Processing FutureHard challenges (PhD topics and beyond)Hard challenges (PhD topics and beyond)All on the research sideAll on the research sideBut maybe in Research LabsBut maybe in Research LabsSpeech Reco without SpeechUsing other modalitiesUsing other modalitiesLip movement, muscle movementLip movement, muscle movementSilent speechSilent speechNo generated audioNo generated audioJust think about the wordsJust think about the wordsGesture recognitionGesture recognitionConversational SystemsParticipant in a meetingParticipant in a meetingTrue conversational speechTrue conversational speechAppropriate nonAppropriate non--word speech generationword speech generationKnow when to speak, when to laugh, when to listenKnow when to speak, when to laugh, when to listenAppropriate timing conversationAppropriate timing conversationAble to interrupt when having something to sayAble to interrupt when having something to sayHave something to sayHave something to saySummaries and DiscussionsDescribe a paper/movie/eventDescribe a paper/movie/eventAppropriate summaryAppropriate summaryAllow questionsAllow questionsKnow when to use style/emotionKnow when to use style/emotionNot just speech<Not just speech<-->text >text Understand more of the text contentUnderstand more of the text contentFinal NotesDon’t forget to fill in Faculty Course Don’t forget to fill in Faculty Course EvaluationEvaluationFinal Homework dueFinal Homework dueMonday 8Monday 8thth3:30pm3:30pmFinal examFinal examTuesday 16Tuesday 16thth1pm1pm--4pm 4pm
View Full Document