CMU CS 15492 - Speech Processing Current Topics and Future challenges Commercial and Research - D2072960

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15492> Speech Processing Current Topics and Future challenges Commercial and Research

DOC PREVIEW

CMU CS 15492 - Speech Processing Current Topics and Future challenges Commercial and Research

School name Carnegie Mellon University

Course Cs 15492- Special Topic: Speech Processing

Pages 16

This preview shows page 1-2-3-4-5 out of 16 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 16 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Speech Processing 15-492/18-492Speech ProcessingCurrent Topics and Future challengesCommercial and ResearchCurrent and FutureWhat are the hot topics in SpeechWhat are the hot topics in SpeechWhat currently worksWhat currently worksWhat could work soon (5What could work soon (5--10years)10years)What are the industry hot topicsWhat are the industry hot topicsWhat are the research challengesWhat are the research challengesSpoken Dialog: NowIndustry:Industry:Location based queryingLocation based queryingGoogleGoogle: 411, : 411, smartphonesmartphoneMicrosoft Live Search: Microsoft Live Search: smartphonesmartphoneYahoo (Yahoo (VlingoVlingo))Blackberry, Blackberry, IPhoneIPhone(Owners have money)(Owners have money)How do you make money out of this …How do you make money out of this …Spoken Dialog: NowResearchResearchError recoveryError recoveryAdaptive systemsAdaptive systemsRapid deploymentRapid deploymentLearning dialog structure from dataLearning dialog structure from dataASR: NowIndustryIndustryMoving from grammar based to NMoving from grammar based to N--gram basedgram basedBroadcast news transcription of IRBroadcast news transcription of IRRobust speech recognition:Robust speech recognition:In car, outside, in noisy officeIn car, outside, in noisy officeLM adaptation from other sourcesLM adaptation from other sourcesUsing click through and search queriesUsing click through and search queriesPronunciation variants (“wrong” ones too)Pronunciation variants (“wrong” ones too)Medical transcriptionMedical transcriptionASR: NowResearch:Research:Discriminative trainingDiscriminative trainingAcoustic parameter projections to discriminate Acoustic parameter projections to discriminate between the correct answers and competitorsbetween the correct answers and competitorsRobust recognitionRobust recognitionFar field microphonesFar field microphonesBlind source separationBlind source separationOut of vocabulary wordsOut of vocabulary wordsUnsupervised trainingUnsupervised trainingTTS: NowIndustryIndustryBuilding custom voices (and your voice)Building custom voices (and your voice)Multilingual on small devicesMultilingual on small devicesE.g. for GPS Navigation over EuropeE.g. for GPS Navigation over EuropeEasy methods to build new languagesEasy methods to build new languagesTTS: NowResearchResearchImproving statistical synthesisImproving statistical synthesisRapid support in new languagesRapid support in new languagesEmotional speech synthesisEmotional speech synthesisAutomatic building of voices from dataAutomatic building of voices from dataWithout any human interventionWithout any human interventionSynthesis beyond the sentenceSynthesis beyond the sentenceSynthesis with more text analysisSynthesis with more text analysisSpeech to Speech TranslationIndustryIndustryOne way systems, domain limited systemsOne way systems, domain limited systemsSimple targeted cell phone systemsSimple targeted cell phone systemsResearchResearchTwo way systems, large domainsTwo way systems, large domainsOne way lecture/broadcast newsOne way lecture/broadcast newsVC and SID: NowVoice conversionVoice conversionCross Lingual Voice ConversionCross Lingual Voice ConversionEmotion/style conversionEmotion/style conversionConversion without training dataConversion without training dataSpeaker IDSpeaker IDAccuracy on large data sets (> 1000 speakers)Accuracy on large data sets (> 1000 speakers)Cross channel/language IDCross channel/language IDMore information in ID (prosody, More information in ID (prosody, vocabvocab))CALL: NowIndustryIndustryPronunciation trainingPronunciation trainingScenario practicingScenario practicingResearchResearchGame based toolsGame based toolsMeasuring educational contributionMeasuring educational contributionSpeech Processing FutureHard challenges (PhD topics and beyond)Hard challenges (PhD topics and beyond)All on the research sideAll on the research sideBut maybe in Research LabsBut maybe in Research LabsSpeech Reco without SpeechUsing other modalitiesUsing other modalitiesLip movement, muscle movementLip movement, muscle movementSilent speechSilent speechNo generated audioNo generated audioJust think about the wordsJust think about the wordsGesture recognitionGesture recognitionConversational SystemsParticipant in a meetingParticipant in a meetingTrue conversational speechTrue conversational speechAppropriate nonAppropriate non--word speech generationword speech generationKnow when to speak, when to laugh, when to listenKnow when to speak, when to laugh, when to listenAppropriate timing conversationAppropriate timing conversationAble to interrupt when having something to sayAble to interrupt when having something to sayHave something to sayHave something to saySummaries and DiscussionsDescribe a paper/movie/eventDescribe a paper/movie/eventAppropriate summaryAppropriate summaryAllow questionsAllow questionsKnow when to use style/emotionKnow when to use style/emotionNot just speech<Not just speech<-->text >text Understand more of the text contentUnderstand more of the text contentFinal NotesDon’t forget to fill in Faculty Course Don’t forget to fill in Faculty Course EvaluationEvaluationFinal Homework dueFinal Homework dueMonday 8Monday 8thth3:30pm3:30pmFinal examFinal examTuesday 16Tuesday 16thth1pm1pm--4pm 4pm

View Full Document