DOC PREVIEW
CMU CS 15492 - review

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Speech Processing 15-492/18-492ReviewASR, TTS, Dialog,S2S, VC, SID and CALLSpeech OverviewASRASRAutomatic Speech Recognition (AM and LM)Automatic Speech Recognition (AM and LM)TTSTTSText to speech: unit selection and statistical Text to speech: unit selection and statistical parametric synthesisparametric synthesisDialogDialogSpoken dialog systems: Spoken dialog systems: VoiceXMLVoiceXML, direct and , direct and mixed initiative dialogsmixed initiative dialogsSpeech OverviewVCVCVoice conversion, transformation, morphingVoice conversion, transformation, morphingSIDSIDSpeaker ID, Speaker recognitionSpeaker ID, Speaker recognitionCALLCALLComputer Aided Language LearningComputer Aided Language LearningS2SS2SSpeech to Speech translationSpeech to Speech translationASRAcoustic modelsAcoustic modelsAcoustic models (usually Acoustic models (usually HMMsHMMs) ) Modeling all ways to say each phonemeModeling all ways to say each phonemeLanguage modelsLanguage modelsModeling word sequence likelihoodsModeling word sequence likelihoodsTriTri--grams and grammarsgrams and grammarsASR• ASR and Bayes ruleBy Bayes ruleAcoustic model Language modelASR EvaluationWERWERWord error rate Word error rate vsvsAccuracyAccuracyWhat is the expected/acceptable WER ofWhat is the expected/acceptable WER ofDictationDictationDialog systemsDialog systemsSpeech IRSpeech IRConversational speech with a far field microphone with Conversational speech with a far field microphone with multiple overlapping nonmultiple overlapping non--native speakers (who know native speakers (who know each other) with heavily vehicle traffic in the each other) with heavily vehicle traffic in the backgrounbackgrounTTSText analysisText analysisHomographs, symbol, expansionHomographs, symbol, expansionLinguistic analysisLinguistic analysisPronunciation lexiconsPronunciation lexiconsProsody: breaks, intonation, durationProsody: breaks, intonation, durationWaveform synthesisWaveform synthesisFormant synthesis, concatenative synthesis, Formant synthesis, concatenative synthesis, statistical parametric synthesisstatistical parametric synthesisWaveform SynthesisDiphonesDiphonesMidMid--phone to midphone to mid--phone speech unitsphone speech unitsUnit selectionUnit selectionSelecting appropriate subSelecting appropriate sub--word units from large word units from large databases of natural speechdatabases of natural speechStatistical Parametric SpeechStatistical Parametric SpeechBuild speech model of “averages” of similar speechBuild speech model of “averages” of similar speechLimit domain synthesisLimit domain synthesisTargeted synthesis Targeted synthesisTTS EvaluationYes that sounds like a robotYes that sounds like a robotHuman listening testsHuman listening testsMOS scale for “likable”MOS scale for “likable”SUS sentences for understandabilitySUS sentences for understandabilityHuman personal Human personal prefrencesprefrences..Spoken Dialog SystemsVoiceXMLVoiceXML(and SALT)(and SALT)TreeTree--based dialog systemsbased dialog systemsOlympusOlympusMore general dialog systemsMore general dialog systemsSystem types:System types:System initiativeSystem initiativeMixed initiativeMixed initiativeHMIHY (How may I help you)HMIHY (How may I help you)Spoken Dialog System EvaluationTask completionTask completionCall lengthCall lengthNumber of turnsNumber of turns(Number of Calls)(Number of Calls)Break down byBreak down byNew/repeat callersNew/repeat callersDifferent usage typesDifferent usage typesNew LanguagesText examplesText examplesFor finding nice promptsFor finding nice promptsFor building language modelsFor building language modelsPhoneme definitionsPhoneme definitionsPronunciation lexiconPronunciation lexiconRecordingsRecordingsLots for ASR, one good one for TTSLots for ASR, one good one for TTSSpeech to SpeechReal timeReal timeTargeted/wide vocabularyTargeted/wide vocabularySpeech not textSpeech not textOften resource limited target languageOften resource limited target languageNeed a written form, and collect own dataNeed a written form, and collect own dataVoice ConversionConvert source text to target speakerConvert source text to target speakerSmall amount to target speaker (e.g. 30 Small amount to target speaker (e.g. 30 uttsutts))GMMGMM--based modelsbased modelsUses Uses Speaker conversionSpeaker conversionStyle conversionStyle conversionCross lingual voice conversionCross lingual voice conversionDeDe--identificationidentificationEvaluationEvaluationListeningListeningSpeaker ID systemsSpeaker ID systemsSpeaker IDSpeaker recognitionSpeaker recognitionWho is speakingWho is speakingSecurity, Security, passwdpasswdaccessaccessDiacritzationDiacritzation(who is speaking in a meeting)(who is speaking in a meeting)Speaker, language, dialect, style IDSpeaker, language, dialect, style IDTechniquesTechniquesGMM and Phone based techniquesGMM and Phone based techniquesCALLComputer aided language learningComputer aided language learningReading tutorsReading tutorsFirst and second language LearnersFirst and second language LearnersSecond language learnersSecond language learnersPronunciation trainersPronunciation trainersFluency practiceFluency practiceInteractive scenario experienceInteractive scenario


View Full Document
Download review
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view review 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?